okumin commented on a change in pull request #1531:
URL: https://github.com/apache/hive/pull/1531#discussion_r500420813
##########
File path:
ql/src/test/results/clientpositive/llap/annotate_stats_lateral_view_join.q.out
##########
@@ -503,14 +503,14 @@ STAGE PLANS:
Statistics: Num rows: 1 Data size: 376 Basic
stats: COMPLETE Column stats: COMPLETE
Lateral View Join Operator
outputColumnNames: _col0, _col1, _col5, _col6
- Statistics: Num rows: 0 Data size: 24 Basic
stats: PARTIAL Column stats: NONE
+ Statistics: Num rows: 0 Data size: 24 Basic
stats: PARTIAL Column stats: COMPLETE
Review comment:
With clone, the following condition is not satisfied since the basic
stats of parent operators are PARTIAL.
https://github.com/apache/hive/blob/91e492de239427fc1e38e5e4350cfdce409ebb70/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2969
##########
File path:
ql/src/test/results/clientpositive/llap/annotate_stats_lateral_view_join.q.out
##########
@@ -503,14 +503,14 @@ STAGE PLANS:
Statistics: Num rows: 1 Data size: 376 Basic
stats: COMPLETE Column stats: COMPLETE
Lateral View Join Operator
outputColumnNames: _col0, _col1, _col5, _col6
- Statistics: Num rows: 0 Data size: 24 Basic
stats: PARTIAL Column stats: NONE
+ Statistics: Num rows: 0 Data size: 24 Basic
stats: PARTIAL Column stats: COMPLETE
Review comment:
BTW, it would be better that the UDTF rule puts one in num rows in case
that it becomes zero.
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##########
@@ -2921,6 +2920,97 @@ public Object process(Node nd, Stack<Node> stack,
NodeProcessorCtx procCtx,
}
}
+ /**
+ * LateralViewJoinOperator changes the data size and column level statistics.
+ *
+ * A diagram of LATERAL VIEW.
+ *
+ * [Lateral View Forward]
+ * / \
+ * [Select] [Select]
+ * | |
+ * | [UDTF]
+ * \ /
+ * [Lateral View Join]
+ *
+ * For each row of the source, the left branch just picks columns and the
right branch processes UDTF.
+ * And then LVJ joins a row from the left branch with rows from the right
branch.
+ * The join has one-to-many relationship since UDTF can generate multiple
rows.
+ *
+ * This rule multiplies the stats from the left branch by T(right) / T(left)
and sums up the both sides.
+ */
+ public static class LateralViewJoinStatsRule extends DefaultStatsRule
implements SemanticNodeProcessor {
+ @Override
+ public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx,
+ Object... nodeOutputs) throws SemanticException {
+ final LateralViewJoinOperator lop = (LateralViewJoinOperator) nd;
+ final AnnotateStatsProcCtx aspCtx = (AnnotateStatsProcCtx) procCtx;
+ final HiveConf conf = aspCtx.getConf();
+
+ if (!isAllParentsContainStatistics(lop)) {
+ return null;
+ }
+
+ final List<Operator<? extends OperatorDesc>> parents =
lop.getParentOperators();
+ if (parents.size() != 2) {
+ LOG.warn("LateralViewJoinOperator should have just two parents but
actually has "
+ + parents.size() + " parents.");
+ return null;
+ }
+
+ final Statistics selectStats =
parents.get(LateralViewJoinOperator.SELECT_TAG).getStatistics().clone();
Review comment:
Nothing was not unexpectedly broken. CI failed but it would not be
related to this PR...
-
https://github.com/apache/hive/pull/1531/commits/91e492de239427fc1e38e5e4350cfdce409ebb70
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]