[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r260671073 ## File path: ql/src/test/results/clientpositive/distinct_groupby_without_cbo.q.out ## @@ -0,0 +1,1191 @@ +PREHOOK: query: explain select distinct count(*) from src1 where key in (128,146,150) +PREHOOK: type: QUERY +PREHOOK: Input: default@src1 + A masked pattern was here +POSTHOOK: query: explain select distinct count(*) from src1 where key in (128,146,150) +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src1 + A masked pattern was here +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Map Reduce + Map Operator Tree: + TableScan +alias: src1 +filterExpr: (key) IN (128, 146, 150) (type: boolean) +Statistics: Num rows: 25 Data size: 2150 Basic stats: COMPLETE Column stats: COMPLETE +Filter Operator + predicate: (key) IN (128, 146, 150) (type: boolean) + Statistics: Num rows: 5 Data size: 430 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator +Statistics: Num rows: 5 Data size: 430 Basic stats: COMPLETE Column stats: COMPLETE +Group By Operator + aggregations: count() + mode: hash + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator +sort order: +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +value expressions: _col0 (type: bigint) + Reduce Operator Tree: +Group By Operator + aggregations: count(VALUE._col0) + mode: mergepartial + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator +compressed: false +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +table: +input format: org.apache.hadoop.mapred.SequenceFileInputFormat +output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat +serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 +Fetch Operator + limit: -1 + Processor Tree: +ListSink + +PREHOOK: query: select distinct count(*) from src1 where key in (128,146,150) +PREHOOK: type: QUERY +PREHOOK: Input: default@src1 + A masked pattern was here +POSTHOOK: query: select distinct count(*) from src1 where key in (128,146,150) +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src1 + A masked pattern was here +3 +PREHOOK: query: explain select distinct * from (select distinct count(*) from src1 where key in (128,146,150)) as T +PREHOOK: type: QUERY +PREHOOK: Input: default@src1 + A masked pattern was here +POSTHOOK: query: explain select distinct * from (select distinct count(*) from src1 where key in (128,146,150)) as T +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src1 + A masked pattern was here +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-2 depends on stages: Stage-1 + Stage-0 depends on stages: Stage-2 + +STAGE PLANS: + Stage: Stage-1 +Map Reduce + Map Operator Tree: + TableScan +alias: src1 +filterExpr: (key) IN (128, 146, 150) (type: boolean) +Statistics: Num rows: 25 Data size: 2150 Basic stats: COMPLETE Column stats: COMPLETE +Filter Operator + predicate: (key) IN (128, 146, 150) (type: boolean) + Statistics: Num rows: 5 Data size: 430 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator +Statistics: Num rows: 5 Data size: 430 Basic stats: COMPLETE Column stats: COMPLETE +Group By Operator + aggregations: count() + mode: hash + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator +sort order: +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +value expressions: _col0 (type: bigint) + Reduce Operator Tree: +Group By Operator + aggregations: count(VALUE._col0) + mode: mergepartial + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + Group By Operator +keys: _col0 (type: bigint) +mode: hash +
[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r260656405 ## File path: ql/src/test/results/clientpositive/distinct_groupby_without_cbo.q.out ## @@ -0,0 +1,2018 @@ +PREHOOK: query: explain select distinct count(a.value) from src a group by a.key +PREHOOK: type: QUERY +PREHOOK: Input: default@src + A masked pattern was here +POSTHOOK: query: explain select distinct count(a.value) from src a group by a.key +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src + A masked pattern was here +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Map Reduce + Map Operator Tree: + TableScan +alias: a +Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE +Select Operator + expressions: key (type: string), value (type: string) + outputColumnNames: key, value + Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE + Group By Operator +aggregations: count(value) +keys: key (type: string) +mode: hash +outputColumnNames: _col0, _col1 +Statistics: Num rows: 250 Data size: 23750 Basic stats: COMPLETE Column stats: COMPLETE +Reduce Output Operator + key expressions: _col0 (type: string) + sort order: + + Map-reduce partition columns: _col0 (type: string) + Statistics: Num rows: 250 Data size: 23750 Basic stats: COMPLETE Column stats: COMPLETE + value expressions: _col1 (type: bigint) + Execution mode: vectorized + Reduce Operator Tree: +Group By Operator + aggregations: count(VALUE._col0) + keys: KEY._col0 (type: string) + mode: mergepartial + outputColumnNames: _col0, _col1 + Statistics: Num rows: 250 Data size: 23750 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator +expressions: _col1 (type: bigint) +outputColumnNames: _col0 +Statistics: Num rows: 250 Data size: 2000 Basic stats: COMPLETE Column stats: COMPLETE +File Output Operator + compressed: false + Statistics: Num rows: 250 Data size: 2000 Basic stats: COMPLETE Column stats: COMPLETE + table: + input format: org.apache.hadoop.mapred.SequenceFileInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 +Fetch Operator + limit: -1 + Processor Tree: +ListSink + +PREHOOK: query: select distinct count(a.value) from src a group by a.key +PREHOOK: type: QUERY +PREHOOK: Input: default@src + A masked pattern was here +POSTHOOK: query: select distinct count(a.value) from src a group by a.key +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src + A masked pattern was here +3 +1 +2 +2 +2 +1 Review comment: this is removed, due to the error, for now we don't support distinct with group by and aggregagte function if cbo is not enabled This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r260656545 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -4195,21 +4195,18 @@ public static long unsetBit(long bitmap, int bitIdx) { * DISTINCT, if present, will be handled when generating the SELECT. */ List getGroupByForClause(QBParseInfo parseInfo, String dest) throws SemanticException { -// When *not* invoked by CalcitePlanner, return the DISTINCT as a GBY -// CBO will handle the DISTINCT in CalcitePlannerAction.genSelectLogicalPlan ASTNode selectExpr = parseInfo.getSelForClause(dest); Collection aggregateFunction = parseInfo.getDestToAggregationExprs().get(dest).values(); -if (isSelectDistinct(selectExpr) && !isGroupBy(selectExpr) && !isAggregateInSelect(selectExpr, aggregateFunction)) { +if (isSelectDistinct(selectExpr) && !hasGroupBySibling(selectExpr) && +!isAggregateInSelect(selectExpr, aggregateFunction)) { List result = new ArrayList(selectExpr == null ? 0 : selectExpr.getChildCount()); Review comment: removed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r260387274 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -4194,27 +4191,29 @@ public static long unsetBit(long bitmap, int bitIdx) { } /** - * This function is a wrapper of parseInfo.getGroupByForClause which - * automatically translates SELECT DISTINCT a,b,c to SELECT a,b,c GROUP BY - * a,b,c. + * Returns the GBY, if present; + * DISTINCT, if present, will be handled when generating the SELECT. */ List getGroupByForClause(QBParseInfo parseInfo, String dest) throws SemanticException { -if (parseInfo.getSelForClause(dest).getToken().getType() == HiveParser.TOK_SELECTDI) { - ASTNode selectExprs = parseInfo.getSelForClause(dest); - List result = new ArrayList(selectExprs == null ? 0 - : selectExprs.getChildCount()); - if (selectExprs != null) { -for (int i = 0; i < selectExprs.getChildCount(); ++i) { - if (((ASTNode) selectExprs.getChild(i)).getToken().getType() == HiveParser.QUERY_HINT) { +// When *not* invoked by CalcitePlanner, return the DISTINCT as a GBY +// CBO will handle the DISTINCT in CalcitePlannerAction.genSelectLogicalPlan +ASTNode selectExpr = parseInfo.getSelForClause(dest); +Collection aggregateFunction = parseInfo.getDestToAggregationExprs().get(dest).values(); +if (isSelectDistinct(selectExpr) && !isGroupBy(selectExpr) && !isAggregateInSelect(selectExpr, aggregateFunction)) { Review comment: as we agreed, distinct with aggregate function and with group by will be supported only if cbo is enabled. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r259958209 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -4230,6 +4229,34 @@ public static long unsetBit(long bitmap, int bitIdx) { } } + protected boolean isGroupBy(ASTNode expr) { +boolean isGroupBy = false; +if (expr.getParent() != null && expr.getParent() instanceof Node) +for (Node sibling : ((Node)expr.getParent()).getChildren()) { + isGroupBy |= sibling instanceof ASTNode && ((ASTNode)sibling).getType() == HiveParser.TOK_GROUPBY; +} + +return isGroupBy; + } + + protected boolean isSelectDistinct(ASTNode expr) { +return expr.getType() == HiveParser.TOK_SELECTDI; + } + + protected boolean isAggregateInSelect(Node node, Collection aggregateFunction) { +if (node.getChildren() == null) { + return false; +} + +for (Node child : node.getChildren()) { Review comment: I doubt there is any. The above example is not valid, it says: Unsupported SubQuery Expression Invalid subquery. Subquery with DISTINCT clause is not supported! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r259953429 ## File path: ql/src/test/queries/clientpositive/distinct_groupby.q ## @@ -0,0 +1,57 @@ +--! qt:dataset:src1 + Review comment: adding q tests for non-cbo as well, good idea! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r259953166 ## File path: ql/src/test/results/clientpositive/distinct_groupby.q.out ## @@ -0,0 +1,1530 @@ +PREHOOK: query: explain select distinct count(*) from src1 where key in (128,146,150) +PREHOOK: type: QUERY +PREHOOK: Input: default@src1 + A masked pattern was here +POSTHOOK: query: explain select distinct count(*) from src1 where key in (128,146,150) +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src1 + A masked pattern was here +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Map Reduce + Map Operator Tree: + TableScan +alias: src1 +filterExpr: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: boolean) +Statistics: Num rows: 25 Data size: 2150 Basic stats: COMPLETE Column stats: COMPLETE +Filter Operator + predicate: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: boolean) + Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator +Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE Column stats: COMPLETE +Group By Operator + aggregations: count() + mode: hash + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator +sort order: +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +value expressions: _col0 (type: bigint) + Execution mode: vectorized + Reduce Operator Tree: +Group By Operator + aggregations: count(VALUE._col0) + mode: mergepartial + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator +compressed: false +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +table: +input format: org.apache.hadoop.mapred.SequenceFileInputFormat +output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat +serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 +Fetch Operator + limit: -1 + Processor Tree: +ListSink + +PREHOOK: query: select distinct count(*) from src1 where key in (128,146,150) +PREHOOK: type: QUERY +PREHOOK: Input: default@src1 + A masked pattern was here +POSTHOOK: query: select distinct count(*) from src1 where key in (128,146,150) +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src1 + A masked pattern was here +3 +PREHOOK: query: explain select distinct * from (select distinct count(*) from src1 where key in (128,146,150)) as T +PREHOOK: type: QUERY +PREHOOK: Input: default@src1 + A masked pattern was here +POSTHOOK: query: explain select distinct * from (select distinct count(*) from src1 where key in (128,146,150)) as T +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src1 + A masked pattern was here +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Map Reduce + Map Operator Tree: + TableScan +alias: src1 +filterExpr: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: boolean) +Statistics: Num rows: 25 Data size: 2150 Basic stats: COMPLETE Column stats: COMPLETE +Filter Operator + predicate: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: boolean) + Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator +Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE Column stats: COMPLETE +Group By Operator + aggregations: count() + mode: hash + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator +sort order: +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +value expressions: _col0 (type: bigint) + Execution mode: vectorized + Reduce Operator Tree: +Group By Operator + aggregations: count(VALUE._col0) + mode: mergepartial + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r259857044 ## File path: ql/src/test/results/clientpositive/distinct_groupby.q.out ## @@ -0,0 +1,1530 @@ +PREHOOK: query: explain select distinct count(*) from src1 where key in (128,146,150) +PREHOOK: type: QUERY +PREHOOK: Input: default@src1 + A masked pattern was here +POSTHOOK: query: explain select distinct count(*) from src1 where key in (128,146,150) +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src1 + A masked pattern was here +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Map Reduce + Map Operator Tree: + TableScan +alias: src1 +filterExpr: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: boolean) +Statistics: Num rows: 25 Data size: 2150 Basic stats: COMPLETE Column stats: COMPLETE +Filter Operator + predicate: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: boolean) + Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator +Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE Column stats: COMPLETE +Group By Operator + aggregations: count() + mode: hash + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator +sort order: +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +value expressions: _col0 (type: bigint) + Execution mode: vectorized + Reduce Operator Tree: +Group By Operator + aggregations: count(VALUE._col0) + mode: mergepartial + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator +compressed: false +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +table: +input format: org.apache.hadoop.mapred.SequenceFileInputFormat +output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat +serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 +Fetch Operator + limit: -1 + Processor Tree: +ListSink + +PREHOOK: query: select distinct count(*) from src1 where key in (128,146,150) +PREHOOK: type: QUERY +PREHOOK: Input: default@src1 + A masked pattern was here +POSTHOOK: query: select distinct count(*) from src1 where key in (128,146,150) +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src1 + A masked pattern was here +3 +PREHOOK: query: explain select distinct * from (select distinct count(*) from src1 where key in (128,146,150)) as T +PREHOOK: type: QUERY +PREHOOK: Input: default@src1 + A masked pattern was here +POSTHOOK: query: explain select distinct * from (select distinct count(*) from src1 where key in (128,146,150)) as T +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src1 + A masked pattern was here +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Map Reduce + Map Operator Tree: + TableScan +alias: src1 +filterExpr: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: boolean) +Statistics: Num rows: 25 Data size: 2150 Basic stats: COMPLETE Column stats: COMPLETE +Filter Operator + predicate: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: boolean) + Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator +Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE Column stats: COMPLETE +Group By Operator + aggregations: count() + mode: hash + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator +sort order: +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +value expressions: _col0 (type: bigint) + Execution mode: vectorized + Reduce Operator Tree: +Group By Operator + aggregations: count(VALUE._col0) + mode: mergepartial + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r259839288 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -4230,6 +4229,34 @@ public static long unsetBit(long bitmap, int bitIdx) { } } + protected boolean isGroupBy(ASTNode expr) { +boolean isGroupBy = false; +if (expr.getParent() != null && expr.getParent() instanceof Node) +for (Node sibling : ((Node)expr.getParent()).getChildren()) { + isGroupBy |= sibling instanceof ASTNode && ((ASTNode)sibling).getType() == HiveParser.TOK_GROUPBY; +} + +return isGroupBy; + } + + protected boolean isSelectDistinct(ASTNode expr) { +return expr.getType() == HiveParser.TOK_SELECTDI; + } + + protected boolean isAggregateInSelect(Node node, Collection aggregateFunction) { +if (node.getChildren() == null) { + return false; +} + +for (Node child : node.getChildren()) { Review comment: I don't see how, all the nodes under the SELECT DISTINCT node are for expressions that may be aggregations. Under the SELECT DISTINCT node there may be things like "1 + count(*)" in it, in which case the aggregation is at a lower level in the tree. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r259837145 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -4230,6 +4229,34 @@ public static long unsetBit(long bitmap, int bitIdx) { } } + protected boolean isGroupBy(ASTNode expr) { Review comment: renamed to hasGroupBySibling(ASTNode selectExpr) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r259832367 ## File path: ql/src/test/results/clientpositive/distinct_groupby.q.out ## @@ -0,0 +1,1530 @@ +PREHOOK: query: explain select distinct count(*) from src1 where key in (128,146,150) +PREHOOK: type: QUERY +PREHOOK: Input: default@src1 + A masked pattern was here +POSTHOOK: query: explain select distinct count(*) from src1 where key in (128,146,150) +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src1 + A masked pattern was here +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Map Reduce + Map Operator Tree: + TableScan +alias: src1 +filterExpr: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: boolean) +Statistics: Num rows: 25 Data size: 2150 Basic stats: COMPLETE Column stats: COMPLETE +Filter Operator + predicate: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: boolean) + Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator +Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE Column stats: COMPLETE +Group By Operator + aggregations: count() + mode: hash + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator +sort order: +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +value expressions: _col0 (type: bigint) + Execution mode: vectorized + Reduce Operator Tree: +Group By Operator + aggregations: count(VALUE._col0) + mode: mergepartial + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator +compressed: false +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +table: +input format: org.apache.hadoop.mapred.SequenceFileInputFormat +output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat +serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 +Fetch Operator + limit: -1 + Processor Tree: +ListSink + +PREHOOK: query: select distinct count(*) from src1 where key in (128,146,150) +PREHOOK: type: QUERY +PREHOOK: Input: default@src1 + A masked pattern was here +POSTHOOK: query: select distinct count(*) from src1 where key in (128,146,150) +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src1 + A masked pattern was here +3 +PREHOOK: query: explain select distinct * from (select distinct count(*) from src1 where key in (128,146,150)) as T +PREHOOK: type: QUERY +PREHOOK: Input: default@src1 + A masked pattern was here +POSTHOOK: query: explain select distinct * from (select distinct count(*) from src1 where key in (128,146,150)) as T +POSTHOOK: type: QUERY +POSTHOOK: Input: default@src1 + A masked pattern was here +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 +Map Reduce + Map Operator Tree: + TableScan +alias: src1 +filterExpr: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: boolean) +Statistics: Num rows: 25 Data size: 2150 Basic stats: COMPLETE Column stats: COMPLETE +Filter Operator + predicate: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: boolean) + Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator +Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE Column stats: COMPLETE +Group By Operator + aggregations: count() + mode: hash + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator +sort order: +Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE +value expressions: _col0 (type: bigint) + Execution mode: vectorized + Reduce Operator Tree: +Group By Operator + aggregations: count(VALUE._col0) + mode: mergepartial + outputColumnNames: _col0 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r259824341 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -4194,27 +4191,29 @@ public static long unsetBit(long bitmap, int bitIdx) { } /** - * This function is a wrapper of parseInfo.getGroupByForClause which - * automatically translates SELECT DISTINCT a,b,c to SELECT a,b,c GROUP BY - * a,b,c. + * Returns the GBY, if present; + * DISTINCT, if present, will be handled when generating the SELECT. */ List getGroupByForClause(QBParseInfo parseInfo, String dest) throws SemanticException { -if (parseInfo.getSelForClause(dest).getToken().getType() == HiveParser.TOK_SELECTDI) { - ASTNode selectExprs = parseInfo.getSelForClause(dest); - List result = new ArrayList(selectExprs == null ? 0 - : selectExprs.getChildCount()); - if (selectExprs != null) { -for (int i = 0; i < selectExprs.getChildCount(); ++i) { - if (((ASTNode) selectExprs.getChild(i)).getToken().getType() == HiveParser.QUERY_HINT) { +// When *not* invoked by CalcitePlanner, return the DISTINCT as a GBY +// CBO will handle the DISTINCT in CalcitePlannerAction.genSelectLogicalPlan +ASTNode selectExpr = parseInfo.getSelForClause(dest); +Collection aggregateFunction = parseInfo.getDestToAggregationExprs().get(dest).values(); +if (isSelectDistinct(selectExpr) && !isGroupBy(selectExpr) && !isAggregateInSelect(selectExpr, aggregateFunction)) { + List result = new ArrayList(selectExpr == null ? 0 : selectExpr.getChildCount()); + if (selectExpr != null) { Review comment: agree, removed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By
miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By URL: https://github.com/apache/hive/pull/544#discussion_r259823592 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -3730,7 +3697,9 @@ private RelNode genGBLogicalPlan(QB qb, RelNode srcRel) throws SemanticException ASTNode node = (ASTNode) selExprList.getChild(0).getChild(0); if (node.getToken().getType() == HiveParser.TOK_ALLCOLREF) { // As we said before, here we use genSelectLogicalPlan to rewrite AllColRef - srcRel = genSelectLogicalPlan(qb, srcRel, srcRel, null, null, true).getKey(); + if (!(isSelectDistinct(selExprList) && isGroupBy(selExprList))) { Review comment: you are right, fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services