[Impala-ASF-CR] IMPALA-9495: Support struct in select list for ORC tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17638 ) Change subject: IMPALA-9495: Support struct in select list for ORC tables .. Patch Set 17: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7468/ -- To view, visit http://gerrit.cloudera.org:8080/17638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a Gerrit-Change-Number: 17638 Gerrit-PatchSet: 17 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 13 Sep 2021 20:55:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 13: (4 comments) I think if we can improve the observability a little bit, it will be great. http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc File be/src/exec/streaming-aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc@134 PS13, Line 134: VLOG_QUERY << "the number of rows (" << aggs_[0]->GetNumKeys() << ") returned" : " from the streaming aggregation node has exceeded the limit of " : << limit(); If we can add the info to runtime_profile_, it will be more useful. For example, to verify that the feature is able to kick in in query tests. runtime_profile_->AddInfoString("Hdfs Read Thread Concurrency Bucket", ss.str()); http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934 PS11, Line 2934: limit: 2 > where id = subquery,If this subQuery returns 2 rows, we can sure that it is Okay. Looks this is a badly written query when it returns more one row. My fault. The following version runs fine on my box and I suppose your new feature should not kick in. select * from functional.alltypes where id in (select i from (select bigint_col as i from functional.alltypes union select tinyint_col as i from functional.alltypes) t ) ; http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test@446 PS13, Line 446: Verify Can we also verify that some rows are indeed skipped in spill situation? http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test@2726 PS13, Line 2726: speed up aggregations Can we verify that most of the rows are indeed skipped fast? -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 13 Sep 2021 15:15:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9495: Support struct in select list for ORC tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17638 ) Change subject: IMPALA-9495: Support struct in select list for ORC tables .. Patch Set 17: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a Gerrit-Change-Number: 17638 Gerrit-PatchSet: 17 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 13 Sep 2021 14:44:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9495: Support struct in select list for ORC tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17638 ) Change subject: IMPALA-9495: Support struct in select list for ORC tables .. Patch Set 17: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7468/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a Gerrit-Change-Number: 17638 Gerrit-PatchSet: 17 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 13 Sep 2021 14:44:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9495: Support struct in select list for ORC tables
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17638 ) Change subject: IMPALA-9495: Support struct in select list for ORC tables .. Patch Set 16: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a Gerrit-Change-Number: 17638 Gerrit-PatchSet: 16 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 13 Sep 2021 13:23:07 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@604 PS2, Line 604: buildStatsPredicate(analyzer, slotRef, binaryPred, binaryPred.getOp()); Parquet has a somewhat hacky way of finding EQ predicates in the backend and using it in bloom filters: https://github.com/apache/impala/blob/master/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1884 It would be great to use a common logic here - I prefer doing the logic in FE, but we did it in BE because we (Daniel Becker + me) were more familiar with BE. -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 13 Sep 2021 10:00:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9452/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 13 Sep 2021 07:01:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 13: (6 comments) http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG@10 PS12, Line 10: > nit. exceed the max line threshold. Done http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc@76 PS12, Line 76: VLOG_QUERY << Substitute("the number of rows ($0) returned from the aggregation" : " node has exceeded the limit of $1", aggs_[0]->GetNumKeys(), limit()); > nit. May use Substitute() which is faster. Done http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift@479 PS12, Line 479: compl > nit. complete Done http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@646 PS12, Line 646: When both conditions below are true, aggregati > nit. I think we should mention the two conditions in the commit message her Done http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@648 PS12, Line 648: on n > nit. May use Complete, as Halt implies stop for some reason. Done http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934 PS11, Line 2934: limit: 2 > Since we do not push down id from the outer side to the inner, I would thin where id = subquery,If this subQuery returns 2 rows, we can sure that it is not meet the semantic requirement, we should report a semantic error.We don't need to get all the results. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 13 Sep 2021 06:40:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#13). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 18 files changed, 142 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/13 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao