Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 )
Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations ...................................................................... Patch Set 13: (4 comments) I think if we can improve the observability a little bit, it will be great. http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc File be/src/exec/streaming-aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc@134 PS13, Line 134: VLOG_QUERY << "the number of rows (" << aggs_[0]->GetNumKeys() << ") returned" : " from the streaming aggregation node has exceeded the limit of " : << limit(); If we can add the info to runtime_profile_, it will be more useful. For example, to verify that the feature is able to kick in in query tests. runtime_profile_->AddInfoString("Hdfs Read Thread Concurrency Bucket", ss.str()); http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934 PS11, Line 2934: limit: 2 > where id = subquery,If this subQuery returns 2 rows, we can sure that it is Okay. Looks this is a badly written query when it returns more one row. My fault. The following version runs fine on my box and I suppose your new feature should not kick in. select * from functional.alltypes where id in (select i from (select bigint_col as i from functional.alltypes union select tinyint_col as i from functional.alltypes) t ) ; http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test@446 PS13, Line 446: Verify Can we also verify that some rows are indeed skipped in spill situation? http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test@2726 PS13, Line 2726: speed up aggregations Can we verify that most of the rows are indeed skipped fast? -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao <liu...@sensorsdata.cn> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: liuyao <liu...@sensorsdata.cn> Gerrit-Comment-Date: Mon, 13 Sep 2021 15:15:19 +0000 Gerrit-HasComments: Yes