Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
......................................................................


Patch Set 13:

(4 comments)

I think if we can improve the observability a little bit, it will be great.

http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc
File be/src/exec/streaming-aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc@134
PS13, Line 134:  VLOG_QUERY << "the number of rows (" << aggs_[0]->GetNumKeys() 
<< ") returned"
              :               " from the streaming aggregation node has 
exceeded the limit of "
              :               << limit();
If we can add the info to runtime_profile_, it will be more useful. For 
example, to verify that the feature is able to kick in in query tests.

runtime_profile_->AddInfoString("Hdfs Read Thread Concurrency Bucket", 
ss.str());


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934
PS11, Line 2934: limit: 2
> where id = subquery,If this subQuery returns 2 rows, we can sure that it is
Okay. Looks this is a badly written query when it returns more one row. My 
fault.

The following version runs fine on my box and I suppose your new feature should 
not kick in.

select * from functional.alltypes where id in                                   
            
  (select i from (select bigint_col as i from functional.alltypes     
                  union                           
                  select tinyint_col as i from functional.alltypes) t
)                                                                               
   
;


http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test@446
PS13, Line 446: Verify
Can we also verify that some rows are indeed skipped in spill situation?


http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test@2726
PS13, Line 2726:  speed up aggregations
Can we verify that most of the rows are indeed skipped fast?



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao <liu...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: liuyao <liu...@sensorsdata.cn>
Gerrit-Comment-Date: Mon, 13 Sep 2021 15:15:19 +0000
Gerrit-HasComments: Yes

Reply via email to