[Impala-ASF-CR] IMPALA-9495: Support struct in select list for ORC tables

2021-09-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17638 )

Change subject: IMPALA-9495: Support struct in select list for ORC tables
..


Patch Set 17: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7468/


--
To view, visit http://gerrit.cloudera.org:8080/17638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a
Gerrit-Change-Number: 17638
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 13 Sep 2021 20:55:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-13 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 13:

(4 comments)

I think if we can improve the observability a little bit, it will be great.

http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc
File be/src/exec/streaming-aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc@134
PS13, Line 134:  VLOG_QUERY << "the number of rows (" << aggs_[0]->GetNumKeys() 
<< ") returned"
  :   " from the streaming aggregation node has 
exceeded the limit of "
  :   << limit();
If we can add the info to runtime_profile_, it will be more useful. For 
example, to verify that the feature is able to kick in in query tests.

runtime_profile_->AddInfoString("Hdfs Read Thread Concurrency Bucket", 
ss.str());


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934
PS11, Line 2934: limit: 2
> where id = subquery,If this subQuery returns 2 rows, we can sure that it is
Okay. Looks this is a badly written query when it returns more one row. My 
fault.

The following version runs fine on my box and I suppose your new feature should 
not kick in.

select * from functional.alltypes where id in   

  (select i from (select bigint_col as i from functional.alltypes 
  union   
  select tinyint_col as i from functional.alltypes) t
)   
   
;


http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test@446
PS13, Line 446: Verify
Can we also verify that some rows are indeed skipped in spill situation?


http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test@2726
PS13, Line 2726:  speed up aggregations
Can we verify that most of the rows are indeed skipped fast?



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 13 Sep 2021 15:15:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9495: Support struct in select list for ORC tables

2021-09-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17638 )

Change subject: IMPALA-9495: Support struct in select list for ORC tables
..


Patch Set 17: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a
Gerrit-Change-Number: 17638
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 13 Sep 2021 14:44:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9495: Support struct in select list for ORC tables

2021-09-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17638 )

Change subject: IMPALA-9495: Support struct in select list for ORC tables
..


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7468/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a
Gerrit-Change-Number: 17638
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 13 Sep 2021 14:44:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9495: Support struct in select list for ORC tables

2021-09-13 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17638 )

Change subject: IMPALA-9495: Support struct in select list for ORC tables
..


Patch Set 16: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a
Gerrit-Change-Number: 17638
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 13 Sep 2021 13:23:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

2021-09-13 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list 
predicate to ORC reader
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@604
PS2, Line 604: buildStatsPredicate(analyzer, slotRef, binaryPred, 
binaryPred.getOp());
Parquet has a somewhat hacky way of finding EQ predicates in the backend and 
using it in bloom filters: 
https://github.com/apache/impala/blob/master/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1884

It would be great to use a common logic here - I prefer doing the logic in FE, 
but we did it in BE because we (Daniel Becker + me) were more familiar with BE.



--
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 13 Sep 2021 10:00:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9452/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 13 Sep 2021 07:01:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-13 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 13:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG@10
PS12, Line 10:
> nit. exceed the max line threshold.
Done


http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc@76
PS12, Line 76:  VLOG_QUERY << Substitute("the number of rows ($0) returned from 
the aggregation"
 :   " node has exceeded the limit of $1", 
aggs_[0]->GetNumKeys(), limit());
> nit. May use Substitute() which is faster.
Done


http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift@479
PS12, Line 479: compl
> nit. complete
Done


http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@646
PS12, Line 646: When both conditions below are true, aggregati
> nit. I think we should mention the two conditions in the commit message her
Done


http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@648
PS12, Line 648: on n
> nit. May use Complete, as Halt implies stop for some reason.
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934
PS11, Line 2934: limit: 2
> Since we do not push down id from the outer side to the inner, I would thin
where id = subquery,If this subQuery returns 2 rows, we can sure that it is not 
meet the semantic requirement, we should report a semantic error.We don't need 
to get all the results.



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 13 Sep 2021 06:40:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-13 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#13).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to
pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
18 files changed, 142 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/13
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao