Alex Behm has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6812/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 455:     // There are no materialized slots, e.g.
There are no materialized slots and we are not optimizing count(*)


Line 456:     // "select count(*) from alltypes where int_col > 5" and "select 
1 from alltypes".
The first query is not correct, in that case int_col is materialized. I think 
the second query is a sufficient example.


http://gerrit.cloudera.org:8080/#/c/6812/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

Line 171:       if (slot.getColumn() != null && !slot.getStats().hasStats()) 
return true;
Thanks. This was a bug in general, not really specific to your change.


http://gerrit.cloudera.org:8080/#/c/6812/4/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test:

Line 105: from functional_parquet.alltypes
single line query


Line 150: # The optimization is disabled if the output of the count(*) inline 
view is being
Optimization is not applied because the inner count(*) is not materialized. The 
outer count(*) does not reference a base table.


Line 285: # Optimization is not applied when selecting from an empty table.
Do we apply the optimization when we reference a non-empty partitioned table, 
but then we prune all partitions?


Line 323: # materialized. Not materialized agg exprs are ignored.
Non-materialized agg exprs are ignored.


http://gerrit.cloudera.org:8080/#/c/6812/4/testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test:

Line 86: # Verify that 0 is returned when we are selecting from an empty table 
and the optimization
the optimization is not applied in this case right?


Line 95: # Verify that 0 is returned when all the partitioned columns are 
filtered.
when all partitions are pruned


http://gerrit.cloudera.org:8080/#/c/6812/4/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

Line 275:     if (vector.get_value('table_format').file_format != 'text' or
seems weird, why not

vector.get_value('table_format').file_format != 'parquet'


Line 280:     vector.get_value('exec_option')['batch_size'] = 1
Doesn't exhaustive run this test with multiple batch sizes already? If so, then 
no need for this.


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to