Alex Behm has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization ......................................................................
Patch Set 4: (11 comments) http://gerrit.cloudera.org:8080/#/c/6812/4/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 455: // There are no materialized slots, e.g. There are no materialized slots and we are not optimizing count(*) Line 456: // "select count(*) from alltypes where int_col > 5" and "select 1 from alltypes". The first query is not correct, in that case int_col is materialized. I think the second query is a sufficient example. http://gerrit.cloudera.org:8080/#/c/6812/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: Line 171: if (slot.getColumn() != null && !slot.getStats().hasStats()) return true; Thanks. This was a bug in general, not really specific to your change. http://gerrit.cloudera.org:8080/#/c/6812/4/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test File testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test: Line 105: from functional_parquet.alltypes single line query Line 150: # The optimization is disabled if the output of the count(*) inline view is being Optimization is not applied because the inner count(*) is not materialized. The outer count(*) does not reference a base table. Line 285: # Optimization is not applied when selecting from an empty table. Do we apply the optimization when we reference a non-empty partitioned table, but then we prune all partitions? Line 323: # materialized. Not materialized agg exprs are ignored. Non-materialized agg exprs are ignored. http://gerrit.cloudera.org:8080/#/c/6812/4/testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test: Line 86: # Verify that 0 is returned when we are selecting from an empty table and the optimization the optimization is not applied in this case right? Line 95: # Verify that 0 is returned when all the partitioned columns are filtered. when all partitions are pruned http://gerrit.cloudera.org:8080/#/c/6812/4/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: Line 275: if (vector.get_value('table_format').file_format != 'text' or seems weird, why not vector.get_value('table_format').file_format != 'parquet' Line 280: vector.get_value('exec_option')['batch_size'] = 1 Doesn't exhaustive run this test with multiple batch sizes already? If so, then no need for this. -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com> Gerrit-HasComments: Yes