Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18327 )
Change subject: IMPALA-11123: Optimize count(star) for ORC scans ...................................................................... Patch Set 4: (6 comments) The patch looks pretty good now! http://gerrit.cloudera.org:8080/#/c/18327/4/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/18327/4/be/src/exec/hdfs-orc-scanner.cc@776 PS4, Line 776: int64_t num_rows = static_cast<int64_t>(reader_->getNumberOfRows()); Could you comment that only the scanner of the footer split will run in this case? Also mention we have the special logics in HdfsScanner::IssueFooterRanges(). http://gerrit.cloudera.org:8080/#/c/18327/4/be/src/exec/hdfs-orc-scanner.cc@806 PS4, Line 806: This is an unoptimized count(*) case. I think count(*) won't go here now. If there are any conjuncts for the count(*), we will need to materialize some slots thus it's not a zero slot table scan. I think we should use the comment at line 796 (move it here) and change the example to "select 1" over the table. http://gerrit.cloudera.org:8080/#/c/18327/4/be/src/exec/hdfs-orc-scanner.cc@807 PS4, Line 807: // Insert 'num_to_commit' template tuples into 'row_batch'. Could you comment that only the scanner of the footer split will run in this case? Also mention we have the special logics in HdfsScanner::IssueFooterRanges(). http://gerrit.cloudera.org:8080/#/c/18327/2/testdata/workloads/functional-planner/queries/PlannerTest/orc-stats-agg.test File testdata/workloads/functional-planner/queries/PlannerTest/orc-stats-agg.test: http://gerrit.cloudera.org:8080/#/c/18327/2/testdata/workloads/functional-planner/queries/PlannerTest/orc-stats-agg.test@4 PS2, Line 4: functional_orc_def.uncomp_src_alltypes > This table follow schema from functional.alltypes, but without "transaction I see. I thought managed tables can only be transactional but that's wrong. Double checked that the file schema is non-transactional. Thanks for the explanation! http://gerrit.cloudera.org:8080/#/c/18327/4/testdata/workloads/functional-query/queries/QueryTest/orc-stats-agg.test File testdata/workloads/functional-query/queries/QueryTest/orc-stats-agg.test: http://gerrit.cloudera.org:8080/#/c/18327/4/testdata/workloads/functional-query/queries/QueryTest/orc-stats-agg.test@5 PS4, Line 5: from functional_orc_def.uncomp_src_alltypes Could you add a test to cover the old optimization (ie. zero slot table scan)? E.g. select 1 from functional_orc_def.alltypestiny http://gerrit.cloudera.org:8080/#/c/18327/2/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: http://gerrit.cloudera.org:8080/#/c/18327/2/tests/query_test/test_aggregation.py@279 PS2, Line 279: if (vector.get_value('table_format').file_format != 'text' or : vector.get_value('table_format').compression_codec != 'none'): > Looking again, the core exploration of this test only have single 'text/non Thanks for looking into this! I think it worths a comment here to save time of other developers. -- To view, visit http://gerrit.cloudera.org:8080/18327 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0fafa1182f97323aeb9ee39dd4e8ecd418fa6091 Gerrit-Change-Number: 18327 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Comment-Date: Sat, 26 Mar 2022 09:52:35 +0000 Gerrit-HasComments: Yes