Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 )
Change subject: IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. ...................................................................... Patch Set 12: (5 comments) Looks great. We may need to beef up the test section a bit. http://gerrit.cloudera.org:8080/#/c/17860/12/be/src/exec/scratch-tuple-batch-test.cc File be/src/exec/scratch-tuple-batch-test.cc: http://gerrit.cloudera.org:8080/#/c/17860/12/be/src/exec/scratch-tuple-batch-test.cc@69 PS12, Line 69: 2, 4, 8, 16, 32 nit. I wonder if we can construct a test with randomly assigned true/false for the batch as follows. 1. Random assign true/false: for each element, flip a coin; 2. Construct the micro batches; 3. Verify the micro batches as follows: each one has valid start/end and length per selected_rows[start, end], and the gap between batch i and i+1 is correct. http://gerrit.cloudera.org:8080/#/c/17860/12/be/src/exec/scratch-tuple-batch-test.cc@86 PS12, Line 86: Creates nit. Expect to observe http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h File be/src/exec/scratch-tuple-batch.h: http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@29 PS10, Line 29: ScratchMicroBatch > Using aggregate initialisers instead of constructor accepting arguments as Done http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@203 PS10, Line 203: << "should be true"; : /// Add the last micro batch which was b > We can avoid that extra branch and condition and also extra condition on cl Done http://gerrit.cloudera.org:8080/#/c/17860/12/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test File testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test: http://gerrit.cloudera.org:8080/#/c/17860/12/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test@436 PS12, Line 436: row_regex:.* RF00.\[min_max\] -. .\.wr_item_sk.* In addition, I wonder if we can grab a few counters on late materialized rows. For example a total of 100 rows is returned out of a total 1000 rows, then row_regex: .*LATE_MATERIALIZED_ROWS=100.* Such test probably can be done for a mix of each kind of expressions handled in the code, and the return type of scan column. -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 12 Gerrit-Owner: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 26 Oct 2021 13:24:52 +0000 Gerrit-HasComments: Yes