Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materilization of columns for filtered out 
rows in Parquet table.
......................................................................


Patch Set 12:

(5 comments)

Looks great. We may need to beef up the test section a bit.

http://gerrit.cloudera.org:8080/#/c/17860/12/be/src/exec/scratch-tuple-batch-test.cc
File be/src/exec/scratch-tuple-batch-test.cc:

http://gerrit.cloudera.org:8080/#/c/17860/12/be/src/exec/scratch-tuple-batch-test.cc@69
PS12, Line 69: 2, 4, 8, 16, 32
nit. I wonder if we can construct a test with randomly assigned true/false for 
the batch as follows.

1. Random assign true/false: for each element, flip a coin;
2. Construct the micro batches;
3. Verify the micro batches as follows: each one has valid start/end and length 
per selected_rows[start, end], and the gap between batch i and i+1 is correct.


http://gerrit.cloudera.org:8080/#/c/17860/12/be/src/exec/scratch-tuple-batch-test.cc@86
PS12, Line 86: Creates
nit. Expect to observe


http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h
File be/src/exec/scratch-tuple-batch.h:

http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@29
PS10, Line 29: ScratchMicroBatch
> Using aggregate initialisers instead of constructor accepting arguments as
Done


http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@203
PS10, Line 203:                     << "should be true";
              :     /// Add the last micro batch which was b
> We can avoid that extra branch and condition and also extra condition on cl
Done


http://gerrit.cloudera.org:8080/#/c/17860/12/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
File testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17860/12/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test@436
PS12, Line 436: row_regex:.* RF00.\[min_max\] -. .\.wr_item_sk.*
In addition, I wonder if we can grab a few counters on late materialized rows.  
For example a total of 100 rows is returned out of a total 1000 rows, then

row_regex: .*LATE_MATERIALIZED_ROWS=100.*

Such test probably can be done for a mix of each kind of expressions handled in 
the code, and the return type of scan column.



--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 12
Gerrit-Owner: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Oct 2021 13:24:52 +0000
Gerrit-HasComments: Yes

Reply via email to