Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: [WIP] IMPALA-9873: Avoid materilization of columns for filtered 
out rows in Parquet table.
......................................................................


Patch Set 1:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/hdfs-columnar-scanner-ir.cc
File be/src/exec/hdfs-columnar-scanner-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/hdfs-columnar-scanner-ir.cc@25
PS1, Line 25: selected_rows
Can you make this a member of RowBatch or a devived classso you don't need to 
pass the 2 (related) arguments separately?


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/hdfs-orc-scanner.cc@629
PS1, Line 629:     bool selected_rows[scratch_batch_->capacity];
Should check end of stack here or allocate memory if capacity is anything 
substantial. If not, comment/DCHECK that size is small. Non-issue if this moves 
into RowBatch.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.h
File be/src/exec/parquet/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.h@431
PS1, Line 431:   /// Column readers among 'column_readers_' used for filtering
Consider packaging these in a class if they are going to be passed together.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.h@911
PS1, Line 911:   vector<SlotId> GetSlotIdsForConjuncts(vector<ScalarExpr*> 
conjuncts);
Pass vector as reference to avoid copying.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@246
PS1, Line 246:     vector<ParquetColumnReader*>& non_filter_readers) {
Declare function const.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@254
PS1, Line 254:   slot_ids.insert(std::begin(conjunct_slot_ids), 
std::end(conjunct_slot_ids));
use conjunct_slot_ids instead of making a copy.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2147
PS1, Line 2147:       // Old logic
Old vs new not meaningful comment here. This doesn't look like the old logic..


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2148
PS1, Line 2148:       ScratchMicroBatch batches[1];
change array[1] to single variable.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2151
PS1, Line 2151:       batches[0].length = scratch_batch_->capacity;
Probably best to leave the old logic in-tact here both to avoid any potential 
performance regressions and so that the new logic can be completely disabled 
via parameter. Plus calling the new interface for a single batch seems clunky.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2166
PS1, Line 2166:       ScratchMicroBatch batches[1];
Move this new logic into a function as well.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2238
PS1, Line 2238:         // continue the old range as 'last' is within 
'skip_length' of last range.
How does it end up in this case?


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2253
PS1, Line 2253:     batches[range].start = start;
Can we only get here if batch_size==0?


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2275
PS1, Line 2275:             return Status(Substitute("Couldn't skip rows in 
file $0.", filename()));
Add comments how this can occur



--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 1
Gerrit-Owner: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Sep 2021 18:07:13 +0000
Gerrit-HasComments: Yes

Reply via email to