[Impala-ASF-CR] [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table.

Amogh Margoor (Code Review) Wed, 06 Oct 2021 07:45:10 -0700

Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )


Change subject: [WIP] IMPALA-9873: Avoid materilization of columns for filtered 
out rows in Parquet table.
......................................................................


Patch Set 5:

(44 comments)

http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/hdfs-columnar-scanner-ir.cc
File be/src/exec/hdfs-columnar-scanner-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/hdfs-columnar-scanner-ir.cc@25
PS1, Line 25: selected_rows
> Can you make this a member of RowBatch or a devived classso you don't need
'selected_rows' is not related to RowBatch. RowBatch is also used in many other 
operators and are serialized/deserialized where selected_rows would not have 
any relevance. Additionally, they hold on to memory much longer than the VLA  
(variable length array) as they are streamed from one operator to another until 
it hits blocking operator or root of fragment. However 'selected_rows' and this 
function is related to ScratchTupleBatch. If scratch batch is not being shared 
by threads, we can make it a member of it - just that getting that memory from 
malloc/new might be slower than VLA.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/hdfs-columnar-scanner-ir.cc
File be/src/exec/hdfs-columnar-scanner-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/hdfs-columnar-scanner-ir.cc@25
PS3, Line 25: selected_rows
> Can you make this a member of RowBatch or a devived classso you don't need
'selected_rows' is not related to RowBatch. Additionally, RowBatch is used at 
many other operators, are serialized/deserialized etc where selected_rows has 
no significance and also can keep holding onto memory much longer than VLA 
(variable length array) currently used as it is streamed from operator to 
another until it reaches root fragment or blocking operator. However, I had 
thought about pushing it into ScratchTupleBatch to which both the function and 
'selected_rows' are related. I didn't do it as VLA is faster than malloc.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/hdfs-orc-scanner.cc@629
PS1, Line 629:     bool selected_rows[scratch_batch_->capacity];
> Should check end of stack here or allocate memory if capacity is anything s
Thinking about making it a member of ScratchTupleBatch as mentioned in another 
discussion.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h
File be/src/exec/parquet/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@430
PS3, Line 430:   std::vector<ParquetColumnReader*> column_readers_;
> Consider packaging these in a class if they are going to be passed together
It's not being passed together in current change except for being initialised 
and not being used outside this class, so we can avoid extra indirection.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@560
PS3, Line 560: d materializ
> materializing
changed it throughout.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@560
PS3, Line 560: th
> nit: them?
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@562
PS3, Line 562: skip materia
> materializing
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@565
PS3, Line 565: Materializing
> materializing
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@568
PS3, Line 568: _;
> Why is it a one-element array? The name also misses a last underscore.
Changed it.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@927
PS3, Line 927:   /// Micro batches are sub ranges in 0..num_tuples-1 which 
needs to be read.
> nit: Unnecessary empty line.
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@928
PS3, Line 928:   /// Tuple memory to write to is specified by 
'scratch_batch->tuple_mem'.
> Pass vector as reference to avoid copying.
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@930
PS3, Line 930: ch* row_batch, bool
> nit: 'micro_batches'
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@941
PS3, Line 941:
> nit: first
Done


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.h
File be/src/exec/parquet/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.h@431
PS1, Line 431:   /// Column readers among 'column_readers_' used for filtering
> Consider packaging these in a class if they are going to be passed together
In recent change, they are not being passed together except when being 
initialised.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.h@911
PS1, Line 911:   /// values that pass the conjuncts.
> Pass vector as reference to avoid copying.
Done


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@246
PS1, Line 246: }
> Declare function const.
Done


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@254
PS1, Line 254:       std::end(scan_node_->conjuncts()));
> use conjunct_slot_ids instead of making a copy.
Done


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2147
PS1, Line 2147:       bool continue_execution;
> Old vs new not meaningful comment here. This doesn't look like the old logi
This is changed now.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2148
PS1, Line 2148:       if (col_reader->max_rep_level() > 0) {
> change array[1] to single variable.
It has been pulled to member variable.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2151
PS1, Line 2151:             &scratch_batch_->num_tuples);
> Probably best to leave the old logic in-tact here both to avoid any potenti
makes sense. I have kept the old code intact in AssemblyRowInternal.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2166
PS1, Line 2166:           Status err(Substitute("Corrupt Parquet file '$0': 
column '$1' "
> Move this new logic into a function as well.
This function has only the new logic now.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2238
PS1, Line 2238:     if (num_row_to_commit == 0) {
> How does it end up in this case?
It ends up here even for simple case like consecutive true values. But for more 
complicated cases: assume skip_length as 5 and batch of 10: TTTFFFTTTT - it 
will enter here for all true values except the first one.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2253
PS1, Line 2253:         // When using Page Index, materialize the entire batch. 
Currently, only avoiding
> Can we only get here if batch_size==0?
We will get here for pretty much everything except when batch_size is 0 or when 
all values in batch are false. I am assuming you meant to ask when will we hit 
the false branch here. However, I am removing condition (start = -1) and adding 
DCHECK to ensure batch sizes are not 0 and values are not false.


http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2275
PS1, Line 2275:     if (row_end) {
> Add comments how this can occur
Added comment to the function declaration of 'SkipRows'.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@249
PS3, Line 249: const vector<ParquetColumnRead
> nit. const vector<ParquetColumnReader*>&.
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@250
PS3, Line 250: vector<ParquetColumnReader*>* filter_readers,
             :     vector<ParquetColumnReader*>* non_filter_readers)
> Declare function const.
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@250
PS3, Line 250: vector<ParquetColumnReader*>* filter_readers,
             :     vector<ParquetColumnReader*>* non_filter_readers)
> In Impala BE, prefer to use vector<ParquetColumnReader*>* when the argument
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@259
PS3, Line 259:     ParquetColumnReader* column_reader = 
column_readers[column_idx];
> use conjunct_slot_ids instead of making a copy.
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@270
PS3, Line 270:
> should use const T&.
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-chunk-reader.h
File be/src/exec/parquet/parquet-column-chunk-reader.h:

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-chunk-reader.h@127
PS3, Line 127: should be called after we know that the first page is not a 
dictionary page.
             :     // Th
> nit. format
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h
File be/src/exec/parquet/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@439
PS3, Line 439:  and
> will?
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@439
PS3, Line 439:
> nit. to insert "regardless of the page type" to make it clear.
its going to be data page only. its just a wrapper over ReadNextPageHeader. 
changed the comment.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@526
PS3, Line 526:
> missing comment?
added.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@527
PS3, Line 527:   /// Skip values in the page data. Returns true on success, 
false otherwise.
> May add a DCHECK(candidate_page_idx_ <= candidate_data_pages_.size() - 1) h
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@553
PS3, Line 553:
> nit. Noticed that this change and all the ones following this one are due t
sure. Not sure if this happened due to clang-format.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc
File be/src/exec/parquet/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1151
PS3, Line 1151: Status::OK();
> We should return a non Status::OK() object here.
Not really. It depends on if abort_on_error is set as Query option. 
LogCorruptNumValuesInMetadataError is used similarly in other places.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1160
PS3, Line 1160: UNLIKELY(num_v
> nit. UNLIKELY()
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1210
PS3, Line 1210: remaining
> init to 0.
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1372
PS3, Line 1372:
> This is redundant.
right. removed it. thanks.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1479
PS3, Line 1479: AdvanceNextPageHea
> AdvanceNextPageHeader() sounds better.
For sure. Changed it. I was under the influence of 'JumpToNextPage'.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1484
PS3, Line 1484: num_buffered_values_ == 0
> I wonder if we could skip this empty page and advance to the next one.
This signifies end of RowGroup. Same logic is currently being used (check 
NextPage()).


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1509
PS3, Line 1509: /// Wrapper around 'SkipTopLevelRows' to skip across multiple 
pages.
              : /// Function handles 3 scenarios:
              : /// 1. Page Filtering: When this is enabled this function can 
be used
              : ///    to skip to a particular 'skip_row_id'.
              : /// 2. Collection: When this scalar reader is reading elements 
of a collection
              : /// 3. Rest of the cases.
              : //
> This probably should be inlined.
makes sense. done.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1517
PS3, Line 1517: /// At that point, we can just skip to the required row id.
> Please add a description for the algorithm used in the method, as well as o
Done


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1538
PS3, Line 1538: }
> Do we need to check the status here?
We don't need to check, we can just return it and client will check it. But 
before that we can fix the candidate_row_range even when skipping happened 
partially and failed.



--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Oct 2021 14:45:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table.

Reply via email to