Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 )
Change subject: [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. ...................................................................... Patch Set 5: (44 comments) http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/hdfs-columnar-scanner-ir.cc File be/src/exec/hdfs-columnar-scanner-ir.cc: http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/hdfs-columnar-scanner-ir.cc@25 PS1, Line 25: selected_rows > Can you make this a member of RowBatch or a devived classso you don't need 'selected_rows' is not related to RowBatch. RowBatch is also used in many other operators and are serialized/deserialized where selected_rows would not have any relevance. Additionally, they hold on to memory much longer than the VLA (variable length array) as they are streamed from one operator to another until it hits blocking operator or root of fragment. However 'selected_rows' and this function is related to ScratchTupleBatch. If scratch batch is not being shared by threads, we can make it a member of it - just that getting that memory from malloc/new might be slower than VLA. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/hdfs-columnar-scanner-ir.cc File be/src/exec/hdfs-columnar-scanner-ir.cc: http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/hdfs-columnar-scanner-ir.cc@25 PS3, Line 25: selected_rows > Can you make this a member of RowBatch or a devived classso you don't need 'selected_rows' is not related to RowBatch. Additionally, RowBatch is used at many other operators, are serialized/deserialized etc where selected_rows has no significance and also can keep holding onto memory much longer than VLA (variable length array) currently used as it is streamed from operator to another until it reaches root fragment or blocking operator. However, I had thought about pushing it into ScratchTupleBatch to which both the function and 'selected_rows' are related. I didn't do it as VLA is faster than malloc. http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/hdfs-orc-scanner.cc@629 PS1, Line 629: bool selected_rows[scratch_batch_->capacity]; > Should check end of stack here or allocate memory if capacity is anything s Thinking about making it a member of ScratchTupleBatch as mentioned in another discussion. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h File be/src/exec/parquet/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@430 PS3, Line 430: std::vector<ParquetColumnReader*> column_readers_; > Consider packaging these in a class if they are going to be passed together It's not being passed together in current change except for being initialised and not being used outside this class, so we can avoid extra indirection. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@560 PS3, Line 560: d materializ > materializing changed it throughout. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@560 PS3, Line 560: th > nit: them? Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@562 PS3, Line 562: skip materia > materializing Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@565 PS3, Line 565: Materializing > materializing Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@568 PS3, Line 568: _; > Why is it a one-element array? The name also misses a last underscore. Changed it. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@927 PS3, Line 927: /// Micro batches are sub ranges in 0..num_tuples-1 which needs to be read. > nit: Unnecessary empty line. Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@928 PS3, Line 928: /// Tuple memory to write to is specified by 'scratch_batch->tuple_mem'. > Pass vector as reference to avoid copying. Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@930 PS3, Line 930: ch* row_batch, bool > nit: 'micro_batches' Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.h@941 PS3, Line 941: > nit: first Done http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.h File be/src/exec/parquet/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.h@431 PS1, Line 431: /// Column readers among 'column_readers_' used for filtering > Consider packaging these in a class if they are going to be passed together In recent change, they are not being passed together except when being initialised. http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.h@911 PS1, Line 911: /// values that pass the conjuncts. > Pass vector as reference to avoid copying. Done http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@246 PS1, Line 246: } > Declare function const. Done http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@254 PS1, Line 254: std::end(scan_node_->conjuncts())); > use conjunct_slot_ids instead of making a copy. Done http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2147 PS1, Line 2147: bool continue_execution; > Old vs new not meaningful comment here. This doesn't look like the old logi This is changed now. http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2148 PS1, Line 2148: if (col_reader->max_rep_level() > 0) { > change array[1] to single variable. It has been pulled to member variable. http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2151 PS1, Line 2151: &scratch_batch_->num_tuples); > Probably best to leave the old logic in-tact here both to avoid any potenti makes sense. I have kept the old code intact in AssemblyRowInternal. http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2166 PS1, Line 2166: Status err(Substitute("Corrupt Parquet file '$0': column '$1' " > Move this new logic into a function as well. This function has only the new logic now. http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2238 PS1, Line 2238: if (num_row_to_commit == 0) { > How does it end up in this case? It ends up here even for simple case like consecutive true values. But for more complicated cases: assume skip_length as 5 and batch of 10: TTTFFFTTTT - it will enter here for all true values except the first one. http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2253 PS1, Line 2253: // When using Page Index, materialize the entire batch. Currently, only avoiding > Can we only get here if batch_size==0? We will get here for pretty much everything except when batch_size is 0 or when all values in batch are false. I am assuming you meant to ask when will we hit the false branch here. However, I am removing condition (start = -1) and adding DCHECK to ensure batch sizes are not 0 and values are not false. http://gerrit.cloudera.org:8080/#/c/17860/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2275 PS1, Line 2275: if (row_end) { > Add comments how this can occur Added comment to the function declaration of 'SkipRows'. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@249 PS3, Line 249: const vector<ParquetColumnRead > nit. const vector<ParquetColumnReader*>&. Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@250 PS3, Line 250: vector<ParquetColumnReader*>* filter_readers, : vector<ParquetColumnReader*>* non_filter_readers) > Declare function const. Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@250 PS3, Line 250: vector<ParquetColumnReader*>* filter_readers, : vector<ParquetColumnReader*>* non_filter_readers) > In Impala BE, prefer to use vector<ParquetColumnReader*>* when the argument Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@259 PS3, Line 259: ParquetColumnReader* column_reader = column_readers[column_idx]; > use conjunct_slot_ids instead of making a copy. Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@270 PS3, Line 270: > should use const T&. Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-chunk-reader.h File be/src/exec/parquet/parquet-column-chunk-reader.h: http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-chunk-reader.h@127 PS3, Line 127: should be called after we know that the first page is not a dictionary page. : // Th > nit. format Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h File be/src/exec/parquet/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@439 PS3, Line 439: and > will? Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@439 PS3, Line 439: > nit. to insert "regardless of the page type" to make it clear. its going to be data page only. its just a wrapper over ReadNextPageHeader. changed the comment. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@526 PS3, Line 526: > missing comment? added. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@527 PS3, Line 527: /// Skip values in the page data. Returns true on success, false otherwise. > May add a DCHECK(candidate_page_idx_ <= candidate_data_pages_.size() - 1) h Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.h@553 PS3, Line 553: > nit. Noticed that this change and all the ones following this one are due t sure. Not sure if this happened due to clang-format. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1151 PS3, Line 1151: Status::OK(); > We should return a non Status::OK() object here. Not really. It depends on if abort_on_error is set as Query option. LogCorruptNumValuesInMetadataError is used similarly in other places. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1160 PS3, Line 1160: UNLIKELY(num_v > nit. UNLIKELY() Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1210 PS3, Line 1210: remaining > init to 0. Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1372 PS3, Line 1372: > This is redundant. right. removed it. thanks. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1479 PS3, Line 1479: AdvanceNextPageHea > AdvanceNextPageHeader() sounds better. For sure. Changed it. I was under the influence of 'JumpToNextPage'. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1484 PS3, Line 1484: num_buffered_values_ == 0 > I wonder if we could skip this empty page and advance to the next one. This signifies end of RowGroup. Same logic is currently being used (check NextPage()). http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1509 PS3, Line 1509: /// Wrapper around 'SkipTopLevelRows' to skip across multiple pages. : /// Function handles 3 scenarios: : /// 1. Page Filtering: When this is enabled this function can be used : /// to skip to a particular 'skip_row_id'. : /// 2. Collection: When this scalar reader is reading elements of a collection : /// 3. Rest of the cases. : // > This probably should be inlined. makes sense. done. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1517 PS3, Line 1517: /// At that point, we can just skip to the required row id. > Please add a description for the algorithm used in the method, as well as o Done http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1538 PS3, Line 1538: } > Do we need to check the status here? We don't need to check, we can just return it and client will check it. But before that we can fix the candidate_row_range even when skipping happened partially and failed. -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 06 Oct 2021 14:45:00 +0000 Gerrit-HasComments: Yes