Alex Behm has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans ......................................................................
Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/6949/2/be/src/exec/parquet-scratch-tuple-batch.h File be/src/exec/parquet-scratch-tuple-batch.h: Line 130: // because the batch we returned earlier may hold a reference into 'tuple_mem'. > I don't think this is a significant problem for performance - if the scan i Thanks for explaining. Makes sense to me. http://gerrit.cloudera.org:8080/#/c/6949/3/be/src/exec/parquet-scratch-tuple-batch.h File be/src/exec/parquet-scratch-tuple-batch.h: Line 53: MemPool aux_mem_pool; Another thought: How about splitting this pool up into memory that is only referenced by tuples in tuple_mem and memory that could be refd by already-returned batches. For example, the decompression buffers live across multiple scratch/transfer rounds, but as far as I can tell all var-len memory that is allocated from this pool is only referenced by tuples in tuple_mem. I'd even be ok with naming the pools 'var_len_pool' and 'decompression_pool'. That would make it less mysterious what is inside them. This change is mostly for understandability, but it would also allow us to avoid unnecessarily transferring var-len data (I know it's only a nice-to-have from a perf perspective). People reading this code may have the same question as me regarding the var-len data. Line 100: void FinalizeTransfer(RowBatch* dst_batch, int num_to_commit) { FinalizeTupleTransfer() is a little clearer because we typically mean resources in the context of "transfer" Line 102: ++num_output_batches; Wondering if we should make this "non-empty output batches". Line 129: // Compaction is unsafe if the scratch batch was split across multiple output batches Mind moving these two checks into the caller? Checking 'num_output_batches' in the same fn where it is updated makes it more obvious what it is used for. This will also clarify in which situations TryCompact() can "fail", i.e., only when there is not enough mem Line 160: int64_t total_allocated_bytes() { const -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes