Tim Armstrong has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans ......................................................................
Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/6949/3/be/src/exec/parquet-scratch-tuple-batch.h File be/src/exec/parquet-scratch-tuple-batch.h: Line 53: MemPool aux_mem_pool; > Another thought: How about splitting this pool up into memory that is only I started to implement this but realised there were some serious caveats to deep-copying var-len data that mean I'd rather not bundle it into this patch. It may have some "interesting" effects on performance and memory consumption. E.g. if a string column is fully or partially dictionary-encoded it could blow up the memory footprint by duplicating the string values. The change in this patch is more of an unambiguous win. If we don't do the compaction, it seems a bit silly to group memory that is essentially managed in the same way into different MemPools. I don't have a strong objection to it but my intuition is that it's better not to do it. I extended the comment a bit to give better guidance about the appropriate usage patterns and when memory can be added to the pools, which may help make it more comprehensible. Line 100: void FinalizeTransfer(RowBatch* dst_batch, int num_to_commit) { > FinalizeTupleTransfer() is a little clearer because we typically mean resou Done Line 102: ++num_output_batches; > Wondering if we should make this "non-empty output batches". Done. Yeah that's a more precise thing to do. Line 129: // Compaction is unsafe if the scratch batch was split across multiple output batches > Mind moving these two checks into the caller? Checking 'num_output_batches' I moved the first check into the caller, but the second check is actually unnecessary because the code below handles that case correctly. I removed it and commented to that effect. Line 160: int64_t total_allocated_bytes() { > const Done -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes