Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6949/3/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 53:   MemPool aux_mem_pool;
> Another thought: How about splitting this pool up into memory that is only 
I started to implement this but realised there were some serious caveats to 
deep-copying var-len data that mean I'd rather not bundle it into this patch. 
It may have some "interesting" effects on performance and memory consumption. 
E.g. if a string column is fully or partially dictionary-encoded it could blow 
up the memory footprint by duplicating the string values.

The change in this patch is more of an unambiguous win.

If we don't do the compaction, it seems a bit silly to group memory that is 
essentially managed in the same way into different MemPools. I don't have a 
strong objection to it but my intuition is that it's better not to do it. I 
extended the comment a bit to give better guidance about the appropriate usage 
patterns and when memory can be added to the pools, which may help make it more 
comprehensible.


Line 100:   void FinalizeTransfer(RowBatch* dst_batch, int num_to_commit) {
> FinalizeTupleTransfer() is a little clearer because we typically mean resou
Done


Line 102:     ++num_output_batches;
> Wondering if we should make this "non-empty output batches".
Done. Yeah that's a more precise thing to do.


Line 129:     // Compaction is unsafe if the scratch batch was split across 
multiple output batches
> Mind moving these two checks into the caller? Checking 'num_output_batches'
I moved the first check into the caller, but the second check is actually 
unnecessary because the code below handles that case correctly. I removed it 
and commented to that effect.


Line 160:   int64_t total_allocated_bytes() {
> const
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to