Alex Behm has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6949/2/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 130:     // because the batch we returned earlier may hold a reference 
into 'tuple_mem'.
> I don't think this is a significant problem for performance - if the scan i
Thanks for explaining. Makes sense to me.


http://gerrit.cloudera.org:8080/#/c/6949/3/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 53:   MemPool aux_mem_pool;
Another thought: How about splitting this pool up into memory that is only 
referenced by tuples in tuple_mem and memory that could be refd by 
already-returned batches.

For example, the decompression buffers live across multiple scratch/transfer 
rounds, but as far as I can tell all var-len memory that is allocated from this 
pool is only referenced by tuples in tuple_mem. I'd even be ok with naming the 
pools 'var_len_pool' and 'decompression_pool'. That would make it less 
mysterious what is inside them.

This change is mostly for understandability, but it would also allow us to 
avoid unnecessarily transferring var-len data (I know it's only a nice-to-have 
from a perf perspective).

People reading this code may have the same question as me regarding the var-len 
data.


Line 100:   void FinalizeTransfer(RowBatch* dst_batch, int num_to_commit) {
FinalizeTupleTransfer() is a little clearer because we typically mean resources 
in the context of "transfer"


Line 102:     ++num_output_batches;
Wondering if we should make this "non-empty output batches".


Line 129:     // Compaction is unsafe if the scratch batch was split across 
multiple output batches
Mind moving these two checks into the caller? Checking 'num_output_batches' in 
the same fn where it is updated makes it more obvious what it is used for.

This will also clarify in which situations TryCompact() can "fail", i.e., only 
when there is not enough mem


Line 160:   int64_t total_allocated_bytes() {
const


-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to