Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6949/2/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 1031:     DCHECK_EQ(0, scratch_batch_->total_allocated_bytes());
> Where do the decompression buffers get freed?
IMPALA-5304 made that redundant (I missed doing this cleanup in that patch).


http://gerrit.cloudera.org:8080/#/c/6949/2/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

PS2, Line 48: MemPool
> It's not clear from the var names comments where the var-len data goes. Tha
I elaborated on the comments to make it a bit more explicit.


Line 50:   // Pool used to accumulate other memory such as decompression 
buffers that may be
> may be referenced
Done


Line 109:     dst_batch->tuple_data_pool()->AcquireData(&aux_mem_pool, false);
> I would have thought that the var-len data like strings or collections can 
We need to transfer the allocations for var-len data regardless since we have 
no way to know for sure that previously-returned batches don't reference some 
of that data.

So we could still run into a similar problem with excessive transfer of var-len 
data in some cases but I think that's impossible to solve without larger 
changes to the memory ownership model because it would require more precise 
tracking of which buffers may be referenced by already-returned rows. Reducing 
the amount of fixed-length data transferred will help a lot regardless by 
reducing the overall volume of lock contention.

It's also counter-productive if the data is dictionary-encoded.


Line 130:     if (num_output_batches > 1) return false;
> This new compaction has non-obvious caveats like this one, and I find the f
I don't think this is a significant problem for performance - if the scan is 
selective then the scratch batch will have many fewer rows than the output 
batches and only a handful of batches will hit this edge case.

Agree it's subtle but I think there will be subtleties regardless with our 
current memory transfer model. I tried to keep this encapsulated in 
FinalizeTransfer()/TryCompact()/Reset() so that this isn't spread out 
throughout the code.

I think the alternative has some issues that I would rather avoid. I think 
doing two passes over the batch will be slower since the loop in 
ProcessScratchBatch() is already tight.

It would also complicate the logic around 'tuple_mem' because compacted memory 
lives temporarily in the scratch batch before being moved to the destination 
batch - I think you would need additional state in the scratch batch to track 
both the compacted tuples and the tuple buffer that you're going to reuse.


Line 139:     for (int i = dst_batch->num_rows(); i < end_row; ++i) {
> Don't we have a CopyRows() for this in RowBatch?
That seems to copy the tuple pointers rather than the tuples themselves.


-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to