Tim Armstrong has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans ......................................................................
Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/6949/2/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 1031: DCHECK_EQ(0, scratch_batch_->total_allocated_bytes()); > Where do the decompression buffers get freed? IMPALA-5304 made that redundant (I missed doing this cleanup in that patch). http://gerrit.cloudera.org:8080/#/c/6949/2/be/src/exec/parquet-scratch-tuple-batch.h File be/src/exec/parquet-scratch-tuple-batch.h: PS2, Line 48: MemPool > It's not clear from the var names comments where the var-len data goes. Tha I elaborated on the comments to make it a bit more explicit. Line 50: // Pool used to accumulate other memory such as decompression buffers that may be > may be referenced Done Line 109: dst_batch->tuple_data_pool()->AcquireData(&aux_mem_pool, false); > I would have thought that the var-len data like strings or collections can We need to transfer the allocations for var-len data regardless since we have no way to know for sure that previously-returned batches don't reference some of that data. So we could still run into a similar problem with excessive transfer of var-len data in some cases but I think that's impossible to solve without larger changes to the memory ownership model because it would require more precise tracking of which buffers may be referenced by already-returned rows. Reducing the amount of fixed-length data transferred will help a lot regardless by reducing the overall volume of lock contention. It's also counter-productive if the data is dictionary-encoded. Line 130: if (num_output_batches > 1) return false; > This new compaction has non-obvious caveats like this one, and I find the f I don't think this is a significant problem for performance - if the scan is selective then the scratch batch will have many fewer rows than the output batches and only a handful of batches will hit this edge case. Agree it's subtle but I think there will be subtleties regardless with our current memory transfer model. I tried to keep this encapsulated in FinalizeTransfer()/TryCompact()/Reset() so that this isn't spread out throughout the code. I think the alternative has some issues that I would rather avoid. I think doing two passes over the batch will be slower since the loop in ProcessScratchBatch() is already tight. It would also complicate the logic around 'tuple_mem' because compacted memory lives temporarily in the scratch batch before being moved to the destination batch - I think you would need additional state in the scratch batch to track both the compacted tuples and the tuple buffer that you're going to reuse. Line 139: for (int i = dst_batch->num_rows(); i < end_row; ++i) { > Don't we have a CopyRows() for this in RowBatch? That seems to copy the tuple pointers rather than the tuples themselves. -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes