Hello Michael Ho, Lars Volker, Zoram Thanga, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/8814 to look at the new patch set (#10). Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time ...................................................................... IMPALA-6290: limit ScannerContext to 1 buffer at a time This is a prerequisite for constraining the number of buffers per scan range. Before this patch, calling ReadBytes(), SkipBytes(), etc could cause an arbitrary number of I/O buffers to accumulate in 'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for a range and then called ReadBytes(30MB), we would hit resource exhaustion as soon as 3 buffers were accumulated in 'completed_io_buffers_'. The fix is to avoid accumulating any buffers in 'completed_io_buffers_'. Instead of adding them to 'completed_io_buffers_', completed buffers are just returned to the I/O manager. It turned out that this did not weaken the ScannerContext's guarantees about memory lifetime, because ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each time it was called regardless. I checked that this behaviour wasn't a bug by inspecting the scanner code. I could not find any cases where scanners depended on returned memory remaining valid beyond the next Read*()/Get*()/Skip*() call on the stream. This change makes that lifetime explicit in the comments. A side-effect of this fix is that scanners do not need to call ReleaseCompletedResources() in CommitRows() and means that the ScannerContext only ever needs to hold one I/O buffer at a time. This change also reimplements SkipBytes() to avoid it accumulating memory in the boundary buffer for large skip sizes. Also clarifies some of the invariants in ScannerContext. E.g. some places assumed io_buffer_ != NULL, but that is no longer needed. Testing: Ran core tests with ASAN and exhaustive tests with DEBUG. Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/exec/scanner-context.inline.h M be/src/runtime/io/disk-io-mgr.cc M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test 11 files changed, 320 insertions(+), 210 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8814/10 -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>