[Impala-ASF-CR] IMPALA-3202,IMPALA-2298: rework scratch file I/O
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3202,IMPALA-2298: rework scratch file I/O .. Patch Set 15: (15 comments) I wasn't planning in the immediate future to expose the allocation policy, since that would complicate the interface and not provide an immediate benefit. I don't think there's anything that precludes us from doing it at a later point: we'd need to separate out the scratch allocator from the I/O (E.g. FileGroup and TmpFileIoContext) and have a way to pass the scratch range into Write(). Then probably have some kind of callback to allocate a new range if we hit an error. http://gerrit.cloudera.org:8080/#/c/5141/15//COMMIT_MSG Commit Message: PS15, Line 7: IMPALA-2298 > this one is already marked as resolved. is it the right jira? It was a duplicate - fixed Line 54: the remaining data (72M) to /tmp. > this might be a good py.test. what do you think? I couldn't think of a way to achieve this without admin privileges to the machine, which we can't assume we have. Users probably also wouldn't want our test suite messing around with their fstab. http://gerrit.cloudera.org:8080/#/c/5141/15/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 177: > this could use a comment. Done http://gerrit.cloudera.org:8080/#/c/5141/15/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: PS15, Line 45: File > given that File is no longer exposed, make it "file". Or reword this now th Done PS15, Line 48: a > delete or make singular Done Line 51: /// data once the write finishes. > does the read on WriteHandle block if the write wasn't complete yet, or how Reworded to clarify how the callback fits in here. PS15, Line 53: The files are : /// allocated lazily upon write. > lazily with respect to what? this also seems redundant with the first para Removed. Line 56: /// WriteHandle. > these two paragraphs kind of mix together the class abstraction and the imp Tried to reorganise these paragraphs so they start off with abstraction then discuss implementation. Line 57: /// > also, reading through the code (though I haven't got through it all yet), t That is TODO for the next patch - I didn't want to tackle recycling of variable-length scratch ranges in this patch, which isn't needed for BufferedBlockMgr (its previous behaviour was to always allocate max_block_size_ ranges). Added a TODO up the top here. Line 62: class File; > is this public declaration needed? Yeah it's needed for TmpFileMgrTest unfortunately. There wasn't an easy workaround I could find because of how gtest creates a separate class for each test. Line 69: typedef boost::function WriteDoneCallback; > how does the callback know which write was completed? boost::function is really a function pointer plus a data payload, so whoever constructs the function can include a pointer to whatever data they need. I could rework as a plain function pointer with arguments (void* context, WriteHandle* handle, const Status&) or similar - I was just copying DiskIoMgr's interface here. Line 96: /// data). > this last sentence could use clarification that the rewrite is by the calle Clarified the rewrite. I intended the part about "the memory referenced must remain valid" to communicate the ownership requirements. The buffer ownership isn't transferred. Also added an additional sentence to make it explicit that callers shouldn't modify the buffer while the write is in flight. PS15, Line 99: > be Done PS15, Line 102: CancelAndRestoreData > CancelWriteAndRestoreData Done http://gerrit.cloudera.org:8080/#/c/5141/15/be/src/util/buffer-ref.h File be/src/util/buffer-ref.h: Line 26: /// a separate pointer and length. > does this go away once "everything" is switched to using BufferPool BufferH No, since this could refer to a portion of a buffer (or really any range of memory). -- To view, visit http://gerrit.cloudera.org:8080/5141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3202,IMPALA-2298: rework scratch file I/O
Dan Hecht has posted comments on this change. Change subject: IMPALA-3202,IMPALA-2298: rework scratch file I/O .. Patch Set 15: (15 comments) Haven't made it all the way through the code, but had some comments about the header to get out there. One high level comment is that at some point we had thought that the exec nodes (or buffered tuple stream equivalent layer) would need to have some control over how the buffers are laid out in scratch files, to get good sequential read patterns. This new abstraction seems to preclude that possibility, no? http://gerrit.cloudera.org:8080/#/c/5141/15//COMMIT_MSG Commit Message: PS15, Line 7: IMPALA-2298 this one is already marked as resolved. is it the right jira? Line 54: the remaining data (72M) to /tmp. this might be a good py.test. what do you think? http://gerrit.cloudera.org:8080/#/c/5141/15/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 177: this could use a comment. http://gerrit.cloudera.org:8080/#/c/5141/15/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: PS15, Line 45: File given that File is no longer exposed, make it "file". Or reword this now that files are really implementation details. PS15, Line 48: a delete or make singular Line 51: /// data once the write finishes. does the read on WriteHandle block if the write wasn't complete yet, or how is this synchronized? PS15, Line 53: The files are : /// allocated lazily upon write. lazily with respect to what? this also seems redundant with the first paragraph. Line 56: /// WriteHandle. these two paragraphs kind of mix together the class abstraction and the implementation (i.e. files and directories are only implementation details now, right?). Separating it would make the abstraction clearer. Something else I wasn't sure about when reading this comment was the relationship with DiskIoMgr. With the old interface, it wasn't relevant since the user of TmpFileMgr() determined how to do the actually reads/writes. Now that this abstraction does the read/writes, it is relevant. Line 57: /// also, reading through the code (though I haven't got through it all yet), the block_size concept seems pretty integral to the abstraction, but isn't explained here. Line 62: class File; is this public declaration needed? Line 69: typedef boost::function WriteDoneCallback; how does the callback know which write was completed? Line 96: /// data). this last sentence could use clarification that the rewrite is by the callee (i.e. it's not saying the caller may rewrite the data in place). Also, at what point can the caller reclaim the buffer for another use? The buffer ownership is not transferred, right? PS15, Line 99: be PS15, Line 102: CancelAndRestoreData CancelWriteAndRestoreData http://gerrit.cloudera.org:8080/#/c/5141/15/be/src/util/buffer-ref.h File be/src/util/buffer-ref.h: Line 26: /// a separate pointer and length. does this go away once "everything" is switched to using BufferPool BufferHandles? -- To view, visit http://gerrit.cloudera.org:8080/5141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3202,IMPALA-2298: rework scratch file I/O
Tim Armstrong has uploaded a new patch set (#15). Change subject: IMPALA-3202,IMPALA-2298: rework scratch file I/O .. IMPALA-3202,IMPALA-2298: rework scratch file I/O Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into TmpFileMgr, in anticipation of it being shared with BufferPool. TmpFileMgr now handles: * Scratch space allocation and recycling * Read and write I/O The interface is also greatly changed so that it is built around Write() and Read() calls, abstracting away the details of temporary file allocation from clients. This means the TmpFileMgr::File class can be hidden from clients. Write error recovery: Also implement write error recovery in TmpFileMgr. If an error occurs while writing to scratch and we have multiple scratch directories, we will try one of the other directories before cancelling the query. File-level blacklisting is used to prevent excessive repeated attempts to resize a scratch file during a single query. Device-level blacklisting is still disabled because it is problematic to permanently take a scratch directory out of use. To reduce the number of error paths, all I/O errors are now handled asynchronously. Previously errors creating or extending the file were returned synchronously from WriteUnpinnedBlock(). This required modifying DiskIoMgr to create the file if not present when opened. Future Work: * Support for recycling variable-length scratch file ranges. I omitted this to avoid making the patch even large. Testing: Updated BufferedBlockMgr unit test to reflect changes in behaviour: * Scratch space is no longer permanently associated with a block, and is remapped every time a new block is written to disk . * Files are now blacklisted - updated existing tests and enable the disable blacklisting test. Added some basic testing of recycling of scratch file ranges in the TmpFileMgr unit test. I also manually tested the code in two ways. First by removing permissions for /tmp/impala-scratch and ensuring that a spilling query fails cleanly. Second, by creating a tiny ramdisk (16M) and running with two scratch directories: one on /tmp and one on the tiny ramdisk. When spilling, an out of space error is encountered for the tiny ramdisk and impala spills the remaining data (72M) to /tmp. Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 --- M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-block-mgr.h M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M be/src/runtime/query-state.cc A be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h A be/src/util/buffer-ref.h M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M common/thrift/ImpalaInternalService.thrift 15 files changed, 1,151 insertions(+), 506 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/5141/15 -- To view, visit http://gerrit.cloudera.org:8080/5141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong