Dan Hecht has posted comments on this change. Change subject: IMPALA-3202,IMPALA-2298: rework scratch file I/O ......................................................................
Patch Set 15: (15 comments) Haven't made it all the way through the code, but had some comments about the header to get out there. One high level comment is that at some point we had thought that the exec nodes (or buffered tuple stream equivalent layer) would need to have some control over how the buffers are laid out in scratch files, to get good sequential read patterns. This new abstraction seems to preclude that possibility, no? http://gerrit.cloudera.org:8080/#/c/5141/15//COMMIT_MSG Commit Message: PS15, Line 7: IMPALA-2298 this one is already marked as resolved. is it the right jira? Line 54: the remaining data (72M) to /tmp. this might be a good py.test. what do you think? http://gerrit.cloudera.org:8080/#/c/5141/15/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 177: this could use a comment. http://gerrit.cloudera.org:8080/#/c/5141/15/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: PS15, Line 45: File given that File is no longer exposed, make it "file". Or reword this now that files are really implementation details. PS15, Line 48: a delete or make singular Line 51: /// data once the write finishes. does the read on WriteHandle block if the write wasn't complete yet, or how is this synchronized? PS15, Line 53: The files are : /// allocated lazily upon write. lazily with respect to what? this also seems redundant with the first paragraph. Line 56: /// WriteHandle. these two paragraphs kind of mix together the class abstraction and the implementation (i.e. files and directories are only implementation details now, right?). Separating it would make the abstraction clearer. Something else I wasn't sure about when reading this comment was the relationship with DiskIoMgr. With the old interface, it wasn't relevant since the user of TmpFileMgr() determined how to do the actually reads/writes. Now that this abstraction does the read/writes, it is relevant. Line 57: /// also, reading through the code (though I haven't got through it all yet), the block_size concept seems pretty integral to the abstraction, but isn't explained here. Line 62: class File; is this public declaration needed? Line 69: typedef boost::function<void(const Status&)> WriteDoneCallback; how does the callback know which write was completed? Line 96: /// data). this last sentence could use clarification that the rewrite is by the callee (i.e. it's not saying the caller may rewrite the data in place). Also, at what point can the caller reclaim the buffer for another use? The buffer ownership is not transferred, right? PS15, Line 99: be PS15, Line 102: CancelAndRestoreData CancelWriteAndRestoreData http://gerrit.cloudera.org:8080/#/c/5141/15/be/src/util/buffer-ref.h File be/src/util/buffer-ref.h: Line 26: /// a separate pointer and length. does this go away once "everything" is switched to using BufferPool BufferHandles? -- To view, visit http://gerrit.cloudera.org:8080/5141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-HasComments: Yes