Tim Armstrong has posted comments on this change. Change subject: IMPALA-3202,IMPALA-2079: rework scratch file I/O ......................................................................
Patch Set 20: (25 comments) Added an end-to-end test. This required adding a flag to allow multiple scratch dirs per device. http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/runtime/buffered-block-mgr.cc File be/src/runtime/buffered-block-mgr.cc: Line 633: // The block's buffer is still in memory with the original data. > ... but we couldn't get additional reservation. However, we can use 'releas Done Line 656: > // FindBufferForBlock() wasn't able to find a buffer so transfer the one fr Done Line 845: if (block->write_handle_ != NULL) { > how could write_handle_ be NULL here? anyway, okay with leaving this if-st Good point. Removed the unnecessary if stmt. http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 363: int64_t GetNumWritesOutstanding(); > since the test class is a friend, can this be made private? Done PS20, Line 403: pinned state > but this method doesn't actually set the pin bit on the block... Went with CancelWrite(), since it also reverses the effects if the write is finished. PS20, Line 458: or > double or Done http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: PS20, Line 1167: with > will Done Line 1173: write_range->file_, errno, GetStrErrMsg()))); > weird line break clang-format gets funny with multi-line strings. I reworded the error message anyway to be more user-centric and it wraps better now. http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: Line 41: /// The physical file is created on the first call to AllocateSpace(). > is that true? Nope, fixed. PS20, Line 43: > on Done Line 93: /// blacklisted, the corresponding device will always be blacklisted. > hmm i don't see how a device is blacklisted. Yeah this comment is stale - the device blacklisting was removed a long time ago. Updated comment. http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: Line 404: // Don't hold 'lock_' outside AllocateSpace() to minimise contention. > not sure what this comment is telling me. Removed, I don't think it's needed. PS20, Line 499: Log the error before retrying so that the failure isn't swallowed silently. > is this referring to the old LogError() code, or the scratch_errors_? Yeah it needs updating - "Save and report" i think is more accurate. Line 500: // Make sure we don't allocate new blocks in this file. This is to avoid hitting Removed the second part of the comment - I don't think it added anything that wasn't covered by the long comment further down. Line 507: handle->file_->ReportIOError(write_status.msg()); > this does the file-level blacklist. But i don't see where the device-level We don't do any permanent device-level blacklisting. http://gerrit.cloudera.org:8080/#/c/5141/19/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: Line 330: /// so no locking is required. This is a terminal lock and should not be held while > There's nothing really about this class that makes it need to have query sc Yep, it's all gone now - much simpler http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: PS20, Line 40: local > delete, or say (usually local) or similar Done PS20, Line 48: Temporary files are managed via a FileGroup. To write to a temporary file, first a : /// FileGroup is created, then FileGroup::Write() is called to asynchronously write a : /// memory buffer to disk. > This still kinda makes it sound like the user of this class can care about Done PS20, Line 78: It is used as a handle for external classes to identify devices. > if this is only used externally for tests, I think we should say something Reworded to make it clear that it's internal. PS20, Line 167: w > handle->write_in_flight_ ? Done Line 276: void Cancel(); > does this need to be exposed, or could all callers just use CancelWriteAndR It doesn't need to be exposed now that we don't have 'state_' (previously it was called from BufferedBlockMgr to avoid that issue). Might be useful in future to cancel the write while discarding the data, but agree it's easier to keep it private for now to avoid complicating lifecycle. PS20, Line 300: SetWriteComplete > "Set" makes it sound like it just sets state. Maybe just call this WriteCo Done http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/util/buffer-ref.h File be/src/util/buffer-ref.h: PS20, Line 27: Ref > what does "Ref" stand for? Reference? This might get confused with BufferH Changed to MemRange. Yeah I was planning to do a follow-up patch to use this in other places in bufferpool/. Line 29: BufferRef(uint8_t* data, int64_t len) : data_(data), len_(len) {} Added some DCHECKs to the constructor. http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/util/disk-info.h File be/src/util/disk-info.h: Line 91: static int num_datanode_dirs_; > dead code also? Done -- To view, visit http://gerrit.cloudera.org:8080/5141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes