Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8707 )
Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront ...................................................................... Patch Set 20: (10 comments) http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/disk-io-mgr.h File be/src/runtime/io/disk-io-mgr.h: http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/disk-io-mgr.h@289 PS20, Line 289: AddBuffersToRange nit: AllocateBuffersForRange http://gerrit.cloudera.org:8080/#/c/8707/14/be/src/runtime/io/disk-io-mgr.h File be/src/runtime/io/disk-io-mgr.h: http://gerrit.cloudera.org:8080/#/c/8707/14/be/src/runtime/io/disk-io-mgr.h@259 PS14, Line 259: do nit: does http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/disk-io-mgr.cc@826 PS20, Line 826: ScheduleMode::IMMEDIATELY please see comment on request-context.cc::179 http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/request-context.cc File be/src/runtime/io/request-context.cc: http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/request-context.cc@174 PS20, Line 174: disk_state->ScheduleContext(lock, this, range->disk_id()); shouldn't this be called only for ScheduleMode::IMMEDIATELY? for ScheduleMode::UPON_GETNEXT: GetNextRange() will be scheduling this context and for ScheduleMode::BY_CALLER : the caller will explicitly call ScheduleContext() http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/request-context.cc@179 PS20, Line 179: DCHECK(schedule_mode == ScheduleMode::IMMEDIATELY) << static_cast<int>(schedule_mode); can you document this in the method comment, that schedule_mode should always be ScheduleMode::IMMEDIATELY for writes. Also, I have some confusion over here too, initially we required schedule_immediately to be always false, yet we always called ScheduleContext() for writes. What should be the desired behavior here? http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/request-ranges.h@437 PS20, Line 437: unused_buffers_ should we call this unused_internal_buffers? to make the distinction clear from external buffers http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/request-ranges.h@444 PS20, Line 444: unused_buffer_bytes_returned_ this implicitly signifies the amount of bytes that have been read or are being read, which is why we are using this to check if we need to hold onto buffers. this however is incremented on call to GetUnusedBuffer(), which is also being used in CleanUpUnusedBuffers(). Although this wont make any difference to the correctness of how this variable is being used, since CleanUpUnusedBuffers() is only called when eosr is true or scanner is cancelled, after which this variable will be redundant....still I would prefer this variable be incremented in GetNextBufferForRange() instead and renamed appropriately http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/scan-range.cc@139 PS20, Line 139: unused_buffer_bytes_ += buffer->buffer_len(); should we do a check for (unused_buffer_bytes_ >= len_ - unused_buffer_bytes_returned_) after absorbing each buffer. This way we can hold on to only the buffers we need to read the remainder of the file and cleanup the rest of them http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/scan-range.cc@141 PS20, Line 141: if (blocked_on_buffer_) { : // This scan range was blocked and is no longer, schedule it again. : blocked_on_buffer_ = false; : schedule_range = true; : } maybe do this outside the for loop? http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/scan-range.cc@212 PS20, Line 212: boost:: nit: can be removed -- To view, visit http://gerrit.cloudera.org:8080/8707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f Gerrit-Change-Number: 8707 Gerrit-PatchSet: 20 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 01 Feb 2018 01:28:21 +0000 Gerrit-HasComments: Yes