Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )
Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges ...................................................................... Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc File be/src/runtime/io/hdfs-file-reader.cc: http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc@155 PS3, Line 155: *eof = true; > I'm thinking that eof is the right way to go. The FileReader is a new abstr I agree, it is cleaner to keep FileReader simple and keep higher level logic in ScanRange. http://gerrit.cloudera.org:8080/#/c/11520/5/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/11520/5/be/src/runtime/io/request-ranges.h@254 PS5, Line 254: bool expected_local, const BufferOpts& buffer_opts, > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/11520/4/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/11520/4/be/src/runtime/io/request-ranges.h@251 PS4, Line 251: /// Same as above, but it also adds sub-ranges. No need to merge contiguous sub-ranges : /// in advance, as this method will do the merge. : void Reset(hdfsFS fs, const char* file, int64_t len, int64_t offset, int disk_id, : bool expected_local, const BufferOpts& buffer_opts, : std::vector<SubRange>&& sub_ranges, void* meta_dat > Will we know the SubRanges when we call Reset()? (BaseScalarColumnReader::R Yes, I think we'll always know the SubRanges when we call Reset(). I overloaded Reset() to have another version that takes SubRanges. This way I didn't have to modify the current call sites that pass in metadata. http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc@268 PS3, Line 268: (!read_status.ok()) return read_status; > I think it makes sense for us to view end-of-file or partial read as errors Seems valid, now it returns SCANNER_INCOMPLETE_READ in this case as suggested. http://gerrit.cloudera.org:8080/#/c/11520/4/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/11520/4/be/src/runtime/io/scan-range.cc@236 PS4, Line 236: buffer_desc->eosr_ = eof || bytes_read_ == bytes_to_ > One thing we should try to do is make it clear what this calculation is doi Introduced a new local boolean variable called 'eof'. Also added a short comment. -- To view, visit http://gerrit.cloudera.org:8080/11520 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4 Gerrit-Change-Number: 11520 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 11 Oct 2018 09:58:33 +0000 Gerrit-HasComments: Yes