Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15370 )
Change subject: IMPALA-6636: Use async IO in ORC scanner ...................................................................... Patch Set 23: (8 comments) http://gerrit.cloudera.org:8080/#/c/15370/21/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15370/21/be/src/exec/hdfs-orc-scanner.cc@252 PS21, Line 252: col_range_local, split_range->mtime(), BufferOpts(split_range->cache_options())); : RETURN_IF_ERROR( : context_->AddAndStartStream(scan_range, range.io_reservation, &range.stream_)); : } : return Status::OK(); : } : > Added one more case in HdfsOrcScanner::ScanRangeInputStream::read. Done http://gerrit.cloudera.org:8080/#/c/15370/21/be/src/exec/hdfs-orc-scanner.cc@484 PS21, Line 484: Status HdfsOrcScanner::ProcessFileTail() { > Ah, yeah, that's a problem. Then we need to make sure 'input_stream_' won't Done http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@114 PS22, Line 114: uint64_t > Could you change this to warning or use VLOG_QUERY? Otherwise it's hard to Done http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@128 PS22, Line 128: if (!status.ok()) throw ResourceError(status); > Can we report the offset and length here? Done http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@137 PS22, Line 137: return Status(msg); : } > Not related to this patch, but I think we need to revisit this as IMPALA-68 I suppose we can match it with how we expect locality in case of ColumnRange::read? http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@238 PS22, Line 238: artition_id = > Shouldn't this be "range.offset_ + range.length_"? Thanks for catching this! Fixed it and move the logic as inline function IsExpectedLocal. http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@261 PS22, Line 261: string msg = Substitute("ORC read request out of range. offset: $0 length: $1 $2", : offset, length, debug()); : return Status(msg); : } : : DCHECK(offset >= current_position_); : S > I think we can change this to DCHECK(offset >= current_position_) now, sinc Done http://gerrit.cloudera.org:8080/#/c/15370/22/be/src/exec/hdfs-orc-scanner.cc@287 PS22, Line 287: > Could you create a JIRA for this? I found ORC-262 already describe this problem. I add this to the comment. -- To view, visit http://gerrit.cloudera.org:8080/15370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 Gerrit-Change-Number: 15370 Gerrit-PatchSet: 23 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Comment-Date: Fri, 21 Jan 2022 17:41:58 +0000 Gerrit-HasComments: Yes