Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15370 )
Change subject: WIP IMPALA-6636: Use async IO in ORC scanner ...................................................................... Patch Set 2: Note that this was a quite hacky implementation - the problem is that when the ORC lib reads from the file, it only gives us an offset and length and we do not know which column (or stream) does it try to read. So we build a map of ranges beforehand (HdfsOrcScanner::StartColumnReading), and try to guess which range to advance during every individual read call and fall back to sync-IO if the read is not what we expected (HdfsOrcScanner::ScanRangeInputStream::read) This seems to work, but changes in ORC lib can easily lead "disabling" async scanning by reading in unexpected patterns. The best would be to move most of the logic to ORC, so that it would return the ranges to us and identify the given range in every read call. -- To view, visit http://gerrit.cloudera.org:8080/15370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 Gerrit-Change-Number: 15370 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Mon, 09 Aug 2021 14:44:27 +0000 Gerrit-HasComments: No