Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
......................................................................


Patch Set 2:

Note that this was a quite hacky implementation - the problem is that when the 
ORC lib reads from the file, it only gives us an offset and length and we do 
not know which column (or stream) does it try to read. So we build a map of 
ranges beforehand (HdfsOrcScanner::StartColumnReading), and try to guess which 
range to advance during every individual read call and fall back to sync-IO if 
the read is not what we expected (HdfsOrcScanner::ScanRangeInputStream::read)

This seems to work, but changes in ORC lib can easily lead "disabling" async 
scanning by reading in unexpected patterns. The best would be to move most of 
the logic to ORC, so that it would return the ranges to us and identify the 
given range in every read call.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Aug 2021 14:44:27 +0000
Gerrit-HasComments: No

Reply via email to