[ https://issues.apache.org/jira/browse/IMPALA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sahil Takiar resolved IMPALA-9606. ---------------------------------- Fix Version/s: Impala 4.0 Resolution: Fixed > ABFS reads should use hdfsPreadFully > ------------------------------------ > > Key: IMPALA-9606 > URL: https://issues.apache.org/jira/browse/IMPALA-9606 > Project: IMPALA > Issue Type: Bug > Components: Backend > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > Fix For: Impala 4.0 > > > In IMPALA-8525, hdfs preads were enabled by default when reading data from > S3. IMPALA-8525 deferred enabling preads for ABFS because they didn't > significantly improve performance. After some more investigation into the > ABFS input streams, I think it is safe to use {{hdfsPreadFully}} for ABFS > reads. > The ABFS client uses a different model for fetching data compared to S3A. > Details are beyond the scope of this JIRA, but it is related to a feature in > ABFS called "read-aheads". ABFS has logic to pre-fetch data it *thinks* will > be required by the client. By default, it pre-fetches # cores * 4 MB of data. > If the requested data exists in the client cache, it is read from the cache. > However, there is no real drawback to using {{hdfsPreadFully}} for ABFS > reads. It's definitely safer, because while the current implementation of > ABFS always returns the amount of requested data, only the {{hdfsPreadFully}} > API makes that guarantee. -- This message was sent by Atlassian Jira (v8.3.4#803005)