[
https://issues.apache.org/jira/browse/HADOOP-19101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran resolved HADOOP-19101.
-------------------------------------
Fix Version/s: 3.3.9
3.4.1
Resolution: Fixed
> Vectored Read into off-heap buffer broken in fallback implementation
> --------------------------------------------------------------------
>
> Key: HADOOP-19101
> URL: https://issues.apache.org/jira/browse/HADOOP-19101
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs, fs/azure
> Affects Versions: 3.4.0, 3.3.6
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Blocker
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> {{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at
> position zero even when the range is at a different offset. As a result: you
> can get incorrect information.
> Thanks for this is straightforward: we pass in a FileRange and use its offset
> as the starting position.
> However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely
> read vectorIO into direct buffers through HDFS, ABFS or GCS. Note that we
> have never seen this in production because the parquet and ORC libraries both
> read into on-heap storage.
> Those libraries needs to be audited to make sure that they never attempt to
> read into off-heap DirectBuffers. This is a bit trickier than you would think
> because an allocator is passed in. For PARQUET-2171 we will
> * only invoke the API on streams which explicitly declare their support for
> the API (so fallback in parquet itself)
> * not invoke when direct buffer allocation is in use.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]