[ https://issues.apache.org/jira/browse/HADOOP-19101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved HADOOP-19101. ------------------------------------- Fix Version/s: 3.3.9 3.4.1 Resolution: Fixed > Vectored Read into off-heap buffer broken in fallback implementation > -------------------------------------------------------------------- > > Key: HADOOP-19101 > URL: https://issues.apache.org/jira/browse/HADOOP-19101 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure > Affects Versions: 3.4.0, 3.3.6 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Blocker > Fix For: 3.3.9, 3.5.0, 3.4.1 > > > {{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at > position zero even when the range is at a different offset. As a result: you > can get incorrect information. > Thanks for this is straightforward: we pass in a FileRange and use its offset > as the starting position. > However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely > read vectorIO into direct buffers through HDFS, ABFS or GCS. Note that we > have never seen this in production because the parquet and ORC libraries both > read into on-heap storage. > Those libraries needs to be audited to make sure that they never attempt to > read into off-heap DirectBuffers. This is a bit trickier than you would think > because an allocator is passed in. For PARQUET-2171 we will > * only invoke the API on streams which explicitly declare their support for > the API (so fallback in parquet itself) > * not invoke when direct buffer allocation is in use. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org