[ 
https://issues.apache.org/jira/browse/HADOOP-18521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629339#comment-17629339
 ] 

Steve Loughran commented on HADOOP-18521:
-----------------------------------------

This bug can be fixed by deleting one line from 
{{ReadBufferManager.purgeBuffersForStream()}}.

I am not going to provide a test for this as you need a multi GB CSV file and a 
build of spark configured to use your hadoop dist.

The latest build of cloudstore (https://github.com/steveloughran/cloudstore) 
has a command {{mkcsv}} which can create the file; the man page includes the 
spark binding info: 
https://github.com/steveloughran/cloudstore/blob/trunk/src/main/site/mkcsv.md

along with the fix, i am going to include a stream capability which the fs and 
stream can be probed for to declare that the fix is in. this allows for 
programmatic verification of the safety of releases, including with the 
cloudstore pathcapabilities command

> ABFS ReadBufferManager buffer sharing across concurrent HTTP requests
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-18521
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18521
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>    Affects Versions: 3.3.2, 3.3.3, 3.3.4
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>
> AbfsInputStream.close() can trigger the return of buffers used for active 
> prefetch GET requests into the ReadBufferManager free buffer pool.
> A subsequent prefetch by a different stream in the same process may acquire 
> this same buffer. This can lead to risk of corruption of its own prefetched 
> data, data which may then be returned to that other thread.
> On releases without the fix for this (3.3.2 to 3.3.4), the bug can be avoided 
> by disabling all prefetching 
> {code}
> fs.azure.readaheadqueue.depth
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to