[ https://issues.apache.org/jira/browse/HADOOP-18521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633807#comment-17633807 ]
ASF GitHub Bot commented on HADOOP-18521: ----------------------------------------- snvijaya commented on PR #5117: URL: https://github.com/apache/hadoop/pull/5117#issuecomment-1313734822 Hi @steveloughran, Wanted to get your opinion on below change as a possible replacement for this change : [https://github.com/apache/hadoop/pull/5133](url) A ReadBuffer with a valid Buffer assigned to it can be in certain states when stream is closed, and with the above change, I am trying to address it as below : 1. Is in QueueReadAheadList - No change, the earlier purge takes care of it 2. Is in CompletedList - No change again, the earlier purge takes care of it 3. Is InProgressList but yet to make the network call - If stream is closed, stop network call and move the ReadBuffer as a failure into completed list 4. Is InProgressList , just finished the network call - If stream is closed, network call was successful or not, move the ReadBuffer as a failure into completed list Now, when in state 3 or 4, the purge method might not pick it as it might have executed first. In that case, to prioritize these ReadBuffers for eviction, have added the check for stream is closed in the eviction code as well. Please let me know if you see value in this fix and I could pursue further changes to incorporate validation code at queuing time and when getBlock finds a hit in completed list, and will also add related test code. > ABFS ReadBufferManager buffer sharing across concurrent HTTP requests > --------------------------------------------------------------------- > > Key: HADOOP-18521 > URL: https://issues.apache.org/jira/browse/HADOOP-18521 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure > Affects Versions: 3.3.2, 3.3.3, 3.3.4 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Critical > Labels: pull-request-available > > AbfsInputStream.close() can trigger the return of buffers used for active > prefetch GET requests into the ReadBufferManager free buffer pool. > A subsequent prefetch by a different stream in the same process may acquire > this same buffer. This can lead to risk of corruption of its own prefetched > data, data which may then be returned to that other thread. > On releases without the fix for this (3.3.2 to 3.3.4), the bug can be avoided > by disabling all prefetching > {code} > fs.azure.readaheadqueue.depth = 0 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org