Weijie Guo created FLINK-28925:
----------------------------------

             Summary: Fix the concurrency problem in hybrid shuffle
                 Key: FLINK-28925
                 URL: https://issues.apache.org/jira/browse/FLINK-28925
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Network
    Affects Versions: 1.16.0
            Reporter: Weijie Guo
             Fix For: 1.16.0


Through tpc-ds testing and code analysis, I found some thread unsafe problems 
in hybrid shuffle:
 # HsSubpartitionMemeoryDataManager#consumeBuffer should return a readOnlySlice 
buffer to downstream instead of original buffer: If the spilling thread is 
processing while  downstream task is consuming the same buffer, the amount of 
data written to the disk will be smaller than the actual value. To solve this, 
we should let the consuming thread and the spilling thread share the same data 
but not index.
 # HsSubpartitionMemoryDataManager#releaseSubpartitionBuffers should ignore the 
release decision if the buffer already removed from bufferIndexToContexts 
instead of throw an exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to