[ 
https://issues.apache.org/jira/browse/FLINK-31036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689614#comment-17689614
 ] 

Rui Fan commented on FLINK-31036:
---------------------------------

I see that StateCheckpointedITCase fails when this commit[1] is merged.

I have an idea: Based on this commit, use a script to run 
StateCheckpointedITCase(with LOG enabled) repeatedly to see if it can be 
reproduced. I'll update here when I come to any conclusions.

[1]https://github.com/flink-ci/flink-mirror/commit/331abad6dd2d6ec79ecde7116f3b48f2a249b9ef

> StateCheckpointedITCase timed out due to deadlock
> -------------------------------------------------
>
>                 Key: FLINK-31036
>                 URL: https://issues.apache.org/jira/browse/FLINK-31036
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.17.0
>            Reporter: Matthias Pohl
>            Assignee: Rui Fan
>            Priority: Blocker
>              Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46023&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=0c010d0c-3dec-5bf1-d408-7b18988b1b2b&l=10608
> {code}
> "Legacy Source Thread - Source: Custom Source -> Filter (6/12)#69980" 
> #13718026 prio=5 os_prio=0 tid=0x00007f05f44f0800 nid=0x128157 waiting on 
> condition [0x00007f059feef000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000000f0a974e8> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>       at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>       at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>       at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>       at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegmentBlocking(LocalBufferPool.java:384)
>       at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:356)
>       at 
> org.apache.flink.runtime.io.network.partition.BufferWritingResultPartition.requestNewBufferBuilderFromPool(BufferWritingResultPartition.java:414)
>       at 
> org.apache.flink.runtime.io.network.partition.BufferWritingResultPartition.requestNewUnicastBufferBuilder(BufferWritingResultPartition.java:390)
>       at 
> org.apache.flink.runtime.io.network.partition.BufferWritingResultPartition.appendUnicastDataForRecordContinuation(BufferWritingResultPartition.java:328)
>       at 
> org.apache.flink.runtime.io.network.partition.BufferWritingResultPartition.emitRecord(BufferWritingResultPartition.java:161)
>       at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:107)
>       at 
> org.apache.flink.runtime.io.network.api.writer.ChannelSelectorRecordWriter.emit(ChannelSelectorRecordWriter.java:55)
>       at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:105)
>       at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:91)
>       at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
>       at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:59)
>       at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:31)
>       at 
> org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:39)
>       at 
> org.apache.flink.streaming.runtime.io.RecordProcessorUtils$$Lambda$1311/1256184070.accept(Unknown
>  Source)
>       at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75)
>       at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:50)
>       at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
>       at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:418)
>       at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:513)
>       - locked <0x00000000d55035c0> (a java.lang.Object)
>       at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$SwitchingOnClose.collect(StreamSourceContexts.java:103)
>       at 
> org.apache.flink.test.checkpointing.StateCheckpointedITCase$StringGeneratingSourceFunction.run(StateCheckpointedITCase.java:178)
>       - locked <0x00000000d55035c0> (a java.lang.Object)
>       at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
>       at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
>       at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to