[ 
https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819495#comment-17819495
 ] 

Yunfeng Zhou commented on FLINK-34424:
--------------------------------------

Hi [~mapohl]  and [~pnowojski], [~tanyuxin] and I have not been able to find 
the cause of this problem for now. As far as I have investigated, a Java thread 
might be blocked without an explicit "waiting to lock ..." hint if it involves 
JNI calls like MonitorEnter/MonitorExit, internal locks like Semaphore and 
CountDownLatch, or low-level synchronization primitives like 
LockSupport.park(). However I did not find any match of these patterns in the 
blocked thread's stack. Besides, it seems that Java's GC might also cause a 
thread to be in blocking status, so we are not even sure there are blocking or 
deadlock issues to resolve.

Given that we could not reproduce this error in our local environment, nor have 
we found a similar error in Flink CI history, it seems that the error is a 
low-probability issue that can be temporarily ignored. Could we mark this issue 
as low priority for now and maybe revisit it when we have more inputs on 
exceptions like this?

> BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times 
> out
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-34424
>                 URL: https://issues.apache.org/jira/browse/FLINK-34424
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.19.0, 1.20.0
>            Reporter: Matthias Pohl
>            Assignee: Yunfeng Zhou
>            Priority: Critical
>              Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9151
> {code}
> Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 
> tid=0x00007f19503af800 nid=0x284c in Object.wait() [0x00007f191b6db000]
> Feb 11 13:55:29    java.lang.Thread.State: WAITING (on object monitor)
> Feb 11 13:55:29       at java.lang.Object.wait(Native Method)
> Feb 11 13:55:29       at java.lang.Thread.join(Thread.java:1252)
> Feb 11 13:55:29       - locked <0x00000000e2e019a8> (a 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader)
> Feb 11 13:55:29       at 
> org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104)
> Feb 11 13:55:29       at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92)
> Feb 11 13:55:29       at 
> org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81)
> Feb 11 13:55:29       at 
> org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177)
> Feb 11 13:55:29       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to