[ https://issues.apache.org/jira/browse/FLINK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819495#comment-17819495 ]
Yunfeng Zhou commented on FLINK-34424: -------------------------------------- Hi [~mapohl] and [~pnowojski], [~tanyuxin] and I have not been able to find the cause of this problem for now. As far as I have investigated, a Java thread might be blocked without an explicit "waiting to lock ..." hint if it involves JNI calls like MonitorEnter/MonitorExit, internal locks like Semaphore and CountDownLatch, or low-level synchronization primitives like LockSupport.park(). However I did not find any match of these patterns in the blocked thread's stack. Besides, it seems that Java's GC might also cause a thread to be in blocking status, so we are not even sure there are blocking or deadlock issues to resolve. Given that we could not reproduce this error in our local environment, nor have we found a similar error in Flink CI history, it seems that the error is a low-probability issue that can be temporarily ignored. Could we mark this issue as low priority for now and maybe revisit it when we have more inputs on exceptions like this? > BoundedBlockingSubpartitionWriteReadTest#testRead10ConsumersConcurrent times > out > -------------------------------------------------------------------------------- > > Key: FLINK-34424 > URL: https://issues.apache.org/jira/browse/FLINK-34424 > Project: Flink > Issue Type: Bug > Components: Runtime / Network > Affects Versions: 1.19.0, 1.20.0 > Reporter: Matthias Pohl > Assignee: Yunfeng Zhou > Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57446&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9151 > {code} > Feb 11 13:55:29 "ForkJoinPool-50-worker-25" #414 daemon prio=5 os_prio=0 > tid=0x00007f19503af800 nid=0x284c in Object.wait() [0x00007f191b6db000] > Feb 11 13:55:29 java.lang.Thread.State: WAITING (on object monitor) > Feb 11 13:55:29 at java.lang.Object.wait(Native Method) > Feb 11 13:55:29 at java.lang.Thread.join(Thread.java:1252) > Feb 11 13:55:29 - locked <0x00000000e2e019a8> (a > org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest$LongReader) > Feb 11 13:55:29 at > org.apache.flink.core.testutils.CheckedThread.trySync(CheckedThread.java:104) > Feb 11 13:55:29 at > org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:92) > Feb 11 13:55:29 at > org.apache.flink.core.testutils.CheckedThread.sync(CheckedThread.java:81) > Feb 11 13:55:29 at > org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartitionWriteReadTest.testRead10ConsumersConcurrent(BoundedBlockingSubpartitionWriteReadTest.java:177) > Feb 11 13:55:29 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)