[jira] [Updated] (FLINK-25316) BlobServer can get stuck during shutdown

2022-09-28 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser updated FLINK-25316:
---
Fix Version/s: (was: 1.16.0)

> BlobServer can get stuck during shutdown
> 
>
> Key: FLINK-25316
> URL: https://issues.apache.org/jira/browse/FLINK-25316
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Robert Metzger
>Priority: Minor
>
> The cluster shutdown can get stuck
> {code}
> "AkkaRpcService-Supervisor-Termination-Future-Executor-thread-1" #89 daemon 
> prio=5 os_prio=0 tid=0x004017d7 nid=0x2ec in Object.wait() 
> [0x00402a9b5000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0xd6c48368> (a 
> org.apache.flink.runtime.blob.BlobServer)
>   at java.lang.Thread.join(Thread.java:1252)
>   - locked <0xd6c48368> (a 
> org.apache.flink.runtime.blob.BlobServer)
>   at java.lang.Thread.join(Thread.java:1326)
>   at org.apache.flink.runtime.blob.BlobServer.close(BlobServer.java:319)
>   at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:406)
>   - locked <0xd5d27350> (a java.lang.Object)
>   at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$shutDownAsync$4(ClusterEntrypoint.java:505
> {code}
> because the BlobServer.run() method ignores interrupts:
> {code}
> "BLOB Server listener at 6124" #30 daemon prio=5 os_prio=0 
> tid=0x00401c929800 nid=0x2b4 runnable [0x0040263f9000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.PlainSocketImpl.socketAccept(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
>   at java.net.ServerSocket.implAccept(ServerSocket.java:560)
>   at java.net.ServerSocket.accept(ServerSocket.java:528)
>   at 
> org.apache.flink.util.NetUtils.acceptWithoutTimeout(NetUtils.java:143)
>   at org.apache.flink.runtime.blob.BlobServer.run(BlobServer.java:268)
> {code}
> This issue was introduced in FLINK-24156 and first mentioned in 
> https://issues.apache.org/jira/browse/FLINK-24113?focusedCommentId=17459414&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17459414



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-25316) BlobServer can get stuck during shutdown

2022-04-12 Thread Yun Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Gao updated FLINK-25316:

Fix Version/s: 1.16.0

> BlobServer can get stuck during shutdown
> 
>
> Key: FLINK-25316
> URL: https://issues.apache.org/jira/browse/FLINK-25316
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Robert Metzger
>Priority: Minor
> Fix For: 1.15.0, 1.16.0
>
>
> The cluster shutdown can get stuck
> {code}
> "AkkaRpcService-Supervisor-Termination-Future-Executor-thread-1" #89 daemon 
> prio=5 os_prio=0 tid=0x004017d7 nid=0x2ec in Object.wait() 
> [0x00402a9b5000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0xd6c48368> (a 
> org.apache.flink.runtime.blob.BlobServer)
>   at java.lang.Thread.join(Thread.java:1252)
>   - locked <0xd6c48368> (a 
> org.apache.flink.runtime.blob.BlobServer)
>   at java.lang.Thread.join(Thread.java:1326)
>   at org.apache.flink.runtime.blob.BlobServer.close(BlobServer.java:319)
>   at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:406)
>   - locked <0xd5d27350> (a java.lang.Object)
>   at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$shutDownAsync$4(ClusterEntrypoint.java:505
> {code}
> because the BlobServer.run() method ignores interrupts:
> {code}
> "BLOB Server listener at 6124" #30 daemon prio=5 os_prio=0 
> tid=0x00401c929800 nid=0x2b4 runnable [0x0040263f9000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.PlainSocketImpl.socketAccept(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
>   at java.net.ServerSocket.implAccept(ServerSocket.java:560)
>   at java.net.ServerSocket.accept(ServerSocket.java:528)
>   at 
> org.apache.flink.util.NetUtils.acceptWithoutTimeout(NetUtils.java:143)
>   at org.apache.flink.runtime.blob.BlobServer.run(BlobServer.java:268)
> {code}
> This issue was introduced in FLINK-24156 and first mentioned in 
> https://issues.apache.org/jira/browse/FLINK-24113?focusedCommentId=17459414&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17459414



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25316) BlobServer can get stuck during shutdown

2021-12-15 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-25316:
---
Priority: Minor  (was: Critical)

> BlobServer can get stuck during shutdown
> 
>
> Key: FLINK-25316
> URL: https://issues.apache.org/jira/browse/FLINK-25316
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Robert Metzger
>Priority: Minor
> Fix For: 1.15.0
>
>
> The cluster shutdown can get stuck
> {code}
> "AkkaRpcService-Supervisor-Termination-Future-Executor-thread-1" #89 daemon 
> prio=5 os_prio=0 tid=0x004017d7 nid=0x2ec in Object.wait() 
> [0x00402a9b5000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0xd6c48368> (a 
> org.apache.flink.runtime.blob.BlobServer)
>   at java.lang.Thread.join(Thread.java:1252)
>   - locked <0xd6c48368> (a 
> org.apache.flink.runtime.blob.BlobServer)
>   at java.lang.Thread.join(Thread.java:1326)
>   at org.apache.flink.runtime.blob.BlobServer.close(BlobServer.java:319)
>   at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:406)
>   - locked <0xd5d27350> (a java.lang.Object)
>   at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$shutDownAsync$4(ClusterEntrypoint.java:505
> {code}
> because the BlobServer.run() method ignores interrupts:
> {code}
> "BLOB Server listener at 6124" #30 daemon prio=5 os_prio=0 
> tid=0x00401c929800 nid=0x2b4 runnable [0x0040263f9000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.PlainSocketImpl.socketAccept(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
>   at java.net.ServerSocket.implAccept(ServerSocket.java:560)
>   at java.net.ServerSocket.accept(ServerSocket.java:528)
>   at 
> org.apache.flink.util.NetUtils.acceptWithoutTimeout(NetUtils.java:143)
>   at org.apache.flink.runtime.blob.BlobServer.run(BlobServer.java:268)
> {code}
> This issue was introduced in FLINK-24156 and first mentioned in 
> https://issues.apache.org/jira/browse/FLINK-24113?focusedCommentId=17459414&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17459414



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25316) BlobServer can get stuck during shutdown

2021-12-14 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-25316:
---
Description: 
The cluster shutdown can get stuck
{code}
"AkkaRpcService-Supervisor-Termination-Future-Executor-thread-1" #89 daemon 
prio=5 os_prio=0 tid=0x004017d7 nid=0x2ec in Object.wait() 
[0x00402a9b5000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xd6c48368> (a 
org.apache.flink.runtime.blob.BlobServer)
at java.lang.Thread.join(Thread.java:1252)
- locked <0xd6c48368> (a 
org.apache.flink.runtime.blob.BlobServer)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.flink.runtime.blob.BlobServer.close(BlobServer.java:319)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:406)
- locked <0xd5d27350> (a java.lang.Object)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$shutDownAsync$4(ClusterEntrypoint.java:505
{code}

because the BlobServer.run() method ignores interrupts:
{code}
"BLOB Server listener at 6124" #30 daemon prio=5 os_prio=0 
tid=0x00401c929800 nid=0x2b4 runnable [0x0040263f9000]
   java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:560)
at java.net.ServerSocket.accept(ServerSocket.java:528)
at 
org.apache.flink.util.NetUtils.acceptWithoutTimeout(NetUtils.java:143)
at org.apache.flink.runtime.blob.BlobServer.run(BlobServer.java:268)
{code}

This issue was introduced in FLINK-24156 and first mentioned in 
https://issues.apache.org/jira/browse/FLINK-24113?focusedCommentId=17459414&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17459414


  was:
The cluster shutdown can get stuck
{code}
"AkkaRpcService-Supervisor-Termination-Future-Executor-thread-1" #89 daemon 
prio=5 os_prio=0 tid=0x004017d7 nid=0x2ec in Object.wait() 
[0x00402a9b5000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xd6c48368> (a 
org.apache.flink.runtime.blob.BlobServer)
at java.lang.Thread.join(Thread.java:1252)
- locked <0xd6c48368> (a 
org.apache.flink.runtime.blob.BlobServer)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.flink.runtime.blob.BlobServer.close(BlobServer.java:319)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:406)
- locked <0xd5d27350> (a java.lang.Object)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$shutDownAsync$4(ClusterEntrypoint.java:505
{code}

because the BlobServer.run() method ignores interrupts:
{code}
"BLOB Server listener at 6124" #30 daemon prio=5 os_prio=0 
tid=0x00401c929800 nid=0x2b4 runnable [0x0040263f9000]
   java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:560)
at java.net.ServerSocket.accept(ServerSocket.java:528)
at 
org.apache.flink.util.NetUtils.acceptWithoutTimeout(NetUtils.java:143)
at org.apache.flink.runtime.blob.BlobServer.run(BlobServer.java:268)
{code}

This issue was introduced in FLINK-24156.



> BlobServer can get stuck during shutdown
> 
>
> Key: FLINK-25316
> URL: https://issues.apache.org/jira/browse/FLINK-25316
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Robert Metzger
>Priority: Critical
> Fix For: 1.15.0
>
>
> The cluster shutdown can get stuck
> {code}
> "AkkaRpcService-Supervisor-Termination-Future-Executor-thread-1" #89 daemon 
> prio=5 os_prio=0 tid=0x004017d7 nid=0x2ec in Object.wait() 
> [0x00402a9b5000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0xd6c48368> (a 
> org.apache.flink.runtime.blob.BlobServer)
>   at java.lang.Thread.join(Thread.java:1252)
>   - locked <0xd6c48368> (a 
> org.apache.flink.runtime.blob.BlobServer)
>   at java.lang.Thread.join(Thread.java:1326)
>   at org.apache.flink.runtime.blob.BlobServer.close(BlobServer.java:319)
>   at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:406)
>