[ https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579999#comment-16579999 ]
Daryn Sharp commented on HADOOP-15538: -------------------------------------- Did you ever look for an hs_err file per my earlier comment: "The main thread has no stack trace. Did you clip it out? If no, that's indicative of the jvm trying to shutdown. It may have detected a thread death or other nasty issue but is stuck waiting for all the non-daemon threads to exit. Look for a hs_err_pid file." > Possible RPC deadlock in Client > ------------------------------- > > Key: HADOOP-15538 > URL: https://issues.apache.org/jira/browse/HADOOP-15538 > Project: Hadoop Common > Issue Type: Bug > Components: common > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Priority: Major > Attachments: t1+13min.jstack, t1.jstack > > > We have a jstack collection that spans 13 minutes. One frame per ~1.5 > minutes. And for each of the frame, I observed the following: > {code:java} > Found one Java-level deadlock: > ============================= > "IPC Parameter Sending Thread #294": > waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a > java.lang.Object), > which is held by UNKNOWN_owner_addr=0x00007f68332e2800 > Java stack information for the threads listed above: > =================================================== > "IPC Parameter Sending Thread #294": > at > sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268) > - waiting to lock <0x0000000621745390> (a java.lang.Object) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461) > - locked <0x0000000621745380> (a java.lang.Object) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > - locked <0x0000000621749850> (a java.io.BufferedOutputStream) > at java.io.DataOutputStream.flush(DataOutputStream.java:123) > at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072) > - locked <0x000000062174b878> (a java.io.DataOutputStream) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Found one Java-level deadlock: > ============================= > "IPC Client (297602875) connection to x.y.z.p:8020 from impala": > waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a > java.lang.Object), > which is held by UNKNOWN_owner_addr=0x00007f68332e2800 > Java stack information for the threads listed above: > =================================================== > "IPC Client (297602875) connection to x.y.z.p:8020 from impala": > at > sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279) > - waiting to lock <0x0000000621745390> (a java.lang.Object) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390) > - locked <0x0000000621745370> (a java.lang.Object) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > - locked <0x00000006217476f0> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) > Found 2 deadlocks. > {code} > This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64. > The code appears to match > [https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java]. > The first thread is blocked at: > [https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268] > The second thread is blocked at: > > [https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279] > There are two issues here: > # There seems to be a real deadlock because the stacks remain the same even > if the first an last jstack frames captured is 13 minutes apart. > # Java deadlock report seems to be problematic, two threads that have > deadlock should not be blocked on the same lock, but they appear to be in > this case: the same SocketChannelImpl's stateLock. > I found a relevant jdk jira > [https://bugs.openjdk.java.net/browse/JDK-8007476], it explains where two > deadlocks are reported and they are really for the same deadlock. > I don't see a similar report about this issue in jdk jira database, and I'm > thinking about filing a jdk jira for that, but would like to throw some > discussion here before that. > Issue#1 is important, because the client is hanging, which indicate a real > problem; however, without a correct report before issue#2 is fixed, it's not > clear how the deadlock really looks like. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org