[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock

2018-04-03 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-15359:
---
Priority: Major  (was: Critical)

> IPC client hang in kerberized cluster due to JDK deadlock
> -
>
> Key: HADOOP-15359
> URL: https://issues.apache.org/jira/browse/HADOOP-15359
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.6.0, 2.8.0, 3.0.0
>Reporter: Xiao Chen
>Priority: Major
> Attachments: 1.jstack, 2.jstack
>
>
> In a recent internal testing, we have found a DFS client hang. Further 
> inspecting jstack shows the following:
> {noformat}
> "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 
> daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for 
> monitor entry [0x7f6bc2bd6000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.security.Provider.getService(Provider.java:1035)
> - waiting to lock <0x80277040> (a sun.security.provider.Sun)
> at 
> sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
> at 
> sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
> at 
> sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
> at javax.crypto.Cipher.getInstance(Cipher.java:513)
> at 
> sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202)
> at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
> at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
> at 
> sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
> at 
> sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
> at 
> sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
> at 
> sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
> at 
> sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374)
> at 
> sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284)
> at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209)
> at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182)
> at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053)
> at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403)
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77)
> at 
> org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617)
> at 
> org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583)
> - locked <0x83444878> (a java.nio.HeapByteBuffer)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x834448c0> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> {noformat}
> and at the end of jstack:
> {noformat}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #29":
>   waiting to lock monitor 0x17ff49f8 (object 0x80277040, a 
> sun.security.provider.Sun),
>   which is held by UNKNOWN_owner_addr=0x50607000
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #29":
> at java.security.Provider.getService(Provider.java:1035)
> - waiting to lock <0x80277040> (a sun.security.provider.Sun)
> at 
> sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437)
> at 
> sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
> at 
> sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
> at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293)
> - locked <0x834386b8> (a java.lang.Object)
> at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121)
> at 
> javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160)
> at 
> sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187)
> at sun.security.krb5.interna

[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock

2018-04-03 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-15359:
---
Description: 
In a recent internal testing, we have found a DFS client hang. Further 
inspecting jstack shows the following:

{noformat}
"IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 daemon 
prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for monitor entry 
[0x7f6bc2bd6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.security.Provider.getService(Provider.java:1035)
- waiting to lock <0x80277040> (a sun.security.provider.Sun)
at 
sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
at 
sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
at 
sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
at javax.crypto.Cipher.getInstance(Cipher.java:513)
at 
sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202)
at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
at 
sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
at 
sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
at 
sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
at 
sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
at 
sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374)
at 
sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284)
at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209)
at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182)
at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053)
at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403)
at com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77)
at 
org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617)
at 
org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583)
- locked <0x83444878> (a java.nio.HeapByteBuffer)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
- locked <0x834448c0> (a java.io.BufferedInputStream)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
{noformat}

and at the end of jstack:
{noformat}
Found one Java-level deadlock:
=
"IPC Parameter Sending Thread #29":
  waiting to lock monitor 0x17ff49f8 (object 0x80277040, a 
sun.security.provider.Sun),
  which is held by UNKNOWN_owner_addr=0x50607000

Java stack information for the threads listed above:
===
"IPC Parameter Sending Thread #29":
at java.security.Provider.getService(Provider.java:1035)
- waiting to lock <0x80277040> (a sun.security.provider.Sun)
at 
sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437)
at 
sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
at 
sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293)
- locked <0x834386b8> (a java.lang.Object)
at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121)
at javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160)
at 
sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187)
at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
at 
sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
at 
sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
at 
sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
at 
sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
at 
sun.security.jgss.krb5.MessageToken.genSignAndSeqNumber(MessageToken.java:315)
at sun.security.jgss.krb5.WrapToken.(WrapToken.java:422)
at sun.security.jgs

[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock

2018-04-03 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-15359:
---
Attachment: 2.jstack
1.jstack

> IPC client hang in kerberized cluster due to JDK deadlock
> -
>
> Key: HADOOP-15359
> URL: https://issues.apache.org/jira/browse/HADOOP-15359
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.6.0, 2.8.0, 3.0.0
>Reporter: Xiao Chen
>Priority: Critical
> Attachments: 1.jstack, 2.jstack
>
>
> In a recent internal testing, we have found a DFS client hang. Further 
> inspecting jstack shows the following:
> {noformat}
> "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 
> daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for 
> monitor entry [0x7f6bc2bd6000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.security.Provider.getService(Provider.java:1035)
> - waiting to lock <0x80277040> (a sun.security.provider.Sun)
> at 
> sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
> at 
> sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
> at 
> sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
> at javax.crypto.Cipher.getInstance(Cipher.java:513)
> at 
> sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202)
> at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
> at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
> at 
> sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
> at 
> sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
> at 
> sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
> at 
> sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
> at 
> sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374)
> at 
> sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284)
> at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209)
> at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182)
> at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053)
> at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403)
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77)
> at 
> org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617)
> at 
> org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583)
> - locked <0x83444878> (a java.nio.HeapByteBuffer)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x834448c0> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> {noformat}
> and at the end of jstack:
> {noformat}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #29":
>   waiting to lock monitor 0x17ff49f8 (object 0x80277040, a 
> sun.security.provider.Sun),
>   which is held by UNKNOWN_owner_addr=0x50607000
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #29":
> at java.security.Provider.getService(Provider.java:1035)
> - waiting to lock <0x80277040> (a sun.security.provider.Sun)
> at 
> sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437)
> at 
> sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
> at 
> sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
> at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293)
> - locked <0x834386b8> (a java.lang.Object)
> at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121)
> at 
> javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160)
> at 
> sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187)
> at sun.secur

[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock

2018-04-03 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-15359:
---
Affects Version/s: 2.6.0

> IPC client hang in kerberized cluster due to JDK deadlock
> -
>
> Key: HADOOP-15359
> URL: https://issues.apache.org/jira/browse/HADOOP-15359
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.6.0, 2.8.0, 3.0.0
>Reporter: Xiao Chen
>Priority: Critical
>
> In a recent internal testing, we have found a DFS client hang. Further 
> inspecting jstack shows the following:
> {noformat}
> "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 
> daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for 
> monitor entry [0x7f6bc2bd6000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.security.Provider.getService(Provider.java:1035)
> - waiting to lock <0x80277040> (a sun.security.provider.Sun)
> at 
> sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
> at 
> sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
> at 
> sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
> at javax.crypto.Cipher.getInstance(Cipher.java:513)
> at 
> sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202)
> at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
> at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
> at 
> sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
> at 
> sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
> at 
> sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
> at 
> sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
> at 
> sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374)
> at 
> sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284)
> at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209)
> at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182)
> at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053)
> at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403)
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77)
> at 
> org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617)
> at 
> org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583)
> - locked <0x83444878> (a java.nio.HeapByteBuffer)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x834448c0> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> {noformat}
> and at the end of jstack:
> {noformat}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #29":
>   waiting to lock monitor 0x17ff49f8 (object 0x80277040, a 
> sun.security.provider.Sun),
>   which is held by UNKNOWN_owner_addr=0x50607000
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #29":
> at java.security.Provider.getService(Provider.java:1035)
> - waiting to lock <0x80277040> (a sun.security.provider.Sun)
> at 
> sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437)
> at 
> sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
> at 
> sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
> at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293)
> - locked <0x834386b8> (a java.lang.Object)
> at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121)
> at 
> javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160)
> at 
> sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187)
> at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
> 

[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock

2018-04-03 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-15359:
---
Description: 
In a recent internal testing, we have found a DFS client hang. Further 
inspecting jstack shows the following:

{noformat}
"IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 daemon 
prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for monitor entry 
[0x7f6bc2bd6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.security.Provider.getService(Provider.java:1035)
- waiting to lock <0x80277040> (a sun.security.provider.Sun)
at 
sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
at 
sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
at 
sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
at javax.crypto.Cipher.getInstance(Cipher.java:513)
at 
sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202)
at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
at 
sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
at 
sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
at 
sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
at 
sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
at 
sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374)
at 
sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284)
at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209)
at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182)
at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053)
at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403)
at com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77)
at 
org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617)
at 
org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583)
- locked <0x83444878> (a java.nio.HeapByteBuffer)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
- locked <0x834448c0> (a java.io.BufferedInputStream)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
{noformat}

and at the end of jstack:
{noformat}
Found one Java-level deadlock:
=
"IPC Parameter Sending Thread #29":
  waiting to lock monitor 0x17ff49f8 (object 0x80277040, a 
sun.security.provider.Sun),
  which is held by UNKNOWN_owner_addr=0x50607000

Java stack information for the threads listed above:
===
"IPC Parameter Sending Thread #29":
at java.security.Provider.getService(Provider.java:1035)
- waiting to lock <0x80277040> (a sun.security.provider.Sun)
at 
sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437)
at 
sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
at 
sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293)
- locked <0x834386b8> (a java.lang.Object)
at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121)
at javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160)
at 
sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187)
at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
at 
sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
at 
sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
at 
sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
at 
sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
at 
sun.security.jgss.krb5.MessageToken.genSignAndSeqNumber(MessageToken.java:315)
at sun.security.jgss.krb5.WrapToken.(WrapToken.java:422)
at sun.security.jgs

[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock

2018-04-03 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-15359:
---
Priority: Critical  (was: Major)

> IPC client hang in kerberized cluster due to JDK deadlock
> -
>
> Key: HADOOP-15359
> URL: https://issues.apache.org/jira/browse/HADOOP-15359
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.6.0, 2.8.0, 3.0.0
>Reporter: Xiao Chen
>Priority: Critical
>
> In a recent internal testing, we have found a DFS client hang. Further 
> inspecting jstack shows the following:
> {noformat}
> "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 
> daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for 
> monitor entry [0x7f6bc2bd6000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.security.Provider.getService(Provider.java:1035)
> - waiting to lock <0x80277040> (a sun.security.provider.Sun)
> at 
> sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
> at 
> sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
> at 
> sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
> at javax.crypto.Cipher.getInstance(Cipher.java:513)
> at 
> sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202)
> at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
> at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
> at 
> sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
> at 
> sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
> at 
> sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
> at 
> sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
> at 
> sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374)
> at 
> sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284)
> at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209)
> at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182)
> at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053)
> at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403)
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77)
> at 
> org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617)
> at 
> org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583)
> - locked <0x83444878> (a java.nio.HeapByteBuffer)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x834448c0> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> {noformat}
> and at the end of jstack:
> {noformat}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #29":
>   waiting to lock monitor 0x17ff49f8 (object 0x80277040, a 
> sun.security.provider.Sun),
>   which is held by UNKNOWN_owner_addr=0x50607000
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #29":
> at java.security.Provider.getService(Provider.java:1035)
> - waiting to lock <0x80277040> (a sun.security.provider.Sun)
> at 
> sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437)
> at 
> sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
> at 
> sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
> at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293)
> - locked <0x834386b8> (a java.lang.Object)
> at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121)
> at 
> javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160)
> at 
> sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187)
> at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484

[jira] [Updated] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock

2018-04-03 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-15359:
---
Summary: IPC client hang in kerberized cluster due to JDK deadlock  (was: 
IPC client could run into JDK deadlock)

> IPC client hang in kerberized cluster due to JDK deadlock
> -
>
> Key: HADOOP-15359
> URL: https://issues.apache.org/jira/browse/HADOOP-15359
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.8.0, 3.0.0
>Reporter: Xiao Chen
>Priority: Major
>
> In a recent internal testing, we have found a DFS client hang. Further 
> inspecting jstack shows the following:
> {noformat}
> "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 
> daemon prio=5 os_prio=0 tid=0x7f6bb306c000 nid=0x1c76e waiting for 
> monitor entry [0x7f6bc2bd6000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.security.Provider.getService(Provider.java:1035)
> - waiting to lock <0x80277040> (a sun.security.provider.Sun)
> at 
> sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
> at 
> sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
> at 
> sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
> at javax.crypto.Cipher.getInstance(Cipher.java:513)
> at 
> sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202)
> at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
> at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
> at 
> sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
> at 
> sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
> at 
> sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
> at 
> sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
> at 
> sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374)
> at 
> sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284)
> at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209)
> at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182)
> at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053)
> at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403)
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77)
> at 
> org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617)
> at 
> org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583)
> - locked <0x83444878> (a java.nio.HeapByteBuffer)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x834448c0> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> {noformat}
> and at the end of jstack:
> {noformat}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #29":
>   waiting to lock monitor 0x17ff49f8 (object 0x80277040, a 
> sun.security.provider.Sun),
>   which is held by UNKNOWN_owner_addr=0x50607000
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #29":
> at java.security.Provider.getService(Provider.java:1035)
> - waiting to lock <0x80277040> (a sun.security.provider.Sun)
> at 
> sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437)
> at 
> sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
> at 
> sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
> at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293)
> - locked <0x834386b8> (a java.lang.Object)
> at javax.crypto.SecretKeyFactory.(SecretKeyFactory.java:121)
> at 
> javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160)
> at 
> sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187)
>