[jira] [Commented] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption

2017-12-05 Thread Yishan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279779#comment-16279779
 ] 

Yishan Jiang commented on SPARK-21827:
--

Yes, I am using HDFS.
Cores to executor, mostly using default. Try other number like 2, 3... etc, 
same issue.

> Task fail due to executor exception when enable Sasl Encryption
> ---
>
> Key: SPARK-21827
> URL: https://issues.apache.org/jira/browse/SPARK-21827
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.6.1, 2.1.1, 2.2.0
> Environment: OS: RedHat 7.1 64bit
>Reporter: Yishan Jiang
>
> We met authentication and Sasl encryption on many versions, just append 161 
> version like this:
> spark.local.dir /tmp/test-161
> spark.shuffle.service.enabled true
> *spark.authenticate true*
> *spark.authenticate.enableSaslEncryption true*
> *spark.network.sasl.serverAlwaysEncrypt true*
> spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f
> spark.history.ui.port 18089
> spark.shuffle.service.port 7347
> spark.master.rest.port 6076
> spark.deploy.recoveryMode NONE
> spark.ssl.enabled true
> spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom
> We run an Spark example and task fail with Exception messages:
> 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347
> 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager
> 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs
> 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from 
> cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394
> java.lang.IllegalArgumentException: Frame length should be positive: 
> -5594407078713290673   
> at 
> org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:785)
> 17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests 
> outstanding when connection from 
> cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed
> 17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = 
> RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, 
> cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn
> dpointRef(null))] in 1 attempts
> java.lang.IllegalArgumentException: Frame length should be positive: 
> -5594407078713290673
> at 
> org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at 

[jira] [Commented] (SPARK-21495) DIGEST-MD5: Out of order sequencing of messages from server

2017-08-24 Thread Yishan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139834#comment-16139834
 ] 

Yishan Jiang commented on SPARK-21495:
--

Hi Sean,

I saw you resolve this issue as "Not a problem", so does this mean spark only 
support simple secret like "aaa"?

> DIGEST-MD5: Out of order sequencing of messages from server
> ---
>
> Key: SPARK-21495
> URL: https://issues.apache.org/jira/browse/SPARK-21495
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.6.1
> Environment: OS: RedHat 7.1 64bit
> Spark: 1.6.1
>Reporter: Xin Yu Pan
>
> We hit an issue when enabling authentication and Sasl encryption, see bold 
> font in following parameter list.
> spark.local.dir /tmp/xpan-spark-161
> spark.eventLog.dir file:///home/xpan/spark-conf/event
> spark.eventLog.enabled true
> spark.history.fs.logDirectory file:/home/xpan/spark-conf/event
> spark.history.ui.port 18085
> spark.history.fs.cleaner.enabled true
> spark.history.fs.cleaner.interval 1d
> spark.history.fs.cleaner.maxAge 14d
> spark.dynamicAllocation.enabled false
> spark.shuffle.service.enabled false
> spark.shuffle.service.port 7448
> spark.shuffle.reduceLocality.enabled false
> spark.master.port 7087
> spark.master.rest.port 6077
> spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom
> *spark.authenticate true
> spark.authenticate.secret 5828d44b-f9b9-4033-b1f5-21d1e3273ec2
> spark.authenticate.enableSaslEncryption false
> spark.network.sasl.serverAlwaysEncrypt false*
> We run the simple SparkPi example and there are Exception messages even 
> though the application gets done.
> # cat 
> spark-1.6.1-bin-hadoop2.6/logs/spark-xpan-org.apache.spark.deploy.ExternalShuffleService-1-cws-75.out.1
> ... ...
> 17/07/20 02:57:30 INFO spark.SecurityManager: SecurityManager: authentication 
> enabled; ui acls disabled; users with view permissions: Set(xpan); users with 
> modify permissions: Set(xpan)
> 17/07/20 02:57:31 INFO deploy.ExternalShuffleService: Starting shuffle 
> service on port 7448 with useSasl = true
> 17/07/20 02:58:04 INFO shuffle.ExternalShuffleBlockResolver: Registered 
> executor AppExecId{appId=app-20170720025800-, execId=0} with 
> ExecutorShuffleInfo{localDirs=[/tmp/xpan-spark-161/spark-8e4885a3-c463-4dfb-a396-04e16b65fd1e/executor-be15fcd0-c946-4c83-ba25-3b20bbce5b0e/blockmgr-0fd2658a-ce15-4d56-901c-4c746161bbe0],
>  subDirsPerLocalDir=64, 
> shuffleManager=org.apache.spark.shuffle.sort.SortShuffleManager}
> 17/07/20 02:58:11 INFO security.sasl: DIGEST41:Unmatched MACs
> 17/07/20 02:58:11 WARN server.TransportChannelHandler: Exception in 
> connection from /172.29.10.77:50616
> io.netty.handler.codec.DecoderException: javax.security.sasl.SaslException: 
> DIGEST-MD5: Out of order sequencing of messages from server. Got: 125 
> Expected: 123
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:99)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:785)
> Caused by: javax.security.sasl.SaslException: DIGEST-MD5: Out of order 
> sequencing of messages from server. Got: 125 Expected: 123
>   at 
> com.ibm.security.sasl.digest.DigestMD5Base$DigestPrivacy.unwrap(DigestMD5Base.java:1535)
>   at 
> com.ibm.security.sasl.digest.DigestMD5Base.unwrap(DigestMD5Base.java:231)
>   at 
> org.apache.spark.network.sasl.SparkSaslServer.unwrap(SparkSaslServer.java:149)
>   at 
> org.apache.spark.network.sasl.SaslEncryption$DecryptionHandler.decode(SaslEncryption.java:127)
>   

[jira] [Updated] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption

2017-08-24 Thread Yishan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yishan Jiang updated SPARK-21827:
-
Description: 
We met authentication and Sasl encryption on many versions, just append 161 
version like this:

spark.local.dir /tmp/test-161
spark.shuffle.service.enabled true
*spark.authenticate true*
*spark.authenticate.enableSaslEncryption true*
*spark.network.sasl.serverAlwaysEncrypt true*
spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f
spark.history.ui.port 18089
spark.shuffle.service.port 7347
spark.master.rest.port 6076
spark.deploy.recoveryMode NONE
spark.ssl.enabled true
spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom

We run an Spark example and task fail with Exception messages:

17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347
17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager
17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs
17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from 
cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394
java.lang.IllegalArgumentException: Frame length should be positive: 
-5594407078713290673   
at 
org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119)
at 
org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:785)
17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests 
outstanding when connection from 
cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed
17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = 
RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, 
cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn
dpointRef(null))] in 1 attempts
java.lang.IllegalArgumentException: Frame length should be positive: 
-5594407078713290673
at 
org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119)
at 
org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:785)
17/08/22 03:56:55 ERROR TransportClient: Failed to send RPC 9091046580632843491 
to cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394: 
java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
17/08/22 03:56:55 WARN NettyRpcEndpointRef: Error sending message [message = 
RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, 
cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEndpointRef(null))] 
in 2 attempts
java.io.IOException: Failed to send RPC 9091046580632843491 to 
cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394: 
java.nio.channels.ClosedChannelException
at 
org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)

[jira] [Created] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption

2017-08-24 Thread Yishan Jiang (JIRA)
Yishan Jiang created SPARK-21827:


 Summary: Task fail due to executor exception when enable Sasl 
Encryption
 Key: SPARK-21827
 URL: https://issues.apache.org/jira/browse/SPARK-21827
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Affects Versions: 2.2.0, 2.1.1, 1.6.1
 Environment: linux x86_64 


Reporter: Yishan Jiang


We met authentication and Sasl encryption on many versions, just append 161 
version like this:

spark.local.dir /tmp/test-161
spark.shuffle.service.enabled true
*spark.authenticate true*
*spark.authenticate.enableSaslEncryption true*
*spark.network.sasl.serverAlwaysEncrypt true*
spark.history.ui.port 18089
spark.shuffle.service.port 7347
spark.master.rest.port 6076
spark.deploy.recoveryMode NONE
spark.ssl.enabled true
spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom

We run an Spark example and task fail with Exception messages:

17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347
17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager
17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs
17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from 
cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394
java.lang.IllegalArgumentException: Frame length should be positive: 
-5594407078713290673   
at 
org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119)
at 
org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:785)
17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests 
outstanding when connection from 
cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed
17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = 
RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, 
cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn
dpointRef(null))] in 1 attempts
java.lang.IllegalArgumentException: Frame length should be positive: 
-5594407078713290673
at 
org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119)
at 
org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:785)
17/08/22 03:56:55 ERROR TransportClient: Failed to send RPC 9091046580632843491 
to cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394: 
java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
17/08/22 03:56:55 WARN NettyRpcEndpointRef: Error sending message [message = 
RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, 
cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEndpointRef(null))] 
in 2 attempts
java.io.IOException: Failed to send RPC 9091046580632843491 to 

[jira] [Updated] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption

2017-08-24 Thread Yishan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yishan Jiang updated SPARK-21827:
-
Environment: 
OS: RedHat 7.1 64bit



  was:
linux x86_64 




> Task fail due to executor exception when enable Sasl Encryption
> ---
>
> Key: SPARK-21827
> URL: https://issues.apache.org/jira/browse/SPARK-21827
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.6.1, 2.1.1, 2.2.0
> Environment: OS: RedHat 7.1 64bit
>Reporter: Yishan Jiang
>
> We met authentication and Sasl encryption on many versions, just append 161 
> version like this:
> spark.local.dir /tmp/test-161
> spark.shuffle.service.enabled true
> *spark.authenticate true*
> *spark.authenticate.enableSaslEncryption true*
> *spark.network.sasl.serverAlwaysEncrypt true*
> spark.history.ui.port 18089
> spark.shuffle.service.port 7347
> spark.master.rest.port 6076
> spark.deploy.recoveryMode NONE
> spark.ssl.enabled true
> spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom
> We run an Spark example and task fail with Exception messages:
> 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347
> 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager
> 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs
> 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from 
> cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394
> java.lang.IllegalArgumentException: Frame length should be positive: 
> -5594407078713290673   
> at 
> org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:785)
> 17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests 
> outstanding when connection from 
> cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed
> 17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = 
> RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, 
> cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn
> dpointRef(null))] in 1 attempts
> java.lang.IllegalArgumentException: Frame length should be positive: 
> -5594407078713290673
> at 
> org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:785)
> 17/08/22 03:56:55 ERROR TransportClient: Failed to send RPC 
> 9091046580632843491 to 

[jira] [Commented] (SPARK-21495) DIGEST-MD5: Out of order sequencing of messages from server

2017-08-24 Thread Yishan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139813#comment-16139813
 ] 

Yishan Jiang commented on SPARK-21495:
--

Meet same issue like this, I tried change spark.authenticate.secret as simple 
as "aaa" and it works well. So it mostly because the authenticate could not 
support complicate secret. Change it to simple secret to work around if it 
blocks.

> DIGEST-MD5: Out of order sequencing of messages from server
> ---
>
> Key: SPARK-21495
> URL: https://issues.apache.org/jira/browse/SPARK-21495
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.6.1
> Environment: OS: RedHat 7.1 64bit
> Spark: 1.6.1
>Reporter: Xin Yu Pan
>
> We hit an issue when enabling authentication and Sasl encryption, see bold 
> font in following parameter list.
> spark.local.dir /tmp/xpan-spark-161
> spark.eventLog.dir file:///home/xpan/spark-conf/event
> spark.eventLog.enabled true
> spark.history.fs.logDirectory file:/home/xpan/spark-conf/event
> spark.history.ui.port 18085
> spark.history.fs.cleaner.enabled true
> spark.history.fs.cleaner.interval 1d
> spark.history.fs.cleaner.maxAge 14d
> spark.dynamicAllocation.enabled false
> spark.shuffle.service.enabled false
> spark.shuffle.service.port 7448
> spark.shuffle.reduceLocality.enabled false
> spark.master.port 7087
> spark.master.rest.port 6077
> spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom
> *spark.authenticate true
> spark.authenticate.secret 5828d44b-f9b9-4033-b1f5-21d1e3273ec2
> spark.authenticate.enableSaslEncryption false
> spark.network.sasl.serverAlwaysEncrypt false*
> We run the simple SparkPi example and there are Exception messages even 
> though the application gets done.
> # cat 
> spark-1.6.1-bin-hadoop2.6/logs/spark-xpan-org.apache.spark.deploy.ExternalShuffleService-1-cws-75.out.1
> ... ...
> 17/07/20 02:57:30 INFO spark.SecurityManager: SecurityManager: authentication 
> enabled; ui acls disabled; users with view permissions: Set(xpan); users with 
> modify permissions: Set(xpan)
> 17/07/20 02:57:31 INFO deploy.ExternalShuffleService: Starting shuffle 
> service on port 7448 with useSasl = true
> 17/07/20 02:58:04 INFO shuffle.ExternalShuffleBlockResolver: Registered 
> executor AppExecId{appId=app-20170720025800-, execId=0} with 
> ExecutorShuffleInfo{localDirs=[/tmp/xpan-spark-161/spark-8e4885a3-c463-4dfb-a396-04e16b65fd1e/executor-be15fcd0-c946-4c83-ba25-3b20bbce5b0e/blockmgr-0fd2658a-ce15-4d56-901c-4c746161bbe0],
>  subDirsPerLocalDir=64, 
> shuffleManager=org.apache.spark.shuffle.sort.SortShuffleManager}
> 17/07/20 02:58:11 INFO security.sasl: DIGEST41:Unmatched MACs
> 17/07/20 02:58:11 WARN server.TransportChannelHandler: Exception in 
> connection from /172.29.10.77:50616
> io.netty.handler.codec.DecoderException: javax.security.sasl.SaslException: 
> DIGEST-MD5: Out of order sequencing of messages from server. Got: 125 
> Expected: 123
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:99)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:785)
> Caused by: javax.security.sasl.SaslException: DIGEST-MD5: Out of order 
> sequencing of messages from server. Got: 125 Expected: 123
>   at 
> com.ibm.security.sasl.digest.DigestMD5Base$DigestPrivacy.unwrap(DigestMD5Base.java:1535)
>   at 
> com.ibm.security.sasl.digest.DigestMD5Base.unwrap(DigestMD5Base.java:231)
>   at 
>