[jira] [Updated] (SPARK-24493) Kerberos Ticket Renewal is failing in long running Spark job

2018-06-08 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24493:

Component/s: Spark Core

> Kerberos Ticket Renewal is failing in long running Spark job
> 
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:2381)
>  failed in 0.218 s due to Job aborted due to stage failure: Task 0 in stage 
> 10896.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10896.0 
> (TID 7290, , executor 1): 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for abcd: HDFS_DELEGATION_TOKEN owner=a...@example.com, 
> renewer=yarn, realUser=, issueDate=1528136773875, maxDate=1528138573875, 
> sequenceNumber=38, masterKeyId=6) is expired, current time: 2018-06-04 
> 18:56:51,276+ expected renewal time: 2018-06-04 18:56:13,875+
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1355)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:317)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:856)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:845)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:834)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:998)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:189)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:186)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:141)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at 

[jira] [Updated] (SPARK-24493) Kerberos Ticket Renewal is failing in long running Spark job

2018-06-08 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24493:

Component/s: (was: Spark Core)
 YARN

> Kerberos Ticket Renewal is failing in long running Spark job
> 
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:2381)
>  failed in 0.218 s due to Job aborted due to stage failure: Task 0 in stage 
> 10896.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10896.0 
> (TID 7290, , executor 1): 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for abcd: HDFS_DELEGATION_TOKEN owner=a...@example.com, 
> renewer=yarn, realUser=, issueDate=1528136773875, maxDate=1528138573875, 
> sequenceNumber=38, masterKeyId=6) is expired, current time: 2018-06-04 
> 18:56:51,276+ expected renewal time: 2018-06-04 18:56:13,875+
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1355)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:317)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:856)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:845)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:834)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:998)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:189)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:186)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:141)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

[jira] [Updated] (SPARK-24493) Kerberos Ticket Renewal is failing in long running Spark job

2018-06-08 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24493:

Priority: Major  (was: Blocker)

> Kerberos Ticket Renewal is failing in long running Spark job
> 
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:2381)
>  failed in 0.218 s due to Job aborted due to stage failure: Task 0 in stage 
> 10896.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10896.0 
> (TID 7290, , executor 1): 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for abcd: HDFS_DELEGATION_TOKEN owner=a...@example.com, 
> renewer=yarn, realUser=, issueDate=1528136773875, maxDate=1528138573875, 
> sequenceNumber=38, masterKeyId=6) is expired, current time: 2018-06-04 
> 18:56:51,276+ expected renewal time: 2018-06-04 18:56:13,875+
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1355)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:317)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:856)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:845)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:834)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:998)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:189)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:186)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:141)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at