[jira] [Commented] (SPARK-25869) Spark on YARN: the original diagnostics is missing when job failed maxAppAttempts times

2018-12-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724453#comment-16724453
 ] 

ASF GitHub Bot commented on SPARK-25869:


vanzin closed pull request #22876: [SPARK-25869] [YARN] the original 
diagnostics is missing when job failed ma…
URL: https://github.com/apache/spark/pull/22876
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
index 8f94e3f731007..57f0a7f05b2e5 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
@@ -293,6 +293,9 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 }
 
 if (!unregistered) {
+  logInfo("Waiting for " + sparkConf.get("spark.yarn.report.interval", 
"1000").toInt +"ms to unregister am," +
+" so the client can have the right diagnostics msg!")
+  Thread.sleep(sparkConf.get("spark.yarn.report.interval", 
"1000").toInt)
   // we only want to unregister if we don't want the RM to retry
   if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
 unregister(finalStatus, finalMsg)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark on YARN: the original diagnostics is missing when job failed 
> maxAppAttempts times
> ---
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Yeliang Cang
>Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
>  spark-submit  --class org.apache.spark.examples.SparkPi     --master yarn    
>  --deploy-mode cluster     --driver-memory 127m  --driver-cores 1   
> --executor-memory 2048m     --executor-cores 1    --num-executors 10  --queue 
> root.mr --conf spark.testing.reservedMemory=1048576 --conf 
> spark.yarn.executor.memoryOverhead=50 --conf 
> spark.yarn.driver.memoryOverhead=50 
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 1
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in 
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: FAILED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1540536615315_0013 finished with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
>  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
>  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> 

[jira] [Commented] (SPARK-25869) Spark on YARN: the original diagnostics is missing when job failed maxAppAttempts times

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715709#comment-16715709
 ] 

ASF GitHub Bot commented on SPARK-25869:


vanzin commented on a change in pull request #22876: [SPARK-25869] [YARN] the 
original diagnostics is missing when job failed ma…
URL: https://github.com/apache/spark/pull/22876#discussion_r240412098
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ##
 @@ -293,6 +293,9 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 }
 
 if (!unregistered) {
+  logInfo("Waiting for " + sparkConf.get("spark.yarn.report.interval", 
"1000").toInt +"ms to unregister am," +
 
 Review comment:
   This should also be a config constant. Instead of sleeping might be better 
to join `userClassThread` or `reporterThread` since they may exit more quickly 
than the configured wait.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark on YARN: the original diagnostics is missing when job failed 
> maxAppAttempts times
> ---
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Yeliang Cang
>Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
>  spark-submit  --class org.apache.spark.examples.SparkPi     --master yarn    
>  --deploy-mode cluster     --driver-memory 127m  --driver-cores 1   
> --executor-memory 2048m     --executor-cores 1    --num-executors 10  --queue 
> root.mr --conf spark.testing.reservedMemory=1048576 --conf 
> spark.yarn.executor.memoryOverhead=50 --conf 
> spark.yarn.driver.memoryOverhead=50 
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 1
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in 
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: FAILED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1540536615315_0013 finished with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
>  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
>  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager: 
> Shutdown hook called
> 2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager: 
> Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
> {code}
>  
>  
> Solution: after apply the patch, spark client log can be shown as:

[jira] [Commented] (SPARK-25869) Spark on YARN: the original diagnostics is missing when job failed maxAppAttempts times

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715714#comment-16715714
 ] 

ASF GitHub Bot commented on SPARK-25869:


vanzin commented on issue #22876: [SPARK-25869] [YARN] the original diagnostics 
is missing when job failed ma…
URL: https://github.com/apache/spark/pull/22876#issuecomment-446005351
 
 
   Also, PR title and summary should explain the fix, not the problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark on YARN: the original diagnostics is missing when job failed 
> maxAppAttempts times
> ---
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Yeliang Cang
>Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
>  spark-submit  --class org.apache.spark.examples.SparkPi     --master yarn    
>  --deploy-mode cluster     --driver-memory 127m  --driver-cores 1   
> --executor-memory 2048m     --executor-cores 1    --num-executors 10  --queue 
> root.mr --conf spark.testing.reservedMemory=1048576 --conf 
> spark.yarn.executor.memoryOverhead=50 --conf 
> spark.yarn.driver.memoryOverhead=50 
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 1
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in 
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: FAILED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1540536615315_0013 finished with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
>  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
>  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager: 
> Shutdown hook called
> 2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager: 
> Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
> {code}
>  
>  
> Solution: after apply the patch, spark client log can be shown as:
> {code}
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: RUNNING)
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812436656
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
>  user: mr
> 2018-10-29 19:27:33,964 INFO 

[jira] [Commented] (SPARK-25869) Spark on YARN: the original diagnostics is missing when job failed maxAppAttempts times

2018-10-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667101#comment-16667101
 ] 

Apache Spark commented on SPARK-25869:
--

User 'Cangyl' has created a pull request for this issue:
https://github.com/apache/spark/pull/22876

> Spark on YARN: the original diagnostics is missing when job failed 
> maxAppAttempts times
> ---
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Yeliang Cang
>Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
>  spark-submit  --class org.apache.spark.examples.SparkPi     --master yarn    
>  --deploy-mode cluster     --driver-memory 127m  --driver-cores 1   
> --executor-memory 2048m     --executor-cores 1    --num-executors 10  --queue 
> root.mr --conf spark.testing.reservedMemory=1048576 --conf 
> spark.yarn.executor.memoryOverhead=50 --conf 
> spark.yarn.driver.memoryOverhead=50 
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 1
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in 
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: FAILED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1540536615315_0013 finished with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
>  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
>  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager: 
> Shutdown hook called
> 2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager: 
> Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
> {code}
>  
>  
> Solution: after apply the patch, spark client log can be shown as:
> {code}
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: RUNNING)
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812436656
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
>  user: mr
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: FAILED)
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Application application_1540536615315_0012 failed 2 times due 
> to AM Container for appattempt_1540536615315_0012_02 exited with 
> exitCode: -104
> For more detailed output, check application tracking 
> page:http://zdh141:8088/cluster/app/application_1540536615315_0012Then, 

[jira] [Commented] (SPARK-25869) Spark on YARN: the original diagnostics is missing when job failed maxAppAttempts times

2018-10-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667098#comment-16667098
 ] 

Apache Spark commented on SPARK-25869:
--

User 'Cangyl' has created a pull request for this issue:
https://github.com/apache/spark/pull/22876

> Spark on YARN: the original diagnostics is missing when job failed 
> maxAppAttempts times
> ---
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Yeliang Cang
>Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
>  spark-submit  --class org.apache.spark.examples.SparkPi     --master yarn    
>  --deploy-mode cluster     --driver-memory 127m  --driver-cores 1   
> --executor-memory 2048m     --executor-cores 1    --num-executors 10  --queue 
> root.mr --conf spark.testing.reservedMemory=1048576 --conf 
> spark.yarn.executor.memoryOverhead=50 --conf 
> spark.yarn.driver.memoryOverhead=50 
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 1
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in 
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: FAILED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1540536615315_0013 finished with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
>  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
>  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager: 
> Shutdown hook called
> 2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager: 
> Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
> {code}
>  
>  
> Solution: after apply the patch, spark client log can be shown as:
> {code}
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: RUNNING)
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812436656
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
>  user: mr
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: FAILED)
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Application application_1540536615315_0012 failed 2 times due 
> to AM Container for appattempt_1540536615315_0012_02 exited with 
> exitCode: -104
> For more detailed output, check application tracking 
> page:http://zdh141:8088/cluster/app/application_1540536615315_0012Then, 

[jira] [Commented] (SPARK-25869) Spark on YARN: the original diagnostics is missing when job failed maxAppAttempts times

2018-10-29 Thread Yeliang Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667072#comment-16667072
 ] 

Yeliang Cang commented on SPARK-25869:
--

The reason that causes this is that when ApplicationMaster unregister itself, 
the diagnostics message "Shutdown hook called before final status was 
reported." is sent to YARN resourcemanager before the am container 
CONTAINER_FINISHED event sent to resourcemanager. So we can delay the 
unregister event in "spark.yarn.report.interval" time, so that the client can 
get the diagnostics message in CONTAINER_FINISHED  event.

> Spark on YARN: the original diagnostics is missing when job failed 
> maxAppAttempts times
> ---
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Yeliang Cang
>Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
>  spark-submit  --class org.apache.spark.examples.SparkPi     --master yarn    
>  --deploy-mode cluster     --driver-memory 127m  --driver-cores 1   
> --executor-memory 2048m     --executor-cores 1    --num-executors 10  --queue 
> root.mr --conf spark.testing.reservedMemory=1048576 --conf 
> spark.yarn.executor.memoryOverhead=50 --conf 
> spark.yarn.driver.memoryOverhead=50 
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 1
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in 
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: FAILED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1540536615315_0013 finished with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
>  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
>  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager: 
> Shutdown hook called
> 2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager: 
> Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
> {code}
>  
>  
> Solution: after apply the patch, spark client log can be shown as:
> {code}
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: RUNNING)
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812436656
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
>  user: mr
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: FAILED)
> 2018-10-29 19:27:33,964 INFO