[jira] [Assigned] (SPARK-25869) Spark on YARN: the original diagnostics is missing when job failed maxAppAttempts times

2019-02-12 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-25869:
--

Assignee: (was: Marcelo Vanzin)

> Spark on YARN: the original diagnostics is missing when job failed 
> maxAppAttempts times
> ---
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Yeliang Cang
>Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
>  spark-submit  --class org.apache.spark.examples.SparkPi     --master yarn    
>  --deploy-mode cluster     --driver-memory 127m  --driver-cores 1   
> --executor-memory 2048m     --executor-cores 1    --num-executors 10  --queue 
> root.mr --conf spark.testing.reservedMemory=1048576 --conf 
> spark.yarn.executor.memoryOverhead=50 --conf 
> spark.yarn.driver.memoryOverhead=50 
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 1
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in 
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: FAILED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1540536615315_0013 finished with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
>  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
>  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager: 
> Shutdown hook called
> 2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager: 
> Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
> {code}
>  
>  
> Solution: after apply the patch, spark client log can be shown as:
> {code}
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: RUNNING)
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812436656
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
>  user: mr
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: FAILED)
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Application application_1540536615315_0012 failed 2 times due 
> to AM Container for appattempt_1540536615315_0012_02 exited with 
> exitCode: -104
> For more detailed output, check application tracking 
> page:http://zdh141:8088/cluster/app/application_1540536615315_0012Then, click 
> on links to logs of each attempt.
> Diagnostics: virtual memory used. Killing contain

[jira] [Assigned] (SPARK-25869) Spark on YARN: the original diagnostics is missing when job failed maxAppAttempts times

2019-02-12 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-25869:
--

Assignee: Marcelo Vanzin

> Spark on YARN: the original diagnostics is missing when job failed 
> maxAppAttempts times
> ---
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Yeliang Cang
>Assignee: Marcelo Vanzin
>Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
>  spark-submit  --class org.apache.spark.examples.SparkPi     --master yarn    
>  --deploy-mode cluster     --driver-memory 127m  --driver-cores 1   
> --executor-memory 2048m     --executor-cores 1    --num-executors 10  --queue 
> root.mr --conf spark.testing.reservedMemory=1048576 --conf 
> spark.yarn.executor.memoryOverhead=50 --conf 
> spark.yarn.driver.memoryOverhead=50 
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 1
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in 
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: FAILED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1540536615315_0013 finished with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
>  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
>  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager: 
> Shutdown hook called
> 2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager: 
> Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
> {code}
>  
>  
> Solution: after apply the patch, spark client log can be shown as:
> {code}
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: RUNNING)
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812436656
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
>  user: mr
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: FAILED)
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Application application_1540536615315_0012 failed 2 times due 
> to AM Container for appattempt_1540536615315_0012_02 exited with 
> exitCode: -104
> For more detailed output, check application tracking 
> page:http://zdh141:8088/cluster/app/application_1540536615315_0012Then, click 
> on links to logs of each attempt.
> Diagnostics: virtual m

[jira] [Assigned] (SPARK-25869) Spark on YARN: the original diagnostics is missing when job failed maxAppAttempts times

2018-12-18 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-25869:
--

Assignee: (was: Marcelo Vanzin)

> Spark on YARN: the original diagnostics is missing when job failed 
> maxAppAttempts times
> ---
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Yeliang Cang
>Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
>  spark-submit  --class org.apache.spark.examples.SparkPi     --master yarn    
>  --deploy-mode cluster     --driver-memory 127m  --driver-cores 1   
> --executor-memory 2048m     --executor-cores 1    --num-executors 10  --queue 
> root.mr --conf spark.testing.reservedMemory=1048576 --conf 
> spark.yarn.executor.memoryOverhead=50 --conf 
> spark.yarn.driver.memoryOverhead=50 
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 1
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in 
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: FAILED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1540536615315_0013 finished with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
>  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
>  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager: 
> Shutdown hook called
> 2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager: 
> Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
> {code}
>  
>  
> Solution: after apply the patch, spark client log can be shown as:
> {code}
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: RUNNING)
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812436656
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
>  user: mr
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: FAILED)
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Application application_1540536615315_0012 failed 2 times due 
> to AM Container for appattempt_1540536615315_0012_02 exited with 
> exitCode: -104
> For more detailed output, check application tracking 
> page:http://zdh141:8088/cluster/app/application_1540536615315_0012Then, click 
> on links to logs of each attempt.
> Diagnostics: virtual memory used. Killing contain

[jira] [Assigned] (SPARK-25869) Spark on YARN: the original diagnostics is missing when job failed maxAppAttempts times

2018-12-18 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-25869:
--

Assignee: Marcelo Vanzin

> Spark on YARN: the original diagnostics is missing when job failed 
> maxAppAttempts times
> ---
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Yeliang Cang
>Assignee: Marcelo Vanzin
>Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
>  spark-submit  --class org.apache.spark.examples.SparkPi     --master yarn    
>  --deploy-mode cluster     --driver-memory 127m  --driver-cores 1   
> --executor-memory 2048m     --executor-cores 1    --num-executors 10  --queue 
> root.mr --conf spark.testing.reservedMemory=1048576 --conf 
> spark.yarn.executor.memoryOverhead=50 --conf 
> spark.yarn.driver.memoryOverhead=50 
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 1
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in 
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: FAILED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1540536615315_0013 finished with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
>  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
>  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager: 
> Shutdown hook called
> 2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager: 
> Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
> {code}
>  
>  
> Solution: after apply the patch, spark client log can be shown as:
> {code}
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: RUNNING)
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812436656
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
>  user: mr
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: FAILED)
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Application application_1540536615315_0012 failed 2 times due 
> to AM Container for appattempt_1540536615315_0012_02 exited with 
> exitCode: -104
> For more detailed output, check application tracking 
> page:http://zdh141:8088/cluster/app/application_1540536615315_0012Then, click 
> on links to logs of each attempt.
> Diagnostics: virtual m

[jira] [Assigned] (SPARK-25869) Spark on YARN: the original diagnostics is missing when job failed maxAppAttempts times

2018-10-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25869:


Assignee: (was: Apache Spark)

> Spark on YARN: the original diagnostics is missing when job failed 
> maxAppAttempts times
> ---
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Yeliang Cang
>Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
>  spark-submit  --class org.apache.spark.examples.SparkPi     --master yarn    
>  --deploy-mode cluster     --driver-memory 127m  --driver-cores 1   
> --executor-memory 2048m     --executor-cores 1    --num-executors 10  --queue 
> root.mr --conf spark.testing.reservedMemory=1048576 --conf 
> spark.yarn.executor.memoryOverhead=50 --conf 
> spark.yarn.driver.memoryOverhead=50 
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 1
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in 
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: FAILED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1540536615315_0013 finished with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
>  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
>  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager: 
> Shutdown hook called
> 2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager: 
> Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
> {code}
>  
>  
> Solution: after apply the patch, spark client log can be shown as:
> {code}
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: RUNNING)
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812436656
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
>  user: mr
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: FAILED)
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Application application_1540536615315_0012 failed 2 times due 
> to AM Container for appattempt_1540536615315_0012_02 exited with 
> exitCode: -104
> For more detailed output, check application tracking 
> page:http://zdh141:8088/cluster/app/application_1540536615315_0012Then, click 
> on links to logs of each attempt.
> Diagnostics: virtual memory used. Killing container.
> 

[jira] [Assigned] (SPARK-25869) Spark on YARN: the original diagnostics is missing when job failed maxAppAttempts times

2018-10-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25869:


Assignee: Apache Spark

> Spark on YARN: the original diagnostics is missing when job failed 
> maxAppAttempts times
> ---
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Yeliang Cang
>Assignee: Apache Spark
>Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
>  spark-submit  --class org.apache.spark.examples.SparkPi     --master yarn    
>  --deploy-mode cluster     --driver-memory 127m  --driver-cores 1   
> --executor-memory 2048m     --executor-cores 1    --num-executors 10  --queue 
> root.mr --conf spark.testing.reservedMemory=1048576 --conf 
> spark.yarn.executor.memoryOverhead=50 --conf 
> spark.yarn.driver.memoryOverhead=50 
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 1
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in 
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812501560
>  final status: FAILED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
>  user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1540536615315_0013 finished with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
>  at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
>  at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager: 
> Shutdown hook called
> 2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager: 
> Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
> {code}
>  
>  
> Solution: after apply the patch, spark client log can be shown as:
> {code}
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: RUNNING)
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: 10.43.183.143
>  ApplicationMaster RPC port: 0
>  queue: root.mr
>  start time: 1540812436656
>  final status: UNDEFINED
>  tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
>  user: mr
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client: Application 
> report for application_1540536615315_0012 (state: FAILED)
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client:
>  client token: N/A
>  diagnostics: Application application_1540536615315_0012 failed 2 times due 
> to AM Container for appattempt_1540536615315_0012_02 exited with 
> exitCode: -104
> For more detailed output, check application tracking 
> page:http://zdh141:8088/cluster/app/application_1540536615315_0012Then, click 
> on links to logs of each attempt.
> Diagnostics: virtual memory us