[jira] [Resolved] (SPARK-6449) Driver OOM results in reported application result SUCCESS

2015-03-24 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-6449.
--
   Resolution: Duplicate
Fix Version/s: (was: 1.3.0)

 Driver OOM results in reported application result SUCCESS
 -

 Key: SPARK-6449
 URL: https://issues.apache.org/jira/browse/SPARK-6449
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.3.0
Reporter: Ryan Williams

 I ran a job yesterday that according to the History Server and YARN RM 
 finished with status {{SUCCESS}}.
 Clicking around on the history server UI, there were too few stages run, and 
 I couldn't figure out why that would have been.
 Finally, inspecting the end of the driver's logs, I saw:
 {code}
 15/03/20 15:08:13 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
 15/03/20 15:08:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
 Shutting down remote daemon.
 15/03/20 15:08:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
 Remote daemon shut down; proceeding with flushing remote transports.
 15/03/20 15:08:13 INFO spark.SparkContext: Successfully stopped SparkContext
 Exception in thread Driver scala.MatchError: java.lang.OutOfMemoryError: GC 
 overhead limit exceeded (of class java.lang.OutOfMemoryError)
 at 
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:485)
 15/03/20 15:08:13 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, 
 exitCode: 0, (reason: Shutdown hook called before final status was reported.)
 15/03/20 15:08:13 INFO yarn.ApplicationMaster: Unregistering 
 ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before 
 final status was reported.)
 15/03/20 15:08:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
 Remoting shut down.
 15/03/20 15:08:13 INFO impl.AMRMClientImpl: Waiting for application to be 
 successfully unregistered.
 15/03/20 15:08:13 INFO yarn.ApplicationMaster: Deleting staging directory 
 .sparkStaging/application_1426705269584_0055
 {code}
 The driver OOM'd, [the {{catch}} block that presumably should have caught 
 it|https://github.com/apache/spark/blob/b6090f902e6ec24923b4dde4aabc9076956521c1/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L484]
  threw a {{MatchError}}, and then {{SUCCESS}} was returned to YARN and 
 written to the event log.
 This should be logged as a failed job and reported as such to YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6449) Driver OOM results in reported application result SUCCESS

2015-03-23 Thread Ryan Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Williams resolved SPARK-6449.
--
   Resolution: Implemented
Fix Version/s: 1.3.0

 Driver OOM results in reported application result SUCCESS
 -

 Key: SPARK-6449
 URL: https://issues.apache.org/jira/browse/SPARK-6449
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.3.0
Reporter: Ryan Williams
 Fix For: 1.3.0


 I ran a job yesterday that according to the History Server and YARN RM 
 finished with status {{SUCCESS}}.
 Clicking around on the history server UI, there were too few stages run, and 
 I couldn't figure out why that would have been.
 Finally, inspecting the end of the driver's logs, I saw:
 {code}
 15/03/20 15:08:13 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
 15/03/20 15:08:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
 Shutting down remote daemon.
 15/03/20 15:08:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
 Remote daemon shut down; proceeding with flushing remote transports.
 15/03/20 15:08:13 INFO spark.SparkContext: Successfully stopped SparkContext
 Exception in thread Driver scala.MatchError: java.lang.OutOfMemoryError: GC 
 overhead limit exceeded (of class java.lang.OutOfMemoryError)
 at 
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:485)
 15/03/20 15:08:13 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, 
 exitCode: 0, (reason: Shutdown hook called before final status was reported.)
 15/03/20 15:08:13 INFO yarn.ApplicationMaster: Unregistering 
 ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before 
 final status was reported.)
 15/03/20 15:08:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
 Remoting shut down.
 15/03/20 15:08:13 INFO impl.AMRMClientImpl: Waiting for application to be 
 successfully unregistered.
 15/03/20 15:08:13 INFO yarn.ApplicationMaster: Deleting staging directory 
 .sparkStaging/application_1426705269584_0055
 {code}
 The driver OOM'd, [the {{catch}} block that presumably should have caught 
 it|https://github.com/apache/spark/blob/b6090f902e6ec24923b4dde4aabc9076956521c1/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L484]
  threw a {{MatchError}}, and then {{SUCCESS}} was returned to YARN and 
 written to the event log.
 This should be logged as a failed job and reported as such to YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org