[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2015-05-21 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r30820880
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +406,81 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * This system security manager applies to the entire process.
+   * It's main purpose is to handle the case if the user code does a 
System.exit.
+   * This allows us to catch that and properly set the YARN application 
status and
+   * cleanup if needed.
+   */
+  private def setupSystemSecurityManager(): Unit = {
+try {
+  var stopped = false
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo("In securityManager checkExit, exit code: " + paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+"User class exited with non-zero exit code")
+}
+stopped = true
+  }
+}
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): Unit 
= {}
+  })
+}
+catch {
+  case e: SecurityException =>
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_SECURITY,
+  "Error in setSecurityManager")
+logError("Error in setSecurityManager:", e)
+}
+  }
+
+  /**
+   * Start the user class, which contains the spark driver, in a separate 
Thread.
+   * If the main routine exits cleanly or exits with System.exit(0) we
+   * assume it was successful, for all other cases we assume failure.
+   *
+   * Returns the user thread that was started.
+   */
   private def startUserClass(): Thread = {
 logInfo("Starting the user JAR in a separate Thread")
 System.setProperty("spark.executor.instances", 
args.numExecutors.toString)
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod("main", 
classOf[Array[String]])
 
-userClassThread = new Thread {
+val userThread = new Thread {
   override def run() {
-var status = FinalApplicationStatus.FAILED
 try {
-  // Copy
   val mainArgs = new Array[String](args.userArgs.size)
   args.userArgs.copyToArray(mainArgs, 0, args.userArgs.size)
   mainMethod.invoke(null, mainArgs)
-  // Some apps have "System.exit(0)" at the end.  The user thread 
will stop here unless
-  // it has an uncaught exception thrown out.  It needs a shutdown 
hook to set SUCCEEDED.
-  status = FinalApplicationStatus.SUCCEEDED
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+  logDebug("Done running users class")
 } catch {
   case e: InvocationTargetException =>
 e.getCause match {
   case _: InterruptedException =>
 // Reporter thread can interrupt to stop user class
-
-  case e => throw e
+  case e: Exception =>
--- End diff --

This was changed in a subsequent PR. Check the current code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2015-05-20 Thread javabrett
Github user javabrett commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r30769030
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +406,81 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * This system security manager applies to the entire process.
+   * It's main purpose is to handle the case if the user code does a 
System.exit.
+   * This allows us to catch that and properly set the YARN application 
status and
+   * cleanup if needed.
+   */
+  private def setupSystemSecurityManager(): Unit = {
+try {
+  var stopped = false
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo("In securityManager checkExit, exit code: " + paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+"User class exited with non-zero exit code")
+}
+stopped = true
+  }
+}
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): Unit 
= {}
+  })
+}
+catch {
+  case e: SecurityException =>
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_SECURITY,
+  "Error in setSecurityManager")
+logError("Error in setSecurityManager:", e)
+}
+  }
+
+  /**
+   * Start the user class, which contains the spark driver, in a separate 
Thread.
+   * If the main routine exits cleanly or exits with System.exit(0) we
+   * assume it was successful, for all other cases we assume failure.
+   *
+   * Returns the user thread that was started.
+   */
   private def startUserClass(): Thread = {
 logInfo("Starting the user JAR in a separate Thread")
 System.setProperty("spark.executor.instances", 
args.numExecutors.toString)
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod("main", 
classOf[Array[String]])
 
-userClassThread = new Thread {
+val userThread = new Thread {
   override def run() {
-var status = FinalApplicationStatus.FAILED
 try {
-  // Copy
   val mainArgs = new Array[String](args.userArgs.size)
   args.userArgs.copyToArray(mainArgs, 0, args.userArgs.size)
   mainMethod.invoke(null, mainArgs)
-  // Some apps have "System.exit(0)" at the end.  The user thread 
will stop here unless
-  // it has an uncaught exception thrown out.  It needs a shutdown 
hook to set SUCCEEDED.
-  status = FinalApplicationStatus.SUCCEEDED
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+  logDebug("Done running users class")
 } catch {
   case e: InvocationTargetException =>
 e.getCause match {
   case _: InterruptedException =>
 // Reporter thread can interrupt to stop user class
-
-  case e => throw e
+  case e: Exception =>
--- End diff --

I'm curious, should this be Throwable?  If my application throws an 
uncaught Error, shouldn't that also result in FAILED, and would it (still) do 
so with this change?  P.S. my Scala is not that strong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2577


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-07 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-58197006
  
Thanks @andrewor14.  I've merged this into 1.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-06 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-58136343
  
LGTM, feel free to merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-58037937
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21330/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-58037924
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21330/consoleFull)
 for   PR 2577 at commit 
[`9c2efbf`](https://github.com/apache/spark/commit/9c2efbfd8d199bf89f911e44c7b07c6afe6b15bd).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CacheTableCommand(tableName: String, plan: 
Option[LogicalPlan], isLazy: Boolean)`
  * `case class UncacheTableCommand(tableName: String) extends Command`
  * `case class CacheTableCommand(`
  * `case class UncacheTableCommand(tableName: String) extends LeafNode 
with Command `
  * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-58026499
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21330/consoleFull)
 for   PR 2577 at commit 
[`9c2efbf`](https://github.com/apache/spark/commit/9c2efbfd8d199bf89f911e44c7b07c6afe6b15bd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-06 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18459205
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +405,82 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * This system security manager applies to the entire process.
+   * It's main purpose is to handle the case if the user code does a 
System.exit.
+   * This allows us to catch that and properly set the YARN application 
status and
+   * cleanup if needed.
+   */
+  private def setupSystemSecurityManager() = {
+try {
+  var stopped = false
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo("In securityManager checkExit, exit code: " + paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+"User class exited with non-zero exit code")
+}
+stopped = true
+  }
+}
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): Unit 
= {
+}
--- End diff --

In the future please clarify what you want bumped up as you said this prior 
and I thought you meant remove the extra space between 430 and 431.  I assume 
you actually mean the }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-06 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18459132
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +405,82 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * This system security manager applies to the entire process.
+   * It's main purpose is to handle the case if the user code does a 
System.exit.
+   * This allows us to catch that and properly set the YARN application 
status and
+   * cleanup if needed.
+   */
+  private def setupSystemSecurityManager() = {
+try {
+  var stopped = false
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo("In securityManager checkExit, exit code: " + paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+"User class exited with non-zero exit code")
+}
+stopped = true
+  }
+}
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): Unit 
= {
+}
+  })
+}
+catch {
+  case e: SecurityException =>
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_SECURITY,
+  "Error in setSecurityManager")
+logError("Error in setSecurityManager:", e)
+}
+  }
+
+  /**
+   * Start the user class, which contains the spark driver, in a separate 
Thread.
+   * If the main routine exits cleanly or exits with System.exit(0) we
+   * assume it was successful, for all other cases we assume failure.
+   *
+   * Returns the user thread that was started.
+   */
   private def startUserClass(): Thread = {
 logInfo("Starting the user JAR in a separate Thread")
 System.setProperty("spark.executor.instances", 
args.numExecutors.toString)
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod("main", 
classOf[Array[String]])
 
-userClassThread = new Thread {
+val userThread = new Thread {
   override def run() {
-var status = FinalApplicationStatus.FAILED
 try {
-  // Copy
   val mainArgs = new Array[String](args.userArgs.size)
   args.userArgs.copyToArray(mainArgs, 0, args.userArgs.size)
   mainMethod.invoke(null, mainArgs)
-  // Some apps have "System.exit(0)" at the end.  The user thread 
will stop here unless
-  // it has an uncaught exception thrown out.  It needs a shutdown 
hook to set SUCCEEDED.
-  status = FinalApplicationStatus.SUCCEEDED
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+  logDebug("Done running users class")
 } catch {
   case e: InvocationTargetException =>
 e.getCause match {
   case _: InterruptedException =>
 // Reporter thread can interrupt to stop user class
-
-  case e => throw e
+  case e: Throwable =>
--- End diff --

that is fine, but note you didn't comment on this one earlier, you 
commented somewhere else in the code. this one we end up re-throwing so I 
wasn't as concerned with it.  I can change it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57857279
  
Hey @tgravescs this LGTM pending a few minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18418623
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +405,82 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * This system security manager applies to the entire process.
+   * It's main purpose is to handle the case if the user code does a 
System.exit.
+   * This allows us to catch that and properly set the YARN application 
status and
+   * cleanup if needed.
+   */
+  private def setupSystemSecurityManager() = {
+try {
+  var stopped = false
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo("In securityManager checkExit, exit code: " + paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+"User class exited with non-zero exit code")
+}
+stopped = true
+  }
+}
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): Unit 
= {
+}
--- End diff --

can you bump this up one line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18418613
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +405,82 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * This system security manager applies to the entire process.
+   * It's main purpose is to handle the case if the user code does a 
System.exit.
+   * This allows us to catch that and properly set the YARN application 
status and
+   * cleanup if needed.
+   */
+  private def setupSystemSecurityManager() = {
--- End diff --

can you add `: Unit`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18418362
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +405,82 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * This system security manager applies to the entire process.
+   * It's main purpose is to handle the case if the user code does a 
System.exit.
+   * This allows us to catch that and properly set the YARN application 
status and
+   * cleanup if needed.
+   */
+  private def setupSystemSecurityManager() = {
+try {
+  var stopped = false
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo("In securityManager checkExit, exit code: " + paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+"User class exited with non-zero exit code")
+}
+stopped = true
+  }
+}
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): Unit 
= {
+}
+  })
+}
+catch {
+  case e: SecurityException =>
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_SECURITY,
+  "Error in setSecurityManager")
+logError("Error in setSecurityManager:", e)
+}
+  }
+
+  /**
+   * Start the user class, which contains the spark driver, in a separate 
Thread.
+   * If the main routine exits cleanly or exits with System.exit(0) we
+   * assume it was successful, for all other cases we assume failure.
+   *
+   * Returns the user thread that was started.
+   */
   private def startUserClass(): Thread = {
 logInfo("Starting the user JAR in a separate Thread")
 System.setProperty("spark.executor.instances", 
args.numExecutors.toString)
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod("main", 
classOf[Array[String]])
 
-userClassThread = new Thread {
+val userThread = new Thread {
   override def run() {
-var status = FinalApplicationStatus.FAILED
 try {
-  // Copy
   val mainArgs = new Array[String](args.userArgs.size)
   args.userArgs.copyToArray(mainArgs, 0, args.userArgs.size)
   mainMethod.invoke(null, mainArgs)
-  // Some apps have "System.exit(0)" at the end.  The user thread 
will stop here unless
-  // it has an uncaught exception thrown out.  It needs a shutdown 
hook to set SUCCEEDED.
-  status = FinalApplicationStatus.SUCCEEDED
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+  logDebug("Done running users class")
 } catch {
   case e: InvocationTargetException =>
 e.getCause match {
   case _: InterruptedException =>
 // Reporter thread can interrupt to stop user class
-
-  case e => throw e
+  case e: Throwable =>
--- End diff --

I still think we should catch only Exception here. All we ever do in 
`finish` is to kill the threads and log the exit code, and if we get a really 
bad `Throwable` that kills the JVM then these threads won't survive anyway. 
It's just that the JVM is not guaranteed to do whatever `finish` does properly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18418213
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -328,10 +349,18 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private def waitForSparkDriver(): ActorRef = {
 logInfo("Waiting for Spark driver to be reachable.")
 var driverUp = false
+var count = 0
 val hostport = args.userArgs(0)
 val (driverHost, driverPort) = Utils.parseHostPort(hostport)
-while (!driverUp) {
+
+// spark driver should already be up since it launched us, but we 
don't want to
+// wait forever, so wait 100 seconds max to match the cluster mode 
setting.
+// Leave this config unpublished for now.
--- End diff --

minor, but can you add `SPARK-3779` to the comment so others know we're 
tracking this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18418163
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -231,33 +259,26 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 val t = new Thread {
   override def run() {
 var failureCount = 0
-
 while (!finished) {
   try {
-checkNumExecutorsFailed()
-if (!finished) {
+if (allocator.getNumExecutorsFailed >= maxNumExecutorFailures) 
{
+  finish(FinalApplicationStatus.FAILED,
+ApplicationMaster.EXIT_MAX_EXECUTOR_FAILURES,
+"Max number of executor failures reached")
+} else {
   logDebug("Sending progress")
   allocator.allocateResources()
 }
 failureCount = 0
   } catch {
+case i: InterruptedException =>
 case e: Throwable => {
   failureCount += 1
   if (!NonFatal(e) || failureCount >= reporterMaxFailures) {
--- End diff --

Since we're catching `InterruptedException` here it's always gonna be 
`NonFatal` right? I think this check is now outdated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57818934
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21250/consoleFull)
 for   PR 2577 at commit 
[`e8cc261`](https://github.com/apache/spark/commit/e8cc261ba8e8b639d2fd375638ae5bb0925c1411).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57818940
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21250/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57809815
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21250/consoleFull)
 for   PR 2577 at commit 
[`e8cc261`](https://github.com/apache/spark/commit/e8cc261ba8e8b639d2fd375638ae5bb0925c1411).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57808148
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21248/consoleFull)
 for   PR 2577 at commit 
[`24c98e3`](https://github.com/apache/spark/commit/24c98e3154ee2ee93dc4c958ac982b534a798972).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57808155
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21248/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57807655
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21248/consoleFull)
 for   PR 2577 at commit 
[`24c98e3`](https://github.com/apache/spark/commit/24c98e3154ee2ee93dc4c958ac982b534a798972).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57807019
  
Addressed all the review comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18395494
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -328,10 +348,18 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private def waitForSparkDriver(): ActorRef = {
 logInfo("Waiting for Spark driver to be reachable.")
 var driverUp = false
+var count = 0
 val hostport = args.userArgs(0)
 val (driverHost, driverPort) = Utils.parseHostPort(hostport)
-while (!driverUp) {
+
+// spark driver should already be up since it launched us, but we 
don't want to
+// wait forever, so wait 100 seconds max to match the cluster mode 
setting.
+// Leave this config unpublished for now.
+val numTries = 
sparkConf.getInt("spark.yarn.ApplicationMaster.client.waitTries", 1000)
--- End diff --

https://issues.apache.org/jira/browse/SPARK-3779 filed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18395371
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -328,10 +348,18 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private def waitForSparkDriver(): ActorRef = {
 logInfo("Waiting for Spark driver to be reachable.")
 var driverUp = false
+var count = 0
 val hostport = args.userArgs(0)
 val (driverHost, driverPort) = Utils.parseHostPort(hostport)
-while (!driverUp) {
+
+// spark driver should already be up since it launched us, but we 
don't want to
+// wait forever, so wait 100 seconds max to match the cluster mode 
setting.
+// Leave this config unpublished for now.
+val numTries = 
sparkConf.getInt("spark.yarn.ApplicationMaster.client.waitTries", 1000)
--- End diff --

ok for this pr I'll leave it applicationMaster.waitTries and match cluster 
mode and I'll file a separate jira to clean it up. The documentation doesn't 
state how long each loop is for example.  I think these would be better to just 
change to be a wait times versus number of tries and then they can be used for 
both modes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18251599
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +404,80 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * Start the user class, which contains the spark driver.
+   * If the main routine exits cleanly or exits with System.exit(0) we
+   * assume it was successful, for all other cases we assume failure.
+   */
   private def startUserClass(): Thread = {
 logInfo("Starting the user JAR in a separate Thread")
 System.setProperty("spark.executor.instances", 
args.numExecutors.toString)
+var stopped = false
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod("main", 
classOf[Array[String]])
 
-userClassThread = new Thread {
+val userThread = new Thread {
   override def run() {
-var status = FinalApplicationStatus.FAILED
+
+try {
+  // Note this security manager applies to the entire process, not
+  // just this thread. It's here to handle the case if the user 
code
+  // does System.exit
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo("In securityManager checkExit, exit code: " + 
paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+"User class exited with non-zero exit code")
+}
+stopped = true
+  }
+}
+
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): 
Unit = {
+}
+  })
+}
+catch {
+  case e: SecurityException => {
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_SECURITY,
+  "Error in setSecurityManager")
+logError("Error in setSecurityManager:", e)
+  }
+}
+
--- End diff --

sounds good, I'll separate it out. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18249218
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +404,80 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * Start the user class, which contains the spark driver.
+   * If the main routine exits cleanly or exits with System.exit(0) we
+   * assume it was successful, for all other cases we assume failure.
+   */
   private def startUserClass(): Thread = {
 logInfo("Starting the user JAR in a separate Thread")
 System.setProperty("spark.executor.instances", 
args.numExecutors.toString)
+var stopped = false
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod("main", 
classOf[Array[String]])
 
-userClassThread = new Thread {
+val userThread = new Thread {
   override def run() {
-var status = FinalApplicationStatus.FAILED
+
+try {
+  // Note this security manager applies to the entire process, not
+  // just this thread. It's here to handle the case if the user 
code
+  // does System.exit
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo("In securityManager checkExit, exit code: " + 
paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+"User class exited with non-zero exit code")
+}
+stopped = true
+  }
+}
+
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): 
Unit = {
+}
+  })
+}
+catch {
+  case e: SecurityException => {
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_SECURITY,
+  "Error in setSecurityManager")
+logError("Error in setSecurityManager:", e)
+  }
+}
+
--- End diff --

Not a big deal, but I think it'll make the content of this thread easier to 
read by minimizing the logic we put in it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18249133
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -328,10 +348,18 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private def waitForSparkDriver(): ActorRef = {
 logInfo("Waiting for Spark driver to be reachable.")
 var driverUp = false
+var count = 0
 val hostport = args.userArgs(0)
 val (driverHost, driverPort) = Utils.parseHostPort(hostport)
-while (!driverUp) {
+
+// spark driver should already be up since it launched us, but we 
don't want to
+// wait forever, so wait 100 seconds max to match the cluster mode 
setting.
+// Leave this config unpublished for now.
+val numTries = 
sparkConf.getInt("spark.yarn.ApplicationMaster.client.waitTries", 1000)
--- End diff --

It's kind of inconsistent to use `applicationMaster.client.waitTries` for 
client mode but `applicationMaster.waitTries` for cluster mode, and the 
existing documentation for the latter makes no mention of cluster mode even 
though it's only used there. It's fine to keep the `client` config here but we 
should make the other one `applicationMaster.cluster.waitTries` in a future 
JIRA and deprecate the less specific one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18248626
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,108 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18243719
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -328,10 +348,18 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private def waitForSparkDriver(): ActorRef = {
 logInfo("Waiting for Spark driver to be reachable.")
 var driverUp = false
+var count = 0
 val hostport = args.userArgs(0)
 val (driverHost, driverPort) = Utils.parseHostPort(hostport)
-while (!driverUp) {
+
+// spark driver should already be up since it launched us, but we 
don't want to
+// wait forever, so wait 100 seconds max to match the cluster mode 
setting.
+// Leave this config unpublished for now.
+val numTries = 
sparkConf.getInt("spark.yarn.ApplicationMaster.client.waitTries", 1000)
--- End diff --

yes the client was tacked on to mean it used in the client mode because the 
timing of the loops are different between the modes.  Its an internal config 
right now so user shouldn't be setting. The timing is different because client 
mode is already up when this is launched, versus in cluster mode we are 
launching the user code, which takes some times (10's of seconds).

I'll file a separate jira to fix up the mismatch in doc/config.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18243513
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,108 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass 

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18240426
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -231,33 +258,26 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 val t = new Thread {
   override def run() {
 var failureCount = 0
-
-while (!finished) {
+while (!finished && !Thread.currentThread().isInterrupted()) {
--- End diff --

Is the second check needed? If the thread is interrupted won't this already 
have exited?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18240375
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -343,6 +371,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   Thread.sleep(100)
   }
 }
+
+if (!driverUp) {
+  throw new Exception("Failed to connect to driver!")
--- End diff --

Can you throw `SparkException` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18240353
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -328,10 +348,18 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private def waitForSparkDriver(): ActorRef = {
 logInfo("Waiting for Spark driver to be reachable.")
 var driverUp = false
+var count = 0
 val hostport = args.userArgs(0)
 val (driverHost, driverPort) = Utils.parseHostPort(hostport)
-while (!driverUp) {
+
+// spark driver should already be up since it launched us, but we 
don't want to
+// wait forever, so wait 100 seconds max to match the cluster mode 
setting.
+// Leave this config unpublished for now.
+val numTries = 
sparkConf.getInt("spark.yarn.ApplicationMaster.client.waitTries", 1000)
--- End diff --

This config should use camel case for `applicationMaster`. Also, there's 
already a `spark.yarn.applicationMaster.waitTries`. Does the extra `client` 
mean it's for client mode? Do we want a separate setting for client vs deploy 
modes here?

By the way there is a mismatch between what is already there 
`spark.yarn.ApplicationMatser.waitTries` and what we document 
`spark.yarn.applicationMaster.waitTries`. I think this is a bug that we can fix 
separately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18240121
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +404,80 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * Start the user class, which contains the spark driver.
--- End diff --

Thanks for documenting this. Can you add that this is started in a separate 
thread and this method returns that thread?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18240067
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +404,80 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * Start the user class, which contains the spark driver.
+   * If the main routine exits cleanly or exits with System.exit(0) we
+   * assume it was successful, for all other cases we assume failure.
+   */
   private def startUserClass(): Thread = {
 logInfo("Starting the user JAR in a separate Thread")
 System.setProperty("spark.executor.instances", 
args.numExecutors.toString)
+var stopped = false
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod("main", 
classOf[Array[String]])
 
-userClassThread = new Thread {
+val userThread = new Thread {
   override def run() {
-var status = FinalApplicationStatus.FAILED
+
+try {
+  // Note this security manager applies to the entire process, not
+  // just this thread. It's here to handle the case if the user 
code
+  // does System.exit
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo("In securityManager checkExit, exit code: " + 
paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+"User class exited with non-zero exit code")
+}
+stopped = true
+  }
+}
+
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): 
Unit = {
+}
--- End diff --

can you bump this up one line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18239755
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +404,80 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * Start the user class, which contains the spark driver.
+   * If the main routine exits cleanly or exits with System.exit(0) we
+   * assume it was successful, for all other cases we assume failure.
+   */
   private def startUserClass(): Thread = {
 logInfo("Starting the user JAR in a separate Thread")
 System.setProperty("spark.executor.instances", 
args.numExecutors.toString)
+var stopped = false
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod("main", 
classOf[Array[String]])
 
-userClassThread = new Thread {
+val userThread = new Thread {
   override def run() {
-var status = FinalApplicationStatus.FAILED
+
+try {
+  // Note this security manager applies to the entire process, not
+  // just this thread. It's here to handle the case if the user 
code
+  // does System.exit
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo("In securityManager checkExit, exit code: " + 
paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+"User class exited with non-zero exit code")
+}
+stopped = true
+  }
+}
+
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): 
Unit = {
+}
+  })
+}
+catch {
+  case e: SecurityException => {
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_SECURITY,
+  "Error in setSecurityManager")
+logError("Error in setSecurityManager:", e)
+  }
+}
+
--- End diff --

No it could be pulled out, but its also not needed for yarn-client mode


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18239726
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,108 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18239465
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +404,80 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * Start the user class, which contains the spark driver.
+   * If the main routine exits cleanly or exits with System.exit(0) we
+   * assume it was successful, for all other cases we assume failure.
+   */
   private def startUserClass(): Thread = {
 logInfo("Starting the user JAR in a separate Thread")
 System.setProperty("spark.executor.instances", 
args.numExecutors.toString)
+var stopped = false
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod("main", 
classOf[Array[String]])
 
-userClassThread = new Thread {
+val userThread = new Thread {
   override def run() {
-var status = FinalApplicationStatus.FAILED
+
+try {
+  // Note this security manager applies to the entire process, not
+  // just this thread. It's here to handle the case if the user 
code
+  // does System.exit
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo("In securityManager checkExit, exit code: " + 
paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+"User class exited with non-zero exit code")
+}
+stopped = true
+  }
+}
+
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): 
Unit = {
+}
+  })
+}
+catch {
+  case e: SecurityException => {
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_SECURITY,
+  "Error in setSecurityManager")
+logError("Error in setSecurityManager:", e)
+  }
+}
+
--- End diff --

Does this block have to execute inside the thread, since it's a system-wide 
setting?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18239431
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +404,80 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * Start the user class, which contains the spark driver.
+   * If the main routine exits cleanly or exits with System.exit(0) we
+   * assume it was successful, for all other cases we assume failure.
+   */
   private def startUserClass(): Thread = {
 logInfo("Starting the user JAR in a separate Thread")
 System.setProperty("spark.executor.instances", 
args.numExecutors.toString)
+var stopped = false
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod("main", 
classOf[Array[String]])
 
-userClassThread = new Thread {
+val userThread = new Thread {
   override def run() {
-var status = FinalApplicationStatus.FAILED
+
+try {
+  // Note this security manager applies to the entire process, not
+  // just this thread. It's here to handle the case if the user 
code
+  // does System.exit
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo("In securityManager checkExit, exit code: " + 
paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+"User class exited with non-zero exit code")
+}
+stopped = true
+  }
+}
+
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): 
Unit = {
+}
+  })
+}
+catch {
+  case e: SecurityException => {
--- End diff --

nit: no need for `{` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18238861
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -450,6 +511,15 @@ object ApplicationMaster extends Logging {
 
   val SHUTDOWN_HOOK_PRIORITY: Int = 30
 
+  // exit codes for different causes, no reason behind the values
+  val EXIT_SUCCESS = 0
+  val EXIT_UNCAUGHT_EXCEPTION = 10
+  val EXIT_MAX_EXECUTOR_FAILURES = 11
+  val EXIT_REPORTER_FAILURE = 12
+  val EXIT_SC_NOT_INITED = 13
+  val EXIT_SECURITY = 14
+  val EXIT_EXCEPTION_USER_CLASS = 15
+
--- End diff --

These should all be private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18238788
  
--- Diff: 
yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala 
---
@@ -66,14 +70,16 @@ private class YarnRMClientImpl(args: 
ApplicationMasterArguments) extends YarnRMC
 appAttemptId
   }
 
-  override def shutdown(status: FinalApplicationStatus, diagnostics: 
String = "") = {
-val finishReq = 
Records.newRecord(classOf[FinishApplicationMasterRequest])
-  .asInstanceOf[FinishApplicationMasterRequest]
-finishReq.setAppAttemptId(getAttemptId())
-finishReq.setFinishApplicationStatus(status)
-finishReq.setDiagnostics(diagnostics)
-finishReq.setTrackingUrl(uiHistoryAddress)
-resourceManager.finishApplicationMaster(finishReq)
+  override def unregister(status: FinalApplicationStatus, diagnostics: 
String = "") = synchronized {
+if (registered) {
+  val finishReq = 
Records.newRecord(classOf[FinishApplicationMasterRequest])
+.asInstanceOf[FinishApplicationMasterRequest]
--- End diff --

this pr didn't change this code, other then wrapping it with an if.  Its 
also going to be deprecated soon so I don't see a reason to fix  it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18238499
  
--- Diff: 
yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala 
---
@@ -66,14 +70,16 @@ private class YarnRMClientImpl(args: 
ApplicationMasterArguments) extends YarnRMC
 appAttemptId
   }
 
-  override def shutdown(status: FinalApplicationStatus, diagnostics: 
String = "") = {
-val finishReq = 
Records.newRecord(classOf[FinishApplicationMasterRequest])
-  .asInstanceOf[FinishApplicationMasterRequest]
-finishReq.setAppAttemptId(getAttemptId())
-finishReq.setFinishApplicationStatus(status)
-finishReq.setDiagnostics(diagnostics)
-finishReq.setTrackingUrl(uiHistoryAddress)
-resourceManager.finishApplicationMaster(finishReq)
+  override def unregister(status: FinalApplicationStatus, diagnostics: 
String = "") = synchronized {
+if (registered) {
+  val finishReq = 
Records.newRecord(classOf[FinishApplicationMasterRequest])
+.asInstanceOf[FinishApplicationMasterRequest]
--- End diff --

You probably don't need this cast


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57347769
  
LGTM. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57325372
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21040/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57325361
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21040/consoleFull)
 for   PR 2577 at commit 
[`fab166d`](https://github.com/apache/spark/commit/fab166dba852f732049dda112daab17a491dd94c).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57314584
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21040/consoleFull)
 for   PR 2577 at commit 
[`fab166d`](https://github.com/apache/spark/commit/fab166dba852f732049dda112daab17a491dd94c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-30 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57313993
  
thanks for the review @vanzin.  I've updated it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18183146
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,134 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass whi

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18182939
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,134 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass 

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18181156
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,134 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass whi

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18180991
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,134 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass whi

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18180627
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,134 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass 

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18179848
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,134 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass 

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57209937
  
Looks ok to me, although the exception handling does feel a little 
paranoid. :-) Just had a few nits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18174887
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +432,80 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * Start the user class, which contains the spark driver.
+   * If the main routine exits cleanly or exits with System.exit(0) we
+   * assume it was successful, for all other cases we assume failure.
+   */
   private def startUserClass(): Thread = {
 logInfo("Starting the user JAR in a separate Thread")
 System.setProperty("spark.executor.instances", 
args.numExecutors.toString)
+var stopped = false
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod("main", 
classOf[Array[String]])
 
-userClassThread = new Thread {
+val userThread = new Thread {
   override def run() {
-var status = FinalApplicationStatus.FAILED
+
+try {
+  // Note this security manager applies to the entire process, not
+  // just this thread. Its here to handle the case if the user code
--- End diff --

nit: It's


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18174148
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,134 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass whi

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18174477
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,134 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass whi

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18174440
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -232,32 +285,27 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   override def run() {
 var failureCount = 0
 
-while (!finished) {
+while (!finished && !Thread.currentThread().isInterrupted()) {
   try {
-checkNumExecutorsFailed()
-if (!finished) {
+
--- End diff --

nit: blank line not needed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18174328
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,134 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass whi

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18174075
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -71,80 +74,134 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private val sparkContextRef = new AtomicReference[SparkContext](null)
 
   final def run(): Int = {
-val appAttemptId = client.getAttemptId()
+try {
+  val appAttemptId = client.getAttemptId()
 
-if (isDriver) {
-  // Set the web ui port to be ephemeral for yarn so we don't conflict 
with
-  // other spark processes running on the same box
-  System.setProperty("spark.ui.port", "0")
+  if (isDriver) {
+// Set the web ui port to be ephemeral for yarn so we don't 
conflict with
+// other spark processes running on the same box
+System.setProperty("spark.ui.port", "0")
 
-  // Set the master property to match the requested mode.
-  System.setProperty("spark.master", "yarn-cluster")
+// Set the master property to match the requested mode.
+System.setProperty("spark.master", "yarn-cluster")
 
-  // Propagate the application ID so that YarnClusterSchedulerBackend 
can pick it up.
-  System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
-}
+// Propagate the application ID so that 
YarnClusterSchedulerBackend can pick it up.
+System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+  }
 
-logInfo("ApplicationAttemptId: " + appAttemptId)
+  logInfo("ApplicationAttemptId: " + appAttemptId)
 
-val cleanupHook = new Runnable {
-  override def run() {
-// If the SparkContext is still registered, shut it down as a best 
case effort in case
-// users do not call sc.stop or do System.exit().
-val sc = sparkContextRef.get()
-if (sc != null) {
-  logInfo("Invoking sc stop from shutdown hook")
-  sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
-}
+  val cleanupHook = new Runnable {
+override def run() {
+  // If the SparkContext is still registered, shut it down as a 
best case effort in case
+  // users do not call sc.stop or do System.exit().
+  val sc = sparkContextRef.get()
+  if (sc != null) {
+logInfo("Invoking sc stop from shutdown hook")
+sc.stop()
+  }
+  val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
+  val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
+
+  if (!finished) {
+// this shouldn't ever happen, but if it does assume weird 
failure
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
+  "shutdown hook called without cleanly finishing")
+  }
 
-// Cleanup the staging dir after the app is finished, or if it's 
the last attempt at
-// running the AM.
-val maxAppAttempts = client.getMaxRegAttempts(yarnConf)
-val isLastAttempt = client.getAttemptId().getAttemptId() >= 
maxAppAttempts
-if (finished || isLastAttempt) {
-  cleanupStagingDir()
+  if (!unregistered) {
+// we only want to unregister if we don't want the RM to retry
+if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
+  unregister(finalStatus, finalMsg)
+  cleanupStagingDir()
+}
+  }
 }
   }
-}
 
-// Use higher priority than FileSystem.
-assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
-ShutdownHookManager
-  .get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
+  // Use higher priority than FileSystem.
+  assert(ApplicationMaster.SHUTDOWN_HOOK_PRIORITY > 
FileSystem.SHUTDOWN_HOOK_PRIORITY)
+  ShutdownHookManager
+.get().addShutdownHook(cleanupHook, 
ApplicationMaster.SHUTDOWN_HOOK_PRIORITY)
 
-// Call this to force generation of secret so it gets populated into 
the
-// Hadoop UGI. This has to happen before the startUserClass which does 
a
-// doAs in order for the credentials to be passed on to the executor 
containers.
-val securityMgr = new SecurityManager(sparkConf)
+  // Call this to force generation of secret so it gets populated into 
the
+  // Hadoop UGI. This has to happen before the startUserClass whi

[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18173150
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -450,6 +539,15 @@ object ApplicationMaster extends Logging {
 
   val SHUTDOWN_HOOK_PRIORITY: Int = 30
 
+  // exit codes for different causes, no reason behind the values
--- End diff --

The application Master is not an executor so I chose not to use it. It also 
doesn't have the same exit reasons which could be useful if the user has an 
exit code and wants to know what that matches up to


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18170801
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -450,6 +539,15 @@ object ApplicationMaster extends Logging {
 
   val SHUTDOWN_HOOK_PRIORITY: Int = 30
 
+  // exit codes for different causes, no reason behind the values
--- End diff --

We can use this class?

[ExecutorExitCode](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/ExecutorExitCode.scala)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57187866
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20975/consoleFull)
 for   PR 2577 at commit 
[`32f4dfa`](https://github.com/apache/spark/commit/32f4dfa5d0778e3e7bd7c5da27c71f8c3f5930f6).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57187881
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20975/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57185274
  
also note this does change everything to allow yarn to retry. previously 
when it hit the maximum number of executor failures it didn't retry the AM.  I 
waffled back and forth on this one.  At first the thought was that if that many 
executors are dying its probably an issue with the user code, but then again if 
you have a really long running job then I can think of situations you want it 
to retry.Anyone have strong opinion on that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57176392
  
@witgo can you verify this covers https://github.com/apache/spark/pull/2311


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-57176963
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20975/consoleFull)
 for   PR 2577 at commit 
[`32f4dfa`](https://github.com/apache/spark/commit/32f4dfa5d0778e3e7bd7c5da27c71f8c3f5930f6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-09-29 Thread tgravescs
GitHub user tgravescs opened a pull request:

https://github.com/apache/spark/pull/2577

[SPARK-3627] - [yarn] - fix exit code and final status reporting to RM

See the description and whats handled in the jira comment: 
https://issues.apache.org/jira/browse/SPARK-3627?focusedCommentId=14150013&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14150013

This does not handle yarn client mode reporting of the driver to the AM.   
I think that should be handled when we make it an unmanaged AM.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgravescs/spark SPARK-3627

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2577.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2577


commit d3cc80050dfb472c426d1c1399edd9134e53a563
Author: Thomas Graves 
Date:   2014-09-29T13:38:43Z

SPARK-3627 - yarn - fix exit code and final status reporting to RM

commit f0b65199f50bae99aa89ea3a5915b610d7134392
Author: Thomas Graves 
Date:   2014-09-29T15:05:29Z

change order of cleanup staging dir

commit 32f4dfa5d0778e3e7bd7c5da27c71f8c3f5930f6
Author: Thomas Graves 
Date:   2014-09-29T15:10:09Z

switch back




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org