[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-29 Thread witgo
Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/2311


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56821516
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20808/consoleFull)
 for   PR 2311 at commit 
[`9655aa8`](https://github.com/apache/spark/commit/9655aa8c8e99f49e4598d5f37548afeb6f1d9d6a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56829283
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20808/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56829268
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20808/consoleFull)
 for   PR 2311 at commit 
[`9655aa8`](https://github.com/apache/spark/commit/9655aa8c8e99f49e4598d5f37548afeb6f1d9d6a).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-25 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56910981
  
@witgo There are cases not covered by this pr that I would like to fix up 
dealing with exit codes and final status.  Would you mind if I just pull these 
changes into mine?  I want to try to fix it up across the board.  I hope to 
have something up tomorrow if all the testing goes well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-25 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56912003
  
Ok, no problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-25 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56912131
  
also maybe I'm missing something but I don't see how the changes help in 
yarn-client mode? The changes you made were on the application master running 
on a separate host then the driver itself, startUserClass is only used in 
cluster mode, and when the disconnect happens in client mode we always report 
success.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-25 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56914209
  
Yes, this does not involve `org.apache.spark.deploy.yarn.Client` class, 
which run outside the cluster.
We should call `YarnClient.killApplication` when an uncaught exception is 
thrown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56525799
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20707/consoleFull)
 for   PR 2311 at commit 
[`617efef`](https://github.com/apache/spark/commit/617efeff1a6bec059fd24652fd783f54dbb9915c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56524702
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20706/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56525297
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56537068
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20707/consoleFull)
 for   PR 2311 at commit 
[`617efef`](https://github.com/apache/spark/commit/617efeff1a6bec059fd24652fd783f54dbb9915c).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56537077
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20707/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17930410
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 if (sc != null) {
   logInfo(Invoking sc stop from shutdown hook)
   sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
+}
+
+// Shuts down the AM.
+if (!finished) {
--- End diff --

Just to understand, is the YARN retry logic there for resubmitting the AM? 
Most Spark applications don't support that, since their driver contains a lot 
of state. (Though I guess applications submitted in yarn-client mode might).

In general, if the user's driver is running within the AM, I would say it's 
successful if the driver's main() returns 0 or calls System.exit(0). If the 
driver is running remotely, it should tell the AM when to shut down cleanly 
when it closes its SparkContext or exits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17940217
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 if (sc != null) {
   logInfo(Invoking sc stop from shutdown hook)
   sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
+}
+
+// Shuts down the AM.
+if (!finished) {
--- End diff --

Yes the retry logic is for resubmitting the AM in the case it was on 
machine that went down, slow, network loss, etc..

Ok so we'll go with if it System.exit(0) or returns 0 then it succeeded and 
everything else for now is failure and will retry if configured to retry.  If 
we don't want it to retry in certain cases we would need more information from 
the driver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17946079
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 if (sc != null) {
   logInfo(Invoking sc stop from shutdown hook)
   sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
+}
+
+// Shuts down the AM.
+if (!finished) {
--- End diff --

Well my feeling is that if the driver program crashed, it's going to crash 
again the next time you try. Is that not the case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17946116
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 if (sc != null) {
   logInfo(Invoking sc stop from shutdown hook)
   sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
+}
+
+// Shuts down the AM.
+if (!finished) {
--- End diff --

Well, that depends a lot on the crash. If the host where the AM was running 
just dies, running it on a different host might succeed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17946154
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 if (sc != null) {
   logInfo(Invoking sc stop from shutdown hook)
   sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
+}
+
+// Shuts down the AM.
+if (!finished) {
--- End diff --

Basically there are a few cases:
- If the machine with the AM dies, then sure, YARN could relaunch it 
(hopefully that is user-configurable in case their code has weird side effects)
- If the driver returns 0, it's finished
- If the driver crashes for another reason, it could be a bug in the 
driver; I'm not sure we should just resubmit in that case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17948054
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 if (sc != null) {
   logInfo(Invoking sc stop from shutdown hook)
   sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
+}
+
+// Shuts down the AM.
+if (!finished) {
--- End diff --

@vanzin But this code is *in* the AM, isn't it? That's what I'm saying 
above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17951431
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 if (sc != null) {
   logInfo(Invoking sc stop from shutdown hook)
   sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
+}
+
+// Shuts down the AM.
+if (!finished) {
--- End diff --

Yeah, unless we want to add more logic about trying to figure out what 
shouldn't be retried, the easy thing is to just retry on any failure.  
Obviously if the machine dies this code won't be running, its more that 
something weird happens causing it to crash or exit badly.  

There are actually some potential issues with rerunning the AM though.  One 
is what we refer to as split brain (one AM losing connection from RM but still 
running so it starts a second AM) and both write to the same output dir and 
cause issues with the output data.  I filed a jira for this to try to handle in 
Spark AM.

The second occurs if the fist run had committed its output and we rerun it 
when shouldn't. 
 The reason we don't want that to happen is to prevent data corruption. 
Many times in MR one job will start once anothers output is committed, so if it 
was to get changed out from under them by a rerun of the AM it could lose data. 
 I'm not sure that same kind of check is as easy with Spark.  

MR handles both of those cases. Obviously if your MR job is writing to some 
other service or using custom fileoutput or has some other side effects its up 
to the user to guarantee that it can be rerun.  

I'm assuming its the users responsibility with Spark since spark can rerun 
tasks/stages on failure. Any input on that Matei?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-23 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17951694
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 if (sc != null) {
   logInfo(Invoking sc stop from shutdown hook)
   sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
+}
+
+// Shuts down the AM.
+if (!finished) {
--- End diff --

Our `saveAs*File` operations in Spark also avoid running if the output 
directly already exists, so there is that support in there. But for Spark I'd 
make this AM rerun configurable. The reason is that some Spark apps might be 
accessing weird data sources, serving a UI to outside users, or just doing 
fairly complicate logic inside that is hard to retry. These are things that MR 
jobs don't do as commonly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56352566
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20653/consoleFull)
 for   PR 2311 at commit 
[`5f0883d`](https://github.com/apache/spark/commit/5f0883d8ae00fe632ff237965fb5bc73c026c7c2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56358732
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20653/consoleFull)
 for   PR 2311 at commit 
[`5f0883d`](https://github.com/apache/spark/commit/5f0883d8ae00fe632ff237965fb5bc73c026c7c2).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-22 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17863114
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -119,13 +126,7 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 } else {
   runExecutorLauncher(securityMgr)
 }
-
-if (finalStatus != FinalApplicationStatus.UNDEFINED) {
--- End diff --

Since you're changing this logic, you need to modify `runExecutorLauncher` 
so that it correctly reports an error on failure. Right now it will always exit 
with 0 and will not report the final status to the RM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-22 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17863181
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -119,13 +126,7 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 } else {
   runExecutorLauncher(securityMgr)
 }
-
-if (finalStatus != FinalApplicationStatus.UNDEFINED) {
-  finish(finalStatus)
-  0
-} else {
-  1
-}
+exitCode
--- End diff --

Feels like with your changes there's a possibility that `finish()` might 
never be called. Might be better to add a call to that method here before 
returning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-22 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56408312
  
I've been having a multiple different issues with the success/failure 
reporting (SPARK-3627).  Does this handle all of those?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-22 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17864288
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -119,13 +126,7 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 } else {
   runExecutorLauncher(securityMgr)
 }
-
-if (finalStatus != FinalApplicationStatus.UNDEFINED) {
-  finish(finalStatus)
-  0
-} else {
-  1
-}
+exitCode
--- End diff --

Nevermind. I see you added the call to the shutdown hook. Still, the 
`finalStatus` needs to be properly set in `runExecutorLauncher`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-22 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56410208
  
@tgravescs this change should handle uncaught exceptions and explicit 
`System.exit` in the driver code. Not sure if that covers all the issues you're 
seeing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17868237
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -29,6 +29,7 @@ import org.apache.hadoop.yarn.api._
 import org.apache.hadoop.yarn.api.records._
 import org.apache.hadoop.yarn.conf.YarnConfiguration
 
+import org.apache.spark.executor.{ExecutorUncaughtExceptionHandler, 
ExecutorExitCode}
--- End diff --

I don't see these used anywhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17882325
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 if (sc != null) {
   logInfo(Invoking sc stop from shutdown hook)
   sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
+}
+
+// Shuts down the AM.
+if (!finished) {
--- End diff --

I think this could cause problems with YARN retry logic.  We shouldn't be 
finishing (unregistering with RM) unless we've explicitly succeeded or failed 
and don't want a retry to happen in those weird cases.  Since this is in 
shutdown hook I'm not sure we explicitly know that. 

i guess it comes down to us defining when the spark app is cleanly exiting. 
 This could be failed, killed, or succeeded.  Unfortunately I'm not sure spark 
really has this final reporting. 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17884602
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 if (sc != null) {
   logInfo(Invoking sc stop from shutdown hook)
   sc.stop()
-  finish(FinalApplicationStatus.SUCCEEDED)
+}
+
+// Shuts down the AM.
+if (!finished) {
--- End diff --

@mateiz @pwendell  is there anything in spark that we can/should use  where 
we consider the application in a final state (either success or failure) such 
that we wouldn't want to retry it?On MR there is explicit states for 
finishing and it also has checks for the committed output.   Spark I don't 
think is quite as straight forward.  Do we have any guidance on what an 
application should do on error and success?

ie can we say if sc.stop() is called then it finished cleanly, or perhaps 
if System.exit(0) is called. Should we be peaking at the metrics or something 
to see if any jobs failed with the application. thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-21 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56308622
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56308708
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20631/consoleFull)
 for   PR 2311 at commit 
[`7efcca0`](https://github.com/apache/spark/commit/7efcca0297802c6cfe685b61ad5c79f639cc5b0d).
 * This patch **does not** merge cleanly!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-56311778
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20631/consoleFull)
 for   PR 2311 at commit 
[`7efcca0`](https://github.com/apache/spark/commit/7efcca0297802c6cfe685b61ad5c79f639cc5b0d).
 * This patch **fails** unit tests.
 * This patch **does not** merge cleanly!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-54990398
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20042/consoleFull)
 for   PR 2311 at commit 
[`7efcca0`](https://github.com/apache/spark/commit/7efcca0297802c6cfe685b61ad5c79f639cc5b0d).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-54802000
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19974/consoleFull)
 for   PR 2311 at commit 
[`2ebe62e`](https://github.com/apache/spark/commit/2ebe62edd3056f842e16643c50cd77b04a6cbce5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-54806033
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19974/consoleFull)
 for   PR 2311 at commit 
[`2ebe62e`](https://github.com/apache/spark/commit/2ebe62edd3056f842e16643c50cd77b04a6cbce5).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class NoExitsException(exitCode: Int) extends 
SecurityException`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-54819294
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19979/consoleFull)
 for   PR 2311 at commit 
[`2ebe62e`](https://github.com/apache/spark/commit/2ebe62edd3056f842e16643c50cd77b04a6cbce5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-54830677
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19979/consoleFull)
 for   PR 2311 at commit 
[`2ebe62e`](https://github.com/apache/spark/commit/2ebe62edd3056f842e16643c50cd77b04a6cbce5).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class NoExitsException(exitCode: Int) extends 
SecurityException`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17251037
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -355,11 +355,20 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   private def startUserClass(): Thread = {
 logInfo(Starting the user JAR in a separate Thread)
 System.setProperty(spark.executor.instances, 
args.numExecutors.toString)
+var stopped = false
+case class NoExitsException(exitCode: Int) extends SecurityException
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod(main, 
classOf[Array[String]])
 
 val t = new Thread {
   override def run() {
+System.setSecurityManager(new java.lang.SecurityManager() {
+  override def checkExit(paramInt: Int) {
+if (!stopped) {
+  throw new NoExitsException(paramInt)
--- End diff --

Instead of throwing the exception, why not just set `exitStatus` directly 
and avoid having a custom exception? You can also use that as a trigger to call 
finish() if it hasn't been called yet, with a nice error message about the 
app having called `System.exit`.

That avoids messing with the app's exception handling, since very few 
people realize `System.exit` can throw and thus don't really plan for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2311#discussion_r17251254
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -369,7 +378,17 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   // Some apps have System.exit(0) at the end.  The user thread 
will stop here unless
   // it has an uncaught exception thrown out.  It needs a shutdown 
hook to set SUCCEEDED.
   status = FinalApplicationStatus.SUCCEEDED
-} finally {
+} catch {
+  case NoExitsException(exitCode) =
+exitStatus = exitCode
+if (exitStatus == 0) {
+  status = FinalApplicationStatus.SUCCEEDED
+}
+  case e: Throwable =
--- End diff --

Better not to catch `Throwable`, and don't set `exitCode` to -1 (in the 
shell, that translates to an exit code of 255, which is generally interpreted 
to mean different things than just process exited [1].)

[1] Some discussion: http://en.wikipedia.org/wiki/Exit_status



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-07 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/2311

[SPARK-3293] yarn's web show SUCCEEDED when the driver throw a exception 
in yarn-client



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-3293

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2311.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2311


commit 3828707a9b67f0b54a8f6a0a9b36307c2ae14429
Author: GuoQiang Li wi...@qq.com
Date:   2014-09-03T06:00:52Z

yarn check exit code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-54747134
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19948/consoleFull)
 for   PR 2311 at commit 
[`3828707`](https://github.com/apache/spark/commit/3828707a9b67f0b54a8f6a0a9b36307c2ae14429).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3293] yarn's web show SUCCEEDED when ...

2014-09-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2311#issuecomment-54748722
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19948/consoleFull)
 for   PR 2311 at commit 
[`3828707`](https://github.com/apache/spark/commit/3828707a9b67f0b54a8f6a0a9b36307c2ae14429).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class NoExitsException(exitCode: Int) extends 
SecurityException`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org