[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2015-01-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68997280
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25154/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2015-01-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68997276
  
  [Test build #25154 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25154/consoleFull)
 for   PR 3771 at commit 
[`c02bfcc`](https://github.com/apache/spark/commit/c02bfcca0eb73246dc11a9b2a0ef80053d85a44b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2015-01-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68990989
  
  [Test build #25154 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25154/consoleFull)
 for   PR 3771 at commit 
[`c02bfcc`](https://github.com/apache/spark/commit/c02bfcca0eb73246dc11a9b2a0ef80053d85a44b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2015-01-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3771


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2015-01-07 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-69025846
  
thanks @SaintBacchus  the changes look good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2015-01-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68617626
  
  [Test build #25021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25021/consoleFull)
 for   PR 3771 at commit 
[`c02bfcc`](https://github.com/apache/spark/commit/c02bfcca0eb73246dc11a9b2a0ef80053d85a44b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2015-01-03 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68617680
  
@tgravescs your comment is much more clear than what I said, I have use it 
instead of mine.Thx.

when Yarn HA event happens, the previous ApplicationMaster will throw a
```
java.io.IOException: Failed on local exception: java.io.EOFException
``` 
So the yarn cluster the catch the exception and change the final status.

But the yarn client will directly go into the ShutDownHook and cause the 
problem.
I think it haven't go into the  `DisassociatedEvent` yet, because the 
driver is still alive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2015-01-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68619465
  
  [Test build #25021 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25021/consoleFull)
 for   PR 3771 at commit 
[`c02bfcc`](https://github.com/apache/spark/commit/c02bfcca0eb73246dc11a9b2a0ef80053d85a44b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2015-01-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68619468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25021/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2014-12-30 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68367226
  
@SaintBacchus so I'm still a bit unclear of the exact scenario. I just want 
to make sure we are handling everything properly so want to make sure I 
understand fully.

So this is when the RM goes down and is being brought back up or fails over 
to a standby.  At that point it restarts the applications to start a new 
attempt. The shutdown hook is run and the code you mention above runs and 
unregisters. I understand client mode can't set it because spark context is not 
in the same process. The thing that is unclear to me is how is cluster mode 
setting the finalStatus to something other then succeeded?  Is sparkContext 
being signalled and then throwing exception so that startUserClass catches it 
and marks it as failed?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2014-12-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/3771#discussion_r22352924
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -153,6 +153,19 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   }
 
   /**
+   * we should distinct the default final status between client and 
cluster,
+   * because the SUCCEEDED status may cause the HA failed in client mode 
and
+   * UNDEFINED may cause the error reporter in cluster when using sys.exit.
+   */
+  final def getDefaultFinalStatus() = {
--- End diff --

I assume we are hitting the logic on line 108 above in if (!finished) {... 
I think that comment and code is based on the final status defaulting to 
success.  In the very least we should update that comment explaining what is 
going to happen in client vs cluster mode.   Since the DisassociatedEvent exits 
with success for client mode I think making the default as undefined for client 
mode is fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2014-12-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/3771#discussion_r22353308
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -153,6 +153,19 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
   }
 
   /**
+   * we should distinct the default final status between client and 
cluster,
--- End diff --

can we clarify this comment a little.  Perhaps something more like below 
(feel free to reword)

Set the default final application status for client mode to UNDEFINED to 
handle if YARN HA restarts the application so that it properly retries. Set the 
final status to SUCCEEDED in cluster mode to handle if the user calls 
System.exit from the application code.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2014-12-29 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68260017
  
Can you please be a bit more specific and detail out exact what happens 
here? Are you referring to when RM has to failover or during rolling upgrade. 
Is the container brought down and then back up again... please just describe 
the scenario and what exactly is happening. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2014-12-29 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68321575
  
what @tgravescs says is  close to the scenario, but it happens during the 
RM  recover after broke down.
```scala
if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
  unregister(finalStatus, finalMsg)
  cleanupStagingDir(fs)
}
```
In the code, it won't check the `isLastAttempt` if the `finalStatus` was 
`FinalApplicationStatus.SUCCEEDED` . 
When the RM recovering happens, it would not check the `isLastAttempt` 
since the yarn-client had no chance to change the value of `finalStatus`.  It's 
going to the `unregister` and this application can't recover itself.
So the yarn-client can't support the RM HA now.(yarn-cluster is OK)
And dividing the `finalStatus` into two parts is an easy way to avoid this 
problem and compatible with previous design.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2014-12-24 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68092406
  
@tgravescs can you hava a look at this problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2014-12-23 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68006295
  
Also, /cc @tgravescs, another one of our YARN maintainers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2014-12-22 Thread SaintBacchus
Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-67925042
  
@andrewor14 can you go through this problem? Thx.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org