[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68997280 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25154/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68997276 [Test build #25154 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25154/consoleFull) for PR 3771 at commit [`c02bfcc`](https://github.com/apache/spark/commit/c02bfcca0eb73246dc11a9b2a0ef80053d85a44b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68990989 [Test build #25154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25154/consoleFull) for PR 3771 at commit [`c02bfcc`](https://github.com/apache/spark/commit/c02bfcca0eb73246dc11a9b2a0ef80053d85a44b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3771 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-69025846 thanks @SaintBacchus the changes look good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68617626 [Test build #25021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25021/consoleFull) for PR 3771 at commit [`c02bfcc`](https://github.com/apache/spark/commit/c02bfcca0eb73246dc11a9b2a0ef80053d85a44b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68617680 @tgravescs your comment is much more clear than what I said, I have use it instead of mine.Thx. when Yarn HA event happens, the previous ApplicationMaster will throw a ``` java.io.IOException: Failed on local exception: java.io.EOFException ``` So the yarn cluster the catch the exception and change the final status. But the yarn client will directly go into the ShutDownHook and cause the problem. I think it haven't go into the `DisassociatedEvent` yet, because the driver is still alive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68619465 [Test build #25021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25021/consoleFull) for PR 3771 at commit [`c02bfcc`](https://github.com/apache/spark/commit/c02bfcca0eb73246dc11a9b2a0ef80053d85a44b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68619468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25021/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68367226 @SaintBacchus so I'm still a bit unclear of the exact scenario. I just want to make sure we are handling everything properly so want to make sure I understand fully. So this is when the RM goes down and is being brought back up or fails over to a standby. At that point it restarts the applications to start a new attempt. The shutdown hook is run and the code you mention above runs and unregisters. I understand client mode can't set it because spark context is not in the same process. The thing that is unclear to me is how is cluster mode setting the finalStatus to something other then succeeded? Is sparkContext being signalled and then throwing exception so that startUserClass catches it and marks it as failed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3771#discussion_r22352924 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -153,6 +153,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, } /** + * we should distinct the default final status between client and cluster, + * because the SUCCEEDED status may cause the HA failed in client mode and + * UNDEFINED may cause the error reporter in cluster when using sys.exit. + */ + final def getDefaultFinalStatus() = { --- End diff -- I assume we are hitting the logic on line 108 above in if (!finished) {... I think that comment and code is based on the final status defaulting to success. In the very least we should update that comment explaining what is going to happen in client vs cluster mode. Since the DisassociatedEvent exits with success for client mode I think making the default as undefined for client mode is fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3771#discussion_r22353308 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -153,6 +153,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, } /** + * we should distinct the default final status between client and cluster, --- End diff -- can we clarify this comment a little. Perhaps something more like below (feel free to reword) Set the default final application status for client mode to UNDEFINED to handle if YARN HA restarts the application so that it properly retries. Set the final status to SUCCEEDED in cluster mode to handle if the user calls System.exit from the application code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68260017 Can you please be a bit more specific and detail out exact what happens here? Are you referring to when RM has to failover or during rolling upgrade. Is the container brought down and then back up again... please just describe the scenario and what exactly is happening. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68321575 what @tgravescs says is close to the scenario, but it happens during the RM recover after broke down. ```scala if (finalStatus == FinalApplicationStatus.SUCCEEDED || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(fs) } ``` In the code, it won't check the `isLastAttempt` if the `finalStatus` was `FinalApplicationStatus.SUCCEEDED` . When the RM recovering happens, it would not check the `isLastAttempt` since the yarn-client had no chance to change the value of `finalStatus`. It's going to the `unregister` and this application can't recover itself. So the yarn-client can't support the RM HA now.(yarn-cluster is OK) And dividing the `finalStatus` into two parts is an easy way to avoid this problem and compatible with previous design. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68092406 @tgravescs can you hava a look at this problem? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68006295 Also, /cc @tgravescs, another one of our YARN maintainers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-67925042 @andrewor14 can you go through this problem? Thx. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org