[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 @tnachen Can you check this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13077 @srowen /@tnachen Can you check this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13072: [SPARK-15288] [Mesos] Mesos dispatcher should handle gra...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13072 @srowen Can you check this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13143: [SPARK-15359] [Mesos] Mesos dispatcher should handle DRI...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13143 @tnachen Can you check this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16801: [SPARK-13619] [WEBUI] [CORE] Jobs page UI shows w...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/16801 [SPARK-13619] [WEBUI] [CORE] Jobs page UI shows wrong number of failed tasks ## What changes were proposed in this pull request? When the Failed/Killed Task End events come after the Job End event then these task events are simply getting ignored without considering/updating into JobUIData. It is happening because jobId information is getting deleted from stageIdToActiveJobIds during the Job End event and Task End event is not able to find the Job information to update. ## How was this patch tested? ### Current behaviour of Spark Jobs page for Running Application and History page, Completed Jobs (1) | Job Id | Description | Submitted | Duration | Stages: Succeeded/Total | Tasks (for all stages): Succeeded/Total | | --- | --- | --- | --- | --- | --- | | 0 | saveAsTextFile at JavaWordCountWithSlowTask.java:49 | 2017/01/25 09:03:14 | 1.4 min | 2/2 | 400/400 (17 killed) | Completed Stages (2) | Stage Id | Description | Submitted | Duration | Tasks: Succeeded/Total | Input | Output | Shuffle Read | Shuffle Write | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 1 | saveAsTextFile at JavaWordCountWithSlowTask.java:49 +details | 2017/01/25 09:04:34 | 5 s | 200/200 (2 failed) (1 killed) | | 6.8 KB | 2.3 MB | | | 0 | mapToPair at JavaWordCountWithSlowTask.java:33 +details | 2017/01/25 09:03:15 | 1.3 min | 200/200 (16 killed) | 1915.5 MB | | | 2.3 MB | ### Behaviour of the Web Pages after applying the patch, Completed Jobs (1) | Job Id | Description | Submitted | Duration | Stages: Succeeded/Total | Tasks (for all stages): Succeeded/Total | | --- | --- | --- | --- | --- | --- | | 0 | saveAsTextFile at JavaWordCountWithSlowTask.java:49 | 2017/01/25 09:03:14 | 1.4 min | 2/2 | 400/400 (2 failed) (17 killed) | Completed Stages (2) | Stage Id | Description | Submitted | Duration | Tasks: Succeeded/Total | Input | Output | Shuffle Read | Shuffle Write | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 1 | saveAsTextFile at JavaWordCountWithSlowTask.java:49 +details | 2017/01/25 09:04:34 | 5 s | 200/200 (2 failed) (1 killed) | | 6.8 KB | 2.3 MB | | 0 | mapToPair at JavaWordCountWithSlowTask.java:33 +details | 2017/01/25 09:03:15 | 1.3 min | 200/200 (16 killed) | 1915.5 MB | | | 2.3 MB | You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-13619 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16801.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16801 commit 3b51ef0e5ddd58e0bd8f90a52ca08145e5cdef4d Author: Devaraj K Date: 2017-02-04T01:40:35Z [SPARK-13619] [WEBUI] [CORE] Jobs page UI shows wrong number of failed tasks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13072: [SPARK-15288] [Mesos] Mesos dispatcher should handle gra...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13072 MesosClusterDispatcher also has multiple threads like Executor, when any one thread terminates in the MesosClusterDispatcher process due to some error/exception it keeps running without performing the terminated thread functionality. I think we need to handle those uncaught exceptions from the MesosClusterDispatcher process threads using the UncaughtExceptionHandler and take the action instead of running the MesosClusterDispatcher without performing the functionality and without notifying the user. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16725: [SPARK-19377] [WEBUI] [CORE] Killed tasks should ...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/16725 [SPARK-19377] [WEBUI] [CORE] Killed tasks should have the status as KILLED ## What changes were proposed in this pull request? Copying of the killed status was missing while getting the newTaskInfo object by dropping the unnecessary details to reduce the memory usage. This patch adds the copying of the killed status to newTaskInfo object, this will correct the display of the status from wrong status to KILLED status in Web UI. ## How was this patch tested? Current behaviour of displaying tasks in stage UI page, | Index | ID | Attempt | Status | Locality Level | Executor ID / Host | Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle Write Size / Records | Errors | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |143|10 |0 |SUCCESS|NODE_LOCAL |6 / x.xx.x.x stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0 | |0.0 B / 0|TaskKilled (killed intentionally)| |156|11 |0 |SUCCESS|NODE_LOCAL |5 / x.xx.x.x stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0 | |0.0 B / 0|TaskKilled (killed intentionally)| Web UI display after applying the patch, | Index | ID | Attempt | Status | Locality Level | Executor ID / Host | Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle Write Size / Records | Errors | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |143|10 |0 |KILLED |NODE_LOCAL |6 / x.xx.x.x stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0 | | 0.0 B / 0 | TaskKilled (killed intentionally)| |156|11 |0 |KILLED |NODE_LOCAL |5 / x.xx.x.x stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0 | |0.0 B / 0 | TaskKilled (killed intentionally)| You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-19377 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16725.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16725 commit 6206d109b646e55223a4b162a37e70f42f4570a1 Author: Devaraj K Date: 2017-01-28T05:53:21Z [SPARK-19377] [WEBUI] [CORE] Killed tasks should have the status as KILLED --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16705: [SPARK-19354] [Core] Killed tasks are getting mar...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/16705 [SPARK-19354] [Core] Killed tasks are getting marked as FAILED ## What changes were proposed in this pull request? Handling the exception which occurs during the kill and logging it instead of re-throwing the exception which causes to mark the task as FAILED. ## How was this patch tested? I verified this manually by running multiple applications, with the patch changes when any exception occurs during kill, it logs the exception and continues with the kill process. It shows/considers the task as KILLED in Web UI sections of 'Details for Job' and 'Aggregated Metrics by Executor'. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-19354 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16705.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16705 commit d472245bb7392db4dc1b260eeafba1470448ef03 Author: Devaraj K Date: 2017-01-25T21:33:09Z [SPARK-19354] [Core] Killed tasks are getting marked as FAILED --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13077: [SPARK-10748] [Mesos] Log error instead of crashi...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/13077#discussion_r94205390 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -559,15 +560,29 @@ private[spark] class MesosClusterScheduler( } else { val offer = offerOption.get val queuedTasks = tasks.getOrElseUpdate(offer.offerId, new ArrayBuffer[TaskInfo]) -val task = createTaskInfo(submission, offer) -queuedTasks += task -logTrace(s"Using offer ${offer.offerId.getValue} to launch driver " + - submission.submissionId) -val newState = new MesosClusterSubmissionState(submission, task.getTaskId, offer.slaveId, - None, new Date(), None, getDriverFrameworkID(submission)) -launchedDrivers(submission.submissionId) = newState -launchedDriversState.persist(submission.submissionId, newState) -afterLaunchCallback(submission.submissionId) +breakable { --- End diff -- Here it needs to continue in the for loop from the catch block with next set of drivers. It cannot return from the exception since it needs to launch the other candidates, I can consider the other suggestion i.e. moving the following code into try clause. I will update the PR by moving the code into try block. Please let me know if it doesnât make sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13077 @tnachen, sorry for the delay, I will update the patch. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12753: [SPARK-3767] [CORE] Support wildcard in Spark properties
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/12753 I will update this PR with the ConfigReader and reopen the jira. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12753: [SPARK-3767] [CORE] Support wildcard in Spark properties
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/12753 @vanzin, SPARK-3767 was resolved as 'Won't Fix' by @srowen. I was in assumption that SPARK-16671 covers this as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12753: [SPARK-3767] [CORE] Support wildcard in Spark properties
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/12753 @vanzin Thanks for looking into this, I have resolved the conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13143: [SPARK-15359] [Mesos] Mesos dispatcher should handle DRI...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13143 MesosDriver doesn't throw any exception, it just returns with the value as Status.DRIVER_ABORTED. ``` registerLatch.await() // propagate any error to the calling thread. This ensures that SparkContext creation fails // without leaving a broken context that won't be able to schedule any tasks error.foreach(throw _) This code handles exceptions and throws if it gets Status.DRIVER_ABORTED during registration, once the registration completes there is no code to handle and will be skipped the status and also thread dies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13077 Thanks @tnachen for looking into this, I will update this with the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11996: [SPARK-10530] [CORE] Kill other task attempts when one t...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/11996 @lw-lin I think it will release the resources and then it throws TaskKilledException at [Executor.scala#L307](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L307). If you are facing the issue then please file a separate ticket with the details, we can discuss there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13323: [SPARK-15555] [Mesos] Driver with --supervise option can...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13323 @tnachen Thanks for your review, I have added a test for this, can you have a look into it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59989/testReport/ `org.apache.spark.scheduler.BlacklistIntegrationSuite.Bad node with multiple executors, job will still succeed with the right confs` This test is passing in my local environment and also doesn't seem to be related this change. @tnachen can we retest it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59989/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 **[Test build #59989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59989/consoleFull)** for PR 13326 at commit [`7f4f34b`](https://github.com/apache/spark/commit/7f4f34b1dd8ec20297f1295610e11c8fed860652). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 **[Test build #59989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59989/consoleFull)** for PR 13326 at commit [`7f4f34b`](https://github.com/apache/spark/commit/7f4f34b1dd8ec20297f1295610e11c8fed860652). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13407: [SPARK-15665] [CORE] spark-submit --kill and --status ar...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13407 Thanks @vanzin for review and merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers wa...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/13326#discussion_r65748851 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -188,10 +188,10 @@ private[spark] class MesosClusterScheduler( mesosDriver.killTask(task.taskId) k.success = true k.message = "Killing running driver" - } else if (removeFromQueuedDrivers(submissionId)) { --- End diff -- Thanks @tnachen for looking into this, I see it is being used in other places. ``` queuedDrivers .filter(d => launchedDrivers.contains(d.submissionId)) .foreach(d => removeFromQueuedDrivers(d.submissionId)) ``` ``` // Then we walk through the queued drivers and try to schedule them. scheduleTasks( copyBuffer(queuedDrivers), removeFromQueuedDrivers, currentOffers, tasks) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13407: [SPARK-15665] [CORE] spark-submit --kill and --status ar...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13407 Thanks @vanzin and @andrewor14 for looking into this, sorry for the delay. > If SparkSubmit can still process --kill and --status with those, then that's fine too (just use SparkLauncher.NO_RESOURCE). I tried this but it doesn't work with the below error ``` [devaraj@server2 spark-master]$ ./bin/spark-submit --kill driver-20160531171222- Error: Cannot load main class from JAR spark-internal with URI null. Please specify a class through --class. Run with --help for usage help or --verbose for debug output ``` I have renamed the printInfo flag to isAppResourceReq and used the same for kill and status cases also. Please review and let me know your feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15665] [CORE] spark-submit --kill and -...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/13407 [SPARK-15665] [CORE] spark-submit --kill and --status are not working ## What changes were proposed in this pull request? --kill and --status were not considered while handling in OptionParser and due to that it was failing. Now handling the --kill and --status options as part of OptionParser.handle. ## How was this patch tested? I have verified these manually by running --kill and --status commands. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-15665 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13407.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13407 commit 62ebc7a22fde3f0974f381a89c48c8e5d43a1ce4 Author: Devaraj K Date: 2016-05-31T09:38:50Z [SPARK-15665] [CORE] spark-submit --kill and --status are not working --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222589679 Thanks @kayousterhout. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222104830 @kayousterhout, I have added inline comments and the build is also fine now, please have a look into it. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-221968586 @kayousterhout Thanks a lot for your review and comments. I have fixed them, please have a look into this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15560] [Mesos] Queued/Supervise drivers...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/13326 [SPARK-15560] [Mesos] Queued/Supervise drivers waiting for retry drivers disappear for kill command in Mesos mode ## What changes were proposed in this pull request? With the patch, it moves the drivers from Queued Drivers/Supervise drivers waiting for retry section to Finished Drivers section when they get killed. ## How was this patch tested? I have verified it manually by checking the Mesos Dispatcher UI while simulating this scenario. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-15560 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13326.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13326 commit 7f4f34b1dd8ec20297f1295610e11c8fed860652 Author: Devaraj K Date: 2016-05-26T10:12:34Z [SPARK-15560] [Mesos] Queued/Supervise drivers waiting for retry drivers disappear for kill command in Mesos mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15555] [Mesos] Driver with --supervise ...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/13323 [SPARK-1] [Mesos] Driver with --supervise option cannot be killed in Mesos mode ## What changes were proposed in this pull request? Not adding the Killed applications for retry. ## How was this patch tested? I have verified manually in the Mesos cluster, with the changes the killed applications move to Finished Drivers section and will not retry. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13323.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13323 commit 2e5664c63416fed1a9954fd2ed6c71773eed34ed Author: Devaraj K Date: 2016-05-26T07:48:00Z [SPARK-1] [Mesos] Driver with --supervise option cannot be killed in Mesos mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-221531103 @kayousterhout, can you have look into this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r63738214 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task => + task.initialAccumulators.map { a => a.toInfo(Some(0L), None) } +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) --- End diff -- I feel adding an argument to **checkSpeculatableTasks()** would lead to change the signature of the method in the Schedulable interface and correspondingly all of its implementations. I am thinking to move the code in **TaskSetManager.checkSpeculatableTasks()** to another method which takes an argument(i.e minTimeToSpeculation: Int) and same method can be used in the test. Please give your opinion on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r63736986 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task => + task.initialAccumulators.map { a => a.toInfo(Some(0L), None) } +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) +val speculation = manager.checkSpeculatableTasks +assert(speculation === true) +// Offer resource to start the speculative attempt for the running task +val taskOption5 = manager.resourceOffer("exec1", "host1", NO_PREF) +assert(taskOption5.isDefined) +val task5 = taskOption5.get +assert(task5.taskId === 4) +assert(task5.executorId === "exec1") +assert(task5.attemptNumber === 1) +sched.backend = mock(classOf[SchedulerBackend]) +// Complete the speculative attempt for the running task +manager.handleSuccessfulTask(4, createTaskResult(3, accumUpdatesByTask(3))) +assert(sched.endedTasks(3) === Success) --- End diff -- Here **sched.backend** is **mock(classOf[SchedulerBackend])** and as part of **manager.handleSuccessfulTask()**, it issues **sched.backend.killTask()** for any other attempts. Since it is a mock invocation it only ensures that other attempts kill invocation is happening. I have added the same in the comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-220082195 Thanks a lot @kayousterhout for the review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15359] [Mesos] Mesos dispatcher should ...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/13143 [SPARK-15359] [Mesos] Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run() ## What changes were proposed in this pull request? When the mesosDriver.run() returns with the status as DRIVER_ABORTED then throwing the exception which can be handled from SparkUncaughtExceptionHandler to shutdown the dispatcher. ## How was this patch tested? I verified it manually, the driver thread throws exception when mesosDriver.run() returns with the DRIVER_ABORTED status. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-15359 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13143.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13143 commit c16fb5ff62a943d2c17524f6e8a328acfc8dfd82 Author: Devaraj K Date: 2016-05-17T08:32:13Z [SPARK-15359] [Mesos] Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10748] [Mesos] Log error instead of cra...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/13077 [SPARK-10748] [Mesos] Log error instead of crashing Spark Mesos dispatcher when a job is misconfigured ## What changes were proposed in this pull request? Now handling the spark exception which gets thrown for invalid job configuration, marking that job as failed and continuing to launch the other drivers instead of throwing the exception. ## How was this patch tested? I verified manually, now the misconfigured jobs move to Finished Drivers section in UI and continue to launch the other jobs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-10748 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13077.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13077 commit ae85f8154e506c223a018d9c04c58967a82fa580 Author: Devaraj K Date: 2016-05-12T10:10:50Z [SPARK-10748] [Mesos] Log error instead of crashing Spark Mesos dispatcher when a job is misconfigured --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15288] [SQL] Support old table schema c...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/13073#issuecomment-218703662 @clockfly, seems JIRA number mentioned in the title is wrong, I think it should be SPARK-15253. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-218671842 @kayousterhout, @markhamstra any comments plz? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15288] [Mesos] Mesos dispatcher should ...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/13072 [SPARK-15288] [Mesos] Mesos dispatcher should handle gracefully when any thread gets UncaughtException ## What changes were proposed in this pull request? Adding the default UncaughtExceptionHandler to the MesosClusterDispatcher. ## How was this patch tested? I verified it manually, when any of the dispatcher thread gets uncaught exceptions then the default UncaughtExceptionHandler will handle those exceptions. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-15288 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13072.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13072 commit 27de4fb65b7400d7d2e76843cb9eb1c55c9d69d4 Author: Devaraj K Date: 2016-05-12T06:25:38Z [SPARK-15288] [Mesos] Mesos dispatcher should handle gracefully when any thread gets UncaughtException --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14234] [CORE] Executor crashes for Task...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/12031#issuecomment-216774818 Thanks a lot @zsxwing for pushing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3767] [CORE] Support wildcard in Spark ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/12753#issuecomment-216760992 @rxin, Please have a look into this and let me know any thing needs to be done here. About @, M/R also uses @ for the taskid wild card in java opts and there is no problem in windows and as well as in other places with @. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3767] [CORE] Support wildcard in Spark ...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/12753#discussion_r61995355 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala --- @@ -166,14 +166,15 @@ private[spark] class CoarseMesosSchedulerBackend( environment.addVariables( Environment.Variable.newBuilder().setName("SPARK_CLASSPATH").setValue(cp).build()) } -val extraJavaOpts = conf.get("spark.executor.extraJavaOptions", "") +var extraJavaOpts = conf.get("spark.executor.extraJavaOptions", "") --- End diff -- Thanks @BryanCutler for the suggestion, I have addressed it in the latest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1989] [CORE] Exit executors faster if t...
Github user devaraj-kavali closed the pull request at: https://github.com/apache/spark/pull/12571 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3767] [CORE] Support wildcard in Spark ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/12753#issuecomment-215619844 Thanks @rxin for checking this, I don't think @ is used any where. Here again we are replacing only for 'spark.executor.extraJavaOptions' value when @execid@ occurs, any other @ symbols we leave as it is, so I don't think any problem occurs due to this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3767] [CORE] Support wildcard in Spark ...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/12753 [SPARK-3767] [CORE] Support wildcard in Spark properties ## What changes were proposed in this pull request? Added provision to specify the 'spark.executor.extraJavaOptions' value in terms of the Executor Id(i.e. @execid@). @execid@ will be replaced with the Executor Id while starting the executor. ## How was this patch tested? I have verified this by checking the executor process command and gc logs. I verified the same in different deployement modes(Standalone, YARN, Mesos). You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-3767 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12753.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12753 commit fc3bbd0d72d9319885b877490f57ed4f1b870fa2 Author: Devaraj K Date: 2016-04-28T10:46:51Z [SPARK-3767] [CORE] Support wildcard in Spark properties --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13965] [CORE] TaskSetManager should kil...
Github user devaraj-kavali closed the pull request at: https://github.com/apache/spark/pull/11778 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1989] [CORE] Exit executors faster if t...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/12571#discussion_r61209327 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala --- @@ -66,12 +66,20 @@ private[spark] class SparkDeploySchedulerBackend( "--cores", "{{CORES}}", "--app-id", "{{APP_ID}}", "--worker-url", "{{WORKER_URL}}") -val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions") +var extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions") .map(Utils.splitCommandString).getOrElse(Seq.empty) val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath") .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil) val libraryPathEntries = sc.conf.getOption("spark.executor.extraLibraryPath") .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil) +// Add GC Limit options if they are not present +val extraJavaOptsAsStr = extraJavaOpts.mkString(" ") +if (!extraJavaOptsAsStr.contains("-XX:GCTimeLimit")) { + extraJavaOpts :+= Utils.getGCTimeLimitOption --- End diff -- Thanks @vanzin for your comments, I will update with the comments fix and also will verify with non-Oracle JVMs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1989] [CORE] Exit executors faster if t...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/12571#issuecomment-214982150 Thanks @tgravescs for the comment, users can still specify this gc params as part of the java opts. If the user doesn't specify these gc params then only we are adding with default values for executors instead of the relying on JVM default values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1989] [CORE] Exit executors faster if t...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/12571#issuecomment-213522339 @srowen I have made the changes, Please have a look into this. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1989] [CORE] Exit executors faster if t...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/12571#issuecomment-212841070 Thanks @srowen for checking this immediately, I will make the changes as per your explanation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14234] [CORE] Executor crashes for Task...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/12031#issuecomment-212826864 ping @andrewor14, @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1989] [CORE] Exit executors faster if t...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/12571 [SPARK-1989] [CORE] Exit executors faster if they get into a cycle of heavy GC ## What changes were proposed in this pull request? Added spark.executor.gcTimeLimit config for getting the value for the GC Option -XX:GCTimeLimit and spark.executor.gcHeapFreeLimit config for getting the value for the GC Option -XX:GCHeapFreeLimit. Now GC time limit and heap free limit options need to set using these configs and are not allowed as part of spark.executor.extraJavaOptions. ## How was this patch tested? I have verified this by checking the executor process command when I ran different spark applications. I verified the same in different deployement modes(Standalone, YARN, Mesos). You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-1989 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12571.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12571 commit 4f715623487f95a71b6c38c2e50c5e1b6ec7a1b3 Author: Devaraj K Date: 2016-04-21T09:15:07Z [SPARK-1989] [CORE] Exit executors faster if they get into a cycle of heavy GC --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14234] [CORE] Executor crashes for Task...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/12031#issuecomment-208725033 @andrewor14, Can you have a look into this when you find some time? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14234] [CORE] Executor crashes for Task...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/12031#issuecomment-206215752 Thanks @zsxwing for your comments. I have addressed them, Please have a look into this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13063] [YARN] Make the SPARK YARN STAGI...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/12082#discussion_r58590060 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -1444,4 +1444,19 @@ object Client extends Logging { uri.startsWith(s"$LOCAL_SCHEME:") } + /** + * Returns the app staging dir. + */ + private def getAppStagingDirPath( + conf: SparkConf, + fs: FileSystem, + appStagingDir: String): Path = { +val stagingRootDir = conf.get(STAGING_DIR).orNull --- End diff -- Thanks @tgravescs for the suggestion. I have addressed it in the latest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r58563479 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task => + task.initialAccumulators.map { a => a.toInfo(Some(0L), None) } +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) --- End diff -- Thanks @tgravescs for your quick response. Here Thread.sleep(100) is to match the threshold value mentioned in TaskSetManager.checkSpeculatableTasks(). It is the minimum time where the task needs to run for this much of time before becoming eligible for launching a speculative attempt. I don't see any way to change this default value. > val medianDuration = durations(min((0.5 * tasksSuccessful).round.toInt, durations.length - 1)) > val threshold = max(SPECULATION_MULTIPLIER * medianDuration, 100) > I don't think this threshold value is related to the config âspark.speculation.intervalâ here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14234] [CORE] Executor crashes for Task...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/12031#discussion_r58342616 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -319,10 +319,14 @@ private[spark] class Executor( case _: TaskKilledException | _: InterruptedException if task.killed => logInfo(s"Executor killed $taskName (TID $taskId)") + // Reset the interrupted status of the thread to update the status + Thread.interrupted() execBackend.statusUpdate(taskId, TaskState.KILLED, ser.serialize(TaskKilled)) --- End diff -- Thanks @zsxwing for looking into the patch. > What will happen if the thread is interrupted when execBackend.statusUpdate is running? I think the executor will still crash. I do think this is a problem. I have handled it in the latest, Can you look into the changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-204313542 Thanks @tgravescs for checking this, I will add test for these changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13063] [YARN] Make the SPARK YARN STAGI...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/12082#discussion_r58177914 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -1444,4 +1444,19 @@ object Client extends Logging { uri.startsWith(s"$LOCAL_SCHEME:") } + /** + * Returns the app staging dir. + */ + private def getAppStagingDirPath( + conf: SparkConf, + fs: FileSystem, + appStagingDir: String): Path = { +val stagingRootDir = conf.get(STAGING_DIR).orNull --- End diff -- `conf.get(STAGING_DIR).orElse(fs.getHomeDirectory)` gives type mismatch compilation error since the `fs.getHomeDirectory` return type is Path and it expects to be String as per `conf.get(STAGING_DIR)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13063] [YARN] Make the SPARK YARN STAGI...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/12082#issuecomment-204309628 Thanks @tgravescs for looking into the patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13063] [YARN] Make the SPARK YARN STAGI...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/12082 [SPARK-13063] [YARN] Make the SPARK YARN STAGING DIR as configurable ## What changes were proposed in this pull request? Made the SPARK YARN STAGING DIR as configurable with the configuration as 'spark.yarn.staging-dir'. ## How was this patch tested? I have verified it manually by running applications on yarn, If the 'spark.yarn.staging-dir' is configured then the value used as staging directory otherwise uses the default value i.e. file systemâs home directory for the user. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-13063 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12082.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12082 commit c3f02fdbdeb9c9dbe3d2a7361414005eed987509 Author: Devaraj K Date: 2016-03-31T09:41:22Z [SPARK-13063] [YARN] Make the SPARK YARN STAGING DIR as configurable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14234] [CORE] Executor crashes for Task...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/12031 [SPARK-14234] [CORE] Executor crashes for TaskRunner thread interruption ## What changes were proposed in this pull request? Resetting the task interruption status before updating the task status. ## How was this patch tested? I have verified it manually by running multiple applications, Executor doesn't crash and updates the status to the driver without any exceptions with the patch changes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-14234 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12031.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12031 commit 94b31fa52ce283c2c3de838bfd32dd2cc918c50d Author: Devaraj K Date: 2016-03-29T08:23:59Z [SPARK-14234] [CORE] Executor crashes for TaskRunner thread interruption --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13343] [CORE] speculative tasks that di...
Github user devaraj-kavali closed the pull request at: https://github.com/apache/spark/pull/11916 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13343] [CORE] speculative tasks that di...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11916#issuecomment-202318900 I have moved these changes to the PR https://github.com/apache/spark/pull/11996 for SPARK-10530. @tgravescs, please have a look into https://github.com/apache/spark/pull/11996 when you have some time. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/11996 [SPARK-10530] [CORE] Kill other task attempts when one taskattempt belonging the same task is succeeded in speculation ## What changes were proposed in this pull request? With this patch, TaskSetManager kills other running attempts when any one of the attempt succeeds for the same task. Also killed tasks will not be considered as failed tasks and they get listed separately in the UI and also shows the task state as KILLED instead of FAILED. ## How was this patch tested? core\src\test\scala\org\apache\spark\ui\jobs\JobProgressListenerSuite.scala core\src\test\scala\org\apache\spark\util\JsonProtocolSuite.scala I have verified this patch manually by enabling spark.speculation as true, when any attempt gets succeeded then other running attempts are getting killed for the same task and other pending tasks are getting assigned in those. And also when any attempt gets killed then they are considered as KILLED tasks and not considered as FAILED tasks. Please find the attached screen shots for the reference. ![stage-tasks-table](https://cloud.githubusercontent.com/assets/3174804/14075132/394c6a12-f4f4-11e5-8638-20ff7b8cc9bc.png) ![stages-table](https://cloud.githubusercontent.com/assets/3174804/14075134/3b60f412-f4f4-11e5-9ea6-dd0dcc86eb03.png) Ref : https://github.com/apache/spark/pull/11916 You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-10530 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11996.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11996 commit 1a9e36516e9016f43a605abce0ee49e1262363a6 Author: Devaraj K Date: 2016-03-28T09:03:07Z [SPARK-10530] [CORE] Kill other task attempts when one taskattempt belonging the same task is succeeded in speculation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13343] [CORE] speculative tasks that di...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/11916#discussion_r57459040 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -620,6 +620,14 @@ private[spark] class TaskSetManager( // Note: "result.value()" only deserializes the value when it's called at the first time, so // here "result.value()" just returns the value and won't block other threads. sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info) +// Kill other task attempts if any as the one attempt succeeded +for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber != info.attemptNumber --- End diff -- Thanks @tgravescs. I would be happy to fix the issue about succeeding more than one attempt as you explained as part of this PR but I am thinking it would be good if we can handle it separately without mixing with the current PR changes. I will move the current changes to a PR for [SPARK-10530](https://issues.apache.org/jira/browse/SPARK-10530) and we can continue to fix multiple attempts success issue as part of the [SPARK-13343](https://issues.apache.org/jira/browse/SPARK-13343). Please let me if it doesn't make sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13343] [CORE] speculative tasks that di...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/11916#discussion_r57349258 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -620,6 +620,14 @@ private[spark] class TaskSetManager( // Note: "result.value()" only deserializes the value when it's called at the first time, so // here "result.value()" just returns the value and won't block other threads. sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info) +// Kill other task attempts if any as the one attempt succeeded +for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber != info.attemptNumber --- End diff -- I can think that during the map phase(which don't write to Hadoop) there is a chance of succeeding two attempts as you explained. But in final phase(which write to Hadoop) tasks, during commitTask() if two attempts try to rename taskAttemptPath to committedTaskPath then only one attempt would succeed and other will fail with the rename failure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13343] [CORE] speculative tasks that di...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/11916#discussion_r57340394 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -620,6 +620,14 @@ private[spark] class TaskSetManager( // Note: "result.value()" only deserializes the value when it's called at the first time, so // here "result.value()" just returns the value and won't block other threads. sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info) +// Kill other task attempts if any as the one attempt succeeded +for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber != info.attemptNumber --- End diff -- Thanks @tgravescs for the comment. If anyone attempt is actually completed(succeeded) and not reached the success event here and during that time if any other attempt tries to commit the o/p then the SparkHadoopMapRedUtil.commitTask would prevent it doing so. And other case is that if the task attempt completes in Executor before getting the kill signal from TaskSetManager.handleSuccessfulTask then the Executor ignores the kill request and there will be no problem. I don't see a case that there will be two attempts becoming success where the task attempts use the commit coordination, Please help me understand if there are any. Here the major issue is, there are other task attempts running and not releasing the executor threads even if there is a task attempt already succeeded for the same task, sometimes these unnecessary task attempts keep running till the job/application completion(if the worker nodes running these attempts are very slow) which makes the application performance worse. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13343] [CORE] speculative tasks that di...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11916#issuecomment-200740334 Thanks @rxin and @andrewor14 for looking into the patch. These failed tests in the latest build are not related to this patch and they have been failing in the previous builds as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13343] [CORE] speculative tasks that di...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/11916 [SPARK-13343] [CORE] speculative tasks that didn't commit shouldn't be marked as success ## What changes were proposed in this pull request? Now with this patch, killed tasks will not be considered as failed tasks and they get listed separately in the UI and also shows the task state as KILLED instead of FAILED. ## How was this patch tested? I have verified this patch manually, when any attempt gets killed then they are considered as KILLED tasks and not considered as FAILED tasks. Please find the attached screen shots for the reference. [SPARK-13965](https://issues.apache.org/jira/browse/SPARK-13965)/https://github.com/apache/spark/pull/11778 kills the running task attempts immediately when any one of the task succeed and this patch will show consider and show them as KILLED. ![stage-tasks-table](https://cloud.githubusercontent.com/assets/3174804/13984882/1e8deb66-f11f-11e5-9a89-e571dc5f1eef.png) ![stages-table](https://cloud.githubusercontent.com/assets/3174804/13984881/1e8d8216-f11f-11e5-9d29-22a7aca94938.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-13343 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11916.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11916 commit 7c033b6d6dd7eb1d9296d82a965facec95dd6757 Author: Devaraj K Date: 2016-03-23T12:11:30Z [SPARK-13343] [CORE] speculative tasks that didn't commit shouldn't be marked as success --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-913] [CORE] log the size of each shuffl...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/11819 [SPARK-913] [CORE] log the size of each shuffle block in block manager ## What changes were proposed in this pull request? Added a log message which shows the size of the block. ## How was this patch tested? Verified it manually that this log is coming in the executor log. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-913 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11819.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11819 commit e738937d557230e5dacee7f7f913e37e54255a8e Author: Devaraj K Date: 2016-03-18T10:17:09Z [SPARK-913] [CORE] log the size of each shuffle block in block manager --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13965] [CORE] Driver should kill the ot...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/11778 [SPARK-13965] [CORE] Driver should kill the other running task attempts if any one task attempt succeeds for the same task ## What changes were proposed in this pull request? core\src\main\scala\org\apache\spark\scheduler\TaskSetManager.scala TaskSetManager kills other running attempts when any one attempts succeeds for the same task. ## How was this patch tested? I have verified this patch manually by enabling spark.speculation as true, when any attempt gets succeeded then other running attempts are getting killed for the same task and other pending tasks are getting assigned in those. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-13965 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11778.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11778 commit 10cf58e10ffb93961db41d3269fd48dab8ecf711 Author: Devaraj K Date: 2016-03-17T09:10:14Z [SPARK-13965] [CORE] Driver should kill the other running task attempts if any one task attempt succeeds for the same task --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-913] [CORE] log the size of each shuffl...
Github user devaraj-kavali closed the pull request at: https://github.com/apache/spark/pull/11819 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-913] [CORE] log the size of each shuffl...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11819#issuecomment-198434010 Thanks @srowen and @JoshRosen for the details, I am closing this since the BlockManager no longer handles shuffled blocks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117] [Web UI] WebUI should use the lo...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11490#issuecomment-193282292 Sounds fine @srowen, I will update with the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117] [Web UI] WebUI should use the lo...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/11490#discussion_r5519 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -134,7 +134,8 @@ private[spark] abstract class WebUI( def bind() { assert(!serverInfo.isDefined, "Attempted to bind %s more than once!".format(className)) try { - serverInfo = Some(startJettyServer("0.0.0.0", port, sslOptions, handlers, conf, name)) + var host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + serverInfo = Some(startJettyServer(host, port, sslOptions, handlers, conf, name)) logInfo("Started %s at http://%s:%d".format(className, publicHostName, boundPort)) --- End diff -- I am in an assumption that we need to consider SPARK_PUBLIC_DNS value for showing the url in the log as per our previous conversation. Donât we need to consider SPARK_PUBLIC_DNS while logging here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117] [Web UI] WebUI should use the lo...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11490#issuecomment-192878456 Thanks @srowen and @zsxwing for the confirmation. I have updated the description and fixed the review comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117] [Web UI] WebUI should use the lo...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11490#issuecomment-192118319 I agree @srowen, I see that SPARK_PUBLIC_DNS is not for binding purpose. I have changed the env var to SPARK_LOCAL_IP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117] [Web UI] WebUI should use the lo...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11490#issuecomment-191869171 I had overlooked and it was my mistake, I think we need to consider both the env variables something like, ``` serverInfo = Some(startJettyServer(Option(conf.getenv("SPARK_PUBLIC_DNS")) .getOrElse(Option(conf.getenv("SPARK_LOCAL_IP")) .getOrElse("0.0.0.0")), port, sslOptions, handlers, conf, name)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117] [Web UI] WebUI should use the lo...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/11490 [SPARK-13117] [Web UI] WebUI should use the local ip not 0.0.0.0 ## What changes were proposed in this pull request? In WebUI, now Jetty Server starts with SPARK_PUBLIC_DNS config value if it is configured otherwise it starts with default value as '0.0.0.0'. It is continuation as per the closed PR https://github.com/apache/spark/pull/11133 for the JIRA SPARK-13117 and discussion in SPARK-13117. ## How was this patch tested? This has been verified using the command 'netstat -tnlp | grep ' to check on which IP/hostname is binding with the below steps. In the below results, mentioned PID in the command is the corresponding process id. Without the patch changes, Web UI(Jetty Server) is not taking the value configured for SPARK_PUBLIC_DNS and it is listening to all the interfaces. ## Master ``` [devaraj@stobdtserver2 sbin]$ netstat -tnlp | grep 3930 tcp6 0 0 :::8080 :::*LISTEN 3930/java ``` ## Worker ``` [devaraj@stobdtserver2 sbin]$ netstat -tnlp | grep 4090 tcp6 0 0 :::8081 :::*LISTEN 4090/java ``` ## History Server Process, ``` [devaraj@stobdtserver2 sbin]$ netstat -tnlp | grep 2471 tcp6 0 0 :::18080:::*LISTEN 2471/java ``` ## Driver ``` [devaraj@stobdtserver2 spark-master]$ netstat -tnlp | grep 6556 tcp6 0 0 :::4040 :::*LISTEN 6556/java ``` With the patch changes # i. With SPARK_PUBLIC_DNS configured If the SPARK_PUBLIC_DNS is configured then all the processes Web UI(Jetty Server) is getting bind to the configured value. ## Master ``` [devaraj@stobdtserver2 sbin]$ netstat -tnlp | grep 1561 tcp6 0 0 x.x.x.x:8080 :::*LISTEN 1561/java ``` ## Worker ``` [devaraj@stobdtserver2 sbin]$ netstat -tnlp | grep 2229 tcp6 0 0 x.x.x.x:8081 :::*LISTEN 2229/java ``` ## History Server ``` [devaraj@stobdtserver2 sbin]$ netstat -tnlp | grep 3747 tcp6 0 0 x.x.x.x:18080 :::*LISTEN 3747/java ``` ## Driver ``` [devaraj@stobdtserver2 spark-master]$ netstat -tnlp | grep 6013 tcp6 0 0 x.x.x.x:4040 :::*LISTEN 6013/java ``` # ii. Without SPARK_PUBLIC_DNS configured If the SPARK_PUBLIC_DNS is not configured then all the processes Web UI(Jetty Server) will start with the '0.0.0.0' as default value. ## Master ``` [devaraj@stobdtserver2 sbin]$ netstat -tnlp | grep 4573 tcp6 0 0 :::8080 :::*LISTEN 4573/java ``` ## Worker ``` [devaraj@stobdtserver2 sbin]$ netstat -tnlp | grep 4703 tcp6 0 0 :::8081 :::*LISTEN 4703/java ``` ## History Server ``` [devaraj@stobdtserver2 sbin]$ netstat -tnlp | grep 4846 tcp6 0 0 :::18080:::*LISTEN 4846/java ``` ## Driver ``` [devaraj@stobdtserver2 sbin]$ netstat -tnlp | grep 5437 tcp6 0 0 :::4040 :::*LISTEN 5437/java ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-13117-v1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11490.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11490 commit 1d736ff7053b0df7b42b34ae738b7a2873e718a7 Author: Devaraj K Date: 2016-03-03T09:00:24Z [SPARK-13117] [Web UI] WebUI should use the local ip not 0.0.0.0 In WebUI, now Jetty Server starts with SPARK_PUBLIC_DNS config value if it is configured otherwise it starts with default value as '0.0.0.0'. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13621] [CORE] TestExecutor.scala needs ...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/11474 [SPARK-13621] [CORE] TestExecutor.scala needs to be moved to test package Moved TestExecutor.scala from src to test package and removed the unused file TestClient.scala. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-13621 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11474.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11474 commit 894638b208cfbec911161093f14f2e05ed31c2a9 Author: Devaraj K Date: 2016-03-02T17:37:03Z [SPARK-13621] [CORE] TestExecutor.scala needs to be moved to test package Moved TestExecutor.scala from src to test package and removed unused file TestClient.scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117][Web UI] WebUI should use the loc...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11133#issuecomment-189320534 @srowen Would it be OK if we start the Jetty server with the default value as "0.0.0.0" instead of the local host name and it can take effect of the configured value for SPARK_PUBLIC_DNS if it is configured? It would change only for the Web UI and doesn't impact any others. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117][Web UI] WebUI should use the loc...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11133#issuecomment-188678997 Earlier there was no problem in the test because the jetty server was getting started with â0.0.0.0â and was not taking effect of the value configured for SPARK_PUBLIC_DNS and test assertions are checking the host name of the url's and those url's are getting derived from the SPARK_PUBLIC_DNS. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13012] [Documentation] Replace example ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11053#issuecomment-188062361 @yinxusen I will look into the issue SPARK-13462, Thanks for creating it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117][Web UI] WebUI should use the loc...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11133#issuecomment-187520503 @srowen, I have fixed the test failure, Can you have a look into this? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117][Web UI] WebUI should use the loc...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11133#issuecomment-187098980 It is not giving clear details about the failure and exiting with the exit code 1 is because of ***System.exit(1)***. I think we can skip this ***System.exit(1)*** while running tests to avoid the termination of jvm for these kind of exceptions and show them as test failures. ```javascript try { serverInfo = Some(startJettyServer(publicHostName, port, sslOptions, handlers, conf, name)) logInfo("Started %s at http://%s:%d".format(className, publicHostName, boundPort)) } catch { case e: Exception => logError("Failed to bind %s".format(className), e) System.exit(1) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117][Web UI] WebUI should use the loc...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11133#issuecomment-187096991 I see the test/Jenkins failure is due to the PR change. Here org.apache.spark.deploy.LogUrlsStandaloneSuite is failing because of the below exception, ``` 16/02/22 17:38:32.257 dispatcher-event-loop-5 ERROR Worker: Connection to master failed! Waiting for master to reconnect... 16/02/22 17:38:42.441 ScalaTest-main-running-LogUrlsStandaloneSuite ERROR SparkUI: Failed to bind SparkUI java.net.SocketException: Unresolved address at sun.nio.ch.Net.translateToSocketException(Net.java:157) at sun.nio.ch.Net.translateException(Net.java:183) at sun.nio.ch.Net.translateException(Net.java:189) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76) at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.eclipse.jetty.server.Server.doStart(Server.java:293) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:283) at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:293) at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:293) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1973) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:166) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1964) at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:293) at org.apache.spark.ui.WebUI.bind(WebUI.scala:137) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:458) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:458) at scala.Option.foreach(Option.scala:257) at org.apache.spark.SparkContext.(SparkContext.scala:458) at org.apache.spark.SparkContext.(SparkContext.scala:133) at org.apache.spark.deploy.LogUrlsStandaloneSuite$$anonfun$2.apply$mcV$sp(LogUrlsStandaloneSuite.scala:59) at org.apache.spark.deploy.LogUrlsStandaloneSuite$$anonfun$2.apply(LogUrlsStandaloneSuite.scala:55) at org.apache.spark.deploy.LogUrlsStandaloneSuite$$anonfun$2.apply(LogUrlsStandaloneSuite.scala:55) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) ``` In LogUrlsStandaloneSuite.scala, SPARK_PUBLIC_DNS is getting set as "public_dns" and same is trying to use while starting the jetty server in WebUI.scala and it is failing to resolve the "public_dns". ```javascript test("verify that log urls reflect SPARK_PUBLIC_DNS (SPARK-6175)") { val SPARK_PUBLIC_DNS = "public_dns" val conf = new SparkConfWithEnv(Map("SPARK_PUBLIC_DNS" -> SPARK_PUBLIC_DNS)).set( "spark.extraListeners", classOf[SaveExecutorInfo].getName) sc = new SparkContext("local-cluster[2,1,1024]", "test", conf) ``` ```javascript protected val publicHostName = Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse(localHostName) def bind() { assert(!serverInfo.isDefined, "Attempted to bind %s more than once!".format(className)) try { serverInfo = Some(startJettyServer(publicHostName, port, sslOptions, handlers, conf, name)) logInfo("Started %s at http://%s:%d".format(className, publicHostName, boundPort)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13012] [Documentation] Replace example ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11053#issuecomment-186855380 Thanks @yinxusen for the good suggestion, I have addressed it. > ModelSelectionViaTrainValidationSplitExample and JavaModelSelectionViaTrainValidationSplitExample still have a problem of Vector serialization. But I think we can add follow-up JIRA to locate the bug and fix it. Yes, we can create an another followup JIRA to fix the problem. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117][Web UI] WebUI should use the loc...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11133#issuecomment-186185243 Thanks @srowen for trying jenkins test to check this. ```javascript [info] - verify that correct log urls get propagated from workers (2 seconds, 508 milliseconds) Exception in thread "Thread-46" Exception in thread "Thread-53" java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2293) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2586) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at sbt.React.react(ForkTests.scala:114) at sbt.ForkTests$$anonfun$mainTestTask$1$Acceptor$2$.run(ForkTests.scala:74) at java.lang.Thread.run(Thread.java:745) java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2598) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1$React.react(Framework.scala:945) at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1.run(Framework.scala:934) at java.lang.Thread.run(Thread.java:745) [info] ScalaTest [info] Run completed in 10 minutes, 41 seconds. [info] Total number of tests run: 1378 [info] Suites: completed 152, aborted 0 [info] Tests: succeeded 1378, failed 0, canceled 0, ignored 5, pending 0 [info] All tests passed. [error] Error: Total 0, Failed 0, Errors 0, Passed 0 [error] Error during tests: [error] Running java with options -classpath /home/jenkins/workspace/SparkPullRequestBuilder/core/target/scala-2.11/test-classes:/home/jenkins/workspace/SparkPullRequestBuilder/core/target/scala-2.11/classes:/home/jenkins/workspace/SparkPullRequestBuilder/launcher/target/scala-2.11/classes:/home/jenkins/workspace/SparkPullRequestBuilder/network/common/target/scala-2.11/classes:/home/jenkins/workspace/SparkPullRequestBuilder/network/shuffle/target/scala-2.11/classes:/home/jenkins/workspace/SparkPullRequestBuilder/unsafe/target/scala-2.11/classes:/home/jenkins/wor:/home/sparkivy/per-executor-caches/7/.sbt/boot/scala-2.10.5/org.scala-sbt/sbt/0.13.9/test-agent-0.13.9.jar:/home/sparkivy/per-executor-caches/7/.sbt/boot/scala-2.10.5/org.scala-sbt/sbt/0.13.9/test-interface-1.0.jar sbt.ForkMain 55745 failed with exit code 1 [info] MQTTStreamSuite: ``` ```javascript [info] Passed: Total 975, Failed 0, Errors 0, Passed 975, Ignored 12 [error] (core/test:test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 4484 s, completed Feb 17, 2016 5:23:51 AM [error] running /home/jenkins/workspace/SparkPullRequestBuilder/build/sbt -Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl -Phive-thriftserver -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test ; received return code 1 ``` I think Jenkins are showing test failures because of these, I don't see any test case failure here. Can anyone help me to give some information about how to check these test errors? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13012] [Documentation] Replace example ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11053#issuecomment-186082803 Thanks again @yinxusen for the review, I have addressed the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13016] [Documentation] Replace example ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11132#issuecomment-186070514 Thanks again @yinxusen for the review, I have addressed them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13016] [Documentation] Replace example ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11132#issuecomment-185629805 Thanks @yinxusen for the review, I have addressed them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13012] [Documentation] Replace example ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11053#issuecomment-185611856 Thanks @yinxusen for your details review and comments. I have addressed them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13016] [Documentation] Replace example ...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/11132#discussion_r53277719 --- Diff: examples/src/main/java/org/apache/spark/examples/mllib/JavaSVDExample.java --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.mllib; + +//$example on$ +import java.util.LinkedList; + +import org.apache.spark.SparkConf; +import org.apache.spark.SparkContext; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.mllib.linalg.Matrix; +import org.apache.spark.mllib.linalg.SingularValueDecomposition; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.mllib.linalg.distributed.RowMatrix; +//$example off$ + +/** + * Example for SingularValueDecomposition. + */ +public class JavaSVDExample { + public static void main(String[] args) { +SparkConf conf = new SparkConf().setAppName("SVD Example"); +SparkContext sc = new SparkContext(conf); + +// $example on$ +double[][] array = { { 1.12, 2.05, 3.12 }, { 5.56, 6.28, 8.94 }, { 10.2, 8.0, 20.5 } }; +LinkedList rowsList = new LinkedList(); +for (int i = 0; i < array.length; i++) { + Vector currentRow = Vectors.dense(array[i]); + rowsList.add(currentRow); +} +JavaRDD rows = JavaSparkContext.fromSparkContext(sc).parallelize(rowsList); + +// Create a RowMatrix from JavaRDD. +RowMatrix mat = new RowMatrix(rows.rdd()); + +// Compute the top 3 singular values and corresponding singular vectors. +SingularValueDecomposition svd = mat.computeSVD(3, true, 1.0E-9d); +RowMatrix U = svd.U(); +Vector s = svd.s(); +Matrix V = svd.V(); +Vector[] collectPartitions = (Vector[]) U.rows().collect(); --- End diff -- It gives compilation error if we remove typecasting here since the return type of collect is Object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13012] [Documentation] Replace example ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11053#issuecomment-185280685 Thanks @srowen for review and comments. I have removed serialVersionUID and setters in Java Beans and also addressed the unnecessary spaces between braces in imports. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13016] [Documentation] Replace example ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11132#issuecomment-185064091 @yinxusen Thanks for reviewing, I have addressed the comments, Please have a look into this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117][Web UI] WebUI should use the loc...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11133#issuecomment-185016457 @srowen I am investigating it, will update. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13012] [Documentation] Replace example ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11053#issuecomment-184538336 Thanks for the review @yinxusen. I have configured the code format in IDE and using the same for formatting the code. I will fix these comments and update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org