[GitHub] spark issue #19267: [WIP][SPARK-20628][CORE] Blacklist nodes when they trans...

2017-11-27 Thread juanrh
Github user juanrh commented on the issue: https://github.com/apache/spark/pull/19267 @tgravescs I was finally able to contribute https://github.com/apache/hadoop/pull/289 which solves [YARN-6483](https://issues.apache.org/jira/browse/YARN-6483). With that patch, and the code

[GitHub] spark issue #19267: [WIP][SPARK-20628][CORE] Blacklist nodes when they trans...

2017-11-07 Thread juanrh
Github user juanrh commented on the issue: https://github.com/apache/spark/pull/19267 @tgravescs I have opened https://github.com/apache/hadoop/pull/289 to the YARN changes to get a notification in the AM when a nodes transitions to DECOMMISSIONING. This should be already useful

[GitHub] spark issue #19583: [WIP][SPARK-22339] [CORE] [NETWORK-SHUFFLE] Push epoch u...

2017-10-31 Thread juanrh
Github user juanrh commented on the issue: https://github.com/apache/spark/pull/19583 Even though we only wait 5 seconds by default between retries, the retries themselves can take a lot of time. For example in a simple word count job where a node is lost during stage 1.0 I have seen

[GitHub] spark pull request #19583: [WIP][SPARK-22339] [CORE] [NETWORK-SHUFFLE] Push ...

2017-10-31 Thread juanrh
Github user juanrh commented on a diff in the pull request: https://github.com/apache/spark/pull/19583#discussion_r148079448 --- Diff: core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala --- @@ -225,6 +270,7 @@ class HeartbeatReceiverSuite Matchers.eq

[GitHub] spark pull request #19583: [WIP][SPARK-22339] [CORE] [NETWORK-SHUFFLE] Push ...

2017-10-31 Thread juanrh
Github user juanrh commented on a diff in the pull request: https://github.com/apache/spark/pull/19583#discussion_r148078847 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -51,7 +51,26 @@ private case class ExecutorRegistered(executorId: String

[GitHub] spark pull request #19583: [WIP][SPARK-22339] [CORE] [NETWORK-SHUFFLE] Push ...

2017-10-31 Thread juanrh
Github user juanrh commented on a diff in the pull request: https://github.com/apache/spark/pull/19583#discussion_r148078337 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -241,6 +243,21 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request #19583: [WIP][SPARK-22339] [CORE] [NETWORK-SHUFFLE] Push ...

2017-10-31 Thread juanrh
Github user juanrh commented on a diff in the pull request: https://github.com/apache/spark/pull/19583#discussion_r148075569 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -51,7 +51,26 @@ private case class ExecutorRegistered(executorId: String

[GitHub] spark pull request #19590: [WIP][SPARK-22148][CORE] TaskSetManager.abortIfCo...

2017-10-27 Thread juanrh
GitHub user juanrh opened a pull request: https://github.com/apache/spark/pull/19590 [WIP][SPARK-22148][CORE] TaskSetManager.abortIfCompletelyBlacklisted should not abort when all current executors are blacklisted but dynamic allocation is enabled ## What changes were proposed

[GitHub] spark pull request #19583: [WIP][SPARK-22339] [CORE] [NETWORK-SHUFFLE]

2017-10-26 Thread juanrh
GitHub user juanrh opened a pull request: https://github.com/apache/spark/pull/19583 [WIP][SPARK-22339] [CORE] [NETWORK-SHUFFLE] ## What changes were proposed in this pull request? When a task finishes with error due to a fetch error, then DAGScheduler unregisters the shuffle

[GitHub] spark issue #19267: [WIP][SPARK-20628][CORE] Blacklist nodes when they trans...

2017-10-26 Thread juanrh
Github user juanrh commented on the issue: https://github.com/apache/spark/pull/19267 Hi @tgravescs, thanks again for your feedback. Regarding concrete uses cases, this change might be used extend the existing graceful decommission mechanism available in AWS EMR from a while ago

[GitHub] spark issue #19267: [WIP][SPARK-20628][CORE] Blacklist nodes when they trans...

2017-10-20 Thread juanrh
Github user juanrh commented on the issue: https://github.com/apache/spark/pull/19267 Hi @vanzin and @tgravescs, do you have any other comments on this proposal? Thanks, Juan --- - To unsubscribe

[GitHub] spark issue #19267: [WIP][SPARK-20628][CORE] Blacklist nodes when they trans...

2017-10-17 Thread juanrh
Github user juanrh commented on the issue: https://github.com/apache/spark/pull/19267 Hi Tom, thanks for your answer. Regarding use cases for the Spark admin command, I think it would be a good fit for cloud environments, where single job clusters are common, because

[GitHub] spark issue #19267: [WIP][SPARK-20628][CORE] Blacklist nodes when they trans...

2017-10-06 Thread juanrh
Github user juanrh commented on the issue: https://github.com/apache/spark/pull/19267 Hi @vanzin, do you have any comments on the design document attached above? Thanks --- - To unsubscribe, e-mail

[GitHub] spark issue #19267: [WIP][SPARK-20628][CORE] Blacklist nodes when they trans...

2017-10-02 Thread juanrh
Github user juanrh commented on the issue: https://github.com/apache/spark/pull/19267 Hi @vanzin, thanks for taking a look. This was part of a discussion with @holdenk about SPARK-20628. I have attached the document [Spark_Blacklisting_on_decommissioning-Scope.pdf](https

[GitHub] spark pull request #19267: [WIP][SPARK-20628][CORE] Blacklist nodes when the...

2017-09-18 Thread juanrh
GitHub user juanrh opened a pull request: https://github.com/apache/spark/pull/19267 [WIP][SPARK-20628][CORE] Blacklist nodes when they transition to DECOMMISSIONING state in YARN ## What changes were proposed in this pull request? Dynamic cluster configurations where cluster

[GitHub] spark pull request #17411: logging improvements

2017-03-24 Thread juanrh
GitHub user juanrh opened a pull request: https://github.com/apache/spark/pull/17411 logging improvements ## What changes were proposed in this pull request? Adding additional information to existing logging messages: - YarnAllocator: log the executor ID together

[GitHub] spark pull request: [SPARK-6714][Streaming][Kafka] additionally ov...

2015-04-23 Thread juanrh
Github user juanrh closed the pull request at: https://github.com/apache/spark/pull/5367 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-6714][Streaming][Kafka] additionally ov...

2015-04-23 Thread juanrh
Github user juanrh commented on the pull request: https://github.com/apache/spark/pull/5367#issuecomment-95575488 the corresponding issue [SPARK-6714] has been closed as Won't Fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-6714][Streaming][Kafka] additionally ov...

2015-04-05 Thread juanrh
GitHub user juanrh opened a pull request: https://github.com/apache/spark/pull/5367 [SPARK-6714][Streaming][Kafka] additionally overload KafkaUtils.createDi... ...rectStream for using a messageHandler without having to specify the offsets You can merge this pull request into a Git