[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2017-03-14 Thread sitalkedia
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/12436 @jisookim0513 - created a new PR - https://github.com/apache/spark/pull/17297 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2017-03-13 Thread sitalkedia
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/12436 >> Also, separately from what approach is used, how do you deal with the following: suppose map task 1 loses its output (e.g., the reducer where that task is located dies). Now, suppose reduce

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2017-02-21 Thread jisookim0513
Github user jisookim0513 commented on the issue: https://github.com/apache/spark/pull/12436 @sitalkedia have you had a chance to work on this issue and open a new PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-12-21 Thread sitalkedia
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/12436 @kayousterhout - Thanks for taking a look at the PR. Currently I don't have time to work on it. I will close the PR and open a new PR with issues addressed. --- If your project is set up for

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-12-19 Thread kayousterhout
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/12436 @sitalkedia this has been inactive for a while and there were a few issues pointed out above that haven't yet been resolved. Do you have time to work on this? Otherwise, can you close the

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-10-04 Thread kayousterhout
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/12436 Yeah @mridulm that also seems like an issue with this approach. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-10-04 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/12436 I am curious how this is resilient to epoch changes which will be triggered due to executor loss for a shuffle task when its shuffle map task executor is gone. Wont it not create issues if

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-09-06 Thread kayousterhout
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/12436 @sitalkedia I was thinking about this over the weekend and I'm not sure this is the right approach. I suspect it might be better to re-use the same task set manager for the new stage. This

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-09-06 Thread sitalkedia
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/12436 @davies - Thanks for looking into this. Updated the PR description with details of the change. Let me know if the approach seem reasonable, I will work on rebasing the change against latest

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-09-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/12436 @sitalkedia Have a quick look at this one, the use case sounds good, we should improve the stability for long running tasks. Could you explain a bit more how the current patch works? (in the PR

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-08-19 Thread sitalkedia
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/12436 ping. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-08-15 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/12436 See https://issues.apache.org/jira/browse/SPARK-17064 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-07-18 Thread sitalkedia
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/12436 @rxin - The idea is not to rerun or kill already running tasks in case of fetch failure because they might finish. If those tasks end up failing later, the dag scheduler will rerun them. ---

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-07-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/12436 Is the idea here to not rerun jobs that are already running in the case of a fetch failure, because they might finish? What happens after the change if those tasks end up coming back as

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-07-09 Thread sitalkedia
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/12436 ping. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-07-01 Thread sitalkedia
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/12436 @kayousterhout - Our use case is very large workload on Spark. We are processing around 100TBs of data in a single Spark job with 100k tasks in it (BTW the single threaded DagScheduler is

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-06-30 Thread kayousterhout
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/12436 @sitalkedia What's the use case for this? In the cases I've seen, if there's one fetch failure, it typically means that a machine that ran a map task has failed / gone done / been revoked by