from:"suyanNone"

Github user suyanNone closed the pull request at:

https://github.com/apache/spark/pull/12570


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14750][Spark][UI] Support spark on yarn...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/12570#issuecomment-218412962
  
@ajbozarth  oh...I not familiar with standalone... I have no idea about 
there had a class named `LogPage`, nice  for you to tell me about that...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14957][Yarn] Adopt healthy dir to store...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/12735#issuecomment-218389524
  
@tgravescs, yes, the executor will failed fast... 
vaguely remember that there have a application failed caused by shuffle 
server unhealthy dir, I don't have time to walk deep at that time... now I 
can't re-produce it
 so I will close this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14957][Yarn] Adopt healthy dir to store...

Github user suyanNone closed the pull request at:

https://github.com/apache/spark/pull/12735


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13902][SCHEDULER] Make DAGScheduler.get...

2016-04-28 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/12655#issuecomment-215329471
  
as I know, duplicate stage occurs: 
stage 2 and stage 3  all  depends on stage 1
stage 4 depends on stage 2 and stage 3

So, if we get getAncestorShuffleDependencies(stage 4), it will 
depth-traverse stage 2-> stage 1, then traverse stage 3 -> stage 1... due to we 
use stack,  we can't de-duplicate the same stage in this.

if  we traverse stage 4 with de-duplication successful, it will in 
`shuffleIdToMapStage`, which can tell other rdd which stage had traversed. 
right?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14957][Yarn] Adopt healthy dir to store...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/12735#issuecomment-215298717
  
To be honest, I not walk through all Yarn shuffle server process, I just 
fix our user reported problem, why can't connect with shuffle server due to 
create level db in an non-exists folder.  I will take some time to re-produce 
problem, and be more comprehend  about this.

@tgravescs I will look into the getRecoveryPath api...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14957][Yarn] Adopt healthy dir to store...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/12735#issuecomment-215297811
  
for why will have multi-meta files,  assume we found one, but the disk will 
be a read-only filesystem,  still use that ? or choose another healthy dir to 
create new one?

if choose the second, we can't delete right, and we will have 2 files 
exist...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14957][Yarn] Adopt healthy dir to store...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/12735#issuecomment-215294953
  
```
registeredExecutorFile =
  
findRegisteredExecutorFile(conf.getTrimmedStrings("yarn.nodemanager.local-dirs"));
```

we got that  local dirs for yarnConfig,  there a lot of dirs, but current 
we always adopt the first dir. right?

first assume there don't exist any meta file:
if the first disk had been removed, or had disk errors, like read-only 
filesystem/input/output errors/no space left. it will cause  
ExternalShuffleBlockResolver to create a new leveldb file, but it will 
failed...and throw IOException, and this IOException will be engulfed by 
YarnShufflerService
```
try {
  blockHandler = new ExternalShuffleBlockHandler(transportConf, 
registeredExecutorFile);
} catch (Exception e) {
  logger.error("Failed to initialize external shuffle service", e);
}
```
So may be we can choose a more healthy disk dir for storing meta file, to 
avoid necessary exception.








---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14957][Spark] Adopt healthy dir to stor...

GitHub user suyanNone opened a pull request:

https://github.com/apache/spark/pull/12735

[SPARK-14957][Spark] Adopt healthy dir to store executor meta

## What changes were proposed in this pull request?

Adopt a healthy dir to store executor meta.

## How was this patch tested?

Unit tests





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark fix-0dir

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12735.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12735


commit 2527391378e3f572663211f9ecc20a3a49c52ebb
Author: hushan <hus...@xiaomi.com>
Date:   2016-04-27T13:24:49Z

Adopt healthy dir to store executor meta




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK][SPARK-10842]Eliminate creating duplica...

2016-04-26 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8923#issuecomment-214655571
  
Revert set to Stack, and  add test case.
Revert set  to Stack, for we should build map stage from bottom to 
up(Stack), not a random(Set structure).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8927#issuecomment-213363612
  
@squito @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...

Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/8927#discussion_r60716314
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -1083,8 +1085,6 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with Timeou
   Success,
   makeMapStatus("hostA", reduceRdd.partitions.size)))
 assert(shuffleStage.numAvailableOutputs === 2)
-assert(mapOutputTracker.getMapSizesByExecutorId(shuffleId, 
0).map(_._1).toSet ===
--- End diff --

For running stage , executor lost will not register outputlocs in this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...

Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/8927#discussion_r60715521
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1416,6 +1449,7 @@ class DAGScheduler(
 
 outputCommitCoordinator.stageEnd(stage.id)
 listenerBus.post(SparkListenerStageCompleted(stage.latestInfo))
+taskScheduler.zombieTasks(stage.id)
--- End diff --

Once stage was finished, it should make  previous taskset Zombie


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12524][Core]DagScheduler may submit a t...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/12524#issuecomment-213351412
  
can you refer this, and have a look, 
https://github.com/apache/spark/pull/8927


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark][Graphx] Fix Graph vertexRDD/EdgeRDD ch...

GitHub user suyanNone reopened a pull request:

https://github.com/apache/spark/pull/12576

[Spark][Graphx] Fix Graph vertexRDD/EdgeRDD checkpoint results 
ClassCastException

## What changes were proposed in this pull request?

The PR fixed  compute chain from CheckpointRDD<-vertexRDDImp to  
CheckpointRDD<-partitionRDD<- vertexRDDImpl

## How was this patch tested?






You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark refine-graph

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12576.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12576


commit f2ced743b2aa147ed2b43221b4f52d502b85c8c6
Author: hushan <hus...@xiaomi.com>
Date:   2016-04-21T13:43:30Z

fix

commit a106758dd4492ccae851c48d010e6eb9fb26b849
Author: hushan <hus...@xiaomi.com>
Date:   2016-04-22T02:37:33Z

Add testcase




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark][Graphx] Fix Graph vertexRDD/EdgeRDD ch...

Github user suyanNone closed the pull request at:

https://github.com/apache/spark/pull/12576


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark][Graphx] Fix Graph vertexRDD/EdgeRDD ch...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/12576#issuecomment-212938601
  
mistake to open, close first


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark][Graphx] Fix Graph vertexRDD/EdgeRDD ch...

GitHub user suyanNone opened a pull request:

https://github.com/apache/spark/pull/12576

[Spark][Graphx] Fix Graph vertexRDD/EdgeRDD checkpoint results 
ClassCastException

## What changes were proposed in this pull request?

The PR fixed  compute chain from CheckpointRDD<-vertexRDDImp to  
CheckpointRDD<-partitionRDD<- vertexRDDImpl

## How was this patch tested?






You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark refine-graph

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12576.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12576


commit f2ced743b2aa147ed2b43221b4f52d502b85c8c6
Author: hushan <hus...@xiaomi.com>
Date:   2016-04-21T13:43:30Z

fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14750][Spark][UI] Support spark on yarn...

GitHub user suyanNone opened a pull request:

https://github.com/apache/spark/pull/12570

[SPARK-14750][Spark][UI] Support spark on yarn user refer finished 
application log in historyServer

## What changes were proposed in this pull request?

For spark on yarn, make user can refer application logs while application 
finished.


## How was this patch tested?
manual tests

beforeï¼
![Uploading Screenshot from 2016-04-21 16:59:45.pngâ¦]()

after
![Uploading Screenshot from 2016-04-21 17:00:44.pngâ¦]()

![Uploading Screenshot from 2016-04-21 17:01:15.pngâ¦]()




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark refer-hdfs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12570


commit f37d5c0ad8a0fa5eaa7c305c6ddad9d6702e462c
Author: hushan <hus...@xiaomi.com>
Date:   2016-04-20T08:28:38Z

[SPARK][UI] Make historyserver refer application log

Summary: Ref T5959

Test Plan: Test local

Reviewers: liushaohui, zouchenjun, wangjiasheng, peng.zhang

Reviewed By: peng.zhang

Maniphest Tasks: T5959

Differential Revision: https://phabricator.d.xiaomi.net/D31886

Conflicts:

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

commit 80262effb0918616a7d33e81c8e97db5d56d74ab
Author: hushan <hus...@xiaomi.com>
Date:   2016-04-21T03:26:15Z

Refine




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...

2016-04-20 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8927#issuecomment-212322683
  
jenkins retest this please



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...

2016-04-20 Thread suyanNone

GitHub user suyanNone reopened a pull request:

https://github.com/apache/spark/pull/8927

[SPARK-10796][CORE]Resubmit stage while lost task in Zombie TaskSets

We meet that problem in Spark 1.3.0, and I also reproduce on the latest 
version.

desc:
1. We know a running `ShuffleMapStage` will have multiple `TaskSet`: one 
Active TaskSet, multiple Zombie TaskSet. 
2. We think a running `ShuffleMapStage` is success only if its partition 
are all process success, namely each taskâs MapStatus are all add into 
`outputLocs`
3. MapStatus of running `ShuffleMapStage` may succeed by Zombie TaskSet1 / 
Zombie TaskSet2 // Active TaskSetN, and may some MapStatus only belong to 
one TaskSet, and may be a Zombie TaskSet.
4. If lost a executor, it chanced that some lost-executor related MapStatus 
are succeed by some Zombie TaskSet. In current logical, The solution to 
resolved that lost MapStatus problem is, each TaskSet re-running that those 
tasks which succeed in lost-executor: re-add into `TaskSet's pendingTasks`, and 
re-add it paritions into `Stageâs pendingPartitions` . but it is useless if 
that lost MapStatus only belong to Zombie TaskSet, it is Zombie, so will never 
be scheduled his `pendingTasks`
5. The condition for resubmit stage is only if some task throws 
`FetchFailedException`, but may the lost-executor just not empty any MapStatus 
of parent Stage for one of running Stages, and itâs happen to that running 
`Stage` was lost a MapStatus only belong to a ZombieTask. So if all Zombie 
TaskSets are all processed his runningTasks and Active TaskSet are all 
processed his pendingTask, then will removed by `TaskSchedulerImp`, then that 
running Stage's pending partitions is still nonEmpty. it will hangs..


Examples:
Running Stage 0.0, running TaskSet0.0, Finshed task0.0 in ExecA, running 
Task1.0 in ExecB, waiting Task2.0
---> Task1.0 throws FetchFailedException
---> Running Resubmited stage 0.1, running TaskSet0.1(which re-run Task1, 
Task2), assume Task 1.0 finshed in ExecA
---> ExecA lost, and it happens no one throw FetchFailedExecption.
---> TaskSet0.1 re-submit task 1, re-add it into pendingTasks, and waiting 
TaskSchedulerImp schedule.
TaskSet 0.0 also resubmit task0, re-add it into pendingTasks, because 
itâs Zombie, TaskSchedulerImpl skip to schedule TaskSet0.0
So if TaskSet0.0 and TaskSet0.1 (isZombie && runningTasks.empty), 
TaskSchedulerImp will remove those TaskSets.
DagScheduler still have pendingPartitions due to the task lost in 
TaskSet0.0, but his TaskSets are all removed, so hangs




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark rerun-special

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8927.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8927


commit 1be6071956c6a93e2d264fb1c2db92d01c4d3fe4
Author: hushan <hus...@xiaomi.com>
Date:   2016-04-20T08:21:51Z

Fix zombieTasksets and RemovedTaskset lost output




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...

2016-04-20 Thread suyanNone

Github user suyanNone closed the pull request at:

https://github.com/apache/spark/pull/8927


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12009][Yarn]Avoid to re-allocating yarn...

2015-11-30 Thread suyanNone

Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/9992#discussion_r46235437
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -228,6 +228,7 @@ private[spark] class ApplicationMaster(
   }
 
   private def sparkContextStopped(sc: SparkContext) = {
+sc.requestTotalExecutors(0, 0, Map.empty)
--- End diff --

right... I am little mix with yarnScheduler and yarnSchedulerBackend 
class...
the stop order was:
YarnClusterScheduler.stop
 ->TaskSchedulerImpl.stop -> CoarsedGrainSchedulerbackend.stop -> asyn: 
stopExecutors
 -> ApplicationMaster.sparkContextStopped(sc)
 
may be it is good way to override stop() in YarnSchedulerBackend. 
```
YarnSchedulerBackend.stop {
  doRequestTotalExecutors() //this is a askWithRetry...
  super.stop
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12009][Yarn]Avoid to re-allocating yarn...

2015-11-30 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/9992#issuecomment-160623236
  
@lianhuiwang Hi, can you look for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12009][Yarn]Avoid to re-allocating yarn...

2015-11-30 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/10031#issuecomment-160623021
  
@lianhuiwang
I prefer call sc.requestTotalExecutor(0), because we can't foresee user's 
behave after sc.stop
and I already refine in my previous modification version. thanks for  you 
if you can take some time to look it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12009][Yarn]Avoid to re-allocating yarn...

2015-11-29 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/10031#issuecomment-160507811
  
no... actually, it is not good ideaExecutor may be killed by Yarn due 
to some reason...or just akka disconnected...
and I hope you can close this, I will working for that to find a more 
reasonable solution


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12009][Yarn]Avoid to re-allocating yarn...

2015-11-25 Thread suyanNone

GitHub user suyanNone opened a pull request:

https://github.com/apache/spark/pull/9992

[SPARK-12009][Yarn]Avoid to re-allocating yarn container while driver want 
to stop all Executors



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark tricky

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9992.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9992


commit 771b33c799d30722cd51d24f7aa456719fa2d9d0
Author: hushan <hus...@xiaomi.com>
Date:   2015-11-26T07:02:44Z

Avoid to re-allocator yarn container while sc.stop




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6157][CORE] Unrolling with MEMORY_AND_D...

2015-11-23 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4887#issuecomment-159140817
  
@andrewor14  I have the impression that cogroup will hold tow RDD's 
iterator, 
If the first iterator is `MEMORY_ONLY`, and unroll failed, so the iterator 
is based of ` Right(vector.iterator ++ values)`
then the second is `MEMORY_AND_DISK`, and unroll failed, and we prepare to 
put it into disk. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Unify dependencies entry

2015-11-13 Thread suyanNone

GitHub user suyanNone opened a pull request:

https://github.com/apache/spark/pull/9691

Unify dependencies entry

a small change

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark unify-getDependency

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9691.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9691


commit 4ac866fa386cbd94b078bdfc4c5c6a07d30b9ea6
Author: hushan <hus...@xiaomi.com>
Date:   2015-11-13T12:05:58Z

Unify dependencies entry




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6157][CORE] Unrolling with MEMORY_AND_D...

2015-11-11 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4887#issuecomment-155758546
  
@andrewor14 , I had not make my point clearly.
This issues created because  of `memory_disk_level` block not released 
`unrollMemory` after that block had been put into disk successful.

In current logic,
There are have a `unrollMemoryMap` to contains all unroll memory for 
unrolling blocks and unrolled failed blocks.
due to SPARK-4777, we add a `pendingUnrollMemoryMap` to reserve unroll 
memory only for a block that unrolled success. 
and `pendingUnrollMemoryMap`  + `unrollMemoryMap` is the total used 
unrollMemory.

Now to resolve this issue, after spark had unroll failed a 
`memory_and_disk` level block, we need to know the specific memory size(like 
199MB) of `this` block.
**important**we can't just call `releaseUnrollMemoryForThisTask` after we 
had put this block into disk, because `unrollMemoryForThisTask` may contains 
other block's unroll memory, such as 2 cache RDD, and have same paritioner, and 
with some `cogroup` ops.  right?

So we may need to know the specific memory size of unrolled failed 
memory_and_disk block, and should be differentiate from that `unrollMemoryMap`. 




  






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6157][CORE] Unrolling with MEMORY_AND_D...

2015-11-11 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4887#issuecomment-155985114

eh...
the earlier version is add a
`blockManager.memoryStore.releasePendingUnrollMemoryForThisThread()` in [this
line](https://github.com/apache/spark/blob/6cd98c1878a9c5c6475ed5974643021ab27862a7/core/src/main/scala/org/apache/spark/CacheManager.scala#L184).

but the `pendingUnrollMemory` is just stored unrolled success memory, not
for the unrolled failed memory. so it is useless to call
`blockManager.memoryStore.releasePendingUnrollMemoryForThisThread()`. Unless
we change `pendingUnrollMemory` to store unroll memory for unrolled failed
`MEMORY_AND_DISK` block and unrolled successful block.

and memoryStore had another api:
(1)`blockManager.memoryStore.releaseUnrollMemoryForThisThread()` or
(2)`blockManager.memoryStore.releaseUnrollMemoryForThisThread(SomeMemorySize)`,
`unrollMemoryMap` is used store unroll memory for unrolling and unrolled failed
blocks. if we call (1) after [this
line](https://github.com/apache/spark/blob/6cd98c1878a9c5c6475ed5974643021ab27862a7/core/src/main/scala/org/apache/spark/CacheManager.scala#L184),
it will release memory for other unrolled failed block.
if we call (2), we can't got the specific memory for that `MEMORY_AND_DISK`
block.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6157][CORE] Unrolling with MEMORY_AND_D...

2015-11-09 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4887#issuecomment-155260117
  
 = =, I will find time to update today...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8100][UI]Make able to refer lost execut...

2015-11-04 Thread suyanNone

Github user suyanNone closed the pull request at:

https://github.com/apache/spark/pull/6644


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK][SPARK-10842]Eliminate creating duplica...

2015-11-02 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8923#issuecomment-153248553
  
 @markhamstra  @squito  
 I need re-construct and re-run the test case to confirm that problem... 

R1  --|  
 |   --->R3 | > R4 --|
 |   |>R6--->R7
R2  --| R3 |->R5 --|

We get finalRDD: R7 shufflesdeps, the first is R6

for R6,  not added into `shuffleToMapStage`,  get his all not added into  
`shuffleIdToMapStage` AncestersDeps (R4(R3(R1, R2)), R5(R3(R1, R2))),  in Stack 
it somthing like(R4, R3, R1, R2, R5, R3, R1,  R2), for each dep,  it will 
create a new ShuffleMapStage. so the deps from R1->R3 and R2 ->R3, will create 
twice.

eh... @markhamstra,  change the return type from Stack  to  Set sounds like 
good, but the `shuffleDep` object, I not sure it is unique for each shuffleId, 
I need to check that.  If I confirm that uniq, I'd like to change the `val 
parents = new Stack[ShuffleDependency[_, _, _]]` to `new Set`.

 






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-7729:Executor which has been killed shou...

2015-11-02 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/6263#issuecomment-153225204
  
yean,  make it configurable looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-7729:Executor which has been killed shou...

2015-10-25 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/6263#issuecomment-151010358
  
Hi, @archit279thakur would you mind add the logic about adding a time 
expire to show lost-Executor log? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK][SPARK-10842]Eliminate creating duplica...

2015-10-15 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8923#issuecomment-148297636
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6157][CORE] Unrolling with MEMORY_AND_D...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4887#issuecomment-147957284
  
@andrewor14 
For we have three methods to release unroll memory for three situation
 1. Unroll success. We expect to cache this block in `tryToPut`. We do not 
release and re-acquire memory from the MemoryManager in order to avoid race 
conditions where another component steals the memory that we're trying to 
transfer. (SPARK-4777)
2. Unroll failed for memory_disk_level block. We should release memory 
after putting this block into diskStore in order to re-acquire memory for other 
purpose. (SPARK-6157)
3. Unroll failed for memory_only_level block,  We do not release until we 
finished task.

According that, we may hold the reserved unroll memory after unrolled a 
block. the `pendingUnrollMemory` was designed only for unroll success block, 
for situation 2, we also need hold that memory, we may could transfer a 
parameter  to `unrollSafely` to tell that block contains disk level, so we can 
hold that memory after unroll failed a memory_and_disk block. and done some 
release after we put that block into diskStore.

another solution is not add parameter for `unrollSafely`, for each Task 
unrolling a block, first acquire memory into `pendingUnrollMemoryMap`, we do 
some release for situation 1 and 2, and move that memory for `unrollMemoryMap` 
for situation 3. 

The commit version adopt second solution, May have a more suitable 
solution? look forward for your suggestions





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK][SPARK-10842]Eliminate creating duplica...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8923#issuecomment-147999008
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK][SPARK-10842]Eliminate creating duplica...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8923#issuecomment-147964293
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6157][CORE] Unrolling with MEMORY_AND_D...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4887#issuecomment-147963975
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6157][CORE] Unrolling with MEMORY_AND_D...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4887#issuecomment-148025758
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6157][CORE] Unrolling with MEMORY_AND_D...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4887#issuecomment-147999540
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8927#issuecomment-14378
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8927#issuecomment-143698080
  
Reproduce that, so re-open that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...

GitHub user suyanNone reopened a pull request:

https://github.com/apache/spark/pull/8927

[SPARK-10796][CORE]Resubmit stage while lost task in Zombie TaskSets

We meet that problem in Spark 1.3.0, and I also check the latest Spark 
code, and I think that problem still exist.
desc:
1. We know a running `ShuffleMapStage` will have multiple `TaskSet`: one 
Active TaskSet, multiple Zombie TaskSet. 
2. We think a running `ShuffleMapStage` is success only if its partition 
are all process success, namely each taskâs MapStatus are all add into 
`outputLocs`
3. MapStatus of running `ShuffleMapStage` may succeed by Zombie TaskSet1 / 
Zombie TaskSet2 // Active TaskSetN, and may some MapStatus only belong to 
one TaskSet, and may be a Zombie TaskSet.
4. If lost a executor, it chanced that some lost-executor related MapStatus 
are succeed by some Zombie TaskSet. In current logical, The solution to 
resolved that lost MapStatus problem is, each TaskSet re-running that those 
tasks which succeed in lost-executor: re-add into `TaskSet's pendingTasks`, and 
re-add it paritions into `Stageâs pendingPartitions` . but it is useless if 
that lost MapStatus only belong to Zombie TaskSet, it is Zombie, so will never 
be scheduled his `pendingTasks`
5. The condition for resubmit stage is only if some task throws 
`FetchFailedException`, but may the lost-executor just not empty any MapStatus 
of parent Stage for one of running Stages, and itâs happen to that running 
`Stage` was lost a MapStatus only belong to a ZombieTask. So if all Zombie 
TaskSets are all processed his runningTasks and Active TaskSet are all 
processed his pendingTask, then will removed by `TaskSchedulerImp`, then that 
running Stage's pending partitions is still nonEmpty. it will hangs..


Examples:
Running Stage 0.0, running TaskSet0.0, Finshed task0.0 in ExecA, running 
Task1.0 in ExecB, waiting Task2.0
---> Task1.0 throws FetchFailedException
---> Running Resubmited stage 0.1, running TaskSet0.1(which re-run Task1, 
Task2), assume Task 1.0 finshed in ExecA
---> ExecA lost, and it happens no one throw FetchFailedExecption.
---> TaskSet0.1 re-submit task 1, re-add it into pendingTasks, and waiting 
TaskSchedulerImp schedule.
TaskSet 0.0 also resubmit task0, re-add it into pendingTasks, because 
itâs Zombie, TaskSchedulerImpl skip to schedule TaskSet0.0
So if TaskSet0.0 and TaskSet0.1 (isZombie && runningTasks.empty), 
TaskSchedulerImp will remove those TaskSets.
DagScheduler still have pendingPartitions due to the task lost in 
TaskSet0.0, but his TaskSets are all removed, so hangs




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark rerun-special

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8927.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8927


commit 554c61f800c6c1b25b1002a7255569a9c38e4154
Author: hushan <hus...@xiaomi.com>
Date:   2015-09-24T09:49:22Z

rerun-specail

commit 3b4a683d23f951082df0b9d29dfa094683d235ea
Author: hushan <hus...@xiaomi.com>
Date:   2015-09-28T03:09:05Z

refine

commit f845f33563623a9f3d6858aba893ed8c75453403
Author: hushan <hus...@xiaomi.com>
Date:   2015-09-28T03:14:18Z

refine

commit 301da0a20c94084bc8f783cd0e087e63f07e2124
Author: hushan <hus...@xiaomi.com>
Date:   2015-09-28T03:18:46Z

refine




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK][SPARK-10842]Eliminate creating duplica...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8923#issuecomment-143923739
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...

2015-09-27 Thread suyanNone

GitHub user suyanNone opened a pull request:

https://github.com/apache/spark/pull/8927

[SPARK-10796][CORE]Resubmit stage while lost task in Zombie TaskSets

We meet that problem in Spark 1.3.0, and I also check the latest Spark 
code, and I think that problem still exist.
1. We know a running `ShuffleMapStage` will have multiple `TaskSet`: one 
Active TaskSet, multiple Zombie TaskSet. 
2. We think a running `ShuffleMapStage` is success only if its partition 
are all process success, namely each taskâs MapStatus are all add into 
`outputLocs`
3. MapStatus of running `ShuffleMapStage` may succeed by Zombie TaskSet1 / 
Zombie TaskSet2 // Active TaskSetN, and may some MapStatus only belong to 
one TaskSet, and may be a Zombie TaskSet.
4. If lost a executor, it chanced that some lost-executor related MapStatus 
are succeed by some Zombie TaskSet. In current logical, The solution to 
resolved that lost MapStatus problem is, each TaskSet re-running that those 
tasks which succeed in lost-executor: re-add into `TaskSet's pendingTasks`, and 
re-add it paritions into `Stageâs pendingPartitions` . but it is useless if 
that lost MapStatus only belong to Zombie TaskSet, it is Zombie, so will never 
be scheduled his `pendingTasks`
5. The condition for resubmit stage is only if some task throws 
`FetchFailedException`, but may the lost-executor just not empty any MapStatus 
of parent Stage for one of running Stages, and itâs happen to that running 
`Stage` was lost a MapStatus only belong to a ZombieTask. So if all Zombie 
TaskSets are all processed his runningTasks and Active TaskSet are all 
processed his pendingTask, then will removed by `TaskSchedulerImp`, then that 
running Stage's pending partitions is still nonEmpty. it will hangs..

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark rerun-special

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8927.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8927


commit 554c61f800c6c1b25b1002a7255569a9c38e4154
Author: hushan <hus...@xiaomi.com>
Date:   2015-09-24T09:49:22Z

rerun-specail

commit 3b4a683d23f951082df0b9d29dfa094683d235ea
Author: hushan <hus...@xiaomi.com>
Date:   2015-09-28T03:09:05Z

refine

commit f845f33563623a9f3d6858aba893ed8c75453403
Author: hushan <hus...@xiaomi.com>
Date:   2015-09-28T03:14:18Z

refine

commit 301da0a20c94084bc8f783cd0e087e63f07e2124
Author: hushan <hus...@xiaomi.com>
Date:   2015-09-28T03:18:46Z

refine




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...

2015-09-27 Thread suyanNone

Github user suyanNone closed the pull request at:

https://github.com/apache/spark/pull/8927


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...

2015-09-27 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8927#issuecomment-143632342
  
I will run a test job on the latest code, to confirm that problem exist or 
not... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK][SPARK-10842]Eliminate create duplicate...

2015-09-26 Thread suyanNone

GitHub user suyanNone opened a pull request:

https://github.com/apache/spark/pull/8923

[SPARK][SPARK-10842]Eliminate create duplicate stage while generate job dag

When we traverse RDD, to generate Stage DAG, Spark will skip judge the 
stage whether it add into shuffleIdToStage in some condition: get shuffleDep 
from getAncestorShuffleDependency

Before:

![before1](https://cloud.githubusercontent.com/assets/9818852/10117739/9e1e8778-6493-11e5-935d-ea32099e94c9.png)

![before2](https://cloud.githubusercontent.com/assets/9818852/10117740/9e20ee46-6493-11e5-95b4-f30df5ac22e1.png)

After:

![after](https://cloud.githubusercontent.com/assets/9818852/10117737/9e1864c4-6493-11e5-8541-31ff0ead329a.png)

![after2](https://cloud.githubusercontent.com/assets/9818852/10117738/9e1c8658-6493-11e5-954a-fb00f36f4d3b.png)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark duplicate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8923.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8923


commit 315887a5267a385d40a47267dcd25664414f1784
Author: hushan <hus...@xiaomi.com>
Date:   2015-09-26T13:12:39Z

Duplicate stage




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6157][CORE]Unroll unsuccessful memory_a...

2015-09-08 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4887#issuecomment-138480264
  
sorry too late to see that, I will update it today


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK][SPARK-10370]Cancel all running attempt...

GitHub user suyanNone opened a pull request:

https://github.com/apache/spark/pull/8550

[SPARK][SPARK-10370]Cancel all running attempts while that stage marked as 
finished



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark apache-tasksets

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8550.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8550


commit 75b97796a2226cbc855d8b6d198a09cce709d698
Author: hushan[è¡ç] <hus...@xiaomi.com>
Date:   2015-09-01T09:16:15Z

Cancel stage task after it already marked finished




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6157][CORE]Unroll unsuccessful memory_a...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4887#issuecomment-136661103
  
@srowen Ok, I check the current code, it change to memory threshold per 
task instead per thread. so that problem is still exist...

"
In the previous situation, it is aim to resolve memory_and_disk level 
block, first unroll failed, but it will reserved unroll memory for this thread, 
and that unroll memory part should release after that block already put into 
disk, because nobody will use value for that unroll arrays. So it no necessary 
to reserved that unroll memory part. 
"

Do you agree that description or do I make that understandable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6157][CORE]Unroll unsuccessful memory_a...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4887#issuecomment-136657671
  
@srowen @nitin2goyal  I not sure it is need because there is a big change 
for memory threshold per thread.

In the previous situation, it is aim to resolve memory_and_disk level 
block, first unroll failed, but it will reserved unroll memory for this thread, 
 and that unroll memory part should release after that block already put into 
disk.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE] don't submit stage until it...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/7699#issuecomment-136639959
  
@squito  Spark-10370 , eh...I think already resolved it few month ago in my 
local env... but it's based in spark 1.3.0... 
do u already working for it ?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK][SPARK-10370]Cancel all running attempt...

Github user suyanNone closed the pull request at:

https://github.com/apache/spark/pull/8550


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK][SPARK-10370]Cancel all running attempt...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8550#issuecomment-136762126
  
@squito yean, I will close that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK][SPARK-10370]Cancel all running attempt...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8550#issuecomment-136766398
  
@squito  I close that patch, because there have some errors in test.
call `taskScheduler.cancelTasks(stage.id, true)` in MarkStageAsFinished, It 
will mark all the no usefull tasksets related to that stage as zombie, because 
we already mark it as finished, and no need to deal with any running task for 
that stage. So first that prevent task from scheduling, second
 it may be good to kill the running task for already finished stage, the 
task may run a long time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK][SPARK-10370]Cancel all running attempt...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8550#issuecomment-136767971
  
For Spark-2666, this patch is aim to cancel all stage tasks as long as a 
FetchFailedException was thrown, and I think is no related to this patch, 
right? because this patch is just cancel the task which his stage is marked as 
finished.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: KafKaDirectDstream should filter empty partiti...

2015-08-18 Thread suyanNone

Github user suyanNone closed the pull request at:

https://github.com/apache/spark/pull/8237


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: KafKaDirectDstream should filter empty partiti...

2015-08-18 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8237#issuecomment-132427763
  
@tdas Ok, as you already discussed before, so let's stay the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: KafKaDirectDstream should filter empty partiti...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/8237#issuecomment-132054268
  
@tdas, yean, I agree with it will be confused in semantics, and for a batch 
streaming system, it will not blocked the next batch as long as it will 
finished quickly, even not larger than batch time. Ah...I still have a little 
uncomfortable for launching empty tasks to executor, as we run spark on yarn, 
the resource is based soft isolation, so it will occurs resource competition, 
some times, even for empty task, the process time will larger than batch time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: KafKaDirectDstream should filter empty partiti...

GitHub user suyanNone opened a pull request:

https://github.com/apache/spark/pull/8237

KafKaDirectDstream should filter empty partition task or rdd

To avoid submit stages and tasks for 0 events batch.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark spark-10052

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8237


commit 4e4764d0a007160ae93a9c5b5910043ab406c02b
Author: hushan[è¡ç] hus...@xiaomi.com
Date:   2015-08-17T12:22:49Z

KafKaDirectDstream should filter empty partition task or rdd




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: KafKaDirectDstream should filter empty partiti...

GitHub user suyanNone reopened a pull request:

https://github.com/apache/spark/pull/8237

KafKaDirectDstream should filter empty partition task or rdd

To avoid submit stages and tasks for 0 events batch.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark spark-10052

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8237


commit 4e4764d0a007160ae93a9c5b5910043ab406c02b
Author: hushan[è¡ç] hus...@xiaomi.com
Date:   2015-08-17T12:22:49Z

KafKaDirectDstream should filter empty partition task or rdd




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: KafKaDirectDstream should filter empty partiti...

Github user suyanNone closed the pull request at:

https://github.com/apache/spark/pull/8237


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8100][UI]Make able to refer lost execut...

2015-08-05 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/6644#issuecomment-128218507
  
@andrewor14 @harishreedharan @squito  I miss the andrewor14 comments about 
long running app again...  as harishreedharan says, time limit expires is 
worth considering


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE] don't submit stage until it...

2015-07-28 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/7699#issuecomment-125477126
  
... sorry for not update instantly... I am working busy off-line. it's nice 
for you to refine the code for #4055.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8100][UI]Make able to refer lost execut...

2015-07-28 Thread suyanNone

Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/6644#discussion_r35617005
  
--- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala ---
@@ -60,7 +60,8 @@ class ExecutorSummary private[spark](
 val totalShuffleRead: Long,
 val totalShuffleWrite: Long,
 val maxMemory: Long,
-val executorLogs: Map[String, String])
+val executorLogs: Map[String, String],
+val isRemoved: Boolean)
--- End diff --

Ok, I will refine that, and it's nice to tell me about MimaExcludes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8100][UI]Make able to refer lost execut...

2015-07-28 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/6644#issuecomment-125468342
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

2015-07-21 Thread suyanNone

Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r35177284
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -739,6 +742,88 @@ class DAGSchedulerSuite
 assertDataStructuresEmpty()
   }
 
+  test(verify not submit next stage while not have registered mapStatus) 
{
+val firstRDD = new MyRDD(sc, 3, Nil)
+val firstShuffleDep = new ShuffleDependency(firstRDD, null)
+val firstShuffleId = firstShuffleDep.shuffleId
+val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep))
+val shuffleDep = new ShuffleDependency(shuffleMapRdd, null)
+val reduceRdd = new MyRDD(sc, 1, List(shuffleDep))
+submit(reduceRdd, Array(0))
+
+// things start out smoothly, stage 0 completes with no issues
+complete(taskSets(0), Seq(
+  (Success, makeMapStatus(hostB, shuffleMapRdd.partitions.size)),
+  (Success, makeMapStatus(hostB, shuffleMapRdd.partitions.size)),
+  (Success, makeMapStatus(hostA, shuffleMapRdd.partitions.size))
+))
+
+// then one executor dies, and a task fails in stage 1
+runEvent(ExecutorLost(exec-hostA))
+runEvent(CompletionEvent(taskSets(1).tasks(0),
+  FetchFailed(null, firstShuffleId, 2, 0, Fetch failed),
+  null, null, createFakeTaskInfo(), null))
+
+// so we resubmit stage 0, which completes happily
+Thread.sleep(1000)
+val stage0Resubmit = taskSets(2)
+assert(stage0Resubmit.stageId == 0)
+assert(stage0Resubmit.stageAttemptId === 1)
+val task = stage0Resubmit.tasks(0)
+assert(task.partitionId === 2)
+runEvent(CompletionEvent(task, Success,
+  makeMapStatus(hostC, shuffleMapRdd.partitions.size), null, 
createFakeTaskInfo(), null))
+
+// now here is where things get tricky : we will now have a task set 
representing
+// the second attempt for stage 1, but we *also* have some tasks for 
the first attempt for
+// stage 1 still going
+val stage1Resubmit = taskSets(3)
+assert(stage1Resubmit.stageId == 1)
+assert(stage1Resubmit.stageAttemptId === 1)
+assert(stage1Resubmit.tasks.length === 3)
+
+// we'll have some tasks finish from the first attempt, and some 
finish from the second attempt,
+// so that we actually have all stage outputs, though no attempt has 
completed all its
+// tasks
+runEvent(CompletionEvent(taskSets(3).tasks(0), Success,
+  makeMapStatus(hostC, reduceRdd.partitions.size), null, 
createFakeTaskInfo(), null))
+runEvent(CompletionEvent(taskSets(3).tasks(1), Success,
+  makeMapStatus(hostC, reduceRdd.partitions.size), null, 
createFakeTaskInfo(), null))
+// late task finish from the first attempt
+runEvent(CompletionEvent(taskSets(1).tasks(2), Success,
+  makeMapStatus(hostB, reduceRdd.partitions.size), null, 
createFakeTaskInfo(), null))
+
+// What should happen now is that we submit stage 2.  However, we 
might not see an error
+// b/c of DAGScheduler's error handling (it tends to swallow errors 
and just log them).  But
+// we can check some conditions.
+// Note that the really important thing here is not so much that we 
submit stage 2 *immediately*
+// but that we don't end up with some error from these interleaved 
completions.  It would also
+// be OK (though sub-optimal) if stage 2 simply waited until the 
resubmission of stage 1 had
+// all its tasks complete
+
+// check that we have all the map output for stage 0 (it should have 
been there even before
+// the last round of completions from stage 1, but just to double 
check it hasn't been messed
+// up)
+(0 until 3).foreach { reduceIdx =
+  val arr = mapOutputTracker.getServerStatuses(0, reduceIdx)
+  assert(arr != null)
+  assert(arr.nonEmpty)
--- End diff --

may the below code will be more better? 
``  
try {
mapOutputTracker.getMapSizesByExecutorId(0, reduceIdx)
  } catch {
case e: Exception = fail()
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

2015-07-20 Thread suyanNone

Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r35064694
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1193,8 +1193,10 @@ class DAGScheduler(
 // TODO: This will be really slow if we keep accumulating shuffle 
map stages
 for ((shuffleId, stage) - shuffleToMapStage) {
   stage.removeOutputsOnExecutor(execId)
-  val locs = stage.outputLocs.map(list = if (list.isEmpty) null 
else list.head).toArray
-  mapOutputTracker.registerMapOutputs(shuffleId, locs, changeEpoch 
= true)
+  if (!runningStages.contains(stage)) {
--- End diff --

yean, that right. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

2015-07-20 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4055#issuecomment-123142054
  
@squito @markhamstra hmm, agree to only tracker partitionId, which make 
more simple. It's great to see that patch will to be close, Great thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

2015-07-06 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4055#issuecomment-118758477
  
@squito about stage.pendingTask and shuffleMapStage.isAvaiable 
for use `isAvailable` to instead of `stage.pendingTask's`, may need more 
careful to do with that.
there is a comments in the code `// Some tasks had failed; let's resubmit 
this shuffleStage`, I can't  recognize the situation or context...  if we 
change to `isAvailable`, then the following code 
`if(shuffleStage.outputLocs.contains(Nil))` may be removed? and also may remove 
all variable `stage.pendingTask` related code, because is no reader any more. 
the logical of dagScheduler is complicated and always has some confusing 
code... 

```
 if (runningStages.contains(shuffleStage)  
shuffleStage.pendingTasks.isEmpty) { //
 clearCacheLocs()
 if (shuffleStage.outputLocs.contains(Nil)) {
 // Some tasks had failed; let's resubmit this shuffleStage
 // TODO: Lower-level scheduler should also deal with this
 logInfo(Resubmitting  + shuffleStage +  ( + shuffleStage.name + ) 
because some of itstasks had failed:  + 
shuffleStage.outputLocs.zipWithIndex.filter(_._1.isEmpty).map(_._2).mkString(, 
))   
submitStage(shuffleStage)
  } else {
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

2015-07-05 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4055#issuecomment-118708012
  
@squito oh...I had skipped it...
1) Task attempt now is described in `TaskInfo` in Spark `TaskSetManager`. 
`TaskSetManager` is responsible for completing task attempt `TaskInfo`, which 
identified by `TaskID`.  and if `TaskSetManager` is think a attempt was 
succeed, then call `dagScheduler` to complete the `tasks(index)` which is a 
`Task`.
so it will make some sense that identified a `Task` by stageId and 
partitionID?










 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r33860928
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1193,8 +1193,10 @@ class DAGScheduler(
 // TODO: This will be really slow if we keep accumulating shuffle 
map stages
 for ((shuffleId, stage) - shuffleToMapStage) {
   stage.removeOutputsOnExecutor(execId)
-  val locs = stage.outputLocs.map(list = if (list.isEmpty) null 
else list.head).toArray
-  mapOutputTracker.registerMapOutputs(shuffleId, locs, changeEpoch 
= true)
+  if (!runningStages.contains(stage)) {
--- End diff --

@squito for this one, dagScheduler handler ExecutorLost, it will register 
all `shuffleToMapStage`. Including the running MapStage, which that is not 
complete partial, it will cause we register partial mapStatus into 
MapOutputTracker.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r33861063
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -305,7 +305,7 @@ private[spark] class MapOutputTrackerMaster(conf: 
SparkConf)
 
 if (mapStatuses.contains(shuffleId)) {
   val statuses = mapStatuses(shuffleId)
-  if (statuses.nonEmpty) {
+  if (statuses.nonEmpty  statuses.exists(_ != null)) {
--- End diff --

Do I need a test case for that correct, I think it is a bug...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r33861293
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1193,8 +1193,10 @@ class DAGScheduler(
 // TODO: This will be really slow if we keep accumulating shuffle 
map stages
 for ((shuffleId, stage) - shuffleToMapStage) {
   stage.removeOutputsOnExecutor(execId)
-  val locs = stage.outputLocs.map(list = if (list.isEmpty) null 
else list.head).toArray
-  mapOutputTracker.registerMapOutputs(shuffleId, locs, changeEpoch 
= true)
+  if (!runningStages.contains(stage)) {
--- End diff --

Another things to claim, if we fix the problem described in that issues, we 
may never meet some problem resulted  by that 2 part code.  it may not help for 
that issue, do I need to create other issue for that 2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4055#issuecomment-118257935
  
@squito you are right, taskset is set to zombie for not scheduling the rest 
task.
you Testcase is good, and I just modify 1 place.
```
runEvent(ExecutorLost(exec-hostA))
runEvent(CompletionEvent(taskSets(1).tasks(0),
  FetchFailed(null, firstShuffleId, 2, 0, Fetch failed),
  null, null, createFakeTaskInfo(), null))
// so we resubmit stage 0, which completes happily
scheduler.resubmitFailedStages()
val stage0Resubmit = taskSets(2)
assert(stage0Resubmit.stageId == 0)
assert(stage0Resubmit.attempt === 1)
```

to 

```
runEvent(ExecutorLost(exec-hostA))
runEvent(CompletionEvent(taskSets(1).tasks(0),
  FetchFailed(null, firstShuffleId, 2, 0, Fetch failed),
  null, null, createFakeTaskInfo(), null))
// so we resubmit stage 0, which completes happily
Thread.sleep(1000)
val stage0Resubmit = taskSets(2)
assert(stage0Resubmit.stageId == 0)
assert(stage0Resubmit.attempt === 1)
```

the reason for that because dagScheduler handle FetchFailed, it will 
```
  messageScheduler.schedule(new Runnable {
override def run(): Unit = 
eventProcessLoop.post(ResubmitFailedStages)
  }, 200, TimeUnit.MILLISECONDS) 
```
and runEvent is asyn, so need to wait some time to make  sure that event 
was processed



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r33858523
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -305,7 +305,7 @@ private[spark] class MapOutputTrackerMaster(conf: 
SparkConf)
 
 if (mapStatuses.contains(shuffleId)) {
   val statuses = mapStatuses(shuffleId)
-  if (statuses.nonEmpty) {
+  if (statuses.nonEmpty  statuses.exists(_ != null)) {
--- End diff --

These two correctness occurs while I revert my modify back to verify my 
test case will fails. but when I do this, it always encounter some strange 
problem. 
while we submitMissingTask, we will call `getPreferredLocs` for each task. 
and in that, for rdd's each ShuffleDependency, it will call 
`mapOutputTracker.getLocationsWithLargestOutputs(`, and in that, it will judge 
status.nonEmpty(), so it alway true,
 so it will always call `val status = statuses(mapIdx); val blockSize = 
status.getSizeForBlock(reducerId)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

2015-06-30 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4055#issuecomment-117419882
  
@squito I am in vacation in the past 10 days, sorry too late to see that. I 
will read your comments more carefully tomorrow, and it's grateful to you for 
reviewing. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8044][CORE]Avoid to make directbuffer o...

2015-06-30 Thread suyanNone

Github user suyanNone closed the pull request at:

https://github.com/apache/spark/pull/6586


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8044][CORE]Avoid to make directbuffer o...

2015-06-30 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/6586#issuecomment-117421446
  
@srowen  I am in vacation in the past 10 days, sorry too late to see that. 
eh... ok, I will close this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8100][UI]Make able to refer lost execut...

2015-06-20 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/6644#issuecomment-113865096
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4055#issuecomment-113414080
  
@squito, can you help me to review the test again. 3ks




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4055#issuecomment-113477240
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...

Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r32811839
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -305,7 +305,7 @@ private[spark] class MapOutputTrackerMaster(conf: 
SparkConf)
 
 if (mapStatuses.contains(shuffleId)) {
   val statuses = mapStatuses(shuffleId)
-  if (statuses.nonEmpty) {
+  if (statuses.nonEmpty  statuses.exists(_ != null)) {
--- End diff --

nonEmpty is just to judge the array length != 0
but we will register array[partitionSize] for each shuffle when job submit.
So statuses,nonEmpty is always true




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4055#issuecomment-113438222
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6157][CORE]Unroll unsuccessful memory_a...

Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/4887#discussion_r32811415
  
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -295,9 +296,9 @@ private[spark] class MemoryStore(blockManager: 
BlockManager, maxMemory: Long)
   // In this case, we should release the memory after we cache the 
block there.
   // Otherwise, if we return an iterator, we release the memory 
reserved here
   // later when the task finishes.
-  if (keepUnrolling) {
-accountingLock.synchronized {
-  val amountToRelease = currentUnrollMemoryForThisThread - 
previousMemoryReserved
+  accountingLock.synchronized {
--- End diff --

@srowen releasePendingUnrollMemoryForThisThread means: we will reserved the 
unroll memory reserved for the unroll block, we need to release that part of 
memory when we  not use any more, like: we put the unroll successful block to 
memoryStore, or we put unroll unsuccessful block into the disk, or we complete 
the task.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...