[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-07 Thread zhonghaihua
Github user zhonghaihua commented on the issue:

https://github.com/apache/spark/pull/12258
  
@vanzin my JIRA username is `iward`. Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-07 Thread zhonghaihua
Github user zhonghaihua commented on the issue:

https://github.com/apache/spark/pull/12258
  
@vanzin retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12258: [SPARK-14485][CORE] ignore task finished for exec...

2016-06-06 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/12258#discussion_r66000194
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -343,17 +343,31 @@ private[spark] class TaskSchedulerImpl(
 }
 taskIdToTaskSetManager.get(tid) match {
   case Some(taskSet) =>
+var executorId: String = null
 if (TaskState.isFinished(state)) {
   taskIdToTaskSetManager.remove(tid)
   taskIdToExecutorId.remove(tid).foreach { execId =>
+executorId = execId
 if (executorIdToTaskCount.contains(execId)) {
   executorIdToTaskCount(execId) -= 1
 }
   }
 }
 if (state == TaskState.FINISHED) {
+  // In some case, executor has already removed by driver for 
heartbeats timeout, but
+  // at sometime, before executor killed  by cluster, the task 
of running on this
+  // executor is finished and return task success state to 
driver. However, this kinds
+  // of task should be ignored, because the task on this 
executor is already re-queued
+  // by driver. For more details, can check in SPARK-14485.
   taskSet.removeRunningTask(tid)
-  taskResultGetter.enqueueSuccessfulTask(taskSet, tid, 
serializedData)
+  if (executorId != null && 
!executorIdToTaskCount.contains(executorId)) {
+logInfo(
+  ("Ignoring update with state %s for TID %s because its 
executor has already " +
--- End diff --

Yeah, thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12258: [SPARK-14485][CORE] ignore task finished for exec...

2016-06-06 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/12258#discussion_r65997697
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -343,17 +343,31 @@ private[spark] class TaskSchedulerImpl(
 }
 taskIdToTaskSetManager.get(tid) match {
   case Some(taskSet) =>
+var executorId: String = null
 if (TaskState.isFinished(state)) {
   taskIdToTaskSetManager.remove(tid)
   taskIdToExecutorId.remove(tid).foreach { execId =>
+executorId = execId
 if (executorIdToTaskCount.contains(execId)) {
   executorIdToTaskCount(execId) -= 1
 }
   }
 }
 if (state == TaskState.FINISHED) {
+  // In some case, executor has already removed by driver for 
heartbeats timeout, but
+  // at sometime, before executor killed  by cluster, the task 
of running on this
+  // executor is finished and return task success state to 
driver. However, this kinds
+  // of task should be ignored, because the task on this 
executor is already re-queued
+  // by driver. For more details, can check in SPARK-14485.
   taskSet.removeRunningTask(tid)
-  taskResultGetter.enqueueSuccessfulTask(taskSet, tid, 
serializedData)
+  if (executorId != null && 
!executorIdToTaskCount.contains(executorId)) {
+logInfo(
+  ("Ignoring update with state %s for TID %s because its 
executor has already " +
--- End diff --

I see, thanks for your comments. @vanzin 

Besides, according to your point of view, I think this log 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L371
 should also be changed use  interpolation, right ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-06 Thread zhonghaihua
Github user zhonghaihua commented on the issue:

https://github.com/apache/spark/pull/12258
  
@vanzin @andrewor14 Could someone review it, or any thoughts or concerns 
for this patch ? Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-04 Thread zhonghaihua
Github user zhonghaihua commented on the issue:

https://github.com/apache/spark/pull/12258
  
Hi, @vanzin , the code has changed, could you review it, please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12258: [SPARK-14485][CORE] ignore task finished for exec...

2016-06-03 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/12258#discussion_r65797802
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -343,17 +343,31 @@ private[spark] class TaskSchedulerImpl(
 }
 taskIdToTaskSetManager.get(tid) match {
   case Some(taskSet) =>
+var executorId: String = null
 if (TaskState.isFinished(state)) {
   taskIdToTaskSetManager.remove(tid)
   taskIdToExecutorId.remove(tid).foreach { execId =>
+executorId = execId
 if (executorIdToTaskCount.contains(execId)) {
   executorIdToTaskCount(execId) -= 1
 }
   }
 }
 if (state == TaskState.FINISHED) {
-  taskSet.removeRunningTask(tid)
-  taskResultGetter.enqueueSuccessfulTask(taskSet, tid, 
serializedData)
+  // In some case, executor has already removed by driver for 
heartbeats timeout, but
+  // at sometime, before executor killed  by cluster, the task 
of running on this
+  // executor is finished and return task success state to 
driver. However, this kinds
+  // of task should be ignored, because the task on this 
executor is already re-queued
+  // by driver. For more details, can check in SPARK-14485.
+  if (executorId.ne(null) && 
!executorIdToTaskCount.contains(executorId)) {
+taskSet.removeRunningTask(tid)
+logWarning(
--- End diff --

Yes, you are right. I will change it soon, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12258: [SPARK-14485][CORE] ignore task finished for exec...

2016-06-03 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/12258#discussion_r6579
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -343,17 +343,31 @@ private[spark] class TaskSchedulerImpl(
 }
 taskIdToTaskSetManager.get(tid) match {
   case Some(taskSet) =>
+var executorId: String = null
 if (TaskState.isFinished(state)) {
   taskIdToTaskSetManager.remove(tid)
   taskIdToExecutorId.remove(tid).foreach { execId =>
+executorId = execId
 if (executorIdToTaskCount.contains(execId)) {
   executorIdToTaskCount(execId) -= 1
 }
   }
 }
 if (state == TaskState.FINISHED) {
-  taskSet.removeRunningTask(tid)
-  taskResultGetter.enqueueSuccessfulTask(taskSet, tid, 
serializedData)
+  // In some case, executor has already removed by driver for 
heartbeats timeout, but
+  // at sometime, before executor killed  by cluster, the task 
of running on this
+  // executor is finished and return task success state to 
driver. However, this kinds
+  // of task should be ignored, because the task on this 
executor is already re-queued
+  // by driver. For more details, can check in SPARK-14485.
+  if (executorId.ne(null) && 
!executorIdToTaskCount.contains(executorId)) {
+taskSet.removeRunningTask(tid)
--- End diff --

I see. Fix it soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12258: [SPARK-14485][CORE] ignore task finished for exec...

2016-06-03 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/12258#discussion_r65797773
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -343,17 +343,31 @@ private[spark] class TaskSchedulerImpl(
 }
 taskIdToTaskSetManager.get(tid) match {
   case Some(taskSet) =>
+var executorId: String = null
 if (TaskState.isFinished(state)) {
   taskIdToTaskSetManager.remove(tid)
   taskIdToExecutorId.remove(tid).foreach { execId =>
+executorId = execId
 if (executorIdToTaskCount.contains(execId)) {
   executorIdToTaskCount(execId) -= 1
 }
   }
 }
 if (state == TaskState.FINISHED) {
-  taskSet.removeRunningTask(tid)
-  taskResultGetter.enqueueSuccessfulTask(taskSet, tid, 
serializedData)
+  // In some case, executor has already removed by driver for 
heartbeats timeout, but
+  // at sometime, before executor killed  by cluster, the task 
of running on this
+  // executor is finished and return task success state to 
driver. However, this kinds
+  // of task should be ignored, because the task on this 
executor is already re-queued
+  // by driver. For more details, can check in SPARK-14485.
+  if (executorId.ne(null) && 
!executorIdToTaskCount.contains(executorId)) {
--- End diff --

Thanks for your comments. I will fix it soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-03 Thread zhonghaihua
Github user zhonghaihua commented on the issue:

https://github.com/apache/spark/pull/12258
  
@srowen @JoshRosen @andrewor14 Could someone verify this PR? Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14485][CORE] ignore task finished for e...

2016-04-12 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/12258#issuecomment-208881577
  
@JoshRosen @andrewor14 Could you verify this PR? Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14485][CORE] ignore task finished for e...

2016-04-08 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/12258#issuecomment-207296818
  
cc @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Ignore task finish for executor lost and remov...

2016-04-08 Thread zhonghaihua
GitHub user zhonghaihua opened a pull request:

https://github.com/apache/spark/pull/12258

Ignore task finish for executor lost and removed by driver

## What changes were proposed in this pull request?

Now, when executor is removed by driver with heartbeats timeout, driver 
will re-queue the task on this executor and send a kill command to cluster to 
kill this executor.
But, in a situation, the running task of this executor is finished and 
return result to driver before this executor killed by kill command sent by 
driver. At this situation, driver will accept the task finished event and 
ignore speculative task and re-queued task. 
But, as we know, this executor has removed by driver, the result of this 
finished task can not save in driver because the BlockManagerId has also 
removed from BlockManagerMaster by driver. So, the result data of this stage is 
not complete, and then, it will cause fetch failure. For more details, [link to 
jira issues SPARK-14485](https://issues.apache.org/jira/browse/SPARK-14485)
This PR introduce a mechanism to ignore this kind of task finished.


## How was this patch tested?

None.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhonghaihua/spark 
ignoreTaskFinishForExecutorLostAndRemovedByDriver

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12258.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12258


commit d491ecd845e41e96aa94f36655860abb0dc70947
Author: zhonghaihua <793507...@qq.com>
Date:   2016-04-08T06:16:05Z

ignore task finished when its executor has already removed by driver

commit c85cf990e84419fa10e3e78e2a32d6e015e46435
Author: zhonghaihua <793507...@qq.com>
Date:   2016-04-08T07:40:07Z

add comment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-04-02 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-204715025
  
Hi, @tgravescs , my jira id is `Iward`. Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-04-01 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-204288682
  
@andrewor14 @tgravescs @vanzin The code and the comment is optimized. And 
the description of this PR and jira is also updated. Please review it again. 
Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-03-31 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r58042774
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -81,8 +82,20 @@ private[yarn] class YarnAllocator(
 new ConcurrentHashMap[ContainerId, java.lang.Boolean])
 
   @volatile private var numExecutorsRunning = 0
-  // Used to generate a unique ID per executor
-  private var executorIdCounter = 0
+
+  /**
+   * Used to generate a unique ID per executor
+   *
+   * Init `executorIdCounter`. when AM restart, `executorIdCounter` will 
reset to 0. Then
+   * the id of new executor will start from 1, this will conflict with the 
executor has
--- End diff --

@tgravescs Ok, I will do it soon. Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-03-31 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r58032934
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -155,6 +158,9 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   // in this block are read when requesting executors
   CoarseGrainedSchedulerBackend.this.synchronized {
 executorDataMap.put(executorId, data)
+if (currentExecutorIdCounter < Integer.parseInt(executorId)) {
+  currentExecutorIdCounter = Integer.parseInt(executorId)
+}
--- End diff --

@vanzin Thanks for your comments. I will optimize it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-03-31 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r58009178
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala
 ---
@@ -30,6 +30,8 @@ private[spark] object CoarseGrainedClusterMessages {
 
   case object RetrieveSparkProps extends CoarseGrainedClusterMessage
 
+  case object RetrieveMaxExecutorId extends CoarseGrainedClusterMessage
--- End diff --

@tgravescs Ok,I get your mean. Thanks a lot.
Use this name `RetrieveLastAllocatedExecutorId` is ok ? @vanzin What's your 
opinion ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-03-29 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r57838823
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -81,8 +82,20 @@ private[yarn] class YarnAllocator(
 new ConcurrentHashMap[ContainerId, java.lang.Boolean])
 
   @volatile private var numExecutorsRunning = 0
-  // Used to generate a unique ID per executor
-  private var executorIdCounter = 0
+
+  /**
+   * Used to generate a unique ID per executor
+   *
+   * Init `executorIdCounter`. when AM restart, `executorIdCounter` will 
reset to 0. Then
+   * the id of new executor will start from 1, this will conflict with the 
executor has
--- End diff --

I think we can clarify this in `SPARK-12864` issue. @andrewor14 What's your 
opinion ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-03-29 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r57838667
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -81,8 +82,20 @@ private[yarn] class YarnAllocator(
 new ConcurrentHashMap[ContainerId, java.lang.Boolean])
 
   @volatile private var numExecutorsRunning = 0
-  // Used to generate a unique ID per executor
-  private var executorIdCounter = 0
+
+  /**
+   * Used to generate a unique ID per executor
+   *
+   * Init `executorIdCounter`. when AM restart, `executorIdCounter` will 
reset to 0. Then
+   * the id of new executor will start from 1, this will conflict with the 
executor has
+   * already created before. So, we should initialize the 
`executorIdCounter` by getting
+   * the max executorId from driver.
+   *
+   * @see SPARK-12864
+   */
+  private var executorIdCounter: Int = {
--- End diff --

@tgravescs Thanks your comment. Yes. I think in this situation, when Am 
comes up, if the executor has already registered, the execuotr information must 
set in `executorDataMap`. And in executor registers process, it has a check to 
make sure we don't allow executor to register with same id. 
You can see it in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L139


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-03-29 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r57837944
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala
 ---
@@ -30,6 +30,8 @@ private[spark] object CoarseGrainedClusterMessages {
 
   case object RetrieveSparkProps extends CoarseGrainedClusterMessage
 
+  case object RetrieveMaxExecutorId extends CoarseGrainedClusterMessage
--- End diff --

@tgravescs Thanks for reviewing. Because this variables is the max 
executorId in previous AM, and this is just called by initializing AM. Our 
intention is to get the max executorId from all executor in previous AM. So I 
think maxExecutorId is ok. 
@andrewor14 What's your opinion ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-03-26 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-201880646
  
@andrewor14 @tgravescs @vanzin Could you verify this PR, or any thoughts or 
concerns for this ? Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-03-03 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-191830328
  
Hi @andrewor14 , any thoughts or concerns for this patch ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-25 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-189095532
  
Hi @andrewor14 , could you review this ?  Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-25 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r54094456
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -78,6 +78,9 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   // Executors that have been lost, but for which we don't yet know the 
real exit reason.
   protected val executorsPendingLossReason = new HashSet[String]
 
+  // The num of current max ExecutorId used to re-register appMaster
+  var currentExecutorIdCounter = 0
--- End diff --

Hi @jerryshao , you are right. I fix it now. Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-25 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-188730628
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-25 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r54063643
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -81,8 +82,20 @@ private[yarn] class YarnAllocator(
 new ConcurrentHashMap[ContainerId, java.lang.Boolean])
 
   @volatile private var numExecutorsRunning = 0
-  // Used to generate a unique ID per executor
-  private var executorIdCounter = 0
+
+  /**
+   * Used to generate a unique ID per executor
+   *
+   * Init `executorIdCounter`. when AM restart, `executorIdCounter` will 
reset to 0. Then
+   * the id of new executor will start from 1, this will conflict with the 
executor has
+   * already created before. So, we should initialize the 
`executorIdCounter` by getting
+   * the max executorId from driver.
+   *
+   * @see SPARK-12864
+   */
+  private var executorIdCounter: Int = {
+driverRef.askWithRetry[Int](RetrieveMaxExecutorId) + 1
--- End diff --

I see, thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-25 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r54062244
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -81,8 +82,20 @@ private[yarn] class YarnAllocator(
 new ConcurrentHashMap[ContainerId, java.lang.Boolean])
 
   @volatile private var numExecutorsRunning = 0
-  // Used to generate a unique ID per executor
-  private var executorIdCounter = 0
+
+  /**
+   * Used to generate a unique ID per executor
+   *
+   * Init `executorIdCounter`. when AM restart, `executorIdCounter` will 
reset to 0. Then
+   * the id of new executor will start from 1, this will conflict with the 
executor has
+   * already created before. So, we should initialize the 
`executorIdCounter` by getting
+   * the max executorId from driver.
+   *
+   * @see SPARK-12864
+   */
+  private var executorIdCounter: Int = {
+driverRef.askWithRetry[Int](RetrieveMaxExecutorId) + 1
--- End diff --

Hi @jerryshao , I made a test just now. It can work in yarn cluster mode. 
From my understanding, `driverRef` is created by `ApplicationMaster`, so, I 
think at this time `driverEndpoint` is started. What's your opinion ? Thanks a 
lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-25 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r54060168
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -81,8 +82,20 @@ private[yarn] class YarnAllocator(
 new ConcurrentHashMap[ContainerId, java.lang.Boolean])
 
   @volatile private var numExecutorsRunning = 0
-  // Used to generate a unique ID per executor
-  private var executorIdCounter = 0
+
+  /**
+   * Used to generate a unique ID per executor
+   *
+   * Init `executorIdCounter`. when AM restart, `executorIdCounter` will 
reset to 0. Then
+   * the id of new executor will start from 1, this will conflict with the 
executor has
+   * already created before. So, we should initialize the 
`executorIdCounter` by getting
+   * the max executorId from driver.
+   *
+   * @see SPARK-12864
+   */
+  private var executorIdCounter: Int = {
+driverRef.askWithRetry[Int](RetrieveMaxExecutorId) + 1
--- End diff --

Yes, but just in yarn-client mode. Is it necessary to test in yarn-cluster 
mode ? @jerryshao 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-24 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r54058736
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -81,8 +82,20 @@ private[yarn] class YarnAllocator(
 new ConcurrentHashMap[ContainerId, java.lang.Boolean])
 
   @volatile private var numExecutorsRunning = 0
-  // Used to generate a unique ID per executor
-  private var executorIdCounter = 0
+
+  /**
+   * Used to generate a unique ID per executor
+   *
+   * Init `executorIdCounter`. when AM restart, `executorIdCounter` will 
reset to 0. Then
+   * the id of new executor will start from 1, this will conflict with the 
executor has
+   * already created before. So, we should initialize the 
`executorIdCounter` by getting
+   * the max executorId from driver.
+   *
+   * @see SPARK-12864
+   */
+  private var executorIdCounter: Int = {
+driverRef.askWithRetry[Int](RetrieveMaxExecutorId) + 1
--- End diff --

Hi @jerryshao , thanks for reviewing. Allocating a new executor won't 
execute that code. It just request the executor id when `YarnAllocator` being 
created.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-24 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r54058368
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -78,6 +78,9 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   // Executors that have been lost, but for which we don't yet know the 
real exit reason.
   protected val executorsPendingLossReason = new HashSet[String]
 
+  // The num of current max ExecutorId used to re-register appMaster
+  var currentExecutorIdCounter = 0
--- End diff --

Hi @jerryshao , thanks for your comments. The master branch is different 
from branch-1.5.x version. In master branch,`CoarseGrainedSchedulerBackend` is 
belong to module `core` and `YarnSchedulerBackend` is belong to module `yarn` , 
while in branch-1.5.x version it is belong to the same package. So, from my 
understanding, `protected` is unsuited here, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-24 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r53919614
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -169,6 +172,24 @@ private[yarn] class YarnAllocator(
   }
 
   /**
+   * Init `executorIdCounter`
+   */
+  def initExecutorIdCounter(): Unit = {
+val port = sparkConf.getInt("spark.yarn.am.port", 0)
+SparkHadoopUtil.get.runAsSparkUser { () =>
+  val init = RpcEnv.create(
+"executorIdCounterInit",
+Utils.localHostName,
+port,
+sparkConf,
+new SecurityManager(sparkConf))
+  val driver = init.setupEndpointRefByURI(driverUrl)
--- End diff --

Hi @jerryshao , thanks for your comments. I see what you mean, I will fix 
it soon. Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-21 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-186993839
  
Hi @andrewor14 , the reason of test failed seems `GitException`. Could you 
retest it ? Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-21 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r53579600
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -169,6 +172,24 @@ private[yarn] class YarnAllocator(
   }
 
   /**
+   * Init `executorIdCounter`
+   */
+  def initExecutorIdCounter(): Unit = {
+val port = sparkConf.getInt("spark.yarn.am.port", 0)
+SparkHadoopUtil.get.runAsSparkUser { () =>
+  val init = RpcEnv.create(
+"executorIdCounterInit",
+Utils.localHostName,
+port,
+sparkConf,
+new SecurityManager(sparkConf))
+  val driver = init.setupEndpointRefByURI(driverUrl)
--- End diff --

Hi @andrewor14 , `driverRef` doesn't work in this case. Because, for my 
understanding, `driverRef` which endpoint name called `YarnScheduler` send 
message to `YarnSchedulerEndpoint` (or get message from 
`YarnSchedulerEndpoint`), while we should get max executorId from 
`CoarseGrainedSchedulerBackend.DriverEndpoint` which endpoint name called 
`CoarseGrainedScheduler`.

So, I think we should need a method to initialize `executorIdCounter`. And 
as you said, we should add huge comment huge comment related to SPARK-12864 to 
explain why we need to do this at this method. What‘s your opinion ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-20 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r53562669
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -155,6 +158,9 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   // in this block are read when requesting executors
   CoarseGrainedSchedulerBackend.this.synchronized {
 executorDataMap.put(executorId, data)
+if (currentExecutorIdCounter < Integer.parseInt(executorId)) {
+  currentExecutorIdCounter = Integer.parseInt(executorId)
+}
--- End diff --

Thank for review it. For my understanding, I don't think we can get the max 
executor ID in executorDataMap. Because, when AM is failure, all the executor 
are disconnect and be removed, by this time, as the code in method 
`CoarseGrainedSchedulerBackend.removeExecutor` show, the executor information 
in executorDataMap also be removed. 
So, I think the executor information in executorDataMap is not complete. 
What do you think ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-20 Thread zhonghaihua
Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r53562543
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala
 ---
@@ -30,6 +30,8 @@ private[spark] object CoarseGrainedClusterMessages {
 
   case object RetrieveSparkProps extends CoarseGrainedClusterMessage
 
+  case object RetrieveCurrentExecutorIdCounter extends 
CoarseGrainedClusterMessage
--- End diff --

@andrewor14 Yeah, you are right. I will fix it soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-18 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-186016863
  
Hi, @andrewor14 , I agree with @jerryshao , I think that is not related to 
it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-18 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-185625228
  
@andrewor14 @marmbrus @rxin ,  any thoughts or concerns for this patch ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-04 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-179924619
  
@andrewor14 Thanks for review it. Could this path merge to master ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-01-25 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-174830247
  
@marmbrus @liancheng @yhuai Could you verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: initialize executorIdCounter after Application...

2016-01-17 Thread zhonghaihua
GitHub user zhonghaihua opened a pull request:

https://github.com/apache/spark/pull/10794

initialize executorIdCounter after ApplicationMaster killed for max n…

Currently, when max number of executor failures reached the 
`maxNumExecutorFailures`, `ApplicationMaster` will be killed and re-register 
another one.This time, `YarnAllocator` will be created a new instance.
But, the value of property `executorIdCounter` in `YarnAllocator` will 
reset to `0`. Then the Id of new executor will starting from `1`. This will 
confuse with the executor has already created before, which will cause 
FetchFailedException.
This PR introduce a mechanism to initialize `executorIdCounter` after 
`ApplicationMaster` killed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhonghaihua/spark 
initExecutorIdCounterAfterAMKilled

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10794.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10794


commit 30048ac7ac9fc95edc1b936076415dea335848ef
Author: zhonghaihua <793507...@qq.com>
Date:   2016-01-17T12:46:44Z

initialize executorIdCounter after ApplicationMaster killed for max number 
of executor failures reached




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-01-17 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-172323185
  
cc @rxin @marmbrus @chenghao-intel @jeanlyn could you give some advice ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12125][SQL] pull out nondeterministic e...

2016-01-05 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10128#issuecomment-169231327
  
@marmbrus Thanks for your suggestions. I think your idea can simply solve 
problem. But in some situations, this seems not very suitable. 
For example:
`Join(testRelation, testRelation2, Inner,Some(And(EqualTo(a, b), 
EqualTo(Rand(33) * c, d` If `c` is an attribute belong to `testRelation2`, 
I think `Rand(33)` is more appropriately pulled out to the right child tree of 
`Join`, otherwise, if belong to `testRelation`, it is appropriately pulled out 
to left child tree. 

When `nondeterministic expressions` is used with `table attribute`, I think 
pull out it should depend on the attribute. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12125][SQL] pull out nondeterministic e...

2016-01-04 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10128#issuecomment-168892937
  
@rxin @yhuai Could you verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12125][SQL] pull out nondeterministic e...

2015-12-04 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10128#issuecomment-161969421
  
@cloud-fan Thanks for your advice. But, as @jeanlyn said,`Repartition` will 
deal with all data, and this PR will deal with join keys cause data skew.
Because in some situations, we will use this operator to avoid data skew in 
`SQL`, then I think maybe we should support this. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: pull out nondeterministic expressions from Joi...

2015-12-03 Thread zhonghaihua
GitHub user zhonghaihua opened a pull request:

https://github.com/apache/spark/pull/10128

pull out nondeterministic expressions from Join

Currently,`nondeterministic expressions` are only allowed in `Project` or 
`Filter`,And only when we use nondeterministic expressions in `UnaryNode` can 
be pulled out.

But,Sometime in many case,we will use nondeterministic expressions to 
process `join keys` avoiding data skew.for example:
```
select * 
from tableA a 
join 
(select * from tableB) b 
on upper((case when (a.brand_code is null or a.brand_code = '' ) then cast( 
(-rand() * 1000 ) as string ) else a.brand_code end ))  = b.brand_code

```

This PR introduce a mechanism to pull out nondeterministic expressions from 
`Join`,so we can use nondeterministic expression in `Join` appropriately.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhonghaihua/spark pulloutJoinNondeterministic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10128.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10128


commit 6e166578a5c1a1faf260389509663ac8c71ec015
Author: zhonghaihua <793507...@qq.com>
Date:   2015-11-30T07:44:49Z

pull out nondeterministic expressions from Join

pull out nondeterministic expressions from Join

pull out nondeterministic expressions from Join




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: pull out nondeterministic expressions from Joi...

2015-12-03 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10128#issuecomment-161590938
  
 @rxin @cloud-fan @chenghao-intel @jeanlyn Could you give some suggestions 
on this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Join nondeterministic

2015-12-02 Thread zhonghaihua
GitHub user zhonghaihua opened a pull request:

https://github.com/apache/spark/pull/10122

Join nondeterministic



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhonghaihua/spark join_nondeterministic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10122.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10122


commit 6d8ebc801799714d297c83be6935b37e26dc2df7
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-26T05:35:49Z

[SPARK-10243] [MLLIB] update since versions in mllib.tree

Same as #8421 but for `mllib.tree`.

cc jkbradley

Author: Xiangrui Meng <m...@databricks.com>

Closes #8442 from mengxr/SPARK-10236.

(cherry picked from commit fb7e12fe2e14af8de4c206ca8096b2e8113bfddc)
Signed-off-by: Xiangrui Meng <m...@databricks.com>

commit 08d390f457f80ffdc2dfce61ea579d9026047f12
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-26T05:49:33Z

[SPARK-10235] [MLLIB] update since versions in mllib.regression

Same as #8421 but for `mllib.regression`.

cc freeman-lab dbtsai

Author: Xiangrui Meng <m...@databricks.com>

Closes #8426 from mengxr/SPARK-10235 and squashes the following commits:

6cd28e4 [Xiangrui Meng] update since versions in mllib.regression

(cherry picked from commit 4657fa1f37d41dd4c7240a960342b68c7c591f48)
Signed-off-by: DB Tsai <d...@netflix.com>

commit 21a10a86d20ec1a6fea42286b4d2aae9ce7e848d
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-26T06:45:41Z

[SPARK-10236] [MLLIB] update since versions in mllib.feature

Same as #8421 but for `mllib.feature`.

cc dbtsai

Author: Xiangrui Meng <m...@databricks.com>

Closes #8449 from mengxr/SPARK-10236.feature and squashes the following 
commits:

0e8d658 [Xiangrui Meng] remove unnecessary comment
ad70b03 [Xiangrui Meng] update since versions in mllib.feature

(cherry picked from commit 321d7759691bed9867b1f0470f12eab2faa50aff)
Signed-off-by: DB Tsai <d...@netflix.com>

commit 5220db9e352b5d5eae59cead9478ca0a9f73f16b
Author: felixcheung <felixcheun...@hotmail.com>
Date:   2015-08-26T06:48:16Z

[SPARK-9316] [SPARKR] Add support for filtering using `[` (synonym for 
filter / select)

Add support for
```
   df[df$name == "Smith", c(1,2)]
   df[df$age %in% c(19, 30), 1:2]
```

shivaram

Author: felixcheung <felixcheun...@hotmail.com>

Closes #8394 from felixcheung/rsubset.

(cherry picked from commit 75d4773aa50e24972c533e8b48697fde586429eb)
Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu>

commit b0dde36009ce371824ce3e47e60fa0711d7733bb
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-26T18:47:05Z

[SPARK-9665] [MLLIB] audit MLlib API annotations

I only found `ml.NaiveBayes` missing `Experimental` annotation. This PR 
doesn't cover Python APIs.

cc jkbradley

Author: Xiangrui Meng <m...@databricks.com>

Closes #8452 from mengxr/SPARK-9665.

(cherry picked from commit 6519fd06cc8175c9182ef16cf8a37d7f255eb846)
Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit efbd7af44e855efcbb1fa224e80db24947e2b153
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-26T21:02:19Z

[SPARK-10241] [MLLIB] update since versions in mllib.recommendation

Same as #8421 but for `mllib.recommendation`.

cc srowen coderxiang

Author: Xiangrui Meng <m...@databricks.com>

Closes #8432 from mengxr/SPARK-10241.

(cherry picked from commit 086d4681df3ebfccfc04188262c10482f44553b0)
Signed-off-by: Xiangrui Meng <m...@databricks.com>

commit 0bdb800575ae2872e2655983a1be94dcf2e8c36b
Author: Davies Liu <dav...@databricks.com>
Date:   2015-08-26T23:04:44Z

[SPARK-10305] [SQL] fix create DataFrame from Python class

cc jkbradley

Author: Davies Liu <dav...@databricks.com>

Closes #8470 from davies/fix_create_df.

(cherry picked from commit d41d6c48207159490c1e1d9cc54015725cfa41b2)
Signed-off-by: Davies Liu <davies@gmail.com>

commit cef707d2185ca7e0c5635fabe709d5e26915b5bb
Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu>
Date:   2015-08-27T01:13:07Z

[SPARK-10308] [SPARKR] Add %in% to the exported namespace

I also checked all the other functions defined in column.R, functions.R and 
DataFrame.R and everything else looked fine.

cc yu-iskw

Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu>

Closes #8473 from 

[GitHub] spark pull request: Join nondeterministic

2015-12-02 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10122#issuecomment-161533958
  
I am so sorry to create this pull request, this pr is not on the right 
branch.I will close it right now.
This is my mistake, cause trouble, very sorry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Join nondeterministic

2015-12-02 Thread zhonghaihua
Github user zhonghaihua closed the pull request at:

https://github.com/apache/spark/pull/10122


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-15 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-156810307
  
Hi @zhichao-li ,thanks for doing this.I got a problem of scanning 
partitions slowly,and I apply this patch to my spark version.In my case:
 * Before I apply this patch,it takes at least 3 or 4 minutes to scan 
partitions.
 * After applying this patch,it takes only about 20 seconds at this stage.
I am happy to see it takes effect in my case.It solve my problem.And I 
think is it better to add conf to control whether to use this feature?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-15 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-156810714
  
Hi @zhichao-li ,thanks for doing this.I got a problem of scanning 
partitions slowly,and I apply this patch to my spark version.In my case:
* Before I apply this patch,it takes at least 3 or 4 minutes to scan 
partitions.
* After applying this patch,it takes only about 20 seconds at this stage.

I am happy to see it takes effect in my case.It solve my problem.And I 
think is it better to add conf to control whether to use this feature?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org