[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2016-02-17 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150128#comment-15150128
 ] 

Sean Owen commented on SPARK-8119:
--

I don't think there are any more 1.4.x releases to come. Still, if you open a 
clean back port to the branch I'll look at merging it.

> HeartbeatReceiver should not adjust application executor resources
> --
>
> Key: SPARK-8119
> URL: https://issues.apache.org/jira/browse/SPARK-8119
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: SaintBacchus
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.5.0
>
>
> DynamicAllocation will set the total executor to a little number when it 
> wants to kill some executors.
> But in no-DynamicAllocation scenario, Spark will also set the total executor.
> So it will cause such problem: sometimes an executor fails down, there is no 
> more executor which will be pull up by spark.
> === EDIT by andrewor14 ===
> The issue is that the AM forgets about the original number of executors it 
> wants after calling sc.killExecutor. Even if dynamic allocation is not 
> enabled, this is still possible because of heartbeat timeouts.
> I think the problem is that sc.killExecutor is used incorrectly in 
> HeartbeatReceiver. The intention of the method is to permanently adjust the 
> number of executors the application will get. In HeartbeatReceiver, however, 
> this is used as a best-effort mechanism to ensure that the timed out executor 
> is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2016-02-16 Thread Zhen Peng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150055#comment-15150055
 ] 

Zhen Peng commented on SPARK-8119:
--

Hi [~srowen], I think it's really a serious bug, do you have any reason for not 
back-porting it to 1.4.x?

> HeartbeatReceiver should not adjust application executor resources
> --
>
> Key: SPARK-8119
> URL: https://issues.apache.org/jira/browse/SPARK-8119
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: SaintBacchus
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.5.0
>
>
> DynamicAllocation will set the total executor to a little number when it 
> wants to kill some executors.
> But in no-DynamicAllocation scenario, Spark will also set the total executor.
> So it will cause such problem: sometimes an executor fails down, there is no 
> more executor which will be pull up by spark.
> === EDIT by andrewor14 ===
> The issue is that the AM forgets about the original number of executors it 
> wants after calling sc.killExecutor. Even if dynamic allocation is not 
> enabled, this is still possible because of heartbeat timeouts.
> I think the problem is that sc.killExecutor is used incorrectly in 
> HeartbeatReceiver. The intention of the method is to permanently adjust the 
> number of executors the application will get. In HeartbeatReceiver, however, 
> this is used as a best-effort mechanism to ensure that the timed out executor 
> is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2015-09-10 Thread Dan Shechter (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738995#comment-14738995
 ] 

Dan Shechter commented on SPARK-8119:
-

Why was the target version moved to 1.5.1?
Wasn't this already marked as fixed for 1.5.0?
Is now pushed back?

> HeartbeatReceiver should not adjust application executor resources
> --
>
> Key: SPARK-8119
> URL: https://issues.apache.org/jira/browse/SPARK-8119
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: SaintBacchus
>Assignee: Andrew Or
>Priority: Critical
>  Labels: backport-needed
> Fix For: 1.5.0
>
>
> DynamicAllocation will set the total executor to a little number when it 
> wants to kill some executors.
> But in no-DynamicAllocation scenario, Spark will also set the total executor.
> So it will cause such problem: sometimes an executor fails down, there is no 
> more executor which will be pull up by spark.
> === EDIT by andrewor14 ===
> The issue is that the AM forgets about the original number of executors it 
> wants after calling sc.killExecutor. Even if dynamic allocation is not 
> enabled, this is still possible because of heartbeat timeouts.
> I think the problem is that sc.killExecutor is used incorrectly in 
> HeartbeatReceiver. The intention of the method is to permanently adjust the 
> number of executors the application will get. In HeartbeatReceiver, however, 
> this is used as a best-effort mechanism to ensure that the timed out executor 
> is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2015-09-10 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739030#comment-14739030
 ] 

Sean Owen commented on SPARK-8119:
--

No, it's marked as Fixed for 1.5.0 which remains true. I did a bulk change of 
Target=1.5.0 to Target=1.5.1 which changed this one too, but then I noticed 
that didn't make sense; it's only left to be integrated into 1.4.2, so I 
restored that.

> HeartbeatReceiver should not adjust application executor resources
> --
>
> Key: SPARK-8119
> URL: https://issues.apache.org/jira/browse/SPARK-8119
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: SaintBacchus
>Assignee: Andrew Or
>Priority: Critical
>  Labels: backport-needed
> Fix For: 1.5.0
>
>
> DynamicAllocation will set the total executor to a little number when it 
> wants to kill some executors.
> But in no-DynamicAllocation scenario, Spark will also set the total executor.
> So it will cause such problem: sometimes an executor fails down, there is no 
> more executor which will be pull up by spark.
> === EDIT by andrewor14 ===
> The issue is that the AM forgets about the original number of executors it 
> wants after calling sc.killExecutor. Even if dynamic allocation is not 
> enabled, this is still possible because of heartbeat timeouts.
> I think the problem is that sc.killExecutor is used incorrectly in 
> HeartbeatReceiver. The intention of the method is to permanently adjust the 
> number of executors the application will get. In HeartbeatReceiver, however, 
> this is used as a best-effort mechanism to ensure that the timed out executor 
> is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2015-08-12 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693121#comment-14693121
 ] 

Sean Owen commented on SPARK-8119:
--

Yes, committed for 1.5.0. I don't know if it will actually go back into 1.4.x 
since it depends on other changes that aren't in 1.4.x. 

 HeartbeatReceiver should not adjust application executor resources
 --

 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: SaintBacchus
Assignee: Andrew Or
Priority: Critical
  Labels: backport-needed
 Fix For: 1.5.0


 DynamicAllocation will set the total executor to a little number when it 
 wants to kill some executors.
 But in no-DynamicAllocation scenario, Spark will also set the total executor.
 So it will cause such problem: sometimes an executor fails down, there is no 
 more executor which will be pull up by spark.
 === EDIT by andrewor14 ===
 The issue is that the AM forgets about the original number of executors it 
 wants after calling sc.killExecutor. Even if dynamic allocation is not 
 enabled, this is still possible because of heartbeat timeouts.
 I think the problem is that sc.killExecutor is used incorrectly in 
 HeartbeatReceiver. The intention of the method is to permanently adjust the 
 number of executors the application will get. In HeartbeatReceiver, however, 
 this is used as a best-effort mechanism to ensure that the timed out executor 
 is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2015-08-12 Thread Dan Shechter (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693113#comment-14693113
 ] 

Dan Shechter commented on SPARK-8119:
-

Does this mean it's already fixed for the upcoming 1.5.0?
The only outstanding issue is for the 1.4.2 backport? 

 HeartbeatReceiver should not adjust application executor resources
 --

 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: SaintBacchus
Assignee: Andrew Or
Priority: Critical
  Labels: backport-needed
 Fix For: 1.5.0


 DynamicAllocation will set the total executor to a little number when it 
 wants to kill some executors.
 But in no-DynamicAllocation scenario, Spark will also set the total executor.
 So it will cause such problem: sometimes an executor fails down, there is no 
 more executor which will be pull up by spark.
 === EDIT by andrewor14 ===
 The issue is that the AM forgets about the original number of executors it 
 wants after calling sc.killExecutor. Even if dynamic allocation is not 
 enabled, this is still possible because of heartbeat timeouts.
 I think the problem is that sc.killExecutor is used incorrectly in 
 HeartbeatReceiver. The intention of the method is to permanently adjust the 
 number of executors the application will get. In HeartbeatReceiver, however, 
 this is used as a best-effort mechanism to ensure that the timed out executor 
 is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2015-08-12 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693738#comment-14693738
 ] 

Andrew Or commented on SPARK-8119:
--

It will be in 1.4.2. I just need to backport it.

 HeartbeatReceiver should not adjust application executor resources
 --

 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: SaintBacchus
Assignee: Andrew Or
Priority: Critical
  Labels: backport-needed
 Fix For: 1.5.0


 DynamicAllocation will set the total executor to a little number when it 
 wants to kill some executors.
 But in no-DynamicAllocation scenario, Spark will also set the total executor.
 So it will cause such problem: sometimes an executor fails down, there is no 
 more executor which will be pull up by spark.
 === EDIT by andrewor14 ===
 The issue is that the AM forgets about the original number of executors it 
 wants after calling sc.killExecutor. Even if dynamic allocation is not 
 enabled, this is still possible because of heartbeat timeouts.
 I think the problem is that sc.killExecutor is used incorrectly in 
 HeartbeatReceiver. The intention of the method is to permanently adjust the 
 number of executors the application will get. In HeartbeatReceiver, however, 
 this is used as a best-effort mechanism to ensure that the timed out executor 
 is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2015-08-02 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650657#comment-14650657
 ] 

Sean Owen commented on SPARK-8119:
--

I attempted a back-port but this depends on SPARK-7835 and possibly other prior 
changes, which I'm not so familiar with.

 HeartbeatReceiver should not adjust application executor resources
 --

 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: SaintBacchus
Assignee: Andrew Or
Priority: Critical
  Labels: backport-needed
 Fix For: 1.5.0


 DynamicAllocation will set the total executor to a little number when it 
 wants to kill some executors.
 But in no-DynamicAllocation scenario, Spark will also set the total executor.
 So it will cause such problem: sometimes an executor fails down, there is no 
 more executor which will be pull up by spark.
 === EDIT by andrewor14 ===
 The issue is that the AM forgets about the original number of executors it 
 wants after calling sc.killExecutor. Even if dynamic allocation is not 
 enabled, this is still possible because of heartbeat timeouts.
 I think the problem is that sc.killExecutor is used incorrectly in 
 HeartbeatReceiver. The intention of the method is to permanently adjust the 
 number of executors the application will get. In HeartbeatReceiver, however, 
 this is used as a best-effort mechanism to ensure that the timed out executor 
 is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2015-07-16 Thread Shay Rojansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630799#comment-14630799
 ] 

Shay Rojansky commented on SPARK-8119:
--

Thanks Andrew!

 HeartbeatReceiver should not adjust application executor resources
 --

 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: SaintBacchus
Assignee: Andrew Or
Priority: Critical
 Fix For: 1.5.0


 DynamicAllocation will set the total executor to a little number when it 
 wants to kill some executors.
 But in no-DynamicAllocation scenario, Spark will also set the total executor.
 So it will cause such problem: sometimes an executor fails down, there is no 
 more executor which will be pull up by spark.
 === EDIT by andrewor14 ===
 The issue is that the AM forgets about the original number of executors it 
 wants after calling sc.killExecutor. Even if dynamic allocation is not 
 enabled, this is still possible because of heartbeat timeouts.
 I think the problem is that sc.killExecutor is used incorrectly in 
 HeartbeatReceiver. The intention of the method is to permanently adjust the 
 number of executors the application will get. In HeartbeatReceiver, however, 
 this is used as a best-effort mechanism to ensure that the timed out executor 
 is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2015-07-16 Thread Shay Rojansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630380#comment-14630380
 ] 

Shay Rojansky commented on SPARK-8119:
--

Will this really not be fixed before 1.5? This issue makes Spark 1.4 unusable 
in a Yarn environment where preemption may happen

 HeartbeatReceiver should not adjust application executor resources
 --

 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: SaintBacchus
Assignee: Andrew Or
Priority: Critical

 DynamicAllocation will set the total executor to a little number when it 
 wants to kill some executors.
 But in no-DynamicAllocation scenario, Spark will also set the total executor.
 So it will cause such problem: sometimes an executor fails down, there is no 
 more executor which will be pull up by spark.
 === EDIT by andrewor14 ===
 The issue is that the AM forgets about the original number of executors it 
 wants after calling sc.killExecutor. Even if dynamic allocation is not 
 enabled, this is still possible because of heartbeat timeouts.
 I think the problem is that sc.killExecutor is used incorrectly in 
 HeartbeatReceiver. The intention of the method is to permanently adjust the 
 number of executors the application will get. In HeartbeatReceiver, however, 
 this is used as a best-effort mechanism to ensure that the timed out executor 
 is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2015-07-16 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630545#comment-14630545
 ] 

Andrew Or commented on SPARK-8119:
--

Hi [~roji], yes, we should fix it for 1.4.2 as well. Thanks for bringing that 
up.

 HeartbeatReceiver should not adjust application executor resources
 --

 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: SaintBacchus
Assignee: Andrew Or
Priority: Critical

 DynamicAllocation will set the total executor to a little number when it 
 wants to kill some executors.
 But in no-DynamicAllocation scenario, Spark will also set the total executor.
 So it will cause such problem: sometimes an executor fails down, there is no 
 more executor which will be pull up by spark.
 === EDIT by andrewor14 ===
 The issue is that the AM forgets about the original number of executors it 
 wants after calling sc.killExecutor. Even if dynamic allocation is not 
 enabled, this is still possible because of heartbeat timeouts.
 I think the problem is that sc.killExecutor is used incorrectly in 
 HeartbeatReceiver. The intention of the method is to permanently adjust the 
 number of executors the application will get. In HeartbeatReceiver, however, 
 this is used as a best-effort mechanism to ensure that the timed out executor 
 is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources

2015-06-29 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606853#comment-14606853
 ] 

Apache Spark commented on SPARK-8119:
-

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/7107

 HeartbeatReceiver should not adjust application executor resources
 --

 Key: SPARK-8119
 URL: https://issues.apache.org/jira/browse/SPARK-8119
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: SaintBacchus

 DynamicAllocation will set the total executor to a little number when it 
 wants to kill some executors.
 But in no-DynamicAllocation scenario, Spark will also set the total executor.
 So it will cause such problem: sometimes an executor fails down, there is no 
 more executor which will be pull up by spark.
 === EDIT by andrewor14 ===
 The issue is that the AM forgets about the original number of executors it 
 wants after calling sc.killExecutor. Even if dynamic allocation is not 
 enabled, this is still possible because of heartbeat timeouts.
 I think the problem is that sc.killExecutor is used incorrectly in 
 HeartbeatReceiver. The intention of the method is to permanently adjust the 
 number of executors the application will get. In HeartbeatReceiver, however, 
 this is used as a best-effort mechanism to ensure that the timed out executor 
 is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org