[jira] [Commented] (HELIX-654) Rebalance running task

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014966#comment-16014966
 ] 

ASF GitHub Bot commented on HELIX-654:
--

Github user jiajunwang commented on a diff in the pull request:

https://github.com/apache/helix/pull/88#discussion_r117135837
  
--- Diff: helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java 
---
@@ -420,6 +411,14 @@ private ResourceAssignment 
computeResourceMapping(String jobResource,
   workflowConfig, workflowCtx, allPartitions, 
cache.getIdealStates());
   for (Map.Entry entry : 
taskAssignments.entrySet()) {
 String instance = entry.getKey();
+
+if (!isGenericTaskJob(jobCfg) || jobCfg.isRebalanceRunningTask()) {
--- End diff --

Why is this logic in the for loop? Do we need to execute it for each 
 entry?


> Rebalance running task
> --
>
> Key: HELIX-654
> URL: https://issues.apache.org/jira/browse/HELIX-654
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core
>Reporter: Weihan Kong
>
> h3. Feature summary
> Helix Task Framework empowers user to run tasks on instances managed by 
> Helix. There're 2 type of tasks: generic task and fixed target task. For 
> fixed target task, the task always follows the targeted partition and is 
> rebalanced if the partition is rebalanced. For generic task, Helix provides 
> user the choice to rebalance the running task or not, when the topology of 
> the cluster changes.
> For most users, it's better to disabled this feature(as default) since 
> there's no need to re-run the task every time new node is added. For users 
> with long-running tasks, enabling this feature can be very useful so that 
> when new node is added, the load of the tasks are better balanced among the 
> cluster.
> h3. Defined system behavior
> h4. When a node fails,
> h6. Feature disabled:
> * Running tasks on that failed node will be rebalanced to a live node, since 
> the task no longer exists and failed with the node.
> h6. Feature enabled:
> * Same.
> h4. When a new node is added,
> h6. Feature disabled:
> * Running tasks will continue to run on the current instance.
> * If a running task fails after a while, it might be rebalanced and run on 
> other instances, according to the new rebalance assignment under the new 
> cluster topology.
> h6. Feature enabled:
> * Running task might be cancelled and rebalanced immediately, according to 
> the new rebalance assignment under the new cluster topology.
> h3. Configuration
> A job level config field(RebalanceRunningTask) in JobConfig to enable/disable 
> this feature. By default it's false.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HELIX-654) Rebalance running task

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014967#comment-16014967
 ] 

ASF GitHub Bot commented on HELIX-654:
--

Github user jiajunwang commented on a diff in the pull request:

https://github.com/apache/helix/pull/88#discussion_r117137720
  
--- Diff: helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java 
---
@@ -455,6 +454,44 @@ private ResourceAssignment 
computeResourceMapping(String jobResource,
 return ra;
   }
 
+  /**
+   * If assignment is different from previous assignment, drop the old 
running task if it's no
+   * longer assigned to the same instance, but not removing it from 
excludeSet because the same task
+   * should not be assigned to the new instance right way.
+   */
+  private void dropRebalancedRunningTasks(Map 
newAssignment,
+  Map oldAssignment, Map paMap,
+  JobContext jobContext) {
+for (String instance : oldAssignment.keySet()) {
+  for (Integer pId : oldAssignment.get(instance)) {
+if (jobContext.getPartitionState(pId) == TaskPartitionState.RUNNING
+&& !newAssignment.get(instance).contains(pId)) {
+  paMap.put(pId, new PartitionAssignment(instance, 
TaskPartitionState.DROPPED.name()));
+  jobContext.setPartitionState(pId, TaskPartitionState.DROPPED);
--- End diff --

Do we need to set DROPPED here?
New status will be updated by updateJobContextAndGetTaskCurrentState() next 
round, right?

One problem of setting DROPPED here is that if the participant cannot 
cancel the job in a short time, it's status will still be RUNNING. Then in the 
first round, the controller sets it to be DROPPED.  In the second round, it 
will be changed back to RUNNING. Although, eventually the state will be 
correct, it is confusing during this period.


> Rebalance running task
> --
>
> Key: HELIX-654
> URL: https://issues.apache.org/jira/browse/HELIX-654
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core
>Reporter: Weihan Kong
>
> h3. Feature summary
> Helix Task Framework empowers user to run tasks on instances managed by 
> Helix. There're 2 type of tasks: generic task and fixed target task. For 
> fixed target task, the task always follows the targeted partition and is 
> rebalanced if the partition is rebalanced. For generic task, Helix provides 
> user the choice to rebalance the running task or not, when the topology of 
> the cluster changes.
> For most users, it's better to disabled this feature(as default) since 
> there's no need to re-run the task every time new node is added. For users 
> with long-running tasks, enabling this feature can be very useful so that 
> when new node is added, the load of the tasks are better balanced among the 
> cluster.
> h3. Defined system behavior
> h4. When a node fails,
> h6. Feature disabled:
> * Running tasks on that failed node will be rebalanced to a live node, since 
> the task no longer exists and failed with the node.
> h6. Feature enabled:
> * Same.
> h4. When a new node is added,
> h6. Feature disabled:
> * Running tasks will continue to run on the current instance.
> * If a running task fails after a while, it might be rebalanced and run on 
> other instances, according to the new rebalance assignment under the new 
> cluster topology.
> h6. Feature enabled:
> * Running task might be cancelled and rebalanced immediately, according to 
> the new rebalance assignment under the new cluster topology.
> h3. Configuration
> A job level config field(RebalanceRunningTask) in JobConfig to enable/disable 
> this feature. By default it's false.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] helix pull request #88: [HELIX-654] Running task rebalance

2017-05-17 Thread jiajunwang
Github user jiajunwang commented on a diff in the pull request:

https://github.com/apache/helix/pull/88#discussion_r117137720
  
--- Diff: helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java 
---
@@ -455,6 +454,44 @@ private ResourceAssignment 
computeResourceMapping(String jobResource,
 return ra;
   }
 
+  /**
+   * If assignment is different from previous assignment, drop the old 
running task if it's no
+   * longer assigned to the same instance, but not removing it from 
excludeSet because the same task
+   * should not be assigned to the new instance right way.
+   */
+  private void dropRebalancedRunningTasks(Map 
newAssignment,
+  Map oldAssignment, Map paMap,
+  JobContext jobContext) {
+for (String instance : oldAssignment.keySet()) {
+  for (Integer pId : oldAssignment.get(instance)) {
+if (jobContext.getPartitionState(pId) == TaskPartitionState.RUNNING
+&& !newAssignment.get(instance).contains(pId)) {
+  paMap.put(pId, new PartitionAssignment(instance, 
TaskPartitionState.DROPPED.name()));
+  jobContext.setPartitionState(pId, TaskPartitionState.DROPPED);
--- End diff --

Do we need to set DROPPED here?
New status will be updated by updateJobContextAndGetTaskCurrentState() next 
round, right?

One problem of setting DROPPED here is that if the participant cannot 
cancel the job in a short time, it's status will still be RUNNING. Then in the 
first round, the controller sets it to be DROPPED.  In the second round, it 
will be changed back to RUNNING. Although, eventually the state will be 
correct, it is confusing during this period.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] helix pull request #88: [HELIX-654] Running task rebalance

2017-05-17 Thread jiajunwang
Github user jiajunwang commented on a diff in the pull request:

https://github.com/apache/helix/pull/88#discussion_r117135837
  
--- Diff: helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java 
---
@@ -420,6 +411,14 @@ private ResourceAssignment 
computeResourceMapping(String jobResource,
   workflowConfig, workflowCtx, allPartitions, 
cache.getIdealStates());
   for (Map.Entry entry : 
taskAssignments.entrySet()) {
 String instance = entry.getKey();
+
+if (!isGenericTaskJob(jobCfg) || jobCfg.isRebalanceRunningTask()) {
--- End diff --

Why is this logic in the for loop? Do we need to execute it for each 
 entry?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HELIX-657) Fix TestRebalancerPersistAssignments

2017-05-17 Thread Jiajun Wang (JIRA)
Jiajun Wang created HELIX-657:
-

 Summary: Fix TestRebalancerPersistAssignments
 Key: HELIX-657
 URL: https://issues.apache.org/jira/browse/HELIX-657
 Project: Apache Helix
  Issue Type: Bug
  Components: helix-core
Reporter: Jiajun Wang


Fix the unstable test case TestRebalancerPersistAssignments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)