subject:"\[jira\] \[Commented\] \(AIRAVATA\-2742\) Helix Controller throws an Exception when the participant is killed"

[jira] [Commented] (AIRAVATA-2742) Helix Controller throws an Exception when the participant is killed

2018-04-11 Thread Dimuthu Upeksha (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRAVATA-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434115#comment-16434115
 ] 

Dimuthu Upeksha commented on AIRAVATA-2742:
---

Helix Team identified this as an bug and they will fix it in future releases

https://issues.apache.org/jira/browse/HELIX-693

Helix Dev discussion - Subject: Sporadic issue when restarting a Participant

> Helix Controller throws an Exception when the participant is killed
> ---
>
> Key: AIRAVATA-2742
> URL: https://issues.apache.org/jira/browse/AIRAVATA-2742
> Project: Airavata
>  Issue Type: Bug
>  Components: helix implementation
>Affects Versions: 0.18
>Reporter: Dimuthu Upeksha
>Assignee: Dimuthu Upeksha
>Priority: Major
>
> This was a sporadic issue and occurred only once in the test setup. There 
> were 5 - 10 tasks running in the Participant and Participant was externally 
> killed by SIGTERM command (kill . Once the Participant is started 
> again, it did not pickup the tasks that it was running at the time it was 
> killed. Surprisingly, the status of the respective workflows were IN_PROGRESS 
> status. Helix Controller log showed following error for each Workflow. This 
> seems like a bug in Helix and I posted the issue in Helix mailing list 
> (Subject : Sporadic issue when restarting a Participant). 
>  
> 2018-04-06 15:10:57,766 [Thread-3] ERROR o.a.h.c.s.BestPossibleStateCalcStage 
>  - Error computing assignment for resource 
> Workflow_of_process_PROCESS_7f6c8a54-b50f-4bdb-aafd-59ce87276527-POST-b5e39e07-2d8e-4309-be5a-f5b6067f9a24_TASK_cc8039e5-f054-4dea-8c7f-07c98077b117.
>  Skipping.
> java.lang.NullPointerException: Name is null
>         at java.lang.Enum.valueOf(Enum.java:236)
>         at 
> org.apache.helix.task.TaskPartitionState.valueOf(TaskPartitionState.java:25)
>         at 
> org.apache.helix.task.JobRebalancer.computeResourceMapping(JobRebalancer.java:272)
>         at 
> org.apache.helix.task.JobRebalancer.computeBestPossiblePartitionState(JobRebalancer.java:140)
>         at 
> org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(BestPossibleStateCalcStage.java:171)
>         at 
> org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(BestPossibleStateCalcStage.java:66)
>         at 
> org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:48)
>         at 
> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:295)
>         at 
> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595)
> 2018-04-06 15:11:00,385 [Thread-3] ERROR o.a.h.c.s.BestPossibleStateCalcStage 
>  - Error computing assignment for resource 
> Workflow_of_process_PROCESS_2b69b499-c527-4c9d-8b2b-db17366f5f81-POST-c67607ae-9177-4a02-af8a-8b3751eea4ff_TASK_1ea6876d-f2ec-4139-a15d-0e64a80a3025.
>  Skipping. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRAVATA-2742) Helix Controller throws an Exception when the participant is killed

2018-04-09 Thread Dimuthu Upeksha (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRAVATA-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430643#comment-16430643
 ] 

Dimuthu Upeksha commented on AIRAVATA-2742:
---

Tested this locally for both SIGKILL and SIGTERM commands but couldn't 
reproduce it. As a safety step, I'm updating Helix core version form 0.6.7 -> 
0.8.0. But I would suggest to extensively inspect participant restarts and the 
consistency of workflow executions in future testing iterations. Specially, 
observe the Helix Controller log

https://github.com/apache/airavata/commit/01e0e70605ea9937304458651335166e52c51d60

> Helix Controller throws an Exception when the participant is killed
> ---
>
> Key: AIRAVATA-2742
> URL: https://issues.apache.org/jira/browse/AIRAVATA-2742
> Project: Airavata
>  Issue Type: Bug
>  Components: helix implementation
>Affects Versions: 0.18
>Reporter: Dimuthu Upeksha
>Assignee: Dimuthu Upeksha
>Priority: Major
>
> This was a sporadic issue and occurred only once in the test setup. There 
> were 5 - 10 tasks running in the Participant and Participant was externally 
> killed by SIGTERM command (kill . Once the Participant is started 
> again, it did not pickup the tasks that it was running at the time it was 
> killed. Surprisingly, the status of the respective workflows were IN_PROGRESS 
> status. Helix Controller log showed following error for each Workflow. This 
> seems like a bug in Helix and I posted the issue in Helix mailing list 
> (Subject : Sporadic issue when restarting a Participant). 
>  
> 2018-04-06 15:10:57,766 [Thread-3] ERROR o.a.h.c.s.BestPossibleStateCalcStage 
>  - Error computing assignment for resource 
> Workflow_of_process_PROCESS_7f6c8a54-b50f-4bdb-aafd-59ce87276527-POST-b5e39e07-2d8e-4309-be5a-f5b6067f9a24_TASK_cc8039e5-f054-4dea-8c7f-07c98077b117.
>  Skipping.
> java.lang.NullPointerException: Name is null
>         at java.lang.Enum.valueOf(Enum.java:236)
>         at 
> org.apache.helix.task.TaskPartitionState.valueOf(TaskPartitionState.java:25)
>         at 
> org.apache.helix.task.JobRebalancer.computeResourceMapping(JobRebalancer.java:272)
>         at 
> org.apache.helix.task.JobRebalancer.computeBestPossiblePartitionState(JobRebalancer.java:140)
>         at 
> org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(BestPossibleStateCalcStage.java:171)
>         at 
> org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(BestPossibleStateCalcStage.java:66)
>         at 
> org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:48)
>         at 
> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:295)
>         at 
> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595)
> 2018-04-06 15:11:00,385 [Thread-3] ERROR o.a.h.c.s.BestPossibleStateCalcStage 
>  - Error computing assignment for resource 
> Workflow_of_process_PROCESS_2b69b499-c527-4c9d-8b2b-db17366f5f81-POST-c67607ae-9177-4a02-af8a-8b3751eea4ff_TASK_1ea6876d-f2ec-4139-a15d-0e64a80a3025.
>  Skipping. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRAVATA-2742) Helix Controller throws an Exception when the participant is killed

[jira] [Commented] (AIRAVATA-2742) Helix Controller throws an Exception when the participant is killed

2 matches

Site Navigation

Mail list logo

Footer information