[
https://issues.apache.org/jira/browse/AIRAVATA-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430643#comment-16430643
]
Dimuthu Upeksha commented on AIRAVATA-2742:
---
Tested this locally for both SIGKILL and SIGTERM commands but couldn't
reproduce it. As a safety step, I'm updating Helix core version form 0.6.7 ->
0.8.0. But I would suggest to extensively inspect participant restarts and the
consistency of workflow executions in future testing iterations. Specially,
observe the Helix Controller log
https://github.com/apache/airavata/commit/01e0e70605ea9937304458651335166e52c51d60
> Helix Controller throws an Exception when the participant is killed
> ---
>
> Key: AIRAVATA-2742
> URL: https://issues.apache.org/jira/browse/AIRAVATA-2742
> Project: Airavata
> Issue Type: Bug
> Components: helix implementation
>Affects Versions: 0.18
>Reporter: Dimuthu Upeksha
>Assignee: Dimuthu Upeksha
>Priority: Major
>
> This was a sporadic issue and occurred only once in the test setup. There
> were 5 - 10 tasks running in the Participant and Participant was externally
> killed by SIGTERM command (kill . Once the Participant is started
> again, it did not pickup the tasks that it was running at the time it was
> killed. Surprisingly, the status of the respective workflows were IN_PROGRESS
> status. Helix Controller log showed following error for each Workflow. This
> seems like a bug in Helix and I posted the issue in Helix mailing list
> (Subject : Sporadic issue when restarting a Participant).
>
> 2018-04-06 15:10:57,766 [Thread-3] ERROR o.a.h.c.s.BestPossibleStateCalcStage
> - Error computing assignment for resource
> Workflow_of_process_PROCESS_7f6c8a54-b50f-4bdb-aafd-59ce87276527-POST-b5e39e07-2d8e-4309-be5a-f5b6067f9a24_TASK_cc8039e5-f054-4dea-8c7f-07c98077b117.
> Skipping.
> java.lang.NullPointerException: Name is null
> at java.lang.Enum.valueOf(Enum.java:236)
> at
> org.apache.helix.task.TaskPartitionState.valueOf(TaskPartitionState.java:25)
> at
> org.apache.helix.task.JobRebalancer.computeResourceMapping(JobRebalancer.java:272)
> at
> org.apache.helix.task.JobRebalancer.computeBestPossiblePartitionState(JobRebalancer.java:140)
> at
> org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(BestPossibleStateCalcStage.java:171)
> at
> org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(BestPossibleStateCalcStage.java:66)
> at
> org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:48)
> at
> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:295)
> at
> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595)
> 2018-04-06 15:11:00,385 [Thread-3] ERROR o.a.h.c.s.BestPossibleStateCalcStage
> - Error computing assignment for resource
> Workflow_of_process_PROCESS_2b69b499-c527-4c9d-8b2b-db17366f5f81-POST-c67607ae-9177-4a02-af8a-8b3751eea4ff_TASK_1ea6876d-f2ec-4139-a15d-0e64a80a3025.
> Skipping.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)