Hunter L created HELIX-778:
------------------------------

             Summary: TASK: Fix a race condition in 
updatePreviousAssignedTasksStatus
                 Key: HELIX-778
                 URL: https://issues.apache.org/jira/browse/HELIX-778
             Project: Apache Helix
          Issue Type: Improvement
            Reporter: Hunter L
            Assignee: Hunter L


It was observed that TestUnregisteredCommand is very unstable. The reason was 
identified to be a race condition where when a task fails, sometimes a pending 
message for that task (from INIT to RUNNING) wasn't being cleaned up on time, 
so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus would try to 
process that message and skip the status update of that task (like updating its 
status and NUM_ATTEMPTS field in JobContext).

A short, temporary fix is to call markPartitionError() prior to checking the 
pending message, but over the long haul, we would need to revisit the task 
status update's design here to avoid this type of race conditions.

Changelist:
1. Move markPartitionError() up before checking for a pending message on the 
task
2. Fix TestUnregisteredCommand's instability



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to