[
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
chenhaodan updated OOZIE-3717:
------------------------------
Description:
Fork actions parallel submit, so will add ForkedActionStartXCommand and
RecoveryService will check pending action may add ActionStartXCommand, if
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same
action) in queue, it would be lose. The thread parallel submit actions block at
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1 Thread 2
(ForkedActionStartXCommand) (ActionStartXCommand)
+----------------------------+ +---------+
| removeFromUniqueCallables | | ..... |
+----------------------------+ +---------+
| ...... | | queue |
+----------------------------+ +---------+
| queue | enqueue successed, in uniqueCallables
+----------------------------+
| wrapper.filterDuplicates() |
+----------------------------+
Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so
ForkedActionStartXCommand would be lost, and block at
CallableQueueService.blockingWait(). {code}
was:
Fork actions parallel submit, so will add ForkedActionStartXCommand and
RecoveryService will check pending action may add ActionStartXCommand, if
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same
action) in queue, it would be lose. The thread parallel submit actions block at
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1 Thread 2
(ForkedActionStartXCommand) (ActionStartXCommand)
+----------------------------+ +---------+
| removeFromUniqueCallables | | ..... |
+----------------------------+ +---------+
| ...... | | queue |
+----------------------------+ +---------+
| queue | enqueue successed, in uniqueCallables
+----------------------------+
| wrapper.filterDuplicates() |
+----------------------------+
Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so
ForkedActionStartXCommand would be lost, and block at
CallableQueueService.blockingWait(). {code}
> Fork actions parallel submit, becasue ForkedActionStartXCommand and
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be
> lost, and cause deadlock
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
> Issue Type: Bug
> Components: action
> Affects Versions: 5.2.1
> Reporter: chenhaodan
> Assignee: chenhaodan
> Priority: Major
> Fix For: trunk
>
>
> Fork actions parallel submit, so will add ForkedActionStartXCommand and
> RecoveryService will check pending action may add ActionStartXCommand, if
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same
> action) in queue, it would be lose. The thread parallel submit actions block
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1 Thread 2
> (ForkedActionStartXCommand) (ActionStartXCommand)
> +----------------------------+ +---------+
> | removeFromUniqueCallables | | ..... |
> +----------------------------+ +---------+
> | ...... | | queue |
> +----------------------------+ +---------+
> | queue | enqueue successed, in uniqueCallables
> +----------------------------+
> | wrapper.filterDuplicates() |
> +----------------------------+
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name,
> so ForkedActionStartXCommand would be lost, and block at
> CallableQueueService.blockingWait(). {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)