[
https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990407#comment-13990407
]
Virag Kothari commented on OOZIE-1533:
--------------------------------------
bq. The de-duping for queue is on action id for CoordActionInputCheckXCommand.
So, 10,000 actions will be in the queue even with coord job lock.
You are right. But, with coord job lock, only 1 command for a job can execute
at a time. With action locks, the handler threads may remain busy in working on
commands belonging to only 1 job if that job has lot of actions. This will be
unfair to other jobs with less actions.
The more acute problem is with system correctness as commands like kill,
suspend update both job and action during their lifecycle. For. e.g if a
command such as kill acquires lock on coordjob and terminates all actions by
moving them to KILLED, a InputCheckX with lock on actionId can execute
simultaneously and may move the action to READY state. The CoordReady may
eventually fail as the job is in KILLED state, but the action is inadvertently
in READY state.
Ideally, all commands should execute within ms, so even though they execute
using a job lock it should be very fast. Are you sure the slowness is due to
job locks? At Y!, we have seen more slowness due to DB issues usually masking
the delay caused by job locks.
> Coordinator action materialization is too slow due to coarse job level locks
> ----------------------------------------------------------------------------
>
> Key: OOZIE-1533
> URL: https://issues.apache.org/jira/browse/OOZIE-1533
> Project: Oozie
> Issue Type: Improvement
> Reporter: Srikanth Sundarrajan
> Assignee: Srikanth Sundarrajan
> Labels: locking
> Attachments: OOZIE-1533.patch
>
>
> Coord job level lock introduces high contention. Instead introduce coord
> action level locking whenever appropriate
--
This message was sent by Atlassian JIRA
(v6.2#6252)