[
https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932353#comment-13932353
]
Virag Kothari commented on OOZIE-1533:
--------------------------------------
[~sriksun],
In addition to some comments by Rohini..
Having coord job locks also allows fairness to coord jobs. For e.g, if one job
has 10,000 actions and other job has only 1 action and command queue size is
10,000. With coord job lock, only one command of any coordinator job can
reside in the queue. With coord action lock, it might happen that each of the
10,000 actions of a single job will have its individual command in the command
queue preventing the job with a single action to be in queue.. This issue
becomes very prominent for CoordActionInputCheckX as that command occupies the
majority of the queue. But I understand that ideally locks should only be for
correctness and should not be used for providing fairness to jobs or reducing
pressure on external entities like NN/DB. As you suggest, we should look at
other options of improving throughput and scalability while providing fairness
and keeping the system stable. Btw, this similar problem also exists for
workflows as all wf action related commands acquire wf job lock before
executing.
bq. Unless all coord actions are done, status transit service should't be
updating the coord job. correct ?
Status transit service looks at child jobs and updates the parent based on
that. For e.g, a user issues a suspend on coordinator and status changes to
SUSPENDED, but due to some reason one of the underlying workflows get killed.
Then the status transit service would move the status to SUSPENDEDWITHERROR
> Coordinator action materialization is too slow due to coarse job level locks
> ----------------------------------------------------------------------------
>
> Key: OOZIE-1533
> URL: https://issues.apache.org/jira/browse/OOZIE-1533
> Project: Oozie
> Issue Type: Improvement
> Reporter: Srikanth Sundarrajan
> Assignee: Srikanth Sundarrajan
> Labels: locking
> Attachments: OOZIE-1533.patch
>
>
> Coord job level lock introduces high contention. Instead introduce coord
> action level locking whenever appropriate
--
This message was sent by Atlassian JIRA
(v6.2#6252)