[ 
https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932353#comment-13932353
 ] 

Virag Kothari commented on OOZIE-1533:
--------------------------------------

[~sriksun],
In addition to some comments by Rohini..
Having coord job locks also allows fairness to coord jobs. For e.g, if one job 
has 10,000 actions and other job has only 1 action and command queue size is 
10,000.  With coord job lock, only one command of any coordinator job can 
reside in the queue. With coord action lock, it might happen that  each of the 
10,000 actions of a single job will have its individual command in the command 
queue preventing the job with a single action to be in queue.. This issue 
becomes very prominent for CoordActionInputCheckX as that command occupies the 
majority of the queue. But I understand that ideally locks should only be for 
correctness and should not be used for providing fairness to jobs or reducing 
pressure on external entities like NN/DB. As you suggest, we should look at 
other options of improving throughput and scalability while providing fairness 
and keeping the system stable. Btw, this similar problem also exists for 
workflows as all wf action related commands acquire wf job lock before 
executing.

bq. Unless all coord actions are done, status transit service should't be 
updating the coord job. correct ?

Status transit service looks at child jobs and updates the parent based on 
that. For e.g, a user issues a suspend on coordinator and status changes to 
SUSPENDED, but due to some reason one of the underlying workflows get killed. 
Then the status transit service would move the status to SUSPENDEDWITHERROR 




> Coordinator action materialization is too slow due to coarse job level locks
> ----------------------------------------------------------------------------
>
>                 Key: OOZIE-1533
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1533
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Srikanth Sundarrajan
>            Assignee: Srikanth Sundarrajan
>              Labels: locking
>         Attachments: OOZIE-1533.patch
>
>
> Coord job level lock introduces high contention. Instead introduce coord 
> action level locking whenever appropriate



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to