[ 
https://issues.apache.org/jira/browse/UIMA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915034#comment-13915034
 ] 

Lou DeGenaro commented on UIMA-3657:
------------------------------------

Sample "blocked" report:

27 Feb 2014 15:18:56,098  INFO OR.TrackSync - blocked     N/A target: 
DuccWorkMap requester: OrchestratorCommonArea.getCheckpointable time: 16001 
blocking: OrchestratorComponent.getState

The above or.log entry says that OrchestratorComponent.getState is blocked by 
OrchestratorCommonArea.getCheckpointable who has held the DuccWorkMap 
synchronization lock for about 16 seconds.

=====

Sample "overtime" report:

27 Feb 2014 15:18:56,098  INFO OR.TrackSync - overtime     N/A target: 
DuccWorkMap requester: OrchestratorCommonArea.getCheckpointable wait: 1 held: 
21877
27 Feb 2014 15:18:56,098  INFO OR.TrackSync - report     N/A target: 
DuccWorkMap requester: OrchestratorComponent.getState  pending: 1

The above or.log entries say that OrchestratorCommonArea.getCheckpointable 
waited on 1 millisecond to obtain the synchronized lock for DuccWorkMap, then 
held the lock for nearly 22 seconds and in doing so blocked 1 instance of 
OrchestratorComponent.getState from getting the lock


> DUCC Orchestrator (OR) improved synchronization tracking
> --------------------------------------------------------
>
>                 Key: UIMA-3657
>                 URL: https://issues.apache.org/jira/browse/UIMA-3657
>             Project: UIMA
>          Issue Type: Improvement
>          Components: DUCC
>    Affects Versions: 1.0-Ducc
>            Reporter: Lou DeGenaro
>            Assignee: Lou DeGenaro
>
> The orchestrator currently records to its log some limited and incomplete 
> information about synchronization.  This improvement:
> 1. Instruments all WorkMap synchronizations in the OR
> 2. Accounts for time blocked and time held
> 3. Records all new requests for synchronization when current holder exceeds 
> 10 seconds
> 4. Records all pending requests when current holder releases having held 
> synchronization for > 10 seconds
> This is to address the situation, for example, where OR is running albeit 
> slowly.  Newly added log messages will hopefully shed light on where the 
> bottlenecks may be.
> One theory is that a normally fast resource, such as the filesystem, becomes 
> very slow and bogs down OR while its trying to write its checkpoint dataset.  
> In this case, we'd expect to see the synchronization lock held for a long 
> time by the OR's checkpoint module.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to