[
https://issues.apache.org/jira/browse/UIMA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915034#comment-13915034
]
Lou DeGenaro commented on UIMA-3657:
------------------------------------
Sample "blocked" report:
27 Feb 2014 15:18:56,098 INFO OR.TrackSync - blocked N/A target:
DuccWorkMap requester: OrchestratorCommonArea.getCheckpointable time: 16001
blocking: OrchestratorComponent.getState
The above or.log entry says that OrchestratorComponent.getState is blocked by
OrchestratorCommonArea.getCheckpointable who has held the DuccWorkMap
synchronization lock for about 16 seconds.
=====
Sample "overtime" report:
27 Feb 2014 15:18:56,098 INFO OR.TrackSync - overtime N/A target:
DuccWorkMap requester: OrchestratorCommonArea.getCheckpointable wait: 1 held:
21877
27 Feb 2014 15:18:56,098 INFO OR.TrackSync - report N/A target:
DuccWorkMap requester: OrchestratorComponent.getState pending: 1
The above or.log entries say that OrchestratorCommonArea.getCheckpointable
waited on 1 millisecond to obtain the synchronized lock for DuccWorkMap, then
held the lock for nearly 22 seconds and in doing so blocked 1 instance of
OrchestratorComponent.getState from getting the lock
> DUCC Orchestrator (OR) improved synchronization tracking
> --------------------------------------------------------
>
> Key: UIMA-3657
> URL: https://issues.apache.org/jira/browse/UIMA-3657
> Project: UIMA
> Issue Type: Improvement
> Components: DUCC
> Affects Versions: 1.0-Ducc
> Reporter: Lou DeGenaro
> Assignee: Lou DeGenaro
>
> The orchestrator currently records to its log some limited and incomplete
> information about synchronization. This improvement:
> 1. Instruments all WorkMap synchronizations in the OR
> 2. Accounts for time blocked and time held
> 3. Records all new requests for synchronization when current holder exceeds
> 10 seconds
> 4. Records all pending requests when current holder releases having held
> synchronization for > 10 seconds
> This is to address the situation, for example, where OR is running albeit
> slowly. Newly added log messages will hopefully shed light on where the
> bottlenecks may be.
> One theory is that a normally fast resource, such as the filesystem, becomes
> very slow and bogs down OR while its trying to write its checkpoint dataset.
> In this case, we'd expect to see the synchronization lock held for a long
> time by the OR's checkpoint module.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)