subject:"\[jira\] \[Commented\] \(SLING\-4061\) Deadlock involving discovery services at startup with Oak"

[jira] [Commented] (SLING-4061) Deadlock involving discovery services at startup with Oak

2015-10-06 Thread Robert Munteanu (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945661#comment-14945661
 ] 

Robert Munteanu commented on SLING-4061:


[~bdelacretaz] - have you seen this issue recently?

> Deadlock involving discovery services at startup with Oak
> -
>
> Key: SLING-4061
> URL: https://issues.apache.org/jira/browse/SLING-4061
> Project: Sling
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Bertrand Delacretaz
> Attachments: discovery-deadlock.txt
>
>
> I just got a deadlock at startup when starting the launchpad integration 
> tests instance on sling trunk revision 1632058 (so starting with Oak):
> {code}
> export DBG="-Xmx1G  -XX:MaxPermSize=256m 
> -agentlib:jdwp=transport=dt_socket,address=30303,server=y,suspend=n"
> export MAVEN_OPTS="-Xmx1G  -XX:MaxPermSize=256m $DBG -Dsling.run.modes=oak"
> cd launchpad/testing
> mvn launchpad:run
> {code}
> I'll attach the stack trace. The discovery HeartbeatHandler, and 
> DiscoveryServiceImpl classes are involved.
> The deadlock happens often on my box (macosx 10.9.5, java version 
> "1.7.0_45"), with the same deadlock pattern AFAICS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SLING-4061) Deadlock involving discovery services at startup with Oak

2014-10-15 Thread Robert Munteanu (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172469#comment-14172469
 ] 

Robert Munteanu commented on SLING-4061:


There is an issue is in the {{DiscoveryServiceImpl}} activate method, but I'm 
not sure it's the root cause. The class leaks a reference to itself before the 
{{activate()}} method completes:

{code:java}
// make sure the first heartbeat is issued as soon as possible - which
// is right after this service starts. since the two (discoveryservice
// and heartbeatHandler need to know each other, the discoveryservice
// is passed on to the heartbeatHandler in this initialize call).
heartbeatHandler.initialize(this,
clusterViewService.getIsolatedClusterViewId());

final TopologyEventListener[] registeredServices;
synchronized (lock) {
registeredServices = this.eventListeners;
doUpdateProperties();

TopologyViewImpl newView = (TopologyViewImpl) getTopology();
TopologyEvent event = new TopologyEvent(Type.TOPOLOGY_INIT, null,
newView);
for (final TopologyEventListener da : registeredServices) {
sendTopologyEvent(da, event);
}
activated = true;
oldView = newView;
}
{code}

The deadlock itself is a lock ordering issue

- in thread pool-5-thread-1 the HeartbeatHandler wants to issue an update and 
thread and holds the DiscoveryServiceImpl.lock lock but can't lock the 
SegmentNodeStoreService lock
- in thread CM Event Dispatcher... the SegmentNodeStoreService holds its own 
lock and the call stack ends up trying to invoke 
DiscoveryServiceImpl.bindTopologyEventListener, which needs the 
DiscoveryServiceImpl.lock

I wonder whether we need more fine-grained locking in the DiscoveryServiceImpl 
- a single lock object seems to coarse-grained, especially since a lot seems to 
happen during calls like updateProperties(), including invocation of foreign 
code ( notifying event listeners ) which is a bit worrisome - invoking foreign 
code with locks held is prone to deadlocks.

Another alternative is to make make use of concurrent collections for e.g. 
event listeners, but I'm not sure we don't get bitten by the fact that they are 
weakly consistent.


 Deadlock involving discovery services at startup with Oak
 -

 Key: SLING-4061
 URL: https://issues.apache.org/jira/browse/SLING-4061
 Project: Sling
  Issue Type: Bug
  Components: Extensions
Reporter: Bertrand Delacretaz
 Attachments: discovery-deadlock.txt


 I just got a deadlock at startup when starting the launchpad integration 
 tests instance on sling trunk revision 1632058 (so starting with Oak):
 {code}
 export DBG=-Xmx1G  -XX:MaxPermSize=256m 
 -agentlib:jdwp=transport=dt_socket,address=30303,server=y,suspend=n
 export MAVEN_OPTS=-Xmx1G  -XX:MaxPermSize=256m $DBG -Dsling.run.modes=oak
 cd launchpad/testing
 mvn launchpad:run
 {code}
 I'll attach the stack trace. The discovery HeartbeatHandler, and 
 DiscoveryServiceImpl classes are involved.
 The deadlock happens often on my box (macosx 10.9.5, java version 
 1.7.0_45), with the same deadlock pattern AFAICS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SLING-4061) Deadlock involving discovery services at startup with Oak

[jira] [Commented] (SLING-4061) Deadlock involving discovery services at startup with Oak

2 matches

Site Navigation

Mail list logo

Footer information