Re: Server Hang in a Cluster, might be a deadlock

Ian Boston Tue, 15 May 2007 01:42:21 -0700


Dominique,

I have tried with Ctrl+/, and with kill -3 and with a jmx deadlockmonitor class that looks for deadlocked threads every 500ms..... but itdoesn't find any....... which makes me think that its not adeadlock..... JProfiler also has a deadlock detector (which might bemore reliable than my code :) ) and so far it hasn't found any ... soperhapse its not a deadlock.... not certain what else it could be thereis no cpu activity.

Im going to go back to basics, instrument the code and produce some biglog files. (Just hoping that doesnt generate a work around!)


Ian



Dominique Pfister wrote:

Hi Ian,

have you been able to generate a thread dump of the stalled node, at
the moment it doesn't appear to respond any more? That might help...

Kind regards
Dominique

On 5/15/07, Ian Boston <[EMAIL PROTECTED]> wrote:

Hi,

I've been doing some testing of a 2 node jackrabbit cluster using 1.3
(with the JCR-915 patch), but I am getting some strange behavior.

I use OSX Finder to mount a DAV service from each node and then upload
lots of files to each dav mount at the same time. All goes Ok for the
first few 1000 files, and then one of the nodes stops responding to that
session. The other node continues and finishes.

Eventually OSX disconnects the stalled node.

When I try the port of the apparently stalled cluster node, its still
responds, however with some strange behaviour.

A remount attempt responds with a 401 and forces basic login, but stalls
after that point. (the URL is to the base of a workspace)

If I open firefox and access the dav servlet via firefox, I can navigate
down the directory tree, but if I try and refresh any jcr folder or jcr
file that I have already visited (since the cluster node has been up),
FF spins forever.

I have put a Deadlock detector class into both nodes (java class that
looks for deadlock through jmx) but it doesnt detect anything.

I have also use JProfiler connected to one node but it never detects a
deadlock.

I have tried all of this in single node mode, with no Journal or
ClusterNode and not been able to re-create the problem (yet).

The one thing that I have seen in JProfiler is threads blocked waiting

for an ItemState? monitor inside jackrabbit, but never for more that500ms.


I am using the standard DatabaseJournal and the
SimpleDbPersistanceManager, however I see the same happening with the
FileJournal.

Any ideas ? I might put some very simple debug in near that monitor that
was blocking for 500ms ?

I did search JIRA but couldnt find anything that was a close match.


Ian

Re: Server Hang in a Cluster, might be a deadlock

Reply via email to