Re: Frozen Again

Farzad Valad Thu, 07 Jul 2011 14:38:48 -0700

I don't see an attachment in the email.

On 7/7/2011 1:20 PM, Karl Wright wrote:

Attached please find an instrumented
framework\pull-agent\src\main\java\org\apache\manifoldcf\crawler\system\ResetManager.java
class.  Please rebuild with this class, cause the hang, and capture
standard out so I can see it.


Thanks!
Karl


On Thu, Jul 7, 2011 at 2:12 PM, Karl Wright<daddy...@gmail.com>  wrote:

Thanks.  I maybe can send you an instrumented ResetManager class later
today, if you are in a position to rebuild MCF and try this again.

Karl

On Thu, Jul 7, 2011 at 2:06 PM, Farzad Valad<ho...@farzad.net>  wrote:

I'm attaching the current thread dump file that goes with the log file.  It
is easy to recreate just cause an insert failure do size mismatch between
the column and value, where the value can't fit. More than happy to test and
help out.

On 7/6/2011 2:44 PM, Farzad Valad wrote:

You are right, it was db error.  In this case I tried to insert a value
larger than the column size and the insert failed.  I'll grab the log next
time too, but unfortunately deleted and running another test with a larger
column.  As soon as it finishes or errors, I'll reproduce this one again and
send you the stack trace.

On 7/6/2011 2:36 PM, Karl Wright wrote:

I have seen this before.  The critical traceback, which you see for
ALL the worker threads, is:

"Worker thread '36'" daemon prio=6 tid=0x00000000077ed000 nid=0xa98 in
Object.wait() [0x000000000b1af000]
    java.lang.Thread.State: WAITING (on object monitor)
         at java.lang.Object.wait(Native Method)
         at java.lang.Object.wait(Object.java:485)
         at
org.apache.manifoldcf.crawler.system.ResetManager.waitForReset(ResetManager.java:107)
         - locked<0x00000000e0005528>    (a
org.apache.manifoldcf.crawler.system.WorkerResetManager)
         at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:110)


ManifoldCF has code in it for dealing with database errors that
requires all worker threads to be brought into the same state.  This
code has never worked properly, and I've never been able to figure out
why.  But the underlying problem is that you've had a database error
of some kind which requires a reset.  This is usually a connection
error.

Can you look at manifoldcf.log and send the last stack trace in it?
It could be too short a connection lifetime in either the manifoldcf
configuration or in the postgresql configuration.

Karl


On Wed, Jul 6, 2011 at 3:27 PM, Farzad Valad<ho...@farzad.net>    wrote:

So this time I went through the thread dump and don't see any socket
waits.
  Any thoughts why it is stuck this time?

Thanks,
Farzad.

Re: Frozen Again

Reply via email to