New worker threads are never spawned. That is why losing all the threads is fatal. Karl
On Tue, Oct 9, 2012 at 10:28 AM, Maciej Liżewski <maciej.lizew...@gmail.com> wrote: > One more thing: > > now, running other job connected with same repository also hangs after > seeding and no worker threads are spawn... > > > > > 2012/10/9 Maciej Liżewski <maciej.lizew...@gmail.com>: >> 2012/10/9 Karl Wright <daddy...@gmail.com>: >>> "- all worker threads are gone," ??? >>> >>> Really?? >> >> yes... really.. this is why I am also writing that this is strange... >> this is list of currently active threads: >> system >> Reference Handler Waiting >> Finalizer Waiting >> Signal Dispatcher Running >> Attach Listener Running >> main >> main Waiting >> Timer-0 Evaluating... >> HashSessionScavenger-0 Waiting >> HashSessionScavenger-1 Evaluating... >> HashSessionScavenger-2 Waiting >> qtp1792381350-157 Evaluating... >> qtp1792381350-158 Waiting >> qtp1792381350-159 Evaluating... >> qtp1792381350-161 Waiting >> Job start thread Waiting >> Startup thread Waiting >> Delete startup thread Waiting >> Finisher thread Waiting >> Job notification thread Waiting >> Job delete thread Waiting >> Stuffer thread Waiting >> Expire stuffer thread Waiting >> Set priority thread Waiting >> Expiration thread '0' Waiting >> Expiration thread '1' Waiting >> Expiration thread '2' Waiting >> Expiration thread '3' Waiting >> Expiration thread '4' Waiting >> Expiration thread '5' Evaluating... >> Expiration thread '6' Waiting >> Expiration thread '7' Waiting >> Expiration thread '8' Waiting >> Expiration thread '9' Evaluating... >> Document cleanup stuffer thread Waiting >> Document cleanup thread '0' Waiting >> Document cleanup thread '1' Waiting >> Document cleanup thread '2' Waiting >> Document cleanup thread '3' Evaluating... >> Document cleanup thread '4' Waiting >> Document cleanup thread '5' Waiting >> Document cleanup thread '6' Evaluating... >> Document cleanup thread '7' Waiting >> Document cleanup thread '8' Evaluating... >> Document cleanup thread '9' Waiting >> Document delete stuffer thread Waiting >> Document delete thread '0' Waiting >> Document delete thread '1' Waiting >> Document delete thread '2' Evaluating... >> Document delete thread '3' Waiting >> Document delete thread '4' Evaluating... >> Document delete thread '5' Waiting >> Document delete thread '6' Waiting >> Document delete thread '7' Waiting >> Document delete thread '8' Waiting >> Document delete thread '9' Evaluating... >> Job reset thread Waiting >> Seeding thread Waiting >> Idle cleanup thread Evaluating... >> Connection pool reaper Sleeping >> qtp1792381350-160 Running >> qtp1792381350-162 Running >> qtp1792381350-163 Running >> qtp1792381350-156 Evaluating... >> derby.daemons >> derby.rawStoreDaemon Waiting >> >> but still UI shows: >> >> Name Status Start Time End Time Documents Active >> Processed >> Restart Pause Abort msol-x Running Tue Oct 09 15:15:29 CEST >> 2012 5689 10 5689 >> >> (10 documents left active to process and status = "running") >> >> I checked the code of connector and there are only >> "ManifoldCFException" throwed... >> >> >>> >>> I can think of no scenario where the worker threads disappear except >>> if shutdown of the agents process is attempted and fails. >>> WorkerThread catches all Throwable's and logs them and repeats in a >>> loop. The only kind of exception that can cause the thread to exit is >>> an InterruptedException. >>> >>> This isn't making a heck of a lot of sense... >>> >>> Karl >>> >>> >>> On Tue, Oct 9, 2012 at 9:58 AM, Maciej Liżewski >>> <maciej.lizew...@gmail.com> wrote: >>>> Just looking at threads... but nothing special. >>>> - all worker threads are gone, >>>> - stuffer thread runs in a loop but finds nothing to do... >>>> - other threads just waits on 'sleep' commands. >>>> >>>> is there any particular thread I should look at? >>>> >>>> I could guess that there was some exception (maybe in my connector, >>>> but I could not repeat it) that was not handled and some worker thread >>>> just disappeared... >>>> >>>> How to enable debug logs so I could see verbose output from core functions? >>>> >>>> >>>> 2012/10/9 Karl Wright <daddy...@gmail.com>: >>>>> FWIW, getting thread dumps from the process running the agents process >>>>> when it is "hung" may (or may not) help determine the underlying >>>>> clause. >>>>> >>>>> Karl >>>>> >>>>> On Tue, Oct 9, 2012 at 9:21 AM, Karl Wright <daddy...@gmail.com> wrote: >>>>>> What is your deployment model? Is this a multiprocess deployment? >>>>>> What database are you using? >>>>>> >>>>>> There are various load tests for each database, which do far more than >>>>>> 7000 documents. I am concerned that you are seeing this because of >>>>>> some kind of cross-process synchronization issues, which might occur >>>>>> (for instance) if you are using a multiprocess environment with a >>>>>> single-process properties.xml file. >>>>>> >>>>>> Karl >>>>>> >>>>>> On Tue, Oct 9, 2012 at 9:12 AM, Maciej Liżewski >>>>>> <maciej.lizew...@gmail.com> wrote: >>>>>>> Ok... it is not a getMaxDocumentRequest issue, because I was able to >>>>>>> get it even with getMaxDocumentRequest=1. Seems it occurs when >>>>>>> indenxing large sets of documents (in my case ~7000). It also happened >>>>>>> once for CIFS connecotr (with samba share)... >>>>>>> >>>>>>> result is like this: >>>>>>> >>>>>>> Name Status Start Time End Time Documents Active >>>>>>> Processed >>>>>>> Restart Pause Abort Mantis Running Tue Oct 09 13:56:59 CEST >>>>>>> 2012 5689 1600 4400 >>>>>>> >>>>>>> there is "active" documents count 1600 for about an hour now but there >>>>>>> is no server load and nothing changes... seems that it is hanging >>>>>>> somwhere inside manifold core. >>>>>>> >>>>>>> also - when hitting abort - nothing happens (job process remains in >>>>>>> "aborting" state)... >>>>>>> >>>>>>> Problem is that it happens irregularly (sometime 10 documents, >>>>>>> sometime 1600 and sometime all documents are indexed). Tried to check >>>>>>> that locally but on first pass everything went ok... really strange... >>>>>>> >>>>>>> >>>>>>> 2012/10/3 Karl Wright <daddy...@gmail.com>: >>>>>>>> Hi Maciej, >>>>>>>> >>>>>>>> It sounds like your loop condition must be somehow incorrect. You may >>>>>>>> not receive the full number of documents specified by >>>>>>>> getMaxDocumentRequest(), but rather a number less than that. >>>>>>>> >>>>>>>> We have a number of connectors that use document batches > 1, e.g. the >>>>>>>> LiveLink connector, so this is likely not the problem. >>>>>>>> >>>>>>>> I'd recommend adding System.out.println() diagnostics to see exactly >>>>>>>> what is happening inside both getDocumentVersions() and >>>>>>>> processDocuments(). >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Oct 3, 2012 at 4:30 PM, Maciej Liżewski >>>>>>>> <maciej.lizew...@gmail.com> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have noticed strange problem with Connector (new one I am developing >>>>>>>>> right now) and getMaxDocumentRequest parameter. >>>>>>>>> When it returns 1 (default) everything seems ok, but when I set it to >>>>>>>>> anything higher (5, 10, 20) indexing job does not end but hangs when >>>>>>>>> there is only getMaxDocumentRequest documents left (when it should >>>>>>>>> process 5 documents in a row - 5 documents stays "active") >>>>>>>>> All document related functions seem written ok (they all iterate >>>>>>>>> throug passed arrays), there are no exceptions thrown (at least I do >>>>>>>>> not see any in console). >>>>>>>>> >>>>>>>>> What can be wrong and what should I look at to? any ideas? >>>>>>>>> >>>>>>>>> By the way - the new connector is for Mantis Bug tracker to index >>>>>>>>> issues. >>>>>>>>> >>>>>>>>> TIA >>>>>>>>> Redguy