So, when something goes wrong during document processing, the usual thing
that happens is that the document is pushed back onto the document queue
but with a processing time some distance in the future.  How far in the
future depends on the details of exception handling within the connector.
But a FATAL Error / RuntimeException, in this case due to a linkage error
(which is what you have here - you are missing a jar that you need) will
not always recover properly. I will need to look at the code to see what
happens with non-exception Throwables in the worker thread to see the
details, but the problem with runtime exceptions like that is that this is
a broad class of problem that may mean that the database is not even
working properly, or you cannot reach it, etc.  We probably are not going
to be able to make ManifoldCF robust against such errors, out of memory
conditions, etc., and it would probably not be a realistic goal to make
that so either.

Karl


On Wed, Mar 23, 2022 at 12:10 PM Julien Massiera <
julien.massi...@francelabs.com> wrote:

> Yes sorry I wrongly described the 1/:  the runtime exception happens in
> the processDocument on the first and only document found in the seeding
> phase.
>
> Here is the stack trace:
>
> WARN 2022-03-22T13:07:03,761 (Worker thread '13') -
> MCF|MCF-agent|apache.manifoldcf.connectors|JCIFS: Possibly transient
> exception detected on attempt 1 while checking if file
> smb://localhost/test/ exists: null
> jcifs.smb.SmbException: null
>         at jcifs.smb.SmbTreeImpl.waitForState(SmbTreeImpl.java:774)
> ~[jcifs-ng-2.1.7.jar:?]
>         at jcifs.smb.SmbTreeImpl.treeConnect(SmbTreeImpl.java:540)
> ~[jcifs-ng-2.1.7.jar:?]
>         at
> jcifs.smb.SmbTreeConnection.connectTree(SmbTreeConnection.java:614)
> ~[jcifs-ng-2.1.7.jar:?]
>         at
> jcifs.smb.SmbTreeConnection.connectHost(SmbTreeConnection.java:568)
> ~[jcifs-ng-2.1.7.jar:?]
>         at
> jcifs.smb.SmbTreeConnection.connectHost(SmbTreeConnection.java:489)
> ~[jcifs-ng-2.1.7.jar:?]
>         at jcifs.smb.SmbTreeConnection.connect(SmbTreeConnection.java:465)
> ~[jcifs-ng-2.1.7.jar:?]
>         at
> jcifs.smb.SmbTreeConnection.connectWrapException(SmbTreeConnection.java:426)
> ~[jcifs-ng-2.1.7.jar:?]
>         at jcifs.smb.SmbFile.ensureTreeConnected(SmbFile.java:559)
> ~[jcifs-ng-2.1.7.jar:?]
>         at jcifs.smb.SmbFile.exists(SmbFile.java:859)
> ~[jcifs-ng-2.1.7.jar:?]
>         at
> com.francelabs.datafari.connectors.share.SharedDriveConnector.fileExists(SharedDriveConnector.java:2129)
> [datafari-share-connector-6.0-dev-Community.jar:?]
>         at
> com.francelabs.datafari.connectors.share.SharedDriveConnector.processDocuments(SharedDriveConnector.java:677)
> [datafari-share-connector-6.0-dev-Community.jar:?]
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
> Caused by: java.lang.InterruptedException
>         at java.lang.Object.wait(Native Method) ~[?:?]
>         at java.lang.Object.wait(Object.java:328) ~[?:?]
>         at jcifs.smb.SmbTreeImpl.waitForState(SmbTreeImpl.java:771)
> ~[jcifs-ng-2.1.7.jar:?]
>         ... 11 more
> FATAL 2022-03-22T13:07:03,777 (Worker thread '13') -
> MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Error tossed: 'boolean
> org.bouncycastle.asn1.ASN1ObjectIdentifier.equals(org.bouncycastle.asn1.ASN1Primitive)'
> java.lang.NoSuchMethodError: 'boolean
> org.bouncycastle.asn1.ASN1ObjectIdentifier.equals(org.bouncycastle.asn1.ASN1Primitive)'
>         at jcifs.spnego.NegTokenInit.parse(NegTokenInit.java:167) ~[?:?]
>         at jcifs.spnego.NegTokenInit.<init>(NegTokenInit.java:66) ~[?:?]
>         at
> jcifs.smb.NtlmPasswordAuthenticator.createContext(NtlmPasswordAuthenticator.java:243)
> ~[?:?]
>         at jcifs.smb.SmbSessionImpl.createContext(SmbSessionImpl.java:706)
> ~[?:?]
>         at
> jcifs.smb.SmbSessionImpl.sessionSetupSMB2(SmbSessionImpl.java:544) ~[?:?]
>         at jcifs.smb.SmbSessionImpl.sessionSetup(SmbSessionImpl.java:491)
> ~[?:?]
>         at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:369) ~[?:?]
>         at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:347) ~[?:?]
>         at jcifs.smb.SmbTreeImpl.treeConnect(SmbTreeImpl.java:611) ~[?:?]
>         at
> jcifs.smb.SmbTreeConnection.connectTree(SmbTreeConnection.java:614) ~[?:?]
>         at
> jcifs.smb.SmbTreeConnection.connectHost(SmbTreeConnection.java:568) ~[?:?]
>         at
> jcifs.smb.SmbTreeConnection.connectHost(SmbTreeConnection.java:489) ~[?:?]
>         at jcifs.smb.SmbTreeConnection.connect(SmbTreeConnection.java:465)
> ~[?:?]
>         at
> jcifs.smb.SmbTreeConnection.connectWrapException(SmbTreeConnection.java:426)
> ~[?:?]
>         at jcifs.smb.SmbFile.ensureTreeConnected(SmbFile.java:559) ~[?:?]
>         at jcifs.smb.SmbFile.exists(SmbFile.java:859) ~[?:?]
>         at
> com.francelabs.datafari.connectors.share.SharedDriveConnector.fileExists(SharedDriveConnector.java:2129)
> ~[?:?]
>         at
> com.francelabs.datafari.connectors.share.SharedDriveConnector.processDocuments(SharedDriveConnector.java:677)
> ~[?:?]
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
>
> So after that FATAL exception, we end up in the 'wait' state of the
> documentQueue
>
> Julien
>
> -----Message d'origine-----
> De : Karl Wright <daddy...@gmail.com>
> Envoyé : mercredi 23 mars 2022 16:32
> À : dev <dev@manifoldcf.apache.org>
> Objet : Re: WorkerThread runtime exceptions
>
> Specifically, the stuffer thread is responsible for finding documents to
> process and getting them to the worker threads via the internal queue that
> the worker threads wait on.  The stuffer thread uses a query to do this.
> Either the query is not finding any documents, or the stuffer thread is
> down.  Probably it is the former, and the reason it is not finding any
> documents is because the job is in the wrong state due to that runtime
> exception.
>
> Can you describe what code is throwing that runtime exception?  It would
> be very helpful if you could provide a stack trace for it from the log.
>
> Karl
>
>
> On Wed, Mar 23, 2022 at 11:27 AM Karl Wright <daddy...@gmail.com> wrote:
>
> > ' 1/ On the first and only one document of the seeding phase
> > encountered, a runtime exception is triggered'
> >
> > The worker threads do not handle seeding.  If a runtime exception
> > takes place during seeding, no documents will be queued, and that is
> > the problem.  The state of the job must be incorrectly updated even
> > though the seeding failed.  OR the job's state is properly updated but
> > the corresponding thread that is supposed to know when the job is
> > completed (by looking at the job queue) doesn't properly trigger.
> >
> > The architecture of ManifoldCF has many threads that are individually
> > responsible for transitioning the job state based on the jobqueue.  If
> > somehow the jobstate winds up not in the right state then those
> > threads will not do the right thing.
> >
> > Karl
> >
> >
> > On Wed, Mar 23, 2022 at 11:08 AM Julien Massiera <
> > julien.massi...@francelabs.com> wrote:
> >
> >> Hi Karl,
> >>
> >> I had some time to investigate the problem I exposed in my first
> >> mail, and here is the behavior I observed:
> >>
> >> 1/ On the first and only one document of the seeding phase
> >> encountered, a runtime exception is triggered 2/ The runtime
> >> exception is catched by the WorkerThread, logged, and the
> >> WorkerThread stays alive (line 856 of the WorkerThread class) 3/ The
> >> WorkerThread calls the getDocument method of its documentQueue (line
> >> 121 of the WorkerThread class) 4/ The documentQueue ends in an
> >> infinite 'wait' state because the queue size is 0 and the resetFlag
> >> is false (lines 109 and 110 of the DocumentQueue class) 5/ Because of
> >> the infinite 'wait' state of the documentQueue, the job stays freezed
> >> on the 'running' state and it is impossible to stop it until the
> >> Agent is restarted
> >>
> >> I don't know much about the WorkerThread and the DocumentQueue logic,
> >> so from there, I really need your help to understand this behavior
> >> and to figure out what can be done to prevent the job from hanging in
> >> that case, which, I assume, can happen in other circumstances with
> >> other repository connectors
> >>
> >> Regards,
> >> Julien
> >>
> >> -----Message d'origine-----
> >> De : Julien Massiera <julien.massi...@francelabs.com> Envoyé : jeudi
> >> 24 février 2022 15:08 À : dev@manifoldcf.apache.org Objet : RE:
> >> WorkerThread runtime exceptions
> >>
> >> Yes I understand
> >>
> >> -----Message d'origine-----
> >> De : Karl Wright <daddy...@gmail.com> Envoyé : jeudi 24 février 2022
> >> 14:59 À : dev <dev@manifoldcf.apache.org> Objet : Re: WorkerThread
> >> runtime exceptions
> >>
> >> I'm currently completely consumed with upgrading dependencies for
> >> Tika and CXF.  This is a massive job and won't be done for probably
> >> another week or two.  Once that is done I can try to look into your
> concern.
> >>
> >> Karl
> >>
> >>
> >> On Thu, Feb 24, 2022 at 8:47 AM Julien Massiera <
> >> julien.massi...@francelabs.com> wrote:
> >>
> >> > Hi,
> >> >
> >> >
> >> >
> >> > I have faced a situation where the MCF agent was still up but was
> >> > not doing anything after a runtime exception.
> >> >
> >> >
> >> >
> >> > My use case was the following :
> >> > I have updated the libs used by a repository connector but forgot one.
> >> > During doc processing, a runtime exception <
> >> > java.lang.NoSuchMethodError > has been throwed because the sub
> >> > dependency lib was not up to date and thus the method called was
> >> > missing. The exception was catched by the WorkerThread and
> >> > displayed < Error tossed: .. > but then nothing and the job stayed
> >> > in running status and I was not able to abort it until I killed and
> >> > I restarted the agent.
> >> >
> >> >
> >> >
> >> > The catching clause is located in the WorkerThread class at lines
> >> 853-857.
> >> > I
> >> > know this is a particular case but I am not sure that the fact the
> >> > agent hangs after this exception is a normal behavior and
> >> > furthermore I can imagine that it can happen with other unkown
> runtime exceptions.
> >> > Is there something we can do to avoid the agent to be hanging in
> >> > those
> >> cases ?
> >> >
> >> >
> >> >
> >> > Regards,
> >> >
> >> > Julien
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
>
>

Reply via email to