[ https://issues.apache.org/jira/browse/CONNECTORS-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright resolved CONNECTORS-1724. ------------------------------------- Fix Version/s: ManifoldCF next Resolution: Fixed r1903673 > When the REST API cannot be connected, job using the Generic Repository > Connector would be freezed. > --------------------------------------------------------------------------------------------------- > > Key: CONNECTORS-1724 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1724 > Project: ManifoldCF > Issue Type: Bug > Reporter: Nguyen Huu Nhat > Assignee: Karl Wright > Priority: Major > Fix For: ManifoldCF next > > > Hi there, > As there is an issue that is still not handled occurs in use, I would like to > suggest the following fix for the source code of Generic repository connector. > For details about this issue, please refer to the information below: > h3. +*1. Connector name*+ > Generic Repository Connector > h3. +*2. Issue*+ > When Generic Repository is calling REST API with _action=seed_ and an error > occurs, corresponding error handling is not executed, which results in that > crawling job of ManifoldCF is frozen at status *Starting up* and no error > message is outputted. > * When this issue happens in the Generic Repository, seed phase of jobs in > other repositories also freezes (perhaps, seed thread is also frozen) > * Even after ManifoldCF is restarted, as jobs are automatically executed, > the same issue happens again. > * A temporary solution is to aborting the job and recheck the connection. > h3. +*3. Reproduction*+ > h4. *Reproduction method:* > * At setting step for Generic repository connection, set a non-existent > entry point (e.g. [http://localhost/no*exist/]). Then, define a job that uses > that entry point and run that job. > * 10 minutes or more after the job gets started, its status is still > *Starting up* and abnormal end does not occur due to connection error and > time-out. > h4. *Reproduction steps:* > * Create a Generic repository connection with the following settings: > ** On the *Entry Point* tab, set a non-existent entry point (e.g. > [http://localhost/no*exist/]) > * Create a job using above Generic repository connection > * Start the created job and keep track of its status > ** Job is going to be frozen with the following information: > *** Status: Starting up > *** Start Time: Not started > *** Documents: 0 > ** No new events appear in *Document Status* > ** No errors get logged in manifoldcf.log > h3. +*4. Cause*+ > In *GenericConnector$ExecuteSeedingThread* class, *seedBuffer.signalDone()* > method is only called when returned HTTP status code is 200. > * When the connector is not able to connect to REST API, which means that > returned HTTP status code is not 200, *seedBuffer.signalDone()* method is not > called. > ** This results in that *complete* flag is not reassigned as _true_ > ** As *complete* flag is not reassigned as _true_ and *buffer.size()* is 0, > job is stuck in the *wait()* process, inside the while loop of > *XThreadStringBuffer#fetch()* method. > ([https://github.com/apache/manifoldcf/blob/release-2.22.1/framework/connector-common/src/main/java/org/apache/manifoldcf/connectorcommon/common/XThreadStringBuffer.java#L78]) > {code:java} > while (buffer.size() == 0 && !complete) > wait(); > {code} > ⇒ These are the reasons why job is frozen at status *Starting up* > h3. +*5. Solution*+ > In order to resolve this issue, we suggest the following things: > * *seedBuffer.signalDone()* method should be called for all cases of HTTP > response status. > * Moreover, when HTTP status code is not 200, ManifoldCFException is thrown. > There is no process to handle ManifoldCFException in *finishUp()* method of > *GenericConnector$ExecuteSeedingThread* class, so process to handle this > exception should be added. > h3. +*6. Suggested source code (based on release 2.22.1)*+ > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1151] > {code:java} > - seedBuffer.signalDone(); > } finally { > EntityUtils.consume(response.getEntity()); > method.releaseConnection(); > + seedBuffer.signalDone(); > } > {code} > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1120] > {code:java} > if (thr instanceof RuntimeException) { > throw (RuntimeException) thr; > } else if (thr instanceof Error) { > throw (Error) thr; > + } else if (thr instanceof ManifoldCFException) { > + throw (ManifoldCFException) thr; > } else { > throw new RuntimeException("Unhandled exception of type: " + > thr.getClass().getName(), thr); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)