Nguyen Huu Nhat created CONNECTORS-1724: -------------------------------------------
Summary: When the REST API cannot be connected, job using the Generic Repository Connector would be freezed. Key: CONNECTORS-1724 URL: https://issues.apache.org/jira/browse/CONNECTORS-1724 Project: ManifoldCF Issue Type: Bug Reporter: Nguyen Huu Nhat Hi there, As there is an issue that is still not handled occurs in use, I would like to suggest the following fix for the source code of Generic repository connector. For details about this issue, please refer to the information below: h3. +*1. Connector name*+ Generic Repository Connector h3. +*2. Issue*+ When Generic Repository is calling REST API with _action=seed_ and an error occurs, corresponding error handling is not executed, which results in that crawling job of ManifoldCF is frozen at status *Starting up* and no error message is outputted. * When this issue happens in the Generic Repository, seed phase of jobs in other repositories also freezes (perhaps, seed thread is also frozen) * Even after ManifoldCF is restarted, as jobs are automatically executed, the same issue happens again. * A temporary solution is to aborting the job and recheck the connection. h3. +*3. Reproduction*+ h4. *Reproduction method:* * At setting step for Generic repository connection, set a non-existent entry point (e.g. [http://localhost/no*exist/]). Then, define a job that uses that entry point and run that job. * 10 minutes or more after the job gets started, its status is still *Starting up* and abnormal end does not occur due to connection error and time-out. h4. *Reproduction steps:* * Create a Generic repository connection with the following settings: ** On the *Entry Point* tab, set a non-existent entry point (e.g. [http://localhost/no*exist/]) * Create a job using above Generic repository connection * Start the created job and keep track of its status ** Job is going to be frozen with the following information: *** Status: Starting up *** Start Time: Not started *** Documents: 0 ** No new events appear in *Document Status* ** No errors get logged in manifoldcf.log h3. +*4. Cause*+ In *GenericConnector$ExecuteSeedingThread* class, *seedBuffer.signalDone()* method is only called when returned HTTP status code is 200. * When the connector is not able to connect to REST API, which means that returned HTTP status code is not 200, *seedBuffer.signalDone()* method is not called. ** This results in that *complete* flag is not reassigned as _true_ ** As *complete* flag is not reassigned as _true_ and *buffer.size()* is 0, job is stuck in the *wait()* process, inside the while loop of *XThreadStringBuffer#fetch()* method. ([https://github.com/apache/manifoldcf/blob/release-2.22.1/framework/connector-common/src/main/java/org/apache/manifoldcf/connectorcommon/common/XThreadStringBuffer.java#L78]) {code:java} while (buffer.size() == 0 && !complete) wait(); {code} ⇒ These are the reasons why job is frozen at status *Starting up* h3. +*5. Solution*+ In order to resolve this issue, we suggest the following things: * *seedBuffer.signalDone()* method should be called for all cases of HTTP response status. * Moreover, when HTTP status code is not 200, ManifoldCFException is thrown. There is no process to handle ManifoldCFException in *finishUp()* method of *GenericConnector$ExecuteSeedingThread* class, so process to handle this exception should be added. h3. +*6. Suggested source code (based on release 2.22.1)*+ [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1151] {code:java} - seedBuffer.signalDone(); } finally { EntityUtils.consume(response.getEntity()); method.releaseConnection(); + seedBuffer.signalDone(); } {code} [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1120] {code:java} if (thr instanceof RuntimeException) { throw (RuntimeException) thr; } else if (thr instanceof Error) { throw (Error) thr; + } else if (thr instanceof ManifoldCFException) { + throw (ManifoldCFException) thr; } else { throw new RuntimeException("Unhandled exception of type: " + thr.getClass().getName(), thr); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)