[
https://issues.apache.org/jira/browse/CONNECTORS-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wright reassigned CONNECTORS-1724:
---------------------------------------
Assignee: Karl Wright
> When the REST API cannot be connected, job using the Generic Repository
> Connector would be freezed.
> ---------------------------------------------------------------------------------------------------
>
> Key: CONNECTORS-1724
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1724
> Project: ManifoldCF
> Issue Type: Bug
> Reporter: Nguyen Huu Nhat
> Assignee: Karl Wright
> Priority: Major
>
> Hi there,
> As there is an issue that is still not handled occurs in use, I would like to
> suggest the following fix for the source code of Generic repository connector.
> For details about this issue, please refer to the information below:
> h3. +*1. Connector name*+
> Generic Repository Connector
> h3. +*2. Issue*+
> When Generic Repository is calling REST API with _action=seed_ and an error
> occurs, corresponding error handling is not executed, which results in that
> crawling job of ManifoldCF is frozen at status *Starting up* and no error
> message is outputted.
> * When this issue happens in the Generic Repository, seed phase of jobs in
> other repositories also freezes (perhaps, seed thread is also frozen)
> * Even after ManifoldCF is restarted, as jobs are automatically executed,
> the same issue happens again.
> * A temporary solution is to aborting the job and recheck the connection.
> h3. +*3. Reproduction*+
> h4. *Reproduction method:*
> * At setting step for Generic repository connection, set a non-existent
> entry point (e.g. [http://localhost/no*exist/]). Then, define a job that uses
> that entry point and run that job.
> * 10 minutes or more after the job gets started, its status is still
> *Starting up* and abnormal end does not occur due to connection error and
> time-out.
> h4. *Reproduction steps:*
> * Create a Generic repository connection with the following settings:
> ** On the *Entry Point* tab, set a non-existent entry point (e.g.
> [http://localhost/no*exist/])
> * Create a job using above Generic repository connection
> * Start the created job and keep track of its status
> ** Job is going to be frozen with the following information:
> *** Status: Starting up
> *** Start Time: Not started
> *** Documents: 0
> ** No new events appear in *Document Status*
> ** No errors get logged in manifoldcf.log
> h3. +*4. Cause*+
> In *GenericConnector$ExecuteSeedingThread* class, *seedBuffer.signalDone()*
> method is only called when returned HTTP status code is 200.
> * When the connector is not able to connect to REST API, which means that
> returned HTTP status code is not 200, *seedBuffer.signalDone()* method is not
> called.
> ** This results in that *complete* flag is not reassigned as _true_
> ** As *complete* flag is not reassigned as _true_ and *buffer.size()* is 0,
> job is stuck in the *wait()* process, inside the while loop of
> *XThreadStringBuffer#fetch()* method.
> ([https://github.com/apache/manifoldcf/blob/release-2.22.1/framework/connector-common/src/main/java/org/apache/manifoldcf/connectorcommon/common/XThreadStringBuffer.java#L78])
> {code:java}
> while (buffer.size() == 0 && !complete)
> wait();
> {code}
> ⇒ These are the reasons why job is frozen at status *Starting up*
> h3. +*5. Solution*+
> In order to resolve this issue, we suggest the following things:
> * *seedBuffer.signalDone()* method should be called for all cases of HTTP
> response status.
> * Moreover, when HTTP status code is not 200, ManifoldCFException is thrown.
> There is no process to handle ManifoldCFException in *finishUp()* method of
> *GenericConnector$ExecuteSeedingThread* class, so process to handle this
> exception should be added.
> h3. +*6. Suggested source code (based on release 2.22.1)*+
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1151]
> {code:java}
> - seedBuffer.signalDone();
> } finally {
> EntityUtils.consume(response.getEntity());
> method.releaseConnection();
> + seedBuffer.signalDone();
> }
> {code}
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1120]
> {code:java}
> if (thr instanceof RuntimeException) {
> throw (RuntimeException) thr;
> } else if (thr instanceof Error) {
> throw (Error) thr;
> + } else if (thr instanceof ManifoldCFException) {
> + throw (ManifoldCFException) thr;
> } else {
> throw new RuntimeException("Unhandled exception of type: " +
> thr.getClass().getName(), thr);
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)