[ 
https://issues.apache.org/jira/browse/CONNECTORS-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1724.
-------------------------------------
    Fix Version/s: ManifoldCF next
       Resolution: Fixed

r1903673

> When the REST API cannot be connected, job using the Generic Repository 
> Connector would be freezed.
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1724
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1724
>             Project: ManifoldCF
>          Issue Type: Bug
>            Reporter: Nguyen Huu Nhat
>            Assignee: Karl Wright
>            Priority: Major
>             Fix For: ManifoldCF next
>
>
> Hi there,
> As there is an issue that is still not handled occurs in use, I would like to 
> suggest the following fix for the source code of Generic repository connector.
> For details about this issue, please refer to the information below:
> h3. +*1. Connector name*+
> Generic Repository Connector
> h3. +*2. Issue*+
> When Generic Repository is calling REST API with _action=seed_ and an error 
> occurs, corresponding error handling is not executed, which results in that 
> crawling job of ManifoldCF is frozen at status *Starting up* and no error 
> message is outputted.
>  * When this issue happens in the Generic Repository, seed phase of jobs in 
> other repositories also freezes (perhaps, seed thread is also frozen)
>  * Even after ManifoldCF is restarted, as jobs are automatically executed, 
> the same issue happens again.
>  * A temporary solution is to aborting the job and recheck the connection.
> h3. +*3. Reproduction*+
> h4. *Reproduction method:*
>  * At setting step for Generic repository connection, set a non-existent 
> entry point (e.g. [http://localhost/no*exist/]). Then, define a job that uses 
> that entry point and run that job.
>  * 10 minutes or more after the job gets started, its status is still 
> *Starting up* and abnormal end does not occur due to connection error and 
> time-out.
> h4. *Reproduction steps:*
>  * Create a Generic repository connection with the following settings:
>  ** On the *Entry Point* tab, set a non-existent entry point (e.g. 
> [http://localhost/no*exist/])
>  * Create a job using above Generic repository connection
>  * Start the created job and keep track of its status
>  ** Job is going to be frozen with the following information:
>  *** Status: Starting up
>  *** Start Time: Not started
>  *** Documents: 0
>  ** No new events appear in *Document Status*
>  ** No errors get logged in manifoldcf.log
> h3. +*4. Cause*+
> In *GenericConnector$ExecuteSeedingThread* class, *seedBuffer.signalDone()* 
> method is only called when returned HTTP status code is 200.
>  * When the connector is not able to connect to REST API, which means that 
> returned HTTP status code is not 200, *seedBuffer.signalDone()* method is not 
> called.
>  ** This results in that *complete* flag is not reassigned as _true_
>  ** As *complete* flag is not reassigned as _true_ and *buffer.size()* is 0, 
> job is stuck in the *wait()* process, inside the while loop of 
> *XThreadStringBuffer#fetch()* method.
> ([https://github.com/apache/manifoldcf/blob/release-2.22.1/framework/connector-common/src/main/java/org/apache/manifoldcf/connectorcommon/common/XThreadStringBuffer.java#L78])
> {code:java}
>     while (buffer.size() == 0 && !complete)
>       wait();
> {code}
> ⇒ These are the reasons why job is frozen at status *Starting up*
> h3. +*5. Solution*+
> In order to resolve this issue, we suggest the following things:
>  * *seedBuffer.signalDone()* method should be called for all cases of HTTP 
> response status.
>  * Moreover, when HTTP status code is not 200, ManifoldCFException is thrown. 
> There is no process to handle ManifoldCFException in *finishUp()* method of 
> *GenericConnector$ExecuteSeedingThread* class, so process to handle this 
> exception should be added.
> h3. +*6. Suggested source code (based on release 2.22.1)*+
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1151]
> {code:java}
> -         seedBuffer.signalDone();
>         } finally {
>           EntityUtils.consume(response.getEntity());
>           method.releaseConnection();
> +         seedBuffer.signalDone();
>         }
> {code}
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1120]
> {code:java}
>         if (thr instanceof RuntimeException) {
>           throw (RuntimeException) thr;
>         } else if (thr instanceof Error) {
>           throw (Error) thr;
> +       } else if (thr instanceof ManifoldCFException) {
> +         throw (ManifoldCFException) thr;
>         } else {
>           throw new RuntimeException("Unhandled exception of type: " + 
> thr.getClass().getName(), thr);
>         }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to