[jira] [Resolved] (CONNECTORS-1724) When the REST API cannot be connected, job using the Generic Repository Connector would be freezed.

2022-08-25 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1724.
-
Fix Version/s: ManifoldCF next
   Resolution: Fixed

r1903673

> When the REST API cannot be connected, job using the Generic Repository 
> Connector would be freezed.
> ---
>
> Key: CONNECTORS-1724
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1724
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Nguyen Huu Nhat
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF next
>
>
> Hi there,
> As there is an issue that is still not handled occurs in use, I would like to 
> suggest the following fix for the source code of Generic repository connector.
> For details about this issue, please refer to the information below:
> h3. +*1. Connector name*+
> Generic Repository Connector
> h3. +*2. Issue*+
> When Generic Repository is calling REST API with _action=seed_ and an error 
> occurs, corresponding error handling is not executed, which results in that 
> crawling job of ManifoldCF is frozen at status *Starting up* and no error 
> message is outputted.
>  * When this issue happens in the Generic Repository, seed phase of jobs in 
> other repositories also freezes (perhaps, seed thread is also frozen)
>  * Even after ManifoldCF is restarted, as jobs are automatically executed, 
> the same issue happens again.
>  * A temporary solution is to aborting the job and recheck the connection.
> h3. +*3. Reproduction*+
> h4. *Reproduction method:*
>  * At setting step for Generic repository connection, set a non-existent 
> entry point (e.g. [http://localhost/no*exist/]). Then, define a job that uses 
> that entry point and run that job.
>  * 10 minutes or more after the job gets started, its status is still 
> *Starting up* and abnormal end does not occur due to connection error and 
> time-out.
> h4. *Reproduction steps:*
>  * Create a Generic repository connection with the following settings:
>  ** On the *Entry Point* tab, set a non-existent entry point (e.g. 
> [http://localhost/no*exist/])
>  * Create a job using above Generic repository connection
>  * Start the created job and keep track of its status
>  ** Job is going to be frozen with the following information:
>  *** Status: Starting up
>  *** Start Time: Not started
>  *** Documents: 0
>  ** No new events appear in *Document Status*
>  ** No errors get logged in manifoldcf.log
> h3. +*4. Cause*+
> In *GenericConnector$ExecuteSeedingThread* class, *seedBuffer.signalDone()* 
> method is only called when returned HTTP status code is 200.
>  * When the connector is not able to connect to REST API, which means that 
> returned HTTP status code is not 200, *seedBuffer.signalDone()* method is not 
> called.
>  ** This results in that *complete* flag is not reassigned as _true_
>  ** As *complete* flag is not reassigned as _true_ and *buffer.size()* is 0, 
> job is stuck in the *wait()* process, inside the while loop of 
> *XThreadStringBuffer#fetch()* method.
> ([https://github.com/apache/manifoldcf/blob/release-2.22.1/framework/connector-common/src/main/java/org/apache/manifoldcf/connectorcommon/common/XThreadStringBuffer.java#L78])
> {code:java}
> while (buffer.size() == 0 && !complete)
>   wait();
> {code}
> ⇒ These are the reasons why job is frozen at status *Starting up*
> h3. +*5. Solution*+
> In order to resolve this issue, we suggest the following things:
>  * *seedBuffer.signalDone()* method should be called for all cases of HTTP 
> response status.
>  * Moreover, when HTTP status code is not 200, ManifoldCFException is thrown. 
> There is no process to handle ManifoldCFException in *finishUp()* method of 
> *GenericConnector$ExecuteSeedingThread* class, so process to handle this 
> exception should be added.
> h3. +*6. Suggested source code (based on release 2.22.1)*+
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1151]
> {code:java}
> - seedBuffer.signalDone();
> } finally {
>   EntityUtils.consume(response.getEntity());
>   method.releaseConnection();
> + seedBuffer.signalDone();
> }
> {code}
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1120]
> {code:java}
> if (thr instanceof RuntimeException) {
>   throw (RuntimeException) thr;
> } else if (thr instanceof Error) {
>   throw (Error) thr;
> +   } else if (thr instanceof ManifoldCFException) {
> + throw (ManifoldCFException) 

[jira] [Assigned] (CONNECTORS-1724) When the REST API cannot be connected, job using the Generic Repository Connector would be freezed.

2022-08-25 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1724:
---

Assignee: Karl Wright

> When the REST API cannot be connected, job using the Generic Repository 
> Connector would be freezed.
> ---
>
> Key: CONNECTORS-1724
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1724
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Nguyen Huu Nhat
>Assignee: Karl Wright
>Priority: Major
>
> Hi there,
> As there is an issue that is still not handled occurs in use, I would like to 
> suggest the following fix for the source code of Generic repository connector.
> For details about this issue, please refer to the information below:
> h3. +*1. Connector name*+
> Generic Repository Connector
> h3. +*2. Issue*+
> When Generic Repository is calling REST API with _action=seed_ and an error 
> occurs, corresponding error handling is not executed, which results in that 
> crawling job of ManifoldCF is frozen at status *Starting up* and no error 
> message is outputted.
>  * When this issue happens in the Generic Repository, seed phase of jobs in 
> other repositories also freezes (perhaps, seed thread is also frozen)
>  * Even after ManifoldCF is restarted, as jobs are automatically executed, 
> the same issue happens again.
>  * A temporary solution is to aborting the job and recheck the connection.
> h3. +*3. Reproduction*+
> h4. *Reproduction method:*
>  * At setting step for Generic repository connection, set a non-existent 
> entry point (e.g. [http://localhost/no*exist/]). Then, define a job that uses 
> that entry point and run that job.
>  * 10 minutes or more after the job gets started, its status is still 
> *Starting up* and abnormal end does not occur due to connection error and 
> time-out.
> h4. *Reproduction steps:*
>  * Create a Generic repository connection with the following settings:
>  ** On the *Entry Point* tab, set a non-existent entry point (e.g. 
> [http://localhost/no*exist/])
>  * Create a job using above Generic repository connection
>  * Start the created job and keep track of its status
>  ** Job is going to be frozen with the following information:
>  *** Status: Starting up
>  *** Start Time: Not started
>  *** Documents: 0
>  ** No new events appear in *Document Status*
>  ** No errors get logged in manifoldcf.log
> h3. +*4. Cause*+
> In *GenericConnector$ExecuteSeedingThread* class, *seedBuffer.signalDone()* 
> method is only called when returned HTTP status code is 200.
>  * When the connector is not able to connect to REST API, which means that 
> returned HTTP status code is not 200, *seedBuffer.signalDone()* method is not 
> called.
>  ** This results in that *complete* flag is not reassigned as _true_
>  ** As *complete* flag is not reassigned as _true_ and *buffer.size()* is 0, 
> job is stuck in the *wait()* process, inside the while loop of 
> *XThreadStringBuffer#fetch()* method.
> ([https://github.com/apache/manifoldcf/blob/release-2.22.1/framework/connector-common/src/main/java/org/apache/manifoldcf/connectorcommon/common/XThreadStringBuffer.java#L78])
> {code:java}
> while (buffer.size() == 0 && !complete)
>   wait();
> {code}
> ⇒ These are the reasons why job is frozen at status *Starting up*
> h3. +*5. Solution*+
> In order to resolve this issue, we suggest the following things:
>  * *seedBuffer.signalDone()* method should be called for all cases of HTTP 
> response status.
>  * Moreover, when HTTP status code is not 200, ManifoldCFException is thrown. 
> There is no process to handle ManifoldCFException in *finishUp()* method of 
> *GenericConnector$ExecuteSeedingThread* class, so process to handle this 
> exception should be added.
> h3. +*6. Suggested source code (based on release 2.22.1)*+
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1151]
> {code:java}
> - seedBuffer.signalDone();
> } finally {
>   EntityUtils.consume(response.getEntity());
>   method.releaseConnection();
> + seedBuffer.signalDone();
> }
> {code}
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1120]
> {code:java}
> if (thr instanceof RuntimeException) {
>   throw (RuntimeException) thr;
> } else if (thr instanceof Error) {
>   throw (Error) thr;
> +   } else if (thr instanceof ManifoldCFException) {
> + throw (ManifoldCFException) thr;
> } else {
>   throw new RuntimeException("Unhandled