[jira] [Created] (CONNECTORS-1724) When the REST API cannot be connected, job using the Generic Repository Connector would be freezed.

2022-08-24 Thread Nguyen Huu Nhat (Jira)
Nguyen Huu Nhat created CONNECTORS-1724:
---

 Summary: When the REST API cannot be connected, job using the 
Generic Repository Connector would be freezed.
 Key: CONNECTORS-1724
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1724
 Project: ManifoldCF
  Issue Type: Bug
Reporter: Nguyen Huu Nhat


Hi there,

As there is an issue that is still not handled occurs in use, I would like to 
suggest the following fix for the source code of Generic repository connector.
For details about this issue, please refer to the information below:

h3. +*1. Connector name*+

Generic Repository Connector

h3. +*2. Issue*+

When Generic Repository is calling REST API with _action=seed_ and an error 
occurs, corresponding error handling is not executed, which results in that 
crawling job of ManifoldCF is frozen at status *Starting up* and no error 
message is outputted.
 * When this issue happens in the Generic Repository, seed phase of jobs in 
other repositories also freezes (perhaps, seed thread is also frozen)
 * Even after ManifoldCF is restarted, as jobs are automatically executed, the 
same issue happens again.
 * A temporary solution is to aborting the job and recheck the connection.

h3. +*3. Reproduction*+

h4. *Reproduction method:*
 * At setting step for Generic repository connection, set a non-existent entry 
point (e.g. [http://localhost/no*exist/]). Then, define a job that uses that 
entry point and run that job.
 * 10 minutes or more after the job gets started, its status is still *Starting 
up* and abnormal end does not occur due to connection error and time-out.

h4. *Reproduction steps:*
 * Create a Generic repository connection with the following settings:
 ** On the *Entry Point* tab, set a non-existent entry point (e.g. 
[http://localhost/no*exist/])
 * Create a job using above Generic repository connection
 * Start the created job and keep track of its status
 ** Job is going to be frozen with the following information:
 *** Status: Starting up
 *** Start Time: Not started
 *** Documents: 0
 ** No new events appear in *Document Status*
 ** No errors get logged in manifoldcf.log

h3. +*4. Cause*+

In *GenericConnector$ExecuteSeedingThread* class, *seedBuffer.signalDone()* 
method is only called when returned HTTP status code is 200.
 * When the connector is not able to connect to REST API, which means that 
returned HTTP status code is not 200, *seedBuffer.signalDone()* method is not 
called.
 ** This results in that *complete* flag is not reassigned as _true_
 ** As *complete* flag is not reassigned as _true_ and *buffer.size()* is 0, 
job is stuck in the *wait()* process, inside the while loop of 
*XThreadStringBuffer#fetch()* method.

([https://github.com/apache/manifoldcf/blob/release-2.22.1/framework/connector-common/src/main/java/org/apache/manifoldcf/connectorcommon/common/XThreadStringBuffer.java#L78])
{code:java}
while (buffer.size() == 0 && !complete)
  wait();
{code}

⇒ These are the reasons why job is frozen at status *Starting up*

h3. +*5. Solution*+

In order to resolve this issue, we suggest the following things:
 * *seedBuffer.signalDone()* method should be called for all cases of HTTP 
response status.
 * Moreover, when HTTP status code is not 200, ManifoldCFException is thrown. 
There is no process to handle ManifoldCFException in *finishUp()* method of 
*GenericConnector$ExecuteSeedingThread* class, so process to handle this 
exception should be added.

h3. +*6. Suggested source code (based on release 2.22.1)*+

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1151]
{code:java}
- seedBuffer.signalDone();
} finally {
  EntityUtils.consume(response.getEntity());
  method.releaseConnection();
+ seedBuffer.signalDone();
}
{code}

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1120]
{code:java}
if (thr instanceof RuntimeException) {
  throw (RuntimeException) thr;
} else if (thr instanceof Error) {
  throw (Error) thr;
+   } else if (thr instanceof ManifoldCFException) {
+ throw (ManifoldCFException) thr;
} else {
  throw new RuntimeException("Unhandled exception of type: " + 
thr.getClass().getName(), thr);
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] Release Apache ManifoldCF 2.23, RC0

2022-08-24 Thread Karl Wright
Please vote on whether to release Apache ManifoldCF 2.23, RC0.  The
artifact can be found at
https://svn.apache.org/repos/asf/manifoldcf/branches/release-2.23-branch .
There is also a release tag at
https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.23-RC0 .

This release contains few changes from 2.22.1 but does include many
dependency related minor fixes, such as:


r1902902 | kwright | 2022-07-21 04:27:31 -0400 (Thu, 21 Jul 2022) | 1 line

CONNECTORS-1541, revisited

r1902854 | jmssiera | 2022-07-19 09:27:31 -0400 (Tue, 19 Jul 2022) | 1 line

CONNECTORS-1721: Confluence v6 does not distinguish 404 errors

r1902717 | kwright | 2022-07-14 06:57:25 -0400 (Thu, 14 Jul 2022) | 1 line

Armor code against null bin names

r1902684 | kwright | 2022-07-12 16:29:10 -0400 (Tue, 12 Jul 2022) | 1 line

Fix for CONNECTORS-1720.

r1902537 | jmssiera | 2022-07-07 13:02:53 -0400 (Thu, 07 Jul 2022) | 1 line

CONNECTORS-1719: Handle MariaDB in JDBC connector

r1901792 | kwright | 2022-06-09 17:13:01 -0400 (Thu, 09 Jun 2022) | 1 line

Update xerces version

r1901783 | kwright | 2022-06-09 10:12:30 -0400 (Thu, 09 Jun 2022) | 1 line

CONNECTORS-1717: Update log4j dependency

r1901778 | kwright | 2022-06-09 08:30:52 -0400 (Thu, 09 Jun 2022) | 1 line

More fixes related to CONNECTORS-1714

r1901777 | kwright | 2022-06-09 07:25:31 -0400 (Thu, 09 Jun 2022) | 1 line

CONNECTORS-1714: Update commons-beanutils to latest version

r1901774 | kwright | 2022-06-09 05:58:24 -0400 (Thu, 09 Jun 2022) | 1 line

CONNECTORS-1716: Use SSL for downloading nuxeo

r1900835 | kwright | 2022-05-12 07:45:26 -0400 (Thu, 12 May 2022) | 1 line

Karl