[jira] [Commented] (CONNECTORS-1738) Suggestion for adding function that allows setting timeout values for Elasticsearch Output Connector

2022-10-19 Thread Nguyen Huu Nhat (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620720#comment-17620720
 ] 

Nguyen Huu Nhat commented on CONNECTORS-1738:
-

Hi [~kwri...@metacarta.com] ,
I understand that you are busy, but could you please have a look at this 
suggestion.
Thank you so much!

> Suggestion for adding function that allows setting timeout values for 
> Elasticsearch Output Connector
> 
>
> Key: CONNECTORS-1738
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1738
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Nguyen Huu Nhat
>Priority: Major
> Attachments: EditConnection.PNG, ViewConnection.PNG, patch.txt
>
>
> Hi there,
> For Elasticsearch Output Connector, during use, I have exeperienced cases 
> that required the values of *socketTimeout* and *connectionTimeout* to be 
> increased.
> However, as those values are being hardcoded within the source code as 
> 90(ms) and 6(ms) respectively, it is quite troublesome to update them 
> in cases mentioned above.
> For this reason, instead of hardcoding, I think it would be better that the 
> values of *socketTimeout* and *connectionTimeout* can be edited through 
> WebUI, on the connection setting screen.
> In ManifoldCF, there are also a few other connectors that support setting 
> *socketTimeout* and {*}connectionTimeout{*}, such as Generic, Confluence, etc.
> Therefore, I would like to suggest modifying the ElasticSearch Output 
> Connector's source code to allow setting *socketTimeout* and 
> *connectionTimeout* value when it is needed.
> h3. +*1. Connector Name*+
> ElasticSearch Output Connector
> h3. +*2. Improvement Detail*+
> On connection setting screen (WebUI), add handling method that enable value 
> setting for *socketTimeout* and *connectionTimeout*
> ※The default value for *socketTimeout* and *connectionTimeout* are still 
> 90 and 6 (ms) as they are.
> The connection setting screen will look like below:
> !EditConnection.PNG!
> h3. +*3. Suggested source code (based on release 2.22.1)*+
> Because the content is edited in many files & the number of LOC is quite 
> large,
> I will attach the patch file here, please check it out.
> [^patch.txt]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1738) Suggestion for adding function that allows setting timeout values for Elasticsearch Output Connector

2022-10-11 Thread Nguyen Huu Nhat (Jira)
Nguyen Huu Nhat created CONNECTORS-1738:
---

 Summary: Suggestion for adding function that allows setting 
timeout values for Elasticsearch Output Connector
 Key: CONNECTORS-1738
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1738
 Project: ManifoldCF
  Issue Type: Improvement
Reporter: Nguyen Huu Nhat
 Attachments: EditConnection.PNG, ViewConnection.PNG, patch.txt

Hi there,

For Elasticsearch Output Connector, during use, I have exeperienced cases that 
required the values of *socketTimeout* and *connectionTimeout* to be increased.
However, as those values are being hardcoded within the source code as 
90(ms) and 6(ms) respectively, it is quite troublesome to update them 
in cases mentioned above.

For this reason, instead of hardcoding, I think it would be better that the 
values of *socketTimeout* and *connectionTimeout* can be edited through WebUI, 
on the connection setting screen.

In ManifoldCF, there are also a few other connectors that support setting 
*socketTimeout* and {*}connectionTimeout{*}, such as Generic, Confluence, etc.
Therefore, I would like to suggest modifying the ElasticSearch Output 
Connector's source code to allow setting *socketTimeout* and 
*connectionTimeout* value when it is needed.
h3. +*1. Connector Name*+

ElasticSearch Output Connector
h3. +*2. Improvement Detail*+

On connection setting screen (WebUI), add handling method that enable value 
setting for *socketTimeout* and *connectionTimeout*
※The default value for *socketTimeout* and *connectionTimeout* are still 90 
and 6 (ms) as they are.

The connection setting screen will look like below:
!EditConnection.PNG!
h3. +*3. Suggested source code (based on release 2.22.1)*+

Because the content is edited in many files & the number of LOC is quite large,
I will attach the patch file here, please check it out.
[^patch.txt]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1737) Suggestion for adding function for proxy configuration for connector of Confluence-V6

2022-10-04 Thread Nguyen Huu Nhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nguyen Huu Nhat updated CONNECTORS-1737:

Description: 
Hi there,

Currently, I am having a problem regarding the need for proxy information to 
connect to Confluence Cloud (SaaS) from the Confluence connector.
As you know, we can use Confluence Server (self-built environment) or use 
Confluence Cloud (SaaS).
Using Confluence Server, you can set up a server (for example, located in the 
same place where ManifoldCF is running) so you may not need to use proxy 
information for the connection.
However, for Confluence Cloud (SaaS), there will be cases where you need to set 
proxy information to be able to connect.

I have checked in a few other connectors of ManifoldCF, there are a few 
connectors that support setting proxy information for the connection, for 
example: SharePoint, Jira, Slack, ...
Therefore, I would like to suggest editing Confluence-v6's sourcecode 
(including Authority connector and Repository connector) so that we can have an 
option to set proxy information to use for connection to Confluence when needed.
h3. +*Connector Name*+
 - Confluence-v6 (including Authority connector and Repository connector)

h3. +*Improvement Detail*+
 - At the Confluence connection setting screen, add fields that allow the user 
to set proxy information (eg: proxyProtocol, proxyHost, proxyPort..).
The proxy setting screen will look like below:
!EditConnection.png!


 - When connecting (call method `connect()`), use the set field values as proxy 
information.

※ This improvement applies to both the Authority Connector and the Repository 
Connector side.
h3. +*Suggested source code (based on release 2.22.1)*+

Because the content is edited in many files and the number of LOC is quite 
large, I will attach the patch file here, please check it.
[^patch.txt]

  was:
Hi there,

Currently, I am having a problem regarding the need for proxy information to 
connect to Confluence Cloud (SaaS) from the Confluence connector.
As you know, we can use Confluence Server (self-built environment) or use 
Confluence Cloud (SaaS).
Using Confluence Server, you can set up a server (for example, located in the 
same place where ManifoldCF is running) so you may not need to use proxy 
information for the connection.
However, for Confluence Cloud (SaaS), there will be cases where you need to set 
proxy information to be able to connect.

I have checked in a few other connectors of ManifoldCF, there are a few 
connectors that support setting proxy information for the connection, for 
example: SharePoint, Jira, Slack, ...
Therefore, I would like to suggest editing Confluence-v6's sourcecode 
(including Authority connector and Repository connector) so that we can have an 
option to set proxy information to use for connection to Confluence when needed.
h3. +*Connector Name*+
 - Confluence-v6 (including Authority connector and Repository connector)

h3. +*Improvement Detail*+
 - At the Confluence connection setting screen, add fields that allow the user 
to set proxy information (eg: proxyProtocol, proxyHost, proxyPort..).
The proxy setting screen will look like below:
!EditConnection.png!


 - When connecting (call method `connect()`), use the set field values as proxy 
information.

※ This improvement applies to both the Authority Connector and the Repository 
Connector side.
h3. +*Source Code Modification*+

Because the content is edited in many files and the number of LOC is quite 
large, I will attach the patch file here, please check it.
[^patch.txt]


> Suggestion for adding function for proxy configuration for connector of 
> Confluence-V6
> -
>
> Key: CONNECTORS-1737
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1737
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Nguyen Huu Nhat
>Priority: Major
> Attachments: EditConnection.png, ViewConnection.png, patch.txt
>
>
> Hi there,
> Currently, I am having a problem regarding the need for proxy information to 
> connect to Confluence Cloud (SaaS) from the Confluence connector.
> As you know, we can use Confluence Server (self-built environment) or use 
> Confluence Cloud (SaaS).
> Using Confluence Server, you can set up a server (for example, located in the 
> same place where ManifoldCF is running) so you may not need to use proxy 
> information for the connection.
> However, for Confluence Cloud (SaaS), there will be cases where you need to 
> set proxy information to be able to connect.
> I have checked in a few other connectors of ManifoldCF, there are a few 
> connectors that support setting proxy information for the connection, for 
> example: SharePoint, Jira, Slack, ...
> Therefore, I would like to suggest editing 

[jira] [Created] (CONNECTORS-1737) Suggestion for adding function for proxy configuration for connector of Confluence-V6

2022-10-04 Thread Nguyen Huu Nhat (Jira)
Nguyen Huu Nhat created CONNECTORS-1737:
---

 Summary: Suggestion for adding function for proxy configuration 
for connector of Confluence-V6
 Key: CONNECTORS-1737
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1737
 Project: ManifoldCF
  Issue Type: Improvement
Reporter: Nguyen Huu Nhat
 Attachments: EditConnection.png, ViewConnection.png, patch.txt

Hi there,

Currently, I am having a problem regarding the need for proxy information to 
connect to Confluence Cloud (SaaS) from the Confluence connector.
As you know, we can use Confluence Server (self-built environment) or use 
Confluence Cloud (SaaS).
Using Confluence Server, you can set up a server (for example, located in the 
same place where ManifoldCF is running) so you may not need to use proxy 
information for the connection.
However, for Confluence Cloud (SaaS), there will be cases where you need to set 
proxy information to be able to connect.

I have checked in a few other connectors of ManifoldCF, there are a few 
connectors that support setting proxy information for the connection, for 
example: SharePoint, Jira, Slack, ...
Therefore, I would like to suggest editing Confluence-v6's sourcecode 
(including Authority connector and Repository connector) so that we can have an 
option to set proxy information to use for connection to Confluence when needed.
h3. +*Connector Name*+
 - Confluence-v6 (including Authority connector and Repository connector)

h3. +*Improvement Detail*+
 - At the Confluence connection setting screen, add fields that allow the user 
to set proxy information (eg: proxyProtocol, proxyHost, proxyPort..).
The proxy setting screen will look like below:
!EditConnection.png!


 - When connecting (call method `connect()`), use the set field values as proxy 
information.

※ This improvement applies to both the Authority Connector and the Repository 
Connector side.
h3. +*Source Code Modification*+

Because the content is edited in many files and the number of LOC is quite 
large, I will attach the patch file here, please check it.
[^patch.txt]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1731) Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector

2022-09-21 Thread Nguyen Huu Nhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nguyen Huu Nhat updated CONNECTORS-1731:

Description: 
Hi there,

I would like to suggest the following retry-related improvements:
h3. +*1. Connector name*+

Generic Repository Connector
h3. +*2. Reasons for improvement*+

In the process of ExecuteSeedingThread, DocumentVersionThread and 
ExecuteProcessThread, when HttpClient.execute is executed, there are chances 
that connection gets interrupted and connection error occurs (HTTP status code 
<> 200).
When HTTP status code <> 200, exception of type ManifoldCFException will be 
thrown. However, there is no process to handle this, which leads to job being 
aborted.

As errors relating to connection (HTTP status code <> 200) can be resolved 
automatically, retry should be added for cases like this.
h3. +*3. Improvements*+

Improvement includes the followings:
 * Adding method to handle retry for exception of type ManifoldCFException
 * Calling method to handle ManifoldCFException exception generated when 
executing in threads: ExecuteSeedingThread, DocumentVersionThread, 
ExecuteProcessThread

h3. +*4. Suggested source code (based on release 2.22.1)*+
 * Adding method to handle retry for exception of type ManifoldCFException
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1026]
{code:java}
  /**
   * Function for handling ManifoldCFException exception caused by connection 
error.
   * In case of connection error, ServiceInterruption exception is thrown to 
perform retry.
   * 
   * @param e ManifoldCFException
   * @throws ServiceInterruption
   */
  protected static void handleManifoldCFException(ManifoldCFException e)
throws ServiceInterruption {
long currentTime = System.currentTimeMillis();
throw new ServiceInterruption("Connection error: " + e.getMessage(), e, 
currentTime + 30L,
  currentTime + 3 * 60 * 6L, -1, false);
  }
{code}

 * Calling method to handle ManifoldCFException exception generated when 
executing in threads: ExecuteSeedingThread, DocumentVersionThread, 
ExecuteProcessThread
 ** ExecuteSeedingThread
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L256]
{code:java}
} catch (InterruptedException e) {
  t.interrupt();
  throw new ManifoldCFException("Interrupted: " + e.getMessage(), e,
ManifoldCFException.INTERRUPTED);
+   } catch (ManifoldCFException e) {
+ handleManifoldCFException(e);
}
return new Long(seedTime).toString();
{code}

 ** DocumentVersionThread
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L304]
{code:java}
  try {
versions = versioningThread.finishUp();
  } catch (IOException ex) {
handleIOException((IOException)ex);
+ } catch (ManifoldCFException ex) {
+   handleManifoldCFException(ex);
  } catch (InterruptedException ex) {
throw new ManifoldCFException(ex.getMessage(), ex, 
ManifoldCFException.INTERRUPTED);
  }
{code}

 ** ExecuteProcessThread
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L445]
{code:java}
} catch (InterruptedIOException e) {
  t.interrupt();
  throw new ManifoldCFException("Interrupted: " + e.getMessage(), 
e, ManifoldCFException.INTERRUPTED);
} catch (IOException e) {
  handleIOException(e);
+   } catch (ManifoldCFException e) {
+ handleManifoldCFException(e);
}
{code}

  was:
Hi there,

I would like to suggest the following retry-related improvements:

h3. +*1. Connector name*+

Generic Repository Connector

h3. +*2. Reasons for improvement*+

In the process of ExecuteSeedingThread, DocumentVersionThread and 
ExecuteProcessThread, when HttpClient.execute is executed, there are chances 
that connection gets interrupted and connection error occurs (HTTP status code 
<> 200).
When HTTP status code <> 200, exception of type ManifoldCFException will be 
thrown. However, there is no process to handle this, which leads to job being 
aborted.

As errors relating to connection (HTTP status code <> 200) can be resolved 
automatically, retry should be added for cases like this.

h3. +*3. Improvements*+

Improvement includes the followings:
 * Adding method to handle retry for exception of type ManifoldCFException
 * Calling method to handle  ManifoldCFException exception generated when 
executing in threads: ExecuteSeedingThread, 

[jira] [Updated] (CONNECTORS-1731) Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector

2022-09-21 Thread Nguyen Huu Nhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nguyen Huu Nhat updated CONNECTORS-1731:

Attachment: patch_update.txt

> Suggestion for adding handling process for ManifoldCFException in Generic 
> Repository Connector
> --
>
> Key: CONNECTORS-1731
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1731
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Nguyen Huu Nhat
>Assignee: Karl Wright
>Priority: Major
> Attachments: patch.txt, patch_update.txt
>
>
> Hi there,
> I would like to suggest the following retry-related improvements:
> h3. +*1. Connector name*+
> Generic Repository Connector
> h3. +*2. Reasons for improvement*+
> In the process of ExecuteSeedingThread, DocumentVersionThread and 
> ExecuteProcessThread, when HttpClient.execute is executed, there are chances 
> that connection gets interrupted and connection error occurs (HTTP status 
> code <> 200).
> When HTTP status code <> 200, exception of type ManifoldCFException will be 
> thrown. However, there is no process to handle this, which leads to job being 
> aborted.
> As errors relating to connection (HTTP status code <> 200) can be resolved 
> automatically, retry should be added for cases like this.
> h3. +*3. Improvements*+
> Improvement includes the followings:
>  * Adding method to handle retry for exception of type ManifoldCFException
>  * Calling method to handle  ManifoldCFException exception generated when 
> executing in threads: ExecuteSeedingThread, DocumentVersionThread, 
> ExecuteProcessThread
> h3. +*4. Suggested source code (based on release 2.22.1)*+
>  * Adding method to handle retry for exception of type ManifoldCFException
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1026]
> {code:java}
>   /**
>* Function for handling ManifoldCFException exception caused by connection 
> error.
>* In case of connection error, ServiceInterruption exception is thrown to 
> perform retry.
>* 
>* @param e ManifoldCFException
>* @throws ServiceInterruption
>*/
>   protected static void handleManifoldCFException(ManifoldCFException e)
> throws ServiceInterruption {
> long currentTime = System.currentTimeMillis();
> throw new ServiceInterruption("Connection error: " + e.getMessage(), e, 
> currentTime + 30L,
>   currentTime + timeToFail * 3 * 60 * 6L, -1, false);
>   }
> {code}
>  * Calling method to handle  ManifoldCFException exception generated when 
> executing in threads: ExecuteSeedingThread, DocumentVersionThread, 
> ExecuteProcessThread
>  ** ExecuteSeedingThread
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L256]
> {code:java}
> } catch (InterruptedException e) {
>   t.interrupt();
>   throw new ManifoldCFException("Interrupted: " + e.getMessage(), e,
> ManifoldCFException.INTERRUPTED);
> +   } catch (ManifoldCFException e) {
> + handleManifoldCFException(e);
> }
> return new Long(seedTime).toString();
> {code}
>  ** DocumentVersionThread
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L304]
> {code:java}
>   try {
> versions = versioningThread.finishUp();
>   } catch (IOException ex) {
> handleIOException((IOException)ex);
> + } catch (ManifoldCFException ex) {
> +   handleManifoldCFException(ex);
>   } catch (InterruptedException ex) {
> throw new ManifoldCFException(ex.getMessage(), ex, 
> ManifoldCFException.INTERRUPTED);
>   }
> {code}
>  ** ExecuteProcessThread
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L445]
> {code:java}
> } catch (InterruptedIOException e) {
>   t.interrupt();
>   throw new ManifoldCFException("Interrupted: " + e.getMessage(), 
> e, ManifoldCFException.INTERRUPTED);
> } catch (IOException e) {
>   handleIOException(e);
> +   } catch (ManifoldCFException e) {
> + handleManifoldCFException(e);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1731) Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector

2022-09-21 Thread Nguyen Huu Nhat (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607630#comment-17607630
 ] 

Nguyen Huu Nhat commented on CONNECTORS-1731:
-

Sorry. The patch file is currently a bit mistake about the modification source 
code. I updated that part in the attached file.
Please check again with the file below:
[^patch_update.txt]

> Suggestion for adding handling process for ManifoldCFException in Generic 
> Repository Connector
> --
>
> Key: CONNECTORS-1731
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1731
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Nguyen Huu Nhat
>Assignee: Karl Wright
>Priority: Major
> Attachments: patch.txt, patch_update.txt
>
>
> Hi there,
> I would like to suggest the following retry-related improvements:
> h3. +*1. Connector name*+
> Generic Repository Connector
> h3. +*2. Reasons for improvement*+
> In the process of ExecuteSeedingThread, DocumentVersionThread and 
> ExecuteProcessThread, when HttpClient.execute is executed, there are chances 
> that connection gets interrupted and connection error occurs (HTTP status 
> code <> 200).
> When HTTP status code <> 200, exception of type ManifoldCFException will be 
> thrown. However, there is no process to handle this, which leads to job being 
> aborted.
> As errors relating to connection (HTTP status code <> 200) can be resolved 
> automatically, retry should be added for cases like this.
> h3. +*3. Improvements*+
> Improvement includes the followings:
>  * Adding method to handle retry for exception of type ManifoldCFException
>  * Calling method to handle  ManifoldCFException exception generated when 
> executing in threads: ExecuteSeedingThread, DocumentVersionThread, 
> ExecuteProcessThread
> h3. +*4. Suggested source code (based on release 2.22.1)*+
>  * Adding method to handle retry for exception of type ManifoldCFException
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1026]
> {code:java}
>   /**
>* Function for handling ManifoldCFException exception caused by connection 
> error.
>* In case of connection error, ServiceInterruption exception is thrown to 
> perform retry.
>* 
>* @param e ManifoldCFException
>* @throws ServiceInterruption
>*/
>   protected static void handleManifoldCFException(ManifoldCFException e)
> throws ServiceInterruption {
> long currentTime = System.currentTimeMillis();
> throw new ServiceInterruption("Connection error: " + e.getMessage(), e, 
> currentTime + 30L,
>   currentTime + timeToFail * 3 * 60 * 6L, -1, false);
>   }
> {code}
>  * Calling method to handle  ManifoldCFException exception generated when 
> executing in threads: ExecuteSeedingThread, DocumentVersionThread, 
> ExecuteProcessThread
>  ** ExecuteSeedingThread
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L256]
> {code:java}
> } catch (InterruptedException e) {
>   t.interrupt();
>   throw new ManifoldCFException("Interrupted: " + e.getMessage(), e,
> ManifoldCFException.INTERRUPTED);
> +   } catch (ManifoldCFException e) {
> + handleManifoldCFException(e);
> }
> return new Long(seedTime).toString();
> {code}
>  ** DocumentVersionThread
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L304]
> {code:java}
>   try {
> versions = versioningThread.finishUp();
>   } catch (IOException ex) {
> handleIOException((IOException)ex);
> + } catch (ManifoldCFException ex) {
> +   handleManifoldCFException(ex);
>   } catch (InterruptedException ex) {
> throw new ManifoldCFException(ex.getMessage(), ex, 
> ManifoldCFException.INTERRUPTED);
>   }
> {code}
>  ** ExecuteProcessThread
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L445]
> {code:java}
> } catch (InterruptedIOException e) {
>   t.interrupt();
>   throw new ManifoldCFException("Interrupted: " + e.getMessage(), 
> e, ManifoldCFException.INTERRUPTED);
> } catch (IOException e) {
>   handleIOException(e);
> +   } catch (ManifoldCFException e) {
> + handleManifoldCFException(e);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1731) Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector

2022-09-20 Thread Nguyen Huu Nhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nguyen Huu Nhat updated CONNECTORS-1731:

Attachment: patch.txt

> Suggestion for adding handling process for ManifoldCFException in Generic 
> Repository Connector
> --
>
> Key: CONNECTORS-1731
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1731
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Nguyen Huu Nhat
>Assignee: Karl Wright
>Priority: Major
> Attachments: patch.txt
>
>
> Hi there,
> I would like to suggest the following retry-related improvements:
> h3. +*1. Connector name*+
> Generic Repository Connector
> h3. +*2. Reasons for improvement*+
> In the process of ExecuteSeedingThread, DocumentVersionThread and 
> ExecuteProcessThread, when HttpClient.execute is executed, there are chances 
> that connection gets interrupted and connection error occurs (HTTP status 
> code <> 200).
> When HTTP status code <> 200, exception of type ManifoldCFException will be 
> thrown. However, there is no process to handle this, which leads to job being 
> aborted.
> As errors relating to connection (HTTP status code <> 200) can be resolved 
> automatically, retry should be added for cases like this.
> h3. +*3. Improvements*+
> Improvement includes the followings:
>  * Adding method to handle retry for exception of type ManifoldCFException
>  * Calling method to handle  ManifoldCFException exception generated when 
> executing in threads: ExecuteSeedingThread, DocumentVersionThread, 
> ExecuteProcessThread
> h3. +*4. Suggested source code (based on release 2.22.1)*+
>  * Adding method to handle retry for exception of type ManifoldCFException
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1026]
> {code:java}
>   /**
>* Function for handling ManifoldCFException exception caused by connection 
> error.
>* In case of connection error, ServiceInterruption exception is thrown to 
> perform retry.
>* 
>* @param e ManifoldCFException
>* @throws ServiceInterruption
>*/
>   protected static void handleManifoldCFException(ManifoldCFException e)
> throws ServiceInterruption {
> long currentTime = System.currentTimeMillis();
> throw new ServiceInterruption("Connection error: " + e.getMessage(), e, 
> currentTime + 30L,
>   currentTime + timeToFail * 3 * 60 * 6L, -1, false);
>   }
> {code}
>  * Calling method to handle  ManifoldCFException exception generated when 
> executing in threads: ExecuteSeedingThread, DocumentVersionThread, 
> ExecuteProcessThread
>  ** ExecuteSeedingThread
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L256]
> {code:java}
> } catch (InterruptedException e) {
>   t.interrupt();
>   throw new ManifoldCFException("Interrupted: " + e.getMessage(), e,
> ManifoldCFException.INTERRUPTED);
> +   } catch (ManifoldCFException e) {
> + handleManifoldCFException(e);
> }
> return new Long(seedTime).toString();
> {code}
>  ** DocumentVersionThread
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L304]
> {code:java}
>   try {
> versions = versioningThread.finishUp();
>   } catch (IOException ex) {
> handleIOException((IOException)ex);
> + } catch (ManifoldCFException ex) {
> +   handleManifoldCFException(ex);
>   } catch (InterruptedException ex) {
> throw new ManifoldCFException(ex.getMessage(), ex, 
> ManifoldCFException.INTERRUPTED);
>   }
> {code}
>  ** ExecuteProcessThread
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L445]
> {code:java}
> } catch (InterruptedIOException e) {
>   t.interrupt();
>   throw new ManifoldCFException("Interrupted: " + e.getMessage(), 
> e, ManifoldCFException.INTERRUPTED);
> } catch (IOException e) {
>   handleIOException(e);
> +   } catch (ManifoldCFException e) {
> + handleManifoldCFException(e);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1731) Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector

2022-09-20 Thread Nguyen Huu Nhat (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607485#comment-17607485
 ] 

Nguyen Huu Nhat commented on CONNECTORS-1731:
-

I executed above command.
Here is the result of my suggestion changes.
[^patch.txt]
Please check. Thanks!

> Suggestion for adding handling process for ManifoldCFException in Generic 
> Repository Connector
> --
>
> Key: CONNECTORS-1731
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1731
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Nguyen Huu Nhat
>Assignee: Karl Wright
>Priority: Major
> Attachments: patch.txt
>
>
> Hi there,
> I would like to suggest the following retry-related improvements:
> h3. +*1. Connector name*+
> Generic Repository Connector
> h3. +*2. Reasons for improvement*+
> In the process of ExecuteSeedingThread, DocumentVersionThread and 
> ExecuteProcessThread, when HttpClient.execute is executed, there are chances 
> that connection gets interrupted and connection error occurs (HTTP status 
> code <> 200).
> When HTTP status code <> 200, exception of type ManifoldCFException will be 
> thrown. However, there is no process to handle this, which leads to job being 
> aborted.
> As errors relating to connection (HTTP status code <> 200) can be resolved 
> automatically, retry should be added for cases like this.
> h3. +*3. Improvements*+
> Improvement includes the followings:
>  * Adding method to handle retry for exception of type ManifoldCFException
>  * Calling method to handle  ManifoldCFException exception generated when 
> executing in threads: ExecuteSeedingThread, DocumentVersionThread, 
> ExecuteProcessThread
> h3. +*4. Suggested source code (based on release 2.22.1)*+
>  * Adding method to handle retry for exception of type ManifoldCFException
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1026]
> {code:java}
>   /**
>* Function for handling ManifoldCFException exception caused by connection 
> error.
>* In case of connection error, ServiceInterruption exception is thrown to 
> perform retry.
>* 
>* @param e ManifoldCFException
>* @throws ServiceInterruption
>*/
>   protected static void handleManifoldCFException(ManifoldCFException e)
> throws ServiceInterruption {
> long currentTime = System.currentTimeMillis();
> throw new ServiceInterruption("Connection error: " + e.getMessage(), e, 
> currentTime + 30L,
>   currentTime + timeToFail * 3 * 60 * 6L, -1, false);
>   }
> {code}
>  * Calling method to handle  ManifoldCFException exception generated when 
> executing in threads: ExecuteSeedingThread, DocumentVersionThread, 
> ExecuteProcessThread
>  ** ExecuteSeedingThread
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L256]
> {code:java}
> } catch (InterruptedException e) {
>   t.interrupt();
>   throw new ManifoldCFException("Interrupted: " + e.getMessage(), e,
> ManifoldCFException.INTERRUPTED);
> +   } catch (ManifoldCFException e) {
> + handleManifoldCFException(e);
> }
> return new Long(seedTime).toString();
> {code}
>  ** DocumentVersionThread
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L304]
> {code:java}
>   try {
> versions = versioningThread.finishUp();
>   } catch (IOException ex) {
> handleIOException((IOException)ex);
> + } catch (ManifoldCFException ex) {
> +   handleManifoldCFException(ex);
>   } catch (InterruptedException ex) {
> throw new ManifoldCFException(ex.getMessage(), ex, 
> ManifoldCFException.INTERRUPTED);
>   }
> {code}
>  ** ExecuteProcessThread
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L445]
> {code:java}
> } catch (InterruptedIOException e) {
>   t.interrupt();
>   throw new ManifoldCFException("Interrupted: " + e.getMessage(), 
> e, ManifoldCFException.INTERRUPTED);
> } catch (IOException e) {
>   handleIOException(e);
> +   } catch (ManifoldCFException e) {
> + handleManifoldCFException(e);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1731) Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector

2022-09-20 Thread Nguyen Huu Nhat (Jira)
Nguyen Huu Nhat created CONNECTORS-1731:
---

 Summary: Suggestion for adding handling process for 
ManifoldCFException in Generic Repository Connector
 Key: CONNECTORS-1731
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1731
 Project: ManifoldCF
  Issue Type: Improvement
Reporter: Nguyen Huu Nhat


Hi there,

I would like to suggest the following retry-related improvements:

h3. +*1. Connector name*+

Generic Repository Connector

h3. +*2. Reasons for improvement*+

In the process of ExecuteSeedingThread, DocumentVersionThread and 
ExecuteProcessThread, when HttpClient.execute is executed, there are chances 
that connection gets interrupted and connection error occurs (HTTP status code 
<> 200).
When HTTP status code <> 200, exception of type ManifoldCFException will be 
thrown. However, there is no process to handle this, which leads to job being 
aborted.

As errors relating to connection (HTTP status code <> 200) can be resolved 
automatically, retry should be added for cases like this.

h3. +*3. Improvements*+

Improvement includes the followings:
 * Adding method to handle retry for exception of type ManifoldCFException
 * Calling method to handle  ManifoldCFException exception generated when 
executing in threads: ExecuteSeedingThread, DocumentVersionThread, 
ExecuteProcessThread

h3. +*4. Suggested source code (based on release 2.22.1)*+

 * Adding method to handle retry for exception of type ManifoldCFException
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1026]
{code:java}
  /**
   * Function for handling ManifoldCFException exception caused by connection 
error.
   * In case of connection error, ServiceInterruption exception is thrown to 
perform retry.
   * 
   * @param e ManifoldCFException
   * @throws ServiceInterruption
   */
  protected static void handleManifoldCFException(ManifoldCFException e)
throws ServiceInterruption {
long currentTime = System.currentTimeMillis();
throw new ServiceInterruption("Connection error: " + e.getMessage(), e, 
currentTime + 30L,
  currentTime + timeToFail * 3 * 60 * 6L, -1, false);
  }
{code}

 * Calling method to handle  ManifoldCFException exception generated when 
executing in threads: ExecuteSeedingThread, DocumentVersionThread, 
ExecuteProcessThread
 ** ExecuteSeedingThread
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L256]
{code:java}
} catch (InterruptedException e) {
  t.interrupt();
  throw new ManifoldCFException("Interrupted: " + e.getMessage(), e,
ManifoldCFException.INTERRUPTED);
+   } catch (ManifoldCFException e) {
+ handleManifoldCFException(e);
}
return new Long(seedTime).toString();
{code}
 ** DocumentVersionThread
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L304]
{code:java}
  try {
versions = versioningThread.finishUp();
  } catch (IOException ex) {
handleIOException((IOException)ex);
+ } catch (ManifoldCFException ex) {
+   handleManifoldCFException(ex);
  } catch (InterruptedException ex) {
throw new ManifoldCFException(ex.getMessage(), ex, 
ManifoldCFException.INTERRUPTED);
  }
{code}
 ** ExecuteProcessThread
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L445]
{code:java}
} catch (InterruptedIOException e) {
  t.interrupt();
  throw new ManifoldCFException("Interrupted: " + e.getMessage(), 
e, ManifoldCFException.INTERRUPTED);
} catch (IOException e) {
  handleIOException(e);
+   } catch (ManifoldCFException e) {
+ handleManifoldCFException(e);
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1730) Improvement suggestion for retry function in SharedDriveConnector

2022-09-19 Thread Nguyen Huu Nhat (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606832#comment-17606832
 ] 

Nguyen Huu Nhat commented on CONNECTORS-1730:
-

Thank you for your response.
After carefully checking the source code again, I can see that possible 
problems have alreay been handled by ServiceInterruption to retry.
As for that reason, I am cancelling this ticket.

> Improvement suggestion for retry function in SharedDriveConnector
> -
>
> Key: CONNECTORS-1730
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1730
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Nguyen Huu Nhat
>Assignee: Karl Wright
>Priority: Major
>
> Hi there,
> I would like to suggest the following retry-related improvements:
> h3. +*1. Connector name*+
> SharedDriveConnector
> h3. +*2. Overview*+
>  * When connection to SMB can't be executed, JCIFS connector will fail to 
> connect and exception will occur. JCIFS will attempt to retry, and abort 
> after a certain number of time.
>  * The number of retry is currently controlled by the following parameters:
>  ** *retriesRemaining* (hardcode:3): The number of occurence of the same 
> exception for a file or method. If a different exception occurs, this value 
> is reset to 3.
>  ** *totalTries* (hardcode:5): The total number of occurrence of an exception 
> for a file or method.
> For the two variables above, if *retriesRemaining* becomes 0 or *totalTries* 
> becomes 5 then the job will be aborted.
> h3. +*3. Reasons for improvement*+
> Currently the maximum number of retry is being hardcoded at 3 and 5, 
> respectively.
> In case connection to file server is unstable, to avoid aborting, I would 
> like to suggest making these values customizable.
> For implementation, I would like to suggest the following methods:
>  * 1/ Setting retry values in *properties.xml*
>  * 2/ Setting retry values on WebUI of repository connection
> Between the two methods above, I suggest the first method because of 
> following reasons:
>  * The first method is easier to implement
>  * Although the second method is more user-friendly, there are several issues:
>  ** The config data from the screen will have to be stored in the database 
> (PostgreSQL), resulting in an increased number of fields.
>  ** Consequently, there might be a need to perform DB Migration in case 
> further changes to setting field are needed.
> ※ According to the above reasons, I will proceed with the first method 
> 'Setting retry values in properties.xml' for the next part of this suggestion.
> h3. +*4. Improvement*+
> Changing source code to read maximum number of retries from *properties.xml*
> Declare two variables in *properties.xml* to set the maximum number of retry:
>  * 
> `org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount`
> ⇒ Set to `consecutiveSMBExceptionRetryCount` 
>  * `org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount`
> ⇒ Set to `totalSMBRetryCount`
> E.g:
> {code:xml}
>name="org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount"
>  value="3"/>
>name="org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount" 
> value="5"/>
> {code}
> SharedDriveConnector will load these values from the file and set to two 
> variables within the source code.
> ※In case these values can't be found from the file or set to an invalid 
> value, default values will be used instead.
> h3. +*5. Suggested source code (based on release 2.22.1)*+
> Target class: 
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector
>  * Declare two class variables to store the configured values as follows:
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java#L103]
>  
> {code:java}
>   private final static int consecutiveSMBExceptionRetryCount;
>   private final static int totalSMBRetryCount;
> {code}
>   
>  * Initialize the two variables above with following steps:
>  ** Set the values configured in 'properties.xml' to the two variables above
>  ** If these values weren't configured or invalid, set them to default values 
> of 3 and 5, respectively.
> [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java#L106]
> {code:java}
>   // Static initialization of various system properties.  This hopefully 
> takes place
>   // before jcifs is loaded.
>   static
>   {
>   ...
>   int tempConsecutiveSMBExceptionRetryCount = 3;
>   int tempTotalSMBRetryCount = 5;
>   try {
>   

[jira] [Updated] (CONNECTORS-1730) Improvement suggestion for retry function in SharedDriveConnector

2022-09-07 Thread Nguyen Huu Nhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nguyen Huu Nhat updated CONNECTORS-1730:

Description: 
Hi there,

I would like to suggest the following retry-related improvements:

h3. +*1. Connector name*+

SharedDriveConnector

h3. +*2. Overview*+

 * When connection to SMB can't be executed, JCIFS connector will fail to 
connect and exception will occur. JCIFS will attempt to retry, and abort after 
a certain number of time.

 * The number of retry is currently controlled by the following parameters:
 ** *retriesRemaining* (hardcode:3): The number of occurence of the same 
exception for a file or method. If a different exception occurs, this value is 
reset to 3.
 ** *totalTries* (hardcode:5): The total number of occurrence of an exception 
for a file or method.

For the two variables above, if *retriesRemaining* becomes 0 or *totalTries* 
becomes 5 then the job will be aborted.

h3. +*3. Reasons for improvement*+

Currently the maximum number of retry is being hardcoded at 3 and 5, 
respectively.
In case connection to file server is unstable, to avoid aborting, I would like 
to suggest making these values customizable.

For implementation, I would like to suggest the following methods:
 * 1/ Setting retry values in *properties.xml*
 * 2/ Setting retry values on WebUI of repository connection

Between the two methods above, I suggest the first method because of following 
reasons:
 * The first method is easier to implement
 * Although the second method is more user-friendly, there are several issues:
 ** The config data from the screen will have to be stored in the database 
(PostgreSQL), resulting in an increased number of fields.
 ** Consequently, there might be a need to perform DB Migration in case further 
changes to setting field are needed.

※ According to the above reasons, I will proceed with the first method 'Setting 
retry values in properties.xml' for the next part of this suggestion.

h3. +*4. Improvement*+

Changing source code to read maximum number of retries from *properties.xml*

Declare two variables in *properties.xml* to set the maximum number of retry:
 * 
`org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount`
⇒ Set to `consecutiveSMBExceptionRetryCount` 
 * `org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount`
⇒ Set to `totalSMBRetryCount`
E.g:
{code:xml}
  
  
{code}
SharedDriveConnector will load these values from the file and set to two 
variables within the source code.
※In case these values can't be found from the file or set to an invalid value, 
default values will be used instead.

h3. +*5. Suggested source code (based on release 2.22.1)*+

Target class: 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector

 * Declare two class variables to store the configured values as follows:
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java#L103]
 
{code:java}
  private final static int consecutiveSMBExceptionRetryCount;
  private final static int totalSMBRetryCount;
{code}

 * Initialize the two variables above with following steps:
 ** Set the values configured in 'properties.xml' to the two variables above
 ** If these values weren't configured or invalid, set them to default values 
of 3 and 5, respectively.
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java#L106]
{code:java}
  // Static initialization of various system properties.  This hopefully takes 
place
  // before jcifs is loaded.
  static
  {
...
int tempConsecutiveSMBExceptionRetryCount = 3;
int tempTotalSMBRetryCount = 5;

try {
tempConsecutiveSMBExceptionRetryCount = 
ManifoldCF.getIntProperty("org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount",
 tempConsecutiveSMBExceptionRetryCount);
} catch (ManifoldCFException e) {
Logging.connectors.warn("Invalid property value for " + 
"org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount,
 must be integer. Setting to default: " + 
Integer.toString(tempConsecutiveSMBExceptionRetryCount));
}
consecutiveSMBExceptionRetryCount = 
tempConsecutiveSMBExceptionRetryCount;
try {
tempTotalSMBRetryCount = 
ManifoldCF.getIntProperty("org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount",
 tempTotalSMBRetryCount);
} catch (ManifoldCFException e) {
Logging.connectors.warn("Invalid property value for " + 
"org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount, must 
be integer. Setting to default: " + 

[jira] [Created] (CONNECTORS-1730) Improvement suggestion for retry function in SharedDriveConnector

2022-09-07 Thread Nguyen Huu Nhat (Jira)
Nguyen Huu Nhat created CONNECTORS-1730:
---

 Summary: Improvement suggestion for retry function in 
SharedDriveConnector
 Key: CONNECTORS-1730
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1730
 Project: ManifoldCF
  Issue Type: Improvement
Reporter: Nguyen Huu Nhat


Hi there,

I would like to suggest the following retry-related improvements:

h3. +*1. Connector name*+

SharedDriveConnector

h3. +*2. Preface*+

 * When connection to SMB can't be executed, JCIFS connector will fail to 
connect and exception will occur. JCIFS will attempt to retry, and abort after 
a certain number of time.

 * The number of retry is currently controlled by the folloing parameter:
 ** *retriesRemaining* (hardcode:3): The number of occurence of the same 
exception for a file or method. If a different exception occurs, this value is 
reset to 3.
 ** *totalTries* (hardcode:5): The total number of occurrence of an exception 
for a file or method.

For the two variables above, if *retriesRemaining* becomes 0 or *totalTries* 
becomes 5 then the job will be aborted.

h3. +*3. Reasons for improvement*+

Currently the maximum number of retry is being hardcoded at 3 and 5, 
respectively.
In case connection to file server is unstable, to avoid aborting, I would like 
to suggest making these values customizable.

For implementation, I would like to suggest the following methods:
 * 1/ Setting retry values in *properties.xml*
 * 2/ Setting retry values on WebUI of repository connection

Between the two methods above, I suggest the first method because of following 
reasons:
 * The first method is easier to implement
 * Although the second method is more user-friendly, there are several issues:
 ** The config data from the screen will have to be stored in the database 
(PostgreSQL), resulting in an increased number of fields.
 ** Consequently, there might be a need to perform DB Migration in case further 
changes to setting field are needed.

※ According to the above reasons, I will proceed with the first method 'Setting 
retry values in properties.xml' for the next part of this suggestion.

h3. +*4. Improvement*+

Changing source code to read maximum number of retries from *properties.xml*

Declare two variables in *properties.xml* to set the maximum number of retry:
 * 
`org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount`
⇒ Set to `consecutiveSMBExceptionRetryCount` 
 * `org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount`
⇒ Set to `totalSMBRetryCount`
E.g:
{code:xml}
  
  
{code}
SharedDriveConnector will load these values from the file and set to two 
variables within the source code.
※In case these values can't be found from the file or set to an invalid value, 
default values will be used instead.

h3. +*5. Suggested source code (based on release 2.22.1)*+

Target class: 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java#L103]

 * Declare two class variables to store the configured values as follows:   

{code:java}
  private final static int consecutiveSMBExceptionRetryCount;
  private final static int totalSMBRetryCount;
{code}

 * Initialize the two variables above with following steps:
 ** Set the values configured in 'properties.xml' to the two variables above
 ** If these values weren't configured or invalid, set them to default values 
of 3 and 5, respectively.
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java#L106]
{code:java}
  // Static initialization of various system properties.  This hopefully takes 
place
  // before jcifs is loaded.
  static
  {
...
int tempConsecutiveSMBExceptionRetryCount = 3;
int tempTotalSMBRetryCount = 5;

try {
tempConsecutiveSMBExceptionRetryCount = 
ManifoldCF.getIntProperty("org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount",
 tempConsecutiveSMBExceptionRetryCount);
} catch (ManifoldCFException e) {
Logging.connectors.warn("Invalid property value for " + 
"org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount,
 must be integer. Setting to default: " + 
Integer.toString(tempConsecutiveSMBExceptionRetryCount));
}
consecutiveSMBExceptionRetryCount = 
tempConsecutiveSMBExceptionRetryCount;
try {
tempTotalSMBRetryCount = 
ManifoldCF.getIntProperty("org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount",
 tempTotalSMBRetryCount);
} catch (ManifoldCFException e) {
  

[jira] [Created] (CONNECTORS-1729) The Confluence-v6 Repository Connector's attachment logic is incorrect

2022-08-29 Thread Nguyen Huu Nhat (Jira)
Nguyen Huu Nhat created CONNECTORS-1729:
---

 Summary: The Confluence-v6 Repository Connector's attachment logic 
is incorrect
 Key: CONNECTORS-1729
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1729
 Project: ManifoldCF
  Issue Type: Bug
Reporter: Nguyen Huu Nhat


Hi there,

As there is an issue that is still not handled occurs in use, I would like to 
suggest the following fix for the source code of Confluence Repository 
Connector.
For details about this issue, please refer to the information below:

h3. +*1. Connector Name*+

confluence-v6 \ Confluence Repository Connector

h3. +*2. Overview*+

 * In the Confluence Repository Connector, there is an error in the logic that 
determines wether the document has attachments or not.
 * Wrong logic leads to attachments not being crawled.

※ This error only occurs when crawling documents from Confluence server, while 
crawling documents from Confluence Cloud (SaaS) still works normally.
 * Formats of the document's ID when there is a file attached are as below:
 ** Crawled from Confluence server: *-*
 ** Crawled from Confluence cloud (SaaS): *att-*

h3. +*3. Reproduction*+

 * On Confluence server:
 ** Create a blog.
 ** Add attachments to the newly created blog.
 * On ManifoldCF:
 ** Create a Confluence Repository Connector with the aforementioned Confluence 
server information.
 ** Create a job using the connector created above with the following details:
 *** On the [Page] tab:
  Process Attachments: (Check).
  Type Specification: Blog.
 ** Start job.
 ** Check [Simple History Report].

h3. +*4. Cause*+

 * At the logic for judging whether the document has / does not have a file 
attachment, if the ID of the document begins with *att*, it is judging that 
there is a file attachment.
 * However, the ID field of the document crawled from the Confluence server, in 
fact, when the file is attached, does not prefix it with *att* (format 
mentioned in item 2).

h3. +*5. Solution*+

My observation is as below:
 * If a document has a file attachment, the ID of that document is a string of 
characters connected by *-* character.
 * If a document does not have a file attachment, the ID of that document does 
not contain *-* character.

Therefore, it is possible to judge whether a file is is attached or not by 
checking if the ID contains *-* character.

h3. +*6. Suggested source code (based on release 2.22.1)*+

***Class: 
org.apache.manifoldcf.crawler.connectors.confluence.v6.util.ConfluenceUtil***

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/confluence-v6/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/confluence/v6/util/ConfluenceUtil.java#L28]
{code:java}
-  private static final String ATTACHMENT_ID_PREFIX = "att";
+  private static final String ATTACHMENT_ID_CHARACTER = "-";
{code}

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/confluence-v6/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/confluence/v6/util/ConfluenceUtil.java#L47]
{code:java}
   public static Boolean isAttachment(String id) {
-return id.startsWith(ATTACHMENT_ID_PREFIX);
+return id.contains(ATTACHMENT_ID_CHARACTER);
   }
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1728) Fix error message of Generic Repository Connector

2022-08-29 Thread Nguyen Huu Nhat (Jira)
Nguyen Huu Nhat created CONNECTORS-1728:
---

 Summary: Fix error message of Generic Repository Connector
 Key: CONNECTORS-1728
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1728
 Project: ManifoldCF
  Issue Type: Bug
Reporter: Nguyen Huu Nhat


Hi there,

As there is a problem that is still not addressed during use, I would like to 
suggest the following correction for the source code of the Generic Repository 
Connector.
For additional details, please see below:

h3. +*1. Connector name*+

Generic Repository Connector

h3. +*2. Issue*+

In the *run()* method of the *GenericConnector$DocumentVersionThread* class, if 
connector cannot connect to REST API (HTTP status code != 200), there is an 
error message in log file:
[ *addSeedDocuments error* - interface returned incorrect return code for: ... ]

However, this is *DocumentVersionThread* thread, not *ExecuteSeedingThread* 
thread. The *addSeedDocuments error* prefix is not suiable to this thread.
I think it should be *getDocumentVersions error* prefix.

h3. +*3. Cause*+

This may be a copy/paste mistake:
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1207]
{code:java}
  if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) {
exception = new ManifoldCFException("addSeedDocuments error - 
interface returned incorrect return code for: " + url + " - " + 
response.getStatusLine().toString());
return;
  }
{code}

h3. +*4. Solution*+

Updating the content of this error message from [addSeedDocuments error] to 
[getDocumentVersions error]

h3. +*5. Suggested source code (based on release 2.22.1)*+

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1207]
{code:java}
  if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) {
-   exception = new ManifoldCFException("addSeedDocuments error - 
interface returned incorrect return code for: " + url + " - " + 
response.getStatusLine().toString());
+   exception = new ManifoldCFException("getDocumentVersions error - 
interface returned incorrect return code for: " + url + " - " + 
response.getStatusLine().toString());
return;
  }
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1727) Timeout values for Genreric Authority is not updated after setting

2022-08-29 Thread Nguyen Huu Nhat (Jira)
Nguyen Huu Nhat created CONNECTORS-1727:
---

 Summary: Timeout values for Genreric Authority is not updated 
after setting
 Key: CONNECTORS-1727
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1727
 Project: ManifoldCF
  Issue Type: Bug
Reporter: Nguyen Huu Nhat


Hi there,

As there is a problem that is still not addressed during use, I would like to 
suggest the following correction for the source code of the Generic Authority 
Connector.
※This is the same issue as that in Generic Repository Connector, which was 
resolved at 
[CONNECTORS-1726|https://issues.apache.org/jira/browse/CONNECTORS-1726]
For additional details, please see below:

h3. +*1. Connector name*+

Generic Authority Connector

h3. +*2. Issue*+

When I create or edit a Generic authority connection, I cannot update the value 
in the following fields:
 * Connection timeout (milis)
 * Socket timeout (milis)

h3. +*3. Reproduction*+

 * Create a Generic authority connection
 ** On *Entry point* tab, edit the values of *Connection timeout (milis)* and 
*Socket timeout (milis)* fields
 ** Click on *Save* button
 * On *View Authority Connection Status - Generic* screen, it can be seen that 
the values of the 2 above fields are not updated.

h3. +*4. Cause*+

The names of the textboxes for the 2 fields are the followings:
 * genericConTimeout
 * genericSoTimeout

However, the names that are being used inside the source code are the 
followings:
 * genericConnectionTimeout
 * genericSocketTimeout

This results in that new values can not be obtained, thus the values of the 2 
fields can not be updated.

h3. +*5. Solution*+

Update parameter names for Connection Timeout and Socket Timeout with names 
that are being stored inside the DataBase:
 * genericConTimeout ➞ genericConnectionTimeout
 * genericSoTimeout ➞ genericSocketTimeout

h3. +*6. Suggested source code (based on release 2.22.1)*+

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/authorities/authorities/generic/GenericAuthority.java#L400]
{code:java}
+ " \n"
+ "  " + Messages.getBodyString(locale, 
"generic.ConnectionTimeoutColon") + "\n"
-   + "  \n"
+   + "  \n"
+ " \n"
+ " \n"
+ "  " + Messages.getBodyString(locale, 
"generic.SocketTimeoutColon") + "\n"
-   + "  \n"
+   + "  \n"
+ " \n"
{code}

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/authorities/authorities/generic/GenericAuthority.java#L415]
{code:java}
- out.print("\n");
- out.print("\n");
+ out.print("\n");
+ out.print("\n");
{code}

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/authorities/authorities/generic/GenericAuthority.java#L428]
{code:java}
-   copyParam(variableContext, parameters, "genericConTimeout");
-   copyParam(variableContext, parameters, "genericSoTimeout");
+   copyParam(variableContext, parameters, "genericConnectionTimeout");
+   copyParam(variableContext, parameters, "genericSocketTimeout");
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1726) Timeout values for Genreric Repository is not updated after setting

2022-08-28 Thread Nguyen Huu Nhat (Jira)
Nguyen Huu Nhat created CONNECTORS-1726:
---

 Summary: Timeout values for Genreric Repository is not updated 
after setting
 Key: CONNECTORS-1726
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1726
 Project: ManifoldCF
  Issue Type: Bug
Reporter: Nguyen Huu Nhat


Hi there,

As there is a problem that is still not addressed during use, I would like to 
suggest the following correction for the source code of the Generic Repository 
Connector.
For additional details, please see below:

h3. +*1. Connector name*+

Generic Repository Connector

h3. +*2. Issue*+

When I create or edit a Generic repository connection, I cannot update the 
value in the following fields:
 * Connection timeout (milis)
 * Socket timeout (milis)

h3. +*3. Reproduction*+

 * Create a Generic repository connection
 ** On *Entry point* tab, edit the values of *Connection timeout (milis)* and 
*Socket timeout (milis)* fields
 ** Click on *Save* button
 * On *View Repository Connection Status - Generic* screen, it can be seen that 
the values of the 2 above fields are not updated.

h3. +*4. Cause*+

The names of the textboxes for the 2 fields are the followings:
 * genericConTimeout
 * genericSoTimeout

However, the names that are being used inside the source code are the 
followings:
 * genericConnectionTimeout
 * genericSocketTimeout

This results in that new values can not be obtained, thus the values of the 2 
fields can not be updated.

h3. +*5. Solution*+

Update parameter names for Connection Timeout and Socket Timeout with names 
that are being stored inside the DataBase:
 * genericConTimeout ➞ genericConnectionTimeout
 * genericSoTimeout ➞ genericSocketTimeout

h3. +*6. Suggested source code (based on release 2.22.1)*+

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L510]
{code:java}
+ " \n"
+ "  " + Messages.getBodyString(locale, 
"generic.ConnectionTimeoutColon") + "\n"
-   + "  \n"
+   + "  \n"
+ " \n"
+ " \n"
+ "  " + Messages.getBodyString(locale, 
"generic.SocketTimeoutColon") + "\n"
-   + "  \n"
+   + "  \n"
+ " \n"
{code}

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L523]
{code:java}
- out.print("\n");
- out.print("\n");
+ out.print("\n");
+ out.print("\n");
{code}

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L535]
{code:java}
-   copyParam(variableContext, parameters, "genericConTimeout");
-   copyParam(variableContext, parameters, "genericSoTimeout");
+   copyParam(variableContext, parameters, "genericConnectionTimeout");
+   copyParam(variableContext, parameters, "genericSocketTimeout");
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1725) MissingResourceException exception occurs at Generic Repository Connector

2022-08-28 Thread Nguyen Huu Nhat (Jira)
Nguyen Huu Nhat created CONNECTORS-1725:
---

 Summary: MissingResourceException exception occurs at Generic 
Repository Connector
 Key: CONNECTORS-1725
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1725
 Project: ManifoldCF
  Issue Type: Bug
Reporter: Nguyen Huu Nhat


Hi there,

As there is a problem that is still not addressed during use, I would like to 
suggest the following correction for the source code of the Generic Repository 
Connector.
For additional details, please see below:

h3. +*1. Connector name*+

Generic Repository Connector

h3. +*2. Issue*+

When I create or edit a job using Generic Repository Connector, if I add a 
parameter without designate its *Parameter name* field, the alert message 
*generic.TypeInParamName* appears.

An error log is as follows:
{noformat}
ERROR 2022-08-04T15:45:43,443 (qtp10405169-442) - Missing resource 
'generic.TypeInParamName' in bundle 
'org.apache.manifoldcf.crawler.connectors.generic.common' for locale 'en'
java.util.MissingResourceException: Can't find resource for bundle 
java.util.PropertyResourceBundle, key generic.TypeInParamName
at java.util.ResourceBundle.getObject(ResourceBundle.java:450) 
~[?:1.8.0_211]
at java.util.ResourceBundle.getString(ResourceBundle.java:407) 
~[?:1.8.0_211]
at org.apache.manifoldcf.core.i18n.Messages.getMessage(Messages.java:195) 
~[mcf-core.jar:?]
at org.apache.manifoldcf.core.i18n.Messages.getMessage(Messages.java:184) 
~[mcf-core.jar:?]
at org.apache.manifoldcf.core.i18n.Messages.getString(Messages.java:218) 
~[mcf-core.jar:?]
at 
org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:343)
 ~[mcf-ui-core.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.generic.Messages.getBodyJavascriptString(Messages.java:95)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.generic.Messages.getBodyJavascriptString(Messages.java:54)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.generic.GenericConnector.outputSpecificationHeader(GenericConnector.java:610)
 ~[?:?]
{noformat}

h3. +*3. Reproduction*+

 * Create a Generic Repository Connector
 * Create a job using the connector created above with the following details:
 ** On the Parameters tab, add the following parameters:
 *** Parameter name: blank
 *** Parameter value: 
 *** Click [Add]

h3. +*4. Cause*+

Key *generic.TypeInParamName* is not present in native2ascii _*.properties_ 
files, however it is in use.
Perhaps, this key is being mistaken with *generic.TypeInParameterName*, that is 
not in use.

h3. +*5. Solution*+

Update the property key used in java classes so that it matches the one defined 
in _*.properties_ files:
*generic.TypeInParamName* ➞ *generic.TypeInParameterName*

h3. +*6. Suggested source code (based on release 2.22.1)*+

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L610]
{code:java}
  + "function "+seqPrefix+"SpecAddParam(anchorvalue) {\n"
  + "  if (editjob."+seqPrefix+"specparamname.value == \"\")\n"
  + "  {\n"
- + "alert(\"" + Messages.getBodyJavascriptString(locale, 
"generic.TypeInParamName") + "\");\n"
+ + "alert(\"" + Messages.getBodyJavascriptString(locale, 
"generic.TypeInParameterName") + "\");\n"
  + "editjob."+seqPrefix+"specparamname.focus();\n"
  + "return;\n"
  + "  }\n"
  + "  "+seqPrefix+"SpecOp(\""+seqPrefix+"paramop\",\"Add\",anchorvalue);\n"
  + "}\n"
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1724) When the REST API cannot be connected, job using the Generic Repository Connector would be freezed.

2022-08-24 Thread Nguyen Huu Nhat (Jira)
Nguyen Huu Nhat created CONNECTORS-1724:
---

 Summary: When the REST API cannot be connected, job using the 
Generic Repository Connector would be freezed.
 Key: CONNECTORS-1724
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1724
 Project: ManifoldCF
  Issue Type: Bug
Reporter: Nguyen Huu Nhat


Hi there,

As there is an issue that is still not handled occurs in use, I would like to 
suggest the following fix for the source code of Generic repository connector.
For details about this issue, please refer to the information below:

h3. +*1. Connector name*+

Generic Repository Connector

h3. +*2. Issue*+

When Generic Repository is calling REST API with _action=seed_ and an error 
occurs, corresponding error handling is not executed, which results in that 
crawling job of ManifoldCF is frozen at status *Starting up* and no error 
message is outputted.
 * When this issue happens in the Generic Repository, seed phase of jobs in 
other repositories also freezes (perhaps, seed thread is also frozen)
 * Even after ManifoldCF is restarted, as jobs are automatically executed, the 
same issue happens again.
 * A temporary solution is to aborting the job and recheck the connection.

h3. +*3. Reproduction*+

h4. *Reproduction method:*
 * At setting step for Generic repository connection, set a non-existent entry 
point (e.g. [http://localhost/no*exist/]). Then, define a job that uses that 
entry point and run that job.
 * 10 minutes or more after the job gets started, its status is still *Starting 
up* and abnormal end does not occur due to connection error and time-out.

h4. *Reproduction steps:*
 * Create a Generic repository connection with the following settings:
 ** On the *Entry Point* tab, set a non-existent entry point (e.g. 
[http://localhost/no*exist/])
 * Create a job using above Generic repository connection
 * Start the created job and keep track of its status
 ** Job is going to be frozen with the following information:
 *** Status: Starting up
 *** Start Time: Not started
 *** Documents: 0
 ** No new events appear in *Document Status*
 ** No errors get logged in manifoldcf.log

h3. +*4. Cause*+

In *GenericConnector$ExecuteSeedingThread* class, *seedBuffer.signalDone()* 
method is only called when returned HTTP status code is 200.
 * When the connector is not able to connect to REST API, which means that 
returned HTTP status code is not 200, *seedBuffer.signalDone()* method is not 
called.
 ** This results in that *complete* flag is not reassigned as _true_
 ** As *complete* flag is not reassigned as _true_ and *buffer.size()* is 0, 
job is stuck in the *wait()* process, inside the while loop of 
*XThreadStringBuffer#fetch()* method.

([https://github.com/apache/manifoldcf/blob/release-2.22.1/framework/connector-common/src/main/java/org/apache/manifoldcf/connectorcommon/common/XThreadStringBuffer.java#L78])
{code:java}
while (buffer.size() == 0 && !complete)
  wait();
{code}

⇒ These are the reasons why job is frozen at status *Starting up*

h3. +*5. Solution*+

In order to resolve this issue, we suggest the following things:
 * *seedBuffer.signalDone()* method should be called for all cases of HTTP 
response status.
 * Moreover, when HTTP status code is not 200, ManifoldCFException is thrown. 
There is no process to handle ManifoldCFException in *finishUp()* method of 
*GenericConnector$ExecuteSeedingThread* class, so process to handle this 
exception should be added.

h3. +*6. Suggested source code (based on release 2.22.1)*+

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1151]
{code:java}
- seedBuffer.signalDone();
} finally {
  EntityUtils.consume(response.getEntity());
  method.releaseConnection();
+ seedBuffer.signalDone();
}
{code}

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1120]
{code:java}
if (thr instanceof RuntimeException) {
  throw (RuntimeException) thr;
} else if (thr instanceof Error) {
  throw (Error) thr;
+   } else if (thr instanceof ManifoldCFException) {
+ throw (ManifoldCFException) thr;
} else {
  throw new RuntimeException("Unhandled exception of type: " + 
thr.getClass().getName(), thr);
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)