[jira] [Commented] (CONNECTORS-1738) Suggestion for adding function that allows setting timeout values for Elasticsearch Output Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620720#comment-17620720 ] Nguyen Huu Nhat commented on CONNECTORS-1738: - Hi [~kwri...@metacarta.com] , I understand that you are busy, but could you please have a look at this suggestion. Thank you so much! > Suggestion for adding function that allows setting timeout values for > Elasticsearch Output Connector > > > Key: CONNECTORS-1738 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1738 > Project: ManifoldCF > Issue Type: Improvement >Reporter: Nguyen Huu Nhat >Priority: Major > Attachments: EditConnection.PNG, ViewConnection.PNG, patch.txt > > > Hi there, > For Elasticsearch Output Connector, during use, I have exeperienced cases > that required the values of *socketTimeout* and *connectionTimeout* to be > increased. > However, as those values are being hardcoded within the source code as > 90(ms) and 6(ms) respectively, it is quite troublesome to update them > in cases mentioned above. > For this reason, instead of hardcoding, I think it would be better that the > values of *socketTimeout* and *connectionTimeout* can be edited through > WebUI, on the connection setting screen. > In ManifoldCF, there are also a few other connectors that support setting > *socketTimeout* and {*}connectionTimeout{*}, such as Generic, Confluence, etc. > Therefore, I would like to suggest modifying the ElasticSearch Output > Connector's source code to allow setting *socketTimeout* and > *connectionTimeout* value when it is needed. > h3. +*1. Connector Name*+ > ElasticSearch Output Connector > h3. +*2. Improvement Detail*+ > On connection setting screen (WebUI), add handling method that enable value > setting for *socketTimeout* and *connectionTimeout* > ※The default value for *socketTimeout* and *connectionTimeout* are still > 90 and 6 (ms) as they are. > The connection setting screen will look like below: > !EditConnection.PNG! > h3. +*3. Suggested source code (based on release 2.22.1)*+ > Because the content is edited in many files & the number of LOC is quite > large, > I will attach the patch file here, please check it out. > [^patch.txt] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CONNECTORS-1738) Suggestion for adding function that allows setting timeout values for Elasticsearch Output Connector
Nguyen Huu Nhat created CONNECTORS-1738: --- Summary: Suggestion for adding function that allows setting timeout values for Elasticsearch Output Connector Key: CONNECTORS-1738 URL: https://issues.apache.org/jira/browse/CONNECTORS-1738 Project: ManifoldCF Issue Type: Improvement Reporter: Nguyen Huu Nhat Attachments: EditConnection.PNG, ViewConnection.PNG, patch.txt Hi there, For Elasticsearch Output Connector, during use, I have exeperienced cases that required the values of *socketTimeout* and *connectionTimeout* to be increased. However, as those values are being hardcoded within the source code as 90(ms) and 6(ms) respectively, it is quite troublesome to update them in cases mentioned above. For this reason, instead of hardcoding, I think it would be better that the values of *socketTimeout* and *connectionTimeout* can be edited through WebUI, on the connection setting screen. In ManifoldCF, there are also a few other connectors that support setting *socketTimeout* and {*}connectionTimeout{*}, such as Generic, Confluence, etc. Therefore, I would like to suggest modifying the ElasticSearch Output Connector's source code to allow setting *socketTimeout* and *connectionTimeout* value when it is needed. h3. +*1. Connector Name*+ ElasticSearch Output Connector h3. +*2. Improvement Detail*+ On connection setting screen (WebUI), add handling method that enable value setting for *socketTimeout* and *connectionTimeout* ※The default value for *socketTimeout* and *connectionTimeout* are still 90 and 6 (ms) as they are. The connection setting screen will look like below: !EditConnection.PNG! h3. +*3. Suggested source code (based on release 2.22.1)*+ Because the content is edited in many files & the number of LOC is quite large, I will attach the patch file here, please check it out. [^patch.txt] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CONNECTORS-1737) Suggestion for adding function for proxy configuration for connector of Confluence-V6
[ https://issues.apache.org/jira/browse/CONNECTORS-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Huu Nhat updated CONNECTORS-1737: Description: Hi there, Currently, I am having a problem regarding the need for proxy information to connect to Confluence Cloud (SaaS) from the Confluence connector. As you know, we can use Confluence Server (self-built environment) or use Confluence Cloud (SaaS). Using Confluence Server, you can set up a server (for example, located in the same place where ManifoldCF is running) so you may not need to use proxy information for the connection. However, for Confluence Cloud (SaaS), there will be cases where you need to set proxy information to be able to connect. I have checked in a few other connectors of ManifoldCF, there are a few connectors that support setting proxy information for the connection, for example: SharePoint, Jira, Slack, ... Therefore, I would like to suggest editing Confluence-v6's sourcecode (including Authority connector and Repository connector) so that we can have an option to set proxy information to use for connection to Confluence when needed. h3. +*Connector Name*+ - Confluence-v6 (including Authority connector and Repository connector) h3. +*Improvement Detail*+ - At the Confluence connection setting screen, add fields that allow the user to set proxy information (eg: proxyProtocol, proxyHost, proxyPort..). The proxy setting screen will look like below: !EditConnection.png! - When connecting (call method `connect()`), use the set field values as proxy information. ※ This improvement applies to both the Authority Connector and the Repository Connector side. h3. +*Suggested source code (based on release 2.22.1)*+ Because the content is edited in many files and the number of LOC is quite large, I will attach the patch file here, please check it. [^patch.txt] was: Hi there, Currently, I am having a problem regarding the need for proxy information to connect to Confluence Cloud (SaaS) from the Confluence connector. As you know, we can use Confluence Server (self-built environment) or use Confluence Cloud (SaaS). Using Confluence Server, you can set up a server (for example, located in the same place where ManifoldCF is running) so you may not need to use proxy information for the connection. However, for Confluence Cloud (SaaS), there will be cases where you need to set proxy information to be able to connect. I have checked in a few other connectors of ManifoldCF, there are a few connectors that support setting proxy information for the connection, for example: SharePoint, Jira, Slack, ... Therefore, I would like to suggest editing Confluence-v6's sourcecode (including Authority connector and Repository connector) so that we can have an option to set proxy information to use for connection to Confluence when needed. h3. +*Connector Name*+ - Confluence-v6 (including Authority connector and Repository connector) h3. +*Improvement Detail*+ - At the Confluence connection setting screen, add fields that allow the user to set proxy information (eg: proxyProtocol, proxyHost, proxyPort..). The proxy setting screen will look like below: !EditConnection.png! - When connecting (call method `connect()`), use the set field values as proxy information. ※ This improvement applies to both the Authority Connector and the Repository Connector side. h3. +*Source Code Modification*+ Because the content is edited in many files and the number of LOC is quite large, I will attach the patch file here, please check it. [^patch.txt] > Suggestion for adding function for proxy configuration for connector of > Confluence-V6 > - > > Key: CONNECTORS-1737 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1737 > Project: ManifoldCF > Issue Type: Improvement >Reporter: Nguyen Huu Nhat >Priority: Major > Attachments: EditConnection.png, ViewConnection.png, patch.txt > > > Hi there, > Currently, I am having a problem regarding the need for proxy information to > connect to Confluence Cloud (SaaS) from the Confluence connector. > As you know, we can use Confluence Server (self-built environment) or use > Confluence Cloud (SaaS). > Using Confluence Server, you can set up a server (for example, located in the > same place where ManifoldCF is running) so you may not need to use proxy > information for the connection. > However, for Confluence Cloud (SaaS), there will be cases where you need to > set proxy information to be able to connect. > I have checked in a few other connectors of ManifoldCF, there are a few > connectors that support setting proxy information for the connection, for > example: SharePoint, Jira, Slack, ... > Therefore, I would like to suggest editing
[jira] [Created] (CONNECTORS-1737) Suggestion for adding function for proxy configuration for connector of Confluence-V6
Nguyen Huu Nhat created CONNECTORS-1737: --- Summary: Suggestion for adding function for proxy configuration for connector of Confluence-V6 Key: CONNECTORS-1737 URL: https://issues.apache.org/jira/browse/CONNECTORS-1737 Project: ManifoldCF Issue Type: Improvement Reporter: Nguyen Huu Nhat Attachments: EditConnection.png, ViewConnection.png, patch.txt Hi there, Currently, I am having a problem regarding the need for proxy information to connect to Confluence Cloud (SaaS) from the Confluence connector. As you know, we can use Confluence Server (self-built environment) or use Confluence Cloud (SaaS). Using Confluence Server, you can set up a server (for example, located in the same place where ManifoldCF is running) so you may not need to use proxy information for the connection. However, for Confluence Cloud (SaaS), there will be cases where you need to set proxy information to be able to connect. I have checked in a few other connectors of ManifoldCF, there are a few connectors that support setting proxy information for the connection, for example: SharePoint, Jira, Slack, ... Therefore, I would like to suggest editing Confluence-v6's sourcecode (including Authority connector and Repository connector) so that we can have an option to set proxy information to use for connection to Confluence when needed. h3. +*Connector Name*+ - Confluence-v6 (including Authority connector and Repository connector) h3. +*Improvement Detail*+ - At the Confluence connection setting screen, add fields that allow the user to set proxy information (eg: proxyProtocol, proxyHost, proxyPort..). The proxy setting screen will look like below: !EditConnection.png! - When connecting (call method `connect()`), use the set field values as proxy information. ※ This improvement applies to both the Authority Connector and the Repository Connector side. h3. +*Source Code Modification*+ Because the content is edited in many files and the number of LOC is quite large, I will attach the patch file here, please check it. [^patch.txt] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CONNECTORS-1731) Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Huu Nhat updated CONNECTORS-1731: Description: Hi there, I would like to suggest the following retry-related improvements: h3. +*1. Connector name*+ Generic Repository Connector h3. +*2. Reasons for improvement*+ In the process of ExecuteSeedingThread, DocumentVersionThread and ExecuteProcessThread, when HttpClient.execute is executed, there are chances that connection gets interrupted and connection error occurs (HTTP status code <> 200). When HTTP status code <> 200, exception of type ManifoldCFException will be thrown. However, there is no process to handle this, which leads to job being aborted. As errors relating to connection (HTTP status code <> 200) can be resolved automatically, retry should be added for cases like this. h3. +*3. Improvements*+ Improvement includes the followings: * Adding method to handle retry for exception of type ManifoldCFException * Calling method to handle ManifoldCFException exception generated when executing in threads: ExecuteSeedingThread, DocumentVersionThread, ExecuteProcessThread h3. +*4. Suggested source code (based on release 2.22.1)*+ * Adding method to handle retry for exception of type ManifoldCFException [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1026] {code:java} /** * Function for handling ManifoldCFException exception caused by connection error. * In case of connection error, ServiceInterruption exception is thrown to perform retry. * * @param e ManifoldCFException * @throws ServiceInterruption */ protected static void handleManifoldCFException(ManifoldCFException e) throws ServiceInterruption { long currentTime = System.currentTimeMillis(); throw new ServiceInterruption("Connection error: " + e.getMessage(), e, currentTime + 30L, currentTime + 3 * 60 * 6L, -1, false); } {code} * Calling method to handle ManifoldCFException exception generated when executing in threads: ExecuteSeedingThread, DocumentVersionThread, ExecuteProcessThread ** ExecuteSeedingThread [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L256] {code:java} } catch (InterruptedException e) { t.interrupt(); throw new ManifoldCFException("Interrupted: " + e.getMessage(), e, ManifoldCFException.INTERRUPTED); + } catch (ManifoldCFException e) { + handleManifoldCFException(e); } return new Long(seedTime).toString(); {code} ** DocumentVersionThread [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L304] {code:java} try { versions = versioningThread.finishUp(); } catch (IOException ex) { handleIOException((IOException)ex); + } catch (ManifoldCFException ex) { + handleManifoldCFException(ex); } catch (InterruptedException ex) { throw new ManifoldCFException(ex.getMessage(), ex, ManifoldCFException.INTERRUPTED); } {code} ** ExecuteProcessThread [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L445] {code:java} } catch (InterruptedIOException e) { t.interrupt(); throw new ManifoldCFException("Interrupted: " + e.getMessage(), e, ManifoldCFException.INTERRUPTED); } catch (IOException e) { handleIOException(e); + } catch (ManifoldCFException e) { + handleManifoldCFException(e); } {code} was: Hi there, I would like to suggest the following retry-related improvements: h3. +*1. Connector name*+ Generic Repository Connector h3. +*2. Reasons for improvement*+ In the process of ExecuteSeedingThread, DocumentVersionThread and ExecuteProcessThread, when HttpClient.execute is executed, there are chances that connection gets interrupted and connection error occurs (HTTP status code <> 200). When HTTP status code <> 200, exception of type ManifoldCFException will be thrown. However, there is no process to handle this, which leads to job being aborted. As errors relating to connection (HTTP status code <> 200) can be resolved automatically, retry should be added for cases like this. h3. +*3. Improvements*+ Improvement includes the followings: * Adding method to handle retry for exception of type ManifoldCFException * Calling method to handle ManifoldCFException exception generated when executing in threads: ExecuteSeedingThread,
[jira] [Updated] (CONNECTORS-1731) Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Huu Nhat updated CONNECTORS-1731: Attachment: patch_update.txt > Suggestion for adding handling process for ManifoldCFException in Generic > Repository Connector > -- > > Key: CONNECTORS-1731 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1731 > Project: ManifoldCF > Issue Type: Improvement >Reporter: Nguyen Huu Nhat >Assignee: Karl Wright >Priority: Major > Attachments: patch.txt, patch_update.txt > > > Hi there, > I would like to suggest the following retry-related improvements: > h3. +*1. Connector name*+ > Generic Repository Connector > h3. +*2. Reasons for improvement*+ > In the process of ExecuteSeedingThread, DocumentVersionThread and > ExecuteProcessThread, when HttpClient.execute is executed, there are chances > that connection gets interrupted and connection error occurs (HTTP status > code <> 200). > When HTTP status code <> 200, exception of type ManifoldCFException will be > thrown. However, there is no process to handle this, which leads to job being > aborted. > As errors relating to connection (HTTP status code <> 200) can be resolved > automatically, retry should be added for cases like this. > h3. +*3. Improvements*+ > Improvement includes the followings: > * Adding method to handle retry for exception of type ManifoldCFException > * Calling method to handle ManifoldCFException exception generated when > executing in threads: ExecuteSeedingThread, DocumentVersionThread, > ExecuteProcessThread > h3. +*4. Suggested source code (based on release 2.22.1)*+ > * Adding method to handle retry for exception of type ManifoldCFException > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1026] > {code:java} > /** >* Function for handling ManifoldCFException exception caused by connection > error. >* In case of connection error, ServiceInterruption exception is thrown to > perform retry. >* >* @param e ManifoldCFException >* @throws ServiceInterruption >*/ > protected static void handleManifoldCFException(ManifoldCFException e) > throws ServiceInterruption { > long currentTime = System.currentTimeMillis(); > throw new ServiceInterruption("Connection error: " + e.getMessage(), e, > currentTime + 30L, > currentTime + timeToFail * 3 * 60 * 6L, -1, false); > } > {code} > * Calling method to handle ManifoldCFException exception generated when > executing in threads: ExecuteSeedingThread, DocumentVersionThread, > ExecuteProcessThread > ** ExecuteSeedingThread > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L256] > {code:java} > } catch (InterruptedException e) { > t.interrupt(); > throw new ManifoldCFException("Interrupted: " + e.getMessage(), e, > ManifoldCFException.INTERRUPTED); > + } catch (ManifoldCFException e) { > + handleManifoldCFException(e); > } > return new Long(seedTime).toString(); > {code} > ** DocumentVersionThread > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L304] > {code:java} > try { > versions = versioningThread.finishUp(); > } catch (IOException ex) { > handleIOException((IOException)ex); > + } catch (ManifoldCFException ex) { > + handleManifoldCFException(ex); > } catch (InterruptedException ex) { > throw new ManifoldCFException(ex.getMessage(), ex, > ManifoldCFException.INTERRUPTED); > } > {code} > ** ExecuteProcessThread > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L445] > {code:java} > } catch (InterruptedIOException e) { > t.interrupt(); > throw new ManifoldCFException("Interrupted: " + e.getMessage(), > e, ManifoldCFException.INTERRUPTED); > } catch (IOException e) { > handleIOException(e); > + } catch (ManifoldCFException e) { > + handleManifoldCFException(e); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CONNECTORS-1731) Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607630#comment-17607630 ] Nguyen Huu Nhat commented on CONNECTORS-1731: - Sorry. The patch file is currently a bit mistake about the modification source code. I updated that part in the attached file. Please check again with the file below: [^patch_update.txt] > Suggestion for adding handling process for ManifoldCFException in Generic > Repository Connector > -- > > Key: CONNECTORS-1731 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1731 > Project: ManifoldCF > Issue Type: Improvement >Reporter: Nguyen Huu Nhat >Assignee: Karl Wright >Priority: Major > Attachments: patch.txt, patch_update.txt > > > Hi there, > I would like to suggest the following retry-related improvements: > h3. +*1. Connector name*+ > Generic Repository Connector > h3. +*2. Reasons for improvement*+ > In the process of ExecuteSeedingThread, DocumentVersionThread and > ExecuteProcessThread, when HttpClient.execute is executed, there are chances > that connection gets interrupted and connection error occurs (HTTP status > code <> 200). > When HTTP status code <> 200, exception of type ManifoldCFException will be > thrown. However, there is no process to handle this, which leads to job being > aborted. > As errors relating to connection (HTTP status code <> 200) can be resolved > automatically, retry should be added for cases like this. > h3. +*3. Improvements*+ > Improvement includes the followings: > * Adding method to handle retry for exception of type ManifoldCFException > * Calling method to handle ManifoldCFException exception generated when > executing in threads: ExecuteSeedingThread, DocumentVersionThread, > ExecuteProcessThread > h3. +*4. Suggested source code (based on release 2.22.1)*+ > * Adding method to handle retry for exception of type ManifoldCFException > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1026] > {code:java} > /** >* Function for handling ManifoldCFException exception caused by connection > error. >* In case of connection error, ServiceInterruption exception is thrown to > perform retry. >* >* @param e ManifoldCFException >* @throws ServiceInterruption >*/ > protected static void handleManifoldCFException(ManifoldCFException e) > throws ServiceInterruption { > long currentTime = System.currentTimeMillis(); > throw new ServiceInterruption("Connection error: " + e.getMessage(), e, > currentTime + 30L, > currentTime + timeToFail * 3 * 60 * 6L, -1, false); > } > {code} > * Calling method to handle ManifoldCFException exception generated when > executing in threads: ExecuteSeedingThread, DocumentVersionThread, > ExecuteProcessThread > ** ExecuteSeedingThread > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L256] > {code:java} > } catch (InterruptedException e) { > t.interrupt(); > throw new ManifoldCFException("Interrupted: " + e.getMessage(), e, > ManifoldCFException.INTERRUPTED); > + } catch (ManifoldCFException e) { > + handleManifoldCFException(e); > } > return new Long(seedTime).toString(); > {code} > ** DocumentVersionThread > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L304] > {code:java} > try { > versions = versioningThread.finishUp(); > } catch (IOException ex) { > handleIOException((IOException)ex); > + } catch (ManifoldCFException ex) { > + handleManifoldCFException(ex); > } catch (InterruptedException ex) { > throw new ManifoldCFException(ex.getMessage(), ex, > ManifoldCFException.INTERRUPTED); > } > {code} > ** ExecuteProcessThread > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L445] > {code:java} > } catch (InterruptedIOException e) { > t.interrupt(); > throw new ManifoldCFException("Interrupted: " + e.getMessage(), > e, ManifoldCFException.INTERRUPTED); > } catch (IOException e) { > handleIOException(e); > + } catch (ManifoldCFException e) { > + handleManifoldCFException(e); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CONNECTORS-1731) Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Huu Nhat updated CONNECTORS-1731: Attachment: patch.txt > Suggestion for adding handling process for ManifoldCFException in Generic > Repository Connector > -- > > Key: CONNECTORS-1731 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1731 > Project: ManifoldCF > Issue Type: Improvement >Reporter: Nguyen Huu Nhat >Assignee: Karl Wright >Priority: Major > Attachments: patch.txt > > > Hi there, > I would like to suggest the following retry-related improvements: > h3. +*1. Connector name*+ > Generic Repository Connector > h3. +*2. Reasons for improvement*+ > In the process of ExecuteSeedingThread, DocumentVersionThread and > ExecuteProcessThread, when HttpClient.execute is executed, there are chances > that connection gets interrupted and connection error occurs (HTTP status > code <> 200). > When HTTP status code <> 200, exception of type ManifoldCFException will be > thrown. However, there is no process to handle this, which leads to job being > aborted. > As errors relating to connection (HTTP status code <> 200) can be resolved > automatically, retry should be added for cases like this. > h3. +*3. Improvements*+ > Improvement includes the followings: > * Adding method to handle retry for exception of type ManifoldCFException > * Calling method to handle ManifoldCFException exception generated when > executing in threads: ExecuteSeedingThread, DocumentVersionThread, > ExecuteProcessThread > h3. +*4. Suggested source code (based on release 2.22.1)*+ > * Adding method to handle retry for exception of type ManifoldCFException > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1026] > {code:java} > /** >* Function for handling ManifoldCFException exception caused by connection > error. >* In case of connection error, ServiceInterruption exception is thrown to > perform retry. >* >* @param e ManifoldCFException >* @throws ServiceInterruption >*/ > protected static void handleManifoldCFException(ManifoldCFException e) > throws ServiceInterruption { > long currentTime = System.currentTimeMillis(); > throw new ServiceInterruption("Connection error: " + e.getMessage(), e, > currentTime + 30L, > currentTime + timeToFail * 3 * 60 * 6L, -1, false); > } > {code} > * Calling method to handle ManifoldCFException exception generated when > executing in threads: ExecuteSeedingThread, DocumentVersionThread, > ExecuteProcessThread > ** ExecuteSeedingThread > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L256] > {code:java} > } catch (InterruptedException e) { > t.interrupt(); > throw new ManifoldCFException("Interrupted: " + e.getMessage(), e, > ManifoldCFException.INTERRUPTED); > + } catch (ManifoldCFException e) { > + handleManifoldCFException(e); > } > return new Long(seedTime).toString(); > {code} > ** DocumentVersionThread > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L304] > {code:java} > try { > versions = versioningThread.finishUp(); > } catch (IOException ex) { > handleIOException((IOException)ex); > + } catch (ManifoldCFException ex) { > + handleManifoldCFException(ex); > } catch (InterruptedException ex) { > throw new ManifoldCFException(ex.getMessage(), ex, > ManifoldCFException.INTERRUPTED); > } > {code} > ** ExecuteProcessThread > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L445] > {code:java} > } catch (InterruptedIOException e) { > t.interrupt(); > throw new ManifoldCFException("Interrupted: " + e.getMessage(), > e, ManifoldCFException.INTERRUPTED); > } catch (IOException e) { > handleIOException(e); > + } catch (ManifoldCFException e) { > + handleManifoldCFException(e); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CONNECTORS-1731) Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607485#comment-17607485 ] Nguyen Huu Nhat commented on CONNECTORS-1731: - I executed above command. Here is the result of my suggestion changes. [^patch.txt] Please check. Thanks! > Suggestion for adding handling process for ManifoldCFException in Generic > Repository Connector > -- > > Key: CONNECTORS-1731 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1731 > Project: ManifoldCF > Issue Type: Improvement >Reporter: Nguyen Huu Nhat >Assignee: Karl Wright >Priority: Major > Attachments: patch.txt > > > Hi there, > I would like to suggest the following retry-related improvements: > h3. +*1. Connector name*+ > Generic Repository Connector > h3. +*2. Reasons for improvement*+ > In the process of ExecuteSeedingThread, DocumentVersionThread and > ExecuteProcessThread, when HttpClient.execute is executed, there are chances > that connection gets interrupted and connection error occurs (HTTP status > code <> 200). > When HTTP status code <> 200, exception of type ManifoldCFException will be > thrown. However, there is no process to handle this, which leads to job being > aborted. > As errors relating to connection (HTTP status code <> 200) can be resolved > automatically, retry should be added for cases like this. > h3. +*3. Improvements*+ > Improvement includes the followings: > * Adding method to handle retry for exception of type ManifoldCFException > * Calling method to handle ManifoldCFException exception generated when > executing in threads: ExecuteSeedingThread, DocumentVersionThread, > ExecuteProcessThread > h3. +*4. Suggested source code (based on release 2.22.1)*+ > * Adding method to handle retry for exception of type ManifoldCFException > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1026] > {code:java} > /** >* Function for handling ManifoldCFException exception caused by connection > error. >* In case of connection error, ServiceInterruption exception is thrown to > perform retry. >* >* @param e ManifoldCFException >* @throws ServiceInterruption >*/ > protected static void handleManifoldCFException(ManifoldCFException e) > throws ServiceInterruption { > long currentTime = System.currentTimeMillis(); > throw new ServiceInterruption("Connection error: " + e.getMessage(), e, > currentTime + 30L, > currentTime + timeToFail * 3 * 60 * 6L, -1, false); > } > {code} > * Calling method to handle ManifoldCFException exception generated when > executing in threads: ExecuteSeedingThread, DocumentVersionThread, > ExecuteProcessThread > ** ExecuteSeedingThread > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L256] > {code:java} > } catch (InterruptedException e) { > t.interrupt(); > throw new ManifoldCFException("Interrupted: " + e.getMessage(), e, > ManifoldCFException.INTERRUPTED); > + } catch (ManifoldCFException e) { > + handleManifoldCFException(e); > } > return new Long(seedTime).toString(); > {code} > ** DocumentVersionThread > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L304] > {code:java} > try { > versions = versioningThread.finishUp(); > } catch (IOException ex) { > handleIOException((IOException)ex); > + } catch (ManifoldCFException ex) { > + handleManifoldCFException(ex); > } catch (InterruptedException ex) { > throw new ManifoldCFException(ex.getMessage(), ex, > ManifoldCFException.INTERRUPTED); > } > {code} > ** ExecuteProcessThread > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L445] > {code:java} > } catch (InterruptedIOException e) { > t.interrupt(); > throw new ManifoldCFException("Interrupted: " + e.getMessage(), > e, ManifoldCFException.INTERRUPTED); > } catch (IOException e) { > handleIOException(e); > + } catch (ManifoldCFException e) { > + handleManifoldCFException(e); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CONNECTORS-1731) Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector
Nguyen Huu Nhat created CONNECTORS-1731: --- Summary: Suggestion for adding handling process for ManifoldCFException in Generic Repository Connector Key: CONNECTORS-1731 URL: https://issues.apache.org/jira/browse/CONNECTORS-1731 Project: ManifoldCF Issue Type: Improvement Reporter: Nguyen Huu Nhat Hi there, I would like to suggest the following retry-related improvements: h3. +*1. Connector name*+ Generic Repository Connector h3. +*2. Reasons for improvement*+ In the process of ExecuteSeedingThread, DocumentVersionThread and ExecuteProcessThread, when HttpClient.execute is executed, there are chances that connection gets interrupted and connection error occurs (HTTP status code <> 200). When HTTP status code <> 200, exception of type ManifoldCFException will be thrown. However, there is no process to handle this, which leads to job being aborted. As errors relating to connection (HTTP status code <> 200) can be resolved automatically, retry should be added for cases like this. h3. +*3. Improvements*+ Improvement includes the followings: * Adding method to handle retry for exception of type ManifoldCFException * Calling method to handle ManifoldCFException exception generated when executing in threads: ExecuteSeedingThread, DocumentVersionThread, ExecuteProcessThread h3. +*4. Suggested source code (based on release 2.22.1)*+ * Adding method to handle retry for exception of type ManifoldCFException [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1026] {code:java} /** * Function for handling ManifoldCFException exception caused by connection error. * In case of connection error, ServiceInterruption exception is thrown to perform retry. * * @param e ManifoldCFException * @throws ServiceInterruption */ protected static void handleManifoldCFException(ManifoldCFException e) throws ServiceInterruption { long currentTime = System.currentTimeMillis(); throw new ServiceInterruption("Connection error: " + e.getMessage(), e, currentTime + 30L, currentTime + timeToFail * 3 * 60 * 6L, -1, false); } {code} * Calling method to handle ManifoldCFException exception generated when executing in threads: ExecuteSeedingThread, DocumentVersionThread, ExecuteProcessThread ** ExecuteSeedingThread [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L256] {code:java} } catch (InterruptedException e) { t.interrupt(); throw new ManifoldCFException("Interrupted: " + e.getMessage(), e, ManifoldCFException.INTERRUPTED); + } catch (ManifoldCFException e) { + handleManifoldCFException(e); } return new Long(seedTime).toString(); {code} ** DocumentVersionThread [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L304] {code:java} try { versions = versioningThread.finishUp(); } catch (IOException ex) { handleIOException((IOException)ex); + } catch (ManifoldCFException ex) { + handleManifoldCFException(ex); } catch (InterruptedException ex) { throw new ManifoldCFException(ex.getMessage(), ex, ManifoldCFException.INTERRUPTED); } {code} ** ExecuteProcessThread [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L445] {code:java} } catch (InterruptedIOException e) { t.interrupt(); throw new ManifoldCFException("Interrupted: " + e.getMessage(), e, ManifoldCFException.INTERRUPTED); } catch (IOException e) { handleIOException(e); + } catch (ManifoldCFException e) { + handleManifoldCFException(e); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CONNECTORS-1730) Improvement suggestion for retry function in SharedDriveConnector
[ https://issues.apache.org/jira/browse/CONNECTORS-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606832#comment-17606832 ] Nguyen Huu Nhat commented on CONNECTORS-1730: - Thank you for your response. After carefully checking the source code again, I can see that possible problems have alreay been handled by ServiceInterruption to retry. As for that reason, I am cancelling this ticket. > Improvement suggestion for retry function in SharedDriveConnector > - > > Key: CONNECTORS-1730 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1730 > Project: ManifoldCF > Issue Type: Improvement >Reporter: Nguyen Huu Nhat >Assignee: Karl Wright >Priority: Major > > Hi there, > I would like to suggest the following retry-related improvements: > h3. +*1. Connector name*+ > SharedDriveConnector > h3. +*2. Overview*+ > * When connection to SMB can't be executed, JCIFS connector will fail to > connect and exception will occur. JCIFS will attempt to retry, and abort > after a certain number of time. > * The number of retry is currently controlled by the following parameters: > ** *retriesRemaining* (hardcode:3): The number of occurence of the same > exception for a file or method. If a different exception occurs, this value > is reset to 3. > ** *totalTries* (hardcode:5): The total number of occurrence of an exception > for a file or method. > For the two variables above, if *retriesRemaining* becomes 0 or *totalTries* > becomes 5 then the job will be aborted. > h3. +*3. Reasons for improvement*+ > Currently the maximum number of retry is being hardcoded at 3 and 5, > respectively. > In case connection to file server is unstable, to avoid aborting, I would > like to suggest making these values customizable. > For implementation, I would like to suggest the following methods: > * 1/ Setting retry values in *properties.xml* > * 2/ Setting retry values on WebUI of repository connection > Between the two methods above, I suggest the first method because of > following reasons: > * The first method is easier to implement > * Although the second method is more user-friendly, there are several issues: > ** The config data from the screen will have to be stored in the database > (PostgreSQL), resulting in an increased number of fields. > ** Consequently, there might be a need to perform DB Migration in case > further changes to setting field are needed. > ※ According to the above reasons, I will proceed with the first method > 'Setting retry values in properties.xml' for the next part of this suggestion. > h3. +*4. Improvement*+ > Changing source code to read maximum number of retries from *properties.xml* > Declare two variables in *properties.xml* to set the maximum number of retry: > * > `org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount` > ⇒ Set to `consecutiveSMBExceptionRetryCount` > * `org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount` > ⇒ Set to `totalSMBRetryCount` > E.g: > {code:xml} >name="org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount" > value="3"/> >name="org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount" > value="5"/> > {code} > SharedDriveConnector will load these values from the file and set to two > variables within the source code. > ※In case these values can't be found from the file or set to an invalid > value, default values will be used instead. > h3. +*5. Suggested source code (based on release 2.22.1)*+ > Target class: > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector > * Declare two class variables to store the configured values as follows: > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java#L103] > > {code:java} > private final static int consecutiveSMBExceptionRetryCount; > private final static int totalSMBRetryCount; > {code} > > * Initialize the two variables above with following steps: > ** Set the values configured in 'properties.xml' to the two variables above > ** If these values weren't configured or invalid, set them to default values > of 3 and 5, respectively. > [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java#L106] > {code:java} > // Static initialization of various system properties. This hopefully > takes place > // before jcifs is loaded. > static > { > ... > int tempConsecutiveSMBExceptionRetryCount = 3; > int tempTotalSMBRetryCount = 5; > try { >
[jira] [Updated] (CONNECTORS-1730) Improvement suggestion for retry function in SharedDriveConnector
[ https://issues.apache.org/jira/browse/CONNECTORS-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Huu Nhat updated CONNECTORS-1730: Description: Hi there, I would like to suggest the following retry-related improvements: h3. +*1. Connector name*+ SharedDriveConnector h3. +*2. Overview*+ * When connection to SMB can't be executed, JCIFS connector will fail to connect and exception will occur. JCIFS will attempt to retry, and abort after a certain number of time. * The number of retry is currently controlled by the following parameters: ** *retriesRemaining* (hardcode:3): The number of occurence of the same exception for a file or method. If a different exception occurs, this value is reset to 3. ** *totalTries* (hardcode:5): The total number of occurrence of an exception for a file or method. For the two variables above, if *retriesRemaining* becomes 0 or *totalTries* becomes 5 then the job will be aborted. h3. +*3. Reasons for improvement*+ Currently the maximum number of retry is being hardcoded at 3 and 5, respectively. In case connection to file server is unstable, to avoid aborting, I would like to suggest making these values customizable. For implementation, I would like to suggest the following methods: * 1/ Setting retry values in *properties.xml* * 2/ Setting retry values on WebUI of repository connection Between the two methods above, I suggest the first method because of following reasons: * The first method is easier to implement * Although the second method is more user-friendly, there are several issues: ** The config data from the screen will have to be stored in the database (PostgreSQL), resulting in an increased number of fields. ** Consequently, there might be a need to perform DB Migration in case further changes to setting field are needed. ※ According to the above reasons, I will proceed with the first method 'Setting retry values in properties.xml' for the next part of this suggestion. h3. +*4. Improvement*+ Changing source code to read maximum number of retries from *properties.xml* Declare two variables in *properties.xml* to set the maximum number of retry: * `org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount` ⇒ Set to `consecutiveSMBExceptionRetryCount` * `org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount` ⇒ Set to `totalSMBRetryCount` E.g: {code:xml} {code} SharedDriveConnector will load these values from the file and set to two variables within the source code. ※In case these values can't be found from the file or set to an invalid value, default values will be used instead. h3. +*5. Suggested source code (based on release 2.22.1)*+ Target class: org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector * Declare two class variables to store the configured values as follows: [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java#L103] {code:java} private final static int consecutiveSMBExceptionRetryCount; private final static int totalSMBRetryCount; {code} * Initialize the two variables above with following steps: ** Set the values configured in 'properties.xml' to the two variables above ** If these values weren't configured or invalid, set them to default values of 3 and 5, respectively. [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java#L106] {code:java} // Static initialization of various system properties. This hopefully takes place // before jcifs is loaded. static { ... int tempConsecutiveSMBExceptionRetryCount = 3; int tempTotalSMBRetryCount = 5; try { tempConsecutiveSMBExceptionRetryCount = ManifoldCF.getIntProperty("org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount", tempConsecutiveSMBExceptionRetryCount); } catch (ManifoldCFException e) { Logging.connectors.warn("Invalid property value for " + "org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount, must be integer. Setting to default: " + Integer.toString(tempConsecutiveSMBExceptionRetryCount)); } consecutiveSMBExceptionRetryCount = tempConsecutiveSMBExceptionRetryCount; try { tempTotalSMBRetryCount = ManifoldCF.getIntProperty("org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount", tempTotalSMBRetryCount); } catch (ManifoldCFException e) { Logging.connectors.warn("Invalid property value for " + "org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount, must be integer. Setting to default: " +
[jira] [Created] (CONNECTORS-1730) Improvement suggestion for retry function in SharedDriveConnector
Nguyen Huu Nhat created CONNECTORS-1730: --- Summary: Improvement suggestion for retry function in SharedDriveConnector Key: CONNECTORS-1730 URL: https://issues.apache.org/jira/browse/CONNECTORS-1730 Project: ManifoldCF Issue Type: Improvement Reporter: Nguyen Huu Nhat Hi there, I would like to suggest the following retry-related improvements: h3. +*1. Connector name*+ SharedDriveConnector h3. +*2. Preface*+ * When connection to SMB can't be executed, JCIFS connector will fail to connect and exception will occur. JCIFS will attempt to retry, and abort after a certain number of time. * The number of retry is currently controlled by the folloing parameter: ** *retriesRemaining* (hardcode:3): The number of occurence of the same exception for a file or method. If a different exception occurs, this value is reset to 3. ** *totalTries* (hardcode:5): The total number of occurrence of an exception for a file or method. For the two variables above, if *retriesRemaining* becomes 0 or *totalTries* becomes 5 then the job will be aborted. h3. +*3. Reasons for improvement*+ Currently the maximum number of retry is being hardcoded at 3 and 5, respectively. In case connection to file server is unstable, to avoid aborting, I would like to suggest making these values customizable. For implementation, I would like to suggest the following methods: * 1/ Setting retry values in *properties.xml* * 2/ Setting retry values on WebUI of repository connection Between the two methods above, I suggest the first method because of following reasons: * The first method is easier to implement * Although the second method is more user-friendly, there are several issues: ** The config data from the screen will have to be stored in the database (PostgreSQL), resulting in an increased number of fields. ** Consequently, there might be a need to perform DB Migration in case further changes to setting field are needed. ※ According to the above reasons, I will proceed with the first method 'Setting retry values in properties.xml' for the next part of this suggestion. h3. +*4. Improvement*+ Changing source code to read maximum number of retries from *properties.xml* Declare two variables in *properties.xml* to set the maximum number of retry: * `org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount` ⇒ Set to `consecutiveSMBExceptionRetryCount` * `org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount` ⇒ Set to `totalSMBRetryCount` E.g: {code:xml} {code} SharedDriveConnector will load these values from the file and set to two variables within the source code. ※In case these values can't be found from the file or set to an invalid value, default values will be used instead. h3. +*5. Suggested source code (based on release 2.22.1)*+ Target class: org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java#L103] * Declare two class variables to store the configured values as follows: {code:java} private final static int consecutiveSMBExceptionRetryCount; private final static int totalSMBRetryCount; {code} * Initialize the two variables above with following steps: ** Set the values configured in 'properties.xml' to the two variables above ** If these values weren't configured or invalid, set them to default values of 3 and 5, respectively. [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java#L106] {code:java} // Static initialization of various system properties. This hopefully takes place // before jcifs is loaded. static { ... int tempConsecutiveSMBExceptionRetryCount = 3; int tempTotalSMBRetryCount = 5; try { tempConsecutiveSMBExceptionRetryCount = ManifoldCF.getIntProperty("org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount", tempConsecutiveSMBExceptionRetryCount); } catch (ManifoldCFException e) { Logging.connectors.warn("Invalid property value for " + "org.apache.manifoldcf.crawler.connectors.sharedrive.consecutivesmbexceptionretrycount, must be integer. Setting to default: " + Integer.toString(tempConsecutiveSMBExceptionRetryCount)); } consecutiveSMBExceptionRetryCount = tempConsecutiveSMBExceptionRetryCount; try { tempTotalSMBRetryCount = ManifoldCF.getIntProperty("org.apache.manifoldcf.crawler.connectors.sharedrive.totalsmbretrycount", tempTotalSMBRetryCount); } catch (ManifoldCFException e) {
[jira] [Created] (CONNECTORS-1729) The Confluence-v6 Repository Connector's attachment logic is incorrect
Nguyen Huu Nhat created CONNECTORS-1729: --- Summary: The Confluence-v6 Repository Connector's attachment logic is incorrect Key: CONNECTORS-1729 URL: https://issues.apache.org/jira/browse/CONNECTORS-1729 Project: ManifoldCF Issue Type: Bug Reporter: Nguyen Huu Nhat Hi there, As there is an issue that is still not handled occurs in use, I would like to suggest the following fix for the source code of Confluence Repository Connector. For details about this issue, please refer to the information below: h3. +*1. Connector Name*+ confluence-v6 \ Confluence Repository Connector h3. +*2. Overview*+ * In the Confluence Repository Connector, there is an error in the logic that determines wether the document has attachments or not. * Wrong logic leads to attachments not being crawled. ※ This error only occurs when crawling documents from Confluence server, while crawling documents from Confluence Cloud (SaaS) still works normally. * Formats of the document's ID when there is a file attached are as below: ** Crawled from Confluence server: *-* ** Crawled from Confluence cloud (SaaS): *att-* h3. +*3. Reproduction*+ * On Confluence server: ** Create a blog. ** Add attachments to the newly created blog. * On ManifoldCF: ** Create a Confluence Repository Connector with the aforementioned Confluence server information. ** Create a job using the connector created above with the following details: *** On the [Page] tab: Process Attachments: (Check). Type Specification: Blog. ** Start job. ** Check [Simple History Report]. h3. +*4. Cause*+ * At the logic for judging whether the document has / does not have a file attachment, if the ID of the document begins with *att*, it is judging that there is a file attachment. * However, the ID field of the document crawled from the Confluence server, in fact, when the file is attached, does not prefix it with *att* (format mentioned in item 2). h3. +*5. Solution*+ My observation is as below: * If a document has a file attachment, the ID of that document is a string of characters connected by *-* character. * If a document does not have a file attachment, the ID of that document does not contain *-* character. Therefore, it is possible to judge whether a file is is attached or not by checking if the ID contains *-* character. h3. +*6. Suggested source code (based on release 2.22.1)*+ ***Class: org.apache.manifoldcf.crawler.connectors.confluence.v6.util.ConfluenceUtil*** [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/confluence-v6/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/confluence/v6/util/ConfluenceUtil.java#L28] {code:java} - private static final String ATTACHMENT_ID_PREFIX = "att"; + private static final String ATTACHMENT_ID_CHARACTER = "-"; {code} [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/confluence-v6/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/confluence/v6/util/ConfluenceUtil.java#L47] {code:java} public static Boolean isAttachment(String id) { -return id.startsWith(ATTACHMENT_ID_PREFIX); +return id.contains(ATTACHMENT_ID_CHARACTER); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CONNECTORS-1728) Fix error message of Generic Repository Connector
Nguyen Huu Nhat created CONNECTORS-1728: --- Summary: Fix error message of Generic Repository Connector Key: CONNECTORS-1728 URL: https://issues.apache.org/jira/browse/CONNECTORS-1728 Project: ManifoldCF Issue Type: Bug Reporter: Nguyen Huu Nhat Hi there, As there is a problem that is still not addressed during use, I would like to suggest the following correction for the source code of the Generic Repository Connector. For additional details, please see below: h3. +*1. Connector name*+ Generic Repository Connector h3. +*2. Issue*+ In the *run()* method of the *GenericConnector$DocumentVersionThread* class, if connector cannot connect to REST API (HTTP status code != 200), there is an error message in log file: [ *addSeedDocuments error* - interface returned incorrect return code for: ... ] However, this is *DocumentVersionThread* thread, not *ExecuteSeedingThread* thread. The *addSeedDocuments error* prefix is not suiable to this thread. I think it should be *getDocumentVersions error* prefix. h3. +*3. Cause*+ This may be a copy/paste mistake: [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1207] {code:java} if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) { exception = new ManifoldCFException("addSeedDocuments error - interface returned incorrect return code for: " + url + " - " + response.getStatusLine().toString()); return; } {code} h3. +*4. Solution*+ Updating the content of this error message from [addSeedDocuments error] to [getDocumentVersions error] h3. +*5. Suggested source code (based on release 2.22.1)*+ [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1207] {code:java} if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) { - exception = new ManifoldCFException("addSeedDocuments error - interface returned incorrect return code for: " + url + " - " + response.getStatusLine().toString()); + exception = new ManifoldCFException("getDocumentVersions error - interface returned incorrect return code for: " + url + " - " + response.getStatusLine().toString()); return; } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CONNECTORS-1727) Timeout values for Genreric Authority is not updated after setting
Nguyen Huu Nhat created CONNECTORS-1727: --- Summary: Timeout values for Genreric Authority is not updated after setting Key: CONNECTORS-1727 URL: https://issues.apache.org/jira/browse/CONNECTORS-1727 Project: ManifoldCF Issue Type: Bug Reporter: Nguyen Huu Nhat Hi there, As there is a problem that is still not addressed during use, I would like to suggest the following correction for the source code of the Generic Authority Connector. ※This is the same issue as that in Generic Repository Connector, which was resolved at [CONNECTORS-1726|https://issues.apache.org/jira/browse/CONNECTORS-1726] For additional details, please see below: h3. +*1. Connector name*+ Generic Authority Connector h3. +*2. Issue*+ When I create or edit a Generic authority connection, I cannot update the value in the following fields: * Connection timeout (milis) * Socket timeout (milis) h3. +*3. Reproduction*+ * Create a Generic authority connection ** On *Entry point* tab, edit the values of *Connection timeout (milis)* and *Socket timeout (milis)* fields ** Click on *Save* button * On *View Authority Connection Status - Generic* screen, it can be seen that the values of the 2 above fields are not updated. h3. +*4. Cause*+ The names of the textboxes for the 2 fields are the followings: * genericConTimeout * genericSoTimeout However, the names that are being used inside the source code are the followings: * genericConnectionTimeout * genericSocketTimeout This results in that new values can not be obtained, thus the values of the 2 fields can not be updated. h3. +*5. Solution*+ Update parameter names for Connection Timeout and Socket Timeout with names that are being stored inside the DataBase: * genericConTimeout ➞ genericConnectionTimeout * genericSoTimeout ➞ genericSocketTimeout h3. +*6. Suggested source code (based on release 2.22.1)*+ [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/authorities/authorities/generic/GenericAuthority.java#L400] {code:java} + " \n" + " " + Messages.getBodyString(locale, "generic.ConnectionTimeoutColon") + "\n" - + " \n" + + " \n" + " \n" + " \n" + " " + Messages.getBodyString(locale, "generic.SocketTimeoutColon") + "\n" - + " \n" + + " \n" + " \n" {code} [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/authorities/authorities/generic/GenericAuthority.java#L415] {code:java} - out.print("\n"); - out.print("\n"); + out.print("\n"); + out.print("\n"); {code} [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/authorities/authorities/generic/GenericAuthority.java#L428] {code:java} - copyParam(variableContext, parameters, "genericConTimeout"); - copyParam(variableContext, parameters, "genericSoTimeout"); + copyParam(variableContext, parameters, "genericConnectionTimeout"); + copyParam(variableContext, parameters, "genericSocketTimeout"); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CONNECTORS-1726) Timeout values for Genreric Repository is not updated after setting
Nguyen Huu Nhat created CONNECTORS-1726: --- Summary: Timeout values for Genreric Repository is not updated after setting Key: CONNECTORS-1726 URL: https://issues.apache.org/jira/browse/CONNECTORS-1726 Project: ManifoldCF Issue Type: Bug Reporter: Nguyen Huu Nhat Hi there, As there is a problem that is still not addressed during use, I would like to suggest the following correction for the source code of the Generic Repository Connector. For additional details, please see below: h3. +*1. Connector name*+ Generic Repository Connector h3. +*2. Issue*+ When I create or edit a Generic repository connection, I cannot update the value in the following fields: * Connection timeout (milis) * Socket timeout (milis) h3. +*3. Reproduction*+ * Create a Generic repository connection ** On *Entry point* tab, edit the values of *Connection timeout (milis)* and *Socket timeout (milis)* fields ** Click on *Save* button * On *View Repository Connection Status - Generic* screen, it can be seen that the values of the 2 above fields are not updated. h3. +*4. Cause*+ The names of the textboxes for the 2 fields are the followings: * genericConTimeout * genericSoTimeout However, the names that are being used inside the source code are the followings: * genericConnectionTimeout * genericSocketTimeout This results in that new values can not be obtained, thus the values of the 2 fields can not be updated. h3. +*5. Solution*+ Update parameter names for Connection Timeout and Socket Timeout with names that are being stored inside the DataBase: * genericConTimeout ➞ genericConnectionTimeout * genericSoTimeout ➞ genericSocketTimeout h3. +*6. Suggested source code (based on release 2.22.1)*+ [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L510] {code:java} + " \n" + " " + Messages.getBodyString(locale, "generic.ConnectionTimeoutColon") + "\n" - + " \n" + + " \n" + " \n" + " \n" + " " + Messages.getBodyString(locale, "generic.SocketTimeoutColon") + "\n" - + " \n" + + " \n" + " \n" {code} [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L523] {code:java} - out.print("\n"); - out.print("\n"); + out.print("\n"); + out.print("\n"); {code} [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L535] {code:java} - copyParam(variableContext, parameters, "genericConTimeout"); - copyParam(variableContext, parameters, "genericSoTimeout"); + copyParam(variableContext, parameters, "genericConnectionTimeout"); + copyParam(variableContext, parameters, "genericSocketTimeout"); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CONNECTORS-1725) MissingResourceException exception occurs at Generic Repository Connector
Nguyen Huu Nhat created CONNECTORS-1725: --- Summary: MissingResourceException exception occurs at Generic Repository Connector Key: CONNECTORS-1725 URL: https://issues.apache.org/jira/browse/CONNECTORS-1725 Project: ManifoldCF Issue Type: Bug Reporter: Nguyen Huu Nhat Hi there, As there is a problem that is still not addressed during use, I would like to suggest the following correction for the source code of the Generic Repository Connector. For additional details, please see below: h3. +*1. Connector name*+ Generic Repository Connector h3. +*2. Issue*+ When I create or edit a job using Generic Repository Connector, if I add a parameter without designate its *Parameter name* field, the alert message *generic.TypeInParamName* appears. An error log is as follows: {noformat} ERROR 2022-08-04T15:45:43,443 (qtp10405169-442) - Missing resource 'generic.TypeInParamName' in bundle 'org.apache.manifoldcf.crawler.connectors.generic.common' for locale 'en' java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key generic.TypeInParamName at java.util.ResourceBundle.getObject(ResourceBundle.java:450) ~[?:1.8.0_211] at java.util.ResourceBundle.getString(ResourceBundle.java:407) ~[?:1.8.0_211] at org.apache.manifoldcf.core.i18n.Messages.getMessage(Messages.java:195) ~[mcf-core.jar:?] at org.apache.manifoldcf.core.i18n.Messages.getMessage(Messages.java:184) ~[mcf-core.jar:?] at org.apache.manifoldcf.core.i18n.Messages.getString(Messages.java:218) ~[mcf-core.jar:?] at org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:343) ~[mcf-ui-core.jar:?] at org.apache.manifoldcf.crawler.connectors.generic.Messages.getBodyJavascriptString(Messages.java:95) ~[?:?] at org.apache.manifoldcf.crawler.connectors.generic.Messages.getBodyJavascriptString(Messages.java:54) ~[?:?] at org.apache.manifoldcf.crawler.connectors.generic.GenericConnector.outputSpecificationHeader(GenericConnector.java:610) ~[?:?] {noformat} h3. +*3. Reproduction*+ * Create a Generic Repository Connector * Create a job using the connector created above with the following details: ** On the Parameters tab, add the following parameters: *** Parameter name: blank *** Parameter value: *** Click [Add] h3. +*4. Cause*+ Key *generic.TypeInParamName* is not present in native2ascii _*.properties_ files, however it is in use. Perhaps, this key is being mistaken with *generic.TypeInParameterName*, that is not in use. h3. +*5. Solution*+ Update the property key used in java classes so that it matches the one defined in _*.properties_ files: *generic.TypeInParamName* ➞ *generic.TypeInParameterName* h3. +*6. Suggested source code (based on release 2.22.1)*+ [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L610] {code:java} + "function "+seqPrefix+"SpecAddParam(anchorvalue) {\n" + " if (editjob."+seqPrefix+"specparamname.value == \"\")\n" + " {\n" - + "alert(\"" + Messages.getBodyJavascriptString(locale, "generic.TypeInParamName") + "\");\n" + + "alert(\"" + Messages.getBodyJavascriptString(locale, "generic.TypeInParameterName") + "\");\n" + "editjob."+seqPrefix+"specparamname.focus();\n" + "return;\n" + " }\n" + " "+seqPrefix+"SpecOp(\""+seqPrefix+"paramop\",\"Add\",anchorvalue);\n" + "}\n" {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CONNECTORS-1724) When the REST API cannot be connected, job using the Generic Repository Connector would be freezed.
Nguyen Huu Nhat created CONNECTORS-1724: --- Summary: When the REST API cannot be connected, job using the Generic Repository Connector would be freezed. Key: CONNECTORS-1724 URL: https://issues.apache.org/jira/browse/CONNECTORS-1724 Project: ManifoldCF Issue Type: Bug Reporter: Nguyen Huu Nhat Hi there, As there is an issue that is still not handled occurs in use, I would like to suggest the following fix for the source code of Generic repository connector. For details about this issue, please refer to the information below: h3. +*1. Connector name*+ Generic Repository Connector h3. +*2. Issue*+ When Generic Repository is calling REST API with _action=seed_ and an error occurs, corresponding error handling is not executed, which results in that crawling job of ManifoldCF is frozen at status *Starting up* and no error message is outputted. * When this issue happens in the Generic Repository, seed phase of jobs in other repositories also freezes (perhaps, seed thread is also frozen) * Even after ManifoldCF is restarted, as jobs are automatically executed, the same issue happens again. * A temporary solution is to aborting the job and recheck the connection. h3. +*3. Reproduction*+ h4. *Reproduction method:* * At setting step for Generic repository connection, set a non-existent entry point (e.g. [http://localhost/no*exist/]). Then, define a job that uses that entry point and run that job. * 10 minutes or more after the job gets started, its status is still *Starting up* and abnormal end does not occur due to connection error and time-out. h4. *Reproduction steps:* * Create a Generic repository connection with the following settings: ** On the *Entry Point* tab, set a non-existent entry point (e.g. [http://localhost/no*exist/]) * Create a job using above Generic repository connection * Start the created job and keep track of its status ** Job is going to be frozen with the following information: *** Status: Starting up *** Start Time: Not started *** Documents: 0 ** No new events appear in *Document Status* ** No errors get logged in manifoldcf.log h3. +*4. Cause*+ In *GenericConnector$ExecuteSeedingThread* class, *seedBuffer.signalDone()* method is only called when returned HTTP status code is 200. * When the connector is not able to connect to REST API, which means that returned HTTP status code is not 200, *seedBuffer.signalDone()* method is not called. ** This results in that *complete* flag is not reassigned as _true_ ** As *complete* flag is not reassigned as _true_ and *buffer.size()* is 0, job is stuck in the *wait()* process, inside the while loop of *XThreadStringBuffer#fetch()* method. ([https://github.com/apache/manifoldcf/blob/release-2.22.1/framework/connector-common/src/main/java/org/apache/manifoldcf/connectorcommon/common/XThreadStringBuffer.java#L78]) {code:java} while (buffer.size() == 0 && !complete) wait(); {code} ⇒ These are the reasons why job is frozen at status *Starting up* h3. +*5. Solution*+ In order to resolve this issue, we suggest the following things: * *seedBuffer.signalDone()* method should be called for all cases of HTTP response status. * Moreover, when HTTP status code is not 200, ManifoldCFException is thrown. There is no process to handle ManifoldCFException in *finishUp()* method of *GenericConnector$ExecuteSeedingThread* class, so process to handle this exception should be added. h3. +*6. Suggested source code (based on release 2.22.1)*+ [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1151] {code:java} - seedBuffer.signalDone(); } finally { EntityUtils.consume(response.getEntity()); method.releaseConnection(); + seedBuffer.signalDone(); } {code} [https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1120] {code:java} if (thr instanceof RuntimeException) { throw (RuntimeException) thr; } else if (thr instanceof Error) { throw (Error) thr; + } else if (thr instanceof ManifoldCFException) { + throw (ManifoldCFException) thr; } else { throw new RuntimeException("Unhandled exception of type: " + thr.getClass().getName(), thr); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)