[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-09 Thread Donald Van den Driessche (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738287#comment-16738287
 ] 

Donald Van den Driessche commented on CONNECTORS-1562:
--

Thanks!

I asked the question at the company that provides the sitemap.

I much appreciate all your effort!

> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> image-2019-01-09-14-20-50-616.png, manifoldcf.log.cleanup, 
> manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-09 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738271#comment-16738271
 ] 

Karl Wright commented on CONNECTORS-1562:
-

The "Stream has been closed" issue is occurring because it is simply taking too 
long to read all the data from the sitemap page, and the webserver is closing 
the connection before it's complete.  Alternatively, it might be because the 
server is configured to cut pages off after a certain number of bytes.  I don't 
know which one it is.  You will need to do some research to figure out what 
your server's rules look like.  The preferred solution would be to simply relax 
the rules for that one page.

However, if that's not possible, the best alternative would be to break the 
sitemap page up into pieces.  If each piece was, say 1/4 the size, it might be 
small enough to get past your current rules.


> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> image-2019-01-09-14-20-50-616.png, manifoldcf.log.cleanup, 
> manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-09 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738232#comment-16738232
 ] 

Karl Wright commented on CONNECTORS-1562:
-

We already discussed the IOEXCEPTION issue; that's because of throttling and 
the connection closing is likely occurring on the server side.

For the NULLPOINTEREXCEPTION, there is a stack trace dumped to the ManifoldCF 
log.  Can you find it and create a ticket with it?  Thanks!


> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> image-2019-01-09-14-20-50-616.png, manifoldcf.log.cleanup, 
> manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-09 Thread Donald Van den Driessche (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738231#comment-16738231
 ] 

Donald Van den Driessche commented on CONNECTORS-1562:
--

Kar

After resolving the issues with the API creation repository connections, I 
retested our crawling locally. With a docker which contains a ManifoldCF and an 
Elasticsearch container.

I used No Bandwith throttles and a max connection count of 25.

This on the seedmap existing of 1 page, our whitelist: 
[https://www.uantwerpen.be/admin/system/sitemap/sitemap.aspx?lang=nl=true]
I still get the Stream Closed I/O exception:.

Do you have any more ideas on how to keep the connection open, so that the 
whole whitelist can be processed?

 

Printscreen Simple Report

!image-2019-01-09-14-20-50-616.png!

Stacktrace
{code:java}
ERROR 2019-01-09T13:08:37,876 (Worker thread '22') - Exception tossed: Repeated 
service interruptions - failure processing document: Stream Closed 
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service 
interruptions - failure processing document: Stream Closed at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:489) 
[mcf-pull-agent.jar:?] Caused by: java.io.IOException: Stream Closed at 
java.io.FileInputStream.readBytes(Native Method) ~[?:1.8.0_191] at 
java.io.FileInputStream.read(FileInputStream.java:255) ~[?:1.8.0_191] at 
sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) ~[?:1.8.0_191] at 
sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) ~[?:1.8.0_191] at 
sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:1.8.0_191] at 
java.io.InputStreamReader.read(InputStreamReader.java:184) ~[?:1.8.0_191] at 
org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex$IndexRequestEntity.writeTo(ElasticSearchIndex.java:221)
 ~[?:?] at 
org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEntityProxy.java:121)
 ~[httpclient-4.5.6.jar:4.5.6] at 
org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
 ~[httpcore-4.4.10.jar:4.4.10] at 
org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:160) 
~[httpclient-4.5.6.jar:4.5.6] at 
org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
 ~[httpcore-4.4.10.jar:4.4.10] at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
 ~[httpcore-4.4.10.jar:4.4.10] at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) 
~[httpclient-4.5.6.jar:4.5.6] at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) 
~[httpclient-4.5.6.jar:4.5.6] at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) 
~[httpclient-4.5.6.jar:4.5.6] at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
 ~[httpclient-4.5.6.jar:4.5.6] at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
 ~[httpclient-4.5.6.jar:4.5.6] at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
 ~[httpclient-4.5.6.jar:4.5.6] at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
 ~[httpclient-4.5.6.jar:4.5.6] at 
org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection$CallThread.run(ElasticSearchConnection.java:133)
 ~[?:?] ERROR 2019-01-09T13:08:37,883 (Worker thread '7') - Exception tossed: 
Repeated service interruptions - failure processing document: Stream Closed 
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service 
interruptions - failure processing document: Stream Closed at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:489) 
[mcf-pull-agent.jar:?] Caused by: java.io.IOException: Stream Closed at 
java.io.FileInputStream.readBytes(Native Method) ~[?:1.8.0_191] at 
java.io.FileInputStream.read(FileInputStream.java:255) ~[?:1.8.0_191] at 
sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) ~[?:1.8.0_191] at 
sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) ~[?:1.8.0_191] at 
sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:1.8.0_191] at 
java.io.InputStreamReader.read(InputStreamReader.java:184) ~[?:1.8.0_191] at 
org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex$IndexRequestEntity.writeTo(ElasticSearchIndex.java:221)
 ~[?:?] at 
org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEntityProxy.java:121)
 ~[httpclient-4.5.6.jar:4.5.6] at 
org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
 ~[httpcore-4.4.10.jar:4.4.10] at 
org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:160) 
~[httpclient-4.5.6.jar:4.5.6] at 
org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
 

[jira] [Updated] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-09 Thread Donald Van den Driessche (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Van den Driessche updated CONNECTORS-1562:
-
Attachment: image-2019-01-09-14-20-50-616.png

> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> image-2019-01-09-14-20-50-616.png, manifoldcf.log.cleanup, 
> manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CONNECTORS-1568) UI error imported web connection

2019-01-09 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1568.
-
Resolution: Fixed

A minor fix has been committed that makes the UI robust against a missing 
truststore in the connection definition.  Otherwise, it sounds like the user 
found an error in their process and that resolved the issue for them.


> UI error imported web connection
> 
>
> Key: CONNECTORS-1568
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1568
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> Using the ManifoldCF API, we export a web repository connector, with basic 
> settings.
>  Than we importing the web connector using the manifoldcf API.
>  The connector get's imported and can be used in a job.
>  When trying to view or edit the connector in the UI following error pops up.
> (connected to issue: 
> [CONNECTORS-1567)|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1567]
> *HTTP ERROR 500*
>  Problem accessing /mcf-crawler-ui/editconnection.jsp. Reason:
>      Server Error
> *Caused by:*
> {code:java}
> org.apache.jasper.JasperException: An exception occurred processing JSP page 
> /editconnection.jsp at line 564
> 561:
> 562: if (className.length() > 0)
> 563: {
> 564:   
> RepositoryConnectorFactory.outputConfigurationBody(threadContext,className,new
>  
> org.apache.manifoldcf.ui.jsp.JspWrapper(out,adminprofile),pageContext.getRequest().getLocale(),parameters,tabName);
> 565: }
> 566: %>
> 567:
> Stacktrace:
>     at 
> org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521)
>     at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:430)
>     at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
>     at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
>     at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>     at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>     at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>     at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>     at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
>     at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>     at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
>     at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>     at org.eclipse.jetty.server.Server.handle(Server.java:497)
>     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
>     at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
>     at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>     at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
>     at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>     at org.apache.manifoldcf.core.common.Base64.decodeString(Base64.java:164)
>     at 
> org.apache.manifoldcf.connectorcommon.keystore.KeystoreManager.(KeystoreManager.java:86)
>     at 
> org.apache.manifoldcf.connectorcommon.interfaces.KeystoreManagerFactory.make(KeystoreManagerFactory.java:47)
>     at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.fillInCertificatesTab(WebcrawlerConnector.java:1701)
>     at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.outputConfigurationBody(WebcrawlerConnector.java:1866)
>     at 
> org.apache.manifoldcf.core.interfaces.ConnectorFactory.outputThisConfigurationBody(ConnectorFactory.java:83)
>     at 
> org.apache.manifoldcf.crawler.interfaces.RepositoryConnectorFactory.outputConfigurationBody(RepositoryConnectorFactory.java:155)
>     at 
> 

[jira] [Commented] (CONNECTORS-1568) UI error imported web connection

2019-01-09 Thread Tim Steenbeke (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738153#comment-16738153
 ] 

Tim Steenbeke commented on CONNECTORS-1568:
---

When debugging the project we found a missing JSON object for trust and this 
created an issue for the UI but the connector still worked.
Now we fixed the bug. so thank you for the help.

> UI error imported web connection
> 
>
> Key: CONNECTORS-1568
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1568
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> Using the ManifoldCF API, we export a web repository connector, with basic 
> settings.
>  Than we importing the web connector using the manifoldcf API.
>  The connector get's imported and can be used in a job.
>  When trying to view or edit the connector in the UI following error pops up.
> (connected to issue: 
> [CONNECTORS-1567)|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1567]
> *HTTP ERROR 500*
>  Problem accessing /mcf-crawler-ui/editconnection.jsp. Reason:
>      Server Error
> *Caused by:*
> {code:java}
> org.apache.jasper.JasperException: An exception occurred processing JSP page 
> /editconnection.jsp at line 564
> 561:
> 562: if (className.length() > 0)
> 563: {
> 564:   
> RepositoryConnectorFactory.outputConfigurationBody(threadContext,className,new
>  
> org.apache.manifoldcf.ui.jsp.JspWrapper(out,adminprofile),pageContext.getRequest().getLocale(),parameters,tabName);
> 565: }
> 566: %>
> 567:
> Stacktrace:
>     at 
> org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521)
>     at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:430)
>     at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
>     at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
>     at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>     at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>     at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>     at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>     at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
>     at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>     at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
>     at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>     at org.eclipse.jetty.server.Server.handle(Server.java:497)
>     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
>     at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
>     at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>     at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
>     at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>     at org.apache.manifoldcf.core.common.Base64.decodeString(Base64.java:164)
>     at 
> org.apache.manifoldcf.connectorcommon.keystore.KeystoreManager.(KeystoreManager.java:86)
>     at 
> org.apache.manifoldcf.connectorcommon.interfaces.KeystoreManagerFactory.make(KeystoreManagerFactory.java:47)
>     at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.fillInCertificatesTab(WebcrawlerConnector.java:1701)
>     at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.outputConfigurationBody(WebcrawlerConnector.java:1866)
>     at 
> org.apache.manifoldcf.core.interfaces.ConnectorFactory.outputThisConfigurationBody(ConnectorFactory.java:83)
>     at 
> org.apache.manifoldcf.crawler.interfaces.RepositoryConnectorFactory.outputConfigurationBody(RepositoryConnectorFactory.java:155)
>     at 
> 

[jira] [Commented] (CONNECTORS-1567) export of web connection bandwidth throttling

2019-01-09 Thread Tim Steenbeke (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738159#comment-16738159
 ] 

Tim Steenbeke commented on CONNECTORS-1567:
---

Same problem as CONNECTORS-1568, we found the issue after debugging and fixed 
it.
Thank you for the help.

> export of web connection bandwidth throttling
> -
>
> Key: CONNECTORS-1567
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1567
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
> Attachments: bandwidth_test_abc.png
>
>
> When exporting the web connector using the API, it doesn't export the 
> bandwidth throttling.
>  Than when importing this connector to a clean manifoldcf it creates the 
> connector with basic bandwidth.
>  When using the connector in a job it works properly.
> The issue here is that the connector isn't created with correct bandwidth 
> throttling.
>  And the connector gives issues in the UI when trying to view or edit.
> (related to issue: 
> [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568])
> e.g.:
> {code:java}
> {
>   "name": "test_web",
>   "configuration": null,
> "_PARAMETER_": [
>   {
> "_attribute_name": "Email address",
> "_value_": "tim.steenbeke@formica.digital"
>   },
>   {
> "_attribute_name": "Robots usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Meta robots tags usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Proxy host",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy port",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication domain",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication user name",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication password",
> "_value_": ""
>   }
> ]
>   },
>   "description": "Website repository standard settup",
>   "throttle": null,
>   "max_connections": 10,
>   "class_name": 
> "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector",
>   "acl_authority": null
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1567) export of web connection bandwidth throttling

2019-01-09 Thread Tim Steenbeke (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738090#comment-16738090
 ] 

Tim Steenbeke commented on CONNECTORS-1567:
---

When putting _"binddesc"_ to null it doesn't seem to help

The process you describe is how we do it.
 # Make the connector in the UI
 # test connector
 # extract connector
 #  clean manifold
 # import connector
 # test connector

So the output should than be new format because we use 2.11.

> export of web connection bandwidth throttling
> -
>
> Key: CONNECTORS-1567
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1567
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
> Attachments: bandwidth_test_abc.png
>
>
> When exporting the web connector using the API, it doesn't export the 
> bandwidth throttling.
>  Than when importing this connector to a clean manifoldcf it creates the 
> connector with basic bandwidth.
>  When using the connector in a job it works properly.
> The issue here is that the connector isn't created with correct bandwidth 
> throttling.
>  And the connector gives issues in the UI when trying to view or edit.
> (related to issue: 
> [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568])
> e.g.:
> {code:java}
> {
>   "name": "test_web",
>   "configuration": null,
> "_PARAMETER_": [
>   {
> "_attribute_name": "Email address",
> "_value_": "tim.steenbeke@formica.digital"
>   },
>   {
> "_attribute_name": "Robots usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Meta robots tags usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Proxy host",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy port",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication domain",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication user name",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication password",
> "_value_": ""
>   }
> ]
>   },
>   "description": "Website repository standard settup",
>   "throttle": null,
>   "max_connections": 10,
>   "class_name": 
> "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector",
>   "acl_authority": null
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1567) export of web connection bandwidth throttling

2019-01-09 Thread Tim Steenbeke (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Steenbeke updated CONNECTORS-1567:
--
Attachment: (was: bandwidth.png)

> export of web connection bandwidth throttling
> -
>
> Key: CONNECTORS-1567
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1567
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
> Attachments: bandwidth_test_abc.png
>
>
> When exporting the web connector using the API, it doesn't export the 
> bandwidth throttling.
>  Than when importing this connector to a clean manifoldcf it creates the 
> connector with basic bandwidth.
>  When using the connector in a job it works properly.
> The issue here is that the connector isn't created with correct bandwidth 
> throttling.
>  And the connector gives issues in the UI when trying to view or edit.
> (related to issue: 
> [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568])
> e.g.:
> {code:java}
> {
>   "name": "test_web",
>   "configuration": null,
> "_PARAMETER_": [
>   {
> "_attribute_name": "Email address",
> "_value_": "tim.steenbeke@formica.digital"
>   },
>   {
> "_attribute_name": "Robots usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Meta robots tags usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Proxy host",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy port",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication domain",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication user name",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication password",
> "_value_": ""
>   }
> ]
>   },
>   "description": "Website repository standard settup",
>   "throttle": null,
>   "max_connections": 10,
>   "class_name": 
> "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector",
>   "acl_authority": null
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1567) export of web connection bandwidth throttling

2019-01-09 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737950#comment-16737950
 ] 

Karl Wright commented on CONNECTORS-1567:
-

The best way to construct an API request for any connection or job is to create 
it in the UI and then export it.  The documentation is correct but it is hard 
to pick through all the details, and the UI is easier.  So that is what I would 
do if I were trying to verify everything worked.

Unfortunately, because ManifoldCF was forced to remove a JSON jar we depended 
on due to a legal ruling by the Board, we had to retrofit a different (and not 
as good) JSON jar in place a few years back, and that had all sorts of 
downstream effects on the API JSON format.  We did not need to change the 
specification, but we did need to change how we output certain constructs to 
JSON to not use the syntactic sugar we earlier could use.  I fixed a bug in 
this area in either MCF 2.10 or 2.11, so anything output before that time might 
not have reimported faithfully.  Hope that helps with the explanation.


> export of web connection bandwidth throttling
> -
>
> Key: CONNECTORS-1567
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1567
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
> Attachments: bandwidth.png, bandwidth_test_abc.png
>
>
> When exporting the web connector using the API, it doesn't export the 
> bandwidth throttling.
>  Than when importing this connector to a clean manifoldcf it creates the 
> connector with basic bandwidth.
>  When using the connector in a job it works properly.
> The issue here is that the connector isn't created with correct bandwidth 
> throttling.
>  And the connector gives issues in the UI when trying to view or edit.
> (related to issue: 
> [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568])
> e.g.:
> {code:java}
> {
>   "name": "test_web",
>   "configuration": null,
> "_PARAMETER_": [
>   {
> "_attribute_name": "Email address",
> "_value_": "tim.steenbeke@formica.digital"
>   },
>   {
> "_attribute_name": "Robots usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Meta robots tags usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Proxy host",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy port",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication domain",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication user name",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication password",
> "_value_": ""
>   }
> ]
>   },
>   "description": "Website repository standard settup",
>   "throttle": null,
>   "max_connections": 10,
>   "class_name": 
> "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector",
>   "acl_authority": null
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)