[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass
[ https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738287#comment-16738287 ] Donald Van den Driessche commented on CONNECTORS-1562: -- Thanks! I asked the question at the company that provides the sitemap. I much appreciate all your effort! > Documents unreachable due to hopcount are not considered unreachable on > cleanup pass > > > Key: CONNECTORS-1562 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1562 > Project: ManifoldCF > Issue Type: Bug > Components: Elastic Search connector, Web connector >Affects Versions: ManifoldCF 2.11 > Environment: Manifoldcf 2.11 > Elasticsearch 6.3.2 > Web inputconnector > elastic outputconnecotr > Job crawls website input and outputs content to elastic >Reporter: Tim Steenbeke >Assignee: Karl Wright >Priority: Critical > Labels: starter > Fix For: ManifoldCF 2.12 > > Attachments: Screenshot from 2018-12-31 11-17-29.png, > image-2019-01-09-14-20-50-616.png, manifoldcf.log.cleanup, > manifoldcf.log.init, manifoldcf.log.reduced > > Original Estimate: 4h > Remaining Estimate: 4h > > My documents aren't removed from ElasticSearch index after rerunning the > changed seeds > I update my job to change the seedmap and rerun it or use the schedualer to > keep it runneng even after updating it. > After the rerun the unreachable documents don't get deleted. > It only adds doucments when they can be reached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass
[ https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738271#comment-16738271 ] Karl Wright commented on CONNECTORS-1562: - The "Stream has been closed" issue is occurring because it is simply taking too long to read all the data from the sitemap page, and the webserver is closing the connection before it's complete. Alternatively, it might be because the server is configured to cut pages off after a certain number of bytes. I don't know which one it is. You will need to do some research to figure out what your server's rules look like. The preferred solution would be to simply relax the rules for that one page. However, if that's not possible, the best alternative would be to break the sitemap page up into pieces. If each piece was, say 1/4 the size, it might be small enough to get past your current rules. > Documents unreachable due to hopcount are not considered unreachable on > cleanup pass > > > Key: CONNECTORS-1562 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1562 > Project: ManifoldCF > Issue Type: Bug > Components: Elastic Search connector, Web connector >Affects Versions: ManifoldCF 2.11 > Environment: Manifoldcf 2.11 > Elasticsearch 6.3.2 > Web inputconnector > elastic outputconnecotr > Job crawls website input and outputs content to elastic >Reporter: Tim Steenbeke >Assignee: Karl Wright >Priority: Critical > Labels: starter > Fix For: ManifoldCF 2.12 > > Attachments: Screenshot from 2018-12-31 11-17-29.png, > image-2019-01-09-14-20-50-616.png, manifoldcf.log.cleanup, > manifoldcf.log.init, manifoldcf.log.reduced > > Original Estimate: 4h > Remaining Estimate: 4h > > My documents aren't removed from ElasticSearch index after rerunning the > changed seeds > I update my job to change the seedmap and rerun it or use the schedualer to > keep it runneng even after updating it. > After the rerun the unreachable documents don't get deleted. > It only adds doucments when they can be reached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass
[ https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738232#comment-16738232 ] Karl Wright commented on CONNECTORS-1562: - We already discussed the IOEXCEPTION issue; that's because of throttling and the connection closing is likely occurring on the server side. For the NULLPOINTEREXCEPTION, there is a stack trace dumped to the ManifoldCF log. Can you find it and create a ticket with it? Thanks! > Documents unreachable due to hopcount are not considered unreachable on > cleanup pass > > > Key: CONNECTORS-1562 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1562 > Project: ManifoldCF > Issue Type: Bug > Components: Elastic Search connector, Web connector >Affects Versions: ManifoldCF 2.11 > Environment: Manifoldcf 2.11 > Elasticsearch 6.3.2 > Web inputconnector > elastic outputconnecotr > Job crawls website input and outputs content to elastic >Reporter: Tim Steenbeke >Assignee: Karl Wright >Priority: Critical > Labels: starter > Fix For: ManifoldCF 2.12 > > Attachments: Screenshot from 2018-12-31 11-17-29.png, > image-2019-01-09-14-20-50-616.png, manifoldcf.log.cleanup, > manifoldcf.log.init, manifoldcf.log.reduced > > Original Estimate: 4h > Remaining Estimate: 4h > > My documents aren't removed from ElasticSearch index after rerunning the > changed seeds > I update my job to change the seedmap and rerun it or use the schedualer to > keep it runneng even after updating it. > After the rerun the unreachable documents don't get deleted. > It only adds doucments when they can be reached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass
[ https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738231#comment-16738231 ] Donald Van den Driessche commented on CONNECTORS-1562: -- Kar After resolving the issues with the API creation repository connections, I retested our crawling locally. With a docker which contains a ManifoldCF and an Elasticsearch container. I used No Bandwith throttles and a max connection count of 25. This on the seedmap existing of 1 page, our whitelist: [https://www.uantwerpen.be/admin/system/sitemap/sitemap.aspx?lang=nl=true] I still get the Stream Closed I/O exception:. Do you have any more ideas on how to keep the connection open, so that the whole whitelist can be processed? Printscreen Simple Report !image-2019-01-09-14-20-50-616.png! Stacktrace {code:java} ERROR 2019-01-09T13:08:37,876 (Worker thread '22') - Exception tossed: Repeated service interruptions - failure processing document: Stream Closed org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: Stream Closed at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:489) [mcf-pull-agent.jar:?] Caused by: java.io.IOException: Stream Closed at java.io.FileInputStream.readBytes(Native Method) ~[?:1.8.0_191] at java.io.FileInputStream.read(FileInputStream.java:255) ~[?:1.8.0_191] at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) ~[?:1.8.0_191] at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) ~[?:1.8.0_191] at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:1.8.0_191] at java.io.InputStreamReader.read(InputStreamReader.java:184) ~[?:1.8.0_191] at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex$IndexRequestEntity.writeTo(ElasticSearchIndex.java:221) ~[?:?] at org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEntityProxy.java:121) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:160) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection$CallThread.run(ElasticSearchConnection.java:133) ~[?:?] ERROR 2019-01-09T13:08:37,883 (Worker thread '7') - Exception tossed: Repeated service interruptions - failure processing document: Stream Closed org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: Stream Closed at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:489) [mcf-pull-agent.jar:?] Caused by: java.io.IOException: Stream Closed at java.io.FileInputStream.readBytes(Native Method) ~[?:1.8.0_191] at java.io.FileInputStream.read(FileInputStream.java:255) ~[?:1.8.0_191] at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) ~[?:1.8.0_191] at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) ~[?:1.8.0_191] at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:1.8.0_191] at java.io.InputStreamReader.read(InputStreamReader.java:184) ~[?:1.8.0_191] at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex$IndexRequestEntity.writeTo(ElasticSearchIndex.java:221) ~[?:?] at org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEntityProxy.java:121) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:160) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
[jira] [Updated] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass
[ https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Donald Van den Driessche updated CONNECTORS-1562: - Attachment: image-2019-01-09-14-20-50-616.png > Documents unreachable due to hopcount are not considered unreachable on > cleanup pass > > > Key: CONNECTORS-1562 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1562 > Project: ManifoldCF > Issue Type: Bug > Components: Elastic Search connector, Web connector >Affects Versions: ManifoldCF 2.11 > Environment: Manifoldcf 2.11 > Elasticsearch 6.3.2 > Web inputconnector > elastic outputconnecotr > Job crawls website input and outputs content to elastic >Reporter: Tim Steenbeke >Assignee: Karl Wright >Priority: Critical > Labels: starter > Fix For: ManifoldCF 2.12 > > Attachments: Screenshot from 2018-12-31 11-17-29.png, > image-2019-01-09-14-20-50-616.png, manifoldcf.log.cleanup, > manifoldcf.log.init, manifoldcf.log.reduced > > Original Estimate: 4h > Remaining Estimate: 4h > > My documents aren't removed from ElasticSearch index after rerunning the > changed seeds > I update my job to change the seedmap and rerun it or use the schedualer to > keep it runneng even after updating it. > After the rerun the unreachable documents don't get deleted. > It only adds doucments when they can be reached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CONNECTORS-1568) UI error imported web connection
[ https://issues.apache.org/jira/browse/CONNECTORS-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1568. - Resolution: Fixed A minor fix has been committed that makes the UI robust against a missing truststore in the connection definition. Otherwise, it sounds like the user found an error in their process and that resolved the issue for them. > UI error imported web connection > > > Key: CONNECTORS-1568 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1568 > Project: ManifoldCF > Issue Type: Bug > Components: Web connector >Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12 >Reporter: Tim Steenbeke >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.13 > > > Using the ManifoldCF API, we export a web repository connector, with basic > settings. > Than we importing the web connector using the manifoldcf API. > The connector get's imported and can be used in a job. > When trying to view or edit the connector in the UI following error pops up. > (connected to issue: > [CONNECTORS-1567)|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1567] > *HTTP ERROR 500* > Problem accessing /mcf-crawler-ui/editconnection.jsp. Reason: > Server Error > *Caused by:* > {code:java} > org.apache.jasper.JasperException: An exception occurred processing JSP page > /editconnection.jsp at line 564 > 561: > 562: if (className.length() > 0) > 563: { > 564: > RepositoryConnectorFactory.outputConfigurationBody(threadContext,className,new > > org.apache.manifoldcf.ui.jsp.JspWrapper(out,adminprofile),pageContext.getRequest().getLocale(),parameters,tabName); > 565: } > 566: %> > 567: > Stacktrace: > at > org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:430) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at org.apache.manifoldcf.core.common.Base64.decodeString(Base64.java:164) > at > org.apache.manifoldcf.connectorcommon.keystore.KeystoreManager.(KeystoreManager.java:86) > at > org.apache.manifoldcf.connectorcommon.interfaces.KeystoreManagerFactory.make(KeystoreManagerFactory.java:47) > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.fillInCertificatesTab(WebcrawlerConnector.java:1701) > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.outputConfigurationBody(WebcrawlerConnector.java:1866) > at > org.apache.manifoldcf.core.interfaces.ConnectorFactory.outputThisConfigurationBody(ConnectorFactory.java:83) > at > org.apache.manifoldcf.crawler.interfaces.RepositoryConnectorFactory.outputConfigurationBody(RepositoryConnectorFactory.java:155) > at >
[jira] [Commented] (CONNECTORS-1568) UI error imported web connection
[ https://issues.apache.org/jira/browse/CONNECTORS-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738153#comment-16738153 ] Tim Steenbeke commented on CONNECTORS-1568: --- When debugging the project we found a missing JSON object for trust and this created an issue for the UI but the connector still worked. Now we fixed the bug. so thank you for the help. > UI error imported web connection > > > Key: CONNECTORS-1568 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1568 > Project: ManifoldCF > Issue Type: Bug > Components: Web connector >Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12 >Reporter: Tim Steenbeke >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.13 > > > Using the ManifoldCF API, we export a web repository connector, with basic > settings. > Than we importing the web connector using the manifoldcf API. > The connector get's imported and can be used in a job. > When trying to view or edit the connector in the UI following error pops up. > (connected to issue: > [CONNECTORS-1567)|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1567] > *HTTP ERROR 500* > Problem accessing /mcf-crawler-ui/editconnection.jsp. Reason: > Server Error > *Caused by:* > {code:java} > org.apache.jasper.JasperException: An exception occurred processing JSP page > /editconnection.jsp at line 564 > 561: > 562: if (className.length() > 0) > 563: { > 564: > RepositoryConnectorFactory.outputConfigurationBody(threadContext,className,new > > org.apache.manifoldcf.ui.jsp.JspWrapper(out,adminprofile),pageContext.getRequest().getLocale(),parameters,tabName); > 565: } > 566: %> > 567: > Stacktrace: > at > org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:430) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at org.apache.manifoldcf.core.common.Base64.decodeString(Base64.java:164) > at > org.apache.manifoldcf.connectorcommon.keystore.KeystoreManager.(KeystoreManager.java:86) > at > org.apache.manifoldcf.connectorcommon.interfaces.KeystoreManagerFactory.make(KeystoreManagerFactory.java:47) > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.fillInCertificatesTab(WebcrawlerConnector.java:1701) > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.outputConfigurationBody(WebcrawlerConnector.java:1866) > at > org.apache.manifoldcf.core.interfaces.ConnectorFactory.outputThisConfigurationBody(ConnectorFactory.java:83) > at > org.apache.manifoldcf.crawler.interfaces.RepositoryConnectorFactory.outputConfigurationBody(RepositoryConnectorFactory.java:155) > at >
[jira] [Commented] (CONNECTORS-1567) export of web connection bandwidth throttling
[ https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738159#comment-16738159 ] Tim Steenbeke commented on CONNECTORS-1567: --- Same problem as CONNECTORS-1568, we found the issue after debugging and fixed it. Thank you for the help. > export of web connection bandwidth throttling > - > > Key: CONNECTORS-1567 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1567 > Project: ManifoldCF > Issue Type: Bug > Components: Web connector >Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12 >Reporter: Tim Steenbeke >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.13 > > Attachments: bandwidth_test_abc.png > > > When exporting the web connector using the API, it doesn't export the > bandwidth throttling. > Than when importing this connector to a clean manifoldcf it creates the > connector with basic bandwidth. > When using the connector in a job it works properly. > The issue here is that the connector isn't created with correct bandwidth > throttling. > And the connector gives issues in the UI when trying to view or edit. > (related to issue: > [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568]) > e.g.: > {code:java} > { > "name": "test_web", > "configuration": null, > "_PARAMETER_": [ > { > "_attribute_name": "Email address", > "_value_": "tim.steenbeke@formica.digital" > }, > { > "_attribute_name": "Robots usage", > "_value_": "all" > }, > { > "_attribute_name": "Meta robots tags usage", > "_value_": "all" > }, > { > "_attribute_name": "Proxy host", > "_value_": "" > }, > { > "_attribute_name": "Proxy port", > "_value_": "" > }, > { > "_attribute_name": "Proxy authentication domain", > "_value_": "" > }, > { > "_attribute_name": "Proxy authentication user name", > "_value_": "" > }, > { > "_attribute_name": "Proxy authentication password", > "_value_": "" > } > ] > }, > "description": "Website repository standard settup", > "throttle": null, > "max_connections": 10, > "class_name": > "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector", > "acl_authority": null > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1567) export of web connection bandwidth throttling
[ https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738090#comment-16738090 ] Tim Steenbeke commented on CONNECTORS-1567: --- When putting _"binddesc"_ to null it doesn't seem to help The process you describe is how we do it. # Make the connector in the UI # test connector # extract connector # clean manifold # import connector # test connector So the output should than be new format because we use 2.11. > export of web connection bandwidth throttling > - > > Key: CONNECTORS-1567 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1567 > Project: ManifoldCF > Issue Type: Bug > Components: Web connector >Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12 >Reporter: Tim Steenbeke >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.13 > > Attachments: bandwidth_test_abc.png > > > When exporting the web connector using the API, it doesn't export the > bandwidth throttling. > Than when importing this connector to a clean manifoldcf it creates the > connector with basic bandwidth. > When using the connector in a job it works properly. > The issue here is that the connector isn't created with correct bandwidth > throttling. > And the connector gives issues in the UI when trying to view or edit. > (related to issue: > [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568]) > e.g.: > {code:java} > { > "name": "test_web", > "configuration": null, > "_PARAMETER_": [ > { > "_attribute_name": "Email address", > "_value_": "tim.steenbeke@formica.digital" > }, > { > "_attribute_name": "Robots usage", > "_value_": "all" > }, > { > "_attribute_name": "Meta robots tags usage", > "_value_": "all" > }, > { > "_attribute_name": "Proxy host", > "_value_": "" > }, > { > "_attribute_name": "Proxy port", > "_value_": "" > }, > { > "_attribute_name": "Proxy authentication domain", > "_value_": "" > }, > { > "_attribute_name": "Proxy authentication user name", > "_value_": "" > }, > { > "_attribute_name": "Proxy authentication password", > "_value_": "" > } > ] > }, > "description": "Website repository standard settup", > "throttle": null, > "max_connections": 10, > "class_name": > "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector", > "acl_authority": null > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CONNECTORS-1567) export of web connection bandwidth throttling
[ https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Steenbeke updated CONNECTORS-1567: -- Attachment: (was: bandwidth.png) > export of web connection bandwidth throttling > - > > Key: CONNECTORS-1567 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1567 > Project: ManifoldCF > Issue Type: Bug > Components: Web connector >Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12 >Reporter: Tim Steenbeke >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.13 > > Attachments: bandwidth_test_abc.png > > > When exporting the web connector using the API, it doesn't export the > bandwidth throttling. > Than when importing this connector to a clean manifoldcf it creates the > connector with basic bandwidth. > When using the connector in a job it works properly. > The issue here is that the connector isn't created with correct bandwidth > throttling. > And the connector gives issues in the UI when trying to view or edit. > (related to issue: > [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568]) > e.g.: > {code:java} > { > "name": "test_web", > "configuration": null, > "_PARAMETER_": [ > { > "_attribute_name": "Email address", > "_value_": "tim.steenbeke@formica.digital" > }, > { > "_attribute_name": "Robots usage", > "_value_": "all" > }, > { > "_attribute_name": "Meta robots tags usage", > "_value_": "all" > }, > { > "_attribute_name": "Proxy host", > "_value_": "" > }, > { > "_attribute_name": "Proxy port", > "_value_": "" > }, > { > "_attribute_name": "Proxy authentication domain", > "_value_": "" > }, > { > "_attribute_name": "Proxy authentication user name", > "_value_": "" > }, > { > "_attribute_name": "Proxy authentication password", > "_value_": "" > } > ] > }, > "description": "Website repository standard settup", > "throttle": null, > "max_connections": 10, > "class_name": > "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector", > "acl_authority": null > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1567) export of web connection bandwidth throttling
[ https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737950#comment-16737950 ] Karl Wright commented on CONNECTORS-1567: - The best way to construct an API request for any connection or job is to create it in the UI and then export it. The documentation is correct but it is hard to pick through all the details, and the UI is easier. So that is what I would do if I were trying to verify everything worked. Unfortunately, because ManifoldCF was forced to remove a JSON jar we depended on due to a legal ruling by the Board, we had to retrofit a different (and not as good) JSON jar in place a few years back, and that had all sorts of downstream effects on the API JSON format. We did not need to change the specification, but we did need to change how we output certain constructs to JSON to not use the syntactic sugar we earlier could use. I fixed a bug in this area in either MCF 2.10 or 2.11, so anything output before that time might not have reimported faithfully. Hope that helps with the explanation. > export of web connection bandwidth throttling > - > > Key: CONNECTORS-1567 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1567 > Project: ManifoldCF > Issue Type: Bug > Components: Web connector >Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12 >Reporter: Tim Steenbeke >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.13 > > Attachments: bandwidth.png, bandwidth_test_abc.png > > > When exporting the web connector using the API, it doesn't export the > bandwidth throttling. > Than when importing this connector to a clean manifoldcf it creates the > connector with basic bandwidth. > When using the connector in a job it works properly. > The issue here is that the connector isn't created with correct bandwidth > throttling. > And the connector gives issues in the UI when trying to view or edit. > (related to issue: > [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568]) > e.g.: > {code:java} > { > "name": "test_web", > "configuration": null, > "_PARAMETER_": [ > { > "_attribute_name": "Email address", > "_value_": "tim.steenbeke@formica.digital" > }, > { > "_attribute_name": "Robots usage", > "_value_": "all" > }, > { > "_attribute_name": "Meta robots tags usage", > "_value_": "all" > }, > { > "_attribute_name": "Proxy host", > "_value_": "" > }, > { > "_attribute_name": "Proxy port", > "_value_": "" > }, > { > "_attribute_name": "Proxy authentication domain", > "_value_": "" > }, > { > "_attribute_name": "Proxy authentication user name", > "_value_": "" > }, > { > "_attribute_name": "Proxy authentication password", > "_value_": "" > } > ] > }, > "description": "Website repository standard settup", > "throttle": null, > "max_connections": 10, > "class_name": > "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector", > "acl_authority": null > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)