[ https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731315#comment-16731315 ]
Tim Steenbeke commented on CONNECTORS-1562: ------------------------------------------- IS this the error ? {code:java} WARN 2018-12-31T08:24:46,453 (Worker thread '32') - Service interruption reported for job 1546241012417 connection 'repo_website-en': IO exception: Stream Closed WARN 2018-12-31T08:28:52,471 (Worker thread '6') - Service interruption reported for job 1546241012417 connection 'repo_website-en': IO exception: Stream Closed WARN 2018-12-31T08:32:10,699 (Worker thread '13') - Service interruption reported for job 1546241012417 connection 'repo_website-en': IO exception: Stream Closed ERROR 2018-12-31T08:32:10,750 (Worker thread '13') - Exception tossed: Repeated service interruptions - failure processing document: Stream Closed org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: Stream Closed at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:489) [mcf-pull-agent.jar:?] Caused by: java.io.IOException: Stream Closed at java.io.FileInputStream.readBytes(Native Method) ~[?:1.8.0_191] at java.io.FileInputStream.read(FileInputStream.java:255) ~[?:1.8.0_191] at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) ~[?:1.8.0_191] at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) ~[?:1.8.0_191] at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:1.8.0_191] at java.io.InputStreamReader.read(InputStreamReader.java:184) ~[?:1.8.0_191] at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex$IndexRequestEntity.writeTo(ElasticSearchIndex.java:221) ~[?:?] at org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEntityProxy.java:121) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:160) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection$CallThread.run(ElasticSearchConnection.java:133) ~[?:?] WARN 2018-12-31T08:33:35,958 (Job notification thread) - ES: Commit failed: {"error":"Incorrect HTTP method for uri [/website-en/_optimize] and method [GET], allowed: [POST]","status":405} WARN 2018-12-31T08:34:46,024 (Job notification thread) - ES: Commit failed: {"error":"Incorrect HTTP method for uri [/pintra/_optimize] and method [GET], allowed: [POST]","status":405} {code} The time is 1h difference, it's running on a docker container that has different timezone atm. > Documents unreachable due to hopcount are not considered unreachable on > cleanup pass > ------------------------------------------------------------------------------------ > > Key: CONNECTORS-1562 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1562 > Project: ManifoldCF > Issue Type: Bug > Components: Elastic Search connector, Web connector > Affects Versions: ManifoldCF 2.11 > Environment: Manifoldcf 2.11 > Elasticsearch 6.3.2 > Web inputconnector > elastic outputconnecotr > Job crawls website input and outputs content to elastic > Reporter: Tim Steenbeke > Assignee: Karl Wright > Priority: Critical > Labels: starter > Fix For: ManifoldCF 2.12 > > Attachments: Screenshot from 2018-12-31 11-17-29.png, > manifoldcf.log.cleanup, manifoldcf.log.init, manifoldcf.log.reduced > > Original Estimate: 4h > Remaining Estimate: 4h > > My documents aren't removed from ElasticSearch index after rerunning the > changed seeds > I update my job to change the seedmap and rerun it or use the schedualer to > keep it runneng even after updating it. > After the rerun the unreachable documents don't get deleted. > It only adds doucments when they can be reached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)