[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731315#comment-16731315
 ] 

Tim Steenbeke commented on CONNECTORS-1562:
-------------------------------------------

IS this the error ?
{code:java}
 WARN 2018-12-31T08:24:46,453 (Worker thread '32') - Service interruption 
reported for job 1546241012417 connection 'repo_website-en': IO exception: 
Stream Closed
 WARN 2018-12-31T08:28:52,471 (Worker thread '6') - Service interruption 
reported for job 1546241012417 connection 'repo_website-en': IO exception: 
Stream Closed
 WARN 2018-12-31T08:32:10,699 (Worker thread '13') - Service interruption 
reported for job 1546241012417 connection 'repo_website-en': IO exception: 
Stream Closed
ERROR 2018-12-31T08:32:10,750 (Worker thread '13') - Exception tossed: Repeated 
service interruptions - failure processing document: Stream Closed
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service 
interruptions - failure processing document: Stream Closed
        at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:489) 
[mcf-pull-agent.jar:?]
Caused by: java.io.IOException: Stream Closed
        at java.io.FileInputStream.readBytes(Native Method) ~[?:1.8.0_191]
        at java.io.FileInputStream.read(FileInputStream.java:255) ~[?:1.8.0_191]
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) 
~[?:1.8.0_191]
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) 
~[?:1.8.0_191]
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:1.8.0_191]
        at java.io.InputStreamReader.read(InputStreamReader.java:184) 
~[?:1.8.0_191]
        at 
org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex$IndexRequestEntity.writeTo(ElasticSearchIndex.java:221)
 ~[?:?]
        at 
org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEntityProxy.java:121)
 ~[httpclient-4.5.6.jar:4.5.6]
        at 
org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
 ~[httpcore-4.4.10.jar:4.4.10]
        at 
org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:160) 
~[httpclient-4.5.6.jar:4.5.6]
        at 
org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
 ~[httpcore-4.4.10.jar:4.4.10]
        at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
 ~[httpcore-4.4.10.jar:4.4.10]
        at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) 
~[httpclient-4.5.6.jar:4.5.6]
        at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) 
~[httpclient-4.5.6.jar:4.5.6]
        at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) 
~[httpclient-4.5.6.jar:4.5.6]
        at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
 ~[httpclient-4.5.6.jar:4.5.6]
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
 ~[httpclient-4.5.6.jar:4.5.6]
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
 ~[httpclient-4.5.6.jar:4.5.6]
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
 ~[httpclient-4.5.6.jar:4.5.6]
        at 
org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection$CallThread.run(ElasticSearchConnection.java:133)
 ~[?:?]
 WARN 2018-12-31T08:33:35,958 (Job notification thread) - ES: Commit failed: 
{"error":"Incorrect HTTP method for uri [/website-en/_optimize] and method 
[GET], allowed: [POST]","status":405}
 WARN 2018-12-31T08:34:46,024 (Job notification thread) - ES: Commit failed: 
{"error":"Incorrect HTTP method for uri [/pintra/_optimize] and method [GET], 
allowed: [POST]","status":405}
{code}
The time is 1h difference, it's running on a docker container that has 
different timezone atm.

> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> ------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1562
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Elastic Search connector, Web connector
>    Affects Versions: ManifoldCF 2.11
>         Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>            Reporter: Tim Steenbeke
>            Assignee: Karl Wright
>            Priority: Critical
>              Labels: starter
>             Fix For: ManifoldCF 2.12
>
>         Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> manifoldcf.log.cleanup, manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to