[ 
https://issues.apache.org/jira/browse/CONNECTORS-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906775#comment-13906775
 ] 

Karl Wright commented on CONNECTORS-897:
----------------------------------------

All I can tell from this is that it is coming from HttpClient when it is trying 
to deal with chunked IO.  HttpClient is called by the SolrJ library.  The SolrJ 
library is called by the Solr output connector.  So it's pretty far down in the 
chain of what is going on.

My suspicion, since nobody else has reported this, is that your solr instance 
(or the app server it is running under) is configured to reject posts that are 
too large, and that these respond with some bit of HTML that HttpClient of 
course does not recognize as a valid response to a chunked request.  (I believe 
this is actually the default configuration for later versions of Solr.)  But 
the only way to really debug it is to turn on HttpClient wire debugging and 
crawl one of the affected documents.  You do this by editing logging.ini and 
adding lines pertaining to HttpClient.  Google "HttpClient wire debugging" and 
you should get documentation.

If it is not obvious from the logs what is going on, please let us know.

> IO exception during indexing: missing CR
> ----------------------------------------
>
>                 Key: CONNECTORS-897
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-897
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: CMIS connector
>    Affects Versions: ManifoldCF 1.4.1
>         Environment: Windows 7 with bundled mcf 1.4.1, solr 4.6
>            Reporter: lalit
>
> Hi,
> I have downloaded mcf(ManifoldCF) 1.4.1 source from tag site & build it as 
> per instructions. Now i am using cmis connector to connect to alfresco repo & 
> using solr as output channel.
> When i am crawling alfresco repo for indexing into solr, whenever mcf crawls 
> any media such as image or video, i am getting this error into mcf logs. I 
> have also added adm4.1.jar & xmpcore.jar into ..\contrib\extraction\lib.
> ERROR 2014-02-20 12:50:45,251 (Worker thread '3') - Exception tossed: 
> Repeated service interruptions - failure processing document: missing CR
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service 
> interruptions - failure processing document: missing CR
>       at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:586)
> Caused by: java.io.IOException: missing CR
>       at sun.net.www.http.ChunkedInputStream.processRaw(Unknown Source)
>       at sun.net.www.http.ChunkedInputStream.readAheadBlocking(Unknown Source)
>       at sun.net.www.http.ChunkedInputStream.readAhead(Unknown Source)
>       at sun.net.www.http.ChunkedInputStream.read(Unknown Source)
>       at java.io.FilterInputStream.read(Unknown Source)
>       at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown 
> Source)
>       at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown 
> Source)
>       at 
> org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:69)
>       at 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.doWriteTo(ModifiedHttpMultipart.java:211)
>       at 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.writeTo(ModifiedHttpMultipart.java:229)
>       at 
> org.apache.manifoldcf.agents.output.solr.ModifiedMultipartEntity.writeTo(ModifiedMultipartEntity.java:186)
>       at 
> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>       at 
> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>       at 
> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>       at 
> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>       at 
> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>       at 
> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>       at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>       at 
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
>       at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
>       at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>       at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>       at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>       at 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:291)
>       at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
>       at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>       at 
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:919)
> Regards,
> Lalit.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to