Hi Ahmet,

The exception that seems to be causing the abort is a socket exception
coming from a socket write:

> Caused by: java.net.SocketException: Broken pipe

This makes sense in light of the http code returned from Solr, which
was 413:  http://www.checkupdown.com/status/E413.html .

So there is nothing actually *wrong* with the .aspx documents, but
they are just way too big, and Solr is rejecting them for that reason.

Clearly, though, the Solr connector should recognize this code as
meaning "never retry", so instead of killing the job, it should just
skip the document.  I'll open a ticket for that now.

Karl


On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan <iori...@yahoo.com> wrote:
> Hello,
>
> I am indexing a SharePoint 2010 instance using mcf-trunk (At revision 1432907)
>
> There is no problem with a Document library that contains word excel etc.
>
> However, I receive the following errors with a Document library that has 
> *.aspx files in it.
>
> Status of Jobs => Error: Repeated service interruptions - failure processing 
> document: null
>
>  WARN 2013-01-14 15:00:12,720 (Worker thread '13') - Service interruption 
> reported for job 1358009105156 connection 'iknow': IO exception during 
> indexing: null
> ERROR 2013-01-14 15:00:12,763 (Worker thread '13') - Exception tossed: 
> Repeated service interruptions - failure processing document: null
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service 
> interruptions - failure processing document: null
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> Caused by: org.apache.http.client.ClientProtocolException
>         at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>         at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>         at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>         at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>         at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>         at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>         at 
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry 
> request with a non-repeatable request entity.  The cause lists the reason the 
> original request failed.
>         at 
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>         at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>         at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>         ... 6 more
> Caused by: java.net.SocketException: Broken pipe
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at 
> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>         at 
> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>         at 
> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>         at 
> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>         at 
> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>         at 
> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>         at 
> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>         at 
> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>         at 
> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>         at 
> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>         at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>         at 
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>         ... 8 more
>
> Status of Jobs => Error: Unhandled Solr exception during indexing (0): Server 
> at http://localhost:8983/solr/all returned non ok status:413, message:FULL 
> head
>
>         ERROR 2013-01-14 15:10:42,074 (Worker thread '15') - Exception 
> tossed: Unhandled Solr exception during indexing (0): Server at 
> http://localhost:8983/solr/all returned non ok status:413, message:FULL head
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unhandled Solr 
> exception during indexing (0): Server at http://localhost:8983/solr/all 
> returned non ok status:413, message:FULL head
>         at 
> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
>         at 
> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
>         at 
> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
>         at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
>         at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
>         at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
>         at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
>         at 
> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
>
> On the solr side I see :
>
> INFO: Creating new http client, 
> config:maxConnections=200&maxConnectionsPerHost=8
> 2013-01-14 15:18:21.775:WARN:oejh.HttpParser:Full 
> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616 ...long long chars 
> ... 2B656B6970{}
>
> Thanks,
> Ahmet

Reply via email to