[ 
https://issues.apache.org/jira/browse/CONNECTORS-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509893#comment-13509893
 ] 

David Morana commented on CONNECTORS-576:
-----------------------------------------

I agree; I checked the simple history and I only saw Ok and 200; no errors.
Would there be any value in making manifold go around these?
Such as, finish everything else and then go back and try these later?
Corral these docs and put them in a report? Documents successfully crawled but 
solr couldn't take them...
let me know, 
Thanks,
BTW, I really like ManifoldCF; I looked at a lot of other crawlers and this is 
only one that does what I need it to do.
Thanks again,


                
> Manifold gets repeated service interruptions and stops
> ------------------------------------------------------
>
>                 Key: CONNECTORS-576
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-576
>             Project: ManifoldCF
>          Issue Type: Bug
>    Affects Versions: ManifoldCF next
>         Environment: solr 4.0 manifoldcf v1.1-dev on windows 7
>            Reporter: David Morana
>             Fix For: ManifoldCF 1.1
>
>
> Manifold gets repeated service interruptions and stops.
> Is there a way to get more detailed error information?
> such as, the document name/url/location that it's having a problem with?
> In v.5.1 these errors would appear at the very end (the last 130 to 184 
> document) and then stop.
> The solr logs always reported vague TIKA errors
> I'm unsure where the problems lie.
> Here's the manifoldcf log
>  WARN 2012-12-04 10:27:40,722 (Worker thread '0') - Service interruption 
> reported for job 1343845636068 connection 'LISA-DEV': Error 500 from 
> ingestion request; ingestion will be retried again later
> ERROR 2012-12-04 10:27:40,754 (Worker thread '0') - Exception tossed: 
> Repeated service interruptions - failure processing document: Ingestion HTTP 
> error code 500
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service 
> interruptions - failure processing document: Ingestion HTTP error code 500
>       at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> Caused by: org.apache.manifoldcf.core.interfaces.ManifoldCFException: 
> Ingestion HTTP error code 500
>       at 
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:1386)
>  WARN 2012-12-04 10:27:40,847 (Worker thread '24') - Service interruption 
> reported for job 1343845636068 connection 'LISA-DEV': Job no longer active
> And here's the solr log if it helps:
> org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: XML parse error at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:215)
>  at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>  at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>  at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) 
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) 
> at 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to