[ 
https://issues.apache.org/jira/browse/CONNECTORS-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509907#comment-13509907
 ] 

David Morana commented on CONNECTORS-576:
-----------------------------------------

All I see in the log is vague errors.
Is there anyway to see exactly what docs are not getting into solr?
I can't seem to find that ignore tika errors command. If you happen to find it 
send it my way please...
I'm almost certain that the tika error is a problem with ppt docs.
                
> Manifold gets repeated service interruptions and stops
> ------------------------------------------------------
>
>                 Key: CONNECTORS-576
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-576
>             Project: ManifoldCF
>          Issue Type: Bug
>    Affects Versions: ManifoldCF next
>         Environment: solr 4.0 manifoldcf v1.1-dev on windows 7
>            Reporter: David Morana
>             Fix For: ManifoldCF 1.1
>
>
> Manifold gets repeated service interruptions and stops.
> Is there a way to get more detailed error information?
> such as, the document name/url/location that it's having a problem with?
> In v.5.1 these errors would appear at the very end (the last 130 to 184 
> document) and then stop.
> The solr logs always reported vague TIKA errors
> I'm unsure where the problems lie.
> Here's the manifoldcf log
>  WARN 2012-12-04 10:27:40,722 (Worker thread '0') - Service interruption 
> reported for job 1343845636068 connection 'LISA-DEV': Error 500 from 
> ingestion request; ingestion will be retried again later
> ERROR 2012-12-04 10:27:40,754 (Worker thread '0') - Exception tossed: 
> Repeated service interruptions - failure processing document: Ingestion HTTP 
> error code 500
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service 
> interruptions - failure processing document: Ingestion HTTP error code 500
>       at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> Caused by: org.apache.manifoldcf.core.interfaces.ManifoldCFException: 
> Ingestion HTTP error code 500
>       at 
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:1386)
>  WARN 2012-12-04 10:27:40,847 (Worker thread '24') - Service interruption 
> reported for job 1343845636068 connection 'LISA-DEV': Job no longer active
> And here's the solr log if it helps:
> org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: XML parse error at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:215)
>  at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>  at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>  at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) 
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) 
> at 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to