[
https://issues.apache.org/jira/browse/CONNECTORS-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509907#comment-13509907
]
David Morana commented on CONNECTORS-576:
-----------------------------------------
All I see in the log is vague errors.
Is there anyway to see exactly what docs are not getting into solr?
I can't seem to find that ignore tika errors command. If you happen to find it
send it my way please...
I'm almost certain that the tika error is a problem with ppt docs.
> Manifold gets repeated service interruptions and stops
> ------------------------------------------------------
>
> Key: CONNECTORS-576
> URL: https://issues.apache.org/jira/browse/CONNECTORS-576
> Project: ManifoldCF
> Issue Type: Bug
> Affects Versions: ManifoldCF next
> Environment: solr 4.0 manifoldcf v1.1-dev on windows 7
> Reporter: David Morana
> Fix For: ManifoldCF 1.1
>
>
> Manifold gets repeated service interruptions and stops.
> Is there a way to get more detailed error information?
> such as, the document name/url/location that it's having a problem with?
> In v.5.1 these errors would appear at the very end (the last 130 to 184
> document) and then stop.
> The solr logs always reported vague TIKA errors
> I'm unsure where the problems lie.
> Here's the manifoldcf log
> WARN 2012-12-04 10:27:40,722 (Worker thread '0') - Service interruption
> reported for job 1343845636068 connection 'LISA-DEV': Error 500 from
> ingestion request; ingestion will be retried again later
> ERROR 2012-12-04 10:27:40,754 (Worker thread '0') - Exception tossed:
> Repeated service interruptions - failure processing document: Ingestion HTTP
> error code 500
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service
> interruptions - failure processing document: Ingestion HTTP error code 500
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> Caused by: org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> Ingestion HTTP error code 500
> at
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:1386)
> WARN 2012-12-04 10:27:40,847 (Worker thread '24') - Service interruption
> reported for job 1343845636068 connection 'LISA-DEV': Job no longer active
> And here's the solr log if it helps:
> org.apache.solr.common.SolrException:
> org.apache.tika.exception.TikaException: XML parse error at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:215)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> at
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira