[ 
https://issues.apache.org/jira/browse/CONNECTORS-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559269#comment-16559269
 ] 

Karl Wright commented on CONNECTORS-1518:
-----------------------------------------

[~svanschalkwyk], we don't control how much memory Tika takes to do its content 
extraction.  All we can guarantee is that we feed the content to Tika in 
streamed form.  In some cases it will use more memory and may need to load the 
entire document into memory.

The amount of memory you should give MCF when Tika is involved is therefore a 
function of your largest document (hopefully controlled by Allowed Documents 
filtering) times the number of worker threads you have allocated, plus some 
constant amount for overhead.

You can perhaps prove this to yourself better by setting up a Tika service and 
using the Tika external transformer instead.


> MCF shutting down when Tika is used
> -----------------------------------
>
>                 Key: CONNECTORS-1518
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1518
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Tika extractor
>    Affects Versions: ManifoldCF 2.10
>         Environment: Centos 7
> Prior to crash:
> $free -h
>  total used free shared buff/cache available
> Mem: 15G 1.8G 12G 98M 1.1G 13G
> Swap: 2.0G 0B 2.0G
> After crash:
> $free -h
>  total used free shared buff/cache available
> Mem: 15G 10G 4.0G 98M 1.1G 4.4G
> Swap: 2.0G 0B 2.0G
>  
> {{start-options.env.unix :}}
> {{-Xss500m}}
> {{-Xms1g}}
> {{-Xmx8g}}
> {{-Dorg.apache.manifoldcf.configfile=./properties.xml}}
> {{-Dorg.apache.manifoldcf.jettyshutdowntoken=secret_token}}
> {{-cp}}
> {{.:./lib/mcf-core.jar:./lib/mcf-agents.jar:./lib/mcf-pull-agent.jar:./lib/mcf-ui-core.jar:./lib/mcf-jetty-runner.jar:./lib/jetty-continuation-9.2.3.v20140905.jar:./lib/jetty-http-9.2.3.v20140905.jar:./lib/jetty-io-9.2.3.v20140905.jar:./lib/jetty-jndi-9.2.3.v20140905.jar:./lib/jetty-jsp-jdt-2.3.3.jar:./lib/jetty-plus-9.2.3.v20140905.jar:./lib/jetty-schemas-3.1.M0.jar:./lib/jetty-security-9.2.3.v20140905.jar:./lib/jetty-server-9.2.3.v20140905.jar:./lib/jetty-servlet-9.2.3.v20140905.jar:./lib/jetty-util-9.2.3.v20140905.jar:./lib/jetty-webapp-9.2.3.v20140905.jar:./lib/jetty-xml-9.2.3.v20140905.jar:./lib/hsqldb-2.3.2.jar:./lib/postgresql-42.1.3.jar:./lib/commons-codec-1.10.jar:./lib/commons-collections-3.2.1.jar:./lib/commons-collections4-4.1.jar:./lib/commons-discovery-0.5.jar:./lib/commons-el-1.0.jar:./lib/commons-exec-1.3.jar:./lib/commons-fileupload-1.2.2.jar:./lib/commons-io-2.5.jar:./lib/commons-lang-2.6.jar:./lib/commons-lang3-3.6.jar:./lib/commons-logging-1.2.jar:./lib/ecj-4.3.1.jar:./lib/gson-2.8.0.jar:./lib/guava-21.0.jar:./lib/httpclient-4.5.3.jar:./lib/httpcore-4.4.6.jar:./lib/jasper-6.0.35.jar:./lib/jasper-el-6.0.35.jar:./lib/javax.servlet-api-3.1.0.jar:./lib/jna-4.1.0.jar:./lib/jna-platform-4.1.0.jar:./lib/json-simple-1.1.1.jar:./lib/jsp-api-2.1-glassfish-2.1.v20091210.jar:./lib/juli-6.0.35.jar:./lib/log4j-1.2-api-2.4.1.jar:./lib/log4j-api-2.4.1.jar:./lib/log4j-core-2.4.1.jar:./lib/mail-1.4.5.jar:./lib/serializer-2.7.1.jar:./lib/slf4j-api-1.7.24.jar:./lib/slf4j-simple-1.7.24.jar:./lib/velocity-1.7.jar:./lib/xalan-2.7.1.jar:./lib/xercesImpl-2.10.0.jar:./lib/xml-apis-1.4.01.jar:./lib/zookeeper-3.4.10.jar:}}
>            Reporter: Steph van Schalkwyk
>            Assignee: Karl Wright
>            Priority: Major
>             Fix For: ManifoldCF 2.11
>
>         Attachments: CONNECTORS-1518.patch
>
>
>   ```Jul 26, 2018 1:21:51 PM 
> org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem
>  WARNING: org.xerial's sqlite-jdbc is not loaded.
>  Please provide the jar on your classpath to parse sqlite files.
>  See tika-parsers/pom.xml for the correct version.
>  agents process ran out of memory - shutting down
>  java.lang.OutOfMemoryError: Java heap space
>  \{{ {{ at java.base/java.util.Arrays.copyOf(Arrays.java:3816)}}}}
>  \{{ {{ at java.base/java.util.BitSet.ensureCapacity(BitSet.java:338)}}}}
>  \{{ {{ at java.base/java.util.BitSet.expandTo(BitSet.java:353)}}}}
>  \{{ {{ at java.base/java.util.BitSet.set(BitSet.java:448)}}}}
>  \{{ {{ at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:609)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:343)}}}}
>  \{{ {{ at 
> org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)}}}}
>  \{{ {{ at 
> org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:109)}}}}
>  \{{ {{ at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:179)}}}}
>  \{{ {{ at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:136)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:319)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:170)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:184)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}}}
>  {{ [Thread-475] INFO org.eclipse.jetty.server.ServerConnector - Stopped 
> ServerConnector@37095ded\{HTTP/1.1}{{
> {0.0.0.0:8345}
> }}}}
>  {{ {{[Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - 
> Stopped o.e.j.w.WebAppContext@5a6d5a8f
> {/mcf-api-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-14189461872304124764.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-14189461872304124764.dir/webapp/,UNAVAILABLE]}
> }}{{
> {/opt/manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}}}}}
>  {{ [Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - 
> Stopped 
> o.e.j.w.WebAppContext@6979efad{/mcf-authority-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-11619445383548662284.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-11619445383548662284.dir/webapp/,UNAVAILABLE]}\{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-authority-service.war}}}
>  2018-07-26 13:22:47,170 qtp2061226112-492 FATAL Unable to register shutdown 
> hook because JVM is shutting down. java.lang.IllegalStateException: Cannot 
> add new shutdown hook as this is not started. Current state: STOPPED
>  \{{ {{ at 
> org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)}}}}
>  \{{ {{ at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)}}}}
>  \{{ {{ at 
> org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)}}}}
>  \{{ {{ at 
> org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)}}}}
>  \{{ {{ at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)}}}}
>  \{{ {{ at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)}}}}
>  \{{ {{ at 
> org.apache.logging.log4j.LogManager.getContext(LogManager.java:270)}}}}
>  \{{ {{ at 
> org.apache.log4j.Logger$PrivateManager.getContext(Logger.java:59)}}}}
>  \{{ {{ at org.apache.log4j.Logger.getLogger(Logger.java:37)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.log.Log4JLogChute.init(Log4JLogChute.java:72)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.log.LogManager.createLogChute(LogManager.java:157)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.log.LogManager.updateLog(LogManager.java:269)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.RuntimeInstance.initializeLog(RuntimeInstance.java:871)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.RuntimeInstance.init(RuntimeInstance.java:262)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.RuntimeInstance.requireInitialization(RuntimeInstance.java:302)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.RuntimeInstance.getTemplate(RuntimeInstance.java:1531)}}}}
>  \{{ {{ at 
> org.apache.velocity.app.VelocityEngine.mergeTemplate(VelocityEngine.java:343)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.ui.i18n.Messages.outputResourceWithVelocity(Messages.java:159)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.transformation.tika.Messages.outputResourceWithVelocity(Messages.java:136)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.outputSpecificationBody(TikaExtractor.java:544)}}}}
>  \{{ {{ at org.apache.jsp.editjob_jsp._jspService(editjob_jsp.java:3002)}}}}
>  \{{ {{ at 
> org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)}}}}
>  \{{ {{ at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)}}}}
>  \{{ {{ at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388)}}}}
>  \{{ {{ at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)}}}}
>  \{{ {{ at 
> org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)}}}}
>  \{{ {{ at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)}}}}
>  \{{ {{ at org.eclipse.jetty.server.Server.handle(Server.java:497)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)}}}}
>  \{{ {{ at java.base/java.lang.Thread.run(Thread.java:844)}}}}[Worker thread 
> '35'] WARN org.apache.tika.parser.microsoft.AbstractPOIFSExtractor - Ignoring 
> unexpected exception while parsing summary entry SummaryInformation
>  java.lang.RuntimeException: java.nio.channels.ClosedByInterruptException
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:151)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream.getBlockIterator(NPOIFSStream.java:95)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSDocument.getBlockIterator(NPOIFSDocument.java:179)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NDocumentInputStream.<init>(NDocumentInputStream.java:82)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.DocumentInputStream.<init>(DocumentInputStream.java:65)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:83)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:448)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)}}}}
>  Caused by: java.nio.channels.ClosedByInterruptException
>  \{{ {{ at 
> java.base/java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:199)}}}}
>  \{{ {{ at 
> java.base/sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:388)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.nio.FileBackedDataSource.size(FileBackedDataSource.java:137)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getChainLoopDetector(NPOIFSFileSystem.java:627)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:149)}}}}
>  \{{ {{ ... 21 more}}}}
>  [Worker thread '35'] WARN 
> org.apache.tika.parser.microsoft.AbstractPOIFSExtractor - Ignoring unexpected 
> exception while parsing summary entry DocumentSummaryInformation
>  java.lang.RuntimeException: java.nio.channels.ClosedChannelException
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:151)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream.getBlockIterator(NPOIFSStream.java:95)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSMiniStore.getBlockAt(NPOIFSMiniStore.java:67)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:169)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:142)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NDocumentInputStream.readFully(NDocumentInputStream.java:264)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NDocumentInputStream.read(NDocumentInputStream.java:162)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.DocumentInputStream.read(DocumentInputStream.java:127)}}}}
>  \{{ {{ at 
> org.apache.poi.util.BoundedInputStream.read(BoundedInputStream.java:121)}}}}
>  \{{ {{ at 
> org.apache.poi.util.BoundedInputStream.read(BoundedInputStream.java:103)}}}}
>  \{{ {{ at org.apache.poi.util.IOUtils.copy(IOUtils.java:312)}}}}
>  \{{ {{ at org.apache.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:70)}}}}
>  \{{ {{ at 
> org.apache.poi.hpsf.PropertySet.isPropertySetStream(PropertySet.java:393)}}}}
>  \{{ {{ at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:191)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:83)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:74)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:448)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)}}}}
>  Caused by: java.nio.channels.ClosedChannelException
>  \{{ {{ at 
> java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:158)}}}}
>  \{{ {{ at 
> java.base/sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:373)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.nio.FileBackedDataSource.size(FileBackedDataSource.java:137)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getChainLoopDetector(NPOIFSFileSystem.java:627)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:149)}}}}
>  \{{ {{ ... 30 more}}}} ```}}{{Following up:When these exceptions occur, the 
> heap runs out:13:39:39.856 [Worker thread '49'] WARN 
> org.apache.manifoldcf.jobs - Service interruption reported for job 
> 1532551209410 connection 'file': IO exception: null
>  13:39:39.970 [Worker thread '43'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:40.415 [Worker thread '34'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:40.469 [Worker thread '1'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:43.739 [Worker thread '32'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:44.697 [Worker thread '43'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:45.756 [Worker thread '33'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:45.775 [Worker thread '36'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:46.751 [Worker thread '35'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:46.753 [Worker thread '40'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:47.536 [Worker thread '45'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:48.734 [Worker thread '44'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:50.922 [Worker thread '30'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:54.930 [Worker thread '28'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:40:33.660 [Worker thread '29'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  agents process ran out of memory - shutting down
>  java.lang.OutOfMemoryError: Java heap space
>  \{{ at java.base/java.lang.StringLatin1.newString(StringLatin1.java:549)}}
>  \{{ at java.base/java.lang.StringBuilder.toString(StringBuilder.java:415)}}
>  \{{ at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:341)}}
>  \{{ at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)}}
>  \{{ at 
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}
>  \{{ at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}
>  \{{ at 
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)}}
>  \{{ at 
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)}}
>  \{{ at 
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)}}
>  \{{ at 
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)}}
>  \{{ at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)}}
>  \{{ at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)}}
>  \{{ at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:609)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:343)}}
>  \{{ at 
> org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)}}
>  \{{ at 
> org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:109)}}
>  \{{ at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:179)}}
>  \{{ at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:136)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:319)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:170)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:184)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)}}
>  \{{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
>  \{{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
>  \{{ at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}
>  \{{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)}}
>  agents process ran out of memory - shutting down
>  java.lang.OutOfMemoryError: Java heap space
>  \{{ at java.base/java.util.Arrays.copyOf(Arrays.java:3744)}}
>  \{{ at 
> java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:146)}}
>  \{{ at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:531)}}
>  \{{ at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:550)}}
>  \{{ at java.base/java.lang.StringBuilder.append(StringBuilder.java:171)}}
>  \{{ at 
> java.base/java.util.regex.Matcher.appendReplacement(Matcher.java:1002)}}
>  \{{ at java.base/java.util.regex.Matcher.replaceAll(Matcher.java:1181)}}
>  \{{ at 
> de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)}}
>  \{{ at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)}}
>  \{{ at 
> de.l3s.boilerpipe.sax.CommonTagActions$3.end(CommonTagActions.java:143)}}
>  \{{ at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.endElement(BoilerpipeHTMLContentHandler.java:183)}}
>  \{{ at 
> org.apache.tika.parser.html.BoilerpipeContentHandler.endElement(BoilerpipeContentHandler.java:175)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)}}
>  \{{ at 
> org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:256)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)}}
>  \{{ at 
> org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:273)}}
>  \{{ at 
> org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:224)}}
>  \{{ at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:109)}}
>  \{{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
>  \{{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
>  \{{ at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}
>  \{{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)}}
>  \{{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)}}
>  \{{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)}}
>  \{{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)}}
>  \{{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)}}
>  \{{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)}}
>  \{{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)}}
>  \{{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)}}
>  \{{ at 
> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:448)}}
>  13:40:33.995 [Worker thread '42'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  [Thread-475] INFO org.eclipse.jetty.server.ServerConnector - Stopped 
> ServerConnector@5d235104\{HTTP/1.1}{0.0.0.0:8345}
>  {{[Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - 
> Stopped 
> o.e.j.w.WebAppContext@6105f8a3\{/mcf-api-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-9896962439762567079.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-9896962439762567079.dir/webapp/,UNAVAILABLE]}{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}
>  
>  }}
>  {{[Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - 
> Stopped 
> o.e.j.w.WebAppContext@12365c88\{/mcf-authority-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE]}
>  \{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-authority-service.war}
>  
>  }}
>  
>   
>  
>  Follow-up: When these issues occur, the jvm runs out of space:
>  
>  13:39:39.856 [Worker thread '49'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:39.970 [Worker thread '43'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:40.415 [Worker thread '34'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:40.469 [Worker thread '1'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:43.739 [Worker thread '32'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:44.697 [Worker thread '43'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:45.756 [Worker thread '33'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:45.775 [Worker thread '36'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:46.751 [Worker thread '35'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:46.753 [Worker thread '40'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:47.536 [Worker thread '45'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:48.734 [Worker thread '44'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:50.922 [Worker thread '30'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:54.930 [Worker thread '28'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:40:33.660 [Worker thread '29'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  agents process ran out of memory - shutting down
>  java.lang.OutOfMemoryError: Java heap space
>  at java.base/java.lang.StringLatin1.newString(StringLatin1.java:549)
>  at java.base/java.lang.StringBuilder.toString(StringBuilder.java:415)
>  at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:341)
>  at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)
>  at 
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>  at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>  at 
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>  at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>  at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>  at 
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>  at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>  at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>  at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)
>  at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:609)
>  at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)
>  at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:343)
>  at 
> org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)
>  at 
> org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:109)
>  at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:179)
>  at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:136)
>  at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:319)
>  at 
> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:170)
>  at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:184)
>  at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>  agents process ran out of memory - shutting down
>  java.lang.OutOfMemoryError: Java heap space
>  at java.base/java.util.Arrays.copyOf(Arrays.java:3744)
>  at 
> java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:146)
>  at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:531)
>  at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:550)
>  at java.base/java.lang.StringBuilder.append(StringBuilder.java:171)
>  at java.base/java.util.regex.Matcher.appendReplacement(Matcher.java:1002)
>  at java.base/java.util.regex.Matcher.replaceAll(Matcher.java:1181)
>  at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)
>  at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)
>  at de.l3s.boilerpipe.sax.CommonTagActions$3.end(CommonTagActions.java:143)
>  at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.endElement(BoilerpipeHTMLContentHandler.java:183)
>  at 
> org.apache.tika.parser.html.BoilerpipeContentHandler.endElement(BoilerpipeContentHandler.java:175)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>  at 
> org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:256)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>  at 
> org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:273)
>  at 
> org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:224)
>  at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:109)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>  at 
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>  at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>  at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>  at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>  at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>  at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>  at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>  at 
> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:448)
>  13:40:33.995 [Worker thread '42'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  [Thread-475] INFO org.eclipse.jetty.server.ServerConnector - Stopped 
> ServerConnector@5d235104\{HTTP/1.1}{0.0.0.0:8345}
>  [Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped 
> o.e.j.w.WebAppContext@6105f8a3{/mcf-api-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-9896962439762567079.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-9896962439762567079.dir/webapp/,UNAVAILABLE]}\{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}
> [Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped 
> o.e.j.w.WebAppContext@12365c88{/mcf-authority-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE]}
> {/opt/manifoldcf/manifoldcf_single/././web/war/mcf-authority-service.war}
>  This occurs when ES Connector has this issue:
> |07-26-2018 19:34:25.356|Indexation 
> (ES)|file:/var/manifoldcf/corpus/000640.html|CLIENTPROTOCOLEXCEPTION|46190|9|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to