[ 
https://issues.apache.org/jira/browse/CONNECTORS-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559082#comment-16559082
 ] 

Karl Wright commented on CONNECTORS-1518:
-----------------------------------------

Hi [~svanschalkwyk], the memory usage for the ElasticSearch connector seemingly 
depends on whether the mapper attachment is used or not.  Here's the code.  
Note that everything is properly streamed when the mapper attachment is used, 
but is dumped into a string buffer when not:

{code}
        if (useMapperAttachments && inputStream != null) {
          if(needComma){
            pw.print(",");
          }
          // I'm told this is not necessary: see CONNECTORS-690
          //pw.print("\"type\" : \"attachment\",");
          pw.print("\"file\" : {");
          String contentType = document.getMimeType();
          if (contentType != null)
            pw.print("\"_content_type\" : "+jsonStringEscape(contentType)+",");
          String fileName = document.getFileName();
          if (fileName != null)
            pw.print("\"_name\" : "+jsonStringEscape(fileName)+",");
          // Since ES 1.0
          pw.print(" \"_content\" : \"");
          Base64 base64 = new Base64();
          base64.encodeStream(inputStream, pw);
          pw.print("\"}");
        }
        
        if (!useMapperAttachments && inputStream != null) {
          if (contentAttributeName != null)
          {
            Reader r = new InputStreamReader(inputStream, Consts.UTF_8);
            StringBuilder sb = new 
StringBuilder((int)document.getBinaryLength());
            char[] buffer = new char[65536];
            while (true)
            {
              int amt = r.read(buffer,0,buffer.length);
              if (amt == -1)
                break;
              sb.append(buffer,0,amt);
            }
            needComma = writeField(pw, needComma, contentAttributeName, new 
String[]{sb.toString()});
          }
        }
{code}

The second clause therefore needs to be reworked to properly stream the content 
rather than going via the StringBuffer in that situation.

So is it correct to assume you're not using the mapper attachment?


> MCF shutting down when Tika is used
> -----------------------------------
>
>                 Key: CONNECTORS-1518
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1518
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Tika extractor
>    Affects Versions: ManifoldCF 2.10
>         Environment: Centos 7
> Prior to crash:
> $free -h
>  total used free shared buff/cache available
> Mem: 15G 1.8G 12G 98M 1.1G 13G
> Swap: 2.0G 0B 2.0G
> After crash:
> $free -h
>  total used free shared buff/cache available
> Mem: 15G 10G 4.0G 98M 1.1G 4.4G
> Swap: 2.0G 0B 2.0G
>  
> {{start-options.env.unix :}}
> {{-Xss500m}}
> {{-Xms1g}}
> {{-Xmx8g}}
> {{-Dorg.apache.manifoldcf.configfile=./properties.xml}}
> {{-Dorg.apache.manifoldcf.jettyshutdowntoken=secret_token}}
> {{-cp}}
> {{.:./lib/mcf-core.jar:./lib/mcf-agents.jar:./lib/mcf-pull-agent.jar:./lib/mcf-ui-core.jar:./lib/mcf-jetty-runner.jar:./lib/jetty-continuation-9.2.3.v20140905.jar:./lib/jetty-http-9.2.3.v20140905.jar:./lib/jetty-io-9.2.3.v20140905.jar:./lib/jetty-jndi-9.2.3.v20140905.jar:./lib/jetty-jsp-jdt-2.3.3.jar:./lib/jetty-plus-9.2.3.v20140905.jar:./lib/jetty-schemas-3.1.M0.jar:./lib/jetty-security-9.2.3.v20140905.jar:./lib/jetty-server-9.2.3.v20140905.jar:./lib/jetty-servlet-9.2.3.v20140905.jar:./lib/jetty-util-9.2.3.v20140905.jar:./lib/jetty-webapp-9.2.3.v20140905.jar:./lib/jetty-xml-9.2.3.v20140905.jar:./lib/hsqldb-2.3.2.jar:./lib/postgresql-42.1.3.jar:./lib/commons-codec-1.10.jar:./lib/commons-collections-3.2.1.jar:./lib/commons-collections4-4.1.jar:./lib/commons-discovery-0.5.jar:./lib/commons-el-1.0.jar:./lib/commons-exec-1.3.jar:./lib/commons-fileupload-1.2.2.jar:./lib/commons-io-2.5.jar:./lib/commons-lang-2.6.jar:./lib/commons-lang3-3.6.jar:./lib/commons-logging-1.2.jar:./lib/ecj-4.3.1.jar:./lib/gson-2.8.0.jar:./lib/guava-21.0.jar:./lib/httpclient-4.5.3.jar:./lib/httpcore-4.4.6.jar:./lib/jasper-6.0.35.jar:./lib/jasper-el-6.0.35.jar:./lib/javax.servlet-api-3.1.0.jar:./lib/jna-4.1.0.jar:./lib/jna-platform-4.1.0.jar:./lib/json-simple-1.1.1.jar:./lib/jsp-api-2.1-glassfish-2.1.v20091210.jar:./lib/juli-6.0.35.jar:./lib/log4j-1.2-api-2.4.1.jar:./lib/log4j-api-2.4.1.jar:./lib/log4j-core-2.4.1.jar:./lib/mail-1.4.5.jar:./lib/serializer-2.7.1.jar:./lib/slf4j-api-1.7.24.jar:./lib/slf4j-simple-1.7.24.jar:./lib/velocity-1.7.jar:./lib/xalan-2.7.1.jar:./lib/xercesImpl-2.10.0.jar:./lib/xml-apis-1.4.01.jar:./lib/zookeeper-3.4.10.jar:}}
>            Reporter: Steph van Schalkwyk
>            Assignee: Karl Wright
>            Priority: Major
>             Fix For: ManifoldCF 2.11
>
>
>   ```Jul 26, 2018 1:21:51 PM 
> org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem
>  WARNING: org.xerial's sqlite-jdbc is not loaded.
>  Please provide the jar on your classpath to parse sqlite files.
>  See tika-parsers/pom.xml for the correct version.
>  agents process ran out of memory - shutting down
>  java.lang.OutOfMemoryError: Java heap space
>  \{{ {{ at java.base/java.util.Arrays.copyOf(Arrays.java:3816)}}}}
>  \{{ {{ at java.base/java.util.BitSet.ensureCapacity(BitSet.java:338)}}}}
>  \{{ {{ at java.base/java.util.BitSet.expandTo(BitSet.java:353)}}}}
>  \{{ {{ at java.base/java.util.BitSet.set(BitSet.java:448)}}}}
>  \{{ {{ at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)}}}}
>  \{{ {{ at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:609)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:343)}}}}
>  \{{ {{ at 
> org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)}}}}
>  \{{ {{ at 
> org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:109)}}}}
>  \{{ {{ at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:179)}}}}
>  \{{ {{ at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:136)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:319)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:170)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:184)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}}}
>  {{ [Thread-475] INFO org.eclipse.jetty.server.ServerConnector - Stopped 
> ServerConnector@37095ded\{HTTP/1.1}{{
> {0.0.0.0:8345}
> }}}}
>  {{ {{[Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - 
> Stopped o.e.j.w.WebAppContext@5a6d5a8f
> {/mcf-api-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-14189461872304124764.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-14189461872304124764.dir/webapp/,UNAVAILABLE]}
> }}{{
> {/opt/manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}}}}}
>  {{ [Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - 
> Stopped 
> o.e.j.w.WebAppContext@6979efad{/mcf-authority-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-11619445383548662284.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-11619445383548662284.dir/webapp/,UNAVAILABLE]}\{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-authority-service.war}}}
>  2018-07-26 13:22:47,170 qtp2061226112-492 FATAL Unable to register shutdown 
> hook because JVM is shutting down. java.lang.IllegalStateException: Cannot 
> add new shutdown hook as this is not started. Current state: STOPPED
>  \{{ {{ at 
> org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)}}}}
>  \{{ {{ at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)}}}}
>  \{{ {{ at 
> org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)}}}}
>  \{{ {{ at 
> org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)}}}}
>  \{{ {{ at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)}}}}
>  \{{ {{ at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)}}}}
>  \{{ {{ at 
> org.apache.logging.log4j.LogManager.getContext(LogManager.java:270)}}}}
>  \{{ {{ at 
> org.apache.log4j.Logger$PrivateManager.getContext(Logger.java:59)}}}}
>  \{{ {{ at org.apache.log4j.Logger.getLogger(Logger.java:37)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.log.Log4JLogChute.init(Log4JLogChute.java:72)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.log.LogManager.createLogChute(LogManager.java:157)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.log.LogManager.updateLog(LogManager.java:269)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.RuntimeInstance.initializeLog(RuntimeInstance.java:871)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.RuntimeInstance.init(RuntimeInstance.java:262)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.RuntimeInstance.requireInitialization(RuntimeInstance.java:302)}}}}
>  \{{ {{ at 
> org.apache.velocity.runtime.RuntimeInstance.getTemplate(RuntimeInstance.java:1531)}}}}
>  \{{ {{ at 
> org.apache.velocity.app.VelocityEngine.mergeTemplate(VelocityEngine.java:343)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.ui.i18n.Messages.outputResourceWithVelocity(Messages.java:159)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.transformation.tika.Messages.outputResourceWithVelocity(Messages.java:136)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.outputSpecificationBody(TikaExtractor.java:544)}}}}
>  \{{ {{ at org.apache.jsp.editjob_jsp._jspService(editjob_jsp.java:3002)}}}}
>  \{{ {{ at 
> org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)}}}}
>  \{{ {{ at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)}}}}
>  \{{ {{ at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388)}}}}
>  \{{ {{ at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)}}}}
>  \{{ {{ at 
> org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)}}}}
>  \{{ {{ at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)}}}}
>  \{{ {{ at org.eclipse.jetty.server.Server.handle(Server.java:497)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)}}}}
>  \{{ {{ at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)}}}}
>  \{{ {{ at java.base/java.lang.Thread.run(Thread.java:844)}}}}[Worker thread 
> '35'] WARN org.apache.tika.parser.microsoft.AbstractPOIFSExtractor - Ignoring 
> unexpected exception while parsing summary entry SummaryInformation
>  java.lang.RuntimeException: java.nio.channels.ClosedByInterruptException
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:151)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream.getBlockIterator(NPOIFSStream.java:95)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSDocument.getBlockIterator(NPOIFSDocument.java:179)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NDocumentInputStream.<init>(NDocumentInputStream.java:82)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.DocumentInputStream.<init>(DocumentInputStream.java:65)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:83)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:448)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)}}}}
>  Caused by: java.nio.channels.ClosedByInterruptException
>  \{{ {{ at 
> java.base/java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:199)}}}}
>  \{{ {{ at 
> java.base/sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:388)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.nio.FileBackedDataSource.size(FileBackedDataSource.java:137)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getChainLoopDetector(NPOIFSFileSystem.java:627)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:149)}}}}
>  \{{ {{ ... 21 more}}}}
>  [Worker thread '35'] WARN 
> org.apache.tika.parser.microsoft.AbstractPOIFSExtractor - Ignoring unexpected 
> exception while parsing summary entry DocumentSummaryInformation
>  java.lang.RuntimeException: java.nio.channels.ClosedChannelException
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:151)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream.getBlockIterator(NPOIFSStream.java:95)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSMiniStore.getBlockAt(NPOIFSMiniStore.java:67)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:169)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:142)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NDocumentInputStream.readFully(NDocumentInputStream.java:264)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NDocumentInputStream.read(NDocumentInputStream.java:162)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.DocumentInputStream.read(DocumentInputStream.java:127)}}}}
>  \{{ {{ at 
> org.apache.poi.util.BoundedInputStream.read(BoundedInputStream.java:121)}}}}
>  \{{ {{ at 
> org.apache.poi.util.BoundedInputStream.read(BoundedInputStream.java:103)}}}}
>  \{{ {{ at org.apache.poi.util.IOUtils.copy(IOUtils.java:312)}}}}
>  \{{ {{ at org.apache.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:70)}}}}
>  \{{ {{ at 
> org.apache.poi.hpsf.PropertySet.isPropertySetStream(PropertySet.java:393)}}}}
>  \{{ {{ at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:191)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:83)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:74)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
>  \{{ {{ at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:448)}}}}
>  \{{ {{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)}}}}
>  Caused by: java.nio.channels.ClosedChannelException
>  \{{ {{ at 
> java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:158)}}}}
>  \{{ {{ at 
> java.base/sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:373)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.nio.FileBackedDataSource.size(FileBackedDataSource.java:137)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getChainLoopDetector(NPOIFSFileSystem.java:627)}}}}
>  \{{ {{ at 
> org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:149)}}}}
>  \{{ {{ ... 30 more}}}} ```}}{{Following up:When these exceptions occur, the 
> heap runs out:13:39:39.856 [Worker thread '49'] WARN 
> org.apache.manifoldcf.jobs - Service interruption reported for job 
> 1532551209410 connection 'file': IO exception: null
>  13:39:39.970 [Worker thread '43'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:40.415 [Worker thread '34'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:40.469 [Worker thread '1'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:43.739 [Worker thread '32'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:44.697 [Worker thread '43'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:45.756 [Worker thread '33'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:45.775 [Worker thread '36'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:46.751 [Worker thread '35'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:46.753 [Worker thread '40'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:47.536 [Worker thread '45'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:48.734 [Worker thread '44'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:50.922 [Worker thread '30'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:54.930 [Worker thread '28'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:40:33.660 [Worker thread '29'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  agents process ran out of memory - shutting down
>  java.lang.OutOfMemoryError: Java heap space
>  \{{ at java.base/java.lang.StringLatin1.newString(StringLatin1.java:549)}}
>  \{{ at java.base/java.lang.StringBuilder.toString(StringBuilder.java:415)}}
>  \{{ at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:341)}}
>  \{{ at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)}}
>  \{{ at 
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}
>  \{{ at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}
>  \{{ at 
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)}}
>  \{{ at 
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)}}
>  \{{ at 
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)}}
>  \{{ at 
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)}}
>  \{{ at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)}}
>  \{{ at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)}}
>  \{{ at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:609)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:343)}}
>  \{{ at 
> org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)}}
>  \{{ at 
> org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:109)}}
>  \{{ at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:179)}}
>  \{{ at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:136)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:319)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:170)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:184)}}
>  \{{ at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)}}
>  \{{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
>  \{{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
>  \{{ at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}
>  \{{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)}}
>  agents process ran out of memory - shutting down
>  java.lang.OutOfMemoryError: Java heap space
>  \{{ at java.base/java.util.Arrays.copyOf(Arrays.java:3744)}}
>  \{{ at 
> java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:146)}}
>  \{{ at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:531)}}
>  \{{ at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:550)}}
>  \{{ at java.base/java.lang.StringBuilder.append(StringBuilder.java:171)}}
>  \{{ at 
> java.base/java.util.regex.Matcher.appendReplacement(Matcher.java:1002)}}
>  \{{ at java.base/java.util.regex.Matcher.replaceAll(Matcher.java:1181)}}
>  \{{ at 
> de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)}}
>  \{{ at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)}}
>  \{{ at 
> de.l3s.boilerpipe.sax.CommonTagActions$3.end(CommonTagActions.java:143)}}
>  \{{ at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.endElement(BoilerpipeHTMLContentHandler.java:183)}}
>  \{{ at 
> org.apache.tika.parser.html.BoilerpipeContentHandler.endElement(BoilerpipeContentHandler.java:175)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)}}
>  \{{ at 
> org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:256)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)}}
>  \{{ at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)}}
>  \{{ at 
> org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:273)}}
>  \{{ at 
> org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:224)}}
>  \{{ at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:109)}}
>  \{{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
>  \{{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
>  \{{ at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}
>  \{{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)}}
>  \{{ at 
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)}}
>  \{{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)}}
>  \{{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)}}
>  \{{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)}}
>  \{{ at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)}}
>  \{{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)}}
>  \{{ at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)}}
>  \{{ at 
> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:448)}}
>  13:40:33.995 [Worker thread '42'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  [Thread-475] INFO org.eclipse.jetty.server.ServerConnector - Stopped 
> ServerConnector@5d235104\{HTTP/1.1}{0.0.0.0:8345}
>  {{[Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - 
> Stopped 
> o.e.j.w.WebAppContext@6105f8a3\{/mcf-api-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-9896962439762567079.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-9896962439762567079.dir/webapp/,UNAVAILABLE]}{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}
>  
>  }}
>  {{[Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - 
> Stopped 
> o.e.j.w.WebAppContext@12365c88\{/mcf-authority-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE]}
>  \{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-authority-service.war}
>  
>  }}
>  
>   
>  
>  Follow-up: When these issues occur, the jvm runs out of space:
>  
>  13:39:39.856 [Worker thread '49'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:39.970 [Worker thread '43'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:40.415 [Worker thread '34'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:40.469 [Worker thread '1'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:43.739 [Worker thread '32'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:44.697 [Worker thread '43'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:45.756 [Worker thread '33'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:45.775 [Worker thread '36'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:46.751 [Worker thread '35'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:46.753 [Worker thread '40'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:47.536 [Worker thread '45'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:48.734 [Worker thread '44'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:50.922 [Worker thread '30'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:39:54.930 [Worker thread '28'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  13:40:33.660 [Worker thread '29'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  agents process ran out of memory - shutting down
>  java.lang.OutOfMemoryError: Java heap space
>  at java.base/java.lang.StringLatin1.newString(StringLatin1.java:549)
>  at java.base/java.lang.StringBuilder.toString(StringBuilder.java:415)
>  at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:341)
>  at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)
>  at 
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>  at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>  at 
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>  at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>  at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>  at 
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>  at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>  at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>  at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)
>  at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:609)
>  at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)
>  at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:343)
>  at 
> org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)
>  at 
> org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:109)
>  at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:179)
>  at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:136)
>  at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:319)
>  at 
> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:170)
>  at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:184)
>  at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>  agents process ran out of memory - shutting down
>  java.lang.OutOfMemoryError: Java heap space
>  at java.base/java.util.Arrays.copyOf(Arrays.java:3744)
>  at 
> java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:146)
>  at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:531)
>  at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:550)
>  at java.base/java.lang.StringBuilder.append(StringBuilder.java:171)
>  at java.base/java.util.regex.Matcher.appendReplacement(Matcher.java:1002)
>  at java.base/java.util.regex.Matcher.replaceAll(Matcher.java:1181)
>  at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)
>  at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)
>  at de.l3s.boilerpipe.sax.CommonTagActions$3.end(CommonTagActions.java:143)
>  at 
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.endElement(BoilerpipeHTMLContentHandler.java:183)
>  at 
> org.apache.tika.parser.html.BoilerpipeContentHandler.endElement(BoilerpipeContentHandler.java:175)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>  at 
> org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:256)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>  at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>  at 
> org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:273)
>  at 
> org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:224)
>  at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:109)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>  at 
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>  at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>  at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>  at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>  at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>  at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>  at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>  at 
> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:448)
>  13:40:33.995 [Worker thread '42'] WARN org.apache.manifoldcf.jobs - Service 
> interruption reported for job 1532551209410 connection 'file': IO exception: 
> null
>  [Thread-475] INFO org.eclipse.jetty.server.ServerConnector - Stopped 
> ServerConnector@5d235104\{HTTP/1.1}{0.0.0.0:8345}
>  [Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped 
> o.e.j.w.WebAppContext@6105f8a3{/mcf-api-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-9896962439762567079.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-9896962439762567079.dir/webapp/,UNAVAILABLE]}\{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}
> [Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped 
> o.e.j.w.WebAppContext@12365c88{/mcf-authority-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE]}
> {/opt/manifoldcf/manifoldcf_single/././web/war/mcf-authority-service.war}
>  This occurs when ES Connector has this issue:
> |07-26-2018 19:34:25.356|Indexation 
> (ES)|file:/var/manifoldcf/corpus/000640.html|CLIENTPROTOCOLEXCEPTION|46190|9|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to