[ https://issues.apache.org/jira/browse/CONNECTORS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright resolved CONNECTORS-1593. ------------------------------------- Resolution: Not A Problem Wasn't a ManifoldCF problem, but rather a corrupt document being constructed by the source repository > Memory issue on > org.apache.fontbox.ttf.GlyphSubstitutionTable.readLangSysTable(GlyphSubstitutionTable.java:147) > --------------------------------------------------------------------------------------------------------------- > > Key: CONNECTORS-1593 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1593 > Project: ManifoldCF > Issue Type: Bug > Components: Tika extractor > Affects Versions: ManifoldCF 2.12 > Reporter: Donald Van den Driessche > Assignee: Karl Wright > Priority: Major > Attachments: image-2019-03-22-08-57-53-887.png > > > I have created an Issue with fontbox too: > > When using the internal Tika extractor in a Manifold Job on certain occasions > I get an Out of Memory Error. > {code:java} > Mar 16 14:20:06 manifold01 manifoldcf[15747]: agents process ran out of > memory - shutting down > Mar 16 14:20:06 manifold01 manifoldcf[15747]: java.lang.OutOfMemoryError: > Java heap space > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.fontbox.ttf.GlyphSubstitutionTable.readLangSysTable(GlyphSubstitutionTable.java:147) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.fontbox.ttf.GlyphSubstitutionTable.readScriptTable(GlyphSubstitutionTable.java:129) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.fontbox.ttf.GlyphSubstitutionTable.readScriptList(GlyphSubstitutionTable.java:98) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.fontbox.ttf.GlyphSubstitutionTable.read(GlyphSubstitutionTable.java:78) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:349) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:173) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:150) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:106) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(PDTrueTypeFont.java:199) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:477) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$MonitoredAddActivityWrapper.sendDocument(IncrementalIngester.java:3471) > Mar 16 14:20:06 manifold01 manifoldcf[15747]: at > digital.formica.manifold.connector.transformation.fetchwebresource.WebresourceFetchTransformationConnector.addOrReplaceDocumentWithException(WebresourceFetchTransformationConnector.java:118) > {code} > I've allocated 8g of heap size, Installed the latest version of Tika (1.20) > and PDFBOX (2.0.14). > But no solutions found. > After a heap dump and analyzing this dump, I notice that it is the Integer > class that takes about 2.6g of memory. > Any suggestions? -- This message was sent by Atlassian JIRA (v7.6.3#76005)