[ https://issues.apache.org/jira/browse/TIKA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827512#comment-17827512 ]
Gregory Lepore commented on TIKA-4208: -------------------------------------- Hmm, here's what I get: java -Xmx6g -jar ../tika.jar -J -t table23.sas7bdat Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOfRange(Arrays.java:4030) at java.base/java.lang.StringLatin1.newString(StringLatin1.java:715) at java.base/java.lang.StringLatin1.trim(StringLatin1.java:541) at java.base/java.lang.String.trim(String.java:2644) at org.apache.tika.sax.RecursiveParserWrapperHandler.addContent(RecursiveParserWrapperHandler.java:148) at org.apache.tika.sax.RecursiveParserWrapperHandler.endDocument(RecursiveParserWrapperHandler.java:120) at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:180) at org.apache.tika.cli.TikaCLI.handleRecursiveJson(TikaCLI.java:518) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:489) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:256) java -Xmx6g -jar ../tika.jar -J -r table23.sas7bdat Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.base/java.util.Arrays.copyOf(Arrays.java:3745) at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172) at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:686) at java.base/java.lang.StringBuffer.append(StringBuffer.java:414) at java.base/java.io.StringWriter.write(StringWriter.java:99) at org.apache.tika.sax.ToTextContentHandler.characters(ToTextContentHandler.java:96) at org.apache.tika.sax.ToXMLContentHandler.write(ToXMLContentHandler.java:181) at org.apache.tika.sax.ToXMLContentHandler.endElement(ToXMLContentHandler.java:140) at org.apache.tika.parser.RecursiveParserWrapper$RecursivelySecureContentHandler.endElement(RecursiveParserWrapper.java:360) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:134) at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:241) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:134) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:134) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:134) at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:201) at org.apache.tika.sax.XHTMLContentHandler.endElement(XHTMLContentHandler.java:257) at org.apache.tika.sax.XHTMLContentHandler.endElement(XHTMLContentHandler.java:290) at org.apache.tika.parser.sas.SAS7BDATParser.parse(SAS7BDATParser.java:147) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:203) at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:164) at org.apache.tika.cli.TikaCLI.handleRecursiveJson(TikaCLI.java:518) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:489) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:256) Differences in operating systems? I'm on Linux. > OOM error in SAS7BDATParser > --------------------------- > > Key: TIKA-4208 > URL: https://issues.apache.org/jira/browse/TIKA-4208 > Project: Tika > Issue Type: Bug > Affects Versions: 3.0.0-BETA > Reporter: Gregory Lepore > Priority: Minor > Attachments: table23.sas7bdat.zip > > > For this ARC file: > [https://eotarchive.s3.amazonaws.com/crawl-data/EOT-2004/segments/NARA-000/warc/NARA-PEOT-2004-20041019023240-02598-crawling008-c_NARA-PEOT-2004-20041019053819-01693-crawling007.archive.org.arc.gz] > I'm getting an OOM error: > Exception in thread "main" java.lang.OutOfMemoryError: Requested array size > exceeds VM limit > at java.base/java.util.Arrays.copyOf(Arrays.java:3537) > at > java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:228) > > at > java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:740) > > at java.base/java.lang.StringBuffer.append(StringBuffer.java:410) > at java.base/java.io.StringWriter.write(StringWriter.java:99) > at > org.apache.tika.sax.ToTextContentHandler.characters(ToTextContentHandler.java:96) > > at > org.apache.tika.sax.ToXMLContentHandler.writeEscaped(ToXMLContentHandler.java:229) > > at > org.apache.tika.sax.ToXMLContentHandler.characters(ToXMLContentHandler.java:154) > > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143) > > at > org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:253) > > at > org.apache.tika.parser.RecursiveParserWrapper$RecursivelySecureContentHandler.characters(RecursiveParserWrapper.java:370) > > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143) > > at > org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:253) > > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143) > > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143) > > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143) > > at > org.apache.tika.sax.SafeContentHandler.access$101(SafeContentHandler.java:47) > at > org.apache.tika.sax.SafeContentHandler.lambda$new$0(SafeContentHandler.java:57) > > at > org.apache.tika.sax.SafeContentHandler$$Lambda$327/0x00007f94a022d1a8.write(Unknown > Source) > at > org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:106) > at > org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:250) > > at > org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:270) > > at > org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:295) > > at > org.apache.tika.parser.sas.SAS7BDATParser.parse(SAS7BDATParser.java:146) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:203) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:153) > at > org.apache.tika.parser.RecursiveParserWrapper$EmbeddedParserDecorator.parse(RecursiveParserWrapper.java:259) > > at > org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:71) > at > org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109) > > at > org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:455) > when extracting JSON with both the app and server version of 3.0.0 BETA. -- This message was sent by Atlassian Jira (v8.20.10#820010)