[ 
https://issues.apache.org/jira/browse/TIKA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824879#comment-17824879
 ] 

Gregory Lepore commented on TIKA-4208:
--------------------------------------

java -Xmx4G -Xms4G -jar ../tika.jar file.arc.gz 
 
works, but
 
java -Xmx4G -Xms4G -jar ../tika.jar -J file.arc.gz
 
throws the error. As does all higher values for Xmx and Xms (up to 32GB each) 
when used in conjunction with JSON output.

> OOM error in SAS7BDATParser
> ---------------------------
>
>                 Key: TIKA-4208
>                 URL: https://issues.apache.org/jira/browse/TIKA-4208
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 3.0.0-BETA
>            Reporter: Gregory Lepore
>            Priority: Minor
>
> For this ARC file:
> [https://eotarchive.s3.amazonaws.com/crawl-data/EOT-2004/segments/NARA-000/warc/NARA-PEOT-2004-20041019023240-02598-crawling008-c_NARA-PEOT-2004-20041019053819-01693-crawling007.archive.org.arc.gz]
> I'm getting an OOM error:
> Exception in thread "main" java.lang.OutOfMemoryError: Requested array size 
> exceeds VM limit 
>        at java.base/java.util.Arrays.copyOf(Arrays.java:3537) 
>        at 
> java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:228)
>  
>        at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:740)
>  
>        at java.base/java.lang.StringBuffer.append(StringBuffer.java:410) 
>        at java.base/java.io.StringWriter.write(StringWriter.java:99) 
>        at 
> org.apache.tika.sax.ToTextContentHandler.characters(ToTextContentHandler.java:96)
>  
>        at 
> org.apache.tika.sax.ToXMLContentHandler.writeEscaped(ToXMLContentHandler.java:229)
>  
>        at 
> org.apache.tika.sax.ToXMLContentHandler.characters(ToXMLContentHandler.java:154)
>  
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>        at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:253)
>  
>        at 
> org.apache.tika.parser.RecursiveParserWrapper$RecursivelySecureContentHandler.characters(RecursiveParserWrapper.java:370)
>  
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>        at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:253)
>  
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>        at 
> org.apache.tika.sax.SafeContentHandler.access$101(SafeContentHandler.java:47) 
>        at 
> org.apache.tika.sax.SafeContentHandler.lambda$new$0(SafeContentHandler.java:57)
>  
>        at 
> org.apache.tika.sax.SafeContentHandler$$Lambda$327/0x00007f94a022d1a8.write(Unknown
>  Source) 
>        at 
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:106) 
>        at 
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:250)
>  
>        at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:270)
>  
>        at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:295)
>  
>        at 
> org.apache.tika.parser.sas.SAS7BDATParser.parse(SAS7BDATParser.java:146) 
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) 
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) 
>        at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:203) 
>        at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:153) 
>        at 
> org.apache.tika.parser.RecursiveParserWrapper$EmbeddedParserDecorator.parse(RecursiveParserWrapper.java:259)
>  
>        at 
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:71) 
>        at 
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
>  
>        at 
> org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:455)
> when extracting JSON with both the app and server version of 3.0.0 BETA.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to