https://bz.apache.org/bugzilla/show_bug.cgi?id=66197

--- Comment #3 from earl <[email protected]> ---
The above error occurred during command line execution. We actually use tika
parser(OfficeParser in this case) to parse documents. While parsing a doc file
of size around 460 MB with a heap size of around 1024 MB, OutOfMemoryError
occurred! I'll attach that stacktrace too
Stacktrace:
at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
  at java.util.Arrays.copyOf([BI)[B (Arrays.java:3236)
  at java.io.ByteArrayOutputStream.toByteArray()[B
(ByteArrayOutputStream.java:191)
  at org.apache.poi.util.IOUtils.toByteArray(Ljava/io/InputStream;JI)[B
(IOUtils.java:199)
  at org.apache.poi.util.IOUtils.toByteArray(Ljava/io/InputStream;I)[B
(IOUtils.java:149)
  at
org.apache.poi.hwpf.HWPFDocumentCore.getDocumentEntryBytes(Ljava/lang/String;II)[B
(HWPFDocumentCore.java:331)
  at
org.apache.poi.hwpf.HWPFDocumentCore.<init>(Lorg/apache/poi/poifs/filesystem/DirectoryNode;)V
(HWPFDocumentCore.java:169)
  at
org.apache.poi.hwpf.HWPFDocument.<init>(Lorg/apache/poi/poifs/filesystem/DirectoryNode;)V
(HWPFDocument.java:193)
  at
org.apache.tika.parser.microsoft.WordExtractor.parse(Lorg/apache/poi/poifs/filesystem/DirectoryNode;Lorg/apache/tika/sax/XHTMLContentHandler;)V
(WordExtractor.java:152)
  at
org.apache.tika.parser.microsoft.OfficeParser.parse(Lorg/apache/poi/poifs/filesystem/DirectoryNode;Lorg/apache/tika/parser/ParseContext;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/sax/XHTMLContentHandler;)V
(OfficeParser.java:216)
  at
org.apache.tika.parser.microsoft.OfficeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
(OfficeParser.java:173)
  at
org.apache.tika.parser.CompositeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
(CompositeParser.java:289)
  at
org.apache.tika.parser.CompositeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
(CompositeParser.java:289)
  at
org.apache.tika.parser.AutoDetectParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
(AutoDetectParser.java:150)

In dominator tree, the thread that occupies large memory contains a byte
array(size=173960244) with the following data:
.............................s!...bjbjS)S)......................4l^.1C.g1C.g.k!.......................................................................................................................................................................8...L...,...x0..................R...4.......4.......4.......4.......4.......#.......#.......#...........................................................$...Y...........X...................................#.......................#.......#.......#.......#.......................................4...............4...............C.......C.......C.......#...............4...............4.......................C.......................................................#.......................C.......C...........t...........................................................................d.......4..................FQ...................3.......T...............r...........0...........\.......g.......C.......g.......d.....................................................................

I'm sorry I didn't ask the question clearly initially.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to