[jira] [Commented] (TIKA-3809) OutOfMemoryError occurs while reading doc file

2022-07-05 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562484#comment-17562484
 ] 

Nick Burch commented on TIKA-3809:
--

If the uncompressed XML is 250mb, then you're going to need a heap a lot lot 
bigger than 750mb = 3x the uncompressed size, if you want to use the DOM-based 
parsers. I'd try with about 3gb (so a bit over 10x) and be prepared to go up to 
about 20x uncompressed size for your heap

> OutOfMemoryError occurs while reading doc file
> --
>
> Key: TIKA-3809
> URL: https://issues.apache.org/jira/browse/TIKA-3809
> Project: Tika
>  Issue Type: Bug
>  Components: app
>Affects Versions: 1.23
>Reporter: earl
>Priority: Blocker
>
> OutOfMemoryError occurs while parsing a docx file of size 8 MB (uncompressed 
> size 250 MB). while analyzing the heapdump(.hprof), the thread that parses 
> the file consumes about 750 MB heap size. while looking into a 
> dominator_tree, 
> {code:java}
> org.apache.xmlbeans.impl.store.Xobj$ElementXobj
> {code}
>  This object has been created many times!
> I've also attached the stacktrace,
> {code:java}
> at 
> org.apache.xmlbeans.impl.store.Cur.createElementXobj(Lorg/apache/xmlbeans/impl/store/Locale;Ljavax/xml/namespace/QName;Ljavax/xml/namespace/QName;)Lorg/apache/xmlbeans/impl/store/Xobj;
>  (Cur.java:260)
>   at 
> org.apache.xmlbeans.impl.store.Cur$CurLoadContext.startElement(Ljavax/xml/namespace/QName;)V
>  (Cur.java:2997)
>   at 
> org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Lorg/xml/sax/Attributes;)V
>  (Locale.java:3164)
>   at 
> org.apache.xerces.parsers.AbstractSAXParser.startElement(Lorg/apache/xerces/xni/QName;Lorg/apache/xerces/xni/XMLAttributes;Lorg/apache/xerces/xni/Augmentations;)V
>  (Unknown Source)
>   at 
> org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Lorg/apache/xerces/xni/QName;Lorg/apache/xerces/xni/XMLAttributes;Lorg/apache/xerces/xni/Augmentations;)V
>  (Unknown Source)
>   at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement()Z 
> (Unknown Source)
>   at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Z)Z
>  (Unknown Source)
>   at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Z)Z 
> (Unknown Source)
>   at org.apache.xerces.parsers.XML11Configuration.parse(Z)Z (Unknown Source)
>   at 
> org.apache.xerces.parsers.XML11Configuration.parse(Lorg/apache/xerces/xni/parser/XMLInputSource;)V
>  (Unknown Source)
>   at 
> org.apache.xerces.parsers.XMLParser.parse(Lorg/apache/xerces/xni/parser/XMLInputSource;)V
>  (Unknown Source)
>   at 
> org.apache.xerces.parsers.AbstractSAXParser.parse(Lorg/xml/sax/InputSource;)V 
> (Unknown Source)
>   at 
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Lorg/xml/sax/InputSource;)V
>  (Unknown Source)
>   at 
> org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Lorg/apache/xmlbeans/impl/store/Locale;Lorg/xml/sax/InputSource;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/impl/store/Cur;
>  (Locale.java:3422)
>   at 
> org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject;
>  (Locale.java:1272)
>   at 
> org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Lorg/apache/xmlbeans/SchemaTypeLoader;Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject;
>  (Locale.java:1259)
>   at 
> org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject;
>  (SchemaTypeLoaderBase.java:345)
>   at 
> org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Ljava/io/InputStream;Lorg/apache/xmlbeans/XmlOptions;)Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/DocumentDocument;
>  (Unknown Source)
>   at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead()V 
> (XWPFDocument.java:178)
>   at 
> org.apache.poi.ooxml.POIXMLDocument.load(Lorg/apache/poi/ooxml/POIXMLFactory;)V
>  (POIXMLDocument.java:184)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.(Lorg/apache/poi/openxml4j/opc/OPCPackage;)V
>  (XWPFDocument.java:138)
>   at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.(Lorg/apache/poi/openxml4j/opc/OPCPackage;)V
>  (XWPFWordExtractor.java:60)
>   at 
> org.apache.poi.ooxml.extractor.ExtractorFactory.createExtractor(Lorg/apache/poi/openxml4j/opc/OPCPackage;)Lorg/apache/poi/extractor/POITextExtractor;
>  (ExtractorFactory.java:224)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentH

[jira] [Commented] (TIKA-3809) OutOfMemoryError occurs while reading doc file

2022-07-05 Thread earl (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562594#comment-17562594
 ] 

earl commented on TIKA-3809:


I've been using tika-app-1.23 for parsing. does this version of tika uses dom 
parser? I saw it has been fixed in earlier version itself in this issue 
TIKA-2109 ?

> OutOfMemoryError occurs while reading doc file
> --
>
> Key: TIKA-3809
> URL: https://issues.apache.org/jira/browse/TIKA-3809
> Project: Tika
>  Issue Type: Bug
>  Components: app
>Affects Versions: 1.23
>Reporter: earl
>Priority: Blocker
>
> OutOfMemoryError occurs while parsing a docx file of size 8 MB (uncompressed 
> size 250 MB). while analyzing the heapdump(.hprof), the thread that parses 
> the file consumes about 750 MB heap size. while looking into a 
> dominator_tree, 
> {code:java}
> org.apache.xmlbeans.impl.store.Xobj$ElementXobj
> {code}
>  This object has been created many times!
> I've also attached the stacktrace,
> {code:java}
> at 
> org.apache.xmlbeans.impl.store.Cur.createElementXobj(Lorg/apache/xmlbeans/impl/store/Locale;Ljavax/xml/namespace/QName;Ljavax/xml/namespace/QName;)Lorg/apache/xmlbeans/impl/store/Xobj;
>  (Cur.java:260)
>   at 
> org.apache.xmlbeans.impl.store.Cur$CurLoadContext.startElement(Ljavax/xml/namespace/QName;)V
>  (Cur.java:2997)
>   at 
> org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Lorg/xml/sax/Attributes;)V
>  (Locale.java:3164)
>   at 
> org.apache.xerces.parsers.AbstractSAXParser.startElement(Lorg/apache/xerces/xni/QName;Lorg/apache/xerces/xni/XMLAttributes;Lorg/apache/xerces/xni/Augmentations;)V
>  (Unknown Source)
>   at 
> org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Lorg/apache/xerces/xni/QName;Lorg/apache/xerces/xni/XMLAttributes;Lorg/apache/xerces/xni/Augmentations;)V
>  (Unknown Source)
>   at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement()Z 
> (Unknown Source)
>   at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Z)Z
>  (Unknown Source)
>   at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Z)Z 
> (Unknown Source)
>   at org.apache.xerces.parsers.XML11Configuration.parse(Z)Z (Unknown Source)
>   at 
> org.apache.xerces.parsers.XML11Configuration.parse(Lorg/apache/xerces/xni/parser/XMLInputSource;)V
>  (Unknown Source)
>   at 
> org.apache.xerces.parsers.XMLParser.parse(Lorg/apache/xerces/xni/parser/XMLInputSource;)V
>  (Unknown Source)
>   at 
> org.apache.xerces.parsers.AbstractSAXParser.parse(Lorg/xml/sax/InputSource;)V 
> (Unknown Source)
>   at 
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Lorg/xml/sax/InputSource;)V
>  (Unknown Source)
>   at 
> org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Lorg/apache/xmlbeans/impl/store/Locale;Lorg/xml/sax/InputSource;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/impl/store/Cur;
>  (Locale.java:3422)
>   at 
> org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject;
>  (Locale.java:1272)
>   at 
> org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Lorg/apache/xmlbeans/SchemaTypeLoader;Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject;
>  (Locale.java:1259)
>   at 
> org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject;
>  (SchemaTypeLoaderBase.java:345)
>   at 
> org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Ljava/io/InputStream;Lorg/apache/xmlbeans/XmlOptions;)Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/DocumentDocument;
>  (Unknown Source)
>   at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead()V 
> (XWPFDocument.java:178)
>   at 
> org.apache.poi.ooxml.POIXMLDocument.load(Lorg/apache/poi/ooxml/POIXMLFactory;)V
>  (POIXMLDocument.java:184)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.(Lorg/apache/poi/openxml4j/opc/OPCPackage;)V
>  (XWPFDocument.java:138)
>   at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.(Lorg/apache/poi/openxml4j/opc/OPCPackage;)V
>  (XWPFWordExtractor.java:60)
>   at 
> org.apache.poi.ooxml.extractor.ExtractorFactory.createExtractor(Lorg/apache/poi/openxml4j/opc/OPCPackage;)Lorg/apache/poi/extractor/POITextExtractor;
>  (ExtractorFactory.java:224)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
>  (OOXMLExtractorFactory.java:170)
>   at 
> o

[jira] [Commented] (TIKA-3809) OutOfMemoryError occurs while reading doc file

2022-07-06 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563219#comment-17563219
 ] 

Tim Allison commented on TIKA-3809:
---

By default, Tika uses the dom parser in all versions.  You can select the 
beta-grade sax parser: 
https://cwiki.apache.org/confluence/display/TIKA/MSOfficeParsers

> OutOfMemoryError occurs while reading doc file
> --
>
> Key: TIKA-3809
> URL: https://issues.apache.org/jira/browse/TIKA-3809
> Project: Tika
>  Issue Type: Bug
>  Components: app
>Affects Versions: 1.23
>Reporter: earl
>Priority: Blocker
>
> OutOfMemoryError occurs while parsing a docx file of size 8 MB (uncompressed 
> size 250 MB). while analyzing the heapdump(.hprof), the thread that parses 
> the file consumes about 750 MB heap size. while looking into a 
> dominator_tree, 
> {code:java}
> org.apache.xmlbeans.impl.store.Xobj$ElementXobj
> {code}
>  This object has been created many times!
> I've also attached the stacktrace,
> {code:java}
> at 
> org.apache.xmlbeans.impl.store.Cur.createElementXobj(Lorg/apache/xmlbeans/impl/store/Locale;Ljavax/xml/namespace/QName;Ljavax/xml/namespace/QName;)Lorg/apache/xmlbeans/impl/store/Xobj;
>  (Cur.java:260)
>   at 
> org.apache.xmlbeans.impl.store.Cur$CurLoadContext.startElement(Ljavax/xml/namespace/QName;)V
>  (Cur.java:2997)
>   at 
> org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Lorg/xml/sax/Attributes;)V
>  (Locale.java:3164)
>   at 
> org.apache.xerces.parsers.AbstractSAXParser.startElement(Lorg/apache/xerces/xni/QName;Lorg/apache/xerces/xni/XMLAttributes;Lorg/apache/xerces/xni/Augmentations;)V
>  (Unknown Source)
>   at 
> org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Lorg/apache/xerces/xni/QName;Lorg/apache/xerces/xni/XMLAttributes;Lorg/apache/xerces/xni/Augmentations;)V
>  (Unknown Source)
>   at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement()Z 
> (Unknown Source)
>   at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Z)Z
>  (Unknown Source)
>   at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Z)Z 
> (Unknown Source)
>   at org.apache.xerces.parsers.XML11Configuration.parse(Z)Z (Unknown Source)
>   at 
> org.apache.xerces.parsers.XML11Configuration.parse(Lorg/apache/xerces/xni/parser/XMLInputSource;)V
>  (Unknown Source)
>   at 
> org.apache.xerces.parsers.XMLParser.parse(Lorg/apache/xerces/xni/parser/XMLInputSource;)V
>  (Unknown Source)
>   at 
> org.apache.xerces.parsers.AbstractSAXParser.parse(Lorg/xml/sax/InputSource;)V 
> (Unknown Source)
>   at 
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Lorg/xml/sax/InputSource;)V
>  (Unknown Source)
>   at 
> org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Lorg/apache/xmlbeans/impl/store/Locale;Lorg/xml/sax/InputSource;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/impl/store/Cur;
>  (Locale.java:3422)
>   at 
> org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject;
>  (Locale.java:1272)
>   at 
> org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Lorg/apache/xmlbeans/SchemaTypeLoader;Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject;
>  (Locale.java:1259)
>   at 
> org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject;
>  (SchemaTypeLoaderBase.java:345)
>   at 
> org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Ljava/io/InputStream;Lorg/apache/xmlbeans/XmlOptions;)Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/DocumentDocument;
>  (Unknown Source)
>   at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead()V 
> (XWPFDocument.java:178)
>   at 
> org.apache.poi.ooxml.POIXMLDocument.load(Lorg/apache/poi/ooxml/POIXMLFactory;)V
>  (POIXMLDocument.java:184)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.(Lorg/apache/poi/openxml4j/opc/OPCPackage;)V
>  (XWPFDocument.java:138)
>   at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.(Lorg/apache/poi/openxml4j/opc/OPCPackage;)V
>  (XWPFWordExtractor.java:60)
>   at 
> org.apache.poi.ooxml.extractor.ExtractorFactory.createExtractor(Lorg/apache/poi/openxml4j/opc/OPCPackage;)Lorg/apache/poi/extractor/POITextExtractor;
>  (ExtractorFactory.java:224)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
>  (OOXMLExtractorFactory.java:17

[jira] [Commented] (TIKA-3809) OutOfMemoryError occurs while reading doc file

2022-07-26 Thread earl (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571282#comment-17571282
 ] 

earl commented on TIKA-3809:


Thanks a lot! It now reads the doc file without throwing OOM.. Sorry for the 
delayed response.

> OutOfMemoryError occurs while reading doc file
> --
>
> Key: TIKA-3809
> URL: https://issues.apache.org/jira/browse/TIKA-3809
> Project: Tika
>  Issue Type: Bug
>  Components: app
>Affects Versions: 1.23
>Reporter: earl
>Priority: Blocker
>
> OutOfMemoryError occurs while parsing a docx file of size 8 MB (uncompressed 
> size 250 MB). while analyzing the heapdump(.hprof), the thread that parses 
> the file consumes about 750 MB heap size. while looking into a 
> dominator_tree, 
> {code:java}
> org.apache.xmlbeans.impl.store.Xobj$ElementXobj
> {code}
>  This object has been created many times!
> I've also attached the stacktrace,
> {code:java}
> at 
> org.apache.xmlbeans.impl.store.Cur.createElementXobj(Lorg/apache/xmlbeans/impl/store/Locale;Ljavax/xml/namespace/QName;Ljavax/xml/namespace/QName;)Lorg/apache/xmlbeans/impl/store/Xobj;
>  (Cur.java:260)
>   at 
> org.apache.xmlbeans.impl.store.Cur$CurLoadContext.startElement(Ljavax/xml/namespace/QName;)V
>  (Cur.java:2997)
>   at 
> org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Lorg/xml/sax/Attributes;)V
>  (Locale.java:3164)
>   at 
> org.apache.xerces.parsers.AbstractSAXParser.startElement(Lorg/apache/xerces/xni/QName;Lorg/apache/xerces/xni/XMLAttributes;Lorg/apache/xerces/xni/Augmentations;)V
>  (Unknown Source)
>   at 
> org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Lorg/apache/xerces/xni/QName;Lorg/apache/xerces/xni/XMLAttributes;Lorg/apache/xerces/xni/Augmentations;)V
>  (Unknown Source)
>   at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement()Z 
> (Unknown Source)
>   at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Z)Z
>  (Unknown Source)
>   at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Z)Z 
> (Unknown Source)
>   at org.apache.xerces.parsers.XML11Configuration.parse(Z)Z (Unknown Source)
>   at 
> org.apache.xerces.parsers.XML11Configuration.parse(Lorg/apache/xerces/xni/parser/XMLInputSource;)V
>  (Unknown Source)
>   at 
> org.apache.xerces.parsers.XMLParser.parse(Lorg/apache/xerces/xni/parser/XMLInputSource;)V
>  (Unknown Source)
>   at 
> org.apache.xerces.parsers.AbstractSAXParser.parse(Lorg/xml/sax/InputSource;)V 
> (Unknown Source)
>   at 
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Lorg/xml/sax/InputSource;)V
>  (Unknown Source)
>   at 
> org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Lorg/apache/xmlbeans/impl/store/Locale;Lorg/xml/sax/InputSource;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/impl/store/Cur;
>  (Locale.java:3422)
>   at 
> org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject;
>  (Locale.java:1272)
>   at 
> org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Lorg/apache/xmlbeans/SchemaTypeLoader;Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject;
>  (Locale.java:1259)
>   at 
> org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject;
>  (SchemaTypeLoaderBase.java:345)
>   at 
> org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Ljava/io/InputStream;Lorg/apache/xmlbeans/XmlOptions;)Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/DocumentDocument;
>  (Unknown Source)
>   at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead()V 
> (XWPFDocument.java:178)
>   at 
> org.apache.poi.ooxml.POIXMLDocument.load(Lorg/apache/poi/ooxml/POIXMLFactory;)V
>  (POIXMLDocument.java:184)
>   at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.(Lorg/apache/poi/openxml4j/opc/OPCPackage;)V
>  (XWPFDocument.java:138)
>   at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.(Lorg/apache/poi/openxml4j/opc/OPCPackage;)V
>  (XWPFWordExtractor.java:60)
>   at 
> org.apache.poi.ooxml.extractor.ExtractorFactory.createExtractor(Lorg/apache/poi/openxml4j/opc/OPCPackage;)Lorg/apache/poi/extractor/POITextExtractor;
>  (ExtractorFactory.java:224)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
>  (OOXMLExtractorFactory.java:170)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(Ljava/io/Inp