[ https://issues.apache.org/jira/browse/TIKA-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Seva Alekseyev updated TIKA-2200: --------------------------------- Attachment: MK2048_FROM_ISENTRIS.docx > XML schema mismatch error on a valid Word document > -------------------------------------------------- > > Key: TIKA-2200 > URL: https://issues.apache.org/jira/browse/TIKA-2200 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.14 > Environment: Windows 7 x64, JVM 1.8.0_101 > Reporter: Seva Alekseyev > Attachments: MK2048_FROM_ISENTRIS.docx > > > The attached document, which opens in Word, errors out in Tika: > org.apache.poi.POIXMLException: org.apache.xmlbeans.XmlException: error: The > document is not a > document@http://schemas.openxmlformats.org/wordprocessingml/2006/main: > document element local name mismatch expected document got wordDocument > at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead:241 > at org.apache.poi.POIXMLDocument.load:190 > at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>:124 > at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>:58 > at org.apache.poi.extractor.ExtractorFactory.createExtractor:232 > at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse:86 > at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse:87 > Caused by: org.apache.xmlbeans.XmlException: error: The document is not a > document@http://schemas.openxmlformats.org/wordprocessingml/2006/main: > document element local name mismatch expected document got wordDocument > at org.apache.xmlbeans.impl.store.Locale.verifyDocumentType:459 > at org.apache.xmlbeans.impl.store.Locale.autoTypeDocument:364 > at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject:1391 > at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject:1370 > at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse:370 > at org.apache.poi.POIXMLTypeLoader.parse:116 > at > org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse:-1 > at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead:164 > at org.apache.poi.POIXMLDocument.load:190 > at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>:124 > at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>:58 > at org.apache.poi.extractor.ExtractorFactory.createExtractor:232 > at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse:86 > at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse:87 -- This message was sent by Atlassian JIRA (v6.3.4#6332)