[ 
https://issues.apache.org/jira/browse/TIKA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Story updated TIKA-2179:
-----------------------------
    Attachment: File5.xml

> WordMLParser fails to parse a word xml file
> -------------------------------------------
>
>                 Key: TIKA-2179
>                 URL: https://issues.apache.org/jira/browse/TIKA-2179
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.14
>         Environment: OSX, java 8
>            Reporter: Sean Story
>            Priority: Minor
>         Attachments: File5.xml
>
>
> h3. Problem
> I have a sample word.xml file that can be parsed by neither OOXMLParser 
> (yields an exception that was {{Caused by: 
> org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException: The supplied 
> data appears to be a raw XML file. Formats such as Office 2003 XML are not 
> supported}}) nor by OfficeParser (yields an exception like: 
> {{org.apache.poi.poifs.filesystem.NotOLE2FileException: The supplied data 
> appears to be a raw XML file. Formats such as Office 2003 XML are not 
> supported}}
> I found TIKA-1958 which mentioned the new WordMLParser, so downloaded the 
> source, built, and updated my tika version to 1.14. However, when parsing 
> with WordMLParser, the output text content I get is the empty string {{""}}, 
> but I'm expecting something more like:
> {noformat}
> It means that the guy that you are trading with was reported for a scam 
> attempt. As the others mentioned, some of these BOFA could be false.
> What's important is the current trade that you are doing.
> If everything seems to be in order then there is nothing wrong with going 
> through with the trade.
> Auti, Sneha (QAPM)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to