[ 
https://issues.apache.org/jira/browse/TIKA-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Mastarone updated TIKA-826:
--------------------------------

    Attachment: TIKA-826.patch

Patch for OfficeParser and OOXMLParser classes so that .xlsb files are handled 
by the latter.  Solves one problem and exposes another.
                
> TikaException / OfficeXmlFileException with .xlsb files
> -------------------------------------------------------
>
>                 Key: TIKA-826
>                 URL: https://issues.apache.org/jira/browse/TIKA-826
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>         Environment: Windows 7
>            Reporter: John Mastarone
>         Attachments: TIKA-826.patch
>
>
> The file testEXCEL.xlsb in the tika-parsers test-documents folder causes a 
> POI OfficeXmlFileException when one tries to open it with TikaGUI or TikaCLI, 
> using a latest build.  The reason: Tika has it configured to be opened with 
> the OfficeParser class, rather than the OOXMLParser class; it is an Office 
> 2007 file, and should be opened with the OOXMLParser class.  Neither the 
> ExcelParserTest class nor the OOXMLParserTest class has anything related to 
> .xlsb files.  Once changes are made to these two parsers so that the 
> OOXMLParser is used (I'll submit a patch shortly for these), the 
> OfficeXmlFileException goes away, and a new POI exception 
> (IllegalArgumentException in the ExtractorFactory class) arises in its place, 
> somewhat related to unsolved POI bug 51921; the creator of this bug mentions 
> a .xlsb file among others.  This exception appears to occur because POI 
> doesn't seem to be able to handle .xlsb files whatsoever.  A cursory search 
> of the source for "xlsb" or its mime type yields nothing relevant, and its 
> project has no .xlsb test files that I can see.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to