TikaException / OfficeXmlFileException with .xlsb files
-------------------------------------------------------

                 Key: TIKA-826
                 URL: https://issues.apache.org/jira/browse/TIKA-826
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.1
         Environment: Windows 7
            Reporter: John Mastarone


The file testEXCEL.xlsb in the tika-parsers test-documents folder causes a POI 
OfficeXmlFileException when one tries to open it with TikaGUI or TikaCLI, using 
a latest build.  The reason: Tika has it configured to be opened with the 
OfficeParser class, rather than the OOXMLParser class; it is an Office 2007 
file, and should be opened with the OOXMLParser class.  Neither the 
ExcelParserTest class nor the OOXMLParserTest class has anything related to 
.xlsb files.  Once changes are made to these two parsers so that the 
OOXMLParser is used (I'll submit a patch shortly for these), the 
OfficeXmlFileException goes away, and a new POI exception 
(IllegalArgumentException in the ExtractorFactory class) arises in its place, 
somewhat related to unsolved POI bug 51921; the creator of this bug mentions a 
.xlsb file among others.  This exception appears to occur because POI doesn't 
seem to be able to handle .xlsb files whatsoever.  A cursory search of the 
source for "xlsb" or its mime type yields nothing relevant, and its project has 
no .xlsb test files that I can see.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to