[ https://issues.apache.org/jira/browse/TIKA-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178632#comment-13178632 ]
Nick Burch commented on TIKA-826: --------------------------------- Should be fixed in r1226651 - Neither parser now claims the format, and if it gets to the OOXML one on the basis of the parent type, it's declined. Tests also added for these cases. > TikaException / OfficeXmlFileException with .xlsb files > ------------------------------------------------------- > > Key: TIKA-826 > URL: https://issues.apache.org/jira/browse/TIKA-826 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.1 > Environment: Windows 7 > Reporter: John Mastarone > Fix For: 1.1 > > Attachments: TIKA-826.patch > > > The file testEXCEL.xlsb in the tika-parsers test-documents folder causes a > POI OfficeXmlFileException when one tries to open it with TikaGUI or TikaCLI, > using a latest build. The reason: Tika has it configured to be opened with > the OfficeParser class, rather than the OOXMLParser class; it is an Office > 2007 file, and should be opened with the OOXMLParser class. Neither the > ExcelParserTest class nor the OOXMLParserTest class has anything related to > .xlsb files. Once changes are made to these two parsers so that the > OOXMLParser is used (I'll submit a patch shortly for these), the > OfficeXmlFileException goes away, and a new POI exception > (IllegalArgumentException in the ExtractorFactory class) arises in its place, > somewhat related to unsolved POI bug 51921; the creator of this bug mentions > a .xlsb file among others. This exception appears to occur because POI > doesn't seem to be able to handle .xlsb files whatsoever. A cursory search > of the source for "xlsb" or its mime type yields nothing relevant, and its > project has no .xlsb test files that I can see. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira