[ https://issues.apache.org/jira/browse/TIKA-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174803#comment-13174803 ]
John Mastarone edited comment on TIKA-826 at 12/22/11 1:35 PM: --------------------------------------------------------------- After reading a little more on this, I see that POI doesn't plan on supporting xlsb files anytime soon (I think), so, should Tika really try to handle them at all? was (Author: jfm.apache): After reading a little more on this, I see that POI doesn't plan on supporting xlsb files anytime soon (?), so, should Tika really try to handle them at all? > TikaException / OfficeXmlFileException with .xlsb files > ------------------------------------------------------- > > Key: TIKA-826 > URL: https://issues.apache.org/jira/browse/TIKA-826 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.1 > Environment: Windows 7 > Reporter: John Mastarone > Attachments: TIKA-826.patch > > > The file testEXCEL.xlsb in the tika-parsers test-documents folder causes a > POI OfficeXmlFileException when one tries to open it with TikaGUI or TikaCLI, > using a latest build. The reason: Tika has it configured to be opened with > the OfficeParser class, rather than the OOXMLParser class; it is an Office > 2007 file, and should be opened with the OOXMLParser class. Neither the > ExcelParserTest class nor the OOXMLParserTest class has anything related to > .xlsb files. Once changes are made to these two parsers so that the > OOXMLParser is used (I'll submit a patch shortly for these), the > OfficeXmlFileException goes away, and a new POI exception > (IllegalArgumentException in the ExtractorFactory class) arises in its place, > somewhat related to unsolved POI bug 51921; the creator of this bug mentions > a .xlsb file among others. This exception appears to occur because POI > doesn't seem to be able to handle .xlsb files whatsoever. A cursory search > of the source for "xlsb" or its mime type yields nothing relevant, and its > project has no .xlsb test files that I can see. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira