Seva Alekseyev created TIKA-2205: ------------------------------------ Summary: IllegalArgumentException on a valid Excel file Key: TIKA-2205 URL: https://issues.apache.org/jira/browse/TIKA-2205 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.14 Environment: Windows 7 x64, JVM 1.8.0_101 Reporter: Seva Alekseyev
The attached file, which opens in Excel, errors out in Tika: java.lang.IllegalArgumentException: Cannot format given Object as a Number at java.text.DecimalFormat.format:-1 at java.text.Format.format:-1 at org.apache.poi.ss.usermodel.DataFormatter.performDateFormatting:736 at org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents:804 at org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents:785 at org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.formatNumberDateCell:143 at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener$TikaFormatTrackingHSSFListener.formatNumberDateCell:633 at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord:432 at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord:336 at org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord:92 at org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord:109 at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents:179 at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents:136 at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile:312 at org.apache.tika.parser.microsoft.ExcelExtractor.parse:169 at org.apache.tika.parser.microsoft.OfficeParser.parse:177 at org.apache.tika.parser.microsoft.OfficeParser.parse:130 at gov.nih.niaid.fscanner.Extract.ExtractContents:69 org.apache.tika.exception.TikaException for 63269/<\\ai-storm\FScan\Scan_2016-12-11_11-14-13\Folders\51541330\engelAPBD copy.pptx>: "Error creating OOXML extractor" org.apache.tika.exception.TikaException: Error creating OOXML extractor at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse:120 at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse:87 -- This message was sent by Atlassian JIRA (v6.3.4#6332)