[ https://issues.apache.org/jira/browse/TIKA-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237029#comment-15237029 ]
Tim Allison commented on TIKA-1947: ----------------------------------- I'm not sure this is even a bug. This is the output of the POILogger, not, e.g. printStackTrace(). I _think_ with the right configuration, this should be treated properly. However, I'm not able to get e.g. {{ -Dorg.apache.poi.util.POILogger=org.apache.poi.util.NullLogger}} or the option recommended [here|http://javaevangelist.blogspot.com/2010/10/disabling-apache-poi-logging.html] to work...perhaps drop a question to the POI users list? Before doing that, though, try configuring log4j. > IllegalArgumentException stacktrace in output since POI update > -------------------------------------------------------------- > > Key: TIKA-1947 > URL: https://issues.apache.org/jira/browse/TIKA-1947 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.13 > Reporter: Sam H > Priority: Minor > Attachments: iae.xlsx > > > I tried parsing an Excel document, and noticed there was an > IllegalArgumentException stacktrace in the output. > I've traced this back to > https://github.com/apache/tika/commit/25cee54499126de2b90f6bd5bde8de470b422349 > (TIKA-1799: upgrade to POI 3.14-beta1) > I am unable to reopen that issue, so I'm creating a new one. > Attached you can find my testfile: iae.xlsx > This is the output, running 1.13-snapshot as jar > {code} > java -jar tika-app-1.13-SNAPSHOT.jar iae.xlsx > apr 11, 2016 3:56:26 PM org.apache.poi.ss.format.CellFormat <init> > WARNING: Invalid format: "_([$Ç-2]\ * #,##0.00_);" > java.lang.IllegalArgumentException: Unsupported [] format block '[' in > '_([$Ç-2]\ * #,##0.00_)' > at > org.apache.poi.ss.format.CellFormatPart.formatType(CellFormatPart.java:362) > at > org.apache.poi.ss.format.CellFormatPart.getCellFormatType(CellFormatPart.java:276) > at > org.apache.poi.ss.format.CellFormatPart.<init>(CellFormatPart.java:180) > at org.apache.poi.ss.format.CellFormat.<init>(CellFormat.java:167) > at > org.apache.poi.ss.format.CellFormat.getInstance(CellFormat.java:143) > at > org.apache.poi.ss.usermodel.DataFormatter.getFormat(DataFormatter.java:314) > at > org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:797) > at > org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:769) > at > org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:354) > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:361) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown > Source) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown > Source) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown > Source) > at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown > Source) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown > Source) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown > Source) > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:197) > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:138) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110) > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.getXHTML(XSSFExcelExtractorDecorator.java:97) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:190) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:491) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:144) > apr 11, 2016 3:56:26 PM org.apache.poi.ss.format.CellFormat <init> > WARNING: Invalid format: "_([$Ç-2]\ * \(#,##0.00\);" > java.lang.IllegalArgumentException: Unsupported [] format block '[' in > '_([$Ç-2]\ * \(#,##0.00\)' > at > org.apache.poi.ss.format.CellFormatPart.formatType(CellFormatPart.java:362) > at > org.apache.poi.ss.format.CellFormatPart.getCellFormatType(CellFormatPart.java:276) > at > org.apache.poi.ss.format.CellFormatPart.<init>(CellFormatPart.java:180) > at org.apache.poi.ss.format.CellFormat.<init>(CellFormat.java:167) > at > org.apache.poi.ss.format.CellFormat.getInstance(CellFormat.java:143) > at > org.apache.poi.ss.usermodel.DataFormatter.getFormat(DataFormatter.java:314) > at > org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:797) > at > org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:769) > at > org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:354) > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:361) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown > Source) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown > Source) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown > Source) > at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown > Source) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown > Source) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown > Source) > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:197) > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:138) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110) > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.getXHTML(XSSFExcelExtractorDecorator.java:97) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:190) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:491) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:144) > apr 11, 2016 3:56:26 PM org.apache.poi.ss.format.CellFormat <init> > WARNING: Invalid format: "_([$Ç-2]\ * "-"??_);" > java.lang.IllegalArgumentException: Unsupported [] format block '[' in > '_([$Ç-2]\ * "-"??_)' > at > org.apache.poi.ss.format.CellFormatPart.formatType(CellFormatPart.java:362) > at > org.apache.poi.ss.format.CellFormatPart.getCellFormatType(CellFormatPart.java:276) > at > org.apache.poi.ss.format.CellFormatPart.<init>(CellFormatPart.java:180) > at org.apache.poi.ss.format.CellFormat.<init>(CellFormat.java:167) > at > org.apache.poi.ss.format.CellFormat.getInstance(CellFormat.java:143) > at > org.apache.poi.ss.usermodel.DataFormatter.getFormat(DataFormatter.java:314) > at > org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:797) > at > org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:769) > at > org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:354) > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:361) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown > Source) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown > Source) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown > Source) > at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown > Source) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown > Source) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown > Source) > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:197) > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:138) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110) > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.getXHTML(XSSFExcelExtractorDecorator.java:97) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:190) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:491) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:144) > <?xml version="1.0" encoding="UTF-8"?><html > xmlns="http://www.w3.org/1999/xhtml"> > <head> > <meta name="date" content="2016-04-11T13:45:08Z"/> > <meta name="extended-properties:AppVersion" content="15.0300"/> > <meta name="dc:creator" content="nick"/> > <meta name="extended-properties:Company" content=""/> > <meta name="dcterms:created" content="2016-01-05T14:53:37Z"/> > <meta name="Last-Modified" content="2016-04-11T13:45:08Z"/> > <meta name="dcterms:modified" content="2016-04-11T13:45:08Z"/> > <meta name="Last-Save-Date" content="2016-04-11T13:45:08Z"/> > <meta name="protected" content="false"/> > <meta name="meta:save-date" content="2016-04-11T13:45:08Z"/> > <meta name="Application-Name" content="Microsoft Excel"/> > <meta name="modified" content="2016-04-11T13:45:08Z"/> > <meta name="Content-Length" content="9119"/> > <meta name="Content-Type" > content="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"/> > <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/> > <meta name="X-Parsed-By" > content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser"/> > <meta name="creator" content="nick"/> > <meta name="meta:author" content="nick"/> > <meta name="meta:creation-date" content="2016-01-05T14:53:37Z"/> > <meta name="extended-properties:Application" content="Microsoft Excel"/> > <meta name="meta:last-author" content="Sam"/> > <meta name="Creation-Date" content="2016-01-05T14:53:37Z"/> > <meta name="resourceName" content="iae.xlsx"/> > <meta name="Last-Author" content="Sam"/> > <meta name="Application-Version" content="15.0300"/> > <meta name="Author" content="nick"/> > <meta name="publisher" content=""/> > <meta name="dc:publisher" content=""/> > <title/> > </head> > <body><div><h1>Sheet1</h1> > <table><tbody><tr> <td>69.99</td></tr> > </tbody></table> > </div> > </body></html> > {code} > The real output is consistent with what I would expect (and with the output > from version 1.12) > I would expect this exception to be handled another way, but not to show up > (as text) in my parsed output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)