[ 
https://issues.apache.org/jira/browse/TIKA-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237029#comment-15237029
 ] 

Tim Allison commented on TIKA-1947:
-----------------------------------

I'm not sure this is even a bug.  This is the output of the POILogger, not, 
e.g. printStackTrace().  I _think_ with the right configuration, this should be 
treated properly.  However, I'm not able to get e.g. {{ 
-Dorg.apache.poi.util.POILogger=org.apache.poi.util.NullLogger}} or the option 
recommended 
[here|http://javaevangelist.blogspot.com/2010/10/disabling-apache-poi-logging.html]
 to work...perhaps drop a question to the POI users list?  Before doing that, 
though, try configuring log4j.

> IllegalArgumentException stacktrace in output since POI update
> --------------------------------------------------------------
>
>                 Key: TIKA-1947
>                 URL: https://issues.apache.org/jira/browse/TIKA-1947
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.13
>            Reporter: Sam H
>            Priority: Minor
>         Attachments: iae.xlsx
>
>
> I tried parsing an Excel document, and noticed there was an 
> IllegalArgumentException stacktrace in the output.
> I've traced this back to 
> https://github.com/apache/tika/commit/25cee54499126de2b90f6bd5bde8de470b422349
> (TIKA-1799: upgrade to POI 3.14-beta1) 
> I am unable to reopen that issue, so I'm creating a new one.
> Attached you can find my testfile: iae.xlsx
> This is the output, running 1.13-snapshot as jar
> {code}
> java -jar tika-app-1.13-SNAPSHOT.jar iae.xlsx
> apr 11, 2016 3:56:26 PM org.apache.poi.ss.format.CellFormat <init>
> WARNING: Invalid format: "_([$Ç-2]\ * #,##0.00_);"
> java.lang.IllegalArgumentException: Unsupported [] format block '[' in 
> '_([$Ç-2]\ * #,##0.00_)'
>         at 
> org.apache.poi.ss.format.CellFormatPart.formatType(CellFormatPart.java:362)
>         at 
> org.apache.poi.ss.format.CellFormatPart.getCellFormatType(CellFormatPart.java:276)
>         at 
> org.apache.poi.ss.format.CellFormatPart.<init>(CellFormatPart.java:180)
>         at org.apache.poi.ss.format.CellFormat.<init>(CellFormat.java:167)
>         at 
> org.apache.poi.ss.format.CellFormat.getInstance(CellFormat.java:143)
>         at 
> org.apache.poi.ss.usermodel.DataFormatter.getFormat(DataFormatter.java:314)
>         at 
> org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:797)
>         at 
> org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:769)
>         at 
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:354)
>         at 
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:361)
>         at 
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown
>  Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown
>  Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown
>  Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>  Source)
>         at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown 
> Source)
>         at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
>  Source)
>         at 
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:197)
>         at 
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:138)
>         at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
>         at 
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.getXHTML(XSSFExcelExtractorDecorator.java:97)
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112)
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>         at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:190)
>         at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:491)
>         at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:144)
> apr 11, 2016 3:56:26 PM org.apache.poi.ss.format.CellFormat <init>
> WARNING: Invalid format: "_([$Ç-2]\ * \(#,##0.00\);"
> java.lang.IllegalArgumentException: Unsupported [] format block '[' in 
> '_([$Ç-2]\ * \(#,##0.00\)'
>         at 
> org.apache.poi.ss.format.CellFormatPart.formatType(CellFormatPart.java:362)
>         at 
> org.apache.poi.ss.format.CellFormatPart.getCellFormatType(CellFormatPart.java:276)
>         at 
> org.apache.poi.ss.format.CellFormatPart.<init>(CellFormatPart.java:180)
>         at org.apache.poi.ss.format.CellFormat.<init>(CellFormat.java:167)
>         at 
> org.apache.poi.ss.format.CellFormat.getInstance(CellFormat.java:143)
>         at 
> org.apache.poi.ss.usermodel.DataFormatter.getFormat(DataFormatter.java:314)
>         at 
> org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:797)
>         at 
> org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:769)
>         at 
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:354)
>         at 
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:361)
>         at 
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown
>  Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown
>  Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown
>  Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>  Source)
>         at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown 
> Source)
>         at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
>  Source)
>         at 
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:197)
>         at 
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:138)
>         at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
>         at 
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.getXHTML(XSSFExcelExtractorDecorator.java:97)
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112)
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>         at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:190)
>         at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:491)
>         at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:144)
> apr 11, 2016 3:56:26 PM org.apache.poi.ss.format.CellFormat <init>
> WARNING: Invalid format: "_([$Ç-2]\ * "-"??_);"
> java.lang.IllegalArgumentException: Unsupported [] format block '[' in 
> '_([$Ç-2]\ * "-"??_)'
>         at 
> org.apache.poi.ss.format.CellFormatPart.formatType(CellFormatPart.java:362)
>         at 
> org.apache.poi.ss.format.CellFormatPart.getCellFormatType(CellFormatPart.java:276)
>         at 
> org.apache.poi.ss.format.CellFormatPart.<init>(CellFormatPart.java:180)
>         at org.apache.poi.ss.format.CellFormat.<init>(CellFormat.java:167)
>         at 
> org.apache.poi.ss.format.CellFormat.getInstance(CellFormat.java:143)
>         at 
> org.apache.poi.ss.usermodel.DataFormatter.getFormat(DataFormatter.java:314)
>         at 
> org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:797)
>         at 
> org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:769)
>         at 
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:354)
>         at 
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:361)
>         at 
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown
>  Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown
>  Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown
>  Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>  Source)
>         at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown 
> Source)
>         at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown 
> Source)
>         at 
> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
>  Source)
>         at 
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:197)
>         at 
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:138)
>         at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
>         at 
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.getXHTML(XSSFExcelExtractorDecorator.java:97)
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112)
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>         at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:190)
>         at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:491)
>         at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:144)
> <?xml version="1.0" encoding="UTF-8"?><html 
> xmlns="http://www.w3.org/1999/xhtml";>
> <head>
> <meta name="date" content="2016-04-11T13:45:08Z"/>
> <meta name="extended-properties:AppVersion" content="15.0300"/>
> <meta name="dc:creator" content="nick"/>
> <meta name="extended-properties:Company" content=""/>
> <meta name="dcterms:created" content="2016-01-05T14:53:37Z"/>
> <meta name="Last-Modified" content="2016-04-11T13:45:08Z"/>
> <meta name="dcterms:modified" content="2016-04-11T13:45:08Z"/>
> <meta name="Last-Save-Date" content="2016-04-11T13:45:08Z"/>
> <meta name="protected" content="false"/>
> <meta name="meta:save-date" content="2016-04-11T13:45:08Z"/>
> <meta name="Application-Name" content="Microsoft Excel"/>
> <meta name="modified" content="2016-04-11T13:45:08Z"/>
> <meta name="Content-Length" content="9119"/>
> <meta name="Content-Type" 
> content="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"/>
> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
> <meta name="X-Parsed-By" 
> content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser"/>
> <meta name="creator" content="nick"/>
> <meta name="meta:author" content="nick"/>
> <meta name="meta:creation-date" content="2016-01-05T14:53:37Z"/>
> <meta name="extended-properties:Application" content="Microsoft Excel"/>
> <meta name="meta:last-author" content="Sam"/>
> <meta name="Creation-Date" content="2016-01-05T14:53:37Z"/>
> <meta name="resourceName" content="iae.xlsx"/>
> <meta name="Last-Author" content="Sam"/>
> <meta name="Application-Version" content="15.0300"/>
> <meta name="Author" content="nick"/>
> <meta name="publisher" content=""/>
> <meta name="dc:publisher" content=""/>
> <title/>
> </head>
> <body><div><h1>Sheet1</h1>
> <table><tbody><tr>      <td>69.99</td></tr>
> </tbody></table>
> </div>
> </body></html>
> {code}
> The real output is consistent with what I would expect (and with the output 
> from version 1.12)
> I would expect this exception to be handled another way, but not to show up 
> (as text) in my parsed output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to