Hi, I've added a test for this case at r1626706. We are having TIKA-1421 which blocks the release.
Hong-Thai -----Message d'origine----- De : Ken Krugler [mailto:kkrugler_li...@transpac.com] Envoyé : jeudi 11 septembre 2014 23:07 À : dev@tika.apache.org Objet : RE: NPE on all *.odt, odp, .ods documents > From: Hong-Thai Nguyen > Sent: September 11, 2014 1:40:08pm PDT > To: dev@tika.apache.org > Subject: Re: NPE on all *.odt, odp, .ods documents > > I was wrong when saying that All OpenDocument are failed, some files > passed, but alot of them failed with NPE in OpenDocumentParser line 161. OK, thanks for clarifying. So I assume we now have a unit test that would fail without the fix, yes? Thanks, -- Ken > > I'm looking to OpenDocumentParser.java on 1.6. The bug comes from > block lines 126-130 when input is TikaInputStream (our case): > if (container instanceof ZipFile) { > zipFile = (ZipFile) container; > } else if (tis.hasFile()) { > zipFile = new ZipFile(tis.getFile()); > } > > zipFile is sometimes never created. > > > For information, this bug is really fixed in 1.7-SNAPSHOT. Here's the > detail of comparison on two versions on same corpus: > 1.6: > 14-09-09 16:17:43 INFO (DocumentConversionErrorPlugin.java : 115) > [pool-2 -thread-2] Summary of document conversion errors: > - pdf (7) > - pptx (10) > - doc (6) > - ppt (14) > - xls (9) > - dwg (4) > - odp (495) > - odt (839) > - pps (2) > - ods (1) > > 1.7-SNASPHOT: > - pdf (7) - pptx (10) - doc (6) - ppt (14) - xls (9) - dwg (4) - odp > (2) - pps (2) > > > On Thu, Sep 11, 2014 at 8:55 PM, Ken Krugler > <kkrugler_li...@transpac.com> > wrote: > >> >>> From: Hong-Thai Nguyen >>> Sent: September 11, 2014 5:21:41am PDT >>> To: dev@tika.apache.org >>> Subject: NPE on all *.odt, odp, .ods documents >>> >>> Hi all, >>> >>> I've tested the conversion Tika 1.6 with our corpus, all OpenOffice >>> document types are failed with NPE. Fix has been done on >>> https://issues.apache.org/jira/browse/TIKA-1412, but available from 1.7. >>> That's a fatal error for me. >> >> I'm curious - don't we have unit tests for OpenOffice document types? >> >> If so, then why are they passing, but all docs tried by Hong-Thai fail? >> >> -- Ken >> >>> >>> Should we release a 1.6.1 with the fix of TIKA-1412 ? >>> >>> Tack trace: >>> Caused by: com.polyspot.document.converter.ConversionException: >>> org.apache.tika.exception.TikaException: Unexpected RuntimeException >>> from >>> org.apache.tika.parser.ParserDecorator$1@318e5904 >>> at >>> >> com.polyspot.document.converter.DocumentConverter.realizeTikaConversi >> on(DocumentConverter.java:233) >>> at >>> >> com.polyspot.document.converter.DocumentConverter.convert(DocumentCon >> verter.java:127) >>> at >>> >> com.polyspot.wscrawlers.PsDocConverter.getConvertedDocument(PsDocConv >> erter.java:83) >>> ... 22 more >>> Caused by: org.apache.tika.exception.TikaException: Unexpected >>> RuntimeException from >>> org.apache.tika.parser.ParserDecorator$1@318e5904 >>> at >>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:24 >>> 6) >>> at >>> >> com.polyspot.document.converter.DocumentConverter.realizeTikaConversi >> on(DocumentConverter.java:225) >>> ... 24 more >>> Caused by: java.lang.NullPointerException at >>> >> org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParse >> r.java:161) >>> at >>> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91 >>> ) at >>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:24 >>> 4) >>> ... 25 more >>> >>> -- >>> -------------- >>> Hong-Thai -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr