Hi,

I've added a test for this case at r1626706.
We are having TIKA-1421 which blocks the release.

Hong-Thai

-----Message d'origine-----
De : Ken Krugler [mailto:kkrugler_li...@transpac.com] 
Envoyé : jeudi 11 septembre 2014 23:07
À : dev@tika.apache.org
Objet : RE: NPE on all *.odt, odp, .ods documents


> From: Hong-Thai Nguyen
> Sent: September 11, 2014 1:40:08pm PDT
> To: dev@tika.apache.org
> Subject: Re: NPE on all *.odt, odp, .ods documents
> 
> I was wrong when saying that All OpenDocument are failed, some files 
> passed, but alot of them failed with NPE in OpenDocumentParser line 161.

OK, thanks for clarifying.

So I assume we now have a unit test that would fail without the fix, yes?

Thanks,

-- Ken

> 
> I'm looking to OpenDocumentParser.java on 1.6. The bug comes from 
> block lines 126-130 when input is TikaInputStream (our case):
> if (container instanceof ZipFile) {
>                zipFile = (ZipFile) container;
>            } else if (tis.hasFile()) {
>                zipFile = new ZipFile(tis.getFile());
>            }
> 
> zipFile is sometimes never created.
> 
> 
> For information, this bug is really fixed in 1.7-SNAPSHOT. Here's the 
> detail of comparison on two versions on same corpus:
> 1.6:
> 14-09-09 16:17:43 INFO  (DocumentConversionErrorPlugin.java : 115) 
> [pool-2 -thread-2] Summary of document conversion errors:
> - pdf (7)
> - pptx (10)
> - doc (6)
> - ppt (14)
> - xls (9)
> - dwg (4)
> - odp (495)
> - odt (839)
> - pps (2)
> - ods (1)
> 
> 1.7-SNASPHOT:
> - pdf (7) - pptx (10) - doc (6) - ppt (14) - xls (9) - dwg (4) - odp 
> (2) - pps (2)
> 
> 
> On Thu, Sep 11, 2014 at 8:55 PM, Ken Krugler 
> <kkrugler_li...@transpac.com>
> wrote:
> 
>> 
>>> From: Hong-Thai Nguyen
>>> Sent: September 11, 2014 5:21:41am PDT
>>> To: dev@tika.apache.org
>>> Subject: NPE on all *.odt, odp, .ods documents
>>> 
>>> Hi all,
>>> 
>>> I've tested the conversion Tika 1.6 with our corpus, all OpenOffice 
>>> document types are failed with NPE. Fix has been done on 
>>> https://issues.apache.org/jira/browse/TIKA-1412, but available from 1.7.
>>> That's a fatal error for me.
>> 
>> I'm curious - don't we have unit tests for OpenOffice document types?
>> 
>> If so, then why are they passing, but all docs tried by Hong-Thai fail?
>> 
>> -- Ken
>> 
>>> 
>>> Should we release a 1.6.1 with the fix of TIKA-1412 ?
>>> 
>>> Tack trace:
>>> Caused by: com.polyspot.document.converter.ConversionException:
>>> org.apache.tika.exception.TikaException: Unexpected RuntimeException 
>>> from
>>> org.apache.tika.parser.ParserDecorator$1@318e5904
>>> at
>>> 
>> com.polyspot.document.converter.DocumentConverter.realizeTikaConversi
>> on(DocumentConverter.java:233)
>>> at
>>> 
>> com.polyspot.document.converter.DocumentConverter.convert(DocumentCon
>> verter.java:127)
>>> at
>>> 
>> com.polyspot.wscrawlers.PsDocConverter.getConvertedDocument(PsDocConv
>> erter.java:83)
>>> ... 22 more
>>> Caused by: org.apache.tika.exception.TikaException: Unexpected 
>>> RuntimeException from 
>>> org.apache.tika.parser.ParserDecorator$1@318e5904
>>> at 
>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:24
>>> 6)
>>> at
>>> 
>> com.polyspot.document.converter.DocumentConverter.realizeTikaConversi
>> on(DocumentConverter.java:225)
>>> ... 24 more
>>> Caused by: java.lang.NullPointerException at
>>> 
>> org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParse
>> r.java:161)
>>> at 
>>> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91
>>> ) at 
>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:24
>>> 4)
>>> ... 25 more
>>> 
>>> --
>>> --------------
>>> Hong-Thai




--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply via email to