I was wrong when saying that All OpenDocument are failed, some files
passed, but alot of them failed with NPE in OpenDocumentParser line 161.

I'm looking to OpenDocumentParser.java on 1.6. The bug comes from block
lines 126-130 when input is TikaInputStream (our case):
if (container instanceof ZipFile) {
                zipFile = (ZipFile) container;
            } else if (tis.hasFile()) {
                zipFile = new ZipFile(tis.getFile());
            }

zipFile is sometimes never created.


For information, this bug is really fixed in 1.7-SNAPSHOT. Here's the
detail of comparison on two versions on same corpus:
1.6:
14-09-09 16:17:43 INFO  (DocumentConversionErrorPlugin.java : 115) [pool-2
-thread-2] Summary of document conversion errors:
- pdf (7)
- pptx (10)
- doc (6)
- ppt (14)
- xls (9)
- dwg (4)
- odp (495)
- odt (839)
- pps (2)
- ods (1)

1.7-SNASPHOT:
- pdf (7) - pptx (10) - doc (6) - ppt (14) - xls (9) - dwg (4) - odp (2) -
pps (2)


On Thu, Sep 11, 2014 at 8:55 PM, Ken Krugler <kkrugler_li...@transpac.com>
wrote:

>
> > From: Hong-Thai Nguyen
> > Sent: September 11, 2014 5:21:41am PDT
> > To: dev@tika.apache.org
> > Subject: NPE on all *.odt, odp, .ods documents
> >
> > Hi all,
> >
> > I've tested the conversion Tika 1.6 with our corpus, all OpenOffice
> > document types are failed with NPE. Fix has been done on
> > https://issues.apache.org/jira/browse/TIKA-1412, but available from 1.7.
> > That's a fatal error for me.
>
> I'm curious - don't we have unit tests for OpenOffice document types?
>
> If so, then why are they passing, but all docs tried by Hong-Thai fail?
>
> -- Ken
>
> >
> > Should we release a 1.6.1 with the fix of TIKA-1412 ?
> >
> > Tack trace:
> > Caused by: com.polyspot.document.converter.ConversionException:
> > org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> > org.apache.tika.parser.ParserDecorator$1@318e5904
> > at
> >
> com.polyspot.document.converter.DocumentConverter.realizeTikaConversion(DocumentConverter.java:233)
> > at
> >
> com.polyspot.document.converter.DocumentConverter.convert(DocumentConverter.java:127)
> > at
> >
> com.polyspot.wscrawlers.PsDocConverter.getConvertedDocument(PsDocConverter.java:83)
> > ... 22 more
> > Caused by: org.apache.tika.exception.TikaException: Unexpected
> > RuntimeException from org.apache.tika.parser.ParserDecorator$1@318e5904
> > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:246)
> > at
> >
> com.polyspot.document.converter.DocumentConverter.realizeTikaConversion(DocumentConverter.java:225)
> > ... 24 more
> > Caused by: java.lang.NullPointerException
> > at
> >
> org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:161)
> > at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> > ... 25 more
> >
> > --
> > --------------
> > Hong-Thai
>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>


-- 
--------------
Hong-Thai

Reply via email to