Hello!

05.09.2011, в 16:23, Jukka Zitting написал(а):

> That was me in revision 1164578 for TIKA-704. :-(
> 
>> -            if (root.hasEntry("CONTENTS")) {
>> -                stream = TikaInputStream.get(
>> -                        fs.createDocumentInputStream("CONTENTS"));
> 
> This was my attempt at properly handling the embedded PDF in
> TestWithPdf.docx. It was included in an OLE object with the PDF
> document as it's "CONTENTS" entry. I restored this functionality with
> some more specific checks in revision 1165259, and the resulting code
> should now work correctly with all the test documents we have.

Hm, that is strange - current version of 
OfficeParser.POIFSDocumentType.detectType() thinks that "CONTENTS" part 
identifies POI filesystem as MS Works document. Maybe this is not right.

Please add unit test with that TestWithPdf.docx.

best wishes, Max

Reply via email to