Hello! 05.09.2011, в 16:23, Jukka Zitting написал(а):
> That was me in revision 1164578 for TIKA-704. :-( > >> - if (root.hasEntry("CONTENTS")) { >> - stream = TikaInputStream.get( >> - fs.createDocumentInputStream("CONTENTS")); > > This was my attempt at properly handling the embedded PDF in > TestWithPdf.docx. It was included in an OLE object with the PDF > document as it's "CONTENTS" entry. I restored this functionality with > some more specific checks in revision 1165259, and the resulting code > should now work correctly with all the test documents we have. Hm, that is strange - current version of OfficeParser.POIFSDocumentType.detectType() thinks that "CONTENTS" part identifies POI filesystem as MS Works document. Maybe this is not right. Please add unit test with that TestWithPdf.docx. best wishes, Max