[ https://issues.apache.org/jira/browse/TIKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226499#comment-13226499 ]
Albert L. commented on TIKA-873: -------------------------------- Hi Nick, "testWORD_embeded.doc" is working. I get the following: C:\code\temp>java -jar c:\code\tika-app-1.0.jar -z testWORD_embeded.doc Extracting 'image1' (image/unknown) Extracting 'image4.png' (image/png) Extracting 'image5.jpg' (image/jpeg) Extracting 'image6.png' (image/png) Extracting 'image2' (image/unknown) Extracting 'image3' (image/unknown) Extracting 'file0.docx' (application/vnd.openxmlformats-officedocument.wordprocessingml.document) Extracting '_1345471035.ppt' (application/vnd.ms-powerpoint) Extracting '_1345470949.xls' (application/vnd.ms-excel) Albert > Tika --extract fails for DOC > ---------------------------- > > Key: TIKA-873 > URL: https://issues.apache.org/jira/browse/TIKA-873 > Project: Tika > Issue Type: Bug > Components: general > Affects Versions: 1.0 > Environment: Windows 7 + Java v1.6 > Reporter: Albert L. > Fix For: 1.2 > > Attachments: embedded.doc > > > A file that is embedded in an DOCfile doesn't get extracted to disk. > To "embed" a file into an DOC, simply drag-drop it into an DOC document when > using MS-Word 2010. It will then create an EMF of the embedded file's > preview. > See attached file "embedded.doc" for an example input file that fails with > Tika v1.0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira