https://issues.apache.org/bugzilla/show_bug.cgi?id=52863
--- Comment #4 from HarrySimons <[email protected]> 2012-03-10 01:33:39 UTC --- > Do you know the origin of these failing > docs? Were they created by MS Word or > by OpenOffice or by what ? They were created by a post-2003 and pre-2007 version of MS Word. > Without a sample file we can't do much. Just the name itself of the document is 'Business Intelligence', so you can imagine my difficulty. Even other documents that failing are sensitive enough. I thought, I should be able to remove the sensitive parts of this document and then upload it for the Tika/POI developers. But even mere re-saving the document in Word 2007 (i.e., without any new edits whatsoever) makes the problem mostly go away. I say 'mostly' because, while Tika/POI are then able to extract the text, they also append text like this to the output _-1388201556/ole-[42, 4D, 0E, 0A, 00, 00, 00, 00] _-1388203796/ole-[42, 4D, 2E, 0A, 00, 00, 00, 00] _-1388843352/ole-[42, 4D, 2E, 0A, 00, 00, 00, 00] _-1388845272/ole-[42, 4D, BA, 09, 00, 00, 00, 00] _-1388297360/ole-[42, 4D, BA, 09, 00, 00, 00, 00] _-1388297680/ole-[42, 4D, D6, 09, 00, 00, 00, 00] _-1388296720/ole-[42, 4D, BA, 09, 00, 00, 00, 00] _-1388203476/ole-[42, 4D, 66, 09, 00, 00, 00, 00] _-1382869532/ole-[42, 4D, 36, 0C, 00, 00, 00, 00] _-1388200596/ole-[42, 4D, 2E, 0A, 00, 00, 00, 00] _-1388200916/ole-[42, 4D, BA, 09, 00, 00, 00, 00] _-1383036196/ole-[42, 4D, 12, 09, 00, 00, 00, 00] _-1382867932/ole-[42, 4D, 86, 0A, 00, 00, 00, 00] _-1382868252/ole-[42, 4D, 2E, 0A, 00, 00, 00, 00] _-1380808936/ole-[42, 4D, 2E, 0A, 00, 00, 00, 00] Being a developer myself, I am fully aware how hard it can be to fix (certain) bugs without appropriate test input. I will watch out for newer releases. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
