https://issues.apache.org/bugzilla/show_bug.cgi?id=55732
--- Comment #4 from Marcel Pokrandt <[email protected]> --- I can confirm this bug with my own old ´97 PPT which contains nothing more than an empty Text-Area. Caused by: java.lang.ArrayIndexOutOfBoundsException: 20 at org.apache.poi.util.LittleEndian.getInt(LittleEndian.java:161) at org.apache.poi.hslf.record.StyleTextProp9Atom.<init>(StyleTextProp9Atom.java:70) ... 65 more I made a small test-case (attached) and a suggested solution (attached too) as a patch of class org.apache.poi.hslf.record.StyleTextProp9Atom. Before reading the (not used) fields textCfException9 and textSiException I check if the offset is already behind the array size. if (i >= data.length) { break; } Since both fields are NOT used anywhere I think it should be safe to skip reading them in this case. With my patch two of my checked files with same error succeed to parse and I could extract text. I would really appreciate if you could integrate this patch because I´m using poi/tika for indexing a great bunch of office files and a lot of them seem to fail because of the same error. -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
