https://issues.apache.org/bugzilla/show_bug.cgi?id=54213
--- Comment #4 from Yegor Kozlov <[email protected]> --- I don't know an easy way to tell MSGraph.Chart from a real Excel file. For embedded documents Tika should always check ProgID, this property is stored in the host container. In this particular case you are reading embedded data from a .ppt file and you should check OLEShape#getProgID(). For Excel it should return "Worksheet", for Word - "Document", for MSGraph - "MSGraph.Chart", etc. One problem is that ProgID can contain suffix, e.g. "MSGraph.Chart.8" so it should be a regex check or "startWith" logic. (In reply to comment #3) > Interesting, all news to me! > > Is there an easy way that you know to tell if a file containing a Workbook > entry is really an Excel file, or instead a MSGraph.Chart? We'll need that > logic for Tika -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
