https://issues.apache.org/bugzilla/show_bug.cgi?id=54213
--- Comment #2 from Yegor Kozlov <[email protected]> --- The raw object is a MSGraph.Chart, not a Excel workbook. Don't be misled by the stream name "Workbook" - it is just a format convention. The MSGraph.Chart format is a derivative from BIFF8. The content stream consists of records but the structure and length of the records *CAN* be totally different from their analogues in the binary .xls format. For example, POI-HSSF parser detects record with sid=0x3d as WindowOneRecord and expects that such a record consists of nine shorts and has size of 18 bytes (9 fields of 2 bytes each) . the MSGraph.Chart format is different: depending on the position of WindowOneRecord in the stream it can be either 18 bytes (nine two-byte fields) or 10 bytes (five two-byte fields), see section 2.4.104 in [MS-OGRAPH].pdf I found similar discrepancies for SelectionRecord (0x001D) and LinkedDataRecord (0x1051). All this means that using HSSF to parse MSGraph.Chart is not quite correct. It is a special case you need a special parser to handle it. What information do you need to extract from embedded charts? Series text and data labels? What else ? I'm thinking of a special record factory and a even-driven parser that will read only specific bits of data. We may need to extend current API to support it. Regards, Yegor -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
