https://bz.apache.org/bugzilla/show_bug.cgi?id=59021
Bug ID: 59021
Summary: XSSFSheetXMLHandler is using qName instead of
localName and missing cells/rows
Product: POI
Version: unspecified
Hardware: PC
Status: NEW
Severity: critical
Priority: P2
Component: XSSF
Assignee: [email protected]
Reporter: [email protected]
On TIKA-1859, Movses raised an issue that he can extract content with POI from
a specific xlsx file but not from Tika.
I confirmed that the content is extractable with XSSFWorkbook.
However, Tika does a streaming read with XSSFSheetXMLHandler.
XSSFSheetXMLHandler relies on qName to find "row" and "c". In the submitted
problematic file, the qName includes the namespace (i.e. "x:row", "x:c") and
the sheet handler entirely skips that content.
When I switched the string processing in startElement and endElement in
XSSFSheetXMLHandler to rely on localName, instead of qName, content was
correctly extracted.
Movses ranked this a blocker on Tika. It would be great if we could get the
fix in before we cut 3.14... I should have time tonight so make the fix in
trunk.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]