Clark Perkins created TIKA-3030:
-----------------------------------
Summary: XLS files with a root node named WORKBOOK don't get parsed
Key: TIKA-3030
URL: https://issues.apache.org/jira/browse/TIKA-3030
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.23
Reporter: Clark Perkins
I have an XLS file where the root node contains 2 top-level names - "WORKBOOK"
and " SummaryInformation".
The type gets properly detected as "application/vnd.ms-excel", because the
detector does a check like so:
{noformat}
if (names.contains("Workbook") || names.contains("WORKBOOK")) {
...
}{noformat}
However, the ExcelExtractor silently rejects the file because the root node
doesn't contain a top level node named "Workbook".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)