Clark Perkins created TIKA-3030:
-----------------------------------

             Summary: XLS files with a root node named WORKBOOK don't get parsed
                 Key: TIKA-3030
                 URL: https://issues.apache.org/jira/browse/TIKA-3030
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.23
            Reporter: Clark Perkins


I have an XLS file where the root node contains 2 top-level names - "WORKBOOK" 
and " SummaryInformation".

The type gets properly detected as "application/vnd.ms-excel", because the 
detector does a check like so:
{noformat}
if (names.contains("Workbook") || names.contains("WORKBOOK")) {
    ...
}{noformat}
However, the ExcelExtractor silently rejects the file because the root node 
doesn't contain a top level node named "Workbook".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to