Nick Burch created TIKA-1490:
--------------------------------

             Summary: Basic parser for old Excel files (eg Excel 4)
                 Key: TIKA-1490
                 URL: https://issues.apache.org/jira/browse/TIKA-1490
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 1.6
            Reporter: Nick Burch


In TIKA-1487, we added mime magic for the pre-OLE2 excel file formats. Based on 
the reading of the OpenOffice Excel docs for that, it looks like it should be 
possible to produce a basic parser to extract key bits of info (eg strings) 
from these older file formats. 

This would likely largely be done by having a custom record iterator for the 
older formats, then passing the handful of "interesting" records to POI's 
record classes (maybe with some tweaks for the older formats) to have the 
binary data parsed, then returned by the parser



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to