Jitin Jindal created TIKA-3544:
----------------------------------

             Summary: Extraction of long sequences of digits from Excel 
spreadsheets using Tika 1.20 doesn’t yield the expected results
                 Key: TIKA-3544
                 URL: https://issues.apache.org/jira/browse/TIKA-3544
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.20
            Reporter: Jitin Jindal


If an Excel spreadsheet contains a long sequence of digits, such as a credit 
card number, Tika 1.13 will emit the said sequence in scientific notation.

For example, the credit card number “6011799905775830” is extracted from the 
attached spreadsheet as 6.480195344642784E15, which clearly is not the desired 
output.

I think the impact of this issue is significant. There’s plenty of information 
that can no longer be reliably extracted from spreadsheets. Think credit card 
numbers, telephone numbers and product identifiers to name a few.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to