Olof Jonasson created TIKA-1054:
-----------------------------------

             Summary: Problem with parsing excel date formats
                 Key: TIKA-1054
                 URL: https://issues.apache.org/jira/browse/TIKA-1054
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.2
            Reporter: Olof Jonasson


I'm using solr4.0 and tika1.2 and get some problems with indexing excel files 
containing date formats. I've read TIKA-103 and TIKA-360 and there I get the 
impression that the date formatting problem is solved (for some cases at least).


I've used testEXCEL-formats.xls from TIKA-103 and also resaved it as xlsx and 
tested that as well. Default locale on my computer is swedish. This is what I 
get (sorry for the occasional swedish):

Content of testEXCEL-formats.xlsx and testEXCEL-formats.xls
Number #,##0.00 1 599,99 -1 599,99
Currency $#,##0.00;[Red]($#,##0.00) $1 599,99 ($1 599,99)
Scientific 0.00E+00 1,98E+08 -1,98E+08
Percentage (0.025) 3% 2,50%
Fraction (2.5) 2 1/2
Time Format: h:mm AM/PM 6:15 AM 6:15 PM
Time Format: h:mm 06:15 18:15
Date Format: m/d/yy 2009-10-03
Date Format: d-mmm-yy 17-maj-07
Date/Time Format 2008-01-19 04:35
Custom Number: 19 dollars and ,99 cents
Custom Date: At 4:20 AM on torsdag maj 17, 2007

What the tika1.2 parser returns for the xlsx (and is indexed by solr)
Number #,##0.00 1 599,99 -1 599,99
Currency $#,##0.00;[Red]($#,##0.00) $1 599,99 ($1 599,99)
Scientific 0.00E+00 1,98E+08 -1,98E+08
Percentage (0.025) 3% 2,50%
Fraction (2.5) 2 1/2
Time Format: h:mm AM/PM 6:15 fm 6:15 em
Time Format: h:mm 6:15 18:15
Date Format: m/d/yy 2009/10/03
Date Format: d-mmm-yy 17-maj-07
Date/Time Format 1/19/08 4:35
Custom Number: 19,99 dollars and cents
Custom Date: 39219.18056369212 

What the tika1.2 parser returns for the xls (and is indexed by solr)
Number #,##0.00  1 599,99 -1 599,99
Currency $#,##0.00;[Red]($#,##0.00) $1 599,99 ($1 599,99)
Scientific 0.00E+00 1,98E+08 -1,98E+08
Percentage (0.025) 3% 2,50%
Fraction (2.5) 2 1/2
Time Format: h:mm AM/PM 6:15 fm 6:15 em
Time Format: h:mm  6:15 18:15
Date Format: m/d/yy 10/3/09
Date Format: d-mmm-yy 17-maj-07
Date/Time Format  1/19/08 4:35
Custom Number: 19,99 dollars and cents
Custom Date: 39219.18056369212

--- 

Unexpected formats for:
Date Format: m/d/yy 2009-10-03
Date/Time Format 2008-01-19 04:35
Custom Date: At 4:20 AM on torsdag maj 17, 2007

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to