[
https://issues.apache.org/jira/browse/TIKA-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024895#comment-17024895
]
Nick Burch commented on TIKA-3028:
----------------------------------
The formatting of the raw values into nice strings is handled for Tika by the
underlying {{com.epam.parso}} library, so the bug might be in there
Any chance you could write a 3 line program based on the *To convert the data
of the file into CSV format, use:* section of [https://github.com/epam/parso]
using the latest version of the library, and see if that makes the same
mistake? If so, that ought to be enough to report a bug to epam
> Failed test at SAS7BDATParserTest:112
> -------------------------------------
>
> Key: TIKA-3028
> URL: https://issues.apache.org/jira/browse/TIKA-3028
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.23
> Reporter: Wknds
> Priority: Blocker
> Attachments: Bildschirmfoto 2020-01-24 um 23.12.20.png
>
>
> Test fails at
> SAS7BDATParserTest.testMultiColumns:112->TikaTest.assertContains:107.
> Expected date is _01Jan1960:00:00_
> while the dates in the (untouched) test file are abbreviated by an '.' on my
> system (please refer to the terminal output below).
> {code:java}
> // code placeholder
> [ERROR] Failures:
> [ERROR]
> SAS7BDATParserTest.testMultiColumns:112->TikaTest.assertContains:107
> 01Jan1960:00:00 not found in:
> TESTING Record Number Square of the Record Number Description of
> the Row Percent Done Percent Increment date datetime time
> 0 0 This is row 0 of 10 0%
> 01-01-1960 01Jan.1960:00:00:01.00 00:00:01 1 1 This
> is row 1 of 10 10% 0.0% 02-01-1960
> 01Jan.1960:00:00:10.00 00:00:03 2 4 This is row
> 2 of 10 20% 50.0% 17-01-1960
> 01Jan.1960:00:01:40.00 00:00:09 3 9 This is row
> 3 of 10 30% 66.7% 22-03-1960
> 01Jan.1960:00:16:40.00 00:00:27 4 16 This is row
> 4 of 10 40% 75.0% 13-09-1960
> 01Jan.1960:02:46:40.00 00:01:21 5 25 This is row
> 5 of 10 50% 80.0% 17-09-1961
> 02Jan.1960:03:46:40.00 00:04:03 6 36 This is row
> 6 of 10 60% 83.3% 20-07-1963
> 12Jan.1960:13:46:40.00 00:12:09 7 49 This is row
> 7 of 10 70% 85.7% 29-07-1966
> 25Apr.1960:17:46:40.00 00:36:27 8 64 This is row
> 8 of 10 80% 87.5% 20-03-1971
> 03März1963:09:46:40.00 01:49:21 9 81 This is row
> 9 of 10 90% 88.9% 18-12-1977
> 09Sep.1991:01:46:40.00 05:28:03 10 100 This is row
> 10 of 10 100% 90.0% 19-05-1987
> 19Nov.2276:17:46:40.00 16:24:09
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)