[ https://issues.apache.org/jira/browse/NIFI-14106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Stieglitz updated NIFI-14106: ------------------------------------ Status: Open (was: Patch Available) > Loss of dates and times precision when running Excel with Dates through > SplitExcel > ---------------------------------------------------------------------------------- > > Key: NIFI-14106 > URL: https://issues.apache.org/jira/browse/NIFI-14106 > Project: Apache NiFi > Issue Type: Bug > Reporter: Daniel Stieglitz > Assignee: Daniel Stieglitz > Priority: Major > Attachments: fileWithIssue.xlsx > > Time Spent: 2.5h > Remaining Estimate: 0h > > The SplitExcel processor seems to be dropping date format information needed > in order to correctly determine dates and times. I set up two flows sending > the attached file to both: > +Flow without SplitExcel+ > GetFile -> ConvertRecord (configured with ExcelReader with Schema Access > Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with Schema > Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time Format > HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss) > +Flow with SplitExcel+ > GetFile -> SplitExcel -> ConvertRecord (configured with ExcelReader with > Schema Access Strategy 'Use Starting Row' and CSVRecordSetWriter (configured > with Schema Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, > Time Format HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss) > The contents from the flow without SplitExcel correctly outputs the dates and > times > {code:java} > transaction_id,transaction_date,transaction_time > 75,01/01/2023,09:53:44 > 78,01/01/2023,09:55:16 > 80,01/01/2023,10:00:39 > 81,01/01/2023,10:03:55 > 82,01/01/2023,10:14:49{code} > and its schema is > {code:java} > {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":[{"type":"int","logicalType":"date"},"null"]},{"name":"transaction_time","type":[{"type":"int","logicalType":"time-millis"},"null"]}]}{code} > while the contents from the flow with SplitExcel does not correctly output > the dates and times > {code:java} > transaction_id,transaction_date,transaction_time > 75,44927,-1 > 78,44927,-1 > 80,44927,-1 > 81,44927,-1 > 82,44927,-1{code} > and its schema is > {code:java} > {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":["long","null"]},{"name":"transaction_time","type":["long","null"]}]}{code} > Even if the schema for the SplitExcel flow is specified as the one produced > by flow without the SplitExcel processor and the header from the attached > file is removed, the results are not the correct date and times > {code:java} > 75,01/01/1970,23:59:59 > 78,01/01/1970,23:59:59 > 80,01/01/1970,23:59:59 > 81,01/01/1970,23:59:59 > 82,01/01/1970,23:59:59{code} > It would seem when copying the contents from the original tab into a new > Excel spreadsheet (with one tab) that for all dates/times/timestamps the > correct epoch must be copied over. -- This message was sent by Atlassian Jira (v8.20.10#820010)