[ 
https://issues.apache.org/jira/browse/NIFI-14106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Stieglitz updated NIFI-14106:
------------------------------------
    Description: 
The SplitExcel processor seems to be dropping date format information needed in 
order to correctly determine dates and times. I set up two flows sending the 
attached file to both:

+Flow without SplitExcel+

GetFile -> ConvertRecord (configured with ExcelReader with Schema Access 
Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with Schema 
Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time Format 
HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)

+Flow with SplitExcel+

GetFile -> SplitExcel -> ConvertRecord (configured with ExcelReader with Schema 
Access Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with 
Schema Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time 
Format HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)

The contents from the flow without SplitExcel correctly outputs the dates and 
times
{code:java}
transaction_id,transaction_date,transaction_time
75,01/01/2023,09:53:44
78,01/01/2023,09:55:16
80,01/01/2023,10:00:39
81,01/01/2023,10:03:55
82,01/01/2023,10:14:49{code}
and its schema is
{code:java}
{"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":[{"type":"int","logicalType":"date"},"null"]},{"name":"transaction_time","type":[{"type":"int","logicalType":"time-millis"},"null"]}]}{code}
while the contents from the flow with SplitExcel does not correctly output the 
dates and times
{code:java}
transaction_id,transaction_date,transaction_time
75,44927,-1
78,44927,-1
80,44927,-1
81,44927,-1
82,44927,-1{code}
and its schema is
{code:java}
{"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":["long","null"]},{"name":"transaction_time","type":["long","null"]}]}{code}
Even if the schema for the SplitExcel flow is specified as the one produced by 
flow without the SplitExcel processor and the header from the attached file is 
removed, the results are not the correct date and times
{code:java}
75,01/01/1970,23:59:59
78,01/01/1970,23:59:59
80,01/01/1970,23:59:59
81,01/01/1970,23:59:59
82,01/01/1970,23:59:59{code}
It would seem when copying the contents from the original tab into a new Excel 
spreadsheet (with one tab) that for all dates/times/timestamps the correct 
epoch must be copied over.

  was:
The SplitExcel processor seems to be dropping date format information needed in 
order to correctly determine dates and times. I set up two flows sending the 
attached file to both:

+Flow without SplitExcel+

GetFile -> ConvertRecord (configured with ExcelReader with Schema Access 
Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with Schema 
Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time Format 
HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)

+Flow with SplitExcel+

GetFile -> SplitExcel -> ConvertRecord (configured with ExcelReader with Schema 
Access Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with 
Schema Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time 
Format HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)

The contents from the flow without SplitExcel correctly outputs the dates and 
times
{code:java}
transaction_id,transaction_date,transaction_time
75,01/01/2023,09:53:44
78,01/01/2023,09:55:16
80,01/01/2023,10:00:39
81,01/01/2023,10:03:55
82,01/01/2023,10:14:49{code}
and its schema is
{code:java}
{"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":[{"type":"int","logicalType":"date"},"null"]},{"name":"transaction_time","type":[{"type":"int","logicalType":"time-millis"},"null"]}]}{code}
while the contents from the flow with SplitExcel does not correctly output the 
dates and times
{code:java}
transaction_id,transaction_date,transaction_time
75,44927,-1
78,44927,-1
80,44927,-1
81,44927,-1
82,44927,-1{code}
and its schema is
{code:java}
{"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":["long","null"]},{"name":"transaction_time","type":["long","null"]}]}{code}
Even if the schema for the SplitExcel flow is specified as the one produced by 
flow without the SplitExcel processor and the header from the attached file is 
removed, the results are not the correct date and times
{code:java}
75,01/01/1970,23:59:59
78,01/01/1970,23:59:59
80,01/01/1970,23:59:59
81,01/01/1970,23:59:59
82,01/01/1970,23:59:59{code}
 

It would seem when copying the contents from the original tab into a new Excel 
spreadsheet with one tab for all dates the correct epoch must be copied over.


> Loss of dates and times precision when running Excel with Dates through 
> SplitExcel
> ----------------------------------------------------------------------------------
>
>                 Key: NIFI-14106
>                 URL: https://issues.apache.org/jira/browse/NIFI-14106
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Daniel Stieglitz
>            Priority: Major
>         Attachments: fileWithIssue.xlsx
>
>
> The SplitExcel processor seems to be dropping date format information needed 
> in order to correctly determine dates and times. I set up two flows sending 
> the attached file to both:
> +Flow without SplitExcel+
> GetFile -> ConvertRecord (configured with ExcelReader with Schema Access 
> Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with Schema 
> Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time Format 
> HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
> +Flow with SplitExcel+
> GetFile -> SplitExcel -> ConvertRecord (configured with ExcelReader with 
> Schema Access Strategy 'Use Starting Row' and CSVRecordSetWriter (configured 
> with Schema Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, 
> Time Format HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
> The contents from the flow without SplitExcel correctly outputs the dates and 
> times
> {code:java}
> transaction_id,transaction_date,transaction_time
> 75,01/01/2023,09:53:44
> 78,01/01/2023,09:55:16
> 80,01/01/2023,10:00:39
> 81,01/01/2023,10:03:55
> 82,01/01/2023,10:14:49{code}
> and its schema is
> {code:java}
> {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":[{"type":"int","logicalType":"date"},"null"]},{"name":"transaction_time","type":[{"type":"int","logicalType":"time-millis"},"null"]}]}{code}
> while the contents from the flow with SplitExcel does not correctly output 
> the dates and times
> {code:java}
> transaction_id,transaction_date,transaction_time
> 75,44927,-1
> 78,44927,-1
> 80,44927,-1
> 81,44927,-1
> 82,44927,-1{code}
> and its schema is
> {code:java}
> {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":["long","null"]},{"name":"transaction_time","type":["long","null"]}]}{code}
> Even if the schema for the SplitExcel flow is specified as the one produced 
> by flow without the SplitExcel processor and the header from the attached 
> file is removed, the results are not the correct date and times
> {code:java}
> 75,01/01/1970,23:59:59
> 78,01/01/1970,23:59:59
> 80,01/01/1970,23:59:59
> 81,01/01/1970,23:59:59
> 82,01/01/1970,23:59:59{code}
> It would seem when copying the contents from the original tab into a new 
> Excel spreadsheet (with one tab) that for all dates/times/timestamps the 
> correct epoch must be copied over.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to