[
https://issues.apache.org/jira/browse/NIFI-14106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Stieglitz updated NIFI-14106:
------------------------------------
Description:
The SplitExcel processor seems to be dropping date format information needed in
order to correctly determine dates and times. I set up two flows sending the
attached file to both:
+Flow without SplitExcel+
GetFile -> ConvertRecord (configured with ExcelReader with Schema Access
Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with Schema
Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time Format
HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
+Flow with SplitExcel+
GetFile -> SplitExcel -> ConvertRecord (configured with ExcelReader with Schema
Access Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with
Schema Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time
Format HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
The contents from the flow without SplitExcel correctly outputs the dates and
times
{code:java}
transaction_id,transaction_date,transaction_time
75,01/01/2023,09:53:44
78,01/01/2023,09:55:16
80,01/01/2023,10:00:39
81,01/01/2023,10:03:55
82,01/01/2023,10:14:49{code}
and its schema is
{code:java}
{"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":[{"type":"int","logicalType":"date"},"null"]},{"name":"transaction_time","type":[{"type":"int","logicalType":"time-millis"},"null"]}]}{code}
while the contents from the flow with SplitExcel does not correctly output the
dates and times
{code:java}
transaction_id,transaction_date,transaction_time
75,44927,-1
78,44927,-1
80,44927,-1
81,44927,-1
82,44927,-1{code}
and its schema is
{code:java}
{"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":["long","null"]},{"name":"transaction_time","type":["long","null"]}]}{code}
Even if the schema for the SplitExcel flow is specified as the one produced by
flow without the SplitExcel processor and the header from the attached file is
removed, the results are not the correct date and times
{code:java}
75,01/01/1970,23:59:59
78,01/01/1970,23:59:59
80,01/01/1970,23:59:59
81,01/01/1970,23:59:59
82,01/01/1970,23:59:59{code}
It would seem when copying the contents from the original tab into a new Excel
spreadsheet (with one tab) that for all dates/times/timestamps the correct
epoch must be copied over.
was:
The SplitExcel processor seems to be dropping date format information needed in
order to correctly determine dates and times. I set up two flows sending the
attached file to both:
+Flow without SplitExcel+
GetFile -> ConvertRecord (configured with ExcelReader with Schema Access
Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with Schema
Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time Format
HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
+Flow with SplitExcel+
GetFile -> SplitExcel -> ConvertRecord (configured with ExcelReader with Schema
Access Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with
Schema Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time
Format HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
The contents from the flow without SplitExcel correctly outputs the dates and
times
{code:java}
transaction_id,transaction_date,transaction_time
75,01/01/2023,09:53:44
78,01/01/2023,09:55:16
80,01/01/2023,10:00:39
81,01/01/2023,10:03:55
82,01/01/2023,10:14:49{code}
and its schema is
{code:java}
{"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":[{"type":"int","logicalType":"date"},"null"]},{"name":"transaction_time","type":[{"type":"int","logicalType":"time-millis"},"null"]}]}{code}
while the contents from the flow with SplitExcel does not correctly output the
dates and times
{code:java}
transaction_id,transaction_date,transaction_time
75,44927,-1
78,44927,-1
80,44927,-1
81,44927,-1
82,44927,-1{code}
and its schema is
{code:java}
{"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":["long","null"]},{"name":"transaction_time","type":["long","null"]}]}{code}
Even if the schema for the SplitExcel flow is specified as the one produced by
flow without the SplitExcel processor and the header from the attached file is
removed, the results are not the correct date and times
{code:java}
75,01/01/1970,23:59:59
78,01/01/1970,23:59:59
80,01/01/1970,23:59:59
81,01/01/1970,23:59:59
82,01/01/1970,23:59:59{code}
It would seem when copying the contents from the original tab into a new Excel
spreadsheet with one tab for all dates the correct epoch must be copied over.
> Loss of dates and times precision when running Excel with Dates through
> SplitExcel
> ----------------------------------------------------------------------------------
>
> Key: NIFI-14106
> URL: https://issues.apache.org/jira/browse/NIFI-14106
> Project: Apache NiFi
> Issue Type: Bug
> Reporter: Daniel Stieglitz
> Priority: Major
> Attachments: fileWithIssue.xlsx
>
>
> The SplitExcel processor seems to be dropping date format information needed
> in order to correctly determine dates and times. I set up two flows sending
> the attached file to both:
> +Flow without SplitExcel+
> GetFile -> ConvertRecord (configured with ExcelReader with Schema Access
> Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with Schema
> Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time Format
> HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
> +Flow with SplitExcel+
> GetFile -> SplitExcel -> ConvertRecord (configured with ExcelReader with
> Schema Access Strategy 'Use Starting Row' and CSVRecordSetWriter (configured
> with Schema Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy,
> Time Format HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
> The contents from the flow without SplitExcel correctly outputs the dates and
> times
> {code:java}
> transaction_id,transaction_date,transaction_time
> 75,01/01/2023,09:53:44
> 78,01/01/2023,09:55:16
> 80,01/01/2023,10:00:39
> 81,01/01/2023,10:03:55
> 82,01/01/2023,10:14:49{code}
> and its schema is
> {code:java}
> {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":[{"type":"int","logicalType":"date"},"null"]},{"name":"transaction_time","type":[{"type":"int","logicalType":"time-millis"},"null"]}]}{code}
> while the contents from the flow with SplitExcel does not correctly output
> the dates and times
> {code:java}
> transaction_id,transaction_date,transaction_time
> 75,44927,-1
> 78,44927,-1
> 80,44927,-1
> 81,44927,-1
> 82,44927,-1{code}
> and its schema is
> {code:java}
> {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":["long","null"]},{"name":"transaction_time","type":["long","null"]}]}{code}
> Even if the schema for the SplitExcel flow is specified as the one produced
> by flow without the SplitExcel processor and the header from the attached
> file is removed, the results are not the correct date and times
> {code:java}
> 75,01/01/1970,23:59:59
> 78,01/01/1970,23:59:59
> 80,01/01/1970,23:59:59
> 81,01/01/1970,23:59:59
> 82,01/01/1970,23:59:59{code}
> It would seem when copying the contents from the original tab into a new
> Excel spreadsheet (with one tab) that for all dates/times/timestamps the
> correct epoch must be copied over.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)