[ 
https://issues.apache.org/jira/browse/NIFI-14106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Stieglitz updated NIFI-14106:
------------------------------------
    Description: 
The SplitExcel processor seems to be dropping date format information in order 
to correctly determine dates and times. I set up two flows sending the attached 
file to both:

Flow without SplitExcel

GetFile -> ConvertRecord (configured with ExcelReader with Schema Access 
Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with Schema 
Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time Format 
HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)

 

Flow with SplitExcel

GetFile -> SplitExcel -> ConvertRecord (configured with ExcelReader with Schema 
Access Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with 
Schema Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time 
Format HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)

 

The contents from the flow without SplitExcel is
{code:java}
transaction_id,transaction_date,transaction_time
75,01/01/2023,09:53:44
78,01/01/2023,09:55:16
80,01/01/2023,10:00:39
81,01/01/2023,10:03:55
82,01/01/2023,10:14:49{code}
and its schema is
{code:java}
{"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":[{"type":"int","logicalType":"date"},"null"]},{"name":"transaction_time","type":[{"type":"int","logicalType":"time-millis"},"null"]}]}{code}
while the contents from the flow with SplitExcel is
{code:java}
transaction_id,transaction_date,transaction_time
75,44927,-1
78,44927,-1
80,44927,-1
81,44927,-1
82,44927,-1{code}
and its schema is
{code:java}
{"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":["long","null"]},{"name":"transaction_time","type":["long","null"]}]}{code}

  was:
The SplitExcel processor seems to be dropping date format information in order 
to correctly determine dates and times.

I set up two flows sending the attached file to both:

Flow without SplitExcel

GetFile -> ConvertRecord (configured with ExcelReader with Schema Access 
Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with Schema 
Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time Format 
HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)

 

Flow with SplitExcel

GetFile -> SplitExcel -> ConvertRecord (configured with ExcelReader with Schema 
Access Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with 
Schema Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time 
Format HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)

 

The contents from the flow without SplitExcel is
{code:java}
transaction_id,transaction_date,transaction_time
75,01/01/2023,09:53:44
78,01/01/2023,09:55:16
80,01/01/2023,10:00:39
81,01/01/2023,10:03:55
82,01/01/2023,10:14:49{code}
 

while the contents from the flow with SplitExcel is
{code:java}
transaction_id,transaction_date,transaction_time
75,44927,-1
78,44927,-1
80,44927,-1
81,44927,-1
82,44927,-1{code}
 


> Loss of precision when running Excel with Dates in SplitExcel
> -------------------------------------------------------------
>
>                 Key: NIFI-14106
>                 URL: https://issues.apache.org/jira/browse/NIFI-14106
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Daniel Stieglitz
>            Priority: Major
>         Attachments: fileWithIssue.xlsx
>
>
> The SplitExcel processor seems to be dropping date format information in 
> order to correctly determine dates and times. I set up two flows sending the 
> attached file to both:
> Flow without SplitExcel
> GetFile -> ConvertRecord (configured with ExcelReader with Schema Access 
> Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with Schema 
> Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time Format 
> HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
>  
> Flow with SplitExcel
> GetFile -> SplitExcel -> ConvertRecord (configured with ExcelReader with 
> Schema Access Strategy 'Use Starting Row' and CSVRecordSetWriter (configured 
> with Schema Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, 
> Time Format HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
>  
> The contents from the flow without SplitExcel is
> {code:java}
> transaction_id,transaction_date,transaction_time
> 75,01/01/2023,09:53:44
> 78,01/01/2023,09:55:16
> 80,01/01/2023,10:00:39
> 81,01/01/2023,10:03:55
> 82,01/01/2023,10:14:49{code}
> and its schema is
> {code:java}
> {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":[{"type":"int","logicalType":"date"},"null"]},{"name":"transaction_time","type":[{"type":"int","logicalType":"time-millis"},"null"]}]}{code}
> while the contents from the flow with SplitExcel is
> {code:java}
> transaction_id,transaction_date,transaction_time
> 75,44927,-1
> 78,44927,-1
> 80,44927,-1
> 81,44927,-1
> 82,44927,-1{code}
> and its schema is
> {code:java}
> {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":["long","null"]},{"name":"transaction_time","type":["long","null"]}]}{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to