[jira] [Commented] (NIFI-8932) Add feature to CSVReader to skip N lines at top of the file

2024-01-31 Thread Mark Payne (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813009#comment-17813009
 ] 

Mark Payne commented on NIFI-8932:
--

I think I'm a -1 on this Jira. The CSV Reader should be given valid CSV, rather 
than skipping over an arbitrary number of lines.

[~iiojj2] to strip out the first line of a file, you should not use the complex 
flow above but rather just use RouteText. Add a property with a value of 
${lineNo:gt(1)} and auto-terminated unmatched.

> Add feature to CSVReader to skip N lines at top of the file
> ---
>
> Key: NIFI-8932
> URL: https://issues.apache.org/jira/browse/NIFI-8932
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Philipp Korniets
>Assignee: Matt Burgess
>Priority: Minor
>  Labels: backport-needed
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We have a lot of CSV files where provider add custom header/footer to valid 
> CSV content.
>  CSV header is actually second row. 
> To remove unnecessary data we can use
>  * ReplaceText 
>  * splitText->RouteOnAttribute -> MergeContent
> It would be great to have an option in CSVReader controller to skip N rows 
> from top/bottom in order to get5 clean data.
>  * skip N from the top
>  * skip M from the bottom
>  Similar request was developed in FLINK 
> https://issues.apache.org/jira/browse/FLINK-1002
>  
> Data Example:
> {code}
> 7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X),,,
> distribution_id,Distribution 
> Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,EUR,TPSL_21025226   ,19-Jul-21,BRM96ST7   ,ABC 
> 14/09/24,NR,BOND  
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,GBP,RPSS_21025226   ,19-Jul-21,,Total @ -0.11,,
> {code}
> |7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)|  |  |  |  |  |  |  
> |  |  |  |  |  
> |distribution_id|Distribution 
> Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |EUR|TPSL_21025226   |19-Jul-21|BRM96ST7   |ABC 
> 14/09/24|NR|BOND  |
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-8932) Add feature to CSVReader to skip N lines at top of the file

2023-11-17 Thread Philipp Korniets (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787214#comment-17787214
 ] 

Philipp Korniets commented on NIFI-8932:


Thanks Matt, it would be nice if this new Property of CSVReader will allow 
Expression Language, and scope will be file attribute. This will allow to use 
same service with multiple parameters. IF new property is not provided - 
default to 0

> Add feature to CSVReader to skip N lines at top of the file
> ---
>
> Key: NIFI-8932
> URL: https://issues.apache.org/jira/browse/NIFI-8932
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Philipp Korniets
>Assignee: Matt Burgess
>Priority: Minor
> Fix For: 1.latest, 2.latest
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We have a lot of CSV files where provider add custom header/footer to valid 
> CSV content.
>  CSV header is actually second row. 
> To remove unnecessary data we can use
>  * ReplaceText 
>  * splitText->RouteOnAttribute -> MergeContent
> It would be great to have an option in CSVReader controller to skip N rows 
> from top/bottom in order to get5 clean data.
>  * skip N from the top
>  * skip M from the bottom
>  Similar request was developed in FLINK 
> https://issues.apache.org/jira/browse/FLINK-1002
>  
> Data Example:
> {code}
> 7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X),,,
> distribution_id,Distribution 
> Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,EUR,TPSL_21025226   ,19-Jul-21,BRM96ST7   ,ABC 
> 14/09/24,NR,BOND  
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,GBP,RPSS_21025226   ,19-Jul-21,,Total @ -0.11,,
> {code}
> |7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)|  |  |  |  |  |  |  
> |  |  |  |  |  
> |distribution_id|Distribution 
> Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |EUR|TPSL_21025226   |19-Jul-21|BRM96ST7   |ABC 
> 14/09/24|NR|BOND  |
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-8932) Add feature to CSVReader to skip N lines at top of the file

2023-10-28 Thread Matt Burgess (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780663#comment-17780663
 ] 

Matt Burgess commented on NIFI-8932:


This Jira will be to add skipping top rows. Skipping bottom rows involves a 
different technique to read in N rows before returning any records and keeping 
track of them using a queue to feed more records into the bottom and clear the 
queue if no more records are available. Please feel free to file a follow-on 
Jira to handle skipping the last N rows.

> Add feature to CSVReader to skip N lines at top of the file
> ---
>
> Key: NIFI-8932
> URL: https://issues.apache.org/jira/browse/NIFI-8932
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Philipp Korniets
>Assignee: Matt Burgess
>Priority: Minor
> Fix For: 1.latest, 2.latest
>
>
> We have a lot of CSV files where provider add custom header/footer to valid 
> CSV content.
>  CSV header is actually second row. 
> To remove unnecessary data we can use
>  * ReplaceText 
>  * splitText->RouteOnAttribute -> MergeContent
> It would be great to have an option in CSVReader controller to skip N rows 
> from top/bottom in order to get5 clean data.
>  * skip N from the top
>  * skip M from the bottom
>  Similar request was developed in FLINK 
> https://issues.apache.org/jira/browse/FLINK-1002
>  
> Data Example:
> {code}
> 7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X),,,
> distribution_id,Distribution 
> Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,EUR,TPSL_21025226   ,19-Jul-21,BRM96ST7   ,ABC 
> 14/09/24,NR,BOND  
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,GBP,RPSS_21025226   ,19-Jul-21,,Total @ -0.11,,
> {code}
> |7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)|  |  |  |  |  |  |  
> |  |  |  |  |  
> |distribution_id|Distribution 
> Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |EUR|TPSL_21025226   |19-Jul-21|BRM96ST7   |ABC 
> 14/09/24|NR|BOND  |
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)