[ https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Handermann updated NIFI-8932: ----------------------------------- Labels: backport-needed (was: ) > Add feature to CSVReader to skip N lines at top of the file > ----------------------------------------------------------- > > Key: NIFI-8932 > URL: https://issues.apache.org/jira/browse/NIFI-8932 > Project: Apache NiFi > Issue Type: Improvement > Reporter: Philipp Korniets > Assignee: Matt Burgess > Priority: Minor > Labels: backport-needed > Time Spent: 3h > Remaining Estimate: 0h > > We have a lot of CSV files where provider add custom header/footer to valid > CSV content. > CSV header is actually second row. > To remove unnecessary data we can use > * ReplaceText > * splitText->RouteOnAttribute -> MergeContent > It would be great to have an option in CSVReader controller to skip N rows > from top/bottom in order to get5 clean data. > * skip N from the top > * skip M from the bottom > Similar request was developed in FLINK > https://issues.apache.org/jira/browse/FLINK-1002 > > Data Example: > {code} > 7/20/21 2:48:47 AM GMT-04:00 ABB: Blended Rate Calc (X),,,,,,,,,,, > distribution_id,Distribution > Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type > -1,all,20210719,Repo 21025226,qwerty > ,EUR,TPSL_21025226 ,19-Jul-21,BRM96ST7 ,ABC > 14/09/24,NR,BOND > -1,all,20210719,Repo 21025226,qwerty > ,GBP,RPSS_21025226 ,19-Jul-21,,Total @ -0.11,, > {code} > |7/20/21 2:48:47 AM GMT-04:00 ABB: Blended Rate Calc (X)| | | | | | | > | | | | | > |distribution_id|Distribution > Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type| > |-1|all|20210719|Repo 21025226|qwerty > |EUR|TPSL_21025226 |19-Jul-21|BRM96ST7 |ABC > 14/09/24|NR|BOND | > |-1|all|20210719|Repo 21025226|qwerty > |GBP|RPSS_21025226 |19-Jul-21| |Total @ -0.11| | | -- This message was sent by Atlassian Jira (v8.20.10#820010)