[ https://issues.apache.org/jira/browse/NIFI-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325013#comment-15325013 ]
Mark Payne commented on NIFI-1280: ---------------------------------- [~Toivo Adams] correct - the data would only be read multiple times if necessary but this won't normally happen. I spent some time looking at this a few days ago, actually, looking for a way to refactor it so that we can easily enable multi-pass reading. Unfortunately, though, the only solutions that I came up with are either very hack-y or would require some changes to the NiFi API in order to allow us to obtain an InputStream and return it outside of a ProcessSession callback, which I'm not wild about. Planned to revisit again next week, but just trying to figure out a good way to make this feasible. > Create FilterCSVColumns Processor > --------------------------------- > > Key: NIFI-1280 > URL: https://issues.apache.org/jira/browse/NIFI-1280 > Project: Apache NiFi > Issue Type: Task > Components: Extensions > Reporter: Mark Payne > Assignee: Toivo Adams > > We should have a Processor that allows users to easily filter out specific > columns from CSV data. For instance, a user would configure two different > properties: "Columns of Interest" (a comma-separated list of column indexes) > and "Filtering Strategy" (Keep Only These Columns, Remove Only These Columns). > We can do this today with ReplaceText, but it is far more difficult than it > would be with this Processor, as the user has to use Regular Expressions, > etc. with ReplaceText. > Eventually a Custom UI could even be built that allows a user to upload a > Sample CSV and choose which columns from there, similar to the way that Excel > works when importing CSV by dragging and selecting the desired columns? That > would certainly be a larger undertaking and would not need to be done for an > initial implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)