[ 
https://issues.apache.org/jira/browse/NIFI-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549852#comment-15549852
 ] 

Mark Payne commented on NIFI-1280:
----------------------------------

[~Toivo Adams] I'm not sure what you mean by "indexing FlowFile content." In 
general, the content won't need to be read more than once. Even if the content 
has to be consumed multiple times, modern OS's have phenomenal disk caching 
mechanisms. If you heavy volumes of data through NiFi and you run "iostat -xmh 
5" on a linux system, for instance, you'll see that the number of disk reads is 
almost 0 because the disk caching is so efficient. So I'm not too concerned 
about rereading the data.

I did make some modifications to the processor... allowed the Enumerator to be 
created multiple times and refactored so that the data format can be extracted 
away from the core logic. At this point, the processor still only accepts CSV 
data but I'm hoping to open it up to more data formats than that. I will 
hopefully put a PR up soon that has the updated code, though. Would love for 
you to check it out and see if you've got any feedback.

> Create FilterCSVColumns Processor
> ---------------------------------
>
>                 Key: NIFI-1280
>                 URL: https://issues.apache.org/jira/browse/NIFI-1280
>             Project: Apache NiFi
>          Issue Type: Task
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Toivo Adams
>
> We should have a Processor that allows users to easily filter out specific 
> columns from CSV data. For instance, a user would configure two different 
> properties: "Columns of Interest" (a comma-separated list of column indexes) 
> and "Filtering Strategy" (Keep Only These Columns, Remove Only These Columns).
> We can do this today with ReplaceText, but it is far more difficult than it 
> would be with this Processor, as the user has to use Regular Expressions, 
> etc. with ReplaceText.
> Eventually a Custom UI could even be built that allows a user to upload a 
> Sample CSV and choose which columns from there, similar to the way that Excel 
> works when importing CSV by dragging and selecting the desired columns? That 
> would certainly be a larger undertaking and would not need to be done for an 
> initial implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to