[ 
https://issues.apache.org/jira/browse/NIFI-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552595#comment-15552595
 ] 

Toivo Adams commented on NIFI-1280:
-----------------------------------

@markap14

Roughly same way as databases use indexes.
For example 
select * from emp join emp as mgr on emp.mgr = mgr.id where emp.salary > 
mgr.salary

We can create index on salary column.
Of course indexing itself is expensive operation.
But maybe we can find a way how to create index reasonably cheap.
I have crazy idea to create index in previous step during writing to FlowFile.
Of course creating index should optional and should be used with care.
But sometimes it might improve performance considerably.

Maybe indexing is not worth of trouble.

Glad you have new version almost ready.
Certainly I am interested to see it.
I have a feeling this greatly improves user experience.

Thanks
Toivo

> Create FilterCSVColumns Processor
> ---------------------------------
>
>                 Key: NIFI-1280
>                 URL: https://issues.apache.org/jira/browse/NIFI-1280
>             Project: Apache NiFi
>          Issue Type: Task
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Toivo Adams
>
> We should have a Processor that allows users to easily filter out specific 
> columns from CSV data. For instance, a user would configure two different 
> properties: "Columns of Interest" (a comma-separated list of column indexes) 
> and "Filtering Strategy" (Keep Only These Columns, Remove Only These Columns).
> We can do this today with ReplaceText, but it is far more difficult than it 
> would be with this Processor, as the user has to use Regular Expressions, 
> etc. with ReplaceText.
> Eventually a Custom UI could even be built that allows a user to upload a 
> Sample CSV and choose which columns from there, similar to the way that Excel 
> works when importing CSV by dragging and selecting the desired columns? That 
> would certainly be a larger undertaking and would not need to be done for an 
> initial implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to