[ 
https://issues.apache.org/jira/browse/NIFI-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606203#comment-14606203
 ] 

Ryan Blue commented on NIFI-739:
--------------------------------

I've posted a work-in-progress branch. It adds caches for datasets and writers, 
and will detect when a dataset's schema changes to close and reopen the writer. 
It can also close writers periodically to release the files that are in 
progress, although this will eventually be handled by the writers internally.

Only writers that support sync are cached, in order to guarantee durability. 
Writers that don't support sync, like Parquet, are closed before the 
{{onTrigger}} method completes. Because of this limitation, I've added internal 
batching so that the processor will write as much as it can before closing the 
Parquet writers.

[~fwiffo], could you take a look when you have a chance?

> Update Kite storage processor to cache writers
> ----------------------------------------------
>
>                 Key: NIFI-739
>                 URL: https://issues.apache.org/jira/browse/NIFI-739
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>    Affects Versions: 0.1.0
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>
> The StoreInKiteProcessor currently opens and closes a new dataset writer for 
> each file, creating lots of small files. Instead, the processor should keep 
> writers open and periodically refresh them if data should be made available. 
> This will make it much easier to delegate file management to Kite in the 
> future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to