[ 
https://issues.apache.org/jira/browse/NIFI-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905542#comment-16905542
 ] 

Joseph Witt commented on NIFI-4775:
-----------------------------------

I believe no LICENSE or NOTICE changes are required for our source or 
convenience binary distribution as we're taking this under its ALv2 terms and 
there is no copyright or legally required data to carry forward that I can see 
on their stuff and there appear to be zero transitive deps.  So..L&N check out 
as far as I can tell.

To summarize my remaining concerns
* Title of the JIRA needs to be fixed to what this is.
* The bug fix needs to be pulled out of this and turned into its own JIRA/PR 
with proper tracking.  That looks really important.
* This should be optionally included in a resulting build as we can not afford 
to keep pulling in large deps (rocksdbjni is 12MB) until we break apart the 
project.  We're at the cap of the ASF server limit for our convenience binary.  
There are other examples of what I'm describing such as for Atlas and Hive so 
we avoid making the default build larger.
* Documentation for usage of this should be included with this (like you have 
on the JIRA itself).

With the above items fixed I am +1.

> Allow FlowFile Repository to optionally perform fsync when writing CREATE 
> events but not other events
> -----------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-4775
>                 URL: https://issues.apache.org/jira/browse/NIFI-4775
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Brandon DeVries
>            Priority: Major
>             Fix For: 1.10.0
>
>         Attachments: RocksDBFlowFileRepo.html, rocksdb-flowfile-repo.adoc
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, when a FlowFile is written to the FlowFile Repository, the repo 
> can either fsync or not, depending on nifi.properties. We should allow a 
> third option, of fsync only for CREATE events. In this case, if we receive 
> new data from a source we can fsync the update to the FlowFile Repository 
> before ACK'ing the data from the source. This allows us to guarantee data 
> persistence without the overhead of an fsync for every FlowFile Repository 
> update.
> It may make sense, though, to be a bit more selective about when do this. For 
> example if the source is a system that does not allow us to acknowledge the 
> receipt of data, such as a ListenUDP processor, this doesn't really buy us 
> much. In such a case, we could be smart about avoiding the high cost of an 
> fsync. However, for something like GetSFTP where we have to remove the file 
> in order to 'acknowledge receipt' we can ensure that we wait for the fsync 
> before proceeding.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to