[jira] [Commented] (NIFI-4775) Allow FlowFile Repository to optionally perform fsync when writing CREATE events but not other events

2019-08-12 Thread Joseph Witt (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905542#comment-16905542
 ] 

Joseph Witt commented on NIFI-4775:
---

I believe no LICENSE or NOTICE changes are required for our source or 
convenience binary distribution as we're taking this under its ALv2 terms and 
there is no copyright or legally required data to carry forward that I can see 
on their stuff and there appear to be zero transitive deps.  So..L check out 
as far as I can tell.

To summarize my remaining concerns
* Title of the JIRA needs to be fixed to what this is.
* The bug fix needs to be pulled out of this and turned into its own JIRA/PR 
with proper tracking.  That looks really important.
* This should be optionally included in a resulting build as we can not afford 
to keep pulling in large deps (rocksdbjni is 12MB) until we break apart the 
project.  We're at the cap of the ASF server limit for our convenience binary.  
There are other examples of what I'm describing such as for Atlas and Hive so 
we avoid making the default build larger.
* Documentation for usage of this should be included with this (like you have 
on the JIRA itself).

With the above items fixed I am +1.

> Allow FlowFile Repository to optionally perform fsync when writing CREATE 
> events but not other events
> -
>
> Key: NIFI-4775
> URL: https://issues.apache.org/jira/browse/NIFI-4775
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Brandon DeVries
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: RocksDBFlowFileRepo.html, rocksdb-flowfile-repo.adoc
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, when a FlowFile is written to the FlowFile Repository, the repo 
> can either fsync or not, depending on nifi.properties. We should allow a 
> third option, of fsync only for CREATE events. In this case, if we receive 
> new data from a source we can fsync the update to the FlowFile Repository 
> before ACK'ing the data from the source. This allows us to guarantee data 
> persistence without the overhead of an fsync for every FlowFile Repository 
> update.
> It may make sense, though, to be a bit more selective about when do this. For 
> example if the source is a system that does not allow us to acknowledge the 
> receipt of data, such as a ListenUDP processor, this doesn't really buy us 
> much. In such a case, we could be smart about avoiding the high cost of an 
> fsync. However, for something like GetSFTP where we have to remove the file 
> in order to 'acknowledge receipt' we can ensure that we wait for the fsync 
> before proceeding.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (NIFI-4775) Allow FlowFile Repository to optionally perform fsync when writing CREATE events but not other events

2019-08-12 Thread Joseph Witt (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905535#comment-16905535
 ] 

Joseph Witt commented on NIFI-4775:
---

Regarding my concern that this wasn't reviewed by a committer and has already 
been committed...we've never really formalized the view.  But in general the 
opinion seems to be this is allowed.

https://lists.apache.org/thread.html/8562e182af3673bee0dfb567b2436693d9e3a00f98346575d8def4a6@%3Cdev.nifi.apache.org%3E

So please ignore that concern for now.  I am focused on the others though

> Allow FlowFile Repository to optionally perform fsync when writing CREATE 
> events but not other events
> -
>
> Key: NIFI-4775
> URL: https://issues.apache.org/jira/browse/NIFI-4775
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Brandon DeVries
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: RocksDBFlowFileRepo.html, rocksdb-flowfile-repo.adoc
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, when a FlowFile is written to the FlowFile Repository, the repo 
> can either fsync or not, depending on nifi.properties. We should allow a 
> third option, of fsync only for CREATE events. In this case, if we receive 
> new data from a source we can fsync the update to the FlowFile Repository 
> before ACK'ing the data from the source. This allows us to guarantee data 
> persistence without the overhead of an fsync for every FlowFile Repository 
> update.
> It may make sense, though, to be a bit more selective about when do this. For 
> example if the source is a system that does not allow us to acknowledge the 
> receipt of data, such as a ListenUDP processor, this doesn't really buy us 
> much. In such a case, we could be smart about avoiding the high cost of an 
> fsync. However, for something like GetSFTP where we have to remove the file 
> in order to 'acknowledge receipt' we can ensure that we wait for the fsync 
> before proceeding.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (NIFI-4775) Allow FlowFile Repository to optionally perform fsync when writing CREATE events but not other events

2019-08-12 Thread Joseph Witt (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905531#comment-16905531
 ] 

Joseph Witt commented on NIFI-4775:
---

Additionally the rocksdb library is 12MB in size.  We're already right at the 
limit and therefore we cannot really keep including new libraries at this stage 
until we break extensions away from the core.  I would strongly recommend this 
only gets bundled by someone activating it in a profile for their build rather 
than our default distribution.

> Allow FlowFile Repository to optionally perform fsync when writing CREATE 
> events but not other events
> -
>
> Key: NIFI-4775
> URL: https://issues.apache.org/jira/browse/NIFI-4775
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Brandon DeVries
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: RocksDBFlowFileRepo.html, rocksdb-flowfile-repo.adoc
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, when a FlowFile is written to the FlowFile Repository, the repo 
> can either fsync or not, depending on nifi.properties. We should allow a 
> third option, of fsync only for CREATE events. In this case, if we receive 
> new data from a source we can fsync the update to the FlowFile Repository 
> before ACK'ing the data from the source. This allows us to guarantee data 
> persistence without the overhead of an fsync for every FlowFile Repository 
> update.
> It may make sense, though, to be a bit more selective about when do this. For 
> example if the source is a system that does not allow us to acknowledge the 
> receipt of data, such as a ListenUDP processor, this doesn't really buy us 
> much. In such a case, we could be smart about avoiding the high cost of an 
> fsync. However, for something like GetSFTP where we have to remove the file 
> in order to 'acknowledge receipt' we can ensure that we wait for the fsync 
> before proceeding.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (NIFI-4775) Allow FlowFile Repository to optionally perform fsync when writing CREATE events but not other events

2019-08-12 Thread Joseph Witt (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905523#comment-16905523
 ] 

Joseph Witt commented on NIFI-4775:
---

[~devriesb] 

* The title of this JIRA does not help describe the work done at all.
* The great docs on this JIRA are not present in the commit therefore nobody 
would know how to use it.  This should happen with the commit or be really darn 
close behind.
* The community follows an RTC model.  I do not believe this has been reviewed 
by a committer (though the non committer input is helpful and encouraged).  
That said, in reading our language around commits in our e-mail history and on 
our wiki/contributor guide we don't really clarify.  I could see this being 
worth debating but in any event see my other concerns.
* I am surprised there are zero LICENSE and NOTICE impacts to bringing this 
library in.  There is a comment about being unsure but nothing indicating this 
was verified.
* The change to the WriteAheadFlowFileRepository appears to have nothing to do 
with your commit and possibly seems related to a (serious) bug being found.  If 
so this is super helpful and important and should definitely not be in this 
JIRA/feature and should have its own JIRA/PR for resolution and tracking.

Can you please fix the above things and do the diligence necessary to ensure 
the LICENSE/NOTICE work is done.  I am re-opening as we would not want to 
create an RC with this at this state.


> Allow FlowFile Repository to optionally perform fsync when writing CREATE 
> events but not other events
> -
>
> Key: NIFI-4775
> URL: https://issues.apache.org/jira/browse/NIFI-4775
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Brandon DeVries
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: RocksDBFlowFileRepo.html, rocksdb-flowfile-repo.adoc
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, when a FlowFile is written to the FlowFile Repository, the repo 
> can either fsync or not, depending on nifi.properties. We should allow a 
> third option, of fsync only for CREATE events. In this case, if we receive 
> new data from a source we can fsync the update to the FlowFile Repository 
> before ACK'ing the data from the source. This allows us to guarantee data 
> persistence without the overhead of an fsync for every FlowFile Repository 
> update.
> It may make sense, though, to be a bit more selective about when do this. For 
> example if the source is a system that does not allow us to acknowledge the 
> receipt of data, such as a ListenUDP processor, this doesn't really buy us 
> much. In such a case, we could be smart about avoiding the high cost of an 
> fsync. However, for something like GetSFTP where we have to remove the file 
> in order to 'acknowledge receipt' we can ensure that we wait for the fsync 
> before proceeding.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (NIFI-4775) Allow FlowFile Repository to optionally perform fsync when writing CREATE events but not other events

2019-08-12 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905498#comment-16905498
 ] 

ASF subversion and git services commented on NIFI-4775:
---

Commit 7d77b464ccfbd86ff1f2057c44bc35580d7f9fe2 in nifi's branch 
refs/heads/master from Brandon Rhys DeVries
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=7d77b46 ]

NIFI-4775: FlowFile Repository implementation based on RocksDB

+l from markobean.

This closes #3638.

Signed-off-by: Brandon 


> Allow FlowFile Repository to optionally perform fsync when writing CREATE 
> events but not other events
> -
>
> Key: NIFI-4775
> URL: https://issues.apache.org/jira/browse/NIFI-4775
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Brandon DeVries
>Priority: Major
> Attachments: RocksDBFlowFileRepo.html, rocksdb-flowfile-repo.adoc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, when a FlowFile is written to the FlowFile Repository, the repo 
> can either fsync or not, depending on nifi.properties. We should allow a 
> third option, of fsync only for CREATE events. In this case, if we receive 
> new data from a source we can fsync the update to the FlowFile Repository 
> before ACK'ing the data from the source. This allows us to guarantee data 
> persistence without the overhead of an fsync for every FlowFile Repository 
> update.
> It may make sense, though, to be a bit more selective about when do this. For 
> example if the source is a system that does not allow us to acknowledge the 
> receipt of data, such as a ListenUDP processor, this doesn't really buy us 
> much. In such a case, we could be smart about avoiding the high cost of an 
> fsync. However, for something like GetSFTP where we have to remove the file 
> in order to 'acknowledge receipt' we can ensure that we wait for the fsync 
> before proceeding.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (NIFI-4775) Allow FlowFile Repository to optionally perform fsync when writing CREATE events but not other events

2019-08-07 Thread Brandon DeVries (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902210#comment-16902210
 ] 

Brandon DeVries commented on NIFI-4775:
---

Submitted patch, and attached documentation as adoc and html.  The 
documentation should possibly be incorporated into the existing documentation, 
but I wanted to allow for general review first.  Also, I'm reasonably sure 
there are no licensing issues, but I would appreciate anyone else checking as 
well.

> Allow FlowFile Repository to optionally perform fsync when writing CREATE 
> events but not other events
> -
>
> Key: NIFI-4775
> URL: https://issues.apache.org/jira/browse/NIFI-4775
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Brandon DeVries
>Priority: Major
> Attachments: RocksDBFlowFileRepo.html, rocksdb-flowfile-repo.adoc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, when a FlowFile is written to the FlowFile Repository, the repo 
> can either fsync or not, depending on nifi.properties. We should allow a 
> third option, of fsync only for CREATE events. In this case, if we receive 
> new data from a source we can fsync the update to the FlowFile Repository 
> before ACK'ing the data from the source. This allows us to guarantee data 
> persistence without the overhead of an fsync for every FlowFile Repository 
> update.
> It may make sense, though, to be a bit more selective about when do this. For 
> example if the source is a system that does not allow us to acknowledge the 
> receipt of data, such as a ListenUDP processor, this doesn't really buy us 
> much. In such a case, we could be smart about avoiding the high cost of an 
> fsync. However, for something like GetSFTP where we have to remove the file 
> in order to 'acknowledge receipt' we can ensure that we wait for the fsync 
> before proceeding.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (NIFI-4775) Allow FlowFile Repository to optionally perform fsync when writing CREATE events but not other events

2018-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371874#comment-16371874
 ] 

ASF GitHub Bot commented on NIFI-4775:
--

Github user devriesb commented on the issue:

https://github.com/apache/nifi/pull/2416
  
I'll grant NIFI-4775 may raise issues with my proposed solution. However, 
there is a problem right now.  My proposed solution addresses the problem right 
now.  Future modification may require adjustments to previous assumptions.  
That, however, is a problem for the future.  

In any case, after doing some experimentation, I'm not sure the current 
version of NIFI-4775 is the correct approach.  And whatever the eventual 
approach is, it may more appropriately be a new implementation (as discussed 
above).  I don't think we should put off correcting current bugs because they 
may complicate potential future features.


> Allow FlowFile Repository to optionally perform fsync when writing CREATE 
> events but not other events
> -
>
> Key: NIFI-4775
> URL: https://issues.apache.org/jira/browse/NIFI-4775
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Mark Payne
>Priority: Major
>
> Currently, when a FlowFile is written to the FlowFile Repository, the repo 
> can either fsync or not, depending on nifi.properties. We should allow a 
> third option, of fsync only for CREATE events. In this case, if we receive 
> new data from a source we can fsync the update to the FlowFile Repository 
> before ACK'ing the data from the source. This allows us to guarantee data 
> persistence without the overhead of an fsync for every FlowFile Repository 
> update.
> It may make sense, though, to be a bit more selective about when do this. For 
> example if the source is a system that does not allow us to acknowledge the 
> receipt of data, such as a ListenUDP processor, this doesn't really buy us 
> much. In such a case, we could be smart about avoiding the high cost of an 
> fsync. However, for something like GetSFTP where we have to remove the file 
> in order to 'acknowledge receipt' we can ensure that we wait for the fsync 
> before proceeding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4775) Allow FlowFile Repository to optionally perform fsync when writing CREATE events but not other events

2018-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371852#comment-16371852
 ] 

ASF GitHub Bot commented on NIFI-4775:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2416
  
I would not consider that circumstance to be unusual, but rather a common 
scenario if power is lost, after NIFI-4775 has been implemented. Given that 
NIFI-4775 was created and that there were no objections, I considered that 
verification that it is intended to be implemented in the future. Once this is 
done, it will guarantee no loss of data (though it would allow loss of 
processing). The proposed solution, however, still results in data loss if 
power is lost, but also prevents us from implementing NIFI-4775 effectively 
because once it is implemented it would provide us no real benefit with such a 
solution, as it would still throw out those fsync'ed CREATE events if another 
partition was not also fsync'ed.


> Allow FlowFile Repository to optionally perform fsync when writing CREATE 
> events but not other events
> -
>
> Key: NIFI-4775
> URL: https://issues.apache.org/jira/browse/NIFI-4775
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Mark Payne
>Priority: Major
>
> Currently, when a FlowFile is written to the FlowFile Repository, the repo 
> can either fsync or not, depending on nifi.properties. We should allow a 
> third option, of fsync only for CREATE events. In this case, if we receive 
> new data from a source we can fsync the update to the FlowFile Repository 
> before ACK'ing the data from the source. This allows us to guarantee data 
> persistence without the overhead of an fsync for every FlowFile Repository 
> update.
> It may make sense, though, to be a bit more selective about when do this. For 
> example if the source is a system that does not allow us to acknowledge the 
> receipt of data, such as a ListenUDP processor, this doesn't really buy us 
> much. In such a case, we could be smart about avoiding the high cost of an 
> fsync. However, for something like GetSFTP where we have to remove the file 
> in order to 'acknowledge receipt' we can ensure that we wait for the fsync 
> before proceeding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)