[ 
https://issues.apache.org/jira/browse/NIFI-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706243#comment-14706243
 ] 

Joseph Witt commented on NIFI-744:
----------------------------------

+1.

Bottom line - amazing.  This is a hugely helpful change and it is frankly funny 
we didn't realize this design sooner.  I suppose we had to grow to this point 
as it would have been very hard to do from the get go perhaps.  In any event 
here are some comments/questions.

- nifi-api/Very nice job of providing quality Javadocs!

- nifi-api/FileSystemSwapManager: Lines 300-310 retain backward compatibility!  
Awesome.

- nifi-framework-core/FlowController: Clear that we should refactor the 
prov-based content retrieval out.

- nifi-framework-core/FlowController:3325 Why is this necessary?  Why not just 
not call that method?  This has a 'smell' to it.

- Change in nifi-api/ContentRepository definition.  Totally fair game.  It is 
not a published/supported extension point (the content repository) at this 
point but eventually it probably should be.  There are now a couple deprecated 
methods.  Given the signature change on the other method and the declaration 
that this class is not presently subject to the extension point guarantees 
we're making why not just cut them out now?

- nifi-framework-core/FileSystemRepository:97-99  This is all magic numbers.  
Really a good idea to throw a javadoc on these and explain the reasoning.

- nifi-framework-core/FileSystemRepository:237-244 Are we sure this isn't a 
thread safety issue?  Could anything else be writing to those streams when this 
is called?

- nifi-framework-core/FileSystemRepository:479.. So just to be clear we are 
essentially introducing a need for shared state/concurency protection as a 
tradeoff for the thrashing and throughput issues otherwise seen on disk?  I am 
cool with this..but want to make sure i understood that.  That said I don't 
quite follow why we need that.  Can you explain this a bit more?

- nifi-framework-core/StandardProcessSession - In looking at how the process 
session works and thinking through how it might work when a single session 
writes lots of objects - will it be the case that sequences of writes could go 
to different ResourceClaims?  I am concerned about the implications of this as 
far as sequential reads. Chances are that sequential writes in a given flow 
will benefit from sequential reads.  Do we support that here or not?  I suppose 
that is the point of the queue.

- nifi-framework-core/TestFileSystemSwapManager:19 Extraneous import - same 
package.

- nifi-framework-core/Test* - These are really a very welcome addition.

Great work.

Thanks
Joe

> Allow FileSystemRepository to write to the same file for multiple 
> (non-parallel) sessions
> -----------------------------------------------------------------------------------------
>
>                 Key: NIFI-744
>                 URL: https://issues.apache.org/jira/browse/NIFI-744
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 0.3.0
>
>         Attachments: 
> 0001-NIFI-744-Refactored-ContentClaim-into-ContentClaim-a.patch
>
>
> Currently, when a ProcessSession is committed, the Content Claim that was 
> being written to is now "finished" and will never be written to again.
> When a flow has processors that generate many, many FlowFiles, each in their 
> own session, this means that we have many, many files on disk on the Content 
> Repository, as well. Generally, this hasn't been a problem to write to these 
> files. However, when the files are to be archived or destroyed, this is very 
> taxing and can cause erratic performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to