[ 
https://issues.apache.org/jira/browse/NIFI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141265#comment-15141265
 ] 

Brandon DeVries commented on NIFI-1018:
---------------------------------------

My original thought on this was to have the dataset written to disk as a 
FlowFile, and modify the controller service as such.  The writing to and 
reading from disk of the FlowFile would be analogous to a send / receive over 
the network.  A controller service would then RECEIVE the dataset when it reads 
it, and DROP it when it stops using it.  That way the provenance chain for the 
dataset is complete from beginning to end.

Additionally, a controller service can (and probably should) expose the details 
of the dataset to the processor calling it, allowing the reporting of an ENRICH 
/ CONTENT_MODIFIED / ATTRIBUTES_MODIFIED / whatever event.  Those details could 
simply be included in the returned "lookup" object.  But that enhancement to 
the *enriched* FlowFile's provenance chain is an issue separate from (but 
complimentary to) maintaining the dataset's provenance chain.


> Allow ControllerServices access to ProvenanceReporter
> -----------------------------------------------------
>
>                 Key: NIFI-1018
>                 URL: https://issues.apache.org/jira/browse/NIFI-1018
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Brandon DeVries
>            Assignee: Michael Moser
>
> Currently we maintain a provenance trail for all files flowing through NiFi 
> Processors.  However, if a ControllerService uses some data set it generally 
> just loads it from disk after it is fetched using a normal NiFi flow.  
> However, this breaks the provenance trail for the data set... there is no way 
> (in provenance terms) of knowing what data set the ControllerService is using 
> or when it was loaded.  By giving ControllerServices access to the 
> ProvenanceReporter, they can acknowledge "receipt" of a data set, so the 
> provenance trail from pull to use is maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to