[jira] [Commented] (NIFI-1018) Allow ControllerServices access to ProvenanceReporter

2016-02-10 Thread Brandon DeVries (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141268#comment-15141268
 ] 

Brandon DeVries commented on NIFI-1018:
---

I think there's room for both solutions.  Dataset Registry sounds awesome, but 
is going to take a while, and is worth getting right.  Exposing the provenance 
report to controller services (should be) pretty easy, and fix a deficiency 
right now.  If / when a better solution becomes available, that can be used... 
but in the meantime (6+ months?) a simple change will add a lot of value.



> Allow ControllerServices access to ProvenanceReporter
> -
>
> Key: NIFI-1018
> URL: https://issues.apache.org/jira/browse/NIFI-1018
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Brandon DeVries
>Assignee: Michael Moser
>
> Currently we maintain a provenance trail for all files flowing through NiFi 
> Processors.  However, if a ControllerService uses some data set it generally 
> just loads it from disk after it is fetched using a normal NiFi flow.  
> However, this breaks the provenance trail for the data set... there is no way 
> (in provenance terms) of knowing what data set the ControllerService is using 
> or when it was loaded.  By giving ControllerServices access to the 
> ProvenanceReporter, they can acknowledge "receipt" of a data set, so the 
> provenance trail from pull to use is maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1018) Allow ControllerServices access to ProvenanceReporter

2016-02-10 Thread Joseph Witt (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141271#comment-15141271
 ] 

Joseph Witt commented on NIFI-1018:
---

i agree there is probably a good short term option and what i was proposing is 
a ways out most likely.

> Allow ControllerServices access to ProvenanceReporter
> -
>
> Key: NIFI-1018
> URL: https://issues.apache.org/jira/browse/NIFI-1018
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Brandon DeVries
>Assignee: Michael Moser
>
> Currently we maintain a provenance trail for all files flowing through NiFi 
> Processors.  However, if a ControllerService uses some data set it generally 
> just loads it from disk after it is fetched using a normal NiFi flow.  
> However, this breaks the provenance trail for the data set... there is no way 
> (in provenance terms) of knowing what data set the ControllerService is using 
> or when it was loaded.  By giving ControllerServices access to the 
> ProvenanceReporter, they can acknowledge "receipt" of a data set, so the 
> provenance trail from pull to use is maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1018) Allow ControllerServices access to ProvenanceReporter

2016-02-10 Thread Brandon DeVries (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141265#comment-15141265
 ] 

Brandon DeVries commented on NIFI-1018:
---

My original thought on this was to have the dataset written to disk as a 
FlowFile, and modify the controller service as such.  The writing to and 
reading from disk of the FlowFile would be analogous to a send / receive over 
the network.  A controller service would then RECEIVE the dataset when it reads 
it, and DROP it when it stops using it.  That way the provenance chain for the 
dataset is complete from beginning to end.

Additionally, a controller service can (and probably should) expose the details 
of the dataset to the processor calling it, allowing the reporting of an ENRICH 
/ CONTENT_MODIFIED / ATTRIBUTES_MODIFIED / whatever event.  Those details could 
simply be included in the returned "lookup" object.  But that enhancement to 
the *enriched* FlowFile's provenance chain is an issue separate from (but 
complimentary to) maintaining the dataset's provenance chain.


> Allow ControllerServices access to ProvenanceReporter
> -
>
> Key: NIFI-1018
> URL: https://issues.apache.org/jira/browse/NIFI-1018
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Brandon DeVries
>Assignee: Michael Moser
>
> Currently we maintain a provenance trail for all files flowing through NiFi 
> Processors.  However, if a ControllerService uses some data set it generally 
> just loads it from disk after it is fetched using a normal NiFi flow.  
> However, this breaks the provenance trail for the data set... there is no way 
> (in provenance terms) of knowing what data set the ControllerService is using 
> or when it was loaded.  By giving ControllerServices access to the 
> ProvenanceReporter, they can acknowledge "receipt" of a data set, so the 
> provenance trail from pull to use is maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1018) Allow ControllerServices access to ProvenanceReporter

2016-02-04 Thread Michael Moser (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133091#comment-15133091
 ] 

Michael Moser commented on NIFI-1018:
-

As a really crazy idea, I wonder if it would be useful for NiFi to have a new 
Configuration Repository?  It would consist of FlowFiles that are more-or-less 
permanent.  They could be stored in the Configuration Repository with a unique 
key, so it would be easy to replace with an updated file.  Provenance events 
could be recorded for when FlowFiles add/update/delete from the Config 
Repository.  Processors/Services/Tasks could reference the FlowFiles in the 
Configuration Repository by this unique key, and could record a provenance 
event when they use this reference to read a FlowFile.

> Allow ControllerServices access to ProvenanceReporter
> -
>
> Key: NIFI-1018
> URL: https://issues.apache.org/jira/browse/NIFI-1018
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Brandon DeVries
>Assignee: Michael Moser
>
> Currently we maintain a provenance trail for all files flowing through NiFi 
> Processors.  However, if a ControllerService uses some data set it generally 
> just loads it from disk after it is fetched using a normal NiFi flow.  
> However, this breaks the provenance trail for the data set... there is no way 
> (in provenance terms) of knowing what data set the ControllerService is using 
> or when it was loaded.  By giving ControllerServices access to the 
> ProvenanceReporter, they can acknowledge "receipt" of a data set, so the 
> provenance trail from pull to use is maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1018) Allow ControllerServices access to ProvenanceReporter

2016-02-04 Thread Michael Moser (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133083#comment-15133083
 ] 

Michael Moser commented on NIFI-1018:
-

I wonder if the new StateManager could be leveraged in any way?  Can Controller 
Services access the StateManager?  Can FlowFile references be stored in the 
StateManager in such a way that it could be used in a ProvenanceReporter?

> Allow ControllerServices access to ProvenanceReporter
> -
>
> Key: NIFI-1018
> URL: https://issues.apache.org/jira/browse/NIFI-1018
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Brandon DeVries
>Assignee: Michael Moser
>
> Currently we maintain a provenance trail for all files flowing through NiFi 
> Processors.  However, if a ControllerService uses some data set it generally 
> just loads it from disk after it is fetched using a normal NiFi flow.  
> However, this breaks the provenance trail for the data set... there is no way 
> (in provenance terms) of knowing what data set the ControllerService is using 
> or when it was loaded.  By giving ControllerServices access to the 
> ProvenanceReporter, they can acknowledge "receipt" of a data set, so the 
> provenance trail from pull to use is maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1018) Allow ControllerServices access to ProvenanceReporter

2016-02-04 Thread Joseph Witt (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133163#comment-15133163
 ] 

Joseph Witt commented on NIFI-1018:
---

Not crazy at all.  I love this idea.  I have often felt like we needed a 
'DataSet Registry' for these sorts of datasets that get reused (configuration, 
enrichment, dictionaries, etc...)

> Allow ControllerServices access to ProvenanceReporter
> -
>
> Key: NIFI-1018
> URL: https://issues.apache.org/jira/browse/NIFI-1018
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Brandon DeVries
>Assignee: Michael Moser
>
> Currently we maintain a provenance trail for all files flowing through NiFi 
> Processors.  However, if a ControllerService uses some data set it generally 
> just loads it from disk after it is fetched using a normal NiFi flow.  
> However, this breaks the provenance trail for the data set... there is no way 
> (in provenance terms) of knowing what data set the ControllerService is using 
> or when it was loaded.  By giving ControllerServices access to the 
> ProvenanceReporter, they can acknowledge "receipt" of a data set, so the 
> provenance trail from pull to use is maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1018) Allow ControllerServices access to ProvenanceReporter

2016-02-04 Thread Brandon DeVries (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133174#comment-15133174
 ] 

Brandon DeVries commented on NIFI-1018:
---

That could also potentially help distribute data sets across nodes, and
make sure data sets are in sync. That would be a lot better than our
current ad hoc solutions...



> Allow ControllerServices access to ProvenanceReporter
> -
>
> Key: NIFI-1018
> URL: https://issues.apache.org/jira/browse/NIFI-1018
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Brandon DeVries
>Assignee: Michael Moser
>
> Currently we maintain a provenance trail for all files flowing through NiFi 
> Processors.  However, if a ControllerService uses some data set it generally 
> just loads it from disk after it is fetched using a normal NiFi flow.  
> However, this breaks the provenance trail for the data set... there is no way 
> (in provenance terms) of knowing what data set the ControllerService is using 
> or when it was loaded.  By giving ControllerServices access to the 
> ProvenanceReporter, they can acknowledge "receipt" of a data set, so the 
> provenance trail from pull to use is maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1018) Allow ControllerServices access to ProvenanceReporter

2016-02-04 Thread Joseph Witt (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133298#comment-15133298
 ] 

Joseph Witt commented on NIFI-1018:
---

entirely agreed.  Basically one can register/publish new versions of datasets 
and then can subscribe to them for auto-updates/etc..  Far superior to our 
current model.  We have a lot of very big ticket items on the plate already but 
maybe a later part of the year effort we could gel around.

> Allow ControllerServices access to ProvenanceReporter
> -
>
> Key: NIFI-1018
> URL: https://issues.apache.org/jira/browse/NIFI-1018
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Brandon DeVries
>Assignee: Michael Moser
>
> Currently we maintain a provenance trail for all files flowing through NiFi 
> Processors.  However, if a ControllerService uses some data set it generally 
> just loads it from disk after it is fetched using a normal NiFi flow.  
> However, this breaks the provenance trail for the data set... there is no way 
> (in provenance terms) of knowing what data set the ControllerService is using 
> or when it was loaded.  By giving ControllerServices access to the 
> ProvenanceReporter, they can acknowledge "receipt" of a data set, so the 
> provenance trail from pull to use is maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)