Adam,
I believe there is a need for more detailed ProvenanceEvents.
A use case would be a customer that is trying to track data passed between two 
nifi's and trying to match up SENDs and RECEIVEs

So a flowfile that has a SEND event on the first nifi should have a RECEIVE 
event on the second nifi.
But a flowfile that was PULLed by the second nifi (from the first nifi) will 
not necessarily have any provenance event generated by the first nifi.

(I realize that FETCH is already a "reserved word" in the current 
ProvenanceEvents setup, so I was hoping PULL could be used instead.)
There is another Provenance Event, ACKNOWLEDGE, which would also fit 
occasionally to this model as well (an example would be HandleHttpResponse 
processor which could send this instead of SEND when responding to a HTTP 
request)
This being said, you make an excellent point when you said
"However even more important to realize,
this change would affect many other downstream consumers of provenance data
which aren't necessarily in the stock NiFi distribution."
Thanks,
Nissim

    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman 
<nshi...@yahoo.com.invalid> wrote:  
 
  Adam,
"Yes" to your first question and the four processor examples you listed.

I will need to get back to you regarding your other points.

Thanks,
Nissim

    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft 
<a...@adamtaft.com> wrote:  
 
 Nissim,

Just to be clear, you are trying to distinguish between processors which
are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
data (ListenXYZ)?  Is that your basic vision?

GetFile => PULL
GetHTTP => PULL
ListenHTTP => RECEIVE
ListenTCP => RECEIVE

Could you clarify what advantages this would have in terms of data
provenance?  What would you use this new event type for specifically?  What
are you missing now? Do you have a use case that needs this, or are you
just generally trying to round out the provenance event types for sake of
completeness?  I honestly don't know a use case where you care whether you
polled for the data or listened for it.  The provenance model today just
cares that you received the data, not so much how you received it.

You're right that this proposal will affect many processors and the
internal visualization tools, etc.  However even more important to realize,
this change would affect many other downstream consumers of provenance data
which aren't necessarily in the stock NiFi distribution.  For example, any
third-party/custom ReportingTask that handles provenance data would need to
be updated with this change.  There's probably need for a strong vision to
help demonstrate the value for this vs. the cost of the cascading effects
related to this change.

Thanks,
Adam


On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman <nshi...@yahoo.com.invalid>
wrote:

> Hello Team,
>
> The ProvenanceEventType class does a good job capturing possible events,
> but the PULL event doesn't seem to fall nicely into any of the existing
> types.
>
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> active action of a PULL
>
> And... maybe it would fall into FETCH, but FETCH is more focused on
> contents of an existing flow file being overwritten.
>
> What does the community think about a new PULL event type,
> or
>  using FETCH for PULL, and having what FETCH does now be a new event such
> as REUSE
>
> NOTE: a new PULL event would have a cascading effect of many processors
> that currently are emitting RECEIVE's being modified to be PULL
> (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> would more accurately capture the event.
>
> Thanks,
> Nissim Shiman
>
>
    

Reply via email to