I like the idea of creating PULL as a type. In fact, I'd propose that there are three scenarios here:
RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka subscription PULL - Direct operations to seek out and fetch something in a targeted fashion. Ex. GetHttp QUERY - Go looking for the data and take what matches your search. Ex. JsonQueryElasticsearch, GetMongo, any SQL query processor, etc. On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman <nshi...@yahoo.com.invalid> wrote: > Joe, > > > It is hard to say how much value transit URI would bring to clarify a > RECEIVE. > For example a RECEIVE with transit URI of https:<etc.> could be either a > GetHTTP (i.e. active) or ListenHTTP (i.e. passive) > > but your idea of "a metadata item specifying active vs passive" is a very > clever way to make this work with mimimal disruptions. > > My understanding of this is that the current receive() calls in > ProvenanceReporter [1] will remain the same, but news ones will be added > with a boolean parameter reflecting if the receive is active or passive. > This will allow the current list of Provenance Events [2] to remain the > same. So third party/custom processors can continue working as is > > Does this sound like what you are thinking? > > > [1] > https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46 > > [2] > https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java > > > Thanks, > > Nissim > On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt < > joe.w...@gmail.com> wrote: > > Nissim > > I like the idea to introduce a more refined type of event for how data is > brought into nifi (active - PULL, passive - RECEIVE). > > That said it might be sufficient to simply have this distinction be on the > "RECEIVE" event as a metadata item specifying active vs passive. The > protocol utilized as mentioned in the transport URI should clarify this > though. > > In short - i think there is a way here that is all opt-in for existing > users and components. > > Thanks > > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman <nshi...@yahoo.com.invalid> > wrote: > > > Adam, > > good points... > > I missed a step in explaining the use case where Provenance Events is > > incomplete... > > Where the second nifi does a GetSFTP from the *filesytem* that the first > > nifi is located on > > So the second nifi currently sends a RECEIVE event, but there is no > > corresponding SEND event from the first nifi (nor should there be) > > If the second nifi sent a PULL event, it would be easier for a system > > overseer to know that there should be no corresponding SEND event > > > > Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2 > > does a ListenHTTP, but not in the case above. > > > > The ERROR case you mention is a nice point as well, although not my > > specific issue at the moment. > > Thanks, > > Nissim > > > > > > > > > > > > On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft < > > a...@adamtaft.com> wrote: > > > > > But a flowfile that was PULLed by the second nifi (from the first > nifi) > > will not necessarily have any provenance event generated by the first > nifi. > > > > Isn't this the fault of the first NiFi to fail to emit a SEND event in > > response to the second NiFi's request? In this scenario, shouldn't the > > send/receive pair be: > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]? > > > > What you describe is an odd use case for NiFi. NiFi is usually not in > the > > business of acting as a file server daemon in order to "send" flowfiles > to > > other systems. As you mention, HandleHttpResponse may be a lone wolf > > example processor which generates a SEND event whose input originates > from > > a "listener". [1] The other ListenXYZ processors generally issue RECEIVE > > events because they are receiving bytes, not generating them. > > > > Are there other processors in question? Something custom? Or is this > > related to site-to-site transfers? > > > > I still kind of question the motive of a provenance event pair that is > > trying to establish "who called who first". Honestly just trying to > > understand the use case where a matching SEND/RECEIVE pair doesn't give > you > > what you need. > > > > The only thing I could see would be a processor that asks for data, but > > then doesn't receive it due to some error condition. In this case, > adding > > some sort of ERROR event might be useful. "I attempted to retrieve data > > from ${uri}, but the transfer failed because of ${error condition}". > That > > way, GetXYZ processors could report an error to provenance instead of as > a > > bulletin. > > > > If the problem is related to a processor or the framework itself not > > generating an event, can we just fix that function to emit SEND in the > > scenario that you describe? Changing the provenance model itself (beyond > > possibly adding an ERROR event) feels like it would be the last scenario > to > > consider. > > > > Thanks, > > Adam > > > > [1] > > > > > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191 > > > > > > > > > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman <nshi...@yahoo.com.invalid > > > > wrote: > > > > > Adam, > > > I believe there is a need for more detailed ProvenanceEvents. > > > A use case would be a customer that is trying to track data passed > > between > > > two nifi's and trying to match up SENDs and RECEIVEs > > > > > > So a flowfile that has a SEND event on the first nifi should have a > > > RECEIVE event on the second nifi. > > > But a flowfile that was PULLed by the second nifi (from the first nifi) > > > will not necessarily have any provenance event generated by the first > > nifi. > > > > > > (I realize that FETCH is already a "reserved word" in the current > > > ProvenanceEvents setup, so I was hoping PULL could be used instead.) > > > There is another Provenance Event, ACKNOWLEDGE, which would also fit > > > occasionally to this model as well (an example would be > > HandleHttpResponse > > > processor which could send this instead of SEND when responding to a > HTTP > > > request) > > > This being said, you make an excellent point when you said > > > "However even more important to realize, > > > this change would affect many other downstream consumers of provenance > > data > > > which aren't necessarily in the stock NiFi distribution." > > > Thanks, > > > Nissim > > > > > > On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman > > > <nshi...@yahoo.com.invalid> wrote: > > > > > > Adam, > > > "Yes" to your first question and the four processor examples you > listed. > > > > > > I will need to get back to you regarding your other points. > > > > > > Thanks, > > > Nissim > > > > > > On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft < > > > a...@adamtaft.com> wrote: > > > > > > Nissim, > > > > > > Just to be clear, you are trying to distinguish between processors > which > > > are actively "pulling" data (GetXYZ) vs. processors which just "listen" > > for > > > data (ListenXYZ)? Is that your basic vision? > > > > > > GetFile => PULL > > > GetHTTP => PULL > > > ListenHTTP => RECEIVE > > > ListenTCP => RECEIVE > > > > > > Could you clarify what advantages this would have in terms of data > > > provenance? What would you use this new event type for specifically? > > What > > > are you missing now? Do you have a use case that needs this, or are you > > > just generally trying to round out the provenance event types for sake > of > > > completeness? I honestly don't know a use case where you care whether > > you > > > polled for the data or listened for it. The provenance model today > just > > > cares that you received the data, not so much how you received it. > > > > > > You're right that this proposal will affect many processors and the > > > internal visualization tools, etc. However even more important to > > realize, > > > this change would affect many other downstream consumers of provenance > > data > > > which aren't necessarily in the stock NiFi distribution. For example, > > any > > > third-party/custom ReportingTask that handles provenance data would > need > > to > > > be updated with this change. There's probably need for a strong vision > > to > > > help demonstrate the value for this vs. the cost of the cascading > effects > > > related to this change. > > > > > > Thanks, > > > Adam > > > > > > > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman > <nshi...@yahoo.com.invalid > > > > > > wrote: > > > > > > > Hello Team, > > > > > > > > The ProvenanceEventType class does a good job capturing possible > > events, > > > > but the PULL event doesn't seem to fall nicely into any of the > existing > > > > types. > > > > > > > > > > > > > > https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java > > > > RECEIVE is the closest, but RECEIVE is passive and doesn't capture > the > > > > active action of a PULL > > > > > > > > And... maybe it would fall into FETCH, but FETCH is more focused on > > > > contents of an existing flow file being overwritten. > > > > > > > > What does the community think about a new PULL event type, > > > > or > > > > using FETCH for PULL, and having what FETCH does now be a new event > > such > > > > as REUSE > > > > > > > > NOTE: a new PULL event would have a cascading effect of many > processors > > > > that currently are emitting RECEIVE's being modified to be PULL > > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), > but > > > > would more accurately capture the event. > > > > > > > > Thanks, > > > > Nissim Shiman > > > > > > > > > > > > > >