Re: PULL ProvenanceEvent

2019-11-06 Thread Adam Taft
+1 Joe - this is a good compromise to keep the original API undisturbed.


On Wed, Nov 6, 2019 at 11:05 AM Joe Witt  wrote:

> Nissim
>
> Notionally I am saying that session.getProvenanceReporter().receive(...)
> should have an option to call
> session.getProvenanceReporter().receive(...,ACTIVE|PASSIVE) and if not
> specified it would be UNSPECIFIED.
>
> I dont think this needs to be on the flowfile attribute - it would go
> straight to the provenance event itself which is generated by the session.
>
> Thanks
> Joe
>
> On Wed, Nov 6, 2019 at 11:32 AM Nissim Shiman 
> wrote:
>
> >  Joe,
> >
> > Just to verify what you mean,
> >
> > You are saying that the line:
> > flowfile = session.putAttribute(flowfile, "receiveType", "active")
> >
> > could be added before
> > session.getProvenanceReporter().receive(...)
> >
> >
> > to indicate a PULL.  Is this correct?
> >
> > Thanks,
> >
> > Nissim
> >
> >
> >
> >
> >
> >
> > On Monday, November 4, 2019, 12:50:11 PM EST, Nissim Shiman
> >  wrote:
> >
> >   Having an attribute added indicating passive/active/query for RECEIVE
> > and FETCH will work,
> >
> > but nifi attributes are stateful (i.e. they will still be on the flowfile
> > as metadata a couple of processor steps down the flow)
> >
> > Maybe an option is to expand the the api for RECEIVE and FETCH for with a
> > new parameter for passive/active/query ?
> > (i.e. the existing message signatures, such as  [1] will remain the same,
> > but new ones will be added to handle this new parameter?
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> >
> >
> > On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt <
> > joe.w...@gmail.com> wrote:
> >
> >  These distinctions may be meaningful.  Adding them as an attribute lets
> > the
> > meaning convey but not introduce complexity for the majority case which
> is
> > the distinction isnt key.
> >
> > thanks
> >
> > On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman  >
> > wrote:
> >
> > >  Mike,
> > > I like the QUERY type as well.  Basically a more refined PULL.  Very
> > nice.
> > >
> > >
> > > Part of the challenge of adding PULL as a type is that there are
> > currently
> > > two flavors of RECEIVEs.
> > > RECEIVE and FETCH [1]
> > >
> > > So any addition of a PULL would need a second flavor of PULL to match
> the
> > > case where a flowfile's contents are being overwritten as well (i.e. as
> > > FETCH is currently doing)
> > >
> > >
> > > [1]
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
> > >
> > >
> > > Thanks,
> > > Nissim
> > >
> > >
> > >On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> > > mikerthom...@gmail.com> wrote:
> > >
> > >  I like the idea of creating PULL as a type. In fact, I'd propose that
> > > there
> > > are three scenarios here:
> > >
> > > RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> > > subscription
> > > PULL - Direct operations to seek out and fetch something in a targeted
> > > fashion. Ex. GetHttp
> > > QUERY - Go looking for the data and take what matches your search. Ex.
> > > JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
> > >
> > >
> > >
> > > On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman
>  > >
> > > wrote:
> > >
> > > >  Joe,
> > > >
> > > >
> > > > It is hard to say how much value transit URI would bring to clarify a
> > > > RECEIVE.
> > > > For example a RECEIVE with transit URI of https: could be
> either
> > a
> > > > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> > > >
> > > > but your idea of "a metadata item specifying active vs passive" is a
> > very
> > > > clever way to make this work with mimimal disruptions.
> > > >
> > > > My understanding of this is that the current receive() calls in
> > > > ProvenanceReporter [1] will remain the same, but news ones will be
> > added
> > > > with a boolean parameter reflecting if the receive is active or
> > passive.
> > > > This will allow the current list of Provenance Events [2] to remain
> the
> > > > same.  So third party/custom processors can continue working as is
> > > >
> > > > Does this sound like what you are thinking?
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> > > >
> > > > [2]
> > > > apache/nifi
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Nissim
> > > >On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > > > joe.w...@gmail.com> wrote:
> > > >
> > > >  Nissim
> > > >
> > > > I like the idea to introduce a more refined type of event for how
> data
> > is
> > > > brought into nifi (active - PULL, passive - RECEIVE).
> > > >
> > > > That said it might be sufficient to simply have this distinction be
> on
> > > the
> > > > "RECEIVE" event as a metadata item specifying active vs passi

Re: PULL ProvenanceEvent

2019-11-06 Thread Nissim Shiman
 Joe,

Very nice...


Thanks!
Nissim
On Wednesday, November 6, 2019, 1:05:09 PM EST, Joe Witt 
 wrote:  
 
 Nissim

Notionally I am saying that session.getProvenanceReporter().receive(...)
should have an option to call
session.getProvenanceReporter().receive(...,ACTIVE|PASSIVE) and if not
specified it would be UNSPECIFIED.

I dont think this needs to be on the flowfile attribute - it would go
straight to the provenance event itself which is generated by the session.

Thanks
Joe

On Wed, Nov 6, 2019 at 11:32 AM Nissim Shiman 
wrote:

>  Joe,
>
> Just to verify what you mean,
>
> You are saying that the line:
> flowfile = session.putAttribute(flowfile, "receiveType", "active")
>
> could be added before
> session.getProvenanceReporter().receive(...)
>
>
> to indicate a PULL.  Is this correct?
>
> Thanks,
>
> Nissim
>
>
>
>
>
>
>    On Monday, November 4, 2019, 12:50:11 PM EST, Nissim Shiman
>  wrote:
>
>  Having an attribute added indicating passive/active/query for RECEIVE
> and FETCH will work,
>
> but nifi attributes are stateful (i.e. they will still be on the flowfile
> as metadata a couple of processor steps down the flow)
>
> Maybe an option is to expand the the api for RECEIVE and FETCH for with a
> new parameter for passive/active/query ?
> (i.e. the existing message signatures, such as  [1] will remain the same,
> but new ones will be added to handle this new parameter?
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
>
>
>    On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt <
> joe.w...@gmail.com> wrote:
>
>  These distinctions may be meaningful.  Adding them as an attribute lets
> the
> meaning convey but not introduce complexity for the majority case which is
> the distinction isnt key.
>
> thanks
>
> On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman 
> wrote:
>
> >  Mike,
> > I like the QUERY type as well.  Basically a more refined PULL.  Very
> nice.
> >
> >
> > Part of the challenge of adding PULL as a type is that there are
> currently
> > two flavors of RECEIVEs.
> > RECEIVE and FETCH [1]
> >
> > So any addition of a PULL would need a second flavor of PULL to match the
> > case where a flowfile's contents are being overwritten as well (i.e. as
> > FETCH is currently doing)
> >
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
> >
> >
> > Thanks,
> > Nissim
> >
> >
> >    On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> > mikerthom...@gmail.com> wrote:
> >
> >  I like the idea of creating PULL as a type. In fact, I'd propose that
> > there
> > are three scenarios here:
> >
> > RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> > subscription
> > PULL - Direct operations to seek out and fetch something in a targeted
> > fashion. Ex. GetHttp
> > QUERY - Go looking for the data and take what matches your search. Ex.
> > JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
> >
> >
> >
> > On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman  >
> > wrote:
> >
> > >  Joe,
> > >
> > >
> > > It is hard to say how much value transit URI would bring to clarify a
> > > RECEIVE.
> > > For example a RECEIVE with transit URI of https: could be either
> a
> > > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> > >
> > > but your idea of "a metadata item specifying active vs passive" is a
> very
> > > clever way to make this work with mimimal disruptions.
> > >
> > > My understanding of this is that the current receive() calls in
> > > ProvenanceReporter [1] will remain the same, but news ones will be
> added
> > > with a boolean parameter reflecting if the receive is active or
> passive.
> > > This will allow the current list of Provenance Events [2] to remain the
> > > same.  So third party/custom processors can continue working as is
> > >
> > > Does this sound like what you are thinking?
> > >
> > >
> > > [1]
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> > >
> > > [2]
> > > apache/nifi
> > >
> > >
> > > Thanks,
> > >
> > > Nissim
> > >    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > > joe.w...@gmail.com> wrote:
> > >
> > >  Nissim
> > >
> > > I like the idea to introduce a more refined type of event for how data
> is
> > > brought into nifi (active - PULL, passive - RECEIVE).
> > >
> > > That said it might be sufficient to simply have this distinction be on
> > the
> > > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > > protocol utilized as mentioned in the transport URI should clarify this
> > > though.
> > >
> > > In short - i think there is a way here that is all opt-in for existing
> > > users and components.
> > >
> > > Thanks
> > >
> > > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman
>  > >
> > > wrote:
> > >
> > > >  Adam,
> > > > good points...
> 

Re: PULL ProvenanceEvent

2019-11-06 Thread Joe Witt
Nissim

Notionally I am saying that session.getProvenanceReporter().receive(...)
should have an option to call
session.getProvenanceReporter().receive(...,ACTIVE|PASSIVE) and if not
specified it would be UNSPECIFIED.

I dont think this needs to be on the flowfile attribute - it would go
straight to the provenance event itself which is generated by the session.

Thanks
Joe

On Wed, Nov 6, 2019 at 11:32 AM Nissim Shiman 
wrote:

>  Joe,
>
> Just to verify what you mean,
>
> You are saying that the line:
> flowfile = session.putAttribute(flowfile, "receiveType", "active")
>
> could be added before
> session.getProvenanceReporter().receive(...)
>
>
> to indicate a PULL.  Is this correct?
>
> Thanks,
>
> Nissim
>
>
>
>
>
>
> On Monday, November 4, 2019, 12:50:11 PM EST, Nissim Shiman
>  wrote:
>
>   Having an attribute added indicating passive/active/query for RECEIVE
> and FETCH will work,
>
> but nifi attributes are stateful (i.e. they will still be on the flowfile
> as metadata a couple of processor steps down the flow)
>
> Maybe an option is to expand the the api for RECEIVE and FETCH for with a
> new parameter for passive/active/query ?
> (i.e. the existing message signatures, such as  [1] will remain the same,
> but new ones will be added to handle this new parameter?
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
>
>
> On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt <
> joe.w...@gmail.com> wrote:
>
>  These distinctions may be meaningful.  Adding them as an attribute lets
> the
> meaning convey but not introduce complexity for the majority case which is
> the distinction isnt key.
>
> thanks
>
> On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman 
> wrote:
>
> >  Mike,
> > I like the QUERY type as well.  Basically a more refined PULL.  Very
> nice.
> >
> >
> > Part of the challenge of adding PULL as a type is that there are
> currently
> > two flavors of RECEIVEs.
> > RECEIVE and FETCH [1]
> >
> > So any addition of a PULL would need a second flavor of PULL to match the
> > case where a flowfile's contents are being overwritten as well (i.e. as
> > FETCH is currently doing)
> >
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
> >
> >
> > Thanks,
> > Nissim
> >
> >
> >On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> > mikerthom...@gmail.com> wrote:
> >
> >  I like the idea of creating PULL as a type. In fact, I'd propose that
> > there
> > are three scenarios here:
> >
> > RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> > subscription
> > PULL - Direct operations to seek out and fetch something in a targeted
> > fashion. Ex. GetHttp
> > QUERY - Go looking for the data and take what matches your search. Ex.
> > JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
> >
> >
> >
> > On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman  >
> > wrote:
> >
> > >  Joe,
> > >
> > >
> > > It is hard to say how much value transit URI would bring to clarify a
> > > RECEIVE.
> > > For example a RECEIVE with transit URI of https: could be either
> a
> > > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> > >
> > > but your idea of "a metadata item specifying active vs passive" is a
> very
> > > clever way to make this work with mimimal disruptions.
> > >
> > > My understanding of this is that the current receive() calls in
> > > ProvenanceReporter [1] will remain the same, but news ones will be
> added
> > > with a boolean parameter reflecting if the receive is active or
> passive.
> > > This will allow the current list of Provenance Events [2] to remain the
> > > same.  So third party/custom processors can continue working as is
> > >
> > > Does this sound like what you are thinking?
> > >
> > >
> > > [1]
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> > >
> > > [2]
> > > apache/nifi
> > >
> > >
> > > Thanks,
> > >
> > > Nissim
> > >On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > > joe.w...@gmail.com> wrote:
> > >
> > >  Nissim
> > >
> > > I like the idea to introduce a more refined type of event for how data
> is
> > > brought into nifi (active - PULL, passive - RECEIVE).
> > >
> > > That said it might be sufficient to simply have this distinction be on
> > the
> > > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > > protocol utilized as mentioned in the transport URI should clarify this
> > > though.
> > >
> > > In short - i think there is a way here that is all opt-in for existing
> > > users and components.
> > >
> > > Thanks
> > >
> > > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman
>  > >
> > > wrote:
> > >
> > > >  Adam,
> > > > good points...
> > > > I missed a step in explaining the use case where Provenance Events is
> > > > incomplete...
> > > > Whe

Re: PULL ProvenanceEvent

2019-11-06 Thread Nissim Shiman
 Joe,

Just to verify what you mean,

You are saying that the line:
flowfile = session.putAttribute(flowfile, "receiveType", "active")

could be added before
session.getProvenanceReporter().receive(...)


to indicate a PULL.  Is this correct?

Thanks,

Nissim






On Monday, November 4, 2019, 12:50:11 PM EST, Nissim Shiman 
 wrote:  
 
  Having an attribute added indicating passive/active/query for RECEIVE and 
FETCH will work, 

but nifi attributes are stateful (i.e. they will still be on the flowfile as 
metadata a couple of processor steps down the flow)

Maybe an option is to expand the the api for RECEIVE and FETCH for with a new 
parameter for passive/active/query ?
(i.e. the existing message signatures, such as  [1] will remain the same, but 
new ones will be added to handle this new parameter?

[1] 
https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46


    On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt 
 wrote:  
 
 These distinctions may be meaningful.  Adding them as an attribute lets the
meaning convey but not introduce complexity for the majority case which is
the distinction isnt key.

thanks

On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman 
wrote:

>  Mike,
> I like the QUERY type as well.  Basically a more refined PULL.  Very nice.
>
>
> Part of the challenge of adding PULL as a type is that there are currently
> two flavors of RECEIVEs.
> RECEIVE and FETCH [1]
>
> So any addition of a PULL would need a second flavor of PULL to match the
> case where a flowfile's contents are being overwritten as well (i.e. as
> FETCH is currently doing)
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
>
>
> Thanks,
> Nissim
>
>
>    On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> mikerthom...@gmail.com> wrote:
>
>  I like the idea of creating PULL as a type. In fact, I'd propose that
> there
> are three scenarios here:
>
> RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> subscription
> PULL - Direct operations to seek out and fetch something in a targeted
> fashion. Ex. GetHttp
> QUERY - Go looking for the data and take what matches your search. Ex.
> JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
>
>
>
> On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman 
> wrote:
>
> >  Joe,
> >
> >
> > It is hard to say how much value transit URI would bring to clarify a
> > RECEIVE.
> > For example a RECEIVE with transit URI of https: could be either a
> > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> >
> > but your idea of "a metadata item specifying active vs passive" is a very
> > clever way to make this work with mimimal disruptions.
> >
> > My understanding of this is that the current receive() calls in
> > ProvenanceReporter [1] will remain the same, but news ones will be added
> > with a boolean parameter reflecting if the receive is active or passive.
> > This will allow the current list of Provenance Events [2] to remain the
> > same.  So third party/custom processors can continue working as is
> >
> > Does this sound like what you are thinking?
> >
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> >
> > [2]
> > apache/nifi
> >
> >
> > Thanks,
> >
> > Nissim
> >    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > joe.w...@gmail.com> wrote:
> >
> >  Nissim
> >
> > I like the idea to introduce a more refined type of event for how data is
> > brought into nifi (active - PULL, passive - RECEIVE).
> >
> > That said it might be sufficient to simply have this distinction be on
> the
> > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > protocol utilized as mentioned in the transport URI should clarify this
> > though.
> >
> > In short - i think there is a way here that is all opt-in for existing
> > users and components.
> >
> > Thanks
> >
> > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman  >
> > wrote:
> >
> > >  Adam,
> > > good points...
> > > I missed a step in explaining the use case where Provenance Events is
> > > incomplete...
> > > Where the second nifi does a GetSFTP from the *filesytem* that the
> first
> > > nifi is located on
> > > So the second nifi currently sends a RECEIVE event, but there is no
> > > corresponding SEND event from the first nifi (nor should there be)
> > > If the second nifi sent a PULL event, it would be easier for a system
> > > overseer to know that there should be no corresponding SEND event
> > >
> > > Currently send/receive works well when nifi 1 does a PostHTTP and nifi
> 2
> > > does a ListenHTTP, but not in the case above.
> > >
> > > The ERROR case you mention is a nice point as well, although not my
> > > specific issue at the moment.
> > > Thanks,
> > > Nissim
> > >
> > >
> > >
> > >
> > >
> > >    On Monday, October 28, 2019,

Re: PULL ProvenanceEvent

2019-11-04 Thread Nissim Shiman
 Having an attribute added indicating passive/active/query for RECEIVE and 
FETCH will work, 

but nifi attributes are stateful (i.e. they will still be on the flowfile as 
metadata a couple of processor steps down the flow)

Maybe an option is to expand the the api for RECEIVE and FETCH for with a new 
parameter for passive/active/query ?
(i.e. the existing message signatures, such as  [1] will remain the same, but 
new ones will be added to handle this new parameter?

[1] 
https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46


On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt 
 wrote:  
 
 These distinctions may be meaningful.  Adding them as an attribute lets the
meaning convey but not introduce complexity for the majority case which is
the distinction isnt key.

thanks

On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman 
wrote:

>  Mike,
> I like the QUERY type as well.  Basically a more refined PULL.  Very nice.
>
>
> Part of the challenge of adding PULL as a type is that there are currently
> two flavors of RECEIVEs.
> RECEIVE and FETCH [1]
>
> So any addition of a PULL would need a second flavor of PULL to match the
> case where a flowfile's contents are being overwritten as well (i.e. as
> FETCH is currently doing)
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
>
>
> Thanks,
> Nissim
>
>
>    On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> mikerthom...@gmail.com> wrote:
>
>  I like the idea of creating PULL as a type. In fact, I'd propose that
> there
> are three scenarios here:
>
> RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> subscription
> PULL - Direct operations to seek out and fetch something in a targeted
> fashion. Ex. GetHttp
> QUERY - Go looking for the data and take what matches your search. Ex.
> JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
>
>
>
> On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman 
> wrote:
>
> >  Joe,
> >
> >
> > It is hard to say how much value transit URI would bring to clarify a
> > RECEIVE.
> > For example a RECEIVE with transit URI of https: could be either a
> > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> >
> > but your idea of "a metadata item specifying active vs passive" is a very
> > clever way to make this work with mimimal disruptions.
> >
> > My understanding of this is that the current receive() calls in
> > ProvenanceReporter [1] will remain the same, but news ones will be added
> > with a boolean parameter reflecting if the receive is active or passive.
> > This will allow the current list of Provenance Events [2] to remain the
> > same.  So third party/custom processors can continue working as is
> >
> > Does this sound like what you are thinking?
> >
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> >
> > [2]
> > apache/nifi
> >
> >
> > Thanks,
> >
> > Nissim
> >    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > joe.w...@gmail.com> wrote:
> >
> >  Nissim
> >
> > I like the idea to introduce a more refined type of event for how data is
> > brought into nifi (active - PULL, passive - RECEIVE).
> >
> > That said it might be sufficient to simply have this distinction be on
> the
> > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > protocol utilized as mentioned in the transport URI should clarify this
> > though.
> >
> > In short - i think there is a way here that is all opt-in for existing
> > users and components.
> >
> > Thanks
> >
> > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman  >
> > wrote:
> >
> > >  Adam,
> > > good points...
> > > I missed a step in explaining the use case where Provenance Events is
> > > incomplete...
> > > Where the second nifi does a GetSFTP from the *filesytem* that the
> first
> > > nifi is located on
> > > So the second nifi currently sends a RECEIVE event, but there is no
> > > corresponding SEND event from the first nifi (nor should there be)
> > > If the second nifi sent a PULL event, it would be easier for a system
> > > overseer to know that there should be no corresponding SEND event
> > >
> > > Currently send/receive works well when nifi 1 does a PostHTTP and nifi
> 2
> > > does a ListenHTTP, but not in the case above.
> > >
> > > The ERROR case you mention is a nice point as well, although not my
> > > specific issue at the moment.
> > > Thanks,
> > > Nissim
> > >
> > >
> > >
> > >
> > >
> > >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > > a...@adamtaft.com> wrote:
> > >
> > >  > But a flowfile that was PULLed by the second nifi (from the first
> > nifi)
> > > will not necessarily have any provenance event generated by the first
> > nifi.
> > >
> > > Isn't this the fault of the first NiFi to fail to emit a SEND event in
> > > response to the 

Re: PULL ProvenanceEvent

2019-10-31 Thread Joe Witt
These distinctions may be meaningful.  Adding them as an attribute lets the
meaning convey but not introduce complexity for the majority case which is
the distinction isnt key.

thanks

On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman 
wrote:

>  Mike,
> I like the QUERY type as well.  Basically a more refined PULL.  Very nice.
>
>
> Part of the challenge of adding PULL as a type is that there are currently
> two flavors of RECEIVEs.
> RECEIVE and FETCH [1]
>
> So any addition of a PULL would need a second flavor of PULL to match the
> case where a flowfile's contents are being overwritten as well (i.e. as
> FETCH is currently doing)
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
>
>
> Thanks,
> Nissim
>
>
> On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> mikerthom...@gmail.com> wrote:
>
>  I like the idea of creating PULL as a type. In fact, I'd propose that
> there
> are three scenarios here:
>
> RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> subscription
> PULL - Direct operations to seek out and fetch something in a targeted
> fashion. Ex. GetHttp
> QUERY - Go looking for the data and take what matches your search. Ex.
> JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
>
>
>
> On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman 
> wrote:
>
> >  Joe,
> >
> >
> > It is hard to say how much value transit URI would bring to clarify a
> > RECEIVE.
> > For example a RECEIVE with transit URI of https: could be either a
> > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> >
> > but your idea of "a metadata item specifying active vs passive" is a very
> > clever way to make this work with mimimal disruptions.
> >
> > My understanding of this is that the current receive() calls in
> > ProvenanceReporter [1] will remain the same, but news ones will be added
> > with a boolean parameter reflecting if the receive is active or passive.
> > This will allow the current list of Provenance Events [2] to remain the
> > same.  So third party/custom processors can continue working as is
> >
> > Does this sound like what you are thinking?
> >
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> >
> > [2]
> > apache/nifi
> >
> >
> > Thanks,
> >
> > Nissim
> >On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > joe.w...@gmail.com> wrote:
> >
> >  Nissim
> >
> > I like the idea to introduce a more refined type of event for how data is
> > brought into nifi (active - PULL, passive - RECEIVE).
> >
> > That said it might be sufficient to simply have this distinction be on
> the
> > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > protocol utilized as mentioned in the transport URI should clarify this
> > though.
> >
> > In short - i think there is a way here that is all opt-in for existing
> > users and components.
> >
> > Thanks
> >
> > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman  >
> > wrote:
> >
> > >  Adam,
> > > good points...
> > > I missed a step in explaining the use case where Provenance Events is
> > > incomplete...
> > > Where the second nifi does a GetSFTP from the *filesytem* that the
> first
> > > nifi is located on
> > > So the second nifi currently sends a RECEIVE event, but there is no
> > > corresponding SEND event from the first nifi (nor should there be)
> > > If the second nifi sent a PULL event, it would be easier for a system
> > > overseer to know that there should be no corresponding SEND event
> > >
> > > Currently send/receive works well when nifi 1 does a PostHTTP and nifi
> 2
> > > does a ListenHTTP, but not in the case above.
> > >
> > > The ERROR case you mention is a nice point as well, although not my
> > > specific issue at the moment.
> > > Thanks,
> > > Nissim
> > >
> > >
> > >
> > >
> > >
> > >On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > > a...@adamtaft.com> wrote:
> > >
> > >  > But a flowfile that was PULLed by the second nifi (from the first
> > nifi)
> > > will not necessarily have any provenance event generated by the first
> > nifi.
> > >
> > > Isn't this the fault of the first NiFi to fail to emit a SEND event in
> > > response to the second NiFi's request?  In this scenario, shouldn't the
> > > send/receive pair be:
> > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> > >
> > > What you describe is an odd use case for NiFi.  NiFi is usually not in
> > the
> > > business of acting as a file server daemon in order to "send" flowfiles
> > to
> > > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > > example processor which generates a SEND event whose input originates
> > from
> > > a "listener". [1]  The other ListenXYZ processors generally issue
> RECEIVE
> > > events because they are receiving bytes, not generating them.
> > >
> > > Are there other processors in question? Something cus

Re: PULL ProvenanceEvent

2019-10-31 Thread Nissim Shiman
 Mike,
I like the QUERY type as well.  Basically a more refined PULL.  Very nice.


Part of the challenge of adding PULL as a type is that there are currently two 
flavors of RECEIVEs.  
RECEIVE and FETCH [1]

So any addition of a PULL would need a second flavor of PULL to match the case 
where a flowfile's contents are being overwritten as well (i.e. as FETCH is 
currently doing)


[1] 
https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42


Thanks,
Nissim


On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen 
 wrote:  
 
 I like the idea of creating PULL as a type. In fact, I'd propose that there
are three scenarios here:

RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
subscription
PULL - Direct operations to seek out and fetch something in a targeted
fashion. Ex. GetHttp
QUERY - Go looking for the data and take what matches your search. Ex.
JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.



On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman 
wrote:

>  Joe,
>
>
> It is hard to say how much value transit URI would bring to clarify a
> RECEIVE.
> For example a RECEIVE with transit URI of https: could be either a
> GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
>
> but your idea of "a metadata item specifying active vs passive" is a very
> clever way to make this work with mimimal disruptions.
>
> My understanding of this is that the current receive() calls in
> ProvenanceReporter [1] will remain the same, but news ones will be added
> with a boolean parameter reflecting if the receive is active or passive.
> This will allow the current list of Provenance Events [2] to remain the
> same.  So third party/custom processors can continue working as is
>
> Does this sound like what you are thinking?
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
>
> [2]
> apache/nifi
>
>
> Thanks,
>
> Nissim
>    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> joe.w...@gmail.com> wrote:
>
>  Nissim
>
> I like the idea to introduce a more refined type of event for how data is
> brought into nifi (active - PULL, passive - RECEIVE).
>
> That said it might be sufficient to simply have this distinction be on the
> "RECEIVE" event as a metadata item specifying active vs passive.  The
> protocol utilized as mentioned in the transport URI should clarify this
> though.
>
> In short - i think there is a way here that is all opt-in for existing
> users and components.
>
> Thanks
>
> On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman 
> wrote:
>
> >  Adam,
> > good points...
> > I missed a step in explaining the use case where Provenance Events is
> > incomplete...
> > Where the second nifi does a GetSFTP from the *filesytem* that the first
> > nifi is located on
> > So the second nifi currently sends a RECEIVE event, but there is no
> > corresponding SEND event from the first nifi (nor should there be)
> > If the second nifi sent a PULL event, it would be easier for a system
> > overseer to know that there should be no corresponding SEND event
> >
> > Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2
> > does a ListenHTTP, but not in the case above.
> >
> > The ERROR case you mention is a nice point as well, although not my
> > specific issue at the moment.
> > Thanks,
> > Nissim
> >
> >
> >
> >
> >
> >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > a...@adamtaft.com> wrote:
> >
> >  > But a flowfile that was PULLed by the second nifi (from the first
> nifi)
> > will not necessarily have any provenance event generated by the first
> nifi.
> >
> > Isn't this the fault of the first NiFi to fail to emit a SEND event in
> > response to the second NiFi's request?  In this scenario, shouldn't the
> > send/receive pair be:
> > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> >
> > What you describe is an odd use case for NiFi.  NiFi is usually not in
> the
> > business of acting as a file server daemon in order to "send" flowfiles
> to
> > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > example processor which generates a SEND event whose input originates
> from
> > a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
> > events because they are receiving bytes, not generating them.
> >
> > Are there other processors in question? Something custom? Or is this
> > related to site-to-site transfers?
> >
> > I still kind of question the motive of a provenance event pair that is
> > trying to establish "who called who first".  Honestly just trying to
> > understand the use case where a matching SEND/RECEIVE pair doesn't give
> you
> > what you need.
> >
> > The only thing I could see would be a processor that asks for data, but
> > then doesn't receive it due to some error condition.  In this case,
> adding
> > some sort of ERROR event might be useful.  "I attempted 

Re: PULL ProvenanceEvent

2019-10-30 Thread Mike Thomsen
I like the idea of creating PULL as a type. In fact, I'd propose that there
are three scenarios here:

RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
subscription
PULL - Direct operations to seek out and fetch something in a targeted
fashion. Ex. GetHttp
QUERY - Go looking for the data and take what matches your search. Ex.
JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.



On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman 
wrote:

>  Joe,
>
>
> It is hard to say how much value transit URI would bring to clarify a
> RECEIVE.
> For example a RECEIVE with transit URI of https: could be either a
> GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
>
> but your idea of "a metadata item specifying active vs passive" is a very
> clever way to make this work with mimimal disruptions.
>
> My understanding of this is that the current receive() calls in
> ProvenanceReporter [1] will remain the same, but news ones will be added
> with a boolean parameter reflecting if the receive is active or passive.
> This will allow the current list of Provenance Events [2] to remain the
> same.  So third party/custom processors can continue working as is
>
> Does this sound like what you are thinking?
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
>
> [2]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
>
>
> Thanks,
>
> Nissim
> On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> joe.w...@gmail.com> wrote:
>
>  Nissim
>
> I like the idea to introduce a more refined type of event for how data is
> brought into nifi (active - PULL, passive - RECEIVE).
>
> That said it might be sufficient to simply have this distinction be on the
> "RECEIVE" event as a metadata item specifying active vs passive.  The
> protocol utilized as mentioned in the transport URI should clarify this
> though.
>
> In short - i think there is a way here that is all opt-in for existing
> users and components.
>
> Thanks
>
> On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman 
> wrote:
>
> >  Adam,
> > good points...
> > I missed a step in explaining the use case where Provenance Events is
> > incomplete...
> > Where the second nifi does a GetSFTP from the *filesytem* that the first
> > nifi is located on
> > So the second nifi currently sends a RECEIVE event, but there is no
> > corresponding SEND event from the first nifi (nor should there be)
> > If the second nifi sent a PULL event, it would be easier for a system
> > overseer to know that there should be no corresponding SEND event
> >
> > Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2
> > does a ListenHTTP, but not in the case above.
> >
> > The ERROR case you mention is a nice point as well, although not my
> > specific issue at the moment.
> > Thanks,
> > Nissim
> >
> >
> >
> >
> >
> >On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > a...@adamtaft.com> wrote:
> >
> >  > But a flowfile that was PULLed by the second nifi (from the first
> nifi)
> > will not necessarily have any provenance event generated by the first
> nifi.
> >
> > Isn't this the fault of the first NiFi to fail to emit a SEND event in
> > response to the second NiFi's request?  In this scenario, shouldn't the
> > send/receive pair be:
> > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> >
> > What you describe is an odd use case for NiFi.  NiFi is usually not in
> the
> > business of acting as a file server daemon in order to "send" flowfiles
> to
> > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > example processor which generates a SEND event whose input originates
> from
> > a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
> > events because they are receiving bytes, not generating them.
> >
> > Are there other processors in question? Something custom? Or is this
> > related to site-to-site transfers?
> >
> > I still kind of question the motive of a provenance event pair that is
> > trying to establish "who called who first".  Honestly just trying to
> > understand the use case where a matching SEND/RECEIVE pair doesn't give
> you
> > what you need.
> >
> > The only thing I could see would be a processor that asks for data, but
> > then doesn't receive it due to some error condition.  In this case,
> adding
> > some sort of ERROR event might be useful.  "I attempted to retrieve data
> > from ${uri}, but the transfer failed because of ${error condition}".
> That
> > way, GetXYZ processors could report an error to provenance instead of as
> a
> > bulletin.
> >
> > If the problem is related to a processor or the framework itself not
> > generating an event, can we just fix that function to emit SEND in the
> > scenario that you describe?  Changing the provenance model itself (beyond
> > possibly adding an ERROR event) feels like it would be the last scenario
> to

Re: PULL ProvenanceEvent

2019-10-30 Thread Nissim Shiman
 Joe, 


It is hard to say how much value transit URI would bring to clarify a RECEIVE.
For example a RECEIVE with transit URI of https: could be either a 
GetHTTP (i.e. active) or ListenHTTP (i.e. passive)

but your idea of "a metadata item specifying active vs passive" is a very 
clever way to make this work with mimimal disruptions.

My understanding of this is that the current receive() calls in 
ProvenanceReporter [1] will remain the same, but news ones will be added with a 
boolean parameter reflecting if the receive is active or passive.
This will allow the current list of Provenance Events [2] to remain the same.  
So third party/custom processors can continue working as is

Does this sound like what you are thinking?


[1] 
https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46

[2] 
https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java


Thanks,

Nissim
On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt 
 wrote:  
 
 Nissim

I like the idea to introduce a more refined type of event for how data is
brought into nifi (active - PULL, passive - RECEIVE).

That said it might be sufficient to simply have this distinction be on the
"RECEIVE" event as a metadata item specifying active vs passive.  The
protocol utilized as mentioned in the transport URI should clarify this
though.

In short - i think there is a way here that is all opt-in for existing
users and components.

Thanks

On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman 
wrote:

>  Adam,
> good points...
> I missed a step in explaining the use case where Provenance Events is
> incomplete...
> Where the second nifi does a GetSFTP from the *filesytem* that the first
> nifi is located on
> So the second nifi currently sends a RECEIVE event, but there is no
> corresponding SEND event from the first nifi (nor should there be)
> If the second nifi sent a PULL event, it would be easier for a system
> overseer to know that there should be no corresponding SEND event
>
> Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2
> does a ListenHTTP, but not in the case above.
>
> The ERROR case you mention is a nice point as well, although not my
> specific issue at the moment.
> Thanks,
> Nissim
>
>
>
>
>
>    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> a...@adamtaft.com> wrote:
>
>  > But a flowfile that was PULLed by the second nifi (from the first nifi)
> will not necessarily have any provenance event generated by the first nifi.
>
> Isn't this the fault of the first NiFi to fail to emit a SEND event in
> response to the second NiFi's request?  In this scenario, shouldn't the
> send/receive pair be:
> NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
>
> What you describe is an odd use case for NiFi.  NiFi is usually not in the
> business of acting as a file server daemon in order to "send" flowfiles to
> other systems.  As you mention, HandleHttpResponse may be a lone wolf
> example processor which generates a SEND event whose input originates from
> a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
> events because they are receiving bytes, not generating them.
>
> Are there other processors in question? Something custom? Or is this
> related to site-to-site transfers?
>
> I still kind of question the motive of a provenance event pair that is
> trying to establish "who called who first".  Honestly just trying to
> understand the use case where a matching SEND/RECEIVE pair doesn't give you
> what you need.
>
> The only thing I could see would be a processor that asks for data, but
> then doesn't receive it due to some error condition.  In this case, adding
> some sort of ERROR event might be useful.  "I attempted to retrieve data
> from ${uri}, but the transfer failed because of ${error condition}".  That
> way, GetXYZ processors could report an error to provenance instead of as a
> bulletin.
>
> If the problem is related to a processor or the framework itself not
> generating an event, can we just fix that function to emit SEND in the
> scenario that you describe?  Changing the provenance model itself (beyond
> possibly adding an ERROR event) feels like it would be the last scenario to
> consider.
>
> Thanks,
> Adam
>
> [1]
>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
>
>
>
>
> On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman 
> wrote:
>
> >  Adam,
> > I believe there is a need for more detailed ProvenanceEvents.
> > A use case would be a customer that is trying to track data passed
> between
> > two nifi's and trying to match up SENDs and RECEIVEs
> >
> > So a flowfile that has a SEND event on the first nifi should have a
> > RECEIVE event on the second nifi.
> > But a flowfile that was PULLed by the second nifi (from the first nifi)
> >

Re: PULL ProvenanceEvent

2019-10-29 Thread Joe Witt
Nissim

I like the idea to introduce a more refined type of event for how data is
brought into nifi (active - PULL, passive - RECEIVE).

That said it might be sufficient to simply have this distinction be on the
"RECEIVE" event as a metadata item specifying active vs passive.  The
protocol utilized as mentioned in the transport URI should clarify this
though.

In short - i think there is a way here that is all opt-in for existing
users and components.

Thanks

On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman 
wrote:

>  Adam,
> good points...
> I missed a step in explaining the use case where Provenance Events is
> incomplete...
> Where the second nifi does a GetSFTP from the *filesytem* that the first
> nifi is located on
> So the second nifi currently sends a RECEIVE event, but there is no
> corresponding SEND event from the first nifi (nor should there be)
> If the second nifi sent a PULL event, it would be easier for a system
> overseer to know that there should be no corresponding SEND event
>
> Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2
> does a ListenHTTP, but not in the case above.
>
> The ERROR case you mention is a nice point as well, although not my
> specific issue at the moment.
> Thanks,
> Nissim
>
>
>
>
>
> On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> a...@adamtaft.com> wrote:
>
>  > But a flowfile that was PULLed by the second nifi (from the first nifi)
> will not necessarily have any provenance event generated by the first nifi.
>
> Isn't this the fault of the first NiFi to fail to emit a SEND event in
> response to the second NiFi's request?  In this scenario, shouldn't the
> send/receive pair be:
> NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
>
> What you describe is an odd use case for NiFi.  NiFi is usually not in the
> business of acting as a file server daemon in order to "send" flowfiles to
> other systems.  As you mention, HandleHttpResponse may be a lone wolf
> example processor which generates a SEND event whose input originates from
> a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
> events because they are receiving bytes, not generating them.
>
> Are there other processors in question? Something custom? Or is this
> related to site-to-site transfers?
>
> I still kind of question the motive of a provenance event pair that is
> trying to establish "who called who first".  Honestly just trying to
> understand the use case where a matching SEND/RECEIVE pair doesn't give you
> what you need.
>
> The only thing I could see would be a processor that asks for data, but
> then doesn't receive it due to some error condition.  In this case, adding
> some sort of ERROR event might be useful.  "I attempted to retrieve data
> from ${uri}, but the transfer failed because of ${error condition}".  That
> way, GetXYZ processors could report an error to provenance instead of as a
> bulletin.
>
> If the problem is related to a processor or the framework itself not
> generating an event, can we just fix that function to emit SEND in the
> scenario that you describe?  Changing the provenance model itself (beyond
> possibly adding an ERROR event) feels like it would be the last scenario to
> consider.
>
> Thanks,
> Adam
>
> [1]
>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
>
>
>
>
> On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman 
> wrote:
>
> >  Adam,
> > I believe there is a need for more detailed ProvenanceEvents.
> > A use case would be a customer that is trying to track data passed
> between
> > two nifi's and trying to match up SENDs and RECEIVEs
> >
> > So a flowfile that has a SEND event on the first nifi should have a
> > RECEIVE event on the second nifi.
> > But a flowfile that was PULLed by the second nifi (from the first nifi)
> > will not necessarily have any provenance event generated by the first
> nifi.
> >
> > (I realize that FETCH is already a "reserved word" in the current
> > ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> > There is another Provenance Event, ACKNOWLEDGE, which would also fit
> > occasionally to this model as well (an example would be
> HandleHttpResponse
> > processor which could send this instead of SEND when responding to a HTTP
> > request)
> > This being said, you make an excellent point when you said
> > "However even more important to realize,
> > this change would affect many other downstream consumers of provenance
> data
> > which aren't necessarily in the stock NiFi distribution."
> > Thanks,
> > Nissim
> >
> >On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> >  wrote:
> >
> >  Adam,
> > "Yes" to your first question and the four processor examples you listed.
> >
> > I will need to get back to you regarding your other points.
> >
> > Thanks,
> > Nissim
> >
> >On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam 

Re: PULL ProvenanceEvent

2019-10-29 Thread Nissim Shiman
 Adam,
good points...
I missed a step in explaining the use case where Provenance Events is 
incomplete...
Where the second nifi does a GetSFTP from the *filesytem* that the first nifi 
is located on
So the second nifi currently sends a RECEIVE event, but there is no 
corresponding SEND event from the first nifi (nor should there be)
If the second nifi sent a PULL event, it would be easier for a system overseer 
to know that there should be no corresponding SEND event

Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2 does a 
ListenHTTP, but not in the case above.

The ERROR case you mention is a nice point as well, although not my specific 
issue at the moment.
Thanks,
Nissim





On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft  
wrote:  
 
 > But a flowfile that was PULLed by the second nifi (from the first nifi)
will not necessarily have any provenance event generated by the first nifi.

Isn't this the fault of the first NiFi to fail to emit a SEND event in
response to the second NiFi's request?  In this scenario, shouldn't the
send/receive pair be:
NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?

What you describe is an odd use case for NiFi.  NiFi is usually not in the
business of acting as a file server daemon in order to "send" flowfiles to
other systems.  As you mention, HandleHttpResponse may be a lone wolf
example processor which generates a SEND event whose input originates from
a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
events because they are receiving bytes, not generating them.

Are there other processors in question? Something custom? Or is this
related to site-to-site transfers?

I still kind of question the motive of a provenance event pair that is
trying to establish "who called who first".  Honestly just trying to
understand the use case where a matching SEND/RECEIVE pair doesn't give you
what you need.

The only thing I could see would be a processor that asks for data, but
then doesn't receive it due to some error condition.  In this case, adding
some sort of ERROR event might be useful.  "I attempted to retrieve data
from ${uri}, but the transfer failed because of ${error condition}".  That
way, GetXYZ processors could report an error to provenance instead of as a
bulletin.

If the problem is related to a processor or the framework itself not
generating an event, can we just fix that function to emit SEND in the
scenario that you describe?  Changing the provenance model itself (beyond
possibly adding an ERROR event) feels like it would be the last scenario to
consider.

Thanks,
Adam

[1]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191




On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman 
wrote:

>  Adam,
> I believe there is a need for more detailed ProvenanceEvents.
> A use case would be a customer that is trying to track data passed between
> two nifi's and trying to match up SENDs and RECEIVEs
>
> So a flowfile that has a SEND event on the first nifi should have a
> RECEIVE event on the second nifi.
> But a flowfile that was PULLed by the second nifi (from the first nifi)
> will not necessarily have any provenance event generated by the first nifi.
>
> (I realize that FETCH is already a "reserved word" in the current
> ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> There is another Provenance Event, ACKNOWLEDGE, which would also fit
> occasionally to this model as well (an example would be HandleHttpResponse
> processor which could send this instead of SEND when responding to a HTTP
> request)
> This being said, you make an excellent point when you said
> "However even more important to realize,
> this change would affect many other downstream consumers of provenance data
> which aren't necessarily in the stock NiFi distribution."
> Thanks,
> Nissim
>
>    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
>  wrote:
>
>  Adam,
> "Yes" to your first question and the four processor examples you listed.
>
> I will need to get back to you regarding your other points.
>
> Thanks,
> Nissim
>
>    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> a...@adamtaft.com> wrote:
>
>  Nissim,
>
> Just to be clear, you are trying to distinguish between processors which
> are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
> data (ListenXYZ)?  Is that your basic vision?
>
> GetFile => PULL
> GetHTTP => PULL
> ListenHTTP => RECEIVE
> ListenTCP => RECEIVE
>
> Could you clarify what advantages this would have in terms of data
> provenance?  What would you use this new event type for specifically?  What
> are you missing now? Do you have a use case that needs this, or are you
> just generally trying to round out the provenance event types for sake of
> completeness?  I honestly don't know a use case where you care whether you
> polled for the data or 

Re: PULL ProvenanceEvent

2019-10-28 Thread Adam Taft
> But a flowfile that was PULLed by the second nifi (from the first nifi)
will not necessarily have any provenance event generated by the first nifi.

Isn't this the fault of the first NiFi to fail to emit a SEND event in
response to the second NiFi's request?  In this scenario, shouldn't the
send/receive pair be:
NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?

What you describe is an odd use case for NiFi.  NiFi is usually not in the
business of acting as a file server daemon in order to "send" flowfiles to
other systems.  As you mention, HandleHttpResponse may be a lone wolf
example processor which generates a SEND event whose input originates from
a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
events because they are receiving bytes, not generating them.

Are there other processors in question? Something custom? Or is this
related to site-to-site transfers?

I still kind of question the motive of a provenance event pair that is
trying to establish "who called who first".  Honestly just trying to
understand the use case where a matching SEND/RECEIVE pair doesn't give you
what you need.

The only thing I could see would be a processor that asks for data, but
then doesn't receive it due to some error condition.  In this case, adding
some sort of ERROR event might be useful.  "I attempted to retrieve data
from ${uri}, but the transfer failed because of ${error condition}".  That
way, GetXYZ processors could report an error to provenance instead of as a
bulletin.

If the problem is related to a processor or the framework itself not
generating an event, can we just fix that function to emit SEND in the
scenario that you describe?  Changing the provenance model itself (beyond
possibly adding an ERROR event) feels like it would be the last scenario to
consider.

Thanks,
Adam

[1]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191




On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman 
wrote:

>  Adam,
> I believe there is a need for more detailed ProvenanceEvents.
> A use case would be a customer that is trying to track data passed between
> two nifi's and trying to match up SENDs and RECEIVEs
>
> So a flowfile that has a SEND event on the first nifi should have a
> RECEIVE event on the second nifi.
> But a flowfile that was PULLed by the second nifi (from the first nifi)
> will not necessarily have any provenance event generated by the first nifi.
>
> (I realize that FETCH is already a "reserved word" in the current
> ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> There is another Provenance Event, ACKNOWLEDGE, which would also fit
> occasionally to this model as well (an example would be HandleHttpResponse
> processor which could send this instead of SEND when responding to a HTTP
> request)
> This being said, you make an excellent point when you said
> "However even more important to realize,
> this change would affect many other downstream consumers of provenance data
> which aren't necessarily in the stock NiFi distribution."
> Thanks,
> Nissim
>
> On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
>  wrote:
>
>   Adam,
> "Yes" to your first question and the four processor examples you listed.
>
> I will need to get back to you regarding your other points.
>
> Thanks,
> Nissim
>
> On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> a...@adamtaft.com> wrote:
>
>  Nissim,
>
> Just to be clear, you are trying to distinguish between processors which
> are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
> data (ListenXYZ)?  Is that your basic vision?
>
> GetFile => PULL
> GetHTTP => PULL
> ListenHTTP => RECEIVE
> ListenTCP => RECEIVE
>
> Could you clarify what advantages this would have in terms of data
> provenance?  What would you use this new event type for specifically?  What
> are you missing now? Do you have a use case that needs this, or are you
> just generally trying to round out the provenance event types for sake of
> completeness?  I honestly don't know a use case where you care whether you
> polled for the data or listened for it.  The provenance model today just
> cares that you received the data, not so much how you received it.
>
> You're right that this proposal will affect many processors and the
> internal visualization tools, etc.  However even more important to realize,
> this change would affect many other downstream consumers of provenance data
> which aren't necessarily in the stock NiFi distribution.  For example, any
> third-party/custom ReportingTask that handles provenance data would need to
> be updated with this change.  There's probably need for a strong vision to
> help demonstrate the value for this vs. the cost of the cascading effects
> related to this change.
>
> Thanks,
> Adam
>
>
> On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman 
> wrote:
>
> > Hello Team,
> >
>

Re: PULL ProvenanceEvent

2019-10-28 Thread Nissim Shiman
 Adam,
I believe there is a need for more detailed ProvenanceEvents.
A use case would be a customer that is trying to track data passed between two 
nifi's and trying to match up SENDs and RECEIVEs

So a flowfile that has a SEND event on the first nifi should have a RECEIVE 
event on the second nifi.
But a flowfile that was PULLed by the second nifi (from the first nifi) will 
not necessarily have any provenance event generated by the first nifi.

(I realize that FETCH is already a "reserved word" in the current 
ProvenanceEvents setup, so I was hoping PULL could be used instead.)
There is another Provenance Event, ACKNOWLEDGE, which would also fit 
occasionally to this model as well (an example would be HandleHttpResponse 
processor which could send this instead of SEND when responding to a HTTP 
request)
This being said, you make an excellent point when you said
"However even more important to realize,
this change would affect many other downstream consumers of provenance data
which aren't necessarily in the stock NiFi distribution."
Thanks,
Nissim

On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman 
 wrote:  
 
  Adam,
"Yes" to your first question and the four processor examples you listed.

I will need to get back to you regarding your other points.

Thanks,
Nissim

    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft 
 wrote:  
 
 Nissim,

Just to be clear, you are trying to distinguish between processors which
are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
data (ListenXYZ)?  Is that your basic vision?

GetFile => PULL
GetHTTP => PULL
ListenHTTP => RECEIVE
ListenTCP => RECEIVE

Could you clarify what advantages this would have in terms of data
provenance?  What would you use this new event type for specifically?  What
are you missing now? Do you have a use case that needs this, or are you
just generally trying to round out the provenance event types for sake of
completeness?  I honestly don't know a use case where you care whether you
polled for the data or listened for it.  The provenance model today just
cares that you received the data, not so much how you received it.

You're right that this proposal will affect many processors and the
internal visualization tools, etc.  However even more important to realize,
this change would affect many other downstream consumers of provenance data
which aren't necessarily in the stock NiFi distribution.  For example, any
third-party/custom ReportingTask that handles provenance data would need to
be updated with this change.  There's probably need for a strong vision to
help demonstrate the value for this vs. the cost of the cascading effects
related to this change.

Thanks,
Adam


On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman 
wrote:

> Hello Team,
>
> The ProvenanceEventType class does a good job capturing possible events,
> but the PULL event doesn't seem to fall nicely into any of the existing
> types.
>
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> active action of a PULL
>
> And... maybe it would fall into FETCH, but FETCH is more focused on
> contents of an existing flow file being overwritten.
>
> What does the community think about a new PULL event type,
> or
>  using FETCH for PULL, and having what FETCH does now be a new event such
> as REUSE
>
> NOTE: a new PULL event would have a cascading effect of many processors
> that currently are emitting RECEIVE's being modified to be PULL
> (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> would more accurately capture the event.
>
> Thanks,
> Nissim Shiman
>
>
    

Re: PULL ProvenanceEvent

2019-10-11 Thread Nissim Shiman
 Adam,
"Yes" to your first question and the four processor examples you listed.

I will need to get back to you regarding your other points.

Thanks,
Nissim

On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft 
 wrote:  
 
 Nissim,

Just to be clear, you are trying to distinguish between processors which
are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
data (ListenXYZ)?  Is that your basic vision?

GetFile => PULL
GetHTTP => PULL
ListenHTTP => RECEIVE
ListenTCP => RECEIVE

Could you clarify what advantages this would have in terms of data
provenance?  What would you use this new event type for specifically?  What
are you missing now? Do you have a use case that needs this, or are you
just generally trying to round out the provenance event types for sake of
completeness?  I honestly don't know a use case where you care whether you
polled for the data or listened for it.  The provenance model today just
cares that you received the data, not so much how you received it.

You're right that this proposal will affect many processors and the
internal visualization tools, etc.  However even more important to realize,
this change would affect many other downstream consumers of provenance data
which aren't necessarily in the stock NiFi distribution.  For example, any
third-party/custom ReportingTask that handles provenance data would need to
be updated with this change.  There's probably need for a strong vision to
help demonstrate the value for this vs. the cost of the cascading effects
related to this change.

Thanks,
Adam


On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman 
wrote:

> Hello Team,
>
> The ProvenanceEventType class does a good job capturing possible events,
> but the PULL event doesn't seem to fall nicely into any of the existing
> types.
>
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> active action of a PULL
>
> And... maybe it would fall into FETCH, but FETCH is more focused on
> contents of an existing flow file being overwritten.
>
> What does the community think about a new PULL event type,
> or
>  using FETCH for PULL, and having what FETCH does now be a new event such
> as REUSE
>
> NOTE: a new PULL event would have a cascading effect of many processors
> that currently are emitting RECEIVE's being modified to be PULL
> (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> would more accurately capture the event.
>
> Thanks,
> Nissim Shiman
>
>
  

Re: PULL ProvenanceEvent

2019-10-10 Thread Adam Taft
Nissim,

Just to be clear, you are trying to distinguish between processors which
are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
data (ListenXYZ)?  Is that your basic vision?

GetFile => PULL
GetHTTP => PULL
ListenHTTP => RECEIVE
ListenTCP => RECEIVE

Could you clarify what advantages this would have in terms of data
provenance?  What would you use this new event type for specifically?  What
are you missing now? Do you have a use case that needs this, or are you
just generally trying to round out the provenance event types for sake of
completeness?  I honestly don't know a use case where you care whether you
polled for the data or listened for it.  The provenance model today just
cares that you received the data, not so much how you received it.

You're right that this proposal will affect many processors and the
internal visualization tools, etc.  However even more important to realize,
this change would affect many other downstream consumers of provenance data
which aren't necessarily in the stock NiFi distribution.  For example, any
third-party/custom ReportingTask that handles provenance data would need to
be updated with this change.  There's probably need for a strong vision to
help demonstrate the value for this vs. the cost of the cascading effects
related to this change.

Thanks,
Adam


On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman 
wrote:

> Hello Team,
>
> The ProvenanceEventType class does a good job capturing possible events,
> but the PULL event doesn't seem to fall nicely into any of the existing
> types.
>
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> active action of a PULL
>
> And... maybe it would fall into FETCH, but FETCH is more focused on
> contents of an existing flow file being overwritten.
>
> What does the community think about a new PULL event type,
> or
>  using FETCH for PULL, and having what FETCH does now be a new event such
> as REUSE
>
> NOTE: a new PULL event would have a cascading effect of many processors
> that currently are emitting RECEIVE's being modified to be PULL
> (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> would more accurately capture the event.
>
> Thanks,
> Nissim Shiman
>
>


PULL ProvenanceEvent

2019-10-10 Thread Nissim Shiman
Hello Team,

The ProvenanceEventType class does a good job capturing possible events, but 
the PULL event doesn't seem to fall nicely into any of the existing types.
https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
RECEIVE is the closest, but RECEIVE is passive and doesn't capture the active 
action of a PULL

And... maybe it would fall into FETCH, but FETCH is more focused on contents of 
an existing flow file being overwritten.

What does the community think about a new PULL event type, 
or
 using FETCH for PULL, and having what FETCH does now be a new event such as 
REUSE

NOTE: a new PULL event would have a cascading effect of many processors that 
currently are emitting RECEIVE's being modified to be PULL
(i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but would 
more accurately capture the event.

Thanks,
Nissim Shiman