Re: Use of attribute uuid and other "native" attributes

2023-07-18 Thread Russell Bateman
Of course, a custom processor can create any attribute, including an 
"external id field." I don't think it can "lose" the original uuid 
since, if it attempts to reset it, the action will be quietly ignored 
(Mark).


Note that uuid figures prominently in the display of provenance--in my 
mind the crucial nature of my question. [1]


My question was about the "sanctified" state (or not) of uuid and Matt 
and Mark gave succinct and useful answers that I will explore. I was 
unaware of the suggested "best practice" of considering losing any and 
all previously established attributes before sending flowfiles on. I 
have long done this explicitly in the case of attributes I create, but 
will now contemplate doing it for other attributes I did not create and 
therefore have respected "religiously."


Russ

[1] 
https://www.tutorialspoint.com/apache_nifi/apache_nifi_data_provenance.htm


On 7/18/23 14:07, Edward Armes wrote:

Hmm,

I've seen this come up a few times now I wonder is there need for a rename
of the uuid field and a creation of an external id field?

Edward

On Tue, 18 Jul 2023, 20:53 Lucas Ottersbach,
wrote:


Hey Matt,

you wrote that both `Session.create` and `Session.clone` set a new FlowFile
UUID to the resulting FlowFile. This somewhat sounds like there is an
alternative way where the UUID is not controlled by the framework itself?

I've got a different use case than Russell, but was wondering whether it is
even possible to control the FlowFile UUID as a Processor developer? I've
got a processor pair for inter-cluster transfer of FlowFiles (where
Site-to-Site is not applicable). As of now, the UUID on the receiving side
differs from the original on the origin cluster, because I'm using
`Session.create`.
Is there a way to control the UUID of new FlowFiles?


Best regards,

Lucas

Matt Burgess  schrieb am Di., 18. Juli 2023, 20:23:


In general I recommend only sending on those attributes that will be
used at some point downstream (unless you have an "original"
relationship that should maintain the original state with respect to
provenance). If you don't know that ahead of time you'll probably need
to send all/most of the attributes just in case.

Are you using session.create() or session.clone()? They both set a new
"uuid" attribute on the created FlowFile, with at least the latter
setting some other attributes as well (see the Developer Guide [1] for
more details).

Regards,
Matt

[1]https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html

On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman
wrote:

I have a custom processor, /SplitHl7v4Resources/, that splits out
individual FHIR resources (Patients, Observations, Encounters, etc.)
from great Bundle flowfiles. So, for a given flowfile, it's split into
hundreds of smaller ones.

When I do this, I leave the existing NiFi attributes as they were on

the

original flowfile.

As I contemplate the uuid attribute, it occurs to me that I should find
out what its *significance is for provenance and other potential
debugging/tracing concerns*. I never really look at it, but, if there
were some kind of melt-down in a production environment, would I care
that it multiplied across hundreds of flowfiles besided the original

one?

Also these two other NiFi attributes remain unchanged:

 filename
 path


I do garnish each flowfile with many pointed/significant new attributes
like resource.type that are my own. In my processing, I don't care

about

NiFi's original attributes, but should I?

Thanks,
Russ


Re: Use of attribute uuid and other "native" attributes

2023-07-18 Thread Edward Armes
Hmm,

I've seen this come up a few times now I wonder is there need for a rename
of the uuid field and a creation of an external id field?

Edward

On Tue, 18 Jul 2023, 20:53 Lucas Ottersbach, 
wrote:

> Hey Matt,
>
> you wrote that both `Session.create` and `Session.clone` set a new FlowFile
> UUID to the resulting FlowFile. This somewhat sounds like there is an
> alternative way where the UUID is not controlled by the framework itself?
>
> I've got a different use case than Russell, but was wondering whether it is
> even possible to control the FlowFile UUID as a Processor developer? I've
> got a processor pair for inter-cluster transfer of FlowFiles (where
> Site-to-Site is not applicable). As of now, the UUID on the receiving side
> differs from the original on the origin cluster, because I'm using
> `Session.create`.
> Is there a way to control the UUID of new FlowFiles?
>
>
> Best regards,
>
> Lucas
>
> Matt Burgess  schrieb am Di., 18. Juli 2023, 20:23:
>
> > In general I recommend only sending on those attributes that will be
> > used at some point downstream (unless you have an "original"
> > relationship that should maintain the original state with respect to
> > provenance). If you don't know that ahead of time you'll probably need
> > to send all/most of the attributes just in case.
> >
> > Are you using session.create() or session.clone()? They both set a new
> > "uuid" attribute on the created FlowFile, with at least the latter
> > setting some other attributes as well (see the Developer Guide [1] for
> > more details).
> >
> > Regards,
> > Matt
> >
> > [1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html
> >
> > On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman 
> > wrote:
> > >
> > > I have a custom processor, /SplitHl7v4Resources/, that splits out
> > > individual FHIR resources (Patients, Observations, Encounters, etc.)
> > > from great Bundle flowfiles. So, for a given flowfile, it's split into
> > > hundreds of smaller ones.
> > >
> > > When I do this, I leave the existing NiFi attributes as they were on
> the
> > > original flowfile.
> > >
> > > As I contemplate the uuid attribute, it occurs to me that I should find
> > > out what its *significance is for provenance and other potential
> > > debugging/tracing concerns*. I never really look at it, but, if there
> > > were some kind of melt-down in a production environment, would I care
> > > that it multiplied across hundreds of flowfiles besided the original
> one?
> > >
> > > Also these two other NiFi attributes remain unchanged:
> > >
> > > filename
> > > path
> > >
> > >
> > > I do garnish each flowfile with many pointed/significant new attributes
> > > like resource.type that are my own. In my processing, I don't care
> about
> > > NiFi's original attributes, but should I?
> > >
> > > Thanks,
> > > Russ
> >
>


Re: Use of attribute uuid and other "native" attributes

2023-07-18 Thread Lucas Ottersbach
That was impression as well. Thank you for the quick response and the
clarification.


Best regards

Lucas

Mark Payne  schrieb am Di., 18. Juli 2023, 21:56:

> Lucas,
>
> You cannot control the UUID. It’s automatically generated by the
> framework. If you attempt to use ProcessSession.putAllAttributes or
> ProcessSession.putAttribute, it’ll ignore the “uuid” key.
>
> Thanks
> -Mark
>
>
> > On Jul 18, 2023, at 3:51 PM, Lucas Ottersbach <
> lucas.ottersb...@gmail.com> wrote:
> >
> > Hey Matt,
> >
> > you wrote that both `Session.create` and `Session.clone` set a new
> FlowFile
> > UUID to the resulting FlowFile. This somewhat sounds like there is an
> > alternative way where the UUID is not controlled by the framework itself?
> >
> > I've got a different use case than Russell, but was wondering whether it
> is
> > even possible to control the FlowFile UUID as a Processor developer? I've
> > got a processor pair for inter-cluster transfer of FlowFiles (where
> > Site-to-Site is not applicable). As of now, the UUID on the receiving
> side
> > differs from the original on the origin cluster, because I'm using
> > `Session.create`.
> > Is there a way to control the UUID of new FlowFiles?
> >
> >
> > Best regards,
> >
> > Lucas
> >
> > Matt Burgess  schrieb am Di., 18. Juli 2023,
> 20:23:
> >
> >> In general I recommend only sending on those attributes that will be
> >> used at some point downstream (unless you have an "original"
> >> relationship that should maintain the original state with respect to
> >> provenance). If you don't know that ahead of time you'll probably need
> >> to send all/most of the attributes just in case.
> >>
> >> Are you using session.create() or session.clone()? They both set a new
> >> "uuid" attribute on the created FlowFile, with at least the latter
> >> setting some other attributes as well (see the Developer Guide [1] for
> >> more details).
> >>
> >> Regards,
> >> Matt
> >>
> >> [1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html
> >>
> >> On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman  >
> >> wrote:
> >>>
> >>> I have a custom processor, /SplitHl7v4Resources/, that splits out
> >>> individual FHIR resources (Patients, Observations, Encounters, etc.)
> >>> from great Bundle flowfiles. So, for a given flowfile, it's split into
> >>> hundreds of smaller ones.
> >>>
> >>> When I do this, I leave the existing NiFi attributes as they were on
> the
> >>> original flowfile.
> >>>
> >>> As I contemplate the uuid attribute, it occurs to me that I should find
> >>> out what its *significance is for provenance and other potential
> >>> debugging/tracing concerns*. I never really look at it, but, if there
> >>> were some kind of melt-down in a production environment, would I care
> >>> that it multiplied across hundreds of flowfiles besided the original
> one?
> >>>
> >>> Also these two other NiFi attributes remain unchanged:
> >>>
> >>>filename
> >>>path
> >>>
> >>>
> >>> I do garnish each flowfile with many pointed/significant new attributes
> >>> like resource.type that are my own. In my processing, I don't care
> about
> >>> NiFi's original attributes, but should I?
> >>>
> >>> Thanks,
> >>> Russ
> >>
>
>


Re: Use of attribute uuid and other "native" attributes

2023-07-18 Thread Mark Payne
Lucas,

You cannot control the UUID. It’s automatically generated by the framework. If 
you attempt to use ProcessSession.putAllAttributes or 
ProcessSession.putAttribute, it’ll ignore the “uuid” key.

Thanks
-Mark


> On Jul 18, 2023, at 3:51 PM, Lucas Ottersbach  
> wrote:
> 
> Hey Matt,
> 
> you wrote that both `Session.create` and `Session.clone` set a new FlowFile
> UUID to the resulting FlowFile. This somewhat sounds like there is an
> alternative way where the UUID is not controlled by the framework itself?
> 
> I've got a different use case than Russell, but was wondering whether it is
> even possible to control the FlowFile UUID as a Processor developer? I've
> got a processor pair for inter-cluster transfer of FlowFiles (where
> Site-to-Site is not applicable). As of now, the UUID on the receiving side
> differs from the original on the origin cluster, because I'm using
> `Session.create`.
> Is there a way to control the UUID of new FlowFiles?
> 
> 
> Best regards,
> 
> Lucas
> 
> Matt Burgess  schrieb am Di., 18. Juli 2023, 20:23:
> 
>> In general I recommend only sending on those attributes that will be
>> used at some point downstream (unless you have an "original"
>> relationship that should maintain the original state with respect to
>> provenance). If you don't know that ahead of time you'll probably need
>> to send all/most of the attributes just in case.
>> 
>> Are you using session.create() or session.clone()? They both set a new
>> "uuid" attribute on the created FlowFile, with at least the latter
>> setting some other attributes as well (see the Developer Guide [1] for
>> more details).
>> 
>> Regards,
>> Matt
>> 
>> [1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html
>> 
>> On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman 
>> wrote:
>>> 
>>> I have a custom processor, /SplitHl7v4Resources/, that splits out
>>> individual FHIR resources (Patients, Observations, Encounters, etc.)
>>> from great Bundle flowfiles. So, for a given flowfile, it's split into
>>> hundreds of smaller ones.
>>> 
>>> When I do this, I leave the existing NiFi attributes as they were on the
>>> original flowfile.
>>> 
>>> As I contemplate the uuid attribute, it occurs to me that I should find
>>> out what its *significance is for provenance and other potential
>>> debugging/tracing concerns*. I never really look at it, but, if there
>>> were some kind of melt-down in a production environment, would I care
>>> that it multiplied across hundreds of flowfiles besided the original one?
>>> 
>>> Also these two other NiFi attributes remain unchanged:
>>> 
>>>filename
>>>path
>>> 
>>> 
>>> I do garnish each flowfile with many pointed/significant new attributes
>>> like resource.type that are my own. In my processing, I don't care about
>>> NiFi's original attributes, but should I?
>>> 
>>> Thanks,
>>> Russ
>> 



Re: Use of attribute uuid and other "native" attributes

2023-07-18 Thread Lucas Ottersbach
Hey Matt,

you wrote that both `Session.create` and `Session.clone` set a new FlowFile
UUID to the resulting FlowFile. This somewhat sounds like there is an
alternative way where the UUID is not controlled by the framework itself?

I've got a different use case than Russell, but was wondering whether it is
even possible to control the FlowFile UUID as a Processor developer? I've
got a processor pair for inter-cluster transfer of FlowFiles (where
Site-to-Site is not applicable). As of now, the UUID on the receiving side
differs from the original on the origin cluster, because I'm using
`Session.create`.
Is there a way to control the UUID of new FlowFiles?


Best regards,

Lucas

Matt Burgess  schrieb am Di., 18. Juli 2023, 20:23:

> In general I recommend only sending on those attributes that will be
> used at some point downstream (unless you have an "original"
> relationship that should maintain the original state with respect to
> provenance). If you don't know that ahead of time you'll probably need
> to send all/most of the attributes just in case.
>
> Are you using session.create() or session.clone()? They both set a new
> "uuid" attribute on the created FlowFile, with at least the latter
> setting some other attributes as well (see the Developer Guide [1] for
> more details).
>
> Regards,
> Matt
>
> [1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html
>
> On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman 
> wrote:
> >
> > I have a custom processor, /SplitHl7v4Resources/, that splits out
> > individual FHIR resources (Patients, Observations, Encounters, etc.)
> > from great Bundle flowfiles. So, for a given flowfile, it's split into
> > hundreds of smaller ones.
> >
> > When I do this, I leave the existing NiFi attributes as they were on the
> > original flowfile.
> >
> > As I contemplate the uuid attribute, it occurs to me that I should find
> > out what its *significance is for provenance and other potential
> > debugging/tracing concerns*. I never really look at it, but, if there
> > were some kind of melt-down in a production environment, would I care
> > that it multiplied across hundreds of flowfiles besided the original one?
> >
> > Also these two other NiFi attributes remain unchanged:
> >
> > filename
> > path
> >
> >
> > I do garnish each flowfile with many pointed/significant new attributes
> > like resource.type that are my own. In my processing, I don't care about
> > NiFi's original attributes, but should I?
> >
> > Thanks,
> > Russ
>


Re: Use of attribute uuid and other "native" attributes

2023-07-18 Thread Matt Burgess
In general I recommend only sending on those attributes that will be
used at some point downstream (unless you have an "original"
relationship that should maintain the original state with respect to
provenance). If you don't know that ahead of time you'll probably need
to send all/most of the attributes just in case.

Are you using session.create() or session.clone()? They both set a new
"uuid" attribute on the created FlowFile, with at least the latter
setting some other attributes as well (see the Developer Guide [1] for
more details).

Regards,
Matt

[1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html

On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman  wrote:
>
> I have a custom processor, /SplitHl7v4Resources/, that splits out
> individual FHIR resources (Patients, Observations, Encounters, etc.)
> from great Bundle flowfiles. So, for a given flowfile, it's split into
> hundreds of smaller ones.
>
> When I do this, I leave the existing NiFi attributes as they were on the
> original flowfile.
>
> As I contemplate the uuid attribute, it occurs to me that I should find
> out what its *significance is for provenance and other potential
> debugging/tracing concerns*. I never really look at it, but, if there
> were some kind of melt-down in a production environment, would I care
> that it multiplied across hundreds of flowfiles besided the original one?
>
> Also these two other NiFi attributes remain unchanged:
>
> filename
> path
>
>
> I do garnish each flowfile with many pointed/significant new attributes
> like resource.type that are my own. In my processing, I don't care about
> NiFi's original attributes, but should I?
>
> Thanks,
> Russ


Use of attribute uuid and other "native" attributes

2023-07-18 Thread Russell Bateman
I have a custom processor, /SplitHl7v4Resources/, that splits out 
individual FHIR resources (Patients, Observations, Encounters, etc.) 
from great Bundle flowfiles. So, for a given flowfile, it's split into 
hundreds of smaller ones.


When I do this, I leave the existing NiFi attributes as they were on the 
original flowfile.


As I contemplate the uuid attribute, it occurs to me that I should find 
out what its *significance is for provenance and other potential 
debugging/tracing concerns*. I never really look at it, but, if there 
were some kind of melt-down in a production environment, would I care 
that it multiplied across hundreds of flowfiles besided the original one?


Also these two other NiFi attributes remain unchanged:

   filename
   path


I do garnish each flowfile with many pointed/significant new attributes 
like resource.type that are my own. In my processing, I don't care about 
NiFi's original attributes, but should I?


Thanks,
Russ