Re: Use of attribute uuid and other "native" attributes
Of course, a custom processor can create any attribute, including an "external id field." I don't think it can "lose" the original uuid since, if it attempts to reset it, the action will be quietly ignored (Mark). Note that uuid figures prominently in the display of provenance--in my mind the crucial nature of my question. [1] My question was about the "sanctified" state (or not) of uuid and Matt and Mark gave succinct and useful answers that I will explore. I was unaware of the suggested "best practice" of considering losing any and all previously established attributes before sending flowfiles on. I have long done this explicitly in the case of attributes I create, but will now contemplate doing it for other attributes I did not create and therefore have respected "religiously." Russ [1] https://www.tutorialspoint.com/apache_nifi/apache_nifi_data_provenance.htm On 7/18/23 14:07, Edward Armes wrote: Hmm, I've seen this come up a few times now I wonder is there need for a rename of the uuid field and a creation of an external id field? Edward On Tue, 18 Jul 2023, 20:53 Lucas Ottersbach, wrote: Hey Matt, you wrote that both `Session.create` and `Session.clone` set a new FlowFile UUID to the resulting FlowFile. This somewhat sounds like there is an alternative way where the UUID is not controlled by the framework itself? I've got a different use case than Russell, but was wondering whether it is even possible to control the FlowFile UUID as a Processor developer? I've got a processor pair for inter-cluster transfer of FlowFiles (where Site-to-Site is not applicable). As of now, the UUID on the receiving side differs from the original on the origin cluster, because I'm using `Session.create`. Is there a way to control the UUID of new FlowFiles? Best regards, Lucas Matt Burgess schrieb am Di., 18. Juli 2023, 20:23: In general I recommend only sending on those attributes that will be used at some point downstream (unless you have an "original" relationship that should maintain the original state with respect to provenance). If you don't know that ahead of time you'll probably need to send all/most of the attributes just in case. Are you using session.create() or session.clone()? They both set a new "uuid" attribute on the created FlowFile, with at least the latter setting some other attributes as well (see the Developer Guide [1] for more details). Regards, Matt [1]https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman wrote: I have a custom processor, /SplitHl7v4Resources/, that splits out individual FHIR resources (Patients, Observations, Encounters, etc.) from great Bundle flowfiles. So, for a given flowfile, it's split into hundreds of smaller ones. When I do this, I leave the existing NiFi attributes as they were on the original flowfile. As I contemplate the uuid attribute, it occurs to me that I should find out what its *significance is for provenance and other potential debugging/tracing concerns*. I never really look at it, but, if there were some kind of melt-down in a production environment, would I care that it multiplied across hundreds of flowfiles besided the original one? Also these two other NiFi attributes remain unchanged: filename path I do garnish each flowfile with many pointed/significant new attributes like resource.type that are my own. In my processing, I don't care about NiFi's original attributes, but should I? Thanks, Russ
Re: Use of attribute uuid and other "native" attributes
Hmm, I've seen this come up a few times now I wonder is there need for a rename of the uuid field and a creation of an external id field? Edward On Tue, 18 Jul 2023, 20:53 Lucas Ottersbach, wrote: > Hey Matt, > > you wrote that both `Session.create` and `Session.clone` set a new FlowFile > UUID to the resulting FlowFile. This somewhat sounds like there is an > alternative way where the UUID is not controlled by the framework itself? > > I've got a different use case than Russell, but was wondering whether it is > even possible to control the FlowFile UUID as a Processor developer? I've > got a processor pair for inter-cluster transfer of FlowFiles (where > Site-to-Site is not applicable). As of now, the UUID on the receiving side > differs from the original on the origin cluster, because I'm using > `Session.create`. > Is there a way to control the UUID of new FlowFiles? > > > Best regards, > > Lucas > > Matt Burgess schrieb am Di., 18. Juli 2023, 20:23: > > > In general I recommend only sending on those attributes that will be > > used at some point downstream (unless you have an "original" > > relationship that should maintain the original state with respect to > > provenance). If you don't know that ahead of time you'll probably need > > to send all/most of the attributes just in case. > > > > Are you using session.create() or session.clone()? They both set a new > > "uuid" attribute on the created FlowFile, with at least the latter > > setting some other attributes as well (see the Developer Guide [1] for > > more details). > > > > Regards, > > Matt > > > > [1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html > > > > On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman > > wrote: > > > > > > I have a custom processor, /SplitHl7v4Resources/, that splits out > > > individual FHIR resources (Patients, Observations, Encounters, etc.) > > > from great Bundle flowfiles. So, for a given flowfile, it's split into > > > hundreds of smaller ones. > > > > > > When I do this, I leave the existing NiFi attributes as they were on > the > > > original flowfile. > > > > > > As I contemplate the uuid attribute, it occurs to me that I should find > > > out what its *significance is for provenance and other potential > > > debugging/tracing concerns*. I never really look at it, but, if there > > > were some kind of melt-down in a production environment, would I care > > > that it multiplied across hundreds of flowfiles besided the original > one? > > > > > > Also these two other NiFi attributes remain unchanged: > > > > > > filename > > > path > > > > > > > > > I do garnish each flowfile with many pointed/significant new attributes > > > like resource.type that are my own. In my processing, I don't care > about > > > NiFi's original attributes, but should I? > > > > > > Thanks, > > > Russ > > >
Re: Use of attribute uuid and other "native" attributes
That was impression as well. Thank you for the quick response and the clarification. Best regards Lucas Mark Payne schrieb am Di., 18. Juli 2023, 21:56: > Lucas, > > You cannot control the UUID. It’s automatically generated by the > framework. If you attempt to use ProcessSession.putAllAttributes or > ProcessSession.putAttribute, it’ll ignore the “uuid” key. > > Thanks > -Mark > > > > On Jul 18, 2023, at 3:51 PM, Lucas Ottersbach < > lucas.ottersb...@gmail.com> wrote: > > > > Hey Matt, > > > > you wrote that both `Session.create` and `Session.clone` set a new > FlowFile > > UUID to the resulting FlowFile. This somewhat sounds like there is an > > alternative way where the UUID is not controlled by the framework itself? > > > > I've got a different use case than Russell, but was wondering whether it > is > > even possible to control the FlowFile UUID as a Processor developer? I've > > got a processor pair for inter-cluster transfer of FlowFiles (where > > Site-to-Site is not applicable). As of now, the UUID on the receiving > side > > differs from the original on the origin cluster, because I'm using > > `Session.create`. > > Is there a way to control the UUID of new FlowFiles? > > > > > > Best regards, > > > > Lucas > > > > Matt Burgess schrieb am Di., 18. Juli 2023, > 20:23: > > > >> In general I recommend only sending on those attributes that will be > >> used at some point downstream (unless you have an "original" > >> relationship that should maintain the original state with respect to > >> provenance). If you don't know that ahead of time you'll probably need > >> to send all/most of the attributes just in case. > >> > >> Are you using session.create() or session.clone()? They both set a new > >> "uuid" attribute on the created FlowFile, with at least the latter > >> setting some other attributes as well (see the Developer Guide [1] for > >> more details). > >> > >> Regards, > >> Matt > >> > >> [1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html > >> > >> On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman > > >> wrote: > >>> > >>> I have a custom processor, /SplitHl7v4Resources/, that splits out > >>> individual FHIR resources (Patients, Observations, Encounters, etc.) > >>> from great Bundle flowfiles. So, for a given flowfile, it's split into > >>> hundreds of smaller ones. > >>> > >>> When I do this, I leave the existing NiFi attributes as they were on > the > >>> original flowfile. > >>> > >>> As I contemplate the uuid attribute, it occurs to me that I should find > >>> out what its *significance is for provenance and other potential > >>> debugging/tracing concerns*. I never really look at it, but, if there > >>> were some kind of melt-down in a production environment, would I care > >>> that it multiplied across hundreds of flowfiles besided the original > one? > >>> > >>> Also these two other NiFi attributes remain unchanged: > >>> > >>>filename > >>>path > >>> > >>> > >>> I do garnish each flowfile with many pointed/significant new attributes > >>> like resource.type that are my own. In my processing, I don't care > about > >>> NiFi's original attributes, but should I? > >>> > >>> Thanks, > >>> Russ > >> > >
Re: Use of attribute uuid and other "native" attributes
Lucas, You cannot control the UUID. It’s automatically generated by the framework. If you attempt to use ProcessSession.putAllAttributes or ProcessSession.putAttribute, it’ll ignore the “uuid” key. Thanks -Mark > On Jul 18, 2023, at 3:51 PM, Lucas Ottersbach > wrote: > > Hey Matt, > > you wrote that both `Session.create` and `Session.clone` set a new FlowFile > UUID to the resulting FlowFile. This somewhat sounds like there is an > alternative way where the UUID is not controlled by the framework itself? > > I've got a different use case than Russell, but was wondering whether it is > even possible to control the FlowFile UUID as a Processor developer? I've > got a processor pair for inter-cluster transfer of FlowFiles (where > Site-to-Site is not applicable). As of now, the UUID on the receiving side > differs from the original on the origin cluster, because I'm using > `Session.create`. > Is there a way to control the UUID of new FlowFiles? > > > Best regards, > > Lucas > > Matt Burgess schrieb am Di., 18. Juli 2023, 20:23: > >> In general I recommend only sending on those attributes that will be >> used at some point downstream (unless you have an "original" >> relationship that should maintain the original state with respect to >> provenance). If you don't know that ahead of time you'll probably need >> to send all/most of the attributes just in case. >> >> Are you using session.create() or session.clone()? They both set a new >> "uuid" attribute on the created FlowFile, with at least the latter >> setting some other attributes as well (see the Developer Guide [1] for >> more details). >> >> Regards, >> Matt >> >> [1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html >> >> On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman >> wrote: >>> >>> I have a custom processor, /SplitHl7v4Resources/, that splits out >>> individual FHIR resources (Patients, Observations, Encounters, etc.) >>> from great Bundle flowfiles. So, for a given flowfile, it's split into >>> hundreds of smaller ones. >>> >>> When I do this, I leave the existing NiFi attributes as they were on the >>> original flowfile. >>> >>> As I contemplate the uuid attribute, it occurs to me that I should find >>> out what its *significance is for provenance and other potential >>> debugging/tracing concerns*. I never really look at it, but, if there >>> were some kind of melt-down in a production environment, would I care >>> that it multiplied across hundreds of flowfiles besided the original one? >>> >>> Also these two other NiFi attributes remain unchanged: >>> >>>filename >>>path >>> >>> >>> I do garnish each flowfile with many pointed/significant new attributes >>> like resource.type that are my own. In my processing, I don't care about >>> NiFi's original attributes, but should I? >>> >>> Thanks, >>> Russ >>
Re: Use of attribute uuid and other "native" attributes
Hey Matt, you wrote that both `Session.create` and `Session.clone` set a new FlowFile UUID to the resulting FlowFile. This somewhat sounds like there is an alternative way where the UUID is not controlled by the framework itself? I've got a different use case than Russell, but was wondering whether it is even possible to control the FlowFile UUID as a Processor developer? I've got a processor pair for inter-cluster transfer of FlowFiles (where Site-to-Site is not applicable). As of now, the UUID on the receiving side differs from the original on the origin cluster, because I'm using `Session.create`. Is there a way to control the UUID of new FlowFiles? Best regards, Lucas Matt Burgess schrieb am Di., 18. Juli 2023, 20:23: > In general I recommend only sending on those attributes that will be > used at some point downstream (unless you have an "original" > relationship that should maintain the original state with respect to > provenance). If you don't know that ahead of time you'll probably need > to send all/most of the attributes just in case. > > Are you using session.create() or session.clone()? They both set a new > "uuid" attribute on the created FlowFile, with at least the latter > setting some other attributes as well (see the Developer Guide [1] for > more details). > > Regards, > Matt > > [1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html > > On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman > wrote: > > > > I have a custom processor, /SplitHl7v4Resources/, that splits out > > individual FHIR resources (Patients, Observations, Encounters, etc.) > > from great Bundle flowfiles. So, for a given flowfile, it's split into > > hundreds of smaller ones. > > > > When I do this, I leave the existing NiFi attributes as they were on the > > original flowfile. > > > > As I contemplate the uuid attribute, it occurs to me that I should find > > out what its *significance is for provenance and other potential > > debugging/tracing concerns*. I never really look at it, but, if there > > were some kind of melt-down in a production environment, would I care > > that it multiplied across hundreds of flowfiles besided the original one? > > > > Also these two other NiFi attributes remain unchanged: > > > > filename > > path > > > > > > I do garnish each flowfile with many pointed/significant new attributes > > like resource.type that are my own. In my processing, I don't care about > > NiFi's original attributes, but should I? > > > > Thanks, > > Russ >
Re: Use of attribute uuid and other "native" attributes
In general I recommend only sending on those attributes that will be used at some point downstream (unless you have an "original" relationship that should maintain the original state with respect to provenance). If you don't know that ahead of time you'll probably need to send all/most of the attributes just in case. Are you using session.create() or session.clone()? They both set a new "uuid" attribute on the created FlowFile, with at least the latter setting some other attributes as well (see the Developer Guide [1] for more details). Regards, Matt [1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman wrote: > > I have a custom processor, /SplitHl7v4Resources/, that splits out > individual FHIR resources (Patients, Observations, Encounters, etc.) > from great Bundle flowfiles. So, for a given flowfile, it's split into > hundreds of smaller ones. > > When I do this, I leave the existing NiFi attributes as they were on the > original flowfile. > > As I contemplate the uuid attribute, it occurs to me that I should find > out what its *significance is for provenance and other potential > debugging/tracing concerns*. I never really look at it, but, if there > were some kind of melt-down in a production environment, would I care > that it multiplied across hundreds of flowfiles besided the original one? > > Also these two other NiFi attributes remain unchanged: > > filename > path > > > I do garnish each flowfile with many pointed/significant new attributes > like resource.type that are my own. In my processing, I don't care about > NiFi's original attributes, but should I? > > Thanks, > Russ
Use of attribute uuid and other "native" attributes
I have a custom processor, /SplitHl7v4Resources/, that splits out individual FHIR resources (Patients, Observations, Encounters, etc.) from great Bundle flowfiles. So, for a given flowfile, it's split into hundreds of smaller ones. When I do this, I leave the existing NiFi attributes as they were on the original flowfile. As I contemplate the uuid attribute, it occurs to me that I should find out what its *significance is for provenance and other potential debugging/tracing concerns*. I never really look at it, but, if there were some kind of melt-down in a production environment, would I care that it multiplied across hundreds of flowfiles besided the original one? Also these two other NiFi attributes remain unchanged: filename path I do garnish each flowfile with many pointed/significant new attributes like resource.type that are my own. In my processing, I don't care about NiFi's original attributes, but should I? Thanks, Russ