[ 
https://issues.apache.org/jira/browse/NIFI-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713651#comment-17713651
 ] 

Patrick A. Mol edited comment on NIFI-11464 at 4/18/23 4:33 PM:
----------------------------------------------------------------

The core of the issue is {color:#ffab00}highlighted{color} below.

My parent PG is called *reusables* with a controller service 
_same_name_avro_reader_ with identifier *R123.*
Within this parent I create flow {*}reusable_flow_R{*}, using 
_same_name_avro_reader_ controller service. 
I commit the flow, and the flow is stored with an external controller service 
information: identifier "{*}R123{*}", name {_}same_name_avro_reader{_}.

I create another parent PG called *tenant_flow_T* and create a controller 
service _same_name_avro_reader_ that has identifier *T123.*
I then import the *reusable_flow_R* into {*}tenant_flow_T{*}.
The _same_name_avro_reader_ gets resolved to the controller service as defined 
in tenant_flows with identifier {*}T123{*}. 
The name-based resolution works as intended.  =====>>  +that is not the issue+ 
<<=====
I then proceed to store *tenant_flow_T* in version control. 
The stored version of *tenant_flow_T* includes afaik:
 * the controller service same_name_avro_reader with identifier *T123*
 * a reference to where nested versioned flow can be found in the registry.
{color:#ffab00}The information that *reusable_flow_R* was configured to use 
controller service with identifier *T123* is lost at this point, i.e is not 
stored in the registry.{color}

Now I want to run *tenant_flows_T* in production, so I import the flow from 
version control.
The flow gets imported, including the controller service definition for 
_same_name_avro_reader_ with identifier {*}T123{*}.
{color:#ffab00}The nested versioned flow *reusable_flow_R* is pulled from the 
registry as part of the checkout
and the controller services property is set to the identifiers it was stored 
with in the registry, i.e. {*}R123{*},
which is of course invalid because *tenant_flow_T* does not know about a 
controller service with identifier {*}R123{*}. {color}
{color:#ffab00}Unlike when you import a versioned flow in the UI, the 
name-based resolution switching *R123* to *T123* does not occur this 
time.{color}
{color:#ffab00}In theory this could be fixed by applying the same logic, but 
you would be confined to using controller services with the same name.{color}

{color:#ffab00}*While a flow is valid in development, it might not be valid 
after doing a fresh import. It is not guaranteed.*{color}

Important side-effect.
+One would not be able to use the flow with the ExecuteStateless processor if a 
fresh checkout results in a flow with invalid components.+

Workarounds
 * Workaround is to change the references after import,
but I would need to repeat this every time I checkout the a new instance of the 
flow.
 * Develop and commit *reusable_flow_R* in the confines of {*}tenant_flow_T{*}, 
so the version of *reusable_flow_R* is stored with *T123* instead of {*}R123{*}.
{color:#ffab00}This means I am no longer developing reusables independently or 
separately.{color}

Implications for development.
When you import a reusable for the first time, you see a UUID, not a name, when 
the controller service cannot be resolved.
You would need enforced strict naming policies across the board, 
or find out what names are used before setting up controller services for the 
name-based resolution to work.
I think this is a huge burden.


was (Author: JIRAUSER299819):
The core of the issue is {color:#ffab00}highlighted{color} below.

My parent PG is called *reusables* with a controller service 
_same_name_avro_reader_ with identifier *R123.*
Within this parent I create flow {*}reusable_flow_R{*}, using 
_same_name_avro_reader_ controller service. 
I commit the flow, and the flow is stored with an external controller service 
information: identifier "{*}R123{*}", name {_}same_name_avro_reader{_}.

I create another parent PG called *tenant_flow_T* and create a controller 
service _same_name_avro_reader_ that has identifier *T123.*
I then import the *reusable_flow_R* into {*}tenant_flow_T{*}.
The _same_name_avro_reader_ gets resolved to the controller service as defined 
in tenant_flows with identifier {*}T123{*}. 
The name-based resolution works as intended.  ------> +that is not the issue+ 
<------
I then proceed to store *tenant_flow_T* in version control. 
The stored version of *tenant_flow_T* includes afaik:
 * the controller service same_name_avro_reader with identifier *T123*
 * a reference to where nested versioned flow can be found in the registry.
{color:#ffab00}The information that *reusable_flow_R* was configured to use 
controller service with identifier *T123* is lost at this point, i.e is not 
stored in the registry.
{color}

Now I want to run *tenant_flows_T* in production, so I import the flow from 
version control.
The flow gets imported, including the controller service definition for 
_same_name_avro_reader_ with identifier {*}T123{*}.
{color:#ffab00}The nested versioned flow *reusable_flow_R* is pulled from the 
registry as part of the checkout
and the controller services property is set to the identifiers it was stored 
with in the registry, i.e. {*}R123{*},
which is of course invalid because *tenant_flow_T* does not know about a 
controller service with identifier {*}R123{*}. {color}
{color:#ffab00}Unlike when you import a versioned flow in the UI, the 
name-based resolution switching *R123* to *T123* does not occur this 
time.{color}
{color:#ffab00}In theory this could be fixed by applying the same logic, but 
you would be confined to using controller services with the same name.{color}

{color:#ffab00}*While a flow is valid in development, it might not be valid 
after doing a fresh import. It is not guaranteed.*{color}

Important side-effect.
+One would not be able to use the flow with the ExecuteStateless processor if a 
fresh checkout results in a flow with invalid components.+

Workarounds
 * Workaround is to change the references after import,
but I would need to repeat this every time I checkout the a new instance of the 
flow.
 * Develop and commit *reusable_flow_R* in the confines of {*}tenant_flow_T{*}, 
so the version of *reusable_flow_R* is stored with *T123* instead of {*}R123{*}.
{color:#ffab00}This means I am no longer developing reusables independently or 
separately.{color}

Implications for development.
When you import a reusable for the first time, you see a UUID, not a name, when 
the controller service cannot be resolved.
You would need enforced strict naming policies across the board, 
or find out what names are used before setting up controller services for the 
name-based resolution to work.
I think this is a huge burden.

> importing a versioned flow with a nested versioned flow shows nested 
> versioned flow with invalid controller services.
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-11464
>                 URL: https://issues.apache.org/jira/browse/NIFI-11464
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Flow Versioning
>    Affects Versions: 1.21.0
>         Environment: nifi 1.21.0 and nifi registry 1.21.0 (on ubuntu 20.04)
>            Reporter: Patrick A. Mol
>            Priority: Major
>         Attachments: exported_flow_versions_pretty.zip, 
> image-2023-04-17-17-34-08-898.png, image-2023-04-17-18-06-16-102.png, 
> nested_versioned_flow_issue.xml, screenprints_reproduction_steps.zip
>
>
> When a flow (reusable_flow_Q) has controller services inherited from the 
> hierarchy (process group reusables) and a version of the flow is stored, the 
> flow version contains the references to these external controller services 
> (as seen in an exported flow version [see below]).
> When this versioned flow is imported in another flow (tenant_flows) the 
> controller services need to be reset to the controller services in the new 
> hierarchy.
> When we have a working flow with the nested versioned flow ready in 
> development we can check this flow into version control.
> When we then deploy the flow in production, the nested versioned flow shows 
> up with invalid components. It shows the external controller service 
> identifiers as stored in the flow version.
> When we then go back to development version of tenant_flows and make a minor 
> change to the nested versioned flow reusable_flow_Q and commit this change to 
> version control.
> Due to this version change, we need to also commit the changes for the 
> tenant_flows process group.
> When we now go back to production, and import this new version of 
> tenant_flows, the nested versioned flow reusable_flow_Q does not have invalid 
> controller services.
> If you have several flows under development using the same reusable 
> components, 
> you will likely end up with invalid components after import.
> Depending on the amount of versioned flows used, it could be a lot of work.
> It could also lead to issues when using the ExecuteStateless processor.
> Please see attached template nested_version_flow_issue.xml for a starting 
> point to reproduce the issue. It contains the steps.
> Screenprints are attached in a zip file show the process and diagnosis.
> Controller services identifiers in version 2.
> {code:java}
> $ fgrep -C 4 reusables_avro reusable_flow_Q.json.pretty 
>     "controllerServices": [
>       {
>         "identifier": "dc884171-4d75-3854-8604-afab91bd0e60",
>         "instanceIdentifier": "8f647d06-0187-1000-4be9-14a61f55d904",
>         "name": "reusables_avro_reader",
>         "comments": "",
>         "type": "org.apache.nifi.avro.AvroReader",
>         "bundle": {
>           "group": "org.apache.nifi",
> --
>       },
>       {
>         "identifier": "b512b238-cdee-3642-b5cb-0c98d30dd133",
>         "instanceIdentifier": "8f64f2c6-0187-1000-7557-ca63c88054dd",
>         "name": "reusables_avro_writer",
>         "comments": "",
>         "type": "org.apache.nifi.avro.AvroRecordSetWriter",
>         "bundle": {
>           "group": "org.apache.nifi",
> $ head -15 reusable_flow_Q-version-2.json.pretty 
> {
>   "externalControllerServices": {
>     "dc884171-4d75-3854-8604-afab91bd0e60": {
>       "identifier": "dc884171-4d75-3854-8604-afab91bd0e60",
>       "name": "reusables_avro_reader"
>     },
>     "b512b238-cdee-3642-b5cb-0c98d30dd133": {
>       "identifier": "b512b238-cdee-3642-b5cb-0c98d30dd133",
>       "name": "reusables_avro_writer"
>     }
>   },
>   "flowContents": {
>     "comments": "used to perform Q ...",
>     "componentType": "PROCESS_GROUP",
>     "connections": [
>  {code}
> Controller services identifiers with version 3 committed in process group 
> "tenant_flows".
> {code:java}
> pmo@hpmo:~/Documents.local/nested_versioned_flows_controller_issue$ fgrep -C 
> 4 tenant_flow_avro tenant_flows-version-1.json.pretty 
>         ],
>         "groupIdentifier": "a984831b-8587-3e17-bbbc-ef4b85c3898d",
>         "identifier": "5d9df37d-2a52-3f6e-8cd3-3d3ea9550d22",
>         "instanceIdentifier": "8f6cb319-0187-1000-b7fa-83340f7055f7",
>         "name": "tenant_flow_avro_writer",
>         "properties": {
>           "compression-format": "NONE",
>           "Schema Write Strategy": "avro-embedded",
>           "schema-name": "${schema.name}",
> --
>         ],
>         "groupIdentifier": "a984831b-8587-3e17-bbbc-ef4b85c3898d",
>         "identifier": "8ff96d88-3dc8-30ed-aeb8-757c26a7b807",
>         "instanceIdentifier": "8f6c8a94-0187-1000-af54-2fee12838618",
>         "name": "tenant_flow_avro_reader",
>         "properties": {
>           "schema-name": "${schema.name}",
>           "cache-size": "1000",
>           "schema-access-strategy": "embedded-avro-schema",
> pmo@hpmo:~/Documents.local/nested_versioned_flows_controller_issue$ head -15 
> reusable_flow_Q-version-3.json.pretty 
> {
>   "externalControllerServices": {
>     "8ff96d88-3dc8-30ed-aeb8-757c26a7b807": {
>       "identifier": "8ff96d88-3dc8-30ed-aeb8-757c26a7b807",
>       "name": "tenant_flow_avro_reader"
>     },
>     "5d9df37d-2a52-3f6e-8cd3-3d3ea9550d22": {
>       "identifier": "5d9df37d-2a52-3f6e-8cd3-3d3ea9550d22",
>       "name": "tenant_flow_avro_writer"
>     }
>   },
>   "flowContents": {
>     "comments": "used to perform Q ...",
>     "componentType": "PROCESS_GROUP",
>     "connections": [
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to