[jira] [Updated] (NIFI-13077) On-demand Extension Provider

2024-06-03 Thread James Guzman (Medel) (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Guzman (Medel) updated NIFI-13077:

Description: 
We currently have the concept of *ExternalResourceProvider* with two 
implementations (HDFS and NiFi Registry) that can be configured to list and 
download all NARs made available in those locations. Those implementations, if 
configured, would get started when NiFi starts and would download ALL of the 
available NARs, plus a background thread would check every five minutes for new 
NARs to be available and downloaded.

The proposal here is to have a similar concept that would focus on extensions / 
components but instead of having a background thread and instead of having all 
of the components downloaded, the approach would be to plug this into the 
*ExtensionBuilder* and when a component cannot be instantiated (when loading a 
flow definition) with locally available components, then, instead of creating a 
ghost component, the Extension Providers would be queried with specific 
coordinates and if the provider makes the component available, then the NAR 
would be downloaded (alongside required dependencies if the NAR depends on 
another NAR).

This approach already exists in the *Kafka Connect NiFi plugin* with the class 
{*}ExtensionClientDefinition{*}. By adopting this approach in NiFi, it’d be 
much easier to ship a much *smaller version of NiFi* and have NiFi download the 
required components based on flows that are being instantiated / deployed.

The operation of downloading the NAR would not be blocking, meaning that we 
would still create a ghost component but after completion of the NAR(s) 
download and the loading of the components, the flows would be fully 
operational.

It might be possible to show something similar as for the Python extensions 
where we show that the component is still in the process of downloading third 
party dependencies.

While this is a great opportunity to reduce the size of the NiFi binary (and 
associated container image), it would not be great from a user perspective when 
designing flows because all of the NARs removed from the default image would no 
longer be visible in the list of available components when adding, for example, 
a processor to the canvas.

Longer term we could imagine that the extension providers can also implement a 
listing API so that when showing the list of available components, we would 
show the list of the components available locally as well as the components 
available through the extensions providers. The listing of components could add 
another column to indicate the source of the component.

This is something that is exposed for the Extension Bundles in the NiFi 
Registry (we also have the information about the NiFi API version that has been 
used for building the components so we could use this information to only list 
components that should be compatible from an API standpoint - same major 
version but lower or equal API version).

The immediate goal though would be to introduce the concept of 
ExtensionProvider with the following APIs:
{code:java}
boolean isAvailableExtension(Coordinates)
void downloadExtension(Coordinates)
{code}
Longer term we could also consider something like:
{code:java}
List listExtensions(){code}
But we would need to figure out how a NAR can provide the information about the 
components that are inside of it. The NiFi Registry provides this information, 
but that would not be the case for a Maven based implementation for example.

In nifi.properties we would have something looking like:
{code:java}
nifi.nar.extension.provider..{code}
And we would loop through all the configured providers to find the appropriate 
NAR to download based on provided coordinates in the flow definition that is 
being instantiated (either from flow.json.gz, or an uploaded JSON flow 
definition, or when checking out a flow from a registry client).

  was:
We currently have the concept of ExternalResourceProvider with two 
implementations (HDFS and NiFi Registry) that can be configured to list and 
download all NARs made available in those locations. Those implementations, if 
configured, would get started when NiFi starts and would download ALL of the 
available NARs, plus a background thread would check every five minutes for new 
NARs to be available and downloaded.

The proposal here is to have a similar concept that would focus on extensions / 
components but instead of having a background thread and instead of having all 
of the components downloaded, the approach would be to plug this into the 
ExtensionBuilder and when a component cannot be instantiated (when loading a 
flow definition) with locally available components, then, instead of creating a 
ghost component, the Extension Providers would be queried with specific 
coordinates and if the provider makes the component available, then the NAR 

[jira] [Updated] (NIFI-13077) On-demand Extension Provider

2024-06-03 Thread James Guzman (Medel) (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Guzman (Medel) updated NIFI-13077:

Description: 
We currently have the concept of ExternalResourceProvider with two 
implementations (HDFS and NiFi Registry) that can be configured to list and 
download all NARs made available in those locations. Those implementations, if 
configured, would get started when NiFi starts and would download ALL of the 
available NARs, plus a background thread would check every five minutes for new 
NARs to be available and downloaded.

The proposal here is to have a similar concept that would focus on extensions / 
components but instead of having a background thread and instead of having all 
of the components downloaded, the approach would be to plug this into the 
ExtensionBuilder and when a component cannot be instantiated (when loading a 
flow definition) with locally available components, then, instead of creating a 
ghost component, the Extension Providers would be queried with specific 
coordinates and if the provider makes the component available, then the NAR 
would be downloaded (alongside required dependencies if the NAR depends on 
another NAR).

This approach already exists in the Kafka Connect NiFi plugin with the class 
{*}ExtensionClientDefinition{*}. By adopting this approach in NiFi, it’d be 
much easier to ship a much smaller version of NiFi and have NiFi download the 
required components based on flows that are being instantiated / deployed.

The operation of downloading the NAR would not be blocking, meaning that we 
would still create a ghost component but after completion of the NAR(s) 
download and the loading of the components, the flows would be fully 
operational.

It might be possible to show something similar as for the Python extensions 
where we show that the component is still in the process of downloading third 
party dependencies.

While this is a great opportunity to reduce the size of the NiFi binary (and 
associated container image), it would not be great from a user perspective when 
designing flows because all of the NARs removed from the default image would no 
longer be visible in the list of available components when adding, for example, 
a processor to the canvas.

Longer term we could imagine that the extension providers can also implement a 
listing API so that when showing the list of available components, we would 
show the list of the components available locally as well as the components 
available through the extensions providers. The listing of components could add 
another column to indicate the source of the component.

This is something that is exposed for the Extension Bundles in the NiFi 
Registry (we also have the information about the NiFi API version that has been 
used for building the components so we could use this information to only list 
components that should be compatible from an API standpoint - same major 
version but lower or equal API version).

The immediate goal though would be to introduce the concept of 
ExtensionProvider with the following APIs:
{code:java}
boolean isAvailableExtension(Coordinates)
void downloadExtension(Coordinates)
{code}
Longer term we could also consider something like:
{code:java}
List listExtensions(){code}
But we would need to figure out how a NAR can provide the information about the 
components that are inside of it. The NiFi Registry provides this information, 
but that would not be the case for a Maven based implementation for example.

In nifi.properties we would have something looking like:
{code:java}
nifi.nar.extension.provider..{code}
And we would loop through all the configured providers to find the appropriate 
NAR to download based on provided coordinates in the flow definition that is 
being instantiated (either from flow.json.gz, or an uploaded JSON flow 
definition, or when checking out a flow from a registry client).

  was:
We currently have the concept of ExternalResourceProvider with two 
implementations (HDFS and NiFi Registry) that can be configured to list and 
download all NARs made available in those locations. Those implementations, if 
configured, would get started when NiFi starts and would download ALL of the 
available NARs, plus a background thread would check every five minutes for new 
NARs to be available and downloaded.

The proposal here is to have a similar concept that would focus on extensions / 
components but instead of having a background thread and instead of having all 
of the components downloaded, the approach would be to plug this into the 
ExtensionBuilder and when a component cannot be instantiated (when loading a 
flow definition) with locally available components, then, instead of creating a 
ghost component, the Extension Providers would be queried with specific 
coordinates and if the provider makes the component available, then the NAR 
would be 

[jira] [Updated] (NIFI-13077) On-demand Extension Provider

2024-04-22 Thread Pierre Villard (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard updated NIFI-13077:
--
Description: 
We currently have the concept of ExternalResourceProvider with two 
implementations (HDFS and NiFi Registry) that can be configured to list and 
download all NARs made available in those locations. Those implementations, if 
configured, would get started when NiFi starts and would download ALL of the 
available NARs, plus a background thread would check every five minutes for new 
NARs to be available and downloaded.

The proposal here is to have a similar concept that would focus on extensions / 
components but instead of having a background thread and instead of having all 
of the components downloaded, the approach would be to plug this into the 
ExtensionBuilder and when a component cannot be instantiated (when loading a 
flow definition) with locally available components, then, instead of creating a 
ghost component, the Extension Providers would be queried with specific 
coordinates and if the provider makes the component available, then the NAR 
would be downloaded (alongside required dependencies if the NAR depends on 
another NAR).

This approach already exists in the Kafka Connect NiFi plugin with the class 
ExtensionClientDefinition. By adopting this approach in NiFi, it’d be much 
easier to ship a much smaller version of NiFi and have NiFi download the 
required components based on flows that are being instantiated / deployed.

The operation of downloading the NAR would not be blocking, meaning that we 
would still create a ghost component but after completion of the NAR(s) 
download and the loading of the components, the flows would be fully 
operational.

It might be possible to show something similar as for the Python extensions 
where we show that the component is still in the process of downloading third 
party dependencies.

While this is a great opportunity to reduce the size of the NiFi binary (and 
associated container image), it would not be great from a user perspective when 
designing flows because all of the NARs removed from the default image would no 
longer be visible in the list of available components when adding, for example, 
a processor to the canvas.

Longer term we could imagine that the extension providers can also implement a 
listing API so that when showing the list of available components, we would 
show the list of the components available locally as well as the components 
available through the extensions providers. The listing of components could add 
another column to indicate the source of the component.

This is something that is exposed for the Extension Bundles in the NiFi 
Registry (we also have the information about the NiFi API version that has been 
used for building the components so we could use this information to only list 
components that should be compatible from an API standpoint - same major 
version but lower or equal API version).

The immediate goal though would be to introduce the concept of 
ExtensionProvider with the following APIs:
{code:java}
boolean isAvailableExtension(Coordinates)
void downloadExtension(Coordinates)
{code}
Longer term we could also consider something like:
{code:java}
List listExtensions(){code}
But we would need to figure out how a NAR can provide the information about the 
components that are inside of it. The NiFi Registry provides this information, 
but that would not be the case for a Maven based implementation for example.

In nifi.properties we would have something looking like:
{code:java}
nifi.nar.extension.provider..{code}
And we would loop through all the configured providers to find the appropriate 
NAR to download based on provided coordinates in the flow definition that is 
being instantiated (either from flow.json.gz, or an uploaded JSON flow 
definition, or when checking out a flow from a registry client).

  was:
We currently have the concept of ExternalResourceProvider with two 
implementations (HDFS and NiFi Registry) that can be configured to list and 
download all NARs made available in those locations. Those implementations, if 
configured, would get started when NiFi starts and would download ALL of the 
available NARs, plus a background thread would check every five minutes for new 
NARs to be available and downloaded.

The proposal here is to have a similar concept that would focus on extensions / 
components but instead of having a background thread and instead of having all 
of the components downloaded, the approach would be to plug this into the 
ExtensionBuilder and when a component cannot be instantiated (when loading a 
flow definition) with locally available components, then, instead of creating a 
ghost component, the Extension Providers would be queried with specific 
coordinates and if the provider makes the component available, then the NAR 
would be downloaded