[jira] [Commented] (NIFI-13077) On-demand Extension Provider

2024-06-04 Thread Pierre Villard (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851973#comment-17851973
 ] 

Pierre Villard commented on NIFI-13077:
---

As a side note, I think it is good to also keep a close eye on what [~bbende] 
is doing in NIFI-13343.

I think we all want to go in the same direction. Let's not duplicate efforts :)

> On-demand Extension Provider
> 
>
> Key: NIFI-13077
> URL: https://issues.apache.org/jira/browse/NIFI-13077
> Project: Apache NiFi
>  Issue Type: Epic
>  Components: Core Framework
>Reporter: Pierre Villard
>Priority: Major
>
> We currently have the concept of *ExternalResourceProvider* with two 
> implementations (HDFS and NiFi Registry) that can be configured to list and 
> download all NARs made available in those locations. Those implementations, 
> if configured, would get started when NiFi starts and would download ALL of 
> the available NARs, plus a background thread would check every five minutes 
> for new NARs to be available and downloaded.
> The proposal here is to have a similar concept that would focus on extensions 
> / components but instead of having a background thread and instead of having 
> all of the components downloaded, the approach would be to plug this into the 
> *ExtensionBuilder* and when a component cannot be instantiated (when loading 
> a flow definition) with locally available components, then, instead of 
> creating a ghost component, the Extension Providers would be queried with 
> specific coordinates and if the provider makes the component available, then 
> the NAR would be downloaded (alongside required dependencies if the NAR 
> depends on another NAR).
> This approach already exists in the *Kafka Connect NiFi plugin* with the 
> class {*}ExtensionClientDefinition{*}. By adopting this approach in NiFi, 
> it’d be much easier to ship a much *smaller version of NiFi* and have NiFi 
> download the required components based on flows that are being instantiated / 
> deployed.
> The operation of downloading the NAR would not be blocking, meaning that we 
> would still create a ghost component but after completion of the NAR(s) 
> download and the loading of the components, the flows would be fully 
> operational.
> It might be possible to show something similar as for the Python extensions 
> where we show that the component is still in the process of downloading third 
> party dependencies.
> While this is a great opportunity to reduce the size of the NiFi binary (and 
> associated container image), it would not be great from a user perspective 
> when designing flows because all of the NARs removed from the default image 
> would no longer be visible in the list of available components when adding, 
> for example, a processor to the canvas.
> Longer term we could imagine that the extension providers can also implement 
> a listing API so that when showing the list of available components, we would 
> show the list of the components available locally as well as the components 
> available through the extensions providers. The listing of components could 
> add another column to indicate the source of the component.
> This is something that is exposed for the Extension Bundles in the NiFi 
> Registry (we also have the information about the NiFi API version that has 
> been used for building the components so we could use this information to 
> only list components that should be compatible from an API standpoint - same 
> major version but lower or equal API version).
> The immediate goal though would be to introduce the concept of 
> ExtensionProvider with the following APIs:
> {code:java}
> boolean isAvailableExtension(Coordinates)
> void downloadExtension(Coordinates)
> {code}
> Longer term we could also consider something like:
> {code:java}
> List listExtensions(){code}
> But we would need to figure out how a NAR can provide the information about 
> the components that are inside of it. The NiFi Registry provides this 
> information, but that would not be the case for a Maven based implementation 
> for example.
> In nifi.properties we would have something looking like:
> {code:java}
> nifi.nar.extension.provider..{code}
> And we would loop through all the configured providers to find the 
> appropriate NAR to download based on provided coordinates in the flow 
> definition that is being instantiated (either from flow.json.gz, or an 
> uploaded JSON flow definition, or when checking out a flow from a registry 
> client).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-13077) On-demand Extension Provider

2024-06-03 Thread James Guzman (Medel) (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851852#comment-17851852
 ] 

James Guzman (Medel) commented on NIFI-13077:
-

Thanks [~pvillard] for sharing this Jira ticket in the NIFI-13301 ticket. Yes 
many great benefits to having an On-Demand Extension Provider. One of the 
benefits that caught my attention as I was reading your description was the 
ability to create smaller versions of Apache NIFi. Both when building NiFi and 
MiNiFi CPP from source, I have faced multiple times where the build failed 
because of some external libraries that failed to download or failed to build 
successfully, which caused the NiFi or MiNiFi CPP to fail building. As I start 
working on NIFI-13301. I will keep your NIFI-13077 in mind too. I really your 
extra input on this area. I was talking about this area with [~bbende] and 
[~joewitt] . We'll most likely need to break this feature into multiple sub 
tasks.

> On-demand Extension Provider
> 
>
> Key: NIFI-13077
> URL: https://issues.apache.org/jira/browse/NIFI-13077
> Project: Apache NiFi
>  Issue Type: Epic
>  Components: Core Framework
>Reporter: Pierre Villard
>Priority: Major
>
> We currently have the concept of *ExternalResourceProvider* with two 
> implementations (HDFS and NiFi Registry) that can be configured to list and 
> download all NARs made available in those locations. Those implementations, 
> if configured, would get started when NiFi starts and would download ALL of 
> the available NARs, plus a background thread would check every five minutes 
> for new NARs to be available and downloaded.
> The proposal here is to have a similar concept that would focus on extensions 
> / components but instead of having a background thread and instead of having 
> all of the components downloaded, the approach would be to plug this into the 
> *ExtensionBuilder* and when a component cannot be instantiated (when loading 
> a flow definition) with locally available components, then, instead of 
> creating a ghost component, the Extension Providers would be queried with 
> specific coordinates and if the provider makes the component available, then 
> the NAR would be downloaded (alongside required dependencies if the NAR 
> depends on another NAR).
> This approach already exists in the *Kafka Connect NiFi plugin* with the 
> class {*}ExtensionClientDefinition{*}. By adopting this approach in NiFi, 
> it’d be much easier to ship a much *smaller version of NiFi* and have NiFi 
> download the required components based on flows that are being instantiated / 
> deployed.
> The operation of downloading the NAR would not be blocking, meaning that we 
> would still create a ghost component but after completion of the NAR(s) 
> download and the loading of the components, the flows would be fully 
> operational.
> It might be possible to show something similar as for the Python extensions 
> where we show that the component is still in the process of downloading third 
> party dependencies.
> While this is a great opportunity to reduce the size of the NiFi binary (and 
> associated container image), it would not be great from a user perspective 
> when designing flows because all of the NARs removed from the default image 
> would no longer be visible in the list of available components when adding, 
> for example, a processor to the canvas.
> Longer term we could imagine that the extension providers can also implement 
> a listing API so that when showing the list of available components, we would 
> show the list of the components available locally as well as the components 
> available through the extensions providers. The listing of components could 
> add another column to indicate the source of the component.
> This is something that is exposed for the Extension Bundles in the NiFi 
> Registry (we also have the information about the NiFi API version that has 
> been used for building the components so we could use this information to 
> only list components that should be compatible from an API standpoint - same 
> major version but lower or equal API version).
> The immediate goal though would be to introduce the concept of 
> ExtensionProvider with the following APIs:
> {code:java}
> boolean isAvailableExtension(Coordinates)
> void downloadExtension(Coordinates)
> {code}
> Longer term we could also consider something like:
> {code:java}
> List listExtensions(){code}
> But we would need to figure out how a NAR can provide the information about 
> the components that are inside of it. The NiFi Registry provides this 
> information, but that would not be the case for a Maven based implementation 
> for example.
> In nifi.properties we would have something looking like:
> {code:java}
> nifi.nar.extension.provider..{code}
> And we would