markap14 commented on pull request #5059:
URL: https://github.com/apache/nifi/pull/5059#issuecomment-838659499
Thanks for the contribution @simonbence! Looking through this i think ideas
are sound. However, this is updating the `nifi-api` module, and once it's been
released it's permanent. So we need to be extremely careful about anything that
we put in there. Naming, Exceptions that should be thrown, and the correct
level of abstraction. So I have some thoughts around this.
- What this 'extension' is really doing has nothing to do with Auto-Loading.
It has to do with sourcing the NAR's from some other location. And given the
interface provided in the PR, things like `start()` and `stop()` don't make a
lot of sense to me, to be honest. When I see that, I feel like the component
should be starting some background thread or something maybe so that it can
poll? But then the background thread is created elsewhere. I think what would
be a more appropriate level of abstraction would be:
```
public interface NarProvider {
/**
* Initializes the NAR Provider based on the given set of properties
*/
void initialize(NarProviderInitializationContext context);
/**
* Performs a listing of all NAR's that are available.
*/
Collection listNars() throws IOException;
/**
* Fetches the NAR at the given location. The location should be one of
the values returned by listNars().
*/
InputStream fetchNarContents(String location) throws IOException;
}
```
And here, `NarProviderInitializlationContext` would perhaps just have a
single method:
```
Map getProperties();
```
But this abstraction gives us the freedom to update it in the future.
I also think it's important to consider how those properties get provided to
the implementation. The way that we typically handle something like this within
NiFi would be to have the framework look at `nifi.properties` and pull out the
properties appropriate for the instance. So, perhaps we would look for any
property that has the format:
```
nifi.library.nar.autoload..implementation
nifi.library.nar.autoload..XYZ
```
And then build up a Map where the key is the property name
after and the value is the property value. For example, if we have in
nifi.properties:
```
nifi.library.nar.autoload.hdfs.implementation=org.apache.nifi.hdfs.FetchNarHdfs
nifi.library.nar.autoload.hdfs.kerberos=username
nifi.library.nar.autoload.hdfs.hello=world
```
This would produce a Map that could be provided to the `initialize` method
(via the context):
kerberos => username
hello => world
(No need to include `implementation` necessarily because it's used by the
framework to know the name of the class to use but doesn't really need to be
passed to the class itself).
A few additional thoughts on the implementation:
- With this approach, we only need to have extensions for determining what's
available and downloading it. The Auto-Loader should not be pluggable, and it
should be responsible for calling out to each NarProvider to get a listing, and
then triggering the download of each NAR.
- It makes sense to support multiple Nar Providers, even multiple of the
same type. It's very reasonable to imagine an organization in which there are
multiple teams hosting nars in different locations, and a user may want to
fetch from multiple directories, etc. This would all work nicely with the above
property naming scheme.
- We should always include debug-level logging indicating which NAR's were
returned when performing the listing, and this should indicate how long it took
to perform that listing.
- Should log at an INFO level each time a NAR is downloaded and how long
that took
- NAR's should be written with a dot-name (`.my-bundle.nar`) and then
renamed so that the Extension Loader is able to detect it only when it's done.
It probably would make sense to download into a sub-directory of the
`extensions` directory. I.e., if the nar loader is configured to monitor the
`./extensions` directory we could create a directory `./extensions/hdfs` and
then update the Nar Loader to also scan that directory. This would make it
easier to understand where each of the NAR's came from if we have multiple
NarProviders.
- It's possible that over time users will continually add new versions of a
NAR to the same directory. And over time this will result in loading massive
amounts of NAR's into NiFi. Those that get removed will remain in NiFi. We will
need some way to age off old NARs. Perhaps after performing a listing, we
should detect any nar's that were previously downloaded from that NarProvider
and not present in the Listing. Then, if any of those is not in use (i.e., no
extensions are in the flow that use that bundle) then we can remove the bundle
and its NarLoader.