[GitHub] [nifi] markap14 commented on pull request #5059: NIFI-8519 Adding HDFS support for NAR autoload

2021-05-14 Thread GitBox


markap14 commented on pull request #5059:
URL: https://github.com/apache/nifi/pull/5059#issuecomment-841386401


   Thanks @simonbence ! All looks good now. Fixed one checkstyle violation but 
all was good otherwise. +1 merged to main!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] markap14 commented on pull request #5059: NIFI-8519 Adding HDFS support for NAR autoload

2021-05-13 Thread GitBox


markap14 commented on pull request #5059:
URL: https://github.com/apache/nifi/pull/5059#issuecomment-840846222


   One last thing that I think is necessary: please make sure that the admin 
guide is updated to describe the new properties.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] markap14 commented on pull request #5059: NIFI-8519 Adding HDFS support for NAR autoload

2021-05-13 Thread GitBox


markap14 commented on pull request #5059:
URL: https://github.com/apache/nifi/pull/5059#issuecomment-840845848


   
https://issues.apache.org/jira/secure/attachment/13025442/0001-NIFI-8519-Support-RequiresInstanceClassLoading-annot.patch
 is the patch that I attached.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] markap14 commented on pull request #5059: NIFI-8519 Adding HDFS support for NAR autoload

2021-05-13 Thread GitBox


markap14 commented on pull request #5059:
URL: https://github.com/apache/nifi/pull/5059#issuecomment-840691381


   There is no write up. Just a general observation: it doesn't really make 
sense IMO to allow for the extensions in nifi to live in either local 
filesystem or HDFS. What does make sense is local filesystem or some other 
pluggable location. HDFS will be the only additional location at the point this 
PR is merged in but it makes sense that it should lay the groundwork for it to 
be pluggable is all that I was trying to convey.
   
   Thanks @simonbence will take a look. And I think that's fair enough to push 
that to NIFI-8602.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] markap14 commented on pull request #5059: NIFI-8519 Adding HDFS support for NAR autoload

2021-05-12 Thread GitBox


markap14 commented on pull request #5059:
URL: https://github.com/apache/nifi/pull/5059#issuecomment-839940334


   @ottobackwards not sure VFS would provide what we need here. Not sure how it 
deals with all of the authentication etc. But most importantly, it doesn't 
really provide us the pluggability that we'd like to have here. We don't want 
to say "you can put extensions in your local file system or in HDFS." We want 
this to allow pluggability for any number of locations. We would also want to 
account for the fact that some systems may be offline, etc. and so we want to 
download the files locally, not just pull them on-demand each time. And we'd 
also have to consider all of the concerns with classloader isolation. The NAR 
Loader stuff at present is in the root lib. We certainly cannot put any sort of 
Hadoop libs, etc. at the root, so the logic needs to be isolated into NARs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] markap14 commented on pull request #5059: NIFI-8519 Adding HDFS support for NAR autoload

2021-05-11 Thread GitBox


markap14 commented on pull request #5059:
URL: https://github.com/apache/nifi/pull/5059#issuecomment-838659499


   Thanks for the contribution @simonbence! Looking through this i think ideas 
are sound. However, this is updating the `nifi-api` module, and once it's been 
released it's permanent. So we need to be extremely careful about anything that 
we put in there. Naming, Exceptions that should be thrown, and the correct 
level of abstraction. So I have some thoughts around this.
   
   - What this 'extension' is really doing has nothing to do with Auto-Loading. 
It has to do with sourcing the NAR's from some other location. And given the 
interface provided in the PR, things like `start()` and `stop()` don't make a 
lot of sense to me, to be honest. When I see that, I feel like the component 
should be starting some background thread or something maybe so that it can 
poll? But then the background thread is created elsewhere. I think what would 
be a more appropriate level of abstraction would be:
   ```
   public interface NarProvider {
   /**
* Initializes the NAR Provider based on the given set of properties
*/
   void initialize(NarProviderInitializationContext context);
   
   /**
* Performs a listing of all NAR's that are available.
*/
   Collection listNars() throws IOException;
   
   /**
* Fetches the NAR at the given location. The location should be one of 
the values returned by listNars().
*/
   InputStream fetchNarContents(String location) throws IOException;
   }
   ```
   And here, `NarProviderInitializlationContext` would perhaps just have a 
single method:
   ```
   Map getProperties();
   ```
   But this abstraction gives us the freedom to update it in the future.
   
   I also think it's important to consider how those properties get provided to 
the implementation. The way that we typically handle something like this within 
NiFi would be to have the framework look at `nifi.properties` and pull out the 
properties appropriate for the instance. So, perhaps we would look for any 
property that has the format:
   ```
   nifi.library.nar.autoload..implementation
   nifi.library.nar.autoload..XYZ
   ```
   And then build up a Map where the key is the property name 
after  and the value is the property value. For example, if we have in 
nifi.properties:
   ```
   
nifi.library.nar.autoload.hdfs.implementation=org.apache.nifi.hdfs.FetchNarHdfs
   nifi.library.nar.autoload.hdfs.kerberos=username
   nifi.library.nar.autoload.hdfs.hello=world
   ```
   This would produce a Map that could be provided to the `initialize` method 
(via the context):
   kerberos => username
   hello => world
   (No need to include `implementation` necessarily because it's used by the 
framework to know the name of the class to use but doesn't really need to be 
passed to the class itself).
   
   A few additional thoughts on the implementation:
   - With this approach, we only need to have extensions for determining what's 
available and downloading it. The Auto-Loader should not be pluggable, and it 
should be responsible for calling out to each NarProvider to get a listing, and 
then triggering the download of each NAR.
   - It makes sense to support multiple Nar Providers, even multiple of the 
same type. It's very reasonable to imagine an organization in which there are 
multiple teams hosting nars in different locations, and a user may want to 
fetch from multiple directories, etc. This would all work nicely with the above 
property naming scheme.
   - We should always include debug-level logging indicating which NAR's were 
returned when performing the listing, and this should indicate how long it took 
to perform that listing.
   - Should log at an INFO level each time a NAR is downloaded and how long 
that took
   - NAR's should be written with a dot-name (`.my-bundle.nar`) and then 
renamed so that the Extension Loader is able to detect it only when it's done. 
It probably would make sense to download into a sub-directory of the 
`extensions` directory. I.e., if the nar loader is configured to monitor the 
`./extensions` directory we could create a directory `./extensions/hdfs` and 
then update the Nar Loader to also scan that directory. This would make it 
easier to understand where each of the NAR's came from if we have multiple 
NarProviders.
   - It's possible that over time users will continually add new versions of a 
NAR to the same directory. And over time this will result in loading massive 
amounts of NAR's into NiFi. Those that get removed will remain in NiFi. We will 
need some way to age off old NARs. Perhaps after performing a listing, we 
should detect any nar's that were previously downloaded from that NarProvider 
and not present in the Listing. Then, if any of those is not in use (i.e., no 
extensions are in the flow that use that bundle) then we can remove the bundle 
and its NarLoader.