ddanielr opened a new issue, #27: URL: https://github.com/apache/accumulo-classloaders/issues/27
This ticket contains a potential solution for https://github.com/NationalSecurityAgency/datawave-accumulo-plugins/issues/2 I did not add these details directly in order to avoid conflating specific implementation requirements with the currently known requirements. ## Stage 1. Define possible components ## - Top Level SimpleHDFSClassLoaderFactory class - HDFS file fetcher (pluggable for testing) - ContextPath structure - Manifest File structure - Context cleanup thread #### Context path structure #### The context path should be similar to the following `hdfs://test:8020/contexts/contextA/manifest.json` This manifest file format should be machine readable. (Json is not required but used for this example) #### Directory and Manifest file structure #### The directory should contain a manifest file and jars. ``` /tmp/local-contexts/contextA/manifest.json /tmp/local-contexts/contextA/Iterators.jar /tmp/local-contexts/contextA/IteratorsV2.jar ``` The manifest file should consist of jar names and checksum values. ``` { "context": "contextA", "jars": [ { "name": "Iterators.jar", "checksum": "f2ca1bb6c7e907d06dafe4687e579fce76b37e4e93b7605022da52e6ccc26fd2" }, { "name": "IteratorsV2.jar", "checksum": "934ee77f70dc82403618c154aa63af6f4ebbe3ac1eaf22df21e7e64f0fb6643d" } ] } ``` ## Stage 2. Create Factory ## Create a SimpleHDFSClassLoaderFactory that implements the [ContextClassLoaderFactory](https://github.com/apache/accumulo/blob/b2c5fc4800272c9d669ad53d749602583b2ae724/core/src/main/java/org/apache/accumulo/core/spi/common/ContextClassLoaderFactory.java#L48) interface. This class should use a cache that quickly returns classloaders for already defined context names. This cache should store the classloader and the local directory used for the contextPath file cache. The class should perform a property lookup to get the corresponding contextPath for a given context name. See the [ContextManager ](https://github.com/apache/accumulo/blob/b3ec422ba764643c5e5da819ed2d5106dcf7490c/start/src/main/java/org/apache/accumulo/start/classloader/vfs/ContextManager.java#L34)class It should resolve contextPaths to local directories and attempt to load classes from there. #### Local File Cache Directory Resolution #### This class should resolve context paths to a local directory location based off the immediate parent directory of the manifest file. The local directory location should be a user-defined directory. (Similar to the [VFS_CACHE_DIR](https://github.com/apache/accumulo/blob/2e31448aabd7012aef86fa21f88d8c94f185ebfc/core/src/main/java/org/apache/accumulo/core/conf/Property.java#L1464) property) As an example: ContextPath: `hdfs://test:8020/contexts/contextA/manifest.json` User-defined dir: `/tmp/local-contexts` Resolved dir: `/tmp/local-contexts/contextA` The class should throw an error if that directory doesn't exist. Once the directory is confirmed to exist, the class should use the manifest file to validate jars and generate a new list of jar urls. This list will then used to create a new URLClassloader. This new classloader should be cached, along with the file cache dir, and then returned to the ClassLoaderUtil. Write a test that can stage directory creation and a jar file, then successfully load a class from that jar using the SimpleHDFSClassLoaderFactory. ## Stage 2. Fetch Files from HDFS ## Create a class that will perform the following steps when given a manifest file location: 1. Create a lock file in the user-defined directory. `/tmp/local-contexts/contextA.lock` 2. Create a unique temp directory for the context `/tmp/local-contexts/tmp-contextA-<uuid>` 3. Download the manifest file to the temp dir and use the contents to copy and validate defined jars from the source HDFS location to the tmp dir. 4. Perform a rename option on the directory to promote it to the new context name. `/tmp/local-contexts/tmp-contextA-<uuid>` -> `/tmp/local-contexts/contextA` 5. Delete the lock file. Modify the SimpleHDFSClassLoaderFactory to use this class when the local context directory doesn't exist and the lock file also doesn't exist. Write an IT for testing loading classes from HDFS using the SimpleHDFSClassLoaderFactory in a single Tserver. ## Stage 3. Support multiple processes ## Modify the SimpleHDFSClassLoaderFactory to do the following: 1. Check if the lock file exists and wait to load classes until a user-defined period of time has passed since lock file modification. 2. If the wait is achieved, the class should touch the lock file to reset it's modification date and proceed with fetching files from HDFS. ## Stage 4. Cleanup old contexts ## Start a thread that looks at property definitions every minute and if contexts are not defined, they should be removed from the cache. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
