I've finally hit a road block that I can't figure out a way around. There is a call to ReflectionUtils in HiveUtils. getAuthorizerFactory that is changing the current threads class loader to one for a completely different thread. I can make SessionState set it back but I have no idea why it's changing it. I'm guessing something in Java internals. Whatever's going on it completely breaks udf class loading in certain scenarios.
Thanks -----Original Message----- From: Shawn Weeks <swe...@weeksconsulting.us> Sent: Sunday, March 10, 2019 12:42 PM To: dev@hive.apache.org Subject: RE: Custom UDF Loses Depenencies This rabbit hole is getting a little crazy. I can see SessionState start and attach being called and can see that at those points the current thread class loader and session state class loader are the same but when add_resources is called they are not the same. The current thread class loader for add resources is the first class loader created for the first session state and never changes. -----Original Message----- From: Shawn Weeks <swe...@weeksconsulting.us> Sent: Saturday, March 9, 2019 11:01 AM To: dev@hive.apache.org Subject: RE: Custom UDF Loses Depenencies Ok nevermind, I see where the session state is being attached and the thread context is being switched. So the real question is why is the thread context class loader for registerJars different than the session state it was called with. Currently trying to find out what calls registerJars. Thanks Shawn -----Original Message----- From: Shawn Weeks <swe...@weeksconsulting.us> Sent: Saturday, March 9, 2019 10:57 AM To: dev@hive.apache.org Subject: RE: Custom UDF Loses Depenencies The more I look at this the more confusing it gets, it looks like some places Hive is using the class loader from the SessionState.getConf() and sometimes from the thread context. I'd expect the thread context class loader and the session state one to be the same as the thread context is set by sessionstate. However I'm looking at the start method on SessionState and it doesn't set the classloader at all so registerJars may be getting the wrong class loader. Thanks Shawn -----Original Message----- From: Shawn Weeks <swe...@weeksconsulting.us> Sent: Saturday, March 9, 2019 10:44 AM To: dev@hive.apache.org Subject: RE: Custom UDF Loses Depenencies I filed HIVE-21409 for this issue. It looks like the registerJars method in SessionState is still using the thread context class loader instead of grabbing the one from SessionState. I'm not sure how it happens but it looks like if you have a bunch of parallel sessions from something like Oozie right after Hive starts the class path can get really polluted quickly. I need to go look at the mechanism behind the "delete jars" command as running that on a brand new session seems to wipe all the extra stuff out of the class loader. Thanks -----Original Message----- From: Gopal Vijayaraghavan <gop...@apache.org> Sent: Wednesday, March 6, 2019 3:01 PM To: dev@hive.apache.org Subject: Re: Custom UDF Loses Depenencies > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets ... That looks like the root-cause of a different class of issues. Since the fresh classloader is picking up the URLs, the URLs keep growing. Worse, the Hadoop classes loaded from shims are coming off the thread loader & being reloaded a million times. Cheers, Gopal