I've finally hit a road block that I can't figure out a way around. There is a 
call to ReflectionUtils in HiveUtils. getAuthorizerFactory that is changing the 
current threads class loader to one for a completely different thread. I can 
make SessionState set it back but I have no idea why it's changing it. I'm 
guessing something in Java internals. Whatever's going on it completely breaks 
udf class loading in certain scenarios.

Thanks

-----Original Message-----
From: Shawn Weeks <swe...@weeksconsulting.us> 
Sent: Sunday, March 10, 2019 12:42 PM
To: dev@hive.apache.org
Subject: RE: Custom UDF Loses Depenencies

This rabbit hole is getting a little crazy. I can see SessionState start and 
attach being called and can see that at those points the current thread class 
loader and session state class loader are the same but when add_resources is 
called they are not the same. The current thread class loader for add resources 
is the first class loader created for the first session state and never changes.

-----Original Message-----
From: Shawn Weeks <swe...@weeksconsulting.us> 
Sent: Saturday, March 9, 2019 11:01 AM
To: dev@hive.apache.org
Subject: RE: Custom UDF Loses Depenencies

Ok nevermind, I see where the session state is being attached and the thread 
context is being switched. So the real question is why is the thread context 
class loader for registerJars different than the session state it was called 
with. Currently trying to find out what calls registerJars.

Thanks
Shawn

-----Original Message-----
From: Shawn Weeks <swe...@weeksconsulting.us> 
Sent: Saturday, March 9, 2019 10:57 AM
To: dev@hive.apache.org
Subject: RE: Custom UDF Loses Depenencies

The more I look at this the more confusing it gets, it looks like some places 
Hive is using the class loader from the SessionState.getConf() and sometimes 
from the thread context. I'd expect the thread context class loader and the 
session state one to be the same as the thread context is set by sessionstate. 
However I'm looking at the start method on SessionState and it doesn't set the 
classloader at all so registerJars may be getting the wrong class loader.

Thanks
Shawn

-----Original Message-----
From: Shawn Weeks <swe...@weeksconsulting.us> 
Sent: Saturday, March 9, 2019 10:44 AM
To: dev@hive.apache.org
Subject: RE: Custom UDF Loses Depenencies

I filed HIVE-21409 for this issue. It looks like the registerJars method in 
SessionState is still using the thread context class loader instead of grabbing 
the one from SessionState. I'm not sure how it happens but it looks like if you 
have a bunch of parallel sessions from something like Oozie right after Hive 
starts the class path can get really polluted quickly. I need to go look at the 
mechanism behind the "delete jars" command as running that on a brand new 
session seems to wipe all the extra stuff out of the class loader.

Thanks

-----Original Message-----
From: Gopal Vijayaraghavan <gop...@apache.org> 
Sent: Wednesday, March 6, 2019 3:01 PM
To: dev@hive.apache.org
Subject: Re: Custom UDF Loses Depenencies

>    When we register a jar on the Hive console. Hive creates a fresh URL 
> classloader which includes the path of the current jar to be registered and 
> all the jar paths of the parent classloader. The parent classlaoder is the 
> current ThreadContextClassLoader. Once the URLClassloader is created Hive 
> sets ...

That looks like the root-cause of a different class of issues.

Since the fresh classloader is picking up the URLs, the URLs keep growing.

Worse, the Hadoop classes loaded from shims are coming off the thread loader & 
being reloaded a million times.

Cheers,
Gopal
 


Reply via email to