[ 
https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14824525#comment-14824525
 ] 

Ratandeep Ratti commented on HIVE-11878:
----------------------------------------

There are a few approaches in my mind on how to solve this problem.

1. Do not use the current classloader as the parent when creating a new 
URLClassLoader in {{Utilities.addToClassPath}} method.  In this case, every 
URLClassLoader constructed will have, as its parent, the SystemClassLoader.  
Also as per the {{addToClassPath}} method, every URLClassLoader  will  have all 
the jar paths from the parent (including the current jar to be registered).

Now looking at the above scenario again class-loader *u2* will have as its 
parent the system class-loader and will have the jar paths for jars *j1* and 
*j2* both. Now, when *c1* is instantiated using the classloader *u2*, *u2* will 
load and define the class *c1* as the parent classloader will not have the jar. 
Now class *c2* required by *c1* will also be found in *u2* and will be 
correctly loaded and defined by *u2*.

Note that {{Utilities.removeFromClassPath}} also creates new URLClassloaders 
without passing the current classloader as their parent.

2. Have a new classloader which extends the URLClassloader and uses the servlet 
spec to load classes. That is instead of delegating to parent first, it first 
tries to find the class in its own classpath.

3. Have a new classloader which extends the URLClassloader and which changes 
the scope of the {{addURL}} method from protected to public. Using this custom 
classloader we will not have to create fresh-classloaders for every jar being 
registered. We do have to think about how  delete/remove jars will be 
implemented.


> ClassNotFoundException can possibly  occur if multiple jars are registered in 
> Hive
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-11878
>                 URL: https://issues.apache.org/jira/browse/HIVE-11878
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>            Reporter: Ratandeep Ratti
>            Assignee: Ratandeep Ratti
>              Labels: URLClassLoader
>         Attachments: HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL 
> classloader which includes the path of the current jar to be registered and 
> all the jar paths of the parent classloader. The parent classlaoder is the 
> current ThreadContextClassLoader. Once the URLClassloader is created Hive 
> sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple 
> URLClassLoaders created, each classloader including the jars from its parent 
> and the one extra jar to be registered. The last URLClassLoader created will 
> end up as the current ThreadContextClassLoader. (See details: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class 
> *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, 
> the URLClassLoader *u1* is created and also set as the 
> ThreadContextClassLoader. We register *j2* next, the new URLClassLoader 
> created will be *u2* with *u1* as parent and *u2* becomes the new 
> ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* 
> whereas *u1* only has paths to *j1* (For details see: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load 
> the class using {code} class.forName("c1", true, 
> Thread.currentThread().getContextClassLoader()) {code} . The 
> currentThreadContext class-loader is *u2*, and it has the path to the class 
> *c1*, but note that Class-loaders work by delegating to parent class-loader 
> first. In this case class *c1* will be found and *defined* by class-loader 
> *u1*.
> Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say 
> initialize) is called in *c1*, which references the class *c2*, *c2* will not 
> be found since the class-loader used to search for *c2* will be *u1* (Since 
> the caller's class-loader is used to load a class)
> I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to