[ 
https://issues.apache.org/jira/browse/OOZIE-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589989#comment-15589989
 ] 

Robert Kanter commented on OOZIE-2714:
--------------------------------------

Thanks for the interesting idea.  I think it could help a lot in debugging 
classpath issues.  However, I imagine this would add considerable overhead, 
right?  We can double check, but if that's the case, I think it should be 
disabled by default, and users can enable it when they run into 
classpath-related errors to get more info.  The reason I think there'd be a lot 
of overhead is that I assume that the default classloader simply goes with the 
first class it sees (hence why the ordering matters) and then stops looking.  
The approach here would require looking through all classes at every call to 
{{loadClass()}}.

I'd vote for option 2.  We can check all possible class matches and only 
complain if they don't all match.  The downside to this is that if the classes 
are different in some way, but still compatible enough to work correctly, we'd 
end up still throwing an error.  Though if we disable this by default, that's 
probably fine.

Another concern, though I'm not sure how much of a problem it is: what if the 
user code also tries to use a custom classloader?  Would that mess up ours?  
For example, if, say, HiveCLI has a custom classloader.

> Detect conflicting resources during class loading
> -------------------------------------------------
>
>                 Key: OOZIE-2714
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2714
>             Project: Oozie
>          Issue Type: New Feature
>          Components: core
>            Reporter: Peter Bacsko
>
> There are a bunch of issues in Oozie which are related to class loading. 
> The main problem is that the classpath is constructed in a way which is very 
> specific to Oozie:
> - Hadoop lib jars
> - Sharelib jars
> - User-defined jars
> Sometimes there is a conflict between sharelib and hadoop lib version. Also, 
> users can add their own jars which sometimes contain a different version of 
> popular libraries such as Guava, Apache commons, etc.
> We should be able to detect these conflicts and print exact error message so 
> that Oozie users can take appropriate actions to resolve the problem.
> A possible approach is the following:
> * start the execution of an action on a different thread
> * replace the thread's context classloader with a classloader which can 
> detect conflicts
> * when the JVM invokes the {{loadClass()}} method of the classloader, it  
> scans through the jars (which are available as {{URLClassPath}} objects). If 
> it finds the given resource in at least two jars, it can do different things 
> depending on the setup:
> ** throws an error immediately, mentioning the conflicting jars (this is 
> probably too strict - but still an option)
> ** loads the two resource into a byte array and compares them - it only 
> throws an error if there is difference
> ** compares the jars but only emits an error message if there is a conflict
> ** something else (user defined action?)
> Implementing such a classloader is not difficult and would greatly enhance 
> the supportability of Oozie. It could work in multiple modes depending on the 
> setup - perhaps being able to control it from a workflow config is desirable. 
> If there's any problem, we should be able to turn it off completely, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to