[ 
https://issues.apache.org/jira/browse/OOZIE-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated OOZIE-2714:
--------------------------------
    Description: 
There are a bunch of issues in Oozie which are related to class loading. 

The main problem is that the classpath is constructed in a way which is very 
specific to Oozie:
- Hadoop lib jars
- Sharelib jars
- User-defined jars

Sometimes there is a conflict between sharelib and hadoop lib version. Also, 
users might add their own jars which contains a different version of popular 
libraries such as Guava, Apache commons, etc.

We should be able to detect these conflicts and print exact error message so 
that Oozie users can take appropriate actions to resolve the problem.

A possible approach is the following:
* start the execution of an action on a different thread
* replace the thread's context classloader with a classloader which can detect 
conflicts
* when the JVM invokes the {{loadClass()}} method of the classloader, it  scans 
through the jars (which are available as {{URLClassPath}} objects). If it finds 
the given resource in at least two jars, it can do different things depending 
on the setup:
** throws an error immediately, mentioning the conflicting jars (this is 
probably too strict - but still an option)
** loads the two resource into a byte array and compares them - it only throws 
an error if there is difference
** compares the jars but only emits an error message if there is a conflict
** something else (user defined action?)

Implementing such a classloader is not difficult and would greatly enhance the 
supportability of Oozie. It could work in multiple modes depending on the setup 
- perhaps being able to control it from a workflow config is desirable. If 
there's any problem, we should be able to turn it off completely, too.

  was:
There are a bunch of issues in Oozie which are related to class loading. 

The main problem is that the classpath is constructed in a way which is very 
specific to Oozie:
- Hadoop lib jars
- Sharelib jars
- User-defined jars

Sometimes there is a conflict between sharelib and hadoop lib version. Also, 
users might add their own jars which contains a different version of popular 
libraries such as Guava, Apache commons, etc.

We should be able to detect these conflicts and print exact error message so 
that Oozie user can take appropriate actions to resolve the problem.

A possible approach is the following:
* start the execution of an action on a different thread
* replace the thread's context classloader with a classloader which can detect 
conflicts
* when the JVM invokes the {{loadClass()}} method of the classloader, it  scans 
through the jars (which are available as {{URLClassPath}} objects). If it finds 
the given resource in at least two jars, it can do different things depending 
on the setup:
** throws an error immediately, mentioning the conflicting jars (this is 
probably too strict - but still an option)
** loads the two resource into a byte array and compares them - it only throws 
an error if there is difference
** compares the jars but only emits an error message if there is a conflict
** something else (user defined action?)

Implementing such a classloader is not difficult and would greatly enhance the 
supportability of Oozie. It could work in multiple modes depending on the setup 
- perhaps being able to control it from a workflow config is desirable. If 
there's any problem, we should be able to turn it off completely, too.


> Detect conflicting resources during class loading
> -------------------------------------------------
>
>                 Key: OOZIE-2714
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2714
>             Project: Oozie
>          Issue Type: Improvement
>          Components: core
>            Reporter: Peter Bacsko
>             Fix For: oya
>
>
> There are a bunch of issues in Oozie which are related to class loading. 
> The main problem is that the classpath is constructed in a way which is very 
> specific to Oozie:
> - Hadoop lib jars
> - Sharelib jars
> - User-defined jars
> Sometimes there is a conflict between sharelib and hadoop lib version. Also, 
> users might add their own jars which contains a different version of popular 
> libraries such as Guava, Apache commons, etc.
> We should be able to detect these conflicts and print exact error message so 
> that Oozie users can take appropriate actions to resolve the problem.
> A possible approach is the following:
> * start the execution of an action on a different thread
> * replace the thread's context classloader with a classloader which can 
> detect conflicts
> * when the JVM invokes the {{loadClass()}} method of the classloader, it  
> scans through the jars (which are available as {{URLClassPath}} objects). If 
> it finds the given resource in at least two jars, it can do different things 
> depending on the setup:
> ** throws an error immediately, mentioning the conflicting jars (this is 
> probably too strict - but still an option)
> ** loads the two resource into a byte array and compares them - it only 
> throws an error if there is difference
> ** compares the jars but only emits an error message if there is a conflict
> ** something else (user defined action?)
> Implementing such a classloader is not difficult and would greatly enhance 
> the supportability of Oozie. It could work in multiple modes depending on the 
> setup - perhaps being able to control it from a workflow config is desirable. 
> If there's any problem, we should be able to turn it off completely, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to