[ 
https://issues.apache.org/jira/browse/HADOOP-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148082#comment-14148082
 ] 

Colin Patrick McCabe commented on HADOOP-11127:
-----------------------------------------------

[~aw]: on a more serious note, what advantages do you see to splitting 
{{libhadoop.so}}?  I can't think of anything off the top of my head, but maybe 
I'm missing something.  I've never liked the wide range of configurations we 
have to support in Hadoop-- it seems like splitting libhadoop into N pieces 
would add another 2^N configurations (number of permutations of each piece 
being present or absent.)  More configurations means less testing for each 
config.  We've seen this with lzo and the other compression libraries... it 
just creates a flood of user questions when a library is optional.  If I were 
starting Hadoop today from scratch, I might make libhadoop.so mandatory just 
because users accidentally forgetting to install or configure it has created so 
much grief over the years.  But of course that's not an option for backwards 
compatibility reasons.

[~cnauroth]: I admit that I think solution #2 is reasonable, since it seems to 
correspond to the way that most users I know use Hadoop.  They don't generally 
run clients of version N unless they have installed servers of version N first. 
 They may run clients of version N-1 against servers of version N, but that 
would work with solution #2.  I can see how it's not entirely in keeping with 
the ideal YARN deployment model, though.

Solution #3 is actually really interesting because it would solve the CLASSPATH 
/ LD_LIBRARY_PATH issue forever.  We wouldn't have to worry about incorrect 
configurations as long as the users had the jars with the native shared 
libraries inside them.  I still get questions about why HDFS short-circuit 
reads don't work for one client or another, and 99% of the time, the answer is 
that the path to libhadoop.so is not configured by that client.  This would 
solve that.  Since most clients who use the native libraries use RPMs or debs 
specific to their platforms, I think switching to this system wouldn't be too 
difficult from a user point of view.

There's some interesting discussion of loading a native library from a jar 
here: http://frommyplayground.com/how-to-load-native-jni-library-from-jar/  
Jars are pretty similar to tar files as I understand, so we should be able to 
just inject this file into the jar somewhere.  This may add some startup time 
overhead, but I don't think it will be that large since libhadoop is less than 
a meg...

> Improve versioning and compatibility support in native library for downstream 
> hadoop-common users.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11127
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>            Reporter: Chris Nauroth
>
> There is no compatibility policy enforced on the JNI function signatures 
> implemented in the native library.  This library typically is deployed to all 
> nodes in a cluster, built from a specific source code version.  However, 
> downstream applications that want to run in that cluster might choose to 
> bundle a hadoop-common jar at a different version.  Since there is no 
> compatibility policy, this can cause link errors at runtime when the native 
> function signatures expected by hadoop-common.jar do not exist in 
> libhadoop.so/hadoop.dll.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to