[ https://issues.apache.org/jira/browse/HADOOP-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148082#comment-14148082 ]
Colin Patrick McCabe commented on HADOOP-11127: ----------------------------------------------- [~aw]: on a more serious note, what advantages do you see to splitting {{libhadoop.so}}? I can't think of anything off the top of my head, but maybe I'm missing something. I've never liked the wide range of configurations we have to support in Hadoop-- it seems like splitting libhadoop into N pieces would add another 2^N configurations (number of permutations of each piece being present or absent.) More configurations means less testing for each config. We've seen this with lzo and the other compression libraries... it just creates a flood of user questions when a library is optional. If I were starting Hadoop today from scratch, I might make libhadoop.so mandatory just because users accidentally forgetting to install or configure it has created so much grief over the years. But of course that's not an option for backwards compatibility reasons. [~cnauroth]: I admit that I think solution #2 is reasonable, since it seems to correspond to the way that most users I know use Hadoop. They don't generally run clients of version N unless they have installed servers of version N first. They may run clients of version N-1 against servers of version N, but that would work with solution #2. I can see how it's not entirely in keeping with the ideal YARN deployment model, though. Solution #3 is actually really interesting because it would solve the CLASSPATH / LD_LIBRARY_PATH issue forever. We wouldn't have to worry about incorrect configurations as long as the users had the jars with the native shared libraries inside them. I still get questions about why HDFS short-circuit reads don't work for one client or another, and 99% of the time, the answer is that the path to libhadoop.so is not configured by that client. This would solve that. Since most clients who use the native libraries use RPMs or debs specific to their platforms, I think switching to this system wouldn't be too difficult from a user point of view. There's some interesting discussion of loading a native library from a jar here: http://frommyplayground.com/how-to-load-native-jni-library-from-jar/ Jars are pretty similar to tar files as I understand, so we should be able to just inject this file into the jar somewhere. This may add some startup time overhead, but I don't think it will be that large since libhadoop is less than a meg... > Improve versioning and compatibility support in native library for downstream > hadoop-common users. > -------------------------------------------------------------------------------------------------- > > Key: HADOOP-11127 > URL: https://issues.apache.org/jira/browse/HADOOP-11127 > Project: Hadoop Common > Issue Type: Bug > Components: native > Reporter: Chris Nauroth > > There is no compatibility policy enforced on the JNI function signatures > implemented in the native library. This library typically is deployed to all > nodes in a cluster, built from a specific source code version. However, > downstream applications that want to run in that cluster might choose to > bundle a hadoop-common jar at a different version. Since there is no > compatibility policy, this can cause link errors at runtime when the native > function signatures expected by hadoop-common.jar do not exist in > libhadoop.so/hadoop.dll. -- This message was sent by Atlassian JIRA (v6.3.4#6332)