"Why not just use the hadoop classspath generated by running `hadoop classpath`"
I like it! +1 On Tue, Jul 9, 2013 at 11:33 PM, Jonathan Hsieh <j...@cloudera.com> wrote: > tl;dr > Ideally the generation of hadoop+accumulo's classpath should only be done > in one place. At least for all versions of hadoop i've seen in the 5 > years, there is one place to get hadoop's classpath (the `hadoop classpath` > command). Why not use it? > > ---- > > For the hadoop, I've found that hadoop2 package from bigtop deploy to > different locations than the stock hadoop2 tarballs and require yet another > set of hadoop path wrangling between hadoop1, hadoop2, and different > deployment mechanisms. It's kind of a mess. > > hadoop1: > $HADOOP_HOME/*.jar > $HADOOP_HOME/lib/*.jar > > hadoop2 tarball > $HADOOP_HOME/lib/share/hadoop/common/*.jar > $HADOOP_HOME/lib/share/hadoop/common/lib/*.jar > $HADOOP_HOME/lib/share/hadoop/hdfs/*.jar > $HADOOP_HOME/lib/share/hadoop/hdfs/lib/*.jar > $HADOOP_HOME/lib/share/hadoop/mapreduce/*.jar > $HADOOP_HOME/lib/share/hadoop/mapreduce/lib/*.jar > > hadoop2 bigtop rpm > /usr/lib/hadoop-mapreduce ... > /usr/lib/hadoop-hdfs ... > /usr/lib/hadoop-yarn ... > > There is a script in place that already generates hadoop classpaths that is > consistent across for multiple versions and deployments -- by using the > `hadoop classpath` command. Why not just use the hadoop classspath > generated by running `hadoop classpath` instead of trying to create > classpath in java/python/xml code and having to modify for each kind of > hadoop? > > Does this seem reasonable? Where would there be trouble spots? > > Semi-related, I've found that if HADOOP_HOME/lib doesn't contain certain > jars (accumulo depends on them) the Platform or Main wrapper programs can > fail. > > A follow on would be to consolidate the 3 accumulo+hadoop classpath > generation locations into one, or at least to factor out the base > hadoop+accumulo parts. > > The accumulo+hadoop classpath generation happens in 3 places . > 1) hard coded with hadoop1 locations in java in the AccumuloClassLoader, > 2) as comments in the accumulo-site.xml.example file, > 3) in the soon to be deprecated TestUtils.py (in the system/auto test > suite). > > I don't quite understand all of the classloader magic yet, but I'd wager > that it is used for extensions like iterators. Could we just have one > initial point of accumulo+hadoop classpath generation and then use the xml > config to add more jars that the nested classloaders to handle the magjc > for dfs jarloading and for iterator extensions? > > Thanks, > Jon. > > -- > // Jonathan Hsieh (shay) > // Software Engineer, Cloudera > // j...@cloudera.com >