> On 3 Mar 2015, at 21:14, Keith Turner <[email protected]> wrote: > > But that is not what prompted this discussion. Fluo depends on Accumulo > and Hadoop. Currently Fluo uses maven to build its complete runtime > classpath (w/ maven its easy to exclude things like log4j). This is > problematic in the case where the user builds Fluo with version X of hadoop > and has version Y running on their cluster. I am looking into making the > fluo scripts build the runtime classpath using the installed software, with > something like the following. > > FLUO_CLASSPATH=$FLUO_HOME/lib/*:$ACCUMULO_HOME/lib/*:`hadoop classpath` > > Using this method `hadoop classpath` brings in log4j and slf4j-log4j which > makes slf4j unhappy, because twill brings in logback slf4j bindings.
The trend in YARN apps is to distribute their entire set of dependencies, pulling in only the hadoop conf dirs to their classpath. There's mixed benefits here good: -isolation of dependencies -100% confidence your hadoop APs are in sync -works in clusters in which the nodes do rolling upgrades & different parts of the cluster can be running different versions of hadoop at the same time. bad: -more stuff to upload to the distributed cache -your binaries aren't the same as the clusters, especially if they are not ASF clusters but things built by other people. You can fix that through the use of different mvn repos at build time, but then you have to build things -even with different classpaths, you all share the same native binaries. Try to run a 2.6 app on a 2.4 cluster and things that call native code may have link problems.
