Hey Matthias, I cc'd everyone else on here, but since this was your module, I thought it best to solicit your opinion before refactoring it.
We never managed to get crunch-archetypes working w/hadoop 2.x, which is apparently deprecating the lib/* trick for including client dependencies in favor of the -libjars option (see http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/and http://architects.dzone.com/articles/using-libjars-option-hadoop ) The way that I have found to do this in Maven is to use the copy-dependencies option of the maven-dependency-plugin and include a shell script in a bin/ directory that knows how to setup the HADOOP_CLASSPATH and libjars arguments for use with hadoop jar. Although this approach is more complex than the lib/* trick, it will be able to support hadoop 1.x as well as hadoop 2.x. Do you have any objections to me taking this on, and/or any other landmines I should keep an eye out for? Thanks! Josh -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
