Hi Shawn,
thanks a lot for your mail and your patch! Answers see below.
On Tuesday, 2012-08-14, Shawn Smith wrote:
[...]
> 3. EMR Hadoop 1.0.3 includes Avro 1.5.3 which apparently takes precedence
> over Crunch's Avro 1.7.0. I didn't mess around with trying to get my classes
> in the class path first… Instead I used the maven-shade-plugin in my job's
> build to shade Avro 1.7.0 from "org.apache.avro.*" to
> "shaded.org.apache.avro.*" so it wouldn't conflict with the EMR version of
> Avro. Example exception (you can see the Avro source code line numbers
> correspond to version 1.5.3):
[...]
Hadoop provides no classloader isolation, I've been bitten by this several
times, too. There's a crude workaround you can try:
export HADOOP_USER_CLASSPATH_FIRST=true
You have to set it before running the hadoop script. I don't see other
options at this point until Hadoop is fixed.
[...]
> 4. EMR Hadoop 1.0.3 includes two different versions of SLF4J in the class
> path: 1.4.3 and 1.6.4. As a result, jobs that use SLF4J will fail
> non-deterministically when a particular run uses slf4j-api-1.4.3.jar with
> slf4j-log4j12-1.6.4.jar, as described in the SLF4J FAQ
> (http://www.slf4j.org/faq.html#IllegalAccessError). It looks like you can
> workaround the problem by using shaded SLF4J jars and not relying on the ones
> provided by the Hadoop distribution. The stack trace looks something like
> this:
That's a nasty bug in EMR's Hadoop. We've recently downgraded our dependency
to slf4j 1.4.3 and I'm going to try fixing some more of these problems in
CRUNCH-16.
Please let us know if you experience any more problems!
Regards,
Matthias