Thanks for your reply Aniket. Ok I've done this and I'm still confused. Output from running locally shows:
file:/home/tom/spark-avro/target/scala-2.10/simpleapp.jar file:/home/tom/spark-1.4.0-bin-hadoop2.4/conf/ file:/home/tom/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar file:/home/tom/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar file:/home/tom/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar file:/home/tom/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar file:/usr/lib/jvm/java-7-oracle/jre/lib/ext/sunjce_provider.jar file:/usr/lib/jvm/java-7-oracle/jre/lib/ext/zipfs.jar file:/usr/lib/jvm/java-7-oracle/jre/lib/ext/localedata.jar file:/usr/lib/jvm/java-7-oracle/jre/lib/ext/dnsns.jar file:/usr/lib/jvm/java-7-oracle/jre/lib/ext/sunec.jar file:/usr/lib/jvm/java-7-oracle/jre/lib/ext/sunpkcs11.jar saving text file... done! In yarn-client mode: file:/home/hadoop/simpleapp.jar file:/usr/lib/hadoop/hadoop-auth-2.6.0-amzn-0.jar ... *file:/usr/lib/hadoop-mapreduce/avro-1.7.4.jar* ... And in yarn-cluster mode: file:/mnt/yarn/usercache/hadoop/appcache/application_1441787021820_0004/container_1441787021820_0004_01_000001/__app__.jar ... *file:/usr/lib/hadoop/lib/avro-1.7.4.jar* ... saving text file... done! In yarn-cluster mode it doesn't appear to have sight of the fat jar (simpleapp), but can see avro-1.7.4, but runs fine! Thanks, Tom On Wed, Sep 9, 2015 at 9:49 AM Aniket Bhatnagar <aniket.bhatna...@gmail.com> wrote: > Hi Tom > > There has to be a difference in classpaths in yarn-client and yarn-cluster > mode. Perhaps a good starting point would be to print classpath as a first > thing in SimpleApp.main. It should give clues around why it works in > yarn-cluster mode. > > Thanks, > Aniket > > On Wed, Sep 9, 2015, 2:11 PM Tom Seddon <mr.tom.sed...@gmail.com> wrote: > >> Hi, >> >> I have a problem trying to get a fairly simple app working which makes >> use of native avro libraries. The app runs fine on my local machine and in >> yarn-cluster mode, but when I try to run it on EMR yarn-client mode I get >> the error below. I'm aware this is a version problem, as EMR runs an >> earlier version of avro, and I am trying to use avro-1.7.7. >> >> What's confusing me a great deal is the fact that this runs fine in >> yarn-cluster mode. >> >> What is it about yarn-cluster mode that means the application has access >> to the correct version of the avro library? I need to run in yarn-client >> mode as I will be caching data to the driver machine in between batches. I >> think in yarn-cluster mode the driver can run on any machine in the cluster >> so this would not work. >> >> Grateful for any advice as I'm really stuck on this. AWS support are >> trying but they don't seem to know why this is happening either! >> >> Just to note, I'm aware of Databricks spark-avro project and have used >> it. This is an investigation to see if I can use RDDs instead of >> dataframes. >> >> java.lang.NoSuchMethodError: >> org.apache.avro.Schema$Parser.parse(Ljava/lang/String;[Ljava/lang/String;)Lorg/apache/avro/Schema; >> at ophan.thrift.event.Event.<clinit>(Event.java:10) >> at SimpleApp$.main(SimpleApp.scala:25) >> at SimpleApp.main(SimpleApp.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665) >> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170) >> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> >> Thanks, >> >> Tom >> >> >>