Hi,

I have a problem trying to get a fairly simple app working which makes use
of native avro libraries.  The app runs fine on my local machine and in
yarn-cluster mode, but when I try to run it on EMR yarn-client mode I get
the error below.  I'm aware this is a version problem, as EMR runs an
earlier version of avro, and I am trying to use avro-1.7.7.

What's confusing me a great deal is the fact that this runs fine in
yarn-cluster mode.

What is it about yarn-cluster mode that means the application has access to
the correct version of the avro library?  I need to run in yarn-client mode
as I will be caching data to the driver machine in between batches.  I
think in yarn-cluster mode the driver can run on any machine in the cluster
so this would not work.

Grateful for any advice as I'm really stuck on this.  AWS support are
trying but they don't seem to know why this is happening either!

Just to note, I'm aware of Databricks spark-avro project and have used it.
This is an investigation to see if I can use RDDs instead of dataframes.

java.lang.NoSuchMethodError:
org.apache.avro.Schema$Parser.parse(Ljava/lang/String;[Ljava/lang/String;)Lorg/apache/avro/Schema;
at ophan.thrift.event.Event.<clinit>(Event.java:10)
at SimpleApp$.main(SimpleApp.scala:25)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Thanks,

Tom

Reply via email to