Sandy Ryza created SPARK-1851:
---------------------------------

             Summary: Upgrade Avro dependency to 1.7.6 so Spark can read Avro 
files
                 Key: SPARK-1851
                 URL: https://issues.apache.org/jira/browse/SPARK-1851
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
            Reporter: Sandy Ryza
            Priority: Critical


I tried to set up a basic example getting a Spark job to read an Avro container 
file with Avro specifics.  This results in a ClassNotFoundException: can't 
convert GenericData.Record to com.cloudera.sparkavro.User.

The reason is:
* When creating records, to decide whether to be specific or generic, Avro 
tries to load a class with the name specified in the schema.
* Initially, executors just have the system jars (which include Avro), and load 
the app jars dynamically with a URLClassLoader that's set as the context 
classloader for the task threads.
* Avro tries to load the generated classes with 
SpecificData.class.getClassLoader(), which sidesteps this URLClassLoader and 
goes up to the AppClassLoader.

Avro 1.7.6 has a change (AVRO-987) that falls back to the Thread's context 
classloader when the SpecificData.class.getClassLoader() fails.  I tested with 
Avro 1.7.6 and did not observe the problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to