[ https://issues.apache.org/jira/browse/SPARK-19656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896194#comment-15896194 ]
Eric Maynard commented on SPARK-19656: -------------------------------------- Here is a complete working example in Java: {code:title=AvroTest.java|borderStyle=solid} import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; import java.util.ArrayList; public class AvroTest { public static void main(String[] args){ //build spark session: System.setProperty("hadoop.home.dir", "C:\\Hadoop");//windows hack SparkSession spark = SparkSession.builder().master("local").appName("Avro Test") .config("spark.sql.warehouse.dir", "file:///c:/tmp/spark-warehouse")//another windows hack .getOrCreate(); //create data: ArrayList list = new ArrayList<CustomClass>(); CustomClass cc = new CustomClass(); cc.setValue(5); list.add(cc); spark.createDataFrame(list, CustomClass.class).write().format("com.databricks.spark.avro").save("C:\\tmp\\file.avro"); //read data: Row row = (spark.read().format("com.databricks.spark.avro").load("C:\\tmp\\file.avro").head()); System.out.println("Success =\t" + ((Integer)row.get(0) == 5)); } } {code} With a simple custom class: {code:title=CustomClass.java|borderStyle=solid} import java.io.Serializable; public class CustomClass implements Serializable { public int value; public void setValue(int value){this.value = value;} public int getValue(){return this.value;} } {code} Everything looks ok to me, and the main function prints "Success = true". In the future please make sure that you don't have an issue in your application before opening a JIRA. Also, as an aside, I really recommend picking up some Scala as IMO the Scala API is much friendlier, esp. around the edges for things like the avro library. > Can't load custom type from avro file to RDD with newAPIHadoopFile > ------------------------------------------------------------------ > > Key: SPARK-19656 > URL: https://issues.apache.org/jira/browse/SPARK-19656 > Project: Spark > Issue Type: Question > Components: Java API > Affects Versions: 2.0.2 > Reporter: Nira Amit > > If I understand correctly, in scala it's possible to load custom objects from > avro files to RDDs this way: > {code} > ctx.hadoopFile("/path/to/the/avro/file.avro", > classOf[AvroInputFormat[MyClassInAvroFile]], > classOf[AvroWrapper[MyClassInAvroFile]], > classOf[NullWritable]) > {code} > I'm not a scala developer, so I tried to "translate" this to java as best I > could. I created classes that extend AvroKey and FileInputFormat: > {code} > public static class MyCustomAvroKey extends AvroKey<MyCustomClass>{}; > public static class MyCustomAvroReader extends > AvroRecordReaderBase<MyCustomAvroKey, NullWritable, MyCustomClass> { > // with my custom schema and all the required methods... > } > public static class MyCustomInputFormat extends > FileInputFormat<MyCustomAvroKey, NullWritable>{ > @Override > public RecordReader<MyCustomAvroKey, NullWritable> > createRecordReader(InputSplit inputSplit, TaskAttemptContext > taskAttemptContext) throws IOException, InterruptedException { > return new MyCustomAvroReader(); > } > } > ... > JavaPairRDD<MyCustomAvroKey, NullWritable> records = > sc.newAPIHadoopFile("file:/path/to/datafile.avro", > MyCustomInputFormat.class, MyCustomAvroKey.class, > NullWritable.class, > sc.hadoopConfiguration()); > MyCustomClass first = records.first()._1.datum(); > System.out.println("Got a result, some custom field: " + > first.getSomeCustomField()); > {code} > This compiles fine, but using a debugger I can see that `first._1.datum()` > actually returns a `GenericData$Record` in runtime, not a `MyCustomClass` > instance. > And indeed, when the following line executes: > {code} > MyCustomClass first = records.first()._1.datum(); > {code} > I get an exception: > {code} > java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record > cannot be cast to my.package.containing.MyCustomClass > {code} > Am I doing it wrong? Or is this not possible in Java? -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org