[ https://issues.apache.org/jira/browse/SPARK-19656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873090#comment-15873090 ]
Nira Amit commented on SPARK-19656: ----------------------------------- I also tried to do this without writing my own `AvroKey` and `AvroKeyInputFormat`: {code} JavaPairRDD<AvroKey<MyCustomClass>, NullWritable> records = sc.newAPIHadoopFile("file:/path/to/file.avro", new AvroKeyInputFormat<MyCustomClass>().getClass(), new AvroKey<MyCustomClass>().getClass(), NullWritable.class, sc.hadoopConfiguration()); {code} Which I think should have worked but instead results in a compilation error: {code} Error:(263, 36) java: incompatible types: inferred type does not conform to equality constraint(s) inferred: org.apache.avro.mapred.AvroKey<my.package.containing.MyCustomClass> equality constraints(s): org.apache.avro.mapred.AvroKey<my.package.containing.MyCustomClass>,capture#1 of ? extends org.apache.avro.mapred.AvroKey {code} > Can't load custom type from avro file to RDD with newAPIHadoopFile > ------------------------------------------------------------------ > > Key: SPARK-19656 > URL: https://issues.apache.org/jira/browse/SPARK-19656 > Project: Spark > Issue Type: Question > Components: Java API > Affects Versions: 2.0.2 > Reporter: Nira Amit > > If I understand correctly, in scala it's possible to load custom objects from > avro files to RDDs this way: > {code} > ctx.hadoopFile("/path/to/the/avro/file.avro", > classOf[AvroInputFormat[MyClassInAvroFile]], > classOf[AvroWrapper[MyClassInAvroFile]], > classOf[NullWritable]) > {code} > I'm not a scala developer, so I tried to "translate" this to java as best I > could. I created classes that extend AvroKey and FileInputFormat: > {code} > public static class MyCustomAvroKey extends AvroKey<MyCustomClass>{}; > public static class MyCustomAvroReader extends > AvroRecordReaderBase<MyCustomAvroKey, NullWritable, MyCustomClass> { > // with my custom schema and all the required methods... > } > public static class MyCustomInputFormat extends > FileInputFormat<MyCustomAvroKey, NullWritable>{ > @Override > public RecordReader<MyCustomAvroKey, NullWritable> > createRecordReader(InputSplit inputSplit, TaskAttemptContext > taskAttemptContext) throws IOException, InterruptedException { > return new MyCustomAvroReader(); > } > } > ... > JavaPairRDD<MyCustomAvroKey, NullWritable> records = > sc.newAPIHadoopFile("file:/path/to/datafile.avro", > MyCustomInputFormat.class, MyCustomAvroKey.class, > NullWritable.class, > sc.hadoopConfiguration()); > MyCustomClass first = records.first()._1.datum(); > System.out.println("Got a result, some custom field: " + > first.getSomeCustomField()); > {code} > This compiles fine, but using a debugger I can see that `first._1.datum()` > actually returns a `GenericData$Record` in runtime, not a `MyCustomClass` > instance. > And indeed, when the following line executes: > {code} > MyCustomClass first = records.first()._1.datum(); > {code} > I get an exception: > {code} > java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record > cannot be cast to my.package.containing.MyCustomClass > {code} > Am I doing it wrong? Or is this not possible in Java? -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org