[jira] [Resolved] (SPARK-19656) Can't load custom type from avro file to RDD with newAPIHadoopFile

2017-03-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-19656.
---
Resolution: Not A Problem

> Can't load custom type from avro file to RDD with newAPIHadoopFile
> --
>
> Key: SPARK-19656
> URL: https://issues.apache.org/jira/browse/SPARK-19656
> Project: Spark
>  Issue Type: Question
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: Nira Amit
>
> If I understand correctly, in scala it's possible to load custom objects from 
> avro files to RDDs this way:
> {code}
> ctx.hadoopFile("/path/to/the/avro/file.avro",
>   classOf[AvroInputFormat[MyClassInAvroFile]],
>   classOf[AvroWrapper[MyClassInAvroFile]],
>   classOf[NullWritable])
> {code}
> I'm not a scala developer, so I tried to "translate" this to java as best I 
> could. I created classes that extend AvroKey and FileInputFormat:
> {code}
> public static class MyCustomAvroKey extends AvroKey{};
> public static class MyCustomAvroReader extends 
> AvroRecordReaderBase {
> // with my custom schema and all the required methods...
> }
> public static class MyCustomInputFormat extends 
> FileInputFormat{
> @Override
> public RecordReader 
> createRecordReader(InputSplit inputSplit, TaskAttemptContext 
> taskAttemptContext) throws IOException, InterruptedException {
> return new MyCustomAvroReader();
> }
> }
> ...
> JavaPairRDD records =
> sc.newAPIHadoopFile("file:/path/to/datafile.avro",
> MyCustomInputFormat.class, MyCustomAvroKey.class,
> NullWritable.class,
> sc.hadoopConfiguration());
> MyCustomClass first = records.first()._1.datum();
> System.out.println("Got a result, some custom field: " + 
> first.getSomeCustomField());
> {code}
> This compiles fine, but using a debugger I can see that `first._1.datum()` 
> actually returns a `GenericData$Record` in runtime, not a `MyCustomClass` 
> instance.
> And indeed, when the following line executes:
> {code}
> MyCustomClass first = records.first()._1.datum();
> {code}
> I get an exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record 
> cannot be cast to my.package.containing.MyCustomClass
> {code}
> Am I doing it wrong? Or is this not possible in Java?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19656) Can't load custom type from avro file to RDD with newAPIHadoopFile

2017-03-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-19656.
---
Resolution: Fixed

I do not see anything surprising given your description. Please don't reopen 
this.

> Can't load custom type from avro file to RDD with newAPIHadoopFile
> --
>
> Key: SPARK-19656
> URL: https://issues.apache.org/jira/browse/SPARK-19656
> Project: Spark
>  Issue Type: Question
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: Nira Amit
>
> If I understand correctly, in scala it's possible to load custom objects from 
> avro files to RDDs this way:
> {code}
> ctx.hadoopFile("/path/to/the/avro/file.avro",
>   classOf[AvroInputFormat[MyClassInAvroFile]],
>   classOf[AvroWrapper[MyClassInAvroFile]],
>   classOf[NullWritable])
> {code}
> I'm not a scala developer, so I tried to "translate" this to java as best I 
> could. I created classes that extend AvroKey and FileInputFormat:
> {code}
> public static class MyCustomAvroKey extends AvroKey{};
> public static class MyCustomAvroReader extends 
> AvroRecordReaderBase {
> // with my custom schema and all the required methods...
> }
> public static class MyCustomInputFormat extends 
> FileInputFormat{
> @Override
> public RecordReader 
> createRecordReader(InputSplit inputSplit, TaskAttemptContext 
> taskAttemptContext) throws IOException, InterruptedException {
> return new MyCustomAvroReader();
> }
> }
> ...
> JavaPairRDD records =
> sc.newAPIHadoopFile("file:/path/to/datafile.avro",
> MyCustomInputFormat.class, MyCustomAvroKey.class,
> NullWritable.class,
> sc.hadoopConfiguration());
> MyCustomClass first = records.first()._1.datum();
> System.out.println("Got a result, some custom field: " + 
> first.getSomeCustomField());
> {code}
> This compiles fine, but using a debugger I can see that `first._1.datum()` 
> actually returns a `GenericData$Record` in runtime, not a `MyCustomClass` 
> instance.
> And indeed, when the following line executes:
> {code}
> MyCustomClass first = records.first()._1.datum();
> {code}
> I get an exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record 
> cannot be cast to my.package.containing.MyCustomClass
> {code}
> Am I doing it wrong? Or is this not possible in Java?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19656) Can't load custom type from avro file to RDD with newAPIHadoopFile

2017-03-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-19656.
---
Resolution: Not A Problem

> Can't load custom type from avro file to RDD with newAPIHadoopFile
> --
>
> Key: SPARK-19656
> URL: https://issues.apache.org/jira/browse/SPARK-19656
> Project: Spark
>  Issue Type: Question
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: Nira Amit
>
> If I understand correctly, in scala it's possible to load custom objects from 
> avro files to RDDs this way:
> {code}
> ctx.hadoopFile("/path/to/the/avro/file.avro",
>   classOf[AvroInputFormat[MyClassInAvroFile]],
>   classOf[AvroWrapper[MyClassInAvroFile]],
>   classOf[NullWritable])
> {code}
> I'm not a scala developer, so I tried to "translate" this to java as best I 
> could. I created classes that extend AvroKey and FileInputFormat:
> {code}
> public static class MyCustomAvroKey extends AvroKey{};
> public static class MyCustomAvroReader extends 
> AvroRecordReaderBase {
> // with my custom schema and all the required methods...
> }
> public static class MyCustomInputFormat extends 
> FileInputFormat{
> @Override
> public RecordReader 
> createRecordReader(InputSplit inputSplit, TaskAttemptContext 
> taskAttemptContext) throws IOException, InterruptedException {
> return new MyCustomAvroReader();
> }
> }
> ...
> JavaPairRDD records =
> sc.newAPIHadoopFile("file:/path/to/datafile.avro",
> MyCustomInputFormat.class, MyCustomAvroKey.class,
> NullWritable.class,
> sc.hadoopConfiguration());
> MyCustomClass first = records.first()._1.datum();
> System.out.println("Got a result, some custom field: " + 
> first.getSomeCustomField());
> {code}
> This compiles fine, but using a debugger I can see that `first._1.datum()` 
> actually returns a `GenericData$Record` in runtime, not a `MyCustomClass` 
> instance.
> And indeed, when the following line executes:
> {code}
> MyCustomClass first = records.first()._1.datum();
> {code}
> I get an exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record 
> cannot be cast to my.package.containing.MyCustomClass
> {code}
> Am I doing it wrong? Or is this not possible in Java?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org