What is the right way of saving any PairRDD into avro output format. GraphArray extends SpecificRecord etc.
I have the following java rdd:
JavaPairRDD<GraphArray, NullWritable> pairRDD = ...
and want to save it to avro format:
org.apache.hadoop.mapred.JobConf jc = new org.apache.hadoop.mapred.JobConf(); org.apache.avro.mapred.AvroJob.setOutputSchema(jc, GraphArray.getClassSchema());
org.apache.avro.mapred.AvroOutputFormat.setOutputPath(jc, new Path(outURI));
pairRDD.saveAsHadoopDataset(jc);

the code above throws:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: org.apache.hadoop.io.NullWritable

I also tried wrapping key and values with AvroKey and AvroValue classes respectively.

What am I doing wrong? Should I use JavaRDD (list) instead and try with custom serializer?

Thanks,



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to