Re: java.io.NotSerializableException: org.apache.avro.mapred.AvroKey using spark with avro

2014-12-17 Thread touchdown
Yeah, I have the same problem with 1.1.0, but not 1.0.0. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-NotSerializableException-org-apache-avro-mapred-AvroKey-using-spark-with-avro-tp15165p20752.html Sent from the Apache Spark User List mailing

Re: Unit Testing (JUnit) with Spark

2014-10-29 Thread touchdown
add these to your dependencies: io.netty % netty % 3.6.6.Final exclude(io.netty, netty-all) to the end of spark and hadoop dependencies reference: https://spark-project.atlassian.net/browse/SPARK-1138 I am using Spark 1.1 so the akka issue is already fixed -- View this message in context:

Re: Is there a way to write spark RDD to Avro files

2014-08-02 Thread touchdown
YES! This worked! thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-way-to-write-spark-RDD-to-Avro-files-tp10947p11245.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Is there a way to write spark RDD to Avro files

2014-08-01 Thread touchdown
Hi, I am facing a similar dilemma. I am trying to aggregate a bunch of small avro files into one avro file. I read it in with: sc.newAPIHadoopFile[AvroKey[GenericRecord], NullWritable, AvroKeyInputFormat[GenericRecord]](path) but I can't find saveAsHadoopFile or saveAsNewAPIHadoopFile. Can you

Re: Is there a way to write spark RDD to Avro files

2014-08-01 Thread touchdown
Yes, I saw that after I looked at it closer. Thanks! But I am running into a schema not set error: Writer schema for output key was not set. Use AvroJob.setOutputKeySchema() I am in the process of figuring out how to set schema for an AvroJob from a HDFS file, but any pointer is much appreciated!