You can use this Maven dependency: <dependency> <groupId>com.twitter</groupId> <artifactId>chill-avro</artifactId> <version>0.4.0</version> </dependency>
Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Tue, Dec 9, 2014 at 9:53 AM, Cristovao Jose Domingues Cordeiro < cristovao.corde...@cern.ch> wrote: > Thanks for the reply! > > I've tried in fact your code. But I lack the twiter chill package and I > can not find it online. So I am now trying this > http://spark.apache.org/docs/latest/tuning.html#data-serialization . But > in case I can't do it, could you tell me where to get that Twiter package > you used? > > Thanks > > Cumprimentos / Best regards, > Cristóvão José Domingues Cordeiro > IT Department - 28/R-018 > CERN > ------------------------------ > *From:* Simone Franzini [captainfr...@gmail.com] > *Sent:* 09 December 2014 16:42 > *To:* Cristovao Jose Domingues Cordeiro; user > > *Subject:* Re: NullPointerException When Reading Avro Sequence Files > > Hi Cristovao, > > I have seen a very similar issue that I have posted about in this thread: > > http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-NPE-with-Array-td19797.html > I think your main issue here is somewhat similar, in that the MapWrapper > Scala class is not registered. This gets registered by the Twitter > chill-scala AllScalaRegistrar class that you are currently not using. > > As far as I understand, in order to use Avro with Spark, you also have > to use Kryo. This means you have to use the Spark KryoSerializer. This in > turn uses Twitter chill. I posted the basic code that I am using here: > > > http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-read-this-avro-file-using-spark-amp-scala-td19400.html#a19491 > > Maybe there is a simpler solution to your problem but I am not that much > of an expert yet. I hope this helps. > > Simone Franzini, PhD > > http://www.linkedin.com/in/simonefranzini > > On Tue, Dec 9, 2014 at 8:50 AM, Cristovao Jose Domingues Cordeiro < > cristovao.corde...@cern.ch> wrote: > >> Hi Simone, >> >> thanks but I don't think that's it. >> I've tried several libraries within the --jar argument. Some do give what >> you said. But other times (when I put the right version I guess) I get the >> following: >> 14/12/09 15:45:54 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID >> 0) >> java.io.NotSerializableException: >> scala.collection.convert.Wrappers$MapWrapper >> at >> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) >> at >> java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) >> >> >> Which is odd since I am reading a Avro I wrote...with the same piece of >> code: >> https://gist.github.com/MLnick/5864741781b9340cb211 >> >> Cumprimentos / Best regards, >> Cristóvão José Domingues Cordeiro >> IT Department - 28/R-018 >> CERN >> ------------------------------ >> *From:* Simone Franzini [captainfr...@gmail.com] >> *Sent:* 06 December 2014 15:48 >> *To:* Cristovao Jose Domingues Cordeiro >> *Subject:* Re: NullPointerException When Reading Avro Sequence Files >> >> java.lang.IncompatibleClassChangeError: Found interface >> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected >> >> That is a sign that you are mixing up versions of Hadoop. This is >> particularly an issue when dealing with AVRO. If you are using Hadoop 2, >> you will need to get the hadoop 2 version of avro-mapred. In Maven you >> would do this with the <classifier> hadoop2 </classifier> tag. >> >> Simone Franzini, PhD >> >> http://www.linkedin.com/in/simonefranzini >> >> On Fri, Dec 5, 2014 at 3:52 AM, cjdc <cristovao.corde...@cern.ch> wrote: >> >>> Hi all, >>> >>> I've tried the above example on Gist, but it doesn't work (at least for >>> me). >>> Did anyone get this: >>> 14/12/05 10:44:40 ERROR Executor: Exception in task 0.0 in stage 0.0 >>> (TID 0) >>> java.lang.IncompatibleClassChangeError: Found interface >>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected >>> at >>> >>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47) >>> at >>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115) >>> at >>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103) >>> at >>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65) >>> at >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >>> at >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >>> at >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) >>> at org.apache.spark.scheduler.Task.run(Task.scala:54) >>> at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> 14/12/05 10:44:40 ERROR ExecutorUncaughtExceptionHandler: Uncaught >>> exception >>> in thread Thread[Executor task launch worker-0,5,main] >>> java.lang.IncompatibleClassChangeError: Found interface >>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected >>> at >>> >>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47) >>> at >>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115) >>> at >>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103) >>> at >>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65) >>> at >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >>> at >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >>> at >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) >>> at org.apache.spark.scheduler.Task.run(Task.scala:54) >>> at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> 14/12/05 10:44:40 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 >>> times; >>> aborting job >>> >>> >>> Thanks >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-reading-Avro-Sequence-files-tp10201p20456.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >