Re: NullPointerException When Reading Avro Sequence Files

Simone Franzini Tue, 09 Dec 2014 08:07:36 -0800

You can use this Maven dependency:

<dependency>
    <groupId>com.twitter</groupId>
    <artifactId>chill-avro</artifactId>
    <version>0.4.0</version>
</dependency>


Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

On Tue, Dec 9, 2014 at 9:53 AM, Cristovao Jose Domingues Cordeiro <
cristovao.corde...@cern.ch> wrote:

>  Thanks for the reply!
>
> I've tried in fact your code. But I lack the twiter chill package and I
> can not find it online. So I am now trying this
> http://spark.apache.org/docs/latest/tuning.html#data-serialization . But
> in case I can't do it, could you tell me where to get that Twiter package
> you used?
>
> Thanks
>
>  Cumprimentos / Best regards,
> Cristóvão José Domingues Cordeiro
> IT Department - 28/R-018
> CERN
>    ------------------------------
> *From:* Simone Franzini [captainfr...@gmail.com]
> *Sent:* 09 December 2014 16:42
> *To:* Cristovao Jose Domingues Cordeiro; user
>
> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>
>   Hi Cristovao,
>
> I have seen a very similar issue that I have posted about in this thread:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-NPE-with-Array-td19797.html
>  I think your main issue here is somewhat similar, in that the MapWrapper
> Scala class is not registered. This gets registered by the Twitter
> chill-scala AllScalaRegistrar class that you are currently not using.
>
>  As far as I understand, in order to use Avro with Spark, you also have
> to use Kryo. This means you have to use the Spark KryoSerializer. This in
> turn uses Twitter chill. I posted the basic code that I am using here:
>
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-read-this-avro-file-using-spark-amp-scala-td19400.html#a19491
>
>  Maybe there is a simpler solution to your problem but I am not that much
> of an expert yet. I hope this helps.
>
>  Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>
> On Tue, Dec 9, 2014 at 8:50 AM, Cristovao Jose Domingues Cordeiro <
> cristovao.corde...@cern.ch> wrote:
>
>>  Hi Simone,
>>
>> thanks but I don't think that's it.
>> I've tried several libraries within the --jar argument. Some do give what
>> you said. But other times (when I put the right version I guess) I get the
>> following:
>> 14/12/09 15:45:54 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID
>> 0)
>> java.io.NotSerializableException:
>> scala.collection.convert.Wrappers$MapWrapper
>>         at
>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
>>         at
>> java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
>>
>>
>> Which is odd since I am reading a Avro I wrote...with the same piece of
>> code:
>> https://gist.github.com/MLnick/5864741781b9340cb211
>>
>>  Cumprimentos / Best regards,
>> Cristóvão José Domingues Cordeiro
>> IT Department - 28/R-018
>> CERN
>>    ------------------------------
>> *From:* Simone Franzini [captainfr...@gmail.com]
>> *Sent:* 06 December 2014 15:48
>> *To:* Cristovao Jose Domingues Cordeiro
>> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>>
>>    java.lang.IncompatibleClassChangeError: Found interface
>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>
>>  That is a sign that you are mixing up versions of Hadoop. This is
>> particularly an issue when dealing with AVRO. If you are using Hadoop 2,
>> you will need to get the hadoop 2 version of avro-mapred. In Maven you
>> would do this with the <classifier> hadoop2 </classifier> tag.
>>
>>  Simone Franzini, PhD
>>
>> http://www.linkedin.com/in/simonefranzini
>>
>> On Fri, Dec 5, 2014 at 3:52 AM, cjdc <cristovao.corde...@cern.ch> wrote:
>>
>>> Hi all,
>>>
>>> I've tried the above example on Gist, but it doesn't work (at least for
>>> me).
>>> Did anyone get this:
>>> 14/12/05 10:44:40 ERROR Executor: Exception in task 0.0 in stage 0.0
>>> (TID 0)
>>> java.lang.IncompatibleClassChangeError: Found interface
>>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>>         at
>>>
>>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>>>         at
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
>>>         at
>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
>>>         at
>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>>>         at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>>         at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>         at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>>         at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>>         at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 14/12/05 10:44:40 ERROR ExecutorUncaughtExceptionHandler: Uncaught
>>> exception
>>> in thread Thread[Executor task launch worker-0,5,main]
>>> java.lang.IncompatibleClassChangeError: Found interface
>>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>>         at
>>>
>>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>>>         at
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
>>>         at
>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
>>>         at
>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>>>         at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>>         at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>         at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>>         at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>>         at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 14/12/05 10:44:40 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1
>>> times;
>>> aborting job
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-reading-Avro-Sequence-files-tp10201p20456.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>

Re: NullPointerException When Reading Avro Sequence Files

Reply via email to