[ 
https://issues.apache.org/jira/browse/PIG-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880270#comment-15880270
 ] 

Nandor Kollar commented on PIG-5134:
------------------------------------

[~kellyzly] yes, I executed this test on spark branch without your merge 
commit, and all tests in TestAvroStorage passed. After the merge, only one 
failed, and it was a new test in trunk since the last rebase. 
It looks like we have two options to fix this, I'll attach one soon:
- use Kryo serialization, instead of Spark 1.6.1 default Java serialization
- upgrade to Spark 2.0
- ask Avro to make this class Serializable (least preferred option)

I'd vote for the first option now, and later on once we make spark branch 
stable, we should upgrade to Spark 2.0, which uses Kryo serialization by 
default. It is also told, that Kryo is 10x faster than default serialization, I 
guess that's why Spark moved to Kryo from Java serialization.

> fix  TestAvroStorage unit test failures after PIG-5132
> ------------------------------------------------------
>
>                 Key: PIG-5134
>                 URL: https://issues.apache.org/jira/browse/PIG-5134
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Nandor Kollar
>             Fix For: spark-branch
>
>
> It seems that test fails, because Avro GenericData#Record doesn't implement 
> Serializable interface:
> {code}
> 2017-02-23 09:14:41,887 ERROR [main] spark.JobGraphBuilder 
> (JobGraphBuilder.java:sparkOperToRDD(183)) - throw exception in 
> sparkOperToRDD: 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 
> in stage 9.0 (TID 9) had a not serializable result: 
> org.apache.avro.generic.GenericData$Record
> Serialization stack:
>       - object not serializable (class: 
> org.apache.avro.generic.GenericData$Record, value: {"key": "stuff in closet", 
> "value1": {"thing": "hat", "count": 7}, "value2": {"thing": "coat", "count": 
> 2}})
>       - field (class: org.apache.pig.impl.util.avro.AvroTupleWrapper, name: 
> avroObject, type: interface org.apache.avro.generic.IndexedRecord)
>       - object (class org.apache.pig.impl.util.avro.AvroTupleWrapper, 
> org.apache.pig.impl.util.avro.AvroTupleWrapper@3d3a58c1)
>       at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
> {code}
> The failing tests is a new test introduced with merging trunk to spark 
> branch, that's why we didn't see this error before.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to