[ https://issues.apache.org/jira/browse/PIG-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009606#comment-16009606 ]
Rohini Palaniswamy edited comment on PIG-5134 at 5/14/17 5:48 AM: ------------------------------------------------------------------ bq. I suggest to exclude TestAvroStorage from unittest and not fixed in first release of pig on spark Lets go with Nandor's second patch. Folks using custom tuple are a minority, but many use AvroStorage. So it would be good to have it fixed. [~nkollar], Can you revert the wild card imports? They are generally not recommended in PIG-5134_2.patch. After fixing that, we can have this patch committed. Please do create a new jira for tracking the current issue and exploring kryo serialization which we can look at a later release. was (Author: rohini): bq. I suggest to exclude TestAvroStorage from unittest and not fixed in first release of pig on spark Lets go with Nandor's second patch. Folks using custom tuple are a minority, but many use AvroStorage. So it would be good to have it fixed. [~nkollar], Can you revert the wild card imports? They are generally not recommended in PIG-5134_2.patch. After fixing that, we can have this patch committed. > Fix TestAvroStorage unit test in Spark mode > ------------------------------------------- > > Key: PIG-5134 > URL: https://issues.apache.org/jira/browse/PIG-5134 > Project: Pig > Issue Type: Bug > Components: spark > Reporter: liyunzhang_intel > Assignee: Nandor Kollar > Fix For: spark-branch > > Attachments: PIG-5134_2.patch, PIG-5134.patch > > > It seems that test fails, because Avro GenericData#Record doesn't implement > Serializable interface: > {code} > 2017-02-23 09:14:41,887 ERROR [main] spark.JobGraphBuilder > (JobGraphBuilder.java:sparkOperToRDD(183)) - throw exception in > sparkOperToRDD: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 > in stage 9.0 (TID 9) had a not serializable result: > org.apache.avro.generic.GenericData$Record > Serialization stack: > - object not serializable (class: > org.apache.avro.generic.GenericData$Record, value: {"key": "stuff in closet", > "value1": {"thing": "hat", "count": 7}, "value2": {"thing": "coat", "count": > 2}}) > - field (class: org.apache.pig.impl.util.avro.AvroTupleWrapper, name: > avroObject, type: interface org.apache.avro.generic.IndexedRecord) > - object (class org.apache.pig.impl.util.avro.AvroTupleWrapper, > org.apache.pig.impl.util.avro.AvroTupleWrapper@3d3a58c1) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) > {code} > The failing tests is a new test introduced with merging trunk to spark > branch, that's why we didn't see this error before. -- This message was sent by Atlassian JIRA (v6.3.15#6346)