[jira] [Commented] (PIG-5134) Fix TestAvroStorage unit test in Spark mode

Rohini Palaniswamy (JIRA) Fri, 24 Mar 2017 12:17:02 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940970#comment-15940970
 ]


Rohini Palaniswamy commented on PIG-5134:
-----------------------------------------

bq. Using Kryo would be better choice I think. 
  Agree with this. Changing code of a Load or StoreFunc to get it working with 
Spark is not a good idea. We might fix the builtin ones, but custom ones from 
user will still break. My only concern with kryo is about incompatibility. This 
and guava are the most problematic ones and people generally resort to shading. 
For eg: HIVE-5915, https://issues.cloudera.org/browse/LIVY-109. But spark does 
not seem to use a shaded kryo version. Could you investigate if that could lead 
to any issues?

bq. name="minlog" rev="1.3"
  Add minlog.version to libraries.properties instead of hardcoding

> Fix TestAvroStorage unit test in Spark mode
> -------------------------------------------
>
>                 Key: PIG-5134
>                 URL: https://issues.apache.org/jira/browse/PIG-5134
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Nandor Kollar
>             Fix For: spark-branch
>
>         Attachments: PIG-5134_2.patch, PIG-5134.patch
>
>
> It seems that test fails, because Avro GenericData#Record doesn't implement 
> Serializable interface:
> {code}
> 2017-02-23 09:14:41,887 ERROR [main] spark.JobGraphBuilder 
> (JobGraphBuilder.java:sparkOperToRDD(183)) - throw exception in 
> sparkOperToRDD: 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 
> in stage 9.0 (TID 9) had a not serializable result: 
> org.apache.avro.generic.GenericData$Record
> Serialization stack:
>       - object not serializable (class: 
> org.apache.avro.generic.GenericData$Record, value: {"key": "stuff in closet", 
> "value1": {"thing": "hat", "count": 7}, "value2": {"thing": "coat", "count": 
> 2}})
>       - field (class: org.apache.pig.impl.util.avro.AvroTupleWrapper, name: 
> avroObject, type: interface org.apache.avro.generic.IndexedRecord)
>       - object (class org.apache.pig.impl.util.avro.AvroTupleWrapper, 
> org.apache.pig.impl.util.avro.AvroTupleWrapper@3d3a58c1)
>       at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
> {code}
> The failing tests is a new test introduced with merging trunk to spark 
> branch, that's why we didn't see this error before.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5134) Fix TestAvroStorage unit test in Spark mode

Reply via email to