[ 
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274385#comment-16274385
 ] 

Nandor Kollar commented on PIG-5318:
------------------------------------

Attached PIG-5318_2.patch, I addressed Rohini's comments there.

As of {{TestStoreInstances}} failure, it looks like Spark (unlike Tez and 
MapReduce) creates multiple instances from {{PigOutputFormat}} while setting up 
the output committers: 
[setupCommitter|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L74]
 is called from both 
[setupJob|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L138]
 and from 
[setupTask|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L165],
 and {{setupCommitter}} creates a new {{PigOutputFormat}} each time, saving in 
a private variable. In addition, when Spark writes to files, a new 
{{PigOutputFormat}} is [getting 
created|https://github.com/apache/spark/blob/branch-2.2/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala#L75]
 too, and since POStores are saved and deserialized in configuration, but 
StoreFuncInterface inside stores are 
[transient|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStore.java#L53],
 a new instance of {{STFuncCheckInstances}} is getting created, each time, thus 
{{putNext}} and {{commitTask}} will use different array instances. Not sure if 
it is a bug in Pig, or in Spark, should Spark consistently use the same 
OutputFormat instance in this case?

Making {{reduceStores}}, {{mapStores}}, {{currentConf}} static inside 
{{TestStoreInstances}} would solve the problem, [~rohini], [~kellyzly] what do 
you think about this solution?

> Unit test failures on Pig on Spark with Spark 2.2
> -------------------------------------------------
>
>                 Key: PIG-5318
>                 URL: https://issues.apache.org/jira/browse/PIG-5318
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: Nandor Kollar
>         Attachments: PIG-5318_1.patch, PIG-5318_2.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
>  org.apache.pig.test.TestAssert#testNegativeWithoutFetch
>  org.apache.pig.test.TestAssert#testNegative
>  org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
>  org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
>  org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
>  org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
>  org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed 
> by asserting on the message of the exception's root cause, looks like on 
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks 
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to