[ https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274385#comment-16274385 ]
Nandor Kollar commented on PIG-5318: ------------------------------------ Attached PIG-5318_2.patch, I addressed Rohini's comments there. As of {{TestStoreInstances}} failure, it looks like Spark (unlike Tez and MapReduce) creates multiple instances from {{PigOutputFormat}} while setting up the output committers: [setupCommitter|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L74] is called from both [setupJob|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L138] and from [setupTask|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L165], and {{setupCommitter}} creates a new {{PigOutputFormat}} each time, saving in a private variable. In addition, when Spark writes to files, a new {{PigOutputFormat}} is [getting created|https://github.com/apache/spark/blob/branch-2.2/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala#L75] too, and since POStores are saved and deserialized in configuration, but StoreFuncInterface inside stores are [transient|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStore.java#L53], a new instance of {{STFuncCheckInstances}} is getting created, each time, thus {{putNext}} and {{commitTask}} will use different array instances. Not sure if it is a bug in Pig, or in Spark, should Spark consistently use the same OutputFormat instance in this case? Making {{reduceStores}}, {{mapStores}}, {{currentConf}} static inside {{TestStoreInstances}} would solve the problem, [~rohini], [~kellyzly] what do you think about this solution? > Unit test failures on Pig on Spark with Spark 2.2 > ------------------------------------------------- > > Key: PIG-5318 > URL: https://issues.apache.org/jira/browse/PIG-5318 > Project: Pig > Issue Type: Bug > Components: spark > Reporter: Nandor Kollar > Assignee: Nandor Kollar > Attachments: PIG-5318_1.patch, PIG-5318_2.patch > > > There are several failing cases when executing the unit tests with Spark 2.2: > {code} > org.apache.pig.test.TestAssert#testNegativeWithoutFetch > org.apache.pig.test.TestAssert#testNegative > org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch > org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput > org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore > org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication > org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore > {code} > All of these are related to fixes/changes in Spark. > TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed > by asserting on the message of the exception's root cause, looks like on > Spark 2.2 the exception is wrapped into an additional layer. > TestStore and TestStoreLocal failure are also a test related problems: looks > like SPARK-7953 is fixed in Spark 2.2 > The root cause of TestStoreInstances is yet to be found out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)