[
https://issues.apache.org/jira/browse/PIG-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363875#comment-17363875
]
Koji Noguchi commented on PIG-5319:
-----------------------------------
I do see OutputFormat created twice (*** below)
Using Spark-2.4
{code:java|title=SparkHadoopWriter.scala}
117 committer.setupTask(taskContext). ***
118
119 // Initiate the writer.
120 config.initWriter(taskContext, sparkPartitionId) ***
{code}
Within setupTask and initWriter, each is creating a separate OutputFormat.
Trace for each.
{noformat}
SparkHadoopWriter.scala:117 committer.setupTask(taskContext)
--> HadoopMapReduceCommitProtocol.scala:217 setupCommitter(taskContext)
--> --> HadoopMapReduceCommitProtocol.scala:94 val format =
context.getOutputFormatClass.newInstance()
{noformat}
and
{noformat}
SparkHadoopWriter.scala:120 config.initWriter(taskContext, sparkPartitionId)
--> SparkHadoopWriter.scala:343 val taskFormat = getOutputFormat()
--> --> SparkHadoopWriter.scala:384 outputFormat.newInstance()
{noformat}
> Investigate why TestStoreInstances fails with Spark 2.2
> -------------------------------------------------------
>
> Key: PIG-5319
> URL: https://issues.apache.org/jira/browse/PIG-5319
> Project: Pig
> Issue Type: Bug
> Components: spark
> Reporter: Nándor Kollár
> Priority: Major
>
> TestStoreInstances unit test fails with Spark 2.2.x. It seems in job and task
> commit logic changed a lot since Spark 2.1.x, now it looks like Spark uses a
> different PigOutputFormat when writing to files, and a different one when
> getting the OutputCommitters
--
This message was sent by Atlassian Jira
(v8.3.4#803005)