[
https://issues.apache.org/jira/browse/PIG-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967849#comment-15967849
]
Nandor Kollar commented on PIG-5171:
------------------------------------
I think the problem is with Spark and not with MR. Attached a minimal unit test
to show that the results are different even for 2 rows. When I turn off the
secondary key optimizer for Spark, the test passes. Despite I named it as
Union_3 my sense is that the problem is the same for all three test cases I
mentioned above and if we make the attached unit test pass, all three E2E tests
will pass too.
As for the root cause, it seems that after secondary key optimization in MR we
set the sort order on the mapreduce operation:
{code}
mr.setSecondarySortOrder(info.getSecondarySortOrder());
{code}
this is missing from Spark, so if you do an alternating desc and asc sort, the
test will fail, because for Spark we don't take into account which field should
be sorted in which order.
> SecondarySort_7 is failing with spark exec type
> -----------------------------------------------
>
> Key: PIG-5171
> URL: https://issues.apache.org/jira/browse/PIG-5171
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Nandor Kollar
> Fix For: spark-branch
>
> Attachments: TestUnion_3.java
>
>
> different output produced
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)