[ 
https://issues.apache.org/jira/browse/PIG-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967849#comment-15967849
 ] 

Nandor Kollar commented on PIG-5171:
------------------------------------

I think the problem is with Spark and not with MR. Attached a minimal unit test 
to show that the results are different even for 2 rows. When I turn off the 
secondary key optimizer for Spark, the test passes. Despite I named it as 
Union_3 my sense is that the problem is the same for all three test cases I 
mentioned above and if we make the attached unit test pass, all three E2E tests 
will pass too.
As for the root cause, it seems that after secondary key optimization in MR we 
set the sort order on the mapreduce operation:
{code}
mr.setSecondarySortOrder(info.getSecondarySortOrder());
{code}
this is missing from Spark, so if you do an alternating desc and asc sort, the 
test will fail, because for Spark we don't take into account which field should 
be sorted in which order.

> SecondarySort_7 is failing with spark exec type
> -----------------------------------------------
>
>                 Key: PIG-5171
>                 URL: https://issues.apache.org/jira/browse/PIG-5171
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Nandor Kollar
>             Fix For: spark-branch
>
>         Attachments: TestUnion_3.java
>
>
> different output produced



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to