[ https://issues.apache.org/jira/browse/PIG-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967849#comment-15967849 ]
Nandor Kollar commented on PIG-5171: ------------------------------------ I think the problem is with Spark and not with MR. Attached a minimal unit test to show that the results are different even for 2 rows. When I turn off the secondary key optimizer for Spark, the test passes. Despite I named it as Union_3 my sense is that the problem is the same for all three test cases I mentioned above and if we make the attached unit test pass, all three E2E tests will pass too. As for the root cause, it seems that after secondary key optimization in MR we set the sort order on the mapreduce operation: {code} mr.setSecondarySortOrder(info.getSecondarySortOrder()); {code} this is missing from Spark, so if you do an alternating desc and asc sort, the test will fail, because for Spark we don't take into account which field should be sorted in which order. > SecondarySort_7 is failing with spark exec type > ----------------------------------------------- > > Key: PIG-5171 > URL: https://issues.apache.org/jira/browse/PIG-5171 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Nandor Kollar > Fix For: spark-branch > > Attachments: TestUnion_3.java > > > different output produced -- This message was sent by Atlassian JIRA (v6.3.15#6346)