[jira] [Commented] (PIG-5228) Orc_2 is failing with spark exec type

Adam Szita (JIRA) Tue, 25 Apr 2017 08:56:26 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983102#comment-15983102
 ]


Adam Szita commented on PIG-5228:
---------------------------------

This is because 'name' and 'age' both get the same location index in the 
HashMap's table in Java (for a default length table (16) they both end up on 
index=7). Hence when we put these into a new map (at either loading or storing 
data) in the order of 'age','name' they will end up swapped, and vica versa. My 
wild guess is that either MR or Spark makes an extra filling of a map somewhere 
under the hood and that's where the difference comes from.

Since we're talking about map entries, any order should be fine and the test 
should not depend on it. In my fix I created an extra Orc test for Spark where 
we project from the map field for every key. See [^PIG-5228.0.patch]. This 
should keep the purpose of the test while making sure we don't depend on the 
map's entry order at result comparison time.
[~kellyzly] can you please take a look?

> Orc_2 is failing with spark exec type
> -------------------------------------
>
>                 Key: PIG-5228
>                 URL: https://issues.apache.org/jira/browse/PIG-5228
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>             Fix For: spark-branch
>
>         Attachments: PIG-5228.0.patch
>
>
> This test is failing due to mismatch in the actual and expected result. The 
> difference is only related to the order of entries in Pig maps such as:
> Actual:
> {code}
> [name#alice, age#18]...
> {code}
> Expected:
> {code}
> [age#18, name#alice]...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5228) Orc_2 is failing with spark exec type

Reply via email to