[ 
https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397116#comment-16397116
 ] 

Antoine Pitrou commented on ARROW-640:
--------------------------------------

[~alexhagerman], you'll need to be careful for hashing to be consistent with 
Python scalars (in other words, for every hashable x and y where {{x == y}}, 
{{hash\(x) == hash\(y)}} should also be true).

The simplest way to do that is probably to convert the Arrow value to a Python 
scalar, though that may not be the fastest:
{code:python}
def __hash__(self):
    return hash(self.as_py())
{code}

Otherwise you'll need to reproduce the exact hashing algorithm that Python uses.

> [Python] Arrow scalar values should have a sensible __hash__ and comparison
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-640
>                 URL: https://issues.apache.org/jira/browse/ARROW-640
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Miki Tebeka
>            Assignee: Alex Hagerman
>            Priority: Major
>             Fix For: 0.10.0
>
>
> {noformat}
> In [86]: arr = pa.from_pylist([1, 1, 1, 2])
> In [87]: set(arr)
> Out[87]: {1, 2, 1, 1}
> In [88]: arr[0] == arr[1]
> Out[88]: False
> In [89]: arr
> Out[89]: 
> <pyarrow.array.Int64Array object at 0x7f8c8c739e08>
> [
>   1,
>   1,
>   1,
>   2
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to