[ 
https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404189#comment-16404189
 ] 

Alex Hagerman commented on ARROW-640:
-------------------------------------

I've added the __hash__ for ints and opened a PR. __eq__ was already in place 
using as_py() in relation to the original ticket. Happy to look into the other 
types and explore different ways to handle hashing them as well as any 
extension of as_py that might be needed if some direction or new tickets could 
be provided. Otherwise I'll look at what else is open that I might be able to 
help with.

Timing information is below.

import pyarrow as pa
arr = pa.array([1,1,2,1])
a = arr[0]
%timeit a.__hash__()
265 ns ± 1.72 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

> [Python] Arrow scalar values should have a sensible __hash__ and comparison
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-640
>                 URL: https://issues.apache.org/jira/browse/ARROW-640
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Miki Tebeka
>            Assignee: Alex Hagerman
>            Priority: Major
>             Fix For: 0.10.0
>
>
> {noformat}
> In [86]: arr = pa.from_pylist([1, 1, 1, 2])
> In [87]: set(arr)
> Out[87]: {1, 2, 1, 1}
> In [88]: arr[0] == arr[1]
> Out[88]: False
> In [89]: arr
> Out[89]: 
> <pyarrow.array.Int64Array object at 0x7f8c8c739e08>
> [
>   1,
>   1,
>   1,
>   2
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to