[ https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404189#comment-16404189 ]
Alex Hagerman commented on ARROW-640: ------------------------------------- I've added the __hash__ for ints and opened a PR. __eq__ was already in place using as_py() in relation to the original ticket. Happy to look into the other types and explore different ways to handle hashing them as well as any extension of as_py that might be needed if some direction or new tickets could be provided. Otherwise I'll look at what else is open that I might be able to help with. Timing information is below. import pyarrow as pa arr = pa.array([1,1,2,1]) a = arr[0] %timeit a.__hash__() 265 ns ± 1.72 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) > [Python] Arrow scalar values should have a sensible __hash__ and comparison > --------------------------------------------------------------------------- > > Key: ARROW-640 > URL: https://issues.apache.org/jira/browse/ARROW-640 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Miki Tebeka > Assignee: Alex Hagerman > Priority: Major > Fix For: 0.10.0 > > > {noformat} > In [86]: arr = pa.from_pylist([1, 1, 1, 2]) > In [87]: set(arr) > Out[87]: {1, 2, 1, 1} > In [88]: arr[0] == arr[1] > Out[88]: False > In [89]: arr > Out[89]: > <pyarrow.array.Int64Array object at 0x7f8c8c739e08> > [ > 1, > 1, > 1, > 2 > ] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)