[ https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394627#comment-16394627 ]
Alex Hagerman edited comment on ARROW-640 at 3/11/18 9:01 PM: -------------------------------------------------------------- I think this has changed since the original ticket. The comparison appears to be working. Tested this with string and numbers. Also getting an error on set now. Going to continue looking into this, but if anybody has thoughts on this I'd be happy to hear them. Also from_pylist appears to have been removed, but I didn't find it searching the change log on github only an addition in 0.3. I'm going to look at the history or __eq__ on the ScalarValue and as_py then work on what would make sense for __hash__. {code:java} %load_ext Cython import pyarrow as pa pylist = [1,1,1,2] arr = pa.array(pylist) arr <pyarrow.lib.Int64Array object at 0x7fbad56e4c28> [ 1, 1, 1, 2 ] arr[0] == arr[1] True arr[0] == arr[3] False word_list = ['test', 'not the same', 'test', 'nope'] word_list[0] == word_list[2] True word_list[0] == word_list[1] False pa.array.__eq__ <method-wrapper '__eq__' of builtin_function_or_method object at 0x7fbaab609990> set(arr) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-9-ba21c71e79f9> in <module>() ----> 1 set(arr) TypeError: unhashable type: 'pyarrow.lib.Int64Value' arr_list = pa.from_pylist([1, 1, 1, 2]) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-10-30966022c9ed> in <module>() ----> 1 arr_list = pa.from_pylist([1, 1, 1, 2]) AttributeError: module 'pyarrow' has no attribute 'from_pylist' {code} was (Author: alexhagerman): I think this has changed since the original ticket. The comparison appears to be working. Tested this with string and numbers. Also getting an error on set now. Going to continue looking into this, but if anybody has thoughts on this I'd be happy to hear them. Also from_pylist appears to have been removed, but I didn't find it searching the change log on github only an addition in 0.3. {code:java} %load_ext Cython import pyarrow as pa pylist = [1,1,1,2] arr = pa.array(pylist) arr <pyarrow.lib.Int64Array object at 0x7fbad56e4c28> [ 1, 1, 1, 2 ] arr[0] == arr[1] True arr[0] == arr[3] False word_list = ['test', 'not the same', 'test', 'nope'] word_list[0] == word_list[2] True word_list[0] == word_list[1] False pa.array.__eq__ <method-wrapper '__eq__' of builtin_function_or_method object at 0x7fbaab609990> set(arr) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-9-ba21c71e79f9> in <module>() ----> 1 set(arr) TypeError: unhashable type: 'pyarrow.lib.Int64Value' arr_list = pa.from_pylist([1, 1, 1, 2]) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-10-30966022c9ed> in <module>() ----> 1 arr_list = pa.from_pylist([1, 1, 1, 2]) AttributeError: module 'pyarrow' has no attribute 'from_pylist' {code} > [Python] Arrow scalar values should have a sensible __hash__ and comparison > --------------------------------------------------------------------------- > > Key: ARROW-640 > URL: https://issues.apache.org/jira/browse/ARROW-640 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Miki Tebeka > Assignee: Alex Hagerman > Priority: Major > Fix For: 0.10.0 > > > {noformat} > In [86]: arr = pa.from_pylist([1, 1, 1, 2]) > In [87]: set(arr) > Out[87]: {1, 2, 1, 1} > In [88]: arr[0] == arr[1] > Out[88]: False > In [89]: arr > Out[89]: > <pyarrow.array.Int64Array object at 0x7f8c8c739e08> > [ > 1, > 1, > 1, > 2 > ] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)