[ https://issues.apache.org/jira/browse/ARROW-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535872#comment-16535872 ]
Wes McKinney commented on ARROW-2806: ------------------------------------- Oof, I actually think {{pa.array([1, NaN])}} should either raise an exception or return a DoubleArray with a NaN, unless {{from_pandas=True}}. > [Python] Inconsistent handling of np.nan > ---------------------------------------- > > Key: ARROW-2806 > URL: https://issues.apache.org/jira/browse/ARROW-2806 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.9.0 > Reporter: Uwe L. Korn > Priority: Major > Fix For: 0.10.0 > > > Currently we handle {{np.nan}} differently between having a list or a numpy > array as an input to {{pa.array()}}: > {code} > >>> pa.array(np.array([1, np.nan])) > <pyarrow.lib.DoubleArray object at 0x11680bea8> > [ > 1.0, > nan > ] > >>> pa.array([1., np.nan]) > Out[9]: > <pyarrow.lib.DoubleArray object at 0x10bdacbd8> > [ > 1.0, > NA > ] > {code} > I would actually think the last one is the correct one. Especially once one > casts this to an integer column. There the first one produces a column with > INT_MIN and the second one produces a real null. > But, in {{test_array_conversions_no_sentinel_values}} we check that > {{np.nan}} does not produce a Null. > Even weirder: > {code} > >>> df = pd.DataFrame({'a': [1., None]}) > >>> df > a > 0 1.0 > 1 NaN > >>> pa.Table.from_pandas(df).column(0) > <Column name='a' type=DataType(double)> > chunk 0: <pyarrow.lib.DoubleArray object at 0x104bbf958> > [ > 1.0, > NA > ] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)