Matthew Gilbert created ARROW-2135: -------------------------------------- Summary: from_pandas improperly casting NaNs Key: ARROW-2135 URL: https://issues.apache.org/jira/browse/ARROW-2135 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.8.0 Reporter: Matthew Gilbert
If you create a {{Table}} from a {{DataFrame}} of ints with a NaN value the NaN is improperly cast. Since pandas casts these to floats, when converted to a table the NaN is interpreted as an integer. This seems like a bug since a known limitation in pandas (the inability to have null valued integers data) is taking precedence over arrow's functionality to store these as an IntArray with nulls. {code} import pyarrow as pa import pandas as pd df = pd.DataFrame({"a":[1, 2, pd.np.NaN]}) schema = pa.schema([pa.field("a", pa.int64(), nullable=True)]) table = pa.Table.from_pandas(df, schema=schema) table[0] <pyarrow.lib.Column object at 0x7f2151d19c90> chunk 0: <pyarrow.lib.Int64Array object at 0x7f213bf356d8> [ 1, 2, -9223372036854775808 ]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)