[ https://issues.apache.org/jira/browse/ARROW-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960902#comment-16960902 ]
Joris Van den Bossche edited comment on ARROW-7002 at 10/28/19 10:18 AM: ------------------------------------------------------------------------- Writing is already supported with pandas master and latest arrow (v0.15), so it is waiting on the next pandas release to have it in a stable version. {code} In [1]: from pyarrow import feather ...: import pandas as pd ...: ...: col1 = pd.Series([0, None, 1, 23]).astype('Int64') ...: col2 = pd.Series([1, 3, 2, 1]).astype('Int64') ...: ...: df = pd.DataFrame({'a': col1, 'b': col2}) ...: ...: feather.write_feather(df, '/tmp/foo') ...: In [2]: pd.read_feather('/tmp/foo') Out[2]: a b 0 0.0 1 1 NaN 3 2 1.0 2 3 23.0 1 {code} So converting to R should work properly. Reading it back in with Python will still give you a float array (if there were NaNs), as that is the default conversion of arrow integer to pandas. There is work going on to also preserve those specific pandas types in that case (see ARROW-2428). was (Author: jorisvandenbossche): Writing is already supported with pandas master and latest arrow (0.15), so it is waiting on the next pandas release to have it in a stable version. {code} In [1]: from pyarrow import feather ...: import pandas as pd ...: ...: col1 = pd.Series([0, None, 1, 23]).astype('Int64') ...: col2 = pd.Series([1, 3, 2, 1]).astype('Int64') ...: ...: df = pd.DataFrame({'a': col1, 'b': col2}) ...: ...: feather.write_feather(df, '/tmp/foo') ...: In [2]: pd.read_feather('/tmp/foo') Out[2]: a b 0 0.0 1 1 NaN 3 2 1.0 2 3 23.0 1 {code} Reading it back in will still give you a float array (if there were NaNs), as that is the default conversion of arrow integer to pandas. There is work going on to also preserve those specific pandas types in that case (see ARROW-2428). > Support pandas nullable integer type Int64 > ------------------------------------------ > > Key: ARROW-7002 > URL: https://issues.apache.org/jira/browse/ARROW-7002 > Project: Apache Arrow > Issue Type: New Feature > Reporter: Christian Roth > Priority: Major > > Pandas has a nullable integer type Int64 which does not seem to be supported > by feather yet. > {code:python} > from pyarrow import feather > import pandas as pd > col1 = pd.Series([0, None, 1, 23]).astype('Int64') > col2 = pd.Series([1, 3, 2, 1]).astype('Int64') > df = pd.DataFrame({'a': col1, 'b': col2}) > feather.write_feather(df, '/tmp/foo') > {code} > Gives following error message: > {code:java} > --------------------------------------------------------------------------- > ArrowTypeError Traceback (most recent call last) > <ipython-input-107-8cc611a30355> in <module> > ----> 1 feather.write_feather(df, '/tmp/foo') > ~/miniconda3/envs/sci36/lib/python3.6/site-packages/pyarrow/feather.py in > write_feather(df, dest) > 181 writer = FeatherWriter(dest) > 182 try: > --> 183 writer.write(df) > 184 except Exception: > 185 # Try to make sure the resource is closed > ~/miniconda3/envs/sci36/lib/python3.6/site-packages/pyarrow/feather.py in > write(self, df) > 92 # TODO(wesm): Remove this length check, see ARROW-1732 > 93 if len(df.columns) > 0: > ---> 94 table = Table.from_pandas(df, preserve_index=False) > 95 for i, name in enumerate(table.schema.names): > 96 col = table[i] > ~/miniconda3/envs/sci36/lib/python3.6/site-packages/pyarrow/table.pxi in > pyarrow.lib.Table.from_pandas() > ~/miniconda3/envs/sci36/lib/python3.6/site-packages/pyarrow/pandas_compat.py > in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe) > 551 if nthreads == 1: > 552 arrays = [convert_column(c, f) > --> 553 for c, f in zip(columns_to_convert, convert_fields)] > 554 else: > 555 from concurrent import futures > ~/miniconda3/envs/sci36/lib/python3.6/site-packages/pyarrow/pandas_compat.py > in <listcomp>(.0) > 551 if nthreads == 1: > 552 arrays = [convert_column(c, f) > --> 553 for c, f in zip(columns_to_convert, convert_fields)] > 554 else: > 555 from concurrent import futures > ~/miniconda3/envs/sci36/lib/python3.6/site-packages/pyarrow/pandas_compat.py > in convert_column(col, field) > 542 e.args += ("Conversion failed for column {0!s} with type > {1!s}" > 543 .format(col.name, col.dtype),) > --> 544 raise e > 545 if not field_nullable and result.null_count > 0: > 546 raise ValueError("Field {} was non-nullable but pandas > column " > ~/miniconda3/envs/sci36/lib/python3.6/site-packages/pyarrow/pandas_compat.py > in convert_column(col, field) > 536 > 537 try: > --> 538 result = pa.array(col, type=type_, from_pandas=True, > safe=safe) > 539 except (pa.ArrowInvalid, > 540 pa.ArrowNotImplementedError, > ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for > column a with type Int64') > {code} > xref: > [https://stackoverflow.com/questions/58571419/exporting-dataframe-with-null-able-int64-from-pandas-to-r] > -- This message was sent by Atlassian Jira (v8.3.4#803005)