Hello! Apologies if this has been brought before. I'd like to get devs' thoughts on this potential inconsistency of "what are the python objects for null values" between pandas and pyarrow.
Demonstrated with the following example: (1) pandas seems to use "np.NaN" to represent a missing value (with pandas 1.2.4): In [*32*]: df Out[*32*]: value key 1 some_strign In [*33*]: df2 Out[*33*]: value2 key 2 some_other_string In [*34*]: df.join(df2) Out[*34*]: value value2 key 1 some_strign *NaN* (2) pyarrow seems to use "None" to represent a missing value (4.0.1) >>> s = pd.Series(["some_string", np.NaN]) >>> s 0 some_string 1 NaN dtype: object >>> pa.Array.from_pandas(s).to_pandas() 0 some_string 1 None dtype: object I have looked around the pyarrow doc and didn't find an option to use np.NaN for null values with to_pandas so it's a bit hard to get around trip consistency. I appreciate any thoughts on this as to how to achieve consistency here. Thanks! Li