Hi everybody,
Please, there is an issue with pyarrow (version 4.0.0) when you try to write a
parquet with your engine. It is not possible to write a parquet from a pandas
df when it includes non str columns (datetime64, float64, int64...)
Example:
df = pd.DataFrame({'A':[1, 2, 3], 'B':['a', 'b', 'c']})
df.to_parquet('example.parquet', engine='pyarrow') #Not working
ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for
column InternalId with type float64')
df['A'] = df['A'].astype(str)
df.to_parquet('example.parquet', engine='pyarrow') #Working
Best!
[cid:[email protected]]
Jorge Alarcon
Senior Data Analytics Specialist
Mail: [email protected]
Telf: +34 683541389
28020 Madrid