Jacqueline Nolis created ARROW-10133: ----------------------------------------
Summary: parquet Int64 col cast to float64 on load in pandas Key: ARROW-10133 URL: https://issues.apache.org/jira/browse/ARROW-10133 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.17.1 Reporter: Jacqueline Nolis Attachments: example-failed-int64.parquet Under certain conditions a saved parquet table with a column that is Int64 and all NA seems to be cast to a float64 with all NaN on load. The desired behavior is to have it stay as Int64. Attached is a table where said issue occurs: the second column here should be a int64 but is being loaded as a float64 in Pandas. Interestingly, it seems to be correctly interpreting the column as a Int64 when loading in R, so perhaps its only a Pandas issue. import pyarrow.parquet as pq import boto3 import pandas as pd import io obj = boto3.client('s3').get_object(Bucket="...", Key='...') # file attached to ticket x = pq.read_table(io.BytesIO(obj['Body'].read())) y = x.to_pandas() # this is where the undesired int64 to a float64 cast occurs # >>> x # pyarrow.Table # product_id: string # cost: int64 # name: string # >>> y.dtypes # product_id object # cost float64 # name object # dtype: object -- This message was sent by Atlassian Jira (v8.3.4#803005)