[ https://issues.apache.org/jira/browse/ARROW-12762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-12762: ------------------------------------------ Summary: [Python] ListType doesn't preserve field name after pickle and unpickle (was: [Python] pyarrow.lib.Schema equality fails after pickle and unpickle) > [Python] ListType doesn't preserve field name after pickle and unpickle > ----------------------------------------------------------------------- > > Key: ARROW-12762 > URL: https://issues.apache.org/jira/browse/ARROW-12762 > Project: Apache Arrow > Issue Type: Bug > Affects Versions: 4.0.0 > Reporter: Juan Galvez > Priority: Major > > Here is a small reproducer: > {code:python} > import pandas as pd > from pyspark.sql import SparkSession > import pyarrow.parquet as pq > import pickle > df = pd.DataFrame( > { > "A": [ > ["aa", "bb "], > ["c"], > ["d", "ee", "", "f"], > ["ggg", "H"], > [""], > ] > } > ) > spark = SparkSession.builder.appName("GenSparkData").getOrCreate() > spark_df = spark.createDataFrame(df) > spark_df.write.parquet("list_str.pq", "overwrite") > ds = pq.ParquetDataset("list_str.pq") > assert pickle.loads(pickle.dumps(ds.schema)) == ds.schema # PASSES > assert pickle.loads(pickle.dumps(ds.schema.to_arrow_schema())) == > ds.schema.to_arrow_schema() # FAILS > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)