Micah Williamson created ARROW-3728: ---------------------------------------
Summary: Merging Parquet Files - Pandas Meta in Schema Mismatch Key: ARROW-3728 URL: https://issues.apache.org/jira/browse/ARROW-3728 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.11.1, 0.11.0, 0.10.0 Environment: Python 3.6.3 OSX 10.14 Reporter: Micah Williamson From: https://stackoverflow.com/questions/53214288/merging-parquet-files-pandas-meta-in-schema-mismatch I am trying to merge multiple parquet files into one. Their schemas are identical field-wise but my {{ParquetWriter}} is complaining that they are not. After some investigation I found that the pandas meta in the schemas are different, causing this error. Sample- {code:python} import pyarrow.parquet as pq pq_tables=[] for file_ in files: pq_table = pq.read_table(f'{MESS_DIR}/{file_}') pq_tables.append(pq_table) if writer is None: writer = pq.ParquetWriter(COMPRESSED_FILE, schema=pq_table.schema, use_deprecated_int96_timestamps=True) writer.write_table(table=pq_table) {code} The error- {code} Traceback (most recent call last): File "{PATH_TO}/main.py", line 68, in lambda_handler writer.write_table(table=pq_table) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/parquet.py", line 335, in write_table raise ValueError(msg) ValueError: Table schema does not match schema used to create file: {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)