Micah Williamson created ARROW-3728:
---------------------------------------

             Summary: Merging Parquet Files - Pandas Meta in Schema Mismatch
                 Key: ARROW-3728
                 URL: https://issues.apache.org/jira/browse/ARROW-3728
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.11.1, 0.11.0, 0.10.0
         Environment: Python 3.6.3
OSX 10.14
            Reporter: Micah Williamson


From: 
https://stackoverflow.com/questions/53214288/merging-parquet-files-pandas-meta-in-schema-mismatch
 
I am trying to merge multiple parquet files into one. Their schemas are 
identical field-wise but my {{ParquetWriter}} is complaining that they are not. 
After some investigation I found that the pandas meta in the schemas are 
different, causing this error.
 
Sample-

{code:python}
import pyarrow.parquet as pq

pq_tables=[]
for file_ in files:
    pq_table = pq.read_table(f'{MESS_DIR}/{file_}')
    pq_tables.append(pq_table)
    if writer is None:
        writer = pq.ParquetWriter(COMPRESSED_FILE, schema=pq_table.schema, 
use_deprecated_int96_timestamps=True)
    writer.write_table(table=pq_table)
{code}

The error-

{code}
Traceback (most recent call last):
  File "{PATH_TO}/main.py", line 68, in lambda_handler
    writer.write_table(table=pq_table)
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/parquet.py",
 line 335, in write_table
    raise ValueError(msg)
ValueError: Table schema does not match schema used to create file:
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to