David Lee created ARROW-3956: -------------------------------- Summary: [Python] ParquetWriter.write_table isn't working Key: ARROW-3956 URL: https://issues.apache.org/jira/browse/ARROW-3956 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.11.1 Reporter: David Lee
ParquetWriter.write_table is erroring out on table schema doesn't match file schema, but it does match. Error: {code:java} >>> writer.write_table(arrow_table) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "../lib/python3.6/site-packages/pyarrow/parquet.py", line 374, in write_table raise ValueError(msg) ValueError: Table schema does not match schema used to create file: table: col1: int64 col2: int64 metadata -------- {b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [{"name":' b' "col1", "field_name": "col1", "pandas_type": "int64", "numpy_ty' b'pe": "int64", "metadata": null}, {"name": "col2", "field_name": ' b'"col2", "pandas_type": "int64", "numpy_type": "int64", "metadata' b'": null}], "pandas_version": "0.23.4"}'} vs. file: col1: int64 col2: int64 {code} Test Script: {code:java} import pyarrow as pa import pyarrow.parquet as pq import pandas as pd d = {'col1': [1, 2], 'col2': [3, 4]} df = pd.DataFrame(data=d) arrow_table = pa.Table.from_pandas(df, preserve_index=False) arrow_table pq.write_table(arrow_table, "test.parquet") test_schema = pa.schema([ pa.field('col1', pa.int64()), pa.field('col2', pa.int64()) ]) writer = pq.ParquetWriter("test2.parquet", use_dictionary=True, schema = test_schema, compression='snappy') writer.write_table(arrow_table) writer.close() {code} write_table() works, but ParquetWriter.write_table does not.. I think something is wrong with the schema object. -- This message was sent by Atlassian JIRA (v7.6.3#76005)