[ https://issues.apache.org/jira/browse/ARROW-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-16431: ----------------------------------- Labels: pull-request-available (was: ) > [C++][Parquet] Improve error message in append_row_groups() when appending > disjoint metadata > -------------------------------------------------------------------------------------------- > > Key: ARROW-16431 > URL: https://issues.apache.org/jira/browse/ARROW-16431 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Parquet > Reporter: Michael Milton > Assignee: Miles Granger > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently if you try to append together metadata from row groups with > different schemas (?), you get the following error: > {code:java} > File > "/home/mmilton/.conda/envs/mmilton/envs/driverpipe/lib/python3.9/site-packages/dask/dataframe/io/parquet/arrow.py", > line 52, in _append_row_groups > metadata.append_row_groups(md) > File "pyarrow/_parquet.pyx", line 628, in > pyarrow._parquet.FileMetaData.append_row_groups > self._metadata.AppendRowGroups(deref(c_metadata)) > RuntimeError: AppendRowGroups requires equal schemas. > {code} > What would be useful here is to actually pass the schema difference in the > error object in terms of which columns disagree. This information should > _also_ be in the error message. > For example if it said: > {code:java} > RuntimeError: AppendRowGroups requires equal schemas. Column "foo" was > previously an int32 but the latest row group is storing it as an int64 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)