Re: [PR] GH-47201: [Python][Parquet] Extending the schema and writing it back does not update Spark schema metadata [arrow]

via GitHub Fri, 17 Oct 2025 12:39:49 -0700


AlenkaF commented on PR #47253:
URL: https://github.com/apache/arrow/pull/47253#issuecomment-3416863223


   I had another look at it today. I also found a part in the documentation 
that should fit with this use case:
   
https://arrow.apache.org/docs/python/parquet.html#writing-metadata-and-common-metadata-files
   
   Unfortunately I wasn't able to make it work with the example from the 
reported issue.
   
   I am still not comfortable to add spark row group metadata changes in 
PyArrow though I agree we should make it easier in PyArrow to handle such cases.
   
   Would you be willing to also give a try at the proposed `_metadata` or 
`_common_metadata` files from linked documentation and see if it would be 
something we can use and potentially simplify?
   
   @pitrou what do you think about PyArrow updating spark row group metadata in 
cases where PyArrow is being used for data manipulation with schema changes in 
between spark workloads?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-47201: [Python][Parquet] Extending the schema and writing it back does not update Spark schema metadata [arrow]

Reply via email to