AlenkaF commented on PR #47253: URL: https://github.com/apache/arrow/pull/47253#issuecomment-3416863223
I had another look at it today. I also found a part in the documentation that should fit with this use case: https://arrow.apache.org/docs/python/parquet.html#writing-metadata-and-common-metadata-files Unfortunately I wasn't able to make it work with the example from the reported issue. I am still not comfortable to add spark row group metadata changes in PyArrow though I agree we should make it easier in PyArrow to handle such cases. Would you be willing to also give a try at the proposed `_metadata` or `_common_metadata` files from linked documentation and see if it would be something we can use and potentially simplify? @pitrou what do you think about PyArrow updating spark row group metadata in cases where PyArrow is being used for data manipulation with schema changes in between spark workloads? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
