AlenkaF commented on issue #34212: URL: https://github.com/apache/arrow/issues/34212#issuecomment-4143443039
Thank you for the ping @thisisnic, this issue slipped our radar. I think adding `partition` property to `FileFragment` would be a nice improvement. For the time being, there is a possible workaround. As`ParquetFileFragment` is inheriting the [repr method from `FileFragment`](https://github.com/apache/arrow/blob/2a526c1e623df0ce8b9b76b03d4d9c617d21fda1/python/pyarrow/_dataset.pyx#L1982-L1995) and this method is using [`get_partition_keys`](https://github.com/apache/arrow/blob/80db1020881f461af3a300653cfd2333ef10a45e/python/pyarrow/_dataset.pyx#L4043) one could do the same on a `ParquetFileFragment.partition_expression` which is inherited from the Fragment class: https://github.com/apache/arrow/blob/2a526c1e623df0ce8b9b76b03d4d9c617d21fda1/python/pyarrow/_dataset.pyx#L1480-L1484 Which would look similar to: ```python get_partition_keys(ParquetFileFragment.partition_expression) ``` As for the ordering, it looks like it is non-deterministic (test checks both possible orderings) https://github.com/apache/arrow/blob/2a526c1e623df0ce8b9b76b03d4d9c617d21fda1/python/pyarrow/tests/test_dataset.py#L1934-L1940 Summing up things that could be done here: - add `partition` property - better document `get_partition_keys` (add info to the User Guide, not only API docs) - maybe document the non-deterministic ordering in the repr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
