AudriusButkevicius commented on issue #41719:
URL: https://github.com/apache/arrow/issues/41719#issuecomment-2119338677
Seems you can rebuild the dataset from what parquet_dataset returned:
```python
from pyarrow import fs
filesystem = fs.LocalFileSystem()
remade_dataset = ds.FileSystemDataset(
[
pformat.make_fragment(
fragment.path,
filesystem,
fragment.partition_expression,
[rg.id for rg in fragment.row_groups]
)
for fragment in dataset.get_fragments()
],
dataset.schema,
pformat,
)
print(remade_dataset.to_table())
```
but I assume this re-fetches the metadata (instead of using it from the
_metadata file), beating the purpose of having the _metadata file in the first
place.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]