AudriusButkevicius commented on issue #41719:
URL: https://github.com/apache/arrow/issues/41719#issuecomment-2119338677

   Seems you can rebuild the dataset from what parquet_dataset returned:
   ```python
       from pyarrow import fs
       filesystem = fs.LocalFileSystem()    
       remade_dataset = ds.FileSystemDataset(
           [
               pformat.make_fragment(
                   fragment.path,
                   filesystem,
                   fragment.partition_expression,
                   [rg.id for rg in fragment.row_groups]
               )
               for fragment in dataset.get_fragments()
           ],
           dataset.schema,
           pformat,
       )
       print(remade_dataset.to_table())
   ```
   
    but I assume this re-fetches the metadata (instead of using it from the 
_metadata file), beating the purpose of having the _metadata file in the first 
place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to