rjzamora commented on pull request #10991: URL: https://github.com/apache/arrow/pull/10991#issuecomment-956288941
Thanks for all the great work here @jorisvandenbossche! In order to utilize the Dataset API for read_orc in Dask, we will need an API to split file-level fragments into stripe-level fragments. For example, for parquet datasets there is a `split_by_row_group` method. We also want to be able to select a subset of stripes from a file fragment to produce a new dataset fragment. For example, for parquet datasets we can do `old_frag.format.make_fragment(..., row_groups=selected_row_group_indices)`. Does it make sense for me to raise seperate Jira issues for these features? Or, is this functionality already available? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
