rjzamora edited a comment on pull request #10991: URL: https://github.com/apache/arrow/pull/10991#issuecomment-956288941
Thanks for all the great work here @jorisvandenbossche! In order to utilize the Dataset API for read_orc in Dask, we will need an API to split file-level fragments into stripe-level fragments. For example, for parquet datasets there is a `split_by_row_group` method. We also want to be able to select a subset of stripes from a file fragment to produce a new dataset fragment. For example, for parquet datasets we can do `old_frag.format.make_fragment(..., row_groups=selected_row_group_indices)`. Does it make sense for me to raise separate Jira issues for these features? Or, is this functionality already available? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org