rjzamora edited a comment on pull request #10991:
URL: https://github.com/apache/arrow/pull/10991#issuecomment-956288941


   Thanks for all the great work here @jorisvandenbossche!
   
   In order to utilize the Dataset API for read_orc in Dask, we will need an 
API to split file-level fragments into stripe-level fragments.  For example, 
for parquet datasets there is a `split_by_row_group` method.
   
   We also want to be able to select a subset of stripes from a file fragment 
to produce a new dataset fragment. For example, for parquet datasets we can do 
`old_frag.format.make_fragment(..., row_groups=selected_row_group_indices)`.
   
   Does it make sense for me to raise separate Jira issues for these features? 
Or, is this functionality already available?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to