thisisnic edited a comment on issue #83: URL: https://github.com/apache/arrow-cookbook/issues/83#issuecomment-937927421
Thanks for opening this issue @GShotwell! I agree that this would be a really useful thing to a) be able to do in arrow and b) have a recipe for in this cookbook. There's currently no optimised way of doing this in arrow, but I've opened up a request for this to be implemented at the C++ level; once this is done, we can look at implementing this in R. https://issues.apache.org/jira/browse/ARROW-14254 In the short-term, the snippet below shows how you can achieve this if you have existing files without having to write any new data to them, though I don't think we want to add a recipe for this as apparently it's a bit slow - I'd like to wait until we have a nice way of doing it properly utilising the underlying C++ functionality before adding it to the cookbook. ``` tf <- tempfile() dir.create(tf) # I've added the grouping just so it's written as a partitioned dataset iris %>% group_by(Species) %>% write_dataset(tf) iris_dataset <- open_dataset(tf) # sample as many values as needed rows <- sample(seq_len(nrow(iris_dataset)), 10) # return the sample collect(iris_dataset[rows,]) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
