[ https://issues.apache.org/jira/browse/ARROW-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569154#comment-17569154 ]
Nigel McKernan commented on ARROW-8813: --------------------------------------- The issue [~domiden] references was committed into {{tidyr}} 1.1.0 back in May of 2020, as you can see [here|https://github.com/tidyverse/tidyr/releases#:~:text=pivot_longer()%20and%20pivot_wider()%20are%20now%20generic%20so%20implementations%0Acan%20be%20provided%20for%20objects%20other%20than%20data%20frames], more than 2 years ago. Would it be possible now to incorporate some {{tidyr}} methods that have been converted to generics into {{{}arrow{}}}? > [R] Implementing tidyr interface > -------------------------------- > > Key: ARROW-8813 > URL: https://issues.apache.org/jira/browse/ARROW-8813 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Reporter: Dominic Dennenmoser > Priority: Major > Labels: extension, feature, improvement > > I think it would be reasonable to implement an interface to the {{tidyr}} > package. The implementation would allow to lazily process ArrowTables before > put it back into the memory. However, currently you need to collect the table > first before applying tidyr methods. The following code chunk shows an > example routine: > {code:r} > library(magrittr) > arrow_table <- arrow::read_feather("table.feather", as_data_frame = FALSE) > nested_df <- > arrow_table %>% > dplyr::select(ID, 4:7, Value) %>% > dplyr::filter(Value >= 5) %>% > dplyr::group_by(ID) %>% > dplyr::collect() %>% > tidyr::nest(){code} > The main focus might be the following three methods: > * {{tidyr::[un]nest()}}, > * {{tidyr::pivot_[longer|wider]()}}, and > * {{tidyr::seperate()}}. > I suppose the last two can be fairly quickly implemented, but > {{tidyr::nest()}} and {{tidyr::unnest()}} cannot be implement before > conversion to List<Struct> will be accessible. -- This message was sent by Atlassian Jira (v8.20.10#820010)