[ https://issues.apache.org/jira/browse/ARROW-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569154#comment-17569154 ]
Nigel McKernan edited comment on ARROW-8813 at 7/20/22 7:42 PM: ---------------------------------------------------------------- The issue [~domiden] references was committed into {{tidyr}} 1.1.0 back in May of 2020, as you can see [here|https://github.com/tidyverse/tidyr/releases#:~:text=pivot_longer()%20and%20pivot_wider()%20are%20now%20generic%20so%20implementations%0Acan%20be%20provided%20for%20objects%20other%20than%20data%20frames], more than 2 years ago. Would it be possible now to incorporate some {{tidyr}} methods that have been converted to generics into {{{}arrow{}}}? EDIT: As well, the {{nest()}} generic is now [lazily-evaluated|https://github.com/tidyverse/tidyr/releases#:~:text=The%20nest()%20generic%20now%20avoids%20computing%20on%20.data%2C%20making%20it%20more%0Acompatible%20with%20lazy%20tibbles], making it easier to do remote operations. was (Author: JIRAUSER293150): The issue [~domiden] references was committed into {{tidyr}} 1.1.0 back in May of 2020, as you can see [here|https://github.com/tidyverse/tidyr/releases#:~:text=pivot_longer()%20and%20pivot_wider()%20are%20now%20generic%20so%20implementations%0Acan%20be%20provided%20for%20objects%20other%20than%20data%20frames], more than 2 years ago. Would it be possible now to incorporate some {{tidyr}} methods that have been converted to generics into {{{}arrow{}}}? > [R] Implementing tidyr interface > -------------------------------- > > Key: ARROW-8813 > URL: https://issues.apache.org/jira/browse/ARROW-8813 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Reporter: Dominic Dennenmoser > Priority: Major > Labels: extension, feature, improvement > > I think it would be reasonable to implement an interface to the {{tidyr}} > package. The implementation would allow to lazily process ArrowTables before > put it back into the memory. However, currently you need to collect the table > first before applying tidyr methods. The following code chunk shows an > example routine: > {code:r} > library(magrittr) > arrow_table <- arrow::read_feather("table.feather", as_data_frame = FALSE) > nested_df <- > arrow_table %>% > dplyr::select(ID, 4:7, Value) %>% > dplyr::filter(Value >= 5) %>% > dplyr::group_by(ID) %>% > dplyr::collect() %>% > tidyr::nest(){code} > The main focus might be the following three methods: > * {{tidyr::[un]nest()}}, > * {{tidyr::pivot_[longer|wider]()}}, and > * {{tidyr::seperate()}}. > I suppose the last two can be fairly quickly implemented, but > {{tidyr::nest()}} and {{tidyr::unnest()}} cannot be implement before > conversion to List<Struct> will be accessible. -- This message was sent by Atlassian Jira (v8.20.10#820010)