[ 
https://issues.apache.org/jira/browse/ARROW-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569154#comment-17569154
 ] 

Nigel McKernan commented on ARROW-8813:
---------------------------------------

The issue [~domiden] references was committed into {{tidyr}}  1.1.0 back in May 
of 2020, as you can see 
[here|https://github.com/tidyverse/tidyr/releases#:~:text=pivot_longer()%20and%20pivot_wider()%20are%20now%20generic%20so%20implementations%0Acan%20be%20provided%20for%20objects%20other%20than%20data%20frames],
 more than 2 years ago.

 

Would it be possible now to incorporate some {{tidyr}} methods that have been 
converted to generics into {{{}arrow{}}}?

> [R] Implementing tidyr interface
> --------------------------------
>
>                 Key: ARROW-8813
>                 URL: https://issues.apache.org/jira/browse/ARROW-8813
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Dominic Dennenmoser
>            Priority: Major
>              Labels: extension, feature, improvement
>
> I think it would be reasonable to implement an interface to the {{tidyr}} 
> package. The implementation would allow to lazily process ArrowTables before 
> put it back into the memory. However, currently you need to collect the table 
> first before applying tidyr methods. The following code chunk shows an 
> example routine:
> {code:r}
> library(magrittr)
> arrow_table <- arrow::read_feather("table.feather", as_data_frame = FALSE) 
> nested_df <-
>    arrow_table %>%
>    dplyr::select(ID, 4:7, Value) %>%
>    dplyr::filter(Value >= 5) %>%
>    dplyr::group_by(ID) %>%
>    dplyr::collect() %>%
>    tidyr::nest(){code}
> The main focus might be the following three methods:
>  * {{tidyr::[un]nest()}},
>  * {{tidyr::pivot_[longer|wider]()}}, and
>  * {{tidyr::seperate()}}.
> I suppose the last two can be fairly quickly implemented, but 
> {{tidyr::nest()}} and {{tidyr::unnest()}} cannot be implement before 
> conversion to List<Struct> will be accessible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to