[ 
https://issues.apache.org/jira/browse/ARROW-13766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461577#comment-17461577
 ] 

Dewey Dunnington commented on ARROW-13766:
------------------------------------------

Without ties this isn't bad:

{code:R}
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

df <- tibble(a = rep(letters, 10), b = 1:260, c = 260:1)

# slice_*() without ties is easier
record_batch(df) %>% 
  arrange(c) %>% head(5) %>%
  collect()
#> # A tibble: 5 × 3
#>   a         b     c
#>   <chr> <int> <int>
#> 1 z       260     1
#> 2 y       259     2
#> 3 x       258     3
#> 4 w       257     4
#> 5 v       256     5

record_batch(df) %>% 
  arrange(desc(c)) %>% head(5) %>%
  collect()
#> # A tibble: 5 × 3
#>   a         b     c
#>   <chr> <int> <int>
#> 1 a         1   260
#> 2 b         2   259
#> 3 c         3   258
#> 4 d         4   257
#> 5 e         5   256
{code}

With ties isn't too bad either (just needs a join):

{code:R}
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

df <- tibble(a = rep(letters, 10), b = 1:260, c = 260:1)

# slice_*() with ties needs a join
rb <-  record_batch(df)
rb %>% arrange(a) %>% select(a) %>% head(5) %>% distinct() %>% left_join(rb) 
%>% collect()
#> # A tibble: 10 × 3
#>    a         b     c
#>    <chr> <int> <int>
#>  1 a         1   260
#>  2 a        27   234
#>  3 a        53   208
#>  4 a        79   182
#>  5 a       105   156
#>  6 a       131   130
#>  7 a       157   104
#>  8 a       183    78
#>  9 a       209    52
#> 10 a       235    26
{code}



> [R] Add Arrow methods slice_min(), slice_max()
> ----------------------------------------------
>
>                 Key: ARROW-13766
>                 URL: https://issues.apache.org/jira/browse/ARROW-13766
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Ian Cook
>            Priority: Major
>              Labels: query-engine
>             Fix For: 7.0.0
>
>
> Implement [{{slice_min()}} and 
> {{slice_max()}}|https://dplyr.tidyverse.org/reference/slice.html] methods for 
> {{ArrowTabular}}, {{Dataset}}, and {{arrow_dplyr_query}} objects.
> These dplyr functions supersede the older dplyr function 
> [{{top_n()}}|https://dplyr.tidyverse.org/reference/top_n.html] which I 
> suppose we should also consider implementing a method for.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to