[ https://issues.apache.org/jira/browse/ARROW-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weston Pace updated ARROW-12693: -------------------------------- Summary: [R] Usage of computer function - Use case of unique function (was: Usage of computer function - Use case of unique function) > [R] Usage of computer function - Use case of unique function > ------------------------------------------------------------ > > Key: ARROW-12693 > URL: https://issues.apache.org/jira/browse/ARROW-12693 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Reporter: Sam Albers > Priority: Minor > > I am trying to see if I can leverage `unique` on a Dataset object. Imagining > a much big dataset, I am trying to get away from this expensive pattern: > {code:java} > Dataset %>% > pull(col) %>% > unique(){code} > However when I try the option below it is not working quite how I'd expect. > I'm actually not able to get any working (e.g. `arrow_mean`) so maybe I am > misunderstanding how these are meant to work. > {code:java} > library(arrow, warn.conflicts = FALSE) > library(dplyr, warn.conflicts = FALSE) > dir.create("iris") > iris %>% > group_by(Species) %>% > write_dataset("iris") > ds <- open_dataset("iris") > ds %>% > mutate(unique = arrow_unique(Species)) %>% > collect() > #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar > expression unique("setosa") > ds %>% > mutate(unique = arrow_unique(Petal.Width)) %>% > collect() > #> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar > expression {Sepal.Length=Sepal.Length, Sepal.Width=Sepal.Width, > Petal.Length=Petal.Length, Petal.Width=Petal.Width, Species="setosa", > unique=unique(Petal.Width)} > call_function("unique", ds, "Species") > #> Error: Argument 1 is of class FileSystemDataset but it must be one of > "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar" > call_function("unique", ds, "Petal.Width") > #> Error: Argument 1 is of class FileSystemDataset but it must be one of > "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar" > call_function("mean", ds, "Petal.Width") > #> Error: Argument 1 is of class FileSystemDataset but it must be one of > "Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar" > sessioninfo::session_info() > #> - Session info > --------------------------------------------------------------- > #> setting value > #> version R version 4.0.5 (2021-03-31) > #> os Windows 10 x64 > #> system x86_64, mingw32 > #> ui RTerm > #> language (EN) > #> collate English_Canada.1252 > #> ctype English_Canada.1252 > #> tz America/Los_Angeles > #> date 2021-05-07 > #> > #> - Packages > ------------------------------------------------------------------- > #> package * version date lib source > #> arrow * 4.0.0 2021-04-27 [1] CRAN (R 4.0.5) > #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) > #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3) > #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2) > #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2) > #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5) > #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3) > #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3) > #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) > #> dplyr * 1.0.5 2021-03-05 [1] CRAN (R 4.0.5) > #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5) > #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) > #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3) > #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) > #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3) > #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) > #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.4) > #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3) > #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5) > #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4) > #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) > #> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5) > #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0) > #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0) > #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.0.5) > #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2) > #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2) > #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2) > #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3) > #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5) > #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3) > #> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4) > #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) > #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2) > #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2) > #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.4) > #> tibble 3.1.1 2021-04-18 [1] CRAN (R 4.1.0) > #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.5) > #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5) > #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5) > #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.4) > #> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.4) > #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) > #> > #> [1] C:/Users/salbers/R/win-library/4.0 > #> [2] C:/Program Files/R/R-4.0.5/library > {code} > {color:#172b4d}I am opening this a) because others may have run into the same > issue and b) just in case this is actually a bug. Feel free to close > immediately if this isn't the way these are supposed to work. {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)