[ https://issues.apache.org/jira/browse/ARROW-13472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Keane resolved ARROW-13472. ------------------------------------ Resolution: Fixed Issue resolved by pull request 11307 [https://github.com/apache/arrow/pull/11307] > [R] Remove .engine = "duckdb" argument > -------------------------------------- > > Key: ARROW-13472 > URL: https://issues.apache.org/jira/browse/ARROW-13472 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Reporter: Ian Cook > Assignee: Jonathan Keane > Priority: Critical > Labels: good-first-issue, pull-request-available > Fix For: 6.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > ARROW-12688 added: > * A new function {{to_duckdb()}} which registers an Arrow Dataset with > DuckDB and returns a dbplyr object that can be used in dplyr pipelines > * An {{.engine = "duckdb"}} option in the {{summarise()}} function which > calls {{to_duckdb()}} inside {{summarise()}} > At the moment, the latter is very convenient because {{summarise()}} is not > yet natively supported for Arrow Datasets. > However, this {{.engine = "duckdb"}} option is probably not such a great > design for how users should interact with the arrow package in the longer > term after native {{summarise()}} support is added. At that point, it will > seem strange that this one particular dplyr verb has an {{.engine}} option > while the others do not. Adding the option to all the other dplyr verbs also > seems like a poor UX design. > Consider whether we should ultimately have users choose whether to use the > Arrow C++ engine or the DuckDB engine by passing an {{.engine}} argument to > the {{collect()}} or {{compute()}} function, as [~jonkeane] suggested in > these comments. {{collect()}} would return a tibble whereas {{compute()}} > would return an Arrow Table. -- This message was sent by Atlassian Jira (v8.3.4#803005)