[jira] [Commented] (ARROW-16701) [R] Can we execute SQL in a dplyr pipeline?
[ https://issues.apache.org/jira/browse/ARROW-16701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544618#comment-17544618 ] Jonathan Keane commented on ARROW-16701: Yes, sorry that conflation was unintentional. We can do this today with duckdb, so we should try that — but in principle we should be able to use it with any backend that accepts sql + could speak arrow > [R] Can we execute SQL in a dplyr pipeline? > --- > > Key: ARROW-16701 > URL: https://issues.apache.org/jira/browse/ARROW-16701 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Jonathan Keane >Priority: Major > > Now that we have {{to_duckdb()}} and {{to_arrow()}} is it possible to wrap > those and allow someone to insert arbitrary SQL into a dplyr query? > Something like: > {code:r} > sql <- function(data, sql) { >tbl <- to_duckdb(data) >res <- DBI::dbSendQuery(dbplyr::remote_con(.data), sql, arrow = TRUE) > duckdb::duckdb_fetch_record_batch(res) > } > ds %>% > filter(year > 2020) %>% > sql("SELECT tip_amount, fare_amount, total_amount FROM ") %>% > compute() > {code} > This won't work totally, but is vaguely what we're looking for. > One part that we need to think about is how to deal with the {{from}} clause, > a few possibilities: > * ibis does this by making you "name" the table before doing sql so you can > FROM explicitly > * though maybe you could get away with FROM . like it is a magrittr thing and > sub that > * empty string, and we add it in based on the lazy_tbl object > Possibly related prior art: > https://dbplyr.tidyverse.org/reference/build_sql.html (though the name isn't > perfect IMO, and I think this is more geared towards package developers than > end users?) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (ARROW-16701) [R] Can we execute SQL in a dplyr pipeline?
[ https://issues.apache.org/jira/browse/ARROW-16701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544617#comment-17544617 ] Ian Cook commented on ARROW-16701: -- IMO the description above conflates {{to_*()}} functions (which control which backend should execute the next verbs in the chain) with a hypothetical {{sql()}} function (which I strongly believe should operate like a pure dplyr verb, and could at least hypothetically work with any backend). > [R] Can we execute SQL in a dplyr pipeline? > --- > > Key: ARROW-16701 > URL: https://issues.apache.org/jira/browse/ARROW-16701 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Jonathan Keane >Priority: Major > > Now that we have {{to_duckdb()}} and {{to_arrow()}} is it possible to wrap > those and allow someone to insert arbitrary SQL into a dplyr query? > Something like: > {code:r} > sql <- function(data, sql) { >tbl <- to_duckdb(data) >res <- DBI::dbSendQuery(dbplyr::remote_con(.data), sql, arrow = TRUE) > duckdb::duckdb_fetch_record_batch(res) > } > ds %>% > filter(year > 2020) %>% > sql("SELECT tip_amount, fare_amount, total_amount FROM ") %>% > compute() > {code} > This won't work totally, but is vaguely what we're looking for. > One part that we need to think about is how to deal with the {{from}} clause, > a few possibilities: > * ibis does this by making you "name" the table before doing sql so you can > FROM explicitly > * though maybe you could get away with FROM . like it is a magrittr thing and > sub that > * empty string, and we add it in based on the lazy_tbl object > Possibly related prior art: > https://dbplyr.tidyverse.org/reference/build_sql.html (though the name isn't > perfect IMO, and I think this is more geared towards package developers than > end users?) -- This message was sent by Atlassian Jira (v8.20.7#820007)