[jira] [Commented] (ARROW-16701) [R] Can we execute SQL in a dplyr pipeline?

2022-05-31 Thread Jonathan Keane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544618#comment-17544618
 ] 

Jonathan Keane commented on ARROW-16701:


Yes, sorry that conflation was unintentional. We can do this today with duckdb, 
so we should try that — but in principle we should be able to use it with any 
backend that accepts sql + could speak arrow

> [R] Can we execute SQL in a dplyr pipeline?
> ---
>
> Key: ARROW-16701
> URL: https://issues.apache.org/jira/browse/ARROW-16701
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Jonathan Keane
>Priority: Major
>
> Now that we have {{to_duckdb()}} and {{to_arrow()}} is it possible to wrap 
> those and allow someone to insert arbitrary SQL into a dplyr query?
> Something like:
> {code:r}
> sql <- function(data, sql) {
>tbl <- to_duckdb(data)
>res <- DBI::dbSendQuery(dbplyr::remote_con(.data), sql, arrow = TRUE)
>   duckdb::duckdb_fetch_record_batch(res)
> }
> ds %>%
>   filter(year > 2020) %>% 
>   sql("SELECT tip_amount, fare_amount, total_amount FROM ") %>%
>   compute()
> {code}
> This won't work totally, but is vaguely what we're looking for.
> One part that we need to think about is how to deal with the {{from}} clause, 
> a few possibilities:
> * ibis does this by making you "name" the table before doing sql so you can 
> FROM explicitly
> * though maybe you could get away with FROM . like it is a magrittr thing and 
> sub that
> * empty string, and we add it in based on the lazy_tbl object
> Possibly related prior art: 
> https://dbplyr.tidyverse.org/reference/build_sql.html (though the name isn't 
> perfect IMO, and I think this is more geared towards package developers than 
> end users?)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-16701) [R] Can we execute SQL in a dplyr pipeline?

2022-05-31 Thread Ian Cook (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544617#comment-17544617
 ] 

Ian Cook commented on ARROW-16701:
--

IMO the description above conflates {{to_*()}} functions (which control which 
backend should execute the next verbs in the chain) with a hypothetical 
{{sql()}} function (which I strongly believe should operate like a pure dplyr 
verb, and could at least hypothetically work with any backend).

> [R] Can we execute SQL in a dplyr pipeline?
> ---
>
> Key: ARROW-16701
> URL: https://issues.apache.org/jira/browse/ARROW-16701
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Jonathan Keane
>Priority: Major
>
> Now that we have {{to_duckdb()}} and {{to_arrow()}} is it possible to wrap 
> those and allow someone to insert arbitrary SQL into a dplyr query?
> Something like:
> {code:r}
> sql <- function(data, sql) {
>tbl <- to_duckdb(data)
>res <- DBI::dbSendQuery(dbplyr::remote_con(.data), sql, arrow = TRUE)
>   duckdb::duckdb_fetch_record_batch(res)
> }
> ds %>%
>   filter(year > 2020) %>% 
>   sql("SELECT tip_amount, fare_amount, total_amount FROM ") %>%
>   compute()
> {code}
> This won't work totally, but is vaguely what we're looking for.
> One part that we need to think about is how to deal with the {{from}} clause, 
> a few possibilities:
> * ibis does this by making you "name" the table before doing sql so you can 
> FROM explicitly
> * though maybe you could get away with FROM . like it is a magrittr thing and 
> sub that
> * empty string, and we add it in based on the lazy_tbl object
> Possibly related prior art: 
> https://dbplyr.tidyverse.org/reference/build_sql.html (though the name isn't 
> perfect IMO, and I think this is more geared towards package developers than 
> end users?)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)