nevi-me commented on issue #1323:
URL: https://github.com/apache/arrow-rs/issues/1323#issuecomment-1042111353


   @e-dard there was a discussion a while ago about RDBMS query passthrough in 
Datafusion. I tried it out using @alamb's excellent guide with the TopK node. 
My approach was to create a custom optimizer that understands the source of 
`LogicalPlan::TableScan`'s source (e.g. if we've registered a Postgres schema 
in the catalog).
   That way, if a plan includes a join of 2 tables from the same DB, they could 
be rewritten into a SQL join query. Or when we have something like:
   
   ```sql
   select sum(a), avg(b), max(c), d from postgres.my_table
   where <filter expressions>
   group by d
   ```
   
   For the above query, I've previously been able to convert the logical plan:
   
   ```
   |_ Aggregate (sum(a), avg(b), min(c)) by d
      |_ Filter (<filter expr>)
         |_ Project (a, b, c, d)
            |_ TableScan (select * from postgres.my_table)
   ```
   
   Into the below by implementing `TableScan` and converting filter-pushdown 
predicates to SQL expressions
   
   ```
   |_ Aggregate (sum(a), avg(b), min(c)) by d
      |_ TableScan (select a, b, c, d from postgres.my_table where <filter 
expr>)
   ```
   
   With aggregate, join, union etc pushdown missing.
   
   Through trying to implement custom nodes & planners for RDBMS, I've also 
found that it's quite possible to translate much of the `LogicalPlan` types 
into a SQL query, though the code I was generating by hand would get very 
nested, and could benefit from some further optimization ala Apache Calcite.
   
   I appreciate that I'm very inactive of late, but I'm just mentioning it out 
of interest, because I also thought of Flight SQL being the standard interface 
to connect to various DBs when I saw the PR a few weeks ago. The alternative of 
supporting N interfaces can get daunting. `connector-x` would help with that to 
some extent, but I would prefer if DB implementations end up supporting Flight 
SQL natively.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to