timsaucer commented on issue #513:
URL: 
https://github.com/apache/datafusion-python/issues/513#issuecomment-3361133274

   I've been using Claude to assist me in trying to understand the conventions, 
so take this with a grain of salt.
   
   - DuckDB: `result = duckdb.sql("SELECT * FROM df", df=pandas_df)` (but we've 
seen they can also pull directly from scope)
   - Spark: `spark.sql("SELECT * FROM users WHERE age > :age", age=25)`
   - Daft: `daft.sql("SELECT * FROM df", catalog={"df": df_customer})`
   - Pandas: `query = "SELECT * FROM df_customer"` (direct injection it looks 
like)
   - Postgresql: `cursor.execute("SELECT * FROM users WHERE age > %s", (25,))` 
or with named parameters `cursor.execute("SELECT * FROM users WHERE age > 
%(age)s", {"age": 25})`
   
   One potential problem with the proposed `ctx.sql("select c_custkey, c_name 
from {df}", df=df_customer)` is that if the user also uses f-string replacement 
it gets messy. For example suppose they did `ctx.sql(f"select 
{key_of_interest}, c_name from {df}", df=df_customer)` then I expect this would 
go very poorly. It would try to coerce `df` to a string because of the `f" "`.
   
   I'm a bit torn on the PostgreSQL approach. On the one hand `datafusion` 
upstream tries to stick closely to PostgreSQL. On the other hand the non-named 
parameters I find to be nasty. It reminds me of old `fprint` statements where 
you had to closely watch your parameter ordering.
   
   From this preliminary look it doesn't appear that there is a strong 
consensus in approach.
   
   If I had to pick from one of these, I would probably lean towards the Spark 
approach. I think the f-string replacement argument is a very valid one and 
would just lead to headaches down the road for our users. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to