timsaucer commented on issue #513: URL: https://github.com/apache/datafusion-python/issues/513#issuecomment-3361133274
I've been using Claude to assist me in trying to understand the conventions, so take this with a grain of salt. - DuckDB: `result = duckdb.sql("SELECT * FROM df", df=pandas_df)` (but we've seen they can also pull directly from scope) - Spark: `spark.sql("SELECT * FROM users WHERE age > :age", age=25)` - Daft: `daft.sql("SELECT * FROM df", catalog={"df": df_customer})` - Pandas: `query = "SELECT * FROM df_customer"` (direct injection it looks like) - Postgresql: `cursor.execute("SELECT * FROM users WHERE age > %s", (25,))` or with named parameters `cursor.execute("SELECT * FROM users WHERE age > %(age)s", {"age": 25})` One potential problem with the proposed `ctx.sql("select c_custkey, c_name from {df}", df=df_customer)` is that if the user also uses f-string replacement it gets messy. For example suppose they did `ctx.sql(f"select {key_of_interest}, c_name from {df}", df=df_customer)` then I expect this would go very poorly. It would try to coerce `df` to a string because of the `f" "`. I'm a bit torn on the PostgreSQL approach. On the one hand `datafusion` upstream tries to stick closely to PostgreSQL. On the other hand the non-named parameters I find to be nasty. It reminds me of old `fprint` statements where you had to closely watch your parameter ordering. From this preliminary look it doesn't appear that there is a strong consensus in approach. If I had to pick from one of these, I would probably lean towards the Spark approach. I think the f-string replacement argument is a very valid one and would just lead to headaches down the road for our users. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org