alamb commented on issue #12357:
URL: https://github.com/apache/datafusion/issues/12357#issuecomment-2345936261

   > DataFusion has a lot of really excellent foundational engineering. How 
it's used by so many downstream DB engines attests strongly to that. I think 
it's a real shame that it isn't quite as suitable for the role that 
pandas/dask/polars/duckdb currently occupies. This isn't due to anything 
lacking in the query engine, but the overall user experience for a direct user 
isn't quite as solid (as opposed to someone using it as a library).
   
   Thank you @kszlim  -- This is well stated, and I think this is one of the 
core tensions that has existed in the project from the early days
   
   One way to go is as you suggest and try and make datafusion the superset of 
all that is good about polars (python dataframes) and duckdb (sql). I worry 
that this will result in an even larger library that isn't as good as either. 
   
   Another potential way is to keep the core focused on fundamentals and work 
to provide open source alternatives to those other libraries *built on* 
datafusion. It is my not-so-secret goal with the following discussions:
   * `polars`: https://github.com/apache/datafusion-python/issues/440 (🙌 
@timsaucer )
   * `duckdb`: https://github.com/apache/datafusion/issues/11979 (🙌 
@matthewmturner )
   
   I am hopeing to see datafusion-python (or maybe a library built on 
datafusion-python) and `dft`  evolve into delightful end user experiences.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to