magarick commented on issue #462: URL: https://github.com/apache/arrow-datafusion-python/issues/462#issuecomment-1696476891
Hi Cody! Thanks for your interest in this. I've seen a little bit of Ibis and it looks interesting. I'm also not sure improving Ibis support and making a better "native" API are conflicting goals. > My worry here is that you and the DataFusion community are going to go through the struggles that every new Python dataframe library does, and start facing the same type of questions -- is it `groupby` or `group_by`? `to_csv` or `write_csv`? The list goes on. I've seen the fragmentation of the Python data community over the years and would be far more excited to work on a standard API that supports many backends (Ibis) than bringing another Python dataframe library to the table. These differences, at least as you've described them here, seems more like a mild annoyance than a struggle to me. As long as there's reasonable documentation, I've never found slightly different names to be nearly as big a barrier as identical or similarly named things behaving differently, or differing capabilities across libraries. > We'd love to have more collaboration on Ibis for the DataFusion backend if that'd be an interesting direction to you and others. Ibis was created by Wes McKinney (creator of pandas) and taken an opinionated stance on most issues I suspect you'll face with a new dataframe library. Plus, it takes heavy inspiration from R and other previous tools! Let us know if this would be interesting to you. I'm not opposed to this at all, especially if Ibis can provide a consistent API while still exposing the full power of each underlying library. At some point, though, it seems like you'll encounter differences that preclude a uniform interface or require a specialized API for a unique feature Ibis doesn't support. However, I can see the appeal if you have people who occasionally use a large number of backends or are trying to build something that can interact with multiple systems. So I'd be surprised if there weren't value to both a native interface that exposed all of a tool's power and a universal interface since they seem to be solving different problems. If I'm wrong about Ibis' goals and capabilities, please do correct me though. > It's already integrated with visualization frameworks (Altair, Plotly, Streamlit -- any that support the `__dataframe__` protocol natively, and any others through `to_pandas()`) and ML frameworks (scikit-learn, XGBoost, more in this area coming soon). Glad that you brought this up. What's the relationship between Ibis and the Python dataframe standards protocol? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
