kevinjqliu commented on issue #2921: URL: https://github.com/apache/datafusion-comet/issues/2921#issuecomment-3687869903
Thanks for starting this discussion, Andy! A few initial thoughts, ### Implementation # 1 - Integrate with Iceberg Java library > There have been numerous challenges due to the fact that Comet and Iceberg depend on different versions of various libraries (including Parquet) and this has led to some workarounds involving shading and redesigning the API between Iceberg and Comet. From what I’ve gathered, https://github.com/apache/iceberg/pull/13786 might help resolve this issue. I’ll take a look at the PR. > This also adds a circular dependency between the Comet and Iceberg projects, where we need to release new Comet versions before we can update Iceberg to use that version. This is an interesting problem. @pvary has been working on modularizing the different readers in the java iceberg repo, with the File Format API. Perhaps that can help untangle the dependency issue. > There have also been challenges in getting PRs merged. Similar to above, happy to help here. If it’s useful to coordinate, perhaps we can start a #comet channel on the Iceberg slack . ### For “Implementation # 2 - Integrate with Iceberg Rust library > As of version 0.12.0, Comet now includes an integration with the iceberg-rust crate (which uses the same arrow-rs Parquet reader that Comet already uses). This Thanks to the awesome work from the Comet community (@mbutrovich and perhaps others). [Matt reported](https://lists.apache.org/thread/9d6cg0xpwc5cc1sz3xg0qx5zspqt705c) that he was able to run all of Iceberg Java’s spark tests successfully using the new iceberg-rust 0.8.0 release candidate, an amazing milestone! ### Ecosystem Overall, I feel like iceberg-rust is the ideal place for integration, esp with the datafusion ecosystem. There are already a number of efforts to improve the iceberg-rust <> datafusion integration, which benefits Comet, pyiceberg and likely other projects. Improvements to the iceberg-rust repo can be helpful to the broader data community as a whole. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
