kevinjqliu commented on issue #2921:
URL: 
https://github.com/apache/datafusion-comet/issues/2921#issuecomment-3687869903

   Thanks for starting this discussion, Andy!
   
   A few initial thoughts, 
   
   ### Implementation # 1 - Integrate with Iceberg Java library
   
   > There have been numerous challenges due to the fact that Comet and Iceberg 
depend on different versions of various libraries (including Parquet) and this 
has led to some workarounds involving shading and redesigning the API between 
Iceberg and Comet. 
   
   From what I’ve gathered, https://github.com/apache/iceberg/pull/13786 might 
help resolve this issue. I’ll take a look at the PR.  
   
   > This also adds a circular dependency between the Comet and Iceberg 
projects, where we need to release new Comet versions before we can update 
Iceberg to use that version. 
   
   This is an interesting problem. @pvary has been working on modularizing the 
different readers in the java iceberg repo, with the File Format API. Perhaps 
that can help untangle the dependency issue. 
   
   > There have also been challenges in getting PRs merged.
   
   Similar to above, happy to help here. If it’s useful to coordinate, perhaps 
we can start a #comet channel on the Iceberg slack . 
   
   
   ### For “Implementation # 2 - Integrate with Iceberg Rust library
   
   > As of version 0.12.0, Comet now includes an integration with the 
iceberg-rust crate (which uses the same arrow-rs Parquet reader that Comet 
already uses). This
   
   Thanks to the awesome work from the Comet community (@mbutrovich and perhaps 
others). [Matt 
reported](https://lists.apache.org/thread/9d6cg0xpwc5cc1sz3xg0qx5zspqt705c) 
that he was able to run all of Iceberg Java’s spark tests successfully using 
the new iceberg-rust 0.8.0 release candidate, an amazing milestone!
   
   ### Ecosystem
   Overall, I feel like iceberg-rust is the ideal place for integration, esp 
with the datafusion ecosystem. There are already a number of efforts to improve 
the iceberg-rust <> datafusion integration, which benefits Comet, pyiceberg and 
likely other projects. Improvements to the iceberg-rust repo can be helpful to 
the broader data community as a whole.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to