Maybe you can give more background about Gravitino.
On 2024/08/30 07:50:31 Minghuang Li wrote: > Hello Hudi Devs, > > First and foremost, I would like to express my admiration for the Apache Hudi > project. The innovation and robust features you've brought to data lake > technology management are truly impressive and are greatly valued by the > developer community. > > I'm currently integrating Apache Hudi into Apache Gravitino[1] project to > more efficiently manage data lake metadata. We plan to implement a Hudi > catalog[2] in Gravitino and I am reaching out for advice to ensure we align > with Hudi's best practices and future direction. > > Through my research into the Hudi project, I have noted the current state of > metadata management (please correct me if I am wrong): > > 1. Hudi does not currently offer a unified catalog interface > specification (for instance, a unified interface for Table metadata. The > existing HoodieTable seems designed for table data read/write, not metadata). > 2. Hudi provides various sync tools that can sync metadata to an > external catalog post-data write. Although they implement the > HoodieMetaSyncOperations interface, it does not offer Hudi database and table > abstractions, and seems unable to guarantee consistency (e.g., data write > succeeds but metadata sync fails). > > Based on these observations, a couple of things I’m hoping to get your > insights on: > > Catalog Interface: Is there a stable and unified catalog interface in Hudi > that we can use to ensure compatibility across different Hudi versions? If > such an interface exists, could you point me towards some documentation or > examples? If not, what approach would you recommend for unifying access to > Hudi metadata? > > Future Developments: Are there any plans for official catalog management > features in Hudi? We want to ensure our implementation is future-proof and > would appreciate any details on upcoming enhancements that might impact > catalog management. > > Engine Support: Gravitino supports Spark versions 3.3, 3.4, and 3.5. > Currently, only the latest version of Hudi (0.15) supports Spark 3.5. I am > concerned that developing on this version might introduce stability and > compatibility issues. Additionally, Gravitino's Spark plugin is based on the > Spark v2 interface, while Hudi's Spark support uses the v1 interface. I've > seen plans in the community about supporting Spark v2; could you provide a > timeline for this? This will also determine how Gravitino's Spark plugin will > implement Hudi querying moving forward. > > I would greatly appreciate any guidance and support the Hudi community can > offer. Your insights would be invaluable in ensuring the successful > integration of Hudi into our project. Thank you very much for your time and > assistance! > > Best regards, > Minghuang Li > > [1] https://github.com/apache/gravitino > [2] https://lists.apache.org/thread/bmz4xsv2ogpccy5wtopyy9hp1cot317b > >