[I] Hudi integration with project nessie [hudi]

via GitHub Sat, 29 Nov 2025 19:38:42 -0800


hudi-bot opened a new issue, #14715:
URL: https://github.com/apache/hudi/issues/14715


   [https://github.com/apache/hudi/issues/2330#issuecomment-743423398] 
   
   Follow up from this. 
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-1455
   - Type: New Feature
   
   
   ---
   
   
   ## Comments
   
   14/Dec/20 14:13;rymurr;Thanks [~vinoth] for opening this. We would like to 
investigate how we could provide support for cross-table transactions and the 
git-like features from Nessie in Hudi. 
   
   for other projects (iceberg/delta) there have been interfaces specifically 
designed for new/extended catalog implementations. Where do you think its best 
to start as a similar integration point in Hudi?;;;
   
   ---
   
   14/Dec/20 18:49;vinoth;yes. historically, we did not design Hudi that way , 
as a pure metadata layer. Is that what you are referring to? 
   
   If you could point me to these implementations, I can take a look and come 
back with a proposal, for any changes we need in Hudi ;;;
   
   ---
   
   14/Dec/20 19:12;rymurr;The catalog interface in Iceberg[1] and the LogStore 
interface in Delta [3] both abstract away the file operations to commit a 
transaction. Typically for filesystems with atomic rename (eg hdfs) this just 
delegates to hdfs libraries. For S3 Iceberg delegates the locking to Hive and 
indications are that proprietary Delta delegates to an internal Databricks api 
(guessing from the code in the oss repo and the docs). Nessie fits into iceberg 
[2] and delta [4] by implementing those interfaces and performing the 
(optimistic) locking through nessie. As it hooks in at this layer it is used 
both as the locking mechanism (which is what allows for many simultaneous 
readers and writers) and is able to capture the required info to maintain the 
git-like history of branches and tags.
   
   My (admittedly not extensive) research into Hudi looks like there is indeed 
no layer for those types of operations and everything is handled in by the IO 
itself. I am not sure how easy it is to slide something like Nessie in at that 
level or if it requires something like implementing a hadoop filesystem 
interface. What do you think? 
   
   [1] http://iceberg.apache.org/custom-catalog/
   [2] 
https://github.com/apache/iceberg/tree/master/nessie/src/main/java/org/apache/iceberg/nessie
   [3] 
https://github.com/delta-io/delta/blob/master/src/main/scala/org/apache/spark/sql/delta/storage/LogStore.scala
   [4] 
https://github.com/projectnessie/nessie/blob/main/clients/deltalake/core/src/main/scala/com/dremio/nessie/deltalake/NessieLogStore.scala;;;
   
   ---
   
   17/Dec/20 23:21;vinoth;Thanks for the information Ryan. Let me process this 
and come back here with a concrete proposal. 
   
    ;;;
   
   ---
   
   18/Dec/20 15:20;rymurr;Appreciate it [~vinoth] shout if I can answer 
anything else!;;;
   
   ---
   
   07/Mar/22 01:54;melin;Will this issue continue to advance? [~vinoth] ;;;
   
   ---
   
   01/Apr/24 23:12;wenruimeng;Is there any plan for this issue? ;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Hudi integration with project nessie [hudi]

Reply via email to