hudi-bot opened a new issue, #14715: URL: https://github.com/apache/hudi/issues/14715
[https://github.com/apache/hudi/issues/2330#issuecomment-743423398] Follow up from this. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-1455 - Type: New Feature --- ## Comments 14/Dec/20 14:13;rymurr;Thanks [~vinoth] for opening this. We would like to investigate how we could provide support for cross-table transactions and the git-like features from Nessie in Hudi. for other projects (iceberg/delta) there have been interfaces specifically designed for new/extended catalog implementations. Where do you think its best to start as a similar integration point in Hudi?;;; --- 14/Dec/20 18:49;vinoth;yes. historically, we did not design Hudi that way , as a pure metadata layer. Is that what you are referring to? If you could point me to these implementations, I can take a look and come back with a proposal, for any changes we need in Hudi ;;; --- 14/Dec/20 19:12;rymurr;The catalog interface in Iceberg[1] and the LogStore interface in Delta [3] both abstract away the file operations to commit a transaction. Typically for filesystems with atomic rename (eg hdfs) this just delegates to hdfs libraries. For S3 Iceberg delegates the locking to Hive and indications are that proprietary Delta delegates to an internal Databricks api (guessing from the code in the oss repo and the docs). Nessie fits into iceberg [2] and delta [4] by implementing those interfaces and performing the (optimistic) locking through nessie. As it hooks in at this layer it is used both as the locking mechanism (which is what allows for many simultaneous readers and writers) and is able to capture the required info to maintain the git-like history of branches and tags. My (admittedly not extensive) research into Hudi looks like there is indeed no layer for those types of operations and everything is handled in by the IO itself. I am not sure how easy it is to slide something like Nessie in at that level or if it requires something like implementing a hadoop filesystem interface. What do you think? [1] http://iceberg.apache.org/custom-catalog/ [2] https://github.com/apache/iceberg/tree/master/nessie/src/main/java/org/apache/iceberg/nessie [3] https://github.com/delta-io/delta/blob/master/src/main/scala/org/apache/spark/sql/delta/storage/LogStore.scala [4] https://github.com/projectnessie/nessie/blob/main/clients/deltalake/core/src/main/scala/com/dremio/nessie/deltalake/NessieLogStore.scala;;; --- 17/Dec/20 23:21;vinoth;Thanks for the information Ryan. Let me process this and come back here with a concrete proposal. ;;; --- 18/Dec/20 15:20;rymurr;Appreciate it [~vinoth] shout if I can answer anything else!;;; --- 07/Mar/22 01:54;melin;Will this issue continue to advance? [~vinoth] ;;; --- 01/Apr/24 23:12;wenruimeng;Is there any plan for this issue? ;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
