This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 20b733b5151 [DOCS] Update syncing DataHub docs (#12504)
20b733b5151 is described below

commit 20b733b5151d6a31abac33d71ff5bb5260ad9e1b
Author: Sergio Gómez Villamor <[email protected]>
AuthorDate: Wed Dec 18 05:19:47 2024 +0100

    [DOCS] Update syncing DataHub docs (#12504)
---
 website/docs/syncing_datahub.md | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/website/docs/syncing_datahub.md b/website/docs/syncing_datahub.md
index 89cf9bf8799..55c0c0be601 100644
--- a/website/docs/syncing_datahub.md
+++ b/website/docs/syncing_datahub.md
@@ -9,8 +9,26 @@ obeservability, federated governance, etc.
 Since Hudi 0.11.0, you can now sync to a DataHub instance by setting 
`DataHubSyncTool` as one of the sync tool classes
 for `HoodieStreamer`.
 
-The target Hudi table will be sync'ed to DataHub as a `Dataset`. The Hudi 
table's avro schema will be sync'ed, along
-with the commit timestamp when running the sync.
+The target Hudi table will be sync'ed to DataHub as a `Dataset`, which will be 
created with the following properties:
+
+* Hudi table properties and partitioning information
+* Spark-related properties
+* User-defined properties
+* The last commit and the last commit completion timestamps
+
+Additionally, the `Dataset` object will include the following metadata:
+
+* sub-type as `Table`
+* browse path
+* parent container
+* Avro schema
+* optionally, attached with a `Domain` object
+
+Also, the parent database will be sync'ed to DataHub as a `Container`, 
including the following metadata:
+
+* sub-type as `Database`
+* browse paths
+* optionally, attached with a `Domain` object
 
 ### Configurations
 
@@ -27,6 +45,11 @@ By default, the sync config's database name and table name 
will be used to make
 Subclass `HoodieDataHubDatasetIdentifier` and set it to 
`hoodie.meta.sync.datahub.dataset.identifier.class` to customize
 the URN creation.
 
+Optionally, sync'ed `Dataset` and `Container` objects can be attached with a 
`Domain` object. To do this, set
+`hoodie.meta.sync.datahub.domain.name` to a valid `Domain` URN. Also, sync'ed 
`Dataset` can be attached with 
+user defined properties. To do this, set 
`hoodie.meta.sync.datahub.table.properties` to a comma-separated key-value
+string (_eg_ `key1=val1,key2=val2`).
+
 ### Example
 
 The following shows an example configuration to run `HoodieStreamer` with 
`DataHubSyncTool`.

Reply via email to