This is an automated email from the ASF dual-hosted git repository.

vinish pushed a commit to branch catalog-sync-docs
in repository https://gitbox.apache.org/repos/asf/incubator-xtable.git

commit 77c2cf04a9b172c5bef89edaf594e0e501adab5f
Author: Vinish Reddy <[email protected]>
AuthorDate: Tue Apr 1 19:35:22 2025 -0700

    Add how-to docs for catalog sync
---
 website/README.md                        |   4 +
 website/docs/features-and-limitations.md |   8 ++
 website/docs/how-to-catalog-sync.md      | 205 +++++++++++++++++++++++++++++++
 website/docs/how-to.md                   |   4 +-
 website/sidebars.js                      |   3 +-
 5 files changed, 221 insertions(+), 3 deletions(-)

diff --git a/website/README.md b/website/README.md
index 441bedbb..09feb496 100644
--- a/website/README.md
+++ b/website/README.md
@@ -113,5 +113,9 @@ npm run serve
 1. Create a `.md` file with all the content for Community page.
 2. Add community page to website homepage. 
 
+## Add how-to-use docs for catalog sync.
+1. Create a `.md` file with all the content for how-to-use catalog sync 
feature.
+2. Add how-to-use catalog sync to website.
+
 ## Maintainers
 [Apache XTable™ (Incubating) 
Community](https://incubator.apache.org/projects/xtable.html)
diff --git a/website/docs/features-and-limitations.md 
b/website/docs/features-and-limitations.md
index 1c7105db..9e9070d5 100644
--- a/website/docs/features-and-limitations.md
+++ b/website/docs/features-and-limitations.md
@@ -8,6 +8,8 @@ import TabItem from '@theme/TabItem';
 
 # Features and Limitations
 ## Features
+
+### Synchronizing table format metadata (TableFormatSync)
 Apache XTable™ (Incubating) provides users with the ability to translate 
metadata from one table format to another.  
 
 Apache XTable™ (Incubating) provides two sync modes, "incremental" and "full." 
The incremental mode is more lightweight and has better performance, especially 
on large tables. If there is anything that prevents the incremental mode from 
working properly, the tool will fall back to the full sync mode.
@@ -20,6 +22,12 @@ This sync provides users with the following:
    * For Iceberg, snapshots will be 
[expired](https://iceberg.apache.org/docs/latest/maintenance/#expire-snapshots) 
after a configured amount of time.
    * For Delta, the transaction log will be 
[retained](https://docs.databricks.com/en/sql/language-manual/delta-vacuum.html)
 for a configured amount of time.
 
+### Synchronizing table format metadata in external catalogs (CatalogSync)
+In addition to synchronizing table format metadata, Apache XTable™ 
(Incubating) now allows users to synchronize metadata for tables across 
multiple external catalogs continuously and incrementally.
+This reduces friction by eliminating the manual step of registering tables in 
multiple catalogs and enhances flexibility by avoiding catalog lock-in.
+HMS and AWS Glue are the two catalogs supported right now, support for other 
catalogs (Unity, Apache Polaris Apache Gravitino, DataHub) coming soon. 
+
+
 ## Limitations and Compatibility Notes
 ### General
 - Only Copy-on-Write or Read-Optimized views of tables are currently 
supported. This means that only the underlying parquet files are synced but log 
files from Hudi and [delete 
vectors](https://docs.delta.io/latest/delta-deletion-vectors.html#:~:text=Deletion%20vectors%20indicate%20changes%20to,is%20run%20on%20the%20table.)
 from Delta and Iceberg are not captured by the sync.
diff --git a/website/docs/how-to-catalog-sync.md 
b/website/docs/how-to-catalog-sync.md
new file mode 100644
index 00000000..05bf2116
--- /dev/null
+++ b/website/docs/how-to-catalog-sync.md
@@ -0,0 +1,205 @@
+---
+sidebar_position: 1
+title: "Registering your interoperable tables across multiple catalogs"
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# Registering your interoperable tables across multiple catalogs
+
+:::danger Important
+Using Apache XTable™ (Incubating) to sync your source tables across multiple 
target catalogs involves running catalog sync on your
+current dataset using a bundled jar. You can create this bundled jar by 
following the instructions
+on the [Installation page](/docs/setup). Read through Apache XTable™'s
+[GitHub 
page](https://github.com/apache/incubator-xtable#building-the-project-and-running-tests)
 for more information.
+:::
+
+In this tutorial, we’ll show you how to use Apache XTable™ (Incubating) to 
enable interoperability between catalogs. 
+For example, you can expose a Hudi, Iceberg, or Delta table in Hive Metastore 
(HMS) and make it available in the AWS Glue Data Catalog—without manually 
registering each table. 
+Additionally, Apache XTable™ (Incubating) allows you to convert the table 
format metadata. For instance, a Delta table in HMS can be exposed as an 
Iceberg table in Glue.
+
+
+## Pre-requisites
+1. Source table(s) (Hudi/Delta/Iceberg) already written to your local storage 
or external storage locations like S3/GCS/ADLS. 
+   If you don't have the source table written in place already, you can follow 
the steps in this [tutorial](/docs/how-to#create-dataset) to set it up.
+2. Clone the Apache XTable™ (Incubating) 
[repository](https://github.com/apache/incubator-xtable) and create the
+   `xtable-utilities_2.12-0.2.0-SNAPSHOT-bundled.jar` by following the steps 
on the [Installation page](/docs/setup)
+3. Hive Metastore is configured and running—either locally or on platforms 
like EMR, Dataproc, or HDInsight.
+4. Setup access to interact with AWS APIs from the command line.
+   If you haven’t installed AWSCLIv2, you do so by following the steps 
outlined in
+   [AWS 
docs](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
 and
+   also set up access credentials by following the steps
+   
[here](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html)
+
+In the next steps, we’ll walk through how to run Apache XTable™ (Incubating) 
locally to sync your tables from a source catalog to multiple target catalogs.
+
+## Steps
+
+### Running catalog sync
+
+Create `my_config_catalog.yaml` in the cloned xtable directory.
+<Tabs
+groupId="source-catalog"
+defaultValue="hms"
+values={[
+{ label: 'sourceCatalog: HMS', value: 'hms', },
+{ label: 'sourceCatalog: GLUE', value: 'glue', },
+{ label: 'sourceCatalog: STORAGE', value: 'storage', },
+]}
+>
+<TabItem value="hms">
+
+```yaml md title="yaml"
+sourceCatalog:
+   catalogId: "source-catalog-id"
+   catalogType: "HMS"
+   catalogProperties:
+      # Ex: thrift://localhost:9083
+      # Checkout org.apache.xtable.hms.HMSCatalogConfig for advanced configs.
+      externalCatalog.hms.serverUrl: "hms-server-url"
+
+targetCatalogs:
+   - catalogId: "target-catalog-id-glue"
+     catalogSyncClientImpl: "org.apache.xtable.glue.GlueCatalogSyncClient"
+     catalogProperties:
+        # Checkout org.apache.xtable.glue.GlueCatalogConfig for advanced 
configs. 
+        externalCatalog.glue.region: "aws-region"
+
+datasets:
+   - sourceCatalogTableIdentifier:
+        tableIdentifier:
+           hierarchicalId: "db.hudi_table"
+           # you only need to specify partitionSpec for HUDI sourceFormat
+           partitionSpec: "cs_sold_date_sk:VALUE"
+     targetCatalogTableIdentifiers:
+        - catalogId: "target-catalog-id-hms"
+          tableFormat: "ICEBERG"
+          tableIdentifier:
+             hierarchicalId: "db.iceberg_table"
+        - catalogId: "target-catalog-id-hms"
+          tableFormat: "DELTA"
+          tableIdentifier:
+             hierarchicalId: "db.delta_table"
+```
+
+</TabItem>
+<TabItem value="glue">
+
+```yaml md title="yaml"
+sourceCatalog:
+  catalogId: "source-catalog-id"
+  catalogType: "GLUE"
+  catalogProperties:
+    # Checkout org.apache.xtable.glue.GlueCatalogConfig for advanced configs.
+    externalCatalog.glue.region: "aws-region"
+
+targetCatalogs:
+  - catalogId: "target-catalog-id-hms"
+    catalogSyncClientImpl: "org.apache.xtable.hms.HMSCatalogSyncClient"
+    catalogProperties:
+      # Ex: thrift://localhost:9083
+      # Checkout org.apache.xtable.hms.HMSCatalogConfig for advanced configs.
+      externalCatalog.hms.serverUrl: "hms-server-url"
+
+datasets:
+  - sourceCatalogTableIdentifier:
+      tableIdentifier:
+        hierarchicalId: "db.iceberg_table"
+    targetCatalogTableIdentifiers:
+      - catalogId: "target-catalog-id-hms"
+        tableFormat: "DELTA"
+        tableIdentifier:
+          hierarchicalId: "db.delta_table"
+      - catalogId: "target-catalog-id-hms"
+        tableFormat: "HUDI"
+        tableIdentifier:
+          hierarchicalId: "db.hudi_table"
+```
+
+</TabItem>
+<TabItem value="storage">
+
+```yaml md title="yaml"
+sourceCatalog:
+   catalogId: "source-catalog-id"
+   catalogType: "STORAGE"
+   catalogProperties: {}
+
+targetCatalogs:
+   - catalogId: "target-catalog-id-glue"
+     catalogSyncClientImpl: "org.apache.xtable.glue.GlueCatalogSyncClient"
+     catalogProperties:
+        # Checkout org.apache.xtable.glue.GlueCatalogConfig for advanced 
configs.
+        externalCatalog.glue.region: "aws-region"
+
+   - catalogId: "target-catalog-id-hms"
+     catalogSyncClientImpl: "org.apache.xtable.hms.HMSCatalogSyncClient"
+     catalogProperties:
+        # Ex: thrift://localhost:9083
+        # Checkout org.apache.xtable.hms.HMSCatalogConfig for advanced configs.
+        externalCatalog.hms.serverUrl: "hms-server-url" 
+
+datasets:
+   - sourceCatalogTableIdentifier:
+        storageIdentifier:
+           tableBasePath: file:///path/to/hudi/source/data
+           tableName: table_name
+           # you only need to specify partitionSpec for HUDI sourceFormat
+           partitionSpec: partitionpath:VALUE
+           tableFormat: "HUDI"
+
+     targetCatalogTableIdentifiers:
+        - catalogId: "target-catalog-id-glue"
+          tableFormat: "DELTA"
+          tableIdentifier:
+             hierarchicalId: "db.delta_table"
+
+        - catalogId: "target-catalog-id-hms"
+          tableFormat: "DELTA"
+          tableIdentifier:
+             hierarchicalId: "db.delta_table"
+
+        - catalogId: "target-catalog-id-glue"
+          tableFormat: "ICEBERG"
+          tableIdentifier:
+             hierarchicalId: "db.iceberg_table"
+
+        - catalogId: "target-catalog-id-hms"
+          tableFormat: "ICEBERG"
+          tableIdentifier:
+             hierarchicalId: "db.iceberg_table"
+```
+
+</TabItem>
+</Tabs>
+
+:::note Note:
+1. `catalogId` is a user defined unique identifier for each catalog, useful if 
you want to sync a table to multiple glue/hms catalogs. 
+2. Replace with appropriate values for `hierarchicalId`, a 2-part or 3-part 
tableIdentifier. 
+3. For storage catalog, replace `file:///path/to/source/data` to appropriate 
`tableBasePath`
+   if you have your source table in S3/GCS/ADLS i.e.
+   * S3 - `s3://path/to/source/data`
+   * GCS - `gs://path/to/source/data` or
+   * ADLS - 
`abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>`
+4. For advanced configurations checkout java docs for 
[**GlueCatalogConfig**](https://github.com/apache/incubator-xtable/blob/main/xtable-aws/src/main/java/org/apache/xtable/glue/GlueCatalogConfig.java)
 and 
[**HMSCatalogConfig**](https://github.com/apache/incubator-xtable/blob/main/xtable-hive-metastore/src/main/java/org/apache/xtable/hms/HMSCatalogConfig.java).
+:::
+
+:::note Note:
+Authentication for AWS is done with 
`com.amazonaws.auth.DefaultAWSCredentialsProviderChain`.
+To override this setting, specify a different implementation with the 
`--awsCredentialsProvider` option.
+:::
+
+In your terminal under the cloned Apache XTable™ (Incubating) directory, run 
the below command.
+
+```shell md title="shell"
+java -cp 
xtable-utilities/target/xtable-utilities_2.12-0.2.0-SNAPSHOT-bundled.jar 
org.apache.xtable.utilities.RunCatalogSync --catalogSyncConfig 
my_config_catalog.yaml
+```
+
+**Optional:**
+Now, if you check your target catalog, you'll see the external tables have 
been created and are ready to be queried.
+
+
+## Conclusion
+In this tutorial, we explored how to set up Apache XTable™ (Incubating) to 
automatically read tables from a source catalog and create interoperable 
external tables in target catalogs.
+Once synced, these tables are immediately queryable from the target catalog—no 
manual registration needed.
\ No newline at end of file
diff --git a/website/docs/how-to.md b/website/docs/how-to.md
index a60f223f..31057ec5 100644
--- a/website/docs/how-to.md
+++ b/website/docs/how-to.md
@@ -363,5 +363,5 @@ In this tutorial, we saw how to create a source table and 
use Apache XTable™ (
 that can be used to query the source table in different target table formats.
 
 ## Next steps
-Go through the [Catalog Integration guides](/docs/catalogs-index) to register 
the Apache XTable™ (Incubating) synced tables
-in different data catalogs.
+1. Check out [Registering your interoperable tables across multiple 
catalogs](/docs/how-to-catalog-sync) to learn how to automatically register 
target tables in multiple catalogs using Apache XTable™ (Incubating).
+2. Explore the [Catalog Integration guides](/docs/catalogs-index) to register 
the Apache XTable™ (Incubating) synced tables in different data catalogs 
manually.
diff --git a/website/sidebars.js b/website/sidebars.js
index 5dd5816a..9f0b5594 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -18,7 +18,8 @@ module.exports = {
             label: 'Quick Start',
             collapsed: false,
             items: [
-                'how-to'
+                'how-to',
+                'how-to-catalog-sync'
             ],
         },
         {

Reply via email to