This is an automated email from the ASF dual-hosted git repository. timbrown pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-xtable.git
The following commit(s) were added to refs/heads/main by this push: new df515157 adding info related to oss unity catalog to the docs df515157 is described below commit df515157e9b2a483542f0ff67507a1e03aeafc61 Author: Sagar Lakshmipathy <18vidhyasa...@gmail.com> AuthorDate: Fri Jun 14 17:32:13 2024 -0700 adding info related to oss unity catalog to the docs --- website/docs/unity-catalog.md | 34 ++++++++++++++++++++++++++++------ 1 file changed, 28 insertions(+), 6 deletions(-) diff --git a/website/docs/unity-catalog.md b/website/docs/unity-catalog.md index 25dca5c8..2467a321 100644 --- a/website/docs/unity-catalog.md +++ b/website/docs/unity-catalog.md @@ -4,9 +4,9 @@ title: "Unity Catalog" --- # Syncing to Unity Catalog -This document walks through the steps to register an Apache XTable™ (Incubating) synced Delta table in Unity Catalog on Databricks. +This document walks through the steps to register an Apache XTable™ (Incubating) synced Delta table in Unity Catalog on Databricks and open-source Unity Catalog. -## Pre-requisites +## Pre-requisites (for Databricks Unity Catalog) 1. Source table(s) (Hudi/Iceberg) already written to external storage locations like S3/GCS/ADLS. If you don't have a source table written in S3/GCS/ADLS, you can follow the steps in [this](/docs/hms) tutorial to set it up. @@ -19,6 +19,12 @@ This document walks through the steps to register an Apache XTable™ (Incubatin 5. Clone the Apache XTable™ (Incubating) [repository](https://github.com/apache/incubator-xtable) and create the `xtable-utilities-0.1.0-SNAPSHOT-bundled.jar` by following the steps on the [Installation page](/docs/setup) +## Pre-requisites (for open-source Unity Catalog) +1. Source table(s) (Hudi/Iceberg) already written to external storage locations like S3/GCS/ADLS or local. + In this guide, we will use the local file system. + But for S3/GCS/ADLS, you must add additional properties related to the respective cloud object storage system you're working with as mentioned [here](https://github.com/unitycatalog/unitycatalog/blob/main/docs/server.md) +2. Clone the Unity Catalog repository from [here](https://github.com/unitycatalog/unitycatalog) and build the project by following the steps outlined [here](https://github.com/unitycatalog/unitycatalog?tab=readme-ov-file#prerequisites) + ## Steps ### Running sync Create `my_config.yaml` in the cloned Apache XTable™ (Incubating) directory. @@ -50,8 +56,8 @@ At this point, if you check your bucket path, you will be able to see `_delta_lo 00000000000000000000.json which contains the logs that helps query engines to interpret the source table as a Delta table. ::: -### Register the target table in Unity Catalog -In your Databricks workspace, under SQL editor, run the following queries. +### Register the target table in Databricks Unity Catalog +(After making sure you complete the pre-requisites mentioned for Databricks Unity Catalog above) In your Databricks workspace, under SQL editor, run the following queries. ```sql md title="SQL" CREATE CATALOG xtable; @@ -75,8 +81,24 @@ You can now see the created delta table in **Unity Catalog** under **Catalog** a SELECT * FROM xtable.synced_delta_schema.<table_name>; ``` +### Register the target table in open-source Unity Catalog using the CLI +(After making sure you complete the pre-requisites mentioned for open-source Unity Catalog above) In your terminal start the UC server by following the steps outlined [here](https://github.com/unitycatalog/unitycatalog/tree/main?tab=readme-ov-file#quickstart---hello-uc) + +In a different terminal, run the following commands to register the target table in Unity Catalog. + +```shell md title="shell" +bin/uc table create --full_name unity.default.people --columns "id INT, name STRING, age INT, city STRING, create_ts STRING" --storage_location /tmp/delta-dataset/people +``` + +### Validating the results +You can now read the table registered in Unity Catalog using the below command. + +```shell md title="shell" +bin/uc table read --full_name unity.default.people +``` + ## Conclusion In this guide we saw how to, 1. sync a source table to create metadata for the desired target table formats using Apache XTable™ (Incubating) -2. catalog the data in Delta format in Unity Catalog on Databricks -3. query the Delta table using Databricks SQL editor +2. catalog the data in Delta format in Unity Catalog on Databricks, and also open-source Unity Catalog +3. query the Delta table using Databricks SQL editor, and open-source Unity Catalog CLI.