This is an automated email from the ASF dual-hosted git repository. ashvin pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-xtable.git
commit c07eef19dac205ff4904dd75fdfd45144eba2c98 Author: Kyle Weller <[email protected]> AuthorDate: Thu Mar 7 23:19:43 2024 -0800 made sure (incubating) clarification shows up --- website/README.md | 4 ++-- website/docs/athena.md | 6 +++--- website/docs/biglake-metastore.md | 6 +++--- website/docs/bigquery.md | 6 +++--- website/docs/demo/docker.md | 2 +- website/docs/fabric.md | 8 ++++---- website/docs/features-and-limitations.md | 2 +- website/docs/glue-catalog.md | 6 +++--- website/docs/hms.md | 2 +- website/docs/how-to.md | 4 ++-- website/docs/presto.md | 12 ++++++------ website/docs/query-engines-index.md | 2 +- website/docs/redshift.md | 4 ++-- website/docs/trino.md | 8 ++++---- website/docs/unity-catalog.md | 2 +- website/docusaurus.config.js | 2 +- website/static/index.html | 12 ++++++------ 17 files changed, 44 insertions(+), 44 deletions(-) diff --git a/website/README.md b/website/README.md index 74803fb6..0e45f99d 100644 --- a/website/README.md +++ b/website/README.md @@ -1,6 +1,6 @@ # Apache XTable™ (Incubating) Website Source Code -This repo hosts the source code of [Apache XTable™](https://github.com/apache/incubator-xtable) +This repo hosts the source code of [Apache XTable™ (Incubating)](https://github.com/apache/incubator-xtable) ## Prerequisite @@ -63,4 +63,4 @@ npm run serve ## Maintainers -[Apache XTable™ Community](https://incubator.apache.org/projects/xtable.html) +[Apache XTable™ (Incubating) Community](https://incubator.apache.org/projects/xtable.html) diff --git a/website/docs/athena.md b/website/docs/athena.md index 79b1992d..8ee8eab0 100644 --- a/website/docs/athena.md +++ b/website/docs/athena.md @@ -4,7 +4,7 @@ title: "Amazon Athena" --- # Querying from Amazon Athena -To read an Apache XTable™ synced target table (regardless of the table format) in Amazon Athena, +To read an Apache XTable™ (Incubating) synced target table (regardless of the table format) in Amazon Athena, you can create the table either by: * Using a DDL statement as mentioned in the following AWS docs: * [Example](https://docs.aws.amazon.com/athena/latest/ug/querying-hudi.html#querying-hudi-in-athena-creating-hudi-tables) for Hudi @@ -12,8 +12,8 @@ you can create the table either by: * [Example](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html#querying-iceberg-creating-tables-query-editor) for Iceberg * Or maintain the tables in Glue Data Catalog -For an end to end tutorial that walks through S3, Glue Data Catalog and Athena to query an Apache XTable™ synced table, -you can refer to the Apache XTable™ [Glue Data Catalog Guide](/docs/glue-catalog). +For an end to end tutorial that walks through S3, Glue Data Catalog and Athena to query an Apache XTable™ (Incubating) synced table, +you can refer to the Apache XTable™ (Incubating) [Glue Data Catalog Guide](/docs/glue-catalog). :::danger LIMITATION for Hudi target format: To validate the Hudi targetFormat table results, you need to ensure that the query engine that you're using diff --git a/website/docs/biglake-metastore.md b/website/docs/biglake-metastore.md index 6f095682..bceaa6f4 100644 --- a/website/docs/biglake-metastore.md +++ b/website/docs/biglake-metastore.md @@ -13,10 +13,10 @@ This document walks through the steps to register an Apache XTable™ (Incubatin 1. Source (Hudi/Delta) table(s) already written to Google Cloud Storage. If you don't have the source table written in GCS, you can follow the steps in [this](/docs/how-to#create-dataset) tutorial to set it up. -2. To ensure that the BigLake API's caller (your service account used by XTable) has the +2. To ensure that the BigLake API's caller (your service account used by Apache XTable™ (Incubating)) has the necessary permissions to create a BigLake table, ask your administrator to grant [BigLake Admin](https://cloud.google.com/iam/docs/understanding-roles#biglake.admin) (roles/bigquery.admin) access to the service account. -3. To ensure that the Storage Account API's caller (your service account used by XTable) has the +3. To ensure that the Storage Account API's caller (your service account used by Apache XTable™ (Incubating)) has the necessary permissions to write log/metadata files in GCS, ask your administrator to grant [Storage Object User](https://cloud.google.com/storage/docs/access-control/iam-roles) (roles/storage.objectUser) access to the service account. 4. If you're running Apache XTable™ (Incubating) outside GCP, you need to provide the machine access to interact with BigLake and GCS. @@ -137,6 +137,6 @@ projects/<yourProjectName>/locations/us-west1/catalogs/onetable/databases/onetab ## Conclusion In this guide we saw how to, -1. sync a source table to create Iceberg metadata with XTable +1. sync a source table to create Iceberg metadata with Apache XTable™ (Incubating) 2. catalog the data as an Iceberg table in BigLake Metastore 3. validate the table creation using `projects.locations.catalogs.databases.tables.get` method diff --git a/website/docs/bigquery.md b/website/docs/bigquery.md index 73a8e3b1..934f86e1 100644 --- a/website/docs/bigquery.md +++ b/website/docs/bigquery.md @@ -10,7 +10,7 @@ To read an Apache XTable™ (Incubating) synced [Iceberg table from BigQuery](ht you have two options: #### [Using Iceberg JSON metadata file to create the Iceberg BigLake tables](https://cloud.google.com/bigquery/docs/iceberg-tables#create-using-metadata-file): -Apache XTable™ outputs metadata files for Iceberg target format syncs which can be used by BigQuery +Apache XTable™ (Incubating) outputs metadata files for Iceberg target format syncs which can be used by BigQuery to read the BigLake tables. ```sql md title="sql" @@ -48,8 +48,8 @@ If you are not planning on using Iceberg, then you do not need to add these to y #### [Using BigLake Metastore to create the Iceberg BigLake tables](https://cloud.google.com/bigquery/docs/iceberg-tables#create-using-biglake-metastore): -You can use two options to register Apache XTable™ synced Iceberg tables to BigLake Metastore: -* To directly register the Apache XTable™ synced Iceberg table to BigLake Metastore, +You can use two options to register Apache XTable™ (Incubating) synced Iceberg tables to BigLake Metastore: +* To directly register the Apache XTable™ (Incubating) synced Iceberg table to BigLake Metastore, follow the [Apache XTable™ guide to integrate with BigLake Metastore](/docs/biglake-metastore) * Use [stored procedures for Spark](https://cloud.google.com/bigquery/docs/spark-procedures) on BigQuery to register the table in BigLake Metastore and query the tables from BigQuery. diff --git a/website/docs/demo/docker.md b/website/docs/demo/docker.md index eaf4d17b..1ee4754c 100644 --- a/website/docs/demo/docker.md +++ b/website/docs/demo/docker.md @@ -10,7 +10,7 @@ For this purpose, a self-contained data infrastructure is brought up as Docker c ## Pre-requisites * Install Docker in your local machine -* Clone [Apache XTable™ GitHub repository](https://github.com/apache/incubator-xtable) +* Clone [Apache XTable™ (Incubating) GitHub repository](https://github.com/apache/incubator-xtable) :::note NOTE: This demo was tested in both x86-64 and AArch64 based macOS operating systems diff --git a/website/docs/fabric.md b/website/docs/fabric.md index f9ceee6f..3067bf56 100644 --- a/website/docs/fabric.md +++ b/website/docs/fabric.md @@ -9,7 +9,7 @@ import TabItem from '@theme/TabItem'; # Querying from Microsoft Fabric This guide offers a short tutorial on how to query Apache Iceberg and Apache Hudi tables in Microsoft Fabric utilizing the translation capabilities of Apache XTable™ (Incubating). This tutorial is intended solely for demonstration and to verify the -compatibility of Apache XTable™'s output with Fabric. The tutorial leverages the currently[^1] available features in Fabric, like +compatibility of Apache XTable™ (Incubating) output with Fabric. The tutorial leverages the currently[^1] available features in Fabric, like `Shortcuts`. @@ -26,7 +26,7 @@ to data in other file systems. ## Tutorial The objective of the following tutorial is to translate an Iceberg or Hudi table in ADLS storage account into Delta Lake -format using Apache XTable™. After translation, this table will be accessible for querying from various Fabric engines, +format using Apache XTable™ (Incubating). After translation, this table will be accessible for querying from various Fabric engines, including T-SQL, Spark, and Power BI. ### Pre-requisites @@ -51,8 +51,8 @@ spark.hadoop.fs.azure.account.oauth2.client.id=<client-id> spark.hadoop.fs.azure.account.oauth2.client.secret=<client-secret> ``` -### Step 2. Translate source table to Delta Lake format using Apache XTable™ -This step translates the table `people` originally in Iceberg or Hudi format to Delta Lake format using Apache XTable™. +### Step 2. Translate source table to Delta Lake format using Apache XTable™ (Incubating)™ +This step translates the table `people` originally in Iceberg or Hudi format to Delta Lake format using Apache XTable™ (Incubating). The primary actions for the translation are documented in [Creating your first interoperable table - Running Sync](/docs/how-to#running-sync) tutorial section. However, since the table is in ADLS, you need to update datasets path and hadoop configurations. diff --git a/website/docs/features-and-limitations.md b/website/docs/features-and-limitations.md index b5a5c8cb..c6e33312 100644 --- a/website/docs/features-and-limitations.md +++ b/website/docs/features-and-limitations.md @@ -10,7 +10,7 @@ import TabItem from '@theme/TabItem'; ## Features Apache XTable™ (Incubating) provides users with the ability to translate metadata from one table format to another. -Apache XTable™ provides two sync modes, "incremental" and "full." The incremental mode is more lightweight and has better performance, especially on large tables. If there is anything that prevents the incremental mode from working properly, the tool will fall back to the full sync mode. +Apache XTable™ (Incubating) provides two sync modes, "incremental" and "full." The incremental mode is more lightweight and has better performance, especially on large tables. If there is anything that prevents the incremental mode from working properly, the tool will fall back to the full sync mode. This sync provides users with the following: 1. Syncing of data files along with their column level statistics and partition metadata diff --git a/website/docs/glue-catalog.md b/website/docs/glue-catalog.md index ebc6e687..1d8f970e 100644 --- a/website/docs/glue-catalog.md +++ b/website/docs/glue-catalog.md @@ -18,12 +18,12 @@ This document walks through the steps to register an Apache XTable™ (Incubatin [AWS docs](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and also set up access credentials by following the steps [here](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html) -3. Clone the Apache XTable™ [repository](https://github.com/apache/incubator-xtable) and create the +3. Clone the Apache XTable™ (Incubating) [repository](https://github.com/apache/incubator-xtable) and create the `utilities-0.1.0-SNAPSHOT-bundled.jar` by following the steps on the [Installation page](/docs/setup) ## Steps ### Running sync -Create `my_config.yaml` in the cloned Apache XTable™ directory. +Create `my_config.yaml` in the cloned Apache XTable™ (Incubating) directory. <Tabs groupId="table-format" @@ -193,6 +193,6 @@ SELECT * FROM onetable_synced_db.<table_name>; ## Conclusion In this guide we saw how to, -1. sync a source table to create metadata for the desired target table formats using Apache XTable™ +1. sync a source table to create metadata for the desired target table formats using Apache XTable™ (Incubating) 2. catalog the data in the target table format in Glue Data Catalog 3. query the target table using Amazon Athena diff --git a/website/docs/hms.md b/website/docs/hms.md index 7efbb4d7..21f3820c 100644 --- a/website/docs/hms.md +++ b/website/docs/hms.md @@ -223,6 +223,6 @@ SELECT * FROM iceberg_db.<table_name>; ## Conclusion In this guide we saw how to, -1. sync a source table to create metadata for the desired target table formats using XTable +1. sync a source table to create metadata for the desired target table formats using Apache XTable™ (Incubating) 2. catalog the data in the target table format in Hive Metastore 3. query the target table using Spark diff --git a/website/docs/how-to.md b/website/docs/how-to.md index d9902139..fcfc4784 100644 --- a/website/docs/how-to.md +++ b/website/docs/how-to.md @@ -15,7 +15,7 @@ on the [Installation page](/docs/setup). Read through Apache XTable™'s [GitHub page](https://github.com/apache/incubator-xtable#building-the-project-and-running-tests) for more information. ::: -In this tutorial we will look at how to use Apache XTable™ to add interoperability between table formats. +In this tutorial we will look at how to use Apache XTable™ (Incubating) to add interoperability between table formats. For example, you can expose a table ingested with Hudi as an Iceberg and/or Delta Lake table without copying or moving the underlying data files used for that table while maintaining a similar commit history to enable proper point in time queries. @@ -23,7 +23,7 @@ history to enable proper point in time queries. ## Pre-requisites 1. A compute instance where you can run Apache Spark. This can be your local machine, docker, or a distributed service like Amazon EMR, Google Cloud's Dataproc, Azure HDInsight etc -2. Clone the Apache XTable™ [repository](https://github.com/apache/incubator-xtable) and create the +2. Clone the Apache XTable™ (Incubating) [repository](https://github.com/apache/incubator-xtable) and create the `utilities-0.1.0-SNAPSHOT-bundled.jar` by following the steps on the [Installation page](/docs/setup) 3. Optional: Setup access to write to and/or read from distributed storage services like: * Amazon S3 by following the steps diff --git a/website/docs/presto.md b/website/docs/presto.md index 6f49da84..c91c40cf 100644 --- a/website/docs/presto.md +++ b/website/docs/presto.md @@ -19,12 +19,12 @@ For more information and required configurations refer to: :::danger Delta Lake: Delta Lake supports [generated columns](https://docs.databricks.com/en/delta/generated-columns.html) which are a special type of column whose values are automatically generated based on a user-specified function -over other columns in the Delta table. During sync, Apache XTable™ uses the same logic to generate partition columns wherever required. -Currently, the generated columns from Apache XTable™ sync shows `NULL` when queried from Presto CLI. +over other columns in the Delta table. During sync, Apache XTable™ (Incubating) uses the same logic to generate partition columns wherever required. +Currently, the generated columns from Apache XTable™ (Incubating) sync shows `NULL` when queried from Presto CLI. ::: For hands on experimentation, please follow [Creating your first interoperable table](/docs/how-to) tutorial -to create Apache XTable™ synced tables followed by [Hive Metastore](/docs/hms) tutorial to register the target table +to create Apache XTable™ (Incubating) synced tables followed by [Hive Metastore](/docs/hms) tutorial to register the target table in Hive Metastore. Once done, follow the below high level steps: 1. If you are working with a self-managed Presto service, from the presto-server directory run `./bin/launcher run` 2. From the directory where you have installed presto-cli: login to presto-cli by running `./presto-cli` @@ -42,7 +42,7 @@ values={[ <TabItem value="hudi"> :::tip Note: -If you are following the example from [Hive Metastore](/docs/hms), you can query the Apache XTable™ synced Hudi table +If you are following the example from [Hive Metastore](/docs/hms), you can query the Apache XTable™ (Incubating) synced Hudi table from Presto using the below query. ```sql md title="sql" SELECT * FROM hudi.hudi_db.<table_name>; @@ -53,7 +53,7 @@ SELECT * FROM hudi.hudi_db.<table_name>; <TabItem value="delta"> :::tip Note: -If you are following the example from [Hive Metastore](/docs/hms), you can query the Apache XTable™ synced Delta table +If you are following the example from [Hive Metastore](/docs/hms), you can query the Apache XTable™ (Incubating) synced Delta table from Presto using the below query. ```sql md title="sql" SELECT * FROM delta.delta_db.<table_name>; @@ -64,7 +64,7 @@ SELECT * FROM delta.delta_db.<table_name>; <TabItem value="iceberg"> :::tip Note: -If you are following the example from [Hive Metastore](/docs/hms), you can query the Apache XTable™ synced Iceberg table +If you are following the example from [Hive Metastore](/docs/hms), you can query the Apache XTable™ (Incubating) synced Iceberg table from Presto using the below query. ```sql md title="sql" SELECT * FROM iceberg.iceberg_db.<table_name>; diff --git a/website/docs/query-engines-index.md b/website/docs/query-engines-index.md index f852c523..f5a10859 100644 --- a/website/docs/query-engines-index.md +++ b/website/docs/query-engines-index.md @@ -1,7 +1,7 @@ # Query Engines Apache XTable™ (Incubating) synced tables behave the similarly to native tables which means you do not need any additional configurations -on query engines' side to work with tables synced by Apache XTable™. This guide will delve into the details of working +on query engines' side to work with tables synced by Apache XTable™ (Incubating). This guide will delve into the details of working with various query engines. For more information on how to sync a source format table to create necessary log files to be inferred as a different format table, refer to [Creating your first interoperable table guide](/docs/how-to) diff --git a/website/docs/redshift.md b/website/docs/redshift.md index 64c2f4b4..ed62ba79 100644 --- a/website/docs/redshift.md +++ b/website/docs/redshift.md @@ -4,7 +4,7 @@ title: "Amazon Redshift Spectrum" --- # Querying from Redshift Spectrum -To read a Apache XTable™ synced target table (regardless of the table format) in Amazon Redshift, +To read an Apache XTable™ (Incubating) synced target table (regardless of the table format) in Amazon Redshift, users have to create an external schema and refer to the external data catalog that contains the table. Redshift infers the table's schema and format from the external catalog/database directly. For more information on creating external schemas, refer to @@ -41,7 +41,7 @@ You have two options to create and query Delta tables in Redshift Spectrum: 1. Follow the steps in [this](https://docs.delta.io/latest/redshift-spectrum-integration.html#set-up-a-redshift-spectrum-to-delta-lake-integration-and-query-delta-tables) article to set up a Redshift Spectrum to Delta Lake integration and query Delta tables directly from Amazon S3. -2. While creating the Glue Crawler to crawl the Apache XTable™ synced Delta table, choose the `Create Symlink tables` +2. While creating the Glue Crawler to crawl the Apache XTable™ (Incubating) synced Delta table, choose the `Create Symlink tables` option in `Add data source` pop-up window. This will add `_symlink_format_manifest` folder with manifest files in the table root path. diff --git a/website/docs/trino.md b/website/docs/trino.md index b870b713..b3a6d3f5 100644 --- a/website/docs/trino.md +++ b/website/docs/trino.md @@ -17,7 +17,7 @@ For more information and required configurations refer to: * [Iceberg Connector](https://trino.io/docs/current/connector/iceberg.html) For hands on experimentation, please follow [Creating your first interoperable table](/docs/how-to#create-dataset) -to create Apache XTable™ synced tables followed by [Hive Metastore](/docs/hms) to register the target table +to create Apache XTable™ (Incubating) synced tables followed by [Hive Metastore](/docs/hms) to register the target table in Hive Metastore. Once done, please follow the below high level steps: 1. Start the Trino server manually if you are working with a non-managed Trino service: from the trino-server directory run `./bin/launcher run` @@ -36,7 +36,7 @@ values={[ <TabItem value="hudi"> :::tip Note: -If you are following the example from [Hive Metastore](/docs/hms), you can query the Apache XTable™ synced Hudi table +If you are following the example from [Hive Metastore](/docs/hms), you can query the Apache XTable™ (Incubating) synced Hudi table from Trino using the below query. ```sql md title="sql" SELECT * FROM hudi.hudi_db.<table_name>; @@ -47,7 +47,7 @@ SELECT * FROM hudi.hudi_db.<table_name>; <TabItem value="delta"> :::tip Note: -If you are following the example from [Hive Metastore](/docs/hms), you can query the Apache XTable™ synced Delta table +If you are following the example from [Hive Metastore](/docs/hms), you can query the Apache XTable™ (Incubating) synced Delta table from Trino using the below query. ```sql md title="sql" SELECT * FROM delta.delta_db.<table_name>; @@ -58,7 +58,7 @@ SELECT * FROM delta.delta_db.<table_name>; <TabItem value="iceberg"> :::tip Note: -If you are following the example from [Hive Metastore](/docs/hms), you can query the Apache XTable™ synced Iceberg table +If you are following the example from [Hive Metastore](/docs/hms), you can query the Apache XTable™ (Incubating) synced Iceberg table from Trino using the below query. ```sql md title="sql" SELECT * FROM iceberg.iceberg_db.<table_name>; diff --git a/website/docs/unity-catalog.md b/website/docs/unity-catalog.md index d2de83d9..0b8819ff 100644 --- a/website/docs/unity-catalog.md +++ b/website/docs/unity-catalog.md @@ -77,6 +77,6 @@ SELECT * FROM onetable.synced_delta_schema.<table_name>; ## Conclusion In this guide we saw how to, -1. sync a source table to create metadata for the desired target table formats using XTable +1. sync a source table to create metadata for the desired target table formats using Apache XTable™ (Incubating) 2. catalog the data in Delta format in Unity Catalog on Databricks 3. query the Delta table using Databricks SQL editor diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js index 5426f033..8edfec1e 100644 --- a/website/docusaurus.config.js +++ b/website/docusaurus.config.js @@ -6,7 +6,7 @@ const darkCodeTheme = require('prism-react-renderer/themes/dracula'); /** @type {import('@docusaurus/types').Config} */ const config = { - title: 'Apache XTable™', + title: 'Apache XTable™ (Incubating)', favicon: 'images/xtable-favicon.png', url: 'https://onetable.dev', baseUrl: '/', diff --git a/website/static/index.html b/website/static/index.html index 7f89c3af..cdf744f3 100644 --- a/website/static/index.html +++ b/website/static/index.html @@ -3,13 +3,13 @@ <html data-wf-page="65402b66d39d6454e51fabf1" data-wf-site="65402b66d39d6454e51fabed"> <head> <meta charset="utf-8"> - <title>Apache XTable™</title> - <meta content="Apache XTable™ is a cross-table interop of lakehouse table formats Apache Hudi, Apache Iceberg, and Delta Lake. Apache XTable™ is NOT a new or separate format, Apache XTable™ provides abstractions and tools for the translation of lakehouse table format metadata." name="description"> - <meta content="Apache XTable™" property="og:title"> - <meta content="Apache XTable™ is a cross-table interop of lakehouse table formats Apache Hudi, Apache Iceberg, and Delta Lake. Apache XTable™ is NOT a new or separate format, Apache XTable™ provides abstractions and tools for the translation of lakehouse table format metadata." property="og:description"> + <title>Apache XTable™ (Incubating)</title> + <meta content="Apache XTable™ (Incubating) is a cross-table interop of lakehouse table formats Apache Hudi, Apache Iceberg, and Delta Lake. Apache XTable™ is NOT a new or separate format, Apache XTable™ provides abstractions and tools for the translation of lakehouse table format metadata." name="description"> + <meta content="Apache XTable™ (Incubating)" property="og:title"> + <meta content="Apache XTable™ (Incubating) is a cross-table interop of lakehouse table formats Apache Hudi, Apache Iceberg, and Delta Lake. Apache XTable™ is NOT a new or separate format, Apache XTable™ provides abstractions and tools for the translation of lakehouse table format metadata." property="og:description"> <meta content="https://uploads-ssl.webflow.com/65402b66d39d6454e51fabed/654c643bf9cebe95d6c3dd50_Group%201562%20(1).png" property="og:image"> - <meta content="Apache XTable™" property="twitter:title"> - <meta content="Apache XTable™ is a cross-table interop of lakehouse table formats Apache Hudi, Apache Iceberg, and Delta Lake. Apache XTable™ is NOT a new or separate format, Apache XTable™ provides abstractions and tools for the translation of lakehouse table format metadata." property="twitter:description"> + <meta content="Apache XTable™ (Incubating)" property="twitter:title"> + <meta content="Apache XTable™ (Incubating) is a cross-table interop of lakehouse table formats Apache Hudi, Apache Iceberg, and Delta Lake. Apache XTable™ is NOT a new or separate format, Apache XTable™ provides abstractions and tools for the translation of lakehouse table format metadata." property="twitter:description"> <meta content="https://uploads-ssl.webflow.com/65402b66d39d6454e51fabed/654c643bf9cebe95d6c3dd50_Group%201562%20(1).png" property="twitter:image"> <meta property="og:type" content="website"> <meta content="summary_large_image" name="twitter:card">
