This is an automated email from the ASF dual-hosted git repository.
yufei pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/polaris.git
The following commit(s) were added to refs/heads/main by this push:
new 97f9d0db1 Site: Use the overview page as the home page (#1605)
97f9d0db1 is described below
commit 97f9d0db1410a5c0228c7a155c4c6fa0b1570ee1
Author: Yufei Gu <[email protected]>
AuthorDate: Sun May 18 08:47:27 2025 -0700
Site: Use the overview page as the home page (#1605)
---
site/content/in-dev/unreleased/_index.md | 146 ++++++++++++++++++++++++-
site/content/in-dev/unreleased/overview.md | 166 -----------------------------
2 files changed, 144 insertions(+), 168 deletions(-)
diff --git a/site/content/in-dev/unreleased/_index.md
b/site/content/in-dev/unreleased/_index.md
index b4e147ecd..28ef3e255 100644
--- a/site/content/in-dev/unreleased/_index.md
+++ b/site/content/in-dev/unreleased/_index.md
@@ -18,7 +18,7 @@
# under the License.
#
linkTitle: 'In Development'
-title: 'Unreleased - current state of the main branch'
+title: 'Overview'
type: docs
weight: 200
params:
@@ -37,7 +37,149 @@ These pages refer to the current state of the main branch,
which is still under
Functionalities can be changed, removed or added without prior notice.
{{< /alert >}}
-Check out the [Quickstart]({{% ref "getting-started" %}}) page to get started.
+Apache Polaris (Incubating) is a catalog implementation for Apache
Iceberg™ tables and is built on the open source Apache Iceberg™
REST protocol.
+
+With Polaris, you can provide centralized, secure read and write access to
your Iceberg tables across different REST-compatible query engines.
+
+ overview")
+
+## Key concepts
+
+This section introduces key concepts associated with using Apache Polaris
(Incubating).
+
+In the following diagram, a sample [Apache Polaris (Incubating)
structure](#catalog) with nested [namespaces](#namespace) is shown for
Catalog1. No tables
+or namespaces have been created yet for Catalog2 or Catalog3.
+
+ structure")
+
+### Catalog
+
+In Polaris, you can create one or more catalog resources to organize Iceberg
tables.
+
+Configure your catalog by setting values in the storage configuration for S3,
Azure, or Google Cloud Storage. An Iceberg catalog enables a
+query engine to manage and organize tables. The catalog forms the first
architectural layer in the [Apache Iceberg™ table
specification](https://iceberg.apache.org/spec/#overview) and must support the
following tasks:
+
+- Storing the current metadata pointer for one or more Iceberg tables. A
metadata pointer maps a table name to the location of that table's
+ current metadata file.
+
+- Performing atomic operations so that you can update the current metadata
pointer for a table to the metadata pointer of a new version of
+ the table.
+
+To learn more about Iceberg catalogs, see the [Apache Iceberg™
documentation](https://iceberg.apache.org/concepts/catalog/).
+
+#### Catalog types
+
+A catalog can be one of the following two types:
+
+- Internal: The catalog is managed by Polaris. Tables from this catalog can be
read and written in Polaris.
+
+- External: The catalog is externally managed by another Iceberg catalog
provider (for example, Snowflake, Glue, Dremio Arctic). Tables from
+ this catalog are synced to Polaris. These tables are read-only in Polaris.
+
+A catalog is configured with a storage configuration that can point to S3,
Azure storage, or GCS.
+
+### Namespace
+
+You create *namespaces* to logically group Iceberg tables within a catalog. A
catalog can have multiple namespaces. You can also create
+nested namespaces. Iceberg tables belong to namespaces.
+
+> **Important**
+>
+> For the access privileges defined for a catalog to be enforced correctly,
the following conditions must be met:
+>
+> - The directory only contains the data files that belong to a single table.
+> - The directory hierarchy matches the namespace hierarchy for the catalog.
+>
+> For example, if a catalog includes the following items:
+>
+> - Top-level namespace namespace1
+> - Nested namespace namespace1a
+> - A customers table, which is grouped under nested namespace namespace1a
+> - An orders table, which is grouped under nested namespace namespace1a
+>
+> The directory hierarchy for the catalog must follow this structure:
+>
+> - /namespace1/namespace1a/customers/<files for the customers table *only*>
+> - /namespace1/namespace1a/orders/<files for the orders table *only*>
+
+### Storage configuration
+
+A storage configuration stores a generated identity and access management
(IAM) entity for your cloud storage and is created
+when you create a catalog. The storage configuration is used to set the values
to connect Polaris to your cloud storage. During the
+catalog creation process, an IAM entity is generated and used to create a
trust relationship between the cloud storage provider and Polaris
+Catalog.
+
+When you create a catalog, you supply the following information about your
cloud storage:
+
+| Cloud storage provider | Information |
+| -----------------------| ----------- |
+| Amazon S3 | <ul><li>Default base location for your Amazon S3
bucket</li><li>Locations for your Amazon S3 bucket</li><li>S3 role
ARN</li><li>External ID (optional)</li></ul> |
+| Google Cloud Storage (GCS) | <ul><li>Default base location for your GCS
bucket</li><li>Locations for your GCS bucket</li></ul> |
+| Azure | <ul><li>Default base location for your Microsoft Azure
container</li><li>Locations for your Microsoft Azure container</li><li>Azure
tenant ID</li></ul> |
+
+## Example workflow
+
+In the following example workflow, Bob creates an Apache Iceberg™ table
named Table1 and Alice reads data from Table1.
+
+1. Bob uses Apache Spark™ to create the Table1 table under the
+ Namespace1 namespace in the Catalog1 catalog and insert values into
+ Table1.
+
+ Bob can create Table1 and insert data into it because he is using a
+ service connection with a service principal that has
+ the privileges to perform these actions.
+
+2. Alice uses Snowflake to read data from Table1.
+
+ Alice can read data from Table1 because she is using a service
+ connection with a service principal with a catalog integration that
+ has the privileges to perform this action. Alice
+ creates an unmanaged table in Snowflake to read data from Table1.
+
+")
+
+## Security and access control
+
+### Credential vending
+
+To secure interactions with service connections, Polaris vends temporary
storage credentials to the query engine during query
+execution. These credentials allow the query engine to run the query without
requiring access to your cloud storage for
+Iceberg tables. This process is called credential vending.
+
+As of now, the following limitation is known regarding Apache Iceberg support:
+
+- **remove_orphan_files:** Apache Spark can't use credential vending
+ for this due to a known issue. See
[apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details.
+
+### Identity and access management (IAM)
+
+Polaris uses the identity and access management (IAM) entity to securely
connect to your storage for accessing table data, Iceberg
+metadata, and manifest files that store the table schema, partitions, and
other metadata. Polaris retains the IAM entity for your
+storage location.
+
+### Access control
+
+Polaris enforces the access control that you configure across all tables
registered with the service and governs security for all
+queries from query engines in a consistent manner.
+
+Polaris uses a role-based access control (RBAC) model that lets you centrally
configure access for Polaris service principals to catalogs,
+namespaces, and tables.
+
+Polaris RBAC uses two different role types to delegate privileges:
+
+- **Principal roles:** Granted to Polaris service principals and
+ analogous to roles in other access control systems that you grant to
+ service principals.
+
+- **Catalog roles:** Configured with certain privileges on Polaris
+ catalog resources and granted to principal roles.
+
+For more information, see [Access control]({{% ref "access-control" %}}).
+
+## Legal Notices
+
+Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®,
and Flink® are either registered trademarks or trademarks of the Apache
Software Foundation in the United States and/or other countries.
+
<!--
Testing the `releaseVersion` shortcode here: version is: {{< releaseVersion >}}
diff --git a/site/content/in-dev/unreleased/overview.md
b/site/content/in-dev/unreleased/overview.md
deleted file mode 100644
index 3c5713a15..000000000
--- a/site/content/in-dev/unreleased/overview.md
+++ /dev/null
@@ -1,166 +0,0 @@
----
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-#
-Title: Overview
-type: docs
-weight: 200
----
-
-Apache Polaris (Incubating) is a catalog implementation for Apache
Iceberg™ tables and is built on the open source Apache Iceberg™
REST protocol.
-
-With Polaris, you can provide centralized, secure read and write access to
your Iceberg tables across different REST-compatible query engines.
-
- overview")
-
-## Key concepts
-
-This section introduces key concepts associated with using Apache Polaris
(Incubating).
-
-In the following diagram, a sample [Apache Polaris (Incubating)
structure](#catalog) with nested [namespaces](#namespace) is shown for
Catalog1. No tables
-or namespaces have been created yet for Catalog2 or Catalog3.
-
- structure")
-
-### Catalog
-
-In Polaris, you can create one or more catalog resources to organize Iceberg
tables.
-
-Configure your catalog by setting values in the storage configuration for S3,
Azure, or Google Cloud Storage. An Iceberg catalog enables a
-query engine to manage and organize tables. The catalog forms the first
architectural layer in the [Apache Iceberg™ table
specification](https://iceberg.apache.org/spec/#overview) and must support the
following tasks:
-
-- Storing the current metadata pointer for one or more Iceberg tables. A
metadata pointer maps a table name to the location of that table's
- current metadata file.
-
-- Performing atomic operations so that you can update the current metadata
pointer for a table to the metadata pointer of a new version of
- the table.
-
-To learn more about Iceberg catalogs, see the [Apache Iceberg™
documentation](https://iceberg.apache.org/concepts/catalog/).
-
-#### Catalog types
-
-A catalog can be one of the following two types:
-
-- Internal: The catalog is managed by Polaris. Tables from this catalog can be
read and written in Polaris.
-
-- External: The catalog is externally managed by another Iceberg catalog
provider (for example, Snowflake, Glue, Dremio Arctic). Tables from
- this catalog are synced to Polaris. These tables are read-only in Polaris.
-
-A catalog is configured with a storage configuration that can point to S3,
Azure storage, or GCS.
-
-### Namespace
-
-You create *namespaces* to logically group Iceberg tables within a catalog. A
catalog can have multiple namespaces. You can also create
-nested namespaces. Iceberg tables belong to namespaces.
-
-> **Important**
->
-> For the access privileges defined for a catalog to be enforced correctly,
the following conditions must be met:
->
-> - The directory only contains the data files that belong to a single table.
-> - The directory hierarchy matches the namespace hierarchy for the catalog.
->
-> For example, if a catalog includes the following items:
->
-> - Top-level namespace namespace1
-> - Nested namespace namespace1a
-> - A customers table, which is grouped under nested namespace namespace1a
-> - An orders table, which is grouped under nested namespace namespace1a
->
-> The directory hierarchy for the catalog must follow this structure:
->
-> - /namespace1/namespace1a/customers/<files for the customers table *only*>
-> - /namespace1/namespace1a/orders/<files for the orders table *only*>
-
-### Storage configuration
-
-A storage configuration stores a generated identity and access management
(IAM) entity for your cloud storage and is created
-when you create a catalog. The storage configuration is used to set the values
to connect Polaris to your cloud storage. During the
-catalog creation process, an IAM entity is generated and used to create a
trust relationship between the cloud storage provider and Polaris
-Catalog.
-
-When you create a catalog, you supply the following information about your
cloud storage:
-
-| Cloud storage provider | Information |
-| -----------------------| ----------- |
-| Amazon S3 | <ul><li>Default base location for your Amazon S3
bucket</li><li>Locations for your Amazon S3 bucket</li><li>S3 role
ARN</li><li>External ID (optional)</li></ul> |
-| Google Cloud Storage (GCS) | <ul><li>Default base location for your GCS
bucket</li><li>Locations for your GCS bucket</li></ul> |
-| Azure | <ul><li>Default base location for your Microsoft Azure
container</li><li>Locations for your Microsoft Azure container</li><li>Azure
tenant ID</li></ul> |
-
-## Example workflow
-
-In the following example workflow, Bob creates an Apache Iceberg™ table
named Table1 and Alice reads data from Table1.
-
-1. Bob uses Apache Spark™ to create the Table1 table under the
- Namespace1 namespace in the Catalog1 catalog and insert values into
- Table1.
-
- Bob can create Table1 and insert data into it because he is using a
- service connection with a service principal that has
- the privileges to perform these actions.
-
-2. Alice uses Snowflake to read data from Table1.
-
- Alice can read data from Table1 because she is using a service
- connection with a service principal with a catalog integration that
- has the privileges to perform this action. Alice
- creates an unmanaged table in Snowflake to read data from Table1.
-
-")
-
-## Security and access control
-
-### Credential vending
-
-To secure interactions with service connections, Polaris vends temporary
storage credentials to the query engine during query
-execution. These credentials allow the query engine to run the query without
requiring access to your cloud storage for
-Iceberg tables. This process is called credential vending.
-
-As of now, the following limitation is known regarding Apache Iceberg support:
-
-- **remove_orphan_files:** Apache Spark can't use credential vending
- for this due to a known issue. See
[apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details.
-
-### Identity and access management (IAM)
-
-Polaris uses the identity and access management (IAM) entity to securely
connect to your storage for accessing table data, Iceberg
-metadata, and manifest files that store the table schema, partitions, and
other metadata. Polaris retains the IAM entity for your
-storage location.
-
-### Access control
-
-Polaris enforces the access control that you configure across all tables
registered with the service and governs security for all
-queries from query engines in a consistent manner.
-
-Polaris uses a role-based access control (RBAC) model that lets you centrally
configure access for Polaris service principals to catalogs,
-namespaces, and tables.
-
-Polaris RBAC uses two different role types to delegate privileges:
-
-- **Principal roles:** Granted to Polaris service principals and
- analogous to roles in other access control systems that you grant to
- service principals.
-
-- **Catalog roles:** Configured with certain privileges on Polaris
- catalog resources and granted to principal roles.
-
-For more information, see [Access control]({{% ref "access-control" %}}).
-
-## Legal Notices
-
-Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®,
and Flink® are either registered trademarks or trademarks of the Apache
Software Foundation in the United States and/or other countries.