Re: [PR] Add Docker-based Ceph + Polaris cluster setup [polaris]

via GitHub Tue, 11 Nov 2025 23:39:10 -0800


snazy commented on code in PR #3022:
URL: https://github.com/apache/polaris/pull/3022#discussion_r2517114660



##########
getting-started/ceph/README.md:
##########
@@ -0,0 +1,147 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+ 
+   http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Getting Started with Apache Polaris and Ceph
+
+## Overview
+
+This guide describes how to spin up a **single-node Ceph cluster** with 
**RADOS Gateway (RGW)** for S3-compatible storage and configure it for use by 
**Polaris**.
+
+This example cluster is configured for basic access key authentication only.
+It does not include STS (Security Token Service) or temporary credentials.
+All access to the Ceph RGW (RADOS Gateway) and Polaris integration uses static 
S3-style credentials (as configured via radosgw-admin user create).
+
+Spark is used as a query engine. This example assumes a local Spark 
installation.
+See the [Spark Notebooks Example](../spark/README.md) for a more advanced 
Spark setup.
+
+## Starting the Example
+
+Before starting the Ceph + Polaris stack, you’ll need to configure environment 
variables that define network settings, credentials, and cluster IDs.
+
+The services are started **in sequence**:
+1. Monitor + Manager
+2. OSD
+3. RGW
+4. Polaris
+
+Note: this example pulls the `apache/polaris:latest` image, but assumes the 
image is `1.2.0-incubating` or later. 
+
+### 1. Copy the example environment file
+```shell
+cp .env.example .env
+```
+
+### 2. Start monitor and manager
+```shell
+docker compose up -d mon1 mgr
+```
+
+### 3. Start OSD
+```shell
+docker compose up -d osd1
+```
+
+### 4. Start RGW
+```shell
+docker compose up -d rgw1
+```
+#### Check status
+```shell
+docker exec --interactive --tty ceph-mon1-1 ceph -s
+```
+You should see something like:
+```yaml
+cluster:
+  id:     b2f59c4b-5f14-4f8c-a9b7-3b7998c76a0e
+  health: HEALTH_WARN
+          mon is allowing insecure global_id reclaim
+          1 monitors have not enabled msgr2
+          6 pool(s) have no replicas configured
+
+services:
+  mon: 1 daemons, quorum mon1 (age 49m)
+  mgr: mgr(active, since 94m)
+  osd: 1 osds: 1 up (since 36m), 1 in (since 93m)
+  rgw: 1 daemon active (1 hosts, 1 zones)
+```
+
+### 5. Create bucket for Polaris storage
+```shell
+docker compose up -d setup_bucket
+```
+
+### 6. Run Polaris service
+```shell
+docker compose up -d polaris
+```
+
+### 7. Setup polaris catalog
+```shell
+docker compose up -d polaris-setup
+```
+
+## Connecting From Spark
+
+```shell
+bin/spark-sql \
+    --packages 
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0
 \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.polaris.type=rest \
+    --conf 
spark.sql.catalog.polaris.io-impl="org.apache.iceberg.aws.s3.S3FileIO" \
+    --conf spark.sql.catalog.polaris.uri=http://polaris:8181/api/catalog \
+    --conf spark.sql.catalog.polaris.token-refresh-enabled=true \
+    --conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \
+    --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \
+    --conf spark.sql.catalog.polaris.credential=root:s3cr3t \
+    --conf spark.sql.catalog.polaris.client.region=irrelevant \
+    --conf spark.sql.catalog.polaris.s3.access-key-id=$RGW_ACCESS_KEY \
+    --conf spark.sql.catalog.polaris.s3.secret-access-key=$RGW_SECRET_KEY

Review Comment:
   The keys would be empty, because both variables aren't available in the 
shell.
   Better replace with the actual values for simplicity.



##########
getting-started/ceph/README.md:
##########
@@ -0,0 +1,147 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+ 
+   http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Getting Started with Apache Polaris and Ceph
+
+## Overview
+
+This guide describes how to spin up a **single-node Ceph cluster** with 
**RADOS Gateway (RGW)** for S3-compatible storage and configure it for use by 
**Polaris**.
+
+This example cluster is configured for basic access key authentication only.
+It does not include STS (Security Token Service) or temporary credentials.
+All access to the Ceph RGW (RADOS Gateway) and Polaris integration uses static 
S3-style credentials (as configured via radosgw-admin user create).
+
+Spark is used as a query engine. This example assumes a local Spark 
installation.
+See the [Spark Notebooks Example](../spark/README.md) for a more advanced 
Spark setup.
+
+## Starting the Example
+
+Before starting the Ceph + Polaris stack, you’ll need to configure environment 
variables that define network settings, credentials, and cluster IDs.
+
+The services are started **in sequence**:
+1. Monitor + Manager
+2. OSD
+3. RGW
+4. Polaris
+
+Note: this example pulls the `apache/polaris:latest` image, but assumes the 
image is `1.2.0-incubating` or later. 
+
+### 1. Copy the example environment file
+```shell
+cp .env.example .env
+```
+
+### 2. Start monitor and manager
+```shell
+docker compose up -d mon1 mgr
+```
+
+### 3. Start OSD
+```shell
+docker compose up -d osd1
+```
+
+### 4. Start RGW
+```shell
+docker compose up -d rgw1
+```
+#### Check status
+```shell
+docker exec --interactive --tty ceph-mon1-1 ceph -s
+```
+You should see something like:
+```yaml
+cluster:
+  id:     b2f59c4b-5f14-4f8c-a9b7-3b7998c76a0e
+  health: HEALTH_WARN
+          mon is allowing insecure global_id reclaim
+          1 monitors have not enabled msgr2
+          6 pool(s) have no replicas configured
+
+services:
+  mon: 1 daemons, quorum mon1 (age 49m)
+  mgr: mgr(active, since 94m)
+  osd: 1 osds: 1 up (since 36m), 1 in (since 93m)
+  rgw: 1 daemon active (1 hosts, 1 zones)
+```
+
+### 5. Create bucket for Polaris storage
+```shell
+docker compose up -d setup_bucket
+```
+
+### 6. Run Polaris service
+```shell
+docker compose up -d polaris
+```
+
+### 7. Setup polaris catalog
+```shell
+docker compose up -d polaris-setup
+```
+
+## Connecting From Spark
+
+```shell
+bin/spark-sql \
+    --packages 
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0
 \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.polaris.type=rest \
+    --conf 
spark.sql.catalog.polaris.io-impl="org.apache.iceberg.aws.s3.S3FileIO" \
+    --conf spark.sql.catalog.polaris.uri=http://polaris:8181/api/catalog \

Review Comment:
   The host cannot be resolved.
   Should be
   
   ```suggestion
       --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \
   ```



##########
getting-started/ceph/README.md:
##########
@@ -0,0 +1,145 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+ 
+   http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Getting Started with Apache Polaris and Ceph
+
+## Overview
+
+This guide describes how to spin up a **single-node Ceph cluster** with 
**RADOS Gateway (RGW)** for S3-compatible storage and configure it for use by 
**Polaris**.
+
+This example cluster is configured for basic access key authentication only.
+It does not include STS (Security Token Service) or temporary credentials.
+All access to the Ceph RGW (RADOS Gateway) and Polaris integration uses static 
S3-style credentials (as configured via radosgw-admin user create).
+
+Spark is used as a query engine. This example assumes a local Spark 
installation.
+See the [Spark Notebooks Example](../spark/README.md) for a more advanced 
Spark setup.
+
+## Starting the Example
+
+Before starting the Ceph + Polaris stack, you’ll need to configure environment 
variables that define network settings, credentials, and cluster IDs.
+
+Copy the example environment file:
+```shell
+mv getting-started/ceph/.env.example getting-started/ceph/.env
+```
+
+The services are started **in sequence**:
+1. Monitor + Manager
+2. OSD
+3. RGW
+4. Polaris
+
+Note: this example pulls the `apache/polaris:latest` image, but assumes the 
image is `1.2.0-incubating` or later. 
+
+
+### 1. Start monitor and manager
+```shell
+docker compose up -d mon1 mgr

Review Comment:
   The Docker/Podman part LGTM now.



##########
getting-started/ceph/README.md:
##########
@@ -0,0 +1,152 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+ 
+   http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Getting Started with Apache Polaris and Ceph
+
+## Overview
+
+This guide describes how to spin up a **single-node Ceph cluster** with 
**RADOS Gateway (RGW)** for S3-compatible storage and configure it for use by 
**Polaris**.
+
+This example cluster is configured for basic access key authentication only.
+It does not include STS (Security Token Service) or temporary credentials.
+All access to the Ceph RGW (RADOS Gateway) and Polaris integration uses static 
S3-style credentials (as configured via radosgw-admin user create).
+
+Spark is used as a query engine. This example assumes a local Spark 
installation.
+See the [Spark Notebooks Example](../spark/README.md) for a more advanced 
Spark setup.
+
+## Starting the Example
+
+Before starting the Ceph + Polaris stack, you’ll need to configure environment 
variables that define network settings, credentials, and cluster IDs.
+
+The services are started **in sequence**:
+1. Monitor + Manager
+2. OSD
+3. RGW
+4. Polaris
+
+Note: this example pulls the `apache/polaris:latest` image, but assumes the 
image is `1.2.0-incubating` or later. 
+
+### 1. Copy the example environment file
+```shell
+cp .env.example .env
+```
+
+### 2. Prepare Network
+```shell
+# Optional: force runtime (docker or podman)
+export RUNTIME=docker
+
+./getting-started/ceph/prepare-network.sh
+```
+
+### 3. Start monitor and manager
+```shell
+docker compose up -d mon1 mgr
+```
+
+### 4. Start OSD
+```shell
+docker compose up -d osd1
+```
+
+### 5. Start RGW
+```shell
+docker compose up -d rgw1
+```
+#### Check status
+```shell
+docker exec --interactive --tty ceph-mon1-1 ceph -s
+```
+You should see something like:
+```yaml
+cluster:
+  id:     b2f59c4b-5f14-4f8c-a9b7-3b7998c76a0e
+  health: HEALTH_WARN
+          mon is allowing insecure global_id reclaim
+          1 monitors have not enabled msgr2
+          6 pool(s) have no replicas configured
+
+services:
+  mon: 1 daemons, quorum mon1 (age 49m)
+  mgr: mgr(active, since 94m)
+  osd: 1 osds: 1 up (since 36m), 1 in (since 93m)
+  rgw: 1 daemon active (1 hosts, 1 zones)
+```
+
+### 6. Create bucket for Polaris storage
+```shell
+docker compose up -d setup_bucket
+```
+
+### 7. Run Polaris service
+```shell
+docker compose up -d polaris
+```
+
+### 8. Setup polaris catalog
+```shell
+docker compose up -d polaris-setup
+```
+
+## Connecting From Spark
+
+```shell
+bin/spark-sql \
+    --packages 
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0
 \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.polaris.type=rest \
+    --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \
+    --conf spark.sql.catalog.polaris.token-refresh-enabled=false \
+    --conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \
+    --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \
+    --conf spark.sql.catalog.polaris.credential=root:s3cr3t \
+    --conf spark.sql.catalog.polaris.client.region=irrelevant
+```
+
+Note: `s3cr3t` is defined as the password for the `root` user in the 
`docker-compose.yml` file.
+
+Note: The `client.region` configuration is required for the AWS S3 client to 
work, but it is not used in this example
+since Ceph does not require a specific region.
+
+## Running Queries
+
+Run inside the Spark SQL shell:
+
+```
+spark-sql (default)> use polaris;
+Time taken: 0.837 seconds
+
+spark-sql ()> create namespace ns;
+Time taken: 0.374 seconds
+
+spark-sql ()> create table ns.t1 as select 'abc';

Review Comment:
   Still not working for me:
   
   ```
   $ export RGW_ACCESS_KEY=POLARIS123ACCESS                     # Access key 
for Polaris S3 user
   $ export RGW_SECRET_KEY=POLARIS456SECRET                     # Secret key 
for Polaris S3 user
   $ spark-sql \
       --packages 
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0
 \
       --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
       --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
       --conf spark.sql.catalog.polaris.type=rest \
       --conf 
spark.sql.catalog.polaris.io-impl="org.apache.iceberg.aws.s3.S3FileIO" \
       --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \
       --conf spark.sql.catalog.polaris.token-refresh-enabled=true \
       --conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \
       --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \
       --conf spark.sql.catalog.polaris.credential=root:s3cr3t \
       --conf spark.sql.catalog.polaris.client.region=irrelevant \
       --conf spark.sql.catalog.polaris.s3.access-key-id=$RGW_ACCESS_KEY \
       --conf spark.sql.catalog.polaris.s3.secret-access-key=$RGW_SECRET_KEY
   25/11/12 07:50:56 WARN Utils: Your hostname, shark resolves to a loopback 
address: 127.0.1.1; using 192.168.x.x instead (on interface enp14s0)
   25/11/12 07:50:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
another address
   :: loading settings :: url = 
jar:file:/home/snazy/.sdkman/candidates/spark/3.5.3/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
   Ivy Default Cache set to: /home/snazy/.ivy2/cache
   The jars for the packages stored in: /home/snazy/.ivy2/jars
   org.apache.iceberg#iceberg-spark-runtime-3.5_2.12 added as a dependency
   org.apache.iceberg#iceberg-aws-bundle added as a dependency
   :: resolving dependencies :: 
org.apache.spark#spark-submit-parent-2fdcef36-748e-42b7-815e-6aac08972a3c;1.0
        confs: [default]
        found org.apache.iceberg#iceberg-spark-runtime-3.5_2.12;1.9.0 in central
        found org.apache.iceberg#iceberg-aws-bundle;1.9.0 in central
   :: resolution report :: resolve 56ms :: artifacts dl 1ms
        :: modules in use:
        org.apache.iceberg#iceberg-aws-bundle;1.9.0 from central in [default]
        org.apache.iceberg#iceberg-spark-runtime-3.5_2.12;1.9.0 from central in 
[default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   2   |   0   |   0   |   0   ||   2   |   0   |
        ---------------------------------------------------------------------
   :: retrieving :: 
org.apache.spark#spark-submit-parent-2fdcef36-748e-42b7-815e-6aac08972a3c
        confs: [default]
        0 artifacts copied, 2 already retrieved (0kB/3ms)
   25/11/12 07:50:56 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   25/11/12 07:50:57 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout 
does not exist
   25/11/12 07:50:57 WARN HiveConf: HiveConf of name hive.stats.retries.wait 
does not exist
   25/11/12 07:50:58 WARN ObjectStore: Version information not found in 
metastore. hive.metastore.schema.verification is not enabled so recording the 
schema version 2.3.0
   25/11/12 07:50:58 WARN ObjectStore: setMetaStoreSchemaVersion called but 
recording version is disabled: version = 2.3.0, comment = Set by MetaStore 
[email protected]
   Spark Web UI available at http://x.x.x.x:4040
   Spark master: local[*], Application Id: local-1762930257016
   spark-sql (default)> use polaris;
   25/11/12 07:51:01 WARN AuthManagers: Inferring rest.auth.type=oauth2 since 
property credential was provided. Please explicitly set rest.auth.type to avoid 
this warning.
   25/11/12 07:51:01 WARN OAuth2Manager: Iceberg REST client is missing the 
OAuth2 server URI configuration and defaults to 
http://localhost:8181/api/catalog/v1/oauth/tokens. This automatic fallback will 
be removed in a future Iceberg release.It is recommended to configure the 
OAuth2 endpoint using the 'oauth2-server-uri' property to be prepared. This 
warning will disappear if the OAuth2 endpoint is explicitly configured. See 
https://github.com/apache/iceberg/issues/10537
   25/11/12 07:51:01 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
   Time taken: 0.566 seconds
   spark-sql ()> create namespace ns;
   [SCHEMA_ALREADY_EXISTS] Cannot create schema `ns` because it already exists.
   Choose a different name, drop the existing schema, or add the IF NOT EXISTS 
clause to tolerate pre-existing schema.
   spark-sql ()> create table ns.t1 as select 'abc';
   25/11/12 07:51:06 ERROR SparkSQLDriver: Failed in [create table ns.t1 as 
select 'abc']
   java.lang.IllegalArgumentException: Credential vending was requested for 
table ns.t1, but no credentials are available
        at 
org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:230)
        at 
org.apache.iceberg.rest.ErrorHandlers$TableErrorHandler.accept(ErrorHandlers.java:123)
        at 
org.apache.iceberg.rest.ErrorHandlers$TableErrorHandler.accept(ErrorHandlers.java:107)
        at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:215)
        at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:299)
        at org.apache.iceberg.rest.BaseHTTPClient.post(BaseHTTPClient.java:88)
        at 
org.apache.iceberg.rest.RESTSessionCatalog$Builder.stageCreate(RESTSessionCatalog.java:921)
        at 
org.apache.iceberg.rest.RESTSessionCatalog$Builder.createTransaction(RESTSessionCatalog.java:799)
        at 
org.apache.iceberg.CachingCatalog$CachingTableBuilder.createTransaction(CachingCatalog.java:282)
        at 
org.apache.iceberg.spark.SparkCatalog.stageCreate(SparkCatalog.java:265)
        at 
org.apache.spark.sql.connector.catalog.StagingTableCatalog.stageCreate(StagingTableCatalog.java:94)
        at 
org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:121)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
        at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:68)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:501)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:619)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:613)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:613)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:310)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:75)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:52)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Credential vending was requested for table ns.t1, but no credentials are 
available
   java.lang.IllegalArgumentException: Credential vending was requested for 
table ns.t1, but no credentials are available
        at 
org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:230)
        at 
org.apache.iceberg.rest.ErrorHandlers$TableErrorHandler.accept(ErrorHandlers.java:123)
        at 
org.apache.iceberg.rest.ErrorHandlers$TableErrorHandler.accept(ErrorHandlers.java:107)
        at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:215)
        at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:299)
        at org.apache.iceberg.rest.BaseHTTPClient.post(BaseHTTPClient.java:88)
        at 
org.apache.iceberg.rest.RESTSessionCatalog$Builder.stageCreate(RESTSessionCatalog.java:921)
        at 
org.apache.iceberg.rest.RESTSessionCatalog$Builder.createTransaction(RESTSessionCatalog.java:799)
        at 
org.apache.iceberg.CachingCatalog$CachingTableBuilder.createTransaction(CachingCatalog.java:282)
        at 
org.apache.iceberg.spark.SparkCatalog.stageCreate(SparkCatalog.java:265)
        at 
org.apache.spark.sql.connector.catalog.StagingTableCatalog.stageCreate(StagingTableCatalog.java:94)
        at 
org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:121)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
        at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:68)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:501)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:619)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:613)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:613)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:310)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:75)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:52)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   ```
   



##########
getting-started/ceph/README.md:
##########
@@ -0,0 +1,147 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+ 
+   http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Getting Started with Apache Polaris and Ceph
+
+## Overview
+
+This guide describes how to spin up a **single-node Ceph cluster** with 
**RADOS Gateway (RGW)** for S3-compatible storage and configure it for use by 
**Polaris**.
+
+This example cluster is configured for basic access key authentication only.
+It does not include STS (Security Token Service) or temporary credentials.
+All access to the Ceph RGW (RADOS Gateway) and Polaris integration uses static 
S3-style credentials (as configured via radosgw-admin user create).
+
+Spark is used as a query engine. This example assumes a local Spark 
installation.
+See the [Spark Notebooks Example](../spark/README.md) for a more advanced 
Spark setup.
+
+## Starting the Example
+
+Before starting the Ceph + Polaris stack, you’ll need to configure environment 
variables that define network settings, credentials, and cluster IDs.
+
+The services are started **in sequence**:
+1. Monitor + Manager
+2. OSD
+3. RGW
+4. Polaris
+
+Note: this example pulls the `apache/polaris:latest` image, but assumes the 
image is `1.2.0-incubating` or later. 
+
+### 1. Copy the example environment file
+```shell
+cp .env.example .env
+```
+
+### 2. Start monitor and manager
+```shell
+docker compose up -d mon1 mgr
+```
+
+### 3. Start OSD
+```shell
+docker compose up -d osd1
+```
+
+### 4. Start RGW
+```shell
+docker compose up -d rgw1
+```
+#### Check status
+```shell
+docker exec --interactive --tty ceph-mon1-1 ceph -s
+```
+You should see something like:
+```yaml
+cluster:
+  id:     b2f59c4b-5f14-4f8c-a9b7-3b7998c76a0e
+  health: HEALTH_WARN
+          mon is allowing insecure global_id reclaim
+          1 monitors have not enabled msgr2
+          6 pool(s) have no replicas configured
+
+services:
+  mon: 1 daemons, quorum mon1 (age 49m)
+  mgr: mgr(active, since 94m)
+  osd: 1 osds: 1 up (since 36m), 1 in (since 93m)
+  rgw: 1 daemon active (1 hosts, 1 zones)
+```
+
+### 5. Create bucket for Polaris storage
+```shell
+docker compose up -d setup_bucket
+```
+
+### 6. Run Polaris service
+```shell
+docker compose up -d polaris
+```
+
+### 7. Setup polaris catalog
+```shell
+docker compose up -d polaris-setup
+```
+
+## Connecting From Spark

Review Comment:
   ```suggestion
   ## 8. Connecting From Spark
   ```
   



##########
getting-started/ceph/README.md:
##########
@@ -0,0 +1,147 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+ 
+   http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Getting Started with Apache Polaris and Ceph
+
+## Overview
+
+This guide describes how to spin up a **single-node Ceph cluster** with 
**RADOS Gateway (RGW)** for S3-compatible storage and configure it for use by 
**Polaris**.
+
+This example cluster is configured for basic access key authentication only.
+It does not include STS (Security Token Service) or temporary credentials.
+All access to the Ceph RGW (RADOS Gateway) and Polaris integration uses static 
S3-style credentials (as configured via radosgw-admin user create).
+
+Spark is used as a query engine. This example assumes a local Spark 
installation.
+See the [Spark Notebooks Example](../spark/README.md) for a more advanced 
Spark setup.
+
+## Starting the Example
+
+Before starting the Ceph + Polaris stack, you’ll need to configure environment 
variables that define network settings, credentials, and cluster IDs.
+
+The services are started **in sequence**:
+1. Monitor + Manager
+2. OSD
+3. RGW
+4. Polaris
+
+Note: this example pulls the `apache/polaris:latest` image, but assumes the 
image is `1.2.0-incubating` or later. 
+
+### 1. Copy the example environment file
+```shell
+cp .env.example .env
+```
+
+### 2. Start monitor and manager
+```shell
+docker compose up -d mon1 mgr
+```
+
+### 3. Start OSD
+```shell
+docker compose up -d osd1
+```
+
+### 4. Start RGW
+```shell
+docker compose up -d rgw1
+```
+#### Check status
+```shell
+docker exec --interactive --tty ceph-mon1-1 ceph -s
+```
+You should see something like:
+```yaml
+cluster:
+  id:     b2f59c4b-5f14-4f8c-a9b7-3b7998c76a0e
+  health: HEALTH_WARN
+          mon is allowing insecure global_id reclaim
+          1 monitors have not enabled msgr2
+          6 pool(s) have no replicas configured
+
+services:
+  mon: 1 daemons, quorum mon1 (age 49m)
+  mgr: mgr(active, since 94m)
+  osd: 1 osds: 1 up (since 36m), 1 in (since 93m)
+  rgw: 1 daemon active (1 hosts, 1 zones)
+```
+
+### 5. Create bucket for Polaris storage
+```shell
+docker compose up -d setup_bucket
+```
+
+### 6. Run Polaris service
+```shell
+docker compose up -d polaris
+```
+
+### 7. Setup polaris catalog
+```shell
+docker compose up -d polaris-setup
+```
+
+## Connecting From Spark
+
+```shell
+bin/spark-sql \
+    --packages 
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0
 \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.polaris.type=rest \
+    --conf 
spark.sql.catalog.polaris.io-impl="org.apache.iceberg.aws.s3.S3FileIO" \
+    --conf spark.sql.catalog.polaris.uri=http://polaris:8181/api/catalog \
+    --conf spark.sql.catalog.polaris.token-refresh-enabled=true \
+    --conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \
+    --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \
+    --conf spark.sql.catalog.polaris.credential=root:s3cr3t \
+    --conf spark.sql.catalog.polaris.client.region=irrelevant \
+    --conf spark.sql.catalog.polaris.s3.access-key-id=$RGW_ACCESS_KEY \
+    --conf spark.sql.catalog.polaris.s3.secret-access-key=$RGW_SECRET_KEY
+```
+
+Note: `s3cr3t` is defined as the password for the `root` user in the 
`docker-compose.yml` file.
+
+Note: The `client.region` configuration is required for the AWS S3 client to 
work, but it is not used in this example
+since Ceph does not require a specific region.
+
+## Running Queries

Review Comment:
   ```suggestion
   ## 9. Running Queries
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add Docker-based Ceph + Polaris cluster setup [polaris]

Reply via email to