snazy commented on code in PR #3022: URL: https://github.com/apache/polaris/pull/3022#discussion_r2522984628
########## getting-started/ceph/README.md: ########## @@ -0,0 +1,152 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# Getting Started with Apache Polaris and Ceph + +## Overview + +This guide describes how to spin up a **single-node Ceph cluster** with **RADOS Gateway (RGW)** for S3-compatible storage and configure it for use by **Polaris**. + +This example cluster is configured for basic access key authentication only. +It does not include STS (Security Token Service) or temporary credentials. +All access to the Ceph RGW (RADOS Gateway) and Polaris integration uses static S3-style credentials (as configured via radosgw-admin user create). + +Spark is used as a query engine. This example assumes a local Spark installation. +See the [Spark Notebooks Example](../spark/README.md) for a more advanced Spark setup. + +## Starting the Example + +Before starting the Ceph + Polaris stack, you’ll need to configure environment variables that define network settings, credentials, and cluster IDs. + +The services are started **in sequence**: +1. Monitor + Manager +2. OSD +3. RGW +4. Polaris + +Note: this example pulls the `apache/polaris:latest` image, but assumes the image is `1.2.0-incubating` or later. + +### 1. Copy the example environment file +```shell +cp .env.example .env +``` + +### 2. Prepare Network +```shell +# Optional: force runtime (docker or podman) +export RUNTIME=docker + +./getting-started/ceph/prepare-network.sh +``` + +### 3. Start monitor and manager +```shell +docker compose up -d mon1 mgr +``` + +### 4. Start OSD +```shell +docker compose up -d osd1 +``` + +### 5. Start RGW +```shell +docker compose up -d rgw1 +``` +#### Check status +```shell +docker exec --interactive --tty ceph-mon1-1 ceph -s +``` +You should see something like: +```yaml +cluster: + id: b2f59c4b-5f14-4f8c-a9b7-3b7998c76a0e + health: HEALTH_WARN + mon is allowing insecure global_id reclaim + 1 monitors have not enabled msgr2 + 6 pool(s) have no replicas configured + +services: + mon: 1 daemons, quorum mon1 (age 49m) + mgr: mgr(active, since 94m) + osd: 1 osds: 1 up (since 36m), 1 in (since 93m) + rgw: 1 daemon active (1 hosts, 1 zones) +``` + +### 6. Create bucket for Polaris storage +```shell +docker compose up -d setup_bucket +``` + +### 7. Run Polaris service +```shell +docker compose up -d polaris +``` + +### 8. Setup polaris catalog +```shell +docker compose up -d polaris-setup +``` + +## Connecting From Spark + +```shell +bin/spark-sql \ + --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \ + --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ + --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \ + --conf spark.sql.catalog.polaris.type=rest \ + --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \ + --conf spark.sql.catalog.polaris.token-refresh-enabled=false \ + --conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \ + --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \ + --conf spark.sql.catalog.polaris.credential=root:s3cr3t \ + --conf spark.sql.catalog.polaris.client.region=irrelevant +``` + +Note: `s3cr3t` is defined as the password for the `root` user in the `docker-compose.yml` file. + +Note: The `client.region` configuration is required for the AWS S3 client to work, but it is not used in this example +since Ceph does not require a specific region. + +## Running Queries + +Run inside the Spark SQL shell: + +``` +spark-sql (default)> use polaris; +Time taken: 0.837 seconds + +spark-sql ()> create namespace ns; +Time taken: 0.374 seconds + +spark-sql ()> create table ns.t1 as select 'abc'; Review Comment: ``` $ docker container logs ceph-polaris-setup-1 fetch https://dl-cdn.alpinelinux.org/alpine/v3.22/main/x86_64/APKINDEX.tar.gz fetch https://dl-cdn.alpinelinux.org/alpine/v3.22/community/x86_64/APKINDEX.tar.gz (1/2) Installing oniguruma (6.9.10-r0) (2/2) Installing jq (1.8.0-r0) Executing busybox-1.37.0-r19.trigger OK: 13 MiB in 27 packages Creating catalog... fetch https://dl-cdn.alpinelinux.org/alpine/v3.22/main/x86_64/APKINDEX.tar.gz fetch https://dl-cdn.alpinelinux.org/alpine/v3.22/community/x86_64/APKINDEX.tar.gz OK: 13 MiB in 27 packages Obtained access token: eyJhbG... STORAGE_LOCATION is set to 's3://polaris-storage' Using StorageType: S3 Creating a catalog named quickstart_catalog in realm POLARIS... { "catalog": { "name": "quickstart_catalog", "type": "INTERNAL", "readOnly": false, "properties": { "default-base-location": "s3://polaris-storage" }, "storageConfigInfo": {"storageType":"S3", "endpoint":"http://rgw1:7480", "stsUnavailable":"true", "pathStyleAccess":true} } } * Host polaris:8181 was resolved. * IPv6: (none) * IPv4: 10.89.0.7 * Trying 10.89.0.7:8181... * Connected to polaris (10.89.0.7) port 8181 * using HTTP/1.x > POST /api/management/v1/catalogs HTTP/1.1 > Host: polaris:8181 > User-Agent: curl/8.14.1 > Authorization: Bearer eyJhbGciO... > Accept: application/json > Content-Type: application/json > Polaris-Realm: POLARIS > Content-Length: 326 > } [326 bytes data] * upload completely sent off: 326 bytes < HTTP/1.1 201 Created < Content-Type: application/json;charset=UTF-8 < content-length: 343 < Polaris-Request-Id: b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000002 < { [343 bytes data] * Connection #0 to host polaris left intact {"type":"INTERNAL","name":"quickstart_catalog","properties":{"default-base-location":"s3://polaris-storage"},"createTimestamp":1763031212314,"lastUpdateTimestamp":0,"entityVersion":1,"storageConfigInfo":{"endpoint":"http://rgw1:7480","stsUnavailable":true,"pathStyleAccess":true,"storageType":"S3","allowedLocations":["s3://polaris-storage"]}} Done. Extra grants... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 56 0 0 100 56 0 2524 --:--:-- --:--:-- --:--:-- 2545 Done. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 370 100 370 0 0 46464 0 --:--:-- --:--:-- --:--:-- 52857 {"catalogs":[{"type":"INTERNAL","name":"quickstart_catalog","properties":{"default-base-location":"s3://polaris-storage"},"createTimestamp":1763031212314,"lastUpdateTimestamp":1763031212314,"entityVersion":1,"storageConfigInfo":{"endpoint":"http://rgw1:7480","stsUnavailable":true,"pathStyleAccess":true,"storageType":"S3","allowedLocations":["s3://polaris-storage"]}}]} ``` ``` $ docker container logs ceph-polaris-1 INFO exec -a "java" java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005 -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:+ExitOnOutOfMemoryError -cp "." -jar /deployments/quarkus-run.jar INFO running in /deployments Listening for transport dt_socket at address: 5005 ... Powered by Quarkus 3.28.2 2025-11-13 10:53:26,710 WARN [io.qua.config] [,] [,,,] (main) The "quarkus.log.file.enable" config property is deprecated and should not be used anymore. 2025-11-13 10:53:26,711 WARN [io.qua.config] [,] [,,,] (main) The "quarkus.log.console.enable" config property is deprecated and should not be used anymore. 2025-11-13 10:53:28,092 INFO [org.apa.pol.ser.con.ServiceProducers] [,] [,,,] (main) Bootstrapping realm(s) 'POLARIS', if necessary, from root credentials set provided via the environment variable POLARIS_BOOTSTRAP_CREDENTIALS or Java system property polaris.bootstrap.credentials ... 2025-11-13 10:53:28,171 INFO [org.apa.pol.ser.con.ServiceProducers] [,] [,,,] (main) Realm 'POLARIS' automatically bootstrapped, credentials taken from root credentials set provided via the environment variable POLARIS_BOOTSTRAP_CREDENTIALS or Java system property polaris.bootstrap.credentials, not printed to stdout. 2025-11-13 10:53:28,183 WARN [org.apa.pol.ser.con.ProductionReadinessChecks] [,] [,,,] (main) ⚠️ Production readiness checks failed! Check the warnings below. 2025-11-13 10:53:28,183 WARN [org.apa.pol.ser.con.ProductionReadinessChecks] [,] [,,,] (main) - ⚠️ The current metastore is intended for tests only. Offending configuration option: 'polaris.persistence.type'. 2025-11-13 10:53:28,183 WARN [org.apa.pol.ser.con.ProductionReadinessChecks] [,] [,,,] (main) - ⚠️ A public key file wasn't provided and will be generated. Offending configuration option: 'polaris.authentication.token-broker.rsa-key-pair.public-key-file'. 2025-11-13 10:53:28,184 WARN [org.apa.pol.ser.con.ProductionReadinessChecks] [,] [,,,] (main) - ⚠️ A private key file wasn't provided and will be generated. Offending configuration option: 'polaris.authentication.token-broker.rsa-key-pair.private-key-file'. 2025-11-13 10:53:28,184 WARN [org.apa.pol.ser.con.ProductionReadinessChecks] [,] [,,,] (main) - ⚠️ The realm context resolver is configured to map requests without a realm header to the default realm. Offending configuration option: 'polaris.realm-context.require-header'. 2025-11-13 10:53:28,184 WARN [org.apa.pol.ser.con.ProductionReadinessChecks] [,] [,,,] (main) Refer to https://polaris.apache.org/in-dev/unreleased/configuring-polaris-for-production for more information. 2025-11-13 10:53:28,237 INFO [io.quarkus] [,] [,,,] (main) Apache Polaris Server (incubating) 1.2.0-incubating on JVM (powered by Quarkus 3.28.2) started in 2.803s. Listening on: http://0.0.0.0:8181. Management interface listening on http://0.0.0.0:8182. 2025-11-13 10:53:28,238 INFO [io.quarkus] [,] [,,,] (main) Profile prod activated. 2025-11-13 10:53:28,238 INFO [io.quarkus] [,] [,,,] (main) Installed features: [agroal, amazon-sdk-rds, cdi, hibernate-validator, jdbc-postgresql, micrometer, narayana-jta, oidc, opentelemetry, reactive-routes, rest, rest-jackson, security, smallrye-context-propagation, smallrye-fault-tolerance, smallrye-health, vertx] 2025-11-13 10:53:31,860 INFO [org.apa.pol.ser.con.PolarisIcebergObjectMapperCustomizer] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000001,POLARIS] [,,,] (executor-thread-1) Limiting request body size to 10485760 bytes 2025-11-13 10:53:31,883 INFO [io.qua.htt.access-log] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000001,POLARIS] [,,,] (executor-thread-1) 10.89.0.8 - - [13/Nov/2025:10:53:31 +0000] "POST /api/catalog/v1/oauth/tokens HTTP/1.1" 200 757 2025-11-13 10:53:32,318 INFO [org.apa.pol.ser.adm.PolarisServiceImpl] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000002,POLARIS] [,,,] (executor-thread-1) Created new catalog class PolarisCatalog { class Catalog { type: INTERNAL name: quickstart_catalog properties: class CatalogProperties { {default-base-location=s3://polaris-storage} defaultBaseLocation: s3://polaris-storage } createTimestamp: 1763031212314 lastUpdateTimestamp: 0 entityVersion: 1 storageConfigInfo: class AwsStorageConfigInfo { class StorageConfigInfo { storageType: S3 allowedLocations: [s3://polaris-storage] } roleArn: null externalId: null userArn: null region: null endpoint: http://rgw1:7480 stsEndpoint: null stsUnavailable: true endpointInternal: null pathStyleAccess: true } } } 2025-11-13 10:53:32,322 INFO [io.qua.htt.access-log] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000002,POLARIS] [,,,] (executor-thread-1) 10.89.0.8 - root [13/Nov/2025:10:53:32 +0000] "POST /api/management/v1/catalogs HTTP/1.1" 201 343 2025-11-13 10:53:32,334 INFO [org.apa.pol.ser.adm.PolarisServiceImpl] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000003,POLARIS] [,,,] (executor-thread-1) Adding grant class AddGrantRequest { grant: class CatalogGrant { class GrantResource { type: catalog } privilege: CATALOG_MANAGE_CONTENT } } to catalogRole catalog_admin in catalog quickstart_catalog 2025-11-13 10:53:32,347 INFO [io.qua.htt.access-log] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000003,POLARIS] [,,,] (executor-thread-1) 10.89.0.8 - root [13/Nov/2025:10:53:32 +0000] "PUT /api/management/v1/catalogs/quickstart_catalog/catalog-roles/catalog_admin/grants HTTP/1.1" 201 - 2025-11-13 10:53:32,358 INFO [io.qua.htt.access-log] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000004,POLARIS] [,,,] (executor-thread-1) 10.89.0.8 - root [13/Nov/2025:10:53:32 +0000] "GET /api/management/v1/catalogs HTTP/1.1" 200 370 2025-11-13 10:53:55,820 INFO [io.qua.htt.access-log] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000005,POLARIS] [,,,] (executor-thread-1) 10.89.0.7 - - [13/Nov/2025:10:53:55 +0000] "POST /api/catalog/v1/oauth/tokens HTTP/1.1" 200 757 2025-11-13 10:53:55,903 INFO [io.qua.htt.access-log] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000006,POLARIS] [,,,] (executor-thread-1) 10.89.0.7 - root [13/Nov/2025:10:53:55 +0000] "GET /api/catalog/v1/config?warehouse=quickstart_catalog HTTP/1.1" 200 2128 2025-11-13 10:54:01,849 INFO [org.apa.pol.ser.exc.IcebergExceptionMapper] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000007,POLARIS] [,,,] (executor-thread-1) Handling runtimeException Namespace does not exist: ns 2025-11-13 10:54:01,859 INFO [io.qua.htt.access-log] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000007,POLARIS] [,,,] (executor-thread-1) 10.89.0.7 - root [13/Nov/2025:10:54:01 +0000] "GET /api/catalog/v1/quickstart_catalog/namespaces/ns HTTP/1.1" 404 97 2025-11-13 10:54:01,891 INFO [org.apa.pol.ser.cat.ice.IcebergCatalogHandler] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000008,POLARIS] [,,,] (executor-thread-1) Initializing non-federated catalog 2025-11-13 10:54:01,913 INFO [io.qua.htt.access-log] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000008,POLARIS] [,,,] (executor-thread-1) 10.89.0.7 - root [13/Nov/2025:10:54:01 +0000] "POST /api/catalog/v1/quickstart_catalog/namespaces HTTP/1.1" 200 89 2025-11-13 10:54:07,044 INFO [org.apa.pol.ser.exc.IcebergExceptionMapper] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000009,POLARIS] [,,,] (executor-thread-1) Handling runtimeException Table does not exist: ns.t1 2025-11-13 10:54:07,045 INFO [io.qua.htt.access-log] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000009,POLARIS] [,,,] (executor-thread-1) 10.89.0.7 - root [13/Nov/2025:10:54:07 +0000] "GET /api/catalog/v1/quickstart_catalog/namespaces/ns/tables/t1?snapshots=all HTTP/1.1" 404 92 2025-11-13 10:54:07,079 INFO [org.apa.pol.ser.cat.ice.IcebergCatalogHandler] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000010,POLARIS] [,,,] (executor-thread-1) Initializing non-federated catalog 2025-11-13 10:54:07,083 INFO [org.apa.ice.BaseMetastoreCatalog] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000010,POLARIS] [,,,] (executor-thread-1) Table properties set at catalog level through catalog properties: {} 2025-11-13 10:54:07,084 INFO [org.apa.ice.BaseMetastoreCatalog] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000010,POLARIS] [,,,] (executor-thread-1) Table properties enforced at catalog level through catalog properties: {} 2025-11-13 10:54:07,114 INFO [org.apa.pol.ser.exc.IcebergExceptionMapper] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000010,POLARIS] [,,,] (executor-thread-1) Handling runtimeException Credential vending was requested for table ns.t1, but no credentials are available 2025-11-13 10:54:07,115 INFO [io.qua.htt.access-log] [b5d3ec71-48f7-4e21-905f-2e3f32109485_0000000000000000010,POLARIS] [,,,] (executor-thread-1) 10.89.0.7 - root [13/Nov/2025:10:54:07 +0000] "POST /api/catalog/v1/quickstart_catalog/namespaces/ns/tables HTTP/1.1" 400 151 ``` ``` Credential vending was requested for table ns.t1, but no credentials are available java.lang.IllegalArgumentException: Credential vending was requested for table ns.t1, but no credentials are available at org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:230) ``` (same error as mentioned below) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
