mengw15 commented on code in PR #4274:
URL: https://github.com/apache/texera/pull/4274#discussion_r3114584063
##########
bin/single-node/docker-compose.yml:
##########
@@ -75,6 +105,132 @@ services:
timeout: 5s
retries: 10
+ # Lakekeeper migration init container
+ # This runs once to migrate the database before the lakekeeper server starts
+ lakekeeper-migrate:
+ image: vakamo/lakekeeper:v0.11.0
+ container_name: texera-lakekeeper-migrate
+ depends_on:
+ postgres:
+ condition: service_healthy
+ env_file:
+ - .env
+ restart: "no"
+ entrypoint: ["/home/nonroot/lakekeeper"]
+ command: ["migrate"]
+
+ # Lakekeeper is the Iceberg REST catalog service
+ lakekeeper:
+ image: vakamo/lakekeeper:v0.11.0
+ container_name: texera-lakekeeper
+ restart: always
+ depends_on:
+ postgres:
+ condition: service_healthy
+ minio:
+ condition: service_started
+ lakekeeper-migrate:
+ condition: service_completed_successfully
+ env_file:
+ - .env
+ entrypoint: ["/home/nonroot/lakekeeper"]
+ command: ["serve"]
+ healthcheck:
+ test: ["CMD", "/home/nonroot/lakekeeper", "healthcheck"]
+ interval: 10s
+ timeout: 5s
+ retries: 10
+ start_period: 10s
+
+ # One-shot init container that creates the Lakekeeper default project and
+ # the Iceberg warehouse pointing at the MinIO bucket prepared by minio-init.
+ lakekeeper-init:
+ image: alpine:3.19
+ container_name: texera-lakekeeper-init
+ depends_on:
+ lakekeeper:
+ condition: service_healthy
+ minio-init:
+ condition: service_completed_successfully
+ env_file:
+ - .env
+ restart: "no"
+ entrypoint: [ "/bin/sh", "-c" ]
+ command:
+ - |
+ set -e
+
+ echo "Installing dependencies..."
+ apk add --no-cache curl ca-certificates
+
+ check_status() {
+ if [ "$$1" -ge 200 ] && [ "$$1" -lt 300 ]; then
+ echo "Created $$2 successfully (HTTP $$1)."
+ elif [ "$$1" -eq 409 ]; then
+ echo "$$2 already exists (HTTP 409). Treating as success."
+ else
+ echo "Failed to create $$2. HTTP Code: $$1"
+ echo "ERROR RESPONSE:"
+ if [ -f /tmp/response.txt ]; then cat /tmp/response.txt; fi
+ echo ""
+ exit 1
+ fi
+ }
+
+ echo "Step 1: Initializing Default Project..."
+ PROJECT_PAYLOAD='{"project-id":
"00000000-0000-0000-0000-000000000000", "project-name": "default"}'
+
+ PROJECT_CODE=$$(curl -s -o /tmp/response.txt -w "%{http_code}" \
+ -X POST \
+ -H "Content-Type: application/json" \
+ -d "$$PROJECT_PAYLOAD" \
+ "$$LAKEKEEPER_BASE_URI/management/v1/project" || echo "000")
+
+ check_status "$$PROJECT_CODE" "Default Project"
+
+
+ echo "Step 2: Initializing Warehouse
'$$STORAGE_ICEBERG_CATALOG_REST_WAREHOUSE_NAME'..."
+ CREATE_PAYLOAD=$$(cat <<EOF
+ {
+ "warehouse-name": "$$STORAGE_ICEBERG_CATALOG_REST_WAREHOUSE_NAME",
+ "project-id": "00000000-0000-0000-0000-000000000000",
+ "storage-profile": {
+ "type": "s3",
+ "bucket": "$$STORAGE_ICEBERG_CATALOG_REST_S3_BUCKET",
+ "region": "$$STORAGE_S3_REGION",
+ "endpoint": "$$STORAGE_S3_ENDPOINT",
+ "flavor": "s3-compat",
+ "path-style-access": true,
+ "sts-enabled": false
+ },
+ "storage-credential": {
+ "type": "s3",
+ "credential-type": "access-key",
+ "aws-access-key-id": "$$STORAGE_S3_AUTH_USERNAME",
+ "aws-secret-access-key": "$$STORAGE_S3_AUTH_PASSWORD"
+ }
+ }
+ EOF
+ )
+
+ WAREHOUSE_CODE=$$(curl -s -o /tmp/response.txt -w "%{http_code}" \
+ -X POST \
+ -H "Content-Type: application/json" \
+ -d "$$CREATE_PAYLOAD" \
+ "$$LAKEKEEPER_BASE_URI/management/v1/warehouse" || echo "000")
+
+ # Lakekeeper returns 400 CreateWarehouseStorageProfileOverlap when a
Review Comment:
The 400 CreateWarehouseStorageProfileOverlap branch below here already gives
us the "skip if exists" behavior — if the warehouse is already there,
Lakekeeper returns 400 with that specific error code and we treat it as
success, so re-runs are idempotent.
I also checked the explicit pre-check option. Lakekeeper does not seem to
provide a lookup-by-name API; we would need to list warehouses and parse the
JSON response by name, which adds extra script complexity for the same outcome.
But if we feel explicit pre-check would be better, I am willing to change it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]