This is an automated email from the ASF dual-hosted git repository.
mengw15 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/texera.git
The following commit(s) were added to refs/heads/main by this push:
new dec8c4a3b9 feat(docker-compose): use Lakekeeper as the Iceberg Catalog
when launching via `docker compose` (#4274)
dec8c4a3b9 is described below
commit dec8c4a3b97986e55ad64cd9b3a0069182d32109
Author: Meng Wang <[email protected]>
AuthorDate: Tue Apr 21 18:15:19 2026 -0700
feat(docker-compose): use Lakekeeper as the Iceberg Catalog when launching
via `docker compose` (#4274)
<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
2. Ensure you have added or run the appropriate tests for your PR
3. If the PR is work in progress, mark it a draft on GitHub.
4. Please write your PR title to summarize what this PR proposes, we
are following Conventional Commits style for PR titles as well.
5. Be sure to keep the PR description updated to reflect all changes.
-->
### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, you can clarify why it is a bug.
3. If it is a refactoring, clarify what has been changed.
3. It would be helpful to include a before-and-after comparison using
screenshots or GIFs.
4. Please consider writing useful notes for better and faster reviews.
-->
This PR updates the single-node Docker Compose deployment to integrate
[Lakekeeper](https://github.com/lakekeeper/lakekeeper) as the Iceberg
REST catalog.
**Changes to `bin/single-node/docker-compose.yml`:**
- Added `lakekeeper-migrate` init container: runs database migration
before Lakekeeper starts
- Added `lakekeeper` service (`vakamo/lakekeeper:v0.11.0`): REST catalog
server with health checks
- Added `lakekeeper-init` container: bootstraps default project, creates
MinIO bucket, and configures warehouse
**Changes to `bin/single-node/.env`:**
- Added Lakekeeper config: database URLs, encryption key, base URI
- Added REST catalog config: warehouse name, S3 bucket, region, catalog
URI
- Set `STORAGE_ICEBERG_CATALOG_TYPE=rest`
- Added `IMAGE_REGISTRY`, `IMAGE_TAG`, `TEXERA_PORT`, `MINIO_PORT`,
`TEXERA_SERVICE_LOG_LEVEL` variables
### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes #1234`, `Resolves
#1234`
or `Closes #1234`. If it is only related, simply mention the issue
number.
2. If there is design documentation, please add the link.
3. If there is a discussion in the mailing list, please add the link.
-->
issue: close #4370
### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->
Manually tested
### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'.
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
co-authored with Claude
---------
Co-authored-by: Chen Li <[email protected]>
---
bin/single-node/.env | 40 ++++++--
bin/single-node/docker-compose.yml | 193 +++++++++++++++++++++++++++++++++++++
2 files changed, 224 insertions(+), 9 deletions(-)
diff --git a/bin/single-node/.env b/bin/single-node/.env
index 2c949de0fb..dafec10ccc 100644
--- a/bin/single-node/.env
+++ b/bin/single-node/.env
@@ -15,6 +15,7 @@
# specific language governing permissions and limitations
# under the License.
+# Public host and ports exposed by the deployment
TEXERA_HOST=http://localhost
TEXERA_PORT=8080
MINIO_PORT=9000
@@ -27,37 +28,58 @@ TEXERA_SERVICE_LOG_LEVEL=INFO
IMAGE_REGISTRY=ghcr.io/apache
IMAGE_TAG=latest
+# Admin credentials for Texera
+USER_SYS_ADMIN_USERNAME=texera
+USER_SYS_ADMIN_PASSWORD=texera
+
+# Postgres root credentials
POSTGRES_USER=texera
POSTGRES_PASSWORD=password
-MINIO_ROOT_USER=texera_minio
-MINIO_ROOT_PASSWORD=password
+# S3 (MinIO) credentials
+STORAGE_S3_AUTH_USERNAME=texera_minio
+STORAGE_S3_AUTH_PASSWORD=password
+# LakeFS server configuration
LAKEFS_INSTALLATION_USER_NAME=texera-admin
LAKEFS_INSTALLATION_ACCESS_KEY_ID=AKIAIOSFOLKFSSAMPLES
LAKEFS_INSTALLATION_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
LAKEFS_BLOCKSTORE_TYPE=s3
LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true
LAKEFS_BLOCKSTORE_S3_ENDPOINT=http://texera-minio:9000
-LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=texera_minio
-LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY=password
LAKEFS_AUTH_ENCRYPT_SECRET_KEY=random_string_for_lakefs
LAKEFS_LOGGING_LEVEL=INFO
LAKEFS_STATS_ENABLED=1
LAKEFS_DATABASE_TYPE=postgres
LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING=postgres://texera:password@texera-postgres:5432/texera_lakefs?sslmode=disable
+# Lakekeeper server configuration
+LAKEKEEPER__PG_DATABASE_URL_READ=postgres://texera:password@texera-postgres:5432/texera_lakekeeper
+LAKEKEEPER__PG_DATABASE_URL_WRITE=postgres://texera:password@texera-postgres:5432/texera_lakekeeper
+LAKEKEEPER__PG_ENCRYPTION_KEY=texera_key
+LAKEKEEPER_BASE_URI=http://texera-lakekeeper:8181
+
+# Texera storage endpoints
STORAGE_S3_ENDPOINT=http://texera-minio:9000
+STORAGE_S3_REGION=us-west-2
STORAGE_LAKEFS_ENDPOINT=http://texera-lakefs:8000/api/v1
STORAGE_JDBC_URL=jdbc:postgresql://texera-postgres:5432/texera_db?currentSchema=texera_db,public
STORAGE_JDBC_USERNAME=texera
STORAGE_JDBC_PASSWORD=password
-FILE_SERVICE_GET_PRESIGNED_URL_ENDPOINT=http://file-service:9092/api/dataset/presign-download
-FILE_SERVICE_UPLOAD_ONE_FILE_TO_DATASET_ENDPOINT=http://file-service:9092/api/dataset/did/upload
+
+# Iceberg catalog selector (valid values: rest, postgres)
+STORAGE_ICEBERG_CATALOG_TYPE=rest
+
+# Iceberg REST catalog client configuration
+STORAGE_ICEBERG_CATALOG_REST_URI=http://texera-lakekeeper:8181/catalog
+STORAGE_ICEBERG_CATALOG_REST_WAREHOUSE_NAME=texera
+STORAGE_ICEBERG_CATALOG_REST_S3_BUCKET=texera-iceberg
+
+# Postgres-backed Iceberg catalog
STORAGE_ICEBERG_CATALOG_POSTGRES_URI_WITHOUT_SCHEME=texera-postgres:5432/texera_iceberg_catalog
STORAGE_ICEBERG_CATALOG_POSTGRES_USERNAME=texera
STORAGE_ICEBERG_CATALOG_POSTGRES_PASSWORD=password
-# Admin credentials for Texera (used for login and example data loading)
-USER_SYS_ADMIN_USERNAME=texera
-USER_SYS_ADMIN_PASSWORD=texera
\ No newline at end of file
+# File service endpoints
+FILE_SERVICE_GET_PRESIGNED_URL_ENDPOINT=http://file-service:9092/api/dataset/presign-download
+FILE_SERVICE_UPLOAD_ONE_FILE_TO_DATASET_ENDPOINT=http://file-service:9092/api/dataset/did/upload
diff --git a/bin/single-node/docker-compose.yml
b/bin/single-node/docker-compose.yml
index be9b3e30bd..4329bb3e71 100644
--- a/bin/single-node/docker-compose.yml
+++ b/bin/single-node/docker-compose.yml
@@ -26,9 +26,36 @@ services:
- "${MINIO_PORT:-9000}:9000"
env_file:
- .env
+ environment:
+ - MINIO_ROOT_USER=${STORAGE_S3_AUTH_USERNAME}
+ - MINIO_ROOT_PASSWORD=${STORAGE_S3_AUTH_PASSWORD}
volumes:
- minio_data:/data
command: server --console-address ":9001" /data
+ healthcheck:
+ test: ["CMD", "curl", "-sf", "http://localhost:9000/minio/health/live"]
+ interval: 5s
+ timeout: 3s
+ retries: 10
+
+ # This job creates the S3 bucket used by the Iceberg warehouse that the
+ # Lakekeeper service manages.
+ minio-init:
+ image: minio/mc:RELEASE.2025-05-21T01-59-54Z
+ container_name: texera-minio-init
+ depends_on:
+ minio:
+ condition: service_healthy
+ env_file:
+ - .env
+ restart: "no"
+ entrypoint: ["/bin/sh", "-c"]
+ command:
+ - |
+ set -e
+ mc alias set local "$$STORAGE_S3_ENDPOINT"
"$$STORAGE_S3_AUTH_USERNAME" "$$STORAGE_S3_AUTH_PASSWORD"
+ mc mb --ignore-existing
"local/$$STORAGE_ICEBERG_CATALOG_REST_S3_BUCKET"
+ echo "MinIO bucket '$$STORAGE_ICEBERG_CATALOG_REST_S3_BUCKET' is
ready."
# PostgreSQL with PGroonga extension for full-text search.
# Used by lakeFS and Texera's metadata storage.
@@ -63,6 +90,8 @@ services:
environment:
# This port also need to be changed if the port of MinIO service is
changed
-
LAKEFS_BLOCKSTORE_S3_PRE_SIGNED_ENDPOINT=${TEXERA_HOST}:${MINIO_PORT:-9000}
+ -
LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=${STORAGE_S3_AUTH_USERNAME}
+ -
LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY=${STORAGE_S3_AUTH_PASSWORD}
entrypoint: ["/bin/sh", "-c"]
command:
- |
@@ -75,6 +104,166 @@ services:
timeout: 5s
retries: 10
+ # This job applies Lakekeeper's database schema to the Postgres instance —
+ # creating or upgrading the tables that Lakekeeper uses to track its catalog
metadata.
+ lakekeeper-migrate:
+ image: vakamo/lakekeeper:v0.11.0
+ container_name: texera-lakekeeper-migrate
+ depends_on:
+ postgres:
+ condition: service_healthy
+ env_file:
+ - .env
+ restart: "no"
+ entrypoint: ["/home/nonroot/lakekeeper"]
+ command: ["migrate"]
+
+ # Lakekeeper is the Iceberg REST catalog service
+ lakekeeper:
+ image: vakamo/lakekeeper:v0.11.0
+ container_name: texera-lakekeeper
+ restart: always
+ depends_on:
+ postgres:
+ condition: service_healthy
+ minio:
+ condition: service_started
+ lakekeeper-migrate:
+ condition: service_completed_successfully
+ env_file:
+ - .env
+ entrypoint: ["/home/nonroot/lakekeeper"]
+ command: ["serve"]
+ healthcheck:
+ test: ["CMD", "/home/nonroot/lakekeeper", "healthcheck"]
+ interval: 10s
+ timeout: 5s
+ retries: 10
+ start_period: 10s
+
+ # One-shot init container that creates the Lakekeeper default project and
+ # the Iceberg warehouse pointing at the MinIO bucket prepared by minio-init.
+ lakekeeper-init:
+ image: alpine:3.19
+ container_name: texera-lakekeeper-init
+ depends_on:
+ lakekeeper:
+ condition: service_healthy
+ minio-init:
+ condition: service_completed_successfully
+ env_file:
+ - .env
+ restart: "no"
+ entrypoint: [ "/bin/sh", "-c" ]
+ command:
+ - |
+ set -e
+
+ echo "Installing curl (to call Lakekeeper's management API) and jq (to
parse its list responses)..."
+ apk add --no-cache curl jq
+
+ # Lakekeeper organizes warehouses under "projects" (its top-level
tenant
+ # abstraction). Texera is single-tenant, so we use the NIL UUID (all
+ # zeros) as the project ID — with
LAKEKEEPER__ENABLE_DEFAULT_PROJECT=true
+ # (Lakekeeper's default), this is the project that any client request
+ # without a project-id is routed to.
+ echo "Step 1: Ensuring Lakekeeper's default project exists (top-level
tenant that will hold the Iceberg warehouse)..."
+
+ echo "Checking whether Lakekeeper's default project already exists..."
+ PROJECT_GET_CODE=$$(curl -s -o /tmp/project.txt -w "%{http_code}" \
+ "$$LAKEKEEPER_BASE_URI/management/v1/project" || echo "000")
+
+ if [ "$$PROJECT_GET_CODE" = "200" ]; then
+ echo "Lakekeeper's default project already exists. Skipping
creation."
+ elif [ "$$PROJECT_GET_CODE" = "404" ]; then
+ echo "Default project not found. Creating..."
+ PROJECT_PAYLOAD='{"project-id":
"00000000-0000-0000-0000-000000000000", "project-name": "default"}'
+
+ PROJECT_CODE=$$(curl -s -o /tmp/response.txt -w "%{http_code}" \
+ -X POST \
+ -H "Content-Type: application/json" \
+ -d "$$PROJECT_PAYLOAD" \
+ "$$LAKEKEEPER_BASE_URI/management/v1/project" || echo "000")
+
+ if [ "$$PROJECT_CODE" -lt 200 ] || [ "$$PROJECT_CODE" -ge 300 ]; then
+ echo "Failed to create Lakekeeper's default project. HTTP Code:
$$PROJECT_CODE"
+ echo "ERROR RESPONSE:"
+ if [ -f /tmp/response.txt ]; then cat /tmp/response.txt; fi
+ echo ""
+ exit 1
+ fi
+ echo "Created Lakekeeper's default project successfully (HTTP
$$PROJECT_CODE)."
+ else
+ echo "Failed to check Lakekeeper's default project. HTTP Code:
$$PROJECT_GET_CODE"
+ echo "ERROR RESPONSE:"
+ if [ -f /tmp/project.txt ]; then cat /tmp/project.txt; fi
+ echo ""
+ exit 1
+ fi
+
+
+ echo "Step 2: Ensuring Warehouse
'$$STORAGE_ICEBERG_CATALOG_REST_WAREHOUSE_NAME' exists..."
+
+ # Detect existing warehouse first so we only POST when it's actually
+ # missing — keeps this init idempotent across re-runs.
+ echo "Checking whether Lakekeeper Warehouse
'$$STORAGE_ICEBERG_CATALOG_REST_WAREHOUSE_NAME' already exists..."
+ WAREHOUSE_LIST_CODE=$$(curl -s -o /tmp/warehouses.txt -w
"%{http_code}" \
+ "$$LAKEKEEPER_BASE_URI/management/v1/warehouse" || echo "000")
+
+ if [ "$$WAREHOUSE_LIST_CODE" -lt 200 ] || [ "$$WAREHOUSE_LIST_CODE"
-ge 300 ]; then
+ echo "Failed to list Lakekeeper warehouses. HTTP Code:
$$WAREHOUSE_LIST_CODE"
+ echo "ERROR RESPONSE:"
+ if [ -f /tmp/warehouses.txt ]; then cat /tmp/warehouses.txt; fi
+ echo ""
+ exit 1
+ fi
+
+ if jq -e --arg name "$$STORAGE_ICEBERG_CATALOG_REST_WAREHOUSE_NAME"
'.warehouses[]? | select(.name == $$name)' /tmp/warehouses.txt >/dev/null; then
+ echo "Lakekeeper Warehouse
'$$STORAGE_ICEBERG_CATALOG_REST_WAREHOUSE_NAME' already exists. Skipping
creation."
+ else
+ echo "Warehouse not found. Creating
'$$STORAGE_ICEBERG_CATALOG_REST_WAREHOUSE_NAME'..."
+ CREATE_PAYLOAD=$$(cat <<EOF
+ {
+ "warehouse-name": "$$STORAGE_ICEBERG_CATALOG_REST_WAREHOUSE_NAME",
+ "project-id": "00000000-0000-0000-0000-000000000000",
+ "storage-profile": {
+ "type": "s3",
+ "bucket": "$$STORAGE_ICEBERG_CATALOG_REST_S3_BUCKET",
+ "region": "$$STORAGE_S3_REGION",
+ "endpoint": "$$STORAGE_S3_ENDPOINT",
+ "flavor": "s3-compat",
+ "path-style-access": true,
+ "sts-enabled": false
+ },
+ "storage-credential": {
+ "type": "s3",
+ "credential-type": "access-key",
+ "aws-access-key-id": "$$STORAGE_S3_AUTH_USERNAME",
+ "aws-secret-access-key": "$$STORAGE_S3_AUTH_PASSWORD"
+ }
+ }
+ EOF
+ )
+
+ WAREHOUSE_CODE=$$(curl -s -o /tmp/response.txt -w "%{http_code}" \
+ -X POST \
+ -H "Content-Type: application/json" \
+ -d "$$CREATE_PAYLOAD" \
+ "$$LAKEKEEPER_BASE_URI/management/v1/warehouse" || echo "000")
+
+ if [ "$$WAREHOUSE_CODE" -lt 200 ] || [ "$$WAREHOUSE_CODE" -ge 300 ];
then
+ echo "Failed to create Lakekeeper Warehouse. HTTP Code:
$$WAREHOUSE_CODE"
+ echo "ERROR RESPONSE:"
+ if [ -f /tmp/response.txt ]; then cat /tmp/response.txt; fi
+ echo ""
+ exit 1
+ fi
+ echo "Created Lakekeeper Warehouse successfully (HTTP
$$WAREHOUSE_CODE)."
+ fi
+
+ echo "Initialization sequence completed successfully!"
+
+
# Part2: Specification of Texera's micro-services
# FileService provides endpoints for Texera's dataset management
file-service:
@@ -166,6 +355,10 @@ services:
depends_on:
workflow-compiling-service:
condition: service_started
+ lakekeeper:
+ condition: service_healthy
+ lakekeeper-init:
+ condition: service_completed_successfully
env_file:
- .env
volumes: