[I] Remove Postgres credentials from CU Master and CU Worker [texera]

via GitHub Sun, 10 May 2026 10:55:49 -0700


bobbai00 opened a new issue, #5011:
URL: https://github.com/apache/texera/issues/5011


   ### Task Summary
   
   ## Motivation
   
   CU Master and CU Worker run user-supplied UDF code. Today the CU pod
   ships with the database credentials (`STORAGE_JDBC_URL`,
   `STORAGE_JDBC_USERNAME`, `STORAGE_JDBC_PASSWORD`) in its environment so
   the engine can read and write the metadata tables directly. Anything
   that escapes the UDF sandbox can read those env vars and run arbitrary
   SQL against the shared Postgres instance — read other users' workflows,
   modify execution rows, drop data.
   
   Removing the credentials from the executor closes that exposure. Web-app
   should remain the only writer to the metadata DB; the executor should
   hold no credentials.
   
   ## Current Usage
   
   On `main`, `ComputingUnitMaster.run` opens a JDBC pool at startup:
   
   ```scala
   SqlServer.initConnection(
     StorageConfig.jdbcUrl,
     StorageConfig.jdbcUsername,
     StorageConfig.jdbcPassword
   )
   ```
   
   Once the pool is open, several engine and service code paths reach
   Postgres directly via `SqlServer`:
   
   | Area | File | What it does |
   |---|---|---|
   | Execution lifecycle | `web/service/WorkflowService.scala` | 
`ExecutionsMetadataPersistService.insertNewExecution` (INSERT new row), 
`tryUpdateExistingExecution` (UPDATE status, log_location, runtime_stats_uri, 
result, etc.). |
   | State transitions | `web/storage/ExecutionStateStore.scala` 
(`updateWorkflowState`) | Persists every workflow state change 
(READY/RUNNING/COMPLETED/FAILED/…). Called from many sites in `WorkflowService` 
/ `WorkflowExecutionService`. |
   | Operator/port URI registry | 
`web/resource/.../WorkflowExecutionsResource.scala` | 
`insertOperatorPortResultUri`, `insertOperatorConsoleUri`, 
`getResultUriByLogicalPortId`, etc. Called by engine code (e.g. 
`RegionExecutionCoordinator`) and by `SyncExecutionResource`. |
   | Result/log cleanup | `web/ComputingUnitMaster.scala` (`cleanExecutions`, 
`recurringCheckExpiredResults`) | On startup and on a recurring schedule, 
queries `workflow_executions` for expired rows and updates their status. |
   | Cost-based scheduling | 
`engine/architecture/scheduling/CostEstimator.scala` | 
`getOperatorExecutionTimeInSeconds` reads the latest successful 
`workflow_executions.runtime_stats_uri` for a `wid`. |
   | Dataset path resolution | `common/workflow-core/.../FileResolver.scala` | 
`datasetResolveFunc` joins `USER × DATASET × DATASET_VERSION` to translate 
`/owner/dataset/version/file` into a `dataset:///<repo>/<hash>/<file>` URI. Hit 
during workflow compile. |
   
   `ComputingUnitWorker.scala` itself is trivial (only calls
   `AmberRuntime.startActorWorker`), but a Worker process shares the engine
   code with the Master, so any of the engine-side call sites above
   (notably `RegionExecutionCoordinator` and `CostEstimator`) execute
   inside the Worker process when the corresponding actor is hosted there.
   That is why Worker pods are also deployed with `STORAGE_JDBC_*` today.
   
   ## Proposed Design
   
   Move every direct DB access reachable from CU Master / CU Worker behind
   an HTTP service that owns the credentials. The executor holds no JDBC
   config and authenticates each call by forwarding the originating user's
   JWT.
   
   ```
                        ┌─ web-app ──────────────┐
   CU Master/Worker ──▶ │  (execution metadata)  │ ──▶ Postgres
      (JWT only)        │  file-service (datasets)│
                        └─────────────────────────┘
   ```
   
   ## Roadmap
   
   1. Inventory every `SqlServer` call site reachable from CU Master /
      Worker (the table above is the starting point; double-check by
      grepping the `WorkflowExecutionService` runtime classpath).
   2. For each call site, define the HTTP contract that replaces it on
      the appropriate owning service (web-app for execution metadata,
      file-service for dataset resolution).
   3. Establish the JWT-forwarding plumbing from each entry point on CU
      Master through to the call site.
   4. Migrate call sites one by one; each migration is independently
      reviewable and behind a feature toggle if needed.
   5. Remove `SqlServer.initConnection` from `ComputingUnitMaster.run`.
   6. Drop `STORAGE_JDBC_*` from CU pod templates
      (`bin/k8s/templates/workflow-computing-unit-*.yaml`,
      `bin/single-node/docker-compose.yml`) and from any forwarding logic
      in `ComputingUnitManagingResource` that pushes those vars into
      spawned pods.
   7. Add a smoke test that boots CU Master with `unset STORAGE_JDBC_*`
      and runs a workflow end-to-end, locking the contract in CI.
   
   ### Task Type
   
   - [x] Refactor / Cleanup
   - [ ] DevOps / Deployment / CI
   - [ ] Testing / QA
   - [ ] Documentation
   - [ ] Performance
   - [ ] Other
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Remove Postgres credentials from CU Master and CU Worker [texera]

Reply via email to