This is an automated email from the ASF dual-hosted git repository.
Yicong-Huang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/texera.git
The following commit(s) were added to refs/heads/main by this push:
new 540785920d docs: rewrite AGENTS.md as agent-facing index (#4825)
540785920d is described below
commit 540785920d233f2656e1ea398c12bd114cd02860
Author: Yicong Huang <[email protected]>
AuthorDate: Sat May 2 22:00:29 2026 -0700
docs: rewrite AGENTS.md as agent-facing index (#4825)
## What changes were proposed in this PR?
Rewrite `AGENTS.md` so it works as a slim, index-style guide for coding
agents instead of duplicating contributor docs.
- Architecture map: services + Amber breakdown.
- Topic redirects: contribution / lint / format / testing / license
header now point to `CONTRIBUTING.md`, `SECURITY.md`,
`.github/PULL_REQUEST_TEMPLATE`, the issue templates, `.licenserc.yaml`,
and per-module READMEs.
- Agent-specific rules consolidated: scope/safety, worktree-based dev,
short Conventional-Commits branch and commit naming, issue-first
workflow, brevity for issues and PRs (incl. required frontend
before/after screenshots), test-driven development, and CI label gating
(label → stacks via `LABEL_STACKS` in `required-checks.yml`).
## Any related issues, documentation, discussions?
Closes #4823
## How was this PR tested?
Documentation-only change. Reviewed the rendered Markdown locally;
verified all in-repo links resolve.
## Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7
---------
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
---
AGENTS.md | 362 +++++++++++++++++++++++++++-----------------------------------
1 file changed, 155 insertions(+), 207 deletions(-)
diff --git a/AGENTS.md b/AGENTS.md
index adf05218ce..cb1cd0fdb3 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,237 +1,185 @@
# AGENTS.md
-Guidance for coding agents working in the Apache Texera repository.
-
-## Project Overview
-
-Apache Texera is a collaborative data science and AI/ML workflow system. The
-repo is a multi-language monorepo with Scala/sbt backend services, Python
worker
-runtime code, an Angular frontend, and a TypeScript/Bun agent service.
-
-Major areas:
-
-- `amber/`: workflow execution engine, Scala tests, Python worker runtime, and
- Python operator dependencies.
-- `common/`: shared Scala modules for auth, config, DAO, workflow core,
- workflow operators, and Python-template building.
-- `config-service/`, `access-control-service/`, `file-service/`,
- `computing-unit-managing-service/`, `workflow-compiling-service/`: backend
- services wired through `build.sbt`.
-- `frontend/`: Angular application. Uses Yarn 4.14.1 via Corepack for local
- development, Node >= 24.0.0, Nx, Prettier, ESLint, Karma/Jasmine, and
- ng-zorro. Note: `bin/texera-web-application.dockerfile` currently pins
- Yarn 4.5.1 in its frontend build stage.
-- `agent-service/`: TypeScript Elysia service for Texera LLM agents. CI uses
- Bun 1.3.3.
-- `pyright-language-service/`: TypeScript service for Python language support.
-- `sql/`: database DDL used by local runs and CI.
-- `bin/`: shell scripts and Dockerfiles for services, local deployment, and
- generated protobuf assets.
-
-## Ground Rules
-
-- Keep changes narrowly scoped. Do not rewrite unrelated files or move code
- between services unless the task explicitly requires it.
-- Preserve local user changes. Check `git status --short` before editing and do
- not revert unrelated dirty files.
-- Follow existing module boundaries and naming patterns. Prefer local helpers
- and service abstractions over introducing new framework-level utilities.
-- Add or update tests when behavior changes. For small UI-only fixes where unit
- tests are not practical, document the manual test steps.
-- Never commit secrets, local config, generated build output, caches, or binary
- artifacts. Examples to avoid include `python_udf.conf`, `.env` files,
`target/`,
- `dist/`, `.pytest_cache/`, `.ruff_cache/`, and local logs.
-
-## Licensing
-
-- New source/config files should include the Apache 2.0 ASF license header
unless
- `.licenserc.yaml` excludes that file type or path.
-- Markdown files are excluded from the license-header check.
-- Keep third-party/vendored-code attribution intact. `common/workflow-operator`
- has special license handling in `project/AddMetaInfLicenseFiles.scala`.
-- GitHub Actions in ASF repositories should use approved actions and preferably
- pinned SHAs, matching the existing workflow style.
-
-## Scala / Backend
-
-- Scala version: 2.13.18.
-- Java in CI: Temurin JDK 11.
-- Formatting: `.scalafmt.conf` uses scalafmt 2.6.4 with `maxColumn = 100`.
-- Lint rules live in `.scalafix.conf` and include `ProcedureSyntax` and
- `RemoveUnused`.
-
-Useful root commands:
+## Architecture Map
-```bash
-sbt scalafmtCheckAll
-sbt scalafmtAll
-sbt "scalafixAll --check"
-sbt scalafixAll
-sbt clean package
-sbt test
-```
+Apache Texera: Scala/sbt backend services + the Amber workflow execution
+engine, an Angular UI, and the agent service. JVM modules wired in
+[`build.sbt`](build.sbt).
-Targeted tests are preferred while iterating. Examples:
+| Area | Path | Detail |
+| --- | --- | --- |
+| Workflow execution engine (Amber) | `amber/` |
[amber/README.md](amber/README.md) |
+| Backend services | `config-service/`, `access-control-service/`,
`file-service/`, `computing-unit-managing-service/`,
`workflow-compiling-service/` | `build.sbt` |
+| Shared Scala libs | `common/` (`auth`, `config`, `dao`, `workflow-core`,
`workflow-operator`, `pybuilder`) | `build.sbt` |
+| Frontend (Angular) | `frontend/` | [frontend/README.md](frontend/README.md) |
+| Agent service (Bun/TS, LLM agents) | `agent-service/` |
`agent-service/package.json` |
+| Pyright language service | `pyright-language-service/` |
[pyright-language-service/README.md](pyright-language-service/README.md) |
+| Deploy scripts / Dockerfiles | `bin/` | [README](bin/README.md) /
[k8s](bin/k8s/README.md) / [single-node](bin/single-node/README.md) |
+| DDL, sbt plugins | `sql/`, `project/` | files therein |
-```bash
-sbt "WorkflowExecutionService/testOnly
org.apache.texera.amber.engine.e2e.ReconfigurationSpec"
-sbt "WorkflowCompilingService/testOnly *SomeSpec"
-```
+### Amber breakdown
-CI creates PostgreSQL databases from:
+| Path | Role |
+| --- | --- |
+| `amber/src/main/scala` | Pekko actors, scheduler, reconfiguration, fault
tolerance, gRPC/proto |
+| `amber/src/main/python/pyamber` | Python engine (`pyamber`) — bridge to the
Scala engine |
+| `amber/src/main/python/pytexera` | Python operator SDK exposed to UDFs |
-```bash
-psql -h localhost -U postgres -f sql/texera_ddl.sql
-psql -h localhost -U postgres -f sql/iceberg_postgres_catalog.sql
-psql -h localhost -U postgres -f sql/texera_lakefs.sql
-psql -h localhost -U postgres -v DB_NAME=texera_db_for_test_cases -f
sql/texera_ddl.sql
+## Where Things Live
+
+| Topic | Source of truth |
+| --- | --- |
+| Contribution / PR / lint / format / testing / license header |
[CONTRIBUTING.md](CONTRIBUTING.md) |
+| Reporting security issues | [SECURITY.md](SECURITY.md) |
+| PR template | [.github/PULL_REQUEST_TEMPLATE](.github/PULL_REQUEST_TEMPLATE)
|
+| Issue templates | [bug](.github/ISSUE_TEMPLATE/bug-template.yaml) /
[task](.github/ISSUE_TEMPLATE/task-template.yaml) /
[feature](.github/ISSUE_TEMPLATE/feature-template.yaml) |
+| License-header coverage; vendored `workflow-operator` |
[.licenserc.yaml](.licenserc.yaml);
[project/AddMetaInfLicenseFiles.scala](project/AddMetaInfLicenseFiles.scala) |
+| Local single-node / k8s deploy | [single-node](bin/single-node/README.md),
[k8s](bin/k8s/README.md) |
+
+If a topic is above, **read that file** instead of asking here.
+
+## Agent-Specific Rules
+
+### Scope and safety
+
+- Narrowly scoped changes. No unrelated rewrites or cross-service moves.
+- `git status --short` before editing; don't revert unrelated dirty files.
+- Never commit secrets / local config / build output / caches / binaries
+ (`python_udf.conf`, `.env`, `target/`, `dist/`, `.pytest_cache/`,
+ `.ruff_cache/`, logs).
+
+### Develop in a worktree
+
+Leave `texera/` on `main`. One worktree per PR, branched off a freshly
+fetched `upstream/main`.
+
+```
+texera/ # stays on main, never dirty
+texera-worktrees/<branch>/ # one worktree per PR
```
-## Python Runtime
+Reset to `upstream/main` at start; `git log upstream/main..HEAD` should
+contain only this PR's commits before pushing; remove the worktree after
+merge.
-Python worker code lives primarily under `amber/src/main/python`.
+### Environment
-- Supported CI Python versions: 3.10, 3.11, 3.12, 3.13.
-- Ruff config is in `amber/src/main/python/pyproject.toml`.
-- Ruff line length is 88 and target version is `py310`.
-- Generated protobuf code under `amber/src/main/python/proto` is excluded from
- Ruff.
+| Component | Version |
+| --- | --- |
+| Java | JDK 11 |
+| Scala | 2.13 |
+| Python | 3.12 |
+| Node | 24 |
-Useful commands:
+One Python venv shared across worktrees, sibling of the texera checkout:
+
+```
+<workspace>/
+├── texera/ # main checkout
+├── texera-worktrees/<br>/ # per-PR worktrees
+└── venv312/ # shared Python 3.12 venv
+```
```bash
-cd amber/src/main/python
-ruff check .
-ruff format --check .
-pytest -sv
-python -m pytest core/runnables/test_main_loop.py -v
+python3.12 -m venv ../venv312 && source ../venv312/bin/activate
+pip install -r amber/requirements.txt -r amber/operator-requirements.txt
```
-Install dependencies from `amber/requirements.txt` and
-`amber/operator-requirements.txt` when running the Python runtime or tests
-outside CI.
+Tests that spawn Python workers need an interpreter path. Edit `python.path`
+in [`udf.conf`](common/config/src/main/resources/udf.conf) or
+`export UDF_PYTHON_PATH="$(pwd)/../venv312/bin/python"` (env var overrides).
+Without it, `sbt` Python-integration tests fail to launch a worker.
-## Frontend
+### Branch and commit naming
-The Angular frontend lives in `frontend/`.
+Short, **Conventional Commits**, same shape for branch and commit subject.
-- Node engine: `>=24.0.0`.
-- Package manager: Yarn 4.14.1 via Corepack.
-- Formatting is Prettier plus prettier-eslint. Prettier uses 2 spaces,
- semicolons, double quotes, `printWidth: 120`, and LF endings.
-- Unit tests are Karma/Jasmine. Specs should live next to frontend code as
- `.spec.ts` files.
+| Kind | Branch | Commit |
+| --- | --- | --- |
+| Feature | `feat/agent-workflow-edit` | `feat(agent-service): enable workflow
edit` |
+| Bug fix | `fix/marker-replay` | `fix(amber): marker replay during
reconfiguration` |
+| Tests | `test/pyamber-handlers` | `test(pyamber): add handler unit tests` |
+| Chore | `chore/angular-21` | `chore(deps): upgrade frontend to Angular 21` |
+| CI | `ci/cache-action-bump` | `ci: bump coursier/cache-action to v8.1.0` |
-Useful commands:
+Both ≤ ~60 chars. For code changes, if you use a scope, use the module name
+(`amber`, `pyamber`, `frontend`, `agent-service`, `file-service`, …) — not
+`amber-python`. Use `chore(deps): ...` for dependency-only updates, and
+`ci: ...` for CI-only changes. No `Co-authored-by:` trailer for the repo
+owner.
-```bash
-cd frontend
-corepack enable
-corepack prepare [email protected] --activate
-yarn install --immutable --inline-builds --network-timeout=100000
-yarn format:ci
-yarn format:fix
-yarn lint
-yarn test --watch=false
-yarn test:ci
-yarn build:ci
-yarn start
+### Issues and PRs
+
+Issue-first; both stay short.
+
+```
+issue (template + Type) -> PR (Closes #N, template) -> review -> merge
```
-For UI changes, include screenshots/GIFs or clear manual verification steps in
-the PR description when the behavior is visual or interactive.
+- Every change starts as an issue (minor typo / docs excepted). File against
+ `apache/texera`, never a fork.
+- Pick the right template **and** set the GitHub Issue **Type** explicitly
+ (`Bug` / `Task` / `Feature`); the template's `type:` frontmatter doesn't
+ always apply on creation.
+- Reference the issue: `Closes #N` (or `Fixes` / `Resolves`, or "related to").
+- Issue titles are **plain prose**; never use the Conventional Commits
+ format (`type(scope): ...`) — that prefix is for commit and PR titles only.
+- Task issues match `task-template.yaml` exactly.
+- Prefer **tables** and small **ASCII diagrams** over long bullets. Don't
+ restate the diff or the template.
+- For bugs, lead with **root cause** and a **before -> after** sketch:
+ ```
+ Before: reconfiguration -> replay marker -> worker hangs
+ After: reconfiguration -> replay marker -> resume from checkpoint
+ ```
+- **Frontend PRs**: any visible UI change requires screenshots / GIF,
+ **before / after** side by side. For purely visual fixes that's the
+ primary verification under "How was this PR tested?"; interactive flows
+ also list manual steps (click path, browser, viewport).
+
+### Tests come first
+
+TDD. Write the test before the source change.
-## Agent Service
+```
+write/adjust test (red) -> edit source (green) -> refactor
+```
-The standalone LLM agent service lives in `agent-service/`.
+| Situation | Order |
+| --- | --- |
+| New feature / behavior change | Failing test, then implement. |
+| Bug fix | Regression test reproducing the bug, then fix. |
+| Code with **no tests** | **Characterization tests** pin current behavior
first; only then change source. |
+| Refactor (no behavior change) | Tests stay green throughout — no assertion
edits. |
-- Runtime/package tool in CI: Bun 1.3.3.
-- Source is TypeScript ESM.
+Every test must cover:
-Useful commands:
+- **Both directions**: positive (valid → expected) **and** negative (invalid
+ / error → specific failure mode).
+- **Edge cases**: empty / null / zero / max / boundary, unicode,
+ concurrency/order, missing or malformed config.
+- **Don't assume valid.** External input (user / API / file / message) must
+ be tested with bad input.
-```bash
-cd agent-service
-bun install --frozen-lockfile
-bun run format:check
-bun run typecheck
-bun test
-bun run dev
+Don't claim "tested" without commands. Paste the exact `sbt testOnly` /
+`pytest` / `yarn test:ci` / `bun test` invocation under "How was this PR
+tested?".
+
+### CI labels & gating
+
+CI runs are **selected by PR labels**, not by file diff.
+
+```
+diff -> pr-labeler -> labels on PR -> required-checks maps labels to stacks ->
CI runs
```
-## GitHub PR Writing
-
-Texera requires Conventional Commit PR titles and commit messages. Closed PRs
-commonly use titles like:
-
-- `feat(agent-service): enable Texera Agent to do workflow editing and
execution`
-- `fix(amber): Python internal marker replay during reconfiguration`
-- `fix(frontend): version history timestamp display`
-- `test(amber-python): add unit tests for evaluate-expression and
retry-current-tuple handlers`
-- `chore(deps): upgrade frontend to Angular 21`
-- `ci: bump coursier/cache-action to v8.1.0`
-
-Use the existing `.github/PULL_REQUEST_TEMPLATE` sections:
-
-- `What changes were proposed in this PR?`
-- `Any related issues, documentation, discussions?`
-- `How was this PR tested?`
-- `Was this PR authored or co-authored using generative AI tooling?`
-
-PR description conventions from recent closed PRs:
-
-- Start with the reason for the change, not just the files touched.
-- For bugs, state the root cause and the before/after behavior.
-- For features, describe the user-facing capability and key implementation
- pieces.
-- Link issues with `Closes #1234`, `Fixes #1234`, or `Resolves #1234` when the
- PR should close the issue.
-- Include exact test commands and, when useful, the specific test names or pass
- counts.
-- For UI work, add screenshots/GIFs or explicit manual verification notes.
-- If no automated tests were added, explain why and list manual tests.
-- Answer the AI tooling question explicitly. If AI was used, use the
- `Generated-by: <tool and version>` wording from the template or a similarly
- explicit disclosure. If not, write `No`.
-
-## GitHub Issue Writing
-
-Use the issue templates in `.github/ISSUE_TEMPLATE`.
-
-Bug reports should include:
-
-- What happened and what was expected.
-- Reproduction steps that another contributor can run.
-- Texera version, usually `1.1.0-incubating (Pre-release/Master)` for current
- main.
-- Commit hash when known.
-- Browser information for frontend bugs.
-- Relevant logs or stack traces in fenced code blocks.
-
-Task and feature issues should include:
-
-- A concise task/feature summary.
-- Motivation or user impact.
-- Proposed action or scope, ideally as concrete bullets.
-- Priority (`P0` through `P3`) and task type.
-- File paths, classes, or modules when the work is already localized.
-
-Recent closed issues are usually specific and actionable: they name the failing
-test, exact command, affected files/classes, observable symptoms, and expected
-fix direction. Preserve that style for future issues.
-
-## Before Opening a PR
-
-Run the narrowest checks that cover the change, then broaden when touching
shared
-behavior:
-
-- Scala/backend: targeted `testOnly`, then `sbt scalafmtCheckAll`,
- `sbt "scalafixAll --check"`, and `sbt test` as appropriate.
-- Python runtime: `ruff check .`, `ruff format --check .`, and targeted/full
- `pytest` from `amber/src/main/python`.
-- Frontend: `yarn format:ci`, targeted/full `yarn test:ci`, and
- `yarn build:ci`.
-- Agent service: `bun run format:check`, `bun run typecheck`, and `bun test`.
-
-If a full check is too expensive or cannot run locally, state exactly what was
-run and why the omitted check was skipped.
+- Path → label rules: [`.github/labeler.yml`](.github/labeler.yml)
+- Label → stacks (`LABEL_STACKS`, source of truth):
+
[`.github/workflows/required-checks.yml`](.github/workflows/required-checks.yml).
+ Read it directly; don't duplicate the mapping here.
+- Need extra coverage the diff doesn't imply (e.g. a `common/` change you
+ suspect breaks the frontend)? **Add the relevant label manually**.
+- Empty stack union (docs-only / dev-only / `dependencies` / `feature` /
+ `fix` / `refactor` / `release/*` only) skips every build stack on purpose.
+- `release/*` labels select backport targets; removing one cancels that
+ backport.