aglinxinyuan commented on code in PR #4825: URL: https://github.com/apache/texera/pull/4825#discussion_r3177641376
########## AGENTS.md: ########## @@ -1,235 +1,187 @@ # AGENTS.md -Guidance for coding agents working in the Apache Texera repository. - -## Project Overview - -Apache Texera is a collaborative data science and AI/ML workflow system. The -repo is a multi-language monorepo with Scala/sbt backend services, Python worker -runtime code, an Angular frontend, and a TypeScript/Bun agent service. - -Major areas: - -- `amber/`: workflow execution engine, Scala tests, Python worker runtime, and - Python operator dependencies. -- `common/`: shared Scala modules for auth, config, DAO, workflow core, - workflow operators, and Python-template building. -- `config-service/`, `access-control-service/`, `file-service/`, - `computing-unit-managing-service/`, `workflow-compiling-service/`: backend - services wired through `build.sbt`. -- `frontend/`: Angular application. Uses Yarn 4.14.1, Node >= 20.19.0, Nx, - Prettier, ESLint, Karma/Jasmine, and ng-zorro. -- `agent-service/`: TypeScript Elysia service for Texera LLM agents. CI uses - Bun 1.3.3. -- `pyright-language-service/`: TypeScript service for Python language support. -- `sql/`: database DDL used by local runs and CI. -- `bin/`: shell scripts and Dockerfiles for services, local deployment, and - generated protobuf assets. - -## Ground Rules - -- Keep changes narrowly scoped. Do not rewrite unrelated files or move code - between services unless the task explicitly requires it. -- Preserve local user changes. Check `git status --short` before editing and do - not revert unrelated dirty files. -- Follow existing module boundaries and naming patterns. Prefer local helpers - and service abstractions over introducing new framework-level utilities. -- Add or update tests when behavior changes. For small UI-only fixes where unit - tests are not practical, document the manual test steps. -- Never commit secrets, local config, generated build output, caches, or binary - artifacts. Examples to avoid include `python_udf.conf`, `.env` files, `target/`, - `dist/`, `.pytest_cache/`, `.ruff_cache/`, and local logs. - -## Licensing - -- New source/config files should include the Apache 2.0 ASF license header unless - `.licenserc.yaml` excludes that file type or path. -- Markdown files are excluded from the license-header check. -- Keep third-party/vendored-code attribution intact. `common/workflow-operator` - has special license handling in `project/AddMetaInfLicenseFiles.scala`. -- GitHub Actions in ASF repositories should use approved actions and preferably - pinned SHAs, matching the existing workflow style. - -## Scala / Backend - -- Scala version: 2.13.18. -- Java in CI: Temurin JDK 11. -- Formatting: `.scalafmt.conf` uses scalafmt 2.6.4 with `maxColumn = 100`. -- Lint rules live in `.scalafix.conf` and include `ProcedureSyntax` and - `RemoveUnused`. - -Useful root commands: +## Architecture Map -```bash -sbt scalafmtCheckAll -sbt scalafmtAll -sbt "scalafixAll --check" -sbt scalafixAll -sbt clean package -sbt test -``` +Apache Texera: Scala/sbt backend services + the Amber workflow execution +engine, an Angular UI, and helper TypeScript services. JVM modules wired in +[`build.sbt`](build.sbt). -Targeted tests are preferred while iterating. Examples: +| Area | Path | Detail | +| --- | --- | --- | +| Workflow execution engine (Amber) | `amber/` | [amber/README.md](amber/README.md) | +| Backend services | `config-service/`, `access-control-service/`, `file-service/`, `computing-unit-managing-service/`, `workflow-compiling-service/` | `build.sbt` | +| Shared Scala libs | `common/` (`auth`, `config`, `dao`, `workflow-core`, `workflow-operator`, `pybuilder`) | `build.sbt` | +| Frontend (Angular) | `frontend/` | [frontend/README.md](frontend/README.md) | +| Agent service (Bun/TS, LLM agents) | `agent-service/` | `agent-service/package.json` | +| Pyright language service | `pyright-language-service/` | [pyright-language-service/README.md](pyright-language-service/README.md) | +| Deploy scripts / Dockerfiles | `bin/` | [README](bin/README.md) / [k8s](bin/k8s/README.md) / [single-node](bin/single-node/README.md) | +| DDL, sbt plugins | `sql/`, `project/` | files therein | -```bash -sbt "WorkflowExecutionService/testOnly org.apache.texera.amber.engine.e2e.ReconfigurationSpec" -sbt "WorkflowCompilingService/testOnly *SomeSpec" -``` +### Amber breakdown -CI creates PostgreSQL databases from: +| Path | Role | +| --- | --- | +| `amber/src/main/scala` | Pekko actors, scheduler, reconfiguration, fault tolerance, gRPC/proto | +| `amber/src/main/python/pyamber` | Python engine (`pyamber`) — bridge to the Scala engine | +| `amber/src/main/python/pytexera` | Python operator SDK exposed to UDFs | -```bash -psql -h localhost -U postgres -f sql/texera_ddl.sql -psql -h localhost -U postgres -f sql/iceberg_postgres_catalog.sql -psql -h localhost -U postgres -f sql/texera_lakefs.sql -psql -h localhost -U postgres -v DB_NAME=texera_db_for_test_cases -f sql/texera_ddl.sql +## Where Things Live + +| Topic | Source of truth | +| --- | --- | +| Contribution / PR / lint / format / testing / license header | [CONTRIBUTING.md](CONTRIBUTING.md) | +| Reporting security issues | [SECURITY.md](SECURITY.md) | +| PR template | [.github/PULL_REQUEST_TEMPLATE](.github/PULL_REQUEST_TEMPLATE) | +| Issue templates | [bug](.github/ISSUE_TEMPLATE/bug-template.yaml) / [task](.github/ISSUE_TEMPLATE/task-template.yaml) / [feature](.github/ISSUE_TEMPLATE/feature-template.yaml) | +| License-header coverage; vendored `workflow-operator` | [.licenserc.yaml](.licenserc.yaml); [project/AddMetaInfLicenseFiles.scala](project/AddMetaInfLicenseFiles.scala) | +| Local single-node / k8s deploy | [single-node](bin/single-node/README.md), [k8s](bin/k8s/README.md) | + +If a topic is above, **read that file** instead of asking here. + +## Agent-Specific Rules + +### Scope and safety + +- Narrowly scoped changes. No unrelated rewrites or cross-service moves. +- `git status --short` before editing; don't revert unrelated dirty files. +- Never commit secrets / local config / build output / caches / binaries + (`python_udf.conf`, `.env`, `target/`, `dist/`, `.pytest_cache/`, + `.ruff_cache/`, logs). + +### Develop in a worktree + +Leave `texera/` on `main`. One worktree per PR, branched off a freshly +fetched `upstream/main`. + +``` +texera/ # stays on main, never dirty +texera-worktrees/<branch>/ # one worktree per PR ``` -## Python Runtime +Reset to `upstream/main` at start; `git log upstream/main..HEAD` should +contain only this PR's commits before pushing; remove the worktree after +merge. -Python worker code lives primarily under `amber/src/main/python`. +### Environment -- Supported CI Python versions: 3.10, 3.11, 3.12, 3.13. -- Ruff config is in `amber/src/main/python/pyproject.toml`. -- Ruff line length is 88 and target version is `py310`. -- Generated protobuf code under `amber/src/main/python/proto` is excluded from - Ruff. +| Component | Version | +| --- | --- | +| Java | JDK 11 | +| Scala | 2.13 | +| Python | 3.12 | +| Node | 24 | Review Comment: Upgraded to 24 in #4658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
