GitHub user aicam created a discussion: Architecture discussion: how should the
frontend and agent-service share backend API contracts? (REST contract drift)
## Summary — the contract drift problem
Texera now has **two independent TypeScript clients of the same backend REST
APIs**: the Angular `frontend` and the `agent-service` (the LLM agent that
operates Texera on a user's behalf). Both call the same Scala backend endpoints
(file-service, dashboard-service, workflow-compiling-service,
workflow-execution-service), and both **hand-redeclare the same contract** —
URL paths, request shapes, and response DTOs.
That means each endpoint's contract exists in **three places** that must be
kept in sync by hand:
1. The Scala backend (the actual source of truth — JAX-RS resources)
2. `frontend` (Angular `HttpClient`, types in `common/type/*`, URL constants in
the dataset service)
3. `agent-service` (`fetch`-based clients in `agent-service/src/api/*`, with
its own duplicated interfaces)
**Concrete example — listing dataset versions:**
- Backend: a JAX-RS resource in file-service
- Frontend: `DATASET_VERSION_RETRIEVE_LIST_URL` + a `DatasetVersion` type
- Agent: `listDatasetVersions(...)` building `` `/${did}/version/list` `` + its
own `DatasetVersion` interface in `agent-service/src/api/dataset-api.ts`
The path string and the DTO are duplicated across the frontend and the agent,
and both can silently drift from the Scala backend. When the backend renames a
field or changes a route, nothing fails at build time — the two TS clients just
quietly go stale until something breaks at runtime. As the agent-service grows
to cover more of Texera's surface (datasets, workflow CRUD, execution, operator
metadata), this overlap with the frontend keeps growing.
We'd like to align on a standard for how endpoints/contracts are shared across
clients before the duplication spreads further.
## Important: what does NOT solve this (and why MCP is orthogonal)
This is closely related to **#5610** (inline agent tools vs. a dedicated
TexeraMCP server), so it's worth being explicit about the relationship, because
the two are easy to conflate.
**#5610 is about a different axis.** It asks how the *agent* (an LLM) should
consume Texera's capabilities — inline function-calling tools vs. MCP. That is
an **LLM-facing delivery** question. The drift problem here is about how **two
programmatic HTTP clients** avoid re-declaring the backend's contract. That is
a **client-contract** question. They are orthogonal layers.
**Adopting MCP does not remove this redundancy.** If Texera's actions move
behind a TexeraMCP server:
- The agent-service stops hand-writing tool wrappers — but the endpoint+DTO
knowledge just **relocates into the MCP server**, which still has to call the
REST endpoints. The contract copy moves; it doesn't disappear.
- The **frontend is unaffected**. The Angular UI is a human-facing client and
will never be an MCP client (MCP is a protocol for LLMs/agents). Its duplicate
REST contract survives untouched.
- If the TexeraMCP server is hand-written, it becomes a **fourth**
hand-maintained representation of the same endpoints — i.e. MCP can make drift
*worse*, not better.
So MCP is a choice about how the agent talks to capabilities; it is not a
contract-sharing strategy. The drift has to be solved one layer below MCP.
**Note:** the `@AgentTool` / `GET /api/agent-tools` manifest idea proposed in
#5610 *is* actually a drift solution in disguise — it's a "backend declares the
contract once, clients derive from it" approach (the same family as Option C
below). That's the connection worth pursuing: solve the contract layer first,
and *then* the inline-vs-MCP decision in #5610 becomes a cheap downstream
choice, because every client (frontend, agent, and any future MCP server)
derives from one definition.
## A guiding principle
Both the frontend and the agent are **clients of the same backend**, which is
the single source of truth. So the goal is not "share code between the frontend
and the agent" — it's **single-source the API contract and let every client
derive from it.**
And a key distinction: **share the contract (types + endpoint paths), never the
transport (the HTTP-calling code).** The frontend uses Angular
`HttpClient`/Observables; the agent uses `fetch`/Promises. The transport
legitimately differs per client; only the request/response shapes and URL paths
are identical and worth deduplicating.
## Possible solutions
### Option A — Status quo: hand-maintained copies
Keep duplicating manually.
- **Pros:** zero new tooling; each client fully independent.
- **Cons:** guaranteed drift; runtime-only failures; cost grows with every
shared endpoint.
### Option B — Shared TypeScript contract package
A small workspace package (e.g. `@texera/api-contracts`) holding only DTO
interfaces + endpoint path builders (`datasetVersionListPath(did)`), consumed
by both TS projects. No HTTP logic.
- **Pros:** eliminates 2 of the 3 copies (frontend + agent share one
definition); low setup cost; can pilot on one API.
- **Cons:** still hand-maintained against the Scala source (backend can still
drift from the package); requires wiring a shared module into the Angular build
(the main integration risk).
- **Variant (B'):** define each contract once as a **zod** schema, derive the
TS type, and validate responses at runtime on both sides. agent-service already
uses zod. Adds runtime safety on top of B.
### Option C — Contract-first codegen from the backend (the industry standard)
The Scala backend emits a machine-readable contract (OpenAPI), and we
**generate** typed TS clients for both consumers (`openapi-typescript` /
`openapi-generator` / `orval`). The backend is JAX-RS (Dropwizard/Jersey),
which has first-class OpenAPI support via swagger annotations.
- **Pros:** eliminates all three copies — the spec is generated from the
server, so a backend change breaks both TS clients at build time until updated;
this is the standard answer for "same endpoints, multiple clients"; the same
spec can later feed a TexeraMCP server so it isn't a 4th copy.
- **Cons:** highest setup cost (add OpenAPI emission to the Dropwizard services
+ wire codegen into both builds); we currently emit no spec.
- **Connection to #5610:** this is the generalization of the `@AgentTool`
manifest idea — one backend-declared contract that the frontend, the agent,
*and* any MCP server all derive from.
## Suggested direction (for discussion)
A pragmatic two-phase path:
1. **Now:** Option B/B' — extract a shared `api-contracts` module for one API
(e.g. datasets) as a pilot, to prove out the Angular-build integration on a
small surface.
2. **Later, if overlap keeps growing:** Option C — add OpenAPI emission to the
backend and replace the hand-maintained package with generated clients, making
the **Scala backend the enforced single source** for all clients (and feeding
#5610's tool layer from the same source).
## Affected Area
- Workflow UI (frontend)
- agent-service
- Storage / Metadata (file-service / dashboard-service contracts)
- Deployment / Infrastructure (build + codegen tooling)
## Questions for the group
1. Do we want to commit to backend-as-single-source (Option C) as the eventual
target, or is a shared TS package (Option B) sufficient given the actual
overlap?
2. Is there appetite to add OpenAPI emission to the Dropwizard/JAX-RS services?
Any prior art or blockers?
3. Should the shared contract layer be designed so it can also feed the agent
tool layer from #5610 (one source → frontend client + agent tools + optional
MCP)?
4. Where do we draw the line on what's worth sharing — just DTOs + paths, or
also validation (zod)?
Related: #5610 (inline agent tools vs. TexeraMCP).
GitHub link: https://github.com/apache/texera/discussions/5918
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]