GitHub user aicam created a discussion: Architecture discussion: how should the 
frontend and agent-service share backend API contracts? (REST contract drift)

## Summary — the contract drift problem

Texera now has **two independent TypeScript clients of the same backend REST 
APIs**: the Angular `frontend` and the `agent-service` (the LLM agent that 
operates Texera on a user's behalf). Both call the same Scala backend endpoints 
(file-service, dashboard-service, workflow-compiling-service, 
workflow-execution-service), and both **hand-redeclare the same contract** — 
URL paths, request shapes, and response DTOs.

That means each endpoint's contract exists in **three places** that must be 
kept in sync by hand:

1. The Scala backend (the actual source of truth — JAX-RS resources)
2. `frontend` (Angular `HttpClient`, types in `common/type/*`, URL constants in 
the dataset service)
3. `agent-service` (`fetch`-based clients in `agent-service/src/api/*`, with 
its own duplicated interfaces)

**Concrete example — listing dataset versions:**

- Backend: a JAX-RS resource in file-service
- Frontend: `DATASET_VERSION_RETRIEVE_LIST_URL` + a `DatasetVersion` type
- Agent: `listDatasetVersions(...)` building `` `/${did}/version/list` `` + its 
own `DatasetVersion` interface in `agent-service/src/api/dataset-api.ts`

The path string and the DTO are duplicated across the frontend and the agent, 
and both can silently drift from the Scala backend. When the backend renames a 
field or changes a route, nothing fails at build time — the two TS clients just 
quietly go stale until something breaks at runtime. As the agent-service grows 
to cover more of Texera's surface (datasets, workflow CRUD, execution, operator 
metadata), this overlap with the frontend keeps growing.

We'd like to align on a standard for how endpoints/contracts are shared across 
clients before the duplication spreads further.

## Important: what does NOT solve this (and why MCP is orthogonal)

This is closely related to **#5610** (inline agent tools vs. a dedicated 
TexeraMCP server), so it's worth being explicit about the relationship, because 
the two are easy to conflate.

**#5610 is about a different axis.** It asks how the *agent* (an LLM) should 
consume Texera's capabilities — inline function-calling tools vs. MCP. That is 
an **LLM-facing delivery** question. The drift problem here is about how **two 
programmatic HTTP clients** avoid re-declaring the backend's contract. That is 
a **client-contract** question. They are orthogonal layers.

**Adopting MCP does not remove this redundancy.** If Texera's actions move 
behind a TexeraMCP server:

- The agent-service stops hand-writing tool wrappers — but the endpoint+DTO 
knowledge just **relocates into the MCP server**, which still has to call the 
REST endpoints. The contract copy moves; it doesn't disappear.
- The **frontend is unaffected**. The Angular UI is a human-facing client and 
will never be an MCP client (MCP is a protocol for LLMs/agents). Its duplicate 
REST contract survives untouched.
- If the TexeraMCP server is hand-written, it becomes a **fourth** 
hand-maintained representation of the same endpoints — i.e. MCP can make drift 
*worse*, not better.

So MCP is a choice about how the agent talks to capabilities; it is not a 
contract-sharing strategy. The drift has to be solved one layer below MCP.

**Note:** the `@AgentTool` / `GET /api/agent-tools` manifest idea proposed in 
#5610 *is* actually a drift solution in disguise — it's a "backend declares the 
contract once, clients derive from it" approach (the same family as Option C 
below). That's the connection worth pursuing: solve the contract layer first, 
and *then* the inline-vs-MCP decision in #5610 becomes a cheap downstream 
choice, because every client (frontend, agent, and any future MCP server) 
derives from one definition.

## A guiding principle

Both the frontend and the agent are **clients of the same backend**, which is 
the single source of truth. So the goal is not "share code between the frontend 
and the agent" — it's **single-source the API contract and let every client 
derive from it.**

And a key distinction: **share the contract (types + endpoint paths), never the 
transport (the HTTP-calling code).** The frontend uses Angular 
`HttpClient`/Observables; the agent uses `fetch`/Promises. The transport 
legitimately differs per client; only the request/response shapes and URL paths 
are identical and worth deduplicating.

## Possible solutions

### Option A — Status quo: hand-maintained copies
Keep duplicating manually.
- **Pros:** zero new tooling; each client fully independent.
- **Cons:** guaranteed drift; runtime-only failures; cost grows with every 
shared endpoint.

### Option B — Shared TypeScript contract package
A small workspace package (e.g. `@texera/api-contracts`) holding only DTO 
interfaces + endpoint path builders (`datasetVersionListPath(did)`), consumed 
by both TS projects. No HTTP logic.
- **Pros:** eliminates 2 of the 3 copies (frontend + agent share one 
definition); low setup cost; can pilot on one API.
- **Cons:** still hand-maintained against the Scala source (backend can still 
drift from the package); requires wiring a shared module into the Angular build 
(the main integration risk).
- **Variant (B'):** define each contract once as a **zod** schema, derive the 
TS type, and validate responses at runtime on both sides. agent-service already 
uses zod. Adds runtime safety on top of B.

### Option C — Contract-first codegen from the backend (the industry standard)
The Scala backend emits a machine-readable contract (OpenAPI), and we 
**generate** typed TS clients for both consumers (`openapi-typescript` / 
`openapi-generator` / `orval`). The backend is JAX-RS (Dropwizard/Jersey), 
which has first-class OpenAPI support via swagger annotations.
- **Pros:** eliminates all three copies — the spec is generated from the 
server, so a backend change breaks both TS clients at build time until updated; 
this is the standard answer for "same endpoints, multiple clients"; the same 
spec can later feed a TexeraMCP server so it isn't a 4th copy.
- **Cons:** highest setup cost (add OpenAPI emission to the Dropwizard services 
+ wire codegen into both builds); we currently emit no spec.
- **Connection to #5610:** this is the generalization of the `@AgentTool` 
manifest idea — one backend-declared contract that the frontend, the agent, 
*and* any MCP server all derive from.

## Suggested direction (for discussion)

A pragmatic two-phase path:
1. **Now:** Option B/B' — extract a shared `api-contracts` module for one API 
(e.g. datasets) as a pilot, to prove out the Angular-build integration on a 
small surface.
2. **Later, if overlap keeps growing:** Option C — add OpenAPI emission to the 
backend and replace the hand-maintained package with generated clients, making 
the **Scala backend the enforced single source** for all clients (and feeding 
#5610's tool layer from the same source).

## Affected Area
- Workflow UI (frontend)
- agent-service
- Storage / Metadata (file-service / dashboard-service contracts)
- Deployment / Infrastructure (build + codegen tooling)

## Questions for the group
1. Do we want to commit to backend-as-single-source (Option C) as the eventual 
target, or is a shared TS package (Option B) sufficient given the actual 
overlap?
2. Is there appetite to add OpenAPI emission to the Dropwizard/JAX-RS services? 
Any prior art or blockers?
3. Should the shared contract layer be designed so it can also feed the agent 
tool layer from #5610 (one source → frontend client + agent tools + optional 
MCP)?
4. Where do we draw the line on what's worth sharing — just DTOs + paths, or 
also validation (zod)?

Related: #5610 (inline agent tools vs. TexeraMCP).

GitHub link: https://github.com/apache/texera/discussions/5918

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to