GitHub user bobbai00 created a discussion: Improvement: Treat TexeraAgents as 
access-controlled Resources, just like other resources (workflows, datasets and 
computing units)

# Persist TexeraAgent As An Access-Controlled Resource

## Background

In #4495, a new microservice, `agent-service`, is introduced to manage 
`TexeraAgent` instances that help users do data science using natural language 
and workflows. After this change, the architecture becomes the following:

<img width="4052" height="1100" alt="Agent service architecture" 
src="https://github.com/user-attachments/assets/02081a32-571a-4fea-bc2a-f96e8f1350c6";
 />

The diagrams below show the service traffic when users CRUD `TexeraAgent` 
resources:

<img width="4032" height="1100" alt="Agent CRUD traffic" 
src="https://github.com/user-attachments/assets/64abd20f-29b1-4cb6-bba1-b571a4cac3af";
 />

The diagrams below show the service traffic when users collaborate with 
`TexeraAgent`, and `TexeraAgent` performs ReAct:

<img width="4032" height="1340" alt="Agent ReAct traffic" 
src="https://github.com/user-attachments/assets/73e9ac5a-f3ce-4aac-940c-3513b0ab2ac9";
 />

## Problem Of Current Design

`agent-service` currently manages `TexeraAgent` as in-memory objects with no 
access control. This introduces two major issues:

1. Limited scalability: agent traces are ephemeral. Whenever `agent-service` 
goes down, the agent trace is lost.
2. No per-user isolation: all users see each other's agents. The conversation 
between the user and the agent may contain confidential or sensitive 
information about workflow execution and the user's request.

## Proposal: Treat TexeraAgent As A Resource

Texera already manages several kinds of resources: users, workflows, datasets, 
computing units, and workflow executions. To solve the problems above, 
`TexeraAgent` should be treated as another resource type. Agents are persisted 
in `texera_db`, managed by Postgres, and are owned by a user.


### Relational DB Schema Change

<img width="2172" height="1008" alt="Agent DB schema" 
src="https://github.com/user-attachments/assets/f5b9f064-2c87-4bd5-88fd-eb0b9463e908";
 />

A new entity, `agent`, is added. One user can own multiple agents through 
`agent.uid -> user.uid`.

The schema of the `agent` entity is:

```sql
CREATE TABLE IF NOT EXISTS agent
(
    aid           UUID PRIMARY KEY,
    uid           INT NOT NULL,
    name          VARCHAR(128) NOT NULL,
    model_type    VARCHAR(256) NOT NULL,
    config        JSONB NOT NULL DEFAULT '{}'::jsonb,
    react_steps   JSONB NOT NULL DEFAULT '[]'::jsonb,
    creation_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (uid) REFERENCES "user"(uid) ON DELETE CASCADE
);
```

Column responsibilities:

| Column | Type | Description |
| --- | --- | --- |
| `aid` | `UUID` | Stable agent ID exposed by the agent API. |
| `uid` | `INT` | Owner user ID. This is the access-control boundary for the 
current PR. |
| `name` | `VARCHAR(128)` | User-visible agent name. |
| `model_type` | `VARCHAR(256)` | Model selected for the agent, for example 
`gpt-4o-mini`. |
| `config` | `JSONB` | Persistent agent configuration, including system prompt, 
tool metadata, and runtime settings. |
| `react_steps` | `JSONB` | Persistent ReAct trace as an array of JSON objects. 
|
| `creation_time` | `TIMESTAMP` | Creation timestamp used for ordering and 
display. |

`config` contains durable agent settings only. It should include data such as:

```json
{
  "systemPrompt": "...",
  "tools": [
    {
      "name": "addOperator",
      "description": "...",
      "inputSchema": {},
      "enabled": true
    }
  ],
  "settings": {
    "maxOperatorResultCharLimit": 2000,
    "maxOperatorResultCellCharLimit": 2000,
    "operatorResultSerializationMode": "tsv",
    "toolTimeoutSeconds": 240,
    "executionTimeoutMinutes": 4,
    "disabledTools": [],
    "maxSteps": 100,
    "allowedOperatorTypes": ["CSVFileScan", "Filter"]
  }
}
```

`react_steps` stores the durable ReAct state as a JSON array. Each element 
represents one user or agent step and may include nested tool calls, tool 
results, token usage, and workflow snapshots:

```json
[
  {
    "id": "step-...",
    "messageId": "msg-...",
    "stepId": 0,
    "timestamp": 1780217116417,
    "role": "user",
    "content": "Analyze this workflow",
    "isBegin": true,
    "isEnd": true,
    "messageSource": "chat"
  }
]
```

The table intentionally does not store workflow ID, computing unit ID, workflow 
name, or user JWT. Those values are request-scoped and are supplied when the 
user sends an agent task. This avoids stale workflow bindings, stale computing 
unit bindings, and persisted credentials.

### Service Traffic Change

The traffic when users CRUD agents becomes:

<img width="4068" height="1680" alt="Agent CRUD traffic with persistence" 
src="https://github.com/user-attachments/assets/e99a6d82-0e60-4094-8bb5-b18d461a806a";
 />

The traffic when users collaborate with agents becomes:

<img width="4068" height="2100" alt="Agent ReAct traffic with persistence" 
src="https://github.com/user-attachments/assets/24b6e265-a885-4e15-870b-879ee4e20c8e";
 />

### Access Control

All agent APIs require authentication. The agent service validates JWTs using 
the same secret as the rest of Texera, loaded from:

```text
common/config/src/main/resources/auth.conf
```

Access rules:

| Operation | Rule |
| --- | --- |
| Create agent | Caller must have a valid JWT. The inserted row stores the 
caller's `uid`. |
| List agents | Caller only sees agents where `agent.uid = caller.uid`. |
| Read/update/delete/control agent | Caller must own the target agent. |
| WebSocket connect | Caller must own the target agent. |
| WebSocket task message | Caller must own the agent and provide valid task 
context. |

Unauthorized requests return `401`. Authenticated users accessing another 
user's agent return `403`.

### Persistence Flow

```text
create agent
  -> validate JWT
  -> create runtime TexeraAgent
  -> insert agent row with owner uid, config, and empty react_steps

send task
  -> validate WebSocket access
  -> validate task context
  -> retrieve workflow using userToken + workflowId
  -> run TexeraAgent
  -> persist ReAct steps after step updates and completion
```



GitHub link: https://github.com/apache/texera/discussions/5302

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to