kz930 opened a new pull request, #5095:
URL: https://github.com/apache/texera/pull/5095
## Summary
Adds an in-workspace **AI Workflow Copilot** that helps biomedical
researchers
generate Texera workflows through a guided 4-step wizard backed by Claude.
The
copilot is embedded directly in the Angular workspace (no separate app) and
follows a strict **review-before-apply** flow: workflows are generated to a
review panel where the user inspects each operator's properties and `why`
explanation before clicking Apply to land them on the canvas.
### What's in this PR
- **4-step wizard** (`ai-wizard-panel.component`):
1. Analysis goal (EDA / Prediction / Cleaning / NLP / Custom)
2. Data source — **Existing Dataset** (your uploaded Texera CSV) or
**dkNET Dataset** (curated biomedical schemas)
3. Scientific framework (CRISP-DM / SEMMA / KDD / Custom) — editable
template
injected as soft prompt guidance
4. Guardrails configuration (train/test split, no leakage, mandatory
eval, no
synthetic data — toggleable with rationale shown)
- **Schema-aware generation** — `data-profiler.service` reads the chosen
CSV and
injects real column names, dtypes, null rates, and sample values into the
prompt so the LLM doesn't hallucinate columns
- **White-box validator** (`workflow-validator.service`) — checks JSON
schema,
operator existence, DAG well-formedness, and guardrail compliance with
3-retry repair loop on failure
- **Review-before-apply panel** — per-operator inspection of properties,
missing-required highlighting, and editable JSON values before commit
- **Audit manifest** — every generation logs wizard inputs, prompt version,
LLM response, validator retries, and final workflow JSON for
reproducibility
- **Design doc** (`design-doc.md`) — full architecture, principles, and
scope
decisions
## Demo Video
[Watch the demo on YouTube](https://youtu.be/bCFnP6Ty4b8)
## Test plan
- `yarn build` from `frontend/` completes without errors
- Open workspace → wizard panel renders; can step through all 4 steps
- Step 2 → **Existing Dataset** path: dataset picker opens, selecting a CSV
resolves a valid `/<owner>/<dataset>/v<n>/<file>.csv` backend path
- Step 2 → **dkNET Dataset** path: schema preview shows on selection
- Step 3 → editing the framework template persists and is sent in the
prompt
- Step 4 → toggling guardrails reflects in the final validator report
- Generate → review panel appears (workflow does **not** auto-land on
canvas)
- Edit a missing-required field in the review panel → Apply → workflow
lands
with the edited value
- Apply → operators visible on the canvas, `why` explanations attached
- Guardrails report and manifest ID visible after Apply
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]