kz930 opened a new pull request, #5095:
URL: https://github.com/apache/texera/pull/5095

   ## Summary
     
     Adds an in-workspace **AI Workflow Copilot** that helps biomedical 
researchers
     generate Texera workflows through a guided 4-step wizard backed by Claude. 
The
     copilot is embedded directly in the Angular workspace (no separate app) and
     follows a strict **review-before-apply** flow: workflows are generated to a
     review panel where the user inspects each operator's properties and `why`
     explanation before clicking Apply to land them on the canvas.
     
     ### What's in this PR
     
     - **4-step wizard** (`ai-wizard-panel.component`):
       1. Analysis goal (EDA / Prediction / Cleaning / NLP / Custom)
       2. Data source — **Existing Dataset** (your uploaded Texera CSV) or
          **dkNET Dataset** (curated biomedical schemas)
       3. Scientific framework (CRISP-DM / SEMMA / KDD / Custom) — editable 
template
          injected as soft prompt guidance
       4. Guardrails configuration (train/test split, no leakage, mandatory 
eval, no
          synthetic data — toggleable with rationale shown)
     - **Schema-aware generation** — `data-profiler.service` reads the chosen 
CSV and
       injects real column names, dtypes, null rates, and sample values into the
       prompt so the LLM doesn't hallucinate columns
     - **White-box validator** (`workflow-validator.service`) — checks JSON 
schema,
       operator existence, DAG well-formedness, and guardrail compliance with
       3-retry repair loop on failure
     - **Review-before-apply panel** — per-operator inspection of properties,
       missing-required highlighting, and editable JSON values before commit
     - **Audit manifest** — every generation logs wizard inputs, prompt version,
       LLM response, validator retries, and final workflow JSON for 
reproducibility
     - **Design doc** (`design-doc.md`) — full architecture, principles, and 
scope
       decisions
     
     ## Demo Video
     
     [Watch the demo on YouTube](https://youtu.be/bCFnP6Ty4b8)
     
     ## Test plan
     
     - `yarn build` from `frontend/` completes without errors
     - Open workspace → wizard panel renders; can step through all 4 steps
     - Step 2 → **Existing Dataset** path: dataset picker opens, selecting a CSV
          resolves a valid `/<owner>/<dataset>/v<n>/<file>.csv` backend path
     - Step 2 → **dkNET Dataset** path: schema preview shows on selection
     - Step 3 → editing the framework template persists and is sent in the 
prompt
     - Step 4 → toggling guardrails reflects in the final validator report
     - Generate → review panel appears (workflow does **not** auto-land on 
canvas)
     - Edit a missing-required field in the review panel → Apply → workflow 
lands
           with the edited value
     - Apply → operators visible on the canvas, `why` explanations attached
     - Guardrails report and manifest ID visible after Apply


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to