EmilySun621 opened a new pull request, #5097:
URL: https://github.com/apache/texera/pull/5097

   The Story
   A biomedical researcher wants to study diabetes. She opens Texera and 
thinks: "Where do I even find the right dataset?" She opens a new tab, googles 
around, downloads a CSV from UCI, uploads it to Texera, configures the file 
path manually. Twenty minutes gone before she's even started analyzing.
   After she builds her workflow and runs it, the AI agent gives her a detailed 
model comparison โ€” accuracy, F1 scores, key insights. But it's all buried in a 
long chat message she has to scroll through. She can't easily reference it, 
copy it, or share it with her advisor.
   We fixed both problems.
   What We Built
   1. Dataset Bank โ€” Browse and Import Public Datasets Without Leaving Texera
   A new page in the sidebar where users browse a curated catalog of public 
datasets from UCI, Kaggle, and dkNET โ€” searchable, categorized, and importable 
with one click.
   How it works:
   
   Open "Dataset Bank" in the sidebar โ†’ see a grid of dataset cards
   Search by name, description, or tag (e.g., "diabetes", "classification", 
"healthcare")
   Filter by category: Biomedical, NLP, Computer Vision, Finance, Social 
Science, Time Series, Tabular
   Every card shows: name, source badge (UCI/Kaggle/dkNET), description, 
row/column counts, file size, tags
   Three actions per card:
   
   ๐Ÿ”— View on source โ€” opens the original dataset page so users can verify 
before importing
   โ†“ Download โ€” saves the file locally
   โ˜ Import โ€” imports directly into Texera's dataset system. One click, no 
manual upload, no file path configuration. The dataset immediately appears in 
"Your Datasets" and is ready for any workflow.
   
   
   
   Backend: Server-side proxy (/api/dataset-bank/import-from-url) fetches the 
file and uploads it through Texera's existing dataset pipeline โ€” bypassing 
browser CORS restrictions.
   2. Dataset Search Agent Tool โ€” "Find Me a Diabetes Dataset"
   The AI agent can now search for datasets on your behalf during a 
conversation.
   
   User asks: "find me a diabetes dataset" โ†’ agent calls search_datasets tool
   Searches dkNET, UCI, and Kaggle in parallel, returns top results
   Agent also knows your existing Texera datasets (injected into system prompt 
as "Your Datasets" section)
   User says "use my iris dataset" โ†’ agent knows the exact file path and 
configures the CSV Source automatically
   
   3. Results Dashboard โ€” Analysis Reports Outside the Chat
   When the AI agent produces a workflow analysis (model comparison, metrics, 
key findings), it now appears in a dedicated Results Dashboard panel instead of 
being buried in chat.
   
   Agent wraps analysis in report markers โ†’ chat shows a compact card: "๐Ÿ“Š 
Results ready ยท View Report โ†’"
   Clicking opens a floating Results Dashboard panel alongside the canvas
   Dashboard renders formatted markdown: tables, headers, bold metrics, key 
insights
   Copy button to clipboard, Export button to download as markdown
   Timestamped so users know when the analysis was generated
   Auto-updates when agent sends new analysis
   
   The experience: Canvas on the left showing your DAG, Results Dashboard on 
the right showing your analysis, chat in between for interaction. Everything 
visible at once โ€” no tab switching, no scrolling through messages.
   Demo Scenario
   
   Open Dataset Bank โ†’ search "diabetes" โ†’ filter "Biomedical" โ†’ see Pima 
Indians Diabetes dataset
   Click "๐Ÿ”— View on UCI" to verify โ†’ click "โ˜ Import" โ†’ dataset appears in Your 
Datasets
   Open a workflow โ†’ ask the Diabetes Agent: "Build a classification workflow 
using my diabetes dataset"
   Agent generates the workflow on canvas
   Ask: "Run it and give me a comparison report"
   Agent runs workflow, produces analysis โ†’ "๐Ÿ“Š Results ready ยท View Report โ†’"
   Click โ†’ Results Dashboard opens with formatted model comparison table, 
winner, key insights
   Copy the report to share with advisor
   
   Files Changed
   Dataset Bank (Frontend)
   
   dashboard/component/user/dataset-bank/ โ€” DatasetBankComponent (page, search, 
categories, cards)
   dashboard/component/user/dataset-bank/dataset-bank.seed.ts โ€” Curated seed of 
20+ popular datasets
   dashboard/service/dataset-bank/dataset-bank.service.ts โ€” Fetch, filter, 
import logic
   
   Dataset Search (Agent Service)
   
   agent-service/src/agent/tools/dataset-search-tool.ts โ€” search_datasets tool 
(dkNET + UCI + Kaggle)
   agent-service/src/api/user-datasets-api.ts โ€” Fetches user's existing 
datasets for prompt injection
   agent-service/src/agent/prompts.ts โ€” "Your Datasets" section in system prompt
   
   Dataset Import Proxy (Agent Service)
   
   agent-service/src/api/dataset-import-api.ts โ€” Server-side fetch + Texera 
dataset upload pipeline
   agent-service/src/server.ts โ€” /api/dataset-bank router mount
   
   Results Dashboard (Frontend + Agent Service)
   
   workspace/component/results-dashboard-panel/ โ€” Floating panel with markdown 
rendering, copy, export
   workspace/service/agent-report/agent-report.service.ts โ€” Report pub/sub 
between chat and panel
   agent-service/src/agent/prompts.ts โ€” Report marker convention instructions
   
   Configuration
   
   proxy.config.json โ€” Dev proxy for /api/dataset-bank โ†’ agent-service
   agent-service/src/config/env.ts โ€” TEXERA_FILE_SERVICE_ENDPOINT for dataset 
operations
   
   Testing
   
   Angular build: clean โœ…
   agent-service typecheck: clean โœ…
   Dataset Import: tested with UCI Iris dataset โ€” end-to-end success โœ…
   Results Dashboard: tested with agent-generated report โ€” renders correctly โœ…
   Dataset search tool: registered and callable by agent โœ…


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to