EmilySun621 opened a new pull request, #5097: URL: https://github.com/apache/texera/pull/5097
The Story A biomedical researcher wants to study diabetes. She opens Texera and thinks: "Where do I even find the right dataset?" She opens a new tab, googles around, downloads a CSV from UCI, uploads it to Texera, configures the file path manually. Twenty minutes gone before she's even started analyzing. After she builds her workflow and runs it, the AI agent gives her a detailed model comparison โ accuracy, F1 scores, key insights. But it's all buried in a long chat message she has to scroll through. She can't easily reference it, copy it, or share it with her advisor. We fixed both problems. What We Built 1. Dataset Bank โ Browse and Import Public Datasets Without Leaving Texera A new page in the sidebar where users browse a curated catalog of public datasets from UCI, Kaggle, and dkNET โ searchable, categorized, and importable with one click. How it works: Open "Dataset Bank" in the sidebar โ see a grid of dataset cards Search by name, description, or tag (e.g., "diabetes", "classification", "healthcare") Filter by category: Biomedical, NLP, Computer Vision, Finance, Social Science, Time Series, Tabular Every card shows: name, source badge (UCI/Kaggle/dkNET), description, row/column counts, file size, tags Three actions per card: ๐ View on source โ opens the original dataset page so users can verify before importing โ Download โ saves the file locally โ Import โ imports directly into Texera's dataset system. One click, no manual upload, no file path configuration. The dataset immediately appears in "Your Datasets" and is ready for any workflow. Backend: Server-side proxy (/api/dataset-bank/import-from-url) fetches the file and uploads it through Texera's existing dataset pipeline โ bypassing browser CORS restrictions. 2. Dataset Search Agent Tool โ "Find Me a Diabetes Dataset" The AI agent can now search for datasets on your behalf during a conversation. User asks: "find me a diabetes dataset" โ agent calls search_datasets tool Searches dkNET, UCI, and Kaggle in parallel, returns top results Agent also knows your existing Texera datasets (injected into system prompt as "Your Datasets" section) User says "use my iris dataset" โ agent knows the exact file path and configures the CSV Source automatically 3. Results Dashboard โ Analysis Reports Outside the Chat When the AI agent produces a workflow analysis (model comparison, metrics, key findings), it now appears in a dedicated Results Dashboard panel instead of being buried in chat. Agent wraps analysis in report markers โ chat shows a compact card: "๐ Results ready ยท View Report โ" Clicking opens a floating Results Dashboard panel alongside the canvas Dashboard renders formatted markdown: tables, headers, bold metrics, key insights Copy button to clipboard, Export button to download as markdown Timestamped so users know when the analysis was generated Auto-updates when agent sends new analysis The experience: Canvas on the left showing your DAG, Results Dashboard on the right showing your analysis, chat in between for interaction. Everything visible at once โ no tab switching, no scrolling through messages. Demo Scenario Open Dataset Bank โ search "diabetes" โ filter "Biomedical" โ see Pima Indians Diabetes dataset Click "๐ View on UCI" to verify โ click "โ Import" โ dataset appears in Your Datasets Open a workflow โ ask the Diabetes Agent: "Build a classification workflow using my diabetes dataset" Agent generates the workflow on canvas Ask: "Run it and give me a comparison report" Agent runs workflow, produces analysis โ "๐ Results ready ยท View Report โ" Click โ Results Dashboard opens with formatted model comparison table, winner, key insights Copy the report to share with advisor Files Changed Dataset Bank (Frontend) dashboard/component/user/dataset-bank/ โ DatasetBankComponent (page, search, categories, cards) dashboard/component/user/dataset-bank/dataset-bank.seed.ts โ Curated seed of 20+ popular datasets dashboard/service/dataset-bank/dataset-bank.service.ts โ Fetch, filter, import logic Dataset Search (Agent Service) agent-service/src/agent/tools/dataset-search-tool.ts โ search_datasets tool (dkNET + UCI + Kaggle) agent-service/src/api/user-datasets-api.ts โ Fetches user's existing datasets for prompt injection agent-service/src/agent/prompts.ts โ "Your Datasets" section in system prompt Dataset Import Proxy (Agent Service) agent-service/src/api/dataset-import-api.ts โ Server-side fetch + Texera dataset upload pipeline agent-service/src/server.ts โ /api/dataset-bank router mount Results Dashboard (Frontend + Agent Service) workspace/component/results-dashboard-panel/ โ Floating panel with markdown rendering, copy, export workspace/service/agent-report/agent-report.service.ts โ Report pub/sub between chat and panel agent-service/src/agent/prompts.ts โ Report marker convention instructions Configuration proxy.config.json โ Dev proxy for /api/dataset-bank โ agent-service agent-service/src/config/env.ts โ TEXERA_FILE_SERVICE_ENDPOINT for dataset operations Testing Angular build: clean โ agent-service typecheck: clean โ Dataset Import: tested with UCI Iris dataset โ end-to-end success โ Results Dashboard: tested with agent-generated report โ renders correctly โ Dataset search tool: registered and callable by agent โ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
