[PR] [Hackathon] feat: Multi-Source Data Import — URL, Local File, SQLite, REST API [texera]

via GitHub Sat, 16 May 2026 19:33:12 -0700


EmilySun621 opened a new pull request, #5119:
URL: https://github.com/apache/texera/pull/5119


   > Paste a URL. Drop a file. Open a SQLite database. Ask the AI agent. **Four 
new import paths, zero manual download.**
   
   ---
   
   ## What's New
   
   **🔗 URL Import** — Paste any CSV/JSON URL on the Datasets page, click 
Import. Server-side fetch, auto-format detection.
   
   **📁 Local File Drop** — Drag & drop CSV, JSON, XLSX, TSV, SQLite directly 
onto the Datasets page.
   
   **🗄️ SQLite Import** — Drop a .sqlite file → pick tables from a list → each 
table becomes a dataset. Uses Bun's built-in `bun:sqlite`, no external 
dependencies.
   
   **⚡ REST API Agent Tool** — `fetch_api_data` tool lets the AI agent fetch 
from any API endpoint. Auto-flattens nested JSON to tabular format.
   
   ---
   
   ## How It Works
   
   ```
   Frontend (Datasets page)          Agent Service (port 3001)         Texera
   ┌─────────────────────┐          ┌─────────────────────────┐      
┌──────────┐
   │ URL input ──────────┼────→     │ POST /fetch-url         │─────→│ Dataset  
│
   │ File drop zone ─────┼────→     │ POST /sqlite-tables     │      │ Creation 
│
   │ Agent chat ─────────┼────→     │ POST /sqlite-export     │      │ API      
│
   └─────────────────────┘          │ Tool: fetch_api_data    │      
└──────────┘
                                    └─────────────────────────┘
   ```
   
   ---
   
   ## Verified
   
   ```bash
   $ curl -X POST localhost:3001/api/data-source/fetch-url \
       -H "Content-Type: application/json" \
       -d 
'{"url":"https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"}'
   
   → {"rows":150, "columns":["5.1","3.5","1.4","0.2","Iris-setosa"], 
"format":"csv"} ✅
   ```
   
   ---
   
   ## Files Changed
   
   **New:** `agent-service/src/api/data-source-api.ts` (3 endpoints), 
`data-source-tools.ts` (agent tool)
   
   **Modified:** `user-dataset.component.*` (URL input + drop zone), 
`dataset.service.ts` (fetch methods), `proxy.config.json`, 
`DatasetSearchQueryBuilder.scala` (fix: new datasets now appear in list 
immediately)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [Hackathon] feat: Multi-Source Data Import — URL, Local File, SQLite, REST API [texera]

Reply via email to