tanishqgandhi1908 opened a new pull request, #5094:
URL: https://github.com/apache/texera/pull/5094
### What changes were proposed in this PR?
This PR improves the end-to-end data experience for hackathon workflows by
making ingestion smarter, image workflows first-class, and visual outputs
easier to understand.
**Motivation**
Before:
| User task | Current friction |
| --- | --- |
| Load a dataset | Users must choose the right source operator before they
know the file format |
| Read a folder | Folder-backed datasets are awkward to use and lose file
provenance |
| Work with images | Image bytes appear as opaque binary previews instead of
usable visual data |
| Understand a visual result | Users can see the final output, but not how
it was produced |
After:
| User task | New experience |
| --- | --- |
| Load a dataset | `Smart Source` auto-detects file type, dialect, and
schema |
| Read a folder | The same source can read a folder of similar files and
preserve source-file lineage |
| Work with images | Image folders become structured rows with real image
previews |
| Understand a visual result | Clicking a visual result can open a `Visual
Journey` side panel |
**Main changes**
1. Add `Smart Source` (`SmartFileScan`) with support for CSV, TSV, JSON,
JSONL, Arrow, Parquet, Excel, images, and plain text.
2. Add backend file inference plus frontend inference summaries so the
property panel can show detected format, delimiter, header status, sheet,
schema size, and folder counts.
3. Extend folder support across dataset selection and file scanning:
- folders can be selected from the dataset picker
- `FileScan` can read folders while preserving relative file names
- new `File Split` operator routes rows from the same source file to the
same output port
4. Make image workflows more natural:
- image folders produce rows containing image bytes plus format and
dimensions
- recognized image binaries are serialized as image data URLs
- result tables render image thumbnails instead of raw binary text
5. Teach the agent service about `SmartFileScan` and include operator
display names in the prompt so the agent can reason about user-facing operator
names such as `Smart Source`.
6. Add a reusable `Visual Journey` side panel:
- visualizers can emit rich trace payloads
- ordinary image clicks fall back to a structural upstream workflow trace
- iframe-origin clicks are handled correctly so visualizer interactions
open the side panel reliably
### Any related issues, documentation, discussions?
- Related to hackathon discussion apache/texera#5059.
### How was this PR tested?
```bash
PATH="/Users/tanishqgandhi/.bun/bin:$PATH" bun test
agent-service/src/agent/prompts.test.ts agent-service/src/types/agent.test.ts
JAVA_HOME=$(/usr/libexec/java_home -v 17) sbt "testOnly
org.apache.texera.amber.operator.source.scan.smart.CSVDialectSnifferSpec
org.apache.texera.amber.operator.source.scan.smart.FormatDetectorSpec
org.apache.texera.amber.operator.source.scan.smart.SmartFileSourceOpDescSpec
org.apache.texera.amber.operator.source.scan.smart.SmartFileSourceOpExecSpec
org.apache.texera.amber.operator.fileSplit.FileSplitOpDescSpec
org.apache.texera.amber.operator.fileSplit.FileSplitOpExecSpec
org.apache.texera.amber.operator.source.scan.file.FileScanSourceOpDescSpec
org.apache.texera.web.service.ExecutionResultServiceSpec"
PATH="/Users/tanishqgandhi/.nvm/versions/node/v24.15.0/bin:$PATH" yarn ng
test --watch=false
--include='src/app/workspace/service/visual-trace/visual-trace.utils.spec.ts'
--include='src/app/workspace/component/visual-trace-panel/visual-trace-panel.component.spec.ts'
--include='src/app/workspace/component/result-panel/result-table-frame/result-table-cell.utils.spec.ts'
```
Manual verification:
1. Loaded folder-backed CSV datasets through `Smart Source`.
2. Loaded an image folder and confirmed result cells render image thumbnails.
3. Opened an HTML visualizer, clicked a winner card, and confirmed the
`Visual Journey` panel opens from iframe-origin clicks.
### Was this PR authored or co-authored using generative AI tooling?
Generated-by: Codex (GPT-5)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]