[I] Schema preview and starter pipelines in CSV File Scan property pane [texera]

via GitHub Sat, 06 Jun 2026 20:37:21 -0700


yangzhang75 opened a new issue, #5445:
URL: https://github.com/apache/texera/issues/5445


   ### Feature Summary
   
   When a user adds a CSV File Scan operator and selects a dataset file, the 
Property pane shows only file metadata (path, encoding, limit). The user has no 
quick way to see what's in the dataset, and no guidance on what operators to 
add next — which is exactly the moment when help is most valuable.
   
   This proposes a Schema Preview section inside the CSV File Scan Property 
pane, in two parts:
   
   - **Part A — Schema preview:** show the column list with inferred types for 
the selected file, inline in the property pane.
   - **Part B — Starter pipelines:** show 2–3 suggested operator chains (e.g. 
classification, regression, exploration) derived from deterministic schema 
heuristics. Each has an Apply button that adds the whole chain to the canvas, 
fully linked, as a single undo step. No LLM. No automatic execution — the user 
configures parameters and runs the workflow themselves.
   
   The section is purely additive. The user can ignore it; nothing happens 
unless they click Apply.
   
   This builds on my BioFlow Genesis hackathon prototype (#5122). That 
prototype generated and ran entire workflows automatically — too aggressive. 
This proposal keeps the underlying insight (the moment a user picks a dataset 
is when help is most valuable) but constrains it to suggestion-only, no 
auto-execution, scoped to one operator's pane.
   
   Related to #4085 (onboarding) and #5099 / #5394 (making Texera more 
self-explanatory).
   
   ### Proposed Solution or Design
   
   ### Part A — Schema preview
   
   Column names and types are already inferred at compile time via 
`CSVScanSourceOpDesc.sourceSchema()` (header + ~100 sample rows), and the 
frontend already receives them via 
`WorkflowCompilingService.getOperatorOutputSchemaMap(operatorId)`. The preview 
reads from this service and refreshes on schema changes. No new backend 
endpoint required.
   
   The pane section follows the existing pattern of 
`TypeCastingDisplayComponent`: a standalone component rendered in 
`operator-property-edit-frame` via an operator-type `*ngIf`, taking 
`currentOperatorId` as input.
   ### Part B — Starter pipelines
   
   Deterministic heuristics over the inferred schema map to small operator 
recipes:
   
   - binary target → classification: Split + SklearnLogisticRegression + 
SklearnPrediction
   - continuous target → regression: Split + SklearnLinearRegression + 
SklearnPrediction
   - numeric-heavy, no clear target → exploration: Aggregate + a chart operator 
(e.g. BarChart)
   - many columns → Projection only
   
   These ML recipes are small DAGs, not linear chains — Split fans into the 
trainer's training/testing ports, and the trainer's model output plus test data 
feed into SklearnPrediction. Correctly wiring these dependent multi-input ports 
is the main complexity in PR 2.
   
   Apply uses the existing `WorkflowActionService.addOperatorsAndLinks(...)` as 
a single undo step. Operators are added with default parameters; the user 
configures and runs.
   
   ### PR split
   
   1. **PR 1 — Part A** (schema preview): frontend-only, self-contained.
   2. **PR 2 — Part B** (starter pipelines + Apply): separate PR, after 
discussion.
   
   ### Affected Area
   
   Workflow UI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Schema preview and starter pipelines in CSV File Scan property pane [texera]

Reply via email to