yangzhang75 opened a new issue, #5445: URL: https://github.com/apache/texera/issues/5445
### Feature Summary When a user adds a CSV File Scan operator and selects a dataset file, the Property pane shows only file metadata (path, encoding, limit). The user has no quick way to see what's in the dataset, and no guidance on what operators to add next — which is exactly the moment when help is most valuable. This proposes a Schema Preview section inside the CSV File Scan Property pane, in two parts: - **Part A — Schema preview:** show the column list with inferred types for the selected file, inline in the property pane. - **Part B — Starter pipelines:** show 2–3 suggested operator chains (e.g. classification, regression, exploration) derived from deterministic schema heuristics. Each has an Apply button that adds the whole chain to the canvas, fully linked, as a single undo step. No LLM. No automatic execution — the user configures parameters and runs the workflow themselves. The section is purely additive. The user can ignore it; nothing happens unless they click Apply. This builds on my BioFlow Genesis hackathon prototype (#5122). That prototype generated and ran entire workflows automatically — too aggressive. This proposal keeps the underlying insight (the moment a user picks a dataset is when help is most valuable) but constrains it to suggestion-only, no auto-execution, scoped to one operator's pane. Related to #4085 (onboarding) and #5099 / #5394 (making Texera more self-explanatory). ### Proposed Solution or Design ### Part A — Schema preview Column names and types are already inferred at compile time via `CSVScanSourceOpDesc.sourceSchema()` (header + ~100 sample rows), and the frontend already receives them via `WorkflowCompilingService.getOperatorOutputSchemaMap(operatorId)`. The preview reads from this service and refreshes on schema changes. No new backend endpoint required. The pane section follows the existing pattern of `TypeCastingDisplayComponent`: a standalone component rendered in `operator-property-edit-frame` via an operator-type `*ngIf`, taking `currentOperatorId` as input. ### Part B — Starter pipelines Deterministic heuristics over the inferred schema map to small operator recipes: - binary target → classification: Split + SklearnLogisticRegression + SklearnPrediction - continuous target → regression: Split + SklearnLinearRegression + SklearnPrediction - numeric-heavy, no clear target → exploration: Aggregate + a chart operator (e.g. BarChart) - many columns → Projection only These ML recipes are small DAGs, not linear chains — Split fans into the trainer's training/testing ports, and the trainer's model output plus test data feed into SklearnPrediction. Correctly wiring these dependent multi-input ports is the main complexity in PR 2. Apply uses the existing `WorkflowActionService.addOperatorsAndLinks(...)` as a single undo step. Operators are added with default parameters; the user configures and runs. ### PR split 1. **PR 1 — Part A** (schema preview): frontend-only, self-contained. 2. **PR 2 — Part B** (starter pipelines + Apply): separate PR, after discussion. ### Affected Area Workflow UI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
