Dracylfrr opened a new issue, #5693: URL: https://github.com/apache/texera/issues/5693
### Feature Summary Texera workflows often involve exploring a dataset before applying cleaning, visualization, or analysis operators. A basic column-level summary operator would help users quickly understand the shape and quality of an input table. This issue proposes adding a Column Summary Statistics workflow operator that takes one input table and outputs one summary row per input column. Initial output fields: * columnName * dataType * rowCount * nullCount * nonNullCount * minValue * maxValue * meanValue ### Proposed Solution or Design For the first version: * Numeric columns should report min, max, and mean. * Non-numeric columns should report row/null/non-null counts and leave numeric summary fields null. * The operator should follow existing Texera native operator patterns. * Unit tests should cover numeric columns, non-numeric columns, null values, mixed columns, and empty input. This is intended as a focused workflow operator for basic per-column summary statistics. ### Affected Area Workflow Engine (Amber) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
