Dracylfrr opened a new pull request, #5625: URL: https://github.com/apache/texera/pull/5625
### What changes were proposed in this PR? This PR adds a new **Column Summary Statistics** workflow operator. The operator takes one input table and outputs one summary row per input column. The output includes: * `columnName` * `dataType` * `rowCount` * `nullCount` * `nonNullCount` * `minValue` * `maxValue` * `meanValue` For numeric columns, the operator computes `minValue`, `maxValue`, and `meanValue` in addition to row/null/non-null counts. For non-numeric columns, the operator reports row/null/non-null counts and leaves numeric summary fields as `null`. This PR includes: * A new `ColumnSummaryStatisticsOpDesc` * A new `ColumnSummaryStatisticsOpExec` * A new `ColumnSummaryStatisticsOpExecConfig` * Operator registration in `LogicalOp` * Unit tests covering numeric, string, null, mixed-column, and empty-input behavior The operator is intentionally scoped as a workflow operator for basic per-column summary statistics. ### Any related issues, documentation, discussions? Related to #____ ### How was this PR tested? Added unit tests in: `common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/statistics/columnsummary/ColumnSummaryStatisticsOpExecSpec.scala` The tests cover: * Computing min, max, mean, row count, null count, and non-null count for an integer column * Computing numeric statistics while leaving non-numeric statistics as `null` * Returning one summary row for each input column * Returning no rows when no input tuples are processed Test command run locally: `sbt "WorkflowOperator / testOnly org.apache.texera.amber.operator.statistics.columnsummary.ColumnSummaryStatisticsOpExecSpec"` Result: `Tests: succeeded 4, failed 0` `All tests passed.` ### Was this PR authored or co-authored using generative AI tooling? Generated-by: ChatGPT (GPT-5.5 Thinking) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
