aglinxinyuan opened a new issue, #5777:
URL: https://github.com/apache/texera/issues/5777
### Task Summary
Add a dedicated `DistributedAggregationSpec.scala` that pins the
four-function distributed-aggregation contract end-to-end using a
representative aggregation (e.g. average): local partials computed per "node"
and then merged must equal a single-node fold.
## Background
`DistributedAggregation[P <: AnyRef]`
(`operator/aggregate/DistributedAggregation.scala`) is the case class that
defines how an aggregate is computed in a data-parallel engine, via four
functions (pattern from the SOSP'09 *Distributed Aggregation* paper). It has no
dedicated unit-spec.
```scala
case class DistributedAggregation[P <: AnyRef](
init: () => P, // initial partial
iterate: (P, Tuple) => P, // accumulate one input tuple
merge: (P, P) => P, // combine two partials
finalAgg: (P) => Object // partial -> final value
)
```
## Behavior to pin
Define a representative average aggregation `DistributedAggregation[(Double,
Long)]` over a single numeric column and assert:
| Step | Contract |
| --- | --- |
| `init()` | returns the identity partial `(0.0, 0L)` |
| `iterate` | folds a tuple's value in: `(sum + v, count + 1)` |
| `merge` | combines two partials additively; commutative/associative |
| `finalAgg` | `(sum, count) => sum / count` |
| distributed == single-node | split the input tuples across two partitions,
`iterate` each locally from `init()`, `merge` the partials, `finalAgg` → equals
the average from folding all tuples in one partition |
| empty partition | a partition with no tuples contributes `init()` and
leaves the merged result unchanged |
Build `Tuple`s with the `Schema` / `Attribute` / `Tuple` helpers — see
`AggregateOpSpec` in the same package for the pattern.
## Scope
- New spec: `DistributedAggregationSpec.scala` under
`common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/aggregate/`.
- The spec supplies its own representative aggregation functions — the goal
is to pin the case class's contract/wiring, not any specific operator.
- No production-code changes.
### Task Type
- [ ] Refactor / Cleanup
- [ ] DevOps / Deployment / CI
- [x] Testing / QA
- [ ] Documentation
- [ ] Performance
- [ ] Other
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]