aglinxinyuan opened a new issue, #5734:
URL: https://github.com/apache/texera/issues/5734
### Task Summary
Add dedicated unit-specs for three `LogicalOp` descriptors in the SET
operator family (`union`, `distinct`, `difference`). Pin the descriptor →
`PhysicalOp` translation (operator class name, input/output port shape,
partitioning requirements) so a refactor that drifts any one of those wires is
caught immediately.
## Background
Three concrete `LogicalOp` descriptors in
`common/workflow-operator/operator/` currently lack a dedicated unit-spec. Each
describes a set-style operator (union / distinct set-difference) and wires its
physical-op class name + port shape + partition requirements through
`getPhysicalOp`:
| Source class | Package | What's wired |
| --- | --- | --- |
| `UnionOpDesc` | `operator.union` |
`OpExecWithClassName("…operator.union.UnionOpExec")`; one input port, one
output port; no partition requirement |
| `DistinctOpDesc` | `operator.distinct` |
`OpExecWithClassName("…operator.distinct.DistinctOpExec")`; `HashPartition`
input + derived; blocking output |
| `DifferenceOpDesc` | `operator.difference` |
`OpExecWithClassName("…operator.difference.DifferenceOpExec")`; two input ports
(`left`, `right`) with `HashPartition`; blocking output; schema propagation
requires both inputs to share one schema |
## Behavior to pin
For each descriptor:
| Surface | Contract |
| --- | --- |
| `getPhysicalOp(workflowId, executionId)` | constructs a `PhysicalOp`
referencing the correct executor class name |
| Input ports / output ports | counts and (for `Difference`) display names
match `operatorInfo` |
| `operatorInfo` | name, description, group constant |
| Partition requirement (for `Distinct` / `Difference`) | `HashPartition` |
| `derivePartition` (for `Distinct` / `Difference`) | returns
`HashPartition` regardless of input |
| `Difference` schema propagation | accepts a single shared input schema;
throws `IllegalArgumentException` when input schemas diverge |
| `OperatorGroupConstants.SET_GROUP` /
`OperatorGroupConstants.CLEANING_GROUP` | match the production constants |
## Scope
- New spec files (one per source class per the spec-filename convention):
- `UnionOpDescSpec.scala`
- `DistinctOpDescSpec.scala`
- `DifferenceOpDescSpec.scala`
- No production-code changes.
### Task Type
- [ ] Refactor / Cleanup
- [ ] DevOps / Deployment / CI
- [x] Testing / QA
- [ ] Documentation
- [ ] Performance
- [ ] Other
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]