aglinxinyuan opened a new pull request, #5798: URL: https://github.com/apache/texera/pull/5798
### What changes were proposed in this PR? Pin behavior of three small utility classes/objects in `common/workflow-operator/`. Each one is too thin to justify its own PR but cohesive as a bundle (utility surface). No production-code changes. | Spec | Source class | Tests | | --- | --- | --- | | `OperatorDescriptorUtilsSpec` | `OperatorDescriptorUtils` (object) | 8 | | `DistributedAggregationSpec` | `DistributedAggregation` (case class) | 9 | | `URLFetchUtilSpec` | `URLFetchUtil` (object) | 6 | All three spec files follow the `<srcClassName>Spec.scala` one-to-one convention. **Behavior pinned — `OperatorDescriptorUtils`** | Surface | Contract | | --- | --- | | `equallyPartitionGoal` size | result has exactly `totalNumWorkers` slots | | Sum invariant | slots sum back to `goal` across `goal ∈ [0..20]` × `workers ∈ [1..5]` | | Even partition | when `goal % workers == 0`, every slot is `goal / workers` | | Remainder placement | the first `goal % workers` slots get `+1` (in order) | | `goal < workers` edge case | first `goal` slots get `1`, the rest get `0` | | `toImmutableMap` empty | empty `java.util.Map` → empty Scala `Map` | | `toImmutableMap` preserves entries | round-trip preserves every key/value pair | | Return type | static type is `scala.collection.immutable.Map` (compile-time enforced) | **Behavior pinned — `DistributedAggregation`** | Surface | Contract | | --- | --- | | Case-class shape | all four function members reachable; equality on identical function refs | | `init` | produces the zero partial `(0L, 0L)` | | `iterate` | folds one tuple in: `sum += value`, `count += 1` | | `merge` | adds two partials componentwise | | `finalAgg` divides | `(15L, 5L) → 3.0d` | | `finalAgg` zero guard | `(0L, 0L) → 0.0d` (no divide-by-zero) | | End-to-end single-node | average of `1..5` via fold-left == `3.0` | | End-to-end with merge | same answer via two partial nodes + `merge` | **Behavior pinned — `URLFetchUtil`** | Surface | Contract | | --- | --- | | Success path | `getInputStreamFromURL(file:tempFile)` returns `Some(stream)` carrying the file's exact bytes | | Success with explicit retries | same, with `retries = 3` | | Failure path (default retries) | non-existent `file:` URL returns `None` | | Failure path (`retries = 0`) | loop iterates zero times → `None` immediately | | Failure path (`retries = 2`) | persistent failure exhausts retries → `None` | | Default arg value | `getInputStreamFromURL\$default\$2 == 5`, verified via Scala's synthetic default-accessor | The URLFetchUtil specs use the JVM's built-in `file:` URL handler against temporary files (success path) and non-existent paths (failure path) — no external network calls or process exec. ### Any related issues, documentation, discussions? Closes #5795. ### How was this PR tested? Pure unit-test additions; verified locally with: - `sbt \"WorkflowOperator/testOnly org.apache.texera.amber.operator.util.OperatorDescriptorUtilsSpec org.apache.texera.amber.operator.aggregate.DistributedAggregationSpec org.apache.texera.amber.operator.source.fetcher.URLFetchUtilSpec\"` — 23 tests, all green - `sbt \"WorkflowOperator/Test/scalafmtCheck\"` — clean - CI to confirm ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7 [1M context]) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
