aglinxinyuan opened a new pull request, #5798:
URL: https://github.com/apache/texera/pull/5798

   ### What changes were proposed in this PR?
   
   Pin behavior of three small utility classes/objects in 
`common/workflow-operator/`. Each one is too thin to justify its own PR but 
cohesive as a bundle (utility surface). No production-code changes.
   
   | Spec | Source class | Tests |
   | --- | --- | --- |
   | `OperatorDescriptorUtilsSpec` | `OperatorDescriptorUtils` (object) | 8 |
   | `DistributedAggregationSpec` | `DistributedAggregation` (case class) | 9 |
   | `URLFetchUtilSpec` | `URLFetchUtil` (object) | 6 |
   
   All three spec files follow the `<srcClassName>Spec.scala` one-to-one 
convention.
   
   **Behavior pinned — `OperatorDescriptorUtils`**
   
   | Surface | Contract |
   | --- | --- |
   | `equallyPartitionGoal` size | result has exactly `totalNumWorkers` slots |
   | Sum invariant | slots sum back to `goal` across `goal ∈ [0..20]` × 
`workers ∈ [1..5]` |
   | Even partition | when `goal % workers == 0`, every slot is `goal / 
workers` |
   | Remainder placement | the first `goal % workers` slots get `+1` (in order) 
|
   | `goal < workers` edge case | first `goal` slots get `1`, the rest get `0` |
   | `toImmutableMap` empty | empty `java.util.Map` → empty Scala `Map` |
   | `toImmutableMap` preserves entries | round-trip preserves every key/value 
pair |
   | Return type | static type is `scala.collection.immutable.Map` 
(compile-time enforced) |
   
   **Behavior pinned — `DistributedAggregation`**
   
   | Surface | Contract |
   | --- | --- |
   | Case-class shape | all four function members reachable; equality on 
identical function refs |
   | `init` | produces the zero partial `(0L, 0L)` |
   | `iterate` | folds one tuple in: `sum += value`, `count += 1` |
   | `merge` | adds two partials componentwise |
   | `finalAgg` divides | `(15L, 5L) → 3.0d` |
   | `finalAgg` zero guard | `(0L, 0L) → 0.0d` (no divide-by-zero) |
   | End-to-end single-node | average of `1..5` via fold-left == `3.0` |
   | End-to-end with merge | same answer via two partial nodes + `merge` |
   
   **Behavior pinned — `URLFetchUtil`**
   
   | Surface | Contract |
   | --- | --- |
   | Success path | `getInputStreamFromURL(file:tempFile)` returns 
`Some(stream)` carrying the file's exact bytes |
   | Success with explicit retries | same, with `retries = 3` |
   | Failure path (default retries) | non-existent `file:` URL returns `None` |
   | Failure path (`retries = 0`) | loop iterates zero times → `None` 
immediately |
   | Failure path (`retries = 2`) | persistent failure exhausts retries → 
`None` |
   | Default arg value | `getInputStreamFromURL\$default\$2 == 5`, verified via 
Scala's synthetic default-accessor |
   
   The URLFetchUtil specs use the JVM's built-in `file:` URL handler against 
temporary files (success path) and non-existent paths (failure path) — no 
external network calls or process exec.
   
   ### Any related issues, documentation, discussions?
   
   Closes #5795.
   
   ### How was this PR tested?
   
   Pure unit-test additions; verified locally with:
   
   - `sbt \"WorkflowOperator/testOnly 
org.apache.texera.amber.operator.util.OperatorDescriptorUtilsSpec 
org.apache.texera.amber.operator.aggregate.DistributedAggregationSpec 
org.apache.texera.amber.operator.source.fetcher.URLFetchUtilSpec\"` — 23 tests, 
all green
   - `sbt \"WorkflowOperator/Test/scalafmtCheck\"` — clean
   - CI to confirm
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Opus 4.7 [1M context])


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to