aruraghuwanshi opened a new pull request, #15: URL: https://github.com/apache/druid-operator/pull/15
### Summary `getNodeSpecsByOrder` groups node specs by `NodeType` but previously appended them from `for key, nodeSpec := range m.Spec.Nodes`. In Go, map iteration order is not stable, so the **relative order of multiple specs with the same `NodeType` could change between reconciles**. With **`rollingDeploy: true`**, the handler walks that ordered list and may return early while a workload is still rolling. If the intra–`NodeType` order flips between calls, the operator can effectively **start or advance rollouts for more than one StatefulSet/Deployment of the same `NodeType` at a time**, instead of finishing one before the next. This PR **sorts specs by their map key (ascending) within each `NodeType`** before building the final list, while keeping the existing cross–`NodeType` order from `druidServicesOrder` unchanged. --- ### What changed - **`getNodeSpecsByOrder`** (`controllers/druid/ordering.go`): `sort.Slice` on each per–`NodeType` slice by `ServiceGroup.key`. - **Unit tests** (`controllers/druid/ordering_test.go`): Ginkgo test data uses multiple historical tiers (`historicalstier1`–`3`); added `testing.T` tests that would **fail on the pre-fix map-only ordering** and pass with stable sorting, plus a guard for cross–`NodeType` order. - **E2E** (`e2e/configs/druid-rolling-deploy-cr.yaml`, `e2e/test-rolling-deploy-ordering.sh`, wired from `e2e/e2e.sh`): two historical tiers (`historicalstier1` / `historicalstier2`) with `rollingDeploy: true`, patch to trigger a rollout, and checks that **only one** of the two historical StatefulSets is mid-update at a time, with **lexicographically first tier finishing before the second starts** (when transitions are observable at the poll interval). --- ### Testing - [ ] Tested on a real Kubernetes cluster: **create** a new Druid cluster (as required by the checklist — run full `e2e/e2e.sh` or your environment’s equivalent). - [ ] Tested for **backward compatibility** on an existing cluster (no API/schema change; behavior change is **ordering only**, which is the intended fix for `rollingDeploy` with multiple nodes per `NodeType`). - [x] **Comments** in `getNodeSpecsByOrder` document why sorting is required (kept short). - [ ] **User-facing docs** (e.g. operator docs) — not added; **release note** text below is suitable for release notes if the project uses them. --- ### Release note (suggested) **Druid Operator:** When `rollingDeploy` is enabled, rollout order for multiple StatefulSets/Deployments that share the same `NodeType` (e.g. `historicalstier1` and `historicalstier2`) is now **stable** (sorted by node spec key). That avoids **concurrent rollouts within the same `NodeType`** caused by non-deterministic map iteration. --- ### Key changed/added files - `controllers/druid/ordering.go` — sort within each `NodeType` by spec key - `controllers/druid/ordering_test.go` — Ginkgo + `testing.T` coverage - `controllers/druid/testdata/ordering.yaml` — fixture with multiple historical tiers - `e2e/configs/druid-rolling-deploy-cr.yaml` — rolling-deploy E2E CR - `e2e/test-rolling-deploy-ordering.sh` — E2E script - `e2e/e2e.sh` — invoke the new E2E test --- <!-- If this fixes a tracked issue, use: Fixes #1234. Otherwise remove the next line. --> Fixes #XXXX. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
