aruraghuwanshi opened a new pull request, #15:
URL: https://github.com/apache/druid-operator/pull/15

   ### Summary
   
   `getNodeSpecsByOrder` groups node specs by `NodeType` but previously 
appended them from `for key, nodeSpec := range m.Spec.Nodes`. In Go, map 
iteration order is not stable, so the **relative order of multiple specs with 
the same `NodeType` could change between reconciles**.
   
   With **`rollingDeploy: true`**, the handler walks that ordered list and may 
return early while a workload is still rolling. If the intra–`NodeType` order 
flips between calls, the operator can effectively **start or advance rollouts 
for more than one StatefulSet/Deployment of the same `NodeType` at a time**, 
instead of finishing one before the next.
   
   This PR **sorts specs by their map key (ascending) within each `NodeType`** 
before building the final list, while keeping the existing cross–`NodeType` 
order from `druidServicesOrder` unchanged.
   
   ---
   
   ### What changed
   
   - **`getNodeSpecsByOrder`** (`controllers/druid/ordering.go`): `sort.Slice` 
on each per–`NodeType` slice by `ServiceGroup.key`.
   - **Unit tests** (`controllers/druid/ordering_test.go`): Ginkgo test data 
uses multiple historical tiers (`historicalstier1`–`3`); added `testing.T` 
tests that would **fail on the pre-fix map-only ordering** and pass with stable 
sorting, plus a guard for cross–`NodeType` order.
   - **E2E** (`e2e/configs/druid-rolling-deploy-cr.yaml`, 
`e2e/test-rolling-deploy-ordering.sh`, wired from `e2e/e2e.sh`): two historical 
tiers (`historicalstier1` / `historicalstier2`) with `rollingDeploy: true`, 
patch to trigger a rollout, and checks that **only one** of the two historical 
StatefulSets is mid-update at a time, with **lexicographically first tier 
finishing before the second starts** (when transitions are observable at the 
poll interval).
   
   ---
   
   ### Testing
   
   - [ ] Tested on a real Kubernetes cluster: **create** a new Druid cluster 
(as required by the checklist — run full `e2e/e2e.sh` or your environment’s 
equivalent).
   - [ ] Tested for **backward compatibility** on an existing cluster (no 
API/schema change; behavior change is **ordering only**, which is the intended 
fix for `rollingDeploy` with multiple nodes per `NodeType`).
   - [x] **Comments** in `getNodeSpecsByOrder` document why sorting is required 
(kept short).
   - [ ] **User-facing docs** (e.g. operator docs) — not added; **release 
note** text below is suitable for release notes if the project uses them.
   
   ---
   
   ### Release note (suggested)
   
   **Druid Operator:** When `rollingDeploy` is enabled, rollout order for 
multiple StatefulSets/Deployments that share the same `NodeType` (e.g. 
`historicalstier1` and `historicalstier2`) is now **stable** (sorted by node 
spec key). That avoids **concurrent rollouts within the same `NodeType`** 
caused by non-deterministic map iteration.
   
   ---
   
   ### Key changed/added files
   
   - `controllers/druid/ordering.go` — sort within each `NodeType` by spec key
   - `controllers/druid/ordering_test.go` — Ginkgo + `testing.T` coverage
   - `controllers/druid/testdata/ordering.yaml` — fixture with multiple 
historical tiers
   - `e2e/configs/druid-rolling-deploy-cr.yaml` — rolling-deploy E2E CR
   - `e2e/test-rolling-deploy-ordering.sh` — E2E script
   - `e2e/e2e.sh` — invoke the new E2E test
   
   ---
   
   <!-- If this fixes a tracked issue, use: Fixes #1234. Otherwise remove the 
next line. -->
   Fixes #XXXX.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to