razinbouzar commented on PR #15:
URL: https://github.com/apache/druid-operator/pull/15#issuecomment-4306997113

   @aruraghuwanshi I believe the failing `900s` timeout is caused by the test’s 
rollout trigger, not by the deterministic ordering change itself.
   
   `e2e/test-rolling-deploy-ordering.sh` currently patches 
`spec.nodes.historicalstier{1,2}.workloadAnnotations` and assumes that this 
forces both StatefulSets to roll. In practice, that only changes StatefulSet 
object metadata, not the pod template, so the StatefulSet `updateRevision` 
never changes and the script eventually times out.
   
   I’d recommend:
     - patch `podAnnotations` instead of `workloadAnnotations` to force a real 
StatefulSet revision change
     - fail early if tier1 never picks up a new revision
     - assert the ordering directly via revision progression:
       - tier1 revision changes first
       - tier2 revision is still unchanged at that moment
       - tier1 rollout completes
       - tier2 revision changes only afterwards
     - use `trap`-based cleanup so failed runs do not leak test resources
   
   I tested this locally and the revised flow behaves as expected: the rollout 
is triggered, tier1 updates first, tier2 waits, and the test completes 
successfully without hitting the `900s` timeout.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to