fapifta commented on PR #4681: URL: https://github.com/apache/ozone/pull/4681#issuecomment-1558900443
I am not sure how to approach the following tradeoff here: - on one hand, if we have a secure test environment, where there is a certificate rotation often, even if it becomes flaky that is a result for us, and we possibly find bugs in the rotation mechanism, however it is hard to debug, and most likely it won't happen all the time (knowing and extrapolating from my attitude against flaky tests). There is one interesting cause for a flaky behaviour here, when the box that runs the test runs out of entropy and with that introduces a higher delay that might lead to timeouts. (I am not sure if that is the case with the first attempt on the fork, I did not found logs to confirm.) - on the other hand, if we separate the rotation related testing, we do not need to to crazy things like that -c gazillion in the current version of the patch, but we loose a more wide correctness check regarding the rotation itself. As the test runs roughly for an hour or a bit more (in the success case it was 1 hour in the fail case it was over 1.5 hours), I think we should temper the situation, by allowing 10 minutes of timeout for the newly added rotation test, and work with a 5 minute certificate duration. As I understand (and @Galsza correct me if I am wrong), the newly added test actually checks every 5 seconds if the DN has a new certificate and with a 5 minute duration it would wait for 10 minutes tops for a rotation to happen, but if everything goes well, the rotation should happen in about 4.5 minutes the worst case with a grace duration of 30 seconds. With these settings counting with 9 services for an hours and some we are at about 120-140 certificates once the test is finished, for which we can have a reasonable count around 300 for a large safety net, and perhaps we are somewhere in the middle of the two extremes. Later on as part of HDDS-7401 we should revisit this, and see if we can define enough tests around the rotation (I believe we will) to pull off this change from here to a separate test set concentrating on rotation and functionality during and after rotation. What do you think @Galsza and @adoroszlai about the proposed defaults, and the idea in general to temper the amount of certificates via their lifetime to avoid requesting for potentially thousands of certificates in other tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
