fapifta commented on PR #4681:
URL: https://github.com/apache/ozone/pull/4681#issuecomment-1558900443

   I am not sure how to approach the following tradeoff here:
   - on one hand, if we have a secure test environment, where there is a 
certificate rotation often, even if it becomes flaky that is a result for us, 
and we possibly find bugs in the rotation mechanism, however it is hard to 
debug, and most likely it won't happen all the time (knowing and extrapolating 
from my attitude against flaky tests). There is one interesting cause for a 
flaky behaviour here, when the box that runs the test runs out of entropy and 
with that introduces a higher delay that might lead to timeouts. (I am not sure 
if that is the case with the first attempt on the fork, I did not found logs to 
confirm.)
   - on the other hand, if we separate the rotation related testing, we do not 
need to to crazy things like that -c gazillion in the current version of the 
patch, but we loose a more wide correctness check regarding the rotation itself.
   
   As the test runs roughly for an hour or a bit more (in the success case it 
was 1 hour in the fail case it was over 1.5 hours), I think we should temper 
the situation, by allowing 10 minutes of timeout for the newly added rotation 
test, and work with a 5 minute certificate duration. As I understand (and 
@Galsza correct me if I am wrong), the newly added test actually checks every 5 
seconds if the DN has a new certificate and with a 5 minute duration it would 
wait for 10 minutes tops for a rotation to happen, but if everything goes well, 
the rotation should happen in about 4.5 minutes the worst case with a grace 
duration of 30 seconds.
   
   With these settings counting with 9 services for an hours and some we are at 
about 120-140 certificates once the test is finished, for which we can have a 
reasonable count around 300 for a large safety net, and perhaps we are 
somewhere in the middle of the two extremes.
   
   
   Later on as part of HDDS-7401 we should revisit this, and see if we can 
define enough tests around the rotation (I believe we will) to pull off this 
change from here to a separate test set concentrating on rotation and 
functionality during and after rotation.
   
   What do you think @Galsza and @adoroszlai about the proposed defaults, and 
the idea in general to temper the amount of certificates via their lifetime to 
avoid requesting for potentially thousands of certificates in other tests?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to