Date: Sun, 20 Sep 2020 04:02:45 +0100 From: Roy Marples <r...@marples.name> Message-ID: <51d2f8dc-d059-5eae-9899-5c91539d1...@marples.name>
| The test case just needed fixing. That is not uncommon after changes elsewhere. | The ping to an invalid address caused the ARP entry to enter INCOMPLETE -> | WAITDELETE state and this hung over into the next test casing this entry | to take too long to validty resolve. Why? If a failed ARP (or ND) causes problems for a later request (incl of the same addr) which should work (that is, any problems at all, including delays) then I'd consider the implementation broken (not the test). | The solution is after a deliberate fail And if it wasn't a deliberate fail? Perhaps being just a fraction of a second too quick, and attempting a ping (or ssh, or something) just before the destination becomes reachable (either because it was down, unconfigured, or the net link between then wasn't functional), and | to remove the ARP entry for the address if the user doing this isn't root, and cannot just remove ARP entries? Maybe I'm misunderstanding the actual scenario, but it seems to me that things aren't working as well now as they were before (the timing in the qemu tests hasn't changed recently - not since the nvmm version started being used - but before the arp implementation change, it used to work reliably). | This fixes all the test case fallout from the ARP -> ND merge and has now | survived several test runs. Yes, I have been watching, and I saw that. | The ND cache expiration test which intermittently fails is based on exact | timings. A future patch will add jitter to NS, will cause this test to | fail more. Ideas on how to solve it welcome. Some of the tests make unsupportable assumptions, that just happen to work when initially created. That one might be one of those, in which case we need to look and see what assertions can be made about the state at various times, and make sure that the test only attempts to verify things that ought be true - cache expiration is one of the harder ones to deal with, as generally that just happens whenever the kernel (or whatever is holding the cache - the kernel here) decides that now would be a useful time. Sometimes there the right way is not to test whether the entry has gone from the cache, but whether it is either gone, or in a state where it could vanish any time (eg: lifetime has decremented to 0 or whatever). Not always possible - some things that should happen just aren't possible to reliably test in an automated framework like this - some cache entries just "go away eventually", but the test cannot just wait for "eventually" to occur, especially, as is often the case, how long that takes depends naturally upon other activity, and in the test, there is none. kre