On Thu, 27 Jul 2023 at 20:08, Cleber Rosa <cr...@redhat.com> wrote: > > On Thu, Jul 27, 2023 at 11:50 AM Peter Maydell <peter.mayd...@linaro.org> > wrote: > > > > Ah, so the problem is that we are trying to download the asset > > file, and the remote server is stalling so it doesn't actually > > download the file in 90s, and Avocado doesn't distinguish > > "hit the timeout while trying to download assets" from > > "hit the timeout running the actual test" ? > > > > Yes, exactly. Once the test starts, that's the only timeout being > enforced. The fetch_asset() (and all the download code path) is > simply part of the test and thus under the test timeout. Also, right > now, avocado.Test.fetch_asset() doesn't provide a timeout parameter > (but the underlying avocado.utils.asset.Asset.fetch() does). > > > This sounds to me like the ideal would be that there is a separate > > timeout for file downloads (which could then be a lot shorter than > > the overall test timeout), and "timeout during asset download" > > would be detected separately from "timeout while actually running > > test". But maybe the separation-of-phases in newer Avocado achieves > > that already ? > > > > The mechanism in newer Avocado will simply never attempt to run tests > that don't have the stated requirements fulfilled. With regards to > timeouts, each of the different kinds of requirement implementations > (file downloads and cache, A.K.A. "assets", packages installation, > ansible module execution, etc) are supposed to provide their own > features, including timeouts. > > Anyways, I'll look into, and report back on: > > 1. expanding avocado.Test.fetch_asset() with a timeout parameter
If newer-avocado does all this stuff differently it might not be worth the extra effort on something we're going to move away from. > 2. making sure the newer implementation for the requirement types used > by QEMU respect a timeout (they don't need to be smaller than the > test, because they run completely outside of the test). The main thing I think is that timeouts on asset fetch should result in a SKIP or CANCEL status, not INTERRUPTED, because the CI treats INTERRUPTED as a failure, whereas SKIP and CANCEL are OK. > For now, are you OK with re-running those jobs if the servers stall > the transfers? Or would you rather see a patch that changes the > find_only parameter to True, so that if the pre-test attempt to > download the asset fails, the transfer is never attempted during the > test? I think for the moment we're OK retrying (or more usually, saying "this job is failing today, ignore it") -- usually this kind of thing is "somebody's server is having troubles" and it goes away after a day or so. thanks -- PMM