On Thu, 27 Jul 2023 at 20:08, Cleber Rosa <cr...@redhat.com> wrote:
>
> On Thu, Jul 27, 2023 at 11:50 AM Peter Maydell <peter.mayd...@linaro.org> 
> wrote:
> >
> > Ah, so the problem is that we are trying to download the asset
> > file, and the remote server is stalling so it doesn't actually
> > download the file in 90s, and Avocado doesn't distinguish
> > "hit the timeout while trying to download assets" from
> > "hit the timeout running the actual test" ?
> >
>
> Yes, exactly.  Once the test starts, that's the only timeout being
> enforced.  The fetch_asset() (and all the download code path) is
> simply part of the test and thus under the test timeout.  Also, right
> now, avocado.Test.fetch_asset() doesn't provide a timeout parameter
> (but the underlying avocado.utils.asset.Asset.fetch() does).
>
> > This sounds to me like the ideal would be that there is a separate
> > timeout for file downloads (which could then be a lot shorter than
> > the overall test timeout), and "timeout during asset download"
> > would be detected separately from "timeout while actually running
> > test".  But maybe the separation-of-phases in newer Avocado achieves
> > that already ?
> >
>
> The mechanism in newer Avocado will simply never attempt to run tests
> that don't have the stated requirements fulfilled.  With regards to
> timeouts, each of the different kinds of requirement implementations
> (file downloads and cache, A.K.A. "assets", packages installation,
> ansible module execution,  etc) are supposed to provide their own
> features, including timeouts.
>
> Anyways, I'll look into, and report back on:
>
> 1. expanding avocado.Test.fetch_asset() with a timeout parameter

If newer-avocado does all this stuff differently it might not
be worth the extra effort on something we're going to move away from.

> 2. making sure the newer implementation for the requirement types used
> by QEMU respect a timeout (they don't need to be smaller than the
> test, because they run completely outside of the test).

The main thing I think is that timeouts on asset fetch should
result in a SKIP or CANCEL status, not INTERRUPTED, because
the CI treats INTERRUPTED as a failure, whereas SKIP and CANCEL
are OK.

> For now, are you OK with re-running those jobs if the servers stall
> the transfers? Or would you rather see a patch that changes the
> find_only parameter to True, so that if the pre-test attempt to
> download the asset fails, the transfer is never attempted during the
> test?

I think for the moment we're OK retrying (or more usually, saying
"this job is failing today, ignore it") -- usually this kind
of thing is "somebody's server is having troubles" and it goes
away after a day or so.

thanks
-- PMM

Reply via email to