functional: enable pre-emptive caching of assets

Daniel P . Berrangé Tue, 26 Nov 2024 09:57:08 -0800

On Tue, Nov 26, 2024 at 06:52:57PM +0100, Thomas Huth wrote:
> On 26/11/2024 18.46, Peter Maydell wrote:
> > On Tue, 26 Nov 2024 at 17:31, Daniel P. Berrangé <[email protected]> 
> > wrote:
> > > 
> > > On Tue, Nov 26, 2024 at 05:44:29PM +0100, Philippe Mathieu-Daudé wrote:
> > > > Hi,
> > > > 
> > > > On 4/9/24 12:38, Thomas Huth wrote:
> > > > fetch() can fail [*] (see previous patch, various Exceptions returned).
> > > > 
> > > > What should we do in this case? If we ignore a missing artifact,
> > > > the tests will eventually fail. Better bail out early and save
> > > > credit minutes?
> > > 
> > > We already do what you describe - 'fetch' will raise an exception
> > > which causes the precache task to fail, and the CI job gets marked
> > > as failed. We don't attempt to run tests if assets are missing.
> > > 
> > > 
> > > > > @@ -58,6 +59,12 @@ def tearDown(self):
> > > > >        def main():
> > > > >            path = os.path.basename(sys.argv[0])[:-3]
> > > > > +
> > > > > +        cache = os.environ.get("QEMU_TEST_PRECACHE", None)
> > > > > +        if cache is not None:
> > > > > +            Asset.precache_suites(path, cache)
> > > > > +            return
> > > > > +
> > > > >            tr = pycotap.TAPTestRunner(message_log = 
> > > > > pycotap.LogMode.LogToError,
> > > > >                                       test_output_log = 
> > > > > pycotap.LogMode.LogToError)
> > > > >            unittest.main(module = None, testRunner = tr, 
> > > > > argv=["__dummy__", path])
> > > > 
> > > > [*] Peter reported the following CI failure:
> > > > 
> > > >    https://gitlab.com/qemu-project/qemu/-/jobs/8474928266
> > > > 
> > > > 2024-11-26 14:58:53,170 - qemu-test - ERROR - Unable to download 
> > > > https://apt.armbian.com/pool/main/l/linux-6.6.16/linux-image-current-sunxi_24.2.1_armhf__6.6.16-Seb3e-D6b4a-P2359-Ce96bHfe66-HK01ba-V014b-B067e-R448a.deb:
> > > 
> > > This looks to be working as intended. We failed to cache
> > > the asset, and so we stopped the job, without trying to
> > > run the tests.
> > 
> > The job ended up in state "failed", with a red X mark in
> > the gitlab UI. If we intend that not being able to fetch
> > the assets doesn't count as a test failure, that didn't
> > work here. If we do intend that fetch failures should be
> > CI failures, we need to make our process of fetching and
> > caching the images more robust, because otherwise the result
> > is flaky CI jobs.
> 
> I think we want to continue to maek failing downloads as test failures,
> otherwise we'll never notice when an asset is not available from the
> internet anymore (since SKIPs just get ignored).
> 
> What we really need is a working cache for the private CI runners to ease
> the pain when the host just has a networking hiccup.


Right, if the cache was working, once the cache is primed, then the only
time we would see a fail is if the commit introduces a /new/ URL that is
genuinely invalid.

We absolutely need the caching for runners to be fixed as a high priority
task. It also breaks our ability to use ccache, which means our pipelines
are needlessly slower than they should be.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PULL 15/42] tests/functional: enable pre-emptive caching of assets

Reply via email to