On Tue, Nov 26, 2024 at 06:52:57PM +0100, Thomas Huth wrote: > On 26/11/2024 18.46, Peter Maydell wrote: > > On Tue, 26 Nov 2024 at 17:31, Daniel P. Berrangé <[email protected]> > > wrote: > > > > > > On Tue, Nov 26, 2024 at 05:44:29PM +0100, Philippe Mathieu-Daudé wrote: > > > > Hi, > > > > > > > > On 4/9/24 12:38, Thomas Huth wrote: > > > > fetch() can fail [*] (see previous patch, various Exceptions returned). > > > > > > > > What should we do in this case? If we ignore a missing artifact, > > > > the tests will eventually fail. Better bail out early and save > > > > credit minutes? > > > > > > We already do what you describe - 'fetch' will raise an exception > > > which causes the precache task to fail, and the CI job gets marked > > > as failed. We don't attempt to run tests if assets are missing. > > > > > > > > > > > @@ -58,6 +59,12 @@ def tearDown(self): > > > > > def main(): > > > > > path = os.path.basename(sys.argv[0])[:-3] > > > > > + > > > > > + cache = os.environ.get("QEMU_TEST_PRECACHE", None) > > > > > + if cache is not None: > > > > > + Asset.precache_suites(path, cache) > > > > > + return > > > > > + > > > > > tr = pycotap.TAPTestRunner(message_log = > > > > > pycotap.LogMode.LogToError, > > > > > test_output_log = > > > > > pycotap.LogMode.LogToError) > > > > > unittest.main(module = None, testRunner = tr, > > > > > argv=["__dummy__", path]) > > > > > > > > [*] Peter reported the following CI failure: > > > > > > > > https://gitlab.com/qemu-project/qemu/-/jobs/8474928266 > > > > > > > > 2024-11-26 14:58:53,170 - qemu-test - ERROR - Unable to download > > > > https://apt.armbian.com/pool/main/l/linux-6.6.16/linux-image-current-sunxi_24.2.1_armhf__6.6.16-Seb3e-D6b4a-P2359-Ce96bHfe66-HK01ba-V014b-B067e-R448a.deb: > > > > > > This looks to be working as intended. We failed to cache > > > the asset, and so we stopped the job, without trying to > > > run the tests. > > > > The job ended up in state "failed", with a red X mark in > > the gitlab UI. If we intend that not being able to fetch > > the assets doesn't count as a test failure, that didn't > > work here. If we do intend that fetch failures should be > > CI failures, we need to make our process of fetching and > > caching the images more robust, because otherwise the result > > is flaky CI jobs. > > I think we want to continue to maek failing downloads as test failures, > otherwise we'll never notice when an asset is not available from the > internet anymore (since SKIPs just get ignored). > > What we really need is a working cache for the private CI runners to ease > the pain when the host just has a networking hiccup.
Right, if the cache was working, once the cache is primed, then the only time we would see a fail is if the commit introduces a /new/ URL that is genuinely invalid. We absolutely need the caching for runners to be fixed as a high priority task. It also breaks our ability to use ccache, which means our pipelines are needlessly slower than they should be. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
