On Tue, 26 Nov 2024 at 17:56, Daniel P. Berrangé <[email protected]> wrote: > > On Tue, Nov 26, 2024 at 06:52:57PM +0100, Thomas Huth wrote: > > On 26/11/2024 18.46, Peter Maydell wrote: > > > On Tue, 26 Nov 2024 at 17:31, Daniel P. Berrangé <[email protected]> > > > wrote: > > > > > > > > On Tue, Nov 26, 2024 at 05:44:29PM +0100, Philippe Mathieu-Daudé wrote: > > > > > Hi, > > > > > > > > > > On 4/9/24 12:38, Thomas Huth wrote: > > > > > fetch() can fail [*] (see previous patch, various Exceptions > > > > > returned). > > > > > > > > > > What should we do in this case? If we ignore a missing artifact, > > > > > the tests will eventually fail. Better bail out early and save > > > > > credit minutes? > > > > > > > > We already do what you describe - 'fetch' will raise an exception > > > > which causes the precache task to fail, and the CI job gets marked > > > > as failed. We don't attempt to run tests if assets are missing. > > > > > > > > > > > > > > @@ -58,6 +59,12 @@ def tearDown(self): > > > > > > def main(): > > > > > > path = os.path.basename(sys.argv[0])[:-3] > > > > > > + > > > > > > + cache = os.environ.get("QEMU_TEST_PRECACHE", None) > > > > > > + if cache is not None: > > > > > > + Asset.precache_suites(path, cache) > > > > > > + return > > > > > > + > > > > > > tr = pycotap.TAPTestRunner(message_log = > > > > > > pycotap.LogMode.LogToError, > > > > > > test_output_log = > > > > > > pycotap.LogMode.LogToError) > > > > > > unittest.main(module = None, testRunner = tr, > > > > > > argv=["__dummy__", path]) > > > > > > > > > > [*] Peter reported the following CI failure: > > > > > > > > > > https://gitlab.com/qemu-project/qemu/-/jobs/8474928266 > > > > > > > > > > 2024-11-26 14:58:53,170 - qemu-test - ERROR - Unable to download > > > > > https://apt.armbian.com/pool/main/l/linux-6.6.16/linux-image-current-sunxi_24.2.1_armhf__6.6.16-Seb3e-D6b4a-P2359-Ce96bHfe66-HK01ba-V014b-B067e-R448a.deb: > > > > > > > > This looks to be working as intended. We failed to cache > > > > the asset, and so we stopped the job, without trying to > > > > run the tests. > > > > > > The job ended up in state "failed", with a red X mark in > > > the gitlab UI. If we intend that not being able to fetch > > > the assets doesn't count as a test failure, that didn't > > > work here. If we do intend that fetch failures should be > > > CI failures, we need to make our process of fetching and > > > caching the images more robust, because otherwise the result > > > is flaky CI jobs. > > > > I think we want to continue to maek failing downloads as test failures, > > otherwise we'll never notice when an asset is not available from the > > internet anymore (since SKIPs just get ignored). > > > > What we really need is a working cache for the private CI runners to ease > > the pain when the host just has a networking hiccup. > > Right, if the cache was working, once the cache is primed, then the only > time we would see a fail is if the commit introduces a /new/ URL that is > genuinely invalid. > > We absolutely need the caching for runners to be fixed as a high priority > task. It also breaks our ability to use ccache, which means our pipelines > are needlessly slower than they should be.
The other awkward part of the current setup, incidentally, is that if we fail to download one image file, we immediately stop the whole CI job, so we don't get any information about whether the other tests the job would have run would have passed or not. In situations like the current one where I'm basically ignoring this job as temporarily broken, that means we get less coverage than we might have had. -- PMM
