On 21/8/21 2:38 am, Kinsey Moore wrote: > On 8/19/2021 18:03, Chris Johns wrote: >> On 20/8/21 4:55 am, Kinsey Moore wrote: >>> On 8/19/2021 13:32, Gedare Bloom wrote: >>>> On Thu, Aug 19, 2021 at 11:43 AM Kinsey Moore <kinsey.mo...@oarcorp.com> >>>> wrote: >>>>> I've seen these failures on my local system, in our CI, and on a build >>>>> server that I sometimes >>>>> use for development/testing so if it's a configuration issue we're being >>>>> pretty consistent about >>>>> misconfiguration across some pretty different environments (docker, >>>>> bare-metal, VM, different >>>>> OSs, different QEMU versions). I've seen enough of the spintrcritical >>>>> tests fail sporadically on >>>>> QEMU to lump them all into this category. These are also tests that I >>>>> have seen behave badly >>>>> on ARMv7 QEMU on my local system (which doesn't rule out >>>>> misconfiguration, but it's another >>>>> data point). >>>>> >>>> Yes, for example, it may be a matter of qemu process counts spawned by >>>> rtems-test, and the order in which tests get invoked could be a cause >>>> for which ones don't work. I could easily see this happening, since >>>> each test runtime will be fairly consistent, so you'll often see the >>>> same tests running concurrently with each other. But, if you change >>>> the order (e.g., by adding new tests), then we may see a new set of >>>> sporadically failing testcases, will we just add those, or do we need >>>> to re-examine this indetermine set periodically? Who will maintain >>>> this list? That's kind of the root of my concern here. >>> I understand your concern about maintenance of the failure list and I don't >>> have a good answer for you. I imagine going forward it would be a >>> combination >>> of the current stake-holders for a given BSP and anyone who watches the >>> automated build output from Joel's runs for these kinds of issues. >>> >>> On the other hand if we don't mark those tests, people will get fatigued >>> looking at the spurious failures and assume any new ones just fall into the >>> same category as others. At that point is it even worth running the >>> automated tests for that platform? >>> >>>>> As far as your worry about marking these indeterminate, they're only >>>>> being marked as such for >>>>> QEMU BSPs. The ZynqMP hardware BSP doesn't have these testing carve-outs >>>>> and runs all these tests flawlessly. >> Great, this is important. >> >>>>> These failures become much more common when there is otherwise load on >>>>> the system and a >>>>> lot of them disappear when you limit the tester to a single QEMU >>>>> instance at a time. >>>>> >>>> I'm wondering if we should sacrifice testing speed for >>>> coverage/quality. If throttling rtems-test leads to more reliable test >>>> results, then it may be a better option than basically ignoring a >>>> swath of our testsuite. >>> That would certainly mitigate some of the failures, but you'd also have to >>> guarantee nothing else is running on the system which could cause the same >>> problem. I know at least some of the current automated runs operate on a >>> shared system which can and does often have other intensive processes >>> running on it. There are also the tests that are sporadic on QEMU even >>> without additional load. >> What is it in these tests when combined with qemu that causes the tests to >> fail? >> Is there some relation to a real clock, some shared host resource or a bug in >> qemu? I am concerned a simulator can vary like this based on the host's load >> and >> it makes me wonder how people use it on machines to host a number VMs. > I experienced very similar results on an ARMv7 BSP (not Zynq) and assumed > that this > was a known/accepted problem with QEMU when the same issues popped up on > AArch64.
I think we have just ignored issue. I know I have ignored it because of the rabbit hole it is. > My local system under no other load produces these failures for the > Zynq A9 QEMU > BSP: > > "failed": [ > "spcpucounter01.exe", > "psxtimes01.exe", > "sp69.exe", > "psx12.exe", > "minimum.exe", > "dl06.exe", > "sptimecounter02.exe" > ], > > minimum.exe We have discussed this test in the past and I think the end result from Joel was an exit code of 0 meant it had passed but I am not sure the exit code is printed because it is minimal. Maybe it should be changed to be a `no-run` type test? > and dl06.exe are probably unrelated, Yeap and that is one I should fix when I can find the time. > but the remainder are in my problem set for AArch64 on QEMU. OK. > A run of the AArch64 ZynqMP ILP32 BSP produced these failures under the same > conditions with all the test carve-outs removed: > > "failed": [ > "psx12.exe", > "spcpucounter01.exe", > "sptimecounter01.exe", > "sptimecounter02.exe", > "sp04.exe" > ], > > Because of my experience with the aforementioned ARMv7 BSP and the lack of > failures on hardware, I chose not to weed out the root cause of the failures > under > QEMU. Sure. It however leaves the underlying problem about the reasons these fail with QEMU and so we caught either way. > This patch is documentation of our observations across multiple > architectures and BSPs running on QEMU more than anything else. And also effects the results. >> I feel with this volume of tests being tagged this way we should have a >> better >> understanding of the problem and so a means to track or not track how to >> resolve >> it. As Gedare has kindly stated once pushed this change disappears into a >> dark >> corner and we have no means to track it. >> >> The other solution is to set `jobs` to `1` in this BSP's tester config, again >> something Gedare has raised. It means we get better or even valid results. >> What >> is more important, valid results or running the testsuite as fast as >> possible? > I fully support dropping the number of jobs to "half" or 1 for better results > on > QEMU runs that display these problems. OK then may be this is the way to go. > My comment in that regard was that other system > loading (or multiple simultaneous test runs) can also cause the same problem > and so > this is only a partial solution. Barring a fix for RTEMS or QEMU for these > load- > dependent and sporadic failures, this at least still needs to be documented > in some > form. Yes and the failures should highlight an issue on the host that needs to be looked into. Chris _______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel