[PATCH] aarch64: Add tests that are failing intermittently

2021-08-19 Thread Ryan Long
- Change status of all spintrcritical tests to indeterminate, expanded upon
  comments.
- Add indeterminate tests to xilinx-versal
---
 spec/build/bsps/aarch64/a53/tsta53.yml| 40 ++---
 spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml | 54 ++-
 spec/build/bsps/aarch64/xilinx-zynqmp/tstqemu.yml | 40 ++---
 3 files changed, 120 insertions(+), 14 deletions(-)

diff --git a/spec/build/bsps/aarch64/a53/tsta53.yml 
b/spec/build/bsps/aarch64/a53/tsta53.yml
index f263557..6e8f348 100644
--- a/spec/build/bsps/aarch64/a53/tsta53.yml
+++ b/spec/build/bsps/aarch64/a53/tsta53.yml
@@ -1,20 +1,26 @@
 SPDX-License-Identifier: CC-BY-SA-4.0 OR BSD-2-Clause
 actions:
 - set-test-state:
-# expected to fail, don't compile these
+# This test fails when ran through rtems-tester because it does not
+# produce any output.
 minimum: exclude
 
-# don't compile due to toolchain issues
+# These tests do not compile due to an issue with the GNU Assembler.
+# The issue has been filed(https://devel.rtems.org/ticket/4218).
+# Once the issue has been fixed, these tests will be turned back on.
 spconfig01: exclude
 spmisc01: exclude
 
-# tests that are passing intermittently
+# These tests may or may not fail, however, they do pass on real hardware.
+# It seems to be an issue with QEMU.
 spcpucounter01: indeterminate
+sptimecounter01: indeterminate
 rtmonuse: indeterminate
-sp68: indeterminate
 sp04: indeterminate
 sp20: indeterminate
+sp68: indeterminate
 sp69: indeterminate
+sp71: indeterminate
 rtmonusxtimes01: indeterminate
 spedfsched02: indeterminate
 spedfsched04: indeterminate
@@ -24,12 +30,34 @@ actions:
 sptimecounter04: indeterminate
 ttest02: indeterminate
 
-# tests that pass nominally, but fail under Qemu when the host is under
-# heavy load
+# These tests may or may not fail, however, they do pass on real hardware.
+# It seems to be an issue with Qemu, and that this only occurs when the
+# host machine is under a heavy load.
 psx12: indeterminate
+spintrcritical01: indeterminate
+spintrcritical02: indeterminate
 spintrcritical03: indeterminate
 spintrcritical04: indeterminate
 spintrcritical05: indeterminate
+spintrcritical06: indeterminate
+spintrcritical07: indeterminate
+spintrcritical08: indeterminate
+spintrcritical09: indeterminate
+spintrcritical10: indeterminate
+spintrcritical11: indeterminate
+spintrcritical12: indeterminate
+spintrcritical13: indeterminate
+spintrcritical14: indeterminate
+spintrcritical15: indeterminate
+spintrcritical16: indeterminate
+spintrcritical17: indeterminate
+spintrcritical18: indeterminate
+spintrcritical19: indeterminate
+spintrcritical20: indeterminate
+spintrcritical21: indeterminate
+spintrcritical22: indeterminate
+spintrcritical23: indeterminate
+spintrcritical24: indeterminate
 build-type: option
 copyrights:
 - Copyright (C) 2020 On-Line Applications Research (OAR)
diff --git a/spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml 
b/spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml
index 43f6b2e..884effc 100644
--- a/spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml
+++ b/spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml
@@ -1,13 +1,63 @@
 SPDX-License-Identifier: CC-BY-SA-4.0 OR BSD-2-Clause
 actions:
 - set-test-state:
-# expected to fail
+# This test fails when ran through rtems-tester because it does not
+# produce any output.
 minimum: exclude
 
-# don't compile due to toolchain issues, see RTEMS issue #4218
+# These tests do not compile due to an issue with the GNU Assembler.
+# The issue has been filed(https://devel.rtems.org/ticket/4218).
+# Once the issue has been fixed, these tests will be turned back on.
 spconfig01: exclude
 spmisc01: exclude
 
+# These tests may or may not fail, however, they do pass on real hardware.
+# It seems to be an issue with Qemu.
+spcpucounter01: indeterminate
+sptimecounter01: indeterminate
+rtmonuse: indeterminate
+sp04: indeterminate
+sp20: indeterminate
+sp68: indeterminate
+sp69: indeterminate
+sp71: indeterminate
+rtmonusxtimes01: indeterminate
+spedfsched02: indeterminate
+spedfsched04: indeterminate
+psxtimes01: indeterminate
+sprmsched01: indeterminate
+sptimecounter02: indeterminate
+sptimecounter04: indeterminate
+ttest02: indeterminate
+
+# These tests may or may not fail, however, they do pass on real hardware.
+# It seems to be an issue with Qemu, and that this only occurs when the
+# host machine is under a heavy load.
+psx12: indeterminate
+spintrcritical01: indeterminate
+spintrcritical02: indeterminate
+spintrcritical03: indeterminate
+spintrcritical04: indeterminate
+spintrcritical05: indeterminate
+

Re: [PATCH] aarch64: Add tests that are failing intermittently

2021-08-19 Thread Gedare Bloom
Can you explain the process for generating the lists of indeterminate
test results?

I hate to circle this subject so many times, but is labeling sporadic
simulator failures as indeterminate results really the right thing to
do? Are these indeterminate tests reproducible on different
systems/qemus/loads? Or is it just what you observe locally when
running rtems-test on one specific system? I don't think I see nearly
so many spurious failures when I run rtems-test for example. I really
need to believe we're not just hiding a system configuration problem.

I know I OK'd looking at the versal, but on second thought, I'd rather
leave the xilinx-versal/tstqemu.yml alone until the BSP is finished,
so revert that part of your patch. Sorry about that.

Gedare

On Thu, Aug 19, 2021 at 9:53 AM Ryan Long  wrote:
>
> - Change status of all spintrcritical tests to indeterminate, expanded upon
>   comments.
> - Add indeterminate tests to xilinx-versal
> ---
>  spec/build/bsps/aarch64/a53/tsta53.yml| 40 ++---
>  spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml | 54 
> ++-
>  spec/build/bsps/aarch64/xilinx-zynqmp/tstqemu.yml | 40 ++---
>  3 files changed, 120 insertions(+), 14 deletions(-)
>
> diff --git a/spec/build/bsps/aarch64/a53/tsta53.yml 
> b/spec/build/bsps/aarch64/a53/tsta53.yml
> index f263557..6e8f348 100644
> --- a/spec/build/bsps/aarch64/a53/tsta53.yml
> +++ b/spec/build/bsps/aarch64/a53/tsta53.yml
> @@ -1,20 +1,26 @@
>  SPDX-License-Identifier: CC-BY-SA-4.0 OR BSD-2-Clause
>  actions:
>  - set-test-state:
> -# expected to fail, don't compile these
> +# This test fails when ran through rtems-tester because it does not
> +# produce any output.
>  minimum: exclude
>
> -# don't compile due to toolchain issues
> +# These tests do not compile due to an issue with the GNU Assembler.
> +# The issue has been filed(https://devel.rtems.org/ticket/4218).
> +# Once the issue has been fixed, these tests will be turned back on.
>  spconfig01: exclude
>  spmisc01: exclude
>
> -# tests that are passing intermittently
> +# These tests may or may not fail, however, they do pass on real 
> hardware.
> +# It seems to be an issue with QEMU.
>  spcpucounter01: indeterminate
> +sptimecounter01: indeterminate
>  rtmonuse: indeterminate
> -sp68: indeterminate
>  sp04: indeterminate
>  sp20: indeterminate
> +sp68: indeterminate
>  sp69: indeterminate
> +sp71: indeterminate
>  rtmonusxtimes01: indeterminate
>  spedfsched02: indeterminate
>  spedfsched04: indeterminate
> @@ -24,12 +30,34 @@ actions:
>  sptimecounter04: indeterminate
>  ttest02: indeterminate
>
> -# tests that pass nominally, but fail under Qemu when the host is under
> -# heavy load
> +# These tests may or may not fail, however, they do pass on real 
> hardware.
> +# It seems to be an issue with Qemu, and that this only occurs when the
> +# host machine is under a heavy load.
>  psx12: indeterminate
> +spintrcritical01: indeterminate
> +spintrcritical02: indeterminate
>  spintrcritical03: indeterminate
>  spintrcritical04: indeterminate
>  spintrcritical05: indeterminate
> +spintrcritical06: indeterminate
> +spintrcritical07: indeterminate
> +spintrcritical08: indeterminate
> +spintrcritical09: indeterminate
> +spintrcritical10: indeterminate
> +spintrcritical11: indeterminate
> +spintrcritical12: indeterminate
> +spintrcritical13: indeterminate
> +spintrcritical14: indeterminate
> +spintrcritical15: indeterminate
> +spintrcritical16: indeterminate
> +spintrcritical17: indeterminate
> +spintrcritical18: indeterminate
> +spintrcritical19: indeterminate
> +spintrcritical20: indeterminate
> +spintrcritical21: indeterminate
> +spintrcritical22: indeterminate
> +spintrcritical23: indeterminate
> +spintrcritical24: indeterminate
>  build-type: option
>  copyrights:
>  - Copyright (C) 2020 On-Line Applications Research (OAR)
> diff --git a/spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml 
> b/spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml
> index 43f6b2e..884effc 100644
> --- a/spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml
> +++ b/spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml
> @@ -1,13 +1,63 @@
>  SPDX-License-Identifier: CC-BY-SA-4.0 OR BSD-2-Clause
>  actions:
>  - set-test-state:
> -# expected to fail
> +# This test fails when ran through rtems-tester because it does not
> +# produce any output.
>  minimum: exclude
>
> -# don't compile due to toolchain issues, see RTEMS issue #4218
> +# These tests do not compile due to an issue with the GNU Assembler.
> +# The issue has been filed(https://devel.rtems.org/ticket/4218).
> +# Once the issue has been fixed, these tests will be turned back on.
>  spconfig01: exclude
>  spmisc01: exclude
>
> +

Re: [PATCH] aarch64: Add tests that are failing intermittently

2021-08-19 Thread Kinsey Moore
I've seen these failures on my local system, in our CI, and on a build 
server that I sometimes
use for development/testing so if it's a configuration issue we're being 
pretty consistent about
misconfiguration across some pretty different environments (docker, 
bare-metal, VM, different
OSs, different QEMU versions). I've seen enough of the spintrcritical 
tests fail sporadically on
QEMU to lump them all into this category. These are also tests that I 
have seen behave badly
on ARMv7 QEMU on my local system (which doesn't rule out 
misconfiguration, but it's another

data point).

As far as your worry about marking these indeterminate, they're only 
being marked as such for
QEMU BSPs. The ZynqMP hardware BSP doesn't have these testing carve-outs 
and runs all

these tests flawlessly.

These failures become much more common when there is otherwise load on 
the system and a
lot of them disappear when you limit the tester to a single QEMU 
instance at a time.



Kinsey


On 8/19/2021 11:58, Gedare Bloom wrote:

Can you explain the process for generating the lists of indeterminate
test results?

I hate to circle this subject so many times, but is labeling sporadic
simulator failures as indeterminate results really the right thing to
do? Are these indeterminate tests reproducible on different
systems/qemus/loads? Or is it just what you observe locally when
running rtems-test on one specific system? I don't think I see nearly
so many spurious failures when I run rtems-test for example. I really
need to believe we're not just hiding a system configuration problem.

I know I OK'd looking at the versal, but on second thought, I'd rather
leave the xilinx-versal/tstqemu.yml alone until the BSP is finished,
so revert that part of your patch. Sorry about that.

Gedare

On Thu, Aug 19, 2021 at 9:53 AM Ryan Long  wrote:

- Change status of all spintrcritical tests to indeterminate, expanded upon
   comments.
- Add indeterminate tests to xilinx-versal
---
  spec/build/bsps/aarch64/a53/tsta53.yml| 40 ++---
  spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml | 54 ++-
  spec/build/bsps/aarch64/xilinx-zynqmp/tstqemu.yml | 40 ++---
  3 files changed, 120 insertions(+), 14 deletions(-)

diff --git a/spec/build/bsps/aarch64/a53/tsta53.yml 
b/spec/build/bsps/aarch64/a53/tsta53.yml
index f263557..6e8f348 100644
--- a/spec/build/bsps/aarch64/a53/tsta53.yml
+++ b/spec/build/bsps/aarch64/a53/tsta53.yml
@@ -1,20 +1,26 @@
  SPDX-License-Identifier: CC-BY-SA-4.0 OR BSD-2-Clause
  actions:
  - set-test-state:
-# expected to fail, don't compile these
+# This test fails when ran through rtems-tester because it does not
+# produce any output.
  minimum: exclude

-# don't compile due to toolchain issues
+# These tests do not compile due to an issue with the GNU Assembler.
+# The issue has been filed(https://devel.rtems.org/ticket/4218).
+# Once the issue has been fixed, these tests will be turned back on.
  spconfig01: exclude
  spmisc01: exclude

-# tests that are passing intermittently
+# These tests may or may not fail, however, they do pass on real hardware.
+# It seems to be an issue with QEMU.
  spcpucounter01: indeterminate
+sptimecounter01: indeterminate
  rtmonuse: indeterminate
-sp68: indeterminate
  sp04: indeterminate
  sp20: indeterminate
+sp68: indeterminate
  sp69: indeterminate
+sp71: indeterminate
  rtmonusxtimes01: indeterminate
  spedfsched02: indeterminate
  spedfsched04: indeterminate
@@ -24,12 +30,34 @@ actions:
  sptimecounter04: indeterminate
  ttest02: indeterminate

-# tests that pass nominally, but fail under Qemu when the host is under
-# heavy load
+# These tests may or may not fail, however, they do pass on real hardware.
+# It seems to be an issue with Qemu, and that this only occurs when the
+# host machine is under a heavy load.
  psx12: indeterminate
+spintrcritical01: indeterminate
+spintrcritical02: indeterminate
  spintrcritical03: indeterminate
  spintrcritical04: indeterminate
  spintrcritical05: indeterminate
+spintrcritical06: indeterminate
+spintrcritical07: indeterminate
+spintrcritical08: indeterminate
+spintrcritical09: indeterminate
+spintrcritical10: indeterminate
+spintrcritical11: indeterminate
+spintrcritical12: indeterminate
+spintrcritical13: indeterminate
+spintrcritical14: indeterminate
+spintrcritical15: indeterminate
+spintrcritical16: indeterminate
+spintrcritical17: indeterminate
+spintrcritical18: indeterminate
+spintrcritical19: indeterminate
+spintrcritical20: indeterminate
+spintrcritical21: indeterminate
+spintrcritical22: indeterminate
+spintrcritical23: indeterminate
+spintrcritical24: indeterminate
  build-type: option
  copyrights:
  - Copyright (C) 2020 On-Line Applications Research (OAR)
diff -

Re: [PATCH] aarch64: Add tests that are failing intermittently

2021-08-19 Thread Gedare Bloom
On Thu, Aug 19, 2021 at 11:43 AM Kinsey Moore  wrote:
>
> I've seen these failures on my local system, in our CI, and on a build
> server that I sometimes
> use for development/testing so if it's a configuration issue we're being
> pretty consistent about
> misconfiguration across some pretty different environments (docker,
> bare-metal, VM, different
> OSs, different QEMU versions). I've seen enough of the spintrcritical
> tests fail sporadically on
> QEMU to lump them all into this category. These are also tests that I
> have seen behave badly
> on ARMv7 QEMU on my local system (which doesn't rule out
> misconfiguration, but it's another
> data point).
>
Yes, for example, it may be a matter of qemu process counts spawned by
rtems-test, and the order in which tests get invoked could be a cause
for which ones don't work. I could easily see this happening, since
each test runtime will be fairly consistent, so you'll often see the
same tests running concurrently with each other. But, if you change
the order (e.g., by adding new tests), then we may see a new set of
sporadically failing testcases, will we just add those, or do we need
to re-examine this indetermine set periodically? Who will maintain
this list? That's kind of the root of my concern here.

> As far as your worry about marking these indeterminate, they're only
> being marked as such for
> QEMU BSPs. The ZynqMP hardware BSP doesn't have these testing carve-outs
> and runs all
> these tests flawlessly.
>
> These failures become much more common when there is otherwise load on
> the system and a
> lot of them disappear when you limit the tester to a single QEMU
> instance at a time.
>
I'm wondering if we should sacrifice testing speed for
coverage/quality. If throttling rtems-test leads to more reliable test
results, then it may be a better option than basically ignoring a
swath of our testsuite.

>
> Kinsey
>
>
> On 8/19/2021 11:58, Gedare Bloom wrote:
> > Can you explain the process for generating the lists of indeterminate
> > test results?
> >
> > I hate to circle this subject so many times, but is labeling sporadic
> > simulator failures as indeterminate results really the right thing to
> > do? Are these indeterminate tests reproducible on different
> > systems/qemus/loads? Or is it just what you observe locally when
> > running rtems-test on one specific system? I don't think I see nearly
> > so many spurious failures when I run rtems-test for example. I really
> > need to believe we're not just hiding a system configuration problem.
> >
> > I know I OK'd looking at the versal, but on second thought, I'd rather
> > leave the xilinx-versal/tstqemu.yml alone until the BSP is finished,
> > so revert that part of your patch. Sorry about that.
> >
> > Gedare
> >
> > On Thu, Aug 19, 2021 at 9:53 AM Ryan Long  wrote:
> >> - Change status of all spintrcritical tests to indeterminate, expanded upon
> >>comments.
> >> - Add indeterminate tests to xilinx-versal
> >> ---
> >>   spec/build/bsps/aarch64/a53/tsta53.yml| 40 ++---
> >>   spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml | 54 
> >> ++-
> >>   spec/build/bsps/aarch64/xilinx-zynqmp/tstqemu.yml | 40 ++---
> >>   3 files changed, 120 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/spec/build/bsps/aarch64/a53/tsta53.yml 
> >> b/spec/build/bsps/aarch64/a53/tsta53.yml
> >> index f263557..6e8f348 100644
> >> --- a/spec/build/bsps/aarch64/a53/tsta53.yml
> >> +++ b/spec/build/bsps/aarch64/a53/tsta53.yml
> >> @@ -1,20 +1,26 @@
> >>   SPDX-License-Identifier: CC-BY-SA-4.0 OR BSD-2-Clause
> >>   actions:
> >>   - set-test-state:
> >> -# expected to fail, don't compile these
> >> +# This test fails when ran through rtems-tester because it does not
> >> +# produce any output.
> >>   minimum: exclude
> >>
> >> -# don't compile due to toolchain issues
> >> +# These tests do not compile due to an issue with the GNU Assembler.
> >> +# The issue has been filed(https://devel.rtems.org/ticket/4218).
> >> +# Once the issue has been fixed, these tests will be turned back on.
> >>   spconfig01: exclude
> >>   spmisc01: exclude
> >>
> >> -# tests that are passing intermittently
> >> +# These tests may or may not fail, however, they do pass on real 
> >> hardware.
> >> +# It seems to be an issue with QEMU.
> >>   spcpucounter01: indeterminate
> >> +sptimecounter01: indeterminate
> >>   rtmonuse: indeterminate
> >> -sp68: indeterminate
> >>   sp04: indeterminate
> >>   sp20: indeterminate
> >> +sp68: indeterminate
> >>   sp69: indeterminate
> >> +sp71: indeterminate
> >>   rtmonusxtimes01: indeterminate
> >>   spedfsched02: indeterminate
> >>   spedfsched04: indeterminate
> >> @@ -24,12 +30,34 @@ actions:
> >>   sptimecounter04: indeterminate
> >>   ttest02: indeterminate
> >>
> >> -# tests that pass nominally, but fail under Qemu when the host 

Re: [PATCH] aarch64: Add tests that are failing intermittently

2021-08-19 Thread Kinsey Moore

On 8/19/2021 13:32, Gedare Bloom wrote:

On Thu, Aug 19, 2021 at 11:43 AM Kinsey Moore  wrote:

I've seen these failures on my local system, in our CI, and on a build
server that I sometimes
use for development/testing so if it's a configuration issue we're being
pretty consistent about
misconfiguration across some pretty different environments (docker,
bare-metal, VM, different
OSs, different QEMU versions). I've seen enough of the spintrcritical
tests fail sporadically on
QEMU to lump them all into this category. These are also tests that I
have seen behave badly
on ARMv7 QEMU on my local system (which doesn't rule out
misconfiguration, but it's another
data point).


Yes, for example, it may be a matter of qemu process counts spawned by
rtems-test, and the order in which tests get invoked could be a cause
for which ones don't work. I could easily see this happening, since
each test runtime will be fairly consistent, so you'll often see the
same tests running concurrently with each other. But, if you change
the order (e.g., by adding new tests), then we may see a new set of
sporadically failing testcases, will we just add those, or do we need
to re-examine this indetermine set periodically? Who will maintain
this list? That's kind of the root of my concern here.

I understand your concern about maintenance of the failure list and I don't
have a good answer for you. I imagine going forward it would be a 
combination

of the current stake-holders for a given BSP and anyone who watches the
automated build output from Joel's runs for these kinds of issues.


On the other hand if we don't mark those tests, people will get fatigued
looking at the spurious failures and assume any new ones just fall into the
same category as others. At that point is it even worth running the
automated tests for that platform?




As far as your worry about marking these indeterminate, they're only
being marked as such for
QEMU BSPs. The ZynqMP hardware BSP doesn't have these testing carve-outs
and runs all
these tests flawlessly.

These failures become much more common when there is otherwise load on
the system and a
lot of them disappear when you limit the tester to a single QEMU
instance at a time.


I'm wondering if we should sacrifice testing speed for
coverage/quality. If throttling rtems-test leads to more reliable test
results, then it may be a better option than basically ignoring a
swath of our testsuite.

That would certainly mitigate some of the failures, but you'd also have to
guarantee nothing else is running on the system which could cause the same
problem. I know at least some of the current automated runs operate on a
shared system which can and does often have other intensive processes
running on it. There are also the tests that are sporadic on QEMU even
without additional load.




Kinsey


On 8/19/2021 11:58, Gedare Bloom wrote:

Can you explain the process for generating the lists of indeterminate
test results?

I hate to circle this subject so many times, but is labeling sporadic
simulator failures as indeterminate results really the right thing to
do? Are these indeterminate tests reproducible on different
systems/qemus/loads? Or is it just what you observe locally when
running rtems-test on one specific system? I don't think I see nearly
so many spurious failures when I run rtems-test for example. I really
need to believe we're not just hiding a system configuration problem.

I know I OK'd looking at the versal, but on second thought, I'd rather
leave the xilinx-versal/tstqemu.yml alone until the BSP is finished,
so revert that part of your patch. Sorry about that.

Gedare

On Thu, Aug 19, 2021 at 9:53 AM Ryan Long  wrote:

- Change status of all spintrcritical tests to indeterminate, expanded upon
comments.
- Add indeterminate tests to xilinx-versal
---
   spec/build/bsps/aarch64/a53/tsta53.yml| 40 ++---
   spec/build/bsps/aarch64/xilinx-versal/tstqemu.yml | 54 
++-
   spec/build/bsps/aarch64/xilinx-zynqmp/tstqemu.yml | 40 ++---
   3 files changed, 120 insertions(+), 14 deletions(-)

diff --git a/spec/build/bsps/aarch64/a53/tsta53.yml 
b/spec/build/bsps/aarch64/a53/tsta53.yml
index f263557..6e8f348 100644
--- a/spec/build/bsps/aarch64/a53/tsta53.yml
+++ b/spec/build/bsps/aarch64/a53/tsta53.yml
@@ -1,20 +1,26 @@
   SPDX-License-Identifier: CC-BY-SA-4.0 OR BSD-2-Clause
   actions:
   - set-test-state:
-# expected to fail, don't compile these
+# This test fails when ran through rtems-tester because it does not
+# produce any output.
   minimum: exclude

-# don't compile due to toolchain issues
+# These tests do not compile due to an issue with the GNU Assembler.
+# The issue has been filed(https://devel.rtems.org/ticket/4218).
+# Once the issue has been fixed, these tests will be turned back on.
   spconfig01: exclude
   spmisc01: exclude

-# tests that are passing intermittently
+# These tests may o

Re: [PATCH] aarch64: Add tests that are failing intermittently

2021-08-19 Thread Chris Johns
On 20/8/21 2:58 am, Gedare Bloom wrote:
> I know I OK'd looking at the versal, but on second thought, I'd rather
> leave the xilinx-versal/tstqemu.yml alone until the BSP is finished,
> so revert that part of your patch. Sorry about that.

Agreed, please leave the Versal as is.

Chris
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel


Re: [PATCH] aarch64: Add tests that are failing intermittently

2021-08-19 Thread Chris Johns
On 20/8/21 4:55 am, Kinsey Moore wrote:
> On 8/19/2021 13:32, Gedare Bloom wrote:
>> On Thu, Aug 19, 2021 at 11:43 AM Kinsey Moore  
>> wrote:
>>> I've seen these failures on my local system, in our CI, and on a build
>>> server that I sometimes
>>> use for development/testing so if it's a configuration issue we're being
>>> pretty consistent about
>>> misconfiguration across some pretty different environments (docker,
>>> bare-metal, VM, different
>>> OSs, different QEMU versions). I've seen enough of the spintrcritical
>>> tests fail sporadically on
>>> QEMU to lump them all into this category. These are also tests that I
>>> have seen behave badly
>>> on ARMv7 QEMU on my local system (which doesn't rule out
>>> misconfiguration, but it's another
>>> data point).
>>>
>> Yes, for example, it may be a matter of qemu process counts spawned by
>> rtems-test, and the order in which tests get invoked could be a cause
>> for which ones don't work. I could easily see this happening, since
>> each test runtime will be fairly consistent, so you'll often see the
>> same tests running concurrently with each other. But, if you change
>> the order (e.g., by adding new tests), then we may see a new set of
>> sporadically failing testcases, will we just add those, or do we need
>> to re-examine this indetermine set periodically? Who will maintain
>> this list? That's kind of the root of my concern here.
> I understand your concern about maintenance of the failure list and I don't
> have a good answer for you. I imagine going forward it would be a combination
> of the current stake-holders for a given BSP and anyone who watches the
> automated build output from Joel's runs for these kinds of issues.
> 
> On the other hand if we don't mark those tests, people will get fatigued
> looking at the spurious failures and assume any new ones just fall into the
> same category as others. At that point is it even worth running the
> automated tests for that platform?
> 
>>
>>> As far as your worry about marking these indeterminate, they're only
>>> being marked as such for
>>> QEMU BSPs. The ZynqMP hardware BSP doesn't have these testing carve-outs
>>> and runs all these tests flawlessly.

Great, this is important.

>>> These failures become much more common when there is otherwise load on
>>> the system and a
>>> lot of them disappear when you limit the tester to a single QEMU
>>> instance at a time.
>>>
>> I'm wondering if we should sacrifice testing speed for
>> coverage/quality. If throttling rtems-test leads to more reliable test
>> results, then it may be a better option than basically ignoring a
>> swath of our testsuite.
> That would certainly mitigate some of the failures, but you'd also have to
> guarantee nothing else is running on the system which could cause the same
> problem. I know at least some of the current automated runs operate on a
> shared system which can and does often have other intensive processes
> running on it. There are also the tests that are sporadic on QEMU even
> without additional load.

What is it in these tests when combined with qemu that causes the tests to fail?
Is there some relation to a real clock, some shared host resource or a bug in
qemu? I am concerned a simulator can vary like this based on the host's load and
it makes me wonder how people use it on machines to host a number VMs.

I feel with this volume of tests being tagged this way we should have a better
understanding of the problem and so a means to track or not track how to resolve
it. As Gedare has kindly stated once pushed this change disappears into a dark
corner and we have no means to track it.

The other solution is to set `jobs` to `1` in this BSP's tester config, again
something Gedare has raised. It means we get better or even valid results. What
is more important, valid results or running the testsuite as fast as possible?

Chris
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel


Re: [PATCH] aarch64: Add tests that are failing intermittently

2021-08-20 Thread Kinsey Moore


On 8/19/2021 18:03, Chris Johns wrote:

On 20/8/21 4:55 am, Kinsey Moore wrote:

On 8/19/2021 13:32, Gedare Bloom wrote:

On Thu, Aug 19, 2021 at 11:43 AM Kinsey Moore  wrote:

I've seen these failures on my local system, in our CI, and on a build
server that I sometimes
use for development/testing so if it's a configuration issue we're being
pretty consistent about
misconfiguration across some pretty different environments (docker,
bare-metal, VM, different
OSs, different QEMU versions). I've seen enough of the spintrcritical
tests fail sporadically on
QEMU to lump them all into this category. These are also tests that I
have seen behave badly
on ARMv7 QEMU on my local system (which doesn't rule out
misconfiguration, but it's another
data point).


Yes, for example, it may be a matter of qemu process counts spawned by
rtems-test, and the order in which tests get invoked could be a cause
for which ones don't work. I could easily see this happening, since
each test runtime will be fairly consistent, so you'll often see the
same tests running concurrently with each other. But, if you change
the order (e.g., by adding new tests), then we may see a new set of
sporadically failing testcases, will we just add those, or do we need
to re-examine this indetermine set periodically? Who will maintain
this list? That's kind of the root of my concern here.

I understand your concern about maintenance of the failure list and I don't
have a good answer for you. I imagine going forward it would be a combination
of the current stake-holders for a given BSP and anyone who watches the
automated build output from Joel's runs for these kinds of issues.

On the other hand if we don't mark those tests, people will get fatigued
looking at the spurious failures and assume any new ones just fall into the
same category as others. At that point is it even worth running the
automated tests for that platform?


As far as your worry about marking these indeterminate, they're only
being marked as such for
QEMU BSPs. The ZynqMP hardware BSP doesn't have these testing carve-outs
and runs all these tests flawlessly.

Great, this is important.


These failures become much more common when there is otherwise load on
the system and a
lot of them disappear when you limit the tester to a single QEMU
instance at a time.


I'm wondering if we should sacrifice testing speed for
coverage/quality. If throttling rtems-test leads to more reliable test
results, then it may be a better option than basically ignoring a
swath of our testsuite.

That would certainly mitigate some of the failures, but you'd also have to
guarantee nothing else is running on the system which could cause the same
problem. I know at least some of the current automated runs operate on a
shared system which can and does often have other intensive processes
running on it. There are also the tests that are sporadic on QEMU even
without additional load.

What is it in these tests when combined with qemu that causes the tests to fail?
Is there some relation to a real clock, some shared host resource or a bug in
qemu? I am concerned a simulator can vary like this based on the host's load and
it makes me wonder how people use it on machines to host a number VMs.
I experienced very similar results on an ARMv7 BSP (not Zynq) and 
assumed that this

was a known/accepted problem with QEMU when the same issues popped up on
AArch64. My local system under no other load produces these failures for 
the Zynq A9 QEMU

BSP:

    "failed": [
    "spcpucounter01.exe",
    "psxtimes01.exe",
    "sp69.exe",
    "psx12.exe",
    "minimum.exe",
    "dl06.exe",
    "sptimecounter02.exe"
    ],

minimum.exe and dl06.exe are probably unrelated, but the remainder are in my
problem set for AArch64 on QEMU.

A run of the AArch64 ZynqMP ILP32 BSP produced these failures under the same
conditions with all the test carve-outs removed:

    "failed": [
    "psx12.exe",
    "spcpucounter01.exe",
    "sptimecounter01.exe",
    "sptimecounter02.exe",
    "sp04.exe"
    ],

Because of my experience with the aforementioned ARMv7 BSP and the lack of
failures on hardware, I chose not to weed out the root cause of the 
failures under

QEMU. This patch is documentation of our observations across multiple
architectures and BSPs running on QEMU more than anything else.

I feel with this volume of tests being tagged this way we should have a better
understanding of the problem and so a means to track or not track how to resolve
it. As Gedare has kindly stated once pushed this change disappears into a dark
corner and we have no means to track it.

The other solution is to set `jobs` to `1` in this BSP's tester config, again
something Gedare has raised. It means we get better or even valid results. What
is more important, valid results or running the testsuite as fast as possible?
I fully support dropping the numbe

Re: [PATCH] aarch64: Add tests that are failing intermittently

2021-08-20 Thread Chris Johns
On 21/8/21 2:38 am, Kinsey Moore wrote:
> On 8/19/2021 18:03, Chris Johns wrote:
>> On 20/8/21 4:55 am, Kinsey Moore wrote:
>>> On 8/19/2021 13:32, Gedare Bloom wrote:
 On Thu, Aug 19, 2021 at 11:43 AM Kinsey Moore  
 wrote:
> I've seen these failures on my local system, in our CI, and on a build
> server that I sometimes
> use for development/testing so if it's a configuration issue we're being
> pretty consistent about
> misconfiguration across some pretty different environments (docker,
> bare-metal, VM, different
> OSs, different QEMU versions). I've seen enough of the spintrcritical
> tests fail sporadically on
> QEMU to lump them all into this category. These are also tests that I
> have seen behave badly
> on ARMv7 QEMU on my local system (which doesn't rule out
> misconfiguration, but it's another
> data point).
>
 Yes, for example, it may be a matter of qemu process counts spawned by
 rtems-test, and the order in which tests get invoked could be a cause
 for which ones don't work. I could easily see this happening, since
 each test runtime will be fairly consistent, so you'll often see the
 same tests running concurrently with each other. But, if you change
 the order (e.g., by adding new tests), then we may see a new set of
 sporadically failing testcases, will we just add those, or do we need
 to re-examine this indetermine set periodically? Who will maintain
 this list? That's kind of the root of my concern here.
>>> I understand your concern about maintenance of the failure list and I don't
>>> have a good answer for you. I imagine going forward it would be a 
>>> combination
>>> of the current stake-holders for a given BSP and anyone who watches the
>>> automated build output from Joel's runs for these kinds of issues.
>>>
>>> On the other hand if we don't mark those tests, people will get fatigued
>>> looking at the spurious failures and assume any new ones just fall into the
>>> same category as others. At that point is it even worth running the
>>> automated tests for that platform?
>>>
> As far as your worry about marking these indeterminate, they're only
> being marked as such for
> QEMU BSPs. The ZynqMP hardware BSP doesn't have these testing carve-outs
> and runs all these tests flawlessly.
>> Great, this is important.
>>
> These failures become much more common when there is otherwise load on
> the system and a
> lot of them disappear when you limit the tester to a single QEMU
> instance at a time.
>
 I'm wondering if we should sacrifice testing speed for
 coverage/quality. If throttling rtems-test leads to more reliable test
 results, then it may be a better option than basically ignoring a
 swath of our testsuite.
>>> That would certainly mitigate some of the failures, but you'd also have to
>>> guarantee nothing else is running on the system which could cause the same
>>> problem. I know at least some of the current automated runs operate on a
>>> shared system which can and does often have other intensive processes
>>> running on it. There are also the tests that are sporadic on QEMU even
>>> without additional load.
>> What is it in these tests when combined with qemu that causes the tests to 
>> fail?
>> Is there some relation to a real clock, some shared host resource or a bug in
>> qemu? I am concerned a simulator can vary like this based on the host's load 
>> and
>> it makes me wonder how people use it on machines to host a number VMs.
> I experienced very similar results on an ARMv7 BSP (not Zynq) and assumed 
> that this
> was a known/accepted problem with QEMU when the same issues popped up on
> AArch64.

I think we have just ignored issue. I know I have ignored it because of the
rabbit hole it is.

> My local system under no other load produces these failures for the
> Zynq A9 QEMU
> BSP:
> 
>     "failed": [
>     "spcpucounter01.exe",
>     "psxtimes01.exe",
>     "sp69.exe",
>     "psx12.exe",
>     "minimum.exe",
>     "dl06.exe",
>     "sptimecounter02.exe"
>     ],
> 
> minimum.exe 

We have discussed this test in the past and I think the end result from Joel was
an exit code of 0 meant it had passed but I am not sure the exit code is printed
because it is minimal. Maybe it should be changed to be a `no-run` type test?

> and dl06.exe are probably unrelated,

Yeap and that is one I should fix when I can find the time.

> but the remainder are in my problem set for AArch64 on QEMU.

OK.

> A run of the AArch64 ZynqMP ILP32 BSP produced these failures under the same
> conditions with all the test carve-outs removed:
> 
>     "failed": [
>     "psx12.exe",
>     "spcpucounter01.exe",
>     "sptimecounter01.exe",
>     "sptimecounter02.exe",
>     "sp04.exe"
>     ],
> 
> Because of my experience with the afor

Re: [PATCH] aarch64: Add tests that are failing intermittently

2021-08-26 Thread Kinsey Moore

On 8/20/2021 22:06, Chris Johns wrote:

On 21/8/21 2:38 am, Kinsey Moore wrote:

On 8/19/2021 18:03, Chris Johns wrote:

On 20/8/21 4:55 am, Kinsey Moore wrote:

On 8/19/2021 13:32, Gedare Bloom wrote:

On Thu, Aug 19, 2021 at 11:43 AM Kinsey Moore  wrote:

I've seen these failures on my local system, in our CI, and on a build
server that I sometimes
use for development/testing so if it's a configuration issue we're being
pretty consistent about
misconfiguration across some pretty different environments (docker,
bare-metal, VM, different
OSs, different QEMU versions). I've seen enough of the spintrcritical
tests fail sporadically on
QEMU to lump them all into this category. These are also tests that I
have seen behave badly
on ARMv7 QEMU on my local system (which doesn't rule out
misconfiguration, but it's another
data point).


Yes, for example, it may be a matter of qemu process counts spawned by
rtems-test, and the order in which tests get invoked could be a cause
for which ones don't work. I could easily see this happening, since
each test runtime will be fairly consistent, so you'll often see the
same tests running concurrently with each other. But, if you change
the order (e.g., by adding new tests), then we may see a new set of
sporadically failing testcases, will we just add those, or do we need
to re-examine this indetermine set periodically? Who will maintain
this list? That's kind of the root of my concern here.

I understand your concern about maintenance of the failure list and I don't
have a good answer for you. I imagine going forward it would be a combination
of the current stake-holders for a given BSP and anyone who watches the
automated build output from Joel's runs for these kinds of issues.

On the other hand if we don't mark those tests, people will get fatigued
looking at the spurious failures and assume any new ones just fall into the
same category as others. At that point is it even worth running the
automated tests for that platform?


As far as your worry about marking these indeterminate, they're only
being marked as such for
QEMU BSPs. The ZynqMP hardware BSP doesn't have these testing carve-outs
and runs all these tests flawlessly.

Great, this is important.


These failures become much more common when there is otherwise load on
the system and a
lot of them disappear when you limit the tester to a single QEMU
instance at a time.


I'm wondering if we should sacrifice testing speed for
coverage/quality. If throttling rtems-test leads to more reliable test
results, then it may be a better option than basically ignoring a
swath of our testsuite.

That would certainly mitigate some of the failures, but you'd also have to
guarantee nothing else is running on the system which could cause the same
problem. I know at least some of the current automated runs operate on a
shared system which can and does often have other intensive processes
running on it. There are also the tests that are sporadic on QEMU even
without additional load.

What is it in these tests when combined with qemu that causes the tests to fail?
Is there some relation to a real clock, some shared host resource or a bug in
qemu? I am concerned a simulator can vary like this based on the host's load and
it makes me wonder how people use it on machines to host a number VMs.

I experienced very similar results on an ARMv7 BSP (not Zynq) and assumed that 
this
was a known/accepted problem with QEMU when the same issues popped up on
AArch64.

I think we have just ignored issue. I know I have ignored it because of the
rabbit hole it is.


My local system under no other load produces these failures for the
Zynq A9 QEMU
BSP:

     "failed": [
     "spcpucounter01.exe",
     "psxtimes01.exe",
     "sp69.exe",
     "psx12.exe",
     "minimum.exe",
     "dl06.exe",
     "sptimecounter02.exe"
     ],

minimum.exe

We have discussed this test in the past and I think the end result from Joel was
an exit code of 0 meant it had passed but I am not sure the exit code is printed
because it is minimal. Maybe it should be changed to be a `no-run` type test?


and dl06.exe are probably unrelated,

Yeap and that is one I should fix when I can find the time.


but the remainder are in my problem set for AArch64 on QEMU.

OK.


A run of the AArch64 ZynqMP ILP32 BSP produced these failures under the same
conditions with all the test carve-outs removed:

     "failed": [
     "psx12.exe",
     "spcpucounter01.exe",
     "sptimecounter01.exe",
     "sptimecounter02.exe",
     "sp04.exe"
     ],

Because of my experience with the aforementioned ARMv7 BSP and the lack of
failures on hardware, I chose not to weed out the root cause of the failures 
under
QEMU.

Sure. It however leaves the underlying problem about the reasons these fail with
QEMU and so we caught either way.


This patch is documentation of our observations across multi

Re: [PATCH] aarch64: Add tests that are failing intermittently

2021-08-27 Thread Chris Johns
On 27/8/21 9:36 am, Kinsey Moore wrote:
> On 8/20/2021 22:06, Chris Johns wrote:
>> On 21/8/21 2:38 am, Kinsey Moore wrote:
>>> On 8/19/2021 18:03, Chris Johns wrote:
>>> My comment in that regard was that other system
>>> loading (or multiple simultaneous test runs) can also cause the same problem
>>> and so
>>> this is only a partial solution. Barring a fix for RTEMS or QEMU for these 
>>> load-
>>> dependent and sporadic failures, this at least still needs to be documented
>>> in some
>>> form.
>> Yes and the failures should highlight an issue on the host that needs to be
>> looked into.
> 
> Since I'm working on SMP and I've had some of those tests failing sporadically
> as well, I took a dive into smpschededf01.exe on AArch64 and the issue that
> particular test seems to be encountering is a mismatch between the busy wait
> delay using rtems_test_busy_cpu_usage() and the number of kernel ticks that 
> have
> been experienced. My hypothesis is that QEMU is prone to dumping a pile of 
> timer
> ticks into the virtual CPU all at once to catch up to wall time after 
> returning
> from a context switch on the host OS. This would support the observation that
> failures are sporadic and increase under system load.  I instrumented the code
> and can see that the loop in rtems_test_busy_cpu_usage() isn't running
> substantially between these tick interrupts if at all.

Oh that would confuse things.

> I guess my next step is seeing if QEMU has an option to run its timers closer 
> to
> the illusion of metal instead of being based on the wall clock.

QEMU would need to handle instruction or a CPU timer to manage this.

Chris
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: [PATCH] aarch64: Add tests that are failing intermittently

2021-08-27 Thread Kinsey Moore

On 8/27/2021 19:01, Chris Johns wrote:

On 27/8/21 9:36 am, Kinsey Moore wrote:
Since I'm working on SMP and I've had some of those tests failing 
sporadically

as well, I took a dive into smpschededf01.exe on AArch64 and the issue that
particular test seems to be encountering is a mismatch between the busy wait
delay using rtems_test_busy_cpu_usage() and the number of kernel ticks that have
been experienced. My hypothesis is that QEMU is prone to dumping a pile of timer
ticks into the virtual CPU all at once to catch up to wall time after returning
from a context switch on the host OS. This would support the observation that
failures are sporadic and increase under system load.  I instrumented the code
and can see that the loop in rtems_test_busy_cpu_usage() isn't running
substantially between these tick interrupts if at all.

Oh that would confuse things.
I bumped RSB qemu locally from 5.2-rc1 to 5.2.0 release and the behavior 
got better, but it's still not great and will cause a failure rate of 
approximately 30% with my stripped down and instrumented test. At least 
it's better than 90+% failure rate of 4.1.0 or 5.2-rc1. I previously had 
QEMU 3.1.0 installed from the debian buster package repo and it behaved 
even better than the 5.2.0 release, so there was definitely some kind of 
regression in the interim that got partially fixed.

I guess my next step is seeing if QEMU has an option to run its timers closer to
the illusion of metal instead of being based on the wall clock.

QEMU would need to handle instruction or a CPU timer to manage this.


There don't seem to be any options to manipulate this that I've found, 
but there are a couple of internal timer types. It looks like the QEMU 
virtual timers fall back to a QEMU realtime timer if the virtual timer 
hooks aren't available. I didn't see many of the virtual timer hooks 
defined in the QEMU codebase, so I assume that's what's happening since 
the timer definitions in QEMU for the ARM Generic Timers are of the 
virtual variety.


I'm not sure what can be done from this point beyond updating RSB QEMU 
to 5.2.0 release from 5.2-rc1 barring inordinate time spent in the 
bowels of QEMU.



Kinsey

___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel