Re: [PATCH v3 0/7] Switch iotests to using Async QMP
On Wed, Oct 13, 2021 at 10:49 AM Hanna Reitz wrote: > On 13.10.21 16:00, John Snow wrote: > > > > > > On Wed, Oct 13, 2021 at 8:51 AM John Snow wrote: > > > > > > > > On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz > wrote: > > > > On 13.10.21 00:34, John Snow wrote: > > > Based-on: <20211012214152.802483-1-js...@redhat.com> > > >[PULL 00/10] Python patches > > > GitLab: > > > https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper > > > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591 > > > > > > Hiya, > > > > > > This series continues where the last two AQMP series left > > off and adds a > > > synchronous 'legacy' wrapper around the new AQMP interface, > > then drops > > > it straight into iotests to prove that AQMP is functional > > and totally > > > cool and fine. The disruption and churn to iotests is pretty > > minimal. > > > > > > In the event that a regression happens and I am not > > physically proximate > > > to inflict damage upon, one may set the > > QEMU_PYTHON_LEGACY_QMP variable > > > to any non-empty string as it pleases you to engage the QMP > > machinery > > > you are used to. > > > > > > I'd like to try and get this committed early in the 6.2 > > development > > > cycle to give ample time to smooth over any possible > > regressions. I've > > > tested it locally and via gitlab CI, across Python versions > > 3.6 through > > > 3.10, and "worksforme". If something bad happens, we can > > revert the > > > actual switch-flip very trivially. > > > > So running iotests locally, I got one failure: > > > > $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300 > > [...] > > 300 fail [10:28:06] [10:28:11] > > 5.1s output mismatch (see 300.out.bad) > > --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out > > +++ 300.out.bad > > @@ -1,4 +1,5 @@ > > -... > > > +..ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader: > > > > ConnectionResetError: [Errno 104] Connection reset by peer > > +. > > >-- > > Ran 39 tests > > [...] > > > > > > Oh, unfortunate. > > > > > > I’m afraid I can’t really give a reproducer or anything. It > > feels like > > > > > > Thank you for the report! > > > > just some random spurious timing-related error. Although then > > again, > > 300 does have an `except machine.AbnormalShutdown` clause at one > > point... So perhaps that’s the culprit, and we need to > > disable logging > > there. > > > > > > I'll investigate! > > > > > > Unfortunately, even in a loop some 150 times I couldn't reproduce this > > one. As you point out, it appears to be just a failure caused by > > logging. The test logic itself completes as expected. > > > > Still, I would expect, on a "clean" shutdown of the destination host > > (where the destination process fails to load the migration stream and > > voluntarily exits with an error code) to end with a FIN/ACK for TCP or > > ... uh, whatever happens for a UNIX socket. Where's the Connection > > Reset coming from? Did the destination VM process *crash*? > > > > I'm not so sure that I *should* silence this error, but I also can't > > reproduce it at all to answer these questions, so uh. uhhh. I guess I > > will just hammer it on a loop a few hundred times more and see if I > > get lucky. > > I could reproduce it, by running 20 instances concurrently. (Needs a > change to testrunner.py, so that the reference outputs don’t collide: > > diff --git a/tests/qemu-iotests/testrunner.py > b/tests/qemu-iotests/testrunner.py > index a56b6da396..fd0a3a1eeb 100644 > --- a/tests/qemu-iotests/testrunner.py > +++ b/tests/qemu-iotests/testrunner.py > @@ -221,7 +221,7 @@ def find_reference(self, test: str) -> str: > > def do_run_test(self, test: str) -> TestResult: > f_test = Path(test) > -f_bad = Path(f_test.name + '.out.bad') > +f_bad = Path(f'{os.getpid()}-{f_test.name}.out.bad') > f_notrun = Path(f_test.name + '.notrun') > f_casenotrun = Path(f_test.name + '.casenotrun') > f_reference = Path(self.find_reference(test)) > > ) > > And then: > > $ while TEST_DIR=/tmp/vdi-$$ ./check -vdi 300; do; done > > Which pretty quickly shows the error in at least one of those loops > (under a minute). > > As far as I can tell, changing the log level in 300 does indeed fix it: > > diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300 > in
Re: [PATCH v3 0/7] Switch iotests to using Async QMP
On 13.10.21 16:00, John Snow wrote: On Wed, Oct 13, 2021 at 8:51 AM John Snow wrote: On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz wrote: On 13.10.21 00:34, John Snow wrote: > Based-on: <20211012214152.802483-1-js...@redhat.com> > [PULL 00/10] Python patches > GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591 > > Hiya, > > This series continues where the last two AQMP series left off and adds a > synchronous 'legacy' wrapper around the new AQMP interface, then drops > it straight into iotests to prove that AQMP is functional and totally > cool and fine. The disruption and churn to iotests is pretty minimal. > > In the event that a regression happens and I am not physically proximate > to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable > to any non-empty string as it pleases you to engage the QMP machinery > you are used to. > > I'd like to try and get this committed early in the 6.2 development > cycle to give ample time to smooth over any possible regressions. I've > tested it locally and via gitlab CI, across Python versions 3.6 through > 3.10, and "worksforme". If something bad happens, we can revert the > actual switch-flip very trivially. So running iotests locally, I got one failure: $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300 [...] 300 fail [10:28:06] [10:28:11] 5.1s output mismatch (see 300.out.bad) --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out +++ 300.out.bad @@ -1,4 +1,5 @@ -... +..ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader: ConnectionResetError: [Errno 104] Connection reset by peer +. -- Ran 39 tests [...] Oh, unfortunate. I’m afraid I can’t really give a reproducer or anything. It feels like Thank you for the report! just some random spurious timing-related error. Although then again, 300 does have an `except machine.AbnormalShutdown` clause at one point... So perhaps that’s the culprit, and we need to disable logging there. I'll investigate! Unfortunately, even in a loop some 150 times I couldn't reproduce this one. As you point out, it appears to be just a failure caused by logging. The test logic itself completes as expected. Still, I would expect, on a "clean" shutdown of the destination host (where the destination process fails to load the migration stream and voluntarily exits with an error code) to end with a FIN/ACK for TCP or ... uh, whatever happens for a UNIX socket. Where's the Connection Reset coming from? Did the destination VM process *crash*? I'm not so sure that I *should* silence this error, but I also can't reproduce it at all to answer these questions, so uh. uhhh. I guess I will just hammer it on a loop a few hundred times more and see if I get lucky. I could reproduce it, by running 20 instances concurrently. (Needs a change to testrunner.py, so that the reference outputs don’t collide: diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py index a56b6da396..fd0a3a1eeb 100644 --- a/tests/qemu-iotests/testrunner.py +++ b/tests/qemu-iotests/testrunner.py @@ -221,7 +221,7 @@ def find_reference(self, test: str) -> str: def do_run_test(self, test: str) -> TestResult: f_test = Path(test) - f_bad = Path(f_test.name + '.out.bad') + f_bad = Path(f'{os.getpid()}-{f_test.name}.out.bad') f_notrun = Path(f_test.name + '.notrun') f_casenotrun = Path(f_test.name + '.casenotrun') f_reference = Path(self.find_reference(test)) ) And then: $ while TEST_DIR=/tmp/vdi-$$ ./check -vdi 300; do; done Which pretty quickly shows the error in at least one of those loops (under a minute). As far as I can tell, changing the log level in 300 does indeed fix it: diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300 index 10f9f2a8da..096f5dabf0 100755 --- a/tests/qemu-iotests/300 +++ b/tests/qemu-iotests/300 @@ -27,6 +27,7 @@ from typing import Dict, List, Optional from qemu.machine import machine import iotests +from iotests import change_log_level BlockBitmapMapping = List[Dict[str, object]] @@ -464,7 +465,8 @@ class TestBlockBitmapMappingErrors(TestDirtyBitmapMigration): # Expect abnormal shutdown of the destination VM because of
Re: [PATCH v3 0/7] Switch iotests to using Async QMP
On Wed, Oct 13, 2021 at 8:51 AM John Snow wrote: > > > On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz wrote: > >> On 13.10.21 00:34, John Snow wrote: >> > Based-on: <20211012214152.802483-1-js...@redhat.com> >> >[PULL 00/10] Python patches >> > GitLab: >> https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper >> > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591 >> > >> > Hiya, >> > >> > This series continues where the last two AQMP series left off and adds a >> > synchronous 'legacy' wrapper around the new AQMP interface, then drops >> > it straight into iotests to prove that AQMP is functional and totally >> > cool and fine. The disruption and churn to iotests is pretty minimal. >> > >> > In the event that a regression happens and I am not physically proximate >> > to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable >> > to any non-empty string as it pleases you to engage the QMP machinery >> > you are used to. >> > >> > I'd like to try and get this committed early in the 6.2 development >> > cycle to give ample time to smooth over any possible regressions. I've >> > tested it locally and via gitlab CI, across Python versions 3.6 through >> > 3.10, and "worksforme". If something bad happens, we can revert the >> > actual switch-flip very trivially. >> >> So running iotests locally, I got one failure: >> >> $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300 >> [...] >> 300 fail [10:28:06] [10:28:11] >> 5.1s output mismatch (see 300.out.bad) >> --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out >> +++ 300.out.bad >> @@ -1,4 +1,5 @@ >> -... >> +..ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader: >> ConnectionResetError: [Errno 104] Connection reset by peer >> +. >> -- >> Ran 39 tests >> [...] >> >> > Oh, unfortunate. > > >> >> I’m afraid I can’t really give a reproducer or anything. It feels like >> > > Thank you for the report! > > >> just some random spurious timing-related error. Although then again, >> 300 does have an `except machine.AbnormalShutdown` clause at one >> point... So perhaps that’s the culprit, and we need to disable logging >> there. >> >> > I'll investigate! > Unfortunately, even in a loop some 150 times I couldn't reproduce this one. As you point out, it appears to be just a failure caused by logging. The test logic itself completes as expected. Still, I would expect, on a "clean" shutdown of the destination host (where the destination process fails to load the migration stream and voluntarily exits with an error code) to end with a FIN/ACK for TCP or ... uh, whatever happens for a UNIX socket. Where's the Connection Reset coming from? Did the destination VM process *crash*? I'm not so sure that I *should* silence this error, but I also can't reproduce it at all to answer these questions, so uh. uhhh. I guess I will just hammer it on a loop a few hundred times more and see if I get lucky.
Re: [PATCH v3 0/7] Switch iotests to using Async QMP
On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz wrote: > On 13.10.21 00:34, John Snow wrote: > > Based-on: <20211012214152.802483-1-js...@redhat.com> > >[PULL 00/10] Python patches > > GitLab: > https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper > > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591 > > > > Hiya, > > > > This series continues where the last two AQMP series left off and adds a > > synchronous 'legacy' wrapper around the new AQMP interface, then drops > > it straight into iotests to prove that AQMP is functional and totally > > cool and fine. The disruption and churn to iotests is pretty minimal. > > > > In the event that a regression happens and I am not physically proximate > > to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable > > to any non-empty string as it pleases you to engage the QMP machinery > > you are used to. > > > > I'd like to try and get this committed early in the 6.2 development > > cycle to give ample time to smooth over any possible regressions. I've > > tested it locally and via gitlab CI, across Python versions 3.6 through > > 3.10, and "worksforme". If something bad happens, we can revert the > > actual switch-flip very trivially. > > So running iotests locally, I got one failure: > > $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300 > [...] > 300 fail [10:28:06] [10:28:11] > 5.1s output mismatch (see 300.out.bad) > --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out > +++ 300.out.bad > @@ -1,4 +1,5 @@ > -... > +..ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader: > ConnectionResetError: [Errno 104] Connection reset by peer > +. > -- > Ran 39 tests > [...] > > Oh, unfortunate. > > I’m afraid I can’t really give a reproducer or anything. It feels like > Thank you for the report! > just some random spurious timing-related error. Although then again, > 300 does have an `except machine.AbnormalShutdown` clause at one > point... So perhaps that’s the culprit, and we need to disable logging > there. > > I'll investigate!
Re: [PATCH v3 0/7] Switch iotests to using Async QMP
On 13.10.21 00:34, John Snow wrote: Based-on: <20211012214152.802483-1-js...@redhat.com> [PULL 00/10] Python patches GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591 Hiya, This series continues where the last two AQMP series left off and adds a synchronous 'legacy' wrapper around the new AQMP interface, then drops it straight into iotests to prove that AQMP is functional and totally cool and fine. The disruption and churn to iotests is pretty minimal. In the event that a regression happens and I am not physically proximate to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable to any non-empty string as it pleases you to engage the QMP machinery you are used to. I'd like to try and get this committed early in the 6.2 development cycle to give ample time to smooth over any possible regressions. I've tested it locally and via gitlab CI, across Python versions 3.6 through 3.10, and "worksforme". If something bad happens, we can revert the actual switch-flip very trivially. So running iotests locally, I got one failure: $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300 [...] 300 fail [10:28:06] [10:28:11] 5.1s output mismatch (see 300.out.bad) --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out +++ 300.out.bad @@ -1,4 +1,5 @@ -... +..ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader: ConnectionResetError: [Errno 104] Connection reset by peer +. -- Ran 39 tests [...] I’m afraid I can’t really give a reproducer or anything. It feels like just some random spurious timing-related error. Although then again, 300 does have an `except machine.AbnormalShutdown` clause at one point... So perhaps that’s the culprit, and we need to disable logging there. Hanna