Hi Daniel,
Thanks a lot for review and the suggestions.
On 9/8/25 08:49, Daniel P. Berrangé wrote:
On Thu, Sep 04, 2025 at 03:46:35PM +0000, Gustavo Romero wrote:
In this series, we leveraged the run-test.py script used in the
check-tcg tests, making it a GDB runner capable of calling a test script
without spawning any VMs. In this configuration, the test scripts can
manage the VM and also import gdb, making the GDB Python API inside the
functional test scripts.
A --quiet option has been added to run-test.py so it doesn't print the
command line used to execute GDB to the stdout. This ensures that users
don't get confused about how to re-run the tests. One can re-run the
test simply by copying and pasting the command line shown by Meson when
V=1 is passed:
$ make -j check-functional V=1
or, alternatively, once the test run completes, the exact command found
in the 'command:' field of the build/meson-logs/testlog-thorough.txt
file generated by Meson. Both methods provide the correct environment
variables required to run the test, such as the proper $PYTHONPATH.
While I like the conceptual idea of just sending human GDB commands,
instead of working with GDB protocol packets, I really dislike the
effect this has on the execution / startup of the functional tests
via use of the custom runner for a number of reasons
* The command line for launching the test outside of meson is very
complicated, so not memorable
Why very complicated? It calls a simple runner instead of calling the
test script directly, but it doesn't change the way to re-run a single
test. One just have to pass V=1 to see make's command line and copy
and paste the full command line to re-run the test. I mentioned
inspecting 'testlog-thorough.txt' just for completeness.
* It makes the meson.build rules much more complicated
Do we want to never augment functional tests' meson.build? Nothing
complicated is being added. Basically, just a new variable suffixed with
'_with_runner' which holds a tuple (test, runner) that tell the test
to be executed, following the same logic we already have for all the other
variables that specify the tests per arch/mode/speed.
Another option would be to select a runner based on a suffix in the test
name, for instance, 'reverse_debug_with_runner.py'.
* Running standalone there is no TAP output available making the
test hard to debug on failure or timeout
This is because of an unfortunate GDB Python API issue, please see my
reply in your comment on patch 5/5. This can be solved but needs more
investigation on GDB side.
I understand the need to spawn the test via gdb, in order to be able
to import the 'gdb' python module. Looking at what reverse_debugging.py
does, however, makes me question whether we actually need to directly
use the 'gdb' python module.
The only APIs we use are 'gdb.execute' and 'gdb.parse_and_eval'.
The latter is only used once as
gdb.parse_and_eval("$pc")
and I believe that can be changed to
gdb.execute("printf \"0x%x\", $pc", to_string=True)
IOW, all we need is 'gdb.execute("....", to_string=True)'
Yes, I do want to directly use the 'gdb' python module directly in the
tests. We shouldn't look at a solution only for reverse_debug.py but also
think of any future tests that will require the GDB Python API, so I don't
want to specialize here and reduce the API to a single method.
With a little extra helper proxy script, we can achieve this without
changing the way scripts are launched.
The script needs to listen on a UNIX socket path. When a client
connects, it should read lines of data from the client and pass
them to 'gdb.execute(..., to_string=True)' and whatever data
gdb returns should be written back to the client.
A very very crude example with no error handling would be:
#!/usr/bin/python3
import gdb
import os
import socket
sock = os.environ.get("QEMU_PROXY", "/tmp/qemu.gdb.proxy")
try:
os.unlink(sock)
except:
pass
with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as s:
s.bind(sock)
s.listen()
conn, addr = s.accept()
fh = conn.makefile('rw')
with conn:
while True:
line = fh.readline()
if not line:
break
data = gdb.execute(line, to_string=True)
fh.write(data)
fh.flush()
In the functional test suite, we should have a helper file
tests/functional/qemu_test/gdb.py that provides an API for
launching GDB to execute this proxy script, and an API to
execute commands by talking over this UNIX socket path.
With this, we will need no changes in the way we execute the
reverse debugging script from a test runner POV, thus avoiding
all the downsides of use of the run-test.py script. IOW, the
first 4 patches in this series go away completely. Instead we
need a patch to create the proxy script and a patch to create
the helper APIs in tests/functional/qemu_test/gdb.py, whereupon
the last patch can replace
try:
import gdb
except ModuleNotFoundError:
from sys import exit
exit("This script must be launched via tests/guest-debug/run-test.py!")
with
from qemu_test import gdb
and the earlier mentioned replacement of parse_and_eval()
For the sake of not adding a few lines into meson.build, we are going
to design a new ad-hoc API for the functional tests on top of the GDB
Python API, which will communicate with the test script via a socket
and will _still require a runner anyway_ (just now hidden under a
module/API)? This is far more complicated than having a simple runner
to call GDB and pass the test script.
In fact, I think that if the test script had any clue in its name,
like the "_with_runner" suffix I mentioned above, maybe Meson's could
take care of calling GDB itself without calling any runner. Would that
address your first comment in the bullets (and maybe the second one too,
but not sure without trying it) and get this series accepted by you,
since the third one, about the exitcode, is related to GDB's odd behavior?
Cheers,
Gustavo