On Tue, Jun 10, 2025 at 10:08:49AM -0500, Jaehoon Kim wrote:
> When the source VM attempts to connect to the destination VM's Unix
> domain socket (cpr.sock) during a cpr-transfer test, race conditions can
> occur if the socket file isn't ready. This can lead to connection
> failures when running tests.
>
> This patch creates and listens on the socket in advance, and passes the
> pre-listened FD directly. This avoids timing issues and improves the
> reliability of CPR tests.
>
> Reviewed-by: Jason J. Herne <[email protected]>
> Signed-off-by: Jaehoon Kim <[email protected]>
>
> ---
> Changes since v1:
> - In v1, the patch added a wait loop to poll the existence of the socket
> file (cpr_validate_socket_path()).
>
> - This version instead creates the socket beforehand and passes its FD
> to the destination QEMU, eliminating the race condition entirely.
>
> - Commit title and message changed accordingly.
> ---
> migration/cpr-transfer.c | 3 +-
> tests/qtest/migration/cpr-tests.c | 72 ++++++++++++++++++++++++++++++-
> 2 files changed, 72 insertions(+), 3 deletions(-)
> diff --git a/tests/qtest/migration/cpr-tests.c
> b/tests/qtest/migration/cpr-tests.c
> index 5536e14610..6f90160e21 100644
> --- a/tests/qtest/migration/cpr-tests.c
> +++ b/tests/qtest/migration/cpr-tests.c
> @@ -50,6 +50,51 @@ static void *test_mode_transfer_start(QTestState *from,
> QTestState *to)
> return NULL;
> }
>
> +/*
> + * Create a pre-listened UNIX domain socket at the specified path.
> + *
> + * This is used to eliminate a race condition that can occur
> + * intermittently in qtest during CPR tests. By pre-creating and
> + * listening on the socket, we avoid timing-related issues.
> + */
> +static int setup_socket_listener(const char *path)
> +{
> + struct sockaddr_un un;
> + size_t pathlen;
> + int sock_fd;
> +
> + sock_fd = socket(PF_UNIX, SOCK_STREAM, 0);
> + if (sock_fd < 0) {
> + g_test_message("Failed to create Unix socket");
> + return -1;
> + }
> +
> + pathlen = strlen(path);
> + if (pathlen >= sizeof(un.sun_path)) {
> + g_test_message("UNIX socket path '%s' is too long", path);
> + close(sock_fd);
> + return -1;
> + }
> +
> + memset(&un, 0, sizeof(un));
> + un.sun_family = AF_UNIX;
> + strncpy(un.sun_path, path, sizeof(un.sun_path) - 1);
> +
> + if (bind(sock_fd, (struct sockaddr *)&un, sizeof(un)) < 0) {
> + g_test_message("Failed to bind socket to %s", path);
> + close(sock_fd);
> + return -1;
> + }
> +
> + if (listen(sock_fd, 1) < 0) {
> + g_test_message("Failed to listen on socket %s", path);
> + close(sock_fd);
> + return -1;
> + }
> +
> + return sock_fd;
> +}
This is effectively re-implementing 'unix_listen', so just use
that function.
> @@ -75,6 +120,29 @@ static void test_mode_transfer_common(bool incoming_defer)
> " 'path': '%s' } } ]",
> mig_path);
>
> + /*
> + * Determine socket address type and value.
> + * If socket creation fails, provide the socket path to the target,
> + * so it can create the Unix domain socket itself.
> + * Otherwise, use the pre-listened socket file descriptor directly.
> + */
> + int cpr_sockfd = setup_socket_listener(cpr_path);
> +
> + if (cpr_sockfd < 0) {
A failure of this function (or in future 'unix_listen') shouldn't
trigger any fallback logic - we should report it and fail thue
test.
> + addr_type = g_strdup("unix");
> + addr_key = g_strdup("path");
> + addr_value = g_strdup(cpr_path);
> + } else {
> + addr_type = g_strdup("fd");
> + addr_key = g_strdup("str");
> + addr_value = g_strdup_printf("%d", cpr_sockfd);
> + }
> +
> + opts_target = g_strdup_printf("-incoming cpr,addr.transport=socket,"
> + "addr.type=%s,addr.%s=%s %s",
> + addr_type, addr_key, addr_value, opts);
> +
> +
> MigrateCommon args = {
> .start.opts_source = opts,
> .start.opts_target = opts_target,
> --
> 2.49.0
>
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|