On 6/10/2025 12:09 PM, Daniel P. Berrangé wrote:
On Tue, Jun 10, 2025 at 10:08:49AM -0500, Jaehoon Kim wrote:
When the source VM attempts to connect to the destination VM's Unix
domain socket (cpr.sock) during a cpr-transfer test, race conditions can
occur if the socket file isn't ready. This can lead to connection
failures when running tests.

This patch creates and listens on the socket in advance, and passes the
pre-listened FD directly. This avoids timing issues and improves the
reliability of CPR tests.

Reviewed-by: Jason J. Herne <jjhe...@linux.ibm.com>
Signed-off-by: Jaehoon Kim <jh...@linux.ibm.com>

---
Changes since v1:
- In v1, the patch added a wait loop to poll the existence of the socket
   file (cpr_validate_socket_path()).

- This version instead creates the socket beforehand and passes its FD
   to the destination QEMU, eliminating the race condition entirely.

- Commit title and message changed accordingly.
---
  migration/cpr-transfer.c          |  3 +-
  tests/qtest/migration/cpr-tests.c | 72 ++++++++++++++++++++++++++++++-
  2 files changed, 72 insertions(+), 3 deletions(-)
diff --git a/tests/qtest/migration/cpr-tests.c 
b/tests/qtest/migration/cpr-tests.c
index 5536e14610..6f90160e21 100644
--- a/tests/qtest/migration/cpr-tests.c
+++ b/tests/qtest/migration/cpr-tests.c
@@ -50,6 +50,51 @@ static void *test_mode_transfer_start(QTestState *from, 
QTestState *to)
      return NULL;
  }
+/*
+ * Create a pre-listened UNIX domain socket at the specified path.
+ *
+ * This is used to eliminate a race condition that can occur
+ * intermittently in qtest during CPR tests. By pre-creating and
+ * listening on the socket, we avoid timing-related issues.
+ */
+static int setup_socket_listener(const char *path)
+{
+    struct sockaddr_un un;
+    size_t pathlen;
+    int sock_fd;
+
+    sock_fd = socket(PF_UNIX, SOCK_STREAM, 0);
+    if (sock_fd < 0) {
+        g_test_message("Failed to create Unix socket");
+        return -1;
+    }
+
+    pathlen = strlen(path);
+    if (pathlen >= sizeof(un.sun_path)) {
+        g_test_message("UNIX socket path '%s' is too long", path);
+        close(sock_fd);
+        return -1;
+    }
+
+    memset(&un, 0, sizeof(un));
+    un.sun_family = AF_UNIX;
+    strncpy(un.sun_path, path, sizeof(un.sun_path) - 1);
+
+    if (bind(sock_fd, (struct sockaddr *)&un, sizeof(un)) < 0) {
+        g_test_message("Failed to bind socket to %s", path);
+        close(sock_fd);
+        return -1;
+    }
+
+    if (listen(sock_fd, 1) < 0) {
+        g_test_message("Failed to listen on socket %s", path);
+        close(sock_fd);
+        return -1;
+    }
+
+    return sock_fd;
+}
This is effectively re-implementing 'unix_listen', so just use
that function.
I'll revise the patch to use the already defined function instead of re-implementing it.
Thanks for pointing it out.
@@ -75,6 +120,29 @@ static void test_mode_transfer_common(bool incoming_defer)
          "              'path': '%s' } } ]",
          mig_path);
+ /*
+     * Determine socket address type and value.
+     * If socket creation fails, provide the socket path to the target,
+     * so it can create the Unix domain socket itself.
+     * Otherwise, use the pre-listened socket file descriptor directly.
+     */
+    int cpr_sockfd = setup_socket_listener(cpr_path);
+
+    if (cpr_sockfd < 0) {
A failure of this function (or in future 'unix_listen') shouldn't
trigger any fallback logic - we should report it and fail thue
test.
I was considering both options, but I agree with you.
It's better to fail the test and report the error rather than trigger fallback logic.
I'll update the patch accordingly.

- Jaehoon Kim.
+        addr_type = g_strdup("unix");
+        addr_key = g_strdup("path");
+        addr_value = g_strdup(cpr_path);
+    } else {
+        addr_type = g_strdup("fd");
+        addr_key = g_strdup("str");
+        addr_value = g_strdup_printf("%d", cpr_sockfd);
+    }
+
+    opts_target = g_strdup_printf("-incoming cpr,addr.transport=socket,"
+                                  "addr.type=%s,addr.%s=%s %s",
+                                  addr_type, addr_key, addr_value, opts);
+
+
      MigrateCommon args = {
          .start.opts_source = opts,
          .start.opts_target = opts_target,
--
2.49.0


With regards,
Daniel

Reply via email to