[PATCH net-next v4] vsock/test: Add test for null ptr deref when transport changes

Luigi Leonardi Tue, 24 Jun 2025 08:43:27 -0700

Add a new test to ensure that when the transport changes a null pointer
dereference does not occur. The bug was reported upstream [1] and fixed
with commit 2cb7c756f605 ("vsock/virtio: discard packets if the
transport changes").


KASAN: null-ptr-deref in range [0x0000000000000060-0x0000000000000067]
CPU: 2 UID: 0 PID: 463 Comm: kworker/2:3 Not tainted
Workqueue: vsock-loopback vsock_loopback_work
RIP: 0010:vsock_stream_has_data+0x44/0x70
Call Trace:
 virtio_transport_do_close+0x68/0x1a0
 virtio_transport_recv_pkt+0x1045/0x2ae4
 vsock_loopback_work+0x27d/0x3f0
 process_one_work+0x846/0x1420
 worker_thread+0x5b3/0xf80
 kthread+0x35a/0x700
 ret_from_fork+0x2d/0x70
 ret_from_fork_asm+0x1a/0x30

Note that this test may not fail in a kernel without the fix, but it may
hang on the client side if it triggers a kernel oops.

This works by creating a socket, trying to connect to a server, and then
executing a second connect operation on the same socket but to a
different CID (0). This triggers a transport change. If the connect
operation is interrupted by a signal, this could cause a null-ptr-deref.

Since this bug is non-deterministic, we need to try several times. It
is reasonable to assume that the bug will show up within the timeout
period.

If there is a G2H transport loaded in the system, the bug is not
triggered and this test will always pass.

[1]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/

Suggested-by: Hyunwoo Kim <[email protected]>
Suggested-by: Michal Luczaj <[email protected]>
Signed-off-by: Luigi Leonardi <[email protected]>
---
This series introduces a new test that checks for a null pointer 
dereference that may happen when there is a transport change[1]. This 
bug was fixed in [2].

Note that this test *cannot* fail, it hangs if it triggers a kernel
oops. The intended use-case is to run it and then check if there is any 
oops in the dmesg.

This test is based on Hyunwoo Kim's[3] and Michal's python 
reproducers[4].

[1]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
[2]https://lore.kernel.org/netdev/[email protected]/
[3]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/#t
[4]https://lore.kernel.org/netdev/[email protected]/
---
Changes in v4:
- Addressed Stefano's comments:
    - Minor style changes
    - Use `get_transports()` to print a warning when a G2H transport is 
    loaded
    - Removed check on second connect: Because the first connect is 
    interrupted, the socket is in an unspecified state (see man connect) 
    . This can cause strange and unexpected behaviors (connect returning 
    success on a non-existing CID).

- Link to v3: 
https://lore.kernel.org/r/[email protected]

Sorry, this took waaay longer than expected.

Changes in v3:
Addressed Stefano's and Michal's comments:
    - Added the splat text to the commit commessage.
    - Introduced commit hash that fixes the bug.
    - Not using perror anymore on pthread_* functions.
    - Listener is just created once.

- Link to v2:
https://lore.kernel.org/r/[email protected]

Changes in v2:
- Addressed Stefano's comments:
    - Timeout is now using current_nsec()
    - Check for return values
    - Style issues
- Added Hyunwoo Kim to Suggested-by
- Link to v1: 
https://lore.kernel.org/r/[email protected]
---
 tools/testing/vsock/Makefile     |   1 +
 tools/testing/vsock/vsock_test.c | 178 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 179 insertions(+)

diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index 
6e0b4e95e230500f99bb9c74350701a037ecd198..88211fd132d23ecdfd56ab0815580a237889e7f2
 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -5,6 +5,7 @@ vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o 
control.o util.o msg_ze
 vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
 vsock_perf: vsock_perf.o msg_zerocopy_common.o
 
+vsock_test: LDLIBS = -lpthread
 vsock_uring_test: LDLIBS = -luring
 vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o 
msg_zerocopy_common.o
 
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 
eb6f54378667ac7ed324f4823e988ec9846e41a3..095705c7b53b7ad38ab3b8bc3cbe54a9eeb76d5c
 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -22,6 +22,8 @@
 #include <signal.h>
 #include <sys/ioctl.h>
 #include <linux/time64.h>
+#include <pthread.h>
+#include <fcntl.h>
 
 #include "vsock_test_zerocopy.h"
 #include "timeout.h"
@@ -1867,6 +1869,177 @@ static void test_stream_connect_retry_server(const 
struct test_opts *opts)
        close(fd);
 }
 
+#define TRANSPORT_CHANGE_TIMEOUT 2 /* seconds */
+
+static void *test_stream_transport_change_thread(void *vargp)
+{
+       pid_t *pid = (pid_t *)vargp;
+       int ret;
+
+       /* We want this thread to terminate as soon as possible */
+       ret = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
+       if (ret) {
+               fprintf(stderr, "pthread_setcanceltype: %d\n", ret);
+               exit(EXIT_FAILURE);
+       }
+
+       while (true) {
+               if (kill(*pid, SIGUSR1) < 0) {
+                       perror("kill");
+                       exit(EXIT_FAILURE);
+               }
+       }
+       return NULL;
+}
+
+static void test_transport_change_signal_handler(int signal)
+{
+       /* We need a custom handler for SIGUSR1 as the default one terminates 
the process. */
+}
+
+static void test_stream_transport_change_client(const struct test_opts *opts)
+{
+       __sighandler_t old_handler;
+       pid_t pid = getpid();
+       pthread_t thread_id;
+       time_t tout;
+       int ret, tr;
+
+       tr = get_transports();
+       /* Print a warning if there is a G2H transport loaded.
+        * This is on a best effort basis because VMCI can be either G2H and 
H2G, and there is
+        * no easy way to understand it.
+        * The bug is present in the loopback transport. However, it does not 
interfere
+        * if it is loaded.
+        * The bug we are testing only appears when G2H transports are not 
loaded.
+        */
+
+       tr &= ~TRANSPORT_LOOPBACK;
+       if (tr != 0 && tr != TRANSPORT_VHOST)
+               fprintf(stderr, "G2H Transport detected. This test will not 
fail.\n");
+
+       old_handler = signal(SIGUSR1, test_transport_change_signal_handler);
+       if (old_handler == SIG_ERR) {
+               perror("signal");
+               exit(EXIT_FAILURE);
+       }
+
+       ret = pthread_create(&thread_id, NULL, 
test_stream_transport_change_thread, &pid);
+       if (ret) {
+               fprintf(stderr, "pthread_create: %d\n", ret);
+               exit(EXIT_FAILURE);
+       }
+
+       control_expectln("LISTENING");
+
+       tout = current_nsec() + TRANSPORT_CHANGE_TIMEOUT * NSEC_PER_SEC;
+       do {
+               struct sockaddr_vm sa = {
+                       .svm_family = AF_VSOCK,
+                       .svm_cid = opts->peer_cid,
+                       .svm_port = opts->peer_port,
+               };
+               int s;
+
+               s = socket(AF_VSOCK, SOCK_STREAM, 0);
+               if (s < 0) {
+                       perror("socket");
+                       exit(EXIT_FAILURE);
+               }
+
+               ret = connect(s, (struct sockaddr *)&sa, sizeof(sa));
+               /* The connect can fail due to signals coming from the thread.
+                * or because the receiver connection queue is full.
+                * Ignoring also the latter case because there is no way
+                * of synchronizing client's connect and server's accept when
+                * connect(s) are constantly being interrupted by signals.
+                */
+               if (ret == -1 && (errno != EINTR && errno != ECONNRESET)) {
+                       perror("connect");
+                       exit(EXIT_FAILURE);
+               }
+
+               /* Set CID to 0 cause a transport change. */
+               sa.svm_cid = 0;
+               /* Here we ignore the connect return value because we cannot
+                * safely assume that it will *always* fail.
+                * This is because the previous connect was interrupted
+                * during the connection process. The socket state, as stated
+                * in `man connect`, is unspecified and can result in strange
+                * behaviors.
+                */
+               connect(s, (struct sockaddr *)&sa, sizeof(sa));
+
+               close(s);
+
+               control_writeulong(CONTROL_CONTINUE);
+
+       } while (current_nsec() < tout);
+
+       control_writeulong(CONTROL_DONE);
+
+       ret = pthread_cancel(thread_id);
+       if (ret) {
+               fprintf(stderr, "pthread_cancel: %d\n", ret);
+               exit(EXIT_FAILURE);
+       }
+
+       /* Wait for the thread to terminate */
+       ret = pthread_join(thread_id, NULL);
+       if (ret) {
+               fprintf(stderr, "pthread_join: %d\n", ret);
+               exit(EXIT_FAILURE);
+       }
+
+       /* Restore the old handler */
+       if (signal(SIGUSR1, old_handler) == SIG_ERR) {
+               perror("signal");
+               exit(EXIT_FAILURE);
+       }
+}
+
+static void test_stream_transport_change_server(const struct test_opts *opts)
+{
+       int s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port);
+
+       /* Set the socket to be nonblocking because connects that have been 
interrupted
+        * (EINTR) can fill the receiver's accept queue anyway, leading to 
connect failure.
+        * As of today (6.15) in such situation there is no way to understand, 
from the
+        * client side, if the connection has been queued in the server or not.
+        */
+       if (fcntl(s, F_SETFL, fcntl(s, F_GETFL, 0) | O_NONBLOCK) < 0) {
+               perror("fcntl");
+               exit(EXIT_FAILURE);
+       }
+       control_writeln("LISTENING");
+
+       while (control_readulong() == CONTROL_CONTINUE) {
+               struct sockaddr_vm sa_client;
+               socklen_t socklen_client = sizeof(sa_client);
+
+               /* Must accept the connection, otherwise the `listen`
+                * queue will fill up and new connections will fail.
+                * There can be more than one queued connection,
+                * clear them all.
+                */
+               while (true) {
+                       int client = accept(s, (struct sockaddr *)&sa_client, 
&socklen_client);
+
+                       if (client < 0) {
+                               if (errno == EAGAIN)
+                                       break;
+
+                               perror("accept");
+                               exit(EXIT_FAILURE);
+                       }
+
+                       close(client);
+               }
+       }
+
+       close(s);
+}
+
 static void test_stream_linger_client(const struct test_opts *opts)
 {
        int fd;
@@ -2106,6 +2279,11 @@ static struct test_case test_cases[] = {
                .run_client = test_stream_nolinger_client,
                .run_server = test_stream_nolinger_server,
        },
+       {
+               .name = "SOCK_STREAM transport change null-ptr-deref",
+               .run_client = test_stream_transport_change_client,
+               .run_server = test_stream_transport_change_server,
+       },
        {},
 };
 

---
base-commit: 68d019aa14d97f8d57b0f8d203fd3b44db2ba0c7
change-id: 20250306-test_vsock-3e77a9c7a245

Best regards,
-- 
Luigi Leonardi <[email protected]>

[PATCH net-next v4] vsock/test: Add test for null ptr deref when transport changes

Reply via email to