Re: [PATCH v4] Prevent vhost-user-blk-test hang

2021-09-23 Thread ebl...@redhat.com
On Thu, Sep 23, 2021 at 03:16:17PM +, Raphael Norwitz wrote:
> In the vhost-user-blk-test, as of now there is nothing stoping
> vhost-user-blk in QEMU writing to the socket right after forking off the
> storage daemon before it has a chance to come up properly, leaving the
> test hanging forever. This intermittently hanging test has caused QEMU
> automation failures reported multiple times on the mailing list [1].
> 
> This change makes the storage-daemon notify the vhost-user-blk-test
> that it is fully initialized and ready to handle client connections by
> creating a pidfile on initialiation. This ensures that the storage-daemon
> backend won't miss vhost-user messages and thereby resolves the hang.
> 
> [1] 
> https://lore.kernel.org/qemu-devel/CAFEAcA8kYpz9LiPNxnWJAPSjc=nv532bedyfynabemeohqb...@mail.gmail.com/
> 
> Signed-off-by: Raphael Norwitz 
> ---
> Accidentally left a raw free() in v3. Converted it to g_free() here.
> 
>  tests/qtest/vhost-user-blk-test.c | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/vhost-user-blk-test.c 
> b/tests/qtest/vhost-user-blk-test.c
> index 6f108a1b62..6898f55f11 100644
> --- a/tests/qtest/vhost-user-blk-test.c
> +++ b/tests/qtest/vhost-user-blk-test.c
> @@ -24,6 +24,7 @@
>  #define TEST_IMAGE_SIZE (64 * 1024 * 1024)
>  #define QVIRTIO_BLK_TIMEOUT_US  (30 * 1000 * 1000)
>  #define PCI_SLOT_HP 0x06
> +#define PIDFILE_RETRIES 5
>  
>  typedef struct {
>  pid_t pid;
> @@ -885,7 +886,8 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> vus_instances,
>   int num_queues)
>  {
>  const char *vhost_user_blk_bin = qtest_qemu_storage_daemon_binary();
> -int i;
> +int i, retries;
> +char *daemon_pidfile_path;
>  gchar *img_path;
>  GString *storage_daemon_command = g_string_new(NULL);
>  QemuStorageDaemonState *qsd;
> @@ -898,6 +900,9 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> vus_instances,
>  " -object memory-backend-memfd,id=mem,size=256M,share=on "
>  " -M memory-backend=mem -m 256M ");
>  
> +daemon_pidfile_path = g_strdup_printf("/tmp/daemon-%d", getpid());
> +g_assert_cmpint((uintptr_t) daemon_pidfile_path, !=, (uintptr_t) NULL);

Feels verbose compared to:
g_assert(daemon_pidfile_path)

For that matter, can g_strdup_printf() ever return NULL?  On memory
error, it exit()s rather than returning.  So this assert feels like
overkill.

Otherwise,

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2] Prevent vhost-user-blk-test hang

2021-09-07 Thread ebl...@redhat.com
On Mon, Sep 06, 2021 at 01:25:20PM +, Raphael Norwitz wrote:
> In the vhost-user-blk-test, as of now there is nothing stoping
> vhost-user-blk in QEMU writing to the socket right after forking off the
> storage daemon before it has a chance to come up properly, leaving the
> test hanging forever. This intermittently hanging test has caused QEMU
> automation failures reported multiple times on the mailing list [1].
> 
> This change makes the storage-daemon notify the vhost-user-blk-test
> that it is fully initialized and ready to handle client connections by
> creating a pidfile on initialiation. This ensures that the storage-daemon
> backend won't miss vhost-user messages and thereby resolves the hang.
> 
> [1] 
> https://lore.kernel.org/qemu-devel/CAFEAcA8kYpz9LiPNxnWJAPSjc=nv532bedyfynabemeohqb...@mail.gmail.com/
> 
> Signed-off-by: Raphael Norwitz 
> ---
>  tests/qtest/vhost-user-blk-test.c | 33 ++-
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/vhost-user-blk-test.c 
> b/tests/qtest/vhost-user-blk-test.c
> index 6f108a1b62..78140e6f28 100644
> --- a/tests/qtest/vhost-user-blk-test.c
> +++ b/tests/qtest/vhost-user-blk-test.c
> @@ -24,6 +24,9 @@
>  #define TEST_IMAGE_SIZE (64 * 1024 * 1024)
>  #define QVIRTIO_BLK_TIMEOUT_US  (30 * 1000 * 1000)
>  #define PCI_SLOT_HP 0x06
> +#define PIDFILE_RETRIES 5
> +
> +const char *pidfile_format = "/tmp/daemon-%d";

Why is this not static?  In fact...

>  
>  typedef struct {
>  pid_t pid;
> @@ -885,7 +888,8 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> vus_instances,
>   int num_queues)
>  {
>  const char *vhost_user_blk_bin = qtest_qemu_storage_daemon_binary();
> -int i;
> +int i, err, retries;
> +char *daemon_pidfile_path;
>  gchar *img_path;
>  GString *storage_daemon_command = g_string_new(NULL);
>  QemuStorageDaemonState *qsd;
> @@ -898,6 +902,12 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> vus_instances,
>  " -object memory-backend-memfd,id=mem,size=256M,share=on "
>  " -M memory-backend=mem -m 256M ");
>  
> +err = asprintf(_pidfile_path, pidfile_format, getpid());

...action at a distance makes it harder for gcc to warn about bad
formats.  I'd just inline "/tmp/daemon-%d" here in the asprintf call,
and drop pidfile_format altogether.

Why are we using bare asprintf instead of glib's g_strdup_printf?

> +if (err == -1) {
> +fprintf(stderr, "Failed to format storage-daemon pidfile name %m");

%m in printf is a glibc-ism; not portable to non-Linux.  Do we care?

> +abort();

Rather than directly abort, since this is a glib test runner, is there
a glib function we should be using?  For example, using
g_assert_cmpint((err, !=, -1)?

> +}
> +
>  for (i = 0; i < vus_instances; i++) {
>  int fd;
>  char *sock_path = create_listen_socket();
> @@ -914,6 +924,9 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> vus_instances,
> i + 1, sock_path);
>  }
>  
> +g_string_append_printf(storage_daemon_command, "--pidfile %s",

Missing a space on the tail end if there are more arguments to append
to the command line.

> +   daemon_pidfile_path);
> +
>  g_test_message("starting vhost-user backend: %s",
> storage_daemon_command->str);
>  pid_t pid = fork();
> @@ -930,7 +943,25 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> vus_instances,
>  execlp("/bin/sh", "sh", "-c", storage_daemon_command->str, NULL);
>  exit(1);
>  }
> +
> +/*
> + * Ensure the storage-daemon has come up properly before allowing the
> + * test to proceed.
> + */
> +retries = 0;
> +while (access(daemon_pidfile_path, F_OK) != 0) {
> +if (retries > PIDFILE_RETRIES) {
> +fprintf(stderr, "The storage-daemon failed to come up after %d "
> +"seconds - killing the test", PIDFILE_RETRIES);
> +abort();

Again, would some form of g_assert*() be better than bare abort?

> +}
> +
> +retries++;
> +usleep(1000);

We're inconsistent on whether qtest files use bare usleep or g_usleep.

> +}
> +
>  g_string_free(storage_daemon_command, true);
> +free(daemon_pidfile_path);

Do you want to unlink() the file (if it exists)?

>  
>  qsd = g_new(QemuStorageDaemonState, 1);
>  qsd->pid = pid;
> -- 
> 2.20.1
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 1/2] storage-daemon: add opt to print when initialized

2021-08-30 Thread ebl...@redhat.com
On Mon, Aug 30, 2021 at 03:56:16PM +, Raphael Norwitz wrote:
> On Fri, Aug 27, 2021 at 01:51:48PM -0500, ebl...@redhat.com wrote:
> > On Fri, Aug 27, 2021 at 04:50:35PM +, Raphael Norwitz wrote:
> > > This change adds a command line option to print a line to standard out
> > > when the storage daemon has completed initialization and is ready to
> > > serve client connections.
> > > 
> > > This option will be used to resolve a hang in the vhost-user-blk-test.
> >  
> > Doesn't the existing --pidfile already serve the same job?  That is,
> > why not fix vhost-user-blk-test to take advantage of the pid-file
> > creation rather than output to stdout as evidence of when the storage
> > daemon is up and running?
> > 
> > Therefore, I don't think we need this patch.
> >
> 
> Sure - that make sense. I didn't use the pid-file because I didn't want to
> risk leaving junk on the filesystem if the storage-daemon crashed.

Ideally, storage-daemon doesn't crash during the test.  But even if it
does, we should still be able to register which files will be cleaned
up while exiting the test (if they exist), regardless of whether the
test succeeded or failed, because we have control over the pidfile
name before starting storage-daemon.  Put another way, the task of
cleaning up a pidfile during a test should not be a show-stopper.

[Side note: A long time ago, there were patches submitted to make the
iotests ./check engine run EVERY test in its own subdirectory, so that
cleaning up all files created by the test was trivial: nuke the
directory.  It also has the benefit that for debugging a failing test,
you merely pass an option to ./check that says to not nuke the
directory.  But it did not get applied at the time, and we have had
enough changes in the meantime that reinstating such a useful patch
would basically be work from scratch at this point]

> 
> I'll send a V2 using pid-file without this change.

Thanks, looking forward to it.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 2/2] Prevent vhost-user-blk-test hang

2021-08-27 Thread ebl...@redhat.com
On Fri, Aug 27, 2021 at 04:50:47PM +, Raphael Norwitz wrote:
> In the vhost-user-blk-test, as of now there is nothing stoping

stopping

> vhost-user-blk in QEMU writing to the socket right after forking off the
> storage daemon before it has a chance to come up properly, leaving the
> test hanging forever. This intermittently hanging test has caused QEMU
> automation failures reported multiple times on the mailing list [1].
> 
> This change makes the storage-daemon notify the vhost-user-blk-test
> that it is fully initialized and ready to handle client connections via
> a pipefd before allowing the test to proceed. This ensures that the
> storage-daemon backend won't miss vhost-user messages and thereby
> resolves the hang.

As I said on patch 1, I think the proper fix here is to utilize the
--pidfile option.

> 
> [1] 
> https://lore.kernel.org/qemu-devel/CAFEAcA8kYpz9LiPNxnWJAPSjc=nv532bedyfynabemeohqb...@mail.gmail.com/
> 
> Signed-off-by: Raphael Norwitz 
> ---
>  tests/qtest/vhost-user-blk-test.c | 33 ---
>  1 file changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/qtest/vhost-user-blk-test.c 
> b/tests/qtest/vhost-user-blk-test.c
> index 6f108a1b62..b62af449df 100644
> --- a/tests/qtest/vhost-user-blk-test.c
> +++ b/tests/qtest/vhost-user-blk-test.c
> @@ -21,6 +21,8 @@
>  #include "libqos/vhost-user-blk.h"
>  #include "libqos/libqos-pc.h"
>  
> +const char *daemon_msg = "Block exports setup\n";
> +
>  #define TEST_IMAGE_SIZE (64 * 1024 * 1024)
>  #define QVIRTIO_BLK_TIMEOUT_US  (30 * 1000 * 1000)
>  #define PCI_SLOT_HP 0x06
> @@ -885,7 +887,8 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> vus_instances,
>   int num_queues)
>  {
>  const char *vhost_user_blk_bin = qtest_qemu_storage_daemon_binary();
> -int i;
> +int i, err, pipe_fds[2];
> +char buf[32] = {0};
>  gchar *img_path;
>  GString *storage_daemon_command = g_string_new(NULL);
>  QemuStorageDaemonState *qsd;
> @@ -898,6 +901,12 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> vus_instances,
>  " -object memory-backend-memfd,id=mem,size=256M,share=on "
>  " -M memory-backend=mem -m 256M ");
>  
> +err = pipe(pipe_fds);
> +if (err != 0) {
> +fprintf(stderr, "start_vhost_user_blk: pipe() failed %m\n");
> +abort();
> +}

Instead of setting up a pipe()...

> +
>  for (i = 0; i < vus_instances; i++) {
>  int fd;
>  char *sock_path = create_listen_socket();
> @@ -914,22 +923,40 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> vus_instances,
> i + 1, sock_path);
>  }
>  
> +g_string_append_printf(storage_daemon_command, "--printset");

...change this to request the --pidfile option...

> +
>  g_test_message("starting vhost-user backend: %s",
> storage_daemon_command->str);
> +
>  pid_t pid = fork();
>  if (pid == 0) {
> +close(pipe_fds[0]);
> +
>  /*
>   * Close standard file descriptors so tap-driver.pl pipe detects when
>   * our parent terminates.
>   */
>  close(0);
> -close(1);
>  open("/dev/null", O_RDONLY);
> -open("/dev/null", O_WRONLY);
> +close(1);
> +dup2(pipe_fds[1], 1);
>  
>  execlp("/bin/sh", "sh", "-c", storage_daemon_command->str, NULL);
>  exit(1);
>  }
> +
> +close(pipe_fds[1]);
> +
> +err = read(pipe_fds[0], buf, 20);
> +if (err < 0) {
> +fprintf(stderr, "Failed to read from storage-daemon pipe %m\n");
> +abort();
> +} else if (strcmp(buf, daemon_msg) != 0) {
> +fprintf(stderr, "qemu-storage-daemon did not write expected messaage 
> "
> +"to the pipe. Total bytes read: %d. Got: %s\n", err, buf);
> +abort();
> +}

...and instead of trying to read() from a pipe, you instead wait until
the pid file exists.

> +
>  g_string_free(storage_daemon_command, true);
>  
>  qsd = g_new(QemuStorageDaemonState, 1);
> -- 
> 2.20.1
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 1/2] storage-daemon: add opt to print when initialized

2021-08-27 Thread ebl...@redhat.com
On Fri, Aug 27, 2021 at 04:50:35PM +, Raphael Norwitz wrote:
> This change adds a command line option to print a line to standard out
> when the storage daemon has completed initialization and is ready to
> serve client connections.
> 
> This option will be used to resolve a hang in the vhost-user-blk-test.

Doesn't the existing --pidfile already serve the same job?  That is,
why not fix vhost-user-blk-test to take advantage of the pid-file
creation rather than output to stdout as evidence of when the storage
daemon is up and running?

Therefore, I don't think we need this patch.

> 
> Signed-off-by: Raphael Norwitz 
> ---
>  storage-daemon/qemu-storage-daemon.c | 21 +
>  1 file changed, 21 insertions(+)

Missing a corresponding change to the man page
(docs/tools/qemu-storage-daemon.rst), if we decide to go with this
after all.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org