Re: [RFC V1 1/6] migration: SCM_RIGHTS for QEMUFile

2024-08-16 Thread Peter Xu
On Fri, Aug 16, 2024 at 11:13:20AM -0400, Steven Sistare wrote:
> On 8/15/2024 4:58 PM, Peter Xu wrote:
> > On Sun, Jun 30, 2024 at 12:44:03PM -0700, Steve Sistare wrote:
> > > Define functions to put/get file descriptors to/from a QEMUFile, for qio
> > > channels that support SCM_RIGHTS.  Maintain ordering such that
> > >put(A), put(fd), put(B)
> > > followed by
> > >get(A), get(fd), get(B)
> > > always succeeds.  Other get orderings may succeed but are not guaranteed.
> > > 
> > > Signed-off-by: Steve Sistare 
> > 
> > It's a slight pity that we'll extend fd to qemufiles, rather than changing
> > qemufiles back to iochannels.. for the long term we want to remove qemufile.
> 
> Thanks, I did not know that removing QEMUFile is a goal.
> 
> Currently QEMUFile buffers I/O in userland to reduce system calls.  Do you
> plan to eliminate that buffering, or move it to QIOChannel, perhaps in a
> new QIOChannelBuffered class?
> 
> If you eliminate the buffering, then migration might take longer.
> If you keep the buffering, then this patch's logic will still be needed
> in the QIOChannelBuffered code.

+Dan.

Yes probably the buffering is still needed ultimately.  It's just that it
currently involves qemufile which is (hopefully..) destined to be either
removed or updated.

For this cpr states, I'm not sure the buffered iochannel is a must, e.g. I
wonder what happens if we start with using iochannels directly without
buffering, then after replacing that with buffered iochannels, it'll simply
improve on top without changing much of the code - I mean, buffered
iochannel should still be able to be casted into an iochannel anyway.

> 
> > Would you think we can start to introduce iochannel-compatible vmstate
> > loader from cpr-[exec/transfer] here?  The idea is that we'd want
> > vmstate_load_iochannel() then take that from an iochannel and start getting
> > rid of qemufile API.  It'll already bring two benefits:
> > 
> >- We don't need this patch then I assume, but stick with iochannel API
> > 
> >- We can have Error** as last parameter of vmstate_load_iochannel(), then
> >  as we discussed in the other thread cpr_state_load() can fail with
> >  explicit errors without error_report()s (and as you pointed out, the
> >  load side of Error** support is yet missing)
> > 
> > There's a 3rd potential benefit, and will come to play when we want to
> > allow multifd threads to load device states / VMSDs at some point, as
> > multifd doesn't maintain qemufiles, but only iochannels.
> > 
> > I'm not sure whether it adds too much to you yet, but I'm curious how you
> > think about it.
> 
> A decent idea, but the task quickly mushrooms.  All of the VMSTATE macros used
> in cpr.c would need to be converted, and that stack is deep. eg:
> 
>   VMSTATE_INT32
> vmstate_info_int32
>   put_int32
> qemu_put_sbe32s
>   qemu_put_byte
> add_buf_to_iovec
>   qemu_fflush
>qio_channel_writev_all

Right, right after I sent the email I noticed this too..

The chance is the new iochannel API only resolves whatever needed for cpr
early states, currently:

static const VMStateDescription vmstate_cpr_state = {
.name = CPR_STATE,
.version_id = 1,
.minimum_version_id = 1,
.pre_save = cpr_state_presave,
.fields = (VMStateField[]) {
VMSTATE_UINT32(mode, CprState),
VMSTATE_QLIST_V(fds, CprState, 1, vmstate_cpr_fd, CprFd, next),
VMSTATE_END_OF_LIST()
}
};

static const VMStateDescription vmstate_cpr_fd = {
.name = "cpr fd",
.version_id = 1,
.minimum_version_id = 1,
.fields = (VMStateField[]) {
VMSTATE_UINT32(namelen, CprFd),
VMSTATE_VBUFFER_ALLOC_UINT32(name, CprFd, 0, NULL, namelen),
VMSTATE_INT32(id, CprFd),
VMSTATE_FD(fd, CprFd),
VMSTATE_END_OF_LIST()
}
};

IIUC, a summary of a few things only so far: UINT32/INT32, VBUFFER, FD.
Here FD is the new one, which we'll need to support that anyway, either
with the qemufile API or iochannel API.  Then the rest looks pretty
limited.  IOW, the new iochannel-based vmstate_load() doesn't yet need to
know anything else but these types of fields.  While I don't think I have a
full grasp yet on everything; that just sounds like the right direction to
go for the longer term.

Said that, as you said there can still be plenty of work there.  Steve,
feel free to think about it and take whatever approach you like.

Thanks,

-- 
Peter Xu




Re: [RFC V1 1/6] migration: SCM_RIGHTS for QEMUFile

2024-08-16 Thread Steven Sistare

On 8/15/2024 4:58 PM, Peter Xu wrote:

On Sun, Jun 30, 2024 at 12:44:03PM -0700, Steve Sistare wrote:

Define functions to put/get file descriptors to/from a QEMUFile, for qio
channels that support SCM_RIGHTS.  Maintain ordering such that
   put(A), put(fd), put(B)
followed by
   get(A), get(fd), get(B)
always succeeds.  Other get orderings may succeed but are not guaranteed.

Signed-off-by: Steve Sistare 


It's a slight pity that we'll extend fd to qemufiles, rather than changing
qemufiles back to iochannels.. for the long term we want to remove qemufile.


Thanks, I did not know that removing QEMUFile is a goal.

Currently QEMUFile buffers I/O in userland to reduce system calls.  Do you
plan to eliminate that buffering, or move it to QIOChannel, perhaps in a
new QIOChannelBuffered class?

If you eliminate the buffering, then migration might take longer.
If you keep the buffering, then this patch's logic will still be needed
in the QIOChannelBuffered code.


Would you think we can start to introduce iochannel-compatible vmstate
loader from cpr-[exec/transfer] here?  The idea is that we'd want
vmstate_load_iochannel() then take that from an iochannel and start getting
rid of qemufile API.  It'll already bring two benefits:

   - We don't need this patch then I assume, but stick with iochannel API

   - We can have Error** as last parameter of vmstate_load_iochannel(), then
 as we discussed in the other thread cpr_state_load() can fail with
 explicit errors without error_report()s (and as you pointed out, the
 load side of Error** support is yet missing)

There's a 3rd potential benefit, and will come to play when we want to
allow multifd threads to load device states / VMSDs at some point, as
multifd doesn't maintain qemufiles, but only iochannels.

I'm not sure whether it adds too much to you yet, but I'm curious how you
think about it.


A decent idea, but the task quickly mushrooms.  All of the VMSTATE macros used
in cpr.c would need to be converted, and that stack is deep. eg:

  VMSTATE_INT32
vmstate_info_int32
  put_int32
qemu_put_sbe32s
  qemu_put_byte
add_buf_to_iovec
  qemu_fflush
   qio_channel_writev_all

- Steve


---
  migration/qemu-file.c  | 83 +++---
  migration/qemu-file.h  |  2 ++
  migration/trace-events |  2 ++
  3 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index b6d2f58..424c27d 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -37,6 +37,11 @@
  #define IO_BUF_SIZE 32768
  #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
  
+typedef struct FdEntry {

+QTAILQ_ENTRY(FdEntry) entry;
+int fd;
+} FdEntry;
+
  struct QEMUFile {
  QIOChannel *ioc;
  bool is_writable;
@@ -51,6 +56,9 @@ struct QEMUFile {
  
  int last_error;

  Error *last_error_obj;
+
+bool fd_pass;
+QTAILQ_HEAD(, FdEntry) fds;
  };
  
  /*

@@ -109,6 +117,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool 
is_writable)
  object_ref(ioc);
  f->ioc = ioc;
  f->is_writable = is_writable;
+f->fd_pass = qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS);
+QTAILQ_INIT(&f->fds);
  
  return f;

  }
@@ -310,6 +320,10 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
  int len;
  int pending;
  Error *local_error = NULL;
+g_autofree int *fds = NULL;
+size_t nfd = 0;
+int **pfds = f->fd_pass ? &fds : NULL;
+size_t *pnfd = f->fd_pass ? &nfd : NULL;
  
  assert(!qemu_file_is_writable(f));
  
@@ -325,10 +339,9 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)

  }
  
  do {

-len = qio_channel_read(f->ioc,
-   (char *)f->buf + pending,
-   IO_BUF_SIZE - pending,
-   &local_error);
+struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
+len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
+ &local_error);
  if (len == QIO_CHANNEL_ERR_BLOCK) {
  if (qemu_in_coroutine()) {
  qio_channel_yield(f->ioc, G_IO_IN);
@@ -348,9 +361,65 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
  qemu_file_set_error_obj(f, len, local_error);
  }
  
+for (int i = 0; i < nfd; i++) {

+FdEntry *fde = g_new0(FdEntry, 1);
+fde->fd = fds[i];
+QTAILQ_INSERT_TAIL(&f->fds, fde, entry);
+}
+
  return len;
  }
  
+int qemu_file_put_fd(QEMUFile *f, int fd)

+{
+int ret = 0;
+QIOChannel *ioc = qemu_file_get_ioc(f);
+Error *err = NULL;
+struct iovec iov = { (void *)" ", 1 };
+
+/*
+ * Send a dummy byte so qemu_fill_buffer on the receiving side does not
+ * fail with a len=0 error.  Flush first to maintain ordering wrt other
+ * data.
+ */
+

Re: [RFC V1 1/6] migration: SCM_RIGHTS for QEMUFile

2024-08-15 Thread Peter Xu
On Sun, Jun 30, 2024 at 12:44:03PM -0700, Steve Sistare wrote:
> Define functions to put/get file descriptors to/from a QEMUFile, for qio
> channels that support SCM_RIGHTS.  Maintain ordering such that
>   put(A), put(fd), put(B)
> followed by
>   get(A), get(fd), get(B)
> always succeeds.  Other get orderings may succeed but are not guaranteed.
> 
> Signed-off-by: Steve Sistare 

It's a slight pity that we'll extend fd to qemufiles, rather than changing
qemufiles back to iochannels.. for the long term we want to remove qemufile.

Would you think we can start to introduce iochannel-compatible vmstate
loader from cpr-[exec/transfer] here?  The idea is that we'd want
vmstate_load_iochannel() then take that from an iochannel and start getting
rid of qemufile API.  It'll already bring two benefits:

  - We don't need this patch then I assume, but stick with iochannel API

  - We can have Error** as last parameter of vmstate_load_iochannel(), then
as we discussed in the other thread cpr_state_load() can fail with
explicit errors without error_report()s (and as you pointed out, the
load side of Error** support is yet missing)

There's a 3rd potential benefit, and will come to play when we want to
allow multifd threads to load device states / VMSDs at some point, as
multifd doesn't maintain qemufiles, but only iochannels.

I'm not sure whether it adds too much to you yet, but I'm curious how you
think about it.

Thanks,

> ---
>  migration/qemu-file.c  | 83 
> +++---
>  migration/qemu-file.h  |  2 ++
>  migration/trace-events |  2 ++
>  3 files changed, 83 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index b6d2f58..424c27d 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -37,6 +37,11 @@
>  #define IO_BUF_SIZE 32768
>  #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
>  
> +typedef struct FdEntry {
> +QTAILQ_ENTRY(FdEntry) entry;
> +int fd;
> +} FdEntry;
> +
>  struct QEMUFile {
>  QIOChannel *ioc;
>  bool is_writable;
> @@ -51,6 +56,9 @@ struct QEMUFile {
>  
>  int last_error;
>  Error *last_error_obj;
> +
> +bool fd_pass;
> +QTAILQ_HEAD(, FdEntry) fds;
>  };
>  
>  /*
> @@ -109,6 +117,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool 
> is_writable)
>  object_ref(ioc);
>  f->ioc = ioc;
>  f->is_writable = is_writable;
> +f->fd_pass = qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS);
> +QTAILQ_INIT(&f->fds);
>  
>  return f;
>  }
> @@ -310,6 +320,10 @@ static ssize_t coroutine_mixed_fn 
> qemu_fill_buffer(QEMUFile *f)
>  int len;
>  int pending;
>  Error *local_error = NULL;
> +g_autofree int *fds = NULL;
> +size_t nfd = 0;
> +int **pfds = f->fd_pass ? &fds : NULL;
> +size_t *pnfd = f->fd_pass ? &nfd : NULL;
>  
>  assert(!qemu_file_is_writable(f));
>  
> @@ -325,10 +339,9 @@ static ssize_t coroutine_mixed_fn 
> qemu_fill_buffer(QEMUFile *f)
>  }
>  
>  do {
> -len = qio_channel_read(f->ioc,
> -   (char *)f->buf + pending,
> -   IO_BUF_SIZE - pending,
> -   &local_error);
> +struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
> +len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
> + &local_error);
>  if (len == QIO_CHANNEL_ERR_BLOCK) {
>  if (qemu_in_coroutine()) {
>  qio_channel_yield(f->ioc, G_IO_IN);
> @@ -348,9 +361,65 @@ static ssize_t coroutine_mixed_fn 
> qemu_fill_buffer(QEMUFile *f)
>  qemu_file_set_error_obj(f, len, local_error);
>  }
>  
> +for (int i = 0; i < nfd; i++) {
> +FdEntry *fde = g_new0(FdEntry, 1);
> +fde->fd = fds[i];
> +QTAILQ_INSERT_TAIL(&f->fds, fde, entry);
> +}
> +
>  return len;
>  }
>  
> +int qemu_file_put_fd(QEMUFile *f, int fd)
> +{
> +int ret = 0;
> +QIOChannel *ioc = qemu_file_get_ioc(f);
> +Error *err = NULL;
> +struct iovec iov = { (void *)" ", 1 };
> +
> +/*
> + * Send a dummy byte so qemu_fill_buffer on the receiving side does not
> + * fail with a len=0 error.  Flush first to maintain ordering wrt other
> + * data.
> + */
> +
> +qemu_fflush(f);
> +if (qio_channel_writev_full(ioc, &iov, 1, &fd, 1, 0, &err) < 1) {
> +error_report_err(error_copy(err));
> +qemu_file_set_error_obj(f, -EIO, err);
> +ret = -1;
> +}
> +trace_qemu_file_put_fd(f->ioc->name, fd, ret);
> +return 0;
> +}
> +
> +int qemu_file_get_fd(QEMUFile *f)
> +{
> +int fd = -1;
> +FdEntry *fde;
> +
> +if (!f->fd_pass) {
> +Error *err = NULL;
> +error_setg(&err, "%s does not support fd passing", f->ioc->name);
> +error_report_err(error_copy(err));
> +qemu_file_set_error_obj(f, -EIO, err);
> +goto out;

Re: [RFC V1 1/6] migration: SCM_RIGHTS for QEMUFile

2024-08-05 Thread Steven Sistare

On 8/2/2024 4:20 AM, Euan Turner wrote:

On 30/06/2024 20:44, Steve Sistare wrote:

Define functions to put/get file descriptors to/from a QEMUFile, for qio
channels that support SCM_RIGHTS.  Maintain ordering such that
   put(A), put(fd), put(B)
followed by
   get(A), get(fd), get(B)
always succeeds.  Other get orderings may succeed but are not guaranteed.

Signed-off-by: Steve Sistare 
---
  migration/qemu-file.c  | 83 +++---
  migration/qemu-file.h  |  2 ++
  migration/trace-events |  2 ++
  3 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index b6d2f58..424c27d 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -37,6 +37,11 @@
  #define IO_BUF_SIZE 32768
  #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
+typedef struct FdEntry {
+    QTAILQ_ENTRY(FdEntry) entry;
+    int fd;
+} FdEntry;
+
  struct QEMUFile {
  QIOChannel *ioc;
  bool is_writable;
@@ -51,6 +56,9 @@ struct QEMUFile {
  int last_error;
  Error *last_error_obj;
+
+    bool fd_pass;
+    QTAILQ_HEAD(, FdEntry) fds;
  };
  /*
@@ -109,6 +117,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool 
is_writable)
  object_ref(ioc);
  f->ioc = ioc;
  f->is_writable = is_writable;
+    f->fd_pass = qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS);
+    QTAILQ_INIT(&f->fds);
  return f;
  }
@@ -310,6 +320,10 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
  int len;
  int pending;
  Error *local_error = NULL;
+    g_autofree int *fds = NULL;
+    size_t nfd = 0;
+    int **pfds = f->fd_pass ? &fds : NULL;
+    size_t *pnfd = f->fd_pass ? &nfd : NULL;
  assert(!qemu_file_is_writable(f));
@@ -325,10 +339,9 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
  }
  do {
-    len = qio_channel_read(f->ioc,
-   (char *)f->buf + pending,
-   IO_BUF_SIZE - pending,
-   &local_error);
+    struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
+    len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
+ &local_error);
  if (len == QIO_CHANNEL_ERR_BLOCK) {
  if (qemu_in_coroutine()) {
  qio_channel_yield(f->ioc, G_IO_IN);
@@ -348,9 +361,65 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
  qemu_file_set_error_obj(f, len, local_error);
  }
+    for (int i = 0; i < nfd; i++) {
+    FdEntry *fde = g_new0(FdEntry, 1);
+    fde->fd = fds[i];
+    QTAILQ_INSERT_TAIL(&f->fds, fde, entry);
+    }
+
  return len;
  }
+int qemu_file_put_fd(QEMUFile *f, int fd)
+{
+    int ret = 0;
+    QIOChannel *ioc = qemu_file_get_ioc(f);
+    Error *err = NULL;
+    struct iovec iov = { (void *)" ", 1 };
+
+    /*
+ * Send a dummy byte so qemu_fill_buffer on the receiving side does not
+ * fail with a len=0 error.  Flush first to maintain ordering wrt other
+ * data.
+ */
+
+    qemu_fflush(f);
+    if (qio_channel_writev_full(ioc, &iov, 1, &fd, 1, 0, &err) < 1) {
+    error_report_err(error_copy(err));
+    qemu_file_set_error_obj(f, -EIO, err);
+    ret = -1;
+    }
+    trace_qemu_file_put_fd(f->ioc->name, fd, ret);
+    return 0;

I think you intended return ret?


Yes, thanks - steve



Re: [RFC V1 1/6] migration: SCM_RIGHTS for QEMUFile

2024-08-02 Thread Euan Turner

On 30/06/2024 20:44, Steve Sistare wrote:

Define functions to put/get file descriptors to/from a QEMUFile, for qio
channels that support SCM_RIGHTS.  Maintain ordering such that
   put(A), put(fd), put(B)
followed by
   get(A), get(fd), get(B)
always succeeds.  Other get orderings may succeed but are not guaranteed.

Signed-off-by: Steve Sistare 
---
  migration/qemu-file.c  | 83 +++---
  migration/qemu-file.h  |  2 ++
  migration/trace-events |  2 ++
  3 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index b6d2f58..424c27d 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -37,6 +37,11 @@
  #define IO_BUF_SIZE 32768
  #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
  
+typedef struct FdEntry {

+QTAILQ_ENTRY(FdEntry) entry;
+int fd;
+} FdEntry;
+
  struct QEMUFile {
  QIOChannel *ioc;
  bool is_writable;
@@ -51,6 +56,9 @@ struct QEMUFile {
  
  int last_error;

  Error *last_error_obj;
+
+bool fd_pass;
+QTAILQ_HEAD(, FdEntry) fds;
  };
  
  /*

@@ -109,6 +117,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool 
is_writable)
  object_ref(ioc);
  f->ioc = ioc;
  f->is_writable = is_writable;
+f->fd_pass = qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS);
+QTAILQ_INIT(&f->fds);
  
  return f;

  }
@@ -310,6 +320,10 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
  int len;
  int pending;
  Error *local_error = NULL;
+g_autofree int *fds = NULL;
+size_t nfd = 0;
+int **pfds = f->fd_pass ? &fds : NULL;
+size_t *pnfd = f->fd_pass ? &nfd : NULL;
  
  assert(!qemu_file_is_writable(f));
  
@@ -325,10 +339,9 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)

  }
  
  do {

-len = qio_channel_read(f->ioc,
-   (char *)f->buf + pending,
-   IO_BUF_SIZE - pending,
-   &local_error);
+struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
+len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
+ &local_error);
  if (len == QIO_CHANNEL_ERR_BLOCK) {
  if (qemu_in_coroutine()) {
  qio_channel_yield(f->ioc, G_IO_IN);
@@ -348,9 +361,65 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
  qemu_file_set_error_obj(f, len, local_error);
  }
  
+for (int i = 0; i < nfd; i++) {

+FdEntry *fde = g_new0(FdEntry, 1);
+fde->fd = fds[i];
+QTAILQ_INSERT_TAIL(&f->fds, fde, entry);
+}
+
  return len;
  }
  
+int qemu_file_put_fd(QEMUFile *f, int fd)

+{
+int ret = 0;
+QIOChannel *ioc = qemu_file_get_ioc(f);
+Error *err = NULL;
+struct iovec iov = { (void *)" ", 1 };
+
+/*
+ * Send a dummy byte so qemu_fill_buffer on the receiving side does not
+ * fail with a len=0 error.  Flush first to maintain ordering wrt other
+ * data.
+ */
+
+qemu_fflush(f);
+if (qio_channel_writev_full(ioc, &iov, 1, &fd, 1, 0, &err) < 1) {
+error_report_err(error_copy(err));
+qemu_file_set_error_obj(f, -EIO, err);
+ret = -1;
+}
+trace_qemu_file_put_fd(f->ioc->name, fd, ret);
+return 0;

I think you intended return ret?

+}
+
+int qemu_file_get_fd(QEMUFile *f)
+{
+int fd = -1;
+FdEntry *fde;
+
+if (!f->fd_pass) {
+Error *err = NULL;
+error_setg(&err, "%s does not support fd passing", f->ioc->name);
+error_report_err(error_copy(err));
+qemu_file_set_error_obj(f, -EIO, err);
+goto out;
+}
+
+/* Force the dummy byte and its fd passenger to appear. */
+qemu_peek_byte(f, 0);
+
+fde = QTAILQ_FIRST(&f->fds);
+if (fde) {
+qemu_get_byte(f);   /* Drop the dummy byte */
+fd = fde->fd;
+QTAILQ_REMOVE(&f->fds, fde, entry);
+}
+out:
+trace_qemu_file_get_fd(f->ioc->name, fd);
+return fd;
+}
+
  /** Closes the file
   *
   * Returns negative error value if any error happened on previous operations 
or
@@ -361,11 +430,17 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
   */
  int qemu_fclose(QEMUFile *f)
  {
+FdEntry *fde, *next;
  int ret = qemu_fflush(f);
  int ret2 = qio_channel_close(f->ioc, NULL);
  if (ret >= 0) {
  ret = ret2;
  }
+QTAILQ_FOREACH_SAFE(fde, &f->fds, entry, next) {
+warn_report("qemu_fclose: received fd %d was never claimed", fde->fd);
+close(fde->fd);
+g_free(fde);
+}
  g_clear_pointer(&f->ioc, object_unref);
  error_free(f->last_error_obj);
  g_free(f);
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 11c2120..3e47a20 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -79,5 +79,7 @@ size_t qemu_get_buffer_at(QEMUFile *f, const u

[RFC V1 1/6] migration: SCM_RIGHTS for QEMUFile

2024-06-30 Thread Steve Sistare
Define functions to put/get file descriptors to/from a QEMUFile, for qio
channels that support SCM_RIGHTS.  Maintain ordering such that
  put(A), put(fd), put(B)
followed by
  get(A), get(fd), get(B)
always succeeds.  Other get orderings may succeed but are not guaranteed.

Signed-off-by: Steve Sistare 
---
 migration/qemu-file.c  | 83 +++---
 migration/qemu-file.h  |  2 ++
 migration/trace-events |  2 ++
 3 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index b6d2f58..424c27d 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -37,6 +37,11 @@
 #define IO_BUF_SIZE 32768
 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
 
+typedef struct FdEntry {
+QTAILQ_ENTRY(FdEntry) entry;
+int fd;
+} FdEntry;
+
 struct QEMUFile {
 QIOChannel *ioc;
 bool is_writable;
@@ -51,6 +56,9 @@ struct QEMUFile {
 
 int last_error;
 Error *last_error_obj;
+
+bool fd_pass;
+QTAILQ_HEAD(, FdEntry) fds;
 };
 
 /*
@@ -109,6 +117,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool 
is_writable)
 object_ref(ioc);
 f->ioc = ioc;
 f->is_writable = is_writable;
+f->fd_pass = qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS);
+QTAILQ_INIT(&f->fds);
 
 return f;
 }
@@ -310,6 +320,10 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
 int len;
 int pending;
 Error *local_error = NULL;
+g_autofree int *fds = NULL;
+size_t nfd = 0;
+int **pfds = f->fd_pass ? &fds : NULL;
+size_t *pnfd = f->fd_pass ? &nfd : NULL;
 
 assert(!qemu_file_is_writable(f));
 
@@ -325,10 +339,9 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
 }
 
 do {
-len = qio_channel_read(f->ioc,
-   (char *)f->buf + pending,
-   IO_BUF_SIZE - pending,
-   &local_error);
+struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
+len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
+ &local_error);
 if (len == QIO_CHANNEL_ERR_BLOCK) {
 if (qemu_in_coroutine()) {
 qio_channel_yield(f->ioc, G_IO_IN);
@@ -348,9 +361,65 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
 qemu_file_set_error_obj(f, len, local_error);
 }
 
+for (int i = 0; i < nfd; i++) {
+FdEntry *fde = g_new0(FdEntry, 1);
+fde->fd = fds[i];
+QTAILQ_INSERT_TAIL(&f->fds, fde, entry);
+}
+
 return len;
 }
 
+int qemu_file_put_fd(QEMUFile *f, int fd)
+{
+int ret = 0;
+QIOChannel *ioc = qemu_file_get_ioc(f);
+Error *err = NULL;
+struct iovec iov = { (void *)" ", 1 };
+
+/*
+ * Send a dummy byte so qemu_fill_buffer on the receiving side does not
+ * fail with a len=0 error.  Flush first to maintain ordering wrt other
+ * data.
+ */
+
+qemu_fflush(f);
+if (qio_channel_writev_full(ioc, &iov, 1, &fd, 1, 0, &err) < 1) {
+error_report_err(error_copy(err));
+qemu_file_set_error_obj(f, -EIO, err);
+ret = -1;
+}
+trace_qemu_file_put_fd(f->ioc->name, fd, ret);
+return 0;
+}
+
+int qemu_file_get_fd(QEMUFile *f)
+{
+int fd = -1;
+FdEntry *fde;
+
+if (!f->fd_pass) {
+Error *err = NULL;
+error_setg(&err, "%s does not support fd passing", f->ioc->name);
+error_report_err(error_copy(err));
+qemu_file_set_error_obj(f, -EIO, err);
+goto out;
+}
+
+/* Force the dummy byte and its fd passenger to appear. */
+qemu_peek_byte(f, 0);
+
+fde = QTAILQ_FIRST(&f->fds);
+if (fde) {
+qemu_get_byte(f);   /* Drop the dummy byte */
+fd = fde->fd;
+QTAILQ_REMOVE(&f->fds, fde, entry);
+}
+out:
+trace_qemu_file_get_fd(f->ioc->name, fd);
+return fd;
+}
+
 /** Closes the file
  *
  * Returns negative error value if any error happened on previous operations or
@@ -361,11 +430,17 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
  */
 int qemu_fclose(QEMUFile *f)
 {
+FdEntry *fde, *next;
 int ret = qemu_fflush(f);
 int ret2 = qio_channel_close(f->ioc, NULL);
 if (ret >= 0) {
 ret = ret2;
 }
+QTAILQ_FOREACH_SAFE(fde, &f->fds, entry, next) {
+warn_report("qemu_fclose: received fd %d was never claimed", fde->fd);
+close(fde->fd);
+g_free(fde);
+}
 g_clear_pointer(&f->ioc, object_unref);
 error_free(f->last_error_obj);
 g_free(f);
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 11c2120..3e47a20 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -79,5 +79,7 @@ size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, 
size_t buflen,
   off_t pos);
 
 QIOChannel *qemu_file_get_ioc(QEMUFile *file);
+int qemu_file_put_fd(QEM