Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec

2024-01-18 Thread Peter Xu
On Thu, Jan 18, 2024 at 09:47:18AM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Wed, Jan 17, 2024 at 03:06:15PM -0300, Fabiano Rosas wrote:
> >> Oh no, you're right. Because of p->pending_job. And thinking about
> >> p->pending_job, wouldn't a trylock to the same job while being more
> >> explicit?
> >> 
> >> next_channel %= migrate_multifd_channels();
> >> for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
> >> p = &multifd_send_state->params[i];
> >> 
> >> if(qemu_mutex_trylock(&p->mutex)) {
> >> if (p->quit) {
> >> error_report("%s: channel %d has already quit!", __func__, 
> >> i);
> >> qemu_mutex_unlock(&p->mutex);
> >> return -1;
> >> }
> >> next_channel = (i + 1) % migrate_multifd_channels();
> >> break;
> >> } else {
> >> /* channel still busy, try the next one */
> >> }
> >> }
> >> multifd_send_state->pages = p->pages;
> >> p->pages = pages;
> >> qemu_mutex_unlock(&p->mutex);
> >
> > We probably can't for now; multifd_send_thread() will unlock the mutex
> > before the iochannel write()s, while the write()s will need those fields.
> 
> Right, but we'd change that code to do the IO with the lock held. If no
> one is blocking, it should be ok to hold the lock. Anyway, food for
> thought.

I see what you meant.  Sounds possible.

-- 
Peter Xu




Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec

2024-01-18 Thread Fabiano Rosas
Peter Xu  writes:

> On Wed, Jan 17, 2024 at 03:06:15PM -0300, Fabiano Rosas wrote:
>> Oh no, you're right. Because of p->pending_job. And thinking about
>> p->pending_job, wouldn't a trylock to the same job while being more
>> explicit?
>> 
>> next_channel %= migrate_multifd_channels();
>> for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
>> p = &multifd_send_state->params[i];
>> 
>> if(qemu_mutex_trylock(&p->mutex)) {
>> if (p->quit) {
>> error_report("%s: channel %d has already quit!", __func__, 
>> i);
>> qemu_mutex_unlock(&p->mutex);
>> return -1;
>> }
>> next_channel = (i + 1) % migrate_multifd_channels();
>> break;
>> } else {
>> /* channel still busy, try the next one */
>> }
>> }
>> multifd_send_state->pages = p->pages;
>> p->pages = pages;
>> qemu_mutex_unlock(&p->mutex);
>
> We probably can't for now; multifd_send_thread() will unlock the mutex
> before the iochannel write()s, while the write()s will need those fields.

Right, but we'd change that code to do the IO with the lock held. If no
one is blocking, it should be ok to hold the lock. Anyway, food for
thought.




Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec

2024-01-17 Thread Peter Xu
On Wed, Jan 17, 2024 at 03:06:15PM -0300, Fabiano Rosas wrote:
> Oh no, you're right. Because of p->pending_job. And thinking about
> p->pending_job, wouldn't a trylock to the same job while being more
> explicit?
> 
> next_channel %= migrate_multifd_channels();
> for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
> p = &multifd_send_state->params[i];
> 
> if(qemu_mutex_trylock(&p->mutex)) {
> if (p->quit) {
> error_report("%s: channel %d has already quit!", __func__, i);
> qemu_mutex_unlock(&p->mutex);
> return -1;
> }
> next_channel = (i + 1) % migrate_multifd_channels();
> break;
> } else {
> /* channel still busy, try the next one */
> }
> }
> multifd_send_state->pages = p->pages;
> p->pages = pages;
> qemu_mutex_unlock(&p->mutex);

We probably can't for now; multifd_send_thread() will unlock the mutex
before the iochannel write()s, while the write()s will need those fields.

> Ok, then I can take block->pages_offset and block->host from the
> ramblock. I think I prefer something like this, that way we can be
> explicit about the migration assumptions.

I'm glad we reached an initial consensus.  Yes let's put that in
migration/; I won't expect this code will be used by other iochannel users.

-- 
Peter Xu




Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec

2024-01-17 Thread Fabiano Rosas
Daniel P. Berrangé  writes:

> On Wed, Jan 17, 2024 at 12:39:26PM +, Daniel P. Berrangé wrote:
>> On Mon, Nov 27, 2023 at 05:25:57PM -0300, Fabiano Rosas wrote:
>> > For the upcoming support to fixed-ram migration with multifd, we need
>> > to be able to accept an iovec array with non-contiguous data.
>> > 
>> > Add a pwritev and preadv version that splits the array into contiguous
>> > segments before writing. With that we can have the ram code continue
>> > to add pages in any order and the multifd code continue to send large
>> > arrays for reading and writing.
>> > 
>> > Signed-off-by: Fabiano Rosas 
>> > ---
>> > - split the API that was merged into a single function
>> > - use uintptr_t for compatibility with 32-bit
>> > ---
>> >  include/io/channel.h | 26 
>> >  io/channel.c | 70 
>> >  2 files changed, 96 insertions(+)
>> > 
>> > diff --git a/include/io/channel.h b/include/io/channel.h
>> > index 7986c49c71..25383db5aa 100644
>> > --- a/include/io/channel.h
>> > +++ b/include/io/channel.h
>> > @@ -559,6 +559,19 @@ int qio_channel_close(QIOChannel *ioc,
>> >  ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
>> >  size_t niov, off_t offset, Error **errp);
>> >  
>> > +/**
>> > + * qio_channel_pwritev_all:
>> > + * @ioc: the channel object
>> > + * @iov: the array of memory regions to write data from
>> > + * @niov: the length of the @iov array
>> > + * @offset: the iovec offset in the file where to write the data
>> > + * @errp: pointer to a NULL-initialized error object
>> > + *
>> > + * Returns: 0 if all bytes were written, or -1 on error
>> > + */
>> > +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
>> > +size_t niov, off_t offset, Error **errp);
>> > +
>> >  /**
>> >   * qio_channel_pwrite
>> >   * @ioc: the channel object
>> > @@ -595,6 +608,19 @@ ssize_t qio_channel_pwrite(QIOChannel *ioc, char 
>> > *buf, size_t buflen,
>> >  ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
>> > size_t niov, off_t offset, Error **errp);
>> >  
>> > +/**
>> > + * qio_channel_preadv_all:
>> > + * @ioc: the channel object
>> > + * @iov: the array of memory regions to read data to
>> > + * @niov: the length of the @iov array
>> > + * @offset: the iovec offset in the file from where to read the data
>> > + * @errp: pointer to a NULL-initialized error object
>> > + *
>> > + * Returns: 0 if all bytes were read, or -1 on error
>> > + */
>> > +int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
>> > +   size_t niov, off_t offset, Error **errp);
>> > +
>> >  /**
>> >   * qio_channel_pread
>> >   * @ioc: the channel object
>> > diff --git a/io/channel.c b/io/channel.c
>> > index a1f12f8e90..2f1745d052 100644
>> > --- a/io/channel.c
>> > +++ b/io/channel.c
>> > @@ -472,6 +472,69 @@ ssize_t qio_channel_pwritev(QIOChannel *ioc, const 
>> > struct iovec *iov,
>> >  return klass->io_pwritev(ioc, iov, niov, offset, errp);
>> >  }
>> >  
>> > +static int qio_channel_preadv_pwritev_contiguous(QIOChannel *ioc,
>> > + const struct iovec *iov,
>> > + size_t niov, off_t 
>> > offset,
>> > + bool is_write, Error 
>> > **errp)
>> > +{
>> > +ssize_t ret = -1;
>> > +int i, slice_idx, slice_num;
>> > +uintptr_t base, next, file_offset;
>> > +size_t len;
>> > +
>> > +slice_idx = 0;
>> > +slice_num = 1;
>> > +
>> > +/*
>> > + * If the iov array doesn't have contiguous elements, we need to
>> > + * split it in slices because we only have one (file) 'offset' for
>> > + * the whole iov. Do this here so callers don't need to break the
>> > + * iov array themselves.
>> > + */
>> > +for (i = 0; i < niov; i++, slice_num++) {
>> > +base = (uintptr_t) iov[i].iov_base;
>> > +
>> > +if (i != niov - 1) {
>> > +len = iov[i].iov_len;
>> > +next = (uintptr_t) iov[i + 1].iov_base;
>> > +
>> > +if (base + len == next) {
>> > +continue;
>> > +}
>> > +}
>> > +
>> > +/*
>> > + * Use the offset of the first element of the segment that
>> > + * we're sending.
>> > + */
>> > +file_offset = offset + (uintptr_t) iov[slice_idx].iov_base;
>> > +
>> > +if (is_write) {
>> > +ret = qio_channel_pwritev(ioc, &iov[slice_idx], slice_num,
>> > +  file_offset, errp);
>> > +} else {
>> > +ret = qio_channel_preadv(ioc, &iov[slice_idx], slice_num,
>> > + file_offset, errp);
>> > +}
>> 
>> iov_base is the address of a pointer in RAM, so could be
>> potentially any 64-bit value.
>> 
>

Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec

2024-01-17 Thread Fabiano Rosas
Peter Xu  writes:

> On Tue, Jan 16, 2024 at 03:15:50PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Mon, Nov 27, 2023 at 05:25:57PM -0300, Fabiano Rosas wrote:
>> >> For the upcoming support to fixed-ram migration with multifd, we need
>> >> to be able to accept an iovec array with non-contiguous data.
>> >> 
>> >> Add a pwritev and preadv version that splits the array into contiguous
>> >> segments before writing. With that we can have the ram code continue
>> >> to add pages in any order and the multifd code continue to send large
>> >> arrays for reading and writing.
>> >> 
>> >> Signed-off-by: Fabiano Rosas 
>> >> ---
>> >> - split the API that was merged into a single function
>> >> - use uintptr_t for compatibility with 32-bit
>> >> ---
>> >>  include/io/channel.h | 26 
>> >>  io/channel.c | 70 
>> >>  2 files changed, 96 insertions(+)
>> >> 
>> >> diff --git a/include/io/channel.h b/include/io/channel.h
>> >> index 7986c49c71..25383db5aa 100644
>> >> --- a/include/io/channel.h
>> >> +++ b/include/io/channel.h
>> >> @@ -559,6 +559,19 @@ int qio_channel_close(QIOChannel *ioc,
>> >>  ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
>> >>  size_t niov, off_t offset, Error **errp);
>> >>  
>> >> +/**
>> >> + * qio_channel_pwritev_all:
>> >> + * @ioc: the channel object
>> >> + * @iov: the array of memory regions to write data from
>> >> + * @niov: the length of the @iov array
>> >> + * @offset: the iovec offset in the file where to write the data
>> >> + * @errp: pointer to a NULL-initialized error object
>> >> + *
>> >> + * Returns: 0 if all bytes were written, or -1 on error
>> >> + */
>> >> +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
>> >> +size_t niov, off_t offset, Error **errp);
>> >> +
>> >>  /**
>> >>   * qio_channel_pwrite
>> >>   * @ioc: the channel object
>> >> @@ -595,6 +608,19 @@ ssize_t qio_channel_pwrite(QIOChannel *ioc, char 
>> >> *buf, size_t buflen,
>> >>  ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
>> >> size_t niov, off_t offset, Error **errp);
>> >>  
>> >> +/**
>> >> + * qio_channel_preadv_all:
>> >> + * @ioc: the channel object
>> >> + * @iov: the array of memory regions to read data to
>> >> + * @niov: the length of the @iov array
>> >> + * @offset: the iovec offset in the file from where to read the data
>> >> + * @errp: pointer to a NULL-initialized error object
>> >> + *
>> >> + * Returns: 0 if all bytes were read, or -1 on error
>> >> + */
>> >> +int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
>> >> +   size_t niov, off_t offset, Error **errp);
>> >> +
>> >>  /**
>> >>   * qio_channel_pread
>> >>   * @ioc: the channel object
>> >> diff --git a/io/channel.c b/io/channel.c
>> >> index a1f12f8e90..2f1745d052 100644
>> >> --- a/io/channel.c
>> >> +++ b/io/channel.c
>> >> @@ -472,6 +472,69 @@ ssize_t qio_channel_pwritev(QIOChannel *ioc, const 
>> >> struct iovec *iov,
>> >>  return klass->io_pwritev(ioc, iov, niov, offset, errp);
>> >>  }
>> >>  
>> >> +static int qio_channel_preadv_pwritev_contiguous(QIOChannel *ioc,
>> >> + const struct iovec *iov,
>> >> + size_t niov, off_t 
>> >> offset,
>> >> + bool is_write, Error 
>> >> **errp)
>> >> +{
>> >> +ssize_t ret = -1;
>> >> +int i, slice_idx, slice_num;
>> >> +uintptr_t base, next, file_offset;
>> >> +size_t len;
>> >> +
>> >> +slice_idx = 0;
>> >> +slice_num = 1;
>> >> +
>> >> +/*
>> >> + * If the iov array doesn't have contiguous elements, we need to
>> >> + * split it in slices because we only have one (file) 'offset' for
>> >> + * the whole iov. Do this here so callers don't need to break the
>> >> + * iov array themselves.
>> >> + */
>> >> +for (i = 0; i < niov; i++, slice_num++) {
>> >> +base = (uintptr_t) iov[i].iov_base;
>> >> +
>> >> +if (i != niov - 1) {
>> >> +len = iov[i].iov_len;
>> >> +next = (uintptr_t) iov[i + 1].iov_base;
>> >> +
>> >> +if (base + len == next) {
>> >> +continue;
>> >> +}
>> >> +}
>> >> +
>> >> +/*
>> >> + * Use the offset of the first element of the segment that
>> >> + * we're sending.
>> >> + */
>> >> +file_offset = offset + (uintptr_t) iov[slice_idx].iov_base;
>> >> +
>> >> +if (is_write) {
>> >> +ret = qio_channel_pwritev(ioc, &iov[slice_idx], slice_num,
>> >> +  file_offset, errp);
>> >> +} else {
>> >> +ret = qio_channel_preadv(ioc, &iov[slice_idx], slice_num,
>> >> + file_offse

Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec

2024-01-17 Thread Daniel P . Berrangé
On Wed, Jan 17, 2024 at 12:39:26PM +, Daniel P. Berrangé wrote:
> On Mon, Nov 27, 2023 at 05:25:57PM -0300, Fabiano Rosas wrote:
> > For the upcoming support to fixed-ram migration with multifd, we need
> > to be able to accept an iovec array with non-contiguous data.
> > 
> > Add a pwritev and preadv version that splits the array into contiguous
> > segments before writing. With that we can have the ram code continue
> > to add pages in any order and the multifd code continue to send large
> > arrays for reading and writing.
> > 
> > Signed-off-by: Fabiano Rosas 
> > ---
> > - split the API that was merged into a single function
> > - use uintptr_t for compatibility with 32-bit
> > ---
> >  include/io/channel.h | 26 
> >  io/channel.c | 70 
> >  2 files changed, 96 insertions(+)
> > 
> > diff --git a/include/io/channel.h b/include/io/channel.h
> > index 7986c49c71..25383db5aa 100644
> > --- a/include/io/channel.h
> > +++ b/include/io/channel.h
> > @@ -559,6 +559,19 @@ int qio_channel_close(QIOChannel *ioc,
> >  ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
> >  size_t niov, off_t offset, Error **errp);
> >  
> > +/**
> > + * qio_channel_pwritev_all:
> > + * @ioc: the channel object
> > + * @iov: the array of memory regions to write data from
> > + * @niov: the length of the @iov array
> > + * @offset: the iovec offset in the file where to write the data
> > + * @errp: pointer to a NULL-initialized error object
> > + *
> > + * Returns: 0 if all bytes were written, or -1 on error
> > + */
> > +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
> > +size_t niov, off_t offset, Error **errp);
> > +
> >  /**
> >   * qio_channel_pwrite
> >   * @ioc: the channel object
> > @@ -595,6 +608,19 @@ ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, 
> > size_t buflen,
> >  ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
> > size_t niov, off_t offset, Error **errp);
> >  
> > +/**
> > + * qio_channel_preadv_all:
> > + * @ioc: the channel object
> > + * @iov: the array of memory regions to read data to
> > + * @niov: the length of the @iov array
> > + * @offset: the iovec offset in the file from where to read the data
> > + * @errp: pointer to a NULL-initialized error object
> > + *
> > + * Returns: 0 if all bytes were read, or -1 on error
> > + */
> > +int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
> > +   size_t niov, off_t offset, Error **errp);
> > +
> >  /**
> >   * qio_channel_pread
> >   * @ioc: the channel object
> > diff --git a/io/channel.c b/io/channel.c
> > index a1f12f8e90..2f1745d052 100644
> > --- a/io/channel.c
> > +++ b/io/channel.c
> > @@ -472,6 +472,69 @@ ssize_t qio_channel_pwritev(QIOChannel *ioc, const 
> > struct iovec *iov,
> >  return klass->io_pwritev(ioc, iov, niov, offset, errp);
> >  }
> >  
> > +static int qio_channel_preadv_pwritev_contiguous(QIOChannel *ioc,
> > + const struct iovec *iov,
> > + size_t niov, off_t offset,
> > + bool is_write, Error 
> > **errp)
> > +{
> > +ssize_t ret = -1;
> > +int i, slice_idx, slice_num;
> > +uintptr_t base, next, file_offset;
> > +size_t len;
> > +
> > +slice_idx = 0;
> > +slice_num = 1;
> > +
> > +/*
> > + * If the iov array doesn't have contiguous elements, we need to
> > + * split it in slices because we only have one (file) 'offset' for
> > + * the whole iov. Do this here so callers don't need to break the
> > + * iov array themselves.
> > + */
> > +for (i = 0; i < niov; i++, slice_num++) {
> > +base = (uintptr_t) iov[i].iov_base;
> > +
> > +if (i != niov - 1) {
> > +len = iov[i].iov_len;
> > +next = (uintptr_t) iov[i + 1].iov_base;
> > +
> > +if (base + len == next) {
> > +continue;
> > +}
> > +}
> > +
> > +/*
> > + * Use the offset of the first element of the segment that
> > + * we're sending.
> > + */
> > +file_offset = offset + (uintptr_t) iov[slice_idx].iov_base;
> > +
> > +if (is_write) {
> > +ret = qio_channel_pwritev(ioc, &iov[slice_idx], slice_num,
> > +  file_offset, errp);
> > +} else {
> > +ret = qio_channel_preadv(ioc, &iov[slice_idx], slice_num,
> > + file_offset, errp);
> > +}
> 
> iov_base is the address of a pointer in RAM, so could be
> potentially any 64-bit value.
> 
> We're assigning file_offset to this pointer address with an
> user supplied offset, and then using it as an offset on disk.
> First this could result in 64-b

Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec

2024-01-17 Thread Daniel P . Berrangé
On Mon, Nov 27, 2023 at 05:25:57PM -0300, Fabiano Rosas wrote:
> For the upcoming support to fixed-ram migration with multifd, we need
> to be able to accept an iovec array with non-contiguous data.
> 
> Add a pwritev and preadv version that splits the array into contiguous
> segments before writing. With that we can have the ram code continue
> to add pages in any order and the multifd code continue to send large
> arrays for reading and writing.
> 
> Signed-off-by: Fabiano Rosas 
> ---
> - split the API that was merged into a single function
> - use uintptr_t for compatibility with 32-bit
> ---
>  include/io/channel.h | 26 
>  io/channel.c | 70 
>  2 files changed, 96 insertions(+)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 7986c49c71..25383db5aa 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -559,6 +559,19 @@ int qio_channel_close(QIOChannel *ioc,
>  ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
>  size_t niov, off_t offset, Error **errp);
>  
> +/**
> + * qio_channel_pwritev_all:
> + * @ioc: the channel object
> + * @iov: the array of memory regions to write data from
> + * @niov: the length of the @iov array
> + * @offset: the iovec offset in the file where to write the data
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Returns: 0 if all bytes were written, or -1 on error
> + */
> +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
> +size_t niov, off_t offset, Error **errp);
> +
>  /**
>   * qio_channel_pwrite
>   * @ioc: the channel object
> @@ -595,6 +608,19 @@ ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, 
> size_t buflen,
>  ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
> size_t niov, off_t offset, Error **errp);
>  
> +/**
> + * qio_channel_preadv_all:
> + * @ioc: the channel object
> + * @iov: the array of memory regions to read data to
> + * @niov: the length of the @iov array
> + * @offset: the iovec offset in the file from where to read the data
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Returns: 0 if all bytes were read, or -1 on error
> + */
> +int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
> +   size_t niov, off_t offset, Error **errp);
> +
>  /**
>   * qio_channel_pread
>   * @ioc: the channel object
> diff --git a/io/channel.c b/io/channel.c
> index a1f12f8e90..2f1745d052 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -472,6 +472,69 @@ ssize_t qio_channel_pwritev(QIOChannel *ioc, const 
> struct iovec *iov,
>  return klass->io_pwritev(ioc, iov, niov, offset, errp);
>  }
>  
> +static int qio_channel_preadv_pwritev_contiguous(QIOChannel *ioc,
> + const struct iovec *iov,
> + size_t niov, off_t offset,
> + bool is_write, Error **errp)
> +{
> +ssize_t ret = -1;
> +int i, slice_idx, slice_num;
> +uintptr_t base, next, file_offset;
> +size_t len;
> +
> +slice_idx = 0;
> +slice_num = 1;
> +
> +/*
> + * If the iov array doesn't have contiguous elements, we need to
> + * split it in slices because we only have one (file) 'offset' for
> + * the whole iov. Do this here so callers don't need to break the
> + * iov array themselves.
> + */
> +for (i = 0; i < niov; i++, slice_num++) {
> +base = (uintptr_t) iov[i].iov_base;
> +
> +if (i != niov - 1) {
> +len = iov[i].iov_len;
> +next = (uintptr_t) iov[i + 1].iov_base;
> +
> +if (base + len == next) {
> +continue;
> +}
> +}
> +
> +/*
> + * Use the offset of the first element of the segment that
> + * we're sending.
> + */
> +file_offset = offset + (uintptr_t) iov[slice_idx].iov_base;
> +
> +if (is_write) {
> +ret = qio_channel_pwritev(ioc, &iov[slice_idx], slice_num,
> +  file_offset, errp);
> +} else {
> +ret = qio_channel_preadv(ioc, &iov[slice_idx], slice_num,
> + file_offset, errp);
> +}

iov_base is the address of a pointer in RAM, so could be
potentially any 64-bit value.

We're assigning file_offset to this pointer address with an
user supplied offset, and then using it as an offset on disk.
First this could result in 64-bit overflow when 'offset' is
added to 'iov_base', and second this could result in a file
that's 16 Exabytes in size (with holes of course).

I don't get how this is supposed to work, or be used ?

> +
> +if (ret < 0) {
> +break;
> +}
> +
> +slice_idx += slice_num;
> +slice_n

Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec

2024-01-17 Thread Peter Xu
On Tue, Jan 16, 2024 at 03:15:50PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Mon, Nov 27, 2023 at 05:25:57PM -0300, Fabiano Rosas wrote:
> >> For the upcoming support to fixed-ram migration with multifd, we need
> >> to be able to accept an iovec array with non-contiguous data.
> >> 
> >> Add a pwritev and preadv version that splits the array into contiguous
> >> segments before writing. With that we can have the ram code continue
> >> to add pages in any order and the multifd code continue to send large
> >> arrays for reading and writing.
> >> 
> >> Signed-off-by: Fabiano Rosas 
> >> ---
> >> - split the API that was merged into a single function
> >> - use uintptr_t for compatibility with 32-bit
> >> ---
> >>  include/io/channel.h | 26 
> >>  io/channel.c | 70 
> >>  2 files changed, 96 insertions(+)
> >> 
> >> diff --git a/include/io/channel.h b/include/io/channel.h
> >> index 7986c49c71..25383db5aa 100644
> >> --- a/include/io/channel.h
> >> +++ b/include/io/channel.h
> >> @@ -559,6 +559,19 @@ int qio_channel_close(QIOChannel *ioc,
> >>  ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
> >>  size_t niov, off_t offset, Error **errp);
> >>  
> >> +/**
> >> + * qio_channel_pwritev_all:
> >> + * @ioc: the channel object
> >> + * @iov: the array of memory regions to write data from
> >> + * @niov: the length of the @iov array
> >> + * @offset: the iovec offset in the file where to write the data
> >> + * @errp: pointer to a NULL-initialized error object
> >> + *
> >> + * Returns: 0 if all bytes were written, or -1 on error
> >> + */
> >> +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
> >> +size_t niov, off_t offset, Error **errp);
> >> +
> >>  /**
> >>   * qio_channel_pwrite
> >>   * @ioc: the channel object
> >> @@ -595,6 +608,19 @@ ssize_t qio_channel_pwrite(QIOChannel *ioc, char 
> >> *buf, size_t buflen,
> >>  ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
> >> size_t niov, off_t offset, Error **errp);
> >>  
> >> +/**
> >> + * qio_channel_preadv_all:
> >> + * @ioc: the channel object
> >> + * @iov: the array of memory regions to read data to
> >> + * @niov: the length of the @iov array
> >> + * @offset: the iovec offset in the file from where to read the data
> >> + * @errp: pointer to a NULL-initialized error object
> >> + *
> >> + * Returns: 0 if all bytes were read, or -1 on error
> >> + */
> >> +int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
> >> +   size_t niov, off_t offset, Error **errp);
> >> +
> >>  /**
> >>   * qio_channel_pread
> >>   * @ioc: the channel object
> >> diff --git a/io/channel.c b/io/channel.c
> >> index a1f12f8e90..2f1745d052 100644
> >> --- a/io/channel.c
> >> +++ b/io/channel.c
> >> @@ -472,6 +472,69 @@ ssize_t qio_channel_pwritev(QIOChannel *ioc, const 
> >> struct iovec *iov,
> >>  return klass->io_pwritev(ioc, iov, niov, offset, errp);
> >>  }
> >>  
> >> +static int qio_channel_preadv_pwritev_contiguous(QIOChannel *ioc,
> >> + const struct iovec *iov,
> >> + size_t niov, off_t 
> >> offset,
> >> + bool is_write, Error 
> >> **errp)
> >> +{
> >> +ssize_t ret = -1;
> >> +int i, slice_idx, slice_num;
> >> +uintptr_t base, next, file_offset;
> >> +size_t len;
> >> +
> >> +slice_idx = 0;
> >> +slice_num = 1;
> >> +
> >> +/*
> >> + * If the iov array doesn't have contiguous elements, we need to
> >> + * split it in slices because we only have one (file) 'offset' for
> >> + * the whole iov. Do this here so callers don't need to break the
> >> + * iov array themselves.
> >> + */
> >> +for (i = 0; i < niov; i++, slice_num++) {
> >> +base = (uintptr_t) iov[i].iov_base;
> >> +
> >> +if (i != niov - 1) {
> >> +len = iov[i].iov_len;
> >> +next = (uintptr_t) iov[i + 1].iov_base;
> >> +
> >> +if (base + len == next) {
> >> +continue;
> >> +}
> >> +}
> >> +
> >> +/*
> >> + * Use the offset of the first element of the segment that
> >> + * we're sending.
> >> + */
> >> +file_offset = offset + (uintptr_t) iov[slice_idx].iov_base;
> >> +
> >> +if (is_write) {
> >> +ret = qio_channel_pwritev(ioc, &iov[slice_idx], slice_num,
> >> +  file_offset, errp);
> >> +} else {
> >> +ret = qio_channel_preadv(ioc, &iov[slice_idx], slice_num,
> >> + file_offset, errp);
> >> +}
> >> +
> >> +if (ret < 0) {
> >> +break;
> >> +}
> >> +
> >> +slice_idx += sli

Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec

2024-01-16 Thread Fabiano Rosas
Peter Xu  writes:

> On Mon, Nov 27, 2023 at 05:25:57PM -0300, Fabiano Rosas wrote:
>> For the upcoming support to fixed-ram migration with multifd, we need
>> to be able to accept an iovec array with non-contiguous data.
>> 
>> Add a pwritev and preadv version that splits the array into contiguous
>> segments before writing. With that we can have the ram code continue
>> to add pages in any order and the multifd code continue to send large
>> arrays for reading and writing.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>> - split the API that was merged into a single function
>> - use uintptr_t for compatibility with 32-bit
>> ---
>>  include/io/channel.h | 26 
>>  io/channel.c | 70 
>>  2 files changed, 96 insertions(+)
>> 
>> diff --git a/include/io/channel.h b/include/io/channel.h
>> index 7986c49c71..25383db5aa 100644
>> --- a/include/io/channel.h
>> +++ b/include/io/channel.h
>> @@ -559,6 +559,19 @@ int qio_channel_close(QIOChannel *ioc,
>>  ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
>>  size_t niov, off_t offset, Error **errp);
>>  
>> +/**
>> + * qio_channel_pwritev_all:
>> + * @ioc: the channel object
>> + * @iov: the array of memory regions to write data from
>> + * @niov: the length of the @iov array
>> + * @offset: the iovec offset in the file where to write the data
>> + * @errp: pointer to a NULL-initialized error object
>> + *
>> + * Returns: 0 if all bytes were written, or -1 on error
>> + */
>> +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
>> +size_t niov, off_t offset, Error **errp);
>> +
>>  /**
>>   * qio_channel_pwrite
>>   * @ioc: the channel object
>> @@ -595,6 +608,19 @@ ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, 
>> size_t buflen,
>>  ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
>> size_t niov, off_t offset, Error **errp);
>>  
>> +/**
>> + * qio_channel_preadv_all:
>> + * @ioc: the channel object
>> + * @iov: the array of memory regions to read data to
>> + * @niov: the length of the @iov array
>> + * @offset: the iovec offset in the file from where to read the data
>> + * @errp: pointer to a NULL-initialized error object
>> + *
>> + * Returns: 0 if all bytes were read, or -1 on error
>> + */
>> +int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
>> +   size_t niov, off_t offset, Error **errp);
>> +
>>  /**
>>   * qio_channel_pread
>>   * @ioc: the channel object
>> diff --git a/io/channel.c b/io/channel.c
>> index a1f12f8e90..2f1745d052 100644
>> --- a/io/channel.c
>> +++ b/io/channel.c
>> @@ -472,6 +472,69 @@ ssize_t qio_channel_pwritev(QIOChannel *ioc, const 
>> struct iovec *iov,
>>  return klass->io_pwritev(ioc, iov, niov, offset, errp);
>>  }
>>  
>> +static int qio_channel_preadv_pwritev_contiguous(QIOChannel *ioc,
>> + const struct iovec *iov,
>> + size_t niov, off_t offset,
>> + bool is_write, Error 
>> **errp)
>> +{
>> +ssize_t ret = -1;
>> +int i, slice_idx, slice_num;
>> +uintptr_t base, next, file_offset;
>> +size_t len;
>> +
>> +slice_idx = 0;
>> +slice_num = 1;
>> +
>> +/*
>> + * If the iov array doesn't have contiguous elements, we need to
>> + * split it in slices because we only have one (file) 'offset' for
>> + * the whole iov. Do this here so callers don't need to break the
>> + * iov array themselves.
>> + */
>> +for (i = 0; i < niov; i++, slice_num++) {
>> +base = (uintptr_t) iov[i].iov_base;
>> +
>> +if (i != niov - 1) {
>> +len = iov[i].iov_len;
>> +next = (uintptr_t) iov[i + 1].iov_base;
>> +
>> +if (base + len == next) {
>> +continue;
>> +}
>> +}
>> +
>> +/*
>> + * Use the offset of the first element of the segment that
>> + * we're sending.
>> + */
>> +file_offset = offset + (uintptr_t) iov[slice_idx].iov_base;
>> +
>> +if (is_write) {
>> +ret = qio_channel_pwritev(ioc, &iov[slice_idx], slice_num,
>> +  file_offset, errp);
>> +} else {
>> +ret = qio_channel_preadv(ioc, &iov[slice_idx], slice_num,
>> + file_offset, errp);
>> +}
>> +
>> +if (ret < 0) {
>> +break;
>> +}
>> +
>> +slice_idx += slice_num;
>> +slice_num = 0;
>> +}
>> +
>> +return (ret < 0) ? -1 : 0;
>> +}
>> +
>> +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
>> +size_t niov, off_t offset, Error **errp)
>> +{
>> +return qio_channel_preadv_pwritev_contiguous(ioc, iov, niov,
>

Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec

2024-01-15 Thread Peter Xu
On Mon, Nov 27, 2023 at 05:25:57PM -0300, Fabiano Rosas wrote:
> For the upcoming support to fixed-ram migration with multifd, we need
> to be able to accept an iovec array with non-contiguous data.
> 
> Add a pwritev and preadv version that splits the array into contiguous
> segments before writing. With that we can have the ram code continue
> to add pages in any order and the multifd code continue to send large
> arrays for reading and writing.
> 
> Signed-off-by: Fabiano Rosas 
> ---
> - split the API that was merged into a single function
> - use uintptr_t for compatibility with 32-bit
> ---
>  include/io/channel.h | 26 
>  io/channel.c | 70 
>  2 files changed, 96 insertions(+)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 7986c49c71..25383db5aa 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -559,6 +559,19 @@ int qio_channel_close(QIOChannel *ioc,
>  ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
>  size_t niov, off_t offset, Error **errp);
>  
> +/**
> + * qio_channel_pwritev_all:
> + * @ioc: the channel object
> + * @iov: the array of memory regions to write data from
> + * @niov: the length of the @iov array
> + * @offset: the iovec offset in the file where to write the data
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Returns: 0 if all bytes were written, or -1 on error
> + */
> +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
> +size_t niov, off_t offset, Error **errp);
> +
>  /**
>   * qio_channel_pwrite
>   * @ioc: the channel object
> @@ -595,6 +608,19 @@ ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, 
> size_t buflen,
>  ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
> size_t niov, off_t offset, Error **errp);
>  
> +/**
> + * qio_channel_preadv_all:
> + * @ioc: the channel object
> + * @iov: the array of memory regions to read data to
> + * @niov: the length of the @iov array
> + * @offset: the iovec offset in the file from where to read the data
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Returns: 0 if all bytes were read, or -1 on error
> + */
> +int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
> +   size_t niov, off_t offset, Error **errp);
> +
>  /**
>   * qio_channel_pread
>   * @ioc: the channel object
> diff --git a/io/channel.c b/io/channel.c
> index a1f12f8e90..2f1745d052 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -472,6 +472,69 @@ ssize_t qio_channel_pwritev(QIOChannel *ioc, const 
> struct iovec *iov,
>  return klass->io_pwritev(ioc, iov, niov, offset, errp);
>  }
>  
> +static int qio_channel_preadv_pwritev_contiguous(QIOChannel *ioc,
> + const struct iovec *iov,
> + size_t niov, off_t offset,
> + bool is_write, Error **errp)
> +{
> +ssize_t ret = -1;
> +int i, slice_idx, slice_num;
> +uintptr_t base, next, file_offset;
> +size_t len;
> +
> +slice_idx = 0;
> +slice_num = 1;
> +
> +/*
> + * If the iov array doesn't have contiguous elements, we need to
> + * split it in slices because we only have one (file) 'offset' for
> + * the whole iov. Do this here so callers don't need to break the
> + * iov array themselves.
> + */
> +for (i = 0; i < niov; i++, slice_num++) {
> +base = (uintptr_t) iov[i].iov_base;
> +
> +if (i != niov - 1) {
> +len = iov[i].iov_len;
> +next = (uintptr_t) iov[i + 1].iov_base;
> +
> +if (base + len == next) {
> +continue;
> +}
> +}
> +
> +/*
> + * Use the offset of the first element of the segment that
> + * we're sending.
> + */
> +file_offset = offset + (uintptr_t) iov[slice_idx].iov_base;
> +
> +if (is_write) {
> +ret = qio_channel_pwritev(ioc, &iov[slice_idx], slice_num,
> +  file_offset, errp);
> +} else {
> +ret = qio_channel_preadv(ioc, &iov[slice_idx], slice_num,
> + file_offset, errp);
> +}
> +
> +if (ret < 0) {
> +break;
> +}
> +
> +slice_idx += slice_num;
> +slice_num = 0;
> +}
> +
> +return (ret < 0) ? -1 : 0;
> +}
> +
> +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
> +size_t niov, off_t offset, Error **errp)
> +{
> +return qio_channel_preadv_pwritev_contiguous(ioc, iov, niov,
> + offset, true, errp);
> +}

I'm not sure how Dan thinks about this, but I don't think this is pretty..

Wi

[RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec

2023-11-27 Thread Fabiano Rosas
For the upcoming support to fixed-ram migration with multifd, we need
to be able to accept an iovec array with non-contiguous data.

Add a pwritev and preadv version that splits the array into contiguous
segments before writing. With that we can have the ram code continue
to add pages in any order and the multifd code continue to send large
arrays for reading and writing.

Signed-off-by: Fabiano Rosas 
---
- split the API that was merged into a single function
- use uintptr_t for compatibility with 32-bit
---
 include/io/channel.h | 26 
 io/channel.c | 70 
 2 files changed, 96 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index 7986c49c71..25383db5aa 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -559,6 +559,19 @@ int qio_channel_close(QIOChannel *ioc,
 ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
 size_t niov, off_t offset, Error **errp);
 
+/**
+ * qio_channel_pwritev_all:
+ * @ioc: the channel object
+ * @iov: the array of memory regions to write data from
+ * @niov: the length of the @iov array
+ * @offset: the iovec offset in the file where to write the data
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Returns: 0 if all bytes were written, or -1 on error
+ */
+int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
+size_t niov, off_t offset, Error **errp);
+
 /**
  * qio_channel_pwrite
  * @ioc: the channel object
@@ -595,6 +608,19 @@ ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, 
size_t buflen,
 ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
size_t niov, off_t offset, Error **errp);
 
+/**
+ * qio_channel_preadv_all:
+ * @ioc: the channel object
+ * @iov: the array of memory regions to read data to
+ * @niov: the length of the @iov array
+ * @offset: the iovec offset in the file from where to read the data
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Returns: 0 if all bytes were read, or -1 on error
+ */
+int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
+   size_t niov, off_t offset, Error **errp);
+
 /**
  * qio_channel_pread
  * @ioc: the channel object
diff --git a/io/channel.c b/io/channel.c
index a1f12f8e90..2f1745d052 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -472,6 +472,69 @@ ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct 
iovec *iov,
 return klass->io_pwritev(ioc, iov, niov, offset, errp);
 }
 
+static int qio_channel_preadv_pwritev_contiguous(QIOChannel *ioc,
+ const struct iovec *iov,
+ size_t niov, off_t offset,
+ bool is_write, Error **errp)
+{
+ssize_t ret = -1;
+int i, slice_idx, slice_num;
+uintptr_t base, next, file_offset;
+size_t len;
+
+slice_idx = 0;
+slice_num = 1;
+
+/*
+ * If the iov array doesn't have contiguous elements, we need to
+ * split it in slices because we only have one (file) 'offset' for
+ * the whole iov. Do this here so callers don't need to break the
+ * iov array themselves.
+ */
+for (i = 0; i < niov; i++, slice_num++) {
+base = (uintptr_t) iov[i].iov_base;
+
+if (i != niov - 1) {
+len = iov[i].iov_len;
+next = (uintptr_t) iov[i + 1].iov_base;
+
+if (base + len == next) {
+continue;
+}
+}
+
+/*
+ * Use the offset of the first element of the segment that
+ * we're sending.
+ */
+file_offset = offset + (uintptr_t) iov[slice_idx].iov_base;
+
+if (is_write) {
+ret = qio_channel_pwritev(ioc, &iov[slice_idx], slice_num,
+  file_offset, errp);
+} else {
+ret = qio_channel_preadv(ioc, &iov[slice_idx], slice_num,
+ file_offset, errp);
+}
+
+if (ret < 0) {
+break;
+}
+
+slice_idx += slice_num;
+slice_num = 0;
+}
+
+return (ret < 0) ? -1 : 0;
+}
+
+int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
+size_t niov, off_t offset, Error **errp)
+{
+return qio_channel_preadv_pwritev_contiguous(ioc, iov, niov,
+ offset, true, errp);
+}
+
 ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, size_t buflen,
off_t offset, Error **errp)
 {
@@ -501,6 +564,13 @@ ssize_t qio_channel_preadv(QIOChannel *ioc, const struct 
iovec *iov,
 return klass->io_preadv(ioc, iov, niov, offset, errp);
 }
 
+int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
+   size_t niov, off_t off