On Fri, Aug 04, 2023 at 12:14:32PM +0100, Richard W.M. Jones wrote:
> On Fri, Aug 04, 2023 at 11:49:18AM +0100, Richard W.M. Jones wrote:
> > On Wed, Aug 02, 2023 at 08:50:21PM -0500, Eric Blake wrote:
> > > Previously, we had not been doing any validation of server extent
> > > responses, which means a client query at an offset near the end of the
> > > export can result in a buggy server sending a response longer than the
> > > export length and potentially confusing the client.  The NBD spec also
> > > says that an extent length should be non-zero so that a successful
> > > block status call makes progress.  It is easy enough to track that the
> > > server has not overflowed the export size, and that we ensure an error
> > > on no progress even when the buggy server claims success.  Since the
> > > spec says a client should be prepared for a block status result to be
> > > truncated, the client should not care whether the truncation happened
> > > at the server or at libnbd after validating the server's response.
> > > 
> > > In the process, this patch reorganizes some of the code so that early
> > > exits are obvious, leading for less indentation in the success path.
> > > 
> > > Adding this sanity checking now makes it easier for future patches to
> > > do orthogonal support for a server's 32- or 64-bit reply, vs. a
> > > client's 32- or 64-bit API call.  Once 64-bit replies are in play, we
> > > will additionally have to worry about a 64-bit reply that overflows a
> > > 32-bit API callback without exceeding the exportsize.  Similarly,
> > > since nbd_get_size() already caps export size at 63 bits (based on
> > > off_t limitations), we have guaranteed that a 64-bit API callback will
> > > never see an extent length that could appear negative in a 64-bit
> > > signed type (at least OCaml benefits from that guarantee, since its
> > > only native 64-bit integer type is signed).
> > > 
> > > Signed-off-by: Eric Blake <ebl...@redhat.com>
> > > ---
> > > 
> > > v4: new patch
> > > ---
> > >  generator/states-reply-chunk.c | 78 +++++++++++++++++++++++++---------
> > >  1 file changed, 58 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/generator/states-reply-chunk.c 
> > > b/generator/states-reply-chunk.c
> > > index 17bb5149..735f9456 100644
> > > --- a/generator/states-reply-chunk.c
> > > +++ b/generator/states-reply-chunk.c
> > > @@ -461,6 +461,11 @@  REPLY.CHUNK_REPLY.RECV_BS_ENTRIES:
> > >    struct command *cmd = h->reply_cmd;
> > >    size_t i;
> > >    uint32_t context_id;
> > > +  int error;
> > > +  const char *name;
> > > +  uint32_t orig_len, len, flags;
> > > +  uint64_t total, cap;
> > > +  bool stop;
> > > 
> > >    switch (recv_into_rbuf (h)) {
> > >    case -1: SET_NEXT_STATE (%.DEAD); return 0;
> > > @@ -481,30 +486,63 @@  REPLY.CHUNK_REPLY.RECV_BS_ENTRIES:
> > >        if (context_id == h->meta_contexts.ptr[i].context_id)
> > >          break;
> > > 
> > > -    if (i < h->meta_contexts.len) {
> > > -      int error = cmd->error;
> > > -      const char *name = h->meta_contexts.ptr[i].name;
> > > -
> > > -      /* Need to byte-swap the entries returned, but apart from that
> > > -       * we don't validate them.  Yes, our 32-bit public API foolishly
> > > -       * tracks the number of uint32_t instead of block descriptors;
> > > -       * see _block_desc_is_multiple_of_bs_entry above.
> > > -       */
> > > -      for (i = 0; i < h->bs_count * 2; ++i)
> > > -        h->bs_entries[i] = be32toh (h->bs_entries[i]);
> > > -
> > > -      /* Call the caller's extent function.  */
> > > -      if (CALL_CALLBACK (cmd->cb.fn.extent, name, cmd->offset,
> > > -                         h->bs_entries, h->bs_count * 2, &error) == -1)
> > > -        if (cmd->error == 0)
> > > -          cmd->error = error ? error : EPROTO;
> > > -    }
> > > -    else
> > > +    SET_NEXT_STATE (%FINISH);
> > > +    if (i == h->meta_contexts.len) {
> > >        /* Emit a debug message, but ignore it. */
> > >        debug (h, "server sent unexpected meta context ID %" PRIu32,
> > >               context_id);
> > > +      break;
> > > +    }
> > > 
> > > -    SET_NEXT_STATE (%FINISH);
> > > +    name = h->meta_contexts.ptr[i].name;
> > > +    total = 0;
> > > +    cap = h->exportsize - cmd->offset;
> > > +    assert (cap <= h->exportsize);
> > > +
> > > +    /* Need to byte-swap the entries returned.  The NBD protocol
> > > +     * allows truncation as long as progress is made; the client
> > > +     * cannot tell the difference between a server's truncation or if
> > > +     * we truncate on a length we don't like.  We stop iterating on a
> > > +     * zero-length extent (error only if it is the first extent), and
> > > +     * on an extent beyond the exportsize (unconditional error after
> > > +     * truncating to exportsize); but do not diagnose issues with the
> > > +     * server's length alignments, flag values, nor compliance with
> > > +     * the REQ_ONE command flag.
> > > +     */
> > > +    for (i = 0, stop = false; i < h->bs_count && !stop; ++i) {
> > > +      orig_len = len = be32toh (h->bs_entries[i * 2]);
> > > +      flags = be32toh (h->bs_entries[i * 2 + 1]);
> > > +      total += len;
> > > +      if (len == 0) {
> > > +        stop = true;
> > > +        if (i > 0)
> > > +          break; /* Skip this and later extents; we already made 
> > > progress */
> > > +        /* Expose this extent as an error; we made no progress */
> > > +        cmd->error = cmd->error ? : EPROTO;
> > > +      }
> > > +      else if (total > cap) {
> > > +        /* Expose this extent as an error, after truncating to make 
> > > progress */
> > > +        stop = true;
> > > +        cmd->error = cmd->error ? : EPROTO;
> > > +        len -= total - cap;
> > > +      }
> > > +      h->bs_entries[i * 2] = len;
> > > +      h->bs_entries[i * 2 + 1] = flags;
> > > +    }
> > > +
> > > +    /* Call the caller's extent function.  Yes, our 32-bit public API
> > > +     * foolishly tracks the number of uint32_t instead of block
> > > +     * descriptors; see _block_desc_is_multiple_of_bs_entry above.
> > > +     */
> > > +    if (stop)
> > > +      debug (h, "truncating server's response at unexpected extent 
> > > length %"
> > > +             PRIu32 " and total %" PRIu64 " near extent %zu",
> > > +             orig_len, total, i);
> > > +    error = cmd->error;
> > > +    if (CALL_CALLBACK (cmd->cb.fn.extent, name, cmd->offset,
> > > +                       h->bs_entries, i * 2, &error) == -1)
> > > +      if (cmd->error == 0)
> > > +        cmd->error = error ? error : EPROTO;
> > >    }
> > >    return 0;
> > > 
> > 
> > Acked-by: Richard W.M. Jones <rjo...@redhat.com>
> > 
> > ... although I wonder if this might break something.  I think it's
> > possible for an nbdkit plugin to return a zero length extent, for
> > example if it has a simplistic internal model of regions of the disk.
> > Since the client can still make progress if at least the total length
> > of extents returned is > 0 I'm fairly sure this would work right now.
> 
> Like this:
> 
> $ cat extents.sh
> case "$1" in
>     get_size) echo 10M ;;
>     pread) dd if=/dev/zero count=$3 iflag=count_bytes ;;
>     can_extents) exit 0 ;;
>     extents)
>         echo "0 1M"
>         echo "1M 0 hole,zero"
>         echo "1M 9M"
>         exit 0 ;;
>     *) exit 2 ;;
> esac
> 
> $ nbdinfo --map [ nbdkit sh ./extents.sh ] 
>          0    10485760    0  data
> 
> I wonder if this breaks after this patch?

I looked at the code & it turns out that nbdkit ignores zero-length
extents, and merges extents with the same type, so I think the above
is actually sending a single 10M extent.

https://gitlab.com/nbdkit/nbdkit/-/blob/e94597fb99a86a38fcc0aea32c6102565c9bfd9d/server/extents.c#L158

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Reply via email to