Re: [Qemu-devel] [snabb-devel] Re: [PATCH v6 1/2] vhost-user: add multi queue support

Ouyang, Changchun Thu, 27 Aug 2015 18:55:10 -0700

Hi Michael,

Thanks for your response!


> -----Original Message-----
> From: snabb-de...@googlegroups.com [mailto:snabb-
> de...@googlegroups.com] On Behalf Of Michael S. Tsirkin
> Sent: Thursday, August 27, 2015 9:05 PM
> To: Ouyang, Changchun
> Cc: snabb-de...@googlegroups.com; qemu-devel@nongnu.org;
> thibaut.col...@6wind.com; n.nikol...@virtualopensystems.com;
> l...@snabb.co; Long, Thomas; Liu, Yuanhan
> Subject: Re: [snabb-devel] Re: [PATCH v6 1/2] vhost-user: add multi queue
> support
> 
> On Tue, Aug 25, 2015 at 03:25:54AM +0000, Ouyang, Changchun wrote:
> > Hi Michael,
> >
> > > -----Original Message-----
> > > From: snabb-de...@googlegroups.com [mailto:snabb-
> > > de...@googlegroups.com] On Behalf Of Michael S. Tsirkin
> > > Sent: Thursday, August 13, 2015 5:19 PM
> > > To: Ouyang, Changchun
> > > Cc: qemu-devel@nongnu.org; snabb-de...@googlegroups.com;
> > > thibaut.col...@6wind.com; n.nikol...@virtualopensystems.com;
> > > l...@snabb.co; Long, Thomas
> > > Subject: [snabb-devel] Re: [PATCH v6 1/2] vhost-user: add multi
> > > queue support
> > >
> > > On Wed, Aug 12, 2015 at 02:25:41PM +0800, Ouyang Changchun wrote:
> > > > Based on patch by Nikolay Nikolaev:
> > > > Vhost-user will implement the multi queue support in a similar way
> > > > to what vhost already has - a separate thread for each queue.
> > > > To enable the multi queue functionality - a new command line
> > > > parameter "queues" is introduced for the vhost-user netdev.
> > > >
> > > > The RESET_OWNER change is based on commit:
> > > >    294ce717e0f212ed0763307f3eab72b4a1bdf4d0
> > > > If it is reverted, the patch need update for it accordingly.
> > > >
> > > > Signed-off-by: Nikolay Nikolaev
> > > > <n.nikol...@virtualopensystems.com>
> > > > Signed-off-by: Changchun Ouyang <changchun.ouy...@intel.com>
> > > > ---
> > > > Changes since v5:
> > > >  - fix the message descption for VHOST_RESET_OWNER in vhost-user
> > > > txt
> > > >
> > > > Changes since v4:
> > > >  - remove the unnecessary trailing '\n'
> > > >
> > > > Changes since v3:
> > > >  - fix one typo and wrap one long line
> > > >
> > > > Changes since v2:
> > > >  - fix vq index issue for set_vring_call
> > > >    When it is the case of VHOST_SET_VRING_CALL, The vq_index is
> > > > not
> > > initialized before it is used,
> > > >    thus it could be a random value. The random value leads to
> > > > crash in vhost
> > > after passing down
> > > >    to vhost, as vhost use this random value to index an array index.
> > > >  - fix the typo in the doc and description
> > > >  - address vq index for reset_owner
> > > >
> > > > Changes since v1:
> > > >  - use s->nc.info_str when bringing up/down the backend
> > > >
> > > >  docs/specs/vhost-user.txt |  7 ++++++-
> > > >  hw/net/vhost_net.c        |  3 ++-
> > > >  hw/virtio/vhost-user.c    | 11 ++++++++++-
> > > >  net/vhost-user.c          | 37 ++++++++++++++++++++++++-------------
> > > >  qapi-schema.json          |  6 +++++-
> > > >  qemu-options.hx           |  5 +++--
> > > >  6 files changed, 50 insertions(+), 19 deletions(-)
> > > >
> > > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> > > > index 70da3b1..9390f89 100644
> > > > --- a/docs/specs/vhost-user.txt
> > > > +++ b/docs/specs/vhost-user.txt
> > > > @@ -135,6 +135,11 @@ As older slaves don't support negotiating
> > > > protocol features,  a feature bit was dedicated for this purpose:
> > > >  #define VHOST_USER_F_PROTOCOL_FEATURES 30
> > > >
> > > > +Multi queue support
> > > > +-------------------
> > > > +The protocol supports multiple queues by setting all index fields
> > > > +in the sent messages to a properly calculated value.
> > > > +
> > > >  Message types
> > > >  -------------
> > > >
> > > > @@ -198,7 +203,7 @@ Message types
> > > >
> > > >        Id: 4
> > > >        Equivalent ioctl: VHOST_RESET_OWNER
> > > > -      Master payload: N/A
> > > > +      Master payload: vring state description
> > > >
> > > >        Issued when a new connection is about to be closed. The Master
> will no
> > > >        longer own this connection (and will usually close it).
> > >
> > > This is an interface change, isn't it?
> > > We can't make it unconditionally, need to make it dependent on a
> > > protocol flag.
> 
> 
> Pls remember to fix this one.
> 
> > >
> > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index
> > > > 1f25cb3..9cd6c05 100644
> > > > --- a/hw/net/vhost_net.c
> > > > +++ b/hw/net/vhost_net.c
> > > > @@ -159,6 +159,7 @@ struct vhost_net
> > > > *vhost_net_init(VhostNetOptions
> > > > *options)
> > > >
> > > >      net->dev.nvqs = 2;
> > > >      net->dev.vqs = net->vqs;
> > > > +    net->dev.vq_index = net->nc->queue_index;
> > > >
> > > >      r = vhost_dev_init(&net->dev, options->opaque,
> > > >                         options->backend_type, options->force); @@
> > > > -269,7 +270,7 @@ static void vhost_net_stop_one(struct vhost_net
> *net,
> > > >          for (file.index = 0; file.index < net->dev.nvqs; ++file.index) 
> > > > {
> > > >              const VhostOps *vhost_ops = net->dev.vhost_ops;
> > > >              int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER,
> > > > -                                          NULL);
> > > > +                                          &file);
> > > >              assert(r >= 0);
> > > >          }
> > > >      }
> > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index
> > > > 27ba035..fb11d4c 100644
> > > > --- a/hw/virtio/vhost-user.c
> > > > +++ b/hw/virtio/vhost-user.c
> > > > @@ -219,7 +219,12 @@ static int vhost_user_call(struct vhost_dev
> > > > *dev,
> > > unsigned long int request,
> > > >          break;
> > > >
> > > >      case VHOST_USER_SET_OWNER:
> > > > +        break;
> > > > +
> > > >      case VHOST_USER_RESET_OWNER:
> > > > +        memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> > > > +        msg.state.index += dev->vq_index;
> > > > +        msg.size = sizeof(m.state);
> > > >          break;
> > > >
> > > >      case VHOST_USER_SET_MEM_TABLE:
> > > > @@ -262,17 +267,20 @@ static int vhost_user_call(struct vhost_dev
> > > > *dev,
> > > unsigned long int request,
> > > >      case VHOST_USER_SET_VRING_NUM:
> > > >      case VHOST_USER_SET_VRING_BASE:
> > > >          memcpy(&msg.state, arg, sizeof(struct
> > > > vhost_vring_state));
> > > > +        msg.state.index += dev->vq_index;
> > > >          msg.size = sizeof(m.state);
> > > >          break;
> > > >
> > > >      case VHOST_USER_GET_VRING_BASE:
> > > >          memcpy(&msg.state, arg, sizeof(struct
> > > > vhost_vring_state));
> > > > +        msg.state.index += dev->vq_index;
> > > >          msg.size = sizeof(m.state);
> > > >          need_reply = 1;
> > > >          break;
> > > >
> > > >      case VHOST_USER_SET_VRING_ADDR:
> > > >          memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr));
> > > > +        msg.addr.index += dev->vq_index;
> > > >          msg.size = sizeof(m.addr);
> > > >          break;
> > > >
> > > > @@ -280,7 +288,7 @@ static int vhost_user_call(struct vhost_dev
> > > > *dev,
> > > unsigned long int request,
> > > >      case VHOST_USER_SET_VRING_CALL:
> > > >      case VHOST_USER_SET_VRING_ERR:
> > > >          file = arg;
> > > > -        msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK;
> > > > +        msg.u64 = (file->index + dev->vq_index) &
> > > > + VHOST_USER_VRING_IDX_MASK;
> > > >          msg.size = sizeof(m.u64);
> > > >          if (ioeventfd_enabled() && file->fd > 0) {
> > > >              fds[fd_num++] = file->fd; @@ -322,6 +330,7 @@ static
> > > > int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
> > > >                  error_report("Received bad msg size.");
> > > >                  return -1;
> > > >              }
> > > > +            msg.state.index -= dev->vq_index;
> > > >              memcpy(arg, &msg.state, sizeof(struct vhost_vring_state));
> > > >              break;
> > > >          default:
> > > > diff --git a/net/vhost-user.c b/net/vhost-user.c index
> > > > 1d86a2b..904d8af 100644
> > > > --- a/net/vhost-user.c
> > > > +++ b/net/vhost-user.c
> > > > @@ -121,35 +121,39 @@ static void net_vhost_user_event(void
> > > > *opaque,
> > > int event)
> > > >      case CHR_EVENT_OPENED:
> > > >          vhost_user_start(s);
> > > >          net_vhost_link_down(s, false);
> > > > -        error_report("chardev \"%s\" went up", s->chr->label);
> > > > +        error_report("chardev \"%s\" went up", s->nc.info_str);
> > > >          break;
> > > >      case CHR_EVENT_CLOSED:
> > > >          net_vhost_link_down(s, true);
> > > >          vhost_user_stop(s);
> > > > -        error_report("chardev \"%s\" went down", s->chr->label);
> > > > +        error_report("chardev \"%s\" went down", s->nc.info_str);
> > > >          break;
> > > >      }
> > > >  }
> 
> BTW this seems pretty hacky: you get multiple messages when one client
> connects. Why add multiple event listeners to the same chat device?
> 
> 
> > > >
> > > >  static int net_vhost_user_init(NetClientState *peer, const char 
> > > > *device,
> > > > -                               const char *name, CharDriverState *chr)
> > > > +                               const char *name, CharDriverState *chr,
> > > > +                               uint32_t queues)
> > > >  {
> > > >      NetClientState *nc;
> > > >      VhostUserState *s;
> > > > +    int i;
> > > >
> > > > -    nc = qemu_new_net_client(&net_vhost_user_info, peer, device,
> > > name);
> > > > +    for (i = 0; i < queues; i++) {
> > > > +        nc = qemu_new_net_client(&net_vhost_user_info, peer,
> > > > + device, name);
> > > >
> > > > -    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to %s",
> > > > -             chr->label);
> > > > +        snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to 
> > > > %s",
> > > > +                 i, chr->label);
> > > >
> > > > -    s = DO_UPCAST(VhostUserState, nc, nc);
> > > > +        s = DO_UPCAST(VhostUserState, nc, nc);
> > > >
> > > > -    /* We don't provide a receive callback */
> > > > -    s->nc.receive_disabled = 1;
> > > > -    s->chr = chr;
> > > > -
> > > > -    qemu_chr_add_handlers(s->chr, NULL, NULL,
> net_vhost_user_event,
> > > s);
> > > > +        /* We don't provide a receive callback */
> > > > +        s->nc.receive_disabled = 1;
> > > > +        s->chr = chr;
> > > > +        s->nc.queue_index = i;
> > > >
> > > > +        qemu_chr_add_handlers(s->chr, NULL, NULL,
> > > > + net_vhost_user_event,
> > > s);
> > > > +    }
> > > >      return 0;
> > > >  }
> > > >
> > > > @@ -225,6 +229,7 @@ static int net_vhost_check_net(QemuOpts
> *opts,
> > > > void *opaque)
> > >
> > >
> > > There are two problems here:
> > >
> > > 1. we don't really know that the backend
> > >    is able to support the requested number of queues.
> > >    If not, everything will fail, silently.
> > >    A new message to query the # of queues could help, though
> > >    I'm not sure what can be done on failure. Fail connection?
> > >
> > > 2. each message (e.g. set memory table) is sent multiple times,
> > >    on the same socket.
> > >
> > I think it is tough to resolve these 2 comments, as the current
> > message is either vhost-dev based or virt-queue based, The multiple
> queues(pair) feature use multiple vhost-devs to implement itself.
> > For #1
> > So the queue number is something should be seen in the upper level of
> vhost-dev rather than inside the vhost-dev.
> > For each vhost-net, there are 2 virt-queues, one is for Rx the other is for 
> > Tx.
> > introduce the virt-queue pair number into the vhost-dev? But I don't
> > think it is good, as for each vhost-dev, there is only one virt-queue pair.
> >
> > Where should I put the virt-queue pair number to? I don't get the perfect
> answer till now. Any suggestion is welcome.
> >
> > Could we assume the vhost backend has the ability to create enough
> > virt-queue pair(e.g. 0x8000 is the max) if qemu require Vhost backend
> > to do it. If it is correct, we don't need get virt-queue pair number from
> vhost backend, as vhost backend can Create all virt-queue pair required by
> qemu.
> > The virtio frontend(on guest) has the flexibility to enable which virt-queue
> according to its own capability, qemu can do it by using
> > Set_vring_flag message to notify vhost backend.
> 
> I'm reluctant to agree to this. Implementations tend to get this wrong, e.g.
> they would only test with 2 queues and assume everything is OK.
> With an explicit message, this seems more robust.
> 
> Why is it so hard to implement?  User specifies queues=X.
> Can't we simply validate that backend supports this # of queues?

Yes exactly in the qemu, we use qeueus=X to specify the virt-queue pair number.
In my previous comments, I am talking about how to resolve such scenario: 
vhost backend specify the queues=Y(Y != X, e.g. X = 8, Y = 4), how vhost 
backend tell
qemu that it only has the capability of max virt-queue pair is 4, and qemu need 
update its
max queue number according to its own parameter and the max value from vhost 
back end.

Do you think whether this scenario makes sense or not?
If yes, then we should consider extend new message to allow vhost backend 
communicate 
Its virt-queue number to qemu.
More than that, either we need change codes to allow each net_client has more 
than 
1 virt-queue pair(==2 virt-queue, one for Rx, the other for Tx), 
Or another better solution for that? 

> 
> 
> > For #2
> > The memory table message is also vhost-dev based, it wouldn't hurt we
> > send it a few times, vhost backend could Keep it vhost-dev based too,
> > or keep it once(keep it when first time and ignore in rest messages
> > from the same connected-fd) Any other good suggestion is welcome too
> > :-)
> 
> Add code in vhost-user to skip sending the useless messages?  Yes they
> seem harmless but implementations tend to develop dependencies on such
> bugs, then we get to maintain them forever.

Considering we use multiple vhost_net(multiple net_client) to fulfill multiple 
virt-queue pairs.
So only the first virt-queue pair will send the set-memory-table, others won't 
do it.
Is that ok? so apparently each net client will have difference in message count.
The first net client will have the message of set memory table, others have 
not. 

Thanks
Changchun

> 
> > >
> > >
> > > >  int net_init_vhost_user(const NetClientOptions *opts, const char
> *name,
> > > >                          NetClientState *peer)  {
> > > > +    uint32_t queues;
> > > >      const NetdevVhostUserOptions *vhost_user_opts;
> > > >      CharDriverState *chr;
> > > >
> > > > @@ -243,6 +248,12 @@ int net_init_vhost_user(const
> > > > NetClientOptions
> > > *opts, const char *name,
> > > >          return -1;
> > > >      }
> > > >
> > > > +    /* number of queues for multiqueue */
> > > > +    if (vhost_user_opts->has_queues) {
> > > > +        queues = vhost_user_opts->queues;
> > > > +    } else {
> > > > +        queues = 1;
> > > > +    }
> > > >
> > > > -    return net_vhost_user_init(peer, "vhost_user", name, chr);
> > > > +    return net_vhost_user_init(peer, "vhost_user", name, chr,
> > > > + queues);
> > > >  }
> > > > diff --git a/qapi-schema.json b/qapi-schema.json index
> > > > f97ffa1..51e40ce 100644
> > > > --- a/qapi-schema.json
> > > > +++ b/qapi-schema.json
> > > > @@ -2444,12 +2444,16 @@
> > > >  #
> > > >  # @vhostforce: #optional vhost on for non-MSIX virtio guests (default:
> > > false).
> > > >  #
> > > > +# @queues: #optional number of queues to be created for
> > > > +multiqueue
> > > vhost-user
> > > > +#          (default: 1) (Since 2.5)
> > > > +#
> > > >  # Since 2.1
> > > >  ##
> > > >  { 'struct': 'NetdevVhostUserOptions',
> > > >    'data': {
> > > >      'chardev':        'str',
> > > > -    '*vhostforce':    'bool' } }
> > > > +    '*vhostforce':    'bool',
> > > > +    '*queues':        'uint32' } }
> > > >
> > > >  ##
> > > >  # @NetClientOptions
> > > > diff --git a/qemu-options.hx b/qemu-options.hx index
> > > > ec356f6..dad035e
> > > > 100644
> > > > --- a/qemu-options.hx
> > > > +++ b/qemu-options.hx
> > > > @@ -1942,13 +1942,14 @@ The hubport netdev lets you connect a NIC
> > > > to a QEMU "vlan" instead of a single  netdev.  @code{-net} and
> > > > @code{-device} with parameter @option{vlan} create the  required
> > > > hub
> > > automatically.
> > > >
> > > > -@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off]
> > > > +@item -netdev
> > > > +vhost-user,chardev=@var{id}[,vhostforce=on|off][,queues=n]
> > > >
> > > >  Establish a vhost-user netdev, backed by a chardev @var{id}. The
> > > > chardev should  be a unix domain socket backed one. The vhost-user
> > > > uses a specifically defined  protocol to pass vhost ioctl
> > > > replacement messages to an application on the other  end of the
> > > > socket. On non-MSIX guests, the feature can be forced with -
> @var{vhostforce}.
> > > > +@var{vhostforce}. Use 'queues=@var{n}' to specify the number of
> > > > +queues to be created for multiqueue vhost-user.
> > > >
> > > >  Example:
> > > >  @example
> > > > --
> > > > 1.8.4.2
> > > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "Snabb Switch development" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > > send an email to snabb-devel+unsubscr...@googlegroups.com.
> > > To post to this group, send an email to snabb-de...@googlegroups.com.
> > > Visit this group at http://groups.google.com/group/snabb-devel.
> 
> --
> You received this message because you are subscribed to the Google Groups
> "Snabb Switch development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to snabb-devel+unsubscr...@googlegroups.com.
> To post to this group, send an email to snabb-de...@googlegroups.com.
> Visit this group at http://groups.google.com/group/snabb-devel.

Re: [Qemu-devel] [snabb-devel] Re: [PATCH v6 1/2] vhost-user: add multi queue support

Reply via email to