Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-12 Thread Johannes Berg
On Thu, 2019-09-12 at 14:22 +0200, Stefan Hajnoczi wrote:
> 
> The vhost-user spec is unclear and inconsistent.  Patches are welcome.

:)

> A footnote describing the old terminology would be necessary so that
> existing documentation, code, etc can still be decyphered when the spec
> changes the terminology :).

Yeah.

But we (you) would have to decide first what the terminology actually
*should* be :-)

I'd have said something like "host" and "device", but then "host" can
get confusing in the context of qemu, is it the host that the VM is
running on? It's the VM that's hosting the device, but that's a bit confusing 
in this context.

Client/server might work, but it's not very obvious either (**), and
quite a bit of the text actually gets it wrong right now, so it'd be
very confusing to swap that and have a footnote or similar indicate that
it was described the other way around previously ...

Calling it master/slave as the text does now is a bit confusing to me
because it has DMA (of sorts) and that's called "bus mastering" in most
other contexts, but perhaps that's what would work best nonetheless,
though I'm not really a fan of that terminology.

Perhaps master/device would be nicer, getting the term "device" in there
somewhere would seem appropriate, even if it's not really actually a
device, but from the view inside the "master" VM it is a device...


(**) Btw, is it really true that there's a way to have the device make
the connection to a listening unix domain socket, as implied by the
spec? I cannot find evidence for such a mode in any code...

johannes




Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-12 Thread Stefan Hajnoczi
On Wed, Sep 11, 2019 at 05:36:32PM +0200, Johannes Berg wrote:
> On Wed, 2019-09-11 at 17:31 +0200, Stefan Hajnoczi wrote:
> 
> > > +``VHOST_USER_VQ_CALL``
> > 
> > "call" should be "kick".  "kick" is the driver->device request
> > submission notification and "call" is the device->driver request
> > completion notification.
> 
> Ahrg, yes. I confuse this all the time, sorry, will fix.
> 
> Btw, speaking of confusion ... this document may need some cleanup wrt.
> the "client" language since it states that both sides might be a socket
> server or client, but then goes on to talk about the "client" as though
> it was equivalent to "slave", where normally it's actually the other way
> around (the device is the one listening on the socket, i.e. it's the
> server). This is most pronounced in the section "Starting and stopping
> rings".
> 
> IMHO, it'd be good to talk more about "device" and "host" instead of
> "slave" and "master" too since that's clearer, and just replace *all*
> that language accordingly, but YMMV.

The vhost-user spec is unclear and inconsistent.  Patches are welcome.
A footnote describing the old terminology would be necessary so that
existing documentation, code, etc can still be decyphered when the spec
changes the terminology :).


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-11 Thread Johannes Berg
On Wed, 2019-09-11 at 17:36 +0200, Johannes Berg wrote:
> On Wed, 2019-09-11 at 17:31 +0200, Stefan Hajnoczi wrote:
> 
> > > +``VHOST_USER_VQ_CALL``

It should also be VRING, not VQ, but I did indeed fix that in v2 already
along with the CALL<->KICK inversion.

So I guess I just have to include the part about how it's different from
existing methods, and solve the discussion with Michael about polling
(and possibly use another flag there for message-based mode).

johannes




Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-11 Thread Johannes Berg
On Wed, 2019-09-11 at 17:31 +0200, Stefan Hajnoczi wrote:

> > +``VHOST_USER_VQ_CALL``
> 
> "call" should be "kick".  "kick" is the driver->device request
> submission notification and "call" is the device->driver request
> completion notification.

Ahrg, yes. I confuse this all the time, sorry, will fix.

Btw, speaking of confusion ... this document may need some cleanup wrt.
the "client" language since it states that both sides might be a socket
server or client, but then goes on to talk about the "client" as though
it was equivalent to "slave", where normally it's actually the other way
around (the device is the one listening on the socket, i.e. it's the
server). This is most pronounced in the section "Starting and stopping
rings".

IMHO, it'd be good to talk more about "device" and "host" instead of
"slave" and "master" too since that's clearer, and just replace *all*
that language accordingly, but YMMV.

> > +  :id: 34
> > +  :equivalent ioctl: N/A
> > +  :slave payload: vring state description
> > +  :master payload: N/A
> > +
> > +  When the ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` protocol feature has
> > +  been successfully negotiated, this message may be submitted by the master
> > +  to indicate that a buffer was added to the vring instead of signalling it
> > +  using the vring's event FD or having the slave rely on polling.
> > +
> > +  The state.num field is currently reserved and must be set to 0.
> 
> Please include an explanation of how this message is different from
> the existing methods.  Maybe something like:
> 
>   Unlike eventfd or polling, this message allows the master to control
> precisely when virtqueue processing happens.  When the master uses
> ``need_reply`` to receive a reply, it knows the device has become
> aware of the virtqueue activity.

Sure, thanks, I'll incorporate that.

> >  Slave message types
> >  ---
> > 
> > @@ -1246,6 +1265,19 @@ Slave message types
> >``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been
> >successfully negotiated.
> > 
> > +``VHOST_USER_VQ_KICK``
> 
> s/KICK/CALL/

Indeed. Wait, that should be VHOST_USER_SLAVE_VQ_CALL, actually. Maybe I
already did fix that in v2?

johannes




Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-11 Thread Stefan Hajnoczi
On Mon, Sep 2, 2019 at 2:13 PM Johannes Berg  wrote:
>
> From: Johannes Berg 
>
> For good reason, vhost-user is currently built asynchronously, that
> way better performance can be obtained. However, for certain use
> cases such as simulation, this is problematic.
>
> Consider an event-based simulation in which both the device and CPU
> have are scheduled according to a simulation "calendar". Now, for
> example, consider the CPU sending a command to the device, over a
> vring in the vhost-user protocol. In this case, the CPU must wait
> for the vring update to have been processed, so that the device has
> time to put an entry onto the simulation calendar to obtain time to
> handle the interrupt, before the CPU goes back to scheduling and
> thus updates the simulation calendar or even has the simulation
> move time forward.
>
> This cannot easily achieved with the eventfd based vring kick/call
> design.
>
> Extend the protocol slightly, so that a message can be used for kick
> and call instead, if the new VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS is
> negotiated. This in itself doesn't guarantee synchronisation, but both
> sides can also negotiate VHOST_USER_PROTOCOL_F_REPLY_ACK and thus get
> a reply to this message by setting the need_reply flag, and ensure
> synchronisation this way.
>
> Signed-off-by: Johannes Berg 
> ---
>  docs/interop/vhost-user.rst | 36 ++--
>  1 file changed, 34 insertions(+), 2 deletions(-)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 7827b710aa0a..1270885fecf2 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -2,6 +2,7 @@
>  Vhost-user Protocol
>  ===
>  :Copyright: 2014 Virtual Open Systems Sarl.
> +:Copyright: 2019 Intel Corporation
>  :Licence: This work is licensed under the terms of the GNU GPL,
>version 2 or later. See the COPYING file in the top-level
>directory.
> @@ -785,6 +786,7 @@ Protocol features
>#define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
>#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
>#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
> +  #define VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS 13
>
>  Master message types
>  
> @@ -946,7 +948,9 @@ Master message types
>Bits (0-7) of the payload contain the vring index. Bit 8 is the
>invalid FD flag. This flag is set when there is no file descriptor
>in the ancillary data. This signals that polling should be used
> -  instead of waiting for a kick.
> +  instead of waiting for the call. however, if the protocol feature
> +  ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` has been negotiated it instead
> +  means the updates should be done using the messages.
>
>  ``VHOST_USER_SET_VRING_CALL``
>:id: 13
> @@ -959,7 +963,9 @@ Master message types
>Bits (0-7) of the payload contain the vring index. Bit 8 is the
>invalid FD flag. This flag is set when there is no file descriptor
>in the ancillary data. This signals that polling will be used
> -  instead of waiting for the call.
> +  instead of waiting for the call; however, if the protocol feature
> +  ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` has been negotiated it instead
> +  means the updates should be done using the messages.
>
>  ``VHOST_USER_SET_VRING_ERR``
>:id: 14
> @@ -1190,6 +1196,19 @@ Master message types
>ancillary data. The GPU protocol is used to inform the master of
>rendering state and updates. See vhost-user-gpu.rst for details.
>
> +``VHOST_USER_VQ_CALL``

"call" should be "kick".  "kick" is the driver->device request
submission notification and "call" is the device->driver request
completion notification.

> +  :id: 34
> +  :equivalent ioctl: N/A
> +  :slave payload: vring state description
> +  :master payload: N/A
> +
> +  When the ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` protocol feature has
> +  been successfully negotiated, this message may be submitted by the master
> +  to indicate that a buffer was added to the vring instead of signalling it
> +  using the vring's event FD or having the slave rely on polling.
> +
> +  The state.num field is currently reserved and must be set to 0.

Please include an explanation of how this message is different from
the existing methods.  Maybe something like:

  Unlike eventfd or polling, this message allows the master to control
precisely when virtqueue processing happens.  When the master uses
``need_reply`` to receive a reply, it knows the device has become
aware of the virtqueue activity.

>  Slave message types
>  ---
>
> @@ -1246,6 +1265,19 @@ Slave message types
>``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been
>successfully negotiated.
>
> +``VHOST_USER_VQ_KICK``

s/KICK/CALL/



Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-11 Thread Stefan Hajnoczi
On Wed, Sep 11, 2019 at 11:13 AM Johannes Berg
 wrote:
> On Wed, 2019-09-11 at 09:35 +0200, Stefan Hajnoczi wrote:
> > On Tue, Sep 10, 2019 at 05:14:36PM +0200, Johannes Berg wrote:
> > > On Tue, 2019-09-10 at 17:03 +0200, Stefan Hajnoczi wrote:
> > I'm suggesting this because it seems like a cleaner approach than
> > exposing the calendar concept to the vhost-user devices.
>
> As I argue above, for all intents and purposes that only works for
> extremely trivial devices, to the extent that I'd rather exclude them
> than risk unexpected behaviour.

That makes sense, I think a vhost-user protocol extension is necessary
here.  Thanks for explaining.

Stefan



Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-11 Thread Johannes Berg
On Wed, 2019-09-11 at 09:35 +0200, Stefan Hajnoczi wrote:
> On Tue, Sep 10, 2019 at 05:14:36PM +0200, Johannes Berg wrote:
> > On Tue, 2019-09-10 at 17:03 +0200, Stefan Hajnoczi wrote:
> > > > Now, this means that the CPU (that's part of the simulation) has to
> > > > *wait* for the device to add an entry to the simulation calendar in
> > > > response to the kick... That means that it really has to look like
> > > > 
> > > > CPU   device   calendar
> > > >  ---[kick]-->
> > > >  ---[add entry]-->
> > > >  <---[return]-
> > > 
> > > What are the semantics of returning from the calendar?  Does it mean
> > > "it's now your turn to run?", "your entry has been added and you'll be
> > > notified later when it's time to run?", or something else?
> > 
> > The latter - the entry was added, and you'll be notified when it's time
> > to run; but we need to have that state on the calendar so the CPU won't
> > converse with the calendar before that state is committed.
> 
> Is the device only adding a calendar entry and not doing any actual
> device emulation at this stage?

Correct, yes.

With one exception: in the case of the simtime (calendar) device, the
actual processing *does* happen at this stage, of course - the calendar
entry has to have been added before we return.

> If yes, then this suggests the system could be structured more cleanly.
> The vhost-user device process should focus on device emulation.  It
> should not be aware of the calendar.

Decoupling the device entirely from the simulation doesn't work, at
least it depends on what you want to emulate. If you don't care that
everything in the device happens immediately (in simulation time), then
this might be OK - but in most cases you do want to model some latency,
processing time or similar in the device, and that means the device has
to request more calendar entries for its further processing.

Take a network device for example that wants to model a 50ms latency. It
has to first have a calendar event to take the packet from the deriver
onto the wire, and then have another calendar event to push the packet
from the wire out some other driver. The first of those events could be
modelled by what you suggest below, the second cannot.

> The vhost-user protocol also shouldn't require modifications.
> 
> Instead, Linux arch/um code would add the entry to the calendar when the
> CPU wants to kick a vhost-user device.  I assume the CPU is suspended
> until arch/um code completes adding the entry to the calendar.

Right, OK, so far I'm following, and it seems perfectly reasonable.

Though as I said above (the simtime exception) - suspending the CPU
while adding an entry to the calendar actually also relies on the
KICK/ACK message semantics right now. This could easily be implemented
differently in this particular device though, e.g. by waiting for an ACK
message on a response virtqueue after sending the "add-entry" request on
the command virtqueue.

> When the calendar decides to run the device entry it signals the
> vhost-user kick eventfd.

Now you have to send those FDs also to the calendar, but I guess the
calendar is a vhost-user device too anyway, so we can send it the FD
along with the request to add the calendar entry, i.e. instead of adding
a calendar entry "please tell me" you can add a calendar entry with
"please kick this FD". Seems reasonable, though it requires much deeper
integration of the virtio implementation with the calendar than I was
planning, though possibly a bit less such integration on the device
side.

> The vhost-user device processes the virtqueue
> as if it had been directly signalled by the CPU, totally unaware that
> it's running within a simulation system.

As I said above, it cannot be entirely unaware unless it's a very
trivial device emulation. That *might* be something you actually
want/don't care, for example for a "control network" within the
simulation where you don't need to model any latencies, however it also
very easily introduces issues, say if the vhost-user device emulation,
focusing completely on emulation, starts doing 'sleep()' or similar OS
calls; they really should be going to the simulation instead.


However, really the place where this breaks down is that you don't know
when the device has finished processing.

As a totally made-up example, say you're emulating a strange /dev/null
device, but as a special quirk it can only ever consume an even number
of buffers. You give it a buffer on a virtqueue and kick it, nothing
happens; you give it another one and kick it, it'll consume both and
free (call) up the two entries, doing nothing else.

In the simulation of this device, it has to essentially behave like
this:
On kick, it schedules a calendar entry to process the interrupt. Once
that entry is signalled, the interrupt processing code runs and checks
the state of the virtqueue; if there's an even number of buffers it
releases 

Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-11 Thread Stefan Hajnoczi
On Tue, Sep 10, 2019 at 05:14:36PM +0200, Johannes Berg wrote:
> On Tue, 2019-09-10 at 17:03 +0200, Stefan Hajnoczi wrote:
> > 
> > > Now, this means that the CPU (that's part of the simulation) has to
> > > *wait* for the device to add an entry to the simulation calendar in
> > > response to the kick... That means that it really has to look like
> > > 
> > > CPU   device   calendar
> > >  ---[kick]-->
> > >  ---[add entry]-->
> > >  <---[return]-
> > 
> > What are the semantics of returning from the calendar?  Does it mean
> > "it's now your turn to run?", "your entry has been added and you'll be
> > notified later when it's time to run?", or something else?
> 
> The latter - the entry was added, and you'll be notified when it's time
> to run; but we need to have that state on the calendar so the CPU won't
> converse with the calendar before that state is committed.

Is the device only adding a calendar entry and not doing any actual
device emulation at this stage?

If yes, then this suggests the system could be structured more cleanly.
The vhost-user device process should focus on device emulation.  It
should not be aware of the calendar.  The vhost-user protocol also
shouldn't require modifications.

Instead, Linux arch/um code would add the entry to the calendar when the
CPU wants to kick a vhost-user device.  I assume the CPU is suspended
until arch/um code completes adding the entry to the calendar.

When the calendar decides to run the device entry it signals the
vhost-user kick eventfd.  The vhost-user device processes the virtqueue
as if it had been directly signalled by the CPU, totally unaware that
it's running within a simulation system.

The irq path is similar: the device signals the callfd and the calendar
adds an entry to notify UML that the request has completed.

Some pseudo-code:

arch/um/drivers/.../vhost-user.c:

  void um_vu_kick(struct um_vu_vq *vq)
  {
  if (simulation_mode) {
  calendar_add_entry({
  .type = CAL_ENTRY_TYPE_VHOST_USER_KICK,
  .device = vq->dev,
  .vq_idx = vq->idx,
  });
  return;
  }

  /* The normal code path: signal the kickfd */
  uint64_t val = 1;
  write(vq->kickfd, , sizeof(val));
  }

I'm suggesting this because it seems like a cleaner approach than
exposing the calendar concept to the vhost-user devices.  I'm not 100%
sure it offers the semantics you need to make everything deterministic
though.

A different topic: vhost-user does not have a 1:1 vq buffer:kick
relationship.  It's possible to add multiple buffers and kick only once.
It is also possible for the device to complete multiple buffers and only
call once.

This could pose a problem for simulation because it allows a degree of
non-determinism.  But as long as the both the CPU and the I/O completion
of the device happen on a strict schedule this isn't a problem.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-11 Thread Stefan Hajnoczi
On Tue, Sep 10, 2019 at 05:34:57PM +0200, Johannes Berg wrote:
> On Tue, 2019-09-10 at 11:33 -0400, Michael S. Tsirkin wrote:
> > On Tue, Sep 10, 2019 at 05:14:36PM +0200, Johannes Berg wrote:
> > > Is any of you familiar with the process of getting a virtio device ID
> > > assigned, and if so, do you think it'd be feasible? Without that, it'd
> > > probably be difficult to upstream the patch to support this protocol to
> > > user-mode Linux.
> > 
> > Sure, subscribe then send a patch to virtio-comm...@lists.oasis-open.org
> 
> Ok, great.
> 
> > We do expect people to eventually get around to documenting the device
> > and upstreaming it though. If there's no plan to do it at all, you might
> > still be able to reuse the virtio code, in that case let's talk.
> 
> Right, no, I do want to and am working on the code now, but it's a bit
> of a chicken & egg - without an ID I can't really send any code upstream
> :-)
> 
> I can accompany the request for a new ID with working patches.
> 
> What kind of documentation beyond the header file should be added, and
> where?

You can reserve the device ID without any header files or documentation.
Just a patch like this one will suffice:

  
https://github.com/oasis-tcs/virtio-spec/commit/9454b568c29baab7f3e4b1a384627d0061f71eba

I have checked that device ID 29 appears to be free so you could use it.

For the actual VIRTIO device specification, please follow the same
format as the other devices.  Here is the virtio-net section in the
VIRTIO spec:

  
https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-1940001

It documents the virtqueues, configuration space layout, theory of
operation, and also includes normative statements that compliant drivers
and devices are expected to follow.

The goal of the spec is to provide all the information needed by driver
and device emulation authors to create an implementation from scratch
(without studying existing code in Linux, QEMU, etc).

The VIRTIO spec contains pseudo-C struct and constant definitions, but
not a real header file.  The header file for a Linux driver would live
in include/uapi/linux/virtio_foo.h (see existing devices for examples).
This would be part of your Linux patches and separate from the virtio
spec.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-10 Thread Johannes Berg
On Tue, 2019-09-10 at 11:33 -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 10, 2019 at 05:14:36PM +0200, Johannes Berg wrote:
> > Is any of you familiar with the process of getting a virtio device ID
> > assigned, and if so, do you think it'd be feasible? Without that, it'd
> > probably be difficult to upstream the patch to support this protocol to
> > user-mode Linux.
> 
> Sure, subscribe then send a patch to virtio-comm...@lists.oasis-open.org

Ok, great.

> We do expect people to eventually get around to documenting the device
> and upstreaming it though. If there's no plan to do it at all, you might
> still be able to reuse the virtio code, in that case let's talk.

Right, no, I do want to and am working on the code now, but it's a bit
of a chicken & egg - without an ID I can't really send any code upstream
:-)

I can accompany the request for a new ID with working patches.

What kind of documentation beyond the header file should be added, and
where?

johannes




Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-10 Thread Michael S. Tsirkin
On Tue, Sep 10, 2019 at 05:14:36PM +0200, Johannes Berg wrote:
> Is any of you familiar with the process of getting a virtio device ID
> assigned, and if so, do you think it'd be feasible? Without that, it'd
> probably be difficult to upstream the patch to support this protocol to
> user-mode Linux.

Sure, subscribe then send a patch to virtio-comm...@lists.oasis-open.org

We do expect people to eventually get around to documenting the device
and upstreaming it though. If there's no plan to do it at all, you might
still be able to reuse the virtio code, in that case let's talk.

-- 
MST



Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-10 Thread Johannes Berg
On Tue, 2019-09-10 at 17:03 +0200, Stefan Hajnoczi wrote:
> 
> > Now, this means that the CPU (that's part of the simulation) has to
> > *wait* for the device to add an entry to the simulation calendar in
> > response to the kick... That means that it really has to look like
> > 
> > CPU   device   calendar
> >  ---[kick]-->
> >  ---[add entry]-->
> >  <---[return]-
> 
> What are the semantics of returning from the calendar?  Does it mean
> "it's now your turn to run?", "your entry has been added and you'll be
> notified later when it's time to run?", or something else?

The latter - the entry was added, and you'll be notified when it's time
to run; but we need to have that state on the calendar so the CPU won't
converse with the calendar before that state is committed.

> >  <-[return]--
> 
> What are the semantics of returning to the CPU?  Does it mean the
> submitted I/O request is now complete?

No, it just means that the interrupt was triggered; it will be handled
once the calendar decide that it's the device's turn to run its
interrupt processing event.

The I/O request completion will be done later in a very similar fashion
(just inverting the participants, and replacing "kick" by "call").

> Sorry for these questions.  If this becomes tedious for you I will look
> into the paper you linked.

No no, don't worry, it's fine. Also, the paper doesn't really concern
itself with issues such as this, they just assume everything is
synchronous and make their code that way - they have a simulated device
in qemu that they wrote themselves, and it just happens to be
synchronous by way of their implementation...

What I'm trying to do here now is not really replicate that paper or
anything, but - because I'm working on similar things and simulation -
get some pieces into the upstreams (qemu for vhost-user, Linux for user-
mode-Linux running in a simulation) so (a) I don't have to maintain it
out-of-tree, and (b) it's available for others for use without a monster
patch to an ancient version of the software stack :-)

Happy to answer any questions.

Aside from the interrupt processing semantics in UML which I'll work on
next I do have the whole simulation calendar working etc., i.e. I have
full UML "VM" controlled entirely by the simulation calendar, also over
virtio.



Btw, that reminds me ... there's yet another process to add a virtio
device ID. I'm currently defining a virtio simulation calendar protocol
like this:

+/*
+ * A message passed on either of the VQs (input and output).
+ */
+struct virtio_simtime_msg {
+   __u64 op;   /* see below */
+   __u64 time; /* time in nanoseconds */
+};
+
+/**
+ * enum virtio_simtime_ops - Operation codes
+ */
+enum virtio_simtime_ops {
+   /**
+* @VIRTIO_SIMTIME_REQUEST: request from device to run at the given time
+*/
+   VIRTIO_SIMTIME_REQUEST  = 1,
+
+   /**
+* @VIRTIO_SIMTIME_WAIT: wait for the requested runtime, new requests
+*  may be made while waiting (due to interrupts); the time field
+*  is ignored
+*/
+   VIRTIO_SIMTIME_WAIT = 2,
+
+   /**
+* @VIRTIO_SIMTIME_GET: ask device to fill the buffer with the current
+*  simulation time
+*/
+   VIRTIO_SIMTIME_GET  = 3,
+
+   /**
+* @VIRTIO_SIMTIME_UPDATE: time update to/from the control device
+*/
+   VIRTIO_SIMTIME_UPDATE   = 4,
+
+   /**
+* @VIRTIO_SIMTIME_RUN: run time request granted, current time is in
+*  the time field
+*/
+   VIRTIO_SIMTIME_RUN  = 5,
+
+   /**
+* @VIRTIO_SIMTIME_FREE_UNTIL: enable free-running until the given time,
+*  this is a messag from the device telling the user that it can do
+*  its own scheduling for anything before the given time
+*/
+   VIRTIO_SIMTIME_FREE_UNTIL   = 6,
+};


Above, I've basically only described _REQUEST, _WAIT and _RUN, I'm
pretty sure I need _GET for interrupt handling (or maybe more
efficiently a _REQUEST_NOW, rather than using _GET && _REQUEST); I think
I need _UPDATE before a "kick" to the device if I have _FREE_UNTIL which
is an optimization to not have to talk to the calendar all the time.


Is any of you familiar with the process of getting a virtio device ID
assigned, and if so, do you think it'd be feasible? Without that, it'd
probably be difficult to upstream the patch to support this protocol to
user-mode Linux.

Thanks,
johannes




Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-10 Thread Stefan Hajnoczi
On Mon, Sep 09, 2019 at 07:34:24PM +0200, Johannes Berg wrote:
> On Mon, 2019-09-09 at 18:00 +0200, Stefan Hajnoczi wrote:
> 
> > Is this really necessary?  
> 
> Yes* :)
> 
> > Can the simulation interpose between the
> > call/kick eventfds in order to control when events happen?
> > 
> >   CPU --cpu_kickfd--> Simulation --vhost_kickfd--> vhost-user device
> > 
> > and:
> > 
> >   vhost-user device --vhost_callfd--> Simulation -->cpu_callfd-> CPU
> > 
> > The simluation controls when the CPU's kick is seen by the device and
> > also when the call is seen by the CPU.
> 
> The point isn't to let the simulation know about anything that happens.
> The CPU and the device are *part* of the simulation.

I was thinking more of letting the simulation decide when to signal the
second eventfd in each pair, but now I understand you are trying to make
everything synchronous and ioeventfd is async by definition.

> 
> > I don't understand why new vhost-user protocol messages are required.
> 
> I guess I haven't really explained it well then :-)
> 
> So let's say, WLOG, I have a simulated network and a bunch of Linux
> machines that are running on simulation time. Today I can do that only
> with user-mode Linux, but we'll see.
> 
> Now in order to run everything on simulation time, *everything* that
> happens in the simulation needs to request a simulation calendar entry,
> and gets told when that entry is scheduled.
> 
> So think, for example, you have
> 
> CPU ---[kick]---> device
> 
> Now, this is essentially triggering an interrupt in the device. However,
> the simulation code has to ensure that the simulated device's interrupt
> handling only happens at a scheduler entry. Fundamentally, the
> simulation serializes all processing, contrary to what you want in a
> real system.
> 
> Now, this means that the CPU (that's part of the simulation) has to
> *wait* for the device to add an entry to the simulation calendar in
> response to the kick... That means that it really has to look like
> 
> CPU   device   calendar
>  ---[kick]-->
>  ---[add entry]-->
>  <---[return]-

What are the semantics of returning from the calendar?  Does it mean
"it's now your turn to run?", "your entry has been added and you'll be
notified later when it's time to run?", or something else?

>  <-[return]--

What are the semantics of returning to the CPU?  Does it mean the
submitted I/O request is now complete?

> 
> so that the CPU won't get to idle or some other processing where it asks
> the simulation calendar for its own next entry...
>
> Yes, like I said before, I realize that all of this is completely
> opposed to what you want in a real system, but then in a real system you
> also have real timeouts, and don't just skip time forward when the
> simulation calendar says so ...

Sorry for these questions.  If this becomes tedious for you I will look
into the paper you linked.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-09 Thread Johannes Berg
On Mon, 2019-09-09 at 18:00 +0200, Stefan Hajnoczi wrote:

> Is this really necessary?  

Yes* :)

> Can the simulation interpose between the
> call/kick eventfds in order to control when events happen?
> 
>   CPU --cpu_kickfd--> Simulation --vhost_kickfd--> vhost-user device
> 
> and:
> 
>   vhost-user device --vhost_callfd--> Simulation -->cpu_callfd-> CPU
> 
> The simluation controls when the CPU's kick is seen by the device and
> also when the call is seen by the CPU.

The point isn't to let the simulation know about anything that happens.
The CPU and the device are *part* of the simulation.

> I don't understand why new vhost-user protocol messages are required.

I guess I haven't really explained it well then :-)

So let's say, WLOG, I have a simulated network and a bunch of Linux
machines that are running on simulation time. Today I can do that only
with user-mode Linux, but we'll see.

Now in order to run everything on simulation time, *everything* that
happens in the simulation needs to request a simulation calendar entry,
and gets told when that entry is scheduled.

So think, for example, you have

CPU ---[kick]---> device

Now, this is essentially triggering an interrupt in the device. However,
the simulation code has to ensure that the simulated device's interrupt
handling only happens at a scheduler entry. Fundamentally, the
simulation serializes all processing, contrary to what you want in a
real system.

Now, this means that the CPU (that's part of the simulation) has to
*wait* for the device to add an entry to the simulation calendar in
response to the kick... That means that it really has to look like

CPU   device   calendar
 ---[kick]-->
 ---[add entry]-->
 <---[return]-
 <-[return]--

so that the CPU won't get to idle or some other processing where it asks
the simulation calendar for its own next entry...

Yes, like I said before, I realize that all of this is completely
opposed to what you want in a real system, but then in a real system you
also have real timeouts, and don't just skip time forward when the
simulation calendar says so ...


* Now, of course I lied, this is software after all. The *concept* is
necessary, but it's not strictly necessary to do this in-band in the
vhost-user protocol.
We could do an out-of-band simulation socket for the kick signal and
just pretend we're using polling mode as far as the vhost-user protocol
is concerned, but it'd probably be harder to implement, and we couldn't
do it in a way that we could actually contribute anything upstream.
There are quite a few papers proposing such simulation systems, I only
found the VMSimInt one publishing their code, but even that is some
hacks on top of qemu 1.6.0...

johannes




Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-09 Thread Stefan Hajnoczi
On Thu, Sep 05, 2019 at 10:28:33PM +0200, Johannes Berg wrote:
>  
> > +``VHOST_USER_VQ_CALL``
> > +  :id: 34
> > +  :equivalent ioctl: N/A
> > +  :slave payload: vring state description
> > +  :master payload: N/A
> 
> Oops. This message should be called VHOST_USER_VRING_KICK.
> 
> This file doesn't take about virtqueues, just vrings, and I inverted the
> call/kick.
> 
> [...]
>  
> > +``VHOST_USER_VQ_KICK``
> > +  :id: 4
> > +  :equivalent ioctl: N/A
> > +  :slave payload: vring state description
> > +  :master payload: N/A
> 
> Similarly, that should be called VHOST_USER_SLAVE_VRING_CALL.
> 
> Anyway, some comments would be appreciated. I'm working on an
> implementation now for my simulation environment, and I guess I can keep
> that private etc. but if there's interest I can submit an (optional)
> implementation of this for libvhost-user too, I think.

Is this really necessary?  Can the simulation interpose between the
call/kick eventfds in order to control when events happen?

  CPU --cpu_kickfd--> Simulation --vhost_kickfd--> vhost-user device

and:

  vhost-user device --vhost_callfd--> Simulation -->cpu_callfd-> CPU

The simluation controls when the CPU's kick is seen by the device and
also when the call is seen by the CPU.

I don't understand why new vhost-user protocol messages are required.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-05 Thread Johannes Berg
 
> +``VHOST_USER_VQ_CALL``
> +  :id: 34
> +  :equivalent ioctl: N/A
> +  :slave payload: vring state description
> +  :master payload: N/A

Oops. This message should be called VHOST_USER_VRING_KICK.

This file doesn't take about virtqueues, just vrings, and I inverted the
call/kick.

[...]
 
> +``VHOST_USER_VQ_KICK``
> +  :id: 4
> +  :equivalent ioctl: N/A
> +  :slave payload: vring state description
> +  :master payload: N/A

Similarly, that should be called VHOST_USER_SLAVE_VRING_CALL.

Anyway, some comments would be appreciated. I'm working on an
implementation now for my simulation environment, and I guess I can keep
that private etc. but if there's interest I can submit an (optional)
implementation of this for libvhost-user too, I think.

johannes




[Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages

2019-09-02 Thread Johannes Berg
From: Johannes Berg 

For good reason, vhost-user is currently built asynchronously, that
way better performance can be obtained. However, for certain use
cases such as simulation, this is problematic.

Consider an event-based simulation in which both the device and CPU
have are scheduled according to a simulation "calendar". Now, for
example, consider the CPU sending a command to the device, over a
vring in the vhost-user protocol. In this case, the CPU must wait
for the vring update to have been processed, so that the device has
time to put an entry onto the simulation calendar to obtain time to
handle the interrupt, before the CPU goes back to scheduling and
thus updates the simulation calendar or even has the simulation
move time forward.

This cannot easily achieved with the eventfd based vring kick/call
design.

Extend the protocol slightly, so that a message can be used for kick
and call instead, if the new VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS is
negotiated. This in itself doesn't guarantee synchronisation, but both
sides can also negotiate VHOST_USER_PROTOCOL_F_REPLY_ACK and thus get
a reply to this message by setting the need_reply flag, and ensure
synchronisation this way.

Signed-off-by: Johannes Berg 
---
 docs/interop/vhost-user.rst | 36 ++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 7827b710aa0a..1270885fecf2 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -2,6 +2,7 @@
 Vhost-user Protocol
 ===
 :Copyright: 2014 Virtual Open Systems Sarl.
+:Copyright: 2019 Intel Corporation
 :Licence: This work is licensed under the terms of the GNU GPL,
   version 2 or later. See the COPYING file in the top-level
   directory.
@@ -785,6 +786,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
   #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
   #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
+  #define VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS 13
 
 Master message types
 
@@ -946,7 +948,9 @@ Master message types
   Bits (0-7) of the payload contain the vring index. Bit 8 is the
   invalid FD flag. This flag is set when there is no file descriptor
   in the ancillary data. This signals that polling should be used
-  instead of waiting for a kick.
+  instead of waiting for the call. however, if the protocol feature
+  ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` has been negotiated it instead
+  means the updates should be done using the messages.
 
 ``VHOST_USER_SET_VRING_CALL``
   :id: 13
@@ -959,7 +963,9 @@ Master message types
   Bits (0-7) of the payload contain the vring index. Bit 8 is the
   invalid FD flag. This flag is set when there is no file descriptor
   in the ancillary data. This signals that polling will be used
-  instead of waiting for the call.
+  instead of waiting for the call; however, if the protocol feature
+  ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` has been negotiated it instead
+  means the updates should be done using the messages.
 
 ``VHOST_USER_SET_VRING_ERR``
   :id: 14
@@ -1190,6 +1196,19 @@ Master message types
   ancillary data. The GPU protocol is used to inform the master of
   rendering state and updates. See vhost-user-gpu.rst for details.
 
+``VHOST_USER_VQ_CALL``
+  :id: 34
+  :equivalent ioctl: N/A
+  :slave payload: vring state description
+  :master payload: N/A
+
+  When the ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` protocol feature has
+  been successfully negotiated, this message may be submitted by the master
+  to indicate that a buffer was added to the vring instead of signalling it
+  using the vring's event FD or having the slave rely on polling.
+
+  The state.num field is currently reserved and must be set to 0.
+
 Slave message types
 ---
 
@@ -1246,6 +1265,19 @@ Slave message types
   ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been
   successfully negotiated.
 
+``VHOST_USER_VQ_KICK``
+  :id: 4
+  :equivalent ioctl: N/A
+  :slave payload: vring state description
+  :master payload: N/A
+
+  When the ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` protocol feature has
+  been successfully negotiated, this message may be submitted by the slave
+  to indicate that a buffer was used from the vring instead of signalling it
+  using the vring's kick FD or having the master relying on polling.
+
+  The state.num field is currently reserved and must be set to 0.
+
 .. _reply_ack:
 
 VHOST_USER_PROTOCOL_F_REPLY_ACK
-- 
2.20.1