Re: [Qemu-devel] [PATCH 0/2] virtio len fixes for qemu.
"Michael S. Tsirkin" writes: > On Mon, Mar 16, 2015 at 01:44:22PM +1030, Rusty Russell wrote: >> diff --git a/content.tex b/content.tex >> index 6ba079d..2c946a5 100644 >> --- a/content.tex >> +++ b/content.tex >> @@ -600,10 +600,19 @@ them: it is only written to by the device, and read by >> the driver. >> Each entry in the ring is a pair: \field{id} indicates the head entry of the >> descriptor chain describing the buffer (this matches an entry >> placed in the available ring by the guest earlier), and \field{len} the >> total >> -of bytes written into the buffer. The latter is extremely useful >> -for drivers using untrusted buffers: if you do not know exactly >> -how much has been written by the device, you usually have to zero >> -the buffer to ensure no data leakage occurs. >> +of bytes written into the buffer. >> + >> +\begin{note} >> +\field{len} is useful >> +for drivers using untrusted buffers: if a driver does not know exactly >> +how much has been written by the device, the driver would have to zero >> +the buffer in advance to ensure no data leakage occurs. >> + >> +For example, a network driver may hand a received buffer directly to >> +an unprivileged userspace application. If the network device has not >> +overwritten the bytes which were in that buffer, this may leak the >> +contents of freed memory from other processes to the application. >> +\end{note} >> >> \begin{note} >> The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} >> @@ -612,6 +621,28 @@ the constant as VRING_USED_F_NO_NOTIFY, but the layout >> and value were >> identical. >> \end{note} >> >> +\devicenormative{\subsubsection}{Virtqueue Notification Suppression}{Basic >> Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} >> + >> +The device MUST set \field{len} prior to updating the used \field{idx}. >> + >> +The device MUST write at least \field{len} bytes to descriptor, >> +beginning at the first device-writable buffer, >> +prior to updating the used \field{idx}. >> + >> +The device MAY write more than \field{len} bytes to descriptor. >> + >> +\begin{note} >> +There are potential error cases where a device might not know what >> +parts of the buffers have been written. This is why \field{len} may >> +be an underestimate, but that's preferable to the driver believing >> +that uninitialized memory has been overwritten when it has not. >> +\end{note} >> + >> +\drivernormative{\subsubsection}{Virtqueue Notification Suppression}{Basic >> Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} >> + >> +The driver MUST NOT make assumptions about data in device-writable buffers >> +beyond the first \field{len} bytes, and SHOULD ignore it. > > it -> this data. > > Otherwise on first reading I thought "it" refers to len field. Thanks, fixed. >> + >> \subsection{Virtqueue Notification Suppression}\label{sec:Basic Facilities >> of a Virtio Device / Virtqueues / Virtqueue Notification Suppression} >> >> The device can suppress notifications in a manner analogous to the way > > Sounds good, let's move discussion to virtio/virtio-dev now? > I think it's 1.1 material - agree? Moving. My intent was not to change the spec, but to clarify it. It's a bug in the spec that we didn't spell out what len means, but left it by implication. I'm OK with leaving it for 1.1 if you think this fix is too much, as long as implementations don't leave fixing it until 1.1 :) Thanks, Rusty.
Re: [Qemu-devel] virtio fixes pull for 4.0?
"Michael S. Tsirkin" writes: > On Mon, Mar 09, 2015 at 05:43:19PM +1030, Rusty Russell wrote: >> > I think it's a good idea to merge these patches (maybe except the >> > !TASK_RUNNING thing) sooner rather than later, to make sure people have >> > the time to test the fixes properly. Would you like me to pack up (some >> > of them) them up and do a pull request? >> >> I'm waiting a bit longer, since they seem to still be tricking in. > > I don't think there was a pull request since rc1 which > makes me a bit anxious. Any outstanding issues? Well, I was hoping for an ack on your mmio patches. But the weekend is over, so I've pushed and sent Linus a pull request. Cheers, Rusty.
Re: [Qemu-devel] [PATCH 0/2] virtio len fixes for qemu.
"Michael S. Tsirkin" writes: > On Fri, Mar 13, 2015 at 11:47:18AM +1030, Rusty Russell wrote: >> Here's my proposed spec patch, which spells this out: >> >> diff --git a/content.tex b/content.tex >> index 6ba079d..b6345a8 100644 >> --- a/content.tex >> +++ b/content.tex >> @@ -600,10 +600,19 @@ them: it is only written to by the device, and read by >> the driver. >> Each entry in the ring is a pair: \field{id} indicates the head entry of the >> descriptor chain describing the buffer (this matches an entry >> placed in the available ring by the guest earlier), and \field{len} the >> total >> -of bytes written into the buffer. The latter is extremely useful >> +of bytes written into the buffer. >> + >> +\begin{note} >> +\field{len} is extremely useful > > just "useful" maybe? OK. >> for drivers using untrusted buffers: if you do not know exactly > > replace "you" with "driver" here? Yep. >> -how much has been written by the device, you usually have to zero >> -the buffer to ensure no data leakage occurs. >> +how much has been written by the device, a driver would have to zero >> +the buffer in advance to ensure no data leakage occurs. >> + >> +For example, a network driver > > any driver really, right? Well, the block device has an explicit status byte, and an fixed length. But there's a subtler detail I was considering when I designed this. Imagine a Xen-style "driver domain" which is actually your device; it's *untrusted*. This is possible if the (trusted) host does that actual data transfer, *and* reports the length; and such a mechanism is generic, so the host doesn't need to whether this is a block, net, or other device. (Imagine the device-guest has R/O mapping of the avail ring and descriptor table. Ignoring indirect descriptors you only need a "copy this data to/from this avail entry" helper to make this work). > How about something like this: > > +The device MUST write at least \field{len} bytes to descriptor, > +beginning at the first device-writable buffer, > +prior to updating the used index field. > +The device MAY write more than \field{len} bytes to descriptor. > +The driver MUST NOT make assumptions about data in the buffer pointed to > +by the descriptor with WRITE flag > +beyond the first \field{len} bytes: the data > +might be unchanged by the device, or it might be > +overwritten by the device. > +The driver SHOULD ignore data beyond the first \field{len} bytes. I like these, as long as we note that this MAY is to allow error cases, otherwise people might think they should just set len to zero. Here it is, using the device-writable terminology, and explicitly requiring that the device must set len (otherwise the requirements about the device obeying len makes it look like it's set by the driver): diff --git a/content.tex b/content.tex index 6ba079d..2c946a5 100644 --- a/content.tex +++ b/content.tex @@ -600,10 +600,19 @@ them: it is only written to by the device, and read by the driver. Each entry in the ring is a pair: \field{id} indicates the head entry of the descriptor chain describing the buffer (this matches an entry placed in the available ring by the guest earlier), and \field{len} the total -of bytes written into the buffer. The latter is extremely useful -for drivers using untrusted buffers: if you do not know exactly -how much has been written by the device, you usually have to zero -the buffer to ensure no data leakage occurs. +of bytes written into the buffer. + +\begin{note} +\field{len} is useful +for drivers using untrusted buffers: if a driver does not know exactly +how much has been written by the device, the driver would have to zero +the buffer in advance to ensure no data leakage occurs. + +For example, a network driver may hand a received buffer directly to +an unprivileged userspace application. If the network device has not +overwritten the bytes which were in that buffer, this may leak the +contents of freed memory from other processes to the application. +\end{note} \begin{note} The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} @@ -612,6 +621,28 @@ the constant as VRING_USED_F_NO_NOTIFY, but the layout and value were identical. \end{note} +\devicenormative{\subsubsection}{Virtqueue Notification Suppression}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} + +The device MUST set \field{len} prior to updating the used \field{idx}. + +The device MUST write at least \field{len} bytes to descriptor, +beginning at the first device-writable buffer, +prior to updating the used \field{idx}. + +The device MAY write more than \field{len} bytes to descriptor. + +\begin{note} +There are potential error case
Re: [Qemu-devel] [PATCH] uapi/virtio_scsi: allow overriding CDB/SENSE size
"Michael S. Tsirkin" writes: > On Wed, Mar 11, 2015 at 02:19:03PM +0100, Michael S. Tsirkin wrote: >> QEMU wants to use virtio scsi structures with >> a different VIRTIO_SCSI_CDB_SIZE/VIRTIO_SCSI_SENSE_SIZE, >> let's add ifdefs to allow overriding them. >> >> Keep the old defines under new names: >> VIRTIO_SCSI_CDB_DEFAULT_SIZE/VIRTIO_SCSI_SENSE_DEFAULT_SIZE, >> since that's what these values really are: >> defaults for cdb/sense size fields. >> >> Suggested-by: Paolo Bonzini >> Signed-off-by: Michael S. Tsirkin >> --- > > Rusty, pls note, if possible I'd like this one > merged for 4.0 because it's important for QEMU > (which now imports linux headers). OK, applied. Thanks, Rusty. > > >> include/uapi/linux/virtio_scsi.h | 12 ++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/include/uapi/linux/virtio_scsi.h >> b/include/uapi/linux/virtio_scsi.h >> index 42b9370..cc18ef8 100644 >> --- a/include/uapi/linux/virtio_scsi.h >> +++ b/include/uapi/linux/virtio_scsi.h >> @@ -29,8 +29,16 @@ >> >> #include >> >> -#define VIRTIO_SCSI_CDB_SIZE 32 >> -#define VIRTIO_SCSI_SENSE_SIZE 96 >> +/* Default values of the CDB and sense data size configuration fields */ >> +#define VIRTIO_SCSI_CDB_DEFAULT_SIZE 32 >> +#define VIRTIO_SCSI_SENSE_DEFAULT_SIZE 96 >> + >> +#ifndef VIRTIO_SCSI_CDB_SIZE >> +#define VIRTIO_SCSI_CDB_SIZE VIRTIO_SCSI_CDB_DEFAULT_SIZE >> +#endif >> +#ifndef VIRTIO_SCSI_SENSE_SIZE >> +#define VIRTIO_SCSI_SENSE_SIZE VIRTIO_SCSI_SENSE_DEFAULT_SIZE >> +#endif >> >> /* SCSI command request, followed by data-out */ >> struct virtio_scsi_cmd_req { >> -- >> MST
Re: [Qemu-devel] [PATCH 0/2] virtio len fixes for qemu.
"Michael S. Tsirkin" writes: > On Thu, Mar 12, 2015 at 11:34:35AM +1030, Rusty Russell wrote: >> "Michael S. Tsirkin" writes: >> > On Wed, Mar 11, 2015 at 10:06:40PM +1030, Rusty Russell wrote: >> >> Each entry in the ring is a pair: \field{id} indicates the head >> >> entry of the descriptor chain describing the buffer (this >> >> matches an entry placed in the available ring by the guest >> >> earlier), and \field{len} the total of bytes written into the >> >> buffer. The latter is extremely useful for drivers using >> >> untrusted buffers: if you do not know exactly how much has been >> >> written by the device, you usually have to zero the buffer to >> >> ensure no data leakage occurs. >> > >> > Right so what does this "if you do not know exactly how much has been >> > written by the device" mean? >> >> It means "without this feature, you would not know how much has been >> written by the device"... > > So imagine a situation where device does not know for sure > how much was written, like here. > Should it set len to value that was written for sure? > Or to value that was possibly written? In this particular case, it doesn't matter since the failure is marked. In general, as the stated purpose of 'len' is to avoid guest receive-buffer zeroing, it is implied that it must not overestimate. Imagine the case of a guest user process receiving network packets. If the net device says it's written 1000 bytes (but it hasn't) we will hand 1000 bytes of uninitialized kernel memory to that process. Here's my proposed spec patch, which spells this out: diff --git a/content.tex b/content.tex index 6ba079d..b6345a8 100644 --- a/content.tex +++ b/content.tex @@ -600,10 +600,19 @@ them: it is only written to by the device, and read by the driver. Each entry in the ring is a pair: \field{id} indicates the head entry of the descriptor chain describing the buffer (this matches an entry placed in the available ring by the guest earlier), and \field{len} the total -of bytes written into the buffer. The latter is extremely useful +of bytes written into the buffer. + +\begin{note} +\field{len} is extremely useful for drivers using untrusted buffers: if you do not know exactly -how much has been written by the device, you usually have to zero -the buffer to ensure no data leakage occurs. +how much has been written by the device, a driver would have to zero +the buffer in advance to ensure no data leakage occurs. + +For example, a network driver may hand a received buffer directly to +an unprivileged userspace application. If the network device has not +overwritten the bytes which were in that buffer, this may leak the +contents of freed memory from other processes to the application. +\end{note} \begin{note} The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} @@ -612,6 +621,19 @@ the constant as VRING_USED_F_NO_NOTIFY, but the layout and value were identical. \end{note} +\devicenormative{\subsubsection}{Virtqueue Notification Suppression}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} + +The device MUST set \field{len} to the number of bytes known to be +written to the descriptor, beginning at the first device-writable +buffer. + +\begin{note} +There are potential error cases where a device might not know what +parts of the buffers have been written. In this case \field{len} may +be an underestimate, but that's preferable to the driver believing +that uninitialized memory has been overwritten when it has not/ +\end{note} + \subsection{Virtqueue Notification Suppression}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression} The device can suppress notifications in a manner analogous to the way
Re: [Qemu-devel] [PATCH 0/2] virtio len fixes for qemu.
"Michael S. Tsirkin" writes: > On Wed, Mar 11, 2015 at 10:06:40PM +1030, Rusty Russell wrote: >> Each entry in the ring is a pair: \field{id} indicates the head >> entry of the descriptor chain describing the buffer (this >> matches an entry placed in the available ring by the guest >> earlier), and \field{len} the total of bytes written into the >> buffer. The latter is extremely useful for drivers using >> untrusted buffers: if you do not know exactly how much has been >> written by the device, you usually have to zero the buffer to >> ensure no data leakage occurs. > > Right so what does this "if you do not know exactly how much has been > written by the device" mean? It means "without this feature, you would not know how much has been written by the device"... Should probably become a Note: Cheers, Rusty.
Re: [Qemu-devel] [PATCH] virtio_rpmsg: set DRIVER_OK before using device
Ohad Ben-Cohen writes: > On Mon, Mar 9, 2015 at 10:41 AM, Michael S. Tsirkin wrote: >> On Sat, Mar 07, 2015 at 08:06:56PM +0100, Michael S. Tsirkin wrote: >>> virtio spec requires that all drivers set DRIVER_OK >>> before using devices. While rpmsg isn't yet >>> included in the virtio 1 spec, previous spec versions >>> also required this. >>> >>> virtio rpmsg violates this rule: is calls kick >>> before setting DRIVER_OK. >>> >>> The fix isn't trivial since simply calling virtio_device_ready earlier >>> would mean we might get an interrupt in parallel with adding buffers. >>> >>> Instead, split kick out to prepare+notify calls. prepare before >>> virtio_device_ready - when we know we won't get interrupts. notify right >>> afterwards. >>> >>> Signed-off-by: Michael S. Tsirkin >>> --- >> >> Ohad, can you review and ack pls? > > Sure, > > Acked-by: Ohad Ben-Cohen Applied. Thanks, Rusty.
Re: [Qemu-devel] [PATCH 2/2] virtio-blk: fix length calculations for write operations.
"Michael S. Tsirkin" writes: > On Wed, Mar 11, 2015 at 04:29:32PM +1030, Rusty Russell wrote: >> We only fill in the 'req->qiov.size' bytes on a (successful) read, >> not on a write. >> >> Signed-off-by: Rusty Russell >> --- >> hw/block/virtio-blk.c | 10 +- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c >> index 258bb4c..98d87a9 100644 >> --- a/hw/block/virtio-blk.c >> +++ b/hw/block/virtio-blk.c >> @@ -50,11 +50,19 @@ static void virtio_blk_complete_request(VirtIOBlockReq >> *req, >> { >> VirtIOBlock *s = req->dev; >> VirtIODevice *vdev = VIRTIO_DEVICE(s); >> +int type = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type); >> >> trace_virtio_blk_req_complete(req, status); >> >> stb_p(&req->in->status, status); >> -virtqueue_push(s->vq, &req->elem, req->qiov.size + sizeof(*req->in)); >> + >> +/* If we didn't succeed, we *may* have written more, but don't >> + * count on it. */ > > I wonder about this. > So length as you specify it is <= actually written length. > What are the advantages of this approach? > How about we do the reverse, specify that the length in descriptor > is >= the size actually written? > > If we do this, all these buggy hosts suddenly become correct, > which seems better. The point of telling the guest the amount written is that they don't have to zero the receive buffer beforehand. Cheers, Rusty.
Re: [Qemu-devel] [PATCH 0/2] virtio len fixes for qemu.
"Michael S. Tsirkin" writes: > On Wed, Mar 11, 2015 at 02:47:47PM +0800, Fam Zheng wrote: >> On Wed, 03/11 07:19, Michael S. Tsirkin wrote: >> > On Wed, Mar 11, 2015 at 04:29:30PM +1030, Rusty Russell wrote: >> > > The virtio 'used' ring describes descriptors which have been used. It >> > > also says how many bytes have been written to the ring. For some cases, >> > > this value is ignored by Linux guests, thus errors have not been noticed. >> > > I was working on increasing the checking in Linux when I noticed this >> > > behaviour. >> > > >> > > The first patch changes the 'len' formal parameter name to 'len_written' >> > > to >> > > make the API clearer, and adds an assert(). The second fixes block >> > > writes. >> > > >> > > Cheers, >> > > Rusty. >> > > PS. It's based on MST's virtio-1.0 tree, but should be easily ported. >> > >> > Thanks, this applies to current master without issues. >> > However, I think it's best to apply patch 2, then patch 1, >> > to avoid triggering errors when bisecting. >> >> I'm seeing a make check failure. If this is a false alarm, the test should be >> fixed too. > > Yea, I'm also now thinking we need a spec clarification on this one, and > some testing with non linux drivers before jumping to changing hosts and > guests. The spec is very clear. The implementation is crap; let's fix it before 1.0. Quote: Each entry in the ring is a pair: \field{id} indicates the head entry of the descriptor chain describing the buffer (this matches an entry placed in the available ring by the guest earlier), and \field{len} the total of bytes written into the buffer. The latter is extremely useful for drivers using untrusted buffers: if you do not know exactly how much has been written by the device, you usually have to zero the buffer to ensure no data leakage occurs. I have a patch for the Linux side, too, which warns once per device and fixes it up. I will make the warning conditional on v1.0. Cheers, Rusty.
[Qemu-devel] [PATCH 2/2] virtio-blk: fix length calculations for write operations.
We only fill in the 'req->qiov.size' bytes on a (successful) read, not on a write. Signed-off-by: Rusty Russell --- hw/block/virtio-blk.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 258bb4c..98d87a9 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -50,11 +50,19 @@ static void virtio_blk_complete_request(VirtIOBlockReq *req, { VirtIOBlock *s = req->dev; VirtIODevice *vdev = VIRTIO_DEVICE(s); +int type = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type); trace_virtio_blk_req_complete(req, status); stb_p(&req->in->status, status); -virtqueue_push(s->vq, &req->elem, req->qiov.size + sizeof(*req->in)); + +/* If we didn't succeed, we *may* have written more, but don't + * count on it. */ +if (type == VIRTIO_BLK_T_IN && status == VIRTIO_BLK_S_OK) { +virtqueue_push(s->vq, &req->elem, req->qiov.size + sizeof(*req->in)); +} else { +virtqueue_push(s->vq, &req->elem, sizeof(*req->in)); +} virtio_notify(vdev, s->vq); } -- 2.1.0
[Qemu-devel] [PATCH 0/2] virtio len fixes for qemu.
The virtio 'used' ring describes descriptors which have been used. It also says how many bytes have been written to the ring. For some cases, this value is ignored by Linux guests, thus errors have not been noticed. I was working on increasing the checking in Linux when I noticed this behaviour. The first patch changes the 'len' formal parameter name to 'len_written' to make the API clearer, and adds an assert(). The second fixes block writes. Cheers, Rusty. PS. It's based on MST's virtio-1.0 tree, but should be easily ported. Rusty Russell (2): virtio: make it clear that "len" for a used descriptor is len written. virtio-blk: fix length calculations for write operations. hw/block/virtio-blk.c | 9 - hw/virtio/virtio.c | 19 --- include/hw/virtio/virtio.h | 4 ++-- 3 files changed, 22 insertions(+), 10 deletions(-) -- 2.1.0
[Qemu-devel] [PATCH 1/2] virtio: make it clear that "len" for a used descriptor is len written.
And enforce this with a check that it's <= the writable length. Signed-off-by: Rusty Russell --- hw/virtio/virtio.c | 19 --- include/hw/virtio/virtio.h | 4 ++-- 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 882a31b..c944113 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -243,16 +243,21 @@ int virtio_queue_empty(VirtQueue *vq) } void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len, unsigned int idx) +unsigned int len_written, unsigned int idx) { -unsigned int offset; +unsigned int offset, tot_wlen; int i; -trace_virtqueue_fill(vq, elem, len, idx); +trace_virtqueue_fill(vq, elem, len_written, idx); + +for (tot_wlen = i = 0; i < elem->in_num; i++) { +tot_wlen += elem->in_sg[i].iov_len; +} +assert(len_written <= tot_wlen); offset = 0; for (i = 0; i < elem->in_num; i++) { -size_t size = MIN(len - offset, elem->in_sg[i].iov_len); +size_t size = MIN(len_written - offset, elem->in_sg[i].iov_len); cpu_physical_memory_unmap(elem->in_sg[i].iov_base, elem->in_sg[i].iov_len, @@ -270,7 +275,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, /* Get a pointer to the next entry in the used ring. */ vring_used_ring_id(vq, idx, elem->index); -vring_used_ring_len(vq, idx, len); +vring_used_ring_len(vq, idx, len_written); } void virtqueue_flush(VirtQueue *vq, unsigned int count) @@ -288,9 +293,9 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count) } void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len) +unsigned int len_written) { -virtqueue_fill(vq, elem, len, 0); +virtqueue_fill(vq, elem, len_written, 0); virtqueue_flush(vq, 1); } diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index df09993..153374f 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -191,10 +191,10 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void virtio_del_queue(VirtIODevice *vdev, int n); void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len); +unsigned int len_written); void virtqueue_flush(VirtQueue *vq, unsigned int count); void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len, unsigned int idx); +unsigned int len_written, unsigned int idx); void virtqueue_map_sg(struct iovec *sg, hwaddr *addr, size_t num_sg, int is_write); -- 2.1.0
Re: [Qemu-devel] virtio fixes pull for 4.0?
"Michael S. Tsirkin" writes: > Hi Rusty! > There are a bunch of (mostly virtio 1.0 related) fixes for virtio > that need to go into 4.0 I think. > virtio_blk: typo fix > virtio_blk: fix comment for virtio 1.0 OK, I've added these two. I tend to be overcautious after the merge window. > virtio_console: init work unconditionally > virtio_console: avoid config access from irq > virtio_balloon: set DRIVER_OK before using device > > seem ready? These are in my virtio-next tree already. > virtio_mmio: generation support > virtio_mmio: fix endian-ness for mmio these two are waiting for ack by > Pawel > > These two fix bugs in virtio 1.0 code for mmio. > Host code for that was AFAIK not posted, so I can't test properly. > Pawel? I'm waiting on Acks for these two. > virtio-balloon: do not call blocking ops when !TASK_RUNNING > > Rusty, it seems you prefer a different fix for this issue, > while Cornelia prefers mine. Maybe both me and Cornelia misunderstand the > issue? I know you dealt with a ton of similar issues recently > so you are more likely to be right, but I'd like to understand > what's going on better all the same. Could you help please? In the longer run, we should handle failures from these callbacks. But we don't need to do that right now. So we want the minimal fix. And an annotation is the minimal fix. The bug has been there for ages; it's just the warning that is new (if we *always* slept, we would spin, but in practice we'll rarely sleep). > virtio_rpmsg: set DRIVER_OK before using device > > Just posted this, but seems pretty obvious. Yep, I've applied this too. Thanks! > I think it's a good idea to merge these patches (maybe except the > !TASK_RUNNING thing) sooner rather than later, to make sure people have > the time to test the fixes properly. Would you like me to pack up (some > of them) them up and do a pull request? I'm waiting a bit longer, since they seem to still be tricking in. I'm still chasing a QEMU bug, where it seems to fill in a number too large in the 'len' field for a block device. It should be 1 byte for a block device write, for example. See patch which causes assert() in qemu, but I had to stop at that point (should get back tomorrow I hope). Thanks, Rusty. diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 882a31b..98e99b8 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -243,16 +243,21 @@ int virtio_queue_empty(VirtQueue *vq) } void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len, unsigned int idx) +unsigned int len_written, unsigned int idx) { -unsigned int offset; +unsigned int offset, tot_wlen; int i; -trace_virtqueue_fill(vq, elem, len, idx); +trace_virtqueue_fill(vq, elem, len_written, idx); + +for (tot_wlen = i = 0; i < elem->out_num; i++) { +tot_wlen += elem->out_sg[i].iov_len; +} +assert(len_written <= tot_wlen); offset = 0; for (i = 0; i < elem->in_num; i++) { -size_t size = MIN(len - offset, elem->in_sg[i].iov_len); +size_t size = MIN(len_written - offset, elem->in_sg[i].iov_len); cpu_physical_memory_unmap(elem->in_sg[i].iov_base, elem->in_sg[i].iov_len, @@ -270,7 +275,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, /* Get a pointer to the next entry in the used ring. */ vring_used_ring_id(vq, idx, elem->index); -vring_used_ring_len(vq, idx, len); +vring_used_ring_len(vq, idx, len_written); } void virtqueue_flush(VirtQueue *vq, unsigned int count) @@ -288,9 +293,9 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count) } void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len) +unsigned int len_written) { -virtqueue_fill(vq, elem, len, 0); +virtqueue_fill(vq, elem, len_written, 0); virtqueue_flush(vq, 1); } diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index df09993..153374f 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -191,10 +191,10 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void virtio_del_queue(VirtIODevice *vdev, int n); void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len); +unsigned int len_written); void virtqueue_flush(VirtQueue *vq, unsigned int count); void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len, unsigned int idx); +unsigned int len_written, unsigned int idx); void virtqueue_map_sg(struct iovec *sg, hwaddr *addr, size_t num_sg, int is_write);
Re: [Qemu-devel] [PATCH] virtio_rpmsg: set DRIVER_OK before using device
"Michael S. Tsirkin" writes: > virtio spec requires that all drivers set DRIVER_OK > before using devices. While rpmsg isn't yet > included in the virtio 1 spec, previous spec versions > also required this. > > virtio rpmsg violates this rule: is calls kick > before setting DRIVER_OK. > > The fix isn't trivial since simply calling virtio_device_ready earlier > would mean we might get an interrupt in parallel with adding buffers. > > Instead, split kick out to prepare+notify calls. prepare before > virtio_device_ready - when we know we won't get interrupts. notify right > afterwards. > > Signed-off-by: Michael S. Tsirkin Applied. I'll wait for Ohad to ack before sending to Linus. BTW I assume you have a version of qemu which warns on these kind of failures? That'd be nice to have! Thanks, Rusty. > --- > > Note: compile-tested only. > > drivers/rpmsg/virtio_rpmsg_bus.c | 17 - > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c > b/drivers/rpmsg/virtio_rpmsg_bus.c > index 92f6af6..73354ee 100644 > --- a/drivers/rpmsg/virtio_rpmsg_bus.c > +++ b/drivers/rpmsg/virtio_rpmsg_bus.c > @@ -951,6 +951,7 @@ static int rpmsg_probe(struct virtio_device *vdev) > void *bufs_va; > int err = 0, i; > size_t total_buf_space; > + bool notify; > > vrp = kzalloc(sizeof(*vrp), GFP_KERNEL); > if (!vrp) > @@ -1030,8 +1031,22 @@ static int rpmsg_probe(struct virtio_device *vdev) > } > } > > + /* > + * Prepare to kick but don't notify yet - we can't do this before > + * device is ready. > + */ > + notify = virtqueue_kick_prepare(vrp->rvq); > + > + /* From this point on, we can notify and get callbacks. */ > + virtio_device_ready(vdev); > + > /* tell the remote processor it can start sending messages */ > - virtqueue_kick(vrp->rvq); > + /* > + * this might be concurrent with callbacks, but we are only > + * doing notify, not a full kick here, so that's ok. > + */ > + if (notify) > + virtqueue_notify(vrp->rvq); > > dev_info(&vdev->dev, "rpmsg host is online\n"); > > -- > MST
Re: [Qemu-devel] [PATCH 2/2] virtio-pci: switch to modern accessors for 1.0
"Michael S. Tsirkin" writes: > On Wed, Mar 04, 2015 at 11:06:08AM +1030, Rusty Russell wrote: >> "Michael S. Tsirkin" writes: >> > virtio 1.0 config space is in LE format for all >> > devices, use modern wrappers when accessed through >> > the 1.0 BAR. >> >> Hmm, I'm not so sure about these patches. It's easy to miss the >> existence of the _modern variants, and they're 90% the same as the >> legacy variants. >> >> But as long as it's fixed... >> >> Thanks, >> Rusty. >> PS. rng, block, net and 9p virtio 1.0 seem to work OK on BE guests (LE >> host). > > Hmm good point. What if I'll rework this to get bool legacy parameter? The functions have access to vdev, so they can determine that already. Simply renaming the old version to _legacy would be enough. Thanks, Rusty.
Re: [Qemu-devel] [PATCH 2/2] virtio-pci: switch to modern accessors for 1.0
"Michael S. Tsirkin" writes: > virtio 1.0 config space is in LE format for all > devices, use modern wrappers when accessed through > the 1.0 BAR. Hmm, I'm not so sure about these patches. It's easy to miss the existence of the _modern variants, and they're 90% the same as the legacy variants. But as long as it's fixed... Thanks, Rusty. PS. rng, block, net and 9p virtio 1.0 seem to work OK on BE guests (LE host).
Re: [Qemu-devel] [PATCH 2/2] virtio_blk: fix comment for virtio 1.0
"Michael S. Tsirkin" writes: > Fix up comment to match virtio 1.0 logic: > virtio_blk_outhdr isn't the first elements anymore, > the only requirement is that it comes first in > the s/g list. > > Signed-off-by: Michael S. Tsirkin Thanks, both applied. Cheers, Rusty.
[Qemu-devel] Qemu and virtio 1.0
OK, I am trying to experiment with virtio 1.0 support using the latest kernel and MST's qemu tree: https://git.kernel.org/cgit/virt/kvm/mst/qemu.git/?h=virtio-1.0 The first issue is that the device config endian was wrong (see attached patch). I'm now setting up a BE guest on my x86 laptop, and a BE and LE guest on a BE powerpc machine, to check that all combinations work correctly. If others test too, that would be appreciated! Cheers, Rusty. >From 95ac91554ed602f856a2a5fcc25eaffcad1b1c8d Mon Sep 17 00:00:00 2001 From: Rusty Russell Date: Tue, 24 Feb 2015 14:47:44 +1030 Subject: [PATCH] virtio_config_write*/virtio_config_read*: Don't endian swap for virtio 1.0. Signed-off-by: Rusty Russell diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 079944c..882a31b 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -662,7 +662,12 @@ uint32_t virtio_config_readw(VirtIODevice *vdev, uint32_t addr) k->get_config(vdev, vdev->config); -val = lduw_p(vdev->config + addr); +/* Virtio 1.0 is always LE */ +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) { +val = lduw_le_p(vdev->config + addr); +} else { +val = lduw_p(vdev->config + addr); +} return val; } @@ -677,7 +682,12 @@ uint32_t virtio_config_readl(VirtIODevice *vdev, uint32_t addr) k->get_config(vdev, vdev->config); -val = ldl_p(vdev->config + addr); +/* Virtio 1.0 is always LE */ +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) { +val = ldl_le_p(vdev->config + addr); +} else { +val = ldl_p(vdev->config + addr); +} return val; } @@ -706,7 +716,12 @@ void virtio_config_writew(VirtIODevice *vdev, uint32_t addr, uint32_t data) return; } -stw_p(vdev->config + addr, val); +/* Virtio 1.0 is always LE */ +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) { +stw_le_p(vdev->config + addr, val); +} else { +stw_p(vdev->config + addr, val); +} if (k->set_config) { k->set_config(vdev, vdev->config); @@ -722,7 +737,12 @@ void virtio_config_writel(VirtIODevice *vdev, uint32_t addr, uint32_t data) return; } -stl_p(vdev->config + addr, val); +/* Virtio 1.0 is always LE */ +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) { +stl_le_p(vdev->config + addr, val); +} else { +stl_p(vdev->config + addr, val); +} if (k->set_config) { k->set_config(vdev, vdev->config);
Re: [Qemu-devel] [PATCH RFC 08/11] virtio_blk: use virtio v1.0 endian
Cornelia Huck writes: > Note that we care only about the fields still in use for virtio v1.0. > > Reviewed-by: Thomas Huth > Reviewed-by: David Hildenbrand > Signed-off-by: Cornelia Huck Hi Cornelia, These patches all look good; I'm a bit nervous about our testing missing some conversion, so we'll need qemu patches for PCI so we can test on other platforms too. Thanks, Rusty.
Re: [Qemu-devel] [PATCH RFC 03/11] virtio: support more feature bits
Cornelia Huck writes: > With virtio-1, we support more than 32 feature bits. Let's make > vdev->guest_features depend on the number of supported feature bits, > allowing us to grow the feature bits automatically. It's a judgement call, but I would say that simply using uint64_t will be sufficient for quite a while. Cheers, Rusty.
Re: [Qemu-devel] Using virtio for inter-VM communication
Jan Kiszka writes: > On 2014-06-12 04:27, Rusty Russell wrote: >> Henning Schild writes: >> It was also never implemented, and remains a thought experiment. >> However, implementing it in lguest should be fairly easy. > > The reason why a trusted helper, i.e. additional logic in the > hypervisor, is not our favorite solution is that we'd like to keep the > hypervisor as small as possible. I wouldn't exclude such an approach > categorically, but we have to weigh the costs (lines of code, additional > hypervisor interface) carefully against the gain (existing > specifications and guest driver infrastructure). Reasonable, but I think you'll find it is about the minimal implementation in practice. Unfortunately, I don't have time during the next 6 months to implement it myself :( > Back to VIRTIO_F_RING_SHMEM_ADDR (which you once brought up in an MCA > working group discussion): What speaks against introducing an > alternative encoding of addresses inside virtio data structures? The > idea of this flag was to replace guest-physical addresses with offsets > into a shared memory region associated with or part of a virtio > device. We would also need a way of defining the shared memory region. But that's not the problem. If such a feature is not accepted by the guest? How to you fall back? We don't add features which unmake the standard. > That would preserve zero-copy capabilities (as long as you can work > against the shared mem directly, e.g. doing DMA from a physical NIC or > storage device into it) and keep the hypervisor out of the loop. This seems ill thought out. How will you program a NIC via the virtio protocol without a hypervisor? And how will you make it safe? You'll need an IOMMU. But if you have an IOMMU you don't need shared memory. > Is it > too invasive to existing infrastructure or does it have some other pitfalls? You'll have to convince every vendor to implement your addition to the standard. Which is easier than inventing a completely new system, but it's not quite virtio. Cheers, Rusty.
Re: [Qemu-devel] Using virtio for inter-VM communication
Henning Schild writes: > Hi, > > i am working on the jailhouse[1] project and am currently looking at > inter-VM communication. We want to connect guests directly with virtual > consoles based on shared memory. The code complexity in the hypervisor > should be minimal, it should just make the shared memory discoverable > and provide a signaling mechanism. Hi Henning, The virtio assumption was that the host can see all of guest memory. This simplifies things significantly, and makes it efficient. If you don't have this, *someone* needs to do a copy. Usually the guest OS does a bounce buffer into your shared region. Goodbye performance. Or you can play remapping tricks. Goodbye performance again. My preferred model is to have a trusted helper (ie. host) which understands how to copy between virtio rings. The backend guest (to steal Xen vocab) R/O maps the descriptor, avail ring and used rings in the guest. It then asks the trusted helper to do various operation (copy into writable descriptor, copy out of readable descriptor, mark used). The virtio ring itself acts as a grant table. Note: that helper mechanism is completely protocol agnostic. It was also explicitly designed into the virtio mechanism (with its 4k boundaries for data structures and its 'len' field to indicate how much was written into the descriptor). It was also never implemented, and remains a thought experiment. However, implementing it in lguest should be fairly easy. Cheers, Rusty.
Re: [Qemu-devel] [PATCH v3 3/4] target-ppc: ppc can be either endian
Alexander Graf writes: > On 05/07/2014 12:19 PM, Greg Kurz wrote: > The uglyness about the current_cpu bit is that devices are usually not > supposed to know about the cpu accesses come from usually. But then > again devices shouldn't know about the endianness of a cpu either so I > guess it's ok to breach layers here. > > Rusty, do you have strong feelings either way? My only strong feeling is that qemu is horrible to get patches into. At this point, I really don't care :( Rusty.
Re: [Qemu-devel] [PATCH] virtio-rng: support multiple virtio-rng devices
Amos Kong writes: > Current hwrng core supports to register multiple hwrng devices, > and there is only one device really works in the same time. > QEMU alsu supports to have multiple virtio-rng backends. > > This patch changes virtio-rng driver to support multiple > virtio-rng devices. I never thought anyone would want multiple RNG devices, as it doesn't seem to make sense. But why not? Applied, Rusty. > > ]# cat /sys/class/misc/hw_random/rng_available > virtio_rng.0 virtio_rng.1 > ]# cat /sys/class/misc/hw_random/rng_current > virtio_rng.0 > ]# echo -n virtio_rng.1 > /sys/class/misc/hw_random/rng_current > ]# dd if=/dev/hwrng of=/dev/null > > Signed-off-by: Amos Kong > --- > drivers/char/hw_random/virtio-rng.c | 102 > ++-- > include/linux/hw_random.h | 2 +- > 2 files changed, 64 insertions(+), 40 deletions(-) > > diff --git a/drivers/char/hw_random/virtio-rng.c > b/drivers/char/hw_random/virtio-rng.c > index 2ce0e22..12e242b 100644 > --- a/drivers/char/hw_random/virtio-rng.c > +++ b/drivers/char/hw_random/virtio-rng.c > @@ -25,88 +25,108 @@ > #include > #include > > -static struct virtqueue *vq; > -static unsigned int data_avail; > -static DECLARE_COMPLETION(have_data); > -static bool busy; > + > +struct virtrng_info { > + struct virtio_device *vdev; > + struct hwrng hwrng; > + struct virtqueue *vq; > + unsigned int data_avail; > + struct completion have_data; > + bool busy; > +}; > > static void random_recv_done(struct virtqueue *vq) > { > + struct virtrng_info *vi = vq->vdev->priv; > + > /* We can get spurious callbacks, e.g. shared IRQs + virtio_pci. */ > - if (!virtqueue_get_buf(vq, &data_avail)) > + if (!virtqueue_get_buf(vi->vq, &vi->data_avail)) > return; > > - complete(&have_data); > + complete(&vi->have_data); > } > > /* The host will fill any buffer we give it with sweet, sweet randomness. */ > -static void register_buffer(u8 *buf, size_t size) > +static void register_buffer(struct virtrng_info *vi, u8 *buf, size_t size) > { > struct scatterlist sg; > > sg_init_one(&sg, buf, size); > > /* There should always be room for one buffer. */ > - virtqueue_add_inbuf(vq, &sg, 1, buf, GFP_KERNEL); > + virtqueue_add_inbuf(vi->vq, &sg, 1, buf, GFP_KERNEL); > > - virtqueue_kick(vq); > + virtqueue_kick(vi->vq); > } > > static int virtio_read(struct hwrng *rng, void *buf, size_t size, bool wait) > { > int ret; > + struct virtrng_info *vi = (struct virtrng_info *)rng->priv; > > - if (!busy) { > - busy = true; > - init_completion(&have_data); > - register_buffer(buf, size); > + if (!vi->busy) { > + vi->busy = true; > + init_completion(&vi->have_data); > + register_buffer(vi, buf, size); > } > > if (!wait) > return 0; > > - ret = wait_for_completion_killable(&have_data); > + ret = wait_for_completion_killable(&vi->have_data); > if (ret < 0) > return ret; > > - busy = false; > + vi->busy = false; > > - return data_avail; > + return vi->data_avail; > } > > static void virtio_cleanup(struct hwrng *rng) > { > - if (busy) > - wait_for_completion(&have_data); > -} > - > + struct virtrng_info *vi = (struct virtrng_info *)rng->priv; > > -static struct hwrng virtio_hwrng = { > - .name = "virtio", > - .cleanup= virtio_cleanup, > - .read = virtio_read, > -}; > + if (vi->busy) > + wait_for_completion(&vi->have_data); > +} > > static int probe_common(struct virtio_device *vdev) > { > - int err; > + int err, i; > + struct virtrng_info *vi = NULL; > + > + vi = kmalloc(sizeof(struct virtrng_info), GFP_KERNEL); > + vi->hwrng.name = kmalloc(40, GFP_KERNEL); > + init_completion(&vi->have_data); > + > + vi->hwrng.read = virtio_read; > + vi->hwrng.cleanup = virtio_cleanup; > + vi->hwrng.priv = (unsigned long)vi; > + vdev->priv = vi; > > - if (vq) { > - /* We only support one device for now */ > - return -EBUSY; > - } > /* We expect a single virtqueue. */ > - vq = virtio_find_single_vq(vdev, random_recv_done, "input"); > - if (IS_ERR(vq)) { > - err = PTR_ERR(vq); > - vq = NULL; > + vi->vq = virtio_find_single_vq(vdev, random_recv_done, "input"); > + if (IS_ERR(vi->vq)) { > + err = PTR_ERR(vi->vq); > + kfree(vi->hwrng.name); > + vi->vq = NULL; > + kfree(vi); > + vi = NULL; > return err; > } > > - err = hwrng_register(&virtio_hwrng); > + i = 0; > + do { > + sprintf(vi->hwrng.name, "virtio_rng.%d", i++); > + err = hwrng_register(&vi->hwrng); > + } while (err == -EEXIST)
Re: [Qemu-devel] virtio device error reporting best practice?
Markus Armbruster writes: > Rusty Russell writes: >> The litmus test: does *your* guest handle failures other than by giving >> up on the device? If so, sure, you need to have a sane error-reporting >> strategy. > > Err, isn't this a circular argument? No need for QEMU to report the > failure, because the guest won't handle it; no need to handle the > failure, because QEMU won't report it. > > What about this: would you make your guest handle failures if they were > reported? Perhaps I was unclear, that's what I meant. >>> The main reason I'm considering this stuff is for security reasons if >>> the guest asks for something really illegal or crazy what should the >>> expected behaviour of the host be? (at least secure I know that). >> >> If the guest userspace can do it, don't exit. If the kernel only, and >> it's should have known better, abort is OK. >> >> Sure that doesn't help much! > > Immediate exit() or abort() denies the guest the ability to degrade > service gracefully (disable the device, cry for help and try to hobble > on), or report its brokenness ungracefully (kernel panic, crash dump). > I doubt denying that is okay unless the device is so important that > without it you can't even hope to panic. Oh yes, I completely agree with you! But QEMU practice doesn't :) Cheers, Rusty.
Re: [Qemu-devel] virtio device error reporting best practice?
Dave Airlie writes: > So I'm looking at how best to do virtio gpu device error reporting, > and how to deal with illegal stuff, > > I've two levels of errors I want to support, > > a) unrecoverable or bad guest kernel programming errors, The QEMU standard approach is to exit at this point. No, really. > b) per 3D context errors from the renderer backend, > > (b) I can easily report in an event queue and the guest kernel can in > theory blow away the offenders, this is how GL works with some > extensions, That's probably sanest. > For (a) I can expect a response from every command I put into the main > GPU control queue, the response should always be no error, but in some > cases it will be because the guest hit some host resource error, or > asked for something insane, (guest kernel drivers would be broken in > most of these cases). > > Alternately I can use the separate event queue to send async errors > when the guest does something bad, > > I'm also considering adding some sort of flag in config space saying > the device needs a reset before it will continue doing anything, I generally dislike error codes which Never Happen; it's like making every void function return int just in case: the caller has no idea what to do if it fails. The litmus test: does *your* guest handle failures other than by giving up on the device? If so, sure, you need to have a sane error-reporting strategy. > The main reason I'm considering this stuff is for security reasons if > the guest asks for something really illegal or crazy what should the > expected behaviour of the host be? (at least secure I know that). If the guest userspace can do it, don't exit. If the kernel only, and it's should have known better, abort is OK. Sure that doesn't help much! Rusty.
Re: [Qemu-devel] [PATCH 1/8] virtio_get_byteswap: function for endian-ambivalent targets using virtio.
Alexander Graf writes: > On 02/18/2014 05:17 PM, Cornelia Huck wrote: >> Hm. So whatever_le for 1.0 devices, and virtio_whatever (checking the >> byteswap value) for legacy devices? The device implementation will be >> aware of the virtio version anyway. > > Yeah, but I would hope we want to share as much code as possible here, > so that config accessors can be shared between legacy virtio and 1.0 > virtio. And in that case we want to have a generic helper to read/write > pieces of that config space - which this patch introduces for us :). > >> (Btw., can some of those architectures supporting both le/be run with >> mixed le/be cpus? That would be a mess for legacy devices.) > > Yes, they can. No, we don't care :). There's per-cpu (what endian) and per-device (legacy or no). To do this Right, we'd need to consult both, but we don't always have that information. So this the minimal viable solution. We'll need a per-device legacy flag eventually (there are other changes than just endian). But getting the wrappers in place is a good first step. Cheers, Rusty.
Re: [Qemu-devel] [PATCH 1/8] virtio_get_byteswap: function for endian-ambivalent targets using virtio.
Greg Kurz writes: > On Tue, 18 Feb 2014 20:25:15 +0100 > Andreas Färber wrote: >> Am 18.02.2014 13:38, schrieb Greg Kurz: >> > diff --git a/include/hw/virtio/virtio-access.h >> > b/include/hw/virtio/virtio-access.h new file mode 100644 >> > index 000..2e22a47 >> > --- /dev/null >> > +++ b/include/hw/virtio/virtio-access.h >> > @@ -0,0 +1,132 @@ >> > +/* >> > + * Virtio Accessor Support: In case your target can change endian. >> > + * >> > + * Copyright IBM, Corp. 2013 >> > + * >> > + * Authors: >> > + * Rusty Russell >> > + * >> > + * This work is licensed under the terms of the GNU GPL, version 2. >> > See >> > + * the COPYING file in the top-level directory. >> [snip] >> >> I notice that this has been GPL-2.0 from Rusty's first series on. Is >> there a reason not to make the new file GPL-2.0+? >> >> Cf. http://wiki.qemu.org/Relicensing >> >> Thanks, >> Andreas >> > > Rusty ? It is your call. :) I cut & paste that header. I always prefer 2+ to 2 anyway. Please fix: * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation, either version 2 of the License, or * (at your option) any later version. Thanks, Rusty.
Re: [Qemu-devel] [PATCH 0/7] virtio endian-ambivalent target fixes.
Thomas Huth writes: > On Thu, 17 Oct 2013 14:23:35 +1030 > Rusty Russell wrote: > >> This is a re-transmit of the core of the virtio endian code. Since >> there seems to be some interest in ARM BE virtio, I've separated this from >> the direct problem I was solving: PowerPC LE. >> >> Please apply! >> Rusty. >> >> Rusty Russell (7): >> virtio_get_byteswap: function for endian-ambivalent targets using >> virtio. >> virtio: allow byte swapping for vring and config access >> hw/net/virtio-net: use virtio wrappers to access headers. >> hw/net/virtio-balloon: use virtio wrappers to access page frame >> numbers. >> hw/block/virtio-blk: use virtio wrappers to access headers. >> hw/scsi/virtio-scsi: use virtio wrappers to access headers. >> hw/char/virtio-serial-bus: use virtio wrappers to access headers. > > Hi Rusty! > > May I ask what's the current status of your virtio endian patches? We > likely need something similar when we enable Virtio v1.0 for S390 > virtio-ccw since we then have to byteswap the virtio stuff there, too. > So I recently started to have a look at this... However, in your > patches, the byteswapping seems to be activated/disabled globally, with > the "virtio_byteswap" variable. But with Virtio v1.0, the guest can > decide on a per-device basis whether it wants to drive the device in > v1.0 mode (--> byteswap on S390) or in v0.9 legacy mode (--> no > byteswap), depending on whether it sets the VIRTIO_F_VERSION_1 feature > bit or not. I guess other architectures will have the same problem with > Virtio 1.0, too, when the guests are not running in little endian mode. > So I wonder whether it would it be feasible to change the code so that > the decision of byteswapping or not is done on a per-device basis > instead? What do you think? Hi Thomas, That is definitely the end-goal: these patches are simply to enable current legacy virtio devices. Since we missed 1.3, we're supposed to be in 2.0. Cheers, Rusty.
Re: [Qemu-devel] QueuePFN peculiarity in virtio-mmio
Laszlo Ersek writes: > Hi, > > "Appendix X: virtio-mmio" in the virtio spec says Hi Laszlo, You're in luck! We're currently revising the virtio spec under the OASIS banner. I'd really like you to post your suggestion to their mailing list (yes, you will have to subscribe to it, for IP reasons: virtio-comment-subscr...@lists.oasis-open.org. Thanks, Rusty.
[Qemu-devel] [PATCH 0/7] virtio endian-ambivalent target fixes.
This is a re-transmit of the core of the virtio endian code. Since there seems to be some interest in ARM BE virtio, I've separated this from the direct problem I was solving: PowerPC LE. Please apply! Rusty. Rusty Russell (7): virtio_get_byteswap: function for endian-ambivalent targets using virtio. virtio: allow byte swapping for vring and config access hw/net/virtio-net: use virtio wrappers to access headers. hw/net/virtio-balloon: use virtio wrappers to access page frame numbers. hw/block/virtio-blk: use virtio wrappers to access headers. hw/scsi/virtio-scsi: use virtio wrappers to access headers. hw/char/virtio-serial-bus: use virtio wrappers to access headers. hw/block/virtio-blk.c | 35 +- hw/char/virtio-serial-bus.c | 34 +- hw/net/virtio-net.c | 15 +++-- hw/scsi/virtio-scsi.c | 33 +- hw/virtio/virtio-balloon.c| 3 +- hw/virtio/virtio.c| 34 ++ include/hw/virtio/virtio-access.h | 133 ++ include/hw/virtio/virtio.h| 2 + stubs/Makefile.objs | 1 + stubs/virtio_get_byteswap.c | 6 ++ 10 files changed, 225 insertions(+), 71 deletions(-) create mode 100644 include/hw/virtio/virtio-access.h create mode 100644 stubs/virtio_get_byteswap.c -- 1.8.1.2
[Qemu-devel] [PATCH 3/7] hw/net/virtio-net: use virtio wrappers to access headers.
Signed-off-by: Rusty Russell Reviewed-by: Anthony Liguori --- hw/net/virtio-net.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 22dbd05..431a4b6 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -23,6 +23,7 @@ #include "hw/virtio/virtio-bus.h" #include "qapi/qmp/qjson.h" #include "monitor/monitor.h" +#include "hw/virtio/virtio-access.h" #define VIRTIO_NET_VM_VERSION11 @@ -72,8 +73,8 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config) VirtIONet *n = VIRTIO_NET(vdev); struct virtio_net_config netcfg; -stw_p(&netcfg.status, n->status); -stw_p(&netcfg.max_virtqueue_pairs, n->max_queues); +virtio_stw_p(&netcfg.status, n->status); +virtio_stw_p(&netcfg.max_virtqueue_pairs, n->max_queues); memcpy(netcfg.mac, n->mac, ETH_ALEN); memcpy(config, &netcfg, n->config_size); } @@ -618,7 +619,7 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, s = iov_to_buf(iov, iov_cnt, 0, &mac_data.entries, sizeof(mac_data.entries)); -mac_data.entries = ldl_p(&mac_data.entries); +mac_data.entries = virtio_ldl_p(&mac_data.entries); if (s != sizeof(mac_data.entries)) { goto error; } @@ -645,7 +646,7 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, s = iov_to_buf(iov, iov_cnt, 0, &mac_data.entries, sizeof(mac_data.entries)); -mac_data.entries = ldl_p(&mac_data.entries); +mac_data.entries = virtio_ldl_p(&mac_data.entries); if (s != sizeof(mac_data.entries)) { goto error; } @@ -684,7 +685,7 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd, NetClientState *nc = qemu_get_queue(n->nic); s = iov_to_buf(iov, iov_cnt, 0, &vid, sizeof(vid)); -vid = lduw_p(&vid); +vid = virtio_lduw_p(&vid); if (s != sizeof(vid)) { return VIRTIO_NET_ERR; } @@ -721,7 +722,7 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd, return VIRTIO_NET_ERR; } -queues = lduw_p(&mq.virtqueue_pairs); +queues = virtio_lduw_p(&mq.virtqueue_pairs); if (queues < VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN || queues > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX || @@ -1020,7 +1021,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t } if (mhdr_cnt) { -stw_p(&mhdr.num_buffers, i); +virtio_stw_p(&mhdr.num_buffers, i); iov_from_buf(mhdr_sg, mhdr_cnt, 0, &mhdr.num_buffers, sizeof mhdr.num_buffers); -- 1.8.1.2
[Qemu-devel] [PATCH 6/7] hw/scsi/virtio-scsi: use virtio wrappers to access headers.
Note that st*_raw and ld*_raw are effectively replaced by st*_p and ld*_p. Signed-off-by: Rusty Russell Reviewed-by: Anthony Liguori --- hw/scsi/virtio-scsi.c | 33 + 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 26d95a1..e581dd0 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -18,6 +18,7 @@ #include #include #include +#include "hw/virtio/virtio-access.h" typedef struct VirtIOSCSIReq { VirtIOSCSI *dev; @@ -309,12 +310,12 @@ static void virtio_scsi_command_complete(SCSIRequest *r, uint32_t status, req->resp.cmd->response = VIRTIO_SCSI_S_OK; req->resp.cmd->status = status; if (req->resp.cmd->status == GOOD) { -req->resp.cmd->resid = tswap32(resid); +req->resp.cmd->resid = virtio_tswap32(resid); } else { req->resp.cmd->resid = 0; sense_len = scsi_req_get_sense(r, req->resp.cmd->sense, VIRTIO_SCSI_SENSE_SIZE); -req->resp.cmd->sense_len = tswap32(sense_len); +req->resp.cmd->sense_len = virtio_tswap32(sense_len); } virtio_scsi_complete_req(req); } @@ -410,16 +411,16 @@ static void virtio_scsi_get_config(VirtIODevice *vdev, VirtIOSCSIConfig *scsiconf = (VirtIOSCSIConfig *)config; VirtIOSCSICommon *s = VIRTIO_SCSI_COMMON(vdev); -stl_raw(&scsiconf->num_queues, s->conf.num_queues); -stl_raw(&scsiconf->seg_max, 128 - 2); -stl_raw(&scsiconf->max_sectors, s->conf.max_sectors); -stl_raw(&scsiconf->cmd_per_lun, s->conf.cmd_per_lun); -stl_raw(&scsiconf->event_info_size, sizeof(VirtIOSCSIEvent)); -stl_raw(&scsiconf->sense_size, s->sense_size); -stl_raw(&scsiconf->cdb_size, s->cdb_size); -stw_raw(&scsiconf->max_channel, VIRTIO_SCSI_MAX_CHANNEL); -stw_raw(&scsiconf->max_target, VIRTIO_SCSI_MAX_TARGET); -stl_raw(&scsiconf->max_lun, VIRTIO_SCSI_MAX_LUN); +virtio_stl_p(&scsiconf->num_queues, s->conf.num_queues); +virtio_stl_p(&scsiconf->seg_max, 128 - 2); +virtio_stl_p(&scsiconf->max_sectors, s->conf.max_sectors); +virtio_stl_p(&scsiconf->cmd_per_lun, s->conf.cmd_per_lun); +virtio_stl_p(&scsiconf->event_info_size, sizeof(VirtIOSCSIEvent)); +virtio_stl_p(&scsiconf->sense_size, s->sense_size); +virtio_stl_p(&scsiconf->cdb_size, s->cdb_size); +virtio_stw_p(&scsiconf->max_channel, VIRTIO_SCSI_MAX_CHANNEL); +virtio_stw_p(&scsiconf->max_target, VIRTIO_SCSI_MAX_TARGET); +virtio_stl_p(&scsiconf->max_lun, VIRTIO_SCSI_MAX_LUN); } static void virtio_scsi_set_config(VirtIODevice *vdev, @@ -428,14 +429,14 @@ static void virtio_scsi_set_config(VirtIODevice *vdev, VirtIOSCSIConfig *scsiconf = (VirtIOSCSIConfig *)config; VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev); -if ((uint32_t) ldl_raw(&scsiconf->sense_size) >= 65536 || -(uint32_t) ldl_raw(&scsiconf->cdb_size) >= 256) { +if ((uint32_t) virtio_ldl_p(&scsiconf->sense_size) >= 65536 || +(uint32_t) virtio_ldl_p(&scsiconf->cdb_size) >= 256) { error_report("bad data written to virtio-scsi configuration space"); exit(1); } -vs->sense_size = ldl_raw(&scsiconf->sense_size); -vs->cdb_size = ldl_raw(&scsiconf->cdb_size); +vs->sense_size = virtio_ldl_p(&scsiconf->sense_size); +vs->cdb_size = virtio_ldl_p(&scsiconf->cdb_size); } static uint32_t virtio_scsi_get_features(VirtIODevice *vdev, -- 1.8.1.2
[Qemu-devel] [PATCH 4/7] hw/net/virtio-balloon: use virtio wrappers to access page frame numbers.
Signed-off-by: Rusty Russell Reviewed-by: Anthony Liguori --- hw/virtio/virtio-balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 9504877..97c4ac5 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -30,6 +30,7 @@ #endif #include "hw/virtio/virtio-bus.h" +#include "hw/virtio/virtio-access.h" static void balloon_page(void *addr, int deflate) { @@ -192,7 +193,7 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) ram_addr_t pa; ram_addr_t addr; -pa = (ram_addr_t)ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT; +pa = (ram_addr_t)virtio_ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT; offset += 4; /* FIXME: remove get_system_memory(), but how? */ -- 1.8.1.2
[Qemu-devel] [PATCH 2/7] virtio: allow byte swapping for vring and config access
This is based on a simpler patch by Anthony Liguouri, which only handled the vring accesses. We also need some drivers to access these helpers, eg. for data which contains headers. Signed-off-by: Rusty Russell --- hw/virtio/virtio.c | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 7730780..bbf9f15 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -107,49 +107,49 @@ static inline uint64_t vring_desc_addr(hwaddr desc_pa, int i) { hwaddr pa; pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, addr); -return ldq_phys(pa); +return virtio_ldq_phys(pa); } static inline uint32_t vring_desc_len(hwaddr desc_pa, int i) { hwaddr pa; pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, len); -return ldl_phys(pa); +return virtio_ldl_phys(pa); } static inline uint16_t vring_desc_flags(hwaddr desc_pa, int i) { hwaddr pa; pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, flags); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_desc_next(hwaddr desc_pa, int i) { hwaddr pa; pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, next); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_avail_flags(VirtQueue *vq) { hwaddr pa; pa = vq->vring.avail + offsetof(VRingAvail, flags); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_avail_idx(VirtQueue *vq) { hwaddr pa; pa = vq->vring.avail + offsetof(VRingAvail, idx); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_avail_ring(VirtQueue *vq, int i) { hwaddr pa; pa = vq->vring.avail + offsetof(VRingAvail, ring[i]); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_used_event(VirtQueue *vq) @@ -161,42 +161,42 @@ static inline void vring_used_ring_id(VirtQueue *vq, int i, uint32_t val) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, ring[i].id); -stl_phys(pa, val); +virtio_stl_phys(pa, val); } static inline void vring_used_ring_len(VirtQueue *vq, int i, uint32_t val) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, ring[i].len); -stl_phys(pa, val); +virtio_stl_phys(pa, val); } static uint16_t vring_used_idx(VirtQueue *vq) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, idx); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline void vring_used_idx_set(VirtQueue *vq, uint16_t val) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, idx); -stw_phys(pa, val); +virtio_stw_phys(pa, val); } static inline void vring_used_flags_set_bit(VirtQueue *vq, int mask) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, flags); -stw_phys(pa, lduw_phys(pa) | mask); +virtio_stw_phys(pa, virtio_lduw_phys(pa) | mask); } static inline void vring_used_flags_unset_bit(VirtQueue *vq, int mask) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, flags); -stw_phys(pa, lduw_phys(pa) & ~mask); +virtio_stw_phys(pa, virtio_lduw_phys(pa) & ~mask); } static inline void vring_avail_event(VirtQueue *vq, uint16_t val) @@ -206,7 +206,7 @@ static inline void vring_avail_event(VirtQueue *vq, uint16_t val) return; } pa = vq->vring.used + offsetof(VRingUsed, ring[vq->vring.num]); -stw_phys(pa, val); +virtio_stw_phys(pa, val); } void virtio_queue_set_notification(VirtQueue *vq, int enable) -- 1.8.1.2
[Qemu-devel] [PATCH 5/7] hw/block/virtio-blk: use virtio wrappers to access headers.
Note that st*_raw and ld*_raw are effectively replaced by st*_p and ld*_p. Signed-off-by: Rusty Russell Reviewed-by: Anthony Liguori --- hw/block/virtio-blk.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 49a23c3..111ad4b 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -26,6 +26,7 @@ # include #endif #include "hw/virtio/virtio-bus.h" +#include "hw/virtio/virtio-access.h" typedef struct VirtIOBlockReq { @@ -77,7 +78,7 @@ static void virtio_blk_rw_complete(void *opaque, int ret) trace_virtio_blk_rw_complete(req, ret); if (ret) { -bool is_read = !(ldl_p(&req->out->type) & VIRTIO_BLK_T_OUT); +bool is_read = !(virtio_ldl_p(&req->out->type) & VIRTIO_BLK_T_OUT); if (virtio_blk_handle_rw_error(req, -ret, is_read)) return; } @@ -224,12 +225,12 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req) hdr.status = CHECK_CONDITION; } -stl_p(&req->scsi->errors, - hdr.status | (hdr.msg_status << 8) | - (hdr.host_status << 16) | (hdr.driver_status << 24)); -stl_p(&req->scsi->residual, hdr.resid); -stl_p(&req->scsi->sense_len, hdr.sb_len_wr); -stl_p(&req->scsi->data_len, hdr.dxfer_len); +virtio_stl_p(&req->scsi->errors, + hdr.status | (hdr.msg_status << 8) | + (hdr.host_status << 16) | (hdr.driver_status << 24)); +virtio_stl_p(&req->scsi->residual, hdr.resid); +virtio_stl_p(&req->scsi->sense_len, hdr.sb_len_wr); +virtio_stl_p(&req->scsi->data_len, hdr.dxfer_len); virtio_blk_req_complete(req, status); g_free(req); @@ -240,7 +241,7 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req) fail: /* Just put anything nonzero so that the ioctl fails in the guest. */ -stl_p(&req->scsi->errors, 255); +virtio_stl_p(&req->scsi->errors, 255); virtio_blk_req_complete(req, status); g_free(req); } @@ -286,7 +287,7 @@ static void virtio_blk_handle_write(VirtIOBlockReq *req, MultiReqBuffer *mrb) BlockRequest *blkreq; uint64_t sector; -sector = ldq_p(&req->out->sector); +sector = virtio_ldq_p(&req->out->sector); bdrv_acct_start(req->dev->bs, &req->acct, req->qiov.size, BDRV_ACCT_WRITE); @@ -320,7 +321,7 @@ static void virtio_blk_handle_read(VirtIOBlockReq *req) { uint64_t sector; -sector = ldq_p(&req->out->sector); +sector = virtio_ldq_p(&req->out->sector); bdrv_acct_start(req->dev->bs, &req->acct, req->qiov.size, BDRV_ACCT_READ); @@ -358,7 +359,7 @@ static void virtio_blk_handle_request(VirtIOBlockReq *req, req->out = (void *)req->elem.out_sg[0].iov_base; req->in = (void *)req->elem.in_sg[req->elem.in_num - 1].iov_base; -type = ldl_p(&req->out->type); +type = virtio_ldl_p(&req->out->type); if (type & VIRTIO_BLK_T_FLUSH) { virtio_blk_handle_flush(req, mrb); @@ -487,12 +488,12 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) bdrv_get_geometry(s->bs, &capacity); memset(&blkcfg, 0, sizeof(blkcfg)); -stq_raw(&blkcfg.capacity, capacity); -stl_raw(&blkcfg.seg_max, 128 - 2); -stw_raw(&blkcfg.cylinders, s->conf->cyls); -stl_raw(&blkcfg.blk_size, blk_size); -stw_raw(&blkcfg.min_io_size, s->conf->min_io_size / blk_size); -stw_raw(&blkcfg.opt_io_size, s->conf->opt_io_size / blk_size); +virtio_stq_p(&blkcfg.capacity, capacity); +virtio_stl_p(&blkcfg.seg_max, 128 - 2); +virtio_stw_p(&blkcfg.cylinders, s->conf->cyls); +virtio_stl_p(&blkcfg.blk_size, blk_size); +virtio_stw_p(&blkcfg.min_io_size, s->conf->min_io_size / blk_size); +virtio_stw_p(&blkcfg.opt_io_size, s->conf->opt_io_size / blk_size); blkcfg.heads = s->conf->heads; /* * We must ensure that the block device capacity is a multiple of -- 1.8.1.2
[Qemu-devel] [PATCH 1/7] virtio_get_byteswap: function for endian-ambivalent targets using virtio.
virtio data structures are defined as "target endian", which assumes that's a fixed value. In fact, that actually means it's platform-specific. The OASIS virtio 1.0 spec will fix this. Meanwhile, create a hook for little endian ppc (and potentially ARM). This is called at device reset time (which is done before any driver is loaded) since it may involve a system call to get the status when running under kvm. Signed-off-by: Rusty Russell --- hw/virtio/virtio.c| 6 ++ include/hw/virtio/virtio-access.h | 133 ++ include/hw/virtio/virtio.h| 2 + stubs/Makefile.objs | 1 + stubs/virtio_get_byteswap.c | 6 ++ 5 files changed, 148 insertions(+) create mode 100644 include/hw/virtio/virtio-access.h create mode 100644 stubs/virtio_get_byteswap.c diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 2f1e73b..7730780 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -18,6 +18,9 @@ #include "hw/virtio/virtio.h" #include "qemu/atomic.h" #include "hw/virtio/virtio-bus.h" +#include "hw/virtio/virtio-access.h" + +bool virtio_byteswap = false; /* * The alignment to use between consumer and producer parts of vring. @@ -543,6 +546,9 @@ void virtio_reset(void *opaque) virtio_set_status(vdev, 0); +/* We assume all devices are the same endian. */ +virtio_byteswap = virtio_get_byteswap(); + if (k->reset) { k->reset(vdev); } diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h new file mode 100644 index 000..2263c9c --- /dev/null +++ b/include/hw/virtio/virtio-access.h @@ -0,0 +1,133 @@ +/* + * Virtio Accessor Support: In case your target can change endian. + * + * Copyright IBM, Corp. 2013 + * + * Authors: + * Rusty Russell + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ +#ifndef _QEMU_VIRTIO_ACCESS_H +#define _QEMU_VIRTIO_ACCESS_H +#include "hw/virtio/virtio.h" + +/* Initialized by virtio_get_byteswap() at any virtio device reset. */ +extern bool virtio_byteswap; + +static inline uint16_t virtio_lduw_phys(hwaddr pa) +{ +if (virtio_byteswap) { +return bswap16(lduw_phys(pa)); +} +return lduw_phys(pa); + +} + +static inline uint32_t virtio_ldl_phys(hwaddr pa) +{ +if (virtio_byteswap) { +return bswap32(ldl_phys(pa)); +} +return ldl_phys(pa); +} + +static inline uint64_t virtio_ldq_phys(hwaddr pa) +{ +if (virtio_byteswap) { +return bswap64(ldq_phys(pa)); +} +return ldq_phys(pa); +} + +static inline void virtio_stw_phys(hwaddr pa, uint16_t value) +{ +if (virtio_byteswap) { +stw_phys(pa, bswap16(value)); +} else { +stw_phys(pa, value); +} +} + +static inline void virtio_stl_phys(hwaddr pa, uint32_t value) +{ +if (virtio_byteswap) { +stl_phys(pa, bswap32(value)); +} else { +stl_phys(pa, value); +} +} + +static inline void virtio_stw_p(void *ptr, uint16_t v) +{ +if (virtio_byteswap) { +stw_p(ptr, bswap16(v)); +} else { +stw_p(ptr, v); +} +} + +static inline void virtio_stl_p(void *ptr, uint32_t v) +{ +if (virtio_byteswap) { +stl_p(ptr, bswap32(v)); +} else { +stl_p(ptr, v); +} +} + +static inline void virtio_stq_p(void *ptr, uint64_t v) +{ +if (virtio_byteswap) { +stq_p(ptr, bswap64(v)); +} else { +stq_p(ptr, v); +} +} + +static inline int virtio_lduw_p(const void *ptr) +{ +if (virtio_byteswap) { +return bswap16(lduw_p(ptr)); +} else { +return lduw_p(ptr); +} +} + +static inline int virtio_ldl_p(const void *ptr) +{ +if (virtio_byteswap) { +return bswap32(ldl_p(ptr)); +} else { +return ldl_p(ptr); +} +} + +static inline uint64_t virtio_ldq_p(const void *ptr) +{ +if (virtio_byteswap) { +return bswap64(ldq_p(ptr)); +} else { +return ldq_p(ptr); +} +} + +static inline uint32_t virtio_tswap32(uint32_t s) +{ +if (virtio_byteswap) { +return bswap32(tswap32(s)); +} else { +return tswap32(s); +} +} + +static inline void virtio_tswap32s(uint32_t *s) +{ +tswap32s(s); +if (virtio_byteswap) { +*s = bswap32(*s); +} +} +#endif /* _QEMU_VIRTIO_ACCESS_H */ diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index a90522d..065e1d9 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -249,4 +249,6 @@ void virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign, bool set_handler); void virtio_queue_notify_vq(VirtQueue *vq); void virtio_irq(VirtQueue *vq); + +extern bool virtio_get_byteswap(void); #endif diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index
[Qemu-devel] [PATCH 7/7] hw/char/virtio-serial-bus: use virtio wrappers to access headers.
Signed-off-by: Rusty Russell Reviewed-by: Anthony Liguori --- hw/char/virtio-serial-bus.c | 34 ++ 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index 703f026..2dc0ccf 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -24,6 +24,7 @@ #include "hw/sysbus.h" #include "trace.h" #include "hw/virtio/virtio-serial.h" +#include "hw/virtio/virtio-access.h" static VirtIOSerialPort *find_port_by_id(VirtIOSerial *vser, uint32_t id) { @@ -185,9 +186,9 @@ static size_t send_control_event(VirtIOSerial *vser, uint32_t port_id, { struct virtio_console_control cpkt; -stl_p(&cpkt.id, port_id); -stw_p(&cpkt.event, event); -stw_p(&cpkt.value, value); +virtio_stl_p(&cpkt.id, port_id); +virtio_stw_p(&cpkt.event, event); +virtio_stw_p(&cpkt.value, value); trace_virtio_serial_send_control_event(port_id, event, value); return send_control_msg(vser, &cpkt, sizeof(cpkt)); @@ -291,8 +292,8 @@ static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len) return; } -cpkt.event = lduw_p(&gcpkt->event); -cpkt.value = lduw_p(&gcpkt->value); +cpkt.event = virtio_lduw_p(&gcpkt->event); +cpkt.value = virtio_lduw_p(&gcpkt->value); trace_virtio_serial_handle_control_message(cpkt.event, cpkt.value); @@ -312,10 +313,10 @@ static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len) return; } -port = find_port_by_id(vser, ldl_p(&gcpkt->id)); +port = find_port_by_id(vser, virtio_ldl_p(&gcpkt->id)); if (!port) { error_report("virtio-serial-bus: Unexpected port id %u for device %s", - ldl_p(&gcpkt->id), vser->bus.qbus.name); + virtio_ldl_p(&gcpkt->id), vser->bus.qbus.name); return; } @@ -342,9 +343,9 @@ static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len) } if (port->name) { -stl_p(&cpkt.id, port->id); -stw_p(&cpkt.event, VIRTIO_CONSOLE_PORT_NAME); -stw_p(&cpkt.value, 1); +virtio_stl_p(&cpkt.id, port->id); +virtio_stw_p(&cpkt.event, VIRTIO_CONSOLE_PORT_NAME); +virtio_stw_p(&cpkt.value, 1); buffer_len = sizeof(cpkt) + strlen(port->name) + 1; buffer = g_malloc(buffer_len); @@ -536,7 +537,7 @@ static void virtio_serial_save(QEMUFile *f, void *opaque) qemu_put_be32s(f, &s->config.max_nr_ports); /* The ports map */ -max_nr_ports = tswap32(s->config.max_nr_ports); +max_nr_ports = virtio_tswap32(s->config.max_nr_ports); for (i = 0; i < (max_nr_ports + 31) / 32; i++) { qemu_put_be32s(f, &s->ports_map[i]); } @@ -690,8 +691,8 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) qemu_get_be16s(f, &s->config.rows); qemu_get_be32s(f, &max_nr_ports); -tswap32s(&max_nr_ports); -if (max_nr_ports > tswap32(s->config.max_nr_ports)) { +virtio_tswap32s(&max_nr_ports); +if (max_nr_ports > virtio_tswap32(s->config.max_nr_ports)) { /* Source could have had more ports than us. Fail migration. */ return -EINVAL; } @@ -760,7 +761,7 @@ static uint32_t find_free_port_id(VirtIOSerial *vser) { unsigned int i, max_nr_ports; -max_nr_ports = tswap32(vser->config.max_nr_ports); +max_nr_ports = virtio_tswap32(vser->config.max_nr_ports); for (i = 0; i < (max_nr_ports + 31) / 32; i++) { uint32_t map, bit; @@ -846,7 +847,7 @@ static int virtser_port_qdev_init(DeviceState *qdev) } } -max_nr_ports = tswap32(port->vser->config.max_nr_ports); +max_nr_ports = virtio_tswap32(port->vser->config.max_nr_ports); if (port->id >= max_nr_ports) { error_report("virtio-serial-bus: Out-of-range port id specified, max. allowed: %u", max_nr_ports - 1); @@ -946,7 +947,8 @@ static int virtio_serial_device_init(VirtIODevice *vdev) vser->ovqs[i] = virtio_add_queue(vdev, 128, handle_output); } -vser->config.max_nr_ports = tswap32(vser->serial.max_virtserial_ports); +vser->config.max_nr_ports = +virtio_tswap32(vser->serial.max_virtserial_ports); vser->ports_map = g_malloc0(((vser->serial.max_virtserial_ports + 31) / 32) * sizeof(vser->ports_map[0])); /* -- 1.8.1.2
Re: [Qemu-devel] [PATCH 2/2] virtio: refresh registers at reset time
Greg Kurz writes: > We need to support the guest endianness as soon as a virtio device shows > up. Alex suggested this can achieved by calling cpu_synchronize_state(). > > To have it working on PowerPC, we need to add LPCR in the sync register > functions. > > Signed-off-by: Greg Kurz Excellent! Alex, if you take this, I'll be happy to rebase and re-test the virtio endianness patches on top. Cheers, Rusty.
Re: [Qemu-devel] [PATCH] hw/9pfs/virtio_9p_device: use virtio wrappers to access headers.
Greg Kurz writes: > Follow-up to Rusty's virtio endianness serie: enough to get a working > virtfs mount. > > Note that st*_raw and ld*_raw are effectively replaced by st*_p and ld*_p. > > Signed-off-by: Greg Kurz Thanks! I've reworked my patches in line with the anticipated KVM_GET_ONE_REG from Mikey, and put this into the series. Mikey, here's the template I assumed (needs CONFIG_KVM implementation of kvmppc_update_spr_lpcr). Cheers, Rusty. FIXME: Implement for KVM using KVM_GET_ONE_REG! diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 771cfbe..30d8af6 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -29,6 +29,7 @@ int kvmppc_clear_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits); int kvmppc_or_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits); int kvmppc_set_tcr(PowerPCCPU *cpu); int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu); +void kvmppc_update_spr_lpcr(PowerPCCPU *cpu); #ifndef CONFIG_USER_ONLY off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem); void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd); @@ -159,6 +160,10 @@ static inline bool kvmppc_has_cap_epr(void) { return false; } + +static inline void kvmppc_update_spr_lpcr(PowerPCCPU *cpu) +{ +} #endif #ifndef CONFIG_KVM
Re: [Qemu-devel] [PATCH 1/8] virtio_get_byteswap: function for endian-ambivalent targets using virtio.
Benjamin Herrenschmidt writes: > On Mon, 2013-08-12 at 17:29 +0930, Rusty Russell wrote: >> virtio data structures are defined as "target endian", which assumes >> that's a fixed value. In fact, that actually means it's >> platform-specific. >> >> Hopefully the OASIS virtio 1.0 spec will fix this. Meanwhile, create >> a hook for little endian ppc. > > Ok, sorry if I missed a previous debate on that one but why do you do a > call-out to architecture specific stuff (that is not even inline) on > every access ? > > If we consider that virtio byte order is global, can't you make it a > global that is *set* by the architecture rather than *polled* by > virtio ? OK, so after some more offline discussion it turns out these patches won't really work reliably, since powerpc kvm doesn't reflect the register into qemu. Mikey N has promised me a KVM_GET_ONE_REG to get the register, and I'll rework on top of that: we will query that whenever a device is reset (which Linux does on every device init, so it captures the kexec case too). Cheers, Rusty.
Re: [Qemu-devel] Would virtio support 64 bit address for vring virtqueue?
"Xie, Huawei" writes: > If this is the case, one possible fix would be: > Write two continuous 32bit DWORD to combine a 64bit address > Use the upper 12 bits of PFN val to indicate if it is combined write > In this way, we wouldn't break other virtio driver, register layout > and only need a few lines of modification. Note that (with OASIS) we are currently working on a v1.0 of the virtio spec. I recommend the OASIS virtio-dev mailing list ( https://www.oasis-open.org/mlmanage/ ) and in particular the thread on Michael Tsirkin's proposal on using PCI capabilities: https://lists.oasis-open.org/archives/virtio-comment/201308/msg00087.html Cheers, Rusty.
Re: [Qemu-devel] [PATCH 1/8] virtio_get_byteswap: function for endian-ambivalent targets using virtio.
Benjamin Herrenschmidt writes: > On Tue, 2013-08-13 at 13:50 +0930, Rusty Russell wrote: >> We can have it call once (eg. when the first and storing the status >> word) and store the result. > > And fail with kexec of a different endian kernel :-) Let's not bother > yet. Merge it and then we see if we can optimize. Yeah, my code actually did it every status bye write, which unintentionally solved this. But agreed, let's let these patches stand... Cheers, Rusty.
Re: [Qemu-devel] [PATCH 1/8] virtio_get_byteswap: function for endian-ambivalent targets using virtio.
Anthony Liguori writes: > Benjamin Herrenschmidt writes: > >> On Mon, 2013-08-12 at 17:29 +0930, Rusty Russell wrote: >>> virtio data structures are defined as "target endian", which assumes >>> that's a fixed value. In fact, that actually means it's >>> platform-specific. >>> >>> Hopefully the OASIS virtio 1.0 spec will fix this. Meanwhile, create >>> a hook for little endian ppc. >> >> Ok, sorry if I missed a previous debate on that one but why do you do a >> call-out to architecture specific stuff (that is not even inline) on >> every access ? Anthony said he wanted it that way: my initial patch was more optimized. > Let's focus on getting something merged. Then we can muck around with > it down the road. > > Having target-ppc call into virtio is a layering violation. This > approach keeps the dependencies cleaner. We can have it call once (eg. when the first and storing the status word) and store the result. Cheers, Rusty.
[Qemu-devel] [PATCH 2/8] target-ppc: ppc64 target's virtio can be either endian.
We base it on the OS endian, as reflected by the endianness of the interrupt vectors. Suggested-by: Benjamin Herrenschmidt Signed-off-by: Rusty Russell --- target-ppc/misc_helper.c | 9 + 1 file changed, 9 insertions(+) diff --git a/target-ppc/misc_helper.c b/target-ppc/misc_helper.c index 616aab6..6c97c81 100644 --- a/target-ppc/misc_helper.c +++ b/target-ppc/misc_helper.c @@ -20,6 +20,7 @@ #include "helper.h" #include "helper_regs.h" +#include "hw/virtio/virtio.h" /*/ /* SPR accesses */ @@ -116,3 +117,11 @@ void ppc_store_msr(CPUPPCState *env, target_ulong value) { hreg_store_msr(env, value, 0); } + +bool virtio_get_byteswap(void) +{ +PowerPCCPU *cp = POWERPC_CPU(first_cpu); +CPUPPCState *env = &cp->env; + +return env->spr[SPR_LPCR] & LPCR_ILE; +} -- 1.8.1.2
[Qemu-devel] [PATCH 1/8] virtio_get_byteswap: function for endian-ambivalent targets using virtio.
virtio data structures are defined as "target endian", which assumes that's a fixed value. In fact, that actually means it's platform-specific. Hopefully the OASIS virtio 1.0 spec will fix this. Meanwhile, create a hook for little endian ppc. Signed-off-by: Rusty Russell --- include/hw/virtio/virtio-access.h | 130 ++ include/hw/virtio/virtio.h| 2 + stubs/Makefile.objs | 1 + stubs/virtio_get_byteswap.c | 6 ++ 4 files changed, 139 insertions(+) create mode 100644 include/hw/virtio/virtio-access.h create mode 100644 stubs/virtio_get_byteswap.c diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h new file mode 100644 index 000..01d7dd6 --- /dev/null +++ b/include/hw/virtio/virtio-access.h @@ -0,0 +1,130 @@ +/* + * Virtio Accessor Support: In case your target can change endian. + * + * Copyright IBM, Corp. 2013 + * + * Authors: + * Rusty Russell + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ +#ifndef _QEMU_VIRTIO_ACCESS_H +#define _QEMU_VIRTIO_ACCESS_H +#include "hw/virtio/virtio.h" + +static inline uint16_t virtio_lduw_phys(hwaddr pa) +{ +if (virtio_get_byteswap()) { +return bswap16(lduw_phys(pa)); +} +return lduw_phys(pa); + +} + +static inline uint32_t virtio_ldl_phys(hwaddr pa) +{ +if (virtio_get_byteswap()) { +return bswap32(ldl_phys(pa)); +} +return ldl_phys(pa); +} + +static inline uint64_t virtio_ldq_phys(hwaddr pa) +{ +if (virtio_get_byteswap()) { +return bswap64(ldq_phys(pa)); +} +return ldq_phys(pa); +} + +static inline void virtio_stw_phys(hwaddr pa, uint16_t value) +{ +if (virtio_get_byteswap()) { +stw_phys(pa, bswap16(value)); +} else { +stw_phys(pa, value); +} +} + +static inline void virtio_stl_phys(hwaddr pa, uint32_t value) +{ +if (virtio_get_byteswap()) { +stl_phys(pa, bswap32(value)); +} else { +stl_phys(pa, value); +} +} + +static inline void virtio_stw_p(void *ptr, uint16_t v) +{ +if (virtio_get_byteswap()) { +stw_p(ptr, bswap16(v)); +} else { +stw_p(ptr, v); +} +} + +static inline void virtio_stl_p(void *ptr, uint32_t v) +{ +if (virtio_get_byteswap()) { +stl_p(ptr, bswap32(v)); +} else { +stl_p(ptr, v); +} +} + +static inline void virtio_stq_p(void *ptr, uint64_t v) +{ +if (virtio_get_byteswap()) { +stq_p(ptr, bswap64(v)); +} else { +stq_p(ptr, v); +} +} + +static inline int virtio_lduw_p(const void *ptr) +{ +if (virtio_get_byteswap()) { +return bswap16(lduw_p(ptr)); +} else { +return lduw_p(ptr); +} +} + +static inline int virtio_ldl_p(const void *ptr) +{ +if (virtio_get_byteswap()) { +return bswap32(ldl_p(ptr)); +} else { +return ldl_p(ptr); +} +} + +static inline uint64_t virtio_ldq_p(const void *ptr) +{ +if (virtio_get_byteswap()) { +return bswap64(ldq_p(ptr)); +} else { +return ldq_p(ptr); +} +} + +static inline uint32_t virtio_tswap32(uint32_t s) +{ +if (virtio_get_byteswap()) { +return bswap32(tswap32(s)); +} else { +return tswap32(s); +} +} + +static inline void virtio_tswap32s(uint32_t *s) +{ +tswap32s(s); +if (virtio_get_byteswap()) { +*s = bswap32(*s); +} +} +#endif /* _QEMU_VIRTIO_ACCESS_H */ diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index d7e9e0f..18ca725 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -248,4 +248,6 @@ void virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign, bool set_handler); void virtio_queue_notify_vq(VirtQueue *vq); void virtio_irq(VirtQueue *vq); + +extern bool virtio_get_byteswap(void); #endif diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index f306cba..05f7bab 100644 --- a/stubs/Makefile.objs +++ b/stubs/Makefile.objs @@ -26,3 +26,4 @@ stub-obj-y += vm-stop.o stub-obj-y += vmstate.o stub-obj-$(CONFIG_WIN32) += fd-register.o stub-obj-y += cpus.o +stub-obj-y += virtio_get_byteswap.o diff --git a/stubs/virtio_get_byteswap.c b/stubs/virtio_get_byteswap.c new file mode 100644 index 000..7cf764d --- /dev/null +++ b/stubs/virtio_get_byteswap.c @@ -0,0 +1,6 @@ +#include "hw/virtio/virtio.h" + +bool virtio_get_byteswap(void) +{ +return false; +} -- 1.8.1.2
[Qemu-devel] [PATCH 0/8] virtio for endian curious guests Take #2
The driver parts (patches 3-8) are unchanged, so didn't repost. 1) Rebased onto more recent git tree. 2) New stub is virtio_get_byteswap() 3) PPC64 uses endianness of first CPU interrupt vectors, not CPU endiannes. Hope this one is closer... Rusty. Rusty Russell (8): virtio_get_byteswap: function for endian-ambivalent targets using virtio. target-ppc: ppc64 target's virtio can be either endian. virtio: allow byte swapping for vring and config access hw/net/virtio-net: use virtio wrappers to access headers. hw/net/virtio-balloon: use virtio wrappers to access page frame numbers. hw/block/virtio-blk: use virtio wrappers to access headers. hw/scsi/virtio-scsi: use virtio wrappers to access headers. hw/char/virtio-serial-bus: use virtio wrappers to access headers. hw/block/virtio-blk.c | 35 +- hw/char/virtio-serial-bus.c | 34 +- hw/net/virtio-net.c | 15 +++-- hw/scsi/virtio-scsi.c | 33 +- hw/virtio/virtio-balloon.c| 3 +- hw/virtio/virtio.c| 29 + include/hw/virtio/virtio-access.h | 130 ++ include/hw/virtio/virtio.h| 2 + stubs/Makefile.objs | 1 + stubs/virtio_get_byteswap.c | 6 ++ target-ppc/misc_helper.c | 9 +++ 11 files changed, 226 insertions(+), 71 deletions(-) create mode 100644 include/hw/virtio/virtio-access.h create mode 100644 stubs/virtio_get_byteswap.c -- 1.8.1.2
Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access
Peter Maydell writes: > On 9 August 2013 08:35, Rusty Russell wrote: >> That's a lot of replumbing and indirect function calls for a fairly >> obscure case. We certainly don't have a nice CPUState lying around in >> virtio at the moment, for example. > > Actually if you're in an IO routine you do: it will be in the > global variable 'current_cpu'. Most IO functions don't need this, > but it's what we use for things like "this device behaviour varies > depending on which CPU accesses it". Hmm, that was NULL for me when called from virtio. I have stuck with first_cpu in the ppc64 case. Patches coming, Rusty.
[Qemu-devel] [PATCH 3/8] virtio: allow byte swapping for vring and config access
This is based on a simpler patch by Anthony Liguouri, which only handled the vring accesses. We also need some drivers to access these helpers, eg. for data which contains headers. Signed-off-by: Rusty Russell --- hw/virtio/virtio.c | 29 +++-- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 09f62c6..178647b 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -18,6 +18,7 @@ #include "hw/virtio/virtio.h" #include "qemu/atomic.h" #include "hw/virtio/virtio-bus.h" +#include "hw/virtio/virtio-access.h" /* * The alignment to use between consumer and producer parts of vring. @@ -104,49 +105,49 @@ static inline uint64_t vring_desc_addr(hwaddr desc_pa, int i) { hwaddr pa; pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, addr); -return ldq_phys(pa); +return virtio_ldq_phys(pa); } static inline uint32_t vring_desc_len(hwaddr desc_pa, int i) { hwaddr pa; pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, len); -return ldl_phys(pa); +return virtio_ldl_phys(pa); } static inline uint16_t vring_desc_flags(hwaddr desc_pa, int i) { hwaddr pa; pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, flags); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_desc_next(hwaddr desc_pa, int i) { hwaddr pa; pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, next); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_avail_flags(VirtQueue *vq) { hwaddr pa; pa = vq->vring.avail + offsetof(VRingAvail, flags); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_avail_idx(VirtQueue *vq) { hwaddr pa; pa = vq->vring.avail + offsetof(VRingAvail, idx); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_avail_ring(VirtQueue *vq, int i) { hwaddr pa; pa = vq->vring.avail + offsetof(VRingAvail, ring[i]); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_used_event(VirtQueue *vq) @@ -158,42 +159,42 @@ static inline void vring_used_ring_id(VirtQueue *vq, int i, uint32_t val) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, ring[i].id); -stl_phys(pa, val); +virtio_stl_phys(pa, val); } static inline void vring_used_ring_len(VirtQueue *vq, int i, uint32_t val) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, ring[i].len); -stl_phys(pa, val); +virtio_stl_phys(pa, val); } static uint16_t vring_used_idx(VirtQueue *vq) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, idx); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline void vring_used_idx_set(VirtQueue *vq, uint16_t val) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, idx); -stw_phys(pa, val); +virtio_stw_phys(pa, val); } static inline void vring_used_flags_set_bit(VirtQueue *vq, int mask) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, flags); -stw_phys(pa, lduw_phys(pa) | mask); +virtio_stw_phys(pa, virtio_lduw_phys(pa) | mask); } static inline void vring_used_flags_unset_bit(VirtQueue *vq, int mask) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, flags); -stw_phys(pa, lduw_phys(pa) & ~mask); +virtio_stw_phys(pa, virtio_lduw_phys(pa) & ~mask); } static inline void vring_avail_event(VirtQueue *vq, uint16_t val) @@ -203,7 +204,7 @@ static inline void vring_avail_event(VirtQueue *vq, uint16_t val) return; } pa = vq->vring.used + offsetof(VRingUsed, ring[vq->vring.num]); -stw_phys(pa, val); +virtio_stw_phys(pa, val); } void virtio_queue_set_notification(VirtQueue *vq, int enable) -- 1.8.1.2
Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access
Benjamin Herrenschmidt writes: > This whole exercise should have nothing to do with the current endian > mode of the CPU. If for example you are running lx86 (the x86 emulator > IBM provides) which exploits MSR:LE on POWER7 to run x86 binaries in > userspace, you don't want virtio to suddenly change endian ! > > The information we care about is the endianness of the operating system. Which is why my original patches nabbed the endianness when the target updated the virtio device status. You're making an assumption about the nature of the guest, that they don't pass the virtio device directly through to userspace. I don't care, though. The point is to make something which works, until the Real Fix (LE virtio). > The most logical way to infer it is a different bit, which used to be > MSR:ILE and is now in LPCR for guests and controlled via a hypercall on > pseries, which indicates what is the endianness of interrupt vectors. > > IE. It indicates how the cpu should set MSR:LE when taking an interrupt, > regardless of what the current MSR:LE value is at any given point in > time. > > So what should be done in fact is whenever *that* bit is changed > (currently via hcall, maybe via MSR:ILE if we emulate that on older > models or LPCR when we emulate that), then the qemu cpu model can "call > out" to change the "OS endianness" which we can propagate to virtio. > > Anything trying to do stuff based on the "current" endianness in the MSR > sounds like a cesspit to me. OK. What should Anton's gdb stub do then? Cheers, Rusty.
Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access
Anthony Liguori writes: > Rusty Russell writes: >> (Qemu run under eatmydata to eliminate syncs) > > FYI, cache=unsafe is equivalent to using eatmydata. Ah, thanks! > I can reproduce this although I also see a larger standard deviation. > > BEFORE: > MIN: 496 > MAX: 1055 > AVG: 873.22 > STDEV: 136.88 > > AFTER: > MIN: 494 > MAX: 1456 > AVG: 947.77 > STDEV: 150.89 BTW, how did you generate these stats? Consider this my plug for my little stats filter: https://github.com/rustyrussell/stats > > In my datasets, the stdev is higher in the after case implying that > there is more variation. Indeed, the MIN is pretty much the same. > > GCC is inlining the functions, I'm still surprised that it's measurable > at all. GCC won't inline across compilation units without -flto though, so the stub call won't be inlined, right? Cheers, Rusty.
Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access
Andreas Färber writes: > Am 08.08.2013 17:40, schrieb Anthony Liguori: >> Andreas Färber writes: >>> Am 08.08.2013 15:31, schrieb Anthony Liguori: We have a mechanism to do weak functions via stubs/. I think it would be better to do cpu_get_byteswap() as a stub function and then overload it in the ppc64 code. >>> >>> If this as your name indicates is a per-CPU function then it should go >>> into CPUClass. Interesting question is, what is virtio supposed to do if >>> we have two ppc CPUs, one is Big Endian, the other is Little Endian. >> >> PPC64 is big endian. AFAIK, there is no such thing as a little endian >> PPC64 processor. >> >> This is just a processor mode where loads/stores are byte lane swapped. >> Hence the name 'cpu_get_byteswap'. It's just asking whether the >> load/stores are being swapped or not. > > Exactly, just read it as "is in ... Endian mode". On the CPUs I am more > familiar with (e.g., 970), this used to be controlled via an MSR bit, > which as CPUPPCState::msr exists per CPUState. I did not check on real > hardware, but from the QEMU code this would allow for the mixed-endian > scenario described above. > >> At least for PPC64, it's not possible to enable/disable byte lane >> swapping for individual CPUs. It's done through a system-wide hcall. > > What is offending me is only the following: If we name it > cpu_get_byteswap() as proposed by you, then its first argument should be > a CPUState *cpu. Its value would be read from the derived type's state, > such as the MSR bit in the code path that you wanted duplicated. The > function implementing that register-reading would be a hook in CPUClass, > with a default implementation in qom/cpu.c rather than a fallback in > stubs/. To access CPUClass, CPUState cannot be NULL, as brought up by > Stefano for cpu_do_unassigned_access(); not following that pattern > prevents mixing CPU architectures, which my large refactorings have > partially been about. Cf. my guest-memory-dump refactoring. > > If it is just some random global value, then please don't call it > cpu_*(). Since sPAPR is not a target of its own, I don't see how/where > you want to implement that hcall query as per-target function either, > that might rather call for a QEMUMachine hook? > > I don't care or argue about byte lanes here, I am just trying to keep > API design consistent and not regressing on the way to heterogeneous > emulation. That's a lot of replumbing and indirect function calls for a fairly obscure case. We certainly don't have a nice CPUState lying around in virtio at the moment, for example. I can try to plumb this in if there's consensus, but I suspect it's making the job 10x harder. (The next logical step would be for st* and ld* to take the cpu to query its endianness, Anthony's weird ideas notwithstanding). Cheers, Rusty.
Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access
Andreas Färber writes: > Am 08.08.2013 15:31, schrieb Anthony Liguori: >> Rusty Russell writes: >> We have a mechanism to do weak functions via stubs/. I think it would >> be better to do cpu_get_byteswap() as a stub function and then overload >> it in the ppc64 code. > > If this as your name indicates is a per-CPU function then it should go > into CPUClass. Interesting question is, what is virtio supposed to do if > we have two ppc CPUs, one is Big Endian, the other is Little Endian. > We'd need to check current_cpu then, which for Xen is always NULL. Below is the minimal solution, which is sufficient for virtio. If Anton wants per-cpu endianness for gdb, he'll need something more sophisticated. Feedback welcome! Rusty. Subject: cpu_get_byteswap: function for endian-ambivalent targets. Signed-off-by: Rusty Russell diff --git a/include/qom/cpu.h b/include/qom/cpu.h index a5bb515..ed84267 100644 --- a/include/qom/cpu.h +++ b/include/qom/cpu.h @@ -357,4 +357,13 @@ void cpu_reset_interrupt(CPUState *cpu, int mask); */ void cpu_resume(CPUState *cpu); +/** + * cpu_get_byteswap: + * + * Is (any) CPU running in byteswapped mode: normally false. This + * doesn't take a cpu argument, because we don't support heterogeneous + * endianness. + */ +bool cpu_get_byteswap(void); + #endif diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index 9b701b4..d4af94a 100644 --- a/stubs/Makefile.objs +++ b/stubs/Makefile.objs @@ -25,3 +25,4 @@ stub-obj-y += vm-stop.o stub-obj-y += vmstate.o stub-obj-$(CONFIG_WIN32) += fd-register.o stub-obj-y += cpus.o +stub-obj-y += cpu_byteswap.o diff --git a/stubs/cpu_byteswap.c b/stubs/cpu_byteswap.c new file mode 100644 index 000..b3b669f --- /dev/null +++ b/stubs/cpu_byteswap.c @@ -0,0 +1,6 @@ +#include "qom/cpu.h" + +bool cpu_get_byteswap(void) +{ +return false; +} Subject: target-ppc: ppc64 targets can be either endian. In this case, we just query the first cpu. Signed-off-by: Rusty Russell diff --git a/target-ppc/misc_helper.c b/target-ppc/misc_helper.c index 616aab6..0a508eb 100644 --- a/target-ppc/misc_helper.c +++ b/target-ppc/misc_helper.c @@ -116,3 +116,8 @@ void ppc_store_msr(CPUPPCState *env, target_ulong value) { hreg_store_msr(env, value, 0); } + +bool cpu_get_byteswap(void) +{ +return first_cpu->hflags & (1 << MSR_LE); +}
Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access
Anthony Liguori writes: > I suspect this is a premature optimization. With a weak function called > directly in the accessors below, I suspect you would see no measurable > performance overhead compared to this approach. > > It's all very predictable so the CPU should do a decent job optimizing > the if () away. Perhaps. I was leery of introducing performance regressions, but the actual I/O tends to dominate anyway. So I tested this, by adding the patch (below) and benchmarking qemu-system-i386 on my laptop before and after. Setup: Intel(R) Core(TM) i5 CPU M 560 @ 2.67GHz (Performance cpu governer enabled) Guest: virtio user net, virtio block on raw file, 1 CPU, 512MB RAM. (Qemu run under eatmydata to eliminate syncs) First test: ping -f -c 1 -q 10.0.2.0 (100 times) (Ping chosen since packets stay in qemu's user net code) BEFORE: MIN: 824ms MAX: 914ms AVG: 876.95ms STDDEV: 16ms AFTER: MIN: 872ms MAX: 933ms AVG: 904.35ms STDDEV: 15ms Second test: dd if=/dev/vda iflag=direct count=1 of=/dev/null (100 times) BEFORE: MIN: 0.927994sec MAX: 1.051640sec AVG: 0.99733sec STDDEV: 0.028sec AFTER: MIN: 0.941706sec MAX: 1.034810sec AVG: 0.988692sec STDDEV: 0.021sec So, we can notice performance on ping, but anything which does actual IO is a wash. Cheers, Rusty. diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 2887f17..df8733b 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -85,20 +85,6 @@ struct VirtQueue EventNotifier host_notifier; }; -#ifdef TARGET_VIRTIO_SWAPENDIAN -bool virtio_byteswap; - -/* Ask target code if we should swap endian for all vring and config access. */ -static void mark_endian(void) -{ -virtio_byteswap = virtio_swap_endian(); -} -#else -static void mark_endian(void) -{ -} -#endif - /* virt queue functions */ static void virtqueue_init(VirtQueue *vq) { @@ -540,9 +526,6 @@ void virtio_set_status(VirtIODevice *vdev, uint8_t val) VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev); trace_virtio_set_status(vdev, val); -/* If guest virtio endian is uncertain, set it now. */ -mark_endian(); - if (k->set_status) { k->set_status(vdev, val); } diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h index b1d531e..ea4166a 100644 --- a/include/hw/virtio/virtio-access.h +++ b/include/hw/virtio/virtio-access.h @@ -13,18 +13,9 @@ #ifndef _QEMU_VIRTIO_ACCESS_H #define _QEMU_VIRTIO_ACCESS_H -#ifdef TARGET_VIRTIO_SWAPENDIAN -/* Architectures which need biendian define this function. */ -extern bool virtio_swap_endian(void); - -extern bool virtio_byteswap; -#else -#define virtio_byteswap false -#endif - static inline uint16_t virtio_lduw_phys(hwaddr pa) { -if (virtio_byteswap) { +if (cpu_get_byteswap()) { return bswap16(lduw_phys(pa)); } return lduw_phys(pa); @@ -33,7 +24,7 @@ static inline uint16_t virtio_lduw_phys(hwaddr pa) static inline uint32_t virtio_ldl_phys(hwaddr pa) { -if (virtio_byteswap) { +if (cpu_get_byteswap()) { return bswap32(ldl_phys(pa)); } return ldl_phys(pa); @@ -41,7 +32,7 @@ static inline uint32_t virtio_ldl_phys(hwaddr pa) static inline uint64_t virtio_ldq_phys(hwaddr pa) { -if (virtio_byteswap) { +if (cpu_get_byteswap()) { return bswap64(ldq_phys(pa)); } return ldq_phys(pa); @@ -49,7 +40,7 @@ static inline uint64_t virtio_ldq_phys(hwaddr pa) static inline void virtio_stw_phys(hwaddr pa, uint16_t value) { -if (virtio_byteswap) { +if (cpu_get_byteswap()) { stw_phys(pa, bswap16(value)); } else { stw_phys(pa, value); @@ -58,7 +49,7 @@ static inline void virtio_stw_phys(hwaddr pa, uint16_t value) static inline void virtio_stl_phys(hwaddr pa, uint32_t value) { -if (virtio_byteswap) { +if (cpu_get_byteswap()) { stl_phys(pa, bswap32(value)); } else { stl_phys(pa, value); @@ -67,7 +58,7 @@ static inline void virtio_stl_phys(hwaddr pa, uint32_t value) static inline void virtio_stw_p(void *ptr, uint16_t v) { -if (virtio_byteswap) { +if (cpu_get_byteswap()) { stw_p(ptr, bswap16(v)); } else { stw_p(ptr, v); @@ -76,7 +67,7 @@ static inline void virtio_stw_p(void *ptr, uint16_t v) static inline void virtio_stl_p(void *ptr, uint32_t v) { -if (virtio_byteswap) { +if (cpu_get_byteswap()) { stl_p(ptr, bswap32(v)); } else { stl_p(ptr, v); @@ -85,7 +76,7 @@ static inline void virtio_stl_p(void *ptr, uint32_t v) static inline void virtio_stq_p(void *ptr, uint64_t v) { -if (virtio_byteswap) { +if (cpu_get_byteswap()) { stq_p(ptr, bswap64(v)); } else { stq_p(ptr, v); @@ -94,7 +85,7 @@ static inline void virtio_stq_p(void *ptr, uint64_t v) static inline int virtio_lduw_p(const void *ptr) { -
Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access
Anthony Liguori writes: > "Daniel P. Berrange" writes: > >> On Thu, Aug 08, 2013 at 10:40:28AM -0500, Anthony Liguori wrote: >>> Andreas Färber writes: >>> >> We have a mechanism to do weak functions via stubs/. I think it would >>> >> be better to do cpu_get_byteswap() as a stub function and then overload >>> >> it in the ppc64 code. >>> > >>> > If this as your name indicates is a per-CPU function then it should go >>> > into CPUClass. Interesting question is, what is virtio supposed to do if >>> > we have two ppc CPUs, one is Big Endian, the other is Little Endian. >>> >>> PPC64 is big endian. AFAIK, there is no such thing as a little endian >>> PPC64 processor. >> >> Unless I'm misunderstanding, this thread seems to suggest otherwise: >> >> "[Qemu-devel] [PATCH 0/5] 64bit PowerPC little endian support" >> >> https://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00813.html > > Yeah, it's confusing. It feels like little endian to most software but > the distinction in hardware (and therefore QEMU) is important. > > It's the same processor. It still starts executing big endian > instructions. A magic register value is tweaked and loads/stores are > swapped. CPU data structures are still read as big endian though. It's > really just load/stores that are affected. > > The distinction is important in QEMU. ppc64 is still > TARGET_WORDS_BIGENDIAN. We still want most stl_phys to treat integers > as big endian. There's just this extra concept that CPU loads/stores > are sometimes byte swapped. That affects virtio but not a lot else. You've redefined endian here; please don't do that. Endian is the order in memory which a CPU does loads and stores. From any reasonable definition, PPC is bi-endian. It's actually a weird thing for the qemu core to know at all: almost everything which cares is in target-specific code. The exceptions are gdb stubs and virtio, both of which are "native endian" (and that weird code in exec.c: what is notdirty_mem_write?). Your argument that we shouldn't fix stl_* might be justifiable (ie. just hack virtio and gdb as one-offs), but it's neither clear nor "least surprise". Chers, Rusty.
Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access
Andreas Färber writes: > Am 08.08.2013 15:31, schrieb Anthony Liguori: >> Rusty Russell writes: >> >>> Virtio is currently defined to work as "guest endian", but this is a >>> problem if the guest can change endian. As most targets can't change >>> endian, we make it a per-target option to avoid pessimising. >>> >>> This is based on a simpler patch by Anthony Liguouri, which only handled >>> the vring accesses. We also need some drivers to access these helpers, >>> eg. for data which contains headers. >>> >>> Signed-off-by: Rusty Russell >>> --- >>> hw/virtio/virtio.c| 46 + >>> include/hw/virtio/virtio-access.h | 138 >>> ++ >>> 2 files changed, 170 insertions(+), 14 deletions(-) >>> create mode 100644 include/hw/virtio/virtio-access.h >>> >>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c >>> index 8176c14..2887f17 100644 >>> --- a/hw/virtio/virtio.c >>> +++ b/hw/virtio/virtio.c >>> @@ -18,6 +18,7 @@ >>> #include "hw/virtio/virtio.h" >>> #include "qemu/atomic.h" >>> #include "hw/virtio/virtio-bus.h" >>> +#include "hw/virtio/virtio-access.h" >>> >>> /* The alignment to use between consumer and producer parts of vring. >>> * x86 pagesize again. */ >>> @@ -84,6 +85,20 @@ struct VirtQueue >>> EventNotifier host_notifier; >>> }; >>> >>> +#ifdef TARGET_VIRTIO_SWAPENDIAN >>> +bool virtio_byteswap; >>> + >>> +/* Ask target code if we should swap endian for all vring and config >>> access. */ >>> +static void mark_endian(void) >>> +{ >>> +virtio_byteswap = virtio_swap_endian(); >>> +} >>> +#else >>> +static void mark_endian(void) >>> +{ >>> +} >>> +#endif >>> + >> >> It would be very good to avoid a target specific define here. We would >> like to move to only building a single copy of the virtio code. > > +1 > >> We have a mechanism to do weak functions via stubs/. I think it would >> be better to do cpu_get_byteswap() as a stub function and then overload >> it in the ppc64 code. > > If this as your name indicates is a per-CPU function then it should go > into CPUClass. Interesting question is, what is virtio supposed to do if > we have two ppc CPUs, one is Big Endian, the other is Little Endian. > We'd need to check current_cpu then, which for Xen is always NULL. This is why the check is performed on a random CPU when they first acknowledge the device. It's a hacky assumption, but that's why there's a proposal to nail virtio to LE for the 1.0 OASIS standard. You can't actually change endian of a virtio device in flight: it doesn't make sense since there's readable state there already. Cheers, Rusty.
[Qemu-devel] [PATCH 5/7] hw/block/virtio-blk: use virtio wrappers to access headers.
Note that st*_raw and ld*_raw are effectively replaced by st*_p and ld*_p. Signed-off-by: Rusty Russell --- hw/block/virtio-blk.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index cf12469..9b897fa 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -25,6 +25,7 @@ # include #endif #include "hw/virtio/virtio-bus.h" +#include "hw/virtio/virtio-access.h" typedef struct VirtIOBlockReq { @@ -76,7 +77,7 @@ static void virtio_blk_rw_complete(void *opaque, int ret) trace_virtio_blk_rw_complete(req, ret); if (ret) { -bool is_read = !(ldl_p(&req->out->type) & VIRTIO_BLK_T_OUT); +bool is_read = !(virtio_ldl_p(&req->out->type) & VIRTIO_BLK_T_OUT); if (virtio_blk_handle_rw_error(req, -ret, is_read)) return; } @@ -223,12 +224,12 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req) hdr.status = CHECK_CONDITION; } -stl_p(&req->scsi->errors, - hdr.status | (hdr.msg_status << 8) | - (hdr.host_status << 16) | (hdr.driver_status << 24)); -stl_p(&req->scsi->residual, hdr.resid); -stl_p(&req->scsi->sense_len, hdr.sb_len_wr); -stl_p(&req->scsi->data_len, hdr.dxfer_len); +virtio_stl_p(&req->scsi->errors, +hdr.status | (hdr.msg_status << 8) | +(hdr.host_status << 16) | (hdr.driver_status << 24)); +virtio_stl_p(&req->scsi->residual, hdr.resid); +virtio_stl_p(&req->scsi->sense_len, hdr.sb_len_wr); +virtio_stl_p(&req->scsi->data_len, hdr.dxfer_len); virtio_blk_req_complete(req, status); g_free(req); @@ -239,7 +240,7 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req) fail: /* Just put anything nonzero so that the ioctl fails in the guest. */ -stl_p(&req->scsi->errors, 255); +virtio_stl_p(&req->scsi->errors, 255); virtio_blk_req_complete(req, status); g_free(req); } @@ -285,7 +286,7 @@ static void virtio_blk_handle_write(VirtIOBlockReq *req, MultiReqBuffer *mrb) BlockRequest *blkreq; uint64_t sector; -sector = ldq_p(&req->out->sector); +sector = virtio_ldq_p(&req->out->sector); bdrv_acct_start(req->dev->bs, &req->acct, req->qiov.size, BDRV_ACCT_WRITE); @@ -319,7 +320,7 @@ static void virtio_blk_handle_read(VirtIOBlockReq *req) { uint64_t sector; -sector = ldq_p(&req->out->sector); +sector = virtio_ldq_p(&req->out->sector); bdrv_acct_start(req->dev->bs, &req->acct, req->qiov.size, BDRV_ACCT_READ); @@ -357,7 +358,7 @@ static void virtio_blk_handle_request(VirtIOBlockReq *req, req->out = (void *)req->elem.out_sg[0].iov_base; req->in = (void *)req->elem.in_sg[req->elem.in_num - 1].iov_base; -type = ldl_p(&req->out->type); +type = virtio_ldl_p(&req->out->type); if (type & VIRTIO_BLK_T_FLUSH) { virtio_blk_handle_flush(req, mrb); @@ -485,12 +486,12 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) bdrv_get_geometry(s->bs, &capacity); memset(&blkcfg, 0, sizeof(blkcfg)); -stq_raw(&blkcfg.capacity, capacity); -stl_raw(&blkcfg.seg_max, 128 - 2); -stw_raw(&blkcfg.cylinders, s->conf->cyls); -stl_raw(&blkcfg.blk_size, blk_size); -stw_raw(&blkcfg.min_io_size, s->conf->min_io_size / blk_size); -stw_raw(&blkcfg.opt_io_size, s->conf->opt_io_size / blk_size); +virtio_stq_p(&blkcfg.capacity, capacity); +virtio_stl_p(&blkcfg.seg_max, 128 - 2); +virtio_stw_p(&blkcfg.cylinders, s->conf->cyls); +virtio_stl_p(&blkcfg.blk_size, blk_size); +virtio_stw_p(&blkcfg.min_io_size, s->conf->min_io_size / blk_size); +virtio_stw_p(&blkcfg.opt_io_size, s->conf->opt_io_size / blk_size); blkcfg.heads = s->conf->heads; /* * We must ensure that the block device capacity is a multiple of -- 1.8.1.2
[Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access
Virtio is currently defined to work as "guest endian", but this is a problem if the guest can change endian. As most targets can't change endian, we make it a per-target option to avoid pessimising. This is based on a simpler patch by Anthony Liguouri, which only handled the vring accesses. We also need some drivers to access these helpers, eg. for data which contains headers. Signed-off-by: Rusty Russell --- hw/virtio/virtio.c| 46 + include/hw/virtio/virtio-access.h | 138 ++ 2 files changed, 170 insertions(+), 14 deletions(-) create mode 100644 include/hw/virtio/virtio-access.h diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 8176c14..2887f17 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -18,6 +18,7 @@ #include "hw/virtio/virtio.h" #include "qemu/atomic.h" #include "hw/virtio/virtio-bus.h" +#include "hw/virtio/virtio-access.h" /* The alignment to use between consumer and producer parts of vring. * x86 pagesize again. */ @@ -84,6 +85,20 @@ struct VirtQueue EventNotifier host_notifier; }; +#ifdef TARGET_VIRTIO_SWAPENDIAN +bool virtio_byteswap; + +/* Ask target code if we should swap endian for all vring and config access. */ +static void mark_endian(void) +{ +virtio_byteswap = virtio_swap_endian(); +} +#else +static void mark_endian(void) +{ +} +#endif + /* virt queue functions */ static void virtqueue_init(VirtQueue *vq) { @@ -100,49 +115,49 @@ static inline uint64_t vring_desc_addr(hwaddr desc_pa, int i) { hwaddr pa; pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, addr); -return ldq_phys(pa); +return virtio_ldq_phys(pa); } static inline uint32_t vring_desc_len(hwaddr desc_pa, int i) { hwaddr pa; pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, len); -return ldl_phys(pa); +return virtio_ldl_phys(pa); } static inline uint16_t vring_desc_flags(hwaddr desc_pa, int i) { hwaddr pa; pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, flags); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_desc_next(hwaddr desc_pa, int i) { hwaddr pa; pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, next); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_avail_flags(VirtQueue *vq) { hwaddr pa; pa = vq->vring.avail + offsetof(VRingAvail, flags); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_avail_idx(VirtQueue *vq) { hwaddr pa; pa = vq->vring.avail + offsetof(VRingAvail, idx); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_avail_ring(VirtQueue *vq, int i) { hwaddr pa; pa = vq->vring.avail + offsetof(VRingAvail, ring[i]); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline uint16_t vring_used_event(VirtQueue *vq) @@ -154,42 +169,42 @@ static inline void vring_used_ring_id(VirtQueue *vq, int i, uint32_t val) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, ring[i].id); -stl_phys(pa, val); +virtio_stl_phys(pa, val); } static inline void vring_used_ring_len(VirtQueue *vq, int i, uint32_t val) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, ring[i].len); -stl_phys(pa, val); +virtio_stl_phys(pa, val); } static uint16_t vring_used_idx(VirtQueue *vq) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, idx); -return lduw_phys(pa); +return virtio_lduw_phys(pa); } static inline void vring_used_idx_set(VirtQueue *vq, uint16_t val) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, idx); -stw_phys(pa, val); +virtio_stw_phys(pa, val); } static inline void vring_used_flags_set_bit(VirtQueue *vq, int mask) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, flags); -stw_phys(pa, lduw_phys(pa) | mask); +virtio_stw_phys(pa, virtio_lduw_phys(pa) | mask); } static inline void vring_used_flags_unset_bit(VirtQueue *vq, int mask) { hwaddr pa; pa = vq->vring.used + offsetof(VRingUsed, flags); -stw_phys(pa, lduw_phys(pa) & ~mask); +virtio_stw_phys(pa, virtio_lduw_phys(pa) & ~mask); } static inline void vring_avail_event(VirtQueue *vq, uint16_t val) @@ -199,7 +214,7 @@ static inline void vring_avail_event(VirtQueue *vq, uint16_t val) return; } pa = vq->vring.used + offsetof(VRingUsed, ring[vq->vring.num]); -stw_phys(pa, val); +virtio_stw_phys(pa, val); } void virtio_queue_set_notification(VirtQueue *vq, int enable) @@ -525,6 +540,9 @@ void virtio_set_status(VirtIODevice *vdev, uint8_t val) VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev); trace_virtio_set_status(vdev, val)
[Qemu-devel] [PATCH 7/7] patch virtio-serial-biendian.patch
--- hw/char/virtio-serial-bus.c | 34 ++ 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index cc3d1dd..0421725 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -24,6 +24,7 @@ #include "hw/sysbus.h" #include "trace.h" #include "hw/virtio/virtio-serial.h" +#include "hw/virtio/virtio-access.h" static VirtIOSerialPort *find_port_by_id(VirtIOSerial *vser, uint32_t id) { @@ -185,9 +186,9 @@ static size_t send_control_event(VirtIOSerial *vser, uint32_t port_id, { struct virtio_console_control cpkt; -stl_p(&cpkt.id, port_id); -stw_p(&cpkt.event, event); -stw_p(&cpkt.value, value); +virtio_stl_p(&cpkt.id, port_id); +virtio_stw_p(&cpkt.event, event); +virtio_stw_p(&cpkt.value, value); trace_virtio_serial_send_control_event(port_id, event, value); return send_control_msg(vser, &cpkt, sizeof(cpkt)); @@ -291,8 +292,8 @@ static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len) return; } -cpkt.event = lduw_p(&gcpkt->event); -cpkt.value = lduw_p(&gcpkt->value); +cpkt.event = virtio_lduw_p(&gcpkt->event); +cpkt.value = virtio_lduw_p(&gcpkt->value); trace_virtio_serial_handle_control_message(cpkt.event, cpkt.value); @@ -312,10 +313,10 @@ static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len) return; } -port = find_port_by_id(vser, ldl_p(&gcpkt->id)); +port = find_port_by_id(vser, virtio_ldl_p(&gcpkt->id)); if (!port) { error_report("virtio-serial-bus: Unexpected port id %u for device %s", - ldl_p(&gcpkt->id), vser->bus.qbus.name); + virtio_ldl_p(&gcpkt->id), vser->bus.qbus.name); return; } @@ -342,9 +343,9 @@ static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len) } if (port->name) { -stl_p(&cpkt.id, port->id); -stw_p(&cpkt.event, VIRTIO_CONSOLE_PORT_NAME); -stw_p(&cpkt.value, 1); +virtio_stl_p(&cpkt.id, port->id); +virtio_stw_p(&cpkt.event, VIRTIO_CONSOLE_PORT_NAME); +virtio_stw_p(&cpkt.value, 1); buffer_len = sizeof(cpkt) + strlen(port->name) + 1; buffer = g_malloc(buffer_len); @@ -536,7 +537,7 @@ static void virtio_serial_save(QEMUFile *f, void *opaque) qemu_put_be32s(f, &s->config.max_nr_ports); /* The ports map */ -max_nr_ports = tswap32(s->config.max_nr_ports); +max_nr_ports = virtio_tswap32(s->config.max_nr_ports); for (i = 0; i < (max_nr_ports + 31) / 32; i++) { qemu_put_be32s(f, &s->ports_map[i]); } @@ -690,8 +691,8 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) qemu_get_be16s(f, &s->config.rows); qemu_get_be32s(f, &max_nr_ports); -tswap32s(&max_nr_ports); -if (max_nr_ports > tswap32(s->config.max_nr_ports)) { +virtio_tswap32s(&max_nr_ports); +if (max_nr_ports > virtio_tswap32(s->config.max_nr_ports)) { /* Source could have had more ports than us. Fail migration. */ return -EINVAL; } @@ -760,7 +761,7 @@ static uint32_t find_free_port_id(VirtIOSerial *vser) { unsigned int i, max_nr_ports; -max_nr_ports = tswap32(vser->config.max_nr_ports); +max_nr_ports = virtio_tswap32(vser->config.max_nr_ports); for (i = 0; i < (max_nr_ports + 31) / 32; i++) { uint32_t map, bit; @@ -846,7 +847,7 @@ static int virtser_port_qdev_init(DeviceState *qdev) } } -max_nr_ports = tswap32(port->vser->config.max_nr_ports); +max_nr_ports = virtio_tswap32(port->vser->config.max_nr_ports); if (port->id >= max_nr_ports) { error_report("virtio-serial-bus: Out-of-range port id specified, max. allowed: %u", max_nr_ports - 1); @@ -946,7 +947,8 @@ static int virtio_serial_device_init(VirtIODevice *vdev) vser->ovqs[i] = virtio_add_queue(vdev, 128, handle_output); } -vser->config.max_nr_ports = tswap32(vser->serial.max_virtserial_ports); +vser->config.max_nr_ports = + virtio_tswap32(vser->serial.max_virtserial_ports); vser->ports_map = g_malloc0(((vser->serial.max_virtserial_ports + 31) / 32) * sizeof(vser->ports_map[0])); /* -- 1.8.1.2
[Qemu-devel] [PATCH 4/7] hw/net/virtio-balloon: use virtio wrappers to access page frame numbers.
Signed-off-by: Rusty Russell --- hw/virtio/virtio-balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index d669756..c0b4f6e 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -30,6 +30,7 @@ #endif #include "hw/virtio/virtio-bus.h" +#include "hw/virtio/virtio-access.h" static void balloon_page(void *addr, int deflate) { @@ -192,7 +193,7 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) ram_addr_t pa; ram_addr_t addr; -pa = (ram_addr_t)ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT; +pa = (ram_addr_t)virtio_ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT; offset += 4; /* FIXME: remove get_system_memory(), but how? */ -- 1.8.1.2
[Qemu-devel] [PATCH 2/7] target-ppc: ppc64 targets can be either endian.
Signed-off-by: Rusty Russell --- configure| 1 + target-ppc/misc_helper.c | 8 2 files changed, 9 insertions(+) diff --git a/configure b/configure index ad32f87..cee32af 100755 --- a/configure +++ b/configure @@ -4217,6 +4217,7 @@ case "$target_name" in ppc64) TARGET_BASE_ARCH=ppc TARGET_ABI_DIR=ppc +echo "TARGET_VIRTIO_SWAPENDIAN=y" >> $config_target_mak gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml power-spe.xml" ;; ppc64abi32) diff --git a/target-ppc/misc_helper.c b/target-ppc/misc_helper.c index 616aab6..b031586 100644 --- a/target-ppc/misc_helper.c +++ b/target-ppc/misc_helper.c @@ -20,6 +20,7 @@ #include "helper.h" #include "helper_regs.h" +#include "hw/virtio/virtio-access.h" /*/ /* SPR accesses */ @@ -116,3 +117,10 @@ void ppc_store_msr(CPUPPCState *env, target_ulong value) { hreg_store_msr(env, value, 0); } + +/* Our virtio accesses are LE if the first CPU is LE when they touch + * it. We assume endian doesn't change after that! */ +bool virtio_swap_endian(void) +{ +return first_cpu->hflags & (1 << MSR_LE); +} -- 1.8.1.2
[Qemu-devel] [PATCH 6/7] hw/scsi/virtio-scsi: use virtio wrappers to access headers.
Note that st*_raw and ld*_raw are effectively replaced by st*_p and ld*_p. Signed-off-by: Rusty Russell --- hw/scsi/virtio-scsi.c | 33 + 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 08dd3f3..c417087 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -18,6 +18,7 @@ #include #include #include +#include "hw/virtio/virtio-access.h" typedef struct VirtIOSCSIReq { VirtIOSCSI *dev; @@ -307,12 +308,12 @@ static void virtio_scsi_command_complete(SCSIRequest *r, uint32_t status, req->resp.cmd->response = VIRTIO_SCSI_S_OK; req->resp.cmd->status = status; if (req->resp.cmd->status == GOOD) { -req->resp.cmd->resid = tswap32(resid); +req->resp.cmd->resid = virtio_tswap32(resid); } else { req->resp.cmd->resid = 0; sense_len = scsi_req_get_sense(r, req->resp.cmd->sense, VIRTIO_SCSI_SENSE_SIZE); -req->resp.cmd->sense_len = tswap32(sense_len); +req->resp.cmd->sense_len = virtio_tswap32(sense_len); } virtio_scsi_complete_req(req); } @@ -408,16 +409,16 @@ static void virtio_scsi_get_config(VirtIODevice *vdev, VirtIOSCSIConfig *scsiconf = (VirtIOSCSIConfig *)config; VirtIOSCSICommon *s = VIRTIO_SCSI_COMMON(vdev); -stl_raw(&scsiconf->num_queues, s->conf.num_queues); -stl_raw(&scsiconf->seg_max, 128 - 2); -stl_raw(&scsiconf->max_sectors, s->conf.max_sectors); -stl_raw(&scsiconf->cmd_per_lun, s->conf.cmd_per_lun); -stl_raw(&scsiconf->event_info_size, sizeof(VirtIOSCSIEvent)); -stl_raw(&scsiconf->sense_size, s->sense_size); -stl_raw(&scsiconf->cdb_size, s->cdb_size); -stw_raw(&scsiconf->max_channel, VIRTIO_SCSI_MAX_CHANNEL); -stw_raw(&scsiconf->max_target, VIRTIO_SCSI_MAX_TARGET); -stl_raw(&scsiconf->max_lun, VIRTIO_SCSI_MAX_LUN); +virtio_stl_p(&scsiconf->num_queues, s->conf.num_queues); +virtio_stl_p(&scsiconf->seg_max, 128 - 2); +virtio_stl_p(&scsiconf->max_sectors, s->conf.max_sectors); +virtio_stl_p(&scsiconf->cmd_per_lun, s->conf.cmd_per_lun); +virtio_stl_p(&scsiconf->event_info_size, sizeof(VirtIOSCSIEvent)); +virtio_stl_p(&scsiconf->sense_size, s->sense_size); +virtio_stl_p(&scsiconf->cdb_size, s->cdb_size); +virtio_stw_p(&scsiconf->max_channel, VIRTIO_SCSI_MAX_CHANNEL); +virtio_stw_p(&scsiconf->max_target, VIRTIO_SCSI_MAX_TARGET); +virtio_stl_p(&scsiconf->max_lun, VIRTIO_SCSI_MAX_LUN); } static void virtio_scsi_set_config(VirtIODevice *vdev, @@ -426,14 +427,14 @@ static void virtio_scsi_set_config(VirtIODevice *vdev, VirtIOSCSIConfig *scsiconf = (VirtIOSCSIConfig *)config; VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev); -if ((uint32_t) ldl_raw(&scsiconf->sense_size) >= 65536 || -(uint32_t) ldl_raw(&scsiconf->cdb_size) >= 256) { +if ((uint32_t) virtio_ldl_p(&scsiconf->sense_size) >= 65536 || +(uint32_t) virtio_ldl_p(&scsiconf->cdb_size) >= 256) { error_report("bad data written to virtio-scsi configuration space"); exit(1); } -vs->sense_size = ldl_raw(&scsiconf->sense_size); -vs->cdb_size = ldl_raw(&scsiconf->cdb_size); +vs->sense_size = virtio_ldl_p(&scsiconf->sense_size); +vs->cdb_size = virtio_ldl_p(&scsiconf->cdb_size); } static uint32_t virtio_scsi_get_features(VirtIODevice *vdev, -- 1.8.1.2
[Qemu-devel] [PATCH 0/7] Virtio support for endian-curious guests.
Virtio is currently defined as "guest-endian", but that's a slippery concept when the target can change endian. In particular, virtio devices fail on little-endian powerpc 64. Feedback welcome! Rusty. Rusty Russell (7): virtio: allow byte swapping for vring and config access target-ppc: ppc64 targets can be either endian. hw/net/virtio-net: use virtio wrappers to access headers. hw/net/virtio-balloon: use virtio wrappers to access page frame numbers. hw/block/virtio-blk: use virtio wrappers to access headers. hw/scsi/virtio-scsi: use virtio wrappers to access headers. hw/char/virtio-serial-bus: use virtio wrappers to access headers. configure | 1 + hw/block/virtio-blk.c | 35 +- hw/char/virtio-serial-bus.c | 34 +- hw/net/virtio-net.c | 15 +++-- hw/scsi/virtio-scsi.c | 33 - hw/virtio/virtio-balloon.c| 3 +- hw/virtio/virtio.c| 46 + include/hw/virtio/virtio-access.h | 138 ++ target-ppc/misc_helper.c | 8 +++ 9 files changed, 242 insertions(+), 71 deletions(-) create mode 100644 include/hw/virtio/virtio-access.h -- 1.8.1.2
[Qemu-devel] [PATCH 7/7] hw/char/virtio-serial-bus: use virtio wrappers to access headers.
Signed-off-by: Rusty Russell --- hw/char/virtio-serial-bus.c | 34 ++ 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index cc3d1dd..0421725 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -24,6 +24,7 @@ #include "hw/sysbus.h" #include "trace.h" #include "hw/virtio/virtio-serial.h" +#include "hw/virtio/virtio-access.h" static VirtIOSerialPort *find_port_by_id(VirtIOSerial *vser, uint32_t id) { @@ -185,9 +186,9 @@ static size_t send_control_event(VirtIOSerial *vser, uint32_t port_id, { struct virtio_console_control cpkt; -stl_p(&cpkt.id, port_id); -stw_p(&cpkt.event, event); -stw_p(&cpkt.value, value); +virtio_stl_p(&cpkt.id, port_id); +virtio_stw_p(&cpkt.event, event); +virtio_stw_p(&cpkt.value, value); trace_virtio_serial_send_control_event(port_id, event, value); return send_control_msg(vser, &cpkt, sizeof(cpkt)); @@ -291,8 +292,8 @@ static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len) return; } -cpkt.event = lduw_p(&gcpkt->event); -cpkt.value = lduw_p(&gcpkt->value); +cpkt.event = virtio_lduw_p(&gcpkt->event); +cpkt.value = virtio_lduw_p(&gcpkt->value); trace_virtio_serial_handle_control_message(cpkt.event, cpkt.value); @@ -312,10 +313,10 @@ static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len) return; } -port = find_port_by_id(vser, ldl_p(&gcpkt->id)); +port = find_port_by_id(vser, virtio_ldl_p(&gcpkt->id)); if (!port) { error_report("virtio-serial-bus: Unexpected port id %u for device %s", - ldl_p(&gcpkt->id), vser->bus.qbus.name); + virtio_ldl_p(&gcpkt->id), vser->bus.qbus.name); return; } @@ -342,9 +343,9 @@ static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len) } if (port->name) { -stl_p(&cpkt.id, port->id); -stw_p(&cpkt.event, VIRTIO_CONSOLE_PORT_NAME); -stw_p(&cpkt.value, 1); +virtio_stl_p(&cpkt.id, port->id); +virtio_stw_p(&cpkt.event, VIRTIO_CONSOLE_PORT_NAME); +virtio_stw_p(&cpkt.value, 1); buffer_len = sizeof(cpkt) + strlen(port->name) + 1; buffer = g_malloc(buffer_len); @@ -536,7 +537,7 @@ static void virtio_serial_save(QEMUFile *f, void *opaque) qemu_put_be32s(f, &s->config.max_nr_ports); /* The ports map */ -max_nr_ports = tswap32(s->config.max_nr_ports); +max_nr_ports = virtio_tswap32(s->config.max_nr_ports); for (i = 0; i < (max_nr_ports + 31) / 32; i++) { qemu_put_be32s(f, &s->ports_map[i]); } @@ -690,8 +691,8 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) qemu_get_be16s(f, &s->config.rows); qemu_get_be32s(f, &max_nr_ports); -tswap32s(&max_nr_ports); -if (max_nr_ports > tswap32(s->config.max_nr_ports)) { +virtio_tswap32s(&max_nr_ports); +if (max_nr_ports > virtio_tswap32(s->config.max_nr_ports)) { /* Source could have had more ports than us. Fail migration. */ return -EINVAL; } @@ -760,7 +761,7 @@ static uint32_t find_free_port_id(VirtIOSerial *vser) { unsigned int i, max_nr_ports; -max_nr_ports = tswap32(vser->config.max_nr_ports); +max_nr_ports = virtio_tswap32(vser->config.max_nr_ports); for (i = 0; i < (max_nr_ports + 31) / 32; i++) { uint32_t map, bit; @@ -846,7 +847,7 @@ static int virtser_port_qdev_init(DeviceState *qdev) } } -max_nr_ports = tswap32(port->vser->config.max_nr_ports); +max_nr_ports = virtio_tswap32(port->vser->config.max_nr_ports); if (port->id >= max_nr_ports) { error_report("virtio-serial-bus: Out-of-range port id specified, max. allowed: %u", max_nr_ports - 1); @@ -946,7 +947,8 @@ static int virtio_serial_device_init(VirtIODevice *vdev) vser->ovqs[i] = virtio_add_queue(vdev, 128, handle_output); } -vser->config.max_nr_ports = tswap32(vser->serial.max_virtserial_ports); +vser->config.max_nr_ports = + virtio_tswap32(vser->serial.max_virtserial_ports); vser->ports_map = g_malloc0(((vser->serial.max_virtserial_ports + 31) / 32) * sizeof(vser->ports_map[0])); /* -- 1.8.1.2
[Qemu-devel] [PATCH 3/7] hw/net/virtio-net: use virtio wrappers to access headers.
Signed-off-by: Rusty Russell --- hw/net/virtio-net.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 1ea9556..e77e28d 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -21,6 +21,7 @@ #include "hw/virtio/virtio-net.h" #include "net/vhost_net.h" #include "hw/virtio/virtio-bus.h" +#include "hw/virtio/virtio-access.h" #define VIRTIO_NET_VM_VERSION11 @@ -70,8 +71,8 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config) VirtIONet *n = VIRTIO_NET(vdev); struct virtio_net_config netcfg; -stw_p(&netcfg.status, n->status); -stw_p(&netcfg.max_virtqueue_pairs, n->max_queues); +virtio_stw_p(&netcfg.status, n->status); +virtio_stw_p(&netcfg.max_virtqueue_pairs, n->max_queues); memcpy(netcfg.mac, n->mac, ETH_ALEN); memcpy(config, &netcfg, n->config_size); } @@ -510,7 +511,7 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, s = iov_to_buf(iov, iov_cnt, 0, &mac_data.entries, sizeof(mac_data.entries)); -mac_data.entries = ldl_p(&mac_data.entries); +mac_data.entries = virtio_ldl_p(&mac_data.entries); if (s != sizeof(mac_data.entries)) { return VIRTIO_NET_ERR; } @@ -537,7 +538,7 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, s = iov_to_buf(iov, iov_cnt, 0, &mac_data.entries, sizeof(mac_data.entries)); -mac_data.entries = ldl_p(&mac_data.entries); +mac_data.entries = virtio_ldl_p(&mac_data.entries); if (s != sizeof(mac_data.entries)) { return VIRTIO_NET_ERR; } @@ -569,7 +570,7 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd, size_t s; s = iov_to_buf(iov, iov_cnt, 0, &vid, sizeof(vid)); -vid = lduw_p(&vid); +vid = virtio_lduw_p(&vid); if (s != sizeof(vid)) { return VIRTIO_NET_ERR; } @@ -604,7 +605,7 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd, return VIRTIO_NET_ERR; } -queues = lduw_p(&mq.virtqueue_pairs); +queues = virtio_lduw_p(&mq.virtqueue_pairs); if (queues < VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN || queues > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX || @@ -903,7 +904,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t } if (mhdr_cnt) { -stw_p(&mhdr.num_buffers, i); +virtio_stw_p(&mhdr.num_buffers, i); iov_from_buf(mhdr_sg, mhdr_cnt, 0, &mhdr.num_buffers, sizeof mhdr.num_buffers); -- 1.8.1.2
Re: [Qemu-devel] vhost acceleration broken?
Anthony Liguori writes: > "Michael S. Tsirkin" writes: > >> On Thu, Jul 25, 2013 at 04:56:05PM +0200, Andreas Färber wrote: >>> Am 25.07.2013 16:52, schrieb Michael S. Tsirkin: >>> > On Thu, Jul 25, 2013 at 08:28:00AM -0500, Anthony Liguori wrote: >>> >> We have a pretty awful legacy command line set that comes from years of >>> >> half-baked concepts and the years when too many people just committed >>> >> random shit to the tree. >>> >> >>> >> I think we probably need to start planning for a clean break. Maybe >>> >> that's a good target for a 2.0 version... >>> > >>> > Assuming -netdev supports all required configurations, we should >>> > remove -net from the -help output. >>> >>> Peter had raised the issue of -netdev not working well with boards that >>> already supply a default NIC - was there a solution yet? >>> >>> Andreas >> >> Whoever is removing -net will have to code that up. Want to do this? > > I would not rush to remove things. If we're going to go through a > deprecation process, we should start with a proposal on what things > should be removed and go from there. > > I still don't even think -netdev is the right answer here either. > Wouldn't it make more sense to have something like: > > qemu -vnic tap,script=/foo/myscript > > Or something vaguely understandable by a human? OK. It seems to me that each net device has a host side and a guest side, which you can mix and match. So the commandline should reflect that explicitly: qemu -hostdev net,[tap|user|bridge|socket|vde], -guestdev net, If you have a built-in net device on your emulated board, you've got one implied -guestdev net. And similar principles apply to other things like consoles and disks. Now, guest and host terms may suck. But please pick one terminology and use it *everywhere*. Documentation, code and cmdline. Thanks, Rusty.
Re: [Qemu-devel] vhost acceleration broken?
Anthony Liguori writes: > On Wed, Jul 24, 2013 at 8:55 PM, Rusty Russell wrote: >> Hi all, >> >> Using latest kernel and master qemu, the following doesn't use >> vhost acceleration: >> >> sudo qemu-system-x86_64 -machine pc,accel=kvm $ARGS -m 1024 -net >> tap,script=/home/rusty/bin/kvm-ifup,downscript=no,vhost=on -net >> nic,model=virtio -drive file=$QEMUIMAGE,index=0,media=disk,if=virtio -kernel >> arch/x86/boot/bzImage -append "root=/dev/vda1 $KARGS $*" > > sudo qemu-system-x86_64 -enable-kvm $ARGS -m 1G -netdev > tap,script=/home/rusty/bin/kvm-ifup,vhost=on,id=net0 -device > virtio-net-pci,netdev=net0 -drive file=$QEMUIMAGE,if=virtio -kernel > arch/x86/boot/bzImage -append "root=/dev/vda1 $KARGS $*" > > We really ought to strongly deprecate -net because it's misleading. I > suspect we can reasonably add a warning for model=virtio saying > "please don't use this" and eventually remove it entirely. Thankyou, that works. I'm sure you've thought more deeply about the qemu cmdline than I have, so I won't comment. Cheers, Rusty.
[Qemu-devel] vhost acceleration broken?
Hi all, Using latest kernel and master qemu, the following doesn't use vhost acceleration: sudo qemu-system-x86_64 -machine pc,accel=kvm $ARGS -m 1024 -net tap,script=/home/rusty/bin/kvm-ifup,downscript=no,vhost=on -net nic,model=virtio -drive file=$QEMUIMAGE,index=0,media=disk,if=virtio -kernel arch/x86/boot/bzImage -append "root=/dev/vda1 $KARGS $*" Culprit is here: hw/net/virtio-net.c:virtio_net_vhost_status(): if (nc->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) { return; } info->type is NET_CLIENT_OPTIONS_KIND_HUBPORT. At a glance, it seems like vlan is always enabled, and that means a hub, so that change silently disabled vhost acceleration. It's quite possible that I've screwed up qemu's impenetrable command line (-net or -netdev, who knows what's better?). Frustrated, Rusty.
Re: [Qemu-devel] [PATCH] virtio-net: put virtio net header inline with data
"Michael S. Tsirkin" writes: > On Tue, Jul 09, 2013 at 11:46:23AM +0930, Rusty Russell wrote: >> "Michael S. Tsirkin" writes: >> > For small packets we can simplify xmit processing >> > by linearizing buffers with the header: >> > most packets seem to have enough head room >> > we can use for this purpose. >> > Since existing hypervisors require that header >> > is the first s/g element, we need a feature bit >> > for this. >> > >> > Signed-off-by: Michael S. Tsirkin >> > --- >> > >> > Note: this needs to be applied on top of patch >> > defining VIRTIO_F_ANY_LAYOUT - bit to be >> > selected by Rusty. >> > >> > The following patch should work for any definition of >> > VIRTIO_F_ANY_LAYOUT - I used bit 31 for testing. >> > Rusty, could you please pick a valid bit for VIRTIO_F_ANY_LAYOUT >> > and squeeze this patch into 3.11? >> >> Sorry, it's too late. >> >> First, I've been a bit lax in sending patches via DaveM, and this is >> definitely his territory (ie. more net than virtio). >> >> Secondly, it needs baking and testing time. >> >> Cheers, >> Rusty. > > Okay. But we can already commit the qemu change, right? > Also, can you submit the spec update please? The spec has been updated, BTW. Cheers, Rusty.
Re: [Qemu-devel] virtio indirect with lots of descriptors
Dave Airlie writes: > Hi Rusty, > > playing with my virtio gpu, I started hitting the qemu > error_report("Too many read descriptors in indirect table"); > > Now I'm not sure but this doesn't seem to be a virtio limit that the > guest catches from what I can see, since my host dies quite quickly, > when I'm doing transfers in/out of a 5MB object with an sg entry per > page. > > Just wondering if you can confirm if this is only a qemu limitation or > if I should just work around it at a bit of a higher level in my > driver/device? You're not allowed to place more descriptors in a single request than there are elements in the ring *even* if you use indirects, which are seen as an optimization (thus you can always fall back to direct descriptors if OOM). We could change this rule in the 1.0 spec if required, or even make a special rule for your device, but for the moment that's how it is. Hope that helps, Rusty.
Re: [Qemu-devel] [PATCH] virtio-net: put virtio net header inline with data
"Michael S. Tsirkin" writes: > For small packets we can simplify xmit processing by linearizing buffers > with the header: most packets seem to have enough head room we can use > for this purpose. > > Since some older hypervisors (e.g. qemu before version 1.5) > required that header is the first s/g element, > we need a feature bit for this. OK, we know this is horrible. But I will sleep better knowing that we this feature need never make it into a final 1.0 spec, since it can be assumed at that point... > pr_debug("%s: xmit %p %pM\n", vi->dev->name, skb, dest); > + if (vi->mergeable_rx_bufs) > + hdr_len = sizeof hdr->mhdr; > + else > + hdr_len = sizeof hdr->hdr; > + > + can_push = vi->any_header_sg && > + !((unsigned long)skb->data & (__alignof__(*hdr) - 1)) && > + !skb_header_cloned(skb) && skb_headroom(skb) >= hdr_len; Idle thought: how often does this fail? Would it suck if we copied headers which didn't let us prepend data? Or could we bump dev->hard_header_len appropriately? Thanks, Rusty.
Re: [Qemu-devel] [PATCH RFC] virtio-pci: support config layout in BAR1
"Michael S. Tsirkin" writes: > Some setups don't support enabling BAR0 (IO BAR). Reasons range from CPU > limitations (e.g. on some powerpc setups) to architecture limmitations > (e.g. a setup with >15 PCI bridges, with one virtio device behind each, > on x86). > > PCI Express spec made IO optional, so future guests will disable IO for > a device in more and more configurations. > > This patch makes it possible for host to mirror the config in BAR1, such > that these setups can work properly. > > Guests with old drivers can't be fixed, they will continue to work as > well (or as bad) as they did previously. For this reason, changing > revision id appears unnecessary - it would break setups that previously > worked, partially. > > Future work on re-organizing layout won't conflict with this patch - it > can use a different BAR or put config at an offset, or update revision. > > Signed-off-by: Michael S. Tsirkin Since QEMU won't support this, I think we'll have to wait for the PCI capabilities, ie. the virtio spec 1.0. Cheers, Rusty.
Re: [Qemu-devel] updated: kvm networking todo wiki
Anthony Liguori writes: > "Michael S. Tsirkin" writes: > >> On Thu, May 30, 2013 at 08:40:47AM -0500, Anthony Liguori wrote: >>> Stefan Hajnoczi writes: >>> >>> > On Thu, May 30, 2013 at 7:23 AM, Rusty Russell >>> > wrote: >>> >> Anthony Liguori writes: >>> >>> Rusty Russell writes: >>> >>>> On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote: >>> >>>>> FWIW, I think what's more interesting is using vhost-net as a >>> >>>>> networking >>> >>>>> backend with virtio-net in QEMU being what's guest facing. >>> >>>>> >>> >>>>> In theory, this gives you the best of both worlds: QEMU acts as a >>> >>>>> first >>> >>>>> line of defense against a malicious guest while still getting the >>> >>>>> performance advantages of vhost-net (zero-copy). >>> >>>>> >>> >>>> It would be an interesting idea if we didn't already have the vhost >>> >>>> model where we don't need the userspace bounce. >>> >>> >>> >>> The model is very interesting for QEMU because then we can use vhost as >>> >>> a backend for other types of network adapters (like vmxnet3 or even >>> >>> e1000). >>> >>> >>> >>> It also helps for things like fault tolerance where we need to be able >>> >>> to control packet flow within QEMU. >>> >> >>> >> (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 >>> >> thoughts). >>> >> >>> >> Then I'm really confused as to what this would look like. A zero copy >>> >> sendmsg? We should be able to implement that today. >>> >> >>> >> On the receive side, what can we do better than readv? If we need to >>> >> return to userspace to tell the guest that we've got a new packet, we >>> >> don't win on latency. We might reduce syscall overhead with a >>> >> multi-dimensional readv to read multiple packets at once? >>> > >>> > Sounds like recvmmsg(2). >>> >>> Could we map this to mergable rx buffers though? >>> >>> Regards, >>> >>> Anthony Liguori >> >> Yes because we don't have to complete buffers in order. > > What I meant though was for GRO, we don't know how large the received > packet is going to be. Mergable rx buffers lets us allocate a pool of > data for all incoming packets instead of allocating max packet size * > max packets. > > recvmmsg expects an array of msghdrs and I presume each needs to be > given a fixed size. So this seems incompatible with mergable rx > buffers. Good point. You'd need to build 64k buffers to pass to recvmmsg, then reuse the parts it didn't touch on the next call. This limits us to about a 16th of what we could do with an interface which understood buffer merging, but I don't know how much that would matter in practice. We'd need some benchmarks Cheers, Rusty.
Re: [Qemu-devel] updated: kvm networking todo wiki
Stefan Hajnoczi writes: > On Thu, May 30, 2013 at 7:23 AM, Rusty Russell wrote: >> On the receive side, what can we do better than readv? If we need to >> return to userspace to tell the guest that we've got a new packet, we >> don't win on latency. We might reduce syscall overhead with a >> multi-dimensional readv to read multiple packets at once? > > Sounds like recvmmsg(2). Wow... the future is here, today! Thanks, Rusty.
Re: [Qemu-devel] updated: kvm networking todo wiki
Anthony Liguori writes: > Rusty Russell writes: >> On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote: >>> FWIW, I think what's more interesting is using vhost-net as a networking >>> backend with virtio-net in QEMU being what's guest facing. >>> >>> In theory, this gives you the best of both worlds: QEMU acts as a first >>> line of defense against a malicious guest while still getting the >>> performance advantages of vhost-net (zero-copy). >>> >> It would be an interesting idea if we didn't already have the vhost >> model where we don't need the userspace bounce. > > The model is very interesting for QEMU because then we can use vhost as > a backend for other types of network adapters (like vmxnet3 or even > e1000). > > It also helps for things like fault tolerance where we need to be able > to control packet flow within QEMU. (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts). Then I'm really confused as to what this would look like. A zero copy sendmsg? We should be able to implement that today. On the receive side, what can we do better than readv? If we need to return to userspace to tell the guest that we've got a new packet, we don't win on latency. We might reduce syscall overhead with a multi-dimensional readv to read multiple packets at once? Confused, Rusty.
Re: [Qemu-devel] [PATCH v2 00/11] qemu: use virtio linux headers in portable code
Anthony Liguori writes: > The headers say they are BSD licensed... but they include a GPLv2+ > header. Doesn't make a lot of sense, does it? It makes perfect sense: you're overthinking it. It just means that copying the BSD headers outside Linux is encouraged. And it's clearly nonsensical to claim the GPL on kernel headers means that userspace needs to be GPL. So please ignore this licensing red-herring. And we'll bikeshed the headers in the standard when we have to :) They certainly don't need to be cut&paste into the kernel sources. Cheers, Rusty.
Re: [Qemu-devel] [PATCH v2 00/11] qemu: use virtio linux headers in portable code
Anthony Liguori writes: > Rusty Russell writes: > >> Anthony Liguori writes: >>> Paolo Bonzini writes: >>> >>>> Il 26/05/2013 22:02, Michael S. Tsirkin ha scritto: >>>>> > My fault. I should have looked at linux/types.h (actually >>>>> > asm-generic/). >>>>> >>>>> Not really, __uX appear in the headers that were posted. >>> >>> Which is a problem because this is a reserved namespace in C99. >> >> Personally, I find it hard to care. What matters is not what the >> standard has carved out, but whether we have clashes, reserved namespace >> or no. And that won't happen for these. >> >> If someone wants to convert all the kernel headers, I won't NAK it >> though. > > virtio headers are special. Linux headers are normally only consumed in > the kernel or in a userspace application running on Linux. > > virtio headers may be used either in a userspace application running on > !Linux (we need to support QEMU on Windows) or even in a foreign kernel. No. s/virtio/SCSI/. s/virtio/if_eth/. s/virtio/TCPIP/. Cheers, Rusty.
Re: [Qemu-devel] updated: kvm networking todo wiki
"Michael S. Tsirkin" writes: > On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote: >> "Michael S. Tsirkin" writes: >> >> > On Fri, May 24, 2013 at 05:41:11PM +0800, Jason Wang wrote: >> >> On 05/23/2013 04:50 PM, Michael S. Tsirkin wrote: >> >> > Hey guys, >> >> > I've updated the kvm networking todo wiki with current projects. >> >> > Will try to keep it up to date more often. >> >> > Original announcement below. >> >> >> >> Thanks a lot. I've added the tasks I'm currently working on to the wiki. >> >> >> >> btw. I notice the virtio-net data plane were missed in the wiki. Is the >> >> project still being considered? >> > >> > It might have been interesting several years ago, but now that linux has >> > vhost-net in kernel, the only point seems to be to >> > speed up networking on non-linux hosts. >> >> Data plane just means having a dedicated thread for virtqueue processing >> that doesn't hold qemu_mutex. >> >> Of course we're going to do this in QEMU. It's a no brainer. But not >> as a separate device, just as an improvement to the existing userspace >> virtio-net. >> >> > Since non-linux does not have kvm, I doubt virtio is a bottleneck. >> >> FWIW, I think what's more interesting is using vhost-net as a networking >> backend with virtio-net in QEMU being what's guest facing. >> >> In theory, this gives you the best of both worlds: QEMU acts as a first >> line of defense against a malicious guest while still getting the >> performance advantages of vhost-net (zero-copy). > > Great idea, that sounds very intresting. > > I'll add it to the wiki. > > In fact a bit of complexity in vhost was put there in the vague hope to > support something like this: virtio rings are not translated through > regular memory tables, instead, vhost gets a pointer to ring address. > > This allows qemu acting as a man in the middle, > verifying the descriptors but not touching the > > Anyone interested in working on such a project? It would be an interesting idea if we didn't already have the vhost model where we don't need the userspace bounce. We already have two sets of host side ring code in the kernel (vhost and vringh, though they're being unified). All an accelerator can offer on the tx side is zero copy and direct update of the used ring. On rx userspace could register the buffers and the accelerator could fill them and update the used ring. It still needs to deal with merged buffers, for example. You avoid the address translation in the kernel, but I'm not convinced that's a key problem. Cheers, Rusty.
Re: [Qemu-devel] [PATCH v2 00/11] qemu: use virtio linux headers in portable code
Anthony Liguori writes: > Paolo Bonzini writes: > >> Il 26/05/2013 22:02, Michael S. Tsirkin ha scritto: >>> > My fault. I should have looked at linux/types.h (actually asm-generic/). >>> >>> Not really, __uX appear in the headers that were posted. > > Which is a problem because this is a reserved namespace in C99. Personally, I find it hard to care. What matters is not what the standard has carved out, but whether we have clashes, reserved namespace or no. And that won't happen for these. If someone wants to convert all the kernel headers, I won't NAK it though. > Perhaps it's even worth moving the headers from uapi/linux to > uapi/virtio. Rusty, what do you think? Hmm, #include etc would be worthwhile if that also worked on FreeBSD. Bryan CC'd... Cheers, Rusty.
Re: [Qemu-devel] [PATCH 1/2 V4] virtio-spec: dynamic network offloads configuration
"Michael S. Tsirkin" writes: > On Mon, May 20, 2013 at 03:43:51PM +0200, Paolo Bonzini wrote: >> Is there a story behind skipping virtio-net feature bits 2..4? >> >> Paolo > > Bits 3-4 now :) > I'm curious too. Not a good one :) The year is 2007. virtio_net was the posterchild of free expression and crazy experimentation. We were all so very young... The original host TSO bits in virtio_net were like so: #define VIRTIO_NET_F_TSO4 1 #define VIRTIO_NET_F_UFO 2 #define VIRTIO_NET_F_TSO4_ECN 3 #define VIRTIO_NET_F_TSO6 4 But I decided we might as well offer all or nothing, and if you couldn't handle one, you could just do it in software: #define VIRTIO_NET_F_GSO 6 Reading the git commits, it seems Linux didn't do UFO in software (at the time, at least), so the bits were split out again, with the explicit _HOST_ in the names: #define VIRTIO_NET_F_HOST_TSO4 11 /* Host can handle TSOv4 in. */ #define VIRTIO_NET_F_HOST_TSO6 12 /* Host can handle TSOv6 in. */ #define VIRTIO_NET_F_HOST_ECN 13 /* Host can handle TSO[6] w/ ECN in. */ #define VIRTIO_NET_F_HOST_UFO 14 /* Host can handle UFO in. */ This was all before there was a host implementation, so we were playing. ie. the bits are perfectly fine for reuse. Cheers, Rusty.
Re: [Qemu-devel] [PATCH 0/1 V4] virtio-net: dynamic network offloads configuration
Dmitry Fleytman writes: > Spec patch already inside. > > Sent from my iPad > > On Apr 20, 2013, at 8:04 PM, "Michael S. Tsirkin" wrote: > >> On Fri, Apr 19, 2013 at 10:10:01AM +0300, Dmitry Fleytman wrote: >>> Hello All, >>> >>> Any news regarding this patch? >>> >>> Thanks, >>> Dmitry >> >> Rusty could you comment on the spec change soon please? >> If you pick it up I think we can include the feature in QEMU 1.5. Indeed, spec patch already done. I'm happy... Thanks, Rusty.
Re: [Qemu-devel] [PATCH 1/2 V3] virtio-spec: dynamic network offloads configuration
Dmitry Fleytman writes: > From: Dmitry Fleytman > > Virtio-net driver currently negotiates network offloads > on startup via features mechanism and have no ability to > change offloads state later. > This patch introduced a new control command that allows > to configure device network offloads state dynamically. > The patch also introduces a new feature flag > VIRTIO_NET_F_CTRL_GUEST_OFFLOADS. > > Signed-off-by: Dmitry Fleytman (BTW, I like to be CC'd on these things directly, so I don't miss them) The idea is fine. But I dislike the duplication of constants: let's just use the feature bits directly: #define VIRTIO_NET_F_GUEST_CSUM 1 /* Guest handles pkts w/ partial csum */ #define VIRTIO_NET_F_GUEST_TSO4 7 /* Guest can handle TSOv4 in. */ #define VIRTIO_NET_F_GUEST_TSO6 8 /* Guest can handle TSOv6 in. */ #define VIRTIO_NET_F_GUEST_ECN 9 /* Guest can handle TSO[6] w/ ECN in. */ #define VIRTIO_NET_F_GUEST_UFO 10 /* Guest can handle UFO in. */ You want this, because you have to test against them anyway before trying to re-enable them. And secondly, it'll be much clearer if you don't say "change" but "disable and re-enable", which is what's actually allowed. Thanks, Rusty.
Re: [Qemu-devel] [RFC qemu PATCH] only writing out the last byte of MAC makes it have effect
"Michael S. Tsirkin" writes: > On Thu, Mar 21, 2013 at 02:44:50PM +0800, Amos Kong wrote: >> The lengcy guests don't have mac programming command, we don't know when >> it's safe to use MAC. This patch changed qemu to makes MAC change effect >> when the last byte of MAC is written to config space. >> >> MAC address takes first 6 bytes of config space of virtio-net, the addr >> is 5 when the last byte is written in virtio_config_writeb(). >> >> MAC change will effect when n->mac is updated in virtio_net_set_config(). >> >> Signed-off-by: Amos Kong > > Let's see what Rusty says about the spec change. Implementation notes like this belong as a footnote, eg: For older systems, it is recommended and typical that the device write byte 5 of the mac address last, so devices can use that as a trigger to commit the mac address change. Now, is this a real, or theoretical issue? Have we seen this problem in practice, or should we continue to ignore it? Cheers, Rusty.
Re: [Qemu-devel] virtio-s390: document GPR4/GPR2 cookie values
Cornelia Huck writes: > On Thu, 7 Mar 2013 20:02:21 +0200 > "Michael S. Tsirkin" wrote: > >> virtio-s390 on kvm can use a cookie value passed to guest > > s/virtio-s390/virtio-ccw/ (to avoid confusion with s390-virtio, which > was never specced) > >> to optimize channel/VQ lookups. >> Document this. >> >> Signed-off-by: Michael S. Tsirkin > Otherwise: > Reviewed-by: Cornelia Huck > > Rusty, could you please apply? Modified and committed. Thanks! Rusty.
Re: [Qemu-devel] Headsup: windows virtio networking does not work on current git
"Michael S. Tsirkin" writes: > Maybe we should ask for some centrally assigned vendor id? > We could ask IANA to keep the database. Or we could make it the task of the virtio spec. I don't think it's vital... Cheers, Rusty.
Re: [Qemu-devel] Headsup: windows virtio networking does not work on current git
Vadim Rozenfeld writes: > On Tue, 2013-02-05 at 13:58 +0200, Michael S. Tsirkin wrote: >> On Tue, Feb 05, 2013 at 03:45:38PM +0400, Michael Tokarev wrote: >> Is it really >> > that bad that the config space size changed? Why it has this effect? > Because in this case it's hard to distinguish between resource's > corruption and HW update. But it's also true that if we'd incremented revid you'd have the same failure in this case, right? Cheers, Rusty.
Re: [Qemu-devel] Headsup: windows virtio networking does not work on current git
Anthony Liguori writes: > Rusty Russell writes: >> If I could find a way, I'd like to create some code as an appendix to >> the virtio spec which would torture test each driver and/or device by >> configuring it in strange ways. But that's pure speculation at this >> point. > > It wouldn't be so hard to add a torture parameter to the virtio > implementation in QEMU that would do silly things that were still within > the bounds of the specification. Larger config sizes, advertisement of > unknown feature bits, etc. My thought is that alongside the spec we provide two libraries for each device class: a device-side and a driver-side. Each one gets fed an integer (which controls which craziness it does) and you test against each variant. For testing qemu, we could either put sew this test code into Linux, or hack it into qemu and pretend it was guest-driven (handwave). >> In practice, we'd want to formalize it into a string and a version, and >> hope the version gets bumped appropriately. Because it'll actually be >> used for bug detection. > > Sure. > >> But AFAICT that would be useless in this case. We really want to know about >> the guest before we even start it. > > We can't just prepare to fight yesterday's wars :-) s/just/even/. > We'd want to know the string before we expose host features. That's > easy. Absolutely. But we'd need a virtio2-like spec change, which insisted that the driver write this version number somewhere before inspecting any other field. Or? > How we expose config space in virtio2 is a separate discussion. If we > take a list approach like you've talked about before, it would be much > harder for drivers to assume anything about the BAR size for config > since the size would always be different. I think it's the same discussion. Strings are hard: how about a 16 bit implementation id (don't clash with others) and a 16 bit version number (increment as driver changes). Should we also loosen revid to be a version number for the device? It's only 8 bits though, so perhaps new config fields are better. Cheers, Rusty.
Re: [Qemu-devel] Headsup: windows virtio networking does not work on current git
Anthony Liguori writes: > Michael Tokarev writes: > >> 03.02.2013 17:23, Yan Vugenfirer wrote: >> If it helps, mq changes the config size from 8 to 16 bytes. If the driver was making an assumption about an 8-byte config size, that's likely what the problem is. >>> That's exactly the problem. ... > N.B. this is a pretty nasty bug in the guest driver. Any new virtio-net > feature is going to trigger it (not just multiqueue). So while pc-1.3 > will work forever with this guest image, it's pretty much guaranteed to > break at some point in the future. Wow, yes. I never expected such a bug. I probably don't even want to know how this happened, right? If I could find a way, I'd like to create some code as an appendix to the virtio spec which would torture test each driver and/or device by configuring it in strange ways. But that's pure speculation at this point. >> It's easy to turn off mq by default and turn it on when >> needed... >> >> The problem now is that it isn't obvious why your existing >> setup breaks when you upgrade. While when you especially >> play with mq, you may look at options available around it... > > If we ever to get virtio2, a really handy feature to have would be a > driver identification string. Even if was only informative (and not > authoritative, we've had a lot of issues like this and being able to > make work arounds conditional on the driver identification string would > be nice. In practice, we'd want to formalize it into a string and a version, and hope the version gets bumped appropriately. Because it'll actually be used for bug detection. But AFAICT that would be useless in this case. We really want to know about the guest before we even start it. Cheers, Rusty.
Re: [Qemu-devel] [QEMU PATCH v2] virtio-net: introduce a new macaddr control
ak...@redhat.com writes: > @@ -349,6 +351,14 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t > cmd, > { > struct virtio_net_ctrl_mac mac_data; > > +if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET && elem->out_num == 2 && > +elem->out_sg[1].iov_len == ETH_ALEN) { > +/* Set MAC address */ > +memcpy(n->mac, elem->out_sg[1].iov_base, elem->out_sg[1].iov_len); > +qemu_format_nic_info_str(&n->nic->nc, n->mac); > +return VIRTIO_NET_OK; > +} Does the rest of the net device still rely on the layout of descriptors? If so, OK, we'll fix them all together. If not, this introduces a new one. Cheers, Rusty.
Re: [Qemu-devel] [RFC PATCH 2/2] virtio-net: introduce a new control to set macaddr
"Michael S. Tsirkin" writes: >> +if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) { >> +/* Set MAC address by writing config space */ >> vdev->config->set(vdev, offsetof(struct virtio_net_config, mac), >>dev->dev_addr, dev->addr_len); >> +} >> > > By the way, why don't we fail the command if VIRTIO_NET_F_MAC is off? > Rusty? Looked back through the history for this one. I think the theory is that if the guest doesn't set VIRTIO_NET_F_MAC, it means it doesn't care: it's up to the guest. Cheers, Rusty.
Re: [Qemu-devel] [RFC PATCH 0/2] make mac programming for virtio net more robust
"Michael S. Tsirkin" writes: > On Thu, Jan 10, 2013 at 10:45:39PM +0800, ak...@redhat.com wrote: >> From: Amos Kong >> >> Currenly mac is programmed byte by byte. This means that we >> have an intermediate step where mac is wrong. >> >> Second patch introduced a new vq control command to set mac >> address in one time. > > As you mention we could alternatively do it without > new commands, simply add a feature bit that says that MACs are > in the mac table. > This would be a much bigger patch, and I'm fine with either way. > Rusty what do you think? Hmm, mac filtering and "my mac address" are not quite the same thing. I don't know if it matters for anyone: does it? The mac address is abused for things like identifying machines, etc. If we keep it as a separate concept, Amos' patch seems to make sense. Cheers, Rusty.
Re: [Qemu-devel] [PATCH v6 12/12] virtio-blk: add x-data-plane=on|off performance feature
"Michael S. Tsirkin" writes: > On Tue, Dec 18, 2012 at 03:57:17PM +0100, Stefan Hajnoczi wrote: >> > > @@ -407,6 +409,14 @@ static void virtio_blk_handle_output(VirtIODevice >> > > *vdev, VirtQueue *vq) >> > > .num_writes = 0, >> > > }; >> > > >> > > +/* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so >> > > start >> > > + * dataplane here instead of waiting for .set_status(). >> > > + */ >> > >> > By the way which guests are these? >> >> I ran a Windows 8 guest today with build 48 virtio-win drivers. It >> notifies before the device gets its .set_status() callback invoked. >> But I could swear I've seen Linux guests do this too. > > > That's very broken. But looking at linux drivers it also > seems linux guests do this even today. > We have: > >err = drv->probe(dev); > if (err) > add_status(dev, VIRTIO_CONFIG_S_FAILED); > else { > add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); > if (drv->scan) > drv->scan(dev); > } > > this means that unless drivers implement scan() they > will make device active before DRIVER_OK is written > as the result linux can access it and kick. > And almost no drivers implement scan. > Nasty. Yes, that's true. But as long as they have completed feature negotiation, we allow this (that's why we tool feature negotiation out of the drivers). For example, filling an input virtqueue may well mean we kick the vq, and almost every device does this. virtio_block is the worst: add_disk() does partition scanning. > Rusty, what do you think? Worth fixing? If we tried, I'm fairly sure things would slip through. My feeling has been that we should not rely on the status to indicate readiness. > It does mean that for now we are stuck > with a work around, but I think we need it > in virtio core in qemu, it's not dataplane > specific. Yes, it's a general problem. Cheers, Rusty.
Re: [Qemu-devel] [PATCHv2] virtio: verify that all outstanding buffers are flushed
Anthony Liguori writes: > Yes, take a look at the series I sent out that scrubs all of this to > just send the index and the addresses of the element. > > We technically should save the addresses and sizes too. It makes it a > heck of a lot safer then re-reading guest memory since we do some > validation on the size of the sg elements. Not really. The guest puts the descriptors in the ring and leaves them there until the device acks. If it changes them once they're exposed but before they're acked, it can get either before or after version, and always could. Cheers, Rusty.
Re: [Qemu-devel] [PATCH 3/4] virtio: modify savevm to have a stable wire format
Anthony Liguori writes: > Rusty Russell writes: > >> Anthony Liguori writes: >>> We were memcpy()'ing a structure to the wire :-/ Since savevm really >>> only works on x86 today, lets just declare that this element is sent >>> over the wire as a little endian value in order to fix the bitness. >>> >>> Unfortunately, we also send raw pointers and size_t which are going >>> to be different values on a 32-bit vs. 64-bit QEMU so we need to also >>> deal with that case. >>> >>> A lot of values that should have been previously ignored are now sent >>> as 0 and ignored on the receive side too. >> >> Don't we want to transition to vmstate anyway? Can we just do that, and >> relegate the existing slightly broken code, to legacy? > > What worries me is if someone changes VirtQueueElement, then all the > sudden migration breaks. By transitioning to what I've sent, we at > least have a better documented protocol that isn't prone to subtle > breakage anymore. Plus, we resolve the endian issue before it becomes a > bigger problem when David actually gets live migration working reliably > on PPC... My transition was to copy that structure to VirtQueueSavedElement, and it's only used for loading old versions. With the new code we only need the head from that structure. > I'm certainly in favor of cleaning up the savevm format and probably > leaving the existing load/save functions as-is for legacy purposes. > I'll leave that as an exercise for someone else though :-) What is the rule about new versions? Can we introduce a new save version at any time, or only at major qemu version changes? Thanks, Rusty.
Re: [Qemu-devel] [PATCH 3/4] virtio: modify savevm to have a stable wire format
Anthony Liguori writes: > We were memcpy()'ing a structure to the wire :-/ Since savevm really > only works on x86 today, lets just declare that this element is sent > over the wire as a little endian value in order to fix the bitness. > > Unfortunately, we also send raw pointers and size_t which are going > to be different values on a 32-bit vs. 64-bit QEMU so we need to also > deal with that case. > > A lot of values that should have been previously ignored are now sent > as 0 and ignored on the receive side too. Don't we want to transition to vmstate anyway? Can we just do that, and relegate the existing slightly broken code, to legacy? Cheers, Rusty.
Re: [Qemu-devel] [PATCH] virtio: verify that all outstanding buffers are flushed (was Re: vmstate conversion for virtio?)
Anthony Liguori writes: > Rusty Russell writes: > >> "Michael S. Tsirkin" writes: >> >> No, because I don't understand it. Is it true for the case of >> virtio_blk, which has outstanding requests? >> >>>> Currently we dump a massive structure; it's inelegant at the very >>>> least. > > Inelegant is a kind word.. > > There's a couple things to consider though which is why this code hasn't > changed so far. > > 1) We're writing native endian values to the wire. This is seriously >broken. Just imagine trying to migrate from qemu-system-i386 on an >big endian box to a little endian box. > > 2) Fixing (1) either means (a) breaking migration across the board >gracefully or (b) breaking migration on [big|little] endian hosts in >an extremely ungraceful way. > > 3) We send a ton of crap over the wire that is unnecessary, but we need >to maintain it. > > I wrote up a patch series to try to improve the situation that I'll send > out. I haven't gotten around to testing it with an older version of > QEMU yet. > > I went for 2.b and choose to break big endian hosts. Since we only actually want to save the descriptor head, I was planning on a new format version. That will fix both... Look forward to your patch, Rusty.
Re: [Qemu-devel] [PATCH] virtio: verify that all outstanding buffers are flushed (was Re: vmstate conversion for virtio?)
"Michael S. Tsirkin" writes: > On Thu, Dec 06, 2012 at 04:33:06PM +1030, Rusty Russell wrote: >> "Michael S. Tsirkin" writes: >> > Add sanity check to address the following concern: >> > >> > On Wed, Dec 05, 2012 at 09:47:22AM +1030, Rusty Russell wrote: >> >> All we need is the index of the request; the rest can be re-read from >> >> the ring. >> >> The terminology I used here was loose, indeed. >> >> We need the head of the chained descriptor, which we already read from >> the ring when we gathered the request. > > So ack that patch? No, because I don't understand it. Is it true for the case of virtio_blk, which has outstanding requests? >> Currently we dump a massive structure; it's inelegant at the very least. >> >> Cheers, >> Rusty. > > Hmm not sure what you refer to. I see this per ring: > > qemu_put_be32(f, vdev->vq[i].vring.num); > qemu_put_be64(f, vdev->vq[i].pa); > qemu_put_be16s(f, &vdev->vq[i].last_avail_idx); > > Looks like there's no way around savng these fields. Not what I'm referring to. See here: virtio.h defines a 48k structure: #define VIRTQUEUE_MAX_SIZE 1024 typedef struct VirtQueueElement { unsigned int index; unsigned int out_num; unsigned int in_num; hwaddr in_addr[VIRTQUEUE_MAX_SIZE]; hwaddr out_addr[VIRTQUEUE_MAX_SIZE]; struct iovec in_sg[VIRTQUEUE_MAX_SIZE]; struct iovec out_sg[VIRTQUEUE_MAX_SIZE]; } VirtQueueElement; virtio-blk.c uses it in its request struct: typedef struct VirtIOBlockReq { VirtIOBlock *dev; VirtQueueElement elem; struct virtio_blk_inhdr *in; struct virtio_blk_outhdr *out; struct virtio_scsi_inhdr *scsi; QEMUIOVector qiov; struct VirtIOBlockReq *next; BlockAcctCookie acct; } VirtIOBlockReq; ... and saves it in virtio_blk_save: static void virtio_blk_save(QEMUFile *f, void *opaque) { VirtIOBlock *s = opaque; VirtIOBlockReq *req = s->rq; virtio_save(&s->vdev, f); while (req) { qemu_put_sbyte(f, 1); qemu_put_buffer(f, (unsigned char*)&req->elem, sizeof(req->elem)); req = req->next; } qemu_put_sbyte(f, 0); } Cheers, Rusty.
Re: [Qemu-devel] [PATCH] virtio: verify that all outstanding buffers are flushed (was Re: vmstate conversion for virtio?)
"Michael S. Tsirkin" writes: > Add sanity check to address the following concern: > > On Wed, Dec 05, 2012 at 09:47:22AM +1030, Rusty Russell wrote: >> All we need is the index of the request; the rest can be re-read from >> the ring. The terminology I used here was loose, indeed. We need the head of the chained descriptor, which we already read from the ring when we gathered the request. Currently we dump a massive structure; it's inelegant at the very least. Cheers, Rusty.
Re: [Qemu-devel] vmstate conversion for virtio?
Juan Quintela writes: > Rusty Russell wrote: >> Hi all, >> >> I want to rework the qemu virtio subsystem, but various >> structures are currently blatted to disk in save/load. So I looked at >> altering that, only to discover that it needs conversion to vmstate, and >> 2009 patches in patchwork which have never been applied. >> >> Has there been any progress on this? A git tree I should be using? > > My trees are more than 1 year old, and unfinished (basic trouble is how > to express the lists of pending requests for block and now serial). I > haven't yet looked at virtio-scsi. That's the easy part though, and part of what I want to change. All we need is the index of the request; the rest can be re-read from the ring. So create the array in pre_save, and restore on post_load. Is there something more recent than http://repo.or.cz/w/qemu/quintela.git/shortlog/refs/heads/vmstate/virtio or should I cherry-pick from there? Thanks, Rusty.
[Qemu-devel] vmstate conversion for virtio?
Hi all, I want to rework the qemu virtio subsystem, but various structures are currently blatted to disk in save/load. So I looked at altering that, only to discover that it needs conversion to vmstate, and 2009 patches in patchwork which have never been applied. Has there been any progress on this? A git tree I should be using? Thanks, Rusty.