On Mon, Apr 14, 2014 at 03:08:23PM +0200, Alexander Graf wrote: > > On 14.04.14 14:55, Michael S. Tsirkin wrote: > >On Mon, Apr 14, 2014 at 02:40:04PM +0200, Alexander Graf wrote: > >>On 14.04.14 14:37, Michael S. Tsirkin wrote: > >>>On Mon, Apr 14, 2014 at 02:29:20PM +0200, Alexander Graf wrote: > >>>>On 14.04.14 14:24, Michael S. Tsirkin wrote: > >>>>>On Mon, Apr 14, 2014 at 02:16:03PM +0200, Alexander Graf wrote: > >>>>>>On 14.04.14 13:58, Greg Kurz wrote: > >>>>>>>From: Rusty Russell <ru...@rustcorp.com.au> > >>>>>>> > >>>>>>>virtio data structures are defined as "target endian", which assumes > >>>>>>>that's a fixed value. In fact, that actually means it's > >>>>>>>platform-specific. > >>>>>>>The OASIS virtio 1.0 spec will fix this, by making it all little > >>>>>>>endian. > >>>>>>> > >>>>>>>We introduce memory accessors to be used accross the virtio code where > >>>>>>>needed. These accessors should support both legacy and 1.0 devices. > >>>>>>>A good way to do it is to introduce a per-device property to store the > >>>>>>>endianness. We choose to set this flag at device reset time because it > >>>>>>>is reasonnable to assume the endianness won't change unless we reboot > >>>>>>>or > >>>>>>>kexec another kernel. And it is also reasonnable to assume the new > >>>>>>>kernel > >>>>>>>will reset the devices before using them (otherwise it will break). > >>>>>>> > >>>>>>>We reuse the virtio_is_big_endian() helper since it provides the right > >>>>>>>value for legacy devices with most of the targets, that have fixed > >>>>>>>endianness. It can then be overriden to support endian-ambivalent > >>>>>>>targets. > >>>>>>> > >>>>>>>To support migration, we need to set the flag in virtio_load() as well. > >>>>>>> > >>>>>>>(a) One solution would be to add it to the stream, but it have some > >>>>>>> drawbacks: > >>>>>>>- since this only affects a few targets, the field should be put into a > >>>>>>> subsection > >>>>>>>- virtio migration code should be ported to vmstate to be able to > >>>>>>>introduce > >>>>>>> such a subsection > >>>>>>> > >>>>>>>(b) If we assume the following to be true: > >>>>>>>- target endianness falls under some cpu state > >>>>>>>- cpu state is always restored before virtio devices state because they > >>>>>>> get initialized in this order in main(). > >>>>>>>Then an alternative is to rely on virtio_is_big_endian() again at > >>>>>>>load time. No need to mess around with the migration stream in this > >>>>>>>case. > >>>>>>> > >>>>>>>This patch implements (b). > >>>>>>> > >>>>>>>Note that the tswap helpers are implemented in virtio.c so that > >>>>>>>virtio-access.h stays platform independant. Most of the virtio code > >>>>>>>will be buildable under common-obj instead of obj then, and spare > >>>>>>>some cycles when building for multiple targets. > >>>>>>> > >>>>>>>Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> > >>>>>>>[ ldq_phys() API change, > >>>>>>> relicensed virtio-access.h to GPLv2+ on Rusty's request, > >>>>>>> introduce a per-device is_big_endian flag (supersedes > >>>>>>> needs_byteswap global) > >>>>>>> add VirtIODevice * arg to virtio helpers, > >>>>>>> use the existing virtio_is_big_endian() helper, > >>>>>>> virtio-pci: use the device is_big_endian flag, > >>>>>>> introduce virtio tswap16 and tswap64 helpers, > >>>>>>> move calls to tswap* out of virtio-access.h to make it platform > >>>>>>> independant, > >>>>>>> migration support, > >>>>>>> Greg Kurz <gk...@linux.vnet.ibm.com> ] > >>>>>>>Cc: Cédric Le Goater <c...@fr.ibm.com> > >>>>>>>Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com> > >>>>>>>--- > >>>>>>> > >>>>>>>Changes since v6: > >>>>>>>- merge the virtio_needs_byteswap() helper from v6 and existing > >>>>>>> virtio_is_big_endian() > >>>>>>>- virtio-pci: now supports endianness changes > >>>>>>>- virtio-access.h fixes (target independant) > >>>>>>> > >>>>>>> exec.c | 2 - > >>>>>>> hw/virtio/Makefile.objs | 2 - > >>>>>>> hw/virtio/virtio-pci.c | 11 +-- > >>>>>>> hw/virtio/virtio.c | 35 +++++++++ > >>>>>>> include/hw/virtio/virtio-access.h | 138 > >>>>>>> +++++++++++++++++++++++++++++++++++++ > >>>>>>> include/hw/virtio/virtio.h | 2 + > >>>>>>> vl.c | 4 + > >>>>>>> 7 files changed, 185 insertions(+), 9 deletions(-) > >>>>>>> create mode 100644 include/hw/virtio/virtio-access.h > >>>>>>> > >>>>>>>diff --git a/exec.c b/exec.c > >>>>>>>index 91513c6..e6777d0 100644 > >>>>>>>--- a/exec.c > >>>>>>>+++ b/exec.c > >>>>>>>@@ -42,6 +42,7 @@ > >>>>>>> #else /* !CONFIG_USER_ONLY */ > >>>>>>> #include "sysemu/xen-mapcache.h" > >>>>>>> #include "trace.h" > >>>>>>>+#include "hw/virtio/virtio.h" > >>>>>>> #endif > >>>>>>> #include "exec/cpu-all.h" > >>>>>>>@@ -2745,7 +2746,6 @@ int cpu_memory_rw_debug(CPUState *cpu, > >>>>>>>target_ulong addr, > >>>>>>> * A helper function for the _utterly broken_ virtio device model to > >>>>>>> find out if > >>>>>>> * it's running on a big endian machine. Don't do this at home kids! > >>>>>>> */ > >>>>>>>-bool virtio_is_big_endian(void); > >>>>>>> bool virtio_is_big_endian(void) > >>>>>>> { > >>>>>>> #if defined(TARGET_WORDS_BIGENDIAN) > >>>>>>>diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs > >>>>>>>index 1ba53d9..68c3064 100644 > >>>>>>>--- a/hw/virtio/Makefile.objs > >>>>>>>+++ b/hw/virtio/Makefile.objs > >>>>>>>@@ -4,5 +4,5 @@ common-obj-y += virtio-bus.o > >>>>>>> common-obj-y += virtio-mmio.o > >>>>>>> common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += dataplane/ > >>>>>>>-obj-y += virtio.o virtio-balloon.o > >>>>>>>+obj-y += virtio.o virtio-balloon.o > >>>>>>> obj-$(CONFIG_LINUX) += vhost.o > >>>>>>>diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > >>>>>>>index ce97514..82a1689 100644 > >>>>>>>--- a/hw/virtio/virtio-pci.c > >>>>>>>+++ b/hw/virtio/virtio-pci.c > >>>>>>>@@ -89,9 +89,6 @@ > >>>>>>> /* Flags track per-device state like workarounds for quirks in older > >>>>>>> guests. */ > >>>>>>> #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG (1 << 0) > >>>>>>>-/* HACK for virtio to determine if it's running a big endian guest */ > >>>>>>>-bool virtio_is_big_endian(void); > >>>>>>>- > >>>>>>> static void virtio_pci_bus_new(VirtioBusState *bus, size_t bus_size, > >>>>>>> VirtIOPCIProxy *dev); > >>>>>>>@@ -409,13 +406,13 @@ static uint64_t virtio_pci_config_read(void > >>>>>>>*opaque, hwaddr addr, > >>>>>>> break; > >>>>>>> case 2: > >>>>>>> val = virtio_config_readw(vdev, addr); > >>>>>>>- if (virtio_is_big_endian()) { > >>>>>>>+ if (vdev->is_big_endian) { > >>>>>>> val = bswap16(val); > >>>>>>> } > >>>>>>> break; > >>>>>>> case 4: > >>>>>>> val = virtio_config_readl(vdev, addr); > >>>>>>>- if (virtio_is_big_endian()) { > >>>>>>>+ if (vdev->is_big_endian) { > >>>>>>> val = bswap32(val); > >>>>>>> } > >>>>>>> break; > >>>>>>>@@ -443,13 +440,13 @@ static void virtio_pci_config_write(void > >>>>>>>*opaque, hwaddr addr, > >>>>>>> virtio_config_writeb(vdev, addr, val); > >>>>>>> break; > >>>>>>> case 2: > >>>>>>>- if (virtio_is_big_endian()) { > >>>>>>>+ if (vdev->is_big_endian) { > >>>>>>> val = bswap16(val); > >>>>>>> } > >>>>>>> virtio_config_writew(vdev, addr, val); > >>>>>>> break; > >>>>>>> case 4: > >>>>>>>- if (virtio_is_big_endian()) { > >>>>>>>+ if (vdev->is_big_endian) { > >>>>>>> val = bswap32(val); > >>>>>>> } > >>>>>>> virtio_config_writel(vdev, addr, val); > >>>>>>>diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > >>>>>>>index aeabf3a..bb646f0 100644 > >>>>>>>--- a/hw/virtio/virtio.c > >>>>>>>+++ b/hw/virtio/virtio.c > >>>>>>>@@ -19,6 +19,7 @@ > >>>>>>> #include "hw/virtio/virtio.h" > >>>>>>> #include "qemu/atomic.h" > >>>>>>> #include "hw/virtio/virtio-bus.h" > >>>>>>>+#include "hw/virtio/virtio-access.h" > >>>>>>> /* > >>>>>>> * The alignment to use between consumer and producer parts of vring. > >>>>>>>@@ -546,6 +547,8 @@ void virtio_reset(void *opaque) > >>>>>>> virtio_set_status(vdev, 0); > >>>>>>>+ vdev->is_big_endian = virtio_is_big_endian(); > >>>>>>>+ > >>>>>>> if (k->reset) { > >>>>>>> k->reset(vdev); > >>>>>>> } > >>>>>>>@@ -897,6 +900,11 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f) > >>>>>>> BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); > >>>>>>> VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); > >>>>>>>+ /* NOTE: we assume that endianness is a cpu state AND > >>>>>>>+ * cpu state is restored before virtio devices. > >>>>>>>+ */ > >>>>>>>+ vdev->is_big_endian = virtio_is_big_endian(); > >>>>>>>+ > >>>>>>> if (k->load_config) { > >>>>>>> ret = k->load_config(qbus->parent, f); > >>>>>>> if (ret) > >>>>>>>@@ -1153,6 +1161,33 @@ void > >>>>>>>virtio_device_set_child_bus_name(VirtIODevice *vdev, char *bus_name) > >>>>>>> } > >>>>>>> } > >>>>>>>+uint16_t virtio_tswap16(VirtIODevice *vdev, uint16_t s) > >>>>>>>+{ > >>>>>>>+ if (vdev->is_big_endian) { > >>>>>>>+ return tswap16(s); > >>>>>>>+ } else { > >>>>>>>+ return bswap16(tswap16(s)); > >>>>>>>+ } > >>>>>>This looks pretty bogus. When virtio wants to do "tswap" what it > >>>>>>means is "give me a host endianness value in virtio endianness". How > >>>>>>about something like > >>>>>> > >>>>>>#ifdef HOST_WORDS_BIGENDIAN > >>>>>> return vdev->is_big_endian ? s : bswap16(s); > >>>>>>#else > >>>>>> return vdev->is_big_endian ? bswap16(s) : s; > >>>>>>#endif > >>>>>> > >>>>>Actually why doesn't this call virtio_is_big_endian? > >>>>>As it is, we get extra branches even if target endian-ness > >>>>>is fixed. > >>>>Because virtio_is_big_endian() returns the default endianness, not > >>>>the runtime endianness of a virtio device. > >>In fact, we should probably rename it accordingly. > >> > >>>>>>That should work pretty well inline as well, so you don't need to > >>>>>>compile virtio.c as target dependent. > >>>>>> > >>>>>> > >>>>>>Alex > >>>>>Yes but we'll still need to build two variants: fixed endian and > >>>>>dynamic endian platforms. > >>>>>Something along the lines of 32/64 bit split that we have? > >>>>Why bother? Always make it dynamic and don't change the per-device > >>>>variable, no? I'd be surprised if the performance difference is > >>>>measurable. The bulk of the data we transfer gets copied raw anyway. > >>>> > >>>> > >>>>Alex > >>>This will have to be measured and proved by whoever's proposing the > >>>patch, not by reviewers. Platforms such as AMD which don't do > >>>prediction well would be especially interesting to test on. > >>Sure, Greg, can you do that? I'm sure Michael has test cases > >>available he can give you to measure performance on this. > >Measuring performance is hard ATM. > > > >Disk io stress when backed by raw ramdisk in host is usually the easiest > >way to stress userspace virtio. > >You need to make sure your baseline is repeateable though: > >pin down everything from cpu to hardware interrupts, > >disable power management, c states etc > >Just taking an average and hoping for the best doesn't cut it. > > > > > >We really should write a benchmark device, > >to measure just the transport overhead. > >For example, start with virtio-net but fuse TX and RX > >VQs together, so you'll get right back whatever you send out. > > > >So really, virtio is ATM target-specific and I think it's > >a good idea to get it working as such first, > >do extra cleanups like getting rid of target specific code > >separately. > > How does it help when we keep it target specific? The only way to > remove any overhead from this refactoring would be to instead of > > if (vdev->is_big_endian) > > write it as > > if (virtio_is_big_endian_device(vdev)) > > which we could define as > > static inline bool virtio_is_big_endian_device(VirtIODevice *vdev) > { > #if defined(TARGET_PPC) > return vdev->is_big_endian > #elif defined(TARGET_WORDS_BIGENDIAN) > return true; > #else > return false; > #endif > } > > If it makes you happy to do it this way first and move virtio to > target-independent files later, we can certainly do this. > > > Alex
Yes, that's enough to make me happy.