On Tue, Apr 15, 2014 at 03:22:56PM +0200, Greg Kurz wrote: > On Tue, 15 Apr 2014 13:35:03 +0200 > Alexander Graf <ag...@suse.de> wrote: > > > On 04/15/2014 10:40 AM, Greg Kurz wrote: > > > On Mon, 14 Apr 2014 15:08:23 +0200 > > > Alexander Graf <ag...@suse.de> wrote: > > > > > >> On 14.04.14 14:55, Michael S. Tsirkin wrote: > > >>> On Mon, Apr 14, 2014 at 02:40:04PM +0200, Alexander Graf wrote: > > >>>> On 14.04.14 14:37, Michael S. Tsirkin wrote: > > >>>>> On Mon, Apr 14, 2014 at 02:29:20PM +0200, Alexander Graf wrote: > > >>>>>> On 14.04.14 14:24, Michael S. Tsirkin wrote: > > >>>>>>> On Mon, Apr 14, 2014 at 02:16:03PM +0200, Alexander Graf wrote: > > >>>>>>>> On 14.04.14 13:58, Greg Kurz wrote: > > >>>>>>>>> From: Rusty Russell <ru...@rustcorp.com.au> > > >>>>>>>>> > > >>>>>>>>> virtio data structures are defined as "target endian", which > > >>>>>>>>> assumes > > >>>>>>>>> that's a fixed value. In fact, that actually means it's > > >>>>>>>>> platform-specific. > > >>>>>>>>> The OASIS virtio 1.0 spec will fix this, by making it all little > > >>>>>>>>> endian. > > >>>>>>>>> > > >>>>>>>>> We introduce memory accessors to be used accross the virtio code > > >>>>>>>>> where > > >>>>>>>>> needed. These accessors should support both legacy and 1.0 > > >>>>>>>>> devices. > > >>>>>>>>> A good way to do it is to introduce a per-device property to > > >>>>>>>>> store the > > >>>>>>>>> endianness. We choose to set this flag at device reset time > > >>>>>>>>> because it > > >>>>>>>>> is reasonnable to assume the endianness won't change unless we > > >>>>>>>>> reboot or > > >>>>>>>>> kexec another kernel. And it is also reasonnable to assume the > > >>>>>>>>> new kernel > > >>>>>>>>> will reset the devices before using them (otherwise it will > > >>>>>>>>> break). > > >>>>>>>>> > > >>>>>>>>> We reuse the virtio_is_big_endian() helper since it provides the > > >>>>>>>>> right > > >>>>>>>>> value for legacy devices with most of the targets, that have fixed > > >>>>>>>>> endianness. It can then be overriden to support endian-ambivalent > > >>>>>>>>> targets. > > >>>>>>>>> > > >>>>>>>>> To support migration, we need to set the flag in virtio_load() as > > >>>>>>>>> well. > > >>>>>>>>> > > >>>>>>>>> (a) One solution would be to add it to the stream, but it have > > >>>>>>>>> some > > >>>>>>>>> drawbacks: > > >>>>>>>>> - since this only affects a few targets, the field should be put > > >>>>>>>>> into a > > >>>>>>>>> subsection > > >>>>>>>>> - virtio migration code should be ported to vmstate to be able to > > >>>>>>>>> introduce > > >>>>>>>>> such a subsection > > >>>>>>>>> > > >>>>>>>>> (b) If we assume the following to be true: > > >>>>>>>>> - target endianness falls under some cpu state > > >>>>>>>>> - cpu state is always restored before virtio devices state > > >>>>>>>>> because they > > >>>>>>>>> get initialized in this order in main(). > > >>>>>>>>> Then an alternative is to rely on virtio_is_big_endian() again at > > >>>>>>>>> load time. No need to mess around with the migration stream in > > >>>>>>>>> this case. > > >>>>>>>>> > > >>>>>>>>> This patch implements (b). > > >>>>>>>>> > > >>>>>>>>> Note that the tswap helpers are implemented in virtio.c so that > > >>>>>>>>> virtio-access.h stays platform independant. Most of the virtio > > >>>>>>>>> code > > >>>>>>>>> will be buildable under common-obj instead of obj then, and spare > > >>>>>>>>> some cycles when building for multiple targets. > > >>>>>>>>> > > >>>>>>>>> Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> > > >>>>>>>>> [ ldq_phys() API change, > > >>>>>>>>> relicensed virtio-access.h to GPLv2+ on Rusty's request, > > >>>>>>>>> introduce a per-device is_big_endian flag (supersedes > > >>>>>>>>> needs_byteswap global) > > >>>>>>>>> add VirtIODevice * arg to virtio helpers, > > >>>>>>>>> use the existing virtio_is_big_endian() helper, > > >>>>>>>>> virtio-pci: use the device is_big_endian flag, > > >>>>>>>>> introduce virtio tswap16 and tswap64 helpers, > > >>>>>>>>> move calls to tswap* out of virtio-access.h to make it > > >>>>>>>>> platform independant, > > >>>>>>>>> migration support, > > >>>>>>>>> Greg Kurz <gk...@linux.vnet.ibm.com> ] > > >>>>>>>>> Cc: Cédric Le Goater <c...@fr.ibm.com> > > >>>>>>>>> Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com> > > >>>>>>>>> --- > > >>>>>>>>> > > >>>>>>>>> Changes since v6: > > >>>>>>>>> - merge the virtio_needs_byteswap() helper from v6 and existing > > >>>>>>>>> virtio_is_big_endian() > > >>>>>>>>> - virtio-pci: now supports endianness changes > > >>>>>>>>> - virtio-access.h fixes (target independant) > > >>>>>>>>> > > >>>>>>>>> exec.c | 2 - > > >>>>>>>>> hw/virtio/Makefile.objs | 2 - > > >>>>>>>>> hw/virtio/virtio-pci.c | 11 +-- > > >>>>>>>>> hw/virtio/virtio.c | 35 +++++++++ > > >>>>>>>>> include/hw/virtio/virtio-access.h | 138 > > >>>>>>>>> +++++++++++++++++++++++++++++++++++++ > > >>>>>>>>> include/hw/virtio/virtio.h | 2 + > > >>>>>>>>> vl.c | 4 + > > >>>>>>>>> 7 files changed, 185 insertions(+), 9 deletions(-) > > >>>>>>>>> create mode 100644 include/hw/virtio/virtio-access.h > > >>>>>>>>> > > >>>>>>>>> diff --git a/exec.c b/exec.c > > >>>>>>>>> index 91513c6..e6777d0 100644 > > >>>>>>>>> --- a/exec.c > > >>>>>>>>> +++ b/exec.c > > >>>>>>>>> @@ -42,6 +42,7 @@ > > >>>>>>>>> #else /* !CONFIG_USER_ONLY */ > > >>>>>>>>> #include "sysemu/xen-mapcache.h" > > >>>>>>>>> #include "trace.h" > > >>>>>>>>> +#include "hw/virtio/virtio.h" > > >>>>>>>>> #endif > > >>>>>>>>> #include "exec/cpu-all.h" > > >>>>>>>>> @@ -2745,7 +2746,6 @@ int cpu_memory_rw_debug(CPUState *cpu, > > >>>>>>>>> target_ulong addr, > > >>>>>>>>> * A helper function for the _utterly broken_ virtio device > > >>>>>>>>> model to find out if > > >>>>>>>>> * it's running on a big endian machine. Don't do this at home > > >>>>>>>>> kids! > > >>>>>>>>> */ > > >>>>>>>>> -bool virtio_is_big_endian(void); > > >>>>>>>>> bool virtio_is_big_endian(void) > > >>>>>>>>> { > > >>>>>>>>> #if defined(TARGET_WORDS_BIGENDIAN) > > >>>>>>>>> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs > > >>>>>>>>> index 1ba53d9..68c3064 100644 > > >>>>>>>>> --- a/hw/virtio/Makefile.objs > > >>>>>>>>> +++ b/hw/virtio/Makefile.objs > > >>>>>>>>> @@ -4,5 +4,5 @@ common-obj-y += virtio-bus.o > > >>>>>>>>> common-obj-y += virtio-mmio.o > > >>>>>>>>> common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += dataplane/ > > >>>>>>>>> -obj-y += virtio.o virtio-balloon.o > > >>>>>>>>> +obj-y += virtio.o virtio-balloon.o > > >>>>>>>>> obj-$(CONFIG_LINUX) += vhost.o > > >>>>>>>>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > > >>>>>>>>> index ce97514..82a1689 100644 > > >>>>>>>>> --- a/hw/virtio/virtio-pci.c > > >>>>>>>>> +++ b/hw/virtio/virtio-pci.c > > >>>>>>>>> @@ -89,9 +89,6 @@ > > >>>>>>>>> /* Flags track per-device state like workarounds for quirks in > > >>>>>>>>> older guests. */ > > >>>>>>>>> #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG (1 << 0) > > >>>>>>>>> -/* HACK for virtio to determine if it's running a big endian > > >>>>>>>>> guest */ > > >>>>>>>>> -bool virtio_is_big_endian(void); > > >>>>>>>>> - > > >>>>>>>>> static void virtio_pci_bus_new(VirtioBusState *bus, size_t > > >>>>>>>>> bus_size, > > >>>>>>>>> VirtIOPCIProxy *dev); > > >>>>>>>>> @@ -409,13 +406,13 @@ static uint64_t virtio_pci_config_read(void > > >>>>>>>>> *opaque, hwaddr addr, > > >>>>>>>>> break; > > >>>>>>>>> case 2: > > >>>>>>>>> val = virtio_config_readw(vdev, addr); > > >>>>>>>>> - if (virtio_is_big_endian()) { > > >>>>>>>>> + if (vdev->is_big_endian) { > > >>>>>>>>> val = bswap16(val); > > >>>>>>>>> } > > >>>>>>>>> break; > > >>>>>>>>> case 4: > > >>>>>>>>> val = virtio_config_readl(vdev, addr); > > >>>>>>>>> - if (virtio_is_big_endian()) { > > >>>>>>>>> + if (vdev->is_big_endian) { > > >>>>>>>>> val = bswap32(val); > > >>>>>>>>> } > > >>>>>>>>> break; > > >>>>>>>>> @@ -443,13 +440,13 @@ static void virtio_pci_config_write(void > > >>>>>>>>> *opaque, hwaddr addr, > > >>>>>>>>> virtio_config_writeb(vdev, addr, val); > > >>>>>>>>> break; > > >>>>>>>>> case 2: > > >>>>>>>>> - if (virtio_is_big_endian()) { > > >>>>>>>>> + if (vdev->is_big_endian) { > > >>>>>>>>> val = bswap16(val); > > >>>>>>>>> } > > >>>>>>>>> virtio_config_writew(vdev, addr, val); > > >>>>>>>>> break; > > >>>>>>>>> case 4: > > >>>>>>>>> - if (virtio_is_big_endian()) { > > >>>>>>>>> + if (vdev->is_big_endian) { > > >>>>>>>>> val = bswap32(val); > > >>>>>>>>> } > > >>>>>>>>> virtio_config_writel(vdev, addr, val); > > >>>>>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > > >>>>>>>>> index aeabf3a..bb646f0 100644 > > >>>>>>>>> --- a/hw/virtio/virtio.c > > >>>>>>>>> +++ b/hw/virtio/virtio.c > > >>>>>>>>> @@ -19,6 +19,7 @@ > > >>>>>>>>> #include "hw/virtio/virtio.h" > > >>>>>>>>> #include "qemu/atomic.h" > > >>>>>>>>> #include "hw/virtio/virtio-bus.h" > > >>>>>>>>> +#include "hw/virtio/virtio-access.h" > > >>>>>>>>> /* > > >>>>>>>>> * The alignment to use between consumer and producer parts of > > >>>>>>>>> vring. > > >>>>>>>>> @@ -546,6 +547,8 @@ void virtio_reset(void *opaque) > > >>>>>>>>> virtio_set_status(vdev, 0); > > >>>>>>>>> + vdev->is_big_endian = virtio_is_big_endian(); > > >>>>>>>>> + > > >>>>>>>>> if (k->reset) { > > >>>>>>>>> k->reset(vdev); > > >>>>>>>>> } > > >>>>>>>>> @@ -897,6 +900,11 @@ int virtio_load(VirtIODevice *vdev, QEMUFile > > >>>>>>>>> *f) > > >>>>>>>>> BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); > > >>>>>>>>> VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); > > >>>>>>>>> + /* NOTE: we assume that endianness is a cpu state AND > > >>>>>>>>> + * cpu state is restored before virtio devices. > > >>>>>>>>> + */ > > >>>>>>>>> + vdev->is_big_endian = virtio_is_big_endian(); > > >>>>>>>>> + > > >>>>>>>>> if (k->load_config) { > > >>>>>>>>> ret = k->load_config(qbus->parent, f); > > >>>>>>>>> if (ret) > > >>>>>>>>> @@ -1153,6 +1161,33 @@ void > > >>>>>>>>> virtio_device_set_child_bus_name(VirtIODevice *vdev, char > > >>>>>>>>> *bus_name) > > >>>>>>>>> } > > >>>>>>>>> } > > >>>>>>>>> +uint16_t virtio_tswap16(VirtIODevice *vdev, uint16_t s) > > >>>>>>>>> +{ > > >>>>>>>>> + if (vdev->is_big_endian) { > > >>>>>>>>> + return tswap16(s); > > >>>>>>>>> + } else { > > >>>>>>>>> + return bswap16(tswap16(s)); > > >>>>>>>>> + } > > >>>>>>>> This looks pretty bogus. When virtio wants to do "tswap" what it > > >>>>>>>> means is "give me a host endianness value in virtio endianness". > > >>>>>>>> How > > >>>>>>>> about something like > > >>>>>>>> > > >>>>>>>> #ifdef HOST_WORDS_BIGENDIAN > > >>>>>>>> return vdev->is_big_endian ? s : bswap16(s); > > >>>>>>>> #else > > >>>>>>>> return vdev->is_big_endian ? bswap16(s) : s; > > >>>>>>>> #endif > > >>>>>>>> > > >>>>>>> Actually why doesn't this call virtio_is_big_endian? > > >>>>>>> As it is, we get extra branches even if target endian-ness > > >>>>>>> is fixed. > > >>>>>> Because virtio_is_big_endian() returns the default endianness, not > > >>>>>> the runtime endianness of a virtio device. > > >>>> In fact, we should probably rename it accordingly. > > >>>> > > >>>>>>>> That should work pretty well inline as well, so you don't need to > > >>>>>>>> compile virtio.c as target dependent. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Alex > > >>>>>>> Yes but we'll still need to build two variants: fixed endian and > > >>>>>>> dynamic endian platforms. > > >>>>>>> Something along the lines of 32/64 bit split that we have? > > >>>>>> Why bother? Always make it dynamic and don't change the per-device > > >>>>>> variable, no? I'd be surprised if the performance difference is > > >>>>>> measurable. The bulk of the data we transfer gets copied raw anyway. > > >>>>>> > > >>>>>> > > >>>>>> Alex > > >>>>> This will have to be measured and proved by whoever's proposing the > > >>>>> patch, not by reviewers. Platforms such as AMD which don't do > > >>>>> prediction well would be especially interesting to test on. > > >>>> Sure, Greg, can you do that? I'm sure Michael has test cases > > >>>> available he can give you to measure performance on this. > > >>> Measuring performance is hard ATM. > > >>> > > >>> Disk io stress when backed by raw ramdisk in host is usually the easiest > > >>> way to stress userspace virtio. > > >>> You need to make sure your baseline is repeateable though: > > >>> pin down everything from cpu to hardware interrupts, > > >>> disable power management, c states etc > > >>> Just taking an average and hoping for the best doesn't cut it. > > >>> > > >>> > > >>> We really should write a benchmark device, > > >>> to measure just the transport overhead. > > >>> For example, start with virtio-net but fuse TX and RX > > >>> VQs together, so you'll get right back whatever you send out. > > >>> > > >>> So really, virtio is ATM target-specific and I think it's > > >>> a good idea to get it working as such first, > > >>> do extra cleanups like getting rid of target specific code > > >>> separately. > > >> How does it help when we keep it target specific? The only way to remove > > >> any overhead from this refactoring would be to instead of > > >> > > >> if (vdev->is_big_endian) > > >> > > >> write it as > > >> > > >> if (virtio_is_big_endian_device(vdev)) > > >> > > >> which we could define as > > >> > > >> static inline bool virtio_is_big_endian_device(VirtIODevice *vdev) > > >> { > > >> #if defined(TARGET_PPC) > > >> return vdev->is_big_endian > > >> #elif defined(TARGET_WORDS_BIGENDIAN) > > >> return true; > > >> #else > > >> return false; > > >> #endif > > >> } > > >> > > >> If it makes you happy to do it this way first and move virtio to > > >> target-independent files later, we can certainly do this. > > >> > > >> > > > This would end up defined in all virtio files with the consequence of > > > hitting TARGET_WORDS_BIGENDIAN poison for the target independent ones. > > > Should we kick them out of common-obj variables in makefiles as well ? > > > Or would it be acceptable to have this helper not inlined ? > > > > That would defeat the purpose - the reason to have the helper inlined is > > to remove the conditional branch for x86. > > > > Sure but on the other hand, Peter does not like the idea of moving virtio > to compiled-per-target...
Whenever I do touch hw/virtio/virtio.c I see: CC alpha-softmmu/hw/virtio/virtio.o CC arm-softmmu/hw/virtio/virtio.o LINK alpha-softmmu/qemu-system-alpha LINK arm-softmmu/qemu-system-arm CC aarch64-softmmu/hw/virtio/virtio.o LINK aarch64-softmmu/qemu-system-aarch64 CC i386-softmmu/hw/virtio/virtio.o LINK i386-softmmu/qemu-system-i386 CC m68k-softmmu/hw/virtio/virtio.o LINK m68k-softmmu/qemu-system-m68k CC mips64el-softmmu/hw/virtio/virtio.o LINK mips64el-softmmu/qemu-system-mips64el CC mips64-softmmu/hw/virtio/virtio.o CC mipsel-softmmu/hw/virtio/virtio.o LINK mips64-softmmu/qemu-system-mips64 LINK mipsel-softmmu/qemu-system-mipsel CC mips-softmmu/hw/virtio/virtio.o LINK mips-softmmu/qemu-system-mips CC ppcemb-softmmu/hw/virtio/virtio.o CC ppc64-softmmu/hw/virtio/virtio.o LINK ppcemb-softmmu/qemu-system-ppcemb LINK ppc64-softmmu/qemu-system-ppc64 CC ppc-softmmu/hw/virtio/virtio.o LINK ppc-softmmu/qemu-system-ppc CC s390x-softmmu/hw/virtio/virtio.o LINK s390x-softmmu/qemu-system-s390x CC sh4eb-softmmu/hw/virtio/virtio.o LINK sh4eb-softmmu/qemu-system-sh4eb CC sh4-softmmu/hw/virtio/virtio.o LINK sh4-softmmu/qemu-system-sh4 CC sparc64-softmmu/hw/virtio/virtio.o LINK sparc64-softmmu/qemu-system-sparc64 CC x86_64-softmmu/hw/virtio/virtio.o LINK x86_64-softmmu/qemu-system-x86_64 so it looks like virtio is currently compiled per-target. So why isn't it reasonable to keep it per-target for purpose of this enhancement? What am I missing? > these look like contradictory requests to me... :-\ > Unless I have missed something, we have to settle whether we favor > building/testing > time or performance of non-{powerpc,arm} targets to have legacy virtio > supporting > LE powerpc and BE arm... > > > > > Alex > > > > > > -- > Gregory Kurz kurzg...@fr.ibm.com > gk...@linux.vnet.ibm.com > Software Engineer @ IBM/Meiosys http://www.ibm.com > Tel +33 (0)562 165 496 > > "Anarchy is about taking complete responsibility for yourself." > Alan Moore.