Re: [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask
On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote: > These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to > a real host page size: > 4e51361d7 "cpu-all: complete "real" host page size API" and > f7ceed190 "vfio: cpu: Use "real" page size API" > > This finished the transition by: > - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/ > - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/ > - removing bitfield length for offsets in VFIOQuirk::data as > qemu_real_host_page_mask is not a macro This does not make much sense to me. f7ceed190 moved to REAL_HOST_PAGE_SIZE because it's back end stuff that really depends only on the host page size. Here you're applying a blanket change to vfio code, in particular to the DMA handling code, and for DMA both the host and target page size can be relevant, depending on the details of the IOMMU implementation. > This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is > the minimum page size which IOMMU regions may be using and at the moment > memory regions do not carry the actual page size. And this exception also doesn't make much sense to me. Partly it's confusing because the listener is doing different things depending on whether we have a guest visible IOMMU or not. In short, there doesn't seem to be a coherent explanation here of where the page size / alignment restriction is coming from, and therefore whether it needs to be a host page alignment, a guest page alignment, or both. > Signed-off-by: Alexey Kardashevskiy > --- > > In reality DMA windows are always a lot bigger than a single 4K page > and aligned to 32/64MB, may be only use here > qemu_real_host_page_mask? I don't understand this question either. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpwNRyv0ggc3.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH v4 4/7] pc: fix QEMU crashing when more than ~50 memory hotplugged
On Fri, Jul 10, 2015 at 12:12:36PM +0200, Igor Mammedov wrote: > On Thu, 9 Jul 2015 16:46:43 +0300 > "Michael S. Tsirkin" wrote: > > > On Thu, Jul 09, 2015 at 03:43:01PM +0200, Paolo Bonzini wrote: > > > > > > > > > On 09/07/2015 15:06, Michael S. Tsirkin wrote: > > > > > QEMU asserts in vhost due to hitting vhost backend limit > > > > > on number of supported memory regions. > > > > > > > > > > Describe all hotplugged memory as one continuos range > > > > > to vhost with linear 1:1 HVA->GPA mapping in backend. > > > > > > > > > > Signed-off-by: Igor Mammedov > > > > > > > > Hmm - a bunch of work here to recombine MRs that memory listener > > > > interface breaks up. In particular KVM could benefit from this too (on > > > > workloads that change the table a lot). Can't we teach memory core to > > > > pass hva range as a single continuous range to memory listeners? > > > > > > Memory listeners are based on memory regions, not HVA ranges. > > > > > > Paolo > > > > Many listeners care about HVA ranges. I know KVM and vhost do. > I'm not sure about KVM, it works just fine with fragmented memory regions, > the same will apply to vhost once module parameter to increase limit > is merged. > > but changing generic memory listener interface to replace HVA mapped > regions with HVA container would lead to a case when listeners > won't see exact layout that they might need. I don't think they care, really. > In addition vhost itself will suffer from working with big HVA > since it allocates log depending on size of memory => bigger log. Not really - it allocates the log depending on the PA range. Leaving unused holes doesn't reduce it's size. > That's one of the reasons that in this patch HVA ranges in > memory map are compacted only for backend consumption, > QEMU's side of vhost uses exact map for internal purposes. > And the other reason is I don't know vhost enough to rewrite it > to use big HVA for everything. > > > I guess we could create dummy MRs to fill in the holes left by > > memory hotplug? > it looks like nice thing from vhost pov but complicates other side, What other side do you have in mind? > hence I dislike an idea inventing dummy MRs for vhost's convenience. > > > > vhost already has logic to recombine > > consequitive chunks created by memory core. > which looks a bit complicated and I was thinking about simplifying > it some time in the future.
Re: [Qemu-devel] [PATCH] virtio-rng: Bump up quota value only when guest requests entropy
Hi Amit, Thanks for the review. > > On (Fri) 10 Jul 2015 [15:04:00], Pankaj Gupta wrote: > > Timer was added in virtio-rng to rate limit the > > entropy. It used to trigger at regular intervals to > > bump up the quota value. The value of quota and timer > > slice is decided based on entropy source rate in host. > > It doesn't necessarily depnd on the source rate in the host - all we > want the quota+timer to do is to limit the amount of data a guest can > take from the host - to ensure one (potentially rogue) guest does not > use up all the entropy from the host. Sorry! for not being clear on this. By rate limit I meant same. I used a broader term. > > > This resulted in triggring of timer even when quota > > is not exhausted at all and resulting in extra processing. > > > > This patch triggers timer only when guest requests for > > entropy. As soon as first request from guest for entropy > > comes we set the timer. Timer bumps up the quota value > > when it gets triggered. > > Can you say how you tested this? > > Mainly interested in seeing the results in these cases: > > * No quota/timer specified on command line Tested this scenario. I am setting timer when first request comes. So, timer gets fired after (1 << 16) ms time. > * Quota+timer specified on command line, and guest keeps asking host > for unlimited entropy, e.g. by doing 'dd if=/dev/hwrng of=/dev/null' > in the guest. I did not do 'dd if=/dev/hwrng of=/dev/null'. Did cat '/dev/hwrng' && '/dev/random' > * Ensure quota restrictions are maintained, and we're not giving more > data than configured. Ensured. We are either giving < = requested data > > For these tests, it's helpful to use the host's /dev/urandom as the > source, since that can give data faster to the guest than the default > /dev/random. (Otherwise, if the host itself blocks on /dev/random, > the guest may not get entropy due to that reason vs it not getting > entropy due to rate-limiting.) Agree. Will test this as well. > > I tested one scenario using the trace events. With some quota and a > timer value specified on the cmdline, before patch, I get tons of > trace events before the guest is even up. After applying the patch, I > don't get any trace events. So that's progress! Thanks. > > I have one question: > > > Signed-off-by: Pankaj Gupta > > --- > > hw/virtio/virtio-rng.c | 15 --- > > include/hw/virtio/virtio-rng.h | 1 + > > 2 files changed, 9 insertions(+), 7 deletions(-) > > > > diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c > > index 22b1d87..8774a0c 100644 > > --- a/hw/virtio/virtio-rng.c > > +++ b/hw/virtio/virtio-rng.c > > @@ -78,6 +78,12 @@ static void virtio_rng_process(VirtIORNG *vrng) > > return; > > } > > > > + if (vrng->activate_timer) { > > + timer_mod(vrng->rate_limit_timer, > > + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + > > vrng->conf.period_ms); > > + vrng->activate_timer = false; > > + } > > + > > if (vrng->quota_remaining < 0) { > > quota = 0; > > } else { > > @@ -139,8 +145,7 @@ static void check_rate_limit(void *opaque) > > > > vrng->quota_remaining = vrng->conf.max_bytes; > > virtio_rng_process(vrng); > > - timer_mod(vrng->rate_limit_timer, > > - qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + > > vrng->conf.period_ms); > > + vrng->activate_timer = true; > > } > > We're processing an older request first, and then firing the timer. > What's the use of doing it this way? Why even do this? I also had this query. If we don't call this after resetting 'vrng->quota_remaining' further requests does not work. It looks to me some limitation in earlier code when 'vrng->quota_remaining' goes to < = 0. A self request is needed to reset things. I will try to find the answer. > > I know this is how the code was written originally, but since you've > looked at it, do you know why this is the way it is? No > > Amit > >
Re: [Qemu-devel] [PATCH 5/5] virtio-net: unbreak any layout
On 13/07/2015 07:46, Jason Wang wrote: > -if (out_sg[0].iov_len < n->guest_hdr_len) { > +s = iov_to_buf(out_sg, out_num, 0, &hdr, sizeof(hdr)); > +if (s != sizeof(hdr)) { > error_report("virtio-net header incorrect"); > exit(1); > } > -virtio_net_hdr_swap(vdev, (void *) out_sg[0].iov_base); > +virtio_net_hdr_swap(vdev, (void *) &hdr); > +s = iov_from_buf(out_sg, out_num, 0, &hdr, sizeof(hdr)); > +assert(s == sizeof(hdr)); > } Are the copies necessary in the common case of no swap? In that case you can just use iov_size. Paolo
Re: [Qemu-devel] [PULL 12/62] framebuffer: check memory_region_is_logging
On 13/07/2015 00:02, Peter Maydell wrote: > (I'm wondering if we somehow manage to spend all our time > trying to service the GUI and no time making forward > progress in the guest, though I don't have a clear idea > in mind of why this would be so Perhaps because Xlib calls (and hence GTK+ calls) are blocking? > or why this change would > make things worse, except that presumably we now do more > full-screen-invalidates.) Yes, then it makes sense for this patch to make a difference. VNC does dirty rectangles and has multithreading, but GTK+ doesn't do either. You can just add a memory_region_set_log call for the integratorcp and versatilepb RAM regions. Paolo
Re: [Qemu-devel] [PATCH v3 3/5] Introduce irqchip type specification for KVM
Hello! > Besides we discussed this together before and you know I think we can > manage without this patch file by using kvm_create_device in trial mode > for v2 and v3 - but we did not agree on this -. Because we don't want "some GIC". We want particular GIC version, and this seems to be logical. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia
Re: [Qemu-devel] [PATCH 4/5] Revert "virtio-net: enable virtio 1.0"
On Mon, 13 Jul 2015 13:46:50 +0800 Jason Wang wrote: > This reverts commit df91055db5c9cee93d70ca8c08d72119a240b987. > > This is because: > - vhost support virtio 1.0 now > - transport code (e.g virtio-pci) set this feature when modern is > enabled, setting this unconditionally will break disable-modern=on. *scratches head* Why is transport code now supposed to set VERSION_1? I thought we wanted to have the individual devices offer it, once they are converted. No objection on removing the dependency on !vhost. > > Cc: Cornelia Huck > Signed-off-by: Jason Wang > --- > hw/net/virtio-net.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index d728233..e3c2db3 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -466,7 +466,6 @@ static uint64_t virtio_net_get_features(VirtIODevice > *vdev, uint64_t features) > } > > if (!get_vhost_net(nc->peer)) { > -virtio_add_feature(&features, VIRTIO_F_VERSION_1); > return features; > } > return vhost_net_get_features(get_vhost_net(nc->peer), features);
Re: [Qemu-devel] [PATCH 0/3] hw/net: Drop unnecessary .can_receive functions
On 07/01/2015 03:10 PM, Fam Zheng wrote: > These are all repeating the default, and since we're cleaning up .can_receive, > let's get these out of the way. > > Fam > > > Fam Zheng (3): > musicpal: Drop eth_can_receive > etraxfs_eth: Drop eth_can_receive > lan9118: Drop lan9118_can_receive > > hw/arm/musicpal.c| 6 -- > hw/net/etraxfs_eth.c | 6 -- > hw/net/lan9118.c | 6 -- > 3 files changed, 18 deletions(-) > Reviewed-by: Jason Wang
Re: [Qemu-devel] [PATCH] virtio-rng: Bump up quota value only when guest requests entropy
On (Fri) 10 Jul 2015 [15:04:00], Pankaj Gupta wrote: >Timer was added in virtio-rng to rate limit the > entropy. It used to trigger at regular intervals to > bump up the quota value. The value of quota and timer > slice is decided based on entropy source rate in host. It doesn't necessarily depnd on the source rate in the host - all we want the quota+timer to do is to limit the amount of data a guest can take from the host - to ensure one (potentially rogue) guest does not use up all the entropy from the host. > This resulted in triggring of timer even when quota > is not exhausted at all and resulting in extra processing. > > This patch triggers timer only when guest requests for > entropy. As soon as first request from guest for entropy > comes we set the timer. Timer bumps up the quota value > when it gets triggered. Can you say how you tested this? Mainly interested in seeing the results in these cases: * No quota/timer specified on command line * Quota+timer specified on command line, and guest keeps asking host for unlimited entropy, e.g. by doing 'dd if=/dev/hwrng of=/dev/null' in the guest. * Ensure quota restrictions are maintained, and we're not giving more data than configured. For these tests, it's helpful to use the host's /dev/urandom as the source, since that can give data faster to the guest than the default /dev/random. (Otherwise, if the host itself blocks on /dev/random, the guest may not get entropy due to that reason vs it not getting entropy due to rate-limiting.) I tested one scenario using the trace events. With some quota and a timer value specified on the cmdline, before patch, I get tons of trace events before the guest is even up. After applying the patch, I don't get any trace events. So that's progress! I have one question: > Signed-off-by: Pankaj Gupta > --- > hw/virtio/virtio-rng.c | 15 --- > include/hw/virtio/virtio-rng.h | 1 + > 2 files changed, 9 insertions(+), 7 deletions(-) > > diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c > index 22b1d87..8774a0c 100644 > --- a/hw/virtio/virtio-rng.c > +++ b/hw/virtio/virtio-rng.c > @@ -78,6 +78,12 @@ static void virtio_rng_process(VirtIORNG *vrng) > return; > } > > +if (vrng->activate_timer) { > +timer_mod(vrng->rate_limit_timer, > + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + > vrng->conf.period_ms); > +vrng->activate_timer = false; > +} > + > if (vrng->quota_remaining < 0) { > quota = 0; > } else { > @@ -139,8 +145,7 @@ static void check_rate_limit(void *opaque) > > vrng->quota_remaining = vrng->conf.max_bytes; > virtio_rng_process(vrng); > -timer_mod(vrng->rate_limit_timer, > - qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + > vrng->conf.period_ms); > +vrng->activate_timer = true; > } We're processing an older request first, and then firing the timer. What's the use of doing it this way? Why even do this? I know this is how the code was written originally, but since you've looked at it, do you know why this is the way it is? Amit
[Qemu-devel] [PATCH V2 2/2] tests: test rx recovery from cont
Rx should be recovered after cont. Signed-off-by: Jason Wang --- Changes from V1: - query status before sending cont, this makes sure that the packet were queued. --- tests/virtio-net-test.c | 52 + 1 file changed, 52 insertions(+) diff --git a/tests/virtio-net-test.c b/tests/virtio-net-test.c index b7d6f0b..3c412e2 100644 --- a/tests/virtio-net-test.c +++ b/tests/virtio-net-test.c @@ -140,6 +140,49 @@ static void tx_test(const QVirtioBus *bus, QVirtioDevice *dev, g_assert_cmpstr(buffer, ==, "TEST"); } +static void rx_stop_cont_test(const QVirtioBus *bus, QVirtioDevice *dev, + QGuestAllocator *alloc, QVirtQueue *vq, + int socket) +{ +uint64_t req_addr; +uint32_t free_head; +char test[] = "TEST"; +char buffer[64]; +int len = htonl(sizeof(test)); +struct iovec iov[] = { +{ +.iov_base = &len, +.iov_len = sizeof(len), +}, { +.iov_base = test, +.iov_len = sizeof(test), +}, +}; +int ret; + +req_addr = guest_alloc(alloc, 64); + +free_head = qvirtqueue_add(vq, req_addr, 64, true, false); +qvirtqueue_kick(bus, dev, vq, free_head); + +/* We could check the status, but this command is more importantly to + * ensure the packet data gets queued in QEMU, before we do 'cont'. + */ +qmp("{ 'execute' : 'query-status'}"); +qmp("{ 'execute' : 'stop'}"); + +ret = iov_send(socket, iov, 2, 0, sizeof(len) + sizeof(test)); +g_assert_cmpint(ret, ==, sizeof(test) + sizeof(len)); + +qmp("{ 'execute' : 'cont'}"); + +qvirtio_wait_queue_isr(bus, dev, vq, QVIRTIO_NET_TIMEOUT_US); +memread(req_addr + VNET_HDR_SIZE, buffer, sizeof(test)); +g_assert_cmpstr(buffer, ==, "TEST"); + +guest_free(alloc, req_addr); +} + static void send_recv_test(const QVirtioBus *bus, QVirtioDevice *dev, QGuestAllocator *alloc, QVirtQueue *rvq, QVirtQueue *tvq, int socket) @@ -148,6 +191,13 @@ static void send_recv_test(const QVirtioBus *bus, QVirtioDevice *dev, tx_test(bus, dev, alloc, tvq, socket); } +static void stop_cont_test(const QVirtioBus *bus, QVirtioDevice *dev, + QGuestAllocator *alloc, QVirtQueue *rvq, + QVirtQueue *tvq, int socket) +{ +rx_stop_cont_test(bus, dev, alloc, rvq, socket); +} + static void pci_basic(gconstpointer data) { QVirtioPCIDevice *dev; @@ -206,6 +256,8 @@ int main(int argc, char **argv) g_test_init(&argc, &argv, NULL); #ifndef _WIN32 qtest_add_data_func("/virtio/net/pci/basic", send_recv_test, pci_basic); +qtest_add_data_func("/virtio/net/pci/rx_stop_cont", +stop_cont_test, pci_basic); #endif qtest_add_func("/virtio/net/pci/hotplug", hotplug); -- 2.1.4
[Qemu-devel] [PATCH V2 1/2] tests: introduce basic pci test for virtio-net
Signed-off-by: Jason Wang --- Changes from V1: - replace the magic value 12 with a macro --- tests/Makefile | 2 +- tests/virtio-net-test.c | 186 ++-- 2 files changed, 180 insertions(+), 8 deletions(-) diff --git a/tests/Makefile b/tests/Makefile index 2c4b8dc..5805326 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -381,7 +381,7 @@ tests/wdt_ib700-test$(EXESUF): tests/wdt_ib700-test.o tests/tco-test$(EXESUF): tests/tco-test.o $(libqos-pc-obj-y) tests/virtio-balloon-test$(EXESUF): tests/virtio-balloon-test.o tests/virtio-blk-test$(EXESUF): tests/virtio-blk-test.o $(libqos-virtio-obj-y) -tests/virtio-net-test$(EXESUF): tests/virtio-net-test.o $(libqos-pc-obj-y) +tests/virtio-net-test$(EXESUF): tests/virtio-net-test.o $(libqos-pc-obj-y) $(libqos-virtio-obj-y) tests/virtio-rng-test$(EXESUF): tests/virtio-rng-test.o $(libqos-pc-obj-y) tests/virtio-scsi-test$(EXESUF): tests/virtio-scsi-test.o $(libqos-virtio-obj-y) tests/virtio-9p-test$(EXESUF): tests/virtio-9p-test.o diff --git a/tests/virtio-net-test.c b/tests/virtio-net-test.c index ea7478c..b7d6f0b 100644 --- a/tests/virtio-net-test.c +++ b/tests/virtio-net-test.c @@ -10,20 +10,193 @@ #include #include #include "libqtest.h" +#include "qemu-common.h" +#include "qemu/sockets.h" #include "qemu/osdep.h" -#include "libqos/pci.h" +#include "qemu/iov.h" +#include "libqos/pci-pc.h" +#include "libqos/virtio.h" +#include "libqos/virtio-pci.h" +#include "libqos/malloc.h" +#include "libqos/malloc-pc.h" +#include "libqos/malloc-generic.h" +#include "qemu/bswap.h" +#include "hw/virtio/virtio-net.h" #define PCI_SLOT_HP 0x06 +#define PCI_SLOT0x04 +#define PCI_FN 0x00 -/* Tests only initialization so far. TODO: Replace with functional tests */ -static void pci_nop(void) +#define QVIRTIO_NET_TIMEOUT_US (30 * 1000 * 1000) +#define VNET_HDR_SIZE sizeof(struct virtio_net_hdr_mrg_rxbuf) + +static void test_end(void) +{ +qtest_end(); +} + +#ifndef _WIN32 + +static QVirtioPCIDevice *virtio_net_pci_init(QPCIBus *bus, int slot) +{ +QVirtioPCIDevice *dev; + +dev = qvirtio_pci_device_find(bus, QVIRTIO_NET_DEVICE_ID); +g_assert(dev != NULL); +g_assert_cmphex(dev->vdev.device_type, ==, QVIRTIO_NET_DEVICE_ID); + +qvirtio_pci_device_enable(dev); +qvirtio_reset(&qvirtio_pci, &dev->vdev); +qvirtio_set_acknowledge(&qvirtio_pci, &dev->vdev); +qvirtio_set_driver(&qvirtio_pci, &dev->vdev); + +return dev; +} + +static QPCIBus *pci_test_start(int socket) +{ +char *cmdline; + +cmdline = g_strdup_printf("-netdev socket,fd=%d,id=hs0 -device " + "virtio-net-pci,netdev=hs0", socket); +qtest_start(cmdline); +g_free(cmdline); + +return qpci_init_pc(); +} + +static void driver_init(const QVirtioBus *bus, QVirtioDevice *dev) +{ +uint32_t features; + +features = qvirtio_get_features(bus, dev); +features = features & ~(QVIRTIO_F_BAD_FEATURE | +QVIRTIO_F_RING_INDIRECT_DESC | +QVIRTIO_F_RING_EVENT_IDX); +qvirtio_set_features(bus, dev, features); + +qvirtio_set_driver_ok(bus, dev); +} + +static void rx_test(const QVirtioBus *bus, QVirtioDevice *dev, +QGuestAllocator *alloc, QVirtQueue *vq, +int socket) { +uint64_t req_addr; +uint32_t free_head; +char test[] = "TEST"; +char buffer[64]; +int len = htonl(sizeof(test)); +struct iovec iov[] = { +{ +.iov_base = &len, +.iov_len = sizeof(len), +}, { +.iov_base = test, +.iov_len = sizeof(test), +}, +}; +int ret; + +req_addr = guest_alloc(alloc, 64); + +free_head = qvirtqueue_add(vq, req_addr, 64, true, false); +qvirtqueue_kick(bus, dev, vq, free_head); + +ret = iov_send(socket, iov, 2, 0, sizeof(len) + sizeof(test)); +g_assert_cmpint(ret, ==, sizeof(test) + sizeof(len)); + +qvirtio_wait_queue_isr(bus, dev, vq, QVIRTIO_NET_TIMEOUT_US); +memread(req_addr + VNET_HDR_SIZE, buffer, sizeof(test)); +g_assert_cmpstr(buffer, ==, "TEST"); + +guest_free(alloc, req_addr); } +static void tx_test(const QVirtioBus *bus, QVirtioDevice *dev, +QGuestAllocator *alloc, QVirtQueue *vq, +int socket) +{ +uint64_t req_addr; +uint32_t free_head; +uint32_t len; +char buffer[64]; +int ret; + +req_addr = guest_alloc(alloc, 64); +memwrite(req_addr + VNET_HDR_SIZE, "TEST", 4); + +free_head = qvirtqueue_add(vq, req_addr, 64, false, false); +qvirtqueue_kick(bus, dev, vq, free_head); + +qvirtio_wait_queue_isr(bus, dev, vq, QVIRTIO_NET_TIMEOUT_US); +guest_free(alloc, req_addr); + +ret = qemu_recv(socket, &len, sizeof(len), 0); +g_assert_cmpint(ret, ==, sizeof(len)); +len = ntohl(len); + +ret = qemu_recv(socket, buffer, len, 0); +g_as
[Qemu-devel] [PATCH 4/5] Revert "virtio-net: enable virtio 1.0"
This reverts commit df91055db5c9cee93d70ca8c08d72119a240b987. This is because: - vhost support virtio 1.0 now - transport code (e.g virtio-pci) set this feature when modern is enabled, setting this unconditionally will break disable-modern=on. Cc: Cornelia Huck Signed-off-by: Jason Wang --- hw/net/virtio-net.c | 1 - 1 file changed, 1 deletion(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index d728233..e3c2db3 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -466,7 +466,6 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features) } if (!get_vhost_net(nc->peer)) { -virtio_add_feature(&features, VIRTIO_F_VERSION_1); return features; } return vhost_net_get_features(get_vhost_net(nc->peer), features); -- 2.1.4
[Qemu-devel] [PATCH 3/5] virtio-blk: set VIRTIO_F_ANY_LAYOUT when 1.0 is supported
Chapter 6.3 of spec said " Transitional devices MUST offer, and if offered by the device transitional drivers MUST accept the following: VIRTIO_F_ANY_LAYOUT (27) " So this patch sets VIRTIO_F_ANY_LAYOUT when 1.0 is supported. Cc: Stefan Hajnoczi Cc: Kevin Wolf Cc: qemu-bl...@nongnu.org Signed-off-by: Jason Wang --- hw/block/virtio-blk.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index f30ad25..0f07e25 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -732,6 +732,8 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features) virtio_add_feature(&features, VIRTIO_BLK_F_BLK_SIZE); if (!__virtio_has_feature(features, VIRTIO_F_VERSION_1)) virtio_add_feature(&features, VIRTIO_BLK_F_SCSI); +else +virtio_add_feature(&features, VIRTIO_F_ANY_LAYOUT); if (s->conf.config_wce) { virtio_add_feature(&features, VIRTIO_BLK_F_CONFIG_WCE); -- 2.1.4
[Qemu-devel] [PATCH 5/5] virtio-net: unbreak any layout
Commit 032a74a1c0fcdd5fd1c69e56126b4c857ee36611 ("virtio-net: byteswap virtio-net header") breaks any layout by requiring out_sg[0].iov_len >= n->guest_hdr_len. Fixing this by copying header to temporary buffer and copying it back after byteswap. Fixes 032a74a1c0fcdd5fd1c69e56126b4c857ee36611 ("virtio-net: byteswap virtio-net header") Cc: qemu-sta...@nongnu.org Cc: c...@fr.ibm.com Signed-off-by: Jason Wang --- hw/net/virtio-net.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index e3c2db3..b42af8f 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1139,10 +1139,11 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) } while (virtqueue_pop(q->tx_vq, &elem)) { -ssize_t ret, len; +ssize_t ret, len, s; unsigned int out_num = elem.out_num; struct iovec *out_sg = &elem.out_sg[0]; struct iovec sg[VIRTQUEUE_MAX_SIZE]; +struct virtio_net_hdr hdr; if (out_num < 1) { error_report("virtio-net header not in first element"); @@ -1150,11 +1151,14 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) } if (n->has_vnet_hdr) { -if (out_sg[0].iov_len < n->guest_hdr_len) { +s = iov_to_buf(out_sg, out_num, 0, &hdr, sizeof(hdr)); +if (s != sizeof(hdr)) { error_report("virtio-net header incorrect"); exit(1); } -virtio_net_hdr_swap(vdev, (void *) out_sg[0].iov_base); +virtio_net_hdr_swap(vdev, (void *) &hdr); +s = iov_from_buf(out_sg, out_num, 0, &hdr, sizeof(hdr)); +assert(s == sizeof(hdr)); } /* -- 2.1.4
[Qemu-devel] [PATCH 2/5] virtio-blk: disable scsi passthrough for 1.0 device
VIRTIO_BLK_F_SCSI was no longer supported in 1.0. So disable it. Cc: Stefan Hajnoczi Cc: Kevin Wolf Cc: qemu-bl...@nongnu.org Signed-off-by: Jason Wang --- hw/block/virtio-blk.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 6aefda4..f30ad25 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -730,7 +730,8 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features) virtio_add_feature(&features, VIRTIO_BLK_F_GEOMETRY); virtio_add_feature(&features, VIRTIO_BLK_F_TOPOLOGY); virtio_add_feature(&features, VIRTIO_BLK_F_BLK_SIZE); -virtio_add_feature(&features, VIRTIO_BLK_F_SCSI); +if (!__virtio_has_feature(features, VIRTIO_F_VERSION_1)) +virtio_add_feature(&features, VIRTIO_BLK_F_SCSI); if (s->conf.config_wce) { virtio_add_feature(&features, VIRTIO_BLK_F_CONFIG_WCE); -- 2.1.4
[Qemu-devel] [PATCH 1/5] virtio-pci: ignore unaligned read/write in virtio_address_space_read()/write()
We abort on unaligned read/write in virtio_address_space_read()/write() but since len in under control of guest so qemu will simply crash when booting a modern guest (guest is try to read when len is zero). Fix this by ignoring unaligned write or read. Fixes 1e40356ce5f6ccfa0bb57104a533c62952c560ce ("virtio fix cfg endian-ness for BE targets") Signed-off-by: Jason Wang --- hw/virtio/virtio-pci.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index ccca2b6..bed9735 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -466,8 +466,8 @@ void virtio_address_space_write(AddressSpace *as, hwaddr addr, */ addr &= ~(len - 1); -/* Make sure caller aligned buf properly */ -assert(!(((uintptr_t)buf) & (len - 1))); +if (!(((uintptr_t)buf) & (len - 1))) +return; switch (len) { case 1: @@ -498,8 +498,8 @@ virtio_address_space_read(AddressSpace *as, hwaddr addr, uint8_t *buf, int len) */ addr &= ~(len - 1); -/* Make sure caller aligned buf properly */ -assert(!(((uintptr_t)buf) & (len - 1))); +if (!(((uintptr_t)buf) & (len - 1))) +return; switch (len) { case 1: -- 2.1.4
Re: [Qemu-devel] [PATCH 0/3] hw/net: Drop unnecessary .can_receive functions
On Wed, 07/01 15:10, Fam Zheng wrote: > These are all repeating the default, and since we're cleaning up .can_receive, > let's get these out of the way. Cc'ing net maintainers. Fam
Re: [Qemu-devel] [PATCH v2] net: Flush queued packets when guest resumes
On 07/10/2015 05:24 PM, Fam Zheng wrote: > On Wed, 07/08 17:40, Jason Wang wrote: >> >> On 07/07/2015 05:03 PM, Fam Zheng wrote: >>> On Tue, 07/07 15:44, Jason Wang wrote: On 07/07/2015 09:21 AM, Fam Zheng wrote: > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends, > net queues need to be explicitly flushed after qemu_can_send_packet() > returns false, because the netdev side will disable the polling of fd. > > This fixes the case of "cont" after "stop" (or migration). > > Signed-off-by: Fam Zheng > > --- > > v2: Unify with VM stop handler. (Stefan) > --- > net/net.c | 19 --- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/net/net.c b/net/net.c > index 6ff7fec..28a5597 100644 > --- a/net/net.c > +++ b/net/net.c > @@ -1257,14 +1257,19 @@ void qmp_set_link(const char *name, bool up, > Error **errp) > static void net_vm_change_state_handler(void *opaque, int running, > RunState state) > { > -/* Complete all queued packets, to guarantee we don't modify > - * state later when VM is not running. > - */ > -if (!running) { > -NetClientState *nc; > -NetClientState *tmp; > +NetClientState *nc; > +NetClientState *tmp; > > -QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) { > +QTAILQ_FOREACH_SAFE(nc, &net_clients, next, tmp) { > +if (running) { > +/* Flush queued packets and wake up backends. */ > +if (nc->peer && qemu_can_send_packet(nc)) { > +qemu_flush_queued_packets(nc->peer); > +} > +} else { > +/* Complete all queued packets, to guarantee we don't modify > + * state later when VM is not running. > + */ > qemu_flush_or_purge_queued_packets(nc, true); > } Looks like qemu_can_send_packet() checks both nc->peer and runstate. So probably, we can simplify this to: if (qemu_can_send_packet(nc)) qemu_flush_queued_packets(nc->peer); else qemu_flush_or_purge_queued_packets(nc, true); > } >>> qemu_can_send_packet returns 1 if !nc->peer, so this doesn't work. >>> >>> Fam >> Yes, I was wrong. >> >> Btw, instead of depending on vm handler (which seems racy with other >> state change handler). Can we do this in places like vm_start() and >> vm_stop(). Like we drain and flush block queue during vm stop. >> > Because that's a bit hacky. > > It won't be racy if we replace vdev->vm_running with runstate_is_running() in > virtio-net, and/or don't check it in virtio_net_can_receive(). Is that OK? Ok but not necessary. I believe this patch will be used with the patch that drops virtio_net_can_receive(). So at any order two vmstate handlers were called, the sent_cb will be called and poll will be enabled again. I'm ok with the patch. So Reviewed-by: Jason Wang Thanks > > Fam
Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan
On Fri, 07/10 14:16, Alexandre DERUMIER wrote: > So, I think patch 3/3 is enough. (could be great to have it for qemu 2.4) Stefan, are you happy with this series? Fam
Re: [Qemu-devel] [PATCH] virtio-net: Drop net_virtio_info.can_receive
On 07/08/2015 06:50 PM, Stefan Hajnoczi wrote: > On Tue, Jul 07, 2015 at 04:45:41PM +0800, Jason Wang wrote: >> >> On 07/06/2015 11:21 PM, Stefan Hajnoczi wrote: >>> On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote: On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote: > On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote: >> On 06/30/2015 11:06 AM, Fam Zheng wrote: >>> virtio_net_receive still does the check by calling >>> virtio_net_can_receive, if the device or driver is not ready, the packet >>> is dropped. >>> >>> This is necessary because returning false from can_receive complicates >>> things: the peer would disable sending until we explicitly flush the >>> queue. >>> >>> Signed-off-by: Fam Zheng >>> --- >>> hw/net/virtio-net.c | 1 - >>> 1 file changed, 1 deletion(-) >>> >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>> index d728233..dbef0d0 100644 >>> --- a/hw/net/virtio-net.c >>> +++ b/hw/net/virtio-net.c >>> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice >>> *vdev, QEMUFile *f, >>> static NetClientInfo net_virtio_info = { >>> .type = NET_CLIENT_OPTIONS_KIND_NIC, >>> .size = sizeof(NICState), >>> -.can_receive = virtio_net_can_receive, >>> .receive = virtio_net_receive, >>> .link_status_changed = virtio_net_set_link_status, >>> .query_rx_filter = virtio_net_query_rxfilter, >> A side effect of this patch is it will read and then drop packet is >> guest driver is no ok. > I think that the semantics of .can_receive() and .receive() return > values are currently incorrect in many NICs. They have .can_receive() > functions that return false for conditions where .receive() would > discard the packet. So what happens is that packets get queued when > they should actually be discarded. Yes, but they are bugs more or less. > The purpose of the flow control (queuing) mechanism is to tell the > sender to hold off until the receiver has more rx buffers available. > It's a short-term thing that doesn't included link down, rx disable, or > NIC reset states. > > Therefore, I think this patch will not introduce a regression. It is > adjusting the code to stop queuing packets when they should actually be > dropped. > > Thoughts? I agree there's no functional issue. But it cause wasting of cpu cycles (consider guest is being flooded). Sometime it maybe even dangerous. For tap, we're probably ok since we have 756ae78b but for other backend, we don't. >>> If the guest uses iptables rules or other mechanisms to drop bogus >>> packets the cost is even higher than discarding them at the QEMU layer. >> But it was the choice of guest. >> >>> What's more is that if you're using link down as a DoS mitigation >>> strategy then you might as well hot unplug the NIC. >>> >>> Stefan >> I think there're two problems for virtio-net: >> >> 1) mitigation method when guest driver is ok. For tx, we have either >> timer or bh, for rx and only for tap, we have 756ae78b. We probably need >> fixes for other backends. > I agree. It's not directly related to the change Fam is proposing > though. > >> 2) when driver is not ok, the point is we should not poll the backend at >> all (I believe this is one of the main objects of main loop). Something >> like tap_can_send() and the commit that drops tap_can_send() all follow >> this rule. But this patch does not, we end up with: >> >> - driver is not ok or no driver, qemu keep reading and dropping packets. > It's correct from a functionality perspective. When rx is disabled or > the link is down, packets should be dropped. > > From a performance perspective I agree it would be better to drop > packets earlier in the stack. > > If you want tap to drop packets efficiently during link down/receive > disabled/etc: > > What's the best way to disable tap rx so the host kernel drops packets > instead of queuing them? There's no such API currently especially for a unprivileged qemu. But tuntap will drop the packet if it exceeds tx_queue_length. This looks still much better than current behaviour. > >> - driver is ok but not enough rx buffer, qemu will disable tap read poll. > QEMU relies on the kernel's socket receive buffer or netif receive > queue to hold incoming packets. When QEMU enables tap read poll again > it receives the queued packets. > > Queuing is necessary so USB net emulation (the rx buffer is only 1 > packet!) and maybe slirp work. Yes. > > I think real NICs drop packets when the rx ring is exhausted, but we > need to keep the queuing behavior unless someone proposes a way to > eliminate it. Yes, queueing seems useless if backend can do it. So I'm ok with the patch, and we can optimize in the future. Thanks
Re: [Qemu-devel] [PATCH 3/8] disas: m68k: QOMify target specific disas setup
On 12/07/15 12:00, Peter Crosthwaite wrote: > From: Peter Crosthwaite > > Move the target_disas() m68k specifics to the QOM disas_set_info hook > and delete the #ifdef specific code in disas.c. > > Cc: Greg Ungerer I see no problems. Reviewed-by: Greg Ungerer > Cc: Laurent Vivier > Signed-off-by: Peter Crosthwaite > --- > Testing: > I cant find binaries for this arch easily, but I got this from executing > random code: > > $ ./m68k-softmmu/qemu-system-m68k -kernel ./random_code -S -nographic -d > in_asm 2> err > QEMU 2.3.90 monitor - type 'help' for more information > (qemu) xp 0x4000 > 4000: 0x7d413a22 > (qemu) xp/i 0x4000 > 0x4000: mvsw %d1,%d6 > (qemu) xp/i 0x4004 > 0x4004: addqb #2,%a0@(27614) > (qemu) c > (qemu) Aborted (core dumped) > > $ more err > qemu: fatal: Illegal instruction: 7d41 @ 4000 > --- > disas.c | 4 > target-m68k/cpu.c | 7 +++ > 2 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/disas.c b/disas.c > index 6c86129..91cbb1a 100644 > --- a/disas.c > +++ b/disas.c > @@ -243,8 +243,6 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong > code, > } > s.info.disassembler_options = (char *)"any"; > s.info.print_insn = print_insn_ppc; > -#elif defined(TARGET_M68K) > -s.info.print_insn = print_insn_m68k; > #elif defined(TARGET_MIPS) > #ifdef TARGET_WORDS_BIGENDIAN > s.info.print_insn = print_insn_big_mips; > @@ -463,8 +461,6 @@ void monitor_disas(Monitor *mon, CPUState *cpu, > s.info.endian = BFD_ENDIAN_LITTLE; > } > s.info.print_insn = print_insn_ppc; > -#elif defined(TARGET_M68K) > -s.info.print_insn = print_insn_m68k; > #elif defined(TARGET_MIPS) > #ifdef TARGET_WORDS_BIGENDIAN > s.info.print_insn = print_insn_big_mips; > diff --git a/target-m68k/cpu.c b/target-m68k/cpu.c > index 4f246da..2555755 100644 > --- a/target-m68k/cpu.c > +++ b/target-m68k/cpu.c > @@ -61,6 +61,11 @@ static void m68k_cpu_reset(CPUState *s) > tlb_flush(s, 1); > } > > +static void m68k_cpu_disas_set_info(CPUState *cpu, disassemble_info *info) > +{ > +info->print_insn = print_insn_m68k; > +} > + > /* CPU models */ > > static ObjectClass *m68k_cpu_class_by_name(const char *cpu_model) > @@ -212,6 +217,8 @@ static void m68k_cpu_class_init(ObjectClass *c, void > *data) > dc->vmsd = &vmstate_m68k_cpu; > cc->gdb_num_core_regs = 18; > cc->gdb_core_xml_file = "cf-core.xml"; > + > +cc->disas_set_info = m68k_cpu_disas_set_info; > } > > static void register_cpu_type(const M68kCPUInfo *info) >
Re: [Qemu-devel] [PATCH] target-ppc: Add POWER8E_v2.1 CPU model.
On 07/10/2015 07:19 PM, Andrea Bolognani wrote: On Fri, 2015-07-10 at 15:19 +1000, Alexey Kardashevskiy wrote: If I'm reading the kernel source[1] correctly, there are actually subtle differences other than the number of cores: #define CPU_FTRS_POWER8 (/* Bunch of features here */) #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG) POWER8E was the first family of POWER8. Some revisions of POWER8E have this bug. The bug is that if we migrate from a good POWER8 to POWER8 with this bug, perf counters will stop working as the source kernel did not detect broken CPU and did not enable a workaround. We do not really know how many of this chips are out there (not many I believe) and how many have this bug. And also I guess that more people will be annoyed by inability to migrate rather than by broken perf. I'd leave migration enabled but it is worth mentioning somewhere in user's guide that if the user migrates a guest to POWER8E with the specific PVR, perf might break. David said we might be able to just apply the kernel PMAO workaround unconditionally, which would of course be great. Even if that turns out not to be possible, I agree with you that allowing migration is more important than not breaking performance monitoring, and documenting this properly should be enough. #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL) This bug appears in POWER8E and POWER8 1.0 ("DD1"). The bug is that when a CPU thread wakes up after nap because of doorbell message, the doorbell interrupt is not pending. Guests cannot do nap themselves (they use H_CEDE hypercall for this), when a thread wakes up, it wakes in the host context so there is nothing to handle in the guest, it is purely for KVM to workaround. Great news! So the PVR matching, as done currently in QEMU, will include POWER8DD1 but exclude POWER8NVL, which according to to commit ddee09c0 and the code above is absolutely identical to POWER8. Yeah, we have to add 0x004c into the POWER8 family as well, I'll doublecheck. Okay. libvirt will soon advertise just POWER8 instead of specific models, and will consider the POWER8 guest CPU model to be compatible with any host having a POWER8* CPU, using the same PVR values and masks as the kernel. Once QEMU is updated to include POWER8NVL, the behavior will be consistent across the stack. Correct. Just to be sure: the same applies to POWER7 as well, right? It certainly looks so from both the kernel and QEMU code. Right. Thanks a lot for all the precious bits of information you've provided. I've learnt a lot too :) -- Alexey
Re: [Qemu-devel] [PATCH] more check for replaced node
On 07/02/2015 10:48 PM, Stefan Hajnoczi wrote: > On Wed, Jul 01, 2015 at 04:49:40PM +0800, Wen Congyang wrote: >> On 07/01/2015 04:39 PM, Stefan Hajnoczi wrote: >>> On Thu, Jun 25, 2015 at 02:55:10PM +0800, Wen Congyang wrote: Signed-off-by: Wen Congyang --- block.c | 5 +++-- block/mirror.c| 3 ++- blockdev.c| 2 +- include/block/block.h | 3 ++- 4 files changed, 8 insertions(+), 5 deletions(-) >>> >>> This patch is missing a commit description. What is the justification >>> for this change? >> >> Sorry, I forgot to add commit messages.. >> >> Without this patch, the replace node can be any node, and it can be >> top BDS with BB, or another quorum's child. With this patch, the replace node >> must be this quorum's child. > > I think the point of the replace operation was to swap a quorum child > with a new drive. It sounds like this patch will break that use case? > > The idea was that a failed child needs to be replaced. The user adds a > new -drive and then uses the mirror+replace to include it into the > quorum. I think the new child is not be part of the quorum BDS graph > until replacement occurs. bs/s->common.bs is quorum, and to_replace is the broken child. The new child is target_bs. With this patch, we just check if the broken child is part of the quorum BDS. Do I misunderstand something? Thanks Wen Congyang > > Stefan >
Re: [Qemu-devel] [PATCH qemu v10 10/14] spapr_pci: Enable vfio-pci hotplug
On Sun, Jul 12, 2015 at 09:41:27AM -0500, Michael Roth wrote: > Quoting Alexey Kardashevskiy (2015-07-11 23:59:45) > > On 07/11/2015 07:33 AM, Michael Roth wrote: > > > Quoting Alexey Kardashevskiy (2015-07-05 21:11:06) > > >> sPAPR IOMMU is managing two copies of an TCE table: > > >> 1) a guest view of the table - this is what emulated devices use and > > >> this is where H_GET_TCE reads from; > > >> 2) a hardware TCE table - only present if there is at least one vfio-pci > > >> device on a PHB; it is updated via a memory listener on a PHB address > > >> space which forwards map/unmap requests to vfio-pci IOMMU host driver. > > >> > > >> At the moment presence of vfio-pci devices on a bus affect the way > > >> the guest view table is allocated. If there is no vfio-pci on a PHB > > >> and the host kernel supports KVM acceleration of H_PUT_TCE, a table > > >> is allocated in KVM. However, if there is vfio-pci and we do yet not > > >> support KVM acceleration for these, the table has to be allocated > > >> by the userspace. > > >> > > >> When vfio-pci device is hotplugged and there were no vfio-pci devices > > >> already, the guest view table could have been allocated by KVM which > > >> means that H_PUT_TCE is handled by the host kernel and since we > > >> do not support vfio-pci in KVM, the hardware table will not be updated. > > >> > > >> This reallocates the guest view table in QEMU if the first vfio-pci > > >> device has just been plugged. spapr_tce_realloc_userspace() handles this. > > >> > > >> This replays all the mappings to make sure that the tables are in sync. > > >> This will not have a visible effect though as for a new device > > >> the guest kernel will allocate-and-map new addresses and therefore > > >> existing mappings from emulated devices will not be used by vfio-pci > > >> devices. > > >> > > >> This adds calls to spapr_phb_dma_capabilities_update() in PCI hotplug > > >> hooks. > > >> > > >> Signed-off-by: Alexey Kardashevskiy > > >> --- > > >> Changes: > > >> v10: > > >> * removed unnecessary memory_region_del_subregion() and > > >> memory_region_add_subregion() as > > >> "vfio: Unregister IOMMU notifiers when container is destroyed" removes > > >> notifiers in a more correct way > > >> > > >> v9: > > >> * spapr_phb_hotplug_dma_sync() enumerates TCE tables explicitely rather > > >> than > > >> via object_child_foreach() > > >> * spapr_phb_hotplug_dma_sync() does memory_region_del_subregion() + > > >> memory_region_add_subregion() as otherwise vfio_listener_region_del() is > > >> not > > >> called and we end up with vfio_iommu_map_notify registered twice > > >> (comments welcome!) > > >> if we do hotplug+hotunplug+hotplug of the same device. > > >> * moved spapr_phb_hotplug_dma_sync() on unplug event to rcu as before > > >> calling > > >> spapr_phb_hotplug_dma_sync(), we need VFIO to release the container, > > >> otherwise > > >> spapr_phb_dma_capabilities_update() will decide that the PHB still has > > >> VFIO device. > > >> Actual VFIO PCI device release happens from rcu and since we add ours > > >> later, > > >> it gets executed later and we are good. > > >> --- > > >> hw/ppc/spapr_iommu.c| 51 > > >> ++--- > > >> hw/ppc/spapr_pci.c | 47 > > >> + > > >> include/hw/pci-host/spapr.h | 1 + > > >> include/hw/ppc/spapr.h | 2 ++ > > >> trace-events| 2 ++ > > >> 5 files changed, 100 insertions(+), 3 deletions(-) > > >> > > >> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c > > >> index 45c00d8..2d99c3b 100644 > > >> --- a/hw/ppc/spapr_iommu.c > > >> +++ b/hw/ppc/spapr_iommu.c > > >> @@ -78,12 +78,13 @@ static uint64_t *spapr_tce_alloc_table(uint32_t > > >> liobn, > > >> uint32_t nb_table, > > >> uint32_t page_shift, > > >> int *fd, > > >> - bool vfio_accel) > > >> + bool vfio_accel, > > >> + bool force_userspace) > > >> { > > >> uint64_t *table = NULL; > > >> uint64_t window_size = (uint64_t)nb_table << page_shift; > > >> > > >> -if (kvm_enabled() && !(window_size >> 32)) { > > >> +if (kvm_enabled() && !force_userspace && !(window_size >> 32)) { > > >> table = kvmppc_create_spapr_tce(liobn, window_size, fd, > > >> vfio_accel); > > >> } > > >> > > >> @@ -222,7 +223,8 @@ static void spapr_tce_table_do_enable(sPAPRTCETable > > >> *tcet, bool vfio_accel) > > >> tcet->nb_table, > > >> tcet->page_shift, > > >> &tcet->fd, > > >> -vfio_accel); > > >> +vfio_accel, > > >> +
Re: [Qemu-devel] [PATCH v4 1/1] s390 pci infrastructure modelling
On 7/9/2015 5:57 PM, Michael S. Tsirkin wrote: On Thu, Jul 09, 2015 at 05:30:08PM +0800, zyimin wrote: On 7/9/2015 3:48 PM, Michael S. Tsirkin wrote: On Wed, Jul 08, 2015 at 01:44:55PM +0800, Yi Min Zhao wrote: @@ -588,9 +606,172 @@ static const TypeInfo s390_pcihost_info = { } }; +static void s390_pci_device_hot_plug(HotplugHandler *hotplug_dev, + DeviceState *dev, Error **errp) +{ +S390PCIBusDevice *zpci = S390_PCI_DEVICE(dev); +S390PCIBusDevice *tmp; +S390PCIFacility *f = S390_PCI_FACILITY( +object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); + +QTAILQ_FOREACH(tmp, &f->zpci_list, next) { +/* for now, we use fid to sort the list, need to use uid instead + * when uid is ready. What does ready mean in this context? uid is a new feature on s390 arch for pci card management. But support of uid on OS level has not been ready. Does this matter here? I thought it's an arbitrary order - just making it stable is enough? I think there i sno matter here. Because uid is just for zpci management of firmware and both of fid and uid are unique for all zpci devices attached to one guest. FID maybe change after s390 machine restart. If machine reboot, host will scan PCI devices again and will obtain new FID of every zPCI devices. For guest, it is isolated. + */ +if (tmp->fid > zpci->fid) { +break; +} +} + +if (tmp) { +QTAILQ_INSERT_BEFORE(tmp, zpci, next); +} else { +QTAILQ_INSERT_TAIL(&f->zpci_list, zpci, next); +} +f->token_valid = false; +} This still means hotplug will change the index. How about just using an explicit property to set the index? Harder to use but keeps code simple. These code were written by Hong Bo. I think he wants to use zpci->fid to sort pci cards. "index" what you talked is the order of hotplug? right? If it is, I think this code actually can change the index. I don't really understand what you are saying. id is how guest looks up the device, right? it shouldn't change really. For guest, it will use a special instruction to get pci info of all attached pci devices. For example, there are two pci devices which already are attached to the guest. PCI1: fid=1, uid=1 PCI2: fid=3, uid=3 Then, you hotplug a new pci device to the guest: PCI3: fid=2, uid=2 Although PCI2 is at the 3rd position of zpci_list, it will not effect guest looks up the device. Because guest uses fid and fh to get zpci info and do pci operations. I think fid and uid is just the explicit property what you said.
Re: [Qemu-devel] [PATCH v4 1/1] vhost user: add support of live migration
On 2015/7/10 21:05, Paolo Bonzini wrote: On 26/06/2015 11:22, Thibaut Collet wrote: Some vhost client/backend are able to support live migration. To provide this service the following features must be added: 1. Add the VIRTIO_NET_F_GUEST_ANNOUNCE capability to vhost-net when netdev backend is vhost-user. 2. Provide a nop receive callback to vhost-user. This callback is for RARP packets automatically send by qemu_announce_self after a migration. These packets are useless for vhost user and just discarded. When a packet is received by vhost-user, the vhost-user writes the packet in guest memory. QEMU must then copy that page of guest memory from source to destination; it uses a dirty bitmap for this purpose. How does vhost-user do this? I can see this patch providing enough support for *non*live migration. However, it cannot be enough for live migration unless I'm missing something obvious. Paolo Agree. vhost-user should mmap the log memory and mark dirty pages when send or receive packets. Signed-off-by: Thibaut Collet --- hw/net/vhost_net.c |2 ++ net/vhost-user.c | 21 +++-- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 9bd360b..668c422 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -85,6 +85,8 @@ static const int user_feature_bits[] = { VIRTIO_NET_F_CTRL_MAC_ADDR, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, +VIRTIO_NET_F_GUEST_ANNOUNCE, + VIRTIO_NET_F_MQ, VHOST_INVALID_FEATURE_BIT diff --git a/net/vhost-user.c b/net/vhost-user.c index b51bc04..20778a1 100644 --- a/net/vhost-user.c +++ b/net/vhost-user.c @@ -65,6 +65,24 @@ static void vhost_user_stop(VhostUserState *s) s->vhost_net = 0; } +static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf, + size_t size) +{ +/* A live migration is done. Display an error if the packet is not a RARP. + * RARP are just discarded: guest is already notified of live migration + * by the virtio-net NIC or by the vhost-user backend */ +if (size != 60) { +static int display_trace = 1; + +if (display_trace) { +fprintf(stderr,"Vhost user receives unexpected packets\n"); +fflush(stderr); +display_trace = 0; +} +} +return size; +} + static void vhost_user_cleanup(NetClientState *nc) { VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc); @@ -90,6 +108,7 @@ static bool vhost_user_has_ufo(NetClientState *nc) static NetClientInfo net_vhost_user_info = { .type = NET_CLIENT_OPTIONS_KIND_VHOST_USER, .size = sizeof(VhostUserState), +.receive = vhost_user_receive, .cleanup = vhost_user_cleanup, .has_vnet_hdr = vhost_user_has_vnet_hdr, .has_ufo = vhost_user_has_ufo, @@ -146,8 +165,6 @@ static int net_vhost_user_init(NetClientState *peer, const char *device, s = DO_UPCAST(VhostUserState, nc, nc); -/* We don't provide a receive callback */ -s->nc.receive_disabled = 1; s->chr = chr; s->nc.queue_index = i;
Re: [Qemu-devel] [PATCH v3 0/7] cpu: add i386 cpu hot remove support
On 07/09/2015 10:25 PM, Eduardo Otubo wrote: On Fri, Jun 26, 2015 at 11=37=43AM +0800, Zhu Guihua wrote: Hi, On 06/24/2015 09:28 PM, Eduardo Otubo wrote: Hello Zhu, Are you still working on this feature? Could you provide a rebased version of this series? Sorry for late reply. Yes, we are still working on this feature. I have updated my github, you can get the rebased version from it. https://github.com/zhuguihua/qemu.git cpu-hotplug Hi, thanks a lot for rebasing it. I didn't have time to test it, though. I appologise. Regarding your branch, your plan is to stick with device_del and object_del interface? Yeah, we will only stick with device_del interface. There is no need to use object_del interface for cpu hot remove. Thanks, Zhu
Re: [Qemu-devel] [PULL 12/62] framebuffer: check memory_region_is_logging
On 12 July 2015 at 15:09, Paolo Bonzini wrote: > > > On 10/07/2015 17:44, Peter Maydell wrote: >> I've just noticed that one of my vexpress-a9 test images has >> regressed, and git bisect suggests this commit (d55d42078bfb50) >> is at fault. >> >> Test image (512MB) >> http://people.linaro.org/~peter.maydell/vexpress-3.8.tgz >> Untar anywhere, and run with >> ~/path/to/vexpress-3.8/runme path/to/qemu-system-arm > > I can't reproduce it. I get to the root shell prompt (both on the > framebuffer and on the serial console) and can poweroff successfully. Odd, it was entirely consistent for me. This is with GTK and X11-over-ssh-over-slowish-ADSL, so the actual refresh of the display will be slow, but that ought not to imply anything about how fast the guest can do things, right? (I'm wondering if we somehow manage to spend all our time trying to service the GUI and no time making forward progress in the guest, though I don't have a clear idea in mind of why this would be so or why this change would make things worse, except that presumably we now do more full-screen-invalidates.) -- PMM
[Qemu-devel] [PATCH] PR/66760, ipa-inline-analysis.c compile-time hog
From: bonz...@gnu.org In this PR, a lot of time is spent doing the same ipa_load_from_parm_agg query over and over. Luckily a memoization scheme is already there, it's just not used by ipa-inline-analysis.c. The patch moves the cache struct (struct func_body_info) to ipa-prop.h and modify ipa-inline-analysis.c. On some testcases from PR26854 the "alias stmt walking" timevar goes off the profile while it used to be 30-70%. Bootstrapped (regtest in progress) on x86_64-pc-linux-gnu. Please commit the patch for me if approved, as I don't have anymore the key I used to use for gcc.gnu.org. One of these days I'll send my new SSH public key to the overseers. Paolo * ipa-inline-analysis.c (unmodified_parm_or_parm_agg_item): Accept struct func_body_info* instead of struct ipa_node_params*, expecting fbi->info to be filled in. Replace throughout. Adjust call to ipa_load_from_parm_agg. (set_cond_stmt_execution_predicate): Accept struct func_body_info* instead of struct ipa_node_params*. Adjust calls to other functions so that they pass either fbi or fbi->info. (set_switch_stmt_execution_predicate): Likewise. (will_be_nonconstant_predicate): Likewise. (compute_bb_predicates): Likewise. (estimate_function_body_sizes): Move asserts earlier. Fill in struct func_body_info, replace parms_info with fbi.info. Adjust calls to functions that now accept struct func_body_info. * ipa-prop.c (struct param_aa_status, struct ipa_bb_info, struct func_body_info). Move to ipa-prop.h. (ipa_load_from_parm_agg_1): Rename to ipa_load_from_parm_agg, remove static. Adjust callers. (ipa_load_from_parm_agg): Remove. * ipa-prop.h (struct param_aa_status, struct ipa_bb_info, struct func_body_info). Move from ipa-prop.c. (ipa_load_from_parm_agg): Adjust prototype. diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index d5dbfbd..81a6860 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -1574,7 +1574,7 @@ unmodified_parm (gimple stmt, tree op) loaded. */ static bool -unmodified_parm_or_parm_agg_item (struct ipa_node_params *info, +unmodified_parm_or_parm_agg_item (struct func_body_info *fbi, gimple stmt, tree op, int *index_p, struct agg_position_info *aggpos) { @@ -1583,7 +1583,7 @@ unmodified_parm_or_parm_agg_item (struct ipa_node_params *info, gcc_checking_assert (aggpos); if (res) { - *index_p = ipa_get_param_decl_index (info, res); + *index_p = ipa_get_param_decl_index (fbi->info, res); if (*index_p < 0) return false; aggpos->agg_contents = false; @@ -1599,13 +1599,14 @@ unmodified_parm_or_parm_agg_item (struct ipa_node_params *info, stmt = SSA_NAME_DEF_STMT (op); op = gimple_assign_rhs1 (stmt); if (!REFERENCE_CLASS_P (op)) - return unmodified_parm_or_parm_agg_item (info, stmt, op, index_p, + return unmodified_parm_or_parm_agg_item (fbi, stmt, op, index_p, aggpos); } aggpos->agg_contents = true; - return ipa_load_from_parm_agg (info, stmt, op, index_p, &aggpos->offset, -&aggpos->by_ref); + return ipa_load_from_parm_agg (fbi, fbi->info->descriptors, +stmt, op, index_p, &aggpos->offset, +NULL, &aggpos->by_ref); } /* See if statement might disappear after inlining. @@ -1744,7 +1745,7 @@ eliminated_by_inlining_prob (gimple stmt) predicates to the CFG edges. */ static void -set_cond_stmt_execution_predicate (struct ipa_node_params *info, +set_cond_stmt_execution_predicate (struct func_body_info *fbi, struct inline_summary *summary, basic_block bb) { @@ -1767,7 +1768,7 @@ set_cond_stmt_execution_predicate (struct ipa_node_params *info, /* TODO: handle conditionals like var = op0 < 4; if (var != 0). */ - if (unmodified_parm_or_parm_agg_item (info, last, op, &index, &aggpos)) + if (unmodified_parm_or_parm_agg_item (fbi, last, op, &index, &aggpos)) { code = gimple_cond_code (last); inverted_code = invert_tree_comparison (code, HONOR_NANS (op)); @@ -1810,8 +1811,7 @@ set_cond_stmt_execution_predicate (struct ipa_node_params *info, || gimple_call_num_args (set_stmt) != 1) return; op2 = gimple_call_arg (set_stmt, 0); - if (!unmodified_parm_or_parm_agg_item - (info, set_stmt, op2, &index, &aggpos)) + if (!unmodified_parm_or_parm_agg_item (fbi, set_stmt, op2, &index, &aggpos)) return; FOR_EACH_EDGE (e, ei, bb->succs) if (e->flags & EDGE_FALSE_VALUE) { @@ -1827,7 +1827,7 @@ set_cond_stmt_execution_predicate (struct ipa_node_params *info, predicates to the CFG edges. */ static void -set_switch_stmt_execution_predicate (struct ipa_node_params *info, +set_switch_stmt_execution_predicate (struct func_body_info *fbi,
Re: [Qemu-devel] [PATCH qemu v10 10/14] spapr_pci: Enable vfio-pci hotplug
Quoting Alexey Kardashevskiy (2015-07-11 23:59:45) > On 07/11/2015 07:33 AM, Michael Roth wrote: > > Quoting Alexey Kardashevskiy (2015-07-05 21:11:06) > >> sPAPR IOMMU is managing two copies of an TCE table: > >> 1) a guest view of the table - this is what emulated devices use and > >> this is where H_GET_TCE reads from; > >> 2) a hardware TCE table - only present if there is at least one vfio-pci > >> device on a PHB; it is updated via a memory listener on a PHB address > >> space which forwards map/unmap requests to vfio-pci IOMMU host driver. > >> > >> At the moment presence of vfio-pci devices on a bus affect the way > >> the guest view table is allocated. If there is no vfio-pci on a PHB > >> and the host kernel supports KVM acceleration of H_PUT_TCE, a table > >> is allocated in KVM. However, if there is vfio-pci and we do yet not > >> support KVM acceleration for these, the table has to be allocated > >> by the userspace. > >> > >> When vfio-pci device is hotplugged and there were no vfio-pci devices > >> already, the guest view table could have been allocated by KVM which > >> means that H_PUT_TCE is handled by the host kernel and since we > >> do not support vfio-pci in KVM, the hardware table will not be updated. > >> > >> This reallocates the guest view table in QEMU if the first vfio-pci > >> device has just been plugged. spapr_tce_realloc_userspace() handles this. > >> > >> This replays all the mappings to make sure that the tables are in sync. > >> This will not have a visible effect though as for a new device > >> the guest kernel will allocate-and-map new addresses and therefore > >> existing mappings from emulated devices will not be used by vfio-pci > >> devices. > >> > >> This adds calls to spapr_phb_dma_capabilities_update() in PCI hotplug > >> hooks. > >> > >> Signed-off-by: Alexey Kardashevskiy > >> --- > >> Changes: > >> v10: > >> * removed unnecessary memory_region_del_subregion() and > >> memory_region_add_subregion() as > >> "vfio: Unregister IOMMU notifiers when container is destroyed" removes > >> notifiers in a more correct way > >> > >> v9: > >> * spapr_phb_hotplug_dma_sync() enumerates TCE tables explicitely rather > >> than > >> via object_child_foreach() > >> * spapr_phb_hotplug_dma_sync() does memory_region_del_subregion() + > >> memory_region_add_subregion() as otherwise vfio_listener_region_del() is > >> not > >> called and we end up with vfio_iommu_map_notify registered twice (comments > >> welcome!) > >> if we do hotplug+hotunplug+hotplug of the same device. > >> * moved spapr_phb_hotplug_dma_sync() on unplug event to rcu as before > >> calling > >> spapr_phb_hotplug_dma_sync(), we need VFIO to release the container, > >> otherwise > >> spapr_phb_dma_capabilities_update() will decide that the PHB still has > >> VFIO device. > >> Actual VFIO PCI device release happens from rcu and since we add ours > >> later, > >> it gets executed later and we are good. > >> --- > >> hw/ppc/spapr_iommu.c| 51 > >> ++--- > >> hw/ppc/spapr_pci.c | 47 > >> + > >> include/hw/pci-host/spapr.h | 1 + > >> include/hw/ppc/spapr.h | 2 ++ > >> trace-events| 2 ++ > >> 5 files changed, 100 insertions(+), 3 deletions(-) > >> > >> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c > >> index 45c00d8..2d99c3b 100644 > >> --- a/hw/ppc/spapr_iommu.c > >> +++ b/hw/ppc/spapr_iommu.c > >> @@ -78,12 +78,13 @@ static uint64_t *spapr_tce_alloc_table(uint32_t liobn, > >> uint32_t nb_table, > >> uint32_t page_shift, > >> int *fd, > >> - bool vfio_accel) > >> + bool vfio_accel, > >> + bool force_userspace) > >> { > >> uint64_t *table = NULL; > >> uint64_t window_size = (uint64_t)nb_table << page_shift; > >> > >> -if (kvm_enabled() && !(window_size >> 32)) { > >> +if (kvm_enabled() && !force_userspace && !(window_size >> 32)) { > >> table = kvmppc_create_spapr_tce(liobn, window_size, fd, > >> vfio_accel); > >> } > >> > >> @@ -222,7 +223,8 @@ static void spapr_tce_table_do_enable(sPAPRTCETable > >> *tcet, bool vfio_accel) > >> tcet->nb_table, > >> tcet->page_shift, > >> &tcet->fd, > >> -vfio_accel); > >> +vfio_accel, > >> +false); > >> > >> memory_region_set_size(&tcet->iommu, > >> (uint64_t)tcet->nb_table << tcet->page_shift); > >> @@ -495,6 +497,49 @@ int spapr_dma_dt(void *fdt, int node_off, const char > >> *propnam
Re: [Qemu-devel] [PULL 12/62] framebuffer: check memory_region_is_logging
On 10/07/2015 17:44, Peter Maydell wrote: > I've just noticed that one of my vexpress-a9 test images has > regressed, and git bisect suggests this commit (d55d42078bfb50) > is at fault. > > Test image (512MB) > http://people.linaro.org/~peter.maydell/vexpress-3.8.tgz > Untar anywhere, and run with > ~/path/to/vexpress-3.8/runme path/to/qemu-system-arm I can't reproduce it. I get to the root shell prompt (both on the framebuffer and on the serial console) and can poweroff successfully. Paolo
Re: [Qemu-devel] [PATCH 0/8] Disas QOMification, round 2
Am 12.07.2015 um 03:59 schrieb Peter Crosthwaite: > Continue QOMifying target-specific disassembly. Convert all arches except for > X86 and PPC. They will be round 3 and are non-trivial. > > This brings us close to no arch-specific code in disas.c allow conversion to > common-obj and preparing these arches for inclusion in multi-arch. I was about to say that this is for individual target maintainers now, for lack of CPUClass changes, but I guess they'll collide in disas.c... Once it's reviewed I can queue it on qom-cpu-next, unless Peter or someone wants to. Thanks for your efforts, Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton; HRB 21284 (AG Nürnberg)
Re: [Qemu-devel] [PATCH 5/8] disas: lm32: QOMify target specific disas setup
Am 12. Juli 2015 04:00:02 MESZ, schrieb Peter Crosthwaite : >From: Peter Crosthwaite > >Move the target_disas() lm32 specifics to the QOM disas_set_info hook >and delete the #ifdef specific code in disas.c. > >Cc: Michael Walle >Signed-off-by: Peter Crosthwaite Acked-by: Michael Walle
Re: [Qemu-devel] [PATCH 3/8] disas: m68k: QOMify target specific disas setup
Le 12/07/2015 04:00, Peter Crosthwaite a écrit : > From: Peter Crosthwaite > > Move the target_disas() m68k specifics to the QOM disas_set_info hook > and delete the #ifdef specific code in disas.c. > > Cc: Greg Ungerer > Cc: Laurent Vivier > Signed-off-by: Peter Crosthwaite > --- > Testing: > I cant find binaries for this arch easily, but I got this from executing > random code: > > $ ./m68k-softmmu/qemu-system-m68k -kernel ./random_code -S -nographic -d > in_asm 2> err > QEMU 2.3.90 monitor - type 'help' for more information > (qemu) xp 0x4000 > 4000: 0x7d413a22 > (qemu) xp/i 0x4000 > 0x4000: mvsw %d1,%d6 > (qemu) xp/i 0x4004 > 0x4004: addqb #2,%a0@(27614) > (qemu) c > (qemu) Aborted (core dumped) > > $ more err > qemu: fatal: Illegal instruction: 7d41 @ 4000 > --- > disas.c | 4 > target-m68k/cpu.c | 7 +++ > 2 files changed, 7 insertions(+), 4 deletions(-) Reviewed-by: Laurent Vivier