Re: [Qemu-devel] [PATCH 09/12] ring: introduce lockless ring buffer
On Mon, Jun 04, 2018 at 05:55:17PM +0800, guangrong.x...@gmail.com wrote: [...] (Some more comments/questions for the MP implementation...) > +static inline int ring_mp_put(Ring *ring, void *data) > +{ > +unsigned int index, in, in_next, out; > + > +do { > +in = atomic_read(>in); > +out = atomic_read(>out); [0] Do we need to fetch "out" with load_acquire()? Otherwise what's the pairing of below store_release() at [1]? This barrier exists in SP-SC case which makes sense to me, I assume that's also needed for MP-SC case, am I right? > + > +if (__ring_is_full(ring, in, out)) { > +if (atomic_read(>in) == in && > +atomic_read(>out) == out) { Why read again? After all the ring API seems to be designed as non-blocking. E.g., I see the poll at [2] below makes more sense since when reaches [2] it means that there must be a producer that is _doing_ the queuing, so polling is very possible to complete fast. However here it seems to be a pure busy poll without any hint. Then not sure whether we should just let the caller decide whether it wants to call ring_put() again. > +return -ENOBUFS; > +} > + > +/* a entry has been fetched out, retry. */ > +continue; > +} > + > +in_next = in + 1; > +} while (atomic_cmpxchg(>in, in, in_next) != in); > + > +index = ring_index(ring, in); > + > +/* > + * smp_rmb() paired with the memory barrier of (A) in ring_mp_get() > + * is implied in atomic_cmpxchg() as we should read ring->out first > + * before fetching the entry, otherwise this assert will fail. Thanks for all these comments! These are really helpful for reviewers. However I'm not sure whether I understand it correctly here on MB of (A) for ring_mp_get() - AFAIU that should corresponds to a smp_rmb() at [0] above when reading the "out" variable rather than this assertion, and that's why I thought at [0] we should have something like a load_acquire() there (which contains a rmb()). >From content-wise, I think the code here is correct, since atomic_cmpxchg() should have one implicit smp_mb() after all so we don't need anything further barriers here. > + */ > +assert(!atomic_read(>data[index])); > + > +/* > + * smp_mb() paired with the memory barrier of (B) in ring_mp_get() is > + * implied in atomic_cmpxchg(), that is needed here as we should read > + * ring->out before updating the entry, it is the same as we did in > + * __ring_put(). > + * > + * smp_wmb() paired with the memory barrier of (C) in ring_mp_get() > + * is implied in atomic_cmpxchg(), that is needed as we should increase > + * ring->in before updating the entry. > + */ > +atomic_set(>data[index], data); > + > +return 0; > +} > + > +static inline void *ring_mp_get(Ring *ring) > +{ > +unsigned int index, in; > +void *data; > + > +do { > +in = atomic_read(>in); > + > +/* > + * (C) should read ring->in first to make sure the entry pointed by > this > + * index is available > + */ > +smp_rmb(); > + > +if (!__ring_is_empty(in, ring->out)) { > +break; > +} > + > +if (atomic_read(>in) == in) { > +return NULL; > +} > +/* new entry has been added in, retry. */ > +} while (1); > + > +index = ring_index(ring, ring->out); > + > +do { > +data = atomic_read(>data[index]); > +if (data) { > +break; > +} > +/* the producer is updating the entry, retry */ > +cpu_relax(); [2] > +} while (1); > + > +atomic_set(>data[index], NULL); > + > +/* > + * (B) smp_mb() is needed as we should read the entry out before > + * updating ring->out as we did in __ring_get(). > + * > + * (A) smp_wmb() is needed as we should make the entry be NULL before > + * updating ring->out (which will make the entry be visible and usable). > + */ > +atomic_store_release(>out, ring->out + 1); [1] > + > +return data; > +} > + > +static inline int ring_put(Ring *ring, void *data) > +{ > +if (ring->flags & RING_MULTI_PRODUCER) { > +return ring_mp_put(ring, data); > +} > +return __ring_put(ring, data); > +} > + > +static inline void *ring_get(Ring *ring) > +{ > +if (ring->flags & RING_MULTI_PRODUCER) { > +return ring_mp_get(ring); > +} > +return __ring_get(ring); > +} > +#endif > -- > 2.14.4 > Thanks, -- Peter Xu
Re: [Qemu-devel] [PATCH v2 3/4] ppc/pnv: introduce Pnv8Chip and Pnv9Chip models
On 06/20/2018 02:56 AM, David Gibson wrote: > On Tue, Jun 19, 2018 at 07:24:44AM +0200, Cédric Le Goater wrote: >> > typedef struct PnvChipClass { > /*< private >*/ > @@ -75,6 +95,7 @@ typedef struct PnvChipClass { > > hwaddr xscom_base; > > +void (*realize)(PnvChip *chip, Error **errp); This looks the wrong way round from how things are usually done. Rather than having the base class realize() call the subclass specific realize hook, it's more usual for the subclass to set the dc->realize() and have it call a k->parent_realize() to call up the chain. grep for device_class_set_parent_realize() for some more examples. >>> >>> Ah. That is more to my liking. There are a couple of models following >>> the wrong object pattern, xics, vio. I will check them. >> >> So XICS is causing some head-aches because the ics-kvm model inherits >> from ics-simple which inherits from ics-base. so we have a grand-parent >> class to handle. > > Ok. I mean, we should probably switch ics around to use the > parent_realize model, rather than the backwards one it does now. But > it's not immediately obvious to me why having a grandparent class > breaks things. If you follow the common realize pattern, you end up with a recursive loop with one of the realize routine. I didn't dig much the issue. >> if we could affiliate ics-kvm directly to ics-base it would make the >> family affair easier. we need to check migration though. > > But that said, I've been thinking for a while that it might make sense > to fold ics-kvm into ics-base. It seems very risky to have two > different object classes that are supposed to have guest-identical > behaviour. Certainly adding any migratable state to one not the other > would be horribly wrong. yes. clearly. something like bellow would be better: +---+ | ICS | +-+ common/base ++ | +---+| | | | spaprspapr | | pnv | +--v++v+ | ICS || ICS | | simple/QEMU || KVM | +---++-+ with only some reset and realize handling in the subclasses. The only extra field we could add under the KVM class is the KVM XICS device fd. Thanks, C.
Re: [Qemu-devel] [PATCH v2 3/3] spapr: introduce a fixed IRQ number space
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c >> index e4f5946a2188..c82dc40be0d5 100644 >> --- a/hw/ppc/spapr_events.c >> +++ b/hw/ppc/spapr_events.c >> @@ -709,7 +709,11 @@ void spapr_events_init(sPAPRMachineState *spapr) >> { >> int epow_irq; >> >> -epow_irq = spapr_irq_findone(spapr, _fatal); >> +if (spapr->xics_legacy) { >> +epow_irq = spapr_irq_findone(spapr, _fatal); >> +} else { >> +epow_irq = SPAPR_IRQ_EPOW; > > Can slightly improve brevity by just initializing epow_irq to this, > then overwriting it in the legacy case. and I forgot to add this to v3 ... I can add it later on if there are no other changes requested or if we move the find routine under the machine class. C.
Re: [Qemu-devel] [PATCH 09/12] ring: introduce lockless ring buffer
On Mon, Jun 04, 2018 at 05:55:17PM +0800, guangrong.x...@gmail.com wrote: > From: Xiao Guangrong > > It's the simple lockless ring buffer implement which supports both > single producer vs. single consumer and multiple producers vs. > single consumer. > > Many lessons were learned from Linux Kernel's kfifo (1) and DPDK's > rte_ring (2) before i wrote this implement. It corrects some bugs of > memory barriers in kfifo and it is the simpler lockless version of > rte_ring as currently multiple access is only allowed for producer. Could you provide some more information about the kfifo bug? Any pointer would be appreciated. > > If has single producer vs. single consumer, it is the traditional fifo, > If has multiple producers, it uses the algorithm as followings: > > For the producer, it uses two steps to update the ring: >- first step, occupy the entry in the ring: > > retry: > in = ring->in > if (cmpxhg(>in, in, in +1) != in) > goto retry; > > after that the entry pointed by ring->data[in] has been owned by > the producer. > > assert(ring->data[in] == NULL); > > Note, no other producer can touch this entry so that this entry > should always be the initialized state. > >- second step, write the data to the entry: > > ring->data[in] = data; > > For the consumer, it first checks if there is available entry in the > ring and fetches the entry from the ring: > > if (!ring_is_empty(ring)) > entry = [ring->out]; > > Note: the ring->out has not been updated so that the entry pointed > by ring->out is completely owned by the consumer. > > Then it checks if the data is ready: > > retry: > if (*entry == NULL) > goto retry; > That means, the producer has updated the index but haven't written any > data to it. > > Finally, it fetches the valid data out, set the entry to the initialized > state and update ring->out to make the entry be usable to the producer: > > data = *entry; > *entry = NULL; > ring->out++; > > Memory barrier is omitted here, please refer to the comment in the code. > > (1) > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/kfifo.h > (2) http://dpdk.org/doc/api/rte__ring_8h.html > > Signed-off-by: Xiao Guangrong > --- > migration/ring.h | 265 > +++ If this is a very general implementation, not sure whether we can move this to util/ directory so that it can be used even outside migration codes. > 1 file changed, 265 insertions(+) > create mode 100644 migration/ring.h > > diff --git a/migration/ring.h b/migration/ring.h > new file mode 100644 > index 00..da9b8bdcbb > --- /dev/null > +++ b/migration/ring.h > @@ -0,0 +1,265 @@ > +/* > + * Ring Buffer > + * > + * Multiple producers and single consumer are supported with lock free. > + * > + * Copyright (c) 2018 Tencent Inc > + * > + * Authors: > + * Xiao Guangrong > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory. > + */ > + > +#ifndef _RING__ > +#define _RING__ > + > +#define CACHE_LINE 64 Is this for x86_64? Is the cache line size the same for all arch? > +#define cache_aligned __attribute__((__aligned__(CACHE_LINE))) > + > +#define RING_MULTI_PRODUCER 0x1 > + > +struct Ring { > +unsigned int flags; > +unsigned int size; > +unsigned int mask; > + > +unsigned int in cache_aligned; > + > +unsigned int out cache_aligned; > + > +void *data[0] cache_aligned; > +}; > +typedef struct Ring Ring; > + > +/* > + * allocate and initialize the ring > + * > + * @size: the number of element, it should be power of 2 > + * @flags: set to RING_MULTI_PRODUCER if the ring has multiple producer, > + * otherwise set it to 0, i,e. single producer and single consumer. > + * > + * return the ring. > + */ > +static inline Ring *ring_alloc(unsigned int size, unsigned int flags) > +{ > +Ring *ring; > + > +assert(is_power_of_2(size)); > + > +ring = g_malloc0(sizeof(*ring) + size * sizeof(void *)); > +ring->size = size; > +ring->mask = ring->size - 1; > +ring->flags = flags; > +return ring; > +} > + > +static inline void ring_free(Ring *ring) > +{ > +g_free(ring); > +} > + > +static inline bool __ring_is_empty(unsigned int in, unsigned int out) > +{ > +return in == out; > +} (some of the helpers are a bit confusing to me like this one; I would prefer some of the helpers be directly squashed into code, but it's a personal preference only) > + > +static inline bool ring_is_empty(Ring *ring) > +{ > +return ring->in == ring->out; > +} > + > +static inline unsigned int ring_len(unsigned int in, unsigned int out) > +{ > +return in - out; > +} (this too) > + > +static inline bool > +__ring_is_full(Ring *ring, unsigned int in, unsigned int out) > +{ > +return
Re: [Qemu-devel] [RFC v2 1/3] pci_expander_bridge: add type TYPE_PXB_PCIE_HOST
On 06/13/2018 11:23 AM, Zihan Yang wrote: Michael S. Tsirkin 于2018年6月12日周二 下午9:43写道: On Tue, Jun 12, 2018 at 05:13:22PM +0800, Zihan Yang wrote: The inner host bridge created by pxb-pcie is TYPE_PXB_PCI_HOST by default, add a new type TYPE_PXB_PCIE_HOST to better utilize the ECAM of PCIe Signed-off-by: Zihan Yang I have a concern that there are lots of new properties added here, I'm not sure how are upper layers supposed to manage them all. E.g. bus_nr supplied in several places, domain_nr for which it's not clear how it is supposed to be allocated, etc. Indeed they seem to double the properties, but the pxb host is an internal structure of pxb-pcie device, created in pxb-pcie's realization procedure, and acpi-build queries host bridges instead of pxb-pcie devices. This means that users can not directly specify the property of pxb host bridge, but must 'inherit' from pxb-pcie devices. I had thought about changing the acpi-build process, but that would require more modifications. As for the properties, bus_nr means the start bus number of this host bridge. It is used when pxb-pcie is in pci domain 0 with q35 host to avoid bus number confliction. When it is placed in a separate pci domain, it is not used and should be 0. max_bus means how many buses the user desires, EACH bus in PCIe requires 1MB configuration space, thus specifying it could reduce the reserved memory in MMCFG as suggested by Marcel. The max_bus property is optional, you set the default to 255. I am wondering if 255 is too much as a default for an extra bus, I would use a smaller value, like 10. Typically, the user can specify -device pxb-pcie,id=br1,bus="pcie.0",sep_domain=on,domain_nr=1,max_bus=130 this will place the buses under this pxb host bridge in pci domain 1, and reserve (130 + 1) = 131 buses for it. The start bus number is always 0 currently for simplicity. Can the management interface be simplified? Ideally we wouldn't have to teach libvirt new tricks, just generalize pxb support slightly. We can delete 'sep_domain' property, I just find 'domain_nr' already indicates domain number. Agreed, please remove sep_domain property. Thanks, But domain_nr and max_bus seems unremovable, although they look 'redundant' because they appear twice. I'm not familiar with libvirt, but from the perspective of user, only 2 properties are added(domain_nr and max_bus, if we delete sep_domain), though the internal structure actually has changed.
[Qemu-devel] [PATCH 0/2] ide/hw/core: fix bug# 1777315, crash on short PRDs
The fix utilizes the existing policy QEMU has about short PRDs, and considers the transfers that cause the crash as generated through short PRDS. It - continues to allow QEMU to support multiple calls to prepare_buf/ide_dma_cb, - so, continues to keep QEMU free from needing the entire sglist in one go; - avoids the crash; - but, treats the affected transfers as short, instead of allowing them to continue. Amol Surati (1): ide/hw/core: fix crash on processing a partial-sector-size DMA xfer John Snow (1): tests/ide-test: test case for crash when processing short PRDs hw/ide/core.c| 5 - tests/ide-test.c | 28 2 files changed, 32 insertions(+), 1 deletion(-) -- 2.17.1
[Qemu-devel] [PATCH 1/2] ide/hw/core: fix crash on processing a partial-sector-size DMA xfer
Fixes: https://bugs.launchpad.net/qemu/+bug/1777315 QEMU's short PRD policy applies to a DMA transfer of size < 512 bytes. But it fails to consider transfers which are >= 512 bytes, but are not a multiple of 512 bytes. Such transfers are not subject to the short PRD policy. They end up violating the assumptions about the granularity of the IO sizes, upon which depend the verification of the completion of the previous transfer, and the advancement of the offset in preparation of the next. Those violations result in the crash. By forcing each transfer to be a multiple of sector size, such transfers are subjected to the policy, and therefore culled before they cause the crash. Signed-off-by: Amol Surati --- hw/ide/core.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 2c62efc536..14d135224b 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -836,6 +836,7 @@ static void ide_dma_cb(void *opaque, int ret) { IDEState *s = opaque; int n; +int32_t size_prepared; int64_t sector_num; uint64_t offset; bool stay_active = false; @@ -886,7 +887,9 @@ static void ide_dma_cb(void *opaque, int ret) n = s->nsector; s->io_buffer_index = 0; s->io_buffer_size = n * 512; -if (s->bus->dma->ops->prepare_buf(s->bus->dma, s->io_buffer_size) < 512) { +size_prepared = s->bus->dma->ops->prepare_buf(s->bus->dma, + s->io_buffer_size); +if (size_prepared <= 0 || size_prepared % 512) { /* The PRDs were too short. Reset the Active bit, but don't raise an * interrupt. */ s->status = READY_STAT | SEEK_STAT; -- 2.17.1
[Qemu-devel] [PATCH 2/2] tests/ide-test: test case for crash when processing short PRDs
From: John Snow Related Bug: https://bugs.launchpad.net/qemu/+bug/1777315 Signed-off-by: Amol Surati --- tests/ide-test.c | 28 1 file changed, 28 insertions(+) diff --git a/tests/ide-test.c b/tests/ide-test.c index f39431b1a9..382c29a174 100644 --- a/tests/ide-test.c +++ b/tests/ide-test.c @@ -473,6 +473,32 @@ static void test_bmdma_one_sector_short_prdt(void) free_pci_device(dev); } +static void test_bmdma_partial_sector_short_prdt(void) +{ +QPCIDevice *dev; +QPCIBar bmdma_bar, ide_bar; +uint8_t status; + +/* Read 2 sectors but only give 1 sector in PRDT */ +PrdtEntry prdt[] = { +{ +.addr = 0, +.size = cpu_to_le32(0x200), +}, +{ +.addr = 512, +.size = cpu_to_le32(0x44 | PRDT_EOT), +} +}; + +dev = get_pci_device(_bar, _bar); +status = send_dma_request(CMD_READ_DMA, 0, 2, + prdt, ARRAY_SIZE(prdt), NULL); +g_assert_cmphex(status, ==, 0); +assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | ERR); +free_pci_device(dev); +} + static void test_bmdma_long_prdt(void) { QPCIDevice *dev; @@ -1037,6 +1063,8 @@ int main(int argc, char **argv) qtest_add_func("/ide/bmdma/short_prdt", test_bmdma_short_prdt); qtest_add_func("/ide/bmdma/one_sector_short_prdt", test_bmdma_one_sector_short_prdt); +qtest_add_func("/ide/bmdma/partial_sector_short_prdt", + test_bmdma_partial_sector_short_prdt); qtest_add_func("/ide/bmdma/long_prdt", test_bmdma_long_prdt); qtest_add_func("/ide/bmdma/no_busmaster", test_bmdma_no_busmaster); qtest_add_func("/ide/bmdma/teardown", test_bmdma_teardown); -- 2.17.1
Re: [Qemu-devel] [virtio-dev] Re: [v23 1/2] virtio-crypto: Add virtio crypto device specification
On Wed, Jan 10, 2018 at 01:53:09PM +0800, Longpeng (Mike) wrote: > Hi Halil, > > We are fixing the Intel BUG these days, so I will go through your comments > after > we're done. Thanks. All right - are you guys done with meltdown/spectre? I'd like us to start finally getting parts of this in the spec. This is already used in the field - let's get into spec whatever is already out there. Argue about future enhancements later. -- MST
Re: [Qemu-devel] [PATCH v2 3/8] ppc4xx_i2c: Implement directcntl register
On Wed, 20 Jun 2018, David Gibson wrote: On Tue, Jun 19, 2018 at 11:29:09AM +0200, BALATON Zoltan wrote: On Mon, 18 Jun 2018, David Gibson wrote: On Wed, Jun 13, 2018 at 04:03:18PM +0200, BALATON Zoltan wrote: On Wed, 13 Jun 2018, David Gibson wrote: On Wed, Jun 13, 2018 at 10:54:22AM +0200, BALATON Zoltan wrote: On Wed, 13 Jun 2018, David Gibson wrote: On Wed, Jun 06, 2018 at 03:31:48PM +0200, BALATON Zoltan wrote: diff --git a/hw/i2c/ppc4xx_i2c.c b/hw/i2c/ppc4xx_i2c.c index a68b5f7..5806209 100644 --- a/hw/i2c/ppc4xx_i2c.c +++ b/hw/i2c/ppc4xx_i2c.c @@ -30,6 +30,7 @@ #include "cpu.h" #include "hw/hw.h" #include "hw/i2c/ppc4xx_i2c.h" +#include "bitbang_i2c.h" #define PPC4xx_I2C_MEM_SIZE 18 @@ -46,7 +47,13 @@ #define IIC_XTCNTLSS_SRST (1 << 0) +#define IIC_DIRECTCNTL_SDAC (1 << 3) +#define IIC_DIRECTCNTL_SCLC (1 << 2) +#define IIC_DIRECTCNTL_MSDA (1 << 1) +#define IIC_DIRECTCNTL_MSCL (1 << 0) + typedef struct { +bitbang_i2c_interface *bitbang; uint8_t mdata; uint8_t lmadr; uint8_t hmadr; @@ -308,7 +315,11 @@ static void ppc4xx_i2c_writeb(void *opaque, hwaddr addr, uint64_t value, i2c->xtcntlss = value; break; case 16: -i2c->directcntl = value & 0x7; +i2c->directcntl = value & (IIC_DIRECTCNTL_SDAC & IIC_DIRECTCNTL_SCLC); +i2c->directcntl |= (value & IIC_DIRECTCNTL_SCLC ? 1 : 0); +bitbang_i2c_set(i2c->bitbang, BITBANG_I2C_SCL, i2c->directcntl & 1); Shouldn't that use i2c->directcntl & IIC_DIRECTCNTL_MSCL ? +i2c->directcntl |= bitbang_i2c_set(i2c->bitbang, BITBANG_I2C_SDA, + (value & IIC_DIRECTCNTL_SDAC) != 0) << 1; Last expression might be clearer as: value & IIC_DIRECTCNTL_SDAC ? IIC_DIRECTCNTL_MSDA : 0 I guess this is a matter of taste but to me IIC_DIRECTCNTL_MSDA is a bit position in the register so I use that when accessing that bit but when I check for the values of a bit being 0 or 1 I don't use the define which is for something else, just happens to have value 1 as well. Hmm.. but the bit is being store in i2c->directcntl, which means it can be read back from the register in that position, no? Which of the above two do you mean? In the first one I test for the 1/0 value set by the previous line before the bitbang_i2c_set call. This could be accessed as MSCL later but using that here would just make it longer and less obvious. If I want to be absolutely precise maybe it should be (value & IIC_DIRECTCNTL_SCL ? 1 : 0) in this line too but that was just stored in the register one line before so I can reuse that here as well. Otherwise I could add another variable just for this bit value and use that in both lines but why make it more complicated for a simple 1 or 0 value? Longer maybe, but I don't know about less obvious. Actually I think you should use IIC_DIRECTCNTL_MSCL instead of a bare '1' in both the line setting i2c->directcntl, then the next line checking that bit to pass it into bitbang_i2c_set. The point is you're modifying the effective register contents, so it makes sense to make it clearer which bit of the register you're setting. When setting the bit it's the value 1 so that's not the bit position, Huh?? The constants aren't bit positions either, they're masks. How is IIC_DIRECTCNTL_MSCL wrong here? I think 1 : 0 is correct there. Correct, sure, but less clear than it could be. I've changed the next line in v4 I've just sent to the constant when checking the value of the MSCL bit. In the second case using MSDA is really not correct because the level to set is defined by SDAC bit. The SDAC, SCLC bits are what the program sets to tell which states the two i2c lines should be and the MSDA, MSCL are read only bits that show what states the lines really are. Ok... IIC_DIRECTCNTL_MSDA has value of 1 but it means the second bit in the directcntl reg (which could have 0 or 1 value) not 1 value of a bit or i2c line. Uh.. what? AFAICT, based on the result of bitbang_i2c_set() you're updating the value of the MSDA (== 0x2) bit in i2c->directcntl register state. Why doesn't the symbolic name make sense here? Sorry, I may not have been able to clearly say what I mean. I meant that IIC_DIRECTCNTL_MSDA means the bit in position 1 (numbering from LSB being bit number 0) which may have value 1 or 0. In cases I mean the value I use 1 or 0. In case I refer to the bit position I use constants. In the line bitbang_i2c_set(i2c->bitbang, BITBANG_I2C_SCL, i2c->directcntl & 1); it should be the constant, just used 1 there for brevity because it's obvious from the previous line what's meant. Maybe, but using the constant is still clearer, and friendly to people grepping the source. I've changed this now. At other places the values of the bits are written as 1 or 0 so I think for those constants should not be needed. I have no idea what you mean by this. OK, I'm lost now. Is v4 acceptable or are there any more changes you
Re: [Qemu-devel] [virtio-dev] [PATCH] Add virtio gpu device specification.
On Tue, May 10, 2016 at 01:25:37PM +0200, Gerd Hoffmann wrote: > Hi, > > > > > Rendered versions are available here: > > > > https://www.kraxel.org/virtio/virtio-v1.0-cs03-virtio-gpu.pdf > > > > > > > > https://www.kraxel.org/virtio/virtio-v1.0-cs03-virtio-gpu.html#x1-287 > > > > I guess a non-fenced command only completes when the operation has > > > finished, too (so that a meaningful success/error value can be > > > produced)? > > > > When stuff is processed asynchronously the command can complete before > > the operation actually completed. Current qemu implementation does that > > only in 3d mode, when offloading stuff to the hardware (and verifies > > stuff beforehand, so if you try to kick 3d rendering with an invalid > > context id qemu will throw an error). > > > > I'll try to make that more clear in the text. > > Updated now. > > cheers, > Gerd Is there a chance you could rebase and post? This is used widely, I think we shoould have it in 1.1 if at all possible. > > - > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [Qemu-devel] [PATCH v4 6/7] monitor: remove "x-oob", turn oob on by default
On Tue, Jun 19, 2018 at 04:16:49PM +0200, Markus Armbruster wrote: > Peter Xu writes: > > > There was a regression reported by Eric Auger before with OOB: > > > > http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html > > > > It is fixed in 951702f39c ("monitor: bind dispatch bh to iohandler > > context", 2018-04-10). > > > > For the bug, we turned Out-Of-Band feature of monitors off for 2.12 > > release. Now we turn that on again after the 2.12 release. > > Relating what happened in the order it happened could be easier to > understand. Perhaps: > > OOB commands were introduced in commit cf869d53172. Unfortunately, we > ran into a regression, and had to disable them by default for 2.12 > (commit be933ffc23). > > The regression has since been fixed (commit 951702f39c7 "monitor: bind > dispatch bh to iohandler context"). Time to re-enable OOB. This indeed looks much nicer. > > > This patch partly reverts be933ffc23 (monitor: new parameter "x-oob"), > > meanwhile turn it on again by default for non-MUX QMPs. Note that we > > "by default"? Did I mis-spell somewhere? > > > can't enable Out-Of-Band for monitors with MUX-typed chardev backends, > > because not all the chardev frontends can run without main thread, or > > can run in multiple threads. > > > > Some trivial touch-up in the test code is required to make sure qmp-test > > won't broke. > > "won't break" This one I did. > > > > > Signed-off-by: Peter Xu > > --- > > include/monitor/monitor.h | 1 - > > monitor.c | 17 + > > tests/libqtest.c | 2 +- > > tests/qmp-test.c | 2 +- > > vl.c | 5 - > > 5 files changed, 3 insertions(+), 24 deletions(-) > > > > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h > > index d6ab70cae2..0cb0538a31 100644 > > --- a/include/monitor/monitor.h > > +++ b/include/monitor/monitor.h > > @@ -13,7 +13,6 @@ extern Monitor *cur_mon; > > #define MONITOR_USE_READLINE 0x02 > > #define MONITOR_USE_CONTROL 0x04 > > #define MONITOR_USE_PRETTY0x08 > > -#define MONITOR_USE_OOB 0x10 > > > > bool monitor_cur_is_qmp(void); > > > > diff --git a/monitor.c b/monitor.c > > index c9a02ee40c..7fbcf84b02 100644 > > --- a/monitor.c > > +++ b/monitor.c > > @@ -4587,19 +4587,7 @@ void monitor_init(Chardev *chr, int flags) > > { > > Monitor *mon = g_malloc(sizeof(*mon)); > > bool use_readline = flags & MONITOR_USE_READLINE; > > -bool use_oob = flags & MONITOR_USE_OOB; > > - > > -if (use_oob) { > > -if (CHARDEV_IS_MUX(chr)) { > > -error_report("Monitor Out-Of-Band is not supported with " > > - "MUX typed chardev backend"); > > -exit(1); > > -} > > -if (use_readline) { > > -error_report("Monitor Out-Of-band is only supported by QMP"); > > -exit(1); > > -} > > -} > > +bool use_oob = (flags & MONITOR_USE_CONTROL) && !CHARDEV_IS_MUX(chr); > > A comment explaining (briefly!) why MUX prevents oob would be useful > here. Fortunately, you can simply steal from your commit message. Done. > > > > > monitor_data_init(mon, false, use_oob); > > > > @@ -4701,9 +4689,6 @@ QemuOptsList qemu_mon_opts = { > > },{ > > .name = "pretty", > > .type = QEMU_OPT_BOOL, > > -},{ > > -.name = "x-oob", > > -.type = QEMU_OPT_BOOL, > > }, > > { /* end of list */ } > > }, > > diff --git a/tests/libqtest.c b/tests/libqtest.c > > index 098af6aec4..c5cb3f925c 100644 > > --- a/tests/libqtest.c > > +++ b/tests/libqtest.c > > @@ -213,7 +213,7 @@ QTestState *qtest_init_without_qmp_handshake(bool > > use_oob, > >"-display none " > >"%s", qemu_binary, socket_path, > >getenv("QTEST_LOG") ? "/dev/fd/2" : > > "/dev/null", > > - qmp_socket_path, use_oob ? ",x-oob=on" : > > "", > > + qmp_socket_path, "", > >extra_args ?: ""); > > execlp("/bin/sh", "sh", "-c", command, NULL); > > exit(1); > > diff --git a/tests/qmp-test.c b/tests/qmp-test.c > > index a49cbc6fde..3747bf7fbb 100644 > > --- a/tests/qmp-test.c > > +++ b/tests/qmp-test.c > > @@ -89,7 +89,7 @@ static void test_qmp_protocol(void) > > g_assert(q); > > test_version(qdict_get(q, "version")); > > capabilities = qdict_get_qlist(q, "capabilities"); > > -g_assert(capabilities && qlist_empty(capabilities)); > > +g_assert(capabilities); > > qobject_unref(resp); > > > > /* Test valid command before handshake */ > > diff --git a/vl.c b/vl.c > > index 6e34fb348d..26a0bb3f0f 100644 > > --- a/vl.c > > +++ b/vl.c > > @@ -2307,11 +2307,6 @@ static int mon_init_func(void *opaque, QemuOpts > > *opts,
Re: [Qemu-devel] [PATCH v4 4/7] tests: iotests: drop some stderr line
On Tue, Jun 19, 2018 at 03:57:07PM +0200, Markus Armbruster wrote: > Peter Xu writes: > > > In my Out-Of-Band test, "check -qcow2 060" fail with this (the output is > > manually changed due to line width requirement): > > > > 060 5s ... - output mismatch (see 060.out.bad) > > --- /home/peterx/git/qemu/tests/qemu-iotests/060.out > > +++ /home/peterx/git/qemu/bin/tests/qemu-iotests/060.out.bad > > @@ -427,8 +427,8 @@ > > QMP_VERSION > > {"return": {}} > > qcow2: Image is corrupt: L2 table offset 0x2a2a2a00 unaligned (L1 > > index: 0); further non-fatal corruption events will be suppressed > > -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > > - "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2 > > - table offset 0x2a2a2a0 > > 0 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}} > > read failed: Input/output error > > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > > + "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2 > > + table offset 0x2a2a2a0 > > 0 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}} > > {"return": ""} > > {"return": {}} > > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > > "event": "SHUTDOWN", "data": {"guest": false}} > > Please indent this diff; I'd expect git-am to choke on it. Do you mean something like pretty-JSON? How about I remove this chunk too? What do you prefer? > > > > > The order of the event and the in/out error line is swapped. I didn't > > dig up the reason, but AFAIU what we want to verify is the event rather > > than stderr. Let's drop the stderr line directly for this test. > > > > Signed-off-by: Peter Xu Regards, -- Peter Xu
Re: [Qemu-devel] [PATCH v4 3/7] monitor: flush qmp responses when CLOSED
On Tue, Jun 19, 2018 at 03:55:12PM +0200, Markus Armbruster wrote: > Peter Xu writes: > > > On Tue, Jun 19, 2018 at 01:34:22PM +0800, Peter Xu wrote: > > > > [...] > > > >> Fixes: 6d2d563f8c ("qmp: cleanup qmp queues properly", 2018-03-27) > >> Suggested-by: Markus Armbruster > >> Signed-off-by: Peter Xu > >> > >> Signed-off-by: Peter Xu > > > > I am pretty sure this time that this 2nd line is not there in my local > > tree. :) > > > > I think it's a git-format-patch bug, otherwise I must have misused it > > for a long time. Instead of figuring this out and repost again, I'll > > see how far the rest of the series can go. > > Do you use git-format-patch -s, or have format.signOff set in > .git/config or ~/.gitconfig? Ah it's in my ~/.gitconfig! Removing that fixes the issue. Though I'm still not sure why the problem doesn't happen with other patches. After all, due to the line wrapping mess I still prefer to drop that chunk in commit message directly. Regards, -- Peter Xu
Re: [Qemu-devel] [PATCH v4 3/7] monitor: flush qmp responses when CLOSED
On Tue, Jun 19, 2018 at 03:53:11PM +0200, Markus Armbruster wrote: > Peter Xu writes: > > > Previously we clean up the queues when we got CLOSED event. It was used > > to make sure we won't send leftover replies/events of a old client to a > > new client which makes perfect sense. However this will also drop the > > replies/events even if the output port of the previous chardev backend > > is still open, which can lead to missing of the last replies/events. > > Now this patch does an extra operation to flush the response queue > > before cleaning up. > > > > In most cases, a QMP session will be based on a bidirectional channel (a > > TCP port, for example, we read/write to the same socket handle), so in > > port and out port of the backend chardev are fundamentally the same > > port. In these cases, it does not really matter much on whether we'll > > flush the response queue since flushing will fail anyway. However there > > can be cases where in & out ports of the QMP monitor's backend chardev > > are separated. Here is an example: > > > > cat $QMP_COMMANDS | qemu -qmp stdio ... | filter_commands > > > > In this case, the backend is fd-typed, and it is connected to stdio > > where in port is stdin and out port is stdout. Now if we drop all the > > events on the response queue then filter_command process might miss some > > events that it might expect. The thing is that, when stdin closes, > > stdout might still be there alive! > > > > In practice, I encountered SHUTDOWN event missing when running test with > > iotest 087 with Out-Of-Band enabled. Here is one of the ways that this > > can happen (after "quit" command is executed and QEMU quits the main > > loop): > > > > 1. [main thread] QEMU queues a SHUTDOWN event into response queue. > > > > 2. "cat" terminates (to distinguish it from the animal, I quote it). > > > > 3. [monitor iothread] QEMU's monitor iothread reads EOF from stdin. > > > > 4. [monitor iothread] QEMU's monitor iothread calls the CLOSED event > >hook for the monitor, which will destroy the response queue of the > >monitor, then the SHUTDOWN event is dropped. > > > > 5. [main thread] QEMU's main thread cleans up the monitors in > >monitor_cleanup(). When trying to flush pending responses, it sees > >nothing. SHUTDOWN is lost forever. > > > > Note that before the monitor iothread was introduced, step [4]/[5] could > > never happen since the main loop was the only place to detect the EOF > > event of stdin and run the CLOSED event hooks. Now things can happen in > > parallel in the iothread. > > > > Without this patch, iotest 087 will have ~10% chance to miss the > > SHUTDOWN event and fail when with Out-Of-Band enabled (the output is > > manually touched up to suite line width requirement): > > I wouldn't wrap lines when quoting a diff. > > > > > --- /home/peterx/git/qemu/tests/qemu-iotests/087.out > > +++ /home/peterx/git/qemu/bin/tests/qemu-iotests/087.out.bad > > @@ -8,7 +8,6 @@ > > {"return": {}} > > {"error": {"class": "GenericError", "desc": "'node-name' must be > > specified for the root node"}} > > {"return": {}} > > -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > > - "event": "SHUTDOWN", "data": {"guest": false}} > > > > === Duplicate ID === > > @@ -53,7 +52,6 @@ > > {"return": {}} > > {"return": {}} > > {"return": {}} > > > > -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > > - "event": "SHUTDOWN", "data": {"guest": false}} > > Please indent the quoted diff a bit, so make it more obviously not part > of the patch. In fact, git-am chokes on it for me. To make it even simpler, I plan to remove the whole chunk of the diff from the commit message if you won't disagree. > > > > > This patch fixes the problem. > > > > Fixes: 6d2d563f8c ("qmp: cleanup qmp queues properly", 2018-03-27) > > Suggested-by: Markus Armbruster > > Signed-off-by: Peter Xu > > > > Signed-off-by: Peter Xu > > --- > > monitor.c | 33 ++--- > > 1 file changed, 30 insertions(+), 3 deletions(-) > > > > diff --git a/monitor.c b/monitor.c > > index d4a463f707..c9a02ee40c 100644 > > --- a/monitor.c > > +++ b/monitor.c > > @@ -512,6 +512,27 @@ struct QMPResponse { > > }; > > typedef struct QMPResponse QMPResponse; > > > > +static QObject *monitor_qmp_response_pop_one(Monitor *mon) > > +{ > > +QObject *data; > > + > > +qemu_mutex_lock(>qmp.qmp_queue_lock); > > +data = g_queue_pop_head(mon->qmp.qmp_responses); > > +qemu_mutex_unlock(>qmp.qmp_queue_lock); > > + > > +return data; > > +} > > + > > +static void monitor_qmp_response_flush(Monitor *mon) > > +{ > > +QObject *data; > > + > > +while ((data = monitor_qmp_response_pop_one(mon))) { > > +monitor_json_emitter_raw(mon, data); > > +qobject_unref(data); > > +} > > +} > > + > > /* > > * Pop a QMPResponse from any monitor's response queue into @response. > > * Return false if all the queues are empty;
Re: [Qemu-devel] [PATCH v3 1/2] kvm: support -dedicated cpu-pm=on|off
On Wed, Jun 20, 2018 at 08:46:10AM +0800, Wanpeng Li wrote: > On Wed, 20 Jun 2018 at 08:07, Michael S. Tsirkin wrote: > > > > On Tue, Jun 19, 2018 at 05:07:46PM -0500, Eric Blake wrote: > > > On 06/19/2018 10:17 AM, Paolo Bonzini wrote: > > > > On 16/06/2018 00:29, Michael S. Tsirkin wrote: > > > > > +static QemuOptsList qemu_dedicated_opts = { > > > > > +.name = "dedicated", > > > > > +.head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head), > > > > > +.desc = { > > > > > +{ > > > > > +.name = "mem-lock", > > > > > +.type = QEMU_OPT_BOOL, > > > > > +}, > > > > > +{ > > > > > +.name = "cpu-pm", > > > > > +.type = QEMU_OPT_BOOL, > > > > > +}, > > > > > +{ /* end of list */ } > > > > > +}, > > > > > +}; > > > > > + > > > > > > > > Let the bikeshedding begin! > > > > > > > > 1) Should we deprecate -realtime? > > > > > > > > 2) Maybe -hostresource? > > > > > > What further things might we add in the future? > > > > > > -dedicated sounds wrong (it is an adjective, while most of our options are > > > nouns - thing -machine, -drive, -object, ...) > > > > > > -hostresource at least sounds like a noun, but is long to type. But at > > > least '-hostresource cpu-pm=on' reads reasonably well. > > > > Yes but host resource what? I feel it says nothing at all about what > > one can expect to find in this flag. > > > > > About the only other noun I could think of would be '-feature cpu-pm=on'. > > > > If we have nothing at all to say about what is grouping these things, > > we don't need a new flag. We can make it a machine property. > > > > It's user's hint that some host resource is dedicated to a VM. > > The commit 633711e82 (kvm: rename KVM_HINTS_DEDICATED to > KVM_HINTS_REALTIME) should be reverted according to several threads > discussion I think. > > Regards, > Wanpeng Li IMHO that is unrelated - these KVM hints are hints to *guest*. In this thread we are talking about hints to QEMU that are only necessary because QEMU is separate from the host scheduler/memory management. -- MST
Re: [Qemu-devel] [PATCH for-2.11.2] spapr: make pseries-2.11 the default machine type
On Tue, Jun 19, 2018 at 01:11:28PM +0200, Greg Kurz wrote: > On Mon, 18 Jun 2018 21:04:38 -0500 > Michael Roth wrote: > > > Quoting Greg Kurz (2018-05-22 12:17:28) > > > The spapr capability framework was introduced in QEMU 2.12. It allows > > > to have an explicit control on how host features are exposed to the > > > guest. This is especially needed to handle migration between hetero- > > > geneous hosts (eg, POWER8 to POWER9). It is also used to expose fixes/ > > > workarounds against speculative execution vulnerabilities to guests. > > > The framework was hence backported to QEMU 2.11.1, especially these > > > commits: > > > > > > 0fac4aa93074 spapr: Add pseries-2.12 machine type > > > 9070f408f491 spapr: Treat Hardware Transactional Memory (HTM) as an > > > optional capability > > > > > > 0fac4aa93074 has the confusing effect of making pseries-2.12 the default > > > machine type for QEMU 2.11.1, instead of the expected pseries-2.11. This > > > patch changes the default machine back to pseries-2.11. > > > > > > Unfortunately, 9070f408f491 enforces the HTM capability for pseries-2.11. > > > This isn't supported by TCG and breaks 'make check'. So this patch also > > > adds a hack to turn HTM off when using TCG. > > > > I noticed this ends up breaking TCG migration for 2.11.2 -> 2.12, I > > get this on the target side even when specifying -machine > > pseries-2.11,cap-htm=off for both ends: > > > > qemu-system-ppc64: cap-htm higher level (1) in incoming stream than on > > destination (0) > > qemu-system-ppc64: error while loading state for instance 0x0 of device > > 'spapr' > > qemu-system-ppc64: load of migration failed: Invalid argument > > > > I'm not sure we care all that much about it but it's a regression from > > 2.11.1 > > at least. The main issue seems to be the default caps for 2.11.2 for TCG are > > now different from 2.11 and 2.12+, but spapr_cap_##cap##_needed still > > assumes > > everything is the same across all these versions and as such opts not to > > migrate cap-htm=off, since that's the default for 2.11.2. This results in > > the > > target assuming the source was using the default, which is cap-htm=on, > > and since that disagrees with the spapr->eff we get a failure. > > > > It seems spapr_cap_##cap##_needed needs to be fixed up to address that, > > but I'm not sure how best to deal with backward compatibility in that case. > > Any ideas? If it ends up being a trade-off I think forward compatibility is > > more important. > > > > Yeah, we shouldn't change the default since it affects the migration logic :-\ > > The motivation behind this hack is to fix TCG based 'make check', because > it doesn't pass cap-htm=off, and thus can't run with pseries-2.11. > > Another possibility is to let the default as is, and to disable HTM after the > default caps have been applied. > > Something like that squashed into this patch: > > diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c > index 82043e60e78b..26e6be043b18 100644 > --- a/hw/ppc/spapr_caps.c > +++ b/hw/ppc/spapr_caps.c > @@ -285,11 +285,6 @@ static sPAPRCapabilities > default_caps_with_cpu(sPAPRMachineState *spapr, > > caps = smc->default_caps; > > -/* HACK for 2.11.2: fix make check */ > -if (tcg_enabled()) { > -caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF; > -} > - > if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_2_07, >0, spapr->max_compat_pvr)) { > caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF; > @@ -405,6 +400,11 @@ void spapr_caps_reset(sPAPRMachineState *spapr) > } > } > > +/* HACK for 2.11.2: fix make check */ > +if (tcg_enabled()) { > +spapr->eff.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF; > +} > + > /* .. then apply those caps to the virtual hardware */ > > for (i = 0; i < SPAPR_CAP_NUM; i++) { > - No! The whole point of the caps stuff is to stop changing guest visible behaviours based on host side configuration like the accelerator. So, really, let's not put it back in. The correct fix is to add cap-htm=off to the testcases. Gross, but necessary. > > This allows: > - TCG 'make check' to be happy with pseries-2.11 > - 2.11.2 --> 2.12 migration and backward > > > > > > > Signed-off-by: Greg Kurz > > > --- > > > hw/ppc/spapr.c |4 ++-- > > > hw/ppc/spapr_caps.c |5 + > > > 2 files changed, 7 insertions(+), 2 deletions(-) > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > index 1a2dd1f597d9..6499a867520f 100644 > > > --- a/hw/ppc/spapr.c > > > +++ b/hw/ppc/spapr.c > > > @@ -3820,7 +3820,7 @@ static void > > > spapr_machine_2_12_class_options(MachineClass *mc) > > > /* Defaults for the latest behaviour inherited from the base class */ > > > } > > > > > > -DEFINE_SPAPR_MACHINE(2_12, "2.12", true); > > > +DEFINE_SPAPR_MACHINE(2_12, "2.12", false); > > > > > > /* > > > * pseries-2.11 > > > @@ -3842,7 +3842,7 @@ static void
Re: [Qemu-devel] [PATCH v2 3/8] ppc4xx_i2c: Implement directcntl register
On Tue, Jun 19, 2018 at 11:29:09AM +0200, BALATON Zoltan wrote: > On Mon, 18 Jun 2018, David Gibson wrote: > > On Wed, Jun 13, 2018 at 04:03:18PM +0200, BALATON Zoltan wrote: > > > On Wed, 13 Jun 2018, David Gibson wrote: > > > > On Wed, Jun 13, 2018 at 10:54:22AM +0200, BALATON Zoltan wrote: > > > > > On Wed, 13 Jun 2018, David Gibson wrote: > > > > > > On Wed, Jun 06, 2018 at 03:31:48PM +0200, BALATON Zoltan wrote: > > > > > > > diff --git a/hw/i2c/ppc4xx_i2c.c b/hw/i2c/ppc4xx_i2c.c > > > > > > > index a68b5f7..5806209 100644 > > > > > > > --- a/hw/i2c/ppc4xx_i2c.c > > > > > > > +++ b/hw/i2c/ppc4xx_i2c.c > > > > > > > @@ -30,6 +30,7 @@ > > > > > > > #include "cpu.h" > > > > > > > #include "hw/hw.h" > > > > > > > #include "hw/i2c/ppc4xx_i2c.h" > > > > > > > +#include "bitbang_i2c.h" > > > > > > > > > > > > > > #define PPC4xx_I2C_MEM_SIZE 18 > > > > > > > > > > > > > > @@ -46,7 +47,13 @@ > > > > > > > > > > > > > > #define IIC_XTCNTLSS_SRST (1 << 0) > > > > > > > > > > > > > > +#define IIC_DIRECTCNTL_SDAC (1 << 3) > > > > > > > +#define IIC_DIRECTCNTL_SCLC (1 << 2) > > > > > > > +#define IIC_DIRECTCNTL_MSDA (1 << 1) > > > > > > > +#define IIC_DIRECTCNTL_MSCL (1 << 0) > > > > > > > + > > > > > > > typedef struct { > > > > > > > +bitbang_i2c_interface *bitbang; > > > > > > > uint8_t mdata; > > > > > > > uint8_t lmadr; > > > > > > > uint8_t hmadr; > > > > > > > @@ -308,7 +315,11 @@ static void ppc4xx_i2c_writeb(void *opaque, > > > > > > > hwaddr addr, uint64_t value, > > > > > > > i2c->xtcntlss = value; > > > > > > > break; > > > > > > > case 16: > > > > > > > -i2c->directcntl = value & 0x7; > > > > > > > +i2c->directcntl = value & (IIC_DIRECTCNTL_SDAC & > > > > > > > IIC_DIRECTCNTL_SCLC); > > > > > > > +i2c->directcntl |= (value & IIC_DIRECTCNTL_SCLC ? 1 : 0); > > > > > > > +bitbang_i2c_set(i2c->bitbang, BITBANG_I2C_SCL, > > > > > > > i2c->directcntl & 1); > > > > > > > > > > > > Shouldn't that use i2c->directcntl & IIC_DIRECTCNTL_MSCL ? > > > > > > > > > > > > > +i2c->directcntl |= bitbang_i2c_set(i2c->bitbang, > > > > > > > BITBANG_I2C_SDA, > > > > > > > + (value & IIC_DIRECTCNTL_SDAC) != > > > > > > > 0) << 1; > > > > > > > > > > > > Last expression might be clearer as: > > > > > > value & IIC_DIRECTCNTL_SDAC ? IIC_DIRECTCNTL_MSDA : 0 > > > > > > > > > > I guess this is a matter of taste but to me IIC_DIRECTCNTL_MSDA is a > > > > > bit > > > > > position in the register so I use that when accessing that bit but > > > > > when I > > > > > check for the values of a bit being 0 or 1 I don't use the define > > > > > which is > > > > > for something else, just happens to have value 1 as well. > > > > > > > > Hmm.. but the bit is being store in i2c->directcntl, which means it > > > > can be read back from the register in that position, no? > > > > > > Which of the above two do you mean? > > > > > > In the first one I test for the 1/0 value set by the previous line before > > > the bitbang_i2c_set call. This could be accessed as MSCL later but using > > > that here would just make it longer and less obvious. If I want to be > > > absolutely precise maybe it should be (value & IIC_DIRECTCNTL_SCL ? 1 : 0) > > > in this line too but that was just stored in the register one line before > > > so > > > I can reuse that here as well. Otherwise I could add another variable just > > > for this bit value and use that in both lines but why make it more > > > complicated for a simple 1 or 0 value? > > > > Longer maybe, but I don't know about less obvious. Actually I think > > you should use IIC_DIRECTCNTL_MSCL instead of a bare '1' in both the > > line setting i2c->directcntl, then the next line checking that bit to > > pass it into bitbang_i2c_set. The point is you're modifying the > > effective register contents, so it makes sense to make it clearer > > which bit of the register you're setting. > > When setting the bit it's the value 1 so that's not the bit > position, Huh?? The constants aren't bit positions either, they're masks. How is IIC_DIRECTCNTL_MSCL wrong here? > I > think 1 : 0 is correct there. Correct, sure, but less clear than it could be. > I've changed the next line in v4 I've just > sent to the constant when checking the value of the MSCL bit. > > > > In the second case using MSDA is really not correct because the level to > > > set > > > is defined by SDAC bit. The SDAC, SCLC bits are what the program sets to > > > tell which states the two i2c lines should be and the MSDA, MSCL are read > > > only bits that show what states the lines really are. > > > > Ok... > > > > > IIC_DIRECTCNTL_MSDA has value of 1 but it means the second bit in the > > > directcntl reg (which could have 0 or 1 value) not 1 value of a bit or i2c > > > line. > > > > Uh.. what? AFAICT, based on the result of bitbang_i2c_set() you're > > updating the value of the MSDA (== 0x2)
Re: [Qemu-devel] [Qemu-block] [RFC 1/1] ide: bug #1777315: io_buffer_size and sg.size can represent partial sector sizes
On Wed, Jun 20, 2018 at 06:23:19AM +0530, Amol Surati wrote: > On Tue, Jun 19, 2018 at 05:43:52PM -0400, John Snow wrote: > > > > > > On 06/19/2018 05:26 PM, Amol Surati wrote: > > > On Tue, Jun 19, 2018 at 08:04:03PM +0530, Amol Surati wrote: > > >> On Tue, Jun 19, 2018 at 09:45:15AM -0400, John Snow wrote: > > >>> > > >>> > > >>> On 06/19/2018 04:53 AM, Kevin Wolf wrote: > > Am 19.06.2018 um 06:01 hat Amol Surati geschrieben: > > > On Mon, Jun 18, 2018 at 08:14:10PM -0400, John Snow wrote: > > >> > > >> > > >> On 06/18/2018 02:02 PM, Amol Surati wrote: > > >>> On Mon, Jun 18, 2018 at 12:05:15AM +0530, Amol Surati wrote: > > This patch fixes the assumption that io_buffer_size is always a > > perfect > > multiple of the sector size. The assumption is the cause of the > > firing > > of 'assert(n * 512 == s->sg.size);'. > > > > Signed-off-by: Amol Surati > > --- > > >>> > > >>> The repository https://github.com/asurati/1777315 contains a module > > >>> for > > >>> QEMU's 8086:7010 ATA controller, which exercises the code path > > >>> described in [RFC 0/1] of this series. > > >>> > > >> > > >> Thanks, this made it easier to see what was happening. I was able to > > >> write an ide-test test case using this source as a guide, and > > >> reproduce > > >> the error. > > >> > > >> static void test_bmdma_partial_sector_short_prdt(void) > > >> { > > >> QPCIDevice *dev; > > >> QPCIBar bmdma_bar, ide_bar; > > >> uint8_t status; > > >> > > >> /* Read 2 sectors but only give 1 sector in PRDT */ > > >> PrdtEntry prdt[] = { > > >> { > > >> .addr = 0, > > >> .size = cpu_to_le32(0x200), > > >> }, > > >> { > > >> .addr = 512, > > >> .size = cpu_to_le32(0x44 | PRDT_EOT), > > >> } > > >> }; > > >> > > >> dev = get_pci_device(_bar, _bar); > > >> status = send_dma_request(CMD_READ_DMA, 0, 2, > > >> prdt, ARRAY_SIZE(prdt), NULL); > > >> g_assert_cmphex(status, ==, 0); > > >> assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | > > >> ERR); > > >> free_pci_device(dev); > > >> } > > >> > > >>> Loading the module reproduces the bug. Tested on the latest master > > >>> branch. > > >>> > > >>> Steps: > > >>> - Install a Linux distribution as a guest, ensuring that the boot > > >>> disk > > >>> resides on non-IDE controllers (such as virtio) > > >>> - Attach another disk as a master device on the primary > > >>> IDE controller (i.e. attach at -hda.) > > >>> - Blacklist ata_piix, pata_acpi and ata_generic modules, and reboot. > > >>> - Copy the source files into the guest and build the module. > > >>> - Load the module. QEMU process should die with the message: > > >>> qemu-system-x86_64: hw/ide/core.c:871: ide_dma_cb: > > >>> Assertion `n * 512 == s->sg.size' failed. > > >>> > > >>> > > >>> -Amol > > >>> > > >> > > >> I'm less sure of the fix -- certainly the assert is wrong, but just > > >> incrementing 'n' is wrong too -- we didn't copy (n+1) sectors, we > > >> copied > > >> (n) and a few extra bytes. > > > > > > That is true. > > > > > > There are (at least) two fields that represent the total size of a DMA > > > transfer - > > > (1) The size, as requested through the NSECTOR field. > > > (2) The size, as calculated through the length fields of the PRD > > > entries. > > > > > > It makes sense to consider the most restrictive of the sizes, as the > > > factor > > > which determines both the end of a successful DMA transfer and the > > > condition to assert. > > > > > >> > > >> The sector-based math here would need to be adjusted to be able to > > >> cope > > >> with partial sector reads... or we ought to avoid doing any partial > > >> sector transfers. > > >> > > >> > > >> I'm not sure which is more correct tonight, it depends: > > >> > > >> - If it's OK to transfer partial sectors before reporting overflow, > > >> adjusting the command loop to work with partial sectors is OK. > > >> > > >> - If it's NOT OK to do partial sector transfer, the sglist > > >> preparation > > >> phase needs to produce a truncated SGList that's some multiple of 512 > > >> bytes that leaves the excess bytes in a second sglist that we don't > > >> throw away and can use as a basis for building the next sglist. (Or > > >> the > > >> DMA helpers need to take a max_bytes parameter and return an sglist > > >> representing unused buffer space if the command underflowed.) > > > > > > Support for
Re: [Qemu-devel] [PATCH V1 RESEND 4/6] numa: Extend the command-line to provide memory latency and bandwidth information
On 6/19/2018 11:39 PM, Eric Blake wrote: On 06/19/2018 10:20 AM, Liu Jingqi wrote: Add -numa hmat-lb option to provide System Locality Latency and Bandwidth Information. These memory attributes help to build System Locality Latency and Bandwidth Information Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT). Signed-off-by: Liu Jingqi --- numa.c | 124 qapi/misc.json | 92 - qemu-options.hx | 28 - 3 files changed, 241 insertions(+), 3 deletions(-) +++ b/qapi/misc.json @@ -2736,10 +2736,12 @@ # # @cpu: property based CPU(s) to node mapping (Since: 2.10) # +# @hmat-lb: memory latency and bandwidth information (Since: 2.13) s/2.13/3.0/ through your series ## +# @HmatLBMemoryHierarchy: +# +# The memory hierarchy in the System Locality Latency +# and Bandwidth Information Structure of HMAT Worth including the expansion of the acronym HMAT for someone not familiar with the term? +# +# @memory: the structure represents the memory performance +# +# @last-level: last level memory of memory side cached memory +# +# @1st-level: first level memory of memory side cached memory +# +# @2nd-level: second level memory of memory side cached memory +# +# @3rd-level: third level memory of memory side cached memory +# +# Since: 2.13 +## +{ 'enum': 'HmatLBMemoryHierarchy', + 'data': [ 'memory', 'last-level', '1st-level', + '2nd-level', '3rd-level' ] } enum values starting with a digit is permitted for legacy reasons, but I'm reluctant to add more without good cause. Can you spell these 'first, second, third' instead of '1st, 2nd, 3rd'? + +## +# @HmatLBDataType: +# +# Data type in the System Locality Latency +# and Bandwidth Information Structure of HMAT +# +# @access-latency: access latency +# +# @read-latency: read latency +# +# @write-latency: write latency +# +# @access-bandwidth: access bandwitch s/witch/width/ Also, in what units are these numbers? Thanks for your review. I will modify them accordingly. Jingqi
Re: [Qemu-devel] [PATCH v2 3/4] ppc/pnv: introduce Pnv8Chip and Pnv9Chip models
On Tue, Jun 19, 2018 at 07:24:44AM +0200, Cédric Le Goater wrote: > > >>> typedef struct PnvChipClass { > >>> /*< private >*/ > >>> @@ -75,6 +95,7 @@ typedef struct PnvChipClass { > >>> > >>> hwaddr xscom_base; > >>> > >>> +void (*realize)(PnvChip *chip, Error **errp); > >> > >> This looks the wrong way round from how things are usually done. > >> Rather than having the base class realize() call the subclass specific > >> realize hook, it's more usual for the subclass to set the > >> dc->realize() and have it call a k->parent_realize() to call up the > >> chain. grep for device_class_set_parent_realize() for some more > >> examples. > > > > Ah. That is more to my liking. There are a couple of models following > > the wrong object pattern, xics, vio. I will check them. > > So XICS is causing some head-aches because the ics-kvm model inherits > from ics-simple which inherits from ics-base. so we have a grand-parent > class to handle. Ok. I mean, we should probably switch ics around to use the parent_realize model, rather than the backwards one it does now. But it's not immediately obvious to me why having a grandparent class breaks things. > if we could affiliate ics-kvm directly to ics-base it would make the > family affair easier. we need to check migration though. But that said, I've been thinking for a while that it might make sense to fold ics-kvm into ics-base. It seems very risky to have two different object classes that are supposed to have guest-identical behaviour. Certainly adding any migratable state to one not the other would be horribly wrong. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [Qemu-devel] [Qemu-block] [RFC 1/1] ide: bug #1777315: io_buffer_size and sg.size can represent partial sector sizes
On Tue, Jun 19, 2018 at 05:43:52PM -0400, John Snow wrote: > > > On 06/19/2018 05:26 PM, Amol Surati wrote: > > On Tue, Jun 19, 2018 at 08:04:03PM +0530, Amol Surati wrote: > >> On Tue, Jun 19, 2018 at 09:45:15AM -0400, John Snow wrote: > >>> > >>> > >>> On 06/19/2018 04:53 AM, Kevin Wolf wrote: > Am 19.06.2018 um 06:01 hat Amol Surati geschrieben: > > On Mon, Jun 18, 2018 at 08:14:10PM -0400, John Snow wrote: > >> > >> > >> On 06/18/2018 02:02 PM, Amol Surati wrote: > >>> On Mon, Jun 18, 2018 at 12:05:15AM +0530, Amol Surati wrote: > This patch fixes the assumption that io_buffer_size is always a > perfect > multiple of the sector size. The assumption is the cause of the > firing > of 'assert(n * 512 == s->sg.size);'. > > Signed-off-by: Amol Surati > --- > >>> > >>> The repository https://github.com/asurati/1777315 contains a module > >>> for > >>> QEMU's 8086:7010 ATA controller, which exercises the code path > >>> described in [RFC 0/1] of this series. > >>> > >> > >> Thanks, this made it easier to see what was happening. I was able to > >> write an ide-test test case using this source as a guide, and reproduce > >> the error. > >> > >> static void test_bmdma_partial_sector_short_prdt(void) > >> { > >> QPCIDevice *dev; > >> QPCIBar bmdma_bar, ide_bar; > >> uint8_t status; > >> > >> /* Read 2 sectors but only give 1 sector in PRDT */ > >> PrdtEntry prdt[] = { > >> { > >> .addr = 0, > >> .size = cpu_to_le32(0x200), > >> }, > >> { > >> .addr = 512, > >> .size = cpu_to_le32(0x44 | PRDT_EOT), > >> } > >> }; > >> > >> dev = get_pci_device(_bar, _bar); > >> status = send_dma_request(CMD_READ_DMA, 0, 2, > >> prdt, ARRAY_SIZE(prdt), NULL); > >> g_assert_cmphex(status, ==, 0); > >> assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | > >> ERR); > >> free_pci_device(dev); > >> } > >> > >>> Loading the module reproduces the bug. Tested on the latest master > >>> branch. > >>> > >>> Steps: > >>> - Install a Linux distribution as a guest, ensuring that the boot disk > >>> resides on non-IDE controllers (such as virtio) > >>> - Attach another disk as a master device on the primary > >>> IDE controller (i.e. attach at -hda.) > >>> - Blacklist ata_piix, pata_acpi and ata_generic modules, and reboot. > >>> - Copy the source files into the guest and build the module. > >>> - Load the module. QEMU process should die with the message: > >>> qemu-system-x86_64: hw/ide/core.c:871: ide_dma_cb: > >>> Assertion `n * 512 == s->sg.size' failed. > >>> > >>> > >>> -Amol > >>> > >> > >> I'm less sure of the fix -- certainly the assert is wrong, but just > >> incrementing 'n' is wrong too -- we didn't copy (n+1) sectors, we > >> copied > >> (n) and a few extra bytes. > > > > That is true. > > > > There are (at least) two fields that represent the total size of a DMA > > transfer - > > (1) The size, as requested through the NSECTOR field. > > (2) The size, as calculated through the length fields of the PRD > > entries. > > > > It makes sense to consider the most restrictive of the sizes, as the > > factor > > which determines both the end of a successful DMA transfer and the > > condition to assert. > > > >> > >> The sector-based math here would need to be adjusted to be able to cope > >> with partial sector reads... or we ought to avoid doing any partial > >> sector transfers. > >> > >> > >> I'm not sure which is more correct tonight, it depends: > >> > >> - If it's OK to transfer partial sectors before reporting overflow, > >> adjusting the command loop to work with partial sectors is OK. > >> > >> - If it's NOT OK to do partial sector transfer, the sglist preparation > >> phase needs to produce a truncated SGList that's some multiple of 512 > >> bytes that leaves the excess bytes in a second sglist that we don't > >> throw away and can use as a basis for building the next sglist. (Or the > >> DMA helpers need to take a max_bytes parameter and return an sglist > >> representing unused buffer space if the command underflowed.) > > > > Support for partial sector transfers is built into the DMA interface's > > PRD > > mechanism itself, because an entry is allowed to transfer in the units > > of > > even number of bytes. > > > > I think the controller's IO process runs in two parts (probably loops > > over > > for a single transfer): > > > > (1) The
Re: [Qemu-devel] [PATCH v2 3/3] spapr: introduce a fixed IRQ number space
On Tue, Jun 19, 2018 at 12:05:21PM +0200, Cédric Le Goater wrote: > On 06/19/2018 03:02 AM, David Gibson wrote: > > On Mon, Jun 18, 2018 at 07:34:02PM +0200, Cédric Le Goater wrote: > >> This proposal introduces a new IRQ number space layout using static > >> numbers for all devices and a bitmap allocator for the MSI numbers > >> which are negotiated by the guest at runtime. > >> > >> The previous layout is kept in machines raising the 'xics_legacy' > >> flag. > >> > >> Signed-off-by: Cédric Le Goater > >> --- > >> include/hw/ppc/spapr.h | 4 > >> include/hw/ppc/spapr_irq.h | 30 + > >> hw/ppc/spapr.c | 31 + > >> hw/ppc/spapr_events.c | 12 -- > >> hw/ppc/spapr_irq.c | 56 > >> ++ > >> hw/ppc/spapr_pci.c | 28 ++- > >> hw/ppc/spapr_vio.c | 19 > >> hw/ppc/Makefile.objs | 2 +- > >> 8 files changed, 169 insertions(+), 13 deletions(-) > >> create mode 100644 include/hw/ppc/spapr_irq.h > >> create mode 100644 hw/ppc/spapr_irq.c > >> > >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > >> index 9decc66a1915..4c63b1fac13b 100644 > >> --- a/include/hw/ppc/spapr.h > >> +++ b/include/hw/ppc/spapr.h > >> @@ -7,6 +7,7 @@ > >> #include "hw/ppc/spapr_drc.h" > >> #include "hw/mem/pc-dimm.h" > >> #include "hw/ppc/spapr_ovec.h" > >> +#include "hw/ppc/spapr_irq.h" > >> > >> struct VIOsPAPRBus; > >> struct sPAPRPHBState; > >> @@ -164,6 +165,9 @@ struct sPAPRMachineState { > >> char *kvm_type; > >> > >> const char *icp_type; > >> +bool xics_legacy; > > > > This flag can go in the class, rather than the instance. > > > > And maybe call it 'legacy_irq_allocation'. It assumes XICS, but > > otherwise isn't strongly tied to it. > > Here's another idea. > > Instead of a bool, we could use a find() operation if it is defined > by the spapr_irq backend. So, I don't think find() should go into the irq backend you've been describing elsewhere. I think that should be restricted to the claim() side stuff. But you could make it a sPAPRMachineClass method, and use the static allocations if it's NULL. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v4 01/11] ppc4xx_i2c: Remove unimplemented sdata and intr registers
On Tue, Jun 19, 2018 at 10:52:15AM +0200, BALATON Zoltan wrote: > We don't emulate slave mode so related registers are not needed. > [lh]sadr are only retained to avoid too many warnings and simplify > debugging but sdata is not even correct because device has a 4 byte > FIFO instead so just remove this unimplemented register for now. > > The intr register is also not implemented correctly, it is for > diagnostics and normally not even visible on device without explicitly > enabling it. As no guests are known to need this remove it as well. > > Signed-off-by: BALATON Zoltan > --- > v4: Updated commit message Applied to ppc-for-3.0, thanks. > > hw/i2c/ppc4xx_i2c.c | 16 +--- > include/hw/i2c/ppc4xx_i2c.h | 4 +--- > 2 files changed, 2 insertions(+), 18 deletions(-) > > diff --git a/hw/i2c/ppc4xx_i2c.c b/hw/i2c/ppc4xx_i2c.c > index d1936db..4e0aaae 100644 > --- a/hw/i2c/ppc4xx_i2c.c > +++ b/hw/i2c/ppc4xx_i2c.c > @@ -3,7 +3,7 @@ > * > * Copyright (c) 2007 Jocelyn Mayer > * Copyright (c) 2012 François Revol > - * Copyright (c) 2016 BALATON Zoltan > + * Copyright (c) 2016-2018 BALATON Zoltan > * > * Permission is hereby granted, free of charge, to any person obtaining a > copy > * of this software and associated documentation files (the "Software"), to > deal > @@ -63,7 +63,6 @@ static void ppc4xx_i2c_reset(DeviceState *s) > i2c->mdcntl = 0; > i2c->sts = 0; > i2c->extsts = 0x8f; > -i2c->sdata = 0; > i2c->lsadr = 0; > i2c->hsadr = 0; > i2c->clkdiv = 0; > @@ -71,7 +70,6 @@ static void ppc4xx_i2c_reset(DeviceState *s) > i2c->xfrcnt = 0; > i2c->xtcntlss = 0; > i2c->directcntl = 0xf; > -i2c->intr = 0; > } > > static inline bool ppc4xx_i2c_is_master(PPC4xxI2CState *i2c) > @@ -139,9 +137,6 @@ static uint64_t ppc4xx_i2c_readb(void *opaque, hwaddr > addr, unsigned int size) >TYPE_PPC4xx_I2C, __func__); > } > break; > -case 2: > -ret = i2c->sdata; > -break; > case 4: > ret = i2c->lmadr; > break; > @@ -181,9 +176,6 @@ static uint64_t ppc4xx_i2c_readb(void *opaque, hwaddr > addr, unsigned int size) > case 16: > ret = i2c->directcntl; > break; > -case 17: > -ret = i2c->intr; > -break; > default: > if (addr < PPC4xx_I2C_MEM_SIZE) { > qemu_log_mask(LOG_UNIMP, "%s: Unimplemented register 0x%" > @@ -229,9 +221,6 @@ static void ppc4xx_i2c_writeb(void *opaque, hwaddr addr, > uint64_t value, > } > } > break; > -case 2: > -i2c->sdata = value; > -break; > case 4: > i2c->lmadr = value; > if (i2c_bus_busy(i2c->bus)) { > @@ -302,9 +291,6 @@ static void ppc4xx_i2c_writeb(void *opaque, hwaddr addr, > uint64_t value, > case 16: > i2c->directcntl = value & 0x7; > break; > -case 17: > -i2c->intr = value; > -break; > default: > if (addr < PPC4xx_I2C_MEM_SIZE) { > qemu_log_mask(LOG_UNIMP, "%s: Unimplemented register 0x%" > diff --git a/include/hw/i2c/ppc4xx_i2c.h b/include/hw/i2c/ppc4xx_i2c.h > index 3c60307..e4b6ded 100644 > --- a/include/hw/i2c/ppc4xx_i2c.h > +++ b/include/hw/i2c/ppc4xx_i2c.h > @@ -3,7 +3,7 @@ > * > * Copyright (c) 2007 Jocelyn Mayer > * Copyright (c) 2012 François Revol > - * Copyright (c) 2016 BALATON Zoltan > + * Copyright (c) 2016-2018 BALATON Zoltan > * > * Permission is hereby granted, free of charge, to any person obtaining a > copy > * of this software and associated documentation files (the "Software"), to > deal > @@ -49,7 +49,6 @@ typedef struct PPC4xxI2CState { > uint8_t mdcntl; > uint8_t sts; > uint8_t extsts; > -uint8_t sdata; > uint8_t lsadr; > uint8_t hsadr; > uint8_t clkdiv; > @@ -57,7 +56,6 @@ typedef struct PPC4xxI2CState { > uint8_t xfrcnt; > uint8_t xtcntlss; > uint8_t directcntl; > -uint8_t intr; > } PPC4xxI2CState; > > #endif /* PPC4XX_I2C_H */ -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v2 3/3] spapr: introduce a fixed IRQ number space
On Tue, Jun 19, 2018 at 07:00:18AM +0200, Cédric Le Goater wrote: > On 06/19/2018 03:02 AM, David Gibson wrote: > > On Mon, Jun 18, 2018 at 07:34:02PM +0200, Cédric Le Goater wrote: > >> This proposal introduces a new IRQ number space layout using static > >> numbers for all devices and a bitmap allocator for the MSI numbers > >> which are negotiated by the guest at runtime. > >> > >> The previous layout is kept in machines raising the 'xics_legacy' > >> flag. > >> > >> Signed-off-by: Cédric Le Goater > >> --- > >> include/hw/ppc/spapr.h | 4 > >> include/hw/ppc/spapr_irq.h | 30 + > >> hw/ppc/spapr.c | 31 + > >> hw/ppc/spapr_events.c | 12 -- > >> hw/ppc/spapr_irq.c | 56 > >> ++ > >> hw/ppc/spapr_pci.c | 28 ++- > >> hw/ppc/spapr_vio.c | 19 > >> hw/ppc/Makefile.objs | 2 +- > >> 8 files changed, 169 insertions(+), 13 deletions(-) > >> create mode 100644 include/hw/ppc/spapr_irq.h > >> create mode 100644 hw/ppc/spapr_irq.c > >> > >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > >> index 9decc66a1915..4c63b1fac13b 100644 > >> --- a/include/hw/ppc/spapr.h > >> +++ b/include/hw/ppc/spapr.h > >> @@ -7,6 +7,7 @@ > >> #include "hw/ppc/spapr_drc.h" > >> #include "hw/mem/pc-dimm.h" > >> #include "hw/ppc/spapr_ovec.h" > >> +#include "hw/ppc/spapr_irq.h" > >> > >> struct VIOsPAPRBus; > >> struct sPAPRPHBState; > >> @@ -164,6 +165,9 @@ struct sPAPRMachineState { > >> char *kvm_type; > >> > >> const char *icp_type; > >> +bool xics_legacy; > > > > This flag can go in the class, rather than the instance. > > > > And maybe call it 'legacy_irq_allocation'. It assumes XICS, but > > otherwise isn't strongly tied to it. > > OK. > > >> +int32_t irq_map_nr; > >> +unsigned long *irq_map; > > > > So, I don't love the fact that the new bitmap duplicates information > > that's also in the intc backend (e.g. via ICS_IRQ_FREE()). > > Yes. I agree. new devices using MSI like interrupts will follow the > same pattern for allocation. > > we have two layers of IRQ routines, one for the IRQ numbers and one > for the controller backend. May be we could call the backend handling > routing from the msi one ? > > > However > > leaving the authoritative info in the backend also causes problems > > when we have dynamic switching. Not entirely sure what to do about > > that. > > yes, if we put it in the IRQ backend (the current IRQ controller model > in use) we will have to synchronize the number spaces when the machine > switches interrupt mode. > > >> bool cmd_line_caps[SPAPR_CAP_NUM]; > >> sPAPRCapabilities def, eff, mig; > >> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h > >> new file mode 100644 > >> index ..345a42efd366 > >> --- /dev/null > >> +++ b/include/hw/ppc/spapr_irq.h > >> @@ -0,0 +1,30 @@ > >> +/* > >> + * QEMU PowerPC sPAPR IRQ backend definitions > >> + * > >> + * Copyright (c) 2018, IBM Corporation. > >> + * > >> + * This code is licensed under the GPL version 2 or later. See the > >> + * COPYING file in the top-level directory. > >> + */ > >> + > >> +#ifndef HW_SPAPR_IRQ_H > >> +#define HW_SPAPR_IRQ_H > >> + > >> +/* > >> + * IRQ range offsets per device type > >> + */ > >> +#define SPAPR_IRQ_EPOW 0x1000 /* XICS_IRQ_BASE offset */ > >> +#define SPAPR_IRQ_HOTPLUG0x1001 > >> +#define SPAPR_IRQ_VIO0x1100 /* 256 VIO devices */ > >> +#define SPAPR_IRQ_PCI_LSI0x1200 /* 32+ PHBs devices */ > >> + > >> +#define SPAPR_IRQ_MSI0x1300 /* Offset of the dynamic range > >> covered > >> + * by the bitmap allocator */ > > > > I'm a little confused by the MSI stuff. It looks like you're going > > for the option of one big pool for all dynamic irqs. Except that I > > thought in our discussion the other day you said each PHB advertised > > its own separate MSI range, so we'd actually need to split this up > > into ranges for each PHB. > > Yes we can also, but we don't really need to and it might be too much > constrained in fact. Ok. > As the IRQs are allocated dynamically, there is not a strong relation > between the device doing so and the IRQ numbers. The need for a well > defined IRQ number range is weak. We should provision a certain number > of IRQs of course to size our IRQ number space but even that could be > done dynamically. We can resize the bitmap and allocate new source > blocks under the KVM XICS/XIVE device if needed. The resulting code > is quite simple and the IRQ number space is also less fragmented. > > I think we have all the requirements in hand, the current ones and the > new ones for hotplug PHBs, XIVE interrupt model, CAPI (which should be > like the PHBs), XIVE user IRQs (like MSIs). The new ones are all > dynamic IRQ
Re: [Qemu-devel] [PATCH v3 1/2] kvm: support -dedicated cpu-pm=on|off
On Wed, 20 Jun 2018 at 08:07, Michael S. Tsirkin wrote: > > On Tue, Jun 19, 2018 at 05:07:46PM -0500, Eric Blake wrote: > > On 06/19/2018 10:17 AM, Paolo Bonzini wrote: > > > On 16/06/2018 00:29, Michael S. Tsirkin wrote: > > > > +static QemuOptsList qemu_dedicated_opts = { > > > > +.name = "dedicated", > > > > +.head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head), > > > > +.desc = { > > > > +{ > > > > +.name = "mem-lock", > > > > +.type = QEMU_OPT_BOOL, > > > > +}, > > > > +{ > > > > +.name = "cpu-pm", > > > > +.type = QEMU_OPT_BOOL, > > > > +}, > > > > +{ /* end of list */ } > > > > +}, > > > > +}; > > > > + > > > > > > Let the bikeshedding begin! > > > > > > 1) Should we deprecate -realtime? > > > > > > 2) Maybe -hostresource? > > > > What further things might we add in the future? > > > > -dedicated sounds wrong (it is an adjective, while most of our options are > > nouns - thing -machine, -drive, -object, ...) > > > > -hostresource at least sounds like a noun, but is long to type. But at > > least '-hostresource cpu-pm=on' reads reasonably well. > > Yes but host resource what? I feel it says nothing at all about what > one can expect to find in this flag. > > > About the only other noun I could think of would be '-feature cpu-pm=on'. > > If we have nothing at all to say about what is grouping these things, > we don't need a new flag. We can make it a machine property. > > It's user's hint that some host resource is dedicated to a VM. The commit 633711e82 (kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME) should be reverted according to several threads discussion I think. Regards, Wanpeng Li
Re: [Qemu-devel] [PATCH] hmp-commands: use long for begin and length in dump-guest-memory
On Tue, 2018-06-19 at 11:25 +0100, Dr. David Alan Gilbert wrote: > * Suraj Jitindar Singh (sjitindarsi...@gmail.com) wrote: > > The dump-guest-memory command is used to dump an area of guest > > memory > > to a file, the piece of memory is specified by a begin address and > > a length. These parameters are specified as ints and thus have a > > maximum > > value of 4GB. This means you can't dump the guest memory past the > > first > > 4GB and instead get: > > (qemu) dump-guest-memory tmp 0x1 0x1 > > 'dump-guest-memory' has failed: integer is for 32-bit values > > Try "help dump-guest-memory" for more information > > > > This limitation is imposed in monitor_parse_arguments() since they > > are > > both ints. hmp_dump_guest_memory() uses 64 bit quantities to store > > both > > the begin and length values. Thus specify begin and length as long > > so > > that the entire guest memory space can be dumped. > > > > Signed-off-by: Suraj Jitindar Singh > > --- > > hmp-commands.hx | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/hmp-commands.hx b/hmp-commands.hx > > index 0734fea931..3b5c1f65db 100644 > > --- a/hmp-commands.hx > > +++ b/hmp-commands.hx > > @@ -1116,7 +1116,7 @@ ETEXI > > > > { > > .name = "dump-guest-memory", > > -.args_type = "paging:-p,detach:-d,zlib:-z,lzo:-l,snappy:- > > s,filename:F,begin:i?,length:i?", > > +.args_type = "paging:-p,detach:-d,zlib:-z,lzo:-l,snappy:- > > s,filename:F,begin:l?,length:l?", > > .params = "[-p] [-d] [-z|-l|-s] filename [begin > > length]", > > .help = "dump guest memory into file > > 'filename'.\n\t\t\t" > >"-p: do paging to get guest's memory > > mapping.\n\t\t\t" > > OK, so hmp_dump_guest_memory in hmp.c already uses int64_t for both, > as does the qmp_dump_guest_memory it calls; so this looks OK. > > Can you repost this please with the correct sign off that I see you > tried to fix in the following mail; best if we get it in the one > mail. Of course. Done :) > > Dave > > > -- > > 2.13.6 > > > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-devel] [PATCH] [RESEND] hmp-commands: use long for begin and length in dump-guest-memory
The dump-guest-memory command is used to dump an area of guest memory to a file, the piece of memory is specified by a begin address and a length. These parameters are specified as ints and thus have a maximum value of 4GB. This means you can't dump the guest memory past the first 4GB and instead get: (qemu) dump-guest-memory tmp 0x1 0x1 'dump-guest-memory' has failed: integer is for 32-bit values Try "help dump-guest-memory" for more information This limitation is imposed in monitor_parse_arguments() since they are both ints. hmp_dump_guest_memory() uses 64 bit quantities to store both the begin and length values. Thus specify begin and length as long so that the entire guest memory space can be dumped. Signed-off-by: Suraj Jitindar Singh --- hmp-commands.hx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 0734fea931..3b5c1f65db 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1116,7 +1116,7 @@ ETEXI { .name = "dump-guest-memory", -.args_type = "paging:-p,detach:-d,zlib:-z,lzo:-l,snappy:-s,filename:F,begin:i?,length:i?", +.args_type = "paging:-p,detach:-d,zlib:-z,lzo:-l,snappy:-s,filename:F,begin:l?,length:l?", .params = "[-p] [-d] [-z|-l|-s] filename [begin length]", .help = "dump guest memory into file 'filename'.\n\t\t\t" "-p: do paging to get guest's memory mapping.\n\t\t\t" -- 2.13.6
Re: [Qemu-devel] [PATCH v3 1/2] kvm: support -dedicated cpu-pm=on|off
On Tue, Jun 19, 2018 at 05:07:46PM -0500, Eric Blake wrote: > On 06/19/2018 10:17 AM, Paolo Bonzini wrote: > > On 16/06/2018 00:29, Michael S. Tsirkin wrote: > > > +static QemuOptsList qemu_dedicated_opts = { > > > +.name = "dedicated", > > > +.head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head), > > > +.desc = { > > > +{ > > > +.name = "mem-lock", > > > +.type = QEMU_OPT_BOOL, > > > +}, > > > +{ > > > +.name = "cpu-pm", > > > +.type = QEMU_OPT_BOOL, > > > +}, > > > +{ /* end of list */ } > > > +}, > > > +}; > > > + > > > > Let the bikeshedding begin! > > > > 1) Should we deprecate -realtime? > > > > 2) Maybe -hostresource? > > What further things might we add in the future? > > -dedicated sounds wrong (it is an adjective, while most of our options are > nouns - thing -machine, -drive, -object, ...) > > -hostresource at least sounds like a noun, but is long to type. But at > least '-hostresource cpu-pm=on' reads reasonably well. Yes but host resource what? I feel it says nothing at all about what one can expect to find in this flag. > About the only other noun I could think of would be '-feature cpu-pm=on'. If we have nothing at all to say about what is grouping these things, we don't need a new flag. We can make it a machine property. It's user's hint that some host resource is dedicated to a VM. > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3266 > Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH v1 4/6] qga: removing switch statements, adding run_process_child
Hi On Tue, Jun 19, 2018 at 9:38 PM, Daniel Henrique Barboza wrote: > This is a cleanup of the resulting code after detaching > pmutils and Linux sys state file logic: > > - remove the SUSPEND_MODE_* macros and use an enumeration > instead. At the same time, drop the switch statements > at the start of each function and use the enumeration > index to get the right binary/argument; > > - create a new function called run_process_child(). This > function creates a child process and executes a (shell) > command, returning the command return code. This is a common What about using g_spawn_sync() instead? > operation in the pmutils functions and will be used in the > systemd implementation as well, so this function will avoid > code repetition. > > There are more places inside commands-posix.c where this new > run_process_child function can also be used, but one step > at a time. > > Signed-off-by: Daniel Henrique Barboza > --- > qga/commands-posix.c | 190 +-- > 1 file changed, 76 insertions(+), 114 deletions(-) > > diff --git a/qga/commands-posix.c b/qga/commands-posix.c > index a2870f9ab9..d5e3805ce9 100644 > --- a/qga/commands-posix.c > +++ b/qga/commands-posix.c > @@ -1438,152 +1438,122 @@ qmp_guest_fstrim(bool has_minimum, int64_t minimum, > Error **errp) > #define LINUX_SYS_STATE_FILE "/sys/power/state" > #define SUSPEND_SUPPORTED 0 > #define SUSPEND_NOT_SUPPORTED 1 > -#define SUSPEND_MODE_DISK 1 > -#define SUSPEND_MODE_RAM 2 > -#define SUSPEND_MODE_HYBRID 3 > > -static bool pmutils_supports_mode(int suspend_mode, Error **errp) > +typedef enum { > +SUSPEND_MODE_DISK = 0, > +SUSPEND_MODE_RAM = 1, > +SUSPEND_MODE_HYBRID = 2, > +} SuspendMode; > + > +static int run_process_child(const char *command[], Error **errp) > { > Error *local_err = NULL; > -const char *pmutils_arg; > -const char *pmutils_bin = "pm-is-supported"; > -char *pmutils_path; > +char *cmd_path = g_find_program_in_path(command[0]); > pid_t pid; > -int status; > -bool ret = false; > - > -switch (suspend_mode) { > - > -case SUSPEND_MODE_DISK: > -pmutils_arg = "--hibernate"; > -break; > -case SUSPEND_MODE_RAM: > -pmutils_arg = "--suspend"; > -break; > -case SUSPEND_MODE_HYBRID: > -pmutils_arg = "--suspend-hybrid"; > -break; > -default: > -return ret; > -} > +int status, ret = -1; > > -pmutils_path = g_find_program_in_path(pmutils_bin); > -if (!pmutils_path) { > +if (!cmd_path) { > return ret; > } > > pid = fork(); > if (!pid) { > setsid(); > -execle(pmutils_path, pmutils_bin, pmutils_arg, NULL, environ); > /* > - * If we get here execle() has failed. > + * execve receives a char* const argv[] as second arg but we're > + * receiving a const char*[]. Since execve does not change the > + * array contents it's tolerable to cast here. > */ > -_exit(SUSPEND_NOT_SUPPORTED); > +execve(cmd_path, (char* const*)command, environ); > +_exit(errno); > } else if (pid < 0) { > error_setg_errno(errp, errno, "failed to create child process"); > +ret = EXIT_FAILURE; > goto out; > } > > ga_wait_child(pid, , _err); > if (local_err) { > error_propagate(errp, local_err); > +ret = EXIT_FAILURE; > goto out; > } > > -switch (WEXITSTATUS(status)) { > -case SUSPEND_SUPPORTED: > -ret = true; > -goto out; > -case SUSPEND_NOT_SUPPORTED: > -goto out; > -default: > -error_setg(errp, > - "the helper program '%s' returned an unexpected exit > status" > - " code (%d)", pmutils_path, WEXITSTATUS(status)); > -goto out; > -} > +ret = WEXITSTATUS(status); > > out: > -g_free(pmutils_path); > +g_free(cmd_path); > return ret; > } > > -static void pmutils_suspend(int suspend_mode, Error **errp) > +static bool pmutils_supports_mode(SuspendMode mode, Error **errp) > { > Error *local_err = NULL; > -const char *pmutils_bin; > -char *pmutils_path; > -pid_t pid; > +const char *pmutils_args[3] = {"--hibernate", "--suspend", > + "--suspend-hybrid"}; > +const char *cmd[3] = {"pm-is-supported", pmutils_args[mode], NULL}; > int status; > > -switch (suspend_mode) { > - > -case SUSPEND_MODE_DISK: > -pmutils_bin = "pm-hibernate"; > -break; > -case SUSPEND_MODE_RAM: > -pmutils_bin = "pm-suspend"; > -break; > -case SUSPEND_MODE_HYBRID: > -pmutils_bin = "pm-suspend-hybrid"; > -break; > -default: > -error_setg(errp, "unknown guest suspend mode"); > -return; > -} > +status = run_process_child(cmd, _err); > > -pmutils_path =
Re: [Qemu-devel] [PATCH v1 3/6] qga: guest_suspend: decoupling pm-utils and sys logic
Hi On Tue, Jun 19, 2018 at 9:38 PM, Daniel Henrique Barboza wrote: > Following the same logic of the previous patch, let's also > decouple the suspend logic from guest_suspend into specialized > functions, one for each strategy we support at this moment. > > Signed-off-by: Daniel Henrique Barboza > --- > qga/commands-posix.c | 170 +++ > 1 file changed, 108 insertions(+), 62 deletions(-) > > diff --git a/qga/commands-posix.c b/qga/commands-posix.c > index 89ffd8dc88..a2870f9ab9 100644 > --- a/qga/commands-posix.c > +++ b/qga/commands-posix.c > @@ -1509,6 +1509,65 @@ out: > return ret; > } > > +static void pmutils_suspend(int suspend_mode, Error **errp) > +{ > +Error *local_err = NULL; > +const char *pmutils_bin; > +char *pmutils_path; > +pid_t pid; > +int status; > + > +switch (suspend_mode) { > + > +case SUSPEND_MODE_DISK: > +pmutils_bin = "pm-hibernate"; > +break; > +case SUSPEND_MODE_RAM: > +pmutils_bin = "pm-suspend"; > +break; > +case SUSPEND_MODE_HYBRID: > +pmutils_bin = "pm-suspend-hybrid"; > +break; > +default: > +error_setg(errp, "unknown guest suspend mode"); > +return; > +} > + > +pmutils_path = g_find_program_in_path(pmutils_bin); > +if (!pmutils_path) { > +error_setg(errp, "the helper program '%s' was not found", > pmutils_bin); > +return; > +} > + > +pid = fork(); > +if (!pid) { > +setsid(); > +execle(pmutils_path, pmutils_bin, NULL, environ); > +/* > + * If we get here execle() has failed. > + */ > +_exit(EXIT_FAILURE); > +} else if (pid < 0) { > +error_setg_errno(errp, errno, "failed to create child process"); > +goto out; > +} > + > +ga_wait_child(pid, , _err); > +if (local_err) { > +error_propagate(errp, local_err); > +goto out; > +} > + > +if (WEXITSTATUS(status)) { > +error_setg(errp, > + "the helper program '%s' returned an unexpected exit > status" > + " code (%d)", pmutils_path, WEXITSTATUS(status)); > +} > + > +out: > +g_free(pmutils_path); > +} > + > static bool linux_sys_state_supports_mode(int suspend_mode, Error **errp) > { > const char *sysfile_str; > @@ -1545,64 +1604,28 @@ static bool linux_sys_state_supports_mode(int > suspend_mode, Error **errp) > return false; > } > > -static void bios_supports_mode(int suspend_mode, Error **errp) > -{ > -Error *local_err = NULL; > -bool ret; > - > -ret = pmutils_supports_mode(suspend_mode, _err); > -if (ret) { > -return; > -} > -if (local_err) { > -error_propagate(errp, local_err); > -return; > -} > -ret = linux_sys_state_supports_mode(suspend_mode, errp); > -if (!ret) { > -error_setg(errp, > - "the requested suspend mode is not supported by the > guest"); > -return; > -} > -} > - > -static void guest_suspend(int suspend_mode, Error **errp) > +static void linux_sys_state_suspend(int suspend_mode, Error **errp) > { > Error *local_err = NULL; > -const char *pmutils_bin, *sysfile_str; > -char *pmutils_path; > +const char *sysfile_str; > pid_t pid; > int status; > > -bios_supports_mode(suspend_mode, _err); > -if (local_err) { > -error_propagate(errp, local_err); > -return; > -} > - > switch (suspend_mode) { > > case SUSPEND_MODE_DISK: > -pmutils_bin = "pm-hibernate"; > sysfile_str = "disk"; > break; > case SUSPEND_MODE_RAM: > -pmutils_bin = "pm-suspend"; > sysfile_str = "mem"; > break; > -case SUSPEND_MODE_HYBRID: > -pmutils_bin = "pm-suspend-hybrid"; > -sysfile_str = NULL; > -break; > default: > error_setg(errp, "unknown guest suspend mode"); > return; > } > > -pmutils_path = g_find_program_in_path(pmutils_bin); > - > pid = fork(); > -if (pid == 0) { > +if (!pid) { > /* child */ > int fd; > > @@ -1611,19 +1634,6 @@ static void guest_suspend(int suspend_mode, Error > **errp) > reopen_fd_to_null(1); > reopen_fd_to_null(2); > > -if (pmutils_path) { > -execle(pmutils_path, pmutils_bin, NULL, environ); > -} > - > -/* > - * If we get here either pm-utils is not installed or execle() has > - * failed. Let's try the manual method if the caller wants it. > - */ > - > -if (!sysfile_str) { > -_exit(EXIT_FAILURE); > -} > - > fd = open(LINUX_SYS_STATE_FILE, O_WRONLY); > if (fd < 0) { > _exit(EXIT_FAILURE); > @@ -1636,27 +1646,63 @@ static void guest_suspend(int suspend_mode, Error > **errp) > _exit(EXIT_SUCCESS); > } else if (pid < 0) {
Re: [Qemu-devel] [PATCH] [RFC v2] aio: properly bubble up errors from initialization
On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote: > On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote: > > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote: > > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: > > > > > > > } else if (s->use_linux_aio) { > > > > +int rc; > > > > +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > > > +if (rc != 0) { > > > > +error_report("Unable to use native AIO, falling back > > > > to " > > > > + "thread pool."); > > > > > > In general, error_report() should not output a trailing '.'. > > > > Will fix. > > > > > > +s->use_linux_aio = 0; > > > > +return rc; > > > > > > Wait - the message claims we are falling back, but the non-zero return > > > code > > > sounds like we are returning an error instead of falling back. (My > > > preference - if the user requested something and we can't do it, it's > > > better > > > to error than to fall back to something that does not match the user's > > > request). > > > > I think that makes sense, I hadn't tested this specific case (in my > > reading of the code, it wasn't clear to me if raw_co_prw() could be > > called before raw_aio_plug() had been called, but I think returning the > > error code up should be handled correctly. What about the cases where > > there is no error handling (the other two changes in the patch)? > > While looking at doing these changes, I realized that I'm not quite sure > what the right approach is here. My original rationale for returning > non-zero was that AIO was requested but could not be completed. I > haven't fully tracked back the calling paths, but I assumed it would get > retried at the top level, and since we indicated to not use AIO on > subsequent calls, it will succeed and use threads then (note, that I do > now realize this means a mismatch between the qemu command-line and the > in-use AIO model). > > In practice, with my v2 patch, where I do return a non-zero error-code > from this function, qemu does not exit (nor is any logging other than > that I added emitted on the monitor). If I do not fallback, I imagine we > would just continuously see this error message and IO might not actually > every occur? Reworking all of the callpath to fail on non-zero returns > from raw_co_prw() seems like a fair bit of work, but if that is what is > being requested, I can try that (it will just take a while). > Alternatively, I can produce a v3 quickly that does not bubble the > actual errno all the way up (since it does seem like it is ignored > anyways?). Sorry for the noise, but I had one more thought. Would it be appropriate to push the _setup() call up to when we parse the arguments about aio=native? E.g., we already check there if cache=directsync is specified and error out if not. We could, in theory, also call laio_init() there (via the new function) and error out to the CLI if that fails. Then the runtime paths would simply be able to use the context that was setup earlier? I would need to verify the laio_cleanup() happens correctly still. Thanks, Nish
Re: [Qemu-devel] [PATCH] [RFC v2] aio: properly bubble up errors from initialization
On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote: > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote: > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: > > > } else if (s->use_linux_aio) { > > > +int rc; > > > +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > > +if (rc != 0) { > > > +error_report("Unable to use native AIO, falling back to " > > > + "thread pool."); > > > > In general, error_report() should not output a trailing '.'. > > Will fix. > > > > +s->use_linux_aio = 0; > > > +return rc; > > > > Wait - the message claims we are falling back, but the non-zero return code > > sounds like we are returning an error instead of falling back. (My > > preference - if the user requested something and we can't do it, it's better > > to error than to fall back to something that does not match the user's > > request). > > I think that makes sense, I hadn't tested this specific case (in my > reading of the code, it wasn't clear to me if raw_co_prw() could be > called before raw_aio_plug() had been called, but I think returning the > error code up should be handled correctly. What about the cases where > there is no error handling (the other two changes in the patch)? While looking at doing these changes, I realized that I'm not quite sure what the right approach is here. My original rationale for returning non-zero was that AIO was requested but could not be completed. I haven't fully tracked back the calling paths, but I assumed it would get retried at the top level, and since we indicated to not use AIO on subsequent calls, it will succeed and use threads then (note, that I do now realize this means a mismatch between the qemu command-line and the in-use AIO model). In practice, with my v2 patch, where I do return a non-zero error-code from this function, qemu does not exit (nor is any logging other than that I added emitted on the monitor). If I do not fallback, I imagine we would just continuously see this error message and IO might not actually every occur? Reworking all of the callpath to fail on non-zero returns from raw_co_prw() seems like a fair bit of work, but if that is what is being requested, I can try that (it will just take a while). Alternatively, I can produce a v3 quickly that does not bubble the actual errno all the way up (since it does seem like it is ignored anyways?). > > > +s->use_linux_aio = 0; > > > > Should s->use_linux_aio be a bool instead of an int? > > It is: > > bool use_linux_aio:1; > > would you prefer I did a preparatory patch that converted users to > true/false? Sorry, I misunderstood this -- only my patch does an assignment, so I'll switch to 'false'. Thanks, Nish
[Qemu-devel] [Bug 1776920] Re: qemu-img convert on Mac OSX creates corrupt images
Have I provided all necessary data and other details? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1776920 Title: qemu-img convert on Mac OSX creates corrupt images Status in QEMU: New Bug description: An image created by qemu-img create, then modified by another program is converted to bad/corrupt image when using convert sub command on Mac OSX. The same convert works on Linux. The version of qemu-img is 2.12. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1776920/+subscriptions
Re: [Qemu-devel] [PATCH 2/6] nbd: allow authorization with nbd-server-start QMP command
On Tue, Jun 19, 2018 at 03:10:12PM -0500, Eric Blake wrote: > On 06/15/2018 10:50 AM, Daniel P. Berrangé wrote: > > From: "Daniel P. Berrange" > > > > As with the previous patch to qemu-nbd, the nbd-server-start QMP command > > also needs to be able to specify authorization when enabling TLS encryption. > > > > First the client must create a QAuthZ object instance using the > > 'object-add' command: > > > > { > > 'execute': 'object-add', > > 'arguments': { > > 'qom-type': 'authz-simple', > > 'id': 'authz0', > > 'parameters': { > > 'policy': 'deny', > > 'rules': [ > > { > > 'match': '*CN=fred', > > 'policy': 'allow' > > } > > ] > > } > > } > > } > > > > They can then reference this in the new 'tls-authz' parameter when > > executing the 'nbd-server-start' command: > > > > { > > 'execute': 'nbd-server-start', > > 'arguments': { > > 'addr': { > > 'type': 'inet', > > 'host': '127.0.0.1', > > 'port': '9000' > > }, > > 'tls-creds': 'tls0', > > 'tls-authz': 'authz0' > > } > > } > > Is it worth using a discriminated union (string vs. QAuthZ) so that one > could specify the authz policy inline rather than as a separate object, for > convenience? But that would be fine as a followup patch, if we even want > it. QAuthZ isn't a QAPI type - its a QOM object interface, so you'd have to allow the entire object_add arg set inline, and then validate the QOM type you received after the fact actually implemented the interface. Also for migration at least it is likely the single authz impl will be shared for both migration + nbd services. So I think its cleaner just to keep it separate to avoid having 2 distinct codepaths for handling the same thing Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [Qemu-devel] [PATCH v3 1/2] kvm: support -dedicated cpu-pm=on|off
On 06/19/2018 10:17 AM, Paolo Bonzini wrote: On 16/06/2018 00:29, Michael S. Tsirkin wrote: +static QemuOptsList qemu_dedicated_opts = { +.name = "dedicated", +.head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head), +.desc = { +{ +.name = "mem-lock", +.type = QEMU_OPT_BOOL, +}, +{ +.name = "cpu-pm", +.type = QEMU_OPT_BOOL, +}, +{ /* end of list */ } +}, +}; + Let the bikeshedding begin! 1) Should we deprecate -realtime? 2) Maybe -hostresource? What further things might we add in the future? -dedicated sounds wrong (it is an adjective, while most of our options are nouns - thing -machine, -drive, -object, ...) -hostresource at least sounds like a noun, but is long to type. But at least '-hostresource cpu-pm=on' reads reasonably well. About the only other noun I could think of would be '-feature cpu-pm=on'. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH v1 0/6] QGA: systemd hibernate/suspend/hybrid-sleep
Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20180619193806.17419-1-danielhb...@gmail.com Subject: [Qemu-devel] [PATCH v1 0/6] QGA: systemd hibernate/suspend/hybrid-sleep === TEST SCRIPT BEGIN === #!/bin/bash BASE=base n=1 total=$(git log --oneline $BASE.. | wc -l) failed=0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram commits="$(git log --format=%H --reverse $BASE..)" for c in $commits; do echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then failed=1 echo fi n=$((n+1)) done exit $failed === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' cc696f1009 qga: removing bios_supports_mode dc712bbec2 qga: adding systemd hibernate/suspend/hybrid-sleep support 3be456b7f1 qga: removing switch statements, adding run_process_child 31cf0e6b15 qga: guest_suspend: decoupling pm-utils and sys logic 314d3a05ca qga: bios_supports_mode: decoupling pm-utils and sys logic da135a7c9f qga: refactoring qmp_guest_suspend_* functions === OUTPUT BEGIN === Checking PATCH 1/6: qga: refactoring qmp_guest_suspend_* functions... Checking PATCH 2/6: qga: bios_supports_mode: decoupling pm-utils and sys logic... Checking PATCH 3/6: qga: guest_suspend: decoupling pm-utils and sys logic... Checking PATCH 4/6: qga: removing switch statements, adding run_process_child... ERROR: "(foo* const*)" should be "(foo * const*)" #91: FILE: qga/commands-posix.c:1467: +execve(cmd_path, (char* const*)command, environ); ERROR: space required before that '*' (ctx:VxB) #91: FILE: qga/commands-posix.c:1467: +execve(cmd_path, (char* const*)command, environ); ^ total: 2 errors, 0 warnings, 295 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Checking PATCH 5/6: qga: adding systemd hibernate/suspend/hybrid-sleep support... Checking PATCH 6/6: qga: removing bios_supports_mode... === OUTPUT END === Test command exited with code: 1 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: [Qemu-devel] [PATCH 00/113] Patch Round-up for stable 2.11.2, freeze on 2018-06-22
>>> On 6/18/2018 at 7:41 PM, Michael Roth wrote: > Hi everyone, > > The following new patches are queued for QEMU stable v2.11.2: > > https://github.com/mdroth/qemu/commits/stable-2.11-staging > > The release is planned for 2018-06-22: > > https://wiki.qemu.org/Planning/2.11 > > Please respond here or CC qemu-sta...@nongnu.org on any patches you > think should be included in the release. > For openSUSE Leap 15's qemu package, based on v2.11.1, we also add these patches: commit bb223055b9b327ec66e1f6d2fbaebaee0b8f3dbe Author: Christian Borntraeger Date: Mon Dec 11 13:21:46 2017 +0100 s390-ccw-virtio: allow for systems larger that 7.999TB commit 05b71fb207ab7f016e067bd2a40fc0804362eb74 Author: Marc-André Lureau Date: Mon Jan 29 19:33:04 2018 +0100 tpm: lookup cancel path under tpm device class Bruce
Re: [Qemu-devel] [Qemu-block] [RFC 1/1] ide: bug #1777315: io_buffer_size and sg.size can represent partial sector sizes
On 06/19/2018 05:26 PM, Amol Surati wrote: > On Tue, Jun 19, 2018 at 08:04:03PM +0530, Amol Surati wrote: >> On Tue, Jun 19, 2018 at 09:45:15AM -0400, John Snow wrote: >>> >>> >>> On 06/19/2018 04:53 AM, Kevin Wolf wrote: Am 19.06.2018 um 06:01 hat Amol Surati geschrieben: > On Mon, Jun 18, 2018 at 08:14:10PM -0400, John Snow wrote: >> >> >> On 06/18/2018 02:02 PM, Amol Surati wrote: >>> On Mon, Jun 18, 2018 at 12:05:15AM +0530, Amol Surati wrote: This patch fixes the assumption that io_buffer_size is always a perfect multiple of the sector size. The assumption is the cause of the firing of 'assert(n * 512 == s->sg.size);'. Signed-off-by: Amol Surati --- >>> >>> The repository https://github.com/asurati/1777315 contains a module for >>> QEMU's 8086:7010 ATA controller, which exercises the code path >>> described in [RFC 0/1] of this series. >>> >> >> Thanks, this made it easier to see what was happening. I was able to >> write an ide-test test case using this source as a guide, and reproduce >> the error. >> >> static void test_bmdma_partial_sector_short_prdt(void) >> { >> QPCIDevice *dev; >> QPCIBar bmdma_bar, ide_bar; >> uint8_t status; >> >> /* Read 2 sectors but only give 1 sector in PRDT */ >> PrdtEntry prdt[] = { >> { >> .addr = 0, >> .size = cpu_to_le32(0x200), >> }, >> { >> .addr = 512, >> .size = cpu_to_le32(0x44 | PRDT_EOT), >> } >> }; >> >> dev = get_pci_device(_bar, _bar); >> status = send_dma_request(CMD_READ_DMA, 0, 2, >> prdt, ARRAY_SIZE(prdt), NULL); >> g_assert_cmphex(status, ==, 0); >> assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | ERR); >> free_pci_device(dev); >> } >> >>> Loading the module reproduces the bug. Tested on the latest master >>> branch. >>> >>> Steps: >>> - Install a Linux distribution as a guest, ensuring that the boot disk >>> resides on non-IDE controllers (such as virtio) >>> - Attach another disk as a master device on the primary >>> IDE controller (i.e. attach at -hda.) >>> - Blacklist ata_piix, pata_acpi and ata_generic modules, and reboot. >>> - Copy the source files into the guest and build the module. >>> - Load the module. QEMU process should die with the message: >>> qemu-system-x86_64: hw/ide/core.c:871: ide_dma_cb: >>> Assertion `n * 512 == s->sg.size' failed. >>> >>> >>> -Amol >>> >> >> I'm less sure of the fix -- certainly the assert is wrong, but just >> incrementing 'n' is wrong too -- we didn't copy (n+1) sectors, we copied >> (n) and a few extra bytes. > > That is true. > > There are (at least) two fields that represent the total size of a DMA > transfer - > (1) The size, as requested through the NSECTOR field. > (2) The size, as calculated through the length fields of the PRD entries. > > It makes sense to consider the most restrictive of the sizes, as the > factor > which determines both the end of a successful DMA transfer and the > condition to assert. > >> >> The sector-based math here would need to be adjusted to be able to cope >> with partial sector reads... or we ought to avoid doing any partial >> sector transfers. >> >> >> I'm not sure which is more correct tonight, it depends: >> >> - If it's OK to transfer partial sectors before reporting overflow, >> adjusting the command loop to work with partial sectors is OK. >> >> - If it's NOT OK to do partial sector transfer, the sglist preparation >> phase needs to produce a truncated SGList that's some multiple of 512 >> bytes that leaves the excess bytes in a second sglist that we don't >> throw away and can use as a basis for building the next sglist. (Or the >> DMA helpers need to take a max_bytes parameter and return an sglist >> representing unused buffer space if the command underflowed.) > > Support for partial sector transfers is built into the DMA interface's PRD > mechanism itself, because an entry is allowed to transfer in the units of > even number of bytes. > > I think the controller's IO process runs in two parts (probably loops over > for a single transfer): > > (1) The controller's disk interface transfers between its internal buffer > and the disk storage. The transfers are likely to be in the > multiples of a sector. > (2) The controller's DMA interface transfers between its internal buffer > and the system memory. The transfers can be sub-sector in size(, and > are preserving of the areas, of the internal
[Qemu-devel] [PATCH v16 2/3] i386: Enable TOPOEXT feature on AMD EPYC CPU
Enable TOPOEXT feature on EPYC CPU. This is required to support hyperthreading on VM guests. Also extend xlevel to 0x801E. Disable topoext on PC_COMPAT_2_12 and keep xlevel 0x800a. Signed-off-by: Babu Moger --- include/hw/i386/pc.h | 8 target/i386/cpu.c| 10 ++ 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index fc8dedc..d0ebeb9 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -303,6 +303,14 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *); .driver = TYPE_X86_CPU,\ .property = "legacy-cache",\ .value= "on",\ +},{\ +.driver = TYPE_X86_CPU,\ +.property = "topoext",\ +.value= "off",\ +},{\ +.driver = "EPYC-" TYPE_X86_CPU,\ +.property = "xlevel",\ +.value= stringify(0x800a),\ }, #define PC_COMPAT_2_11 \ diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 130391c..d6ed29b 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -2579,7 +2579,8 @@ static X86CPUDefinition builtin_x86_defs[] = { .features[FEAT_8000_0001_ECX] = CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM | -CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM, +CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM | +CPUID_EXT3_TOPOEXT, .features[FEAT_7_0_EBX] = CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_RDSEED | @@ -2594,7 +2595,7 @@ static X86CPUDefinition builtin_x86_defs[] = { CPUID_XSAVE_XGETBV1, .features[FEAT_6_EAX] = CPUID_6_EAX_ARAT, -.xlevel = 0x800A, +.xlevel = 0x801E, .model_id = "AMD EPYC Processor", .cache_info = _cache_info, }, @@ -2624,7 +2625,8 @@ static X86CPUDefinition builtin_x86_defs[] = { .features[FEAT_8000_0001_ECX] = CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM | -CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM, +CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM | +CPUID_EXT3_TOPOEXT, .features[FEAT_8000_0008_EBX] = CPUID_8000_0008_EBX_IBPB, .features[FEAT_7_0_EBX] = @@ -2641,7 +2643,7 @@ static X86CPUDefinition builtin_x86_defs[] = { CPUID_XSAVE_XGETBV1, .features[FEAT_6_EAX] = CPUID_6_EAX_ARAT, -.xlevel = 0x800A, +.xlevel = 0x801E, .model_id = "AMD EPYC Processor (with IBPB)", .cache_info = _cache_info, }, -- 1.8.3.1
[Qemu-devel] [PATCH v16 1/3] i386: Fix up the Node id for CPUID_8000_001E
This is part of topoext support. To keep the compatibility, it is better we support all the combination of nr_cores and nr_threads currently supported. By allowing more nr_cores and nr_threads, we might end up with more nodes than we can actually support with the real hardware. We need to fix up the node id to make this work. We can achieve this by shifting the socket_id bits left to address more nodes. Signed-off-by: Babu Moger --- target/i386/cpu.c | 26 +- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 7a4484b..130391c 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -19,6 +19,7 @@ #include "qemu/osdep.h" #include "qemu/cutils.h" +#include "qemu/bitops.h" #include "cpu.h" #include "exec/exec-all.h" @@ -472,6 +473,8 @@ static void encode_topo_cpuid801e(CPUState *cs, X86CPU *cpu, uint32_t *ecx, uint32_t *edx) { struct core_topology topo = {0}; +unsigned long nodes; +int shift; build_core_topology(cs->nr_cores, cpu->core_id, ); *eax = cpu->apic_id; @@ -504,7 +507,28 @@ static void encode_topo_cpuid801e(CPUState *cs, X86CPU *cpu, * 2 Socket id * 1:0 Node id */ -*ecx = ((topo.num_nodes - 1) << 8) | (cpu->socket_id << 2) | topo.node_id; +if (topo.num_nodes <= 4) { +*ecx = ((topo.num_nodes - 1) << 8) | (cpu->socket_id << 2) | +topo.node_id; +} else { +/* + * Node id fix up. Actual hardware supports up to 4 nodes. But with + * more than 32 cores, we may end up with more than 4 nodes. + * Node id is a combination of socket id and node id. Only requirement + * here is that this number should be unique accross the system. + * Shift the socket id to accommodate more nodes. We dont expect both + * socket id and node id to be big number at the same time. This is not + * an ideal config but we need to to support it. Max nodes we can have + * is 32 (255/8) with 8 cores per node and 255 max cores. We only need + * 5 bits for nodes. Find the left most set bit to represent the total + * number of nodes. find_last_bit returns last set bit(0 based). Left + * shift(+1) the socket id to represent all the nodes. + */ +nodes = topo.num_nodes - 1; +shift = find_last_bit(, 8); +*ecx = ((topo.num_nodes - 1) << 8) | (cpu->socket_id << (shift + 1)) | +topo.node_id; +} *edx = 0; } -- 1.8.3.1
[Qemu-devel] [PATCH v16 0/3] i386: Enable TOPOEXT to support hyperthreading on AMD CPU
This series enables the TOPOEXT feature for AMD CPUs. This is required to support hyperthreading on kvm guests. This addresses the issues reported in these bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1481253 https://bugs.launchpad.net/qemu/+bug/1703506 v16: Patches are based off of Eduardo's git://github.com/ehabkost/qemu.git x86-next. Some of the patches are queued already. Submitting remaining series. Will be on vacation for couple of weeks. Wanted to fix one issue before I go. 1. Fixed the bit shifting issue with patch #1. Added more comments about the change. v15: Patches are based off of Eduardo's git://github.com/ehabkost/qemu.git x86-next. Some of the patches are queued already. Submitting remaining series. Summary of changes. 1. Added changes to support all the currently supported nr_cores and nr_threads. Fixed up the node id to support this. 2. Removed topology_supports_topoext function. This is not required anymore as we allow all the combinations to work now. 3. Fixed other feedback from Eduardo for v14. v14: Patches are based off of Eduardo's git://github.com/ehabkost/qemu.git x86-next. Some of the patches are queued already. Submitting remaining series. Summary of changes. 1. Always set TOPOEXT feature in kvm_arch_get_supported_cpuid 2. Implemented topology_supports_topoext bit differently. Reason for this is, if we need to disable this feature before the x86_cpu_expand_features. But problem is nr_cores and nr_threads are not populated at this time. It is populated in qemu_init_vcpus. 3. Removed auto-topoext feature completely. The can cause lots of compatibility issues. v13: Patches are based off of Eduardo's git://github.com/ehabkost/qemu.git x86-next. Some of the patches are queued already. Submitting remaining series. Summary of changes. 1.Fixed the error format if the topology cannot be supported. 2.Fixed the compatibility issues with old cpu models and new machine types. Here is the discussion thread. Here is the discussion thread. https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg01239.html 3.I am still testing it. But sending it to get review feedback. v12: Patches are based off of Eduardo's git://github.com/ehabkost/qemu.git x86-next. Some of the patches are queued already. Submitting remaining series. Summary of changes. 1.Added more comments explaining CPUID_Fn801E bit definitions. 2.Split the patch into separate patch to check the topology. Moved the code to x86_cpu_realizefn. Display the error if topoext feature cannot be enabled. 3.Few more text corrections. v11: Patches are based off of Eduardo's git://github.com/ehabkost/qemu.git x86-next. Summary of changes. 1.Added more comments explaining different constants and variables. 2.Removed NUM_SHARING_CACHE macro and made the code simpler. 3.Changed the function name num_sharing_l3_cache to cores_in_core_complex. This function is actually finding the number of cores in a core complex. Purpose here is to re-use the code in couple more places. 4.Added new function nodes_in_socket to find number of nodes in the config. Purpose here is to re-use the code. 5.Used DIV_ROUND_UP wherever applicable. 6.Renamed few constants and functions to generic names. 7.Few more text corrections. v10: Based the patches on Eduardo's git://github.com/ehabkost/qemu.git x86-next Some of the earlier patches are already queued. So, submitting the rest of the series here. This series adds complete redesign of the cpu topology. Based on user given parameter, we try to build topology very close to the hardware. Maintains symmetry as much as possible. Added new function epyc_build_topology to build the topology based on user given nr_cores, nr_threads. Summary of changes. 1. Build the topology dinamically based on nr_cores and nr_threads 2. Added new epyc_build_topology to build the new topology. 3. Added new function num_sharing_l3_cache to calculate the L3 sharing 4. Added a check to verify the topology. Disabled the TOPOEXT if the topology cannot be built. v9: Based the patches on Eduardo's git://github.com/ehabkost/qemu.git x86-next tree. Following 3 patches from v8 are already queued. i386: Add cache information in X86CPUDefinition i386: Initialize cache information for EPYC family processors i386: Helpers to encode cache information consistently So, submitting the rest of the series here. Changes: 1. Included Eduardo's clean up patch 2. Added 2.13 machine types 3. Disabled topoext for 2.12 and below versions. 4. Added the assert to core_id as discussed. v8: Addressed feedback from Eduardo. Thanks Eduardo for being patient with me. Tested on AMD EPYC server and also did some basic testing on intel box. Summary of changes. 1. Reverted back l2 cache associativity. Kept it same as legacy. 2. Changed cache_info structure in X86CPUDefinition and CPUX86State to pointers. 3. Added legacy_cache property in PC_COMPAT_2_12
[Qemu-devel] [PATCH v16 3/3] i386: Remove generic SMT thread check
Remove generic non-intel check while validating hyperthreading support. Certain AMD CPUs can support hyperthreading now. CPU family with TOPOEXT feature can support hyperthreading now. Signed-off-by: Babu Moger Tested-by: Geoffrey McRae Reviewed-by: Eduardo Habkost --- target/i386/cpu.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index d6ed29b..e6c2f8a 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -4985,17 +4985,22 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) qemu_init_vcpu(cs); -/* Only Intel CPUs support hyperthreading. Even though QEMU fixes this - * issue by adjusting CPUID__0001_EBX and CPUID_8000_0008_ECX - * based on inputs (sockets,cores,threads), it is still better to gives +/* + * Most Intel and certain AMD CPUs support hyperthreading. Even though QEMU + * fixes this issue by adjusting CPUID__0001_EBX and CPUID_8000_0008_ECX + * based on inputs (sockets,cores,threads), it is still better to give * users a warning. * * NOTE: the following code has to follow qemu_init_vcpu(). Otherwise * cs->nr_threads hasn't be populated yet and the checking is incorrect. */ -if (!IS_INTEL_CPU(env) && cs->nr_threads > 1 && !ht_warned) { -error_report("AMD CPU doesn't support hyperthreading. Please configure" - " -smp options properly."); + if (IS_AMD_CPU(env) && + !(env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_TOPOEXT) && + cs->nr_threads > 1 && !ht_warned) { +error_report("This family of AMD CPU doesn't support " + "hyperthreading(%d). Please configure -smp " + "options properly or try enabling topoext feature.", + cs->nr_threads); ht_warned = true; } -- 1.8.3.1
Re: [Qemu-devel] [Qemu-block] [RFC 1/1] ide: bug #1777315: io_buffer_size and sg.size can represent partial sector sizes
On Tue, Jun 19, 2018 at 08:04:03PM +0530, Amol Surati wrote: > On Tue, Jun 19, 2018 at 09:45:15AM -0400, John Snow wrote: > > > > > > On 06/19/2018 04:53 AM, Kevin Wolf wrote: > > > Am 19.06.2018 um 06:01 hat Amol Surati geschrieben: > > >> On Mon, Jun 18, 2018 at 08:14:10PM -0400, John Snow wrote: > > >>> > > >>> > > >>> On 06/18/2018 02:02 PM, Amol Surati wrote: > > On Mon, Jun 18, 2018 at 12:05:15AM +0530, Amol Surati wrote: > > > This patch fixes the assumption that io_buffer_size is always a > > > perfect > > > multiple of the sector size. The assumption is the cause of the firing > > > of 'assert(n * 512 == s->sg.size);'. > > > > > > Signed-off-by: Amol Surati > > > --- > > > > The repository https://github.com/asurati/1777315 contains a module for > > QEMU's 8086:7010 ATA controller, which exercises the code path > > described in [RFC 0/1] of this series. > > > > >>> > > >>> Thanks, this made it easier to see what was happening. I was able to > > >>> write an ide-test test case using this source as a guide, and reproduce > > >>> the error. > > >>> > > >>> static void test_bmdma_partial_sector_short_prdt(void) > > >>> { > > >>> QPCIDevice *dev; > > >>> QPCIBar bmdma_bar, ide_bar; > > >>> uint8_t status; > > >>> > > >>> /* Read 2 sectors but only give 1 sector in PRDT */ > > >>> PrdtEntry prdt[] = { > > >>> { > > >>> .addr = 0, > > >>> .size = cpu_to_le32(0x200), > > >>> }, > > >>> { > > >>> .addr = 512, > > >>> .size = cpu_to_le32(0x44 | PRDT_EOT), > > >>> } > > >>> }; > > >>> > > >>> dev = get_pci_device(_bar, _bar); > > >>> status = send_dma_request(CMD_READ_DMA, 0, 2, > > >>> prdt, ARRAY_SIZE(prdt), NULL); > > >>> g_assert_cmphex(status, ==, 0); > > >>> assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | ERR); > > >>> free_pci_device(dev); > > >>> } > > >>> > > Loading the module reproduces the bug. Tested on the latest master > > branch. > > > > Steps: > > - Install a Linux distribution as a guest, ensuring that the boot disk > > resides on non-IDE controllers (such as virtio) > > - Attach another disk as a master device on the primary > > IDE controller (i.e. attach at -hda.) > > - Blacklist ata_piix, pata_acpi and ata_generic modules, and reboot. > > - Copy the source files into the guest and build the module. > > - Load the module. QEMU process should die with the message: > > qemu-system-x86_64: hw/ide/core.c:871: ide_dma_cb: > > Assertion `n * 512 == s->sg.size' failed. > > > > > > -Amol > > > > >>> > > >>> I'm less sure of the fix -- certainly the assert is wrong, but just > > >>> incrementing 'n' is wrong too -- we didn't copy (n+1) sectors, we copied > > >>> (n) and a few extra bytes. > > >> > > >> That is true. > > >> > > >> There are (at least) two fields that represent the total size of a DMA > > >> transfer - > > >> (1) The size, as requested through the NSECTOR field. > > >> (2) The size, as calculated through the length fields of the PRD entries. > > >> > > >> It makes sense to consider the most restrictive of the sizes, as the > > >> factor > > >> which determines both the end of a successful DMA transfer and the > > >> condition to assert. > > >> > > >>> > > >>> The sector-based math here would need to be adjusted to be able to cope > > >>> with partial sector reads... or we ought to avoid doing any partial > > >>> sector transfers. > > >>> > > >>> > > >>> I'm not sure which is more correct tonight, it depends: > > >>> > > >>> - If it's OK to transfer partial sectors before reporting overflow, > > >>> adjusting the command loop to work with partial sectors is OK. > > >>> > > >>> - If it's NOT OK to do partial sector transfer, the sglist preparation > > >>> phase needs to produce a truncated SGList that's some multiple of 512 > > >>> bytes that leaves the excess bytes in a second sglist that we don't > > >>> throw away and can use as a basis for building the next sglist. (Or the > > >>> DMA helpers need to take a max_bytes parameter and return an sglist > > >>> representing unused buffer space if the command underflowed.) > > >> > > >> Support for partial sector transfers is built into the DMA interface's > > >> PRD > > >> mechanism itself, because an entry is allowed to transfer in the units of > > >> even number of bytes. > > >> > > >> I think the controller's IO process runs in two parts (probably loops > > >> over > > >> for a single transfer): > > >> > > >> (1) The controller's disk interface transfers between its internal buffer > > >> and the disk storage. The transfers are likely to be in the > > >> multiples of a sector. > > >> (2) The controller's DMA interface transfers between its internal buffer > > >> and the system memory. The transfers
Re: [Qemu-devel] [PATCH] tests: Simplify .gitignore
On 06/19/2018 05:39 PM, Eric Blake wrote: > Commit 0bcc8e5b was yet another instance of 'git status' reporting > dirty files after an in-tree build, thanks to the new binary > tests/check-block-qdict. > > Instead of piecemeal exemptions of each new binary as they are > added, let's use git's negative globbing feature to exempt ALL > files that have a 'test-' or 'check-' prefix, except for the ones > ending in '.c' or '.sh'. We still have a couple of generated > files that then need (re-)exclusion, but the overall list is a > LOT shorter, and less prone to needing future edits. Finally :) > > Signed-off-by: Eric Blake Reviewed-by: Philippe Mathieu-Daudé > --- > tests/.gitignore | 93 > +++- > 1 file changed, 5 insertions(+), 88 deletions(-) > > diff --git a/tests/.gitignore b/tests/.gitignore > index 2bc61a9a58d..08e2df1ce1f 100644 > --- a/tests/.gitignore > +++ b/tests/.gitignore > @@ -2,101 +2,18 @@ atomic_add-bench > benchmark-crypto-cipher > benchmark-crypto-hash > benchmark-crypto-hmac > -check-qdict > -check-qnum > -check-qjson > -check-qlist > -check-qlit > -check-qnull > -check-qobject > -check-qstring > -check-qom-interface > -check-qom-proplist > +check-* > +!check-*.c > +!check-*.sh > qht-bench > rcutorture > -test-aio > -test-aio-multithread > -test-arm-mptimer > -test-base64 > -test-bdrv-drain > -test-bitops > -test-bitcnt > -test-block-backend > -test-blockjob > -test-blockjob-txn > -test-bufferiszero > -test-char > -test-clone-visitor > -test-coroutine > -test-crypto-afsplit > -test-crypto-block > -test-crypto-cipher > -test-crypto-hash > -test-crypto-hmac > -test-crypto-ivgen > -test-crypto-pbkdf > -test-crypto-secret > -test-crypto-tlscredsx509 > -test-crypto-tlscredsx509-work/ > -test-crypto-tlscredsx509-certs/ > -test-crypto-tlssession > -test-crypto-tlssession-work/ > -test-crypto-tlssession-client/ > -test-crypto-tlssession-server/ > -test-crypto-xts > -test-cutils > -test-hbitmap > -test-hmp > -test-int128 > -test-iov > -test-io-channel-buffer > -test-io-channel-command > -test-io-channel-command.fifo > -test-io-channel-file > -test-io-channel-file.txt > -test-io-channel-socket > -test-io-channel-tls > -test-io-task > -test-keyval > -test-logging > -test-mul64 > -test-opts-visitor > +test-* > +!test-*.c > test-qapi-commands.[ch] > test-qapi-events.[ch] > test-qapi-types.[ch] > -test-qapi-util > test-qapi-visit.[ch] > -test-qdev-global-props > -test-qemu-opts > -test-qdist > -test-qga > -test-qht > -test-qht-par > -test-qmp-cmds > -test-qmp-event > -test-qobject-input-strict > -test-qobject-input-visitor > test-qapi-introspect.[ch] > -test-qobject-output-visitor > -test-rcu-list > -test-replication > -test-shift128 > -test-string-input-visitor > -test-string-output-visitor > -test-thread-pool > -test-throttle > -test-timed-average > -test-uuid > -test-util-sockets > -test-visitor-serialization > -test-vmstate > -test-write-threshold > -test-x86-cpuid > -test-x86-cpuid-compat > -test-xbzrle > -test-netfilter > -test-filter-mirror > -test-filter-redirector > *-test > qapi-schema/*.test.* > vm/*.img >
Re: [Qemu-devel] [PATCH v3 1/2] kvm: support -dedicated cpu-pm=on|off
On Tue, Jun 19, 2018 at 05:17:45PM +0200, Paolo Bonzini wrote: > On 16/06/2018 00:29, Michael S. Tsirkin wrote: > > > > +static QemuOptsList qemu_dedicated_opts = { > > +.name = "dedicated", > > +.head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head), > > +.desc = { > > +{ > > +.name = "mem-lock", > > +.type = QEMU_OPT_BOOL, > > +}, > > +{ > > +.name = "cpu-pm", > > +.type = QEMU_OPT_BOOL, > > +}, > > +{ /* end of list */ } > > +}, > > +}; > > + > > Let the bikeshedding begin! > > 1) Should we deprecate -realtime? Can be a patch on top, by whoever cares. > 2) Maybe -hostresource? > > Paolo Is ability to cause high latency for other threads really a resource? The issues in question: 1. a malicious guest can cause high latency for others sharing the host cpu. 2. to host scheduler cpu looks busier than it really is. All are avoided if you use a dedicated host cpu, and 2 will help scheduler get closer to giving you one. -- MST
[Qemu-devel] [PATCH] target/arm: Set strict alignment for ARMv6-M load/store
Unlike ARMv7-M, ARMv6-M only supports naturally aligned memory accesses for 16-bit halfword and 32-bit word accesses using the LDR, LDRH, LDRSH, STR and STRH instructions. Signed-off-by: Julia Suvorova --- target/arm/translate.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/target/arm/translate.c b/target/arm/translate.c index b988d379e7..d923cbe98e 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -1100,7 +1100,14 @@ static inline TCGv gen_aa32_addr(DisasContext *s, TCGv_i32 a32, TCGMemOp op) static void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32, int index, TCGMemOp opc) { -TCGv addr = gen_aa32_addr(s, a32, opc); +TCGv addr; + +if (arm_dc_feature(s, ARM_FEATURE_M) && +!arm_dc_feature(s, ARM_FEATURE_V7)) { +opc |= MO_ALIGN; +} + +addr = gen_aa32_addr(s, a32, opc); tcg_gen_qemu_ld_i32(val, addr, index, opc); tcg_temp_free(addr); } @@ -1108,7 +1115,14 @@ static void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32, static void gen_aa32_st_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32, int index, TCGMemOp opc) { -TCGv addr = gen_aa32_addr(s, a32, opc); +TCGv addr; + +if (arm_dc_feature(s, ARM_FEATURE_M) && +!arm_dc_feature(s, ARM_FEATURE_V7)) { +opc |= MO_ALIGN; +} + +addr = gen_aa32_addr(s, a32, opc); tcg_gen_qemu_st_i32(val, addr, index, opc); tcg_temp_free(addr); } -- 2.17.0
[Qemu-devel] [PATCH] tests: Simplify .gitignore
Commit 0bcc8e5b was yet another instance of 'git status' reporting dirty files after an in-tree build, thanks to the new binary tests/check-block-qdict. Instead of piecemeal exemptions of each new binary as they are added, let's use git's negative globbing feature to exempt ALL files that have a 'test-' or 'check-' prefix, except for the ones ending in '.c' or '.sh'. We still have a couple of generated files that then need (re-)exclusion, but the overall list is a LOT shorter, and less prone to needing future edits. Signed-off-by: Eric Blake --- tests/.gitignore | 93 +++- 1 file changed, 5 insertions(+), 88 deletions(-) diff --git a/tests/.gitignore b/tests/.gitignore index 2bc61a9a58d..08e2df1ce1f 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -2,101 +2,18 @@ atomic_add-bench benchmark-crypto-cipher benchmark-crypto-hash benchmark-crypto-hmac -check-qdict -check-qnum -check-qjson -check-qlist -check-qlit -check-qnull -check-qobject -check-qstring -check-qom-interface -check-qom-proplist +check-* +!check-*.c +!check-*.sh qht-bench rcutorture -test-aio -test-aio-multithread -test-arm-mptimer -test-base64 -test-bdrv-drain -test-bitops -test-bitcnt -test-block-backend -test-blockjob -test-blockjob-txn -test-bufferiszero -test-char -test-clone-visitor -test-coroutine -test-crypto-afsplit -test-crypto-block -test-crypto-cipher -test-crypto-hash -test-crypto-hmac -test-crypto-ivgen -test-crypto-pbkdf -test-crypto-secret -test-crypto-tlscredsx509 -test-crypto-tlscredsx509-work/ -test-crypto-tlscredsx509-certs/ -test-crypto-tlssession -test-crypto-tlssession-work/ -test-crypto-tlssession-client/ -test-crypto-tlssession-server/ -test-crypto-xts -test-cutils -test-hbitmap -test-hmp -test-int128 -test-iov -test-io-channel-buffer -test-io-channel-command -test-io-channel-command.fifo -test-io-channel-file -test-io-channel-file.txt -test-io-channel-socket -test-io-channel-tls -test-io-task -test-keyval -test-logging -test-mul64 -test-opts-visitor +test-* +!test-*.c test-qapi-commands.[ch] test-qapi-events.[ch] test-qapi-types.[ch] -test-qapi-util test-qapi-visit.[ch] -test-qdev-global-props -test-qemu-opts -test-qdist -test-qga -test-qht -test-qht-par -test-qmp-cmds -test-qmp-event -test-qobject-input-strict -test-qobject-input-visitor test-qapi-introspect.[ch] -test-qobject-output-visitor -test-rcu-list -test-replication -test-shift128 -test-string-input-visitor -test-string-output-visitor -test-thread-pool -test-throttle -test-timed-average -test-uuid -test-util-sockets -test-visitor-serialization -test-vmstate -test-write-threshold -test-x86-cpuid -test-x86-cpuid-compat -test-xbzrle -test-netfilter -test-filter-mirror -test-filter-redirector *-test qapi-schema/*.test.* vm/*.img -- 2.14.4
Re: [Qemu-devel] [virtio-dev] Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
On Tue, Jun 19, 2018 at 12:54:53PM +0200, Cornelia Huck wrote: > Sorry about dragging mainframes into this, but this will only work for > homogenous device coupling, not for heterogenous. Consider my vfio-pci > + virtio-net-ccw example again: The guest cannot find out that the two > belong together by checking some group ID, it has to either use the MAC > or some needs-to-be-architectured property. > > Alternatively, we could propose that mechanism as pci-only, which means > we can rely on mechanisms that won't necessarily work on non-pci > transports. (FWIW, I don't see a use case for using vfio-ccw to pass > through a network card anytime in the near future, due to the nature of > network cards currently in use on s390.) That's what it boils down to, yes. If there's need to have this for non-pci devices, then we should put it in config space. Cornelia, what do you think? -- MST
[Qemu-devel] [PATCH v2] xilinx_spips: Make dma transactions as per dma_burst_size
Qspi dma has a burst length of 64 bytes, So limit transaction length to 64 max. Signed-off-by: Sai Pavan Boddu --- hw/ssi/xilinx_spips.c | 20 +--- include/hw/ssi/xilinx_spips.h | 5 - 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c index 03f5fae..0ea57d1 100644 --- a/hw/ssi/xilinx_spips.c +++ b/hw/ssi/xilinx_spips.c @@ -851,12 +851,17 @@ static void xlnx_zynqmp_qspips_notify(void *opaque) { size_t ret; uint32_t num; -const void *rxd = pop_buf(recv_fifo, 4, ); +const void *rxd; +int len; + +len = recv_fifo->num >= rq->dma_burst_size ? rq->dma_burst_size : + recv_fifo->num; +rxd = pop_buf(recv_fifo, len, ); memcpy(rq->dma_buf, rxd, num); -ret = stream_push(rq->dma, rq->dma_buf, 4); -assert(ret == 4); +ret = stream_push(rq->dma, rq->dma_buf, num); +assert(ret == num); xlnx_zynqmp_qspips_check_flush(rq); } } @@ -1337,6 +1342,9 @@ static void xlnx_zynqmp_qspips_realize(DeviceState *dev, Error **errp) fifo8_create(>rx_fifo_g, xsc->rx_fifo_size); fifo8_create(>tx_fifo_g, xsc->tx_fifo_size); fifo32_create(>fifo_g, 32); +if (s->dma_burst_size > QSPI_DMA_MAX_BURST_SIZE) { +s->dma_burst_size = QSPI_DMA_MAX_BURST_SIZE; +} } static void xlnx_zynqmp_qspips_init(Object *obj) @@ -1411,6 +1419,11 @@ static const VMStateDescription vmstate_xlnx_zynqmp_qspips = { } }; +static Property xilinx_zynqmp_qspips_properties[] = { +DEFINE_PROP_UINT32("dma-burst-size", XlnxZynqMPQSPIPS, dma_burst_size, 64), +DEFINE_PROP_END_OF_LIST(), +}; + static Property xilinx_qspips_properties[] = { /* We had to turn this off for 2.10 as it is not compatible with migration. * It can be enabled but will prevent the device to be migrated. @@ -1463,6 +1476,7 @@ static void xlnx_zynqmp_qspips_class_init(ObjectClass *klass, void * data) dc->realize = xlnx_zynqmp_qspips_realize; dc->reset = xlnx_zynqmp_qspips_reset; dc->vmsd = _xlnx_zynqmp_qspips; +dc->props = xilinx_zynqmp_qspips_properties; xsc->reg_ops = _zynqmp_qspips_ops; xsc->rx_fifo_size = RXFF_A_Q; xsc->tx_fifo_size = TXFF_A_Q; diff --git a/include/hw/ssi/xilinx_spips.h b/include/hw/ssi/xilinx_spips.h index d398a4e..bc5596a 100644 --- a/include/hw/ssi/xilinx_spips.h +++ b/include/hw/ssi/xilinx_spips.h @@ -37,6 +37,8 @@ typedef struct XilinxSPIPS XilinxSPIPS; /* Bite off 4k chunks at a time */ #define LQSPI_CACHE_SIZE 1024 +#define QSPI_DMA_MAX_BURST_SIZE 2048 + typedef enum { READ = 0x3, READ_4 = 0x13, FAST_READ = 0xb,FAST_READ_4 = 0x0c, @@ -95,7 +97,8 @@ typedef struct { XilinxQSPIPS parent_obj; StreamSlave *dma; -uint8_t dma_buf[4]; +uint8_t dma_buf[QSPI_DMA_MAX_BURST_SIZE]; +uint32_t dma_burst_size; int gqspi_irqline; uint32_t regs[XLNX_ZYNQMP_SPIPS_R_MAX]; -- 2.7.4
Re: [Qemu-devel] [PATCH v7 1/3] qmp: adding 'wakeup-suspend-support' in query-target
Hi, Sorry for the delay. I'll summarize what I've understood from the discussion so far: - query-target is the wrong place for this flag. query-machines is (less) wrong because it is not a static property of the machine object - a new "query-current-machine" can be created to host these dynamic properties that belongs to the current instance of the VM - there are machines in which the suspend support may vary with a "-no-acpi" option that would disable both the suspend and wake-up support. In this case, I see no problem into counting this flag into the logic (assuming it is possible, of course) and setting it as "false" if there is -no-acpi present (or even making the API returning "yes", "no" or "acpi" like Markus suggested) somewhere. Based on the last email from Eduardo, apparently there is a handful of other machine properties that can be hosted in either this new query-current-machine API or query-machines. I believe that this is more of a long term goal, but this new query-current-machine API would be a good kick-off and we should go for it. Is this a fair understanding? Did I miss something? Thanks, Daniel On 05/29/2018 11:55 AM, Eduardo Habkost wrote: On Mon, May 28, 2018 at 09:23:54AM +0200, Markus Armbruster wrote: Eduardo Habkost writes: [...] [1] Doing a: $ git grep 'STR.*machine, "' on libvirt source is enough to find some code demonstrating where query-machines is already lacking today: [...] How can we get from this grep to a list of static or dynamic machine type capabilties? Let's look at the code: $ git grep -W 'STR.*machine, "' src/libxl/libxl_capabilities.c=libxlMakeDomainOSCaps(const char *machine, src/libxl/libxl_capabilities.c- virDomainCapsOSPtr os, src/libxl/libxl_capabilities.c- virFirmwarePtr *firmwares, src/libxl/libxl_capabilities.c- size_t nfirmwares) src/libxl/libxl_capabilities.c-{ src/libxl/libxl_capabilities.c-virDomainCapsLoaderPtr capsLoader = >loader; src/libxl/libxl_capabilities.c-size_t i; src/libxl/libxl_capabilities.c- src/libxl/libxl_capabilities.c-os->supported = true; src/libxl/libxl_capabilities.c- src/libxl/libxl_capabilities.c:if (STREQ(machine, "xenpv")) src/libxl/libxl_capabilities.c-return 0; I don't understand why this one is here, but we can find out what we could add to query-machines to make this unnecessary. [...] -- src/libxl/libxl_capabilities.c=libxlMakeDomainCapabilities(virDomainCapsPtr domCaps, src/libxl/libxl_capabilities.c-virFirmwarePtr *firmwares, src/libxl/libxl_capabilities.c-size_t nfirmwares) src/libxl/libxl_capabilities.c-{ src/libxl/libxl_capabilities.c-virDomainCapsOSPtr os = >os; src/libxl/libxl_capabilities.c-virDomainCapsDeviceDiskPtr disk = >disk; src/libxl/libxl_capabilities.c-virDomainCapsDeviceGraphicsPtr graphics = >graphics; src/libxl/libxl_capabilities.c-virDomainCapsDeviceVideoPtr video = >video; src/libxl/libxl_capabilities.c-virDomainCapsDeviceHostdevPtr hostdev = >hostdev; src/libxl/libxl_capabilities.c- src/libxl/libxl_capabilities.c:if (STREQ(domCaps->machine, "xenfv")) src/libxl/libxl_capabilities.c-domCaps->maxvcpus = HVM_MAX_VCPUS; src/libxl/libxl_capabilities.c-else src/libxl/libxl_capabilities.c-domCaps->maxvcpus = PV_MAX_VCPUS; Looks like libvirt isn't using MachineInfo::cpu-max. libvirt bug, or workaround for QEMU limitation? [...] -- src/libxl/libxl_driver.c=libxlConnectGetDomainCapabilities(virConnectPtr conn, src/libxl/libxl_driver.c- const char *emulatorbin, src/libxl/libxl_driver.c- const char *arch_str, src/libxl/libxl_driver.c- const char *machine, src/libxl/libxl_driver.c- const char *virttype_str, src/libxl/libxl_driver.c- unsigned int flags) src/libxl/libxl_driver.c-{ [...] src/libxl/libxl_driver.c-if (machine) { src/libxl/libxl_driver.c:if (STRNEQ(machine, "xenpv") && STRNEQ(machine, "xenfv")) { src/libxl/libxl_driver.c-virReportError(VIR_ERR_INVALID_ARG, "%s", src/libxl/libxl_driver.c- _("Xen only supports 'xenpv' and 'xenfv' machines")); Not sure if this should be encoded in QEMU. accel=xen works with other PC machines, doesn't it? [...] -- src/qemu/qemu_capabilities.c=bool virQEMUCapsHasPCIMultiBus(virQEMUCapsPtr qemuCaps, src/qemu/qemu_capabilities.c- const virDomainDef *def) src/qemu/qemu_capabilities.c-{ src/qemu/qemu_capabilities.c-/* x86_64 and i686 support PCI-multibus on all machine types src/qemu/qemu_capabilities.c- * since forever */ src/qemu/qemu_capabilities.c-if (ARCH_IS_X86(def->os.arch)) src/qemu/qemu_capabilities.c-return true; src/qemu/qemu_capabilities.c-
Re: [Qemu-devel] [PATCH] xilinx_spips: Make dma transactions as per dma_burst_size
Hi Edgar, I got your suggestion below. Will be sending a V2 asap. Thanks, Sai Pavan > -Original Message- > From: Edgar E. Iglesias [mailto:edgar.igles...@xilinx.com] > Sent: Thursday, June 14, 2018 4:55 PM > To: Sai Pavan Boddu > Cc: qemu-devel@nongnu.org; Alistair Francis ; Peter > Crosthwaite ; Peter Maydell > ; Francisco Iglesias > Subject: Re: [PATCH] xilinx_spips: Make dma transactions as per dma_burst_size > > On Thu, Jun 14, 2018 at 10:57:04AM +0530, Sai Pavan Boddu wrote: > > Qspi dma has a burst length of 64 bytes, So limit transaction length > > to > > 64 max. > > Hi Sai, > > Is this a v2 or a resend? > > > > > Signed-off-by: Sai Pavan Boddu > > --- > > hw/ssi/xilinx_spips.c | 18 +++--- > > include/hw/ssi/xilinx_spips.h | 3 ++- > > 2 files changed, 17 insertions(+), 4 deletions(-) > > > > diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c index > > 03f5fae..ea006c4 100644 > > --- a/hw/ssi/xilinx_spips.c > > +++ b/hw/ssi/xilinx_spips.c > > @@ -851,12 +851,17 @@ static void xlnx_zynqmp_qspips_notify(void > *opaque) > > { > > size_t ret; > > uint32_t num; > > -const void *rxd = pop_buf(recv_fifo, 4, ); > > +const void *rxd; > > +int len; > > + > > +len = recv_fifo->num >= rq->dma_burst_size ? rq->dma_burst_size : > > + recv_fifo->num; > > +rxd = pop_buf(recv_fifo, len, ); > > > > memcpy(rq->dma_buf, rxd, num); > > > > -ret = stream_push(rq->dma, rq->dma_buf, 4); > > -assert(ret == 4); > > +ret = stream_push(rq->dma, rq->dma_buf, num); > > +assert(ret == num); > > xlnx_zynqmp_qspips_check_flush(rq); > > } > > } > > @@ -1337,6 +1342,7 @@ static void xlnx_zynqmp_qspips_realize(DeviceState > *dev, Error **errp) > > fifo8_create(>rx_fifo_g, xsc->rx_fifo_size); > > fifo8_create(>tx_fifo_g, xsc->tx_fifo_size); > > fifo32_create(>fifo_g, 32); > > +s->dma_buf = g_new0(uint8_t, s->dma_burst_size); > > > This would need to be free'd somewhere. > But I think you should put a reasonably small limit on the burst size that > users > can configure and then you can allocate this as an array in XlnxZynqMPQSPIPS. > > > > > } > > > > static void xlnx_zynqmp_qspips_init(Object *obj) @@ -1411,6 +1417,11 > > @@ static const VMStateDescription vmstate_xlnx_zynqmp_qspips = { > > } > > }; > > > > +static Property xilinx_zynqmp_qspips_properties[] = { > > +DEFINE_PROP_UINT32("dma-burst-size", XlnxZynqMPQSPIPS, > > +dma_burst_size, 64), > > You need to limit this so users dont pick 4G. Perhaps 2 or 4K max. > > Cheers, > Edgar > > > > +DEFINE_PROP_END_OF_LIST(), > > +}; > > + > > static Property xilinx_qspips_properties[] = { > > /* We had to turn this off for 2.10 as it is not compatible with > > migration. > > * It can be enabled but will prevent the device to be migrated. > > @@ -1463,6 +1474,7 @@ static void > xlnx_zynqmp_qspips_class_init(ObjectClass *klass, void * data) > > dc->realize = xlnx_zynqmp_qspips_realize; > > dc->reset = xlnx_zynqmp_qspips_reset; > > dc->vmsd = _xlnx_zynqmp_qspips; > > +dc->props = xilinx_zynqmp_qspips_properties; > > xsc->reg_ops = _zynqmp_qspips_ops; > > xsc->rx_fifo_size = RXFF_A_Q; > > xsc->tx_fifo_size = TXFF_A_Q; > > diff --git a/include/hw/ssi/xilinx_spips.h > > b/include/hw/ssi/xilinx_spips.h index d398a4e..cca1813 100644 > > --- a/include/hw/ssi/xilinx_spips.h > > +++ b/include/hw/ssi/xilinx_spips.h > > @@ -95,7 +95,8 @@ typedef struct { > > XilinxQSPIPS parent_obj; > > > > StreamSlave *dma; > > -uint8_t dma_buf[4]; > > +uint8_t *dma_buf; > > +uint32_t dma_burst_size; > > int gqspi_irqline; > > > > uint32_t regs[XLNX_ZYNQMP_SPIPS_R_MAX]; > > -- > > 2.7.4 > >
Re: [Qemu-devel] [PATCH v5 3/6] nbd/server: add nbd_meta_empty_or_pattern helper
On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote: Add nbd_meta_pattern() and nbd_meta_empty_or_pattern() helpers for metadata query parsing. nbd_meta_pattern() will be reused for "qemu" s/for/for the/ namespace in following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy --- nbd/server.c | 86 +--- 1 file changed, 59 insertions(+), 27 deletions(-) Feels like growth, even though the goal of refactoring is reuse; but the reuse comes later so I'm okay with it. diff --git a/nbd/server.c b/nbd/server.c index 567561a77e..2d762d7289 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -733,52 +733,83 @@ static int nbd_negotiate_send_meta_context(NBDClient *client, return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ? -EIO : 0; } -/* nbd_meta_base_query - * - * Handle query to 'base' namespace. For now, only base:allocation context is [1]... - * available in it. 'len' is the amount of text remaining to be read from - * the current name, after the 'base:' portion has been stripped. +/* Read strlen(@pattern) bytes, and set @match to true if they match @pattern. + * @match is never set to false. * * Return -errno on I/O error, 0 if option was completely handled by * sending a reply about inconsistent lengths, or 1 on success. * - * Note: return code = 1 doesn't mean that we've parsed "base:allocation" - * namespace. It only means that there are no errors.*/ -static int nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta, - uint32_t len, Error **errp) + * Note: return code = 1 doesn't mean that we've read exactly @pattern + * It only means that there are no errors. */ Comment tail on its own line (now that we've got a patch pending for HACKING to document that, I'll start abiding by it...) +static int nbd_meta_pattern(NBDClient *client, const char *pattern, bool *match, +Error **errp) { int ret; -char query[sizeof("allocation") - 1]; -size_t alen = strlen("allocation"); +char *query; +int len = strlen(pattern); size_t is better than len for strlen() results. -if (len == 0) { -if (client->opt == NBD_OPT_LIST_META_CONTEXT) { -meta->base_allocation = true; -} -trace_nbd_negotiate_meta_query_parse("base:"); -return 1; -} - -if (len != alen) { -trace_nbd_negotiate_meta_query_skip("not base:allocation"); -return nbd_opt_skip(client, len, errp); -} +assert(len); +query = g_malloc(len); At first, I wondered if we could just use a pre-allocated stack buffer larger than any string we ever anticipate. But thinking about it, your dirty bitmap exports expose a name under user control, which means a user could (spitefully) pick a name longer than our buffer (well, up to the 4k name limit imposed by the NBD protocol). So I can live with the malloc. ret = nbd_opt_read(client, query, len, errp); if (ret <= 0) { +g_free(query); return ret; } -if (strncmp(query, "allocation", alen) == 0) { -trace_nbd_negotiate_meta_query_parse("base:allocation"); -meta->base_allocation = true; +if (strncmp(query, pattern, len) == 0) { +trace_nbd_negotiate_meta_query_parse(pattern); +*match = true; } else { -trace_nbd_negotiate_meta_query_skip("not base:allocation"); +trace_nbd_negotiate_meta_query_skip(pattern); Would this one read better as "not %s", pattern? } +g_free(query); return 1; } +/* Read @len bytes, and set @match to true if they match @pattern, or if @len + * is 0 and the client is performing _LIST_. @match is never set to false. + * + * Return -errno on I/O error, 0 if option was completely handled by + * sending a reply about inconsistent lengths, or 1 on success. + * + * Note: return code = 1 doesn't mean that we've read exactly @pattern + * It only means that there are no errors. */ More comment formatting. +static int nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern, + uint32_t len, bool *match, Error **errp) +{ +if (len == 0) { +if (client->opt == NBD_OPT_LIST_META_CONTEXT) { +*match = true; +} +trace_nbd_negotiate_meta_query_parse("empty"); +return 1; +} + +if (len != strlen(pattern)) { +trace_nbd_negotiate_meta_query_skip("different lengths"); +return nbd_opt_skip(client, len, errp); +} + +return nbd_meta_pattern(client, pattern, match, errp); +} + +/* nbd_meta_base_query + * + * Handle query to 'base' namespace. For now, only base:allocation context is Pre-existing (see [1]), but reads better as "Handle queries to the 'base' namespace" + * available in it. 'len' is the amount of text remaining to be read from + * the current name,
Re: [Qemu-devel] [PATCH] [RFC v2] aio: properly bubble up errors from initialization
On 19.06.2018 [14:35:33 -0500], Eric Blake wrote: > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: > > laio_init() can fail for a couple of reasons, which will lead to a NULL > > pointer dereference in laio_attach_aio_context(). > > > > To solve this, add a aio_setup_linux_aio() function which is called > > before aio_get_linux_aio() where it is called currently, and which > > propogates setup errors up. The signature of aio_get_linux_aio() was not > > s/propogates/propagates/ Thanks! > > modified, because it seems preferable to return the actual errno from > > the possible failing initialization calls. > > > > With respect to the error-handling in the file-posix.c, we properly > > bubble any errors up in raw_co_prw and in the case s of > > raw_aio_{,un}plug, the result is the same as if s->use_linux_aio was not > > set (but there is no bubbling up). In all three cases, if the setup > > function fails, we fallback to the thread pool and an error message is > > emitted. > > > > It is trivial to make qemu segfault in my testing. Set > > /proc/sys/fs/aio-max-nr to 0 and start a guest with > > aio=native,cache=directsync. With this patch, the guest successfully > > starts (but obviously isn't using native AIO). Setting aio-max-nr back > > up to a reasonable value, AIO contexts are consumed normally. > > > > Signed-off-by: Nishanth Aravamudan > > > > --- > > > > Changes from v1 -> v2: > > When posting a v2, it's best to post as a new thread, rather than > in-reply-to the v1 thread, so that automated tooling knows to check the new > patch. More patch submission tips at > https://wiki.qemu.org/Contribute/SubmitAPatch My apologies! I'll fix this in a (future) v3. > > Rather than affect virtio-scsi/blk at all, make all the changes internal > > to file-posix.c. Thanks to Kevin Wolf for the suggested change. > > --- > > block/file-posix.c | 24 > > block/linux-aio.c | 15 ++- > > include/block/aio.h | 3 +++ > > include/block/raw-aio.h | 2 +- > > stubs/linux-aio.c | 2 +- > > util/async.c| 15 --- > > 6 files changed, 51 insertions(+), 10 deletions(-) > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > index 07bb061fe4..2415d09bf1 100644 > > --- a/block/file-posix.c > > +++ b/block/file-posix.c > > @@ -1665,6 +1665,14 @@ static int coroutine_fn raw_co_prw(BlockDriverState > > *bs, uint64_t offset, > > type |= QEMU_AIO_MISALIGNED; > > #ifdef CONFIG_LINUX_AIO > > } else if (s->use_linux_aio) { > > +int rc; > > +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > +if (rc != 0) { > > +error_report("Unable to use native AIO, falling back to " > > + "thread pool."); > > In general, error_report() should not output a trailing '.'. Will fix. > > +s->use_linux_aio = 0; > > +return rc; > > Wait - the message claims we are falling back, but the non-zero return code > sounds like we are returning an error instead of falling back. (My > preference - if the user requested something and we can't do it, it's better > to error than to fall back to something that does not match the user's > request). I think that makes sense, I hadn't tested this specific case (in my reading of the code, it wasn't clear to me if raw_co_prw() could be called before raw_aio_plug() had been called, but I think returning the error code up should be handled correctly. What about the cases where there is no error handling (the other two changes in the patch)? > > +} > > LinuxAioState *aio = > > aio_get_linux_aio(bdrv_get_aio_context(bs)); > > assert(qiov->size == bytes); > > return laio_co_submit(bs, aio, s->fd, offset, qiov, type); > > @@ -1695,6 +1703,14 @@ static void raw_aio_plug(BlockDriverState *bs) > > #ifdef CONFIG_LINUX_AIO > > BDRVRawState *s = bs->opaque; > > if (s->use_linux_aio) { > > +int rc; > > +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > +if (rc != 0) { > > +error_report("Unable to use native AIO, falling back to " > > + "thread pool."); > > +s->use_linux_aio = 0; > > Should s->use_linux_aio be a bool instead of an int? It is: bool use_linux_aio:1; would you prefer I did a preparatory patch that converted users to true/false? Thanks, Nish
Re: [Qemu-devel] [PATCH 2/6] nbd: allow authorization with nbd-server-start QMP command
On 06/15/2018 10:50 AM, Daniel P. Berrangé wrote: From: "Daniel P. Berrange" As with the previous patch to qemu-nbd, the nbd-server-start QMP command also needs to be able to specify authorization when enabling TLS encryption. First the client must create a QAuthZ object instance using the 'object-add' command: { 'execute': 'object-add', 'arguments': { 'qom-type': 'authz-simple', 'id': 'authz0', 'parameters': { 'policy': 'deny', 'rules': [ { 'match': '*CN=fred', 'policy': 'allow' } ] } } } They can then reference this in the new 'tls-authz' parameter when executing the 'nbd-server-start' command: { 'execute': 'nbd-server-start', 'arguments': { 'addr': { 'type': 'inet', 'host': '127.0.0.1', 'port': '9000' }, 'tls-creds': 'tls0', 'tls-authz': 'authz0' } } Is it worth using a discriminated union (string vs. QAuthZ) so that one could specify the authz policy inline rather than as a separate object, for convenience? But that would be fine as a followup patch, if we even want it. Signed-off-by: Daniel P. Berrange --- blockdev-nbd.c | 14 +++--- hmp.c | 2 +- include/block/nbd.h | 2 +- qapi/block.json | 4 +++- 4 files changed, 16 insertions(+), 6 deletions(-) @@ -118,6 +121,10 @@ void nbd_server_start(SocketAddress *addr, const char *tls_creds, } } +if (tls_authz) { +nbd_server->tlsauthz = g_strdup(tls_authz); +} Pointless 'if'; g_strdup() does the right thing. +++ b/qapi/block.json @@ -197,6 +197,7 @@ # # @addr: Address on which to listen. # @tls-creds: (optional) ID of the TLS credentials object. Since 2.6 +# @tls-authz: (optional) ID of the QAuthZ authorization object. Since 2.13 No need for the string '(optional)' (I thought we killed those uses when we automated the documentation generation - but obviously a few were left behind). s/2.13/3.0/ -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [virtio-dev] Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
On Tue, Jun 19, 2018 at 3:54 AM, Cornelia Huck wrote: > On Fri, 15 Jun 2018 10:06:07 -0700 > Siwei Liu wrote: > >> On Fri, Jun 15, 2018 at 4:48 AM, Cornelia Huck wrote: >> > On Thu, 14 Jun 2018 18:57:11 -0700 >> > Siwei Liu wrote: >> > >> >> Thank you for sharing your thoughts, Cornelia. With questions below, I >> >> think you raised really good points, some of which I don't have answer >> >> yet and would also like to explore here. >> >> >> >> First off, I don't want to push the discussion to the extreme at this >> >> point, or sell anything about having QEMU manage everything >> >> automatically. Don't get me wrong, it's not there yet. Let's don't >> >> assume we are tied to a specific or concerte solution. I think the key >> >> for our discussion might be to define or refine the boundary between >> >> VM and guest, e.g. what each layer is expected to control and manage >> >> exactly. >> >> >> >> In my view, there might be possibly 3 different options to represent >> >> the failover device conceipt to QEMU and libvirt (or any upper layer >> >> software): >> >> >> >> a. Seperate device: in this model, virtio and passthough remains >> >> separate devices just as today. QEMU exposes the standby feature bit >> >> for virtio, and publish status/event around the negotiation process of >> >> this feature bit for libvirt to react upon. Since Libvirt has the >> >> pairing relationship itself, maybe through MAC address or something >> >> else, it can control the presence of primary by hot plugging or >> >> unplugging the passthrough device, although it has to work tightly >> >> with virtio's feature negotation process. Not just for migration but >> >> also various corner scenarios (driver/feature ok, device reset, >> >> reboot, legacy guest etc) along virtio's feature negotiation. >> > >> > Yes, that one has obvious tie-ins to virtio's modus operandi. >> > >> >> >> >> b. Coupled device: in this model, virtio and passthough devices are >> >> weakly coupled using some group ID, i.e. QEMU match the passthough >> >> device for a standby virtio instance by comparing the group ID value >> >> present behind each device's bridge. Libvirt provides QEMU the group >> >> ID for both type of devices, and only deals with hot plug for >> >> migration, by checking some migration status exposed (e.g. the feature >> >> negotiation status on the virtio device) by QEMU. QEMU manages the >> >> visibility of the primary in guest along virtio's feature negotiation >> >> process. >> > >> > I'm a bit confused here. What, exactly, ties the two devices together? >> >> The group UUID. Since QEMU VFIO dvice does not have insight of MAC >> address (which it doesn't have to), the association between VFIO >> passthrough and standby must be specificed for QEMU to understand the >> relationship with this model. Note, standby feature is no longer >> required to be exposed under this model. > > Isn't that a bit limiting, though? > > With this model, you can probably tie a vfio-pci device and a > virtio-net-pci device together. But this will fail if you have > different transports: Consider tying together a vfio-pci device and a > virtio-net-ccw device on s390, for example. The standby feature bit is > on the virtio-net level and should not have any dependency on the > transport used. Probably we'd limit the support for grouping to virtio-net-pci device and vfio-pci device only. For virtio-net-pci, as you might see with Venu's patch, we store the group UUID on the config space of virtio-pci, which is only applicable to PCI transport. If virtio-net-ccw needs to support the same, I think similar grouping interface should be defined on the VirtIO CCW transport. I think the current implementation of the Linux failover driver assumes that it's SR-IOV VF with same MAC address which the virtio-net-pci needs to pair with, and that the PV path is on same PF without needing to update network of the port-MAC association change. If we need to extend the grouping mechanism to virtio-net-ccw, it has to pass such failover mode to virtio driver specifically through some other option I guess. > >> >> > If libvirt already has the knowledge that it should manage the two as a >> > couple, why do we need the group id (or something else for other >> > architectures)? (Maybe I'm simply missing something because I'm not >> > that familiar with pci.) >> >> The idea is to have QEMU control the visibility and enumeration order >> of the passthrough VFIO for the failover scenario. Hotplug can be one >> way to achieve it, and perhaps there's other way around also. The >> group ID is not just for QEMU to couple devices, it's also helpful to >> guest too as grouping using MAC address is just not safe. > > Sorry about dragging mainframes into this, but this will only work for > homogenous device coupling, not for heterogenous. Consider my vfio-pci > + virtio-net-ccw example again: The guest cannot find out that the two > belong together by checking some group ID, it has to
Re: [Qemu-devel] [PATCH 1/6] qemu-nbd: add support for authorization of TLS clients
On 06/15/2018 10:50 AM, Daniel P. Berrangé wrote: From: "Daniel P. Berrange" Currently any client which can complete the TLS handshake is able to use the NBD server. The server admin can turn on the 'verify-peer' option for the x509 creds to require the client to provide a x509 certificate. This means the client will have to acquire a certificate from the CA before they are permitted to use the NBD server. This is still a fairly low bar to cross. This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which takes the ID of a previously added 'QAuthZ' object instance. This will be used to validate the client's x509 distinguished name. Clients failing the authorization check will not be permitted to use the NBD server. For example to setup authorization that only allows connection from a client whose x509 certificate distinguished name contains 'CN=fred', you would use: qemu-nbd -object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\ endpoint=server,verify-peer=yes \ -object authz-simple,id=authz0,policy=deny,\ rules.0.match=*CN=fred,rules.0.policy=allow \ s/-object/--object/g -tls-creds tls0 \ -tls-authz authz0 s/-tls/--tls/g (qemu-nbd requires double-dash long-opts, -o means --offset except that 'bject' is not an offset; similarly for -t meaning --persistent) other qemu-nbd args... Signed-off-by: Daniel P. Berrange --- include/block/nbd.h | 2 +- nbd/server.c| 12 +++- qemu-nbd.c | 13 - qemu-nbd.texi | 4 4 files changed, 24 insertions(+), 7 deletions(-) +++ b/nbd/server.c @@ -2153,7 +2153,9 @@ void nbd_client_new(NBDExport *exp, if (tlscreds) { object_ref(OBJECT(client->tlscreds)); } -client->tlsaclname = g_strdup(tlsaclname); +if (tlsauthz) { +client->tlsauthz = g_strdup(tlsauthz); +} The 'if' is pointless; g_strdup(NULL) is safe. +++ b/qemu-nbd.c @@ -533,6 +535,7 @@ int main(int argc, char **argv) { "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS }, { "trace", required_argument, NULL, 'T' }, { "fork", no_argument, NULL, QEMU_NBD_OPT_FORK }, +{ "tls-authz", no_argument, NULL, QEMU_NBD_OPT_TLSAUTHZ }, Not your fault, but worth sorting these alphabetically? Bummer that pre-patch, you could use '--tls' as an unambiguous abbreviation for --tls-creds; now it is an ambiguous prefix (you have to type --tls-c or --tls-a to get to the point of no ambiguity). If we really cared, we could add: { "t", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS }, { "tl", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS }, { "tls", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS }, { "tls-", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS }, since getopt_long() no longer reports ambiguity if there is an exact match to what is otherwise the common prefix of two ambiguous options. But I don't think backwards-compatibility on this front is worth worrying about (generally, scripts don't rely on getopt_long()'s unambiguous prefix handling). +++ b/qemu-nbd.texi @@ -91,6 +91,10 @@ of the TLS credentials object previously created with the --object option. @item --fork Fork off the server process and exit the parent once the server is running. +@item --tls-authz=ID +Specify the ID of a qauthz object previously created with the s/qauthz/authz-simple/ ? +--object option. This will be used to authorize users who +connect against their x509 distinguished name. Sounds like someone is "connecting against their name", rather than "authorizing against their name". Better might be: This will be used to authorize connecting users against their x509 distinguished name. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH v2] migration: fix crash in when incoming client channel setup fails
Daniel P. Berrangé wrote: > The way we determine if we can start the incoming migration was > changed to use migration_has_all_channels() in: > > commit 428d89084c709e568f9cd301c2f6416a54c53d6d > Author: Juan Quintela > Date: Mon Jul 24 13:06:25 2017 +0200 > > migration: Create migration_has_all_channels > > This method in turn calls multifd_recv_all_channels_created() > which is hardcoded to always return 'true' when multifd is > not in use. This is a latent bug... > > ...activated in in a following commit where that return result > ends up acting as the flag to indicate whether it is possible > to start processing the migration: > > commit 36c2f8be2c4eb0003ac77a14910842b7ddd7337e > Author: Juan Quintela > Date: Wed Mar 7 08:40:52 2018 +0100 > > migration: Delay start of migration main routines > > This means that if channel initialization fails with normal > migration, it'll never notice and attempt to start the > incoming migration regardless and crash on a NULL pointer. > > This can be seen, for example, if a client connects to a server > requiring TLS, but has an invalid x509 certificate: > > qemu-system-x86_64: The certificate hasn't got a known issuer > qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: > Assertion `mis->from_src_file' failed. > > #0 0x7fffebd24f2b in raise () at /lib64/libc.so.6 > #1 0x7fffebd0f561 in abort () at /lib64/libc.so.6 > #2 0x7fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 > #3 0x7fffebd1d692 in () at /lib64/libc.so.6 > #4 0x55ad027e in process_incoming_migration_co (opaque= out>) at migration/migration.c:386 > #5 0x55c45e8b in coroutine_trampoline (i0=, > i1=) at util/coroutine-ucontext.c:116 > #6 0x7fffebd3a6a0 in __start_context () at /lib64/libc.so.6 > #7 0x in () > > To handle the non-multifd case, we check whether mis->from_src_file > is non-NULL. With this in place, the migration server drops the > rejected client and stays around waiting for another, hopefully > valid, client to arrive. > > Signed-off-by: Daniel P. Berrangé Reviewed-by: Juan Quintela
[Qemu-devel] [PATCH] simpletrace: Convert name from mapping record to str
The rest of the code assumes that idtoname is a (int -> str) dictionary, so convert the data accordingly. This is necessary to make the script work with Python 3 (where reads from a binary file return 'bytes' objects, not 'str'). Fixes the following error: $ python3 ./scripts/simpletrace.py trace-events-all trace-27445 b'object_class_dynamic_cast_assert' event is logged but is not \ declared in the trace events file, try using trace-events-all instead. Signed-off-by: Eduardo Habkost --- scripts/simpletrace.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/simpletrace.py b/scripts/simpletrace.py index d4a50a1e2b..4ad34f90cd 100755 --- a/scripts/simpletrace.py +++ b/scripts/simpletrace.py @@ -70,7 +70,7 @@ def get_record(edict, idtoname, rechdr, fobj): def get_mapping(fobj): (event_id, ) = struct.unpack('=Q', fobj.read(8)) (len, ) = struct.unpack('=L', fobj.read(4)) -name = fobj.read(len) +name = fobj.read(len).decode() return (event_id, name) -- 2.18.0.rc1.1.g3f1ff2140
[Qemu-devel] [PATCH v1 6/6] qga: removing bios_supports_mode
bios_support_mode verifies if the guest has support for a certain suspend mode but it doesn't inform back which suspend tool provides it. The caller, guest_suspend, executes all suspend strategies in order again. After adding systemd suspend support, bios_support_mode now will verify for support for systemd, then pmutils, then Linux sys state file. In a worst case scenario where both systemd and pmutils isn't supported but Linux sys state is: - bios_supports_mode will check for systemd, then pmutils, then Linux sys state. It will tell guest_suspend that there is support, but it will not tell who provides it; - guest_suspend will try to execute (and fail) systemd suspend, then pmutils suspend, to only then use the Linux sys suspend. The time spent executing systemd and pmutils suspend was wasted and could be avoided, but only bios_support_mode knew it but didn't inform it back. A quicker approach is to nuke bios_supports_mode and control whether we found support at all with a bool flag inside guest_suspend. guest_suspend will search for suspend support and execute it as soon as possible. If the a given suspend mechanism fails, continue to the next. If no suspend support is found, the "not supported" message is still being sent back to the user. Signed-off-by: Daniel Henrique Barboza --- qga/commands-posix.c | 54 +++- 1 file changed, 18 insertions(+), 36 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index 6a573de86d..79acc28ee7 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -1681,60 +1681,42 @@ static void linux_sys_state_suspend(SuspendMode mode, Error **errp) } -static void bios_supports_mode(SuspendMode mode, Error **errp) -{ -Error *local_err = NULL; -bool ret; - -ret = systemd_supports_mode(mode, _err); -if (ret) { -return; -} -if (local_err) { -error_propagate(errp, local_err); -return; -} -ret = pmutils_supports_mode(mode, _err); -if (ret) { -return; -} -if (local_err) { -error_propagate(errp, local_err); -return; -} -ret = linux_sys_state_supports_mode(mode, _err); -if (!ret) { -error_setg(errp, - "the requested suspend mode is not supported by the guest"); -} -} - static void guest_suspend(SuspendMode mode, Error **errp) { Error *local_err = NULL; +bool mode_supported = false; -bios_supports_mode(mode, _err); -if (local_err) { -error_propagate(errp, local_err); -return; +if (systemd_supports_mode(mode, _err)) { +mode_supported = true; +systemd_suspend(mode, _err); } -systemd_suspend(mode, _err); if (!local_err) { return; } local_err = NULL; -pmutils_suspend(mode, _err); +if (pmutils_supports_mode(mode, _err)) { +mode_supported = true; +pmutils_suspend(mode, _err); +} + if (!local_err) { return; } local_err = NULL; -linux_sys_state_suspend(mode, _err); -if (local_err) { +if (linux_sys_state_supports_mode(mode, _err)) { +mode_supported = true; +linux_sys_state_suspend(mode, _err); +} + +if (!mode_supported) { +error_setg(errp, + "the requested suspend mode is not supported by the guest"); +} else if (local_err) { error_propagate(errp, local_err); } } -- 2.17.1
[Qemu-devel] [PATCH v1 5/6] qga: adding systemd hibernate/suspend/hybrid-sleep support
pmutils isn't being supported by newer OSes like Fedora 27 or Mint. This means that the only suspend option QGA offers for these guests are writing directly into the Linux sys state file. This also means that QGA also loses the ability to do hybrid suspend in those guests - this suspend mode is only available when using pmutils. Newer guests can use systemd facilities to do all the suspend times QGA supports. The mapping in comparison with pmutils is: - pm-hibernate -> systemctl hibernate - pm-suspend -> systemctl suspend - pm-suspend-hybrid -> systemctl hybrid-sleep To discover whether systemd supports these functions, we inspect the status of the services that implements them. With this patch, we can offer hybrid suspend again for newer guests that do not have pmutils support anymore. Signed-off-by: Daniel Henrique Barboza --- qga/commands-posix.c | 72 1 file changed, 72 insertions(+) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index d5e3805ce9..6a573de86d 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -1486,6 +1486,63 @@ out: return ret; } +static bool systemd_supports_mode(SuspendMode mode, Error **errp) +{ +Error *local_err = NULL; +const char *systemctl_args[3] = {"systemd-hibernate", "systemd-suspend", + "systemd-hybrid-sleep"}; +const char *cmd[4] = {"systemctl", "status", systemctl_args[mode], NULL}; +int status; + +status = run_process_child(cmd, _err); + +/* + * systemctl status uses LSB return codes so we can expect + * status > 0 and be ok. To assert if the guest has support + * for the selected suspend mode, status should be < 4. 4 is + * the code for unknown service status, the return value when + * the service does not exist. A common value is status = 3 + * (program is not running). + */ +if (status > 0 && status < 4) { +return true; +} + +if (local_err) { +error_propagate(errp, local_err); +} + +return false; +} + +static void systemd_suspend(SuspendMode mode, Error **errp) +{ +Error *local_err = NULL; +const char *systemctl_args[3] = {"hibernate", "suspend", "hybrid-sleep"}; +const char *cmd[3] = {"systemctl", systemctl_args[mode], NULL}; +int status; + +status = run_process_child(cmd, _err); + +if (status == 0) { +return; +} + +if (status == -1) { +error_setg(errp, "the helper program '%s' was not found", + systemctl_args[mode]); +return; +} + +if (local_err) { +error_propagate(errp, local_err); +} else { +error_setg(errp, "the helper program 'systemctl %s' returned an " + " unexpected exit status code (%d)", + systemctl_args[mode], status); +} +} + static bool pmutils_supports_mode(SuspendMode mode, Error **errp) { Error *local_err = NULL; @@ -1629,6 +1686,14 @@ static void bios_supports_mode(SuspendMode mode, Error **errp) Error *local_err = NULL; bool ret; +ret = systemd_supports_mode(mode, _err); +if (ret) { +return; +} +if (local_err) { +error_propagate(errp, local_err); +return; +} ret = pmutils_supports_mode(mode, _err); if (ret) { return; @@ -1654,6 +1719,13 @@ static void guest_suspend(SuspendMode mode, Error **errp) return; } +systemd_suspend(mode, _err); +if (!local_err) { +return; +} + +local_err = NULL; + pmutils_suspend(mode, _err); if (!local_err) { return; -- 2.17.1
Re: [Qemu-devel] [PATCH] ppc: Include vga cirrus card into the compiling process
Hello David, Am 2018-06-19 06:36, schrieb David Gibson: Ok. However, your patch doesn't apply against the ppc-for-3.0 tree. It looks like you've made it against a tree including some of BALATON Zoltan's proposed but not yet merged patches. Please make sure your patches are against the current ppc-for-3.0 tree before posting. Okay, I'm sorry for the wrong timing. It is okay to wait for Zoltan's patch queue to be applied before applying this patch as I don't want to introduce new conflicts in those patches. Bye Sebastian
[Qemu-devel] [PATCH v1 3/6] qga: guest_suspend: decoupling pm-utils and sys logic
Following the same logic of the previous patch, let's also decouple the suspend logic from guest_suspend into specialized functions, one for each strategy we support at this moment. Signed-off-by: Daniel Henrique Barboza --- qga/commands-posix.c | 170 +++ 1 file changed, 108 insertions(+), 62 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index 89ffd8dc88..a2870f9ab9 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -1509,6 +1509,65 @@ out: return ret; } +static void pmutils_suspend(int suspend_mode, Error **errp) +{ +Error *local_err = NULL; +const char *pmutils_bin; +char *pmutils_path; +pid_t pid; +int status; + +switch (suspend_mode) { + +case SUSPEND_MODE_DISK: +pmutils_bin = "pm-hibernate"; +break; +case SUSPEND_MODE_RAM: +pmutils_bin = "pm-suspend"; +break; +case SUSPEND_MODE_HYBRID: +pmutils_bin = "pm-suspend-hybrid"; +break; +default: +error_setg(errp, "unknown guest suspend mode"); +return; +} + +pmutils_path = g_find_program_in_path(pmutils_bin); +if (!pmutils_path) { +error_setg(errp, "the helper program '%s' was not found", pmutils_bin); +return; +} + +pid = fork(); +if (!pid) { +setsid(); +execle(pmutils_path, pmutils_bin, NULL, environ); +/* + * If we get here execle() has failed. + */ +_exit(EXIT_FAILURE); +} else if (pid < 0) { +error_setg_errno(errp, errno, "failed to create child process"); +goto out; +} + +ga_wait_child(pid, , _err); +if (local_err) { +error_propagate(errp, local_err); +goto out; +} + +if (WEXITSTATUS(status)) { +error_setg(errp, + "the helper program '%s' returned an unexpected exit status" + " code (%d)", pmutils_path, WEXITSTATUS(status)); +} + +out: +g_free(pmutils_path); +} + static bool linux_sys_state_supports_mode(int suspend_mode, Error **errp) { const char *sysfile_str; @@ -1545,64 +1604,28 @@ static bool linux_sys_state_supports_mode(int suspend_mode, Error **errp) return false; } -static void bios_supports_mode(int suspend_mode, Error **errp) -{ -Error *local_err = NULL; -bool ret; - -ret = pmutils_supports_mode(suspend_mode, _err); -if (ret) { -return; -} -if (local_err) { -error_propagate(errp, local_err); -return; -} -ret = linux_sys_state_supports_mode(suspend_mode, errp); -if (!ret) { -error_setg(errp, - "the requested suspend mode is not supported by the guest"); -return; -} -} - -static void guest_suspend(int suspend_mode, Error **errp) +static void linux_sys_state_suspend(int suspend_mode, Error **errp) { Error *local_err = NULL; -const char *pmutils_bin, *sysfile_str; -char *pmutils_path; +const char *sysfile_str; pid_t pid; int status; -bios_supports_mode(suspend_mode, _err); -if (local_err) { -error_propagate(errp, local_err); -return; -} - switch (suspend_mode) { case SUSPEND_MODE_DISK: -pmutils_bin = "pm-hibernate"; sysfile_str = "disk"; break; case SUSPEND_MODE_RAM: -pmutils_bin = "pm-suspend"; sysfile_str = "mem"; break; -case SUSPEND_MODE_HYBRID: -pmutils_bin = "pm-suspend-hybrid"; -sysfile_str = NULL; -break; default: error_setg(errp, "unknown guest suspend mode"); return; } -pmutils_path = g_find_program_in_path(pmutils_bin); - pid = fork(); -if (pid == 0) { +if (!pid) { /* child */ int fd; @@ -1611,19 +1634,6 @@ static void guest_suspend(int suspend_mode, Error **errp) reopen_fd_to_null(1); reopen_fd_to_null(2); -if (pmutils_path) { -execle(pmutils_path, pmutils_bin, NULL, environ); -} - -/* - * If we get here either pm-utils is not installed or execle() has - * failed. Let's try the manual method if the caller wants it. - */ - -if (!sysfile_str) { -_exit(EXIT_FAILURE); -} - fd = open(LINUX_SYS_STATE_FILE, O_WRONLY); if (fd < 0) { _exit(EXIT_FAILURE); @@ -1636,27 +1646,63 @@ static void guest_suspend(int suspend_mode, Error **errp) _exit(EXIT_SUCCESS); } else if (pid < 0) { error_setg_errno(errp, errno, "failed to create child process"); -goto out; +return; } ga_wait_child(pid, , _err); if (local_err) { error_propagate(errp, local_err); -goto out; -} - -if (!WIFEXITED(status)) { -error_setg(errp, "child process has terminated abnormally"); -goto out; +return; } if
[Qemu-devel] [PATCH v1 1/6] qga: refactoring qmp_guest_suspend_* functions
To be able to add new suspend mechanisms we need to detach the existing QMP functions from the current implementation specifics. At this moment we have functions such as qmp_guest_suspend_ram calling bios_suspend_mode and guest_suspend passing the pmutils command and arguments as parameters. This patch removes this logic from the QMP functions, moving them to the respective functions that will have to deal with which binary to use. Signed-off-by: Daniel Henrique Barboza --- qga/commands-posix.c | 87 1 file changed, 55 insertions(+), 32 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index eae817191b..63c49791a4 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -1438,15 +1438,38 @@ qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **errp) #define LINUX_SYS_STATE_FILE "/sys/power/state" #define SUSPEND_SUPPORTED 0 #define SUSPEND_NOT_SUPPORTED 1 +#define SUSPEND_MODE_DISK 1 +#define SUSPEND_MODE_RAM 2 +#define SUSPEND_MODE_HYBRID 3 -static void bios_supports_mode(const char *pmutils_bin, const char *pmutils_arg, - const char *sysfile_str, Error **errp) +static void bios_supports_mode(int suspend_mode, Error **errp) { Error *local_err = NULL; +const char *pmutils_arg, *sysfile_str; +const char *pmutils_bin = "pm-is-supported"; char *pmutils_path; pid_t pid; int status; +switch (suspend_mode) { + +case SUSPEND_MODE_DISK: +pmutils_arg = "--hibernate"; +sysfile_str = "disk"; +break; +case SUSPEND_MODE_RAM: +pmutils_arg = "--suspend"; +sysfile_str = "mem"; +break; +case SUSPEND_MODE_HYBRID: +pmutils_arg = "--suspend-hybrid"; +sysfile_str = NULL; +break; +default: +error_setg(errp, "guest suspend mode not supported"); +return; +} + pmutils_path = g_find_program_in_path(pmutils_bin); pid = fork(); @@ -1523,14 +1546,39 @@ out: g_free(pmutils_path); } -static void guest_suspend(const char *pmutils_bin, const char *sysfile_str, - Error **errp) +static void guest_suspend(int suspend_mode, Error **errp) { Error *local_err = NULL; +const char *pmutils_bin, *sysfile_str; char *pmutils_path; pid_t pid; int status; +bios_supports_mode(suspend_mode, _err); +if (local_err) { +error_propagate(errp, local_err); +return; +} + +switch (suspend_mode) { + +case SUSPEND_MODE_DISK: +pmutils_bin = "pm-hibernate"; +sysfile_str = "disk"; +break; +case SUSPEND_MODE_RAM: +pmutils_bin = "pm-suspend"; +sysfile_str = "mem"; +break; +case SUSPEND_MODE_HYBRID: +pmutils_bin = "pm-suspend-hybrid"; +sysfile_str = NULL; +break; +default: +error_setg(errp, "unknown guest suspend mode"); +return; +} + pmutils_path = g_find_program_in_path(pmutils_bin); pid = fork(); @@ -1593,42 +1641,17 @@ out: void qmp_guest_suspend_disk(Error **errp) { -Error *local_err = NULL; - -bios_supports_mode("pm-is-supported", "--hibernate", "disk", _err); -if (local_err) { -error_propagate(errp, local_err); -return; -} - -guest_suspend("pm-hibernate", "disk", errp); +guest_suspend(SUSPEND_MODE_DISK, errp); } void qmp_guest_suspend_ram(Error **errp) { -Error *local_err = NULL; - -bios_supports_mode("pm-is-supported", "--suspend", "mem", _err); -if (local_err) { -error_propagate(errp, local_err); -return; -} - -guest_suspend("pm-suspend", "mem", errp); +guest_suspend(SUSPEND_MODE_RAM, errp); } void qmp_guest_suspend_hybrid(Error **errp) { -Error *local_err = NULL; - -bios_supports_mode("pm-is-supported", "--suspend-hybrid", NULL, - _err); -if (local_err) { -error_propagate(errp, local_err); -return; -} - -guest_suspend("pm-suspend-hybrid", NULL, errp); +guest_suspend(SUSPEND_MODE_HYBRID, errp); } static GuestNetworkInterfaceList * -- 2.17.1
[Qemu-devel] [PATCH v1 2/6] qga: bios_supports_mode: decoupling pm-utils and sys logic
In bios_supports_mode there is a verification to assert if the chosen suspend mode is supported by the pmutils tools and, if not, we see if the Linux sys state files supports it. This verification is done in the same function, one after the other, and it works for now. But, when adding a new suspend mechanism that will not necessarily follow the same return 0 or 1 logic of pmutils, this code will be hard to deal with. This patch decouple the two existing logics into their own functions, pmutils_supports_mode and linux_sys_state_supports_mode, which in turn are used inside bios_support_mode. The existing logic is kept but now it's easier to extend it. Signed-off-by: Daniel Henrique Barboza --- qga/commands-posix.c | 116 +-- 1 file changed, 68 insertions(+), 48 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index 63c49791a4..89ffd8dc88 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -1442,75 +1442,43 @@ qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **errp) #define SUSPEND_MODE_RAM 2 #define SUSPEND_MODE_HYBRID 3 -static void bios_supports_mode(int suspend_mode, Error **errp) +static bool pmutils_supports_mode(int suspend_mode, Error **errp) { Error *local_err = NULL; -const char *pmutils_arg, *sysfile_str; +const char *pmutils_arg; const char *pmutils_bin = "pm-is-supported"; char *pmutils_path; pid_t pid; int status; +bool ret = false; switch (suspend_mode) { case SUSPEND_MODE_DISK: pmutils_arg = "--hibernate"; -sysfile_str = "disk"; break; case SUSPEND_MODE_RAM: pmutils_arg = "--suspend"; -sysfile_str = "mem"; break; case SUSPEND_MODE_HYBRID: pmutils_arg = "--suspend-hybrid"; -sysfile_str = NULL; break; default: -error_setg(errp, "guest suspend mode not supported"); -return; +return ret; } pmutils_path = g_find_program_in_path(pmutils_bin); +if (!pmutils_path) { +return ret; +} pid = fork(); if (!pid) { -char buf[32]; /* hopefully big enough */ -ssize_t ret; -int fd; - setsid(); -reopen_fd_to_null(0); -reopen_fd_to_null(1); -reopen_fd_to_null(2); - -if (pmutils_path) { -execle(pmutils_path, pmutils_bin, pmutils_arg, NULL, environ); -} - +execle(pmutils_path, pmutils_bin, pmutils_arg, NULL, environ); /* - * If we get here either pm-utils is not installed or execle() has - * failed. Let's try the manual method if the caller wants it. + * If we get here execle() has failed. */ - -if (!sysfile_str) { -_exit(SUSPEND_NOT_SUPPORTED); -} - -fd = open(LINUX_SYS_STATE_FILE, O_RDONLY); -if (fd < 0) { -_exit(SUSPEND_NOT_SUPPORTED); -} - -ret = read(fd, buf, sizeof(buf)-1); -if (ret <= 0) { -_exit(SUSPEND_NOT_SUPPORTED); -} -buf[ret] = '\0'; - -if (strstr(buf, sysfile_str)) { -_exit(SUSPEND_SUPPORTED); -} - _exit(SUSPEND_NOT_SUPPORTED); } else if (pid < 0) { error_setg_errno(errp, errno, "failed to create child process"); @@ -1523,17 +1491,11 @@ static void bios_supports_mode(int suspend_mode, Error **errp) goto out; } -if (!WIFEXITED(status)) { -error_setg(errp, "child process has terminated abnormally"); -goto out; -} - switch (WEXITSTATUS(status)) { case SUSPEND_SUPPORTED: +ret = true; goto out; case SUSPEND_NOT_SUPPORTED: -error_setg(errp, - "the requested suspend mode is not supported by the guest"); goto out; default: error_setg(errp, @@ -1544,6 +1506,64 @@ static void bios_supports_mode(int suspend_mode, Error **errp) out: g_free(pmutils_path); +return ret; +} + +static bool linux_sys_state_supports_mode(int suspend_mode, Error **errp) +{ +const char *sysfile_str; +char buf[32]; /* hopefully big enough */ +int fd; +ssize_t ret; + +switch (suspend_mode) { + +case SUSPEND_MODE_DISK: +sysfile_str = "disk"; +break; +case SUSPEND_MODE_RAM: +sysfile_str = "mem"; +break; +default: +return false; +} + +fd = open(LINUX_SYS_STATE_FILE, O_RDONLY); +if (fd < 0) { +return false; +} + +ret = read(fd, buf, sizeof(buf) - 1); +if (ret <= 0) { +return false; +} +buf[ret] = '\0'; + +if (strstr(buf, sysfile_str)) { +return true; +} +return false; +} + +static void bios_supports_mode(int suspend_mode, Error **errp) +{ +Error *local_err = NULL; +bool ret; + +ret = pmutils_supports_mode(suspend_mode, _err); +if (ret) { +return; +} +if
[Qemu-devel] [PATCH v1 4/6] qga: removing switch statements, adding run_process_child
This is a cleanup of the resulting code after detaching pmutils and Linux sys state file logic: - remove the SUSPEND_MODE_* macros and use an enumeration instead. At the same time, drop the switch statements at the start of each function and use the enumeration index to get the right binary/argument; - create a new function called run_process_child(). This function creates a child process and executes a (shell) command, returning the command return code. This is a common operation in the pmutils functions and will be used in the systemd implementation as well, so this function will avoid code repetition. There are more places inside commands-posix.c where this new run_process_child function can also be used, but one step at a time. Signed-off-by: Daniel Henrique Barboza --- qga/commands-posix.c | 190 +-- 1 file changed, 76 insertions(+), 114 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index a2870f9ab9..d5e3805ce9 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -1438,152 +1438,122 @@ qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **errp) #define LINUX_SYS_STATE_FILE "/sys/power/state" #define SUSPEND_SUPPORTED 0 #define SUSPEND_NOT_SUPPORTED 1 -#define SUSPEND_MODE_DISK 1 -#define SUSPEND_MODE_RAM 2 -#define SUSPEND_MODE_HYBRID 3 -static bool pmutils_supports_mode(int suspend_mode, Error **errp) +typedef enum { +SUSPEND_MODE_DISK = 0, +SUSPEND_MODE_RAM = 1, +SUSPEND_MODE_HYBRID = 2, +} SuspendMode; + +static int run_process_child(const char *command[], Error **errp) { Error *local_err = NULL; -const char *pmutils_arg; -const char *pmutils_bin = "pm-is-supported"; -char *pmutils_path; +char *cmd_path = g_find_program_in_path(command[0]); pid_t pid; -int status; -bool ret = false; - -switch (suspend_mode) { - -case SUSPEND_MODE_DISK: -pmutils_arg = "--hibernate"; -break; -case SUSPEND_MODE_RAM: -pmutils_arg = "--suspend"; -break; -case SUSPEND_MODE_HYBRID: -pmutils_arg = "--suspend-hybrid"; -break; -default: -return ret; -} +int status, ret = -1; -pmutils_path = g_find_program_in_path(pmutils_bin); -if (!pmutils_path) { +if (!cmd_path) { return ret; } pid = fork(); if (!pid) { setsid(); -execle(pmutils_path, pmutils_bin, pmutils_arg, NULL, environ); /* - * If we get here execle() has failed. + * execve receives a char* const argv[] as second arg but we're + * receiving a const char*[]. Since execve does not change the + * array contents it's tolerable to cast here. */ -_exit(SUSPEND_NOT_SUPPORTED); +execve(cmd_path, (char* const*)command, environ); +_exit(errno); } else if (pid < 0) { error_setg_errno(errp, errno, "failed to create child process"); +ret = EXIT_FAILURE; goto out; } ga_wait_child(pid, , _err); if (local_err) { error_propagate(errp, local_err); +ret = EXIT_FAILURE; goto out; } -switch (WEXITSTATUS(status)) { -case SUSPEND_SUPPORTED: -ret = true; -goto out; -case SUSPEND_NOT_SUPPORTED: -goto out; -default: -error_setg(errp, - "the helper program '%s' returned an unexpected exit status" - " code (%d)", pmutils_path, WEXITSTATUS(status)); -goto out; -} +ret = WEXITSTATUS(status); out: -g_free(pmutils_path); +g_free(cmd_path); return ret; } -static void pmutils_suspend(int suspend_mode, Error **errp) +static bool pmutils_supports_mode(SuspendMode mode, Error **errp) { Error *local_err = NULL; -const char *pmutils_bin; -char *pmutils_path; -pid_t pid; +const char *pmutils_args[3] = {"--hibernate", "--suspend", + "--suspend-hybrid"}; +const char *cmd[3] = {"pm-is-supported", pmutils_args[mode], NULL}; int status; -switch (suspend_mode) { - -case SUSPEND_MODE_DISK: -pmutils_bin = "pm-hibernate"; -break; -case SUSPEND_MODE_RAM: -pmutils_bin = "pm-suspend"; -break; -case SUSPEND_MODE_HYBRID: -pmutils_bin = "pm-suspend-hybrid"; -break; -default: -error_setg(errp, "unknown guest suspend mode"); -return; -} +status = run_process_child(cmd, _err); -pmutils_path = g_find_program_in_path(pmutils_bin); -if (!pmutils_path) { -error_setg(errp, "the helper program '%s' was not found", pmutils_bin); -return; +if (status == SUSPEND_SUPPORTED) { +return true; } -pid = fork(); -if (!pid) { -setsid(); -execle(pmutils_path, pmutils_bin, NULL, environ); -/* - * If we get here execle() has failed. - */ -
[Qemu-devel] [PATCH v1 0/6] QGA: systemd hibernate/suspend/hybrid-sleep
This series adds systemd suspend support for QGA. Some newer guests don't have pmutils anymore, leaving us with just the Linux state file mechanism to suspend the guest OS, which does not support hybrid-sleep. With this implementation, QGA is now able to hybrid suspend newer guests again. Most of the patches are cleanups in the existing suspend code, aiming at both simplifying it and making it easier to extend it with systemd. Note: checkpatch.pl complains about patch 3: ERROR: "(foo* const*)" should be "(foo * const*)" #94: FILE: qga/commands-posix.c:1467: +execve(cmd_path, (char* const*)command, environ); ERROR: space required before that '*' (ctx:VxB) #94: FILE: qga/commands-posix.c:1467: +execve(cmd_path, (char* const*)command, environ); Not sure how to make it know that this is a cast instead of a math operation. Suggestions welcome Daniel Henrique Barboza (6): qga: refactoring qmp_guest_suspend_* functions qga: bios_supports_mode: decoupling pm-utils and sys logic qga: guest_suspend: decoupling pm-utils and sys logic qga: removing switch statements, adding run_process_child qga: adding systemd hibernate/suspend/hybrid-sleep support qga: removing bios_supports_mode qga/commands-posix.c | 315 --- 1 file changed, 210 insertions(+), 105 deletions(-) -- 2.17.1
Re: [Qemu-devel] [PATCH] [RFC v2] aio: properly bubble up errors from initialization
On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: laio_init() can fail for a couple of reasons, which will lead to a NULL pointer dereference in laio_attach_aio_context(). To solve this, add a aio_setup_linux_aio() function which is called before aio_get_linux_aio() where it is called currently, and which propogates setup errors up. The signature of aio_get_linux_aio() was not s/propogates/propagates/ modified, because it seems preferable to return the actual errno from the possible failing initialization calls. With respect to the error-handling in the file-posix.c, we properly bubble any errors up in raw_co_prw and in the case s of raw_aio_{,un}plug, the result is the same as if s->use_linux_aio was not set (but there is no bubbling up). In all three cases, if the setup function fails, we fallback to the thread pool and an error message is emitted. It is trivial to make qemu segfault in my testing. Set /proc/sys/fs/aio-max-nr to 0 and start a guest with aio=native,cache=directsync. With this patch, the guest successfully starts (but obviously isn't using native AIO). Setting aio-max-nr back up to a reasonable value, AIO contexts are consumed normally. Signed-off-by: Nishanth Aravamudan --- Changes from v1 -> v2: When posting a v2, it's best to post as a new thread, rather than in-reply-to the v1 thread, so that automated tooling knows to check the new patch. More patch submission tips at https://wiki.qemu.org/Contribute/SubmitAPatch Rather than affect virtio-scsi/blk at all, make all the changes internal to file-posix.c. Thanks to Kevin Wolf for the suggested change. --- block/file-posix.c | 24 block/linux-aio.c | 15 ++- include/block/aio.h | 3 +++ include/block/raw-aio.h | 2 +- stubs/linux-aio.c | 2 +- util/async.c| 15 --- 6 files changed, 51 insertions(+), 10 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index 07bb061fe4..2415d09bf1 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1665,6 +1665,14 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, type |= QEMU_AIO_MISALIGNED; #ifdef CONFIG_LINUX_AIO } else if (s->use_linux_aio) { +int rc; +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); +if (rc != 0) { +error_report("Unable to use native AIO, falling back to " + "thread pool."); In general, error_report() should not output a trailing '.'. +s->use_linux_aio = 0; +return rc; Wait - the message claims we are falling back, but the non-zero return code sounds like we are returning an error instead of falling back. (My preference - if the user requested something and we can't do it, it's better to error than to fall back to something that does not match the user's request). +} LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs)); assert(qiov->size == bytes); return laio_co_submit(bs, aio, s->fd, offset, qiov, type); @@ -1695,6 +1703,14 @@ static void raw_aio_plug(BlockDriverState *bs) #ifdef CONFIG_LINUX_AIO BDRVRawState *s = bs->opaque; if (s->use_linux_aio) { +int rc; +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); +if (rc != 0) { +error_report("Unable to use native AIO, falling back to " + "thread pool."); +s->use_linux_aio = 0; Should s->use_linux_aio be a bool instead of an int? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PULL 08/26] qobject: Move block-specific qdict code to block-qdict.c
On 06/15/2018 09:20 AM, Kevin Wolf wrote: From: Markus Armbruster Pure code motion, except for two brace placements and a comment tweaked to appease checkpatch. Signed-off-by: Markus Armbruster Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf --- qobject/block-qdict.c | 640 qobject/qdict.c | 629 tests/check-block-qdict.c | 655 ++ tests/check-qdict.c | 642 - MAINTAINERS | 2 + qobject/Makefile.objs | 1 + tests/Makefile.include| 4 + 7 files changed, 1302 insertions(+), 1271 deletions(-) create mode 100644 qobject/block-qdict.c create mode 100644 tests/check-block-qdict.c and missing a change to tests/.gitignore, so that tests/check-block-qdict now shows up as an untracked file on an in-tree build. (We really should follow through with our threat of renaming all the tests to use a consistent suffix-based pattern, as it's much easier to gitignore a suffix than a prefix) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.
On Tue, Jun 19, 2018 at 02:02:10PM -0500, Venu Busireddy wrote: > On 2018-06-19 21:53:01 +0300, Michael S. Tsirkin wrote: > > On Tue, Jun 19, 2018 at 01:36:17PM -0500, Venu Busireddy wrote: > > > On 2018-06-19 21:21:23 +0300, Michael S. Tsirkin wrote: > > > > On Tue, Jun 19, 2018 at 01:14:06PM -0500, Venu Busireddy wrote: > > > > > On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote: > > > > > > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote: > > > > > > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain > > > > > > > the > > > > > > > "Group Identifier" (UUID) that will be used to pair a virtio > > > > > > > device with > > > > > > > the passthrough device attached to that bridge. > > > > > > > > > > > > > > This capability is added to the bridge iff the "uuid" option is > > > > > > > specified > > > > > > > for the bridge device, via the qemu command line. Also, the > > > > > > > bridge's > > > > > > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is > > > > > > > changed > > > > > > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), > > > > > > > when the > > > > > > > "uuid" option is present. > > > > > > > > > > > > > > Signed-off-by: Venu Busireddy > > > > > > > > > > > > I don't see why we should add it to all bridges. > > > > > > Let's just add it to ones that already have the RH vendor ID? > > > > > > > > > > No. I am not adding the capability to all bridges. > > > > > > > > > > In the earlier discussions, we agreed that the bridge be left as > > > > > Intel bridge if we do not intend to use it for storing the pairing > > > > > information. If we do intend to store the pairing information in the > > > > > bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to > > > > > avoid confusion. In other words, bridge's with RH Vendor ID come into > > > > > existence only when there is an intent to store the pairing > > > > > information > > > > > in the bridge. > > > > > > > > > > Accordingly, if the "uuid" option is specified for the bridge, it > > > > > is assumed that the user intends to use the bridge for storing the > > > > > pairing information, and hence, the capability is added to the bridge, > > > > > and the Vendor ID is changed to RH Vendor ID. If the "uuid" option > > > > > is not specified, the bridge remains as Intel bridge, and without the > > > > > vendor-specific capability. > > > > > > > > > > Venu > > > > > > > > Yes but the way to do it is not to tweak the vendor and device ID, > > > > instead, just add the UUID property to bridges that already have the > > > > correct vendor and device id. > > > > > > I was using ioh3420 as the bridge device, because that is what is > > > recommended here: > > > > > > https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/pcie.txt;hb=HEAD > > > > > > ioh3420 defaults to the Intel Vendor ID. Hence the tweak to change the > > > Vendor ID to RH Vendor ID. > > > > > > Is there another bridge device other than ioh3420 that I should use? > > > what device do you suggest? > > > > > > Thanks, > > > > > > Venu > > > > For pci, use hw/pci-bridge/pci_bridge_dev.c > > Maybe allocate a special ID for grouping bridges. > > > > For express, add your own downstream port. > > Specifically, on the command line, what device does the user specify? > For example: > > qemu-system-x86_64 --device ${Bridge_Device},uuid="uuid string", > > What does the user specify for ${Bridge_Device} from the following: > > "i82801b11-bridge", bus PCI > "ioh3420", bus PCI, desc "Intel IOH device id 3420 PCIE Root Port" > "pci-bridge", bus PCI, desc "Standard PCI Bridge" This one. Or add pci-bridge-group. > "pci-bridge-seat", bus PCI, desc "Standard PCI Bridge (multiseat)" > "pcie-pci-bridge", bus PCI > "pcie-root-port", bus PCI, desc "PCI Express Root Port" > "pxb", bus PCI, desc "PCI Expander Bridge" > "pxb-pcie", bus PCI, desc "PCI Express Expander Bridge" > "usb-host", bus usb-bus > "usb-hub", bus usb-bus > "vfio-pci-igd-lpc-bridge", bus PCI, desc "VFIO dummy ISA/LPC bridge for IGD > assignment" > "x3130-upstream", bus PCI, desc "TI X3130 Upstream Port of PCI Express Switch" > "xio3130-downstream", bus PCI, desc "TI X3130 Downstream Port of PCI Express > Switch" > > Or, are you suggesting that I add a new type of device? If latter, what > should it be called? > > Thanks, > For express, add pcie-downstream or pcie-downstream-group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > hw/pci-bridge/ioh3420.c| 2 ++ > > > > > > > hw/pci-bridge/pcie_root_port.c | 7 +++ > > > > > > > hw/pci/pci_bridge.c| 32 > > > > > > > > > > > > > > include/hw/pci/pci.h | 2 ++ > > > > > > > include/hw/pci/pcie.h | 1 + > > > > > > > include/hw/pci/pcie_port.h | 1 + > > > > > > > 6 files changed, 45 insertions(+) > > > > > > > > > > > > > > diff --git
Re: [Qemu-devel] [PATCH v5 2/6] nbd/server: refactor NBDExportMetaContexts
On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote: Use NBDExport pointer instead of just export name: there no needs to s/no needs/is no need/ store duplicated name in the struct, moreover, NBDExport will be used further. Signed-off-by: Vladimir Sementsov-Ogievskiy --- nbd/server.c | 23 +++ 1 file changed, 11 insertions(+), 12 deletions(-) @@ -399,10 +399,9 @@ static int nbd_negotiate_handle_list(NBDClient *client, Error **errp) return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp); } -static void nbd_check_meta_export_name(NBDClient *client) +static void nbd_check_meta_export(NBDClient *client) { -client->export_meta.valid &= !strcmp(client->exp->name, - client->export_meta.export_name); +client->export_meta.valid &= client->exp == client->export_meta.exp; Changes from string comparison to pointer comparison... @@ -853,15 +852,15 @@ static int nbd_negotiate_meta_queries(NBDClient *client, memset(meta, 0, sizeof(*meta)); -ret = nbd_opt_read_name(client, meta->export_name, NULL, errp); +ret = nbd_opt_read_name(client, export_name, NULL, errp); if (ret <= 0) { return ret; } -exp = nbd_export_find(meta->export_name); -if (exp == NULL) { +meta->exp = nbd_export_find(export_name); +if (meta->exp == NULL) { ...by remembering the results of the string comparison performed under the hood. Looks good. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH v2 05/11] hw/arm/virt: GICv3 DT node with one or two redistributor regions
On 19 June 2018 at 20:53, Laszlo Ersek wrote: > Hi Eric, > > sorry about the late followup. I have one question (mainly for Ard): > > On 06/15/18 16:28, Eric Auger wrote: >> This patch allows the creation of a GICv3 node with 1 or 2 >> redistributor regions depending on the number of smu_cpus. >> The second redistributor region is located just after the >> existing RAM region, at 256GB and contains up to up to 512 vcpus. >> >> Please refer to kernel documentation for further node details: >> Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt >> >> Signed-off-by: Eric Auger >> Reviewed-by: Andrew Jones >> >> --- >> v1 (virt3.0) -> v2 >> - Added Drew's R-b >> >> v2 -> v3: >> - VIRT_GIC_REDIST2 is now 64MB large, ie. 512 redistributor capacity >> - virt_gicv3_redist_region_count does not test kvm_irqchip_in_kernel >> anymore >> --- >> hw/arm/virt.c | 29 - >> include/hw/arm/virt.h | 14 ++ >> 2 files changed, 38 insertions(+), 5 deletions(-) >> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >> index 2885d18..d9f72eb 100644 >> --- a/hw/arm/virt.c >> +++ b/hw/arm/virt.c >> @@ -148,6 +148,8 @@ static const MemMapEntry a15memmap[] = { >> [VIRT_PCIE_PIO] = { 0x3eff, 0x0001 }, >> [VIRT_PCIE_ECAM] = { 0x3f00, 0x0100 }, >> [VIRT_MEM] ={ 0x4000, RAMLIMIT_BYTES }, >> +/* Additional 64 MB redist region (can contain up to 512 >> redistributors) */ >> +[VIRT_GIC_REDIST2] ={ 0x40ULL, 0x400 }, >> /* Second PCIe window, 512GB wide at the 512GB boundary */ >> [VIRT_PCIE_MMIO_HIGH] = { 0x80ULL, 0x80ULL }, >> }; >> @@ -401,13 +403,30 @@ static void fdt_add_gic_node(VirtMachineState *vms) >> qemu_fdt_setprop_cell(vms->fdt, "/intc", "#size-cells", 0x2); >> qemu_fdt_setprop(vms->fdt, "/intc", "ranges", NULL, 0); >> if (vms->gic_version == 3) { >> +int nb_redist_regions = virt_gicv3_redist_region_count(vms); >> + >> qemu_fdt_setprop_string(vms->fdt, "/intc", "compatible", >> "arm,gic-v3"); >> -qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", >> - 2, vms->memmap[VIRT_GIC_DIST].base, >> - 2, vms->memmap[VIRT_GIC_DIST].size, >> - 2, vms->memmap[VIRT_GIC_REDIST].base, >> - 2, vms->memmap[VIRT_GIC_REDIST].size); >> + >> +qemu_fdt_setprop_cell(vms->fdt, "/intc", >> + "#redistributor-regions", nb_redist_regions); >> + >> +if (nb_redist_regions == 1) { >> +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", >> + 2, vms->memmap[VIRT_GIC_DIST].base, >> + 2, vms->memmap[VIRT_GIC_DIST].size, >> + 2, >> vms->memmap[VIRT_GIC_REDIST].base, >> + 2, >> vms->memmap[VIRT_GIC_REDIST].size); >> +} else { >> +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", >> + 2, vms->memmap[VIRT_GIC_DIST].base, >> + 2, vms->memmap[VIRT_GIC_DIST].size, >> + 2, >> vms->memmap[VIRT_GIC_REDIST].base, >> + 2, >> vms->memmap[VIRT_GIC_REDIST].size, >> + 2, >> vms->memmap[VIRT_GIC_REDIST2].base, >> + 2, >> vms->memmap[VIRT_GIC_REDIST2].size); >> +} >> + >> if (vms->virt) { >> qemu_fdt_setprop_cells(vms->fdt, "/intc", "interrupts", >> GIC_FDT_IRQ_TYPE_PPI, >> ARCH_GICV3_MAINT_IRQ, > > In edk2, we have the following code in > "ArmVirtPkg/Library/ArmVirtGicArchLib/ArmVirtGicArchLib.c": > > switch (GicRevision) { > > case 3: > // > // The GIC v3 DT binding describes a series of at least 3 physical (base > // addresses, size) pairs: the distributor interface (GICD), at least one > // redistributor region (GICR) containing dedicated redistributor > // interfaces for all individual CPUs, and the CPU interface (GICC). > // Under virtualization, we assume that the first redistributor region > // listed covers the boot CPU. Also, our GICv3 driver only supports the > // system register CPU interface, so we can safely ignore the MMIO version > // which is listed after the sequence of redistributor interfaces. > // This means we are only interested in the first two memory regions > // supplied, and ignore everything else. > // > ASSERT (RegSize >= 32); > > // RegProp[0..1] == { GICD base, GICD size } > DistBase = SwapBytes64 (Reg[0]); > ASSERT (DistBase <
Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.
On 2018-06-19 21:53:01 +0300, Michael S. Tsirkin wrote: > On Tue, Jun 19, 2018 at 01:36:17PM -0500, Venu Busireddy wrote: > > On 2018-06-19 21:21:23 +0300, Michael S. Tsirkin wrote: > > > On Tue, Jun 19, 2018 at 01:14:06PM -0500, Venu Busireddy wrote: > > > > On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote: > > > > > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote: > > > > > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain > > > > > > the > > > > > > "Group Identifier" (UUID) that will be used to pair a virtio device > > > > > > with > > > > > > the passthrough device attached to that bridge. > > > > > > > > > > > > This capability is added to the bridge iff the "uuid" option is > > > > > > specified > > > > > > for the bridge device, via the qemu command line. Also, the bridge's > > > > > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is > > > > > > changed > > > > > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when > > > > > > the > > > > > > "uuid" option is present. > > > > > > > > > > > > Signed-off-by: Venu Busireddy > > > > > > > > > > I don't see why we should add it to all bridges. > > > > > Let's just add it to ones that already have the RH vendor ID? > > > > > > > > No. I am not adding the capability to all bridges. > > > > > > > > In the earlier discussions, we agreed that the bridge be left as > > > > Intel bridge if we do not intend to use it for storing the pairing > > > > information. If we do intend to store the pairing information in the > > > > bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to > > > > avoid confusion. In other words, bridge's with RH Vendor ID come into > > > > existence only when there is an intent to store the pairing information > > > > in the bridge. > > > > > > > > Accordingly, if the "uuid" option is specified for the bridge, it > > > > is assumed that the user intends to use the bridge for storing the > > > > pairing information, and hence, the capability is added to the bridge, > > > > and the Vendor ID is changed to RH Vendor ID. If the "uuid" option > > > > is not specified, the bridge remains as Intel bridge, and without the > > > > vendor-specific capability. > > > > > > > > Venu > > > > > > Yes but the way to do it is not to tweak the vendor and device ID, > > > instead, just add the UUID property to bridges that already have the > > > correct vendor and device id. > > > > I was using ioh3420 as the bridge device, because that is what is > > recommended here: > > > > https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/pcie.txt;hb=HEAD > > > > ioh3420 defaults to the Intel Vendor ID. Hence the tweak to change the > > Vendor ID to RH Vendor ID. > > > > Is there another bridge device other than ioh3420 that I should use? > > what device do you suggest? > > > > Thanks, > > > > Venu > > For pci, use hw/pci-bridge/pci_bridge_dev.c > Maybe allocate a special ID for grouping bridges. > > For express, add your own downstream port. Specifically, on the command line, what device does the user specify? For example: qemu-system-x86_64 --device ${Bridge_Device},uuid="uuid string", What does the user specify for ${Bridge_Device} from the following: "i82801b11-bridge", bus PCI "ioh3420", bus PCI, desc "Intel IOH device id 3420 PCIE Root Port" "pci-bridge", bus PCI, desc "Standard PCI Bridge" "pci-bridge-seat", bus PCI, desc "Standard PCI Bridge (multiseat)" "pcie-pci-bridge", bus PCI "pcie-root-port", bus PCI, desc "PCI Express Root Port" "pxb", bus PCI, desc "PCI Expander Bridge" "pxb-pcie", bus PCI, desc "PCI Express Expander Bridge" "usb-host", bus usb-bus "usb-hub", bus usb-bus "vfio-pci-igd-lpc-bridge", bus PCI, desc "VFIO dummy ISA/LPC bridge for IGD assignment" "x3130-upstream", bus PCI, desc "TI X3130 Upstream Port of PCI Express Switch" "xio3130-downstream", bus PCI, desc "TI X3130 Downstream Port of PCI Express Switch" Or, are you suggesting that I add a new type of device? If latter, what should it be called? Thanks, > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > hw/pci-bridge/ioh3420.c| 2 ++ > > > > > > hw/pci-bridge/pcie_root_port.c | 7 +++ > > > > > > hw/pci/pci_bridge.c| 32 > > > > > > > > > > > > include/hw/pci/pci.h | 2 ++ > > > > > > include/hw/pci/pcie.h | 1 + > > > > > > include/hw/pci/pcie_port.h | 1 + > > > > > > 6 files changed, 45 insertions(+) > > > > > > > > > > > > diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c > > > > > > index a451d74ee6..b6b9ebc726 100644 > > > > > > --- a/hw/pci-bridge/ioh3420.c > > > > > > +++ b/hw/pci-bridge/ioh3420.c > > > > > > @@ -35,6 +35,7 @@ > > > > > > #define IOH_EP_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_MASKBIT > > > > > > #define IOH_EP_MSI_NR_VECTOR2 > > > > > > #define IOH_EP_EXP_OFFSET 0x90 > > > > > > +#define
Re: [Qemu-devel] [PATCH V7 RESEND 12/17] savevm: split the process of different stages for loadvm/savevm
* Zhang Chen (zhangc...@gmail.com) wrote: > On Wed, May 16, 2018 at 2:56 AM, Dr. David Alan Gilbert > wrote: > > > * Zhang Chen (zhangc...@gmail.com) wrote: > > > From: zhanghailiang > > > > > > There are several stages during loadvm/savevm process. In different > > stage, > > > migration incoming processes different types of sections. > > > We want to control these stages more accuracy, it will benefit COLO > > > performance, we don't have to save type of QEMU_VM_SECTION_START > > > sections everytime while do checkpoint, besides, we want to separate > > > the process of saving/loading memory and devices state. > > > > > > So we add three new helper functions: qemu_load_device_state() and > > > qemu_savevm_live_state() to achieve different process during migration. > > > > > > Besides, we make qemu_loadvm_state_main() and qemu_save_device_state() > > > public, and simplify the codes of qemu_save_device_state() by calling the > > > wrapper qemu_savevm_state_header(). > > > > > > Signed-off-by: zhanghailiang > > > Signed-off-by: Li Zhijian > > > Signed-off-by: Zhang Chen > > > Reviewed-by: Dr. David Alan Gilbert > > > --- > > > migration/colo.c | 36 > > > migration/savevm.c | 35 --- > > > migration/savevm.h | 4 > > > 3 files changed, 60 insertions(+), 15 deletions(-) > > > > > > diff --git a/migration/colo.c b/migration/colo.c > > > index cdff0a2490..5b055f79f1 100644 > > > --- a/migration/colo.c > > > +++ b/migration/colo.c > > > @@ -30,6 +30,7 @@ > > > #include "block/block.h" > > > #include "qapi/qapi-events-migration.h" > > > #include "qapi/qmp/qerror.h" > > > +#include "sysemu/cpus.h" > > > > > > static bool vmstate_loading; > > > static Notifier packets_compare_notifier; > > > @@ -414,23 +415,30 @@ static int > > > colo_do_checkpoint_transaction(MigrationState > > *s, > > > > > > /* Disable block migration */ > > > migrate_set_block_enabled(false, _err); > > > -qemu_savevm_state_header(fb); > > > -qemu_savevm_state_setup(fb); > > > qemu_mutex_lock_iothread(); > > > replication_do_checkpoint_all(_err); > > > if (local_err) { > > > qemu_mutex_unlock_iothread(); > > > goto out; > > > } > > > -qemu_savevm_state_complete_precopy(fb, false, false); > > > -qemu_mutex_unlock_iothread(); > > > - > > > -qemu_fflush(fb); > > > > > > colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, > > _err); > > > if (local_err) { > > > goto out; > > > } > > > +/* > > > + * Only save VM's live state, which not including device state. > > > + * TODO: We may need a timeout mechanism to prevent COLO process > > > + * to be blocked here. > > > + */ > > > > I guess that's the downside to transmitting it directly than into the > > buffer; > > Peter Xu's OOB command system would let you kill the connection - and > > that's something I think COLO should use. > > Still the change saves you having that huge outgoing buffer on the > > source side and lets you start sending the checkpoint sooner, which > > means the pause time should be smaller. > > > > Yes, you are right. > But I think this is a performance optimization, this series focus on > enabling. > I will do this job in the future. > > > > > > > +qemu_savevm_live_state(s->to_dst_file); > > > > Does this actually need to be inside of the qemu_mutex_lock_iothread? > > I'm pretty sure the device_state needs to be, but I'm not sure the > > live_state needs to. > > > > I have checked the codes, qemu_savevm_live_state needn't inside of the > qemu_mutex_lock_iothread, > I will move the it out the lock area in next version. > > > > > > > > +/* Note: device state is saved into buffer */ > > > +ret = qemu_save_device_state(fb); > > > + > > > +qemu_mutex_unlock_iothread(); > > > + > > > +qemu_fflush(fb); > > > + > > > /* > > > * We need the size of the VMstate data in Secondary side, > > > * With which we can decide how much data should be read. > > > @@ -643,6 +651,7 @@ void *colo_process_incoming_thread(void *opaque) > > > uint64_t total_size; > > > uint64_t value; > > > Error *local_err = NULL; > > > +int ret; > > > > > > qemu_sem_init(>colo_incoming_sem, 0); > > > > > > @@ -715,6 +724,16 @@ void *colo_process_incoming_thread(void *opaque) > > > goto out; > > > } > > > > > > +qemu_mutex_lock_iothread(); > > > +cpu_synchronize_all_pre_loadvm(); > > > +ret = qemu_loadvm_state_main(mis->from_src_file, mis); > > > +qemu_mutex_unlock_iothread(); > > > + > > > +if (ret < 0) { > > > +error_report("Load VM's live state (ram) error"); > > > +goto out; > > > +} > > > + > > > value = colo_receive_message_value(mis->from_src_file, > > > COLO_MESSAGE_VMSTATE_SIZE, _err); > > >
Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.
On Tue, Jun 19, 2018 at 01:36:17PM -0500, Venu Busireddy wrote: > On 2018-06-19 21:21:23 +0300, Michael S. Tsirkin wrote: > > On Tue, Jun 19, 2018 at 01:14:06PM -0500, Venu Busireddy wrote: > > > On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote: > > > > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote: > > > > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain the > > > > > "Group Identifier" (UUID) that will be used to pair a virtio device > > > > > with > > > > > the passthrough device attached to that bridge. > > > > > > > > > > This capability is added to the bridge iff the "uuid" option is > > > > > specified > > > > > for the bridge device, via the qemu command line. Also, the bridge's > > > > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is changed > > > > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when > > > > > the > > > > > "uuid" option is present. > > > > > > > > > > Signed-off-by: Venu Busireddy > > > > > > > > I don't see why we should add it to all bridges. > > > > Let's just add it to ones that already have the RH vendor ID? > > > > > > No. I am not adding the capability to all bridges. > > > > > > In the earlier discussions, we agreed that the bridge be left as > > > Intel bridge if we do not intend to use it for storing the pairing > > > information. If we do intend to store the pairing information in the > > > bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to > > > avoid confusion. In other words, bridge's with RH Vendor ID come into > > > existence only when there is an intent to store the pairing information > > > in the bridge. > > > > > > Accordingly, if the "uuid" option is specified for the bridge, it > > > is assumed that the user intends to use the bridge for storing the > > > pairing information, and hence, the capability is added to the bridge, > > > and the Vendor ID is changed to RH Vendor ID. If the "uuid" option > > > is not specified, the bridge remains as Intel bridge, and without the > > > vendor-specific capability. > > > > > > Venu > > > > Yes but the way to do it is not to tweak the vendor and device ID, > > instead, just add the UUID property to bridges that already have the > > correct vendor and device id. > > I was using ioh3420 as the bridge device, because that is what is > recommended here: > > https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/pcie.txt;hb=HEAD > > ioh3420 defaults to the Intel Vendor ID. Hence the tweak to change the > Vendor ID to RH Vendor ID. > > Is there another bridge device other than ioh3420 that I should use? > what device do you suggest? > > Thanks, > > Venu For pci, use hw/pci-bridge/pci_bridge_dev.c Maybe allocate a special ID for grouping bridges. For express, add your own downstream port. > > > > > > > > > > > > > > > > > --- > > > > > hw/pci-bridge/ioh3420.c| 2 ++ > > > > > hw/pci-bridge/pcie_root_port.c | 7 +++ > > > > > hw/pci/pci_bridge.c| 32 > > > > > include/hw/pci/pci.h | 2 ++ > > > > > include/hw/pci/pcie.h | 1 + > > > > > include/hw/pci/pcie_port.h | 1 + > > > > > 6 files changed, 45 insertions(+) > > > > > > > > > > diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c > > > > > index a451d74ee6..b6b9ebc726 100644 > > > > > --- a/hw/pci-bridge/ioh3420.c > > > > > +++ b/hw/pci-bridge/ioh3420.c > > > > > @@ -35,6 +35,7 @@ > > > > > #define IOH_EP_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_MASKBIT > > > > > #define IOH_EP_MSI_NR_VECTOR2 > > > > > #define IOH_EP_EXP_OFFSET 0x90 > > > > > +#define IOH_EP_VENDOR_OFFSET0xCC > > > > > #define IOH_EP_AER_OFFSET 0x100 > > > > > > > > > > /* > > > > > @@ -111,6 +112,7 @@ static void ioh3420_class_init(ObjectClass > > > > > *klass, void *data) > > > > > rpc->exp_offset = IOH_EP_EXP_OFFSET; > > > > > rpc->aer_offset = IOH_EP_AER_OFFSET; > > > > > rpc->ssvid_offset = IOH_EP_SSVID_OFFSET; > > > > > +rpc->vendor_offset = IOH_EP_VENDOR_OFFSET; > > > > > rpc->ssid = IOH_EP_SSVID_SSID; > > > > > } > > > > > > > > > > diff --git a/hw/pci-bridge/pcie_root_port.c > > > > > b/hw/pci-bridge/pcie_root_port.c > > > > > index 45f9e8cd4a..ba470c7fda 100644 > > > > > --- a/hw/pci-bridge/pcie_root_port.c > > > > > +++ b/hw/pci-bridge/pcie_root_port.c > > > > > @@ -71,6 +71,12 @@ static void rp_realize(PCIDevice *d, Error **errp) > > > > > goto err_bridge; > > > > > } > > > > > > > > > > +rc = pci_bridge_vendor_init(d, rpc->vendor_offset, errp); > > > > > +if (rc < 0) { > > > > > +error_append_hint(errp, "Can't init group ID, error %d\n", > > > > > rc); > > > > > +goto err_bridge; > > > > > +} > > > > > + > > > > > if (rpc->interrupts_init) { > > > > > rc = rpc->interrupts_init(d, errp); > > > > > if (rc < 0) { > > > > > @@
Re: [Qemu-devel] [PATCH 6/7] block/qcow2-refcount: fix out-of-file L1 entries to be zero
On 06/19/2018 01:34 PM, Vladimir Sementsov-Ogievskiy wrote: Zero out corrupted L1 table entry, which reference L2 table out of underlying file. Zero L1 table entry means that "the L2 table and all clusters described by this L2 table are unallocated." Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2-refcount.c | 37 + 1 file changed, 37 insertions(+) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index d993252fb6..3c9e2da39e 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1641,6 +1641,29 @@ static int fix_l2_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res, return ret; } +/* Zero out L1 entry + * + * Returns: -errno if overlap check failed + * 0 if write failed If the write failed, wouldn't there be an errno value worth returning? + * 1 on success + */ +static int fix_l1_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res, +BdrvCheckMode fix, int64_t l1_offset, +int l1_index, bool active, +const char *fmt, ...) +{ +int ret; +int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2; +va_list args; + +va_start(args, fmt); +ret = fix_table_entry(bs, res, fix, "L1", l1_offset, l1_index, 0, ign, + fmt, args); +va_end(args); + +return ret; +} + /* * Increases the refcount in the given refcount table for the all clusters * referenced in the L2 table. While doing so, performs some checks on L2 @@ -1837,6 +1860,20 @@ static int check_refcounts_l1(BlockDriverState *bs, if (l2_offset) { /* Mark L2 table as used */ l2_offset &= L1E_OFFSET_MASK; +if (l2_offset >= bdrv_getlength(bs->file->bs)) { Again, bdrv_getlength() can fail; you want to make sure that you check for failures before using it in comparisons. +ret = fix_l1_entry_to_zero( +bs, res, fix, l1_table_offset, i, active, +"l2 table offset out of file: offset 0x%" PRIx64, +l2_offset); +if (ret < 0) { +/* Something is seriously wrong, so abort checking + * this L1 table */ +goto fail; +} + +continue; +} + ret = qcow2_inc_refcounts_imrt(bs, res, refcount_table, refcount_table_size, l2_offset, s->cluster_size); -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH v2 05/11] hw/arm/virt: GICv3 DT node with one or two redistributor regions
Hi Eric, sorry about the late followup. I have one question (mainly for Ard): On 06/15/18 16:28, Eric Auger wrote: > This patch allows the creation of a GICv3 node with 1 or 2 > redistributor regions depending on the number of smu_cpus. > The second redistributor region is located just after the > existing RAM region, at 256GB and contains up to up to 512 vcpus. > > Please refer to kernel documentation for further node details: > Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt > > Signed-off-by: Eric Auger > Reviewed-by: Andrew Jones > > --- > v1 (virt3.0) -> v2 > - Added Drew's R-b > > v2 -> v3: > - VIRT_GIC_REDIST2 is now 64MB large, ie. 512 redistributor capacity > - virt_gicv3_redist_region_count does not test kvm_irqchip_in_kernel > anymore > --- > hw/arm/virt.c | 29 - > include/hw/arm/virt.h | 14 ++ > 2 files changed, 38 insertions(+), 5 deletions(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 2885d18..d9f72eb 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -148,6 +148,8 @@ static const MemMapEntry a15memmap[] = { > [VIRT_PCIE_PIO] = { 0x3eff, 0x0001 }, > [VIRT_PCIE_ECAM] = { 0x3f00, 0x0100 }, > [VIRT_MEM] ={ 0x4000, RAMLIMIT_BYTES }, > +/* Additional 64 MB redist region (can contain up to 512 redistributors) > */ > +[VIRT_GIC_REDIST2] ={ 0x40ULL, 0x400 }, > /* Second PCIe window, 512GB wide at the 512GB boundary */ > [VIRT_PCIE_MMIO_HIGH] = { 0x80ULL, 0x80ULL }, > }; > @@ -401,13 +403,30 @@ static void fdt_add_gic_node(VirtMachineState *vms) > qemu_fdt_setprop_cell(vms->fdt, "/intc", "#size-cells", 0x2); > qemu_fdt_setprop(vms->fdt, "/intc", "ranges", NULL, 0); > if (vms->gic_version == 3) { > +int nb_redist_regions = virt_gicv3_redist_region_count(vms); > + > qemu_fdt_setprop_string(vms->fdt, "/intc", "compatible", > "arm,gic-v3"); > -qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", > - 2, vms->memmap[VIRT_GIC_DIST].base, > - 2, vms->memmap[VIRT_GIC_DIST].size, > - 2, vms->memmap[VIRT_GIC_REDIST].base, > - 2, vms->memmap[VIRT_GIC_REDIST].size); > + > +qemu_fdt_setprop_cell(vms->fdt, "/intc", > + "#redistributor-regions", nb_redist_regions); > + > +if (nb_redist_regions == 1) { > +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", > + 2, vms->memmap[VIRT_GIC_DIST].base, > + 2, vms->memmap[VIRT_GIC_DIST].size, > + 2, > vms->memmap[VIRT_GIC_REDIST].base, > + 2, > vms->memmap[VIRT_GIC_REDIST].size); > +} else { > +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", > + 2, vms->memmap[VIRT_GIC_DIST].base, > + 2, vms->memmap[VIRT_GIC_DIST].size, > + 2, > vms->memmap[VIRT_GIC_REDIST].base, > + 2, > vms->memmap[VIRT_GIC_REDIST].size, > + 2, > vms->memmap[VIRT_GIC_REDIST2].base, > + 2, > vms->memmap[VIRT_GIC_REDIST2].size); > +} > + > if (vms->virt) { > qemu_fdt_setprop_cells(vms->fdt, "/intc", "interrupts", > GIC_FDT_IRQ_TYPE_PPI, > ARCH_GICV3_MAINT_IRQ, In edk2, we have the following code in "ArmVirtPkg/Library/ArmVirtGicArchLib/ArmVirtGicArchLib.c": switch (GicRevision) { case 3: // // The GIC v3 DT binding describes a series of at least 3 physical (base // addresses, size) pairs: the distributor interface (GICD), at least one // redistributor region (GICR) containing dedicated redistributor // interfaces for all individual CPUs, and the CPU interface (GICC). // Under virtualization, we assume that the first redistributor region // listed covers the boot CPU. Also, our GICv3 driver only supports the // system register CPU interface, so we can safely ignore the MMIO version // which is listed after the sequence of redistributor interfaces. // This means we are only interested in the first two memory regions // supplied, and ignore everything else. // ASSERT (RegSize >= 32); // RegProp[0..1] == { GICD base, GICD size } DistBase = SwapBytes64 (Reg[0]); ASSERT (DistBase < MAX_UINTN); // RegProp[2..3] == { GICR base, GICR size } RedistBase = SwapBytes64 (Reg[2]); ASSERT (RedistBase < MAX_UINTN); PcdStatus = PcdSet64S
Re: [Qemu-devel] [PATCH 3/7] block/qcow2-refcount: check_refcounts_l2: refactor compressed case
On 06/19/2018 01:34 PM, Vladimir Sementsov-Ogievskiy wrote: Separate offset and size of compressed cluster. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2-refcount.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) Hmm, I wonder if this duplicates my pending patch: https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg04542.html -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH v6 5/6] iotests: Add new test 214 for max compressed cluster offset
On 04/26/2018 07:10 AM, Alberto Garcia wrote: On Thu 26 Apr 2018 04:51:28 AM CEST, Eric Blake wrote: If you have a capable file system (tmpfs is good, ext4 not so much; run ./check with TEST_DIR pointing to a good location so as not to skip the test), it's actually possible to create a qcow2 file that expands to a sparse 512T image with just over 38M of content. The test is not the world's fastest (qemu crawling through 256M bits of refcount table to find the next cluster to allocate takes several seconds, as does qemu-img check reporting millions of leaked clusters); but it DOES catch the problem that the previous patch just fixed where writing a compressed cluster to a full image ended up overwriting the wrong cluster. Suggested-by: Max Reitz Signed-off-by: Eric Blake Nice test :-) Reviewed-by: Alberto Garcia 214 is already in the tree in the meantime; this will need a rebase to pick the next available test number (220 might be claimed, so 222?) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH] hw/pci-host/xilinx-pcie: don't make "io" region be RAM
On Tue, Jun 19, 2018 at 5:07 AM, Peter Maydell wrote: > Currently we use memory_region_init_rom_nomigrate() to create > the "io" memory region to pass to pci_register_root_bus(). > This is a dummy region, because this PCI controller doesn't > support accesses to PCI IO space. > > There is no reason for the dummy region to be a RAM region; > it is only used as a place where PCI BARs can be mapped, > and if you could get a PCI card to do a bus master access > to the IO space it should not get acts-like-RAM behaviour. > Use a simple container memory region instead. (We do have > one PCI card model which can do bus master accesses to IO > space -- the LSI53C895A SCSI adaptor.) > > This avoids the oddity of having a memory region which is > RAM but where the RAM is not migrated. > > Note that the size of the region we use here has no > effect on behaviour. > > Signed-off-by: Peter Maydell Reviewed-by: Alistair Francis Alistair > --- > hw/pci-host/xilinx-pcie.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/hw/pci-host/xilinx-pcie.c b/hw/pci-host/xilinx-pcie.c > index 044e312dc18..b0a31b917d8 100644 > --- a/hw/pci-host/xilinx-pcie.c > +++ b/hw/pci-host/xilinx-pcie.c > @@ -120,9 +120,8 @@ static void xilinx_pcie_host_realize(DeviceState *dev, > Error **errp) > memory_region_init(>mmio, OBJECT(s), "mmio", UINT64_MAX); > memory_region_set_enabled(>mmio, false); > > -/* dummy I/O region */ > -memory_region_init_ram_nomigrate(>io, OBJECT(s), "io", 16, NULL); > -memory_region_set_enabled(>io, false); > +/* dummy PCI I/O region (not visible to the CPU) */ > +memory_region_init(>io, OBJECT(s), "io", 16); > > /* interrupt out */ > qdev_init_gpio_out_named(dev, >irq, "interrupt_out", 1); > -- > 2.17.1 > >
Re: [Qemu-devel] [PATCH v6 0/6] minor qcow2 compression improvements
ping On 04/25/2018 09:51 PM, Eric Blake wrote: Even though v5 was posted earlier today, it was worth a respin: - 2/6: add R-b [Berto] - 4/6, 6/6: improve commit messages [Max] - 5/6: new patch, with an iotests proving that 4/6 is a bug fix [Max] The new test is rather slow (nearly 90 seconds for me using tmpfs) unless it skips entirely (such as testing on ext4); ideas for speeding it up are welcome (translation: maybe qemu should optimize the search for the next available cluster to allocate, and/or qemu-img check should be faster at reporting leaked clusters) 001/6:[] [--] 'qcow2: Prefer byte-based calls into bs->file' 002/6:[] [--] 'qcow2: Document some maximum size constraints' 003/6:[] [--] 'qcow2: Reduce REFT_OFFSET_MASK' 004/6:[] [--] 'qcow2: Don't allow overflow during cluster allocation' 005/6:[down] 'iotests: Add new test 214 for max compressed cluster offset' 006/6:[] [--] 'qcow2: Avoid memory over-allocation on compressed images' Eric Blake (6): qcow2: Prefer byte-based calls into bs->file qcow2: Document some maximum size constraints qcow2: Reduce REFT_OFFSET_MASK qcow2: Don't allow overflow during cluster allocation iotests: Add new test 214 for max compressed cluster offset qcow2: Avoid memory over-allocation on compressed images docs/interop/qcow2.txt | 40 +-- block/qcow2.h | 8 +++- block/qcow2-cluster.c | 32 +-- block/qcow2-refcount.c | 27 - block/qcow2.c | 2 +- tests/qemu-iotests/214 | 97 ++ tests/qemu-iotests/214.out | 54 ++ tests/qemu-iotests/group | 1 + 8 files changed, 234 insertions(+), 27 deletions(-) create mode 100755 tests/qemu-iotests/214 create mode 100644 tests/qemu-iotests/214.out -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH 2/7] block/qcow2-refcount: avoid eating RAM
On 06/19/2018 01:34 PM, Vladimir Sementsov-Ogievskiy wrote: qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat unpredicted amount of memory on corrupted table entries, which are s/unpredicted/an unpredictable/ referencing regions far beyond the end of file. Prevent this, by skipping such regions from further processing. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2-refcount.c | 8 1 file changed, 8 insertions(+) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index f9d095aa2d..28d21bedc3 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1505,6 +1505,14 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, BdrvCheckResult *res, return 0; } +if (offset + size - bdrv_getlength(bs->file->bs) > s->cluster_size) { bdrv_getlength() can fail (returning a negative value); this needs to be refactored so that you aren't performing arithmetic comparisons after such a failure (even if that failure is unlikely). +fprintf(stderr, "ERROR: counting reference for region exceeding the " +"end of the file by more than one cluster: offset 0x%" PRIx64 +" size 0x%" PRIx64 "\n", offset, size); Why is this dumping directly to stderr? /me reads the file Oh. We probably ought to fix the code to pass an Error **errp parameter through the callstack, but that's a bigger audit (and not the fault of your patch for copying existing usage). +res->corruptions++; +return 0; +} + start = start_of_cluster(s, offset); last = start_of_cluster(s, offset + size - 1); for(cluster_offset = start; cluster_offset <= last; -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.
On 2018-06-19 21:21:23 +0300, Michael S. Tsirkin wrote: > On Tue, Jun 19, 2018 at 01:14:06PM -0500, Venu Busireddy wrote: > > On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote: > > > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote: > > > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain the > > > > "Group Identifier" (UUID) that will be used to pair a virtio device with > > > > the passthrough device attached to that bridge. > > > > > > > > This capability is added to the bridge iff the "uuid" option is > > > > specified > > > > for the bridge device, via the qemu command line. Also, the bridge's > > > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is changed > > > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when the > > > > "uuid" option is present. > > > > > > > > Signed-off-by: Venu Busireddy > > > > > > I don't see why we should add it to all bridges. > > > Let's just add it to ones that already have the RH vendor ID? > > > > No. I am not adding the capability to all bridges. > > > > In the earlier discussions, we agreed that the bridge be left as > > Intel bridge if we do not intend to use it for storing the pairing > > information. If we do intend to store the pairing information in the > > bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to > > avoid confusion. In other words, bridge's with RH Vendor ID come into > > existence only when there is an intent to store the pairing information > > in the bridge. > > > > Accordingly, if the "uuid" option is specified for the bridge, it > > is assumed that the user intends to use the bridge for storing the > > pairing information, and hence, the capability is added to the bridge, > > and the Vendor ID is changed to RH Vendor ID. If the "uuid" option > > is not specified, the bridge remains as Intel bridge, and without the > > vendor-specific capability. > > > > Venu > > Yes but the way to do it is not to tweak the vendor and device ID, > instead, just add the UUID property to bridges that already have the > correct vendor and device id. I was using ioh3420 as the bridge device, because that is what is recommended here: https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/pcie.txt;hb=HEAD ioh3420 defaults to the Intel Vendor ID. Hence the tweak to change the Vendor ID to RH Vendor ID. Is there another bridge device other than ioh3420 that I should use? what device do you suggest? Thanks, Venu > > > > > > > > > > > > --- > > > > hw/pci-bridge/ioh3420.c| 2 ++ > > > > hw/pci-bridge/pcie_root_port.c | 7 +++ > > > > hw/pci/pci_bridge.c| 32 > > > > include/hw/pci/pci.h | 2 ++ > > > > include/hw/pci/pcie.h | 1 + > > > > include/hw/pci/pcie_port.h | 1 + > > > > 6 files changed, 45 insertions(+) > > > > > > > > diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c > > > > index a451d74ee6..b6b9ebc726 100644 > > > > --- a/hw/pci-bridge/ioh3420.c > > > > +++ b/hw/pci-bridge/ioh3420.c > > > > @@ -35,6 +35,7 @@ > > > > #define IOH_EP_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_MASKBIT > > > > #define IOH_EP_MSI_NR_VECTOR2 > > > > #define IOH_EP_EXP_OFFSET 0x90 > > > > +#define IOH_EP_VENDOR_OFFSET0xCC > > > > #define IOH_EP_AER_OFFSET 0x100 > > > > > > > > /* > > > > @@ -111,6 +112,7 @@ static void ioh3420_class_init(ObjectClass *klass, > > > > void *data) > > > > rpc->exp_offset = IOH_EP_EXP_OFFSET; > > > > rpc->aer_offset = IOH_EP_AER_OFFSET; > > > > rpc->ssvid_offset = IOH_EP_SSVID_OFFSET; > > > > +rpc->vendor_offset = IOH_EP_VENDOR_OFFSET; > > > > rpc->ssid = IOH_EP_SSVID_SSID; > > > > } > > > > > > > > diff --git a/hw/pci-bridge/pcie_root_port.c > > > > b/hw/pci-bridge/pcie_root_port.c > > > > index 45f9e8cd4a..ba470c7fda 100644 > > > > --- a/hw/pci-bridge/pcie_root_port.c > > > > +++ b/hw/pci-bridge/pcie_root_port.c > > > > @@ -71,6 +71,12 @@ static void rp_realize(PCIDevice *d, Error **errp) > > > > goto err_bridge; > > > > } > > > > > > > > +rc = pci_bridge_vendor_init(d, rpc->vendor_offset, errp); > > > > +if (rc < 0) { > > > > +error_append_hint(errp, "Can't init group ID, error %d\n", rc); > > > > +goto err_bridge; > > > > +} > > > > + > > > > if (rpc->interrupts_init) { > > > > rc = rpc->interrupts_init(d, errp); > > > > if (rc < 0) { > > > > @@ -137,6 +143,7 @@ static void rp_exit(PCIDevice *d) > > > > static Property rp_props[] = { > > > > DEFINE_PROP_BIT(COMPAT_PROP_PCP, PCIDevice, cap_present, > > > > QEMU_PCIE_SLTCAP_PCP_BITNR, true), > > > > +DEFINE_PROP_UUID(COMPAT_PROP_UUID, PCIDevice, uuid, false), > > > > DEFINE_PROP_END_OF_LIST() > > > > }; > > > > > > > > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c > > > > index 40a39f57cb..c63bc439f7
[Qemu-devel] [PATCH 5/7] block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero
Split entry repairing to separate function, to be reused later. Note: entry in in-memory l2 table (local variable in check_refcounts_l2) is not updated after this patch. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2-refcount.c | 147 - 1 file changed, 109 insertions(+), 38 deletions(-) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 02583f260b..d993252fb6 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1548,6 +1548,99 @@ enum { CHECK_FRAG_INFO = 0x2, /* update BlockFragInfo counters */ }; +/* Update entry in L1 or L2 table + * + * Returns: -errno if overlap check failed + * 0 if write failed + * 1 on success + */ +static int write_table_entry(BlockDriverState *bs, const char *table_name, + uint64_t table_offset, int entry_index, + uint64_t new_val, int ign) +{ +int ret; +uint64_t entry_offset = +table_offset + (uint64_t)entry_index * sizeof(new_val); + +cpu_to_be64s(_val); +ret = qcow2_pre_write_overlap_check(bs, ign, entry_offset, sizeof(new_val)); +if (ret < 0) { +fprintf(stderr, +"ERROR: Can't write %s table entry: overlap check failed: %s\n", +table_name, strerror(-ret)); +return ret; +} + +ret = bdrv_pwrite_sync(bs->file, entry_offset, _val, sizeof(new_val)); +if (ret < 0) { +fprintf(stderr, "ERROR: Failed to overwrite %s table entry: %s\n", +table_name, strerror(-ret)); +return 0; +} + +return 1; +} + +/* Try to fix (if allowed) entry in L1 or L2 table. Update @res correspondingly. + * + * Returns: -errno if overlap check failed + * 0 if entry was not updated for other reason + *(fixing disabled or write failed) + * 1 on success + */ +static int fix_table_entry(BlockDriverState *bs, BdrvCheckResult *res, + BdrvCheckMode fix, const char *table_name, + uint64_t table_offset, int entry_index, + uint64_t new_val, int ign, + const char *fmt, va_list args) +{ +int ret; + +fprintf(stderr, fix & BDRV_FIX_ERRORS ? "Repairing: " : "ERROR: "); +vfprintf(stderr, fmt, args); +fprintf(stderr, "\n"); + +if (!(fix & BDRV_FIX_ERRORS)) { +res->corruptions++; +return 0; +} + +ret = write_table_entry(bs, table_name, table_offset, entry_index, new_val, +ign); + +if (ret == 1) { +res->corruptions_fixed++; +} else { +res->check_errors++; +} + +return ret; +} + +/* Make L2 entry to be QCOW2_CLUSTER_ZERO_PLAIN + * + * Returns: -errno if overlap check failed + * 0 if write failed + * 1 on success + */ +static int fix_l2_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res, +BdrvCheckMode fix, int64_t l2_offset, +int l2_index, bool active, +const char *fmt, ...) +{ +int ret; +int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2; +uint64_t l2_entry = QCOW_OFLAG_ZERO; +va_list args; + +va_start(args, fmt); +ret = fix_table_entry(bs, res, fix, "L2", l2_offset, l2_index, l2_entry, + ign, fmt, args); +va_end(args); + +return ret; +} + /* * Increases the refcount in the given refcount table for the all clusters * referenced in the L2 table. While doing so, performs some checks on L2 @@ -1640,46 +1733,24 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, if (qcow2_get_cluster_type(l2_entry) == QCOW2_CLUSTER_ZERO_ALLOC) { -fprintf(stderr, "%s offset=%" PRIx64 ": Preallocated zero " -"cluster is not properly aligned; L2 entry " -"corrupted.\n", -fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", +ret = fix_l2_entry_to_zero( +bs, res, fix, l2_offset, i, active, +"offset=%" PRIx64 ": Preallocated zero cluster is " +"not properly aligned; L2 entry corrupted.", offset); -if (fix & BDRV_FIX_ERRORS) { -uint64_t l2e_offset = -l2_offset + (uint64_t)i * sizeof(uint64_t); -int ign = active ? QCOW2_OL_ACTIVE_L2 : - QCOW2_OL_INACTIVE_L2; - -l2_entry = QCOW_OFLAG_ZERO; -l2_table[i] = cpu_to_be64(l2_entry); -ret = qcow2_pre_write_overlap_check(bs, ign, -
[Qemu-devel] [PATCH 3/7] block/qcow2-refcount: check_refcounts_l2: refactor compressed case
Separate offset and size of compressed cluster. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2-refcount.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 28d21bedc3..42167b7040 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1564,7 +1564,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, BDRVQcow2State *s = bs->opaque; uint64_t *l2_table, l2_entry; uint64_t next_contiguous_offset = 0; -int i, l2_size, nb_csectors, ret; +int i, l2_size, ret; /* Read L2 table from disk */ l2_size = s->l2_size * sizeof(uint64_t); @@ -1583,6 +1583,9 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, switch (qcow2_get_cluster_type(l2_entry)) { case QCOW2_CLUSTER_COMPRESSED: +{ +int64_t csize, coffset; + /* Compressed clusters don't have QCOW_OFLAG_COPIED */ if (l2_entry & QCOW_OFLAG_COPIED) { fprintf(stderr, "ERROR: coffset=0x%" PRIx64 ": " @@ -1593,12 +1596,13 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, } /* Mark cluster as used */ -nb_csectors = ((l2_entry >> s->csize_shift) & - s->csize_mask) + 1; -l2_entry &= s->cluster_offset_mask; +csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) * +BDRV_SECTOR_SIZE; +coffset = l2_entry & s->cluster_offset_mask & + ~(BDRV_SECTOR_SIZE - 1); ret = qcow2_inc_refcounts_imrt(bs, res, refcount_table, refcount_table_size, - l2_entry & ~511, nb_csectors * 512); + coffset, csize); if (ret < 0) { goto fail; } @@ -1615,6 +1619,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, res->bfi.fragmented_clusters++; } break; +} case QCOW2_CLUSTER_ZERO_ALLOC: case QCOW2_CLUSTER_NORMAL: -- 2.11.1
Re: [Qemu-devel] [PATCH v5 1/6] nbd/server: fix trace
On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote: Return code = 1 doesn't mean that we parsed base:allocation. Use correct traces in both -parsed and -skipped cases. Signed-off-by: Vladimir Sementsov-Ogievskiy --- nbd/server.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/nbd/server.c b/nbd/server.c index 9e1f227178..8e02e077ec 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -741,7 +741,10 @@ static int nbd_negotiate_send_meta_context(NBDClient *client, * the current name, after the 'base:' portion has been stripped. * * Return -errno on I/O error, 0 if option was completely handled by - * sending a reply about inconsistent lengths, or 1 on success. */ + * sending a reply about inconsistent lengths, or 1 on success. + * + * Note: return code = 1 doesn't mean that we've parsed "base:allocation" + * namespace. It only means that there are no errors.*/ Space before comment tail (actually, the recent conversation on comment style says the tail should be on its own line...) That's something I can tweak on commit. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
[Qemu-devel] [PATCH 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
Rewrite corrupted L2 table entry, which reference space out of underlying file. Make this L2 table entry read-as-all-zeros without any allocation. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2-refcount.c | 32 1 file changed, 32 insertions(+) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 3c9e2da39e..cbad8355f3 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1714,8 +1714,30 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, /* Mark cluster as used */ csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) * BDRV_SECTOR_SIZE; +if (csize > s->cluster_size) { +ret = fix_l2_entry_to_zero( +bs, res, fix, l2_offset, i, active, +"compressed cluster larger than cluster: size 0x%" +PRIx64, csize); +if (ret < 0) { +goto fail; +} +continue; +} + coffset = l2_entry & s->cluster_offset_mask & ~(BDRV_SECTOR_SIZE - 1); +if (coffset >= bdrv_getlength(bs->file->bs)) { +ret = fix_l2_entry_to_zero( +bs, res, fix, l2_offset, i, active, +"compressed cluster out of file: offset 0x%" PRIx64, +coffset); +if (ret < 0) { +goto fail; +} +continue; +} + ret = qcow2_inc_refcounts_imrt(bs, res, refcount_table, refcount_table_size, coffset, csize); @@ -1742,6 +1764,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, { uint64_t offset = l2_entry & L2E_OFFSET_MASK; +if (offset >= bdrv_getlength(bs->file->bs)) { +ret = fix_l2_entry_to_zero( +bs, res, fix, l2_offset, i, active, +"cluster out of file: offset 0x%" PRIx64, offset); +if (ret < 0) { +goto fail; +} +continue; +} + if (flags & CHECK_FRAG_INFO) { res->bfi.allocated_clusters++; if (next_contiguous_offset && -- 2.11.1
[Qemu-devel] [PATCH 1/7] block/qcow2-refcount: fix check_oflag_copied
Increase corruptions_fixed only after successful fix. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2-refcount.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 18c729aa27..f9d095aa2d 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1816,7 +1816,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res, for (i = 0; i < s->l1_size; i++) { uint64_t l1_entry = s->l1_table[i]; uint64_t l2_offset = l1_entry & L1E_OFFSET_MASK; -bool l2_dirty = false; +int l2_fixed_entries = 0; if (!l2_offset) { continue; @@ -1878,8 +1878,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res, l2_table[j] = cpu_to_be64(refcount == 1 ? l2_entry | QCOW_OFLAG_COPIED : l2_entry & ~QCOW_OFLAG_COPIED); -l2_dirty = true; -res->corruptions_fixed++; +l2_fixed_entries++; } else { res->corruptions++; } @@ -1887,7 +1886,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res, } } -if (l2_dirty) { +if (l2_fixed_entries > 0) { ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_ACTIVE_L2, l2_offset, s->cluster_size); if (ret < 0) { @@ -1905,6 +1904,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res, res->check_errors++; goto fail; } +res->corruptions_fixed += l2_fixed_entries; } } -- 2.11.1
[Qemu-devel] [PATCH 6/7] block/qcow2-refcount: fix out-of-file L1 entries to be zero
Zero out corrupted L1 table entry, which reference L2 table out of underlying file. Zero L1 table entry means that "the L2 table and all clusters described by this L2 table are unallocated." Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2-refcount.c | 37 + 1 file changed, 37 insertions(+) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index d993252fb6..3c9e2da39e 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1641,6 +1641,29 @@ static int fix_l2_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res, return ret; } +/* Zero out L1 entry + * + * Returns: -errno if overlap check failed + * 0 if write failed + * 1 on success + */ +static int fix_l1_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res, +BdrvCheckMode fix, int64_t l1_offset, +int l1_index, bool active, +const char *fmt, ...) +{ +int ret; +int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2; +va_list args; + +va_start(args, fmt); +ret = fix_table_entry(bs, res, fix, "L1", l1_offset, l1_index, 0, ign, + fmt, args); +va_end(args); + +return ret; +} + /* * Increases the refcount in the given refcount table for the all clusters * referenced in the L2 table. While doing so, performs some checks on L2 @@ -1837,6 +1860,20 @@ static int check_refcounts_l1(BlockDriverState *bs, if (l2_offset) { /* Mark L2 table as used */ l2_offset &= L1E_OFFSET_MASK; +if (l2_offset >= bdrv_getlength(bs->file->bs)) { +ret = fix_l1_entry_to_zero( +bs, res, fix, l1_table_offset, i, active, +"l2 table offset out of file: offset 0x%" PRIx64, +l2_offset); +if (ret < 0) { +/* Something is seriously wrong, so abort checking + * this L1 table */ +goto fail; +} + +continue; +} + ret = qcow2_inc_refcounts_imrt(bs, res, refcount_table, refcount_table_size, l2_offset, s->cluster_size); -- 2.11.1
[Qemu-devel] [PATCH 4/7] block/qcow2-refcount: check_refcounts_l2: reduce ignored overlaps
Reduce number of structures ignored in overlap check: when checking active table ignore active tables, when checking inactive table ignore inactive ones. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2-refcount.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 42167b7040..02583f260b 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1559,7 +1559,7 @@ enum { static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, void **refcount_table, int64_t *refcount_table_size, int64_t l2_offset, - int flags, BdrvCheckMode fix) + int flags, BdrvCheckMode fix, bool active) { BDRVQcow2State *s = bs->opaque; uint64_t *l2_table, l2_entry; @@ -1648,11 +1648,12 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, if (fix & BDRV_FIX_ERRORS) { uint64_t l2e_offset = l2_offset + (uint64_t)i * sizeof(uint64_t); +int ign = active ? QCOW2_OL_ACTIVE_L2 : + QCOW2_OL_INACTIVE_L2; l2_entry = QCOW_OFLAG_ZERO; l2_table[i] = cpu_to_be64(l2_entry); -ret = qcow2_pre_write_overlap_check(bs, -QCOW2_OL_ACTIVE_L2 | QCOW2_OL_INACTIVE_L2, +ret = qcow2_pre_write_overlap_check(bs, ign, l2e_offset, sizeof(uint64_t)); if (ret < 0) { fprintf(stderr, "ERROR: Overlap check failed\n"); @@ -1726,7 +1727,7 @@ static int check_refcounts_l1(BlockDriverState *bs, void **refcount_table, int64_t *refcount_table_size, int64_t l1_table_offset, int l1_size, - int flags, BdrvCheckMode fix) + int flags, BdrvCheckMode fix, bool active) { BDRVQcow2State *s = bs->opaque; uint64_t *l1_table = NULL, l2_offset, l1_size2; @@ -1782,7 +1783,7 @@ static int check_refcounts_l1(BlockDriverState *bs, /* Process and check L2 entries */ ret = check_refcounts_l2(bs, res, refcount_table, refcount_table_size, l2_offset, flags, - fix); + fix, active); if (ret < 0) { goto fail; } @@ -2068,7 +2069,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res, /* current L1 table */ ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters, s->l1_table_offset, s->l1_size, CHECK_FRAG_INFO, - fix); + fix, true); if (ret < 0) { return ret; } @@ -2091,7 +2092,8 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res, continue; } ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters, - sn->l1_table_offset, sn->l1_size, 0, fix); + sn->l1_table_offset, sn->l1_size, 0, fix, + false); if (ret < 0) { return ret; } -- 2.11.1
[Qemu-devel] [PATCH 0/7] qcow2 check improvements
Hi all! We've faced the following problem: after host fs corruption, vm images becomes invalid. And which is interesting, starting qemu-img check on them led to allocating of the whole RAM and then killing qemu-img by OOM Killer. This was due to corrupted l2 entries, which referenced clusters far-far beyond the end of the qcow2 file. 02 is a generic fix for the bug, 01 is unrelated improvement, 03-07 are additional info and fixing for such corrupted table entries. Questions on 02, 06 and 07: 1. Should restrictions be more or less strict? 2. Are there valid cases, when such entries should not be considered as corrupted? Vladimir Sementsov-Ogievskiy (7): block/qcow2-refcount: fix check_oflag_copied block/qcow2-refcount: avoid eating RAM block/qcow2-refcount: check_refcounts_l2: refactor compressed case block/qcow2-refcount: check_refcounts_l2: reduce ignored overlaps block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero block/qcow2-refcount: fix out-of-file L1 entries to be zero block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero block/qcow2-refcount.c | 257 +++-- 1 file changed, 206 insertions(+), 51 deletions(-) -- 2.11.1
[Qemu-devel] [PATCH 2/7] block/qcow2-refcount: avoid eating RAM
qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat unpredicted amount of memory on corrupted table entries, which are referencing regions far beyond the end of file. Prevent this, by skipping such regions from further processing. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2-refcount.c | 8 1 file changed, 8 insertions(+) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index f9d095aa2d..28d21bedc3 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1505,6 +1505,14 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, BdrvCheckResult *res, return 0; } +if (offset + size - bdrv_getlength(bs->file->bs) > s->cluster_size) { +fprintf(stderr, "ERROR: counting reference for region exceeding the " +"end of the file by more than one cluster: offset 0x%" PRIx64 +" size 0x%" PRIx64 "\n", offset, size); +res->corruptions++; +return 0; +} + start = start_of_cluster(s, offset); last = start_of_cluster(s, offset + size - 1); for(cluster_offset = start; cluster_offset <= last; -- 2.11.1
Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.
On Tue, Jun 19, 2018 at 01:14:06PM -0500, Venu Busireddy wrote: > On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote: > > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote: > > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain the > > > "Group Identifier" (UUID) that will be used to pair a virtio device with > > > the passthrough device attached to that bridge. > > > > > > This capability is added to the bridge iff the "uuid" option is specified > > > for the bridge device, via the qemu command line. Also, the bridge's > > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is changed > > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when the > > > "uuid" option is present. > > > > > > Signed-off-by: Venu Busireddy > > > > I don't see why we should add it to all bridges. > > Let's just add it to ones that already have the RH vendor ID? > > No. I am not adding the capability to all bridges. > > In the earlier discussions, we agreed that the bridge be left as > Intel bridge if we do not intend to use it for storing the pairing > information. If we do intend to store the pairing information in the > bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to > avoid confusion. In other words, bridge's with RH Vendor ID come into > existence only when there is an intent to store the pairing information > in the bridge. > > Accordingly, if the "uuid" option is specified for the bridge, it > is assumed that the user intends to use the bridge for storing the > pairing information, and hence, the capability is added to the bridge, > and the Vendor ID is changed to RH Vendor ID. If the "uuid" option > is not specified, the bridge remains as Intel bridge, and without the > vendor-specific capability. > > Venu Yes but the way to do it is not to tweak the vendor and device ID, instead, just add the UUID property to bridges that already have the correct vendor and device id. > > > > > > > --- > > > hw/pci-bridge/ioh3420.c| 2 ++ > > > hw/pci-bridge/pcie_root_port.c | 7 +++ > > > hw/pci/pci_bridge.c| 32 > > > include/hw/pci/pci.h | 2 ++ > > > include/hw/pci/pcie.h | 1 + > > > include/hw/pci/pcie_port.h | 1 + > > > 6 files changed, 45 insertions(+) > > > > > > diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c > > > index a451d74ee6..b6b9ebc726 100644 > > > --- a/hw/pci-bridge/ioh3420.c > > > +++ b/hw/pci-bridge/ioh3420.c > > > @@ -35,6 +35,7 @@ > > > #define IOH_EP_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_MASKBIT > > > #define IOH_EP_MSI_NR_VECTOR2 > > > #define IOH_EP_EXP_OFFSET 0x90 > > > +#define IOH_EP_VENDOR_OFFSET0xCC > > > #define IOH_EP_AER_OFFSET 0x100 > > > > > > /* > > > @@ -111,6 +112,7 @@ static void ioh3420_class_init(ObjectClass *klass, > > > void *data) > > > rpc->exp_offset = IOH_EP_EXP_OFFSET; > > > rpc->aer_offset = IOH_EP_AER_OFFSET; > > > rpc->ssvid_offset = IOH_EP_SSVID_OFFSET; > > > +rpc->vendor_offset = IOH_EP_VENDOR_OFFSET; > > > rpc->ssid = IOH_EP_SSVID_SSID; > > > } > > > > > > diff --git a/hw/pci-bridge/pcie_root_port.c > > > b/hw/pci-bridge/pcie_root_port.c > > > index 45f9e8cd4a..ba470c7fda 100644 > > > --- a/hw/pci-bridge/pcie_root_port.c > > > +++ b/hw/pci-bridge/pcie_root_port.c > > > @@ -71,6 +71,12 @@ static void rp_realize(PCIDevice *d, Error **errp) > > > goto err_bridge; > > > } > > > > > > +rc = pci_bridge_vendor_init(d, rpc->vendor_offset, errp); > > > +if (rc < 0) { > > > +error_append_hint(errp, "Can't init group ID, error %d\n", rc); > > > +goto err_bridge; > > > +} > > > + > > > if (rpc->interrupts_init) { > > > rc = rpc->interrupts_init(d, errp); > > > if (rc < 0) { > > > @@ -137,6 +143,7 @@ static void rp_exit(PCIDevice *d) > > > static Property rp_props[] = { > > > DEFINE_PROP_BIT(COMPAT_PROP_PCP, PCIDevice, cap_present, > > > QEMU_PCIE_SLTCAP_PCP_BITNR, true), > > > +DEFINE_PROP_UUID(COMPAT_PROP_UUID, PCIDevice, uuid, false), > > > DEFINE_PROP_END_OF_LIST() > > > }; > > > > > > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c > > > index 40a39f57cb..c63bc439f7 100644 > > > --- a/hw/pci/pci_bridge.c > > > +++ b/hw/pci/pci_bridge.c > > > @@ -34,12 +34,17 @@ > > > #include "hw/pci/pci_bus.h" > > > #include "qemu/range.h" > > > #include "qapi/error.h" > > > +#include "qemu/uuid.h" > > > > > > /* PCI bridge subsystem vendor ID helper functions */ > > > #define PCI_SSVID_SIZEOF8 > > > #define PCI_SSVID_SVID 4 > > > #define PCI_SSVID_SSID 6 > > > > > > +#define PCI_VENDOR_SIZEOF 20 > > > +#define PCI_VENDOR_CAP_LEN_OFFSET 2 > > > +#define PCI_VENDOR_GROUP_ID_OFFSET 4 > > > + > > > int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset, >
Re: [Qemu-devel] [PATCH v2] migration: fix crash in when incoming client channel setup fails
On 06/19/2018 11:35 AM, Daniel P. Berrangé wrote: The way we determine if we can start the incoming migration was changed to use migration_has_all_channels() in: commit 428d89084c709e568f9cd301c2f6416a54c53d6d Author: Juan Quintela Date: Mon Jul 24 13:06:25 2017 +0200 migration: Create migration_has_all_channels This method in turn calls multifd_recv_all_channels_created() which is hardcoded to always return 'true' when multifd is not in use. This is a latent bug... ...activated in in a following commit where that return result s/in in/in/ -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH 00/113] Patch Round-up for stable 2.11.2, freeze on 2018-06-22
On 06/18/2018 09:41 PM, Michael Roth wrote: > Hi everyone, > > The following new patches are queued for QEMU stable v2.11.2: > > https://github.com/mdroth/qemu/commits/stable-2.11-staging > > The release is planned for 2018-06-22: > > https://wiki.qemu.org/Planning/2.11 > > Please respond here or CC qemu-sta...@nongnu.org on any patches you > think should be included in the release. > > Thanks! > Extra patches we are carrying in Fedora 28: commit f7a5376d4b667cf6c83c1d640e32d22456d7b5ee Author: Daniel P. Berrange Date: Tue Jan 16 13:42:10 2018 + qapi: ensure stable sort ordering when checking QAPI entities commit 057ad0b46992e3ec4ce29b9103162aa3c683f347 Author: Daniel P. Berrangé Date: Wed Feb 28 14:04:38 2018 + crypto: ensure we use a predictable TLS priority setting Thanks, Cole
Re: [Qemu-devel] [virtio-dev] Re: [PATCH virtio 1/1] Add "Group Identifier" support to virtio PCI capabilities.
On 2018-06-19 21:12:17 +0300, Michael S. Tsirkin wrote: > On Tue, Jun 19, 2018 at 12:54:06PM -0500, Venu Busireddy wrote: > > On 2018-06-19 20:30:06 +0300, Michael S. Tsirkin wrote: > > > On Tue, Jun 19, 2018 at 11:32:28AM -0500, Venu Busireddy wrote: > > > > Add VIRTIO_PCI_CAP_GROUP_ID_CFG (Group Identifier) capability to the > > > > virtio PCI capabilities to allow for the grouping of devices. > > > > > > > > Signed-off-by: Venu Busireddy > > > > --- > > > > content.tex | 43 +++ > > > > 1 file changed, 43 insertions(+) > > > > > > > > diff --git a/content.tex b/content.tex > > > > index 7a92cb1..7ea6267 100644 > > > > --- a/content.tex > > > > +++ b/content.tex > > > > @@ -599,6 +599,8 @@ The fields are interpreted as follows: > > > > #define VIRTIO_PCI_CAP_DEVICE_CFG4 > > > > /* PCI configuration access */ > > > > #define VIRTIO_PCI_CAP_PCI_CFG 5 > > > > +/* Group Identifier */ > > > > +#define VIRTIO_PCI_CAP_GROUP_ID_CFG 6 > > > > \end{lstlisting} > > > > > > > > Any other value is reserved for future use. > > > > @@ -997,6 +999,47 @@ address \field{cap.length} bytes within a BAR range > > > > specified by some other Virtio Structure PCI Capability > > > > of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}. > > > > > > > > +\subsubsection{Group Identifier capability}\label{sec:Virtio Transport > > > > Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier > > > > capability} > > > > + > > > > +The VIRTIO_PCI_CAP_GROUP_ID_CFG capability provides means for grouping > > > > devices together. > > > > + > > > > +The capability is immediately followed by an identifier of arbitrary > > > > size as below: > > > > + > > > > +\begin{lstlisting} > > > > +struct virtio_pci_group_id_cap { > > > > +struct virtio_pci_cap cap; > > > > +u8 group_id[]; /* Group Identifier */ > > > > +}; > > > > +\end{lstlisting} > > > > + > > > > +The fields \field{cap.bar}, \field{cap.length}, \field{cap.offset} > > > > +and \field{group_id} are read-only for the driver. > > > > + > > > > +The specification does not impose any restrictions on the size or > > > > +structure of group_id[]. > > > > > > I think it must be a multiple of 4 in size, as is > > > standard for all capabilities. > > > > Sure. Would rephrasing it as below suffice? > > > > The specification does not impose any restrictions on the size or > > structure of group_id[], except that the size must be a multiple of 4. > > > > > > > > > > > > Vendors > > Devices Will correct it in the next version. > > > are free to declare this array as > > > > +large as needed, as long as the combined size of all capabilities can > > > > +be accommodated within the PCI configuration space. > > > > + > > > > +If there is enough room in the PCI configuration space to accommodate > > > > +the group identifier, the fields \field{cap.bar}, \field{cap.offset} > > > > +and \field{cap.length} should be set to 0. > > > > + > > > > +If there isn't enough room, some or all of the group identifier can be > > > > +presented in the BAR region, in which case the fields \field{cap.bar}, > > > > +\field{cap.offset} and \field{cap.length} should be set appropriately. > > > > > > And then how do you glue the two pieces? > > > > How the user glues them up is up to the user. The specification should > > not impose rules on that, right? > > We need to define how these are matched. > Let's assume device A has it all in config space, device B > has part in memory. How would we compare them? I will go with your suggestion below, and hence, this becomes obsolete. > > > > > > > > > > + > > > > +In either case, the field \field{cap.cap_len} indicates the length of > > > > +the group identifier information present in the configuration space > > > > +itself. > > > > > > It seems like an overkill to me. Isn't it enough to have it in config > > > space? This would make comparisons easier. > > > > I was trying to make the proposal permissive for expansion, in case > > the user needs the size to be larger than what can be accommodated in > > the config space. Would you like me to restrict that the capability be > > entirely present in the config space? I am fine with it. Please confirm, > > and I will change it so. > > I think so, yes. Sure. I will revise the specification as above in the next version. Thanks, Venu > > > > > > > > + > > > > +\devicenormative{\paragraph}{Group Identifier capability}{Virtio > > > > Transport Options / Virtio Over PCI Bus / PCI Device Layout / Group > > > > Identifier capability} > > > > + > > > > +The device MAY present the VIRTIO_PCI_CAP_GROUP_ID_CFG capability. > > > > + > > > > +\drivernormative{\paragraph}{Group Identifier capability}{Virtio > > > > Transport Options / Virtio Over PCI Bus / PCI Device Layout / Group > > > > Identifier capability} > > > > + > > > > +The driver MUST NOT write to group_id[] area or the BAR region. > > > >
Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.
On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote: > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote: > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain the > > "Group Identifier" (UUID) that will be used to pair a virtio device with > > the passthrough device attached to that bridge. > > > > This capability is added to the bridge iff the "uuid" option is specified > > for the bridge device, via the qemu command line. Also, the bridge's > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is changed > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when the > > "uuid" option is present. > > > > Signed-off-by: Venu Busireddy > > I don't see why we should add it to all bridges. > Let's just add it to ones that already have the RH vendor ID? No. I am not adding the capability to all bridges. In the earlier discussions, we agreed that the bridge be left as Intel bridge if we do not intend to use it for storing the pairing information. If we do intend to store the pairing information in the bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to avoid confusion. In other words, bridge's with RH Vendor ID come into existence only when there is an intent to store the pairing information in the bridge. Accordingly, if the "uuid" option is specified for the bridge, it is assumed that the user intends to use the bridge for storing the pairing information, and hence, the capability is added to the bridge, and the Vendor ID is changed to RH Vendor ID. If the "uuid" option is not specified, the bridge remains as Intel bridge, and without the vendor-specific capability. Venu > > > > --- > > hw/pci-bridge/ioh3420.c| 2 ++ > > hw/pci-bridge/pcie_root_port.c | 7 +++ > > hw/pci/pci_bridge.c| 32 > > include/hw/pci/pci.h | 2 ++ > > include/hw/pci/pcie.h | 1 + > > include/hw/pci/pcie_port.h | 1 + > > 6 files changed, 45 insertions(+) > > > > diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c > > index a451d74ee6..b6b9ebc726 100644 > > --- a/hw/pci-bridge/ioh3420.c > > +++ b/hw/pci-bridge/ioh3420.c > > @@ -35,6 +35,7 @@ > > #define IOH_EP_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_MASKBIT > > #define IOH_EP_MSI_NR_VECTOR2 > > #define IOH_EP_EXP_OFFSET 0x90 > > +#define IOH_EP_VENDOR_OFFSET0xCC > > #define IOH_EP_AER_OFFSET 0x100 > > > > /* > > @@ -111,6 +112,7 @@ static void ioh3420_class_init(ObjectClass *klass, void > > *data) > > rpc->exp_offset = IOH_EP_EXP_OFFSET; > > rpc->aer_offset = IOH_EP_AER_OFFSET; > > rpc->ssvid_offset = IOH_EP_SSVID_OFFSET; > > +rpc->vendor_offset = IOH_EP_VENDOR_OFFSET; > > rpc->ssid = IOH_EP_SSVID_SSID; > > } > > > > diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c > > index 45f9e8cd4a..ba470c7fda 100644 > > --- a/hw/pci-bridge/pcie_root_port.c > > +++ b/hw/pci-bridge/pcie_root_port.c > > @@ -71,6 +71,12 @@ static void rp_realize(PCIDevice *d, Error **errp) > > goto err_bridge; > > } > > > > +rc = pci_bridge_vendor_init(d, rpc->vendor_offset, errp); > > +if (rc < 0) { > > +error_append_hint(errp, "Can't init group ID, error %d\n", rc); > > +goto err_bridge; > > +} > > + > > if (rpc->interrupts_init) { > > rc = rpc->interrupts_init(d, errp); > > if (rc < 0) { > > @@ -137,6 +143,7 @@ static void rp_exit(PCIDevice *d) > > static Property rp_props[] = { > > DEFINE_PROP_BIT(COMPAT_PROP_PCP, PCIDevice, cap_present, > > QEMU_PCIE_SLTCAP_PCP_BITNR, true), > > +DEFINE_PROP_UUID(COMPAT_PROP_UUID, PCIDevice, uuid, false), > > DEFINE_PROP_END_OF_LIST() > > }; > > > > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c > > index 40a39f57cb..c63bc439f7 100644 > > --- a/hw/pci/pci_bridge.c > > +++ b/hw/pci/pci_bridge.c > > @@ -34,12 +34,17 @@ > > #include "hw/pci/pci_bus.h" > > #include "qemu/range.h" > > #include "qapi/error.h" > > +#include "qemu/uuid.h" > > > > /* PCI bridge subsystem vendor ID helper functions */ > > #define PCI_SSVID_SIZEOF8 > > #define PCI_SSVID_SVID 4 > > #define PCI_SSVID_SSID 6 > > > > +#define PCI_VENDOR_SIZEOF 20 > > +#define PCI_VENDOR_CAP_LEN_OFFSET 2 > > +#define PCI_VENDOR_GROUP_ID_OFFSET 4 > > + > > int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset, > >uint16_t svid, uint16_t ssid, > >Error **errp) > > @@ -57,6 +62,33 @@ int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset, > > return pos; > > } > > > > +int pci_bridge_vendor_init(PCIDevice *d, uint8_t offset, Error **errp) > > +{ > > +int pos; > > +PCIDeviceClass *dc = PCI_DEVICE_GET_CLASS(d); > > + > > +if (qemu_uuid_is_null(>uuid)) { > > +return 0; > > +} > > +
Re: [Qemu-devel] Design Decision for KVM based anti rootkit
On 19 June 2018 at 19:37, David Vrabel wrote: > It's not clear how this increases security. What threats is this > protecting again? It won't completely protect prevent rootkits, because still rootkits can edit dynamic kernel data structures, but it will limit what rootkits damage to only dynamic data. This way system calls can't be changed, or Interrupt tables. > As an attacker, modifying the sensitive pages (kernel text?) will > require either: a) altering the existing mappings for these (to make > them read-write or user-writable for example); or b) creating aliased > mappings with suitable permissions. > > If the attacker can modify page tables in this way then it can also > bypass the suggested hypervisor's read-only protection by changing the > mappings to point to a unprotected page. I think I was missing this part out, but I meant to say completely prevent any modification to pages including the guest physical address to guest virtual address mapping for those protected pages, Another tricky (something random just popped up in my mind right now, better to say it than to forget it) solution is making new memory mappings inherit the same protection as old one, I assume that Hyper visor can do either things. Also that was the kind of performance hit I was talking about. I am not sure if that might break things or I can say it will for sure heavily limit some functionalities. like maybe hibernating guest. But that will be the kind of trades off I am expecting at least at the begining.
Re: [Qemu-devel] [virtio-dev] Re: [PATCH virtio 1/1] Add "Group Identifier" support to virtio PCI capabilities.
On Tue, Jun 19, 2018 at 12:54:06PM -0500, Venu Busireddy wrote: > On 2018-06-19 20:30:06 +0300, Michael S. Tsirkin wrote: > > On Tue, Jun 19, 2018 at 11:32:28AM -0500, Venu Busireddy wrote: > > > Add VIRTIO_PCI_CAP_GROUP_ID_CFG (Group Identifier) capability to the > > > virtio PCI capabilities to allow for the grouping of devices. > > > > > > Signed-off-by: Venu Busireddy > > > --- > > > content.tex | 43 +++ > > > 1 file changed, 43 insertions(+) > > > > > > diff --git a/content.tex b/content.tex > > > index 7a92cb1..7ea6267 100644 > > > --- a/content.tex > > > +++ b/content.tex > > > @@ -599,6 +599,8 @@ The fields are interpreted as follows: > > > #define VIRTIO_PCI_CAP_DEVICE_CFG4 > > > /* PCI configuration access */ > > > #define VIRTIO_PCI_CAP_PCI_CFG 5 > > > +/* Group Identifier */ > > > +#define VIRTIO_PCI_CAP_GROUP_ID_CFG 6 > > > \end{lstlisting} > > > > > > Any other value is reserved for future use. > > > @@ -997,6 +999,47 @@ address \field{cap.length} bytes within a BAR range > > > specified by some other Virtio Structure PCI Capability > > > of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}. > > > > > > +\subsubsection{Group Identifier capability}\label{sec:Virtio Transport > > > Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier > > > capability} > > > + > > > +The VIRTIO_PCI_CAP_GROUP_ID_CFG capability provides means for grouping > > > devices together. > > > + > > > +The capability is immediately followed by an identifier of arbitrary > > > size as below: > > > + > > > +\begin{lstlisting} > > > +struct virtio_pci_group_id_cap { > > > +struct virtio_pci_cap cap; > > > +u8 group_id[]; /* Group Identifier */ > > > +}; > > > +\end{lstlisting} > > > + > > > +The fields \field{cap.bar}, \field{cap.length}, \field{cap.offset} > > > +and \field{group_id} are read-only for the driver. > > > + > > > +The specification does not impose any restrictions on the size or > > > +structure of group_id[]. > > > > I think it must be a multiple of 4 in size, as is > > standard for all capabilities. > > Sure. Would rephrasing it as below suffice? > > The specification does not impose any restrictions on the size or > structure of group_id[], except that the size must be a multiple of 4. > > > > > > > > Vendors Devices > are free to declare this array as > > > +large as needed, as long as the combined size of all capabilities can > > > +be accommodated within the PCI configuration space. > > > + > > > +If there is enough room in the PCI configuration space to accommodate > > > +the group identifier, the fields \field{cap.bar}, \field{cap.offset} > > > +and \field{cap.length} should be set to 0. > > > + > > > +If there isn't enough room, some or all of the group identifier can be > > > +presented in the BAR region, in which case the fields \field{cap.bar}, > > > +\field{cap.offset} and \field{cap.length} should be set appropriately. > > > > And then how do you glue the two pieces? > > How the user glues them up is up to the user. The specification should > not impose rules on that, right? We need to define how these are matched. Let's assume device A has it all in config space, device B has part in memory. How would we compare them? > > > > > + > > > +In either case, the field \field{cap.cap_len} indicates the length of > > > +the group identifier information present in the configuration space > > > +itself. > > > > It seems like an overkill to me. Isn't it enough to have it in config > > space? This would make comparisons easier. > > I was trying to make the proposal permissive for expansion, in case > the user needs the size to be larger than what can be accommodated in > the config space. Would you like me to restrict that the capability be > entirely present in the config space? I am fine with it. Please confirm, > and I will change it so. I think so, yes. > > > > > + > > > +\devicenormative{\paragraph}{Group Identifier capability}{Virtio > > > Transport Options / Virtio Over PCI Bus / PCI Device Layout / Group > > > Identifier capability} > > > + > > > +The device MAY present the VIRTIO_PCI_CAP_GROUP_ID_CFG capability. > > > + > > > +\drivernormative{\paragraph}{Group Identifier capability}{Virtio > > > Transport Options / Virtio Over PCI Bus / PCI Device Layout / Group > > > Identifier capability} > > > + > > > +The driver MUST NOT write to group_id[] area or the BAR region. > > > + > > > \subsubsection{Legacy Interfaces: A Note on PCI Device > > > Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI > > > Device Layout / Legacy Interfaces: A Note on PCI Device Layout} > > > > > > Transitional devices MUST present part of configuration > > > > - > > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > > For additional commands, e-mail:
Re: [Qemu-devel] [PATCH V1 RESEND 0/6] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
Hi, This series failed docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. Type: series Message-id: 1529421657-14969-1-git-send-email-jingqi@intel.com Subject: [Qemu-devel] [PATCH V1 RESEND 0/6] Build ACPI Heterogeneous Memory Attribute Table (HMAT) === TEST SCRIPT BEGIN === #!/bin/bash set -e git submodule update --init dtc # Let docker tests dump environment info export SHOW_ENV=1 export J=8 time make docker-test-mingw@fedora === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' b1e9f8529e hmat acpi: Implement _HMA method to update HMAT at runtime f7eb5356e7 numa: Extend the command-line to provide memory side cache information d89f8eb917 numa: Extend the command-line to provide memory latency and bandwidth information 9376f5703d hmat acpi: Build Memory Side Cache Information Structure(s) in ACPI HMAT 6e1685b947 hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) in ACPI HMAT 4a72c940eb hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT === OUTPUT BEGIN === Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-2ntfz6_t/src/dtc'... Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42' BUILD fedora make[1]: Entering directory '/var/tmp/patchew-tester-tmp-2ntfz6_t/src' GEN /var/tmp/patchew-tester-tmp-2ntfz6_t/src/docker-src.2018-06-19-14.05.45.15002/qemu.tar Cloning into '/var/tmp/patchew-tester-tmp-2ntfz6_t/src/docker-src.2018-06-19-14.05.45.15002/qemu.tar.vroot'... done. Checking out files: 46% (2880/6239) Checking out files: 47% (2933/6239) Checking out files: 48% (2995/6239) Checking out files: 49% (3058/6239) Checking out files: 50% (3120/6239) Checking out files: 51% (3182/6239) Checking out files: 52% (3245/6239) Checking out files: 53% (3307/6239) Checking out files: 54% (3370/6239) Checking out files: 55% (3432/6239) Checking out files: 56% (3494/6239) Checking out files: 57% (3557/6239) Checking out files: 58% (3619/6239) Checking out files: 59% (3682/6239) Checking out files: 60% (3744/6239) Checking out files: 61% (3806/6239) Checking out files: 62% (3869/6239) Checking out files: 63% (3931/6239) Checking out files: 64% (3993/6239) Checking out files: 65% (4056/6239) Checking out files: 66% (4118/6239) Checking out files: 67% (4181/6239) Checking out files: 68% (4243/6239) Checking out files: 69% (4305/6239) Checking out files: 70% (4368/6239) Checking out files: 71% (4430/6239) Checking out files: 72% (4493/6239) Checking out files: 73% (4555/6239) Checking out files: 74% (4617/6239) Checking out files: 75% (4680/6239) Checking out files: 76% (4742/6239) Checking out files: 76% (4751/6239) Checking out files: 77% (4805/6239) Checking out files: 78% (4867/6239) Checking out files: 79% (4929/6239) Checking out files: 80% (4992/6239) Checking out files: 81% (5054/6239) Checking out files: 82% (5116/6239) Checking out files: 83% (5179/6239) Checking out files: 84% (5241/6239) Checking out files: 85% (5304/6239) Checking out files: 86% (5366/6239) Checking out files: 87% (5428/6239) Checking out files: 88% (5491/6239) Checking out files: 89% (5553/6239) Checking out files: 90% (5616/6239) Checking out files: 91% (5678/6239) Checking out files: 92% (5740/6239) Checking out files: 93% (5803/6239) Checking out files: 94% (5865/6239) Checking out files: 95% (5928/6239) Checking out files: 96% (5990/6239) Checking out files: 97% (6052/6239) Checking out files: 98% (6115/6239) Checking out files: 99% (6177/6239) Checking out files: 100% (6239/6239) Checking out files: 100% (6239/6239), done. Your branch is up-to-date with 'origin/test'. Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-2ntfz6_t/src/docker-src.2018-06-19-14.05.45.15002/qemu.tar.vroot/dtc'... Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42' Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered for path 'ui/keycodemapdb' Cloning into '/var/tmp/patchew-tester-tmp-2ntfz6_t/src/docker-src.2018-06-19-14.05.45.15002/qemu.tar.vroot/ui/keycodemapdb'... Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce' COPYRUNNER RUN test-mingw in qemu:fedora Packages installed: SDL2-devel-2.0.8-5.fc28.x86_64 bc-1.07.1-5.fc28.x86_64 bison-3.0.4-9.fc28.x86_64 bluez-libs-devel-5.49-3.fc28.x86_64 brlapi-devel-0.6.7-12.fc28.x86_64 bzip2-1.0.6-26.fc28.x86_64 bzip2-devel-1.0.6-26.fc28.x86_64 ccache-3.4.2-2.fc28.x86_64 clang-6.0.0-5.fc28.x86_64