Re: [Qemu-devel] [PATCH 09/12] ring: introduce lockless ring buffer

2018-06-19 Thread Peter Xu
On Mon, Jun 04, 2018 at 05:55:17PM +0800, guangrong.x...@gmail.com wrote:

[...]

(Some more comments/questions for the MP implementation...)

> +static inline int ring_mp_put(Ring *ring, void *data)
> +{
> +unsigned int index, in, in_next, out;
> +
> +do {
> +in = atomic_read(>in);
> +out = atomic_read(>out);

[0]

Do we need to fetch "out" with load_acquire()?  Otherwise what's the
pairing of below store_release() at [1]?

This barrier exists in SP-SC case which makes sense to me, I assume
that's also needed for MP-SC case, am I right?

> +
> +if (__ring_is_full(ring, in, out)) {
> +if (atomic_read(>in) == in &&
> +atomic_read(>out) == out) {

Why read again?  After all the ring API seems to be designed as
non-blocking.  E.g., I see the poll at [2] below makes more sense
since when reaches [2] it means that there must be a producer that is
_doing_ the queuing, so polling is very possible to complete fast.
However here it seems to be a pure busy poll without any hint.  Then
not sure whether we should just let the caller decide whether it wants
to call ring_put() again.

> +return -ENOBUFS;
> +}
> +
> +/* a entry has been fetched out, retry. */
> +continue;
> +}
> +
> +in_next = in + 1;
> +} while (atomic_cmpxchg(>in, in, in_next) != in);
> +
> +index = ring_index(ring, in);
> +
> +/*
> + * smp_rmb() paired with the memory barrier of (A) in ring_mp_get()
> + * is implied in atomic_cmpxchg() as we should read ring->out first
> + * before fetching the entry, otherwise this assert will fail.

Thanks for all these comments!  These are really helpful for
reviewers.

However I'm not sure whether I understand it correctly here on MB of
(A) for ring_mp_get() - AFAIU that should corresponds to a smp_rmb()
at [0] above when reading the "out" variable rather than this
assertion, and that's why I thought at [0] we should have something
like a load_acquire() there (which contains a rmb()).

>From content-wise, I think the code here is correct, since
atomic_cmpxchg() should have one implicit smp_mb() after all so we
don't need anything further barriers here.

> + */
> +assert(!atomic_read(>data[index]));
> +
> +/*
> + * smp_mb() paired with the memory barrier of (B) in ring_mp_get() is
> + * implied in atomic_cmpxchg(), that is needed here as  we should read
> + * ring->out before updating the entry, it is the same as we did in
> + * __ring_put().
> + *
> + * smp_wmb() paired with the memory barrier of (C) in ring_mp_get()
> + * is implied in atomic_cmpxchg(), that is needed as we should increase
> + * ring->in before updating the entry.
> + */
> +atomic_set(>data[index], data);
> +
> +return 0;
> +}
> +
> +static inline void *ring_mp_get(Ring *ring)
> +{
> +unsigned int index, in;
> +void *data;
> +
> +do {
> +in = atomic_read(>in);
> +
> +/*
> + * (C) should read ring->in first to make sure the entry pointed by 
> this
> + * index is available
> + */
> +smp_rmb();
> +
> +if (!__ring_is_empty(in, ring->out)) {
> +break;
> +}
> +
> +if (atomic_read(>in) == in) {
> +return NULL;
> +}
> +/* new entry has been added in, retry. */
> +} while (1);
> +
> +index = ring_index(ring, ring->out);
> +
> +do {
> +data = atomic_read(>data[index]);
> +if (data) {
> +break;
> +}
> +/* the producer is updating the entry, retry */
> +cpu_relax();

[2]

> +} while (1);
> +
> +atomic_set(>data[index], NULL);
> +
> +/*
> + * (B) smp_mb() is needed as we should read the entry out before
> + * updating ring->out as we did in __ring_get().
> + *
> + * (A) smp_wmb() is needed as we should make the entry be NULL before
> + * updating ring->out (which will make the entry be visible and usable).
> + */
> +atomic_store_release(>out, ring->out + 1);

[1]

> +
> +return data;
> +}
> +
> +static inline int ring_put(Ring *ring, void *data)
> +{
> +if (ring->flags & RING_MULTI_PRODUCER) {
> +return ring_mp_put(ring, data);
> +}
> +return __ring_put(ring, data);
> +}
> +
> +static inline void *ring_get(Ring *ring)
> +{
> +if (ring->flags & RING_MULTI_PRODUCER) {
> +return ring_mp_get(ring);
> +}
> +return __ring_get(ring);
> +}
> +#endif
> -- 
> 2.14.4
> 

Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v2 3/4] ppc/pnv: introduce Pnv8Chip and Pnv9Chip models

2018-06-19 Thread Cédric Le Goater
On 06/20/2018 02:56 AM, David Gibson wrote:
> On Tue, Jun 19, 2018 at 07:24:44AM +0200, Cédric Le Goater wrote:
>>
>  typedef struct PnvChipClass {
>  /*< private >*/
> @@ -75,6 +95,7 @@ typedef struct PnvChipClass {
>  
>  hwaddr   xscom_base;
>  
> +void (*realize)(PnvChip *chip, Error **errp);

 This looks the wrong way round from how things are usually done.
 Rather than having the base class realize() call the subclass specific
 realize hook, it's more usual for the subclass to set the
 dc->realize() and have it call a k->parent_realize() to call up the
 chain.  grep for device_class_set_parent_realize() for some more
 examples.
>>>
>>> Ah. That is more to my liking. There are a couple of models following
>>> the wrong object pattern, xics, vio. I will check them.
>>
>> So XICS is causing some head-aches because the ics-kvm model inherits
>> from ics-simple which inherits from ics-base. so we have a grand-parent
>> class to handle.
> 
> Ok.  I mean, we should probably switch ics around to use the
> parent_realize model, rather than the backwards one it does now.  But
> it's not immediately obvious to me why having a grandparent class
> breaks things.

If you follow the common realize pattern, you end up with a recursive 
loop with one of the realize routine. I didn't dig much the issue.

>> if we could affiliate ics-kvm directly to ics-base it would make the 
>> family affair easier. we need to check migration though.
> 
> But that said, I've been thinking for a while that it might make sense
> to fold ics-kvm into ics-base.  It seems very risky to have two
> different object classes that are supposed to have guest-identical
> behaviour.  Certainly adding any migratable state to one not the other
> would be horribly wrong.

yes. clearly. something like bellow would be better:

   +---+
   | ICS   |
 +-+  common/base  ++
 | +---+|
 |  |
 | spaprspapr   |
 |  pnv |
  +--v++v+
  | ICS   ||   ICS   |
  |  simple/QEMU  ||   KVM   |
  +---++-+

with only some reset and realize handling in the subclasses. The only
extra field we could add under the KVM class is the KVM XICS device fd.  


Thanks,

C.



Re: [Qemu-devel] [PATCH v2 3/3] spapr: introduce a fixed IRQ number space

2018-06-19 Thread Cédric Le Goater
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index e4f5946a2188..c82dc40be0d5 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -709,7 +709,11 @@ void spapr_events_init(sPAPRMachineState *spapr)
>>  {
>>  int epow_irq;
>>  
>> -epow_irq = spapr_irq_findone(spapr, _fatal);
>> +if (spapr->xics_legacy) {
>> +epow_irq = spapr_irq_findone(spapr, _fatal);
>> +} else {
>> +epow_irq = SPAPR_IRQ_EPOW;
> 
> Can slightly improve brevity by just initializing epow_irq to this,
> then overwriting it in the legacy case.

and I forgot to add this to v3 ... I can add it later on if there
are no other changes requested or if we move the find routine
under the machine class.

C.



Re: [Qemu-devel] [PATCH 09/12] ring: introduce lockless ring buffer

2018-06-19 Thread Peter Xu
On Mon, Jun 04, 2018 at 05:55:17PM +0800, guangrong.x...@gmail.com wrote:
> From: Xiao Guangrong 
> 
> It's the simple lockless ring buffer implement which supports both
> single producer vs. single consumer and multiple producers vs.
> single consumer.
> 
> Many lessons were learned from Linux Kernel's kfifo (1) and DPDK's
> rte_ring (2) before i wrote this implement. It corrects some bugs of
> memory barriers in kfifo and it is the simpler lockless version of
> rte_ring as currently multiple access is only allowed for producer.

Could you provide some more information about the kfifo bug?  Any
pointer would be appreciated.

> 
> If has single producer vs. single consumer, it is the traditional fifo,
> If has multiple producers, it uses the algorithm as followings:
> 
> For the producer, it uses two steps to update the ring:
>- first step, occupy the entry in the ring:
> 
> retry:
>   in = ring->in
>   if (cmpxhg(>in, in, in +1) != in)
> goto retry;
> 
>  after that the entry pointed by ring->data[in] has been owned by
>  the producer.
> 
>  assert(ring->data[in] == NULL);
> 
>  Note, no other producer can touch this entry so that this entry
>  should always be the initialized state.
> 
>- second step, write the data to the entry:
> 
>  ring->data[in] = data;
> 
> For the consumer, it first checks if there is available entry in the
> ring and fetches the entry from the ring:
> 
>  if (!ring_is_empty(ring))
>   entry = [ring->out];
> 
>  Note: the ring->out has not been updated so that the entry pointed
>  by ring->out is completely owned by the consumer.
> 
> Then it checks if the data is ready:
> 
> retry:
>  if (*entry == NULL)
> goto retry;
> That means, the producer has updated the index but haven't written any
> data to it.
> 
> Finally, it fetches the valid data out, set the entry to the initialized
> state and update ring->out to make the entry be usable to the producer:
> 
>   data = *entry;
>   *entry = NULL;
>   ring->out++;
> 
> Memory barrier is omitted here, please refer to the comment in the code.
> 
> (1) 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/kfifo.h
> (2) http://dpdk.org/doc/api/rte__ring_8h.html
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  migration/ring.h | 265 
> +++

If this is a very general implementation, not sure whether we can move
this to util/ directory so that it can be used even outside migration
codes.

>  1 file changed, 265 insertions(+)
>  create mode 100644 migration/ring.h
> 
> diff --git a/migration/ring.h b/migration/ring.h
> new file mode 100644
> index 00..da9b8bdcbb
> --- /dev/null
> +++ b/migration/ring.h
> @@ -0,0 +1,265 @@
> +/*
> + * Ring Buffer
> + *
> + * Multiple producers and single consumer are supported with lock free.
> + *
> + * Copyright (c) 2018 Tencent Inc
> + *
> + * Authors:
> + *  Xiao Guangrong 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef _RING__
> +#define _RING__
> +
> +#define CACHE_LINE  64

Is this for x86_64?  Is the cache line size the same for all arch?

> +#define cache_aligned __attribute__((__aligned__(CACHE_LINE)))
> +
> +#define RING_MULTI_PRODUCER 0x1
> +
> +struct Ring {
> +unsigned int flags;
> +unsigned int size;
> +unsigned int mask;
> +
> +unsigned int in cache_aligned;
> +
> +unsigned int out cache_aligned;
> +
> +void *data[0] cache_aligned;
> +};
> +typedef struct Ring Ring;
> +
> +/*
> + * allocate and initialize the ring
> + *
> + * @size: the number of element, it should be power of 2
> + * @flags: set to RING_MULTI_PRODUCER if the ring has multiple producer,
> + * otherwise set it to 0, i,e. single producer and single consumer.
> + *
> + * return the ring.
> + */
> +static inline Ring *ring_alloc(unsigned int size, unsigned int flags)
> +{
> +Ring *ring;
> +
> +assert(is_power_of_2(size));
> +
> +ring = g_malloc0(sizeof(*ring) + size * sizeof(void *));
> +ring->size = size;
> +ring->mask = ring->size - 1;
> +ring->flags = flags;
> +return ring;
> +}
> +
> +static inline void ring_free(Ring *ring)
> +{
> +g_free(ring);
> +}
> +
> +static inline bool __ring_is_empty(unsigned int in, unsigned int out)
> +{
> +return in == out;
> +}

(some of the helpers are a bit confusing to me like this one; I would
 prefer some of the helpers be directly squashed into code, but it's a
 personal preference only)

> +
> +static inline bool ring_is_empty(Ring *ring)
> +{
> +return ring->in == ring->out;
> +}
> +
> +static inline unsigned int ring_len(unsigned int in, unsigned int out)
> +{
> +return in - out;
> +}

(this too)

> +
> +static inline bool
> +__ring_is_full(Ring *ring, unsigned int in, unsigned int out)
> +{
> +return 

Re: [Qemu-devel] [RFC v2 1/3] pci_expander_bridge: add type TYPE_PXB_PCIE_HOST

2018-06-19 Thread Marcel Apfelbaum




On 06/13/2018 11:23 AM, Zihan Yang wrote:

Michael S. Tsirkin  于2018年6月12日周二 下午9:43写道:

On Tue, Jun 12, 2018 at 05:13:22PM +0800, Zihan Yang wrote:

The inner host bridge created by pxb-pcie is TYPE_PXB_PCI_HOST by default,
add a new type TYPE_PXB_PCIE_HOST to better utilize the ECAM of PCIe

Signed-off-by: Zihan Yang 

I have a concern that there are lots of new properties
added here, I'm not sure how are upper layers supposed to
manage them all.

E.g. bus_nr supplied in several places, domain_nr for which
it's not clear how it is supposed to be allocated, etc.

Indeed they seem to double the properties, but the pxb host is
an internal structure of pxb-pcie device, created in pxb-pcie's
realization procedure, and acpi-build queries host bridges instead
of pxb-pcie devices. This means that users can not directly specify
the property of pxb host bridge, but must 'inherit' from pxb-pcie
devices. I had thought about changing the acpi-build process,
but that would require more modifications.

As for the properties, bus_nr means the start bus number
of this host bridge. It is used when pxb-pcie is in pci domain 0
with q35 host to avoid bus number confliction. When it is placed
in a separate pci domain, it is not used and should be 0.

max_bus means how many buses the user desires, EACH bus in
PCIe requires 1MB configuration space, thus specifying it could
reduce the reserved memory in MMCFG as suggested by Marcel.


The max_bus property is optional, you set the default to 255.
I am wondering if 255 is too much as a default for an extra bus,
I would use a smaller value, like 10.


Typically, the user can specify

-device pxb-pcie,id=br1,bus="pcie.0",sep_domain=on,domain_nr=1,max_bus=130

this will place the buses under this pxb host bridge in pci domain
1, and reserve (130 + 1) = 131 buses for it. The start bus number
is always 0 currently for simplicity.


Can the management interface be simplified?
Ideally we wouldn't have to teach libvirt new tricks,
just generalize pxb support slightly.

We can delete 'sep_domain' property, I just find 'domain_nr'
already indicates domain number.


Agreed, please remove sep_domain property.

Thanks,



But domain_nr and
max_bus seems unremovable, although they look 'redundant'
because they appear twice.

I'm not familiar with libvirt, but from the perspective of user,
only 2 properties are added(domain_nr and max_bus, if we
delete sep_domain), though the internal structure actually has
changed.





[Qemu-devel] [PATCH 0/2] ide/hw/core: fix bug# 1777315, crash on short PRDs

2018-06-19 Thread Amol Surati
The fix utilizes the existing policy QEMU has about short PRDs, and
considers the transfers that cause the crash as generated through short
PRDS.

It
- continues to allow QEMU to support multiple calls to
  prepare_buf/ide_dma_cb,
- so, continues to keep QEMU free from needing the entire sglist in one
  go;
- avoids the crash;
- but, treats the affected transfers as short, instead of allowing them
  to continue.


Amol Surati (1):
  ide/hw/core: fix crash on processing a partial-sector-size DMA xfer

John Snow (1):
  tests/ide-test: test case for crash when processing short PRDs

 hw/ide/core.c|  5 -
 tests/ide-test.c | 28 
 2 files changed, 32 insertions(+), 1 deletion(-)

-- 
2.17.1




[Qemu-devel] [PATCH 1/2] ide/hw/core: fix crash on processing a partial-sector-size DMA xfer

2018-06-19 Thread Amol Surati
Fixes: https://bugs.launchpad.net/qemu/+bug/1777315

QEMU's short PRD policy applies to a DMA transfer of size < 512 bytes.
But it fails to consider transfers which are >= 512 bytes, but are
not a multiple of 512 bytes.

Such transfers are not subject to the short PRD policy. They end up
violating the assumptions about the granularity of the IO sizes,
upon which depend the verification of the completion of the previous
transfer, and the advancement of the offset in preparation of the next.

Those violations result in the crash.

By forcing each transfer to be a multiple of sector size, such
transfers are subjected to the policy, and therefore culled before they
cause the crash.

Signed-off-by: Amol Surati 
---
 hw/ide/core.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 2c62efc536..14d135224b 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -836,6 +836,7 @@ static void ide_dma_cb(void *opaque, int ret)
 {
 IDEState *s = opaque;
 int n;
+int32_t size_prepared;
 int64_t sector_num;
 uint64_t offset;
 bool stay_active = false;
@@ -886,7 +887,9 @@ static void ide_dma_cb(void *opaque, int ret)
 n = s->nsector;
 s->io_buffer_index = 0;
 s->io_buffer_size = n * 512;
-if (s->bus->dma->ops->prepare_buf(s->bus->dma, s->io_buffer_size) < 512) {
+size_prepared = s->bus->dma->ops->prepare_buf(s->bus->dma,
+  s->io_buffer_size);
+if (size_prepared <= 0 || size_prepared % 512) {
 /* The PRDs were too short. Reset the Active bit, but don't raise an
  * interrupt. */
 s->status = READY_STAT | SEEK_STAT;
-- 
2.17.1




[Qemu-devel] [PATCH 2/2] tests/ide-test: test case for crash when processing short PRDs

2018-06-19 Thread Amol Surati
From: John Snow 

Related Bug: https://bugs.launchpad.net/qemu/+bug/1777315

Signed-off-by: Amol Surati 
---
 tests/ide-test.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/tests/ide-test.c b/tests/ide-test.c
index f39431b1a9..382c29a174 100644
--- a/tests/ide-test.c
+++ b/tests/ide-test.c
@@ -473,6 +473,32 @@ static void test_bmdma_one_sector_short_prdt(void)
 free_pci_device(dev);
 }
 
+static void test_bmdma_partial_sector_short_prdt(void)
+{
+QPCIDevice *dev;
+QPCIBar bmdma_bar, ide_bar;
+uint8_t status;
+
+/* Read 2 sectors but only give 1 sector in PRDT */
+PrdtEntry prdt[] = {
+{
+.addr = 0,
+.size = cpu_to_le32(0x200),
+},
+{
+.addr = 512,
+.size = cpu_to_le32(0x44 | PRDT_EOT),
+}
+};
+
+dev = get_pci_device(_bar, _bar);
+status = send_dma_request(CMD_READ_DMA, 0, 2,
+  prdt, ARRAY_SIZE(prdt), NULL);
+g_assert_cmphex(status, ==, 0);
+assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | ERR);
+free_pci_device(dev);
+}
+
 static void test_bmdma_long_prdt(void)
 {
 QPCIDevice *dev;
@@ -1037,6 +1063,8 @@ int main(int argc, char **argv)
 qtest_add_func("/ide/bmdma/short_prdt", test_bmdma_short_prdt);
 qtest_add_func("/ide/bmdma/one_sector_short_prdt",
test_bmdma_one_sector_short_prdt);
+qtest_add_func("/ide/bmdma/partial_sector_short_prdt",
+   test_bmdma_partial_sector_short_prdt);
 qtest_add_func("/ide/bmdma/long_prdt", test_bmdma_long_prdt);
 qtest_add_func("/ide/bmdma/no_busmaster", test_bmdma_no_busmaster);
 qtest_add_func("/ide/bmdma/teardown", test_bmdma_teardown);
-- 
2.17.1




Re: [Qemu-devel] [virtio-dev] Re: [v23 1/2] virtio-crypto: Add virtio crypto device specification

2018-06-19 Thread Michael S. Tsirkin
On Wed, Jan 10, 2018 at 01:53:09PM +0800, Longpeng (Mike) wrote:
> Hi Halil,
> 
> We are fixing the Intel BUG these days, so I will go through your comments 
> after
> we're done. Thanks.

All right - are you guys done with meltdown/spectre? I'd like us
to start finally getting parts of this in the spec.
This is already used in the field - let's get into spec
whatever is already out there.

Argue about future enhancements later.


-- 
MST



Re: [Qemu-devel] [PATCH v2 3/8] ppc4xx_i2c: Implement directcntl register

2018-06-19 Thread BALATON Zoltan

On Wed, 20 Jun 2018, David Gibson wrote:

On Tue, Jun 19, 2018 at 11:29:09AM +0200, BALATON Zoltan wrote:

On Mon, 18 Jun 2018, David Gibson wrote:

On Wed, Jun 13, 2018 at 04:03:18PM +0200, BALATON Zoltan wrote:

On Wed, 13 Jun 2018, David Gibson wrote:

On Wed, Jun 13, 2018 at 10:54:22AM +0200, BALATON Zoltan wrote:

On Wed, 13 Jun 2018, David Gibson wrote:

On Wed, Jun 06, 2018 at 03:31:48PM +0200, BALATON Zoltan wrote:

diff --git a/hw/i2c/ppc4xx_i2c.c b/hw/i2c/ppc4xx_i2c.c
index a68b5f7..5806209 100644
--- a/hw/i2c/ppc4xx_i2c.c
+++ b/hw/i2c/ppc4xx_i2c.c
@@ -30,6 +30,7 @@
 #include "cpu.h"
 #include "hw/hw.h"
 #include "hw/i2c/ppc4xx_i2c.h"
+#include "bitbang_i2c.h"

 #define PPC4xx_I2C_MEM_SIZE 18

@@ -46,7 +47,13 @@

 #define IIC_XTCNTLSS_SRST   (1 << 0)

+#define IIC_DIRECTCNTL_SDAC (1 << 3)
+#define IIC_DIRECTCNTL_SCLC (1 << 2)
+#define IIC_DIRECTCNTL_MSDA (1 << 1)
+#define IIC_DIRECTCNTL_MSCL (1 << 0)
+
 typedef struct {
+bitbang_i2c_interface *bitbang;
 uint8_t mdata;
 uint8_t lmadr;
 uint8_t hmadr;
@@ -308,7 +315,11 @@ static void ppc4xx_i2c_writeb(void *opaque, hwaddr addr, 
uint64_t value,
 i2c->xtcntlss = value;
 break;
 case 16:
-i2c->directcntl = value & 0x7;
+i2c->directcntl = value & (IIC_DIRECTCNTL_SDAC & IIC_DIRECTCNTL_SCLC);
+i2c->directcntl |= (value & IIC_DIRECTCNTL_SCLC ? 1 : 0);
+bitbang_i2c_set(i2c->bitbang, BITBANG_I2C_SCL, i2c->directcntl & 1);


Shouldn't that use i2c->directcntl & IIC_DIRECTCNTL_MSCL ?


+i2c->directcntl |= bitbang_i2c_set(i2c->bitbang, BITBANG_I2C_SDA,
+   (value & IIC_DIRECTCNTL_SDAC) != 0) << 1;


Last expression might be clearer as:
value & IIC_DIRECTCNTL_SDAC ? IIC_DIRECTCNTL_MSDA : 0


I guess this is a matter of taste but to me IIC_DIRECTCNTL_MSDA is a bit
position in the register so I use that when accessing that bit but when I
check for the values of a bit being 0 or 1 I don't use the define which is
for something else, just happens to have value 1 as well.


Hmm.. but the bit is being store in i2c->directcntl, which means it
can be read back from the register in that position, no?


Which of the above two do you mean?

In the first one I test for the 1/0 value set by the previous line before
the bitbang_i2c_set call. This could be accessed as MSCL later but using
that here would just make it longer and less obvious. If I want to be
absolutely precise maybe it should be (value & IIC_DIRECTCNTL_SCL ? 1 : 0)
in this line too but that was just stored in the register one line before so
I can reuse that here as well. Otherwise I could add another variable just
for this bit value and use that in both lines but why make it more
complicated for a simple 1 or 0 value?


Longer maybe, but I don't know about less obvious.  Actually I think
you should use IIC_DIRECTCNTL_MSCL instead of a bare '1' in both the
line setting i2c->directcntl, then the next line checking that bit to
pass it into bitbang_i2c_set.  The point is you're modifying the
effective register contents, so it makes sense to make it clearer
which bit of the register you're setting.


When setting the bit it's the value 1 so that's not the bit
position,


Huh??  The constants aren't bit positions either, they're masks.  How
is IIC_DIRECTCNTL_MSCL wrong here?


I
think 1 : 0 is correct there.


Correct, sure, but less clear than it could be.


I've changed the next line in v4 I've just
sent to the constant when checking the value of the MSCL bit.


In the second case using MSDA is really not correct because the level to set
is defined by SDAC bit. The SDAC, SCLC bits are what the program sets to
tell which states the two i2c lines should be and the MSDA, MSCL are read
only bits that show what states the lines really are.


Ok...


IIC_DIRECTCNTL_MSDA has value of 1 but it means the second bit in the
directcntl reg (which could have 0 or 1 value) not 1 value of a bit or i2c
line.


Uh.. what?  AFAICT, based on the result of bitbang_i2c_set() you're
updating the value of the MSDA (== 0x2) bit in i2c->directcntl
register state.  Why doesn't the symbolic name make sense here?


Sorry, I may not have been able to clearly say what I mean. I meant that
IIC_DIRECTCNTL_MSDA means the bit in position 1 (numbering from LSB being
bit number 0) which may have value 1 or 0. In cases I mean the value I use 1
or 0. In case I refer to the bit position I use constants. In the line

bitbang_i2c_set(i2c->bitbang, BITBANG_I2C_SCL, i2c->directcntl & 1);

it should be the constant, just used 1 there for brevity because it's
obvious from the previous line what's meant.


Maybe, but using the constant is still clearer, and friendly to people
grepping the source.


I've changed this now. At other
places the values of the bits are written as 1 or 0 so I think for those
constants should not be needed.


I have no idea what you mean by this.


OK, I'm lost now. Is v4 acceptable or are there any more changes you 

Re: [Qemu-devel] [virtio-dev] [PATCH] Add virtio gpu device specification.

2018-06-19 Thread Michael S. Tsirkin
On Tue, May 10, 2016 at 01:25:37PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > > > Rendered versions are available here:
> > > >   https://www.kraxel.org/virtio/virtio-v1.0-cs03-virtio-gpu.pdf
> > > >   
> > > > https://www.kraxel.org/virtio/virtio-v1.0-cs03-virtio-gpu.html#x1-287
> 
> > > I guess a non-fenced command only completes when the operation has
> > > finished, too (so that a meaningful success/error value can be
> > > produced)?
> > 
> > When stuff is processed asynchronously the command can complete before
> > the operation actually completed.  Current qemu implementation does that
> > only in 3d mode, when offloading stuff to the hardware (and verifies
> > stuff beforehand, so if you try to kick 3d rendering with an invalid
> > context id qemu will throw an error).
> > 
> > I'll try to make that more clear in the text.
> 
> Updated now.
> 
> cheers,
>   Gerd

Is there a chance you could rebase and post?
This is used widely, I think we shoould have it in 1.1
if at all possible.


> 
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [Qemu-devel] [PATCH v4 6/7] monitor: remove "x-oob", turn oob on by default

2018-06-19 Thread Peter Xu
On Tue, Jun 19, 2018 at 04:16:49PM +0200, Markus Armbruster wrote:
> Peter Xu  writes:
> 
> > There was a regression reported by Eric Auger before with OOB:
> >
> >   http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html
> >
> > It is fixed in 951702f39c ("monitor: bind dispatch bh to iohandler
> > context", 2018-04-10).
> >
> > For the bug, we turned Out-Of-Band feature of monitors off for 2.12
> > release.  Now we turn that on again after the 2.12 release.
> 
> Relating what happened in the order it happened could be easier to
> understand.  Perhaps:
> 
>   OOB commands were introduced in commit cf869d53172.  Unfortunately, we
>   ran into a regression, and had to disable them by default for 2.12
>   (commit be933ffc23).
> 
>   The regression has since been fixed (commit 951702f39c7 "monitor: bind
>   dispatch bh to iohandler context").  Time to re-enable OOB.

This indeed looks much nicer.

> 
> > This patch partly reverts be933ffc23 (monitor: new parameter "x-oob"),
> > meanwhile turn it on again by default for non-MUX QMPs.  Note that we
> 
> "by default"?

Did I mis-spell somewhere?

> 
> > can't enable Out-Of-Band for monitors with MUX-typed chardev backends,
> > because not all the chardev frontends can run without main thread, or
> > can run in multiple threads.
> >
> > Some trivial touch-up in the test code is required to make sure qmp-test
> > won't broke.
> 
> "won't break"

This one I did.

> 
> >
> > Signed-off-by: Peter Xu 
> > ---
> >  include/monitor/monitor.h |  1 -
> >  monitor.c | 17 +
> >  tests/libqtest.c  |  2 +-
> >  tests/qmp-test.c  |  2 +-
> >  vl.c  |  5 -
> >  5 files changed, 3 insertions(+), 24 deletions(-)
> >
> > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> > index d6ab70cae2..0cb0538a31 100644
> > --- a/include/monitor/monitor.h
> > +++ b/include/monitor/monitor.h
> > @@ -13,7 +13,6 @@ extern Monitor *cur_mon;
> >  #define MONITOR_USE_READLINE  0x02
> >  #define MONITOR_USE_CONTROL   0x04
> >  #define MONITOR_USE_PRETTY0x08
> > -#define MONITOR_USE_OOB   0x10
> >  
> >  bool monitor_cur_is_qmp(void);
> >  
> > diff --git a/monitor.c b/monitor.c
> > index c9a02ee40c..7fbcf84b02 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -4587,19 +4587,7 @@ void monitor_init(Chardev *chr, int flags)
> >  {
> >  Monitor *mon = g_malloc(sizeof(*mon));
> >  bool use_readline = flags & MONITOR_USE_READLINE;
> > -bool use_oob = flags & MONITOR_USE_OOB;
> > -
> > -if (use_oob) {
> > -if (CHARDEV_IS_MUX(chr)) {
> > -error_report("Monitor Out-Of-Band is not supported with "
> > - "MUX typed chardev backend");
> > -exit(1);
> > -}
> > -if (use_readline) {
> > -error_report("Monitor Out-Of-band is only supported by QMP");
> > -exit(1);
> > -}
> > -}
> > +bool use_oob = (flags & MONITOR_USE_CONTROL) && !CHARDEV_IS_MUX(chr);
> 
> A comment explaining (briefly!) why MUX prevents oob would be useful
> here.  Fortunately, you can simply steal from your commit message.

Done.

> 
> >  
> >  monitor_data_init(mon, false, use_oob);
> >  
> > @@ -4701,9 +4689,6 @@ QemuOptsList qemu_mon_opts = {
> >  },{
> >  .name = "pretty",
> >  .type = QEMU_OPT_BOOL,
> > -},{
> > -.name = "x-oob",
> > -.type = QEMU_OPT_BOOL,
> >  },
> >  { /* end of list */ }
> >  },
> > diff --git a/tests/libqtest.c b/tests/libqtest.c
> > index 098af6aec4..c5cb3f925c 100644
> > --- a/tests/libqtest.c
> > +++ b/tests/libqtest.c
> > @@ -213,7 +213,7 @@ QTestState *qtest_init_without_qmp_handshake(bool 
> > use_oob,
> >"-display none "
> >"%s", qemu_binary, socket_path,
> >getenv("QTEST_LOG") ? "/dev/fd/2" : 
> > "/dev/null",
> > -  qmp_socket_path, use_oob ? ",x-oob=on" : 
> > "",
> > +  qmp_socket_path, "",
> >extra_args ?: "");
> >  execlp("/bin/sh", "sh", "-c", command, NULL);
> >  exit(1);
> > diff --git a/tests/qmp-test.c b/tests/qmp-test.c
> > index a49cbc6fde..3747bf7fbb 100644
> > --- a/tests/qmp-test.c
> > +++ b/tests/qmp-test.c
> > @@ -89,7 +89,7 @@ static void test_qmp_protocol(void)
> >  g_assert(q);
> >  test_version(qdict_get(q, "version"));
> >  capabilities = qdict_get_qlist(q, "capabilities");
> > -g_assert(capabilities && qlist_empty(capabilities));
> > +g_assert(capabilities);
> >  qobject_unref(resp);
> >  
> >  /* Test valid command before handshake */
> > diff --git a/vl.c b/vl.c
> > index 6e34fb348d..26a0bb3f0f 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -2307,11 +2307,6 @@ static int mon_init_func(void *opaque, QemuOpts 
> > *opts, 

Re: [Qemu-devel] [PATCH v4 4/7] tests: iotests: drop some stderr line

2018-06-19 Thread Peter Xu
On Tue, Jun 19, 2018 at 03:57:07PM +0200, Markus Armbruster wrote:
> Peter Xu  writes:
> 
> > In my Out-Of-Band test, "check -qcow2 060" fail with this (the output is
> > manually changed due to line width requirement):
> >
> > 060 5s ... - output mismatch (see 060.out.bad)
> > --- /home/peterx/git/qemu/tests/qemu-iotests/060.out
> > +++ /home/peterx/git/qemu/bin/tests/qemu-iotests/060.out.bad
> > @@ -427,8 +427,8 @@
> >  QMP_VERSION
> >  {"return": {}}
> >  qcow2: Image is corrupt: L2 table offset 0x2a2a2a00 unaligned (L1
> >   index: 0); further non-fatal corruption events will be suppressed
> > -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > - "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2
> > - table offset 0x2a2a2a0
> > 0 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}}
> >  read failed: Input/output error
> > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > + "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2
> > + table offset 0x2a2a2a0
> > 0 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}}
> >  {"return": ""}
> >  {"return": {}}
> >  {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> >   "event": "SHUTDOWN", "data": {"guest": false}}
> 
> Please indent this diff; I'd expect git-am to choke on it.

Do you mean something like pretty-JSON?

How about I remove this chunk too?  What do you prefer?

> 
> >
> > The order of the event and the in/out error line is swapped.  I didn't
> > dig up the reason, but AFAIU what we want to verify is the event rather
> > than stderr.  Let's drop the stderr line directly for this test.
> >
> > Signed-off-by: Peter Xu 

Regards,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v4 3/7] monitor: flush qmp responses when CLOSED

2018-06-19 Thread Peter Xu
On Tue, Jun 19, 2018 at 03:55:12PM +0200, Markus Armbruster wrote:
> Peter Xu  writes:
> 
> > On Tue, Jun 19, 2018 at 01:34:22PM +0800, Peter Xu wrote:
> >
> > [...]
> >
> >> Fixes: 6d2d563f8c ("qmp: cleanup qmp queues properly", 2018-03-27)
> >> Suggested-by: Markus Armbruster 
> >> Signed-off-by: Peter Xu 
> >> 
> >> Signed-off-by: Peter Xu 
> >
> > I am pretty sure this time that this 2nd line is not there in my local
> > tree. :)
> >
> > I think it's a git-format-patch bug, otherwise I must have misused it
> > for a long time.  Instead of figuring this out and repost again, I'll
> > see how far the rest of the series can go.
> 
> Do you use git-format-patch -s, or have format.signOff set in
> .git/config or ~/.gitconfig?

Ah it's in my ~/.gitconfig!  Removing that fixes the issue.

Though I'm still not sure why the problem doesn't happen with other
patches.  After all, due to the line wrapping mess I still prefer to
drop that chunk in commit message directly.

Regards,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v4 3/7] monitor: flush qmp responses when CLOSED

2018-06-19 Thread Peter Xu
On Tue, Jun 19, 2018 at 03:53:11PM +0200, Markus Armbruster wrote:
> Peter Xu  writes:
> 
> > Previously we clean up the queues when we got CLOSED event.  It was used
> > to make sure we won't send leftover replies/events of a old client to a
> > new client which makes perfect sense. However this will also drop the
> > replies/events even if the output port of the previous chardev backend
> > is still open, which can lead to missing of the last replies/events.
> > Now this patch does an extra operation to flush the response queue
> > before cleaning up.
> >
> > In most cases, a QMP session will be based on a bidirectional channel (a
> > TCP port, for example, we read/write to the same socket handle), so in
> > port and out port of the backend chardev are fundamentally the same
> > port. In these cases, it does not really matter much on whether we'll
> > flush the response queue since flushing will fail anyway.  However there
> > can be cases where in & out ports of the QMP monitor's backend chardev
> > are separated.  Here is an example:
> >
> >   cat $QMP_COMMANDS | qemu -qmp stdio ... | filter_commands
> >
> > In this case, the backend is fd-typed, and it is connected to stdio
> > where in port is stdin and out port is stdout.  Now if we drop all the
> > events on the response queue then filter_command process might miss some
> > events that it might expect.  The thing is that, when stdin closes,
> > stdout might still be there alive!
> >
> > In practice, I encountered SHUTDOWN event missing when running test with
> > iotest 087 with Out-Of-Band enabled.  Here is one of the ways that this
> > can happen (after "quit" command is executed and QEMU quits the main
> > loop):
> >
> > 1. [main thread] QEMU queues a SHUTDOWN event into response queue.
> >
> > 2. "cat" terminates (to distinguish it from the animal, I quote it).
> >
> > 3. [monitor iothread] QEMU's monitor iothread reads EOF from stdin.
> >
> > 4. [monitor iothread] QEMU's monitor iothread calls the CLOSED event
> >hook for the monitor, which will destroy the response queue of the
> >monitor, then the SHUTDOWN event is dropped.
> >
> > 5. [main thread] QEMU's main thread cleans up the monitors in
> >monitor_cleanup().  When trying to flush pending responses, it sees
> >nothing.  SHUTDOWN is lost forever.
> >
> > Note that before the monitor iothread was introduced, step [4]/[5] could
> > never happen since the main loop was the only place to detect the EOF
> > event of stdin and run the CLOSED event hooks.  Now things can happen in
> > parallel in the iothread.
> >
> > Without this patch, iotest 087 will have ~10% chance to miss the
> > SHUTDOWN event and fail when with Out-Of-Band enabled (the output is
> > manually touched up to suite line width requirement):
> 
> I wouldn't wrap lines when quoting a diff.
> 
> >
> > --- /home/peterx/git/qemu/tests/qemu-iotests/087.out
> > +++ /home/peterx/git/qemu/bin/tests/qemu-iotests/087.out.bad
> > @@ -8,7 +8,6 @@
> >  {"return": {}}
> >  {"error": {"class": "GenericError", "desc": "'node-name' must be
> >   specified for the root node"}}
> >  {"return": {}}
> > -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > - "event": "SHUTDOWN", "data": {"guest": false}}
> >
> >  === Duplicate ID ===
> > @@ -53,7 +52,6 @@
> >  {"return": {}}
> >  {"return": {}}
> >  {"return": {}}
> >
> > -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > - "event": "SHUTDOWN", "data": {"guest": false}}
> 
> Please indent the quoted diff a bit, so make it more obviously not part
> of the patch.  In fact, git-am chokes on it for me.

To make it even simpler, I plan to remove the whole chunk of the diff
from the commit message if you won't disagree.

> 
> >
> > This patch fixes the problem.
> >
> > Fixes: 6d2d563f8c ("qmp: cleanup qmp queues properly", 2018-03-27)
> > Suggested-by: Markus Armbruster 
> > Signed-off-by: Peter Xu 
> >
> > Signed-off-by: Peter Xu 
> > ---
> >  monitor.c | 33 ++---
> >  1 file changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/monitor.c b/monitor.c
> > index d4a463f707..c9a02ee40c 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -512,6 +512,27 @@ struct QMPResponse {
> >  };
> >  typedef struct QMPResponse QMPResponse;
> >  
> > +static QObject *monitor_qmp_response_pop_one(Monitor *mon)
> > +{
> > +QObject *data;
> > +
> > +qemu_mutex_lock(>qmp.qmp_queue_lock);
> > +data = g_queue_pop_head(mon->qmp.qmp_responses);
> > +qemu_mutex_unlock(>qmp.qmp_queue_lock);
> > +
> > +return data;
> > +}
> > +
> > +static void monitor_qmp_response_flush(Monitor *mon)
> > +{
> > +QObject *data;
> > +
> > +while ((data = monitor_qmp_response_pop_one(mon))) {
> > +monitor_json_emitter_raw(mon, data);
> > +qobject_unref(data);
> > +}
> > +}
> > +
> >  /*
> >   * Pop a QMPResponse from any monitor's response queue into @response.
> >   * Return false if all the queues are empty; 

Re: [Qemu-devel] [PATCH v3 1/2] kvm: support -dedicated cpu-pm=on|off

2018-06-19 Thread Michael S. Tsirkin
On Wed, Jun 20, 2018 at 08:46:10AM +0800, Wanpeng Li wrote:
> On Wed, 20 Jun 2018 at 08:07, Michael S. Tsirkin  wrote:
> >
> > On Tue, Jun 19, 2018 at 05:07:46PM -0500, Eric Blake wrote:
> > > On 06/19/2018 10:17 AM, Paolo Bonzini wrote:
> > > > On 16/06/2018 00:29, Michael S. Tsirkin wrote:
> > > > > +static QemuOptsList qemu_dedicated_opts = {
> > > > > +.name = "dedicated",
> > > > > +.head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
> > > > > +.desc = {
> > > > > +{
> > > > > +.name = "mem-lock",
> > > > > +.type = QEMU_OPT_BOOL,
> > > > > +},
> > > > > +{
> > > > > +.name = "cpu-pm",
> > > > > +.type = QEMU_OPT_BOOL,
> > > > > +},
> > > > > +{ /* end of list */ }
> > > > > +},
> > > > > +};
> > > > > +
> > > >
> > > > Let the bikeshedding begin!
> > > >
> > > > 1) Should we deprecate -realtime?
> > > >
> > > > 2) Maybe -hostresource?
> > >
> > > What further things might we add in the future?
> > >
> > > -dedicated sounds wrong (it is an adjective, while most of our options are
> > > nouns - thing -machine, -drive, -object, ...)
> > >
> > > -hostresource at least sounds like a noun, but is long to type.  But at
> > > least '-hostresource cpu-pm=on' reads reasonably well.
> >
> > Yes but host resource what? I feel it says nothing at all about what
> > one can expect to find in this flag.
> >
> > > About the only other noun I could think of would be '-feature cpu-pm=on'.
> >
> > If we have nothing at all to say about what is grouping these things,
> > we don't need a new flag. We can make it a machine property.
> >
> > It's user's hint that some host resource is dedicated to a VM.
> 
> The commit 633711e82 (kvm: rename KVM_HINTS_DEDICATED to
> KVM_HINTS_REALTIME) should be reverted according to several threads
> discussion I think.
> 
> Regards,
> Wanpeng Li

IMHO that is unrelated - these KVM hints are hints to *guest*.

In this thread we are talking about hints to QEMU that are only
necessary because QEMU is separate from the host scheduler/memory
management.

-- 
MST



Re: [Qemu-devel] [PATCH for-2.11.2] spapr: make pseries-2.11 the default machine type

2018-06-19 Thread David Gibson
On Tue, Jun 19, 2018 at 01:11:28PM +0200, Greg Kurz wrote:
> On Mon, 18 Jun 2018 21:04:38 -0500
> Michael Roth  wrote:
> 
> > Quoting Greg Kurz (2018-05-22 12:17:28)
> > > The spapr capability framework was introduced in QEMU 2.12. It allows
> > > to have an explicit control on how host features are exposed to the
> > > guest. This is especially needed to handle migration between hetero-
> > > geneous hosts (eg, POWER8 to POWER9). It is also used to expose fixes/
> > > workarounds against speculative execution vulnerabilities to guests.
> > > The framework was hence backported to QEMU 2.11.1, especially these
> > > commits:
> > > 
> > > 0fac4aa93074 spapr: Add pseries-2.12 machine type
> > > 9070f408f491 spapr: Treat Hardware Transactional Memory (HTM) as an
> > >  optional capability
> > > 
> > > 0fac4aa93074 has the confusing effect of making pseries-2.12 the default
> > > machine type for QEMU 2.11.1, instead of the expected pseries-2.11. This
> > > patch changes the default machine back to pseries-2.11.
> > > 
> > > Unfortunately, 9070f408f491 enforces the HTM capability for pseries-2.11.
> > > This isn't supported by TCG and breaks 'make check'. So this patch also
> > > adds a hack to turn HTM off when using TCG.  
> > 
> > I noticed this ends up breaking TCG migration for 2.11.2 -> 2.12, I
> > get this on the target side even when specifying -machine
> > pseries-2.11,cap-htm=off for both ends:
> > 
> >   qemu-system-ppc64: cap-htm higher level (1) in incoming stream than on 
> > destination (0)
> >   qemu-system-ppc64: error while loading state for instance 0x0 of device 
> > 'spapr'
> >   qemu-system-ppc64: load of migration failed: Invalid argument
> > 
> > I'm not sure we care all that much about it but it's a regression from 
> > 2.11.1
> > at least. The main issue seems to be the default caps for 2.11.2 for TCG are
> > now different from 2.11 and 2.12+, but spapr_cap_##cap##_needed still 
> > assumes
> > everything is the same across all these versions and as such opts not to
> > migrate cap-htm=off, since that's the default for 2.11.2. This results in 
> > the
> > target assuming the source was using the default, which is cap-htm=on,
> > and since that disagrees with the spapr->eff we get a failure.
> > 
> > It seems spapr_cap_##cap##_needed needs to be fixed up to address that,
> > but I'm not sure how best to deal with backward compatibility in that case.
> > Any ideas? If it ends up being a trade-off I think forward compatibility is
> > more important.
> > 
> 
> Yeah, we shouldn't change the default since it affects the migration logic :-\
> 
> The motivation behind this hack is to fix TCG based 'make check', because
> it doesn't pass cap-htm=off, and thus can't run with pseries-2.11.
> 
> Another possibility is to let the default as is, and to disable HTM after the
> default caps have been applied.
> 
> Something like that squashed into this patch:
> 
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index 82043e60e78b..26e6be043b18 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -285,11 +285,6 @@ static sPAPRCapabilities 
> default_caps_with_cpu(sPAPRMachineState *spapr,
>  
>  caps = smc->default_caps;
>  
> -/* HACK for 2.11.2: fix make check */
> -if (tcg_enabled()) {
> -caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
> -}
> -
>  if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_2_07,
>0, spapr->max_compat_pvr)) {
>  caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
> @@ -405,6 +400,11 @@ void spapr_caps_reset(sPAPRMachineState *spapr)
>  }
>  }
>  
> +/* HACK for 2.11.2: fix make check */
> +if (tcg_enabled()) {
> +spapr->eff.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
> +}
> +
>  /* .. then apply those caps to the virtual hardware */
>  
>  for (i = 0; i < SPAPR_CAP_NUM; i++) {
> -

No!  The whole point of the caps stuff is to stop changing guest
visible behaviours based on host side configuration like the
accelerator.  So, really, let's not put it back in.

The correct fix is to add cap-htm=off to the testcases.  Gross, but
necessary.


> 
> This allows:
> - TCG 'make check' to be happy with pseries-2.11
> - 2.11.2 --> 2.12 migration and backward
> 
> > > 
> > > Signed-off-by: Greg Kurz 
> > > ---
> > >  hw/ppc/spapr.c  |4 ++--
> > >  hw/ppc/spapr_caps.c |5 +
> > >  2 files changed, 7 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 1a2dd1f597d9..6499a867520f 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -3820,7 +3820,7 @@ static void 
> > > spapr_machine_2_12_class_options(MachineClass *mc)
> > >  /* Defaults for the latest behaviour inherited from the base class */
> > >  }
> > > 
> > > -DEFINE_SPAPR_MACHINE(2_12, "2.12", true);
> > > +DEFINE_SPAPR_MACHINE(2_12, "2.12", false);
> > > 
> > >  /*
> > >   * pseries-2.11
> > > @@ -3842,7 +3842,7 @@ static void 

Re: [Qemu-devel] [PATCH v2 3/8] ppc4xx_i2c: Implement directcntl register

2018-06-19 Thread David Gibson
On Tue, Jun 19, 2018 at 11:29:09AM +0200, BALATON Zoltan wrote:
> On Mon, 18 Jun 2018, David Gibson wrote:
> > On Wed, Jun 13, 2018 at 04:03:18PM +0200, BALATON Zoltan wrote:
> > > On Wed, 13 Jun 2018, David Gibson wrote:
> > > > On Wed, Jun 13, 2018 at 10:54:22AM +0200, BALATON Zoltan wrote:
> > > > > On Wed, 13 Jun 2018, David Gibson wrote:
> > > > > > On Wed, Jun 06, 2018 at 03:31:48PM +0200, BALATON Zoltan wrote:
> > > > > > > diff --git a/hw/i2c/ppc4xx_i2c.c b/hw/i2c/ppc4xx_i2c.c
> > > > > > > index a68b5f7..5806209 100644
> > > > > > > --- a/hw/i2c/ppc4xx_i2c.c
> > > > > > > +++ b/hw/i2c/ppc4xx_i2c.c
> > > > > > > @@ -30,6 +30,7 @@
> > > > > > >  #include "cpu.h"
> > > > > > >  #include "hw/hw.h"
> > > > > > >  #include "hw/i2c/ppc4xx_i2c.h"
> > > > > > > +#include "bitbang_i2c.h"
> > > > > > > 
> > > > > > >  #define PPC4xx_I2C_MEM_SIZE 18
> > > > > > > 
> > > > > > > @@ -46,7 +47,13 @@
> > > > > > > 
> > > > > > >  #define IIC_XTCNTLSS_SRST   (1 << 0)
> > > > > > > 
> > > > > > > +#define IIC_DIRECTCNTL_SDAC (1 << 3)
> > > > > > > +#define IIC_DIRECTCNTL_SCLC (1 << 2)
> > > > > > > +#define IIC_DIRECTCNTL_MSDA (1 << 1)
> > > > > > > +#define IIC_DIRECTCNTL_MSCL (1 << 0)
> > > > > > > +
> > > > > > >  typedef struct {
> > > > > > > +bitbang_i2c_interface *bitbang;
> > > > > > >  uint8_t mdata;
> > > > > > >  uint8_t lmadr;
> > > > > > >  uint8_t hmadr;
> > > > > > > @@ -308,7 +315,11 @@ static void ppc4xx_i2c_writeb(void *opaque, 
> > > > > > > hwaddr addr, uint64_t value,
> > > > > > >  i2c->xtcntlss = value;
> > > > > > >  break;
> > > > > > >  case 16:
> > > > > > > -i2c->directcntl = value & 0x7;
> > > > > > > +i2c->directcntl = value & (IIC_DIRECTCNTL_SDAC & 
> > > > > > > IIC_DIRECTCNTL_SCLC);
> > > > > > > +i2c->directcntl |= (value & IIC_DIRECTCNTL_SCLC ? 1 : 0);
> > > > > > > +bitbang_i2c_set(i2c->bitbang, BITBANG_I2C_SCL, 
> > > > > > > i2c->directcntl & 1);
> > > > > > 
> > > > > > Shouldn't that use i2c->directcntl & IIC_DIRECTCNTL_MSCL ?
> > > > > > 
> > > > > > > +i2c->directcntl |= bitbang_i2c_set(i2c->bitbang, 
> > > > > > > BITBANG_I2C_SDA,
> > > > > > > +   (value & IIC_DIRECTCNTL_SDAC) != 
> > > > > > > 0) << 1;
> > > > > > 
> > > > > > Last expression might be clearer as:
> > > > > > value & IIC_DIRECTCNTL_SDAC ? IIC_DIRECTCNTL_MSDA : 0
> > > > > 
> > > > > I guess this is a matter of taste but to me IIC_DIRECTCNTL_MSDA is a 
> > > > > bit
> > > > > position in the register so I use that when accessing that bit but 
> > > > > when I
> > > > > check for the values of a bit being 0 or 1 I don't use the define 
> > > > > which is
> > > > > for something else, just happens to have value 1 as well.
> > > > 
> > > > Hmm.. but the bit is being store in i2c->directcntl, which means it
> > > > can be read back from the register in that position, no?
> > > 
> > > Which of the above two do you mean?
> > > 
> > > In the first one I test for the 1/0 value set by the previous line before
> > > the bitbang_i2c_set call. This could be accessed as MSCL later but using
> > > that here would just make it longer and less obvious. If I want to be
> > > absolutely precise maybe it should be (value & IIC_DIRECTCNTL_SCL ? 1 : 0)
> > > in this line too but that was just stored in the register one line before 
> > > so
> > > I can reuse that here as well. Otherwise I could add another variable just
> > > for this bit value and use that in both lines but why make it more
> > > complicated for a simple 1 or 0 value?
> > 
> > Longer maybe, but I don't know about less obvious.  Actually I think
> > you should use IIC_DIRECTCNTL_MSCL instead of a bare '1' in both the
> > line setting i2c->directcntl, then the next line checking that bit to
> > pass it into bitbang_i2c_set.  The point is you're modifying the
> > effective register contents, so it makes sense to make it clearer
> > which bit of the register you're setting.
> 
> When setting the bit it's the value 1 so that's not the bit
> position,

Huh??  The constants aren't bit positions either, they're masks.  How
is IIC_DIRECTCNTL_MSCL wrong here?

> I
> think 1 : 0 is correct there.

Correct, sure, but less clear than it could be.

> I've changed the next line in v4 I've just
> sent to the constant when checking the value of the MSCL bit.
> 
> > > In the second case using MSDA is really not correct because the level to 
> > > set
> > > is defined by SDAC bit. The SDAC, SCLC bits are what the program sets to
> > > tell which states the two i2c lines should be and the MSDA, MSCL are read
> > > only bits that show what states the lines really are.
> > 
> > Ok...
> > 
> > > IIC_DIRECTCNTL_MSDA has value of 1 but it means the second bit in the
> > > directcntl reg (which could have 0 or 1 value) not 1 value of a bit or i2c
> > > line.
> > 
> > Uh.. what?  AFAICT, based on the result of bitbang_i2c_set() you're
> > updating the value of the MSDA (== 0x2) 

Re: [Qemu-devel] [Qemu-block] [RFC 1/1] ide: bug #1777315: io_buffer_size and sg.size can represent partial sector sizes

2018-06-19 Thread Amol Surati
On Wed, Jun 20, 2018 at 06:23:19AM +0530, Amol Surati wrote:
> On Tue, Jun 19, 2018 at 05:43:52PM -0400, John Snow wrote:
> > 
> > 
> > On 06/19/2018 05:26 PM, Amol Surati wrote:
> > > On Tue, Jun 19, 2018 at 08:04:03PM +0530, Amol Surati wrote:
> > >> On Tue, Jun 19, 2018 at 09:45:15AM -0400, John Snow wrote:
> > >>>
> > >>>
> > >>> On 06/19/2018 04:53 AM, Kevin Wolf wrote:
> >  Am 19.06.2018 um 06:01 hat Amol Surati geschrieben:
> > > On Mon, Jun 18, 2018 at 08:14:10PM -0400, John Snow wrote:
> > >>
> > >>
> > >> On 06/18/2018 02:02 PM, Amol Surati wrote:
> > >>> On Mon, Jun 18, 2018 at 12:05:15AM +0530, Amol Surati wrote:
> >  This patch fixes the assumption that io_buffer_size is always a 
> >  perfect
> >  multiple of the sector size. The assumption is the cause of the 
> >  firing
> >  of 'assert(n * 512 == s->sg.size);'.
> > 
> >  Signed-off-by: Amol Surati 
> >  ---
> > >>>
> > >>> The repository https://github.com/asurati/1777315 contains a module 
> > >>> for
> > >>> QEMU's 8086:7010 ATA controller, which exercises the code path
> > >>> described in [RFC 0/1] of this series.
> > >>>
> > >>
> > >> Thanks, this made it easier to see what was happening. I was able to
> > >> write an ide-test test case using this source as a guide, and 
> > >> reproduce
> > >> the error.
> > >>
> > >> static void test_bmdma_partial_sector_short_prdt(void)
> > >> {
> > >> QPCIDevice *dev;
> > >> QPCIBar bmdma_bar, ide_bar;
> > >> uint8_t status;
> > >>
> > >> /* Read 2 sectors but only give 1 sector in PRDT */
> > >> PrdtEntry prdt[] = {
> > >> {
> > >> .addr = 0,
> > >> .size = cpu_to_le32(0x200),
> > >> },
> > >> {
> > >> .addr = 512,
> > >> .size = cpu_to_le32(0x44 | PRDT_EOT),
> > >> }
> > >> };
> > >>
> > >> dev = get_pci_device(_bar, _bar);
> > >> status = send_dma_request(CMD_READ_DMA, 0, 2,
> > >>   prdt, ARRAY_SIZE(prdt), NULL);
> > >> g_assert_cmphex(status, ==, 0);
> > >> assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | 
> > >> ERR);
> > >> free_pci_device(dev);
> > >> }
> > >>
> > >>> Loading the module reproduces the bug. Tested on the latest master
> > >>> branch.
> > >>>
> > >>> Steps:
> > >>> - Install a Linux distribution as a guest, ensuring that the boot 
> > >>> disk
> > >>>   resides on non-IDE controllers (such as virtio)
> > >>> - Attach another disk as a master device on the primary
> > >>>   IDE controller (i.e. attach at -hda.)
> > >>> - Blacklist ata_piix, pata_acpi and ata_generic modules, and reboot.
> > >>> - Copy the source files into the guest and build the module.
> > >>> - Load the module. QEMU process should die with the message:
> > >>>   qemu-system-x86_64: hw/ide/core.c:871: ide_dma_cb:
> > >>>   Assertion `n * 512 == s->sg.size' failed.
> > >>>
> > >>>
> > >>> -Amol
> > >>>
> > >>
> > >> I'm less sure of the fix -- certainly the assert is wrong, but just
> > >> incrementing 'n' is wrong too -- we didn't copy (n+1) sectors, we 
> > >> copied
> > >> (n) and a few extra bytes.
> > >
> > > That is true.
> > >
> > > There are (at least) two fields that represent the total size of a DMA
> > > transfer -
> > > (1) The size, as requested through the NSECTOR field.
> > > (2) The size, as calculated through the length fields of the PRD 
> > > entries.
> > >
> > > It makes sense to consider the most restrictive of the sizes, as the 
> > > factor
> > > which determines both the end of a successful DMA transfer and the
> > > condition to assert.
> > >
> > >>
> > >> The sector-based math here would need to be adjusted to be able to 
> > >> cope
> > >> with partial sector reads... or we ought to avoid doing any partial
> > >> sector transfers.
> > >>
> > >>
> > >> I'm not sure which is more correct tonight, it depends:
> > >>
> > >> - If it's OK to transfer partial sectors before reporting overflow,
> > >> adjusting the command loop to work with partial sectors is OK.
> > >>
> > >> - If it's NOT OK to do partial sector transfer, the sglist 
> > >> preparation
> > >> phase needs to produce a truncated SGList that's some multiple of 512
> > >> bytes that leaves the excess bytes in a second sglist that we don't
> > >> throw away and can use as a basis for building the next sglist. (Or 
> > >> the
> > >> DMA helpers need to take a max_bytes parameter and return an sglist
> > >> representing unused buffer space if the command underflowed.)
> > >
> > > Support for 

Re: [Qemu-devel] [PATCH V1 RESEND 4/6] numa: Extend the command-line to provide memory latency and bandwidth information

2018-06-19 Thread Liu, Jingqi



On 6/19/2018 11:39 PM, Eric Blake wrote:

On 06/19/2018 10:20 AM, Liu Jingqi wrote:

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi 
---
  numa.c  | 124 


  qapi/misc.json  |  92 -
  qemu-options.hx |  28 -
  3 files changed, 241 insertions(+), 3 deletions(-)




+++ b/qapi/misc.json
@@ -2736,10 +2736,12 @@
  #
  # @cpu: property based CPU(s) to node mapping (Since: 2.10)
  #
+# @hmat-lb: memory latency and bandwidth information (Since: 2.13)


s/2.13/3.0/ through your series



  ##
+# @HmatLBMemoryHierarchy:
+#
+# The memory hierarchy in the System Locality Latency
+# and Bandwidth Information Structure of HMAT


Worth including the expansion of the acronym HMAT for someone not 
familiar with the term?



+#
+# @memory: the structure represents the memory performance
+#
+# @last-level: last level memory of memory side cached memory
+#
+# @1st-level: first level memory of memory side cached memory
+#
+# @2nd-level: second level memory of memory side cached memory
+#
+# @3rd-level: third level memory of memory side cached memory
+#
+# Since: 2.13
+##
+{ 'enum': 'HmatLBMemoryHierarchy',
+  'data': [ 'memory', 'last-level', '1st-level',
+    '2nd-level', '3rd-level' ] }


enum values starting with a digit is permitted for legacy reasons, but 
I'm reluctant to add more without good cause.  Can you spell these 
'first, second, third' instead of '1st, 2nd, 3rd'?



+
+##
+# @HmatLBDataType:
+#
+# Data type in the System Locality Latency
+# and Bandwidth Information Structure of HMAT
+#
+# @access-latency: access latency
+#
+# @read-latency: read latency
+#
+# @write-latency: write latency
+#
+# @access-bandwidth: access bandwitch


s/witch/width/

Also, in what units are these numbers?


Thanks for your review. I will modify them accordingly.
Jingqi



Re: [Qemu-devel] [PATCH v2 3/4] ppc/pnv: introduce Pnv8Chip and Pnv9Chip models

2018-06-19 Thread David Gibson
On Tue, Jun 19, 2018 at 07:24:44AM +0200, Cédric Le Goater wrote:
> 
> >>>  typedef struct PnvChipClass {
> >>>  /*< private >*/
> >>> @@ -75,6 +95,7 @@ typedef struct PnvChipClass {
> >>>  
> >>>  hwaddr   xscom_base;
> >>>  
> >>> +void (*realize)(PnvChip *chip, Error **errp);
> >>
> >> This looks the wrong way round from how things are usually done.
> >> Rather than having the base class realize() call the subclass specific
> >> realize hook, it's more usual for the subclass to set the
> >> dc->realize() and have it call a k->parent_realize() to call up the
> >> chain.  grep for device_class_set_parent_realize() for some more
> >> examples.
> >
> > Ah. That is more to my liking. There are a couple of models following
> > the wrong object pattern, xics, vio. I will check them.
> 
> So XICS is causing some head-aches because the ics-kvm model inherits
> from ics-simple which inherits from ics-base. so we have a grand-parent
> class to handle.

Ok.  I mean, we should probably switch ics around to use the
parent_realize model, rather than the backwards one it does now.  But
it's not immediately obvious to me why having a grandparent class
breaks things.

> if we could affiliate ics-kvm directly to ics-base it would make the 
> family affair easier. we need to check migration though.

But that said, I've been thinking for a while that it might make sense
to fold ics-kvm into ics-base.  It seems very risky to have two
different object classes that are supposed to have guest-identical
behaviour.  Certainly adding any migratable state to one not the other
would be horribly wrong.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-block] [RFC 1/1] ide: bug #1777315: io_buffer_size and sg.size can represent partial sector sizes

2018-06-19 Thread Amol Surati
On Tue, Jun 19, 2018 at 05:43:52PM -0400, John Snow wrote:
> 
> 
> On 06/19/2018 05:26 PM, Amol Surati wrote:
> > On Tue, Jun 19, 2018 at 08:04:03PM +0530, Amol Surati wrote:
> >> On Tue, Jun 19, 2018 at 09:45:15AM -0400, John Snow wrote:
> >>>
> >>>
> >>> On 06/19/2018 04:53 AM, Kevin Wolf wrote:
>  Am 19.06.2018 um 06:01 hat Amol Surati geschrieben:
> > On Mon, Jun 18, 2018 at 08:14:10PM -0400, John Snow wrote:
> >>
> >>
> >> On 06/18/2018 02:02 PM, Amol Surati wrote:
> >>> On Mon, Jun 18, 2018 at 12:05:15AM +0530, Amol Surati wrote:
>  This patch fixes the assumption that io_buffer_size is always a 
>  perfect
>  multiple of the sector size. The assumption is the cause of the 
>  firing
>  of 'assert(n * 512 == s->sg.size);'.
> 
>  Signed-off-by: Amol Surati 
>  ---
> >>>
> >>> The repository https://github.com/asurati/1777315 contains a module 
> >>> for
> >>> QEMU's 8086:7010 ATA controller, which exercises the code path
> >>> described in [RFC 0/1] of this series.
> >>>
> >>
> >> Thanks, this made it easier to see what was happening. I was able to
> >> write an ide-test test case using this source as a guide, and reproduce
> >> the error.
> >>
> >> static void test_bmdma_partial_sector_short_prdt(void)
> >> {
> >> QPCIDevice *dev;
> >> QPCIBar bmdma_bar, ide_bar;
> >> uint8_t status;
> >>
> >> /* Read 2 sectors but only give 1 sector in PRDT */
> >> PrdtEntry prdt[] = {
> >> {
> >> .addr = 0,
> >> .size = cpu_to_le32(0x200),
> >> },
> >> {
> >> .addr = 512,
> >> .size = cpu_to_le32(0x44 | PRDT_EOT),
> >> }
> >> };
> >>
> >> dev = get_pci_device(_bar, _bar);
> >> status = send_dma_request(CMD_READ_DMA, 0, 2,
> >>   prdt, ARRAY_SIZE(prdt), NULL);
> >> g_assert_cmphex(status, ==, 0);
> >> assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | 
> >> ERR);
> >> free_pci_device(dev);
> >> }
> >>
> >>> Loading the module reproduces the bug. Tested on the latest master
> >>> branch.
> >>>
> >>> Steps:
> >>> - Install a Linux distribution as a guest, ensuring that the boot disk
> >>>   resides on non-IDE controllers (such as virtio)
> >>> - Attach another disk as a master device on the primary
> >>>   IDE controller (i.e. attach at -hda.)
> >>> - Blacklist ata_piix, pata_acpi and ata_generic modules, and reboot.
> >>> - Copy the source files into the guest and build the module.
> >>> - Load the module. QEMU process should die with the message:
> >>>   qemu-system-x86_64: hw/ide/core.c:871: ide_dma_cb:
> >>>   Assertion `n * 512 == s->sg.size' failed.
> >>>
> >>>
> >>> -Amol
> >>>
> >>
> >> I'm less sure of the fix -- certainly the assert is wrong, but just
> >> incrementing 'n' is wrong too -- we didn't copy (n+1) sectors, we 
> >> copied
> >> (n) and a few extra bytes.
> >
> > That is true.
> >
> > There are (at least) two fields that represent the total size of a DMA
> > transfer -
> > (1) The size, as requested through the NSECTOR field.
> > (2) The size, as calculated through the length fields of the PRD 
> > entries.
> >
> > It makes sense to consider the most restrictive of the sizes, as the 
> > factor
> > which determines both the end of a successful DMA transfer and the
> > condition to assert.
> >
> >>
> >> The sector-based math here would need to be adjusted to be able to cope
> >> with partial sector reads... or we ought to avoid doing any partial
> >> sector transfers.
> >>
> >>
> >> I'm not sure which is more correct tonight, it depends:
> >>
> >> - If it's OK to transfer partial sectors before reporting overflow,
> >> adjusting the command loop to work with partial sectors is OK.
> >>
> >> - If it's NOT OK to do partial sector transfer, the sglist preparation
> >> phase needs to produce a truncated SGList that's some multiple of 512
> >> bytes that leaves the excess bytes in a second sglist that we don't
> >> throw away and can use as a basis for building the next sglist. (Or the
> >> DMA helpers need to take a max_bytes parameter and return an sglist
> >> representing unused buffer space if the command underflowed.)
> >
> > Support for partial sector transfers is built into the DMA interface's 
> > PRD
> > mechanism itself, because an entry is allowed to transfer in the units 
> > of
> > even number of bytes.
> >
> > I think the controller's IO process runs in two parts (probably loops 
> > over
> > for a single transfer):
> >
> > (1) The 

Re: [Qemu-devel] [PATCH v2 3/3] spapr: introduce a fixed IRQ number space

2018-06-19 Thread David Gibson
On Tue, Jun 19, 2018 at 12:05:21PM +0200, Cédric Le Goater wrote:
> On 06/19/2018 03:02 AM, David Gibson wrote:
> > On Mon, Jun 18, 2018 at 07:34:02PM +0200, Cédric Le Goater wrote:
> >> This proposal introduces a new IRQ number space layout using static
> >> numbers for all devices and a bitmap allocator for the MSI numbers
> >> which are negotiated by the guest at runtime.
> >>
> >> The previous layout is kept in machines raising the 'xics_legacy'
> >> flag.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  include/hw/ppc/spapr.h |  4 
> >>  include/hw/ppc/spapr_irq.h | 30 +
> >>  hw/ppc/spapr.c | 31 +
> >>  hw/ppc/spapr_events.c  | 12 --
> >>  hw/ppc/spapr_irq.c | 56 
> >> ++
> >>  hw/ppc/spapr_pci.c | 28 ++-
> >>  hw/ppc/spapr_vio.c | 19 
> >>  hw/ppc/Makefile.objs   |  2 +-
> >>  8 files changed, 169 insertions(+), 13 deletions(-)
> >>  create mode 100644 include/hw/ppc/spapr_irq.h
> >>  create mode 100644 hw/ppc/spapr_irq.c
> >>
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index 9decc66a1915..4c63b1fac13b 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -7,6 +7,7 @@
> >>  #include "hw/ppc/spapr_drc.h"
> >>  #include "hw/mem/pc-dimm.h"
> >>  #include "hw/ppc/spapr_ovec.h"
> >> +#include "hw/ppc/spapr_irq.h"
> >>  
> >>  struct VIOsPAPRBus;
> >>  struct sPAPRPHBState;
> >> @@ -164,6 +165,9 @@ struct sPAPRMachineState {
> >>  char *kvm_type;
> >>  
> >>  const char *icp_type;
> >> +bool xics_legacy;
> > 
> > This flag can go in the class, rather than the instance.
> > 
> > And maybe call it 'legacy_irq_allocation'.  It assumes XICS, but
> > otherwise isn't strongly tied to it.
> 
> Here's another idea.
> 
> Instead of a bool, we could use a find() operation if it is defined 
> by the spapr_irq backend.

So, I don't think find() should go into the irq backend you've been
describing elsewhere.  I think that should be restricted to the
claim() side stuff.  But you could make it a sPAPRMachineClass method,
and use the static allocations if it's NULL.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v4 01/11] ppc4xx_i2c: Remove unimplemented sdata and intr registers

2018-06-19 Thread David Gibson
On Tue, Jun 19, 2018 at 10:52:15AM +0200, BALATON Zoltan wrote:
> We don't emulate slave mode so related registers are not needed.
> [lh]sadr are only retained to avoid too many warnings and simplify
> debugging but sdata is not even correct because device has a 4 byte
> FIFO instead so just remove this unimplemented register for now.
> 
> The intr register is also not implemented correctly, it is for
> diagnostics and normally not even visible on device without explicitly
> enabling it. As no guests are known to need this remove it as well.
> 
> Signed-off-by: BALATON Zoltan 
> ---
> v4: Updated commit message

Applied to ppc-for-3.0, thanks.

> 
>  hw/i2c/ppc4xx_i2c.c | 16 +---
>  include/hw/i2c/ppc4xx_i2c.h |  4 +---
>  2 files changed, 2 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/i2c/ppc4xx_i2c.c b/hw/i2c/ppc4xx_i2c.c
> index d1936db..4e0aaae 100644
> --- a/hw/i2c/ppc4xx_i2c.c
> +++ b/hw/i2c/ppc4xx_i2c.c
> @@ -3,7 +3,7 @@
>   *
>   * Copyright (c) 2007 Jocelyn Mayer
>   * Copyright (c) 2012 François Revol
> - * Copyright (c) 2016 BALATON Zoltan
> + * Copyright (c) 2016-2018 BALATON Zoltan
>   *
>   * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
>   * of this software and associated documentation files (the "Software"), to 
> deal
> @@ -63,7 +63,6 @@ static void ppc4xx_i2c_reset(DeviceState *s)
>  i2c->mdcntl = 0;
>  i2c->sts = 0;
>  i2c->extsts = 0x8f;
> -i2c->sdata = 0;
>  i2c->lsadr = 0;
>  i2c->hsadr = 0;
>  i2c->clkdiv = 0;
> @@ -71,7 +70,6 @@ static void ppc4xx_i2c_reset(DeviceState *s)
>  i2c->xfrcnt = 0;
>  i2c->xtcntlss = 0;
>  i2c->directcntl = 0xf;
> -i2c->intr = 0;
>  }
>  
>  static inline bool ppc4xx_i2c_is_master(PPC4xxI2CState *i2c)
> @@ -139,9 +137,6 @@ static uint64_t ppc4xx_i2c_readb(void *opaque, hwaddr 
> addr, unsigned int size)
>TYPE_PPC4xx_I2C, __func__);
>  }
>  break;
> -case 2:
> -ret = i2c->sdata;
> -break;
>  case 4:
>  ret = i2c->lmadr;
>  break;
> @@ -181,9 +176,6 @@ static uint64_t ppc4xx_i2c_readb(void *opaque, hwaddr 
> addr, unsigned int size)
>  case 16:
>  ret = i2c->directcntl;
>  break;
> -case 17:
> -ret = i2c->intr;
> -break;
>  default:
>  if (addr < PPC4xx_I2C_MEM_SIZE) {
>  qemu_log_mask(LOG_UNIMP, "%s: Unimplemented register 0x%"
> @@ -229,9 +221,6 @@ static void ppc4xx_i2c_writeb(void *opaque, hwaddr addr, 
> uint64_t value,
>  }
>  }
>  break;
> -case 2:
> -i2c->sdata = value;
> -break;
>  case 4:
>  i2c->lmadr = value;
>  if (i2c_bus_busy(i2c->bus)) {
> @@ -302,9 +291,6 @@ static void ppc4xx_i2c_writeb(void *opaque, hwaddr addr, 
> uint64_t value,
>  case 16:
>  i2c->directcntl = value & 0x7;
>  break;
> -case 17:
> -i2c->intr = value;
> -break;
>  default:
>  if (addr < PPC4xx_I2C_MEM_SIZE) {
>  qemu_log_mask(LOG_UNIMP, "%s: Unimplemented register 0x%"
> diff --git a/include/hw/i2c/ppc4xx_i2c.h b/include/hw/i2c/ppc4xx_i2c.h
> index 3c60307..e4b6ded 100644
> --- a/include/hw/i2c/ppc4xx_i2c.h
> +++ b/include/hw/i2c/ppc4xx_i2c.h
> @@ -3,7 +3,7 @@
>   *
>   * Copyright (c) 2007 Jocelyn Mayer
>   * Copyright (c) 2012 François Revol
> - * Copyright (c) 2016 BALATON Zoltan
> + * Copyright (c) 2016-2018 BALATON Zoltan
>   *
>   * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
>   * of this software and associated documentation files (the "Software"), to 
> deal
> @@ -49,7 +49,6 @@ typedef struct PPC4xxI2CState {
>  uint8_t mdcntl;
>  uint8_t sts;
>  uint8_t extsts;
> -uint8_t sdata;
>  uint8_t lsadr;
>  uint8_t hsadr;
>  uint8_t clkdiv;
> @@ -57,7 +56,6 @@ typedef struct PPC4xxI2CState {
>  uint8_t xfrcnt;
>  uint8_t xtcntlss;
>  uint8_t directcntl;
> -uint8_t intr;
>  } PPC4xxI2CState;
>  
>  #endif /* PPC4XX_I2C_H */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 3/3] spapr: introduce a fixed IRQ number space

2018-06-19 Thread David Gibson
On Tue, Jun 19, 2018 at 07:00:18AM +0200, Cédric Le Goater wrote:
> On 06/19/2018 03:02 AM, David Gibson wrote:
> > On Mon, Jun 18, 2018 at 07:34:02PM +0200, Cédric Le Goater wrote:
> >> This proposal introduces a new IRQ number space layout using static
> >> numbers for all devices and a bitmap allocator for the MSI numbers
> >> which are negotiated by the guest at runtime.
> >>
> >> The previous layout is kept in machines raising the 'xics_legacy'
> >> flag.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  include/hw/ppc/spapr.h |  4 
> >>  include/hw/ppc/spapr_irq.h | 30 +
> >>  hw/ppc/spapr.c | 31 +
> >>  hw/ppc/spapr_events.c  | 12 --
> >>  hw/ppc/spapr_irq.c | 56 
> >> ++
> >>  hw/ppc/spapr_pci.c | 28 ++-
> >>  hw/ppc/spapr_vio.c | 19 
> >>  hw/ppc/Makefile.objs   |  2 +-
> >>  8 files changed, 169 insertions(+), 13 deletions(-)
> >>  create mode 100644 include/hw/ppc/spapr_irq.h
> >>  create mode 100644 hw/ppc/spapr_irq.c
> >>
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index 9decc66a1915..4c63b1fac13b 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -7,6 +7,7 @@
> >>  #include "hw/ppc/spapr_drc.h"
> >>  #include "hw/mem/pc-dimm.h"
> >>  #include "hw/ppc/spapr_ovec.h"
> >> +#include "hw/ppc/spapr_irq.h"
> >>  
> >>  struct VIOsPAPRBus;
> >>  struct sPAPRPHBState;
> >> @@ -164,6 +165,9 @@ struct sPAPRMachineState {
> >>  char *kvm_type;
> >>  
> >>  const char *icp_type;
> >> +bool xics_legacy;
> > 
> > This flag can go in the class, rather than the instance.
> > 
> > And maybe call it 'legacy_irq_allocation'.  It assumes XICS, but
> > otherwise isn't strongly tied to it.
> 
> OK.
> 
> >> +int32_t irq_map_nr;
> >> +unsigned long *irq_map;
> > 
> > So, I don't love the fact that the new bitmap duplicates information
> > that's also in the intc backend (e.g. via ICS_IRQ_FREE()).  
> 
> Yes. I agree. new devices using MSI like interrupts will follow the
> same pattern for allocation. 
> 
> we have two layers of IRQ routines, one for the IRQ numbers and one 
> for the controller backend. May be we could call the backend handling 
> routing from the msi one ? 
> 
> > However
> > leaving the authoritative info in the backend also causes problems
> > when we have dynamic switching.  Not entirely sure what to do about
> > that.
> 
> yes, if we put it in the IRQ backend (the current IRQ controller model
> in use) we will have to synchronize the number spaces when the machine 
> switches interrupt mode. 
>  
> >>  bool cmd_line_caps[SPAPR_CAP_NUM];
> >>  sPAPRCapabilities def, eff, mig;
> >> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> >> new file mode 100644
> >> index ..345a42efd366
> >> --- /dev/null
> >> +++ b/include/hw/ppc/spapr_irq.h
> >> @@ -0,0 +1,30 @@
> >> +/*
> >> + * QEMU PowerPC sPAPR IRQ backend definitions
> >> + *
> >> + * Copyright (c) 2018, IBM Corporation.
> >> + *
> >> + * This code is licensed under the GPL version 2 or later. See the
> >> + * COPYING file in the top-level directory.
> >> + */
> >> +
> >> +#ifndef HW_SPAPR_IRQ_H
> >> +#define HW_SPAPR_IRQ_H
> >> +
> >> +/*
> >> + * IRQ range offsets per device type
> >> + */
> >> +#define SPAPR_IRQ_EPOW   0x1000  /* XICS_IRQ_BASE offset */
> >> +#define SPAPR_IRQ_HOTPLUG0x1001
> >> +#define SPAPR_IRQ_VIO0x1100  /* 256 VIO devices */
> >> +#define SPAPR_IRQ_PCI_LSI0x1200  /* 32+ PHBs devices */
> >> +
> >> +#define SPAPR_IRQ_MSI0x1300  /* Offset of the dynamic range 
> >> covered
> >> +  * by the bitmap allocator */
> > 
> > I'm a little confused by the MSI stuff.  It looks like you're going
> > for the option of one big pool for all dynamic irqs.  Except that I
> > thought in our discussion the other day you said each PHB advertised
> > its own separate MSI range, so we'd actually need to split this up
> > into ranges for each PHB.
> 
> Yes we can also, but we don't really need to and it might be too much
> constrained in fact.

Ok.

> As the IRQs are allocated dynamically, there is not a strong relation 
> between the device doing so and the IRQ numbers. The need for a well
> defined IRQ number range is weak. We should provision a certain number 
> of IRQs of course to size our IRQ number space but even that could be 
> done dynamically. We can resize the bitmap and allocate new source 
> blocks under the KVM XICS/XIVE device if needed. The resulting code 
> is quite simple and the IRQ number space is also less fragmented. 
> 
> I think we have all the requirements in hand, the current ones and the 
> new ones for hotplug PHBs, XIVE interrupt model, CAPI (which should be
> like the PHBs), XIVE user IRQs (like MSIs). The new ones are all 
> dynamic IRQ 

Re: [Qemu-devel] [PATCH v3 1/2] kvm: support -dedicated cpu-pm=on|off

2018-06-19 Thread Wanpeng Li
On Wed, 20 Jun 2018 at 08:07, Michael S. Tsirkin  wrote:
>
> On Tue, Jun 19, 2018 at 05:07:46PM -0500, Eric Blake wrote:
> > On 06/19/2018 10:17 AM, Paolo Bonzini wrote:
> > > On 16/06/2018 00:29, Michael S. Tsirkin wrote:
> > > > +static QemuOptsList qemu_dedicated_opts = {
> > > > +.name = "dedicated",
> > > > +.head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
> > > > +.desc = {
> > > > +{
> > > > +.name = "mem-lock",
> > > > +.type = QEMU_OPT_BOOL,
> > > > +},
> > > > +{
> > > > +.name = "cpu-pm",
> > > > +.type = QEMU_OPT_BOOL,
> > > > +},
> > > > +{ /* end of list */ }
> > > > +},
> > > > +};
> > > > +
> > >
> > > Let the bikeshedding begin!
> > >
> > > 1) Should we deprecate -realtime?
> > >
> > > 2) Maybe -hostresource?
> >
> > What further things might we add in the future?
> >
> > -dedicated sounds wrong (it is an adjective, while most of our options are
> > nouns - thing -machine, -drive, -object, ...)
> >
> > -hostresource at least sounds like a noun, but is long to type.  But at
> > least '-hostresource cpu-pm=on' reads reasonably well.
>
> Yes but host resource what? I feel it says nothing at all about what
> one can expect to find in this flag.
>
> > About the only other noun I could think of would be '-feature cpu-pm=on'.
>
> If we have nothing at all to say about what is grouping these things,
> we don't need a new flag. We can make it a machine property.
>
> It's user's hint that some host resource is dedicated to a VM.

The commit 633711e82 (kvm: rename KVM_HINTS_DEDICATED to
KVM_HINTS_REALTIME) should be reverted according to several threads
discussion I think.

Regards,
Wanpeng Li



Re: [Qemu-devel] [PATCH] hmp-commands: use long for begin and length in dump-guest-memory

2018-06-19 Thread Suraj Jitindar Singh
On Tue, 2018-06-19 at 11:25 +0100, Dr. David Alan Gilbert wrote:
> * Suraj Jitindar Singh (sjitindarsi...@gmail.com) wrote:
> > The dump-guest-memory command is used to dump an area of guest
> > memory
> > to a file, the piece of memory is specified by a begin address and
> > a length. These parameters are specified as ints and thus have a
> > maximum
> > value of 4GB. This means you can't dump the guest memory past the
> > first
> > 4GB and instead get:
> > (qemu) dump-guest-memory tmp 0x1 0x1
> > 'dump-guest-memory' has failed: integer is for 32-bit values
> > Try "help dump-guest-memory" for more information
> > 
> > This limitation is imposed in monitor_parse_arguments() since they
> > are
> > both ints. hmp_dump_guest_memory() uses 64 bit quantities to store
> > both
> > the begin and length values. Thus specify begin and length as long
> > so
> > that the entire guest memory space can be dumped.
> > 
> > Signed-off-by: Suraj Jitindar Singh 
> > ---
> >  hmp-commands.hx | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hmp-commands.hx b/hmp-commands.hx
> > index 0734fea931..3b5c1f65db 100644
> > --- a/hmp-commands.hx
> > +++ b/hmp-commands.hx
> > @@ -1116,7 +1116,7 @@ ETEXI
> >  
> >  {
> >  .name   = "dump-guest-memory",
> > -.args_type  = "paging:-p,detach:-d,zlib:-z,lzo:-l,snappy:-
> > s,filename:F,begin:i?,length:i?",
> > +.args_type  = "paging:-p,detach:-d,zlib:-z,lzo:-l,snappy:-
> > s,filename:F,begin:l?,length:l?",
> >  .params = "[-p] [-d] [-z|-l|-s] filename [begin
> > length]",
> >  .help   = "dump guest memory into file
> > 'filename'.\n\t\t\t"
> >"-p: do paging to get guest's memory
> > mapping.\n\t\t\t"
> 
> OK, so hmp_dump_guest_memory in hmp.c already uses int64_t for both,
> as does the qmp_dump_guest_memory it calls; so this looks OK.
> 
> Can you repost this please with the correct sign off that I see you
> tried to fix in the following mail; best if we get it in the one
> mail.

Of course. Done :)

> 
> Dave
> 
> > -- 
> > 2.13.6
> > 
> 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [PATCH] [RESEND] hmp-commands: use long for begin and length in dump-guest-memory

2018-06-19 Thread Suraj Jitindar Singh
The dump-guest-memory command is used to dump an area of guest memory
to a file, the piece of memory is specified by a begin address and
a length. These parameters are specified as ints and thus have a maximum
value of 4GB. This means you can't dump the guest memory past the first
4GB and instead get:
(qemu) dump-guest-memory tmp 0x1 0x1
'dump-guest-memory' has failed: integer is for 32-bit values
Try "help dump-guest-memory" for more information

This limitation is imposed in monitor_parse_arguments() since they are
both ints. hmp_dump_guest_memory() uses 64 bit quantities to store both
the begin and length values. Thus specify begin and length as long so
that the entire guest memory space can be dumped.

Signed-off-by: Suraj Jitindar Singh 
---
 hmp-commands.hx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 0734fea931..3b5c1f65db 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1116,7 +1116,7 @@ ETEXI
 
 {
 .name   = "dump-guest-memory",
-.args_type  = 
"paging:-p,detach:-d,zlib:-z,lzo:-l,snappy:-s,filename:F,begin:i?,length:i?",
+.args_type  = 
"paging:-p,detach:-d,zlib:-z,lzo:-l,snappy:-s,filename:F,begin:l?,length:l?",
 .params = "[-p] [-d] [-z|-l|-s] filename [begin length]",
 .help   = "dump guest memory into file 'filename'.\n\t\t\t"
   "-p: do paging to get guest's memory mapping.\n\t\t\t"
-- 
2.13.6




Re: [Qemu-devel] [PATCH v3 1/2] kvm: support -dedicated cpu-pm=on|off

2018-06-19 Thread Michael S. Tsirkin
On Tue, Jun 19, 2018 at 05:07:46PM -0500, Eric Blake wrote:
> On 06/19/2018 10:17 AM, Paolo Bonzini wrote:
> > On 16/06/2018 00:29, Michael S. Tsirkin wrote:
> > > +static QemuOptsList qemu_dedicated_opts = {
> > > +.name = "dedicated",
> > > +.head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
> > > +.desc = {
> > > +{
> > > +.name = "mem-lock",
> > > +.type = QEMU_OPT_BOOL,
> > > +},
> > > +{
> > > +.name = "cpu-pm",
> > > +.type = QEMU_OPT_BOOL,
> > > +},
> > > +{ /* end of list */ }
> > > +},
> > > +};
> > > +
> > 
> > Let the bikeshedding begin!
> > 
> > 1) Should we deprecate -realtime?
> > 
> > 2) Maybe -hostresource?
> 
> What further things might we add in the future?
> 
> -dedicated sounds wrong (it is an adjective, while most of our options are
> nouns - thing -machine, -drive, -object, ...)
> 
> -hostresource at least sounds like a noun, but is long to type.  But at
> least '-hostresource cpu-pm=on' reads reasonably well.

Yes but host resource what? I feel it says nothing at all about what
one can expect to find in this flag.

> About the only other noun I could think of would be '-feature cpu-pm=on'.

If we have nothing at all to say about what is grouping these things,
we don't need a new flag. We can make it a machine property.

It's user's hint that some host resource is dedicated to a VM.


> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v1 4/6] qga: removing switch statements, adding run_process_child

2018-06-19 Thread Marc-André Lureau
Hi

On Tue, Jun 19, 2018 at 9:38 PM, Daniel Henrique Barboza
 wrote:
> This is a cleanup of the resulting code after detaching
> pmutils and Linux sys state file logic:
>
> - remove the SUSPEND_MODE_* macros and use an enumeration
> instead. At the same time, drop the switch statements
> at the start of each function and use the enumeration
> index to get the right binary/argument;
>
> - create a new function called run_process_child(). This
> function creates a child process and executes a (shell)
> command, returning the command return code. This is a common

What about using g_spawn_sync() instead?

> operation in the pmutils functions and will be used in the
> systemd implementation as well, so this function will avoid
> code repetition.
>
> There are more places inside commands-posix.c where this new
> run_process_child function can also be used, but one step
> at a time.
>
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  qga/commands-posix.c | 190 +--
>  1 file changed, 76 insertions(+), 114 deletions(-)
>
> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
> index a2870f9ab9..d5e3805ce9 100644
> --- a/qga/commands-posix.c
> +++ b/qga/commands-posix.c
> @@ -1438,152 +1438,122 @@ qmp_guest_fstrim(bool has_minimum, int64_t minimum, 
> Error **errp)
>  #define LINUX_SYS_STATE_FILE "/sys/power/state"
>  #define SUSPEND_SUPPORTED 0
>  #define SUSPEND_NOT_SUPPORTED 1
> -#define SUSPEND_MODE_DISK 1
> -#define SUSPEND_MODE_RAM 2
> -#define SUSPEND_MODE_HYBRID 3
>
> -static bool pmutils_supports_mode(int suspend_mode, Error **errp)
> +typedef enum {
> +SUSPEND_MODE_DISK = 0,
> +SUSPEND_MODE_RAM = 1,
> +SUSPEND_MODE_HYBRID = 2,
> +} SuspendMode;
> +
> +static int run_process_child(const char *command[], Error **errp)
>  {
>  Error *local_err = NULL;
> -const char *pmutils_arg;
> -const char *pmutils_bin = "pm-is-supported";
> -char *pmutils_path;
> +char *cmd_path = g_find_program_in_path(command[0]);
>  pid_t pid;
> -int status;
> -bool ret = false;
> -
> -switch (suspend_mode) {
> -
> -case SUSPEND_MODE_DISK:
> -pmutils_arg = "--hibernate";
> -break;
> -case SUSPEND_MODE_RAM:
> -pmutils_arg = "--suspend";
> -break;
> -case SUSPEND_MODE_HYBRID:
> -pmutils_arg = "--suspend-hybrid";
> -break;
> -default:
> -return ret;
> -}
> +int status, ret = -1;
>
> -pmutils_path = g_find_program_in_path(pmutils_bin);
> -if (!pmutils_path) {
> +if (!cmd_path) {
>  return ret;
>  }
>
>  pid = fork();
>  if (!pid) {
>  setsid();
> -execle(pmutils_path, pmutils_bin, pmutils_arg, NULL, environ);
>  /*
> - * If we get here execle() has failed.
> + * execve receives a char* const argv[] as second arg but we're
> + * receiving a const char*[]. Since execve does not change the
> + * array contents it's tolerable to cast here.
>   */
> -_exit(SUSPEND_NOT_SUPPORTED);
> +execve(cmd_path, (char* const*)command, environ);
> +_exit(errno);
>  } else if (pid < 0) {
>  error_setg_errno(errp, errno, "failed to create child process");
> +ret = EXIT_FAILURE;
>  goto out;
>  }
>
>  ga_wait_child(pid, , _err);
>  if (local_err) {
>  error_propagate(errp, local_err);
> +ret = EXIT_FAILURE;
>  goto out;
>  }
>
> -switch (WEXITSTATUS(status)) {
> -case SUSPEND_SUPPORTED:
> -ret = true;
> -goto out;
> -case SUSPEND_NOT_SUPPORTED:
> -goto out;
> -default:
> -error_setg(errp,
> -   "the helper program '%s' returned an unexpected exit 
> status"
> -   " code (%d)", pmutils_path, WEXITSTATUS(status));
> -goto out;
> -}
> +ret = WEXITSTATUS(status);
>
>  out:
> -g_free(pmutils_path);
> +g_free(cmd_path);
>  return ret;
>  }
>
> -static void pmutils_suspend(int suspend_mode, Error **errp)
> +static bool pmutils_supports_mode(SuspendMode mode, Error **errp)
>  {
>  Error *local_err = NULL;
> -const char *pmutils_bin;
> -char *pmutils_path;
> -pid_t pid;
> +const char *pmutils_args[3] = {"--hibernate", "--suspend",
> +   "--suspend-hybrid"};
> +const char *cmd[3] = {"pm-is-supported", pmutils_args[mode], NULL};
>  int status;
>
> -switch (suspend_mode) {
> -
> -case SUSPEND_MODE_DISK:
> -pmutils_bin = "pm-hibernate";
> -break;
> -case SUSPEND_MODE_RAM:
> -pmutils_bin = "pm-suspend";
> -break;
> -case SUSPEND_MODE_HYBRID:
> -pmutils_bin = "pm-suspend-hybrid";
> -break;
> -default:
> -error_setg(errp, "unknown guest suspend mode");
> -return;
> -}
> +status = run_process_child(cmd, _err);
>
> -pmutils_path = 

Re: [Qemu-devel] [PATCH v1 3/6] qga: guest_suspend: decoupling pm-utils and sys logic

2018-06-19 Thread Marc-André Lureau
Hi

On Tue, Jun 19, 2018 at 9:38 PM, Daniel Henrique Barboza
 wrote:
> Following the same logic of the previous patch, let's also
> decouple the suspend logic from guest_suspend into specialized
> functions, one for each strategy we support at this moment.
>
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  qga/commands-posix.c | 170 +++
>  1 file changed, 108 insertions(+), 62 deletions(-)
>
> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
> index 89ffd8dc88..a2870f9ab9 100644
> --- a/qga/commands-posix.c
> +++ b/qga/commands-posix.c
> @@ -1509,6 +1509,65 @@ out:
>  return ret;
>  }
>
> +static void pmutils_suspend(int suspend_mode, Error **errp)
> +{
> +Error *local_err = NULL;
> +const char *pmutils_bin;
> +char *pmutils_path;
> +pid_t pid;
> +int status;
> +
> +switch (suspend_mode) {
> +
> +case SUSPEND_MODE_DISK:
> +pmutils_bin = "pm-hibernate";
> +break;
> +case SUSPEND_MODE_RAM:
> +pmutils_bin = "pm-suspend";
> +break;
> +case SUSPEND_MODE_HYBRID:
> +pmutils_bin = "pm-suspend-hybrid";
> +break;
> +default:
> +error_setg(errp, "unknown guest suspend mode");
> +return;
> +}
> +
> +pmutils_path = g_find_program_in_path(pmutils_bin);
> +if (!pmutils_path) {
> +error_setg(errp, "the helper program '%s' was not found", 
> pmutils_bin);
> +return;
> +}
> +
> +pid = fork();
> +if (!pid) {
> +setsid();
> +execle(pmutils_path, pmutils_bin, NULL, environ);
> +/*
> + * If we get here execle() has failed.
> + */
> +_exit(EXIT_FAILURE);
> +} else if (pid < 0) {
> +error_setg_errno(errp, errno, "failed to create child process");
> +goto out;
> +}
> +
> +ga_wait_child(pid, , _err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +goto out;
> +}
> +
> +if (WEXITSTATUS(status)) {
> +error_setg(errp,
> +   "the helper program '%s' returned an unexpected exit 
> status"
> +   " code (%d)", pmutils_path, WEXITSTATUS(status));
> +}
> +
> +out:
> +g_free(pmutils_path);
> +}
> +
>  static bool linux_sys_state_supports_mode(int suspend_mode, Error **errp)
>  {
>  const char *sysfile_str;
> @@ -1545,64 +1604,28 @@ static bool linux_sys_state_supports_mode(int 
> suspend_mode, Error **errp)
>  return false;
>  }
>
> -static void bios_supports_mode(int suspend_mode, Error **errp)
> -{
> -Error *local_err = NULL;
> -bool ret;
> -
> -ret = pmutils_supports_mode(suspend_mode, _err);
> -if (ret) {
> -return;
> -}
> -if (local_err) {
> -error_propagate(errp, local_err);
> -return;
> -}
> -ret = linux_sys_state_supports_mode(suspend_mode, errp);
> -if (!ret) {
> -error_setg(errp,
> -   "the requested suspend mode is not supported by the 
> guest");
> -return;
> -}
> -}
> -
> -static void guest_suspend(int suspend_mode, Error **errp)
> +static void linux_sys_state_suspend(int suspend_mode, Error **errp)
>  {
>  Error *local_err = NULL;
> -const char *pmutils_bin, *sysfile_str;
> -char *pmutils_path;
> +const char *sysfile_str;
>  pid_t pid;
>  int status;
>
> -bios_supports_mode(suspend_mode, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> -return;
> -}
> -
>  switch (suspend_mode) {
>
>  case SUSPEND_MODE_DISK:
> -pmutils_bin = "pm-hibernate";
>  sysfile_str = "disk";
>  break;
>  case SUSPEND_MODE_RAM:
> -pmutils_bin = "pm-suspend";
>  sysfile_str = "mem";
>  break;
> -case SUSPEND_MODE_HYBRID:
> -pmutils_bin = "pm-suspend-hybrid";
> -sysfile_str = NULL;
> -break;
>  default:
>  error_setg(errp, "unknown guest suspend mode");
>  return;
>  }
>
> -pmutils_path = g_find_program_in_path(pmutils_bin);
> -
>  pid = fork();
> -if (pid == 0) {
> +if (!pid) {
>  /* child */
>  int fd;
>
> @@ -1611,19 +1634,6 @@ static void guest_suspend(int suspend_mode, Error 
> **errp)
>  reopen_fd_to_null(1);
>  reopen_fd_to_null(2);
>
> -if (pmutils_path) {
> -execle(pmutils_path, pmutils_bin, NULL, environ);
> -}
> -
> -/*
> - * If we get here either pm-utils is not installed or execle() has
> - * failed. Let's try the manual method if the caller wants it.
> - */
> -
> -if (!sysfile_str) {
> -_exit(EXIT_FAILURE);
> -}
> -
>  fd = open(LINUX_SYS_STATE_FILE, O_WRONLY);
>  if (fd < 0) {
>  _exit(EXIT_FAILURE);
> @@ -1636,27 +1646,63 @@ static void guest_suspend(int suspend_mode, Error 
> **errp)
>  _exit(EXIT_SUCCESS);
>  } else if (pid < 0) {

Re: [Qemu-devel] [PATCH] [RFC v2] aio: properly bubble up errors from initialization

2018-06-19 Thread Nishanth Aravamudan via Qemu-devel
On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote:
> On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote:
> > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote:
> > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote:
> 
> 
> 
> > > >   } else if (s->use_linux_aio) {
> > > > +int rc;
> > > > +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs));
> > > > +if (rc != 0) {
> > > > +error_report("Unable to use native AIO, falling back 
> > > > to "
> > > > + "thread pool.");
> > > 
> > > In general, error_report() should not output a trailing '.'.
> > 
> > Will fix.
> > 
> > > > +s->use_linux_aio = 0;
> > > > +return rc;
> > > 
> > > Wait - the message claims we are falling back, but the non-zero return 
> > > code
> > > sounds like we are returning an error instead of falling back.  (My
> > > preference - if the user requested something and we can't do it, it's 
> > > better
> > > to error than to fall back to something that does not match the user's
> > > request).
> > 
> > I think that makes sense, I hadn't tested this specific case (in my
> > reading of the code, it wasn't clear to me if raw_co_prw() could be
> > called before raw_aio_plug() had been called, but I think returning the
> > error code up should be handled correctly. What about the cases where
> > there is no error handling (the other two changes in the patch)?
> 
> While looking at doing these changes, I realized that I'm not quite sure
> what the right approach is here. My original rationale for returning
> non-zero was that AIO was requested but could not be completed. I
> haven't fully tracked back the calling paths, but I assumed it would get
> retried at the top level, and since we indicated to not use AIO on
> subsequent calls, it will succeed and use threads then (note, that I do
> now realize this means a mismatch between the qemu command-line and the
> in-use AIO model).
> 
> In practice, with my v2 patch, where I do return a non-zero error-code
> from this function, qemu does not exit (nor is any logging other than
> that I added emitted on the monitor). If I do not fallback, I imagine we
> would just continuously see this error message and IO might not actually
> every occur? Reworking all of the callpath to fail on non-zero returns
> from raw_co_prw() seems like a fair bit of work, but if that is what is
> being requested, I can try that (it will just take a while).
> Alternatively, I can produce a v3 quickly that does not bubble the
> actual errno all the way up (since it does seem like it is ignored
> anyways?).

Sorry for the noise, but I had one more thought. Would it be appropriate
to push the _setup() call up to when we parse the arguments about
aio=native? E.g., we already check there if cache=directsync is
specified and error out if not. We could, in theory, also call
laio_init() there (via the new function) and error out to the CLI if
that fails. Then the runtime paths would simply be able to use the
context that was setup earlier? I would need to verify the
laio_cleanup() happens correctly still.

Thanks,
Nish



Re: [Qemu-devel] [PATCH] [RFC v2] aio: properly bubble up errors from initialization

2018-06-19 Thread Nishanth Aravamudan via Qemu-devel
On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote:
> On 19.06.2018 [14:35:33 -0500], Eric Blake wrote:
> > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote:



> > >   } else if (s->use_linux_aio) {
> > > +int rc;
> > > +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs));
> > > +if (rc != 0) {
> > > +error_report("Unable to use native AIO, falling back to "
> > > + "thread pool.");
> > 
> > In general, error_report() should not output a trailing '.'.
> 
> Will fix.
> 
> > > +s->use_linux_aio = 0;
> > > +return rc;
> > 
> > Wait - the message claims we are falling back, but the non-zero return code
> > sounds like we are returning an error instead of falling back.  (My
> > preference - if the user requested something and we can't do it, it's better
> > to error than to fall back to something that does not match the user's
> > request).
> 
> I think that makes sense, I hadn't tested this specific case (in my
> reading of the code, it wasn't clear to me if raw_co_prw() could be
> called before raw_aio_plug() had been called, but I think returning the
> error code up should be handled correctly. What about the cases where
> there is no error handling (the other two changes in the patch)?

While looking at doing these changes, I realized that I'm not quite sure
what the right approach is here. My original rationale for returning
non-zero was that AIO was requested but could not be completed. I
haven't fully tracked back the calling paths, but I assumed it would get
retried at the top level, and since we indicated to not use AIO on
subsequent calls, it will succeed and use threads then (note, that I do
now realize this means a mismatch between the qemu command-line and the
in-use AIO model).

In practice, with my v2 patch, where I do return a non-zero error-code
from this function, qemu does not exit (nor is any logging other than
that I added emitted on the monitor). If I do not fallback, I imagine we
would just continuously see this error message and IO might not actually
every occur? Reworking all of the callpath to fail on non-zero returns
from raw_co_prw() seems like a fair bit of work, but if that is what is
being requested, I can try that (it will just take a while).
Alternatively, I can produce a v3 quickly that does not bubble the
actual errno all the way up (since it does seem like it is ignored
anyways?).



> > > +s->use_linux_aio = 0;
> > 
> > Should s->use_linux_aio be a bool instead of an int?
> 
> It is:
> 
> bool use_linux_aio:1;
> 
> would you prefer I did a preparatory patch that converted users to
> true/false?

Sorry, I misunderstood this -- only my patch does an assignment, so I'll
switch to 'false'.

Thanks,
Nish



[Qemu-devel] [Bug 1776920] Re: qemu-img convert on Mac OSX creates corrupt images

2018-06-19 Thread Waldemar Kozaczuk
Have I provided all necessary data and other details?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1776920

Title:
  qemu-img convert on Mac OSX creates corrupt images

Status in QEMU:
  New

Bug description:
  An image created by qemu-img create, then modified by another program
  is converted to bad/corrupt image when using convert sub command on
  Mac OSX. The same convert works on Linux. The version of qemu-img is
  2.12.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1776920/+subscriptions



Re: [Qemu-devel] [PATCH 2/6] nbd: allow authorization with nbd-server-start QMP command

2018-06-19 Thread Daniel P . Berrangé
On Tue, Jun 19, 2018 at 03:10:12PM -0500, Eric Blake wrote:
> On 06/15/2018 10:50 AM, Daniel P. Berrangé wrote:
> > From: "Daniel P. Berrange" 
> > 
> > As with the previous patch to qemu-nbd, the nbd-server-start QMP command
> > also needs to be able to specify authorization when enabling TLS encryption.
> > 
> > First the client must create a QAuthZ object instance using the
> > 'object-add' command:
> > 
> > {
> >   'execute': 'object-add',
> >   'arguments': {
> > 'qom-type': 'authz-simple',
> > 'id': 'authz0',
> > 'parameters': {
> >   'policy': 'deny',
> >   'rules': [
> > {
> >   'match': '*CN=fred',
> >   'policy': 'allow'
> > }
> >   ]
> > }
> >   }
> > }
> > 
> > They can then reference this in the new 'tls-authz' parameter when
> > executing the 'nbd-server-start' command:
> > 
> > {
> >   'execute': 'nbd-server-start',
> >   'arguments': {
> > 'addr': {
> > 'type': 'inet',
> > 'host': '127.0.0.1',
> > 'port': '9000'
> > },
> > 'tls-creds': 'tls0',
> > 'tls-authz': 'authz0'
> >   }
> > }
> 
> Is it worth using a discriminated union (string vs. QAuthZ) so that one
> could specify the authz policy inline rather than as a separate object, for
> convenience?  But that would be fine as a followup patch, if we even want
> it.

QAuthZ isn't a QAPI type - its a QOM object interface, so you'd have to
allow the entire object_add arg set inline, and then validate the QOM type
you received after the fact actually implemented the interface.  Also for
migration at least it is likely the single authz impl will be shared for
both migration + nbd services. So I think its cleaner just to keep it
separate to avoid having 2 distinct codepaths for handling the same thing


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [PATCH v3 1/2] kvm: support -dedicated cpu-pm=on|off

2018-06-19 Thread Eric Blake

On 06/19/2018 10:17 AM, Paolo Bonzini wrote:

On 16/06/2018 00:29, Michael S. Tsirkin wrote:
  
+static QemuOptsList qemu_dedicated_opts = {

+.name = "dedicated",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
+.desc = {
+{
+.name = "mem-lock",
+.type = QEMU_OPT_BOOL,
+},
+{
+.name = "cpu-pm",
+.type = QEMU_OPT_BOOL,
+},
+{ /* end of list */ }
+},
+};
+


Let the bikeshedding begin!

1) Should we deprecate -realtime?

2) Maybe -hostresource?


What further things might we add in the future?

-dedicated sounds wrong (it is an adjective, while most of our options 
are nouns - thing -machine, -drive, -object, ...)


-hostresource at least sounds like a noun, but is long to type.  But at 
least '-hostresource cpu-pm=on' reads reasonably well.


About the only other noun I could think of would be '-feature cpu-pm=on'.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v1 0/6] QGA: systemd hibernate/suspend/hybrid-sleep

2018-06-19 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180619193806.17419-1-danielhb...@gmail.com
Subject: [Qemu-devel] [PATCH v1 0/6] QGA: systemd hibernate/suspend/hybrid-sleep

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
cc696f1009 qga: removing bios_supports_mode
dc712bbec2 qga: adding systemd hibernate/suspend/hybrid-sleep support
3be456b7f1 qga: removing switch statements, adding run_process_child
31cf0e6b15 qga: guest_suspend: decoupling pm-utils and sys logic
314d3a05ca qga: bios_supports_mode: decoupling pm-utils and sys logic
da135a7c9f qga: refactoring qmp_guest_suspend_* functions

=== OUTPUT BEGIN ===
Checking PATCH 1/6: qga: refactoring qmp_guest_suspend_* functions...
Checking PATCH 2/6: qga: bios_supports_mode: decoupling pm-utils and sys 
logic...
Checking PATCH 3/6: qga: guest_suspend: decoupling pm-utils and sys logic...
Checking PATCH 4/6: qga: removing switch statements, adding run_process_child...
ERROR: "(foo* const*)" should be "(foo * const*)"
#91: FILE: qga/commands-posix.c:1467:
+execve(cmd_path, (char* const*)command, environ);

ERROR: space required before that '*' (ctx:VxB)
#91: FILE: qga/commands-posix.c:1467:
+execve(cmd_path, (char* const*)command, environ);
  ^

total: 2 errors, 0 warnings, 295 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 5/6: qga: adding systemd hibernate/suspend/hybrid-sleep 
support...
Checking PATCH 6/6: qga: removing bios_supports_mode...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-devel] [PATCH 00/113] Patch Round-up for stable 2.11.2, freeze on 2018-06-22

2018-06-19 Thread Bruce Rogers
>>> On 6/18/2018 at 7:41 PM, Michael Roth  wrote:
> Hi everyone,
> 
> The following new patches are queued for QEMU stable v2.11.2:
> 
>   https://github.com/mdroth/qemu/commits/stable-2.11-staging 
> 
> The release is planned for 2018-06-22:
> 
>   https://wiki.qemu.org/Planning/2.11 
> 
> Please respond here or CC qemu-sta...@nongnu.org on any patches you
> think should be included in the release.
> 

For openSUSE Leap 15's qemu package, based on v2.11.1, we
also add these patches: 

commit bb223055b9b327ec66e1f6d2fbaebaee0b8f3dbe
Author: Christian Borntraeger 
Date:   Mon Dec 11 13:21:46 2017 +0100

s390-ccw-virtio: allow for systems larger that 7.999TB

commit 05b71fb207ab7f016e067bd2a40fc0804362eb74
Author: Marc-André Lureau 
Date:   Mon Jan 29 19:33:04 2018 +0100

tpm: lookup cancel path under tpm device class

Bruce




Re: [Qemu-devel] [Qemu-block] [RFC 1/1] ide: bug #1777315: io_buffer_size and sg.size can represent partial sector sizes

2018-06-19 Thread John Snow



On 06/19/2018 05:26 PM, Amol Surati wrote:
> On Tue, Jun 19, 2018 at 08:04:03PM +0530, Amol Surati wrote:
>> On Tue, Jun 19, 2018 at 09:45:15AM -0400, John Snow wrote:
>>>
>>>
>>> On 06/19/2018 04:53 AM, Kevin Wolf wrote:
 Am 19.06.2018 um 06:01 hat Amol Surati geschrieben:
> On Mon, Jun 18, 2018 at 08:14:10PM -0400, John Snow wrote:
>>
>>
>> On 06/18/2018 02:02 PM, Amol Surati wrote:
>>> On Mon, Jun 18, 2018 at 12:05:15AM +0530, Amol Surati wrote:
 This patch fixes the assumption that io_buffer_size is always a perfect
 multiple of the sector size. The assumption is the cause of the firing
 of 'assert(n * 512 == s->sg.size);'.

 Signed-off-by: Amol Surati 
 ---
>>>
>>> The repository https://github.com/asurati/1777315 contains a module for
>>> QEMU's 8086:7010 ATA controller, which exercises the code path
>>> described in [RFC 0/1] of this series.
>>>
>>
>> Thanks, this made it easier to see what was happening. I was able to
>> write an ide-test test case using this source as a guide, and reproduce
>> the error.
>>
>> static void test_bmdma_partial_sector_short_prdt(void)
>> {
>> QPCIDevice *dev;
>> QPCIBar bmdma_bar, ide_bar;
>> uint8_t status;
>>
>> /* Read 2 sectors but only give 1 sector in PRDT */
>> PrdtEntry prdt[] = {
>> {
>> .addr = 0,
>> .size = cpu_to_le32(0x200),
>> },
>> {
>> .addr = 512,
>> .size = cpu_to_le32(0x44 | PRDT_EOT),
>> }
>> };
>>
>> dev = get_pci_device(_bar, _bar);
>> status = send_dma_request(CMD_READ_DMA, 0, 2,
>>   prdt, ARRAY_SIZE(prdt), NULL);
>> g_assert_cmphex(status, ==, 0);
>> assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | ERR);
>> free_pci_device(dev);
>> }
>>
>>> Loading the module reproduces the bug. Tested on the latest master
>>> branch.
>>>
>>> Steps:
>>> - Install a Linux distribution as a guest, ensuring that the boot disk
>>>   resides on non-IDE controllers (such as virtio)
>>> - Attach another disk as a master device on the primary
>>>   IDE controller (i.e. attach at -hda.)
>>> - Blacklist ata_piix, pata_acpi and ata_generic modules, and reboot.
>>> - Copy the source files into the guest and build the module.
>>> - Load the module. QEMU process should die with the message:
>>>   qemu-system-x86_64: hw/ide/core.c:871: ide_dma_cb:
>>>   Assertion `n * 512 == s->sg.size' failed.
>>>
>>>
>>> -Amol
>>>
>>
>> I'm less sure of the fix -- certainly the assert is wrong, but just
>> incrementing 'n' is wrong too -- we didn't copy (n+1) sectors, we copied
>> (n) and a few extra bytes.
>
> That is true.
>
> There are (at least) two fields that represent the total size of a DMA
> transfer -
> (1) The size, as requested through the NSECTOR field.
> (2) The size, as calculated through the length fields of the PRD entries.
>
> It makes sense to consider the most restrictive of the sizes, as the 
> factor
> which determines both the end of a successful DMA transfer and the
> condition to assert.
>
>>
>> The sector-based math here would need to be adjusted to be able to cope
>> with partial sector reads... or we ought to avoid doing any partial
>> sector transfers.
>>
>>
>> I'm not sure which is more correct tonight, it depends:
>>
>> - If it's OK to transfer partial sectors before reporting overflow,
>> adjusting the command loop to work with partial sectors is OK.
>>
>> - If it's NOT OK to do partial sector transfer, the sglist preparation
>> phase needs to produce a truncated SGList that's some multiple of 512
>> bytes that leaves the excess bytes in a second sglist that we don't
>> throw away and can use as a basis for building the next sglist. (Or the
>> DMA helpers need to take a max_bytes parameter and return an sglist
>> representing unused buffer space if the command underflowed.)
>
> Support for partial sector transfers is built into the DMA interface's PRD
> mechanism itself, because an entry is allowed to transfer in the units of
> even number of bytes.
>
> I think the controller's IO process runs in two parts (probably loops over
> for a single transfer):
>
> (1) The controller's disk interface transfers between its internal buffer
> and the disk storage. The transfers are likely to be in the
> multiples of a sector.
> (2) The controller's DMA interface transfers between its internal buffer
> and the system memory. The transfers can be sub-sector in size(, and
> are preserving of the areas, of the internal 

[Qemu-devel] [PATCH v16 2/3] i386: Enable TOPOEXT feature on AMD EPYC CPU

2018-06-19 Thread Babu Moger
Enable TOPOEXT feature on EPYC CPU. This is required to support
hyperthreading on VM guests. Also extend xlevel to 0x801E.

Disable topoext on PC_COMPAT_2_12 and keep xlevel 0x800a.

Signed-off-by: Babu Moger 
---
 include/hw/i386/pc.h |  8 
 target/i386/cpu.c| 10 ++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index fc8dedc..d0ebeb9 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -303,6 +303,14 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
 .driver   = TYPE_X86_CPU,\
 .property = "legacy-cache",\
 .value= "on",\
+},{\
+.driver   = TYPE_X86_CPU,\
+.property = "topoext",\
+.value= "off",\
+},{\
+.driver   = "EPYC-" TYPE_X86_CPU,\
+.property = "xlevel",\
+.value= stringify(0x800a),\
 },
 
 #define PC_COMPAT_2_11 \
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 130391c..d6ed29b 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -2579,7 +2579,8 @@ static X86CPUDefinition builtin_x86_defs[] = {
 .features[FEAT_8000_0001_ECX] =
 CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH |
 CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM |
-CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM,
+CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM |
+CPUID_EXT3_TOPOEXT,
 .features[FEAT_7_0_EBX] =
 CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
 CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_RDSEED |
@@ -2594,7 +2595,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_XSAVE_XGETBV1,
 .features[FEAT_6_EAX] =
 CPUID_6_EAX_ARAT,
-.xlevel = 0x800A,
+.xlevel = 0x801E,
 .model_id = "AMD EPYC Processor",
 .cache_info = _cache_info,
 },
@@ -2624,7 +2625,8 @@ static X86CPUDefinition builtin_x86_defs[] = {
 .features[FEAT_8000_0001_ECX] =
 CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH |
 CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM |
-CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM,
+CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM |
+CPUID_EXT3_TOPOEXT,
 .features[FEAT_8000_0008_EBX] =
 CPUID_8000_0008_EBX_IBPB,
 .features[FEAT_7_0_EBX] =
@@ -2641,7 +2643,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_XSAVE_XGETBV1,
 .features[FEAT_6_EAX] =
 CPUID_6_EAX_ARAT,
-.xlevel = 0x800A,
+.xlevel = 0x801E,
 .model_id = "AMD EPYC Processor (with IBPB)",
 .cache_info = _cache_info,
 },
-- 
1.8.3.1




[Qemu-devel] [PATCH v16 1/3] i386: Fix up the Node id for CPUID_8000_001E

2018-06-19 Thread Babu Moger
This is part of topoext support. To keep the compatibility, it is better
we support all the combination of nr_cores and nr_threads currently
supported. By allowing more nr_cores and nr_threads, we might end up with
more nodes than we can actually support with the real hardware. We need to
fix up the node id to make this work. We can achieve this by shifting the
socket_id bits left to address more nodes.

Signed-off-by: Babu Moger 
---
 target/i386/cpu.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7a4484b..130391c 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -19,6 +19,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
+#include "qemu/bitops.h"
 
 #include "cpu.h"
 #include "exec/exec-all.h"
@@ -472,6 +473,8 @@ static void encode_topo_cpuid801e(CPUState *cs, X86CPU 
*cpu,
uint32_t *ecx, uint32_t *edx)
 {
 struct core_topology topo = {0};
+unsigned long nodes;
+int shift;
 
 build_core_topology(cs->nr_cores, cpu->core_id, );
 *eax = cpu->apic_id;
@@ -504,7 +507,28 @@ static void encode_topo_cpuid801e(CPUState *cs, X86CPU 
*cpu,
  * 2  Socket id
  *   1:0  Node id
  */
-*ecx = ((topo.num_nodes - 1) << 8) | (cpu->socket_id << 2) | topo.node_id;
+if (topo.num_nodes <= 4) {
+*ecx = ((topo.num_nodes - 1) << 8) | (cpu->socket_id << 2) |
+topo.node_id;
+} else {
+/*
+ * Node id fix up. Actual hardware supports up to 4 nodes. But with
+ * more than 32 cores, we may end up with more than 4 nodes.
+ * Node id is a combination of socket id and node id. Only requirement
+ * here is that this number should be unique accross the system.
+ * Shift the socket id to accommodate more nodes. We dont expect both
+ * socket id and node id to be big number at the same time. This is not
+ * an ideal config but we need to to support it. Max nodes we can have
+ * is 32 (255/8) with 8 cores per node and 255 max cores. We only need
+ * 5 bits for nodes. Find the left most set bit to represent the total
+ * number of nodes. find_last_bit returns last set bit(0 based). Left
+ * shift(+1) the socket id to represent all the nodes.
+ */
+nodes = topo.num_nodes - 1;
+shift = find_last_bit(, 8);
+*ecx = ((topo.num_nodes - 1) << 8) | (cpu->socket_id << (shift + 1)) |
+topo.node_id;
+}
 *edx = 0;
 }
 
-- 
1.8.3.1




[Qemu-devel] [PATCH v16 0/3] i386: Enable TOPOEXT to support hyperthreading on AMD CPU

2018-06-19 Thread Babu Moger
This series enables the TOPOEXT feature for AMD CPUs. This is required to
support hyperthreading on kvm guests.

This addresses the issues reported in these bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1481253
https://bugs.launchpad.net/qemu/+bug/1703506 

v16:
 Patches are based off of Eduardo's git://github.com/ehabkost/qemu.git x86-next.
 Some of the patches are queued already. Submitting remaining series. Will be on
 vacation for couple of weeks. Wanted to fix one issue before I go.
 1. Fixed the bit shifting issue with patch #1. Added more comments about the 
change.


v15:
 Patches are based off of Eduardo's git://github.com/ehabkost/qemu.git x86-next.
 Some of the patches are queued already. Submitting remaining series.
 Summary of changes.
 1. Added changes to support all the currently supported nr_cores and 
nr_threads.
Fixed up the node id to support this.
 2. Removed topology_supports_topoext function. This is not required anymore as
we allow all the combinations to work now.
 3. Fixed other feedback from Eduardo for v14.

v14:
 Patches are based off of Eduardo's git://github.com/ehabkost/qemu.git x86-next.
 Some of the patches are queued already. Submitting remaining series.
 Summary of changes.
 1. Always set TOPOEXT feature in kvm_arch_get_supported_cpuid
 2. Implemented topology_supports_topoext bit differently. Reason for this is, 
if we
need to disable this feature before the x86_cpu_expand_features. But 
problem is
nr_cores and nr_threads are not populated at this time. It is populated in 
qemu_init_vcpus.
 3. Removed auto-topoext feature completely. The can cause lots of 
compatibility issues.

v13:
 Patches are based off of Eduardo's git://github.com/ehabkost/qemu.git x86-next.
 Some of the patches are queued already. Submitting remaining series.
 Summary of changes.
 1.Fixed the error format if the topology cannot be supported.
 2.Fixed the compatibility issues with old cpu models and new machine types.
   Here is the discussion thread.
   Here is the discussion thread. 
https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg01239.html
 3.I am still testing it. But sending it to get review feedback.

v12:
 Patches are based off of Eduardo's git://github.com/ehabkost/qemu.git x86-next.
 Some of the patches are queued already. Submitting remaining series.

 Summary of changes.
 1.Added more comments explaining CPUID_Fn801E bit definitions.
 2.Split the patch into separate patch to check the topology. Moved the code to
   x86_cpu_realizefn. Display the error if topoext feature cannot be enabled.
 3.Few more text corrections.

v11:
 Patches are based off of Eduardo's git://github.com/ehabkost/qemu.git x86-next.
 Summary of changes.
 1.Added more comments explaining different constants and variables.
 2.Removed NUM_SHARING_CACHE macro and made the code simpler.
 3.Changed the function name num_sharing_l3_cache to cores_in_core_complex.
   This function is actually finding the number of cores in a core complex.
   Purpose here is to re-use the code in couple more places.
 4.Added new function nodes_in_socket to find number of nodes in the config.
   Purpose here is to re-use the code.
 5.Used DIV_ROUND_UP wherever applicable.
 6.Renamed few constants and functions to generic names.
 7.Few more text corrections.
 
v10:
 Based the patches on Eduardo's git://github.com/ehabkost/qemu.git x86-next
 Some of the earlier patches are already queued. So, submitting the rest of
 the series here. This series adds complete redesign of the cpu topology.
 Based on user given parameter, we try to build topology very close to the
 hardware. Maintains symmetry as much as possible. Added new function
 epyc_build_topology to build the topology based on user given nr_cores,
 nr_threads.
 Summary of changes.
 1. Build the topology dinamically based on nr_cores and nr_threads
 2. Added new epyc_build_topology to build the new topology.
 3. Added new function num_sharing_l3_cache to calculate the L3 sharing
 4. Added a check to verify the topology. Disabled the TOPOEXT if the
topology cannot be built.

v9:
 Based the patches on Eduardo's git://github.com/ehabkost/qemu.git x86-next
 tree. Following 3 patches from v8 are already queued.
  i386: Add cache information in X86CPUDefinition
  i386: Initialize cache information for EPYC family processors
  i386: Helpers to encode cache information consistently
 So, submitting the rest of the series here.

 Changes:
 1. Included Eduardo's clean up patch
 2. Added 2.13 machine types
 3. Disabled topoext for 2.12 and below versions.
 4. Added the assert to core_id as discussed.

v8:
 Addressed feedback from Eduardo. Thanks Eduardo for being patient with me.
 Tested on AMD EPYC server and also did some basic testing on intel box.
 Summary of changes.
 1. Reverted back l2 cache associativity. Kept it same as legacy.
 2. Changed cache_info structure in X86CPUDefinition and CPUX86State to 
pointers.
 3. Added legacy_cache property in PC_COMPAT_2_12 

[Qemu-devel] [PATCH v16 3/3] i386: Remove generic SMT thread check

2018-06-19 Thread Babu Moger
Remove generic non-intel check while validating hyperthreading support.
Certain AMD CPUs can support hyperthreading now.

CPU family with TOPOEXT feature can support hyperthreading now.

Signed-off-by: Babu Moger 
Tested-by: Geoffrey McRae 
Reviewed-by: Eduardo Habkost 
---
 target/i386/cpu.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index d6ed29b..e6c2f8a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4985,17 +4985,22 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 
 qemu_init_vcpu(cs);
 
-/* Only Intel CPUs support hyperthreading. Even though QEMU fixes this
- * issue by adjusting CPUID__0001_EBX and CPUID_8000_0008_ECX
- * based on inputs (sockets,cores,threads), it is still better to gives
+/*
+ * Most Intel and certain AMD CPUs support hyperthreading. Even though QEMU
+ * fixes this issue by adjusting CPUID__0001_EBX and 
CPUID_8000_0008_ECX
+ * based on inputs (sockets,cores,threads), it is still better to give
  * users a warning.
  *
  * NOTE: the following code has to follow qemu_init_vcpu(). Otherwise
  * cs->nr_threads hasn't be populated yet and the checking is incorrect.
  */
-if (!IS_INTEL_CPU(env) && cs->nr_threads > 1 && !ht_warned) {
-error_report("AMD CPU doesn't support hyperthreading. Please configure"
- " -smp options properly.");
+ if (IS_AMD_CPU(env) &&
+ !(env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_TOPOEXT) &&
+ cs->nr_threads > 1 && !ht_warned) {
+error_report("This family of AMD CPU doesn't support "
+ "hyperthreading(%d). Please configure -smp "
+ "options properly or try enabling topoext feature.",
+ cs->nr_threads);
 ht_warned = true;
 }
 
-- 
1.8.3.1




Re: [Qemu-devel] [Qemu-block] [RFC 1/1] ide: bug #1777315: io_buffer_size and sg.size can represent partial sector sizes

2018-06-19 Thread Amol Surati
On Tue, Jun 19, 2018 at 08:04:03PM +0530, Amol Surati wrote:
> On Tue, Jun 19, 2018 at 09:45:15AM -0400, John Snow wrote:
> > 
> > 
> > On 06/19/2018 04:53 AM, Kevin Wolf wrote:
> > > Am 19.06.2018 um 06:01 hat Amol Surati geschrieben:
> > >> On Mon, Jun 18, 2018 at 08:14:10PM -0400, John Snow wrote:
> > >>>
> > >>>
> > >>> On 06/18/2018 02:02 PM, Amol Surati wrote:
> >  On Mon, Jun 18, 2018 at 12:05:15AM +0530, Amol Surati wrote:
> > > This patch fixes the assumption that io_buffer_size is always a 
> > > perfect
> > > multiple of the sector size. The assumption is the cause of the firing
> > > of 'assert(n * 512 == s->sg.size);'.
> > >
> > > Signed-off-by: Amol Surati 
> > > ---
> > 
> >  The repository https://github.com/asurati/1777315 contains a module for
> >  QEMU's 8086:7010 ATA controller, which exercises the code path
> >  described in [RFC 0/1] of this series.
> > 
> > >>>
> > >>> Thanks, this made it easier to see what was happening. I was able to
> > >>> write an ide-test test case using this source as a guide, and reproduce
> > >>> the error.
> > >>>
> > >>> static void test_bmdma_partial_sector_short_prdt(void)
> > >>> {
> > >>> QPCIDevice *dev;
> > >>> QPCIBar bmdma_bar, ide_bar;
> > >>> uint8_t status;
> > >>>
> > >>> /* Read 2 sectors but only give 1 sector in PRDT */
> > >>> PrdtEntry prdt[] = {
> > >>> {
> > >>> .addr = 0,
> > >>> .size = cpu_to_le32(0x200),
> > >>> },
> > >>> {
> > >>> .addr = 512,
> > >>> .size = cpu_to_le32(0x44 | PRDT_EOT),
> > >>> }
> > >>> };
> > >>>
> > >>> dev = get_pci_device(_bar, _bar);
> > >>> status = send_dma_request(CMD_READ_DMA, 0, 2,
> > >>>   prdt, ARRAY_SIZE(prdt), NULL);
> > >>> g_assert_cmphex(status, ==, 0);
> > >>> assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | ERR);
> > >>> free_pci_device(dev);
> > >>> }
> > >>>
> >  Loading the module reproduces the bug. Tested on the latest master
> >  branch.
> > 
> >  Steps:
> >  - Install a Linux distribution as a guest, ensuring that the boot disk
> >    resides on non-IDE controllers (such as virtio)
> >  - Attach another disk as a master device on the primary
> >    IDE controller (i.e. attach at -hda.)
> >  - Blacklist ata_piix, pata_acpi and ata_generic modules, and reboot.
> >  - Copy the source files into the guest and build the module.
> >  - Load the module. QEMU process should die with the message:
> >    qemu-system-x86_64: hw/ide/core.c:871: ide_dma_cb:
> >    Assertion `n * 512 == s->sg.size' failed.
> > 
> > 
> >  -Amol
> > 
> > >>>
> > >>> I'm less sure of the fix -- certainly the assert is wrong, but just
> > >>> incrementing 'n' is wrong too -- we didn't copy (n+1) sectors, we copied
> > >>> (n) and a few extra bytes.
> > >>
> > >> That is true.
> > >>
> > >> There are (at least) two fields that represent the total size of a DMA
> > >> transfer -
> > >> (1) The size, as requested through the NSECTOR field.
> > >> (2) The size, as calculated through the length fields of the PRD entries.
> > >>
> > >> It makes sense to consider the most restrictive of the sizes, as the 
> > >> factor
> > >> which determines both the end of a successful DMA transfer and the
> > >> condition to assert.
> > >>
> > >>>
> > >>> The sector-based math here would need to be adjusted to be able to cope
> > >>> with partial sector reads... or we ought to avoid doing any partial
> > >>> sector transfers.
> > >>>
> > >>>
> > >>> I'm not sure which is more correct tonight, it depends:
> > >>>
> > >>> - If it's OK to transfer partial sectors before reporting overflow,
> > >>> adjusting the command loop to work with partial sectors is OK.
> > >>>
> > >>> - If it's NOT OK to do partial sector transfer, the sglist preparation
> > >>> phase needs to produce a truncated SGList that's some multiple of 512
> > >>> bytes that leaves the excess bytes in a second sglist that we don't
> > >>> throw away and can use as a basis for building the next sglist. (Or the
> > >>> DMA helpers need to take a max_bytes parameter and return an sglist
> > >>> representing unused buffer space if the command underflowed.)
> > >>
> > >> Support for partial sector transfers is built into the DMA interface's 
> > >> PRD
> > >> mechanism itself, because an entry is allowed to transfer in the units of
> > >> even number of bytes.
> > >>
> > >> I think the controller's IO process runs in two parts (probably loops 
> > >> over
> > >> for a single transfer):
> > >>
> > >> (1) The controller's disk interface transfers between its internal buffer
> > >> and the disk storage. The transfers are likely to be in the
> > >> multiples of a sector.
> > >> (2) The controller's DMA interface transfers between its internal buffer
> > >> and the system memory. The transfers 

Re: [Qemu-devel] [PATCH] tests: Simplify .gitignore

2018-06-19 Thread Philippe Mathieu-Daudé
On 06/19/2018 05:39 PM, Eric Blake wrote:
> Commit 0bcc8e5b was yet another instance of 'git status' reporting
> dirty files after an in-tree build, thanks to the new binary
> tests/check-block-qdict.
> 
> Instead of piecemeal exemptions of each new binary as they are
> added, let's use git's negative globbing feature to exempt ALL
> files that have a 'test-' or 'check-' prefix, except for the ones
> ending in '.c' or '.sh'.  We still have a couple of generated
> files that then need (re-)exclusion, but the overall list is a
> LOT shorter, and less prone to needing future edits.

Finally :)

> 
> Signed-off-by: Eric Blake 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  tests/.gitignore | 93 
> +++-
>  1 file changed, 5 insertions(+), 88 deletions(-)
> 
> diff --git a/tests/.gitignore b/tests/.gitignore
> index 2bc61a9a58d..08e2df1ce1f 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -2,101 +2,18 @@ atomic_add-bench
>  benchmark-crypto-cipher
>  benchmark-crypto-hash
>  benchmark-crypto-hmac
> -check-qdict
> -check-qnum
> -check-qjson
> -check-qlist
> -check-qlit
> -check-qnull
> -check-qobject
> -check-qstring
> -check-qom-interface
> -check-qom-proplist
> +check-*
> +!check-*.c
> +!check-*.sh
>  qht-bench
>  rcutorture
> -test-aio
> -test-aio-multithread
> -test-arm-mptimer
> -test-base64
> -test-bdrv-drain
> -test-bitops
> -test-bitcnt
> -test-block-backend
> -test-blockjob
> -test-blockjob-txn
> -test-bufferiszero
> -test-char
> -test-clone-visitor
> -test-coroutine
> -test-crypto-afsplit
> -test-crypto-block
> -test-crypto-cipher
> -test-crypto-hash
> -test-crypto-hmac
> -test-crypto-ivgen
> -test-crypto-pbkdf
> -test-crypto-secret
> -test-crypto-tlscredsx509
> -test-crypto-tlscredsx509-work/
> -test-crypto-tlscredsx509-certs/
> -test-crypto-tlssession
> -test-crypto-tlssession-work/
> -test-crypto-tlssession-client/
> -test-crypto-tlssession-server/
> -test-crypto-xts
> -test-cutils
> -test-hbitmap
> -test-hmp
> -test-int128
> -test-iov
> -test-io-channel-buffer
> -test-io-channel-command
> -test-io-channel-command.fifo
> -test-io-channel-file
> -test-io-channel-file.txt
> -test-io-channel-socket
> -test-io-channel-tls
> -test-io-task
> -test-keyval
> -test-logging
> -test-mul64
> -test-opts-visitor
> +test-*
> +!test-*.c
>  test-qapi-commands.[ch]
>  test-qapi-events.[ch]
>  test-qapi-types.[ch]
> -test-qapi-util
>  test-qapi-visit.[ch]
> -test-qdev-global-props
> -test-qemu-opts
> -test-qdist
> -test-qga
> -test-qht
> -test-qht-par
> -test-qmp-cmds
> -test-qmp-event
> -test-qobject-input-strict
> -test-qobject-input-visitor
>  test-qapi-introspect.[ch]
> -test-qobject-output-visitor
> -test-rcu-list
> -test-replication
> -test-shift128
> -test-string-input-visitor
> -test-string-output-visitor
> -test-thread-pool
> -test-throttle
> -test-timed-average
> -test-uuid
> -test-util-sockets
> -test-visitor-serialization
> -test-vmstate
> -test-write-threshold
> -test-x86-cpuid
> -test-x86-cpuid-compat
> -test-xbzrle
> -test-netfilter
> -test-filter-mirror
> -test-filter-redirector
>  *-test
>  qapi-schema/*.test.*
>  vm/*.img
> 



Re: [Qemu-devel] [PATCH v3 1/2] kvm: support -dedicated cpu-pm=on|off

2018-06-19 Thread Michael S. Tsirkin
On Tue, Jun 19, 2018 at 05:17:45PM +0200, Paolo Bonzini wrote:
> On 16/06/2018 00:29, Michael S. Tsirkin wrote:
> >  
> > +static QemuOptsList qemu_dedicated_opts = {
> > +.name = "dedicated",
> > +.head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
> > +.desc = {
> > +{
> > +.name = "mem-lock",
> > +.type = QEMU_OPT_BOOL,
> > +},
> > +{
> > +.name = "cpu-pm",
> > +.type = QEMU_OPT_BOOL,
> > +},
> > +{ /* end of list */ }
> > +},
> > +};
> > +
> 
> Let the bikeshedding begin!
> 
> 1) Should we deprecate -realtime?

Can be a patch on top, by whoever cares.

> 2) Maybe -hostresource?
> 
> Paolo

Is ability to cause high latency for other threads really a resource?

The issues in question:
1. a malicious guest can cause high latency for others sharing the host cpu.
2. to host scheduler cpu looks busier than it really is.

All are avoided if you use a dedicated host cpu, and 2 will
help scheduler get closer to giving you one.


-- 
MST



[Qemu-devel] [PATCH] target/arm: Set strict alignment for ARMv6-M load/store

2018-06-19 Thread Julia Suvorova via Qemu-devel
Unlike ARMv7-M, ARMv6-M only supports naturally aligned memory accesses
for 16-bit halfword and 32-bit word accesses using the LDR, LDRH,
LDRSH, STR and STRH instructions.

Signed-off-by: Julia Suvorova 
---
 target/arm/translate.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index b988d379e7..d923cbe98e 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1100,7 +1100,14 @@ static inline TCGv gen_aa32_addr(DisasContext *s, 
TCGv_i32 a32, TCGMemOp op)
 static void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
 int index, TCGMemOp opc)
 {
-TCGv addr = gen_aa32_addr(s, a32, opc);
+TCGv addr;
+
+if (arm_dc_feature(s, ARM_FEATURE_M) &&
+!arm_dc_feature(s, ARM_FEATURE_V7)) {
+opc |= MO_ALIGN;
+}
+
+addr = gen_aa32_addr(s, a32, opc);
 tcg_gen_qemu_ld_i32(val, addr, index, opc);
 tcg_temp_free(addr);
 }
@@ -1108,7 +1115,14 @@ static void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 
val, TCGv_i32 a32,
 static void gen_aa32_st_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
 int index, TCGMemOp opc)
 {
-TCGv addr = gen_aa32_addr(s, a32, opc);
+TCGv addr;
+
+if (arm_dc_feature(s, ARM_FEATURE_M) &&
+!arm_dc_feature(s, ARM_FEATURE_V7)) {
+opc |= MO_ALIGN;
+}
+
+addr = gen_aa32_addr(s, a32, opc);
 tcg_gen_qemu_st_i32(val, addr, index, opc);
 tcg_temp_free(addr);
 }
-- 
2.17.0




[Qemu-devel] [PATCH] tests: Simplify .gitignore

2018-06-19 Thread Eric Blake
Commit 0bcc8e5b was yet another instance of 'git status' reporting
dirty files after an in-tree build, thanks to the new binary
tests/check-block-qdict.

Instead of piecemeal exemptions of each new binary as they are
added, let's use git's negative globbing feature to exempt ALL
files that have a 'test-' or 'check-' prefix, except for the ones
ending in '.c' or '.sh'.  We still have a couple of generated
files that then need (re-)exclusion, but the overall list is a
LOT shorter, and less prone to needing future edits.

Signed-off-by: Eric Blake 
---
 tests/.gitignore | 93 +++-
 1 file changed, 5 insertions(+), 88 deletions(-)

diff --git a/tests/.gitignore b/tests/.gitignore
index 2bc61a9a58d..08e2df1ce1f 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -2,101 +2,18 @@ atomic_add-bench
 benchmark-crypto-cipher
 benchmark-crypto-hash
 benchmark-crypto-hmac
-check-qdict
-check-qnum
-check-qjson
-check-qlist
-check-qlit
-check-qnull
-check-qobject
-check-qstring
-check-qom-interface
-check-qom-proplist
+check-*
+!check-*.c
+!check-*.sh
 qht-bench
 rcutorture
-test-aio
-test-aio-multithread
-test-arm-mptimer
-test-base64
-test-bdrv-drain
-test-bitops
-test-bitcnt
-test-block-backend
-test-blockjob
-test-blockjob-txn
-test-bufferiszero
-test-char
-test-clone-visitor
-test-coroutine
-test-crypto-afsplit
-test-crypto-block
-test-crypto-cipher
-test-crypto-hash
-test-crypto-hmac
-test-crypto-ivgen
-test-crypto-pbkdf
-test-crypto-secret
-test-crypto-tlscredsx509
-test-crypto-tlscredsx509-work/
-test-crypto-tlscredsx509-certs/
-test-crypto-tlssession
-test-crypto-tlssession-work/
-test-crypto-tlssession-client/
-test-crypto-tlssession-server/
-test-crypto-xts
-test-cutils
-test-hbitmap
-test-hmp
-test-int128
-test-iov
-test-io-channel-buffer
-test-io-channel-command
-test-io-channel-command.fifo
-test-io-channel-file
-test-io-channel-file.txt
-test-io-channel-socket
-test-io-channel-tls
-test-io-task
-test-keyval
-test-logging
-test-mul64
-test-opts-visitor
+test-*
+!test-*.c
 test-qapi-commands.[ch]
 test-qapi-events.[ch]
 test-qapi-types.[ch]
-test-qapi-util
 test-qapi-visit.[ch]
-test-qdev-global-props
-test-qemu-opts
-test-qdist
-test-qga
-test-qht
-test-qht-par
-test-qmp-cmds
-test-qmp-event
-test-qobject-input-strict
-test-qobject-input-visitor
 test-qapi-introspect.[ch]
-test-qobject-output-visitor
-test-rcu-list
-test-replication
-test-shift128
-test-string-input-visitor
-test-string-output-visitor
-test-thread-pool
-test-throttle
-test-timed-average
-test-uuid
-test-util-sockets
-test-visitor-serialization
-test-vmstate
-test-write-threshold
-test-x86-cpuid
-test-x86-cpuid-compat
-test-xbzrle
-test-netfilter
-test-filter-mirror
-test-filter-redirector
 *-test
 qapi-schema/*.test.*
 vm/*.img
-- 
2.14.4




Re: [Qemu-devel] [virtio-dev] Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-19 Thread Michael S. Tsirkin
On Tue, Jun 19, 2018 at 12:54:53PM +0200, Cornelia Huck wrote:
> Sorry about dragging mainframes into this, but this will only work for
> homogenous device coupling, not for heterogenous. Consider my vfio-pci
> + virtio-net-ccw example again: The guest cannot find out that the two
> belong together by checking some group ID, it has to either use the MAC
> or some needs-to-be-architectured property.
> 
> Alternatively, we could propose that mechanism as pci-only, which means
> we can rely on mechanisms that won't necessarily work on non-pci
> transports. (FWIW, I don't see a use case for using vfio-ccw to pass
> through a network card anytime in the near future, due to the nature of
> network cards currently in use on s390.)

That's what it boils down to, yes.  If there's need to have this for
non-pci devices, then we should put it in config space.
Cornelia, what do you think?

-- 
MST



[Qemu-devel] [PATCH v2] xilinx_spips: Make dma transactions as per dma_burst_size

2018-06-19 Thread Sai Pavan Boddu
Qspi dma has a burst length of 64 bytes, So limit transaction length to
64 max.

Signed-off-by: Sai Pavan Boddu 
---
 hw/ssi/xilinx_spips.c | 20 +---
 include/hw/ssi/xilinx_spips.h |  5 -
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c
index 03f5fae..0ea57d1 100644
--- a/hw/ssi/xilinx_spips.c
+++ b/hw/ssi/xilinx_spips.c
@@ -851,12 +851,17 @@ static void xlnx_zynqmp_qspips_notify(void *opaque)
 {
 size_t ret;
 uint32_t num;
-const void *rxd = pop_buf(recv_fifo, 4, );
+const void *rxd;
+int len;
+
+len = recv_fifo->num >= rq->dma_burst_size ? rq->dma_burst_size :
+   recv_fifo->num;
+rxd = pop_buf(recv_fifo, len, );
 
 memcpy(rq->dma_buf, rxd, num);
 
-ret = stream_push(rq->dma, rq->dma_buf, 4);
-assert(ret == 4);
+ret = stream_push(rq->dma, rq->dma_buf, num);
+assert(ret == num);
 xlnx_zynqmp_qspips_check_flush(rq);
 }
 }
@@ -1337,6 +1342,9 @@ static void xlnx_zynqmp_qspips_realize(DeviceState *dev, 
Error **errp)
 fifo8_create(>rx_fifo_g, xsc->rx_fifo_size);
 fifo8_create(>tx_fifo_g, xsc->tx_fifo_size);
 fifo32_create(>fifo_g, 32);
+if (s->dma_burst_size > QSPI_DMA_MAX_BURST_SIZE) {
+s->dma_burst_size = QSPI_DMA_MAX_BURST_SIZE;
+}
 }
 
 static void xlnx_zynqmp_qspips_init(Object *obj)
@@ -1411,6 +1419,11 @@ static const VMStateDescription 
vmstate_xlnx_zynqmp_qspips = {
 }
 };
 
+static Property xilinx_zynqmp_qspips_properties[] = {
+DEFINE_PROP_UINT32("dma-burst-size", XlnxZynqMPQSPIPS, dma_burst_size, 64),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static Property xilinx_qspips_properties[] = {
 /* We had to turn this off for 2.10 as it is not compatible with migration.
  * It can be enabled but will prevent the device to be migrated.
@@ -1463,6 +1476,7 @@ static void xlnx_zynqmp_qspips_class_init(ObjectClass 
*klass, void * data)
 dc->realize = xlnx_zynqmp_qspips_realize;
 dc->reset = xlnx_zynqmp_qspips_reset;
 dc->vmsd = _xlnx_zynqmp_qspips;
+dc->props = xilinx_zynqmp_qspips_properties;
 xsc->reg_ops = _zynqmp_qspips_ops;
 xsc->rx_fifo_size = RXFF_A_Q;
 xsc->tx_fifo_size = TXFF_A_Q;
diff --git a/include/hw/ssi/xilinx_spips.h b/include/hw/ssi/xilinx_spips.h
index d398a4e..bc5596a 100644
--- a/include/hw/ssi/xilinx_spips.h
+++ b/include/hw/ssi/xilinx_spips.h
@@ -37,6 +37,8 @@ typedef struct XilinxSPIPS XilinxSPIPS;
 /* Bite off 4k chunks at a time */
 #define LQSPI_CACHE_SIZE 1024
 
+#define QSPI_DMA_MAX_BURST_SIZE 2048
+
 typedef enum {
 READ = 0x3, READ_4 = 0x13,
 FAST_READ = 0xb,FAST_READ_4 = 0x0c,
@@ -95,7 +97,8 @@ typedef struct {
 XilinxQSPIPS parent_obj;
 
 StreamSlave *dma;
-uint8_t dma_buf[4];
+uint8_t dma_buf[QSPI_DMA_MAX_BURST_SIZE];
+uint32_t dma_burst_size;
 int gqspi_irqline;
 
 uint32_t regs[XLNX_ZYNQMP_SPIPS_R_MAX];
-- 
2.7.4




Re: [Qemu-devel] [PATCH v7 1/3] qmp: adding 'wakeup-suspend-support' in query-target

2018-06-19 Thread Daniel Henrique Barboza

Hi,

Sorry for the delay. I'll summarize what I've understood from the discussion
so far:

- query-target is the wrong place for this flag. query-machines is 
(less) wrong

because it is not a static property of the machine object

- a new "query-current-machine" can be created to host these dynamic
properties that belongs to the current instance of the VM

- there are machines in which the suspend support may vary with a
"-no-acpi" option that would disable both the suspend and wake-up
support. In this case, I see no problem into counting this flag into
the logic (assuming it is possible, of course) and setting it as "false"
if there is -no-acpi present (or even making the API returning "yes",
"no" or "acpi" like Markus suggested) somewhere.


Based on the last email from Eduardo, apparently there is a handful
of other machine properties that can be hosted in either this new
query-current-machine API or query-machines. I believe that this is
more of a long term goal, but this new query-current-machine API
would be a good kick-off and we should go for it.

Is this a fair understanding? Did I miss something?


Thanks,


Daniel



On 05/29/2018 11:55 AM, Eduardo Habkost wrote:

On Mon, May 28, 2018 at 09:23:54AM +0200, Markus Armbruster wrote:

Eduardo Habkost  writes:

[...]

[1] Doing a:
   $ git grep 'STR.*machine, "'
on libvirt source is enough to find some code demonstrating where
query-machines is already lacking today:

[...]

How can we get from this grep to a list of static or dynamic machine
type capabilties?

Let's look at the code:


$ git grep -W 'STR.*machine, "'
src/libxl/libxl_capabilities.c=libxlMakeDomainOSCaps(const char *machine,
src/libxl/libxl_capabilities.c-  virDomainCapsOSPtr os,
src/libxl/libxl_capabilities.c-  virFirmwarePtr *firmwares,
src/libxl/libxl_capabilities.c-  size_t nfirmwares)
src/libxl/libxl_capabilities.c-{
src/libxl/libxl_capabilities.c-virDomainCapsLoaderPtr capsLoader = 
>loader;
src/libxl/libxl_capabilities.c-size_t i;
src/libxl/libxl_capabilities.c-
src/libxl/libxl_capabilities.c-os->supported = true;
src/libxl/libxl_capabilities.c-
src/libxl/libxl_capabilities.c:if (STREQ(machine, "xenpv"))
src/libxl/libxl_capabilities.c-return 0;

I don't understand why this one is here, but we can find out what
we could add to query-machines to make this unnecessary.


[...]
--
src/libxl/libxl_capabilities.c=libxlMakeDomainCapabilities(virDomainCapsPtr 
domCaps,
src/libxl/libxl_capabilities.c-virFirmwarePtr 
*firmwares,
src/libxl/libxl_capabilities.c-size_t nfirmwares)
src/libxl/libxl_capabilities.c-{
src/libxl/libxl_capabilities.c-virDomainCapsOSPtr os = >os;
src/libxl/libxl_capabilities.c-virDomainCapsDeviceDiskPtr disk = 
>disk;
src/libxl/libxl_capabilities.c-virDomainCapsDeviceGraphicsPtr graphics = 
>graphics;
src/libxl/libxl_capabilities.c-virDomainCapsDeviceVideoPtr video = 
>video;
src/libxl/libxl_capabilities.c-virDomainCapsDeviceHostdevPtr hostdev = 
>hostdev;
src/libxl/libxl_capabilities.c-
src/libxl/libxl_capabilities.c:if (STREQ(domCaps->machine, "xenfv"))
src/libxl/libxl_capabilities.c-domCaps->maxvcpus = HVM_MAX_VCPUS;
src/libxl/libxl_capabilities.c-else
src/libxl/libxl_capabilities.c-domCaps->maxvcpus = PV_MAX_VCPUS;

Looks like libvirt isn't using MachineInfo::cpu-max.  libvirt
bug, or workaround for QEMU limitation?


[...]
--
src/libxl/libxl_driver.c=libxlConnectGetDomainCapabilities(virConnectPtr conn,
src/libxl/libxl_driver.c-  const char 
*emulatorbin,
src/libxl/libxl_driver.c-  const char *arch_str,
src/libxl/libxl_driver.c-  const char *machine,
src/libxl/libxl_driver.c-  const char 
*virttype_str,
src/libxl/libxl_driver.c-  unsigned int flags)
src/libxl/libxl_driver.c-{
[...]
src/libxl/libxl_driver.c-if (machine) {
src/libxl/libxl_driver.c:if (STRNEQ(machine, "xenpv") && STRNEQ(machine, 
"xenfv")) {
src/libxl/libxl_driver.c-virReportError(VIR_ERR_INVALID_ARG, "%s",
src/libxl/libxl_driver.c-   _("Xen only supports 'xenpv' and 
'xenfv' machines"));


Not sure if this should be encoded in QEMU.  accel=xen works with
other PC machines, doesn't it?


[...]
--
src/qemu/qemu_capabilities.c=bool virQEMUCapsHasPCIMultiBus(virQEMUCapsPtr 
qemuCaps,
src/qemu/qemu_capabilities.c-   const virDomainDef 
*def)
src/qemu/qemu_capabilities.c-{
src/qemu/qemu_capabilities.c-/* x86_64 and i686 support PCI-multibus on all 
machine types
src/qemu/qemu_capabilities.c- * since forever */
src/qemu/qemu_capabilities.c-if (ARCH_IS_X86(def->os.arch))
src/qemu/qemu_capabilities.c-return true;
src/qemu/qemu_capabilities.c-

Re: [Qemu-devel] [PATCH] xilinx_spips: Make dma transactions as per dma_burst_size

2018-06-19 Thread Sai Pavan Boddu
Hi Edgar,

I got your suggestion below. Will be sending a V2 asap.

Thanks,
Sai Pavan

> -Original Message-
> From: Edgar E. Iglesias [mailto:edgar.igles...@xilinx.com]
> Sent: Thursday, June 14, 2018 4:55 PM
> To: Sai Pavan Boddu 
> Cc: qemu-devel@nongnu.org; Alistair Francis ; Peter
> Crosthwaite ; Peter Maydell
> ; Francisco Iglesias 
> Subject: Re: [PATCH] xilinx_spips: Make dma transactions as per dma_burst_size
> 
> On Thu, Jun 14, 2018 at 10:57:04AM +0530, Sai Pavan Boddu wrote:
> > Qspi dma has a burst length of 64 bytes, So limit transaction length
> > to
> > 64 max.
> 
> Hi Sai,
> 
> Is this a v2 or a resend?
> 
> >
> > Signed-off-by: Sai Pavan Boddu 
> > ---
> >  hw/ssi/xilinx_spips.c | 18 +++---
> >  include/hw/ssi/xilinx_spips.h |  3 ++-
> >  2 files changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c index
> > 03f5fae..ea006c4 100644
> > --- a/hw/ssi/xilinx_spips.c
> > +++ b/hw/ssi/xilinx_spips.c
> > @@ -851,12 +851,17 @@ static void xlnx_zynqmp_qspips_notify(void
> *opaque)
> >  {
> >  size_t ret;
> >  uint32_t num;
> > -const void *rxd = pop_buf(recv_fifo, 4, );
> > +const void *rxd;
> > +int len;
> > +
> > +len = recv_fifo->num >= rq->dma_burst_size ? rq->dma_burst_size :
> > +   recv_fifo->num;
> > +rxd = pop_buf(recv_fifo, len, );
> >
> >  memcpy(rq->dma_buf, rxd, num);
> >
> > -ret = stream_push(rq->dma, rq->dma_buf, 4);
> > -assert(ret == 4);
> > +ret = stream_push(rq->dma, rq->dma_buf, num);
> > +assert(ret == num);
> >  xlnx_zynqmp_qspips_check_flush(rq);
> >  }
> >  }
> > @@ -1337,6 +1342,7 @@ static void xlnx_zynqmp_qspips_realize(DeviceState
> *dev, Error **errp)
> >  fifo8_create(>rx_fifo_g, xsc->rx_fifo_size);
> >  fifo8_create(>tx_fifo_g, xsc->tx_fifo_size);
> >  fifo32_create(>fifo_g, 32);
> > +s->dma_buf = g_new0(uint8_t, s->dma_burst_size);
> 
> 
> This would need to be free'd somewhere.
> But I think you should put a reasonably small limit on the burst size that 
> users
> can configure and then you can allocate this as an array in XlnxZynqMPQSPIPS.
> 
> 
> 
> >  }
> >
> >  static void xlnx_zynqmp_qspips_init(Object *obj) @@ -1411,6 +1417,11
> > @@ static const VMStateDescription vmstate_xlnx_zynqmp_qspips = {
> >  }
> >  };
> >
> > +static Property xilinx_zynqmp_qspips_properties[] = {
> > +DEFINE_PROP_UINT32("dma-burst-size", XlnxZynqMPQSPIPS,
> > +dma_burst_size, 64),
> 
> You need to limit this so users dont pick 4G. Perhaps 2 or 4K max.
> 
> Cheers,
> Edgar
> 
> 
> > +DEFINE_PROP_END_OF_LIST(),
> > +};
> > +
> >  static Property xilinx_qspips_properties[] = {
> >  /* We had to turn this off for 2.10 as it is not compatible with 
> > migration.
> >   * It can be enabled but will prevent the device to be migrated.
> > @@ -1463,6 +1474,7 @@ static void
> xlnx_zynqmp_qspips_class_init(ObjectClass *klass, void * data)
> >  dc->realize = xlnx_zynqmp_qspips_realize;
> >  dc->reset = xlnx_zynqmp_qspips_reset;
> >  dc->vmsd = _xlnx_zynqmp_qspips;
> > +dc->props = xilinx_zynqmp_qspips_properties;
> >  xsc->reg_ops = _zynqmp_qspips_ops;
> >  xsc->rx_fifo_size = RXFF_A_Q;
> >  xsc->tx_fifo_size = TXFF_A_Q;
> > diff --git a/include/hw/ssi/xilinx_spips.h
> > b/include/hw/ssi/xilinx_spips.h index d398a4e..cca1813 100644
> > --- a/include/hw/ssi/xilinx_spips.h
> > +++ b/include/hw/ssi/xilinx_spips.h
> > @@ -95,7 +95,8 @@ typedef struct {
> >  XilinxQSPIPS parent_obj;
> >
> >  StreamSlave *dma;
> > -uint8_t dma_buf[4];
> > +uint8_t *dma_buf;
> > +uint32_t dma_burst_size;
> >  int gqspi_irqline;
> >
> >  uint32_t regs[XLNX_ZYNQMP_SPIPS_R_MAX];
> > --
> > 2.7.4
> >



Re: [Qemu-devel] [PATCH v5 3/6] nbd/server: add nbd_meta_empty_or_pattern helper

2018-06-19 Thread Eric Blake

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Add nbd_meta_pattern() and nbd_meta_empty_or_pattern() helpers for
metadata query parsing. nbd_meta_pattern() will be reused for "qemu"


s/for/for the/


namespace in following patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  nbd/server.c | 86 +---
  1 file changed, 59 insertions(+), 27 deletions(-)


Feels like growth, even though the goal of refactoring is reuse; but the 
reuse comes later so I'm okay with it.




diff --git a/nbd/server.c b/nbd/server.c
index 567561a77e..2d762d7289 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -733,52 +733,83 @@ static int nbd_negotiate_send_meta_context(NBDClient 
*client,
  return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ? -EIO : 0;
  }
  
-/* nbd_meta_base_query

- *
- * Handle query to 'base' namespace. For now, only base:allocation context is


[1]...


- * available in it.  'len' is the amount of text remaining to be read from
- * the current name, after the 'base:' portion has been stripped.
+/* Read strlen(@pattern) bytes, and set @match to true if they match @pattern.
+ * @match is never set to false.
   *
   * Return -errno on I/O error, 0 if option was completely handled by
   * sending a reply about inconsistent lengths, or 1 on success.
   *
- * Note: return code = 1 doesn't mean that we've parsed "base:allocation"
- * namespace. It only means that there are no errors.*/
-static int nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
-   uint32_t len, Error **errp)
+ * Note: return code = 1 doesn't mean that we've read exactly @pattern
+ * It only means that there are no errors. */


Comment tail on its own line (now that we've got a patch pending for 
HACKING to document that, I'll start abiding by it...)



+static int nbd_meta_pattern(NBDClient *client, const char *pattern, bool 
*match,
+Error **errp)
  {
  int ret;
-char query[sizeof("allocation") - 1];
-size_t alen = strlen("allocation");
+char *query;
+int len = strlen(pattern);


size_t is better than len for strlen() results.

  
-if (len == 0) {

-if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
-meta->base_allocation = true;
-}
-trace_nbd_negotiate_meta_query_parse("base:");
-return 1;
-}
-
-if (len != alen) {
-trace_nbd_negotiate_meta_query_skip("not base:allocation");
-return nbd_opt_skip(client, len, errp);
-}
+assert(len);
  
+query = g_malloc(len);


At first, I wondered if we could just use a pre-allocated stack buffer 
larger than any string we ever anticipate.  But thinking about it, your 
dirty bitmap exports expose a name under user control, which means a 
user could (spitefully) pick a name longer than our buffer (well, up to 
the 4k name limit imposed by the NBD protocol).  So I can live with the 
malloc.



  ret = nbd_opt_read(client, query, len, errp);
  if (ret <= 0) {
+g_free(query);
  return ret;
  }
  
-if (strncmp(query, "allocation", alen) == 0) {

-trace_nbd_negotiate_meta_query_parse("base:allocation");
-meta->base_allocation = true;
+if (strncmp(query, pattern, len) == 0) {
+trace_nbd_negotiate_meta_query_parse(pattern);
+*match = true;
  } else {
-trace_nbd_negotiate_meta_query_skip("not base:allocation");
+trace_nbd_negotiate_meta_query_skip(pattern);


Would this one read better as "not %s", pattern?


  }
+g_free(query);
  
  return 1;

  }
  
+/* Read @len bytes, and set @match to true if they match @pattern, or if @len

+ * is 0 and the client is performing _LIST_. @match is never set to false.
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success.
+ *
+ * Note: return code = 1 doesn't mean that we've read exactly @pattern
+ * It only means that there are no errors. */


More comment formatting.


+static int nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern,
+ uint32_t len, bool *match, Error **errp)
+{
+if (len == 0) {
+if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
+*match = true;
+}
+trace_nbd_negotiate_meta_query_parse("empty");
+return 1;
+}
+
+if (len != strlen(pattern)) {
+trace_nbd_negotiate_meta_query_skip("different lengths");
+return nbd_opt_skip(client, len, errp);
+}
+
+return nbd_meta_pattern(client, pattern, match, errp);
+}
+
+/* nbd_meta_base_query
+ *
+ * Handle query to 'base' namespace. For now, only base:allocation context is


Pre-existing (see [1]), but reads better as "Handle queries to the 
'base' namespace"



+ * available in it.  'len' is the amount of text remaining to be read from
+ * the current name, 

Re: [Qemu-devel] [PATCH] [RFC v2] aio: properly bubble up errors from initialization

2018-06-19 Thread Nishanth Aravamudan via Qemu-devel
On 19.06.2018 [14:35:33 -0500], Eric Blake wrote:
> On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote:
> > laio_init() can fail for a couple of reasons, which will lead to a NULL
> > pointer dereference in laio_attach_aio_context().
> > 
> > To solve this, add a aio_setup_linux_aio() function which is called
> > before aio_get_linux_aio() where it is called currently, and which
> > propogates setup errors up. The signature of aio_get_linux_aio() was not
> 
> s/propogates/propagates/
 
Thanks!

> > modified, because it seems preferable to return the actual errno from
> > the possible failing initialization calls.
> > 
> > With respect to the error-handling in the file-posix.c, we properly
> > bubble any errors up in raw_co_prw and in the case s of
> > raw_aio_{,un}plug, the result is the same as if s->use_linux_aio was not
> > set (but there is no bubbling up). In all three cases, if the setup
> > function fails, we fallback to the thread pool and an error message is
> > emitted.
> > 
> > It is trivial to make qemu segfault in my testing. Set
> > /proc/sys/fs/aio-max-nr to 0 and start a guest with
> > aio=native,cache=directsync. With this patch, the guest successfully
> > starts (but obviously isn't using native AIO). Setting aio-max-nr back
> > up to a reasonable value, AIO contexts are consumed normally.
> > 
> > Signed-off-by: Nishanth Aravamudan 
> > 
> > ---
> > 
> > Changes from v1 -> v2:
> 
> When posting a v2, it's best to post as a new thread, rather than
> in-reply-to the v1 thread, so that automated tooling knows to check the new
> patch.  More patch submission tips at
> https://wiki.qemu.org/Contribute/SubmitAPatch

My apologies! I'll fix this in a (future) v3.

> > Rather than affect virtio-scsi/blk at all, make all the changes internal
> > to file-posix.c. Thanks to Kevin Wolf for the suggested change.
> > ---
> >   block/file-posix.c  | 24 
> >   block/linux-aio.c   | 15 ++-
> >   include/block/aio.h |  3 +++
> >   include/block/raw-aio.h |  2 +-
> >   stubs/linux-aio.c   |  2 +-
> >   util/async.c| 15 ---
> >   6 files changed, 51 insertions(+), 10 deletions(-)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 07bb061fe4..2415d09bf1 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1665,6 +1665,14 @@ static int coroutine_fn raw_co_prw(BlockDriverState 
> > *bs, uint64_t offset,
> >   type |= QEMU_AIO_MISALIGNED;
> >   #ifdef CONFIG_LINUX_AIO
> >   } else if (s->use_linux_aio) {
> > +int rc;
> > +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs));
> > +if (rc != 0) {
> > +error_report("Unable to use native AIO, falling back to "
> > + "thread pool.");
> 
> In general, error_report() should not output a trailing '.'.

Will fix.

> > +s->use_linux_aio = 0;
> > +return rc;
> 
> Wait - the message claims we are falling back, but the non-zero return code
> sounds like we are returning an error instead of falling back.  (My
> preference - if the user requested something and we can't do it, it's better
> to error than to fall back to something that does not match the user's
> request).

I think that makes sense, I hadn't tested this specific case (in my
reading of the code, it wasn't clear to me if raw_co_prw() could be
called before raw_aio_plug() had been called, but I think returning the
error code up should be handled correctly. What about the cases where
there is no error handling (the other two changes in the patch)?

> > +}
> >   LinuxAioState *aio = 
> > aio_get_linux_aio(bdrv_get_aio_context(bs));
> >   assert(qiov->size == bytes);
> >   return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
> > @@ -1695,6 +1703,14 @@ static void raw_aio_plug(BlockDriverState *bs)
> >   #ifdef CONFIG_LINUX_AIO
> >   BDRVRawState *s = bs->opaque;
> >   if (s->use_linux_aio) {
> > +int rc;
> > +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs));
> > +if (rc != 0) {
> > +error_report("Unable to use native AIO, falling back to "
> > + "thread pool.");
> > +s->use_linux_aio = 0;
> 
> Should s->use_linux_aio be a bool instead of an int?

It is:

bool use_linux_aio:1;

would you prefer I did a preparatory patch that converted users to
true/false?

Thanks,
Nish



Re: [Qemu-devel] [PATCH 2/6] nbd: allow authorization with nbd-server-start QMP command

2018-06-19 Thread Eric Blake

On 06/15/2018 10:50 AM, Daniel P. Berrangé wrote:

From: "Daniel P. Berrange" 

As with the previous patch to qemu-nbd, the nbd-server-start QMP command
also needs to be able to specify authorization when enabling TLS encryption.

First the client must create a QAuthZ object instance using the
'object-add' command:

{
  'execute': 'object-add',
  'arguments': {
'qom-type': 'authz-simple',
'id': 'authz0',
'parameters': {
  'policy': 'deny',
  'rules': [
{
  'match': '*CN=fred',
  'policy': 'allow'
}
  ]
}
  }
}

They can then reference this in the new 'tls-authz' parameter when
executing the 'nbd-server-start' command:

{
  'execute': 'nbd-server-start',
  'arguments': {
'addr': {
'type': 'inet',
'host': '127.0.0.1',
'port': '9000'
},
'tls-creds': 'tls0',
'tls-authz': 'authz0'
  }
}


Is it worth using a discriminated union (string vs. QAuthZ) so that one 
could specify the authz policy inline rather than as a separate object, 
for convenience?  But that would be fine as a followup patch, if we even 
want it.




Signed-off-by: Daniel P. Berrange 
---
  blockdev-nbd.c  | 14 +++---
  hmp.c   |  2 +-
  include/block/nbd.h |  2 +-
  qapi/block.json |  4 +++-
  4 files changed, 16 insertions(+), 6 deletions(-)




@@ -118,6 +121,10 @@ void nbd_server_start(SocketAddress *addr, const char 
*tls_creds,
  }
  }
  
+if (tls_authz) {

+nbd_server->tlsauthz = g_strdup(tls_authz);
+}


Pointless 'if'; g_strdup() does the right thing.


+++ b/qapi/block.json
@@ -197,6 +197,7 @@
  #
  # @addr: Address on which to listen.
  # @tls-creds: (optional) ID of the TLS credentials object. Since 2.6
+# @tls-authz: (optional) ID of the QAuthZ authorization object. Since 2.13


No need for the string '(optional)' (I thought we killed those uses when 
we automated the documentation generation - but obviously a few were 
left behind).


s/2.13/3.0/

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [virtio-dev] Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-19 Thread Siwei Liu
On Tue, Jun 19, 2018 at 3:54 AM, Cornelia Huck  wrote:
> On Fri, 15 Jun 2018 10:06:07 -0700
> Siwei Liu  wrote:
>
>> On Fri, Jun 15, 2018 at 4:48 AM, Cornelia Huck  wrote:
>> > On Thu, 14 Jun 2018 18:57:11 -0700
>> > Siwei Liu  wrote:
>> >
>> >> Thank you for sharing your thoughts, Cornelia. With questions below, I
>> >> think you raised really good points, some of which I don't have answer
>> >> yet and would also like to explore here.
>> >>
>> >> First off, I don't want to push the discussion to the extreme at this
>> >> point, or sell anything about having QEMU manage everything
>> >> automatically. Don't get me wrong, it's not there yet. Let's don't
>> >> assume we are tied to a specific or concerte solution. I think the key
>> >> for our discussion might be to define or refine the boundary between
>> >> VM and guest,  e.g. what each layer is expected to control and manage
>> >> exactly.
>> >>
>> >> In my view, there might be possibly 3 different options to represent
>> >> the failover device conceipt to QEMU and libvirt (or any upper layer
>> >> software):
>> >>
>> >> a. Seperate device: in this model, virtio and passthough remains
>> >> separate devices just as today. QEMU exposes the standby feature bit
>> >> for virtio, and publish status/event around the negotiation process of
>> >> this feature bit for libvirt to react upon. Since Libvirt has the
>> >> pairing relationship itself, maybe through MAC address or something
>> >> else, it can control the presence of primary by hot plugging or
>> >> unplugging the passthrough device, although it has to work tightly
>> >> with virtio's feature negotation process. Not just for migration but
>> >> also various corner scenarios (driver/feature ok, device reset,
>> >> reboot, legacy guest etc) along virtio's feature negotiation.
>> >
>> > Yes, that one has obvious tie-ins to virtio's modus operandi.
>> >
>> >>
>> >> b. Coupled device: in this model, virtio and passthough devices are
>> >> weakly coupled using some group ID, i.e. QEMU match the passthough
>> >> device for a standby virtio instance by comparing the group ID value
>> >> present behind each device's bridge. Libvirt provides QEMU the group
>> >> ID for both type of devices, and only deals with hot plug for
>> >> migration, by checking some migration status exposed (e.g. the feature
>> >> negotiation status on the virtio device) by QEMU. QEMU manages the
>> >> visibility of the primary in guest along virtio's feature negotiation
>> >> process.
>> >
>> > I'm a bit confused here. What, exactly, ties the two devices together?
>>
>> The group UUID. Since QEMU VFIO dvice does not have insight of MAC
>> address (which it doesn't have to), the association between VFIO
>> passthrough and standby must be specificed for QEMU to understand the
>> relationship with this model. Note, standby feature is no longer
>> required to be exposed under this model.
>
> Isn't that a bit limiting, though?
>
> With this model, you can probably tie a vfio-pci device and a
> virtio-net-pci device together. But this will fail if you have
> different transports: Consider tying together a vfio-pci device and a
> virtio-net-ccw device on s390, for example. The standby feature bit is
> on the virtio-net level and should not have any dependency on the
> transport used.

Probably we'd limit the support for grouping to virtio-net-pci device
and vfio-pci device only. For virtio-net-pci, as you might see with
Venu's patch, we store the group UUID on the config space of
virtio-pci, which is only applicable to PCI transport.

If virtio-net-ccw needs to support the same, I think similar grouping
interface should be defined on the VirtIO CCW transport. I think the
current implementation of the Linux failover driver assumes that it's
SR-IOV VF with same MAC address which the virtio-net-pci needs to pair
with, and that the PV path is on same PF without needing to update
network of the port-MAC association change. If we need to extend the
grouping mechanism to virtio-net-ccw, it has to pass such failover
mode to virtio driver specifically through some other option I guess.

>
>>
>> > If libvirt already has the knowledge that it should manage the two as a
>> > couple, why do we need the group id (or something else for other
>> > architectures)? (Maybe I'm simply missing something because I'm not
>> > that familiar with pci.)
>>
>> The idea is to have QEMU control the visibility and enumeration order
>> of the passthrough VFIO for the failover scenario. Hotplug can be one
>> way to achieve it, and perhaps there's other way around also. The
>> group ID is not just for QEMU to couple devices, it's also helpful to
>> guest too as grouping using MAC address is just not safe.
>
> Sorry about dragging mainframes into this, but this will only work for
> homogenous device coupling, not for heterogenous. Consider my vfio-pci
> + virtio-net-ccw example again: The guest cannot find out that the two
> belong together by checking some group ID, it has to 

Re: [Qemu-devel] [PATCH 1/6] qemu-nbd: add support for authorization of TLS clients

2018-06-19 Thread Eric Blake

On 06/15/2018 10:50 AM, Daniel P. Berrangé wrote:

From: "Daniel P. Berrange" 

Currently any client which can complete the TLS handshake is able to use
the NBD server. The server admin can turn on the 'verify-peer' option
for the x509 creds to require the client to provide a x509 certificate.
This means the client will have to acquire a certificate from the CA
before they are permitted to use the NBD server. This is still a fairly
low bar to cross.

This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
takes the ID of a previously added 'QAuthZ' object instance. This will
be used to validate the client's x509 distinguished name. Clients
failing the authorization check will not be permitted to use the NBD
server.

For example to setup authorization that only allows connection from a client
whose x509 certificate distinguished name contains 'CN=fred', you would
use:

   qemu-nbd -object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
endpoint=server,verify-peer=yes \
-object authz-simple,id=authz0,policy=deny,\
   rules.0.match=*CN=fred,rules.0.policy=allow \


s/-object/--object/g


-tls-creds tls0 \
-tls-authz authz0


s/-tls/--tls/g

(qemu-nbd requires double-dash long-opts, -o means --offset except that 
'bject' is not an offset; similarly for -t meaning --persistent)



   other qemu-nbd args...

Signed-off-by: Daniel P. Berrange 
---
  include/block/nbd.h |  2 +-
  nbd/server.c| 12 +++-
  qemu-nbd.c  | 13 -
  qemu-nbd.texi   |  4 
  4 files changed, 24 insertions(+), 7 deletions(-)



+++ b/nbd/server.c



@@ -2153,7 +2153,9 @@ void nbd_client_new(NBDExport *exp,
  if (tlscreds) {
  object_ref(OBJECT(client->tlscreds));
  }
-client->tlsaclname = g_strdup(tlsaclname);
+if (tlsauthz) {
+client->tlsauthz = g_strdup(tlsauthz);
+}


The 'if' is pointless; g_strdup(NULL) is safe.



+++ b/qemu-nbd.c



@@ -533,6 +535,7 @@ int main(int argc, char **argv)
  { "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
  { "trace", required_argument, NULL, 'T' },
  { "fork", no_argument, NULL, QEMU_NBD_OPT_FORK },
+{ "tls-authz", no_argument, NULL, QEMU_NBD_OPT_TLSAUTHZ },


Not your fault, but worth sorting these alphabetically?

Bummer that pre-patch, you could use '--tls' as an unambiguous 
abbreviation for --tls-creds; now it is an ambiguous prefix (you have to 
type --tls-c or --tls-a to get to the point of no ambiguity).  If we 
really cared, we could add:


{ "t", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
{ "tl", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
{ "tls", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
{ "tls-", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },

since getopt_long() no longer reports ambiguity if there is an exact 
match to what is otherwise the common prefix of two ambiguous options. 
But I don't think backwards-compatibility on this front is worth 
worrying about (generally, scripts don't rely on getopt_long()'s 
unambiguous prefix handling).



+++ b/qemu-nbd.texi
@@ -91,6 +91,10 @@ of the TLS credentials object previously created with the 
--object
  option.
  @item --fork
  Fork off the server process and exit the parent once the server is running.
+@item --tls-authz=ID
+Specify the ID of a qauthz object previously created with the


s/qauthz/authz-simple/ ?


+--object option. This will be used to authorize users who
+connect against their x509 distinguished name.


Sounds like someone is "connecting against their name", rather than 
"authorizing against their name".  Better might be:


This will be used to authorize connecting users against their x509 
distinguished name.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v2] migration: fix crash in when incoming client channel setup fails

2018-06-19 Thread Juan Quintela
Daniel P. Berrangé  wrote:
> The way we determine if we can start the incoming migration was
> changed to use migration_has_all_channels() in:
>
>   commit 428d89084c709e568f9cd301c2f6416a54c53d6d
>   Author: Juan Quintela 
>   Date:   Mon Jul 24 13:06:25 2017 +0200
>
> migration: Create migration_has_all_channels
>
> This method in turn calls multifd_recv_all_channels_created()
> which is hardcoded to always return 'true' when multifd is
> not in use. This is a latent bug...
>
> ...activated in in a following commit where that return result
> ends up acting as the flag to indicate whether it is possible
> to start processing the migration:
>
>   commit 36c2f8be2c4eb0003ac77a14910842b7ddd7337e
>   Author: Juan Quintela 
>   Date:   Wed Mar 7 08:40:52 2018 +0100
>
> migration: Delay start of migration main routines
>
> This means that if channel initialization fails with normal
> migration, it'll never notice and attempt to start the
> incoming migration regardless and crash on a NULL pointer.
>
> This can be seen, for example, if a client connects to a server
> requiring TLS, but has an invalid x509 certificate:
>
> qemu-system-x86_64: The certificate hasn't got a known issuer
> qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: 
> Assertion `mis->from_src_file' failed.
>
>  #0  0x7fffebd24f2b in raise () at /lib64/libc.so.6
>  #1  0x7fffebd0f561 in abort () at /lib64/libc.so.6
>  #2  0x7fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
>  #3  0x7fffebd1d692 in  () at /lib64/libc.so.6
>  #4  0x55ad027e in process_incoming_migration_co (opaque= out>) at migration/migration.c:386
>  #5  0x55c45e8b in coroutine_trampoline (i0=, 
> i1=) at util/coroutine-ucontext.c:116
>  #6  0x7fffebd3a6a0 in __start_context () at /lib64/libc.so.6
>  #7  0x in  ()
>
> To handle the non-multifd case, we check whether mis->from_src_file
> is non-NULL. With this in place, the migration server drops the
> rejected client and stays around waiting for another, hopefully
> valid, client to arrive.
>
> Signed-off-by: Daniel P. Berrangé 

Reviewed-by: Juan Quintela 



[Qemu-devel] [PATCH] simpletrace: Convert name from mapping record to str

2018-06-19 Thread Eduardo Habkost
The rest of the code assumes that idtoname is a (int -> str)
dictionary, so convert the data accordingly.

This is necessary to make the script work with Python 3 (where
reads from a binary file return 'bytes' objects, not 'str').

Fixes the following error:

  $ python3 ./scripts/simpletrace.py trace-events-all trace-27445
  b'object_class_dynamic_cast_assert' event is logged but is not \
  declared in the trace events file, try using trace-events-all instead.

Signed-off-by: Eduardo Habkost 
---
 scripts/simpletrace.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/simpletrace.py b/scripts/simpletrace.py
index d4a50a1e2b..4ad34f90cd 100755
--- a/scripts/simpletrace.py
+++ b/scripts/simpletrace.py
@@ -70,7 +70,7 @@ def get_record(edict, idtoname, rechdr, fobj):
 def get_mapping(fobj):
 (event_id, ) = struct.unpack('=Q', fobj.read(8))
 (len, ) = struct.unpack('=L', fobj.read(4))
-name = fobj.read(len)
+name = fobj.read(len).decode()
 
 return (event_id, name)
 
-- 
2.18.0.rc1.1.g3f1ff2140




[Qemu-devel] [PATCH v1 6/6] qga: removing bios_supports_mode

2018-06-19 Thread Daniel Henrique Barboza
bios_support_mode verifies if the guest has support for a certain
suspend mode but it doesn't inform back which suspend tool
provides it. The caller, guest_suspend, executes all suspend
strategies in order again.

After adding systemd suspend support, bios_support_mode now will
verify for support for systemd, then pmutils, then Linux sys state
file. In a worst case scenario where both systemd and pmutils isn't
supported but Linux sys state is:

- bios_supports_mode will check for systemd, then pmutils, then
Linux sys state. It will tell guest_suspend that there is support,
but it will not tell who provides it;

- guest_suspend will try to execute (and fail) systemd suspend,
then pmutils suspend, to only then use the Linux sys suspend.
The time spent executing systemd and pmutils suspend was wasted
and could be avoided, but only bios_support_mode knew it but
didn't inform it back.

A quicker approach is to nuke bios_supports_mode and control
whether we found support at all with a bool flag inside
guest_suspend. guest_suspend will search for suspend support
and execute it as soon as possible. If the a given suspend
mechanism fails, continue to the next. If no suspend
support is found, the "not supported" message is still being
sent back to the user.

Signed-off-by: Daniel Henrique Barboza 
---
 qga/commands-posix.c | 54 +++-
 1 file changed, 18 insertions(+), 36 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 6a573de86d..79acc28ee7 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1681,60 +1681,42 @@ static void linux_sys_state_suspend(SuspendMode mode, 
Error **errp)
 
 }
 
-static void bios_supports_mode(SuspendMode mode, Error **errp)
-{
-Error *local_err = NULL;
-bool ret;
-
-ret = systemd_supports_mode(mode, _err);
-if (ret) {
-return;
-}
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-ret = pmutils_supports_mode(mode, _err);
-if (ret) {
-return;
-}
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-ret = linux_sys_state_supports_mode(mode, _err);
-if (!ret) {
-error_setg(errp,
-   "the requested suspend mode is not supported by the guest");
-}
-}
-
 static void guest_suspend(SuspendMode mode, Error **errp)
 {
 Error *local_err = NULL;
+bool mode_supported = false;
 
-bios_supports_mode(mode, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
+if (systemd_supports_mode(mode, _err)) {
+mode_supported = true;
+systemd_suspend(mode, _err);
 }
 
-systemd_suspend(mode, _err);
 if (!local_err) {
 return;
 }
 
 local_err = NULL;
 
-pmutils_suspend(mode, _err);
+if (pmutils_supports_mode(mode, _err)) {
+mode_supported = true;
+pmutils_suspend(mode, _err);
+}
+
 if (!local_err) {
 return;
 }
 
 local_err = NULL;
 
-linux_sys_state_suspend(mode, _err);
-if (local_err) {
+if (linux_sys_state_supports_mode(mode, _err)) {
+mode_supported = true;
+linux_sys_state_suspend(mode, _err);
+}
+
+if (!mode_supported) {
+error_setg(errp,
+   "the requested suspend mode is not supported by the guest");
+} else if (local_err) {
 error_propagate(errp, local_err);
 }
 }
-- 
2.17.1




[Qemu-devel] [PATCH v1 5/6] qga: adding systemd hibernate/suspend/hybrid-sleep support

2018-06-19 Thread Daniel Henrique Barboza
pmutils isn't being supported by newer OSes like Fedora 27
or Mint. This means that the only suspend option QGA offers
for these guests are writing directly into the Linux sys state
file. This also means that  QGA also loses the ability to do
hybrid suspend in those guests - this suspend mode is only
available when using pmutils.

Newer guests can use systemd facilities to do all the suspend
times QGA supports. The mapping in comparison with pmutils is:

- pm-hibernate -> systemctl hibernate
- pm-suspend -> systemctl suspend
- pm-suspend-hybrid -> systemctl hybrid-sleep

To discover whether systemd supports these functions, we inspect
the status of the services that implements them.

With this patch, we can offer hybrid suspend again for newer
guests that do not have pmutils support anymore.

Signed-off-by: Daniel Henrique Barboza 
---
 qga/commands-posix.c | 72 
 1 file changed, 72 insertions(+)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index d5e3805ce9..6a573de86d 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1486,6 +1486,63 @@ out:
 return ret;
 }
 
+static bool systemd_supports_mode(SuspendMode mode, Error **errp)
+{
+Error *local_err = NULL;
+const char *systemctl_args[3] = {"systemd-hibernate", "systemd-suspend",
+ "systemd-hybrid-sleep"};
+const char *cmd[4] = {"systemctl", "status", systemctl_args[mode], NULL};
+int status;
+
+status = run_process_child(cmd, _err);
+
+/*
+ * systemctl status uses LSB return codes so we can expect
+ * status > 0 and be ok. To assert if the guest has support
+ * for the selected suspend mode, status should be < 4. 4 is
+ * the code for unknown service status, the return value when
+ * the service does not exist. A common value is status = 3
+ * (program is not running).
+ */
+if (status > 0 && status < 4) {
+return true;
+}
+
+if (local_err) {
+error_propagate(errp, local_err);
+}
+
+return false;
+}
+
+static void systemd_suspend(SuspendMode mode, Error **errp)
+{
+Error *local_err = NULL;
+const char *systemctl_args[3] = {"hibernate", "suspend", "hybrid-sleep"};
+const char *cmd[3] = {"systemctl", systemctl_args[mode], NULL};
+int status;
+
+status = run_process_child(cmd, _err);
+
+if (status == 0) {
+return;
+}
+
+if (status == -1) {
+error_setg(errp, "the helper program '%s' was not found",
+   systemctl_args[mode]);
+return;
+}
+
+if (local_err) {
+error_propagate(errp, local_err);
+} else {
+error_setg(errp, "the helper program 'systemctl %s' returned an "
+   " unexpected exit status code (%d)",
+   systemctl_args[mode], status);
+}
+}
+
 static bool pmutils_supports_mode(SuspendMode mode, Error **errp)
 {
 Error *local_err = NULL;
@@ -1629,6 +1686,14 @@ static void bios_supports_mode(SuspendMode mode, Error 
**errp)
 Error *local_err = NULL;
 bool ret;
 
+ret = systemd_supports_mode(mode, _err);
+if (ret) {
+return;
+}
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
 ret = pmutils_supports_mode(mode, _err);
 if (ret) {
 return;
@@ -1654,6 +1719,13 @@ static void guest_suspend(SuspendMode mode, Error **errp)
 return;
 }
 
+systemd_suspend(mode, _err);
+if (!local_err) {
+return;
+}
+
+local_err = NULL;
+
 pmutils_suspend(mode, _err);
 if (!local_err) {
 return;
-- 
2.17.1




Re: [Qemu-devel] [PATCH] ppc: Include vga cirrus card into the compiling process

2018-06-19 Thread Sebastian Bauer

Hello David,

Am 2018-06-19 06:36, schrieb David Gibson:

Ok.  However, your patch doesn't apply against the ppc-for-3.0 tree.
It looks like you've made it against a tree including some of BALATON
Zoltan's proposed but not yet merged patches.

Please make sure your patches are against the current ppc-for-3.0 tree
before posting.


Okay, I'm sorry for the wrong timing. It is okay to wait for Zoltan's 
patch queue to be applied before applying this patch as I don't want to 
introduce new conflicts in those patches.


Bye
Sebastian



[Qemu-devel] [PATCH v1 3/6] qga: guest_suspend: decoupling pm-utils and sys logic

2018-06-19 Thread Daniel Henrique Barboza
Following the same logic of the previous patch, let's also
decouple the suspend logic from guest_suspend into specialized
functions, one for each strategy we support at this moment.

Signed-off-by: Daniel Henrique Barboza 
---
 qga/commands-posix.c | 170 +++
 1 file changed, 108 insertions(+), 62 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 89ffd8dc88..a2870f9ab9 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1509,6 +1509,65 @@ out:
 return ret;
 }
 
+static void pmutils_suspend(int suspend_mode, Error **errp)
+{
+Error *local_err = NULL;
+const char *pmutils_bin;
+char *pmutils_path;
+pid_t pid;
+int status;
+
+switch (suspend_mode) {
+
+case SUSPEND_MODE_DISK:
+pmutils_bin = "pm-hibernate";
+break;
+case SUSPEND_MODE_RAM:
+pmutils_bin = "pm-suspend";
+break;
+case SUSPEND_MODE_HYBRID:
+pmutils_bin = "pm-suspend-hybrid";
+break;
+default:
+error_setg(errp, "unknown guest suspend mode");
+return;
+}
+
+pmutils_path = g_find_program_in_path(pmutils_bin);
+if (!pmutils_path) {
+error_setg(errp, "the helper program '%s' was not found", pmutils_bin);
+return;
+}
+
+pid = fork();
+if (!pid) {
+setsid();
+execle(pmutils_path, pmutils_bin, NULL, environ);
+/*
+ * If we get here execle() has failed.
+ */
+_exit(EXIT_FAILURE);
+} else if (pid < 0) {
+error_setg_errno(errp, errno, "failed to create child process");
+goto out;
+}
+
+ga_wait_child(pid, , _err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto out;
+}
+
+if (WEXITSTATUS(status)) {
+error_setg(errp,
+   "the helper program '%s' returned an unexpected exit status"
+   " code (%d)", pmutils_path, WEXITSTATUS(status));
+}
+
+out:
+g_free(pmutils_path);
+}
+
 static bool linux_sys_state_supports_mode(int suspend_mode, Error **errp)
 {
 const char *sysfile_str;
@@ -1545,64 +1604,28 @@ static bool linux_sys_state_supports_mode(int 
suspend_mode, Error **errp)
 return false;
 }
 
-static void bios_supports_mode(int suspend_mode, Error **errp)
-{
-Error *local_err = NULL;
-bool ret;
-
-ret = pmutils_supports_mode(suspend_mode, _err);
-if (ret) {
-return;
-}
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-ret = linux_sys_state_supports_mode(suspend_mode, errp);
-if (!ret) {
-error_setg(errp,
-   "the requested suspend mode is not supported by the guest");
-return;
-}
-}
-
-static void guest_suspend(int suspend_mode, Error **errp)
+static void linux_sys_state_suspend(int suspend_mode, Error **errp)
 {
 Error *local_err = NULL;
-const char *pmutils_bin, *sysfile_str;
-char *pmutils_path;
+const char *sysfile_str;
 pid_t pid;
 int status;
 
-bios_supports_mode(suspend_mode, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-
 switch (suspend_mode) {
 
 case SUSPEND_MODE_DISK:
-pmutils_bin = "pm-hibernate";
 sysfile_str = "disk";
 break;
 case SUSPEND_MODE_RAM:
-pmutils_bin = "pm-suspend";
 sysfile_str = "mem";
 break;
-case SUSPEND_MODE_HYBRID:
-pmutils_bin = "pm-suspend-hybrid";
-sysfile_str = NULL;
-break;
 default:
 error_setg(errp, "unknown guest suspend mode");
 return;
 }
 
-pmutils_path = g_find_program_in_path(pmutils_bin);
-
 pid = fork();
-if (pid == 0) {
+if (!pid) {
 /* child */
 int fd;
 
@@ -1611,19 +1634,6 @@ static void guest_suspend(int suspend_mode, Error **errp)
 reopen_fd_to_null(1);
 reopen_fd_to_null(2);
 
-if (pmutils_path) {
-execle(pmutils_path, pmutils_bin, NULL, environ);
-}
-
-/*
- * If we get here either pm-utils is not installed or execle() has
- * failed. Let's try the manual method if the caller wants it.
- */
-
-if (!sysfile_str) {
-_exit(EXIT_FAILURE);
-}
-
 fd = open(LINUX_SYS_STATE_FILE, O_WRONLY);
 if (fd < 0) {
 _exit(EXIT_FAILURE);
@@ -1636,27 +1646,63 @@ static void guest_suspend(int suspend_mode, Error 
**errp)
 _exit(EXIT_SUCCESS);
 } else if (pid < 0) {
 error_setg_errno(errp, errno, "failed to create child process");
-goto out;
+return;
 }
 
 ga_wait_child(pid, , _err);
 if (local_err) {
 error_propagate(errp, local_err);
-goto out;
-}
-
-if (!WIFEXITED(status)) {
-error_setg(errp, "child process has terminated abnormally");
-goto out;
+return;
 }
 
 if 

[Qemu-devel] [PATCH v1 1/6] qga: refactoring qmp_guest_suspend_* functions

2018-06-19 Thread Daniel Henrique Barboza
To be able to add new suspend mechanisms we need to detach
the existing QMP functions from the current implementation
specifics.

At this moment we have functions such as qmp_guest_suspend_ram
calling bios_suspend_mode and guest_suspend passing the
pmutils command and arguments as parameters. This patch
removes this logic from the QMP functions, moving them to
the respective functions that will have to deal with which
binary to use.

Signed-off-by: Daniel Henrique Barboza 
---
 qga/commands-posix.c | 87 
 1 file changed, 55 insertions(+), 32 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index eae817191b..63c49791a4 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1438,15 +1438,38 @@ qmp_guest_fstrim(bool has_minimum, int64_t minimum, 
Error **errp)
 #define LINUX_SYS_STATE_FILE "/sys/power/state"
 #define SUSPEND_SUPPORTED 0
 #define SUSPEND_NOT_SUPPORTED 1
+#define SUSPEND_MODE_DISK 1
+#define SUSPEND_MODE_RAM 2
+#define SUSPEND_MODE_HYBRID 3
 
-static void bios_supports_mode(const char *pmutils_bin, const char 
*pmutils_arg,
-   const char *sysfile_str, Error **errp)
+static void bios_supports_mode(int suspend_mode, Error **errp)
 {
 Error *local_err = NULL;
+const char *pmutils_arg, *sysfile_str;
+const char *pmutils_bin = "pm-is-supported";
 char *pmutils_path;
 pid_t pid;
 int status;
 
+switch (suspend_mode) {
+
+case SUSPEND_MODE_DISK:
+pmutils_arg = "--hibernate";
+sysfile_str = "disk";
+break;
+case SUSPEND_MODE_RAM:
+pmutils_arg = "--suspend";
+sysfile_str = "mem";
+break;
+case SUSPEND_MODE_HYBRID:
+pmutils_arg = "--suspend-hybrid";
+sysfile_str = NULL;
+break;
+default:
+error_setg(errp, "guest suspend mode not supported");
+return;
+}
+
 pmutils_path = g_find_program_in_path(pmutils_bin);
 
 pid = fork();
@@ -1523,14 +1546,39 @@ out:
 g_free(pmutils_path);
 }
 
-static void guest_suspend(const char *pmutils_bin, const char *sysfile_str,
-  Error **errp)
+static void guest_suspend(int suspend_mode, Error **errp)
 {
 Error *local_err = NULL;
+const char *pmutils_bin, *sysfile_str;
 char *pmutils_path;
 pid_t pid;
 int status;
 
+bios_supports_mode(suspend_mode, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+switch (suspend_mode) {
+
+case SUSPEND_MODE_DISK:
+pmutils_bin = "pm-hibernate";
+sysfile_str = "disk";
+break;
+case SUSPEND_MODE_RAM:
+pmutils_bin = "pm-suspend";
+sysfile_str = "mem";
+break;
+case SUSPEND_MODE_HYBRID:
+pmutils_bin = "pm-suspend-hybrid";
+sysfile_str = NULL;
+break;
+default:
+error_setg(errp, "unknown guest suspend mode");
+return;
+}
+
 pmutils_path = g_find_program_in_path(pmutils_bin);
 
 pid = fork();
@@ -1593,42 +1641,17 @@ out:
 
 void qmp_guest_suspend_disk(Error **errp)
 {
-Error *local_err = NULL;
-
-bios_supports_mode("pm-is-supported", "--hibernate", "disk", _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-
-guest_suspend("pm-hibernate", "disk", errp);
+guest_suspend(SUSPEND_MODE_DISK, errp);
 }
 
 void qmp_guest_suspend_ram(Error **errp)
 {
-Error *local_err = NULL;
-
-bios_supports_mode("pm-is-supported", "--suspend", "mem", _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-
-guest_suspend("pm-suspend", "mem", errp);
+guest_suspend(SUSPEND_MODE_RAM, errp);
 }
 
 void qmp_guest_suspend_hybrid(Error **errp)
 {
-Error *local_err = NULL;
-
-bios_supports_mode("pm-is-supported", "--suspend-hybrid", NULL,
-   _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-
-guest_suspend("pm-suspend-hybrid", NULL, errp);
+guest_suspend(SUSPEND_MODE_HYBRID, errp);
 }
 
 static GuestNetworkInterfaceList *
-- 
2.17.1




[Qemu-devel] [PATCH v1 2/6] qga: bios_supports_mode: decoupling pm-utils and sys logic

2018-06-19 Thread Daniel Henrique Barboza
In bios_supports_mode there is a verification to assert if
the chosen suspend mode is supported by the pmutils tools and,
if not, we see if the Linux sys state files supports it.

This verification is done in the same function, one after
the other, and it works for now. But, when adding a new
suspend mechanism that will not necessarily follow the same
return 0 or 1 logic of pmutils, this code will be hard
to deal with.

This patch decouple the two existing logics into their own
functions, pmutils_supports_mode and linux_sys_state_supports_mode,
which in turn are used inside bios_support_mode. The existing
logic is kept but now it's easier to extend it.

Signed-off-by: Daniel Henrique Barboza 
---
 qga/commands-posix.c | 116 +--
 1 file changed, 68 insertions(+), 48 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 63c49791a4..89ffd8dc88 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1442,75 +1442,43 @@ qmp_guest_fstrim(bool has_minimum, int64_t minimum, 
Error **errp)
 #define SUSPEND_MODE_RAM 2
 #define SUSPEND_MODE_HYBRID 3
 
-static void bios_supports_mode(int suspend_mode, Error **errp)
+static bool pmutils_supports_mode(int suspend_mode, Error **errp)
 {
 Error *local_err = NULL;
-const char *pmutils_arg, *sysfile_str;
+const char *pmutils_arg;
 const char *pmutils_bin = "pm-is-supported";
 char *pmutils_path;
 pid_t pid;
 int status;
+bool ret = false;
 
 switch (suspend_mode) {
 
 case SUSPEND_MODE_DISK:
 pmutils_arg = "--hibernate";
-sysfile_str = "disk";
 break;
 case SUSPEND_MODE_RAM:
 pmutils_arg = "--suspend";
-sysfile_str = "mem";
 break;
 case SUSPEND_MODE_HYBRID:
 pmutils_arg = "--suspend-hybrid";
-sysfile_str = NULL;
 break;
 default:
-error_setg(errp, "guest suspend mode not supported");
-return;
+return ret;
 }
 
 pmutils_path = g_find_program_in_path(pmutils_bin);
+if (!pmutils_path) {
+return ret;
+}
 
 pid = fork();
 if (!pid) {
-char buf[32]; /* hopefully big enough */
-ssize_t ret;
-int fd;
-
 setsid();
-reopen_fd_to_null(0);
-reopen_fd_to_null(1);
-reopen_fd_to_null(2);
-
-if (pmutils_path) {
-execle(pmutils_path, pmutils_bin, pmutils_arg, NULL, environ);
-}
-
+execle(pmutils_path, pmutils_bin, pmutils_arg, NULL, environ);
 /*
- * If we get here either pm-utils is not installed or execle() has
- * failed. Let's try the manual method if the caller wants it.
+ * If we get here execle() has failed.
  */
-
-if (!sysfile_str) {
-_exit(SUSPEND_NOT_SUPPORTED);
-}
-
-fd = open(LINUX_SYS_STATE_FILE, O_RDONLY);
-if (fd < 0) {
-_exit(SUSPEND_NOT_SUPPORTED);
-}
-
-ret = read(fd, buf, sizeof(buf)-1);
-if (ret <= 0) {
-_exit(SUSPEND_NOT_SUPPORTED);
-}
-buf[ret] = '\0';
-
-if (strstr(buf, sysfile_str)) {
-_exit(SUSPEND_SUPPORTED);
-}
-
 _exit(SUSPEND_NOT_SUPPORTED);
 } else if (pid < 0) {
 error_setg_errno(errp, errno, "failed to create child process");
@@ -1523,17 +1491,11 @@ static void bios_supports_mode(int suspend_mode, Error 
**errp)
 goto out;
 }
 
-if (!WIFEXITED(status)) {
-error_setg(errp, "child process has terminated abnormally");
-goto out;
-}
-
 switch (WEXITSTATUS(status)) {
 case SUSPEND_SUPPORTED:
+ret = true;
 goto out;
 case SUSPEND_NOT_SUPPORTED:
-error_setg(errp,
-   "the requested suspend mode is not supported by the guest");
 goto out;
 default:
 error_setg(errp,
@@ -1544,6 +1506,64 @@ static void bios_supports_mode(int suspend_mode, Error 
**errp)
 
 out:
 g_free(pmutils_path);
+return ret;
+}
+
+static bool linux_sys_state_supports_mode(int suspend_mode, Error **errp)
+{
+const char *sysfile_str;
+char buf[32]; /* hopefully big enough */
+int fd;
+ssize_t ret;
+
+switch (suspend_mode) {
+
+case SUSPEND_MODE_DISK:
+sysfile_str = "disk";
+break;
+case SUSPEND_MODE_RAM:
+sysfile_str = "mem";
+break;
+default:
+return false;
+}
+
+fd = open(LINUX_SYS_STATE_FILE, O_RDONLY);
+if (fd < 0) {
+return false;
+}
+
+ret = read(fd, buf, sizeof(buf) - 1);
+if (ret <= 0) {
+return false;
+}
+buf[ret] = '\0';
+
+if (strstr(buf, sysfile_str)) {
+return true;
+}
+return false;
+}
+
+static void bios_supports_mode(int suspend_mode, Error **errp)
+{
+Error *local_err = NULL;
+bool ret;
+
+ret = pmutils_supports_mode(suspend_mode, _err);
+if (ret) {
+return;
+}
+if 

[Qemu-devel] [PATCH v1 4/6] qga: removing switch statements, adding run_process_child

2018-06-19 Thread Daniel Henrique Barboza
This is a cleanup of the resulting code after detaching
pmutils and Linux sys state file logic:

- remove the SUSPEND_MODE_* macros and use an enumeration
instead. At the same time, drop the switch statements
at the start of each function and use the enumeration
index to get the right binary/argument;

- create a new function called run_process_child(). This
function creates a child process and executes a (shell)
command, returning the command return code. This is a common
operation in the pmutils functions and will be used in the
systemd implementation as well, so this function will avoid
code repetition.

There are more places inside commands-posix.c where this new
run_process_child function can also be used, but one step
at a time.

Signed-off-by: Daniel Henrique Barboza 
---
 qga/commands-posix.c | 190 +--
 1 file changed, 76 insertions(+), 114 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index a2870f9ab9..d5e3805ce9 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1438,152 +1438,122 @@ qmp_guest_fstrim(bool has_minimum, int64_t minimum, 
Error **errp)
 #define LINUX_SYS_STATE_FILE "/sys/power/state"
 #define SUSPEND_SUPPORTED 0
 #define SUSPEND_NOT_SUPPORTED 1
-#define SUSPEND_MODE_DISK 1
-#define SUSPEND_MODE_RAM 2
-#define SUSPEND_MODE_HYBRID 3
 
-static bool pmutils_supports_mode(int suspend_mode, Error **errp)
+typedef enum {
+SUSPEND_MODE_DISK = 0,
+SUSPEND_MODE_RAM = 1,
+SUSPEND_MODE_HYBRID = 2,
+} SuspendMode;
+
+static int run_process_child(const char *command[], Error **errp)
 {
 Error *local_err = NULL;
-const char *pmutils_arg;
-const char *pmutils_bin = "pm-is-supported";
-char *pmutils_path;
+char *cmd_path = g_find_program_in_path(command[0]);
 pid_t pid;
-int status;
-bool ret = false;
-
-switch (suspend_mode) {
-
-case SUSPEND_MODE_DISK:
-pmutils_arg = "--hibernate";
-break;
-case SUSPEND_MODE_RAM:
-pmutils_arg = "--suspend";
-break;
-case SUSPEND_MODE_HYBRID:
-pmutils_arg = "--suspend-hybrid";
-break;
-default:
-return ret;
-}
+int status, ret = -1;
 
-pmutils_path = g_find_program_in_path(pmutils_bin);
-if (!pmutils_path) {
+if (!cmd_path) {
 return ret;
 }
 
 pid = fork();
 if (!pid) {
 setsid();
-execle(pmutils_path, pmutils_bin, pmutils_arg, NULL, environ);
 /*
- * If we get here execle() has failed.
+ * execve receives a char* const argv[] as second arg but we're
+ * receiving a const char*[]. Since execve does not change the
+ * array contents it's tolerable to cast here.
  */
-_exit(SUSPEND_NOT_SUPPORTED);
+execve(cmd_path, (char* const*)command, environ);
+_exit(errno);
 } else if (pid < 0) {
 error_setg_errno(errp, errno, "failed to create child process");
+ret = EXIT_FAILURE;
 goto out;
 }
 
 ga_wait_child(pid, , _err);
 if (local_err) {
 error_propagate(errp, local_err);
+ret = EXIT_FAILURE;
 goto out;
 }
 
-switch (WEXITSTATUS(status)) {
-case SUSPEND_SUPPORTED:
-ret = true;
-goto out;
-case SUSPEND_NOT_SUPPORTED:
-goto out;
-default:
-error_setg(errp,
-   "the helper program '%s' returned an unexpected exit status"
-   " code (%d)", pmutils_path, WEXITSTATUS(status));
-goto out;
-}
+ret = WEXITSTATUS(status);
 
 out:
-g_free(pmutils_path);
+g_free(cmd_path);
 return ret;
 }
 
-static void pmutils_suspend(int suspend_mode, Error **errp)
+static bool pmutils_supports_mode(SuspendMode mode, Error **errp)
 {
 Error *local_err = NULL;
-const char *pmutils_bin;
-char *pmutils_path;
-pid_t pid;
+const char *pmutils_args[3] = {"--hibernate", "--suspend",
+   "--suspend-hybrid"};
+const char *cmd[3] = {"pm-is-supported", pmutils_args[mode], NULL};
 int status;
 
-switch (suspend_mode) {
-
-case SUSPEND_MODE_DISK:
-pmutils_bin = "pm-hibernate";
-break;
-case SUSPEND_MODE_RAM:
-pmutils_bin = "pm-suspend";
-break;
-case SUSPEND_MODE_HYBRID:
-pmutils_bin = "pm-suspend-hybrid";
-break;
-default:
-error_setg(errp, "unknown guest suspend mode");
-return;
-}
+status = run_process_child(cmd, _err);
 
-pmutils_path = g_find_program_in_path(pmutils_bin);
-if (!pmutils_path) {
-error_setg(errp, "the helper program '%s' was not found", pmutils_bin);
-return;
+if (status == SUSPEND_SUPPORTED) {
+return true;
 }
 
-pid = fork();
-if (!pid) {
-setsid();
-execle(pmutils_path, pmutils_bin, NULL, environ);
-/*
- * If we get here execle() has failed.
- */
-

[Qemu-devel] [PATCH v1 0/6] QGA: systemd hibernate/suspend/hybrid-sleep

2018-06-19 Thread Daniel Henrique Barboza
This series adds systemd suspend support for QGA. Some newer
guests don't have pmutils anymore, leaving us with just the
Linux state file mechanism to suspend the guest OS, which does
not support hybrid-sleep. With this implementation, QGA is
now able to hybrid suspend newer guests again.

Most of the patches are cleanups in the existing suspend code,
aiming at both simplifying it and making it easier to extend
it with systemd.


Note: checkpatch.pl complains about patch 3:

ERROR: "(foo* const*)" should be "(foo * const*)"
#94: FILE: qga/commands-posix.c:1467:
+execve(cmd_path, (char* const*)command, environ);

ERROR: space required before that '*' (ctx:VxB)
#94: FILE: qga/commands-posix.c:1467:
+execve(cmd_path, (char* const*)command, environ);


Not sure how to make it know that this is a cast instead
of a math operation. Suggestions welcome

Daniel Henrique Barboza (6):
  qga: refactoring qmp_guest_suspend_* functions
  qga: bios_supports_mode: decoupling pm-utils and sys logic
  qga: guest_suspend: decoupling pm-utils and sys logic
  qga: removing switch statements, adding run_process_child
  qga: adding systemd hibernate/suspend/hybrid-sleep support
  qga: removing bios_supports_mode

 qga/commands-posix.c | 315 ---
 1 file changed, 210 insertions(+), 105 deletions(-)

-- 
2.17.1




Re: [Qemu-devel] [PATCH] [RFC v2] aio: properly bubble up errors from initialization

2018-06-19 Thread Eric Blake

On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote:

laio_init() can fail for a couple of reasons, which will lead to a NULL
pointer dereference in laio_attach_aio_context().

To solve this, add a aio_setup_linux_aio() function which is called
before aio_get_linux_aio() where it is called currently, and which
propogates setup errors up. The signature of aio_get_linux_aio() was not


s/propogates/propagates/


modified, because it seems preferable to return the actual errno from
the possible failing initialization calls.

With respect to the error-handling in the file-posix.c, we properly
bubble any errors up in raw_co_prw and in the case s of
raw_aio_{,un}plug, the result is the same as if s->use_linux_aio was not
set (but there is no bubbling up). In all three cases, if the setup
function fails, we fallback to the thread pool and an error message is
emitted.

It is trivial to make qemu segfault in my testing. Set
/proc/sys/fs/aio-max-nr to 0 and start a guest with
aio=native,cache=directsync. With this patch, the guest successfully
starts (but obviously isn't using native AIO). Setting aio-max-nr back
up to a reasonable value, AIO contexts are consumed normally.

Signed-off-by: Nishanth Aravamudan 

---

Changes from v1 -> v2:


When posting a v2, it's best to post as a new thread, rather than 
in-reply-to the v1 thread, so that automated tooling knows to check the 
new patch.  More patch submission tips at 
https://wiki.qemu.org/Contribute/SubmitAPatch




Rather than affect virtio-scsi/blk at all, make all the changes internal
to file-posix.c. Thanks to Kevin Wolf for the suggested change.
---
  block/file-posix.c  | 24 
  block/linux-aio.c   | 15 ++-
  include/block/aio.h |  3 +++
  include/block/raw-aio.h |  2 +-
  stubs/linux-aio.c   |  2 +-
  util/async.c| 15 ---
  6 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 07bb061fe4..2415d09bf1 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1665,6 +1665,14 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, 
uint64_t offset,
  type |= QEMU_AIO_MISALIGNED;
  #ifdef CONFIG_LINUX_AIO
  } else if (s->use_linux_aio) {
+int rc;
+rc = aio_setup_linux_aio(bdrv_get_aio_context(bs));
+if (rc != 0) {
+error_report("Unable to use native AIO, falling back to "
+ "thread pool.");


In general, error_report() should not output a trailing '.'.


+s->use_linux_aio = 0;
+return rc;


Wait - the message claims we are falling back, but the non-zero return 
code sounds like we are returning an error instead of falling back.  (My 
preference - if the user requested something and we can't do it, it's 
better to error than to fall back to something that does not match the 
user's request).



+}
  LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
  assert(qiov->size == bytes);
  return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
@@ -1695,6 +1703,14 @@ static void raw_aio_plug(BlockDriverState *bs)
  #ifdef CONFIG_LINUX_AIO
  BDRVRawState *s = bs->opaque;
  if (s->use_linux_aio) {
+int rc;
+rc = aio_setup_linux_aio(bdrv_get_aio_context(bs));
+if (rc != 0) {
+error_report("Unable to use native AIO, falling back to "
+ "thread pool.");
+s->use_linux_aio = 0;


Should s->use_linux_aio be a bool instead of an int?

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PULL 08/26] qobject: Move block-specific qdict code to block-qdict.c

2018-06-19 Thread Eric Blake

On 06/15/2018 09:20 AM, Kevin Wolf wrote:

From: Markus Armbruster 

Pure code motion, except for two brace placements and a comment
tweaked to appease checkpatch.

Signed-off-by: Markus Armbruster 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
  qobject/block-qdict.c | 640 
  qobject/qdict.c   | 629 
  tests/check-block-qdict.c | 655 ++
  tests/check-qdict.c   | 642 -
  MAINTAINERS   |   2 +
  qobject/Makefile.objs |   1 +
  tests/Makefile.include|   4 +
  7 files changed, 1302 insertions(+), 1271 deletions(-)
  create mode 100644 qobject/block-qdict.c
  create mode 100644 tests/check-block-qdict.c


and missing a change to tests/.gitignore, so that 
tests/check-block-qdict now shows up as an untracked file on an in-tree 
build.


(We really should follow through with our threat of renaming all the 
tests to use a consistent suffix-based pattern, as it's much easier to 
gitignore a suffix than a prefix)


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.

2018-06-19 Thread Michael S. Tsirkin
On Tue, Jun 19, 2018 at 02:02:10PM -0500, Venu Busireddy wrote:
> On 2018-06-19 21:53:01 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 19, 2018 at 01:36:17PM -0500, Venu Busireddy wrote:
> > > On 2018-06-19 21:21:23 +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 19, 2018 at 01:14:06PM -0500, Venu Busireddy wrote:
> > > > > On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote:
> > > > > > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain 
> > > > > > > the
> > > > > > > "Group Identifier" (UUID) that will be used to pair a virtio 
> > > > > > > device with
> > > > > > > the passthrough device attached to that bridge.
> > > > > > > 
> > > > > > > This capability is added to the bridge iff the "uuid" option is 
> > > > > > > specified
> > > > > > > for the bridge device, via the qemu command line. Also, the 
> > > > > > > bridge's
> > > > > > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is 
> > > > > > > changed
> > > > > > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), 
> > > > > > > when the
> > > > > > > "uuid" option is present.
> > > > > > > 
> > > > > > > Signed-off-by: Venu Busireddy 
> > > > > > 
> > > > > > I don't see why we should add it to all bridges.
> > > > > > Let's just add it to ones that already have the RH vendor ID?
> > > > > 
> > > > > No. I am not adding the capability to all bridges.
> > > > > 
> > > > > In the earlier discussions, we agreed that the bridge be left as
> > > > > Intel bridge if we do not intend to use it for storing the pairing
> > > > > information. If we do intend to store the pairing information in the
> > > > > bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to
> > > > > avoid confusion. In other words, bridge's with RH Vendor ID come into
> > > > > existence only when there is an intent to store the pairing 
> > > > > information
> > > > > in the bridge.
> > > > > 
> > > > > Accordingly, if the "uuid" option is specified for the bridge, it
> > > > > is assumed that the user intends to use the bridge for storing the
> > > > > pairing information, and hence, the capability is added to the bridge,
> > > > > and the Vendor ID is changed to RH Vendor ID. If the "uuid" option
> > > > > is not specified, the bridge remains as Intel bridge, and without the
> > > > > vendor-specific capability.
> > > > > 
> > > > > Venu
> > > > 
> > > > Yes but the way to do it is not to tweak the vendor and device ID,
> > > > instead, just add the UUID property to bridges that already have the
> > > > correct vendor and device id.
> > > 
> > > I was using ioh3420 as the bridge device, because that is what is
> > > recommended here:
> > > 
> > >   https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/pcie.txt;hb=HEAD
> > > 
> > > ioh3420 defaults to the Intel Vendor ID. Hence the tweak to change the
> > > Vendor ID to RH Vendor ID.
> > > 
> > > Is there another bridge device other than ioh3420 that I should use?
> > > what device do you suggest? 
> > > 
> > > Thanks,
> > > 
> > > Venu
> > 
> > For pci, use hw/pci-bridge/pci_bridge_dev.c
> > Maybe allocate a special ID for grouping bridges.
> > 
> > For express, add your own downstream port.
> 
> Specifically, on the command line, what device does the user specify?
> For example:
> 
>  qemu-system-x86_64 --device ${Bridge_Device},uuid="uuid string",
> 
> What does the user specify for ${Bridge_Device} from the following:
> 
> "i82801b11-bridge", bus PCI
> "ioh3420", bus PCI, desc "Intel IOH device id 3420 PCIE Root Port"
> "pci-bridge", bus PCI, desc "Standard PCI Bridge"

This one. Or add pci-bridge-group.

> "pci-bridge-seat", bus PCI, desc "Standard PCI Bridge (multiseat)"
> "pcie-pci-bridge", bus PCI
> "pcie-root-port", bus PCI, desc "PCI Express Root Port"
> "pxb", bus PCI, desc "PCI Expander Bridge"
> "pxb-pcie", bus PCI, desc "PCI Express Expander Bridge"
> "usb-host", bus usb-bus
> "usb-hub", bus usb-bus
> "vfio-pci-igd-lpc-bridge", bus PCI, desc "VFIO dummy ISA/LPC bridge for IGD 
> assignment"
> "x3130-upstream", bus PCI, desc "TI X3130 Upstream Port of PCI Express Switch"
> "xio3130-downstream", bus PCI, desc "TI X3130 Downstream Port of PCI Express 
> Switch"
> 
> Or, are you suggesting that I add a new type of device? If latter, what
> should it be called?
> 
> Thanks,
> 

For express, add pcie-downstream or pcie-downstream-group.

> 
> > 
> > 
> > > > 
> > > > 
> > > > > > 
> > > > > > 
> > > > > > > ---
> > > > > > >  hw/pci-bridge/ioh3420.c|  2 ++
> > > > > > >  hw/pci-bridge/pcie_root_port.c |  7 +++
> > > > > > >  hw/pci/pci_bridge.c| 32 
> > > > > > > 
> > > > > > >  include/hw/pci/pci.h   |  2 ++
> > > > > > >  include/hw/pci/pcie.h  |  1 +
> > > > > > >  include/hw/pci/pcie_port.h |  1 +
> > > > > > >  6 files changed, 45 insertions(+)
> > > > > > > 
> > > > > > > diff --git 

Re: [Qemu-devel] [PATCH v5 2/6] nbd/server: refactor NBDExportMetaContexts

2018-06-19 Thread Eric Blake

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Use NBDExport pointer instead of just export name: there no needs to


s/no needs/is no need/


store duplicated name in the struct, moreover, NBDExport will be used
further.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  nbd/server.c | 23 +++
  1 file changed, 11 insertions(+), 12 deletions(-)




@@ -399,10 +399,9 @@ static int nbd_negotiate_handle_list(NBDClient *client, 
Error **errp)
  return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
  }
  
-static void nbd_check_meta_export_name(NBDClient *client)

+static void nbd_check_meta_export(NBDClient *client)
  {
-client->export_meta.valid &= !strcmp(client->exp->name,
- client->export_meta.export_name);
+client->export_meta.valid &= client->exp == client->export_meta.exp;


Changes from string comparison to pointer comparison...


@@ -853,15 +852,15 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
  
  memset(meta, 0, sizeof(*meta));
  
-ret = nbd_opt_read_name(client, meta->export_name, NULL, errp);

+ret = nbd_opt_read_name(client, export_name, NULL, errp);
  if (ret <= 0) {
  return ret;
  }
  
-exp = nbd_export_find(meta->export_name);

-if (exp == NULL) {
+meta->exp = nbd_export_find(export_name);
+if (meta->exp == NULL) {


...by remembering the results of the string comparison performed under 
the hood.  Looks good.


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v2 05/11] hw/arm/virt: GICv3 DT node with one or two redistributor regions

2018-06-19 Thread Ard Biesheuvel
On 19 June 2018 at 20:53, Laszlo Ersek  wrote:
> Hi Eric,
>
> sorry about the late followup. I have one question (mainly for Ard):
>
> On 06/15/18 16:28, Eric Auger wrote:
>> This patch allows the creation of a GICv3 node with 1 or 2
>> redistributor regions depending on the number of smu_cpus.
>> The second redistributor region is located just after the
>> existing RAM region, at 256GB and contains up to up to 512 vcpus.
>>
>> Please refer to kernel documentation for further node details:
>> Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt
>>
>> Signed-off-by: Eric Auger 
>> Reviewed-by: Andrew Jones 
>>
>> ---
>> v1 (virt3.0) -> v2
>> - Added Drew's R-b
>>
>> v2 -> v3:
>> - VIRT_GIC_REDIST2 is now 64MB large, ie. 512 redistributor capacity
>> - virt_gicv3_redist_region_count does not test kvm_irqchip_in_kernel
>>   anymore
>> ---
>>  hw/arm/virt.c | 29 -
>>  include/hw/arm/virt.h | 14 ++
>>  2 files changed, 38 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 2885d18..d9f72eb 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -148,6 +148,8 @@ static const MemMapEntry a15memmap[] = {
>>  [VIRT_PCIE_PIO] =   { 0x3eff, 0x0001 },
>>  [VIRT_PCIE_ECAM] =  { 0x3f00, 0x0100 },
>>  [VIRT_MEM] ={ 0x4000, RAMLIMIT_BYTES },
>> +/* Additional 64 MB redist region (can contain up to 512 
>> redistributors) */
>> +[VIRT_GIC_REDIST2] ={ 0x40ULL, 0x400 },
>>  /* Second PCIe window, 512GB wide at the 512GB boundary */
>>  [VIRT_PCIE_MMIO_HIGH] =   { 0x80ULL, 0x80ULL },
>>  };
>> @@ -401,13 +403,30 @@ static void fdt_add_gic_node(VirtMachineState *vms)
>>  qemu_fdt_setprop_cell(vms->fdt, "/intc", "#size-cells", 0x2);
>>  qemu_fdt_setprop(vms->fdt, "/intc", "ranges", NULL, 0);
>>  if (vms->gic_version == 3) {
>> +int nb_redist_regions = virt_gicv3_redist_region_count(vms);
>> +
>>  qemu_fdt_setprop_string(vms->fdt, "/intc", "compatible",
>>  "arm,gic-v3");
>> -qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg",
>> - 2, vms->memmap[VIRT_GIC_DIST].base,
>> - 2, vms->memmap[VIRT_GIC_DIST].size,
>> - 2, vms->memmap[VIRT_GIC_REDIST].base,
>> - 2, vms->memmap[VIRT_GIC_REDIST].size);
>> +
>> +qemu_fdt_setprop_cell(vms->fdt, "/intc",
>> +  "#redistributor-regions", nb_redist_regions);
>> +
>> +if (nb_redist_regions == 1) {
>> +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg",
>> + 2, vms->memmap[VIRT_GIC_DIST].base,
>> + 2, vms->memmap[VIRT_GIC_DIST].size,
>> + 2, 
>> vms->memmap[VIRT_GIC_REDIST].base,
>> + 2, 
>> vms->memmap[VIRT_GIC_REDIST].size);
>> +} else {
>> +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg",
>> + 2, vms->memmap[VIRT_GIC_DIST].base,
>> + 2, vms->memmap[VIRT_GIC_DIST].size,
>> + 2, 
>> vms->memmap[VIRT_GIC_REDIST].base,
>> + 2, 
>> vms->memmap[VIRT_GIC_REDIST].size,
>> + 2, 
>> vms->memmap[VIRT_GIC_REDIST2].base,
>> + 2, 
>> vms->memmap[VIRT_GIC_REDIST2].size);
>> +}
>> +
>>  if (vms->virt) {
>>  qemu_fdt_setprop_cells(vms->fdt, "/intc", "interrupts",
>> GIC_FDT_IRQ_TYPE_PPI, 
>> ARCH_GICV3_MAINT_IRQ,
>
> In edk2, we have the following code in
> "ArmVirtPkg/Library/ArmVirtGicArchLib/ArmVirtGicArchLib.c":
>
>   switch (GicRevision) {
>
>   case 3:
> //
> // The GIC v3 DT binding describes a series of at least 3 physical (base
> // addresses, size) pairs: the distributor interface (GICD), at least one
> // redistributor region (GICR) containing dedicated redistributor
> // interfaces for all individual CPUs, and the CPU interface (GICC).
> // Under virtualization, we assume that the first redistributor region
> // listed covers the boot CPU. Also, our GICv3 driver only supports the
> // system register CPU interface, so we can safely ignore the MMIO version
> // which is listed after the sequence of redistributor interfaces.
> // This means we are only interested in the first two memory regions
> // supplied, and ignore everything else.
> //
> ASSERT (RegSize >= 32);
>
> // RegProp[0..1] == { GICD base, GICD size }
> DistBase = SwapBytes64 (Reg[0]);
> ASSERT (DistBase < 

Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.

2018-06-19 Thread Venu Busireddy
On 2018-06-19 21:53:01 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 19, 2018 at 01:36:17PM -0500, Venu Busireddy wrote:
> > On 2018-06-19 21:21:23 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jun 19, 2018 at 01:14:06PM -0500, Venu Busireddy wrote:
> > > > On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote:
> > > > > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote:
> > > > > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain 
> > > > > > the
> > > > > > "Group Identifier" (UUID) that will be used to pair a virtio device 
> > > > > > with
> > > > > > the passthrough device attached to that bridge.
> > > > > > 
> > > > > > This capability is added to the bridge iff the "uuid" option is 
> > > > > > specified
> > > > > > for the bridge device, via the qemu command line. Also, the bridge's
> > > > > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is 
> > > > > > changed
> > > > > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when 
> > > > > > the
> > > > > > "uuid" option is present.
> > > > > > 
> > > > > > Signed-off-by: Venu Busireddy 
> > > > > 
> > > > > I don't see why we should add it to all bridges.
> > > > > Let's just add it to ones that already have the RH vendor ID?
> > > > 
> > > > No. I am not adding the capability to all bridges.
> > > > 
> > > > In the earlier discussions, we agreed that the bridge be left as
> > > > Intel bridge if we do not intend to use it for storing the pairing
> > > > information. If we do intend to store the pairing information in the
> > > > bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to
> > > > avoid confusion. In other words, bridge's with RH Vendor ID come into
> > > > existence only when there is an intent to store the pairing information
> > > > in the bridge.
> > > > 
> > > > Accordingly, if the "uuid" option is specified for the bridge, it
> > > > is assumed that the user intends to use the bridge for storing the
> > > > pairing information, and hence, the capability is added to the bridge,
> > > > and the Vendor ID is changed to RH Vendor ID. If the "uuid" option
> > > > is not specified, the bridge remains as Intel bridge, and without the
> > > > vendor-specific capability.
> > > > 
> > > > Venu
> > > 
> > > Yes but the way to do it is not to tweak the vendor and device ID,
> > > instead, just add the UUID property to bridges that already have the
> > > correct vendor and device id.
> > 
> > I was using ioh3420 as the bridge device, because that is what is
> > recommended here:
> > 
> >   https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/pcie.txt;hb=HEAD
> > 
> > ioh3420 defaults to the Intel Vendor ID. Hence the tweak to change the
> > Vendor ID to RH Vendor ID.
> > 
> > Is there another bridge device other than ioh3420 that I should use?
> > what device do you suggest? 
> > 
> > Thanks,
> > 
> > Venu
> 
> For pci, use hw/pci-bridge/pci_bridge_dev.c
> Maybe allocate a special ID for grouping bridges.
> 
> For express, add your own downstream port.

Specifically, on the command line, what device does the user specify?
For example:

 qemu-system-x86_64 --device ${Bridge_Device},uuid="uuid string",

What does the user specify for ${Bridge_Device} from the following:

"i82801b11-bridge", bus PCI
"ioh3420", bus PCI, desc "Intel IOH device id 3420 PCIE Root Port"
"pci-bridge", bus PCI, desc "Standard PCI Bridge"
"pci-bridge-seat", bus PCI, desc "Standard PCI Bridge (multiseat)"
"pcie-pci-bridge", bus PCI
"pcie-root-port", bus PCI, desc "PCI Express Root Port"
"pxb", bus PCI, desc "PCI Expander Bridge"
"pxb-pcie", bus PCI, desc "PCI Express Expander Bridge"
"usb-host", bus usb-bus
"usb-hub", bus usb-bus
"vfio-pci-igd-lpc-bridge", bus PCI, desc "VFIO dummy ISA/LPC bridge for IGD 
assignment"
"x3130-upstream", bus PCI, desc "TI X3130 Upstream Port of PCI Express Switch"
"xio3130-downstream", bus PCI, desc "TI X3130 Downstream Port of PCI Express 
Switch"

Or, are you suggesting that I add a new type of device? If latter, what
should it be called?

Thanks,



> 
> 
> > > 
> > > 
> > > > > 
> > > > > 
> > > > > > ---
> > > > > >  hw/pci-bridge/ioh3420.c|  2 ++
> > > > > >  hw/pci-bridge/pcie_root_port.c |  7 +++
> > > > > >  hw/pci/pci_bridge.c| 32 
> > > > > > 
> > > > > >  include/hw/pci/pci.h   |  2 ++
> > > > > >  include/hw/pci/pcie.h  |  1 +
> > > > > >  include/hw/pci/pcie_port.h |  1 +
> > > > > >  6 files changed, 45 insertions(+)
> > > > > > 
> > > > > > diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
> > > > > > index a451d74ee6..b6b9ebc726 100644
> > > > > > --- a/hw/pci-bridge/ioh3420.c
> > > > > > +++ b/hw/pci-bridge/ioh3420.c
> > > > > > @@ -35,6 +35,7 @@
> > > > > >  #define IOH_EP_MSI_SUPPORTED_FLAGS  PCI_MSI_FLAGS_MASKBIT
> > > > > >  #define IOH_EP_MSI_NR_VECTOR2
> > > > > >  #define IOH_EP_EXP_OFFSET   0x90
> > > > > > +#define 

Re: [Qemu-devel] [PATCH V7 RESEND 12/17] savevm: split the process of different stages for loadvm/savevm

2018-06-19 Thread Dr. David Alan Gilbert
* Zhang Chen (zhangc...@gmail.com) wrote:
> On Wed, May 16, 2018 at 2:56 AM, Dr. David Alan Gilbert  > wrote:
> 
> > * Zhang Chen (zhangc...@gmail.com) wrote:
> > > From: zhanghailiang 
> > >
> > > There are several stages during loadvm/savevm process. In different
> > stage,
> > > migration incoming processes different types of sections.
> > > We want to control these stages more accuracy, it will benefit COLO
> > > performance, we don't have to save type of QEMU_VM_SECTION_START
> > > sections everytime while do checkpoint, besides, we want to separate
> > > the process of saving/loading memory and devices state.
> > >
> > > So we add three new helper functions: qemu_load_device_state() and
> > > qemu_savevm_live_state() to achieve different process during migration.
> > >
> > > Besides, we make qemu_loadvm_state_main() and qemu_save_device_state()
> > > public, and simplify the codes of qemu_save_device_state() by calling the
> > > wrapper qemu_savevm_state_header().
> > >
> > > Signed-off-by: zhanghailiang 
> > > Signed-off-by: Li Zhijian 
> > > Signed-off-by: Zhang Chen 
> > > Reviewed-by: Dr. David Alan Gilbert 
> > > ---
> > >  migration/colo.c   | 36 
> > >  migration/savevm.c | 35 ---
> > >  migration/savevm.h |  4 
> > >  3 files changed, 60 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/migration/colo.c b/migration/colo.c
> > > index cdff0a2490..5b055f79f1 100644
> > > --- a/migration/colo.c
> > > +++ b/migration/colo.c
> > > @@ -30,6 +30,7 @@
> > >  #include "block/block.h"
> > >  #include "qapi/qapi-events-migration.h"
> > >  #include "qapi/qmp/qerror.h"
> > > +#include "sysemu/cpus.h"
> > >
> > >  static bool vmstate_loading;
> > >  static Notifier packets_compare_notifier;
> > > @@ -414,23 +415,30 @@ static int 
> > > colo_do_checkpoint_transaction(MigrationState
> > *s,
> > >
> > >  /* Disable block migration */
> > >  migrate_set_block_enabled(false, _err);
> > > -qemu_savevm_state_header(fb);
> > > -qemu_savevm_state_setup(fb);
> > >  qemu_mutex_lock_iothread();
> > >  replication_do_checkpoint_all(_err);
> > >  if (local_err) {
> > >  qemu_mutex_unlock_iothread();
> > >  goto out;
> > >  }
> > > -qemu_savevm_state_complete_precopy(fb, false, false);
> > > -qemu_mutex_unlock_iothread();
> > > -
> > > -qemu_fflush(fb);
> > >
> > >  colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND,
> > _err);
> > >  if (local_err) {
> > >  goto out;
> > >  }
> > > +/*
> > > + * Only save VM's live state, which not including device state.
> > > + * TODO: We may need a timeout mechanism to prevent COLO process
> > > + * to be blocked here.
> > > + */
> >
> > I guess that's the downside to transmitting it directly than into the
> > buffer;
> > Peter Xu's OOB command system would let you kill the connection - and
> > that's something I think COLO should use.
> > Still the change saves you having that huge outgoing buffer on the
> > source side and lets you start sending the checkpoint sooner, which
> > means the pause time should be smaller.
> >
> 
> Yes, you are right.
> But I think this is a performance optimization, this series focus on
> enabling.
> I will do this job in the future.
> 
> 
> >
> > > +qemu_savevm_live_state(s->to_dst_file);
> >
> > Does this actually need to be inside of the qemu_mutex_lock_iothread?
> > I'm pretty sure the device_state needs to be, but I'm not sure the
> > live_state needs to.
> >
> 
> I have checked the codes, qemu_savevm_live_state needn't inside of the
> qemu_mutex_lock_iothread,
> I will move the it out the lock area in next version.
> 
> 
> 
> >
> > > +/* Note: device state is saved into buffer */
> > > +ret = qemu_save_device_state(fb);
> > > +
> > > +qemu_mutex_unlock_iothread();
> > > +
> > > +qemu_fflush(fb);
> > > +
> > >  /*
> > >   * We need the size of the VMstate data in Secondary side,
> > >   * With which we can decide how much data should be read.
> > > @@ -643,6 +651,7 @@ void *colo_process_incoming_thread(void *opaque)
> > >  uint64_t total_size;
> > >  uint64_t value;
> > >  Error *local_err = NULL;
> > > +int ret;
> > >
> > >  qemu_sem_init(>colo_incoming_sem, 0);
> > >
> > > @@ -715,6 +724,16 @@ void *colo_process_incoming_thread(void *opaque)
> > >  goto out;
> > >  }
> > >
> > > +qemu_mutex_lock_iothread();
> > > +cpu_synchronize_all_pre_loadvm();
> > > +ret = qemu_loadvm_state_main(mis->from_src_file, mis);
> > > +qemu_mutex_unlock_iothread();
> > > +
> > > +if (ret < 0) {
> > > +error_report("Load VM's live state (ram) error");
> > > +goto out;
> > > +}
> > > +
> > >  value = colo_receive_message_value(mis->from_src_file,
> > >   COLO_MESSAGE_VMSTATE_SIZE, _err);
> > >  

Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.

2018-06-19 Thread Michael S. Tsirkin
On Tue, Jun 19, 2018 at 01:36:17PM -0500, Venu Busireddy wrote:
> On 2018-06-19 21:21:23 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 19, 2018 at 01:14:06PM -0500, Venu Busireddy wrote:
> > > On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote:
> > > > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain the
> > > > > "Group Identifier" (UUID) that will be used to pair a virtio device 
> > > > > with
> > > > > the passthrough device attached to that bridge.
> > > > > 
> > > > > This capability is added to the bridge iff the "uuid" option is 
> > > > > specified
> > > > > for the bridge device, via the qemu command line. Also, the bridge's
> > > > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is changed
> > > > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when 
> > > > > the
> > > > > "uuid" option is present.
> > > > > 
> > > > > Signed-off-by: Venu Busireddy 
> > > > 
> > > > I don't see why we should add it to all bridges.
> > > > Let's just add it to ones that already have the RH vendor ID?
> > > 
> > > No. I am not adding the capability to all bridges.
> > > 
> > > In the earlier discussions, we agreed that the bridge be left as
> > > Intel bridge if we do not intend to use it for storing the pairing
> > > information. If we do intend to store the pairing information in the
> > > bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to
> > > avoid confusion. In other words, bridge's with RH Vendor ID come into
> > > existence only when there is an intent to store the pairing information
> > > in the bridge.
> > > 
> > > Accordingly, if the "uuid" option is specified for the bridge, it
> > > is assumed that the user intends to use the bridge for storing the
> > > pairing information, and hence, the capability is added to the bridge,
> > > and the Vendor ID is changed to RH Vendor ID. If the "uuid" option
> > > is not specified, the bridge remains as Intel bridge, and without the
> > > vendor-specific capability.
> > > 
> > > Venu
> > 
> > Yes but the way to do it is not to tweak the vendor and device ID,
> > instead, just add the UUID property to bridges that already have the
> > correct vendor and device id.
> 
> I was using ioh3420 as the bridge device, because that is what is
> recommended here:
> 
>   https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/pcie.txt;hb=HEAD
> 
> ioh3420 defaults to the Intel Vendor ID. Hence the tweak to change the
> Vendor ID to RH Vendor ID.
> 
> Is there another bridge device other than ioh3420 that I should use?
> what device do you suggest? 
> 
> Thanks,
> 
> Venu

For pci, use hw/pci-bridge/pci_bridge_dev.c
Maybe allocate a special ID for grouping bridges.

For express, add your own downstream port.


> > 
> > 
> > > > 
> > > > 
> > > > > ---
> > > > >  hw/pci-bridge/ioh3420.c|  2 ++
> > > > >  hw/pci-bridge/pcie_root_port.c |  7 +++
> > > > >  hw/pci/pci_bridge.c| 32 
> > > > >  include/hw/pci/pci.h   |  2 ++
> > > > >  include/hw/pci/pcie.h  |  1 +
> > > > >  include/hw/pci/pcie_port.h |  1 +
> > > > >  6 files changed, 45 insertions(+)
> > > > > 
> > > > > diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
> > > > > index a451d74ee6..b6b9ebc726 100644
> > > > > --- a/hw/pci-bridge/ioh3420.c
> > > > > +++ b/hw/pci-bridge/ioh3420.c
> > > > > @@ -35,6 +35,7 @@
> > > > >  #define IOH_EP_MSI_SUPPORTED_FLAGS  PCI_MSI_FLAGS_MASKBIT
> > > > >  #define IOH_EP_MSI_NR_VECTOR2
> > > > >  #define IOH_EP_EXP_OFFSET   0x90
> > > > > +#define IOH_EP_VENDOR_OFFSET0xCC
> > > > >  #define IOH_EP_AER_OFFSET   0x100
> > > > >  
> > > > >  /*
> > > > > @@ -111,6 +112,7 @@ static void ioh3420_class_init(ObjectClass 
> > > > > *klass, void *data)
> > > > >  rpc->exp_offset = IOH_EP_EXP_OFFSET;
> > > > >  rpc->aer_offset = IOH_EP_AER_OFFSET;
> > > > >  rpc->ssvid_offset = IOH_EP_SSVID_OFFSET;
> > > > > +rpc->vendor_offset = IOH_EP_VENDOR_OFFSET;
> > > > >  rpc->ssid = IOH_EP_SSVID_SSID;
> > > > >  }
> > > > >  
> > > > > diff --git a/hw/pci-bridge/pcie_root_port.c 
> > > > > b/hw/pci-bridge/pcie_root_port.c
> > > > > index 45f9e8cd4a..ba470c7fda 100644
> > > > > --- a/hw/pci-bridge/pcie_root_port.c
> > > > > +++ b/hw/pci-bridge/pcie_root_port.c
> > > > > @@ -71,6 +71,12 @@ static void rp_realize(PCIDevice *d, Error **errp)
> > > > >  goto err_bridge;
> > > > >  }
> > > > >  
> > > > > +rc = pci_bridge_vendor_init(d, rpc->vendor_offset, errp);
> > > > > +if (rc < 0) {
> > > > > +error_append_hint(errp, "Can't init group ID, error %d\n", 
> > > > > rc);
> > > > > +goto err_bridge;
> > > > > +}
> > > > > +
> > > > >  if (rpc->interrupts_init) {
> > > > >  rc = rpc->interrupts_init(d, errp);
> > > > >  if (rc < 0) {
> > > > > @@ 

Re: [Qemu-devel] [PATCH 6/7] block/qcow2-refcount: fix out-of-file L1 entries to be zero

2018-06-19 Thread Eric Blake

On 06/19/2018 01:34 PM, Vladimir Sementsov-Ogievskiy wrote:

Zero out corrupted L1 table entry, which reference L2 table out of
underlying file.
Zero L1 table entry means that "the L2 table and all clusters described
by this L2 table are unallocated."

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2-refcount.c | 37 +
  1 file changed, 37 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index d993252fb6..3c9e2da39e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1641,6 +1641,29 @@ static int fix_l2_entry_to_zero(BlockDriverState *bs, 
BdrvCheckResult *res,
  return ret;
  }
  
+/* Zero out L1 entry

+ *
+ * Returns: -errno if overlap check failed
+ *  0 if write failed


If the write failed, wouldn't there be an errno value worth returning?


+ *  1 on success
+ */
+static int fix_l1_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res,
+BdrvCheckMode fix, int64_t l1_offset,
+int l1_index, bool active,
+const char *fmt, ...)
+{
+int ret;
+int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2;
+va_list args;
+
+va_start(args, fmt);
+ret = fix_table_entry(bs, res, fix, "L1", l1_offset, l1_index, 0, ign,
+  fmt, args);
+va_end(args);
+
+return ret;
+}
+
  /*
   * Increases the refcount in the given refcount table for the all clusters
   * referenced in the L2 table. While doing so, performs some checks on L2
@@ -1837,6 +1860,20 @@ static int check_refcounts_l1(BlockDriverState *bs,
  if (l2_offset) {
  /* Mark L2 table as used */
  l2_offset &= L1E_OFFSET_MASK;
+if (l2_offset >= bdrv_getlength(bs->file->bs)) {


Again, bdrv_getlength() can fail; you want to make sure that you check 
for failures before using it in comparisons.



+ret = fix_l1_entry_to_zero(
+bs, res, fix, l1_table_offset, i, active,
+"l2 table offset out of file: offset 0x%" PRIx64,
+l2_offset);
+if (ret < 0) {
+/* Something is seriously wrong, so abort checking
+ * this L1 table */
+goto fail;
+}
+
+continue;
+}
+
  ret = qcow2_inc_refcounts_imrt(bs, res,
 refcount_table, 
refcount_table_size,
 l2_offset, s->cluster_size);



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v2 05/11] hw/arm/virt: GICv3 DT node with one or two redistributor regions

2018-06-19 Thread Laszlo Ersek
Hi Eric,

sorry about the late followup. I have one question (mainly for Ard):

On 06/15/18 16:28, Eric Auger wrote:
> This patch allows the creation of a GICv3 node with 1 or 2
> redistributor regions depending on the number of smu_cpus.
> The second redistributor region is located just after the
> existing RAM region, at 256GB and contains up to up to 512 vcpus.
>
> Please refer to kernel documentation for further node details:
> Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt
>
> Signed-off-by: Eric Auger 
> Reviewed-by: Andrew Jones 
>
> ---
> v1 (virt3.0) -> v2
> - Added Drew's R-b
>
> v2 -> v3:
> - VIRT_GIC_REDIST2 is now 64MB large, ie. 512 redistributor capacity
> - virt_gicv3_redist_region_count does not test kvm_irqchip_in_kernel
>   anymore
> ---
>  hw/arm/virt.c | 29 -
>  include/hw/arm/virt.h | 14 ++
>  2 files changed, 38 insertions(+), 5 deletions(-)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 2885d18..d9f72eb 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -148,6 +148,8 @@ static const MemMapEntry a15memmap[] = {
>  [VIRT_PCIE_PIO] =   { 0x3eff, 0x0001 },
>  [VIRT_PCIE_ECAM] =  { 0x3f00, 0x0100 },
>  [VIRT_MEM] ={ 0x4000, RAMLIMIT_BYTES },
> +/* Additional 64 MB redist region (can contain up to 512 redistributors) 
> */
> +[VIRT_GIC_REDIST2] ={ 0x40ULL, 0x400 },
>  /* Second PCIe window, 512GB wide at the 512GB boundary */
>  [VIRT_PCIE_MMIO_HIGH] =   { 0x80ULL, 0x80ULL },
>  };
> @@ -401,13 +403,30 @@ static void fdt_add_gic_node(VirtMachineState *vms)
>  qemu_fdt_setprop_cell(vms->fdt, "/intc", "#size-cells", 0x2);
>  qemu_fdt_setprop(vms->fdt, "/intc", "ranges", NULL, 0);
>  if (vms->gic_version == 3) {
> +int nb_redist_regions = virt_gicv3_redist_region_count(vms);
> +
>  qemu_fdt_setprop_string(vms->fdt, "/intc", "compatible",
>  "arm,gic-v3");
> -qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg",
> - 2, vms->memmap[VIRT_GIC_DIST].base,
> - 2, vms->memmap[VIRT_GIC_DIST].size,
> - 2, vms->memmap[VIRT_GIC_REDIST].base,
> - 2, vms->memmap[VIRT_GIC_REDIST].size);
> +
> +qemu_fdt_setprop_cell(vms->fdt, "/intc",
> +  "#redistributor-regions", nb_redist_regions);
> +
> +if (nb_redist_regions == 1) {
> +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg",
> + 2, vms->memmap[VIRT_GIC_DIST].base,
> + 2, vms->memmap[VIRT_GIC_DIST].size,
> + 2, 
> vms->memmap[VIRT_GIC_REDIST].base,
> + 2, 
> vms->memmap[VIRT_GIC_REDIST].size);
> +} else {
> +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg",
> + 2, vms->memmap[VIRT_GIC_DIST].base,
> + 2, vms->memmap[VIRT_GIC_DIST].size,
> + 2, 
> vms->memmap[VIRT_GIC_REDIST].base,
> + 2, 
> vms->memmap[VIRT_GIC_REDIST].size,
> + 2, 
> vms->memmap[VIRT_GIC_REDIST2].base,
> + 2, 
> vms->memmap[VIRT_GIC_REDIST2].size);
> +}
> +
>  if (vms->virt) {
>  qemu_fdt_setprop_cells(vms->fdt, "/intc", "interrupts",
> GIC_FDT_IRQ_TYPE_PPI, 
> ARCH_GICV3_MAINT_IRQ,

In edk2, we have the following code in
"ArmVirtPkg/Library/ArmVirtGicArchLib/ArmVirtGicArchLib.c":

  switch (GicRevision) {

  case 3:
//
// The GIC v3 DT binding describes a series of at least 3 physical (base
// addresses, size) pairs: the distributor interface (GICD), at least one
// redistributor region (GICR) containing dedicated redistributor
// interfaces for all individual CPUs, and the CPU interface (GICC).
// Under virtualization, we assume that the first redistributor region
// listed covers the boot CPU. Also, our GICv3 driver only supports the
// system register CPU interface, so we can safely ignore the MMIO version
// which is listed after the sequence of redistributor interfaces.
// This means we are only interested in the first two memory regions
// supplied, and ignore everything else.
//
ASSERT (RegSize >= 32);

// RegProp[0..1] == { GICD base, GICD size }
DistBase = SwapBytes64 (Reg[0]);
ASSERT (DistBase < MAX_UINTN);

// RegProp[2..3] == { GICR base, GICR size }
RedistBase = SwapBytes64 (Reg[2]);
ASSERT (RedistBase < MAX_UINTN);

PcdStatus = PcdSet64S 

Re: [Qemu-devel] [PATCH 3/7] block/qcow2-refcount: check_refcounts_l2: refactor compressed case

2018-06-19 Thread Eric Blake

On 06/19/2018 01:34 PM, Vladimir Sementsov-Ogievskiy wrote:

Separate offset and size of compressed cluster.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2-refcount.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)


Hmm, I wonder if this duplicates my pending patch:

https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg04542.html

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v6 5/6] iotests: Add new test 214 for max compressed cluster offset

2018-06-19 Thread Eric Blake

On 04/26/2018 07:10 AM, Alberto Garcia wrote:

On Thu 26 Apr 2018 04:51:28 AM CEST, Eric Blake wrote:

If you have a capable file system (tmpfs is good, ext4 not so much;
run ./check with TEST_DIR pointing to a good location so as not
to skip the test), it's actually possible to create a qcow2 file
that expands to a sparse 512T image with just over 38M of content.
The test is not the world's fastest (qemu crawling through 256M
bits of refcount table to find the next cluster to allocate takes
several seconds, as does qemu-img check reporting millions of
leaked clusters); but it DOES catch the problem that the previous
patch just fixed where writing a compressed cluster to a full
image ended up overwriting the wrong cluster.

Suggested-by: Max Reitz 
Signed-off-by: Eric Blake 


Nice test :-)

Reviewed-by: Alberto Garcia 


214 is already in the tree in the meantime; this will need a rebase to 
pick the next available test number (220 might be claimed, so 222?)


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH] hw/pci-host/xilinx-pcie: don't make "io" region be RAM

2018-06-19 Thread Alistair Francis
On Tue, Jun 19, 2018 at 5:07 AM, Peter Maydell  wrote:
> Currently we use memory_region_init_rom_nomigrate() to create
> the "io" memory region to pass to pci_register_root_bus().
> This is a dummy region, because this PCI controller doesn't
> support accesses to PCI IO space.
>
> There is no reason for the dummy region to be a RAM region;
> it is only used as a place where PCI BARs can be mapped,
> and if you could get a PCI card to do a bus master access
> to the IO space it should not get acts-like-RAM behaviour.
> Use a simple container memory region instead. (We do have
> one PCI card model which can do bus master accesses to IO
> space -- the LSI53C895A SCSI adaptor.)
>
> This avoids the oddity of having a memory region which is
> RAM but where the RAM is not migrated.
>
> Note that the size of the region we use here has no
> effect on behaviour.
>
> Signed-off-by: Peter Maydell 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/pci-host/xilinx-pcie.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/hw/pci-host/xilinx-pcie.c b/hw/pci-host/xilinx-pcie.c
> index 044e312dc18..b0a31b917d8 100644
> --- a/hw/pci-host/xilinx-pcie.c
> +++ b/hw/pci-host/xilinx-pcie.c
> @@ -120,9 +120,8 @@ static void xilinx_pcie_host_realize(DeviceState *dev, 
> Error **errp)
>  memory_region_init(>mmio, OBJECT(s), "mmio", UINT64_MAX);
>  memory_region_set_enabled(>mmio, false);
>
> -/* dummy I/O region */
> -memory_region_init_ram_nomigrate(>io, OBJECT(s), "io", 16, NULL);
> -memory_region_set_enabled(>io, false);
> +/* dummy PCI I/O region (not visible to the CPU) */
> +memory_region_init(>io, OBJECT(s), "io", 16);
>
>  /* interrupt out */
>  qdev_init_gpio_out_named(dev, >irq, "interrupt_out", 1);
> --
> 2.17.1
>
>



Re: [Qemu-devel] [PATCH v6 0/6] minor qcow2 compression improvements

2018-06-19 Thread Eric Blake

ping

On 04/25/2018 09:51 PM, Eric Blake wrote:

Even though v5 was posted earlier today, it was worth a respin:
- 2/6: add R-b [Berto]
- 4/6, 6/6: improve commit messages [Max]
- 5/6: new patch, with an iotests proving that 4/6 is a bug fix [Max]

The new test is rather slow (nearly 90 seconds for me using
tmpfs) unless it skips entirely (such as testing on ext4); ideas
for speeding it up are welcome (translation: maybe qemu should
optimize the search for the next available cluster to allocate,
and/or qemu-img check should be faster at reporting leaked
clusters)

001/6:[] [--] 'qcow2: Prefer byte-based calls into bs->file'
002/6:[] [--] 'qcow2: Document some maximum size constraints'
003/6:[] [--] 'qcow2: Reduce REFT_OFFSET_MASK'
004/6:[] [--] 'qcow2: Don't allow overflow during cluster allocation'
005/6:[down] 'iotests: Add new test 214 for max compressed cluster offset'
006/6:[] [--] 'qcow2: Avoid memory over-allocation on compressed images'

Eric Blake (6):
   qcow2: Prefer byte-based calls into bs->file
   qcow2: Document some maximum size constraints
   qcow2: Reduce REFT_OFFSET_MASK
   qcow2: Don't allow overflow during cluster allocation
   iotests: Add new test 214 for max compressed cluster offset
   qcow2: Avoid memory over-allocation on compressed images

  docs/interop/qcow2.txt | 40 +--
  block/qcow2.h  |  8 +++-
  block/qcow2-cluster.c  | 32 +--
  block/qcow2-refcount.c | 27 -
  block/qcow2.c  |  2 +-
  tests/qemu-iotests/214 | 97 ++
  tests/qemu-iotests/214.out | 54 ++
  tests/qemu-iotests/group   |  1 +
  8 files changed, 234 insertions(+), 27 deletions(-)
  create mode 100755 tests/qemu-iotests/214
  create mode 100644 tests/qemu-iotests/214.out



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH 2/7] block/qcow2-refcount: avoid eating RAM

2018-06-19 Thread Eric Blake

On 06/19/2018 01:34 PM, Vladimir Sementsov-Ogievskiy wrote:

qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat
unpredicted amount of memory on corrupted table entries, which are


s/unpredicted/an unpredictable/


referencing regions far beyond the end of file.

Prevent this, by skipping such regions from further processing.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2-refcount.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index f9d095aa2d..28d21bedc3 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1505,6 +1505,14 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, 
BdrvCheckResult *res,
  return 0;
  }
  
+if (offset + size - bdrv_getlength(bs->file->bs) > s->cluster_size) {


bdrv_getlength() can fail (returning a negative value); this needs to be 
refactored so that you aren't performing arithmetic comparisons after 
such a failure (even if that failure is unlikely).



+fprintf(stderr, "ERROR: counting reference for region exceeding the "
+"end of the file by more than one cluster: offset 0x%" PRIx64
+" size 0x%" PRIx64 "\n", offset, size);


Why is this dumping directly to stderr?

/me reads the file

Oh.  We probably ought to fix the code to pass an Error **errp parameter 
through the callstack, but that's a bigger audit (and not the fault of 
your patch for copying existing usage).



+res->corruptions++;
+return 0;
+}
+
  start = start_of_cluster(s, offset);
  last = start_of_cluster(s, offset + size - 1);
  for(cluster_offset = start; cluster_offset <= last;



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.

2018-06-19 Thread Venu Busireddy
On 2018-06-19 21:21:23 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 19, 2018 at 01:14:06PM -0500, Venu Busireddy wrote:
> > On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote:
> > > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain the
> > > > "Group Identifier" (UUID) that will be used to pair a virtio device with
> > > > the passthrough device attached to that bridge.
> > > > 
> > > > This capability is added to the bridge iff the "uuid" option is 
> > > > specified
> > > > for the bridge device, via the qemu command line. Also, the bridge's
> > > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is changed
> > > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when the
> > > > "uuid" option is present.
> > > > 
> > > > Signed-off-by: Venu Busireddy 
> > > 
> > > I don't see why we should add it to all bridges.
> > > Let's just add it to ones that already have the RH vendor ID?
> > 
> > No. I am not adding the capability to all bridges.
> > 
> > In the earlier discussions, we agreed that the bridge be left as
> > Intel bridge if we do not intend to use it for storing the pairing
> > information. If we do intend to store the pairing information in the
> > bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to
> > avoid confusion. In other words, bridge's with RH Vendor ID come into
> > existence only when there is an intent to store the pairing information
> > in the bridge.
> > 
> > Accordingly, if the "uuid" option is specified for the bridge, it
> > is assumed that the user intends to use the bridge for storing the
> > pairing information, and hence, the capability is added to the bridge,
> > and the Vendor ID is changed to RH Vendor ID. If the "uuid" option
> > is not specified, the bridge remains as Intel bridge, and without the
> > vendor-specific capability.
> > 
> > Venu
> 
> Yes but the way to do it is not to tweak the vendor and device ID,
> instead, just add the UUID property to bridges that already have the
> correct vendor and device id.

I was using ioh3420 as the bridge device, because that is what is
recommended here:

  https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/pcie.txt;hb=HEAD

ioh3420 defaults to the Intel Vendor ID. Hence the tweak to change the
Vendor ID to RH Vendor ID.

Is there another bridge device other than ioh3420 that I should use?
what device do you suggest? 

Thanks,

Venu

> 
> 
> > > 
> > > 
> > > > ---
> > > >  hw/pci-bridge/ioh3420.c|  2 ++
> > > >  hw/pci-bridge/pcie_root_port.c |  7 +++
> > > >  hw/pci/pci_bridge.c| 32 
> > > >  include/hw/pci/pci.h   |  2 ++
> > > >  include/hw/pci/pcie.h  |  1 +
> > > >  include/hw/pci/pcie_port.h |  1 +
> > > >  6 files changed, 45 insertions(+)
> > > > 
> > > > diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
> > > > index a451d74ee6..b6b9ebc726 100644
> > > > --- a/hw/pci-bridge/ioh3420.c
> > > > +++ b/hw/pci-bridge/ioh3420.c
> > > > @@ -35,6 +35,7 @@
> > > >  #define IOH_EP_MSI_SUPPORTED_FLAGS  PCI_MSI_FLAGS_MASKBIT
> > > >  #define IOH_EP_MSI_NR_VECTOR2
> > > >  #define IOH_EP_EXP_OFFSET   0x90
> > > > +#define IOH_EP_VENDOR_OFFSET0xCC
> > > >  #define IOH_EP_AER_OFFSET   0x100
> > > >  
> > > >  /*
> > > > @@ -111,6 +112,7 @@ static void ioh3420_class_init(ObjectClass *klass, 
> > > > void *data)
> > > >  rpc->exp_offset = IOH_EP_EXP_OFFSET;
> > > >  rpc->aer_offset = IOH_EP_AER_OFFSET;
> > > >  rpc->ssvid_offset = IOH_EP_SSVID_OFFSET;
> > > > +rpc->vendor_offset = IOH_EP_VENDOR_OFFSET;
> > > >  rpc->ssid = IOH_EP_SSVID_SSID;
> > > >  }
> > > >  
> > > > diff --git a/hw/pci-bridge/pcie_root_port.c 
> > > > b/hw/pci-bridge/pcie_root_port.c
> > > > index 45f9e8cd4a..ba470c7fda 100644
> > > > --- a/hw/pci-bridge/pcie_root_port.c
> > > > +++ b/hw/pci-bridge/pcie_root_port.c
> > > > @@ -71,6 +71,12 @@ static void rp_realize(PCIDevice *d, Error **errp)
> > > >  goto err_bridge;
> > > >  }
> > > >  
> > > > +rc = pci_bridge_vendor_init(d, rpc->vendor_offset, errp);
> > > > +if (rc < 0) {
> > > > +error_append_hint(errp, "Can't init group ID, error %d\n", rc);
> > > > +goto err_bridge;
> > > > +}
> > > > +
> > > >  if (rpc->interrupts_init) {
> > > >  rc = rpc->interrupts_init(d, errp);
> > > >  if (rc < 0) {
> > > > @@ -137,6 +143,7 @@ static void rp_exit(PCIDevice *d)
> > > >  static Property rp_props[] = {
> > > >  DEFINE_PROP_BIT(COMPAT_PROP_PCP, PCIDevice, cap_present,
> > > >  QEMU_PCIE_SLTCAP_PCP_BITNR, true),
> > > > +DEFINE_PROP_UUID(COMPAT_PROP_UUID, PCIDevice, uuid, false),
> > > >  DEFINE_PROP_END_OF_LIST()
> > > >  };
> > > >  
> > > > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> > > > index 40a39f57cb..c63bc439f7 

[Qemu-devel] [PATCH 5/7] block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero

2018-06-19 Thread Vladimir Sementsov-Ogievskiy
Split entry repairing to separate function, to be reused later.

Note: entry in in-memory l2 table (local variable in
check_refcounts_l2) is not updated after this patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2-refcount.c | 147 -
 1 file changed, 109 insertions(+), 38 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 02583f260b..d993252fb6 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1548,6 +1548,99 @@ enum {
 CHECK_FRAG_INFO = 0x2,  /* update BlockFragInfo counters */
 };
 
+/* Update entry in L1 or L2 table
+ *
+ * Returns: -errno if overlap check failed
+ *  0 if write failed
+ *  1 on success
+ */
+static int write_table_entry(BlockDriverState *bs, const char *table_name,
+ uint64_t table_offset, int entry_index,
+ uint64_t new_val, int ign)
+{
+int ret;
+uint64_t entry_offset =
+table_offset + (uint64_t)entry_index * sizeof(new_val);
+
+cpu_to_be64s(_val);
+ret = qcow2_pre_write_overlap_check(bs, ign, entry_offset, 
sizeof(new_val));
+if (ret < 0) {
+fprintf(stderr,
+"ERROR: Can't write %s table entry: overlap check failed: 
%s\n",
+table_name, strerror(-ret));
+return ret;
+}
+
+ret = bdrv_pwrite_sync(bs->file, entry_offset, _val, sizeof(new_val));
+if (ret < 0) {
+fprintf(stderr, "ERROR: Failed to overwrite %s table entry: %s\n",
+table_name, strerror(-ret));
+return 0;
+}
+
+return 1;
+}
+
+/* Try to fix (if allowed) entry in L1 or L2 table. Update @res 
correspondingly.
+ *
+ * Returns: -errno if overlap check failed
+ *  0 if entry was not updated for other reason
+ *(fixing disabled or write failed)
+ *  1 on success
+ */
+static int fix_table_entry(BlockDriverState *bs, BdrvCheckResult *res,
+   BdrvCheckMode fix, const char *table_name,
+   uint64_t table_offset, int entry_index,
+   uint64_t new_val, int ign,
+   const char *fmt, va_list args)
+{
+int ret;
+
+fprintf(stderr, fix & BDRV_FIX_ERRORS ? "Repairing: " : "ERROR: ");
+vfprintf(stderr, fmt, args);
+fprintf(stderr, "\n");
+
+if (!(fix & BDRV_FIX_ERRORS)) {
+res->corruptions++;
+return 0;
+}
+
+ret = write_table_entry(bs, table_name, table_offset, entry_index, new_val,
+ign);
+
+if (ret == 1) {
+res->corruptions_fixed++;
+} else {
+res->check_errors++;
+}
+
+return ret;
+}
+
+/* Make L2 entry to be QCOW2_CLUSTER_ZERO_PLAIN
+ *
+ * Returns: -errno if overlap check failed
+ *  0 if write failed
+ *  1 on success
+ */
+static int fix_l2_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res,
+BdrvCheckMode fix, int64_t l2_offset,
+int l2_index, bool active,
+const char *fmt, ...)
+{
+int ret;
+int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2;
+uint64_t l2_entry = QCOW_OFLAG_ZERO;
+va_list args;
+
+va_start(args, fmt);
+ret = fix_table_entry(bs, res, fix, "L2", l2_offset, l2_index, l2_entry,
+  ign, fmt, args);
+va_end(args);
+
+return ret;
+}
+
 /*
  * Increases the refcount in the given refcount table for the all clusters
  * referenced in the L2 table. While doing so, performs some checks on L2
@@ -1640,46 +1733,24 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 if (qcow2_get_cluster_type(l2_entry) ==
 QCOW2_CLUSTER_ZERO_ALLOC)
 {
-fprintf(stderr, "%s offset=%" PRIx64 ": Preallocated zero "
-"cluster is not properly aligned; L2 entry "
-"corrupted.\n",
-fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
+ret = fix_l2_entry_to_zero(
+bs, res, fix, l2_offset, i, active,
+"offset=%" PRIx64 ": Preallocated zero cluster is "
+"not properly aligned; L2 entry corrupted.",
 offset);
-if (fix & BDRV_FIX_ERRORS) {
-uint64_t l2e_offset =
-l2_offset + (uint64_t)i * sizeof(uint64_t);
-int ign = active ? QCOW2_OL_ACTIVE_L2 :
-   QCOW2_OL_INACTIVE_L2;
-
-l2_entry = QCOW_OFLAG_ZERO;
-l2_table[i] = cpu_to_be64(l2_entry);
-ret = qcow2_pre_write_overlap_check(bs, ign,
-

[Qemu-devel] [PATCH 3/7] block/qcow2-refcount: check_refcounts_l2: refactor compressed case

2018-06-19 Thread Vladimir Sementsov-Ogievskiy
Separate offset and size of compressed cluster.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2-refcount.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 28d21bedc3..42167b7040 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1564,7 +1564,7 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 BDRVQcow2State *s = bs->opaque;
 uint64_t *l2_table, l2_entry;
 uint64_t next_contiguous_offset = 0;
-int i, l2_size, nb_csectors, ret;
+int i, l2_size, ret;
 
 /* Read L2 table from disk */
 l2_size = s->l2_size * sizeof(uint64_t);
@@ -1583,6 +1583,9 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 
 switch (qcow2_get_cluster_type(l2_entry)) {
 case QCOW2_CLUSTER_COMPRESSED:
+{
+int64_t csize, coffset;
+
 /* Compressed clusters don't have QCOW_OFLAG_COPIED */
 if (l2_entry & QCOW_OFLAG_COPIED) {
 fprintf(stderr, "ERROR: coffset=0x%" PRIx64 ": "
@@ -1593,12 +1596,13 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 }
 
 /* Mark cluster as used */
-nb_csectors = ((l2_entry >> s->csize_shift) &
-   s->csize_mask) + 1;
-l2_entry &= s->cluster_offset_mask;
+csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
+BDRV_SECTOR_SIZE;
+coffset = l2_entry & s->cluster_offset_mask &
+  ~(BDRV_SECTOR_SIZE - 1);
 ret = qcow2_inc_refcounts_imrt(bs, res,
refcount_table, refcount_table_size,
-   l2_entry & ~511, nb_csectors * 512);
+   coffset, csize);
 if (ret < 0) {
 goto fail;
 }
@@ -1615,6 +1619,7 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 res->bfi.fragmented_clusters++;
 }
 break;
+}
 
 case QCOW2_CLUSTER_ZERO_ALLOC:
 case QCOW2_CLUSTER_NORMAL:
-- 
2.11.1




Re: [Qemu-devel] [PATCH v5 1/6] nbd/server: fix trace

2018-06-19 Thread Eric Blake

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Return code = 1 doesn't mean that we parsed base:allocation. Use
correct traces in both -parsed and -skipped cases.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  nbd/server.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)




diff --git a/nbd/server.c b/nbd/server.c
index 9e1f227178..8e02e077ec 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -741,7 +741,10 @@ static int nbd_negotiate_send_meta_context(NBDClient 
*client,
   * the current name, after the 'base:' portion has been stripped.
   *
   * Return -errno on I/O error, 0 if option was completely handled by
- * sending a reply about inconsistent lengths, or 1 on success. */
+ * sending a reply about inconsistent lengths, or 1 on success.
+ *
+ * Note: return code = 1 doesn't mean that we've parsed "base:allocation"
+ * namespace. It only means that there are no errors.*/


Space before comment tail (actually, the recent conversation on comment 
style says the tail should be on its own line...)


That's something I can tweak on commit.

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



[Qemu-devel] [PATCH 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero

2018-06-19 Thread Vladimir Sementsov-Ogievskiy
Rewrite corrupted L2 table entry, which reference space out of
underlying file.

Make this L2 table entry read-as-all-zeros without any allocation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2-refcount.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3c9e2da39e..cbad8355f3 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1714,8 +1714,30 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 /* Mark cluster as used */
 csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
 BDRV_SECTOR_SIZE;
+if (csize > s->cluster_size) {
+ret = fix_l2_entry_to_zero(
+bs, res, fix, l2_offset, i, active,
+"compressed cluster larger than cluster: size 0x%"
+PRIx64, csize);
+if (ret < 0) {
+goto fail;
+}
+continue;
+}
+
 coffset = l2_entry & s->cluster_offset_mask &
   ~(BDRV_SECTOR_SIZE - 1);
+if (coffset >= bdrv_getlength(bs->file->bs)) {
+ret = fix_l2_entry_to_zero(
+bs, res, fix, l2_offset, i, active,
+"compressed cluster out of file: offset 0x%" PRIx64,
+coffset);
+if (ret < 0) {
+goto fail;
+}
+continue;
+}
+
 ret = qcow2_inc_refcounts_imrt(bs, res,
refcount_table, refcount_table_size,
coffset, csize);
@@ -1742,6 +1764,16 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 {
 uint64_t offset = l2_entry & L2E_OFFSET_MASK;
 
+if (offset >= bdrv_getlength(bs->file->bs)) {
+ret = fix_l2_entry_to_zero(
+bs, res, fix, l2_offset, i, active,
+"cluster out of file: offset 0x%" PRIx64, offset);
+if (ret < 0) {
+goto fail;
+}
+continue;
+}
+
 if (flags & CHECK_FRAG_INFO) {
 res->bfi.allocated_clusters++;
 if (next_contiguous_offset &&
-- 
2.11.1




[Qemu-devel] [PATCH 1/7] block/qcow2-refcount: fix check_oflag_copied

2018-06-19 Thread Vladimir Sementsov-Ogievskiy
Increase corruptions_fixed only after successful fix.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2-refcount.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 18c729aa27..f9d095aa2d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1816,7 +1816,7 @@ static int check_oflag_copied(BlockDriverState *bs, 
BdrvCheckResult *res,
 for (i = 0; i < s->l1_size; i++) {
 uint64_t l1_entry = s->l1_table[i];
 uint64_t l2_offset = l1_entry & L1E_OFFSET_MASK;
-bool l2_dirty = false;
+int l2_fixed_entries = 0;
 
 if (!l2_offset) {
 continue;
@@ -1878,8 +1878,7 @@ static int check_oflag_copied(BlockDriverState *bs, 
BdrvCheckResult *res,
 l2_table[j] = cpu_to_be64(refcount == 1
 ? l2_entry |  QCOW_OFLAG_COPIED
 : l2_entry & ~QCOW_OFLAG_COPIED);
-l2_dirty = true;
-res->corruptions_fixed++;
+l2_fixed_entries++;
 } else {
 res->corruptions++;
 }
@@ -1887,7 +1886,7 @@ static int check_oflag_copied(BlockDriverState *bs, 
BdrvCheckResult *res,
 }
 }
 
-if (l2_dirty) {
+if (l2_fixed_entries > 0) {
 ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_ACTIVE_L2,
 l2_offset, s->cluster_size);
 if (ret < 0) {
@@ -1905,6 +1904,7 @@ static int check_oflag_copied(BlockDriverState *bs, 
BdrvCheckResult *res,
 res->check_errors++;
 goto fail;
 }
+res->corruptions_fixed += l2_fixed_entries;
 }
 }
 
-- 
2.11.1




[Qemu-devel] [PATCH 6/7] block/qcow2-refcount: fix out-of-file L1 entries to be zero

2018-06-19 Thread Vladimir Sementsov-Ogievskiy
Zero out corrupted L1 table entry, which reference L2 table out of
underlying file.
Zero L1 table entry means that "the L2 table and all clusters described
by this L2 table are unallocated."

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2-refcount.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index d993252fb6..3c9e2da39e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1641,6 +1641,29 @@ static int fix_l2_entry_to_zero(BlockDriverState *bs, 
BdrvCheckResult *res,
 return ret;
 }
 
+/* Zero out L1 entry
+ *
+ * Returns: -errno if overlap check failed
+ *  0 if write failed
+ *  1 on success
+ */
+static int fix_l1_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res,
+BdrvCheckMode fix, int64_t l1_offset,
+int l1_index, bool active,
+const char *fmt, ...)
+{
+int ret;
+int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2;
+va_list args;
+
+va_start(args, fmt);
+ret = fix_table_entry(bs, res, fix, "L1", l1_offset, l1_index, 0, ign,
+  fmt, args);
+va_end(args);
+
+return ret;
+}
+
 /*
  * Increases the refcount in the given refcount table for the all clusters
  * referenced in the L2 table. While doing so, performs some checks on L2
@@ -1837,6 +1860,20 @@ static int check_refcounts_l1(BlockDriverState *bs,
 if (l2_offset) {
 /* Mark L2 table as used */
 l2_offset &= L1E_OFFSET_MASK;
+if (l2_offset >= bdrv_getlength(bs->file->bs)) {
+ret = fix_l1_entry_to_zero(
+bs, res, fix, l1_table_offset, i, active,
+"l2 table offset out of file: offset 0x%" PRIx64,
+l2_offset);
+if (ret < 0) {
+/* Something is seriously wrong, so abort checking
+ * this L1 table */
+goto fail;
+}
+
+continue;
+}
+
 ret = qcow2_inc_refcounts_imrt(bs, res,
refcount_table, refcount_table_size,
l2_offset, s->cluster_size);
-- 
2.11.1




[Qemu-devel] [PATCH 4/7] block/qcow2-refcount: check_refcounts_l2: reduce ignored overlaps

2018-06-19 Thread Vladimir Sementsov-Ogievskiy
Reduce number of structures ignored in overlap check: when checking
active table ignore active tables, when checking inactive table ignore
inactive ones.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2-refcount.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 42167b7040..02583f260b 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1559,7 +1559,7 @@ enum {
 static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
   void **refcount_table,
   int64_t *refcount_table_size, int64_t l2_offset,
-  int flags, BdrvCheckMode fix)
+  int flags, BdrvCheckMode fix, bool active)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t *l2_table, l2_entry;
@@ -1648,11 +1648,12 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 if (fix & BDRV_FIX_ERRORS) {
 uint64_t l2e_offset =
 l2_offset + (uint64_t)i * sizeof(uint64_t);
+int ign = active ? QCOW2_OL_ACTIVE_L2 :
+   QCOW2_OL_INACTIVE_L2;
 
 l2_entry = QCOW_OFLAG_ZERO;
 l2_table[i] = cpu_to_be64(l2_entry);
-ret = qcow2_pre_write_overlap_check(bs,
-QCOW2_OL_ACTIVE_L2 | QCOW2_OL_INACTIVE_L2,
+ret = qcow2_pre_write_overlap_check(bs, ign,
 l2e_offset, sizeof(uint64_t));
 if (ret < 0) {
 fprintf(stderr, "ERROR: Overlap check failed\n");
@@ -1726,7 +1727,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
   void **refcount_table,
   int64_t *refcount_table_size,
   int64_t l1_table_offset, int l1_size,
-  int flags, BdrvCheckMode fix)
+  int flags, BdrvCheckMode fix, bool active)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t *l1_table = NULL, l2_offset, l1_size2;
@@ -1782,7 +1783,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
 /* Process and check L2 entries */
 ret = check_refcounts_l2(bs, res, refcount_table,
  refcount_table_size, l2_offset, flags,
- fix);
+ fix, active);
 if (ret < 0) {
 goto fail;
 }
@@ -2068,7 +2069,7 @@ static int calculate_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
 /* current L1 table */
 ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
  s->l1_table_offset, s->l1_size, CHECK_FRAG_INFO,
- fix);
+ fix, true);
 if (ret < 0) {
 return ret;
 }
@@ -2091,7 +2092,8 @@ static int calculate_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
 continue;
 }
 ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
- sn->l1_table_offset, sn->l1_size, 0, fix);
+ sn->l1_table_offset, sn->l1_size, 0, fix,
+ false);
 if (ret < 0) {
 return ret;
 }
-- 
2.11.1




[Qemu-devel] [PATCH 0/7] qcow2 check improvements

2018-06-19 Thread Vladimir Sementsov-Ogievskiy
Hi all!

We've faced the following problem: after host fs corruption, vm images
becomes invalid. And which is interesting, starting qemu-img check on
them led to allocating of the whole RAM and then killing qemu-img by
OOM Killer.

This was due to corrupted l2 entries, which referenced clusters far-far
beyond the end of the qcow2 file.
02 is a generic fix for the bug, 01 is unrelated improvement, 03-07 are
additional info and fixing for such corrupted table entries.

Questions on 02, 06 and 07:
1. Should restrictions be more or less strict?
2. Are there valid cases, when such entries should not be considered as
   corrupted?

Vladimir Sementsov-Ogievskiy (7):
  block/qcow2-refcount: fix check_oflag_copied
  block/qcow2-refcount: avoid eating RAM
  block/qcow2-refcount: check_refcounts_l2: refactor compressed case
  block/qcow2-refcount: check_refcounts_l2: reduce ignored overlaps
  block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero
  block/qcow2-refcount: fix out-of-file L1 entries to be zero
  block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero

 block/qcow2-refcount.c | 257 +++--
 1 file changed, 206 insertions(+), 51 deletions(-)

-- 
2.11.1




[Qemu-devel] [PATCH 2/7] block/qcow2-refcount: avoid eating RAM

2018-06-19 Thread Vladimir Sementsov-Ogievskiy
qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat
unpredicted amount of memory on corrupted table entries, which are
referencing regions far beyond the end of file.

Prevent this, by skipping such regions from further processing.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2-refcount.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index f9d095aa2d..28d21bedc3 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1505,6 +1505,14 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, 
BdrvCheckResult *res,
 return 0;
 }
 
+if (offset + size - bdrv_getlength(bs->file->bs) > s->cluster_size) {
+fprintf(stderr, "ERROR: counting reference for region exceeding the "
+"end of the file by more than one cluster: offset 0x%" PRIx64
+" size 0x%" PRIx64 "\n", offset, size);
+res->corruptions++;
+return 0;
+}
+
 start = start_of_cluster(s, offset);
 last = start_of_cluster(s, offset + size - 1);
 for(cluster_offset = start; cluster_offset <= last;
-- 
2.11.1




Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.

2018-06-19 Thread Michael S. Tsirkin
On Tue, Jun 19, 2018 at 01:14:06PM -0500, Venu Busireddy wrote:
> On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote:
> > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain the
> > > "Group Identifier" (UUID) that will be used to pair a virtio device with
> > > the passthrough device attached to that bridge.
> > > 
> > > This capability is added to the bridge iff the "uuid" option is specified
> > > for the bridge device, via the qemu command line. Also, the bridge's
> > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is changed
> > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when the
> > > "uuid" option is present.
> > > 
> > > Signed-off-by: Venu Busireddy 
> > 
> > I don't see why we should add it to all bridges.
> > Let's just add it to ones that already have the RH vendor ID?
> 
> No. I am not adding the capability to all bridges.
> 
> In the earlier discussions, we agreed that the bridge be left as
> Intel bridge if we do not intend to use it for storing the pairing
> information. If we do intend to store the pairing information in the
> bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to
> avoid confusion. In other words, bridge's with RH Vendor ID come into
> existence only when there is an intent to store the pairing information
> in the bridge.
> 
> Accordingly, if the "uuid" option is specified for the bridge, it
> is assumed that the user intends to use the bridge for storing the
> pairing information, and hence, the capability is added to the bridge,
> and the Vendor ID is changed to RH Vendor ID. If the "uuid" option
> is not specified, the bridge remains as Intel bridge, and without the
> vendor-specific capability.
> 
> Venu

Yes but the way to do it is not to tweak the vendor and device ID,
instead, just add the UUID property to bridges that already have the
correct vendor and device id.


> > 
> > 
> > > ---
> > >  hw/pci-bridge/ioh3420.c|  2 ++
> > >  hw/pci-bridge/pcie_root_port.c |  7 +++
> > >  hw/pci/pci_bridge.c| 32 
> > >  include/hw/pci/pci.h   |  2 ++
> > >  include/hw/pci/pcie.h  |  1 +
> > >  include/hw/pci/pcie_port.h |  1 +
> > >  6 files changed, 45 insertions(+)
> > > 
> > > diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
> > > index a451d74ee6..b6b9ebc726 100644
> > > --- a/hw/pci-bridge/ioh3420.c
> > > +++ b/hw/pci-bridge/ioh3420.c
> > > @@ -35,6 +35,7 @@
> > >  #define IOH_EP_MSI_SUPPORTED_FLAGS  PCI_MSI_FLAGS_MASKBIT
> > >  #define IOH_EP_MSI_NR_VECTOR2
> > >  #define IOH_EP_EXP_OFFSET   0x90
> > > +#define IOH_EP_VENDOR_OFFSET0xCC
> > >  #define IOH_EP_AER_OFFSET   0x100
> > >  
> > >  /*
> > > @@ -111,6 +112,7 @@ static void ioh3420_class_init(ObjectClass *klass, 
> > > void *data)
> > >  rpc->exp_offset = IOH_EP_EXP_OFFSET;
> > >  rpc->aer_offset = IOH_EP_AER_OFFSET;
> > >  rpc->ssvid_offset = IOH_EP_SSVID_OFFSET;
> > > +rpc->vendor_offset = IOH_EP_VENDOR_OFFSET;
> > >  rpc->ssid = IOH_EP_SSVID_SSID;
> > >  }
> > >  
> > > diff --git a/hw/pci-bridge/pcie_root_port.c 
> > > b/hw/pci-bridge/pcie_root_port.c
> > > index 45f9e8cd4a..ba470c7fda 100644
> > > --- a/hw/pci-bridge/pcie_root_port.c
> > > +++ b/hw/pci-bridge/pcie_root_port.c
> > > @@ -71,6 +71,12 @@ static void rp_realize(PCIDevice *d, Error **errp)
> > >  goto err_bridge;
> > >  }
> > >  
> > > +rc = pci_bridge_vendor_init(d, rpc->vendor_offset, errp);
> > > +if (rc < 0) {
> > > +error_append_hint(errp, "Can't init group ID, error %d\n", rc);
> > > +goto err_bridge;
> > > +}
> > > +
> > >  if (rpc->interrupts_init) {
> > >  rc = rpc->interrupts_init(d, errp);
> > >  if (rc < 0) {
> > > @@ -137,6 +143,7 @@ static void rp_exit(PCIDevice *d)
> > >  static Property rp_props[] = {
> > >  DEFINE_PROP_BIT(COMPAT_PROP_PCP, PCIDevice, cap_present,
> > >  QEMU_PCIE_SLTCAP_PCP_BITNR, true),
> > > +DEFINE_PROP_UUID(COMPAT_PROP_UUID, PCIDevice, uuid, false),
> > >  DEFINE_PROP_END_OF_LIST()
> > >  };
> > >  
> > > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> > > index 40a39f57cb..c63bc439f7 100644
> > > --- a/hw/pci/pci_bridge.c
> > > +++ b/hw/pci/pci_bridge.c
> > > @@ -34,12 +34,17 @@
> > >  #include "hw/pci/pci_bus.h"
> > >  #include "qemu/range.h"
> > >  #include "qapi/error.h"
> > > +#include "qemu/uuid.h"
> > >  
> > >  /* PCI bridge subsystem vendor ID helper functions */
> > >  #define PCI_SSVID_SIZEOF8
> > >  #define PCI_SSVID_SVID  4
> > >  #define PCI_SSVID_SSID  6
> > >  
> > > +#define PCI_VENDOR_SIZEOF 20
> > > +#define PCI_VENDOR_CAP_LEN_OFFSET  2
> > > +#define PCI_VENDOR_GROUP_ID_OFFSET 4
> > > +
> > >  int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
> 

Re: [Qemu-devel] [PATCH v2] migration: fix crash in when incoming client channel setup fails

2018-06-19 Thread Eric Blake

On 06/19/2018 11:35 AM, Daniel P. Berrangé wrote:

The way we determine if we can start the incoming migration was
changed to use migration_has_all_channels() in:

   commit 428d89084c709e568f9cd301c2f6416a54c53d6d
   Author: Juan Quintela 
   Date:   Mon Jul 24 13:06:25 2017 +0200

 migration: Create migration_has_all_channels

This method in turn calls multifd_recv_all_channels_created()
which is hardcoded to always return 'true' when multifd is
not in use. This is a latent bug...

...activated in in a following commit where that return result


s/in in/in/

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH 00/113] Patch Round-up for stable 2.11.2, freeze on 2018-06-22

2018-06-19 Thread Cole Robinson
On 06/18/2018 09:41 PM, Michael Roth wrote:
> Hi everyone,
> 
> The following new patches are queued for QEMU stable v2.11.2:
> 
>   https://github.com/mdroth/qemu/commits/stable-2.11-staging
> 
> The release is planned for 2018-06-22:
> 
>   https://wiki.qemu.org/Planning/2.11
> 
> Please respond here or CC qemu-sta...@nongnu.org on any patches you
> think should be included in the release.
> 
> Thanks!
> 

Extra patches we are carrying in Fedora 28:

commit f7a5376d4b667cf6c83c1d640e32d22456d7b5ee
Author: Daniel P. Berrange 
Date:   Tue Jan 16 13:42:10 2018 +

qapi: ensure stable sort ordering when checking QAPI entities

commit 057ad0b46992e3ec4ce29b9103162aa3c683f347
Author: Daniel P. Berrangé 
Date:   Wed Feb 28 14:04:38 2018 +

crypto: ensure we use a predictable TLS priority setting


Thanks,
Cole





Re: [Qemu-devel] [virtio-dev] Re: [PATCH virtio 1/1] Add "Group Identifier" support to virtio PCI capabilities.

2018-06-19 Thread Venu Busireddy
On 2018-06-19 21:12:17 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 19, 2018 at 12:54:06PM -0500, Venu Busireddy wrote:
> > On 2018-06-19 20:30:06 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jun 19, 2018 at 11:32:28AM -0500, Venu Busireddy wrote:
> > > > Add VIRTIO_PCI_CAP_GROUP_ID_CFG (Group Identifier) capability to the
> > > > virtio PCI capabilities to allow for the grouping of devices.
> > > > 
> > > > Signed-off-by: Venu Busireddy 
> > > > ---
> > > >  content.tex | 43 +++
> > > >  1 file changed, 43 insertions(+)
> > > > 
> > > > diff --git a/content.tex b/content.tex
> > > > index 7a92cb1..7ea6267 100644
> > > > --- a/content.tex
> > > > +++ b/content.tex
> > > > @@ -599,6 +599,8 @@ The fields are interpreted as follows:
> > > >  #define VIRTIO_PCI_CAP_DEVICE_CFG4
> > > >  /* PCI configuration access */
> > > >  #define VIRTIO_PCI_CAP_PCI_CFG   5
> > > > +/* Group Identifier */
> > > > +#define VIRTIO_PCI_CAP_GROUP_ID_CFG  6
> > > >  \end{lstlisting}
> > > >  
> > > >  Any other value is reserved for future use.
> > > > @@ -997,6 +999,47 @@ address \field{cap.length} bytes within a BAR range
> > > >  specified by some other Virtio Structure PCI Capability
> > > >  of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
> > > >  
> > > > +\subsubsection{Group Identifier capability}\label{sec:Virtio Transport 
> > > > Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier 
> > > > capability}
> > > > +
> > > > +The VIRTIO_PCI_CAP_GROUP_ID_CFG capability provides means for grouping 
> > > > devices together.
> > > > +
> > > > +The capability is immediately followed by an identifier of arbitrary 
> > > > size as below:
> > > > +
> > > > +\begin{lstlisting}
> > > > +struct virtio_pci_group_id_cap {
> > > > +struct virtio_pci_cap cap;
> > > > +u8 group_id[]; /* Group Identifier */
> > > > +};
> > > > +\end{lstlisting}
> > > > +
> > > > +The fields \field{cap.bar}, \field{cap.length}, \field{cap.offset}
> > > > +and \field{group_id} are read-only for the driver.
> > > > +
> > > > +The specification does not impose any restrictions on the size or
> > > > +structure of group_id[].
> > > 
> > > I think it must be a multiple of 4 in size, as is
> > > standard for all capabilities.
> > 
> > Sure. Would rephrasing it as below suffice?
> > 
> > The specification does not impose any restrictions on the size or
> > structure of group_id[], except that the size must be a multiple of 4.
> > 
> > > 
> > > 
> > > > Vendors
> 
> Devices

Will correct it in the next version.

> 
> > are free to declare this array as
> > > > +large as needed, as long as the combined size of all capabilities can
> > > > +be accommodated within the PCI configuration space.
> > > > +
> > > > +If there is enough room in the PCI configuration space to accommodate
> > > > +the group identifier, the fields \field{cap.bar}, \field{cap.offset}
> > > > +and \field{cap.length} should be set to 0.
> > > > +
> > > > +If there isn't enough room, some or all of the group identifier can be
> > > > +presented in the BAR region, in which case the fields \field{cap.bar},
> > > > +\field{cap.offset} and \field{cap.length} should be set appropriately.
> > > 
> > > And then how do you glue the two pieces?
> > 
> > How the user glues them up is up to the user. The specification should
> > not impose rules on that, right?
> 
> We need to define how these are matched.
> Let's assume device A has it all in config space, device B
> has part in memory. How would we compare them?

I will go with your suggestion below, and hence, this becomes obsolete.

> 
> 
> 
> > > 
> > > > +
> > > > +In either case, the field \field{cap.cap_len} indicates the length of
> > > > +the group identifier information present in the configuration space
> > > > +itself.
> > > 
> > > It seems like an overkill to me. Isn't it enough to have it in config
> > > space? This would make comparisons easier.
> > 
> > I was trying to make the proposal permissive for expansion, in case
> > the user needs the size to be larger than what can be accommodated in
> > the config space. Would you like me to restrict that the capability be
> > entirely present in the config space? I am fine with it. Please confirm,
> > and I will change it so.
> 
> I think so, yes.

Sure. I will revise the specification as above in the next version.

Thanks,

Venu

> 
> > > 
> > > > +
> > > > +\devicenormative{\paragraph}{Group Identifier capability}{Virtio 
> > > > Transport Options / Virtio Over PCI Bus / PCI Device Layout / Group 
> > > > Identifier capability}
> > > > +
> > > > +The device MAY present the VIRTIO_PCI_CAP_GROUP_ID_CFG capability.
> > > > +
> > > > +\drivernormative{\paragraph}{Group Identifier capability}{Virtio 
> > > > Transport Options / Virtio Over PCI Bus / PCI Device Layout / Group 
> > > > Identifier capability}
> > > > +
> > > > +The driver MUST NOT write to group_id[] area or the BAR region.
> > > > 

Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.

2018-06-19 Thread Venu Busireddy
On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote:
> > Add a "Vendor-Specific" capability to the PCIe bridge, to contain the
> > "Group Identifier" (UUID) that will be used to pair a virtio device with
> > the passthrough device attached to that bridge.
> > 
> > This capability is added to the bridge iff the "uuid" option is specified
> > for the bridge device, via the qemu command line. Also, the bridge's
> > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is changed
> > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when the
> > "uuid" option is present.
> > 
> > Signed-off-by: Venu Busireddy 
> 
> I don't see why we should add it to all bridges.
> Let's just add it to ones that already have the RH vendor ID?

No. I am not adding the capability to all bridges.

In the earlier discussions, we agreed that the bridge be left as
Intel bridge if we do not intend to use it for storing the pairing
information. If we do intend to store the pairing information in the
bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to
avoid confusion. In other words, bridge's with RH Vendor ID come into
existence only when there is an intent to store the pairing information
in the bridge.

Accordingly, if the "uuid" option is specified for the bridge, it
is assumed that the user intends to use the bridge for storing the
pairing information, and hence, the capability is added to the bridge,
and the Vendor ID is changed to RH Vendor ID. If the "uuid" option
is not specified, the bridge remains as Intel bridge, and without the
vendor-specific capability.

Venu

> 
> 
> > ---
> >  hw/pci-bridge/ioh3420.c|  2 ++
> >  hw/pci-bridge/pcie_root_port.c |  7 +++
> >  hw/pci/pci_bridge.c| 32 
> >  include/hw/pci/pci.h   |  2 ++
> >  include/hw/pci/pcie.h  |  1 +
> >  include/hw/pci/pcie_port.h |  1 +
> >  6 files changed, 45 insertions(+)
> > 
> > diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
> > index a451d74ee6..b6b9ebc726 100644
> > --- a/hw/pci-bridge/ioh3420.c
> > +++ b/hw/pci-bridge/ioh3420.c
> > @@ -35,6 +35,7 @@
> >  #define IOH_EP_MSI_SUPPORTED_FLAGS  PCI_MSI_FLAGS_MASKBIT
> >  #define IOH_EP_MSI_NR_VECTOR2
> >  #define IOH_EP_EXP_OFFSET   0x90
> > +#define IOH_EP_VENDOR_OFFSET0xCC
> >  #define IOH_EP_AER_OFFSET   0x100
> >  
> >  /*
> > @@ -111,6 +112,7 @@ static void ioh3420_class_init(ObjectClass *klass, void 
> > *data)
> >  rpc->exp_offset = IOH_EP_EXP_OFFSET;
> >  rpc->aer_offset = IOH_EP_AER_OFFSET;
> >  rpc->ssvid_offset = IOH_EP_SSVID_OFFSET;
> > +rpc->vendor_offset = IOH_EP_VENDOR_OFFSET;
> >  rpc->ssid = IOH_EP_SSVID_SSID;
> >  }
> >  
> > diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
> > index 45f9e8cd4a..ba470c7fda 100644
> > --- a/hw/pci-bridge/pcie_root_port.c
> > +++ b/hw/pci-bridge/pcie_root_port.c
> > @@ -71,6 +71,12 @@ static void rp_realize(PCIDevice *d, Error **errp)
> >  goto err_bridge;
> >  }
> >  
> > +rc = pci_bridge_vendor_init(d, rpc->vendor_offset, errp);
> > +if (rc < 0) {
> > +error_append_hint(errp, "Can't init group ID, error %d\n", rc);
> > +goto err_bridge;
> > +}
> > +
> >  if (rpc->interrupts_init) {
> >  rc = rpc->interrupts_init(d, errp);
> >  if (rc < 0) {
> > @@ -137,6 +143,7 @@ static void rp_exit(PCIDevice *d)
> >  static Property rp_props[] = {
> >  DEFINE_PROP_BIT(COMPAT_PROP_PCP, PCIDevice, cap_present,
> >  QEMU_PCIE_SLTCAP_PCP_BITNR, true),
> > +DEFINE_PROP_UUID(COMPAT_PROP_UUID, PCIDevice, uuid, false),
> >  DEFINE_PROP_END_OF_LIST()
> >  };
> >  
> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> > index 40a39f57cb..c63bc439f7 100644
> > --- a/hw/pci/pci_bridge.c
> > +++ b/hw/pci/pci_bridge.c
> > @@ -34,12 +34,17 @@
> >  #include "hw/pci/pci_bus.h"
> >  #include "qemu/range.h"
> >  #include "qapi/error.h"
> > +#include "qemu/uuid.h"
> >  
> >  /* PCI bridge subsystem vendor ID helper functions */
> >  #define PCI_SSVID_SIZEOF8
> >  #define PCI_SSVID_SVID  4
> >  #define PCI_SSVID_SSID  6
> >  
> > +#define PCI_VENDOR_SIZEOF 20
> > +#define PCI_VENDOR_CAP_LEN_OFFSET  2
> > +#define PCI_VENDOR_GROUP_ID_OFFSET 4
> > +
> >  int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
> >uint16_t svid, uint16_t ssid,
> >Error **errp)
> > @@ -57,6 +62,33 @@ int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
> >  return pos;
> >  }
> >  
> > +int pci_bridge_vendor_init(PCIDevice *d, uint8_t offset, Error **errp)
> > +{
> > +int pos;
> > +PCIDeviceClass *dc = PCI_DEVICE_GET_CLASS(d);
> > +
> > +if (qemu_uuid_is_null(>uuid)) {
> > +return 0;
> > +}
> > +

Re: [Qemu-devel] Design Decision for KVM based anti rootkit

2018-06-19 Thread Ahmed Soliman
On 19 June 2018 at 19:37, David Vrabel  wrote:
> It's not clear how this increases security. What threats is this
> protecting again?
It won't completely protect prevent rootkits, because still rootkits
can edit dynamic kernel data structures, but it will limit what
rootkits damage to only dynamic data.
This way system calls can't be changed, or Interrupt tables.
> As an attacker, modifying the sensitive pages (kernel text?) will
> require either: a) altering the existing mappings for these (to make
> them read-write or user-writable for example); or b) creating aliased
> mappings with suitable permissions.
>
> If the attacker can modify page tables in this way then it can also
> bypass the suggested hypervisor's read-only protection by changing the
> mappings to point to a unprotected page.

I think I was missing this part out, but I meant to say completely
prevent any modification to pages including the guest physical address
to guest virtual address mapping for those protected pages, Another
tricky (something random just popped up in my mind right now, better
to say it than to forget it) solution is making new memory mappings
inherit the same protection as old one, I assume that Hyper visor can
do either things. Also that was the kind of performance hit I was
talking about. I am not sure if that might break things or I can say
it will for sure heavily limit some functionalities. like maybe
hibernating guest. But that will be the kind of trades off I am
expecting at least at the begining.



Re: [Qemu-devel] [virtio-dev] Re: [PATCH virtio 1/1] Add "Group Identifier" support to virtio PCI capabilities.

2018-06-19 Thread Michael S. Tsirkin
On Tue, Jun 19, 2018 at 12:54:06PM -0500, Venu Busireddy wrote:
> On 2018-06-19 20:30:06 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 19, 2018 at 11:32:28AM -0500, Venu Busireddy wrote:
> > > Add VIRTIO_PCI_CAP_GROUP_ID_CFG (Group Identifier) capability to the
> > > virtio PCI capabilities to allow for the grouping of devices.
> > > 
> > > Signed-off-by: Venu Busireddy 
> > > ---
> > >  content.tex | 43 +++
> > >  1 file changed, 43 insertions(+)
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index 7a92cb1..7ea6267 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -599,6 +599,8 @@ The fields are interpreted as follows:
> > >  #define VIRTIO_PCI_CAP_DEVICE_CFG4
> > >  /* PCI configuration access */
> > >  #define VIRTIO_PCI_CAP_PCI_CFG   5
> > > +/* Group Identifier */
> > > +#define VIRTIO_PCI_CAP_GROUP_ID_CFG  6
> > >  \end{lstlisting}
> > >  
> > >  Any other value is reserved for future use.
> > > @@ -997,6 +999,47 @@ address \field{cap.length} bytes within a BAR range
> > >  specified by some other Virtio Structure PCI Capability
> > >  of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
> > >  
> > > +\subsubsection{Group Identifier capability}\label{sec:Virtio Transport 
> > > Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier 
> > > capability}
> > > +
> > > +The VIRTIO_PCI_CAP_GROUP_ID_CFG capability provides means for grouping 
> > > devices together.
> > > +
> > > +The capability is immediately followed by an identifier of arbitrary 
> > > size as below:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_pci_group_id_cap {
> > > +struct virtio_pci_cap cap;
> > > +u8 group_id[]; /* Group Identifier */
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +The fields \field{cap.bar}, \field{cap.length}, \field{cap.offset}
> > > +and \field{group_id} are read-only for the driver.
> > > +
> > > +The specification does not impose any restrictions on the size or
> > > +structure of group_id[].
> > 
> > I think it must be a multiple of 4 in size, as is
> > standard for all capabilities.
> 
> Sure. Would rephrasing it as below suffice?
> 
> The specification does not impose any restrictions on the size or
> structure of group_id[], except that the size must be a multiple of 4.
> 
> > 
> > 
> > > Vendors

Devices

> are free to declare this array as
> > > +large as needed, as long as the combined size of all capabilities can
> > > +be accommodated within the PCI configuration space.
> > > +
> > > +If there is enough room in the PCI configuration space to accommodate
> > > +the group identifier, the fields \field{cap.bar}, \field{cap.offset}
> > > +and \field{cap.length} should be set to 0.
> > > +
> > > +If there isn't enough room, some or all of the group identifier can be
> > > +presented in the BAR region, in which case the fields \field{cap.bar},
> > > +\field{cap.offset} and \field{cap.length} should be set appropriately.
> > 
> > And then how do you glue the two pieces?
> 
> How the user glues them up is up to the user. The specification should
> not impose rules on that, right?

We need to define how these are matched.
Let's assume device A has it all in config space, device B
has part in memory. How would we compare them?




> > 
> > > +
> > > +In either case, the field \field{cap.cap_len} indicates the length of
> > > +the group identifier information present in the configuration space
> > > +itself.
> > 
> > It seems like an overkill to me. Isn't it enough to have it in config
> > space? This would make comparisons easier.
> 
> I was trying to make the proposal permissive for expansion, in case
> the user needs the size to be larger than what can be accommodated in
> the config space. Would you like me to restrict that the capability be
> entirely present in the config space? I am fine with it. Please confirm,
> and I will change it so.

I think so, yes.

> > 
> > > +
> > > +\devicenormative{\paragraph}{Group Identifier capability}{Virtio 
> > > Transport Options / Virtio Over PCI Bus / PCI Device Layout / Group 
> > > Identifier capability}
> > > +
> > > +The device MAY present the VIRTIO_PCI_CAP_GROUP_ID_CFG capability.
> > > +
> > > +\drivernormative{\paragraph}{Group Identifier capability}{Virtio 
> > > Transport Options / Virtio Over PCI Bus / PCI Device Layout / Group 
> > > Identifier capability}
> > > +
> > > +The driver MUST NOT write to group_id[] area or the BAR region.
> > > +
> > >  \subsubsection{Legacy Interfaces: A Note on PCI Device 
> > > Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI 
> > > Device Layout / Legacy Interfaces: A Note on PCI Device Layout}
> > >  
> > >  Transitional devices MUST present part of configuration
> > 
> > -
> > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> > For additional commands, e-mail: 

Re: [Qemu-devel] [PATCH V1 RESEND 0/6] Build ACPI Heterogeneous Memory Attribute Table (HMAT)

2018-06-19 Thread no-reply
Hi,

This series failed docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 1529421657-14969-1-git-send-email-jingqi@intel.com
Subject: [Qemu-devel] [PATCH V1 RESEND 0/6] Build ACPI Heterogeneous Memory 
Attribute Table (HMAT)

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
b1e9f8529e hmat acpi: Implement _HMA method to update HMAT at runtime
f7eb5356e7 numa: Extend the command-line to provide memory side cache 
information
d89f8eb917 numa: Extend the command-line to provide memory latency and 
bandwidth information
9376f5703d hmat acpi: Build Memory Side Cache Information Structure(s) in ACPI 
HMAT
6e1685b947 hmat acpi: Build System Locality Latency and Bandwidth Information 
Structure(s) in ACPI HMAT
4a72c940eb hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI 
HMAT

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-2ntfz6_t/src/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
  BUILD   fedora
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-2ntfz6_t/src'
  GEN 
/var/tmp/patchew-tester-tmp-2ntfz6_t/src/docker-src.2018-06-19-14.05.45.15002/qemu.tar
Cloning into 
'/var/tmp/patchew-tester-tmp-2ntfz6_t/src/docker-src.2018-06-19-14.05.45.15002/qemu.tar.vroot'...
done.
Checking out files:  46% (2880/6239)   
Checking out files:  47% (2933/6239)   
Checking out files:  48% (2995/6239)   
Checking out files:  49% (3058/6239)   
Checking out files:  50% (3120/6239)   
Checking out files:  51% (3182/6239)   
Checking out files:  52% (3245/6239)   
Checking out files:  53% (3307/6239)   
Checking out files:  54% (3370/6239)   
Checking out files:  55% (3432/6239)   
Checking out files:  56% (3494/6239)   
Checking out files:  57% (3557/6239)   
Checking out files:  58% (3619/6239)   
Checking out files:  59% (3682/6239)   
Checking out files:  60% (3744/6239)   
Checking out files:  61% (3806/6239)   
Checking out files:  62% (3869/6239)   
Checking out files:  63% (3931/6239)   
Checking out files:  64% (3993/6239)   
Checking out files:  65% (4056/6239)   
Checking out files:  66% (4118/6239)   
Checking out files:  67% (4181/6239)   
Checking out files:  68% (4243/6239)   
Checking out files:  69% (4305/6239)   
Checking out files:  70% (4368/6239)   
Checking out files:  71% (4430/6239)   
Checking out files:  72% (4493/6239)   
Checking out files:  73% (4555/6239)   
Checking out files:  74% (4617/6239)   
Checking out files:  75% (4680/6239)   
Checking out files:  76% (4742/6239)   
Checking out files:  76% (4751/6239)   
Checking out files:  77% (4805/6239)   
Checking out files:  78% (4867/6239)   
Checking out files:  79% (4929/6239)   
Checking out files:  80% (4992/6239)   
Checking out files:  81% (5054/6239)   
Checking out files:  82% (5116/6239)   
Checking out files:  83% (5179/6239)   
Checking out files:  84% (5241/6239)   
Checking out files:  85% (5304/6239)   
Checking out files:  86% (5366/6239)   
Checking out files:  87% (5428/6239)   
Checking out files:  88% (5491/6239)   
Checking out files:  89% (5553/6239)   
Checking out files:  90% (5616/6239)   
Checking out files:  91% (5678/6239)   
Checking out files:  92% (5740/6239)   
Checking out files:  93% (5803/6239)   
Checking out files:  94% (5865/6239)   
Checking out files:  95% (5928/6239)   
Checking out files:  96% (5990/6239)   
Checking out files:  97% (6052/6239)   
Checking out files:  98% (6115/6239)   
Checking out files:  99% (6177/6239)   
Checking out files: 100% (6239/6239)   
Checking out files: 100% (6239/6239), done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 
'/var/tmp/patchew-tester-tmp-2ntfz6_t/src/docker-src.2018-06-19-14.05.45.15002/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered 
for path 'ui/keycodemapdb'
Cloning into 
'/var/tmp/patchew-tester-tmp-2ntfz6_t/src/docker-src.2018-06-19-14.05.45.15002/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out 
'6b3d716e2b6472eb7189d3220552280ef3d832ce'
  COPYRUNNER
RUN test-mingw in qemu:fedora 
Packages installed:
SDL2-devel-2.0.8-5.fc28.x86_64
bc-1.07.1-5.fc28.x86_64
bison-3.0.4-9.fc28.x86_64
bluez-libs-devel-5.49-3.fc28.x86_64
brlapi-devel-0.6.7-12.fc28.x86_64
bzip2-1.0.6-26.fc28.x86_64
bzip2-devel-1.0.6-26.fc28.x86_64
ccache-3.4.2-2.fc28.x86_64
clang-6.0.0-5.fc28.x86_64

  1   2   3   4   >