[Qemu-devel] [RFC PATCH 1/4] hw/intc/arm_gicv3_common: Add state information

2015-09-30 Thread Pavel Fedin
Add state information to GICv3 object structure and implement
arm_gicv3_common_reset(). Also, add some functions for registers which are
not stored directly but simulated.

State information includes not only pure GICv3 data, but also some legacy
registers. This will be useful for implementing software emulation of GICv3
with v2 backwards compatilibity mode.

Signed-off-by: Pavel Fedin 
---
 hw/intc/arm_gicv3_common.c | 135 +++-
 hw/intc/gicv3_internal.h   | 152 +
 include/hw/intc/arm_gicv3_common.h |  76 +++
 3 files changed, 362 insertions(+), 1 deletion(-)
 create mode 100644 hw/intc/gicv3_internal.h

diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index 032ece2..0818fb9 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -21,6 +21,7 @@
  */
 
 #include "hw/intc/arm_gicv3_common.h"
+#include "gicv3_internal.h"
 
 static void gicv3_pre_save(void *opaque)
 {
@@ -88,6 +89,7 @@ void gicv3_init_irqs_and_mmio(GICv3State *s, qemu_irq_handler 
handler,
 static void arm_gicv3_common_realize(DeviceState *dev, Error **errp)
 {
 GICv3State *s = ARM_GICV3_COMMON(dev);
+int i;
 
 /* revision property is actually reserved and currently used only in order
  * to keep the interface compatible with GICv2 code, avoiding extra
@@ -98,11 +100,142 @@ static void arm_gicv3_common_realize(DeviceState *dev, 
Error **errp)
 error_setg(errp, "unsupported GIC revision %d", s->revision);
 return;
 }
+
+if (s->num_irq > GICV3_MAXIRQ) {
+error_setg(errp,
+   "requested %u interrupt lines exceeds GIC maximum %d",
+   s->num_irq, GICV3_MAXIRQ);
+return;
+}
+
+for (i = 0; i < GICV3_MAXIRQ; i++) {
+uint32_t mask_size = BITS_TO_LONGS(s->num_cpu);
+
+s->irq_state[i].mask_size = mask_size;
+s->irq_state[i].enabled = g_malloc(mask_size * sizeof(unsigned long));
+s->irq_state[i].pending = g_malloc(mask_size * sizeof(unsigned long));
+s->irq_state[i].active = g_malloc(mask_size * sizeof(unsigned long));
+s->irq_state[i].level = g_malloc(mask_size * sizeof(unsigned long));
+s->irq_state[i].group = g_malloc(mask_size * sizeof(unsigned long));
+}
+
+s->cpu = g_malloc(s->num_cpu * sizeof(GICv3CPUState));
 }
 
 static void arm_gicv3_common_reset(DeviceState *dev)
 {
-/* TODO */
+GICv3State *s = ARM_GICV3_COMMON(dev);
+int i;
+
+for (i = 0; i < s->num_cpu; i++) {
+GICv3CPUState *c = &s->cpu[i];
+
+c->cpu_enabled = false;
+memset(c->priority1, 0, sizeof(c->priority1));
+memset(c->sgi_pending, 0, sizeof(c->sgi_pending));
+
+c->ctlr[0] = 0;
+c->ctlr[1] = 0;
+c->legacy_ctlr = 0;
+c->priority_mask = 0;
+c->bpr[0] = GIC_MIN_BPR0;
+c->bpr[1] = GIC_MIN_BPR1;
+memset(c->apr, 0, sizeof(c->apr));
+
+c->current_pending = 1023;
+c->running_irq = 1023;
+c->running_priority = 0x100;
+memset(c->last_active, 0, sizeof(c->last_active));
+}
+
+for (i = 0; i < GICV3_MAXIRQ; i++) {
+uint32_t mask_size = s->irq_state[i].mask_size;
+
+memset(s->irq_state[i].enabled, 0, mask_size * sizeof(unsigned long));
+memset(s->irq_state[i].pending, 0, mask_size * sizeof(unsigned long));
+memset(s->irq_state[i].active, 0, mask_size * sizeof(unsigned long));
+memset(s->irq_state[i].level, 0, mask_size * sizeof(unsigned long));
+memset(s->irq_state[i].group, 0, mask_size * sizeof(unsigned long));
+s->irq_state[i].edge_trigger = false;
+}
+
+/* GIC-500 comment 'j' SGI are always enabled */
+for (i = 0; i < GIC_NR_SGIS; i++) {
+set_all_cpus(s, s->irq_state[i].enabled);
+s->irq_state[i].edge_trigger = true;
+}
+/* By default all interrupts always target CPU #0 */
+for (i = 0; i < GICV3_MAXIRQ; i++) {
+s->irq_target[i] = 1;
+}
+memset(s->irq_route, 0, sizeof(s->irq_route));
+memset(s->priority2, 0, sizeof(s->priority2));
+
+/* With all configuration we don't support GICv2 backwards computability */
+if (s->security_extn) {
+/* GICv3 5.3.20 With two security So DS is RAZ/WI ARE_NS is RAO/WI
+ * and ARE_S is RAO/WI
+ */
+ s->ctlr = GICD_CTLR_ARE_S | GICD_CTLR_ARE_NS;
+} else {
+/* GICv3 5.3.20 With one security So DS is RAO/WI ARE is RAO/WI
+ */
+s->ctlr = GICD_CTLR_DS | GICD_CTLR_ARE;
+}
+/* Workaround!
+ * Linux (drivers/irqchip/irq-gic-v3.c) is enabling only group one,
+ * in gic_cpu_sys_reg_init it calls gic_write_grpen1(1);
+ * but it doesn't conigure any interrupt to be in group one
+ */
+for (i = 0; i < s->num_irq; i++) {
+set_all_cpus(s, s->irq_state[i].group);
+}
+}
+
+void set_irq_bit(GICv3State *s, unsigned long *mask, int ir

Re: [Qemu-devel] [PATCH 1/7] string-input-visitor: Fix uint64 parsing

2015-09-30 Thread Eric Blake
On 09/30/2015 07:23 AM, Andreas Färber wrote:
> Am 30.09.2015 um 15:19 schrieb Markus Armbruster:
>> Andreas Färber  writes:
>>> As a bug fix, ignore warnings about preference of qemu_strto[u]ll().
>>
>> I'm not sure I get this sentence.
> 
> This patch causes checkpatch warnings. I intentionally do not address
> them in this bug-fix patch, but instead in a later patch in the series.

Maybe:

As this is a bug fix, the patch intentionally ignores checkpatch
warnings to prefer the use of qemu_strto[u]ll() to minimize size; a
later patch will further address that issue.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v5 1/2] PCI: add missing classes in pci_ids.h to build device tree

2015-09-30 Thread Laurent Vivier
To allow QEMU to add PCI entries in device tree,
we must have a more exhaustive list of PCI class IDs.

This patch synchronizes as much as possible with
pci_ids.h and add some missing IDs from SLOF.

Signed-off-by: Laurent Vivier 
Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Thomas Huth 
---
 include/hw/pci/pci_ids.h | 112 +++
 1 file changed, 103 insertions(+), 9 deletions(-)

diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index d98e6c9..e27dc39 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -12,41 +12,84 @@
 
 /* Device classes and subclasses */
 
-#define PCI_BASE_CLASS_STORAGE   0x01
-#define PCI_BASE_CLASS_NETWORK   0x02
+#define PCI_CLASS_NOT_DEFINED0x
+#define PCI_CLASS_NOT_DEFINED_VGA0x0001
 
+#define PCI_BASE_CLASS_STORAGE   0x01
 #define PCI_CLASS_STORAGE_SCSI   0x0100
 #define PCI_CLASS_STORAGE_IDE0x0101
+#define PCI_CLASS_STORAGE_FLOPPY 0x0102
+#define PCI_CLASS_STORAGE_IPI0x0103
 #define PCI_CLASS_STORAGE_RAID   0x0104
+#define PCI_CLASS_STORAGE_ATA0x0105
 #define PCI_CLASS_STORAGE_SATA   0x0106
+#define PCI_CLASS_STORAGE_SAS0x0107
 #define PCI_CLASS_STORAGE_EXPRESS0x0108
 #define PCI_CLASS_STORAGE_OTHER  0x0180
 
+#define PCI_BASE_CLASS_NETWORK   0x02
 #define PCI_CLASS_NETWORK_ETHERNET   0x0200
+#define PCI_CLASS_NETWORK_TOKEN_RING 0x0201
+#define PCI_CLASS_NETWORK_FDDI   0x0202
+#define PCI_CLASS_NETWORK_ATM0x0203
+#define PCI_CLASS_NETWORK_ISDN   0x0204
+#define PCI_CLASS_NETWORK_WORLDFIP   0x0205
+#define PCI_CLASS_NETWORK_PICMG214   0x0206
 #define PCI_CLASS_NETWORK_OTHER  0x0280
 
+#define PCI_BASE_CLASS_DISPLAY   0x03
 #define PCI_CLASS_DISPLAY_VGA0x0300
+#define PCI_CLASS_DISPLAY_XGA0x0301
+#define PCI_CLASS_DISPLAY_3D 0x0302
 #define PCI_CLASS_DISPLAY_OTHER  0x0380
 
+#define PCI_BASE_CLASS_MULTIMEDIA0x04
+#define PCI_CLASS_MULTIMEDIA_VIDEO   0x0400
 #define PCI_CLASS_MULTIMEDIA_AUDIO   0x0401
+#define PCI_CLASS_MULTIMEDIA_PHONE   0x0402
+#define PCI_CLASS_MULTIMEDIA_OTHER   0x0480
 
+#define PCI_BASE_CLASS_MEMORY0x05
 #define PCI_CLASS_MEMORY_RAM 0x0500
+#define PCI_CLASS_MEMORY_FLASH   0x0501
+#define PCI_CLASS_MEMORY_OTHER   0x0580
 
-#define PCI_CLASS_SYSTEM_SDHCI   0x0805
-#define PCI_CLASS_SYSTEM_OTHER   0x0880
-
-#define PCI_CLASS_SERIAL_USB 0x0c03
-#define PCI_CLASS_SERIAL_SMBUS   0x0c05
-
+#define PCI_BASE_CLASS_BRIDGE0x06
 #define PCI_CLASS_BRIDGE_HOST0x0600
 #define PCI_CLASS_BRIDGE_ISA 0x0601
+#define PCI_CLASS_BRIDGE_EISA0x0602
+#define PCI_CLASS_BRIDGE_MC  0x0603
 #define PCI_CLASS_BRIDGE_PCI 0x0604
 #define PCI_CLASS_BRIDGE_PCI_INF_SUB 0x01
+#define PCI_CLASS_BRIDGE_PCMCIA  0x0605
+#define PCI_CLASS_BRIDGE_NUBUS   0x0606
+#define PCI_CLASS_BRIDGE_CARDBUS 0x0607
+#define PCI_CLASS_BRIDGE_RACEWAY 0x0608
+#define PCI_CLASS_BRIDGE_PCI_SEMITP  0x0609
+#define PCI_CLASS_BRIDGE_IB_PCI  0x060a
 #define PCI_CLASS_BRIDGE_OTHER   0x0680
 
+#define PCI_BASE_CLASS_COMMUNICATION 0x07
 #define PCI_CLASS_COMMUNICATION_SERIAL   0x0700
+#define PCI_CLASS_COMMUNICATION_PARALLEL 0x0701
+#define PCI_CLASS_COMMUNICATION_MULTISERIAL 0x0702
+#define PCI_CLASS_COMMUNICATION_MODEM0x0703
+#define PCI_CLASS_COMMUNICATION_GPIB 0x0704
+#define PCI_CLASS_COMMUNICATION_SC   0x0705
 #define PCI_CLASS_COMMUNICATION_OTHER0x0780
 
+#define PCI_BASE_CLASS_SYSTEM0x08
+#define PCI_CLASS_SYSTEM_PIC 0x0800
+#define PCI_CLASS_SYSTEM_PIC_IOAPIC  0x080010
+#define PCI_CLASS_SYSTEM_PIC_IOXAPIC 0x080020
+#define PCI_CLASS_SYSTEM_DMA 0x0801
+#define PCI_CLASS_SYSTEM_TIMER   0x0802
+#define PCI_CLASS_SYSTEM_RTC 0x0803
+#define PCI_CLASS_SYSTEM_PCI_HOTPLUG 0x0804
+#define PCI_CLASS_SYSTEM_SDHCI   0x0805
+#define PCI_CLASS_SYSTEM_OTHER   0x0880
+
+#define PCI_BASE_CLASS_INPUT 0x09
 #define PCI_CLASS_INPUT_KEYBOARD 0x0900
 #define PCI_CLASS_INPUT_PEN  0x0901
 #define PCI_CLASS_INPUT_MOUSE0x0902
@@ -54,8 +97,59 @@
 #define PCI_CLASS_INPUT_GAMEPORT 0x0904
 #define PCI_CLASS_INPUT_OTHER0x0980
 
-#define PCI_CLASS_PROCESSOR_CO   0x0b40
+#define PCI_BASE_CLASS_DOCKING   0x0a
+#define PCI_CLASS_DOCKING_GENERIC0x0a00
+#define PCI_CLASS_DOCKING_OTHER  0x0a80
+
+#define PCI_BASE_CLASS_PROCESSOR 0x0b
+#define PCI_CLASS_PROCESSOR_PENTIUM  0x0b02
 #define PCI_CLASS_PROCESSOR_POWERPC  0x0b20
+#define PCI_CLASS_PROCESSOR_MIPS 0x0b30
+#define PCI_CLASS_

[Qemu-devel] [RFC PATCH 2/4] kernel: Add definitions for GICv3 attributes

2015-09-30 Thread Pavel Fedin
This temporary patch adds kernel API definitions. Use proper header update
procedure after these features are released.

Signed-off-by: Pavel Fedin 
---
 linux-headers/asm-arm64/kvm.h | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index c8abf25..2f8e86d 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -163,13 +163,21 @@ struct kvm_arch_memory_slot {
 #define KVM_DEV_ARM_VGIC_GRP_ADDR  0
 #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
 #define KVM_DEV_ARM_VGIC_GRP_CPU_REGS  2
+#define   KVM_DEV_ARM_VGIC_64BIT   (1ULL << 63)
 #define   KVM_DEV_ARM_VGIC_CPUID_SHIFT 32
-#define   KVM_DEV_ARM_VGIC_CPUID_MASK  (0xffULL << 
KVM_DEV_ARM_VGIC_CPUID_SHIFT)
+#define   KVM_DEV_ARM_VGIC_CPUID_MASK  (0xfULL << 
KVM_DEV_ARM_VGIC_CPUID_SHIFT)
 #define   KVM_DEV_ARM_VGIC_OFFSET_SHIFT0
 #define   KVM_DEV_ARM_VGIC_OFFSET_MASK (0xULL << 
KVM_DEV_ARM_VGIC_OFFSET_SHIFT)
+#define   KVM_DEV_ARM_VGIC_REG_MASK(KVM_REG_SIZE_MASK | \
+KVM_REG_ARM64_SYSREG_OP0_MASK | \
+KVM_REG_ARM64_SYSREG_OP1_MASK | \
+KVM_REG_ARM64_SYSREG_CRN_MASK | \
+KVM_REG_ARM64_SYSREG_CRM_MASK | \
+KVM_REG_ARM64_SYSREG_OP2_MASK)
 #define KVM_DEV_ARM_VGIC_GRP_NR_IRQS   3
 #define KVM_DEV_ARM_VGIC_GRP_CTRL  4
 #define   KVM_DEV_ARM_VGIC_CTRL_INIT   0
+#define KVM_DEV_ARM_VGIC_GRP_REDIST_REGS 5
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
-- 
2.4.4




Re: [Qemu-devel] [PATCH 1/7] string-input-visitor: Fix uint64 parsing

2015-09-30 Thread Eric Blake
On 09/30/2015 07:19 AM, Markus Armbruster wrote:

> 
> The (essentially undocumented) Visitor abstraction has the following
> methods for integers:

I proposed documentation at:
https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg05434.html

> 
> * Mandatory: type_int()
> 
>   Interface uses int64_t for the value.  The implementation should
>   ensure it fits into int64_t.
> 
> * Optional: type_int{8,16,32}()
> 
>   These use int{8,16,32}_t for the value.
> 
>   If present, it should ensure the value fits into the data type.
> 
>   If missing, the core falls back to type_int() plus appropriate range
>   checking.

No one implements them.  In fact, as part of preparing my documentation,
I actually proposed simplifying the visitor callback interface to drop them:
https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg05432.html

> 
> * Optional: type_int64()
> 
>   Same interface as type_int().
> 
>   If present, it should ensure the value fits into int64_t.
> 
>   If missing, the core falls back to type_int().
> 
>   Aside: setting type_int64() would be useful only when you want to
>   distinguish QAPI types int and int64.  So far, nobody does.  In fact,
>   nobody uses QAPI type int64!  I'm tempted to define QAPI type int as a
>   mere alias for int64 and drop the redundant stuff.

Already part of my proposal.

> 
> * Optional: type_uint{8,16,32}()
> 
>   These use uint{8,16,32}_t for the value.
> 
>   If present, it should ensure the value fits into the data type.
> 
>   If missing, the core falls back to type_int() plus appropriate range
>   checking.

Also unused, and simplified above.

> 
> * Optional: type_uint64()
> 
>   Now it gets interesting.  Interface uses uint64_t for the value.
> 
>   If present, it should ensure the value fits into uint64_t.
> 
>   If missing, the core falls back to type_int().  No range checking.  If
>   type_int() performs range checking as it should, then uint64_t values
>   not representable in int64_t get rejected (wrong), and negative values
>   representable in int64_t get cast to uint64_t (also wrong).
> 
>   I think we need to make type_uint64() mandatory, and drop the
>   fallback.

Probably a good idea, although not done in my proposed patches.

> 
> * Optional: type_size()
> 
>   Same interface as type_uint64().
> 
>   If present, it should ensure the value fits into uint64_t.
> 
>   If missing, the core first tries falling back to type_uint64() and
>   then to type_int().  Falling back to type_int() is as wrong here as it
>   is in type_uint64().

Provided by the QemuOpts parser to allow '1k' to mean 1024, and so on.

> 
>> As a bug fix, ignore warnings about preference of qemu_strto[u]ll().
> 
> I'm not sure I get this sentence.
> 
>> Cc: qemu-sta...@nongnu.org
>> Signed-off-by: Andreas Färber 
> 
> On the actual patch, I have nothing to add over Eric's review right now.
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-09-30 Thread Jeff Cody
On Mon, Sep 28, 2015 at 04:23:16PM +0100, Stefan Hajnoczi wrote:
> On Sun, Sep 27, 2015 at 11:29:18PM -0400, Jeff Cody wrote:
> > +if (s->zero_cycle) {
> > +ret = bdrv_get_block_status(s->target, sector_num, nb_sectors, 
> > &pnum);
> > +if (!(ret & BDRV_BLOCK_ZERO)) {
> > +bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> > +  s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> > +  mirror_write_complete, op);
> 
> mirror_write_complete will advance s->common.offset.  Won't the progress
> be incorrect if we do that for both zeroing and regular mirroring?

Good point.  However, Is it really wrong to count it in the progress,
if we do the zero mirror pass?  I



Re: [Qemu-devel] [PATCH 1/2] target-i386: Use 1UL for bit shift

2015-09-30 Thread Paolo Bonzini


On 29/09/2015 22:34, Eduardo Habkost wrote:
> Fix undefined behavior detected by clang runtime check:
> 
>   qemu/target-i386/cpu.c:1494:15: runtime error:
> left shift of 1 by 31 places cannot be represented in type 'int'
> 
> While doing that, add extra parenthesis for clarity.
> 
> Reported-by: Peter Maydell 
> Signed-off-by: Eduardo Habkost 
> ---
>  target-i386/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 2b914b2..6af6db9 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -1491,7 +1491,7 @@ static void report_unavailable_features(FeatureWord w, 
> uint32_t mask)
>  int i;
>  
>  for (i = 0; i < 32; ++i) {
> -if (1 << i & mask) {
> +if ((1UL << i) & mask) {

1U is enough.

Paolo

ps: Ego ceterum censeo that these warnings are useless and uglify the
code unnecessarily.  But it looks like I'm in a minority so the patch is
okay.

>  const char *reg = get_register_name_32(f->cpuid_reg);
>  assert(reg);
>  fprintf(stderr, "warning: %s doesn't support requested feature: "
> 



Re: [Qemu-devel] Loading image/elf to cpu that has different not system memory address space

2015-09-30 Thread Peter Maydell
On 30 September 2015 at 13:15, Marcin Krzemiński
 wrote:
>
>
> 2015-09-30 12:44 GMT+02:00 Peter Maydell :
>>
>> On 30 September 2015 at 06:18, Marcin Krzemiński
>>  wrote:
>> > I have at 0xfff0 real memory now (with aliasing to lower memory
>> > address).
>> > Does it mean that qemu might try to execute some instructions from
>> > there?
>>
>> As I say, we need there to be fake RAM at that address. We never
>> try to read its contents, though.

> That wasn't clear for me.
> Since I have real and used memory there in my model I worried that I my get
> sometimes unexpected behavior.

It seems very unlikely that you would have real memory there in an
M profile CPU system -- that address range is part of the Vendor
System section of the address space. (Among other things it's
compulsorily execute-never.)

thanks
-- PMM



Re: [Qemu-devel] [PATCH 2/3] hw: do not pass NULL to memory_region_init from instance_init

2015-09-30 Thread Paolo Bonzini


On 30/09/2015 10:57, Markus Armbruster wrote:
> Paolo Bonzini  writes:
> 
>> > This causes the region to outlive the object, because it attaches the
>> > region to /machine.  This is not nice for the "realize" method, but
>> > much worse for "instance_init" because it can cause dangling pointers
>> > after a simple object_new/object_unref pair.
>> >
>> > Reported-by: Markus Armbruster 
>> > Signed-off-by: Paolo Bonzini 
> One more: pxa2xx_pcmcia_initfn().
> 
> The ones you fix are
> Tested-by: Markus Armbruster 

Can you fix it up and take it through your series?

Paolo



Re: [Qemu-devel] [PATCH 1/7] string-input-visitor: Fix uint64 parsing

2015-09-30 Thread Andreas Färber
Am 30.09.2015 um 15:19 schrieb Markus Armbruster:
> Andreas Färber  writes:
>> As a bug fix, ignore warnings about preference of qemu_strto[u]ll().
> 
> I'm not sure I get this sentence.

This patch causes checkpatch warnings. I intentionally do not address
them in this bug-fix patch, but instead in a later patch in the series.

Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)



Re: [Qemu-devel] [PATCH 2/3] hw: do not pass NULL to memory_region_init from instance_init

2015-09-30 Thread Paolo Bonzini


On 30/09/2015 10:30, Thomas Huth wrote:
>> > @@ -944,7 +944,7 @@ static void tcx_initfn(Object *obj)
>> >  SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
>> >  TCXState *s = TCX(obj);
>> >  
>> > -memory_region_init_ram(&s->rom, NULL, "tcx.prom", FCODE_MAX_ROM_SIZE,
>> > +memory_region_init_ram(&s->rom, OBJECT(s), "tcx.prom", 
>> > FCODE_MAX_ROM_SIZE,
>> > &error_fatal);
> Why "OBJECT(s)" and not simply "obj" ?

No particular reason, just the way my brain worked. :)

Paolo



Re: [Qemu-devel] [PATCH] spapr: add a default rng device

2015-09-30 Thread Greg Kurz
On Wed, 30 Sep 2015 11:10:52 +0200
Thomas Huth  wrote:

> On 30/09/15 10:33, Greg Kurz wrote:
> > On Tue, 29 Sep 2015 15:01:09 +1000
> > David Gibson  wrote:
> > 
> >> On Mon, Sep 28, 2015 at 12:13:47PM +0200, Greg Kurz wrote:
> >>> A recent patch by Thomas Huth brought a new spapr-rng pseudo-device to
> >>> provide high-quality random numbers to guests. The device may either be
> >>> backed by a "RngBackend" or the in-kernel implementation of the H_RANDOM
> >>> hypercall.
> >>>
> >>> Since modern POWER8 based servers always provide a hardware rng, it makes
> >>> sense to create a spapr-rng device with use-kvm=true by default when it
> >>> is available.
> >>>
> >>> Of course we want the user to have full control on how the rng is handled.
> >>> The default device WILL NOT be created in the following cases:
> >>> - the -nodefaults option was passed
> >>> - a spapr-rng device was already passed on the command line
> >>>
> >>> The default device is created at reset time to ensure devices specified on
> >>> the command line have been created.
> >>>
> >>> Signed-off-by: Greg Kurz 
> >>
> >> So, I think the concept is ok, but..
> >>
> > 
> > Just to be sure about the concept.
> > 
> > The goal is to free users from having to explicitely pass
> > 
> > -device spapr-rng,use-kvm=true
> > 
> > ... when ALL the following conditions are met:
> > 
> > 1) KVM is used and advertises KVM_CAP_PPC_HWRNG
> > 2) -nodefaults HAS NOT been passed on the cmdline
> > 3) -device spapr-rng HAS NOT been passed on the cmdline
> > 
> >>> ---
> >>>  hw/ppc/spapr.c   |   17 +
> >>>  hw/ppc/spapr_rng.c   |2 +-
> >>>  target-ppc/kvm.c |9 +
> >>>  target-ppc/kvm_ppc.h |6 ++
> >>>  4 files changed, 29 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>> index 7f4f196e53e5..ee048ecffd0c 100644
> >>> --- a/hw/ppc/spapr.c
> >>> +++ b/hw/ppc/spapr.c
> >>> @@ -1059,6 +1059,14 @@ static int spapr_check_htab_fd(sPAPRMachineState 
> >>> *spapr)
> >>>  return rc;
> >>>  }
> >>>  
> >>> +static void spapr_rng_create(void)
> >>> +{
> >>> +Object *rng = object_new(TYPE_SPAPR_RNG);
> >>> +
> >>> +object_property_set_bool(rng, true, "use-kvm", &error_abort);
> >>> +object_property_set_bool(rng, true, "realized", &error_abort);
> >>> +}
> >>> +
> >>>  static void ppc_spapr_reset(void)
> >>>  {
> >>>  sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >>> @@ -1082,6 +1090,15 @@ static void ppc_spapr_reset(void)
> >>>  spapr->rtas_addr = rtas_limit - RTAS_MAX_SIZE;
> >>>  spapr->fdt_addr = spapr->rtas_addr - FDT_MAX_SIZE;
> >>>  
> >>> +/* Create a rng device if the user did not provide it already and
> >>> + * KVM has hwrng support.
> >>> + */
> >>> +if (defaults_enabled() &&
> >>> +kvmppc_hwrng_present() &&
> >>> +!object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)) {
> >>> +spapr_rng_create();
> >>> +}
> >>> +
> >>
> >> Constructing the RNG at reset time is just wrong.  Using
> >> defaults_enabled() is ugly at the best of times, using it at reset,
> >> after construction of the qom tree is generally complete, is just
> >> hideous.
> >>
> > 
> > Yeah I ended up with this hack because I could not figure out how
> > to give priority to a spapr-rng device specified on the cmdline
> > over the automatic one... poor QOM skills :\
> > 
> > If you have a suggestion to handle this case in a more appropriate way,
> > and it is worth the pain compared to the gain, please advice.
> 
> Not sure whether this might be an acceptable solution, but maybe you
> could use qemu_opts_foreach(qemu_find_opts("device"), ...) to check
> whether a "spapr-rng" device has been specified at the command line?
> 

Yes it would allow, at least, to create the device at init time... then
I don't know if it is good practice, considering that:

$ grep -r qemu_opts_foreach hw/
hw/core/qdev-properties-system.c:qemu_opts_foreach(qemu_find_opts("global"),
$

Cheers.

--
Greg


pgp6NHLN_PudM.pgp
Description: OpenPGP digital signature


[Qemu-devel] [RFC PATCH 3/4] hw/intc/arm_gicv3_kvm: Implement get/put functions

2015-09-30 Thread Pavel Fedin
This actually implements pre_save and post_load methods for in-kernel
vGICv3.

Signed-off-by: Pavel Fedin 
---
 hw/intc/arm_gicv3_kvm.c | 391 +++-
 1 file changed, 387 insertions(+), 4 deletions(-)

diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index b48f78f..5f268a3 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -21,8 +21,10 @@
 
 #include "hw/intc/arm_gicv3_common.h"
 #include "hw/sysbus.h"
+#include "qemu/error-report.h"
 #include "sysemu/kvm.h"
 #include "kvm_arm.h"
+#include "gicv3_internal.h"
 #include "vgic_common.h"
 
 #ifdef DEBUG_GICV3_KVM
@@ -41,6 +43,15 @@
 #define KVM_ARM_GICV3_GET_CLASS(obj) \
  OBJECT_GET_CLASS(KVMARMGICv3Class, (obj), TYPE_KVM_ARM_GICV3)
 
+#define ICC_PMR_EL1 ARM64_SYS_REG(0b11, 0b000, 0b0100, 0b0110, 0b000)
+#define ICC_BPR0_EL1ARM64_SYS_REG(0b11, 0b000, 0b1100, 0b1000, 0b011)
+#define ICC_APR0_EL1(n) ARM64_SYS_REG(0b11, 0b000, 0b1100, 0b1000, 0b100 | n)
+#define ICC_APR1_EL1(n) ARM64_SYS_REG(0b11, 0b000, 0b1100, 0b1001, 0b000 | n)
+#define ICC_BPR1_EL1ARM64_SYS_REG(0b11, 0b000, 0b1100, 0b1100, 0b011)
+#define ICC_CTLR_EL1ARM64_SYS_REG(0b11, 0b000, 0b1100, 0b1100, 0b100)
+#define ICC_IGRPEN0_EL1 ARM64_SYS_REG(0b11, 0b000, 0b1100, 0b1100, 0b110)
+#define ICC_IGRPEN1_EL1 ARM64_SYS_REG(0b11, 0b000, 0b1100, 0b1100, 0b111)
+
 typedef struct KVMARMGICv3Class {
 ARMGICv3CommonClass parent_class;
 DeviceRealize parent_realize;
@@ -54,16 +65,382 @@ static void kvm_arm_gicv3_set_irq(void *opaque, int irq, 
int level)
 kvm_arm_gic_set_irq(s->num_irq, irq, level);
 }
 
+#define KVM_VGIC_ATTR(offset, cpu) \
+uint64_t)(cpu) << KVM_DEV_ARM_VGIC_CPUID_SHIFT) & \
+  KVM_DEV_ARM_VGIC_CPUID_MASK) | \
+ (((uint64_t)(offset) << KVM_DEV_ARM_VGIC_OFFSET_SHIFT) & \
+  KVM_DEV_ARM_VGIC_OFFSET_MASK))
+
+static inline void kvm_gicd_access(GICv3State *s, int offset, int cpu,
+   uint32_t *val, bool write)
+{
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_DIST_REGS,
+  KVM_VGIC_ATTR(offset, cpu), val, write);
+}
+
+static inline void kvm_gicr_access(GICv3State *s, int offset, int cpu,
+   uint32_t *val, bool write)
+{
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_REDIST_REGS,
+  KVM_VGIC_ATTR(offset, cpu), val, write);
+}
+
+static inline void kvm_gicc_access(GICv3State *s, uint64_t reg, int cpu,
+   uint64_t *val, bool write)
+{
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CPU_REGS,
+  uint64_t)(cpu) << KVM_DEV_ARM_VGIC_CPUID_SHIFT) &
+KVM_DEV_ARM_VGIC_CPUID_MASK) | reg), val, write);
+}
+
+/*
+ * Translate from the in-kernel field for an IRQ value to/from the qemu
+ * representation.
+ */
+typedef void (*vgic_translate_fn)(GICv3State *s, int irq, int cpu,
+  uint32_t *field, bool to_kernel);
+
+/* synthetic translate function used for clear/set registers to completely
+ * clear a setting using a clear-register before setting the remaining bits
+ * using a set-register */
+static void translate_clear(GICv3State *s, int irq, int cpu,
+uint32_t *field, bool to_kernel)
+{
+if (to_kernel) {
+*field = ~0;
+} else {
+/* does not make sense: qemu model doesn't use set/clear regs */
+abort();
+}
+}
+
+static void translate_enabled(GICv3State *s, int irq, int cpu,
+  uint32_t *field, bool to_kernel)
+{
+if (to_kernel) {
+*field = test_bit(cpu, s->irq_state[irq].enabled);
+} else {
+set_irq_bit(s, s->irq_state[irq].enabled, irq, cpu, *field);
+}
+}
+
+static void translate_group(GICv3State *s, int irq, int cpu,
+uint32_t *field, bool to_kernel)
+{
+if (to_kernel) {
+*field = test_bit(cpu, s->irq_state[irq].group);
+} else {
+set_irq_bit(s, s->irq_state[irq].group, irq, cpu, *field);
+}
+}
+
+static void translate_trigger(GICv3State *s, int irq, int cpu,
+  uint32_t *field, bool to_kernel)
+{
+if (to_kernel) {
+*field = s->irq_state[irq].edge_trigger ? 2 : 0;
+} else {
+s->irq_state[irq].edge_trigger = (*field & 2) ? true : false;
+}
+}
+
+static void translate_pending(GICv3State *s, int irq, int cpu,
+  uint32_t *field, bool to_kernel)
+{
+if (to_kernel) {
+*field = gicv3_test_pending(s, irq, cpu);
+} else {
+set_irq_bit(s, s->irq_state[irq].pending, irq, cpu, *field);
+/* TODO: Capture if level-line is held high in the kernel */
+}
+}
+
+static void translate_active(GICv3State *s, int irq, int cpu,
+ uint32_t *field, bool to_kernel)
+{
+if (to_kernel) {
+*field = test_bit(cpu, s->irq_state[irq].active);
+  

Re: [Qemu-devel] [PATCH v4 3/5] acpi: pc: add fw_cfg device node to ssdt

2015-09-30 Thread Paolo Bonzini


On 30/09/2015 02:18, Gabriel L. Somlo wrote:
> Yes, we're OK. Throughout it all I *meant* to write 0x0B (bee), but my
> brain sometimes mistakenly makes me write 0x08 (eight) instead. Sorry for
> the confusion... :)

IIRC from the pvpanic trainwreck, Windows XP and 2003 always complain
even for 0x0B about a missing driver.

Paolo



Re: [Qemu-devel] [PATCH 1/4] spapr_pci: Allow PCI host bridge DMA window to be configured

2015-09-30 Thread Laurent Vivier


On 30/09/2015 05:48, David Gibson wrote:
> At present the PCI host bridge (PHB) for the pseries machine type has a
> fixed DMA window from 0..1GB (in PCI address space) which is mapped to real
> memory via the PAPR paravirtualized IOMMU.
> 
> For better support of VFIO devices, we're going to want to allow for
> different configurations of the DMA window.
> 
> Eventually we'll want to allow the guest itself to reconfigure the window
> via the PAPR dynamic DMA window interface, but as a preliminary this patch
> allows the user to reconfigure the window with new properties on the PHB
> device.
> 
> Signed-off-by: David Gibson 
> Reviewed-by: Thomas Huth 
> ---
>  hw/ppc/spapr_pci.c  | 7 +--
>  include/hw/pci-host/spapr.h | 3 +--
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 617b7f3..cb7c351 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1387,7 +1387,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState 
> *sphb, Error **errp)
>  sPAPRTCETable *tcet;
>  uint32_t nb_table;
>  
> -nb_table = SPAPR_PCI_DMA32_SIZE >> SPAPR_TCE_PAGE_SHIFT;
> +nb_table = sphb->dma_win_size >> SPAPR_TCE_PAGE_SHIFT;
>  tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
> 0, SPAPR_TCE_PAGE_SHIFT, nb_table, false);
>  if (!tcet) {
> @@ -1397,7 +1397,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState 
> *sphb, Error **errp)
>  }
>  
>  /* Register default 32bit DMA window */
> -memory_region_add_subregion(&sphb->iommu_root, 0,
> +memory_region_add_subregion(&sphb->iommu_root, sphb->dma_win_addr,
>  spapr_tce_get_iommu(tcet));
>  }
>  
> @@ -1430,6 +1430,9 @@ static Property spapr_phb_properties[] = {
> SPAPR_PCI_IO_WIN_SIZE),
>  DEFINE_PROP_BOOL("dynamic-reconfiguration", sPAPRPHBState, dr_enabled,
>   true),
> +/* Default DMA window is 0..1GB */
> +DEFINE_PROP_UINT64("dma_win_addr", sPAPRPHBState, dma_win_addr, 0),
> +DEFINE_PROP_UINT64("dma_win_size", sPAPRPHBState, dma_win_size, 
> 0x4000),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 5322b56..7de5e02 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -78,6 +78,7 @@ struct sPAPRPHBState {
>  MemoryRegion memwindow, iowindow, msiwindow;
>  
>  uint32_t dma_liobn;
> +hwaddr dma_win_addr, dma_win_size;
>  AddressSpace iommu_as;
>  MemoryRegion iommu_root;
>  
> @@ -115,8 +116,6 @@ struct sPAPRPHBVFIOState {
>  
>  #define SPAPR_PCI_MSI_WINDOW 0x400ULL
>  
> -#define SPAPR_PCI_DMA32_SIZE 0x4000
> -
>  static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
>  {
>  sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> 
Reviewed-by: Laurent Vivier 



Re: [Qemu-devel] [RFC v5 0/6] Slow-path for atomic instruction translation

2015-09-30 Thread Paolo Bonzini


On 30/09/2015 10:14, alvise rigo wrote:
>> From 1ft, both approaches rely on checking a flag during stores.
>> This is split between the TLB and the CPUState for Alvise's patches (in
>> order to exploit the existing fast-path checks), and entirely in the
>> radix tree for Emilio's.  However, the idea is the same.
>>
>> Now, the patch are okay for serial emulation, but I am not sure if it's
>> possible to do lock-free ll/sc emulation, because there is a race.
> 
> Do you mean to not use any locking mechanism at all at the emulation side?

Not using it in the fast path of the store, at least.

>>
>> If we check the flag before the store, the race is as follows:
>>
>>CPU0CPU1
>>---
>>check flag
>>load locked:
>>   set flag
>>   load value (normal load on CPU)
>>store
>>store conditional (normal store on CPU)
>>
>> where the sc doesn't fail.  For completeness, if we check it afterwards
> 
> Shouldn't this be prevented by the tcg_excl_access_lock in the
> patchseries based on mttcg (branch slowpath-for-atomic-v5-mttcg)?

No, the issue happens already in the fast path, i.e. when checking the TLB.

Have you ever heard of consensus numbers?  Basically, it's a tool to
prove that it is impossible to implement an algorithm X in a wait-free
manner.

For LL/SC/store, consider a simple case with two CPUs, one executing LL
and one executing a store.  This does not require any lock on the LL/SC
side, because there is only one CPU running that code.  It is also okay
if it requires a lock to synchronize between LL/store in the slow path.
 However, we want a wait-free fast path.  If we can formalize the fast
path as a consensus problem, consensus numbers let us prove whether it
can be done or not.

In fact, it's easy for the CPUs to use consensus to decide the outcome
of a subsequent SC instruction.  If the LL comes before the store, the
SC fails.  If the LL comes after the store, the SC succeeds.  Because
there's two CPUs, this consensus problem can be solved with any
primitive whose consensus number is >= 2.

Unfortunately atomic registers (i.e. memory cells, like QEMU's TLBs)
have consensus number 1.  You need test-and-set, compare-and-swap,
atomic writes to two memory locations or something like that.  All very
expensive stuff.

I attach a PDF with some pseudocode examples, and a Promela model of the
same.  (Yes, I was nerd-sniped).

>> If I'm right, we can still keep the opcodes and implement them with a
>> simple cmpxchg.  It would provide a nice generic tool to implement
>> atomic operations, and it will work correctly if the target has ll/sc.
>> However, ll/sc-on-cmpxchg (e.g., ARM-on-x86) would be susceptible to the
>> ABA problem.
> 
> This was one of my fears that led me to the ll/sc approach. I think it
> could be even more probable in emulation since we can't assume the
> distance in time between LLs and SCs to be small to avoid "aba"
> accesses.

In practice cmpxchg works because most primitives and lock-free data
structures are written against cmpxchg, not LL/SC.  ABA is avoided
through garbage collection, RCU or hazard pointers, without relying on
LL/SC semantics.

Again, your patches are still very useful to provide the abstraction.

Paolo


llsc.pdf
Description: Adobe PDF document
int ll_val;// value read by LL
int val;   // current value of memory cell
int mark;  // mark to trigger slow path in stores
int listed;// abstract representation of "locked list"
int sc_failure;// did SC succeed or fail?

/* common implementation of store-conditional */
#define SC   \
   atomic {  \
   sc_failure = !listed; \
   assert(sc_failure || val == ll_val);  \
   if :: sc_failure -> skip; \
  :: else -> mark = 0; listed = 0; val = 2;  \
   fi;   \
   }

// trivial implementation, obviously broken - does not even try
#if 0
#define LL\
   listed = 1;\
   ll_val = val;  \
   mark = 1;

#define STORE \
   val = 1
#endif

// ---

// broken implementation #1
// STORE can read the mark before LL has set it.  If it stores
// the new value after LL has read the memory, SC will not
// notice the conflict.

#if 0
#define LL\
   ll_val = val;  \
   atomic {   \
 listed = 1;  \
 mark = 1;\
   }

#define STORE \
   if :: mark -> atomic { listed = 0; mark = 0; } \
  :: else -> s

Re: [Qemu-devel] [RFC v5 2/6] softmmu: Add new TLB_EXCL flag

2015-09-30 Thread alvise rigo
On Wed, Sep 30, 2015 at 1:09 PM, Peter Maydell  wrote:
> On 30 September 2015 at 10:24, alvise rigo
>  wrote:
>> On Wed, Sep 30, 2015 at 5:34 AM, Richard Henderson  wrote:
>>> (1) I don't see why EXCL support should differ whether MMIO is set or not.
>>> Either we support exclusive accesses on memory-mapped io like we do on ram
>>> (in which case this is wrong) or we don't (in which case this is
>>> unnecessary).
>>
>> I was not sure whether or not we had to support also MMIO memory.
>> In theory there shouldn't be any issues for including also
>> memory-mapped io, I will consider this for the next version.
>
> Worth considering the interaction between exclusives and other
> cases for which we force the slowpath, notably watchpoints.
>
>>> AFAIK, Alpha is the only target we have which specifies that any normal
>>> memory access during a ll+sc sequence may fail the sc.
>>
>> I will dig into it because I remember that the Alpha architecture
>> behaves like ARM in the handling of LDxL/STxC instructions.
>
> ARM semantics are that a non-exclusive store by this CPU between
> a ldrex and a strex might result in loss of the (local) monitor,
> but non-exclusive loads by this CPU won't. (It's an IMPDEF
> choice.)

Indeed, what is implemented by this patch series is one of the
permissible choices - very close to the one implemented by the current
TCG - that could match all the other architectures with similar
semantics (now I'm not sure about Alpha).
In this regard, I was wondering, should these semantics be somehow
target-* dependent?
Like having some per-architecture functions that, for each LoadLink,
set the size of the exclusive memory region to be protected and decide
whether a normal store/load will make one CPU's SC fail.

Thank you,
alvise

>
> thanks
> -- PMM



Re: [Qemu-devel] [PATCH] block/raw-posix: Open file descriptor O_RDWR to work around glibc posix_fallocate emulation issue.

2015-09-30 Thread Kevin Wolf
Am 29.09.2015 um 17:54 hat Richard W.M. Jones geschrieben:
>   https://bugzilla.redhat.com/show_bug.cgi?id=1265196
> 
> The following command fails on an NFS mountpoint:
> 
>   $ qemu-img create -f qcow2 -o preallocation=falloc disk.img 262144
>   Formatting 'disk.img', fmt=qcow2 size=262144 encryption=off 
> cluster_size=65536 preallocation='falloc' lazy_refcounts=off
>   qemu-img: disk.img: Could not preallocate data for the new file: Bad file 
> descriptor
> 
> The reason turns out to be because NFS doesn't support the
> posix_fallocate call.  glibc emulates it instead.  However glibc's
> emulation involves using the pread(2) syscall.  The pread syscall
> fails with EBADF if the file descriptor is opened without the read
> open-flag (ie. open (..., O_WRONLY)).
> 
> I contacted glibc upstream about this, and their response is here:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=1265196#c9
> 
> There are two possible fixes: Use Linux fallocate directly, or (this
> fix) work around the problem in qemu by opening the file with O_RDWR
> instead of O_WRONLY.
> 
> Signed-off-by: Richard W.M. Jones 
> BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1265196

Thanks, applied to the block branch.

Kevin



Re: [Qemu-devel] [RFC v5 2/6] softmmu: Add new TLB_EXCL flag

2015-09-30 Thread Peter Maydell
On 30 September 2015 at 10:24, alvise rigo
 wrote:
> On Wed, Sep 30, 2015 at 5:34 AM, Richard Henderson  wrote:
>> (1) I don't see why EXCL support should differ whether MMIO is set or not.
>> Either we support exclusive accesses on memory-mapped io like we do on ram
>> (in which case this is wrong) or we don't (in which case this is
>> unnecessary).
>
> I was not sure whether or not we had to support also MMIO memory.
> In theory there shouldn't be any issues for including also
> memory-mapped io, I will consider this for the next version.

Worth considering the interaction between exclusives and other
cases for which we force the slowpath, notably watchpoints.

>> AFAIK, Alpha is the only target we have which specifies that any normal
>> memory access during a ll+sc sequence may fail the sc.
>
> I will dig into it because I remember that the Alpha architecture
> behaves like ARM in the handling of LDxL/STxC instructions.

ARM semantics are that a non-exclusive store by this CPU between
a ldrex and a strex might result in loss of the (local) monitor,
but non-exclusive loads by this CPU won't. (It's an IMPDEF
choice.)

thanks
-- PMM



Re: [Qemu-devel] [PATCH 1/7] string-input-visitor: Fix uint64 parsing

2015-09-30 Thread Markus Armbruster
Andreas Färber  writes:

> All integers would get parsed by strtoll(), not handling the case of
> UINT64 properties with the most significient bit set.

This mess is part of a bigger mess, I'm afraid.

The major ways integers get parsed are:

* QMP: parse_literal() in qmp/qobject/json-parser.c

  This is what parses QMP off the wire.

  RFC 7159 does not prescribe range or precision of JSON numbers.  Our
  implementation accepts the union of int64_t and double.

  If the lexer recognizes a floating-point number, we convert it with
  strtod() and represent it as double.

  If the lexer recognizes a decimal integer, and strtoll() can convert
  it, we represent it in int64_t.  Else, we convert it with strtod() and
  represent it as double.  Unclean: code assumes int64_t is long long.

  Yes, that means QMP can't currently support the full range of QAPI's
  uint64 type.

* QemuOpts: parse_option_number() in util/qemu-option.c

  This is what parses key=value,... strings for command line and other
  places.

  QemuOpts can be used in two ways.  If you fill out QemuOptDesc desc[],
  it rejects unknown keys and parses values of known keys.  If you leave
  it empty, it accepts all keys, and doesn't parse values.  Either way,
  it also stores raw string values.

  QemuOpts' parser only supports unsigned numbers, in decimal, octal and
  hex.  Error checking is very poor.  In particular, it considers
  negative input valid, and silently casts it to uint64_t.  I wouldn't
  be surprised if some code depended on that.

* String input visitor: parse_str() in qapi/string-input-visitor.c

  This appears to be used only by QOM so far:

  - object_property_get_enum()
  - object_property_get_uint16List()
  - object_property_parse()

  parse_str() appears to parse some fancy list syntax.  Comes from
  commit 659268f.  The commit message is useless.  I can't see offhand
  how this interacts with the visitor core.

  Anyway, if we ignore the fancy crap and just look at the parsing of a
  single integer, we see that it supports int64_t in decimal, octal and
  hex, it fails to check for ERANGE, and assumes int64_t is long long.

* Options visitor: opts_type_int() in opts qapi/opts-visitor.c

  This one is for converting QemuOpts to QAPI-defined C types.  It uses
  the raw string values, not the parsed ones.  The QemuOpts parser is
  neither needed nor wanted here.  You should use the options visitor
  with an empty desc[] array to bypass it.  Example: numa.c.

  We got fancy list syntax again.  This one looks like I could
  understand it with a bit of effort.  But let's look just at the
  parsing of a single integer.  It supports uint64_t in decimal, octal
  and hex, and *surprise* checks for errors carefully.

Fixing just a part of a mess can be okay.  I just don't want to lever
the bigger mess unmentioned.

> Implement a .type_uint64 visitor callback, reusing the existing
> parse_str() code through a new argument, using strtoull().

I'm afraid you're leaving the bug in the visitor core unfixed.

The (essentially undocumented) Visitor abstraction has the following
methods for integers:

* Mandatory: type_int()

  Interface uses int64_t for the value.  The implementation should
  ensure it fits into int64_t.

* Optional: type_int{8,16,32}()

  These use int{8,16,32}_t for the value.

  If present, it should ensure the value fits into the data type.

  If missing, the core falls back to type_int() plus appropriate range
  checking.

* Optional: type_int64()

  Same interface as type_int().

  If present, it should ensure the value fits into int64_t.

  If missing, the core falls back to type_int().

  Aside: setting type_int64() would be useful only when you want to
  distinguish QAPI types int and int64.  So far, nobody does.  In fact,
  nobody uses QAPI type int64!  I'm tempted to define QAPI type int as a
  mere alias for int64 and drop the redundant stuff.

* Optional: type_uint{8,16,32}()

  These use uint{8,16,32}_t for the value.

  If present, it should ensure the value fits into the data type.

  If missing, the core falls back to type_int() plus appropriate range
  checking.

* Optional: type_uint64()

  Now it gets interesting.  Interface uses uint64_t for the value.

  If present, it should ensure the value fits into uint64_t.

  If missing, the core falls back to type_int().  No range checking.  If
  type_int() performs range checking as it should, then uint64_t values
  not representable in int64_t get rejected (wrong), and negative values
  representable in int64_t get cast to uint64_t (also wrong).

  I think we need to make type_uint64() mandatory, and drop the
  fallback.

* Optional: type_size()

  Same interface as type_uint64().

  If present, it should ensure the value fits into uint64_t.

  If missing, the core first tries falling back to type_uint64() and
  then to type_int().  Falling back to type_int() is as wrong here as it
  is in type_uint64().

> As a bug fix, ignore warnings about preference of qemu_s

Re: [Qemu-devel] [PATCH v4 46/47] ivshmem: use kvm irqfd for msi notifications

2015-09-30 Thread Claudio Fontana
On 24.09.2015 13:37, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Use irqfd for improving context switch when notifying the guest.
> If the host doesn't support kvm irqfd, regular msi notifications are
> still supported.
> 
> Note: the ivshmem implementation doesn't allow switching between MSI and
> IO interrupts, this patch doesn't either.
> 
> Signed-off-by: Marc-André Lureau 

Paolo could you also take a look at this one?
Seems to work, but I am not familiar with the kvm msi irqfd primitives.

One comment below.

> ---
>  hw/misc/ivshmem.c | 175 
> --
>  1 file changed, 169 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 73644cc..39c0791 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -19,6 +19,7 @@
>  #include "hw/hw.h"
>  #include "hw/i386/pc.h"
>  #include "hw/pci/pci.h"
> +#include "hw/pci/msi.h"
>  #include "hw/pci/msix.h"
>  #include "sysemu/kvm.h"
>  #include "migration/migration.h"
> @@ -68,6 +69,7 @@ typedef struct Peer {
>  
>  typedef struct MSIVector {
>  PCIDevice *pdev;
> +int virq;
>  } MSIVector;
>  
>  typedef struct IVShmemState {
> @@ -293,13 +295,73 @@ static void fake_irqfd(void *opaque, const uint8_t 
> *buf, int size) {
>  msix_notify(pdev, vector);
>  }
>  
> +static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector,
> + MSIMessage msg)
> +{
> +IVShmemState *s = IVSHMEM(dev);
> +EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
> +MSIVector *v = &s->msi_vectors[vector];
> +int ret;
> +
> +IVSHMEM_DPRINTF("vector unmask %p %d\n", dev, vector);
> +
> +ret = kvm_irqchip_update_msi_route(kvm_state, v->virq, msg);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +return kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, NULL, v->virq);
> +}
> +
> +static void ivshmem_vector_mask(PCIDevice *dev, unsigned vector)
> +{
> +IVShmemState *s = IVSHMEM(dev);
> +EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
> +int ret;
> +
> +IVSHMEM_DPRINTF("vector mask %p %d\n", dev, vector);
> +
> +ret = kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, n,
> +s->msi_vectors[vector].virq);
> +if (ret != 0) {
> +error_report("remove_irqfd_notifier_gsi failed");
> +}
> +}
> +
> +static void ivshmem_vector_poll(PCIDevice *dev,
> +unsigned int vector_start,
> +unsigned int vector_end)
> +{
> +IVShmemState *s = IVSHMEM(dev);
> +unsigned int vector;
> +
> +IVSHMEM_DPRINTF("vector poll %p %d-%d\n", dev, vector_start, vector_end);
> +
> +vector_end = MIN(vector_end, s->vectors);
> +
> +for (vector = vector_start; vector < vector_end; vector++) {
> +EventNotifier *notifier = &s->peers[s->vm_id].eventfds[vector];
> +
> +if (!msix_is_masked(dev, vector)) {
> +continue;
> +}
> +
> +if (event_notifier_test_and_clear(notifier)) {
> +msix_set_pending(dev, vector);
> +}
> +}
> +}
> +
>  static CharDriverState* create_eventfd_chr_device(void * opaque, 
> EventNotifier *n,
>int vector)
>  {
>  /* create a event character device based on the passed eventfd */
>  IVShmemState *s = opaque;
> -CharDriverState * chr;
> +PCIDevice *pdev = PCI_DEVICE(s);
>  int eventfd = event_notifier_get_fd(n);
> +CharDriverState *chr;
> +
> +s->msi_vectors[vector].pdev = pdev;
>  
>  chr = qemu_chr_open_eventfd(eventfd);
>  
> @@ -484,6 +546,53 @@ static bool fifo_update_and_get(IVShmemState *s, const 
> uint8_t *buf, int size,
>  return true;
>  }
>  
> +static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
> +{
> +PCIDevice *pdev = PCI_DEVICE(s);
> +MSIMessage msg = msix_get_message(pdev, vector);
> +int ret;
> +
> +IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
> +
> +if (s->msi_vectors[vector].pdev != NULL) {
> +return 0;
> +}
> +
> +ret = kvm_irqchip_add_msi_route(kvm_state, msg); /*  */
> +if (ret < 0) {
> +error_report("ivshmem: kvm_irqchip_add_msi_route failed");
> +return -1;
> +}
> +
> +s->msi_vectors[vector].virq = ret;
> +s->msi_vectors[vector].pdev = pdev;
> +
> +return 0;
> +}
> +
> +static void setup_interrupt(IVShmemState *s, int vector)
> +{
> +EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
> +bool with_irqfd = kvm_msi_via_irqfd_enabled() &&
> +ivshmem_has_feature(s, IVSHMEM_MSI);
> +PCIDevice *pdev = PCI_DEVICE(s);
> +
> +IVSHMEM_DPRINTF("setting up interrupt for vector: %d\n", vector);
> +
> +if (!with_irqfd) {
> +s->eventfd_chr[vector] = create_eventfd_chr_device(s, n, vector);
> +} else if (msix_enabled(pdev)) {
> +   

Re: [Qemu-devel] Loading image/elf to cpu that has different not system memory address space

2015-09-30 Thread Marcin Krzemiński
2015-09-30 12:44 GMT+02:00 Peter Maydell :

> On 30 September 2015 at 06:18, Marcin Krzemiński
>  wrote:
> > I have at 0xfff0 real memory now (with aliasing to lower memory
> > address).
> > Does it mean that qemu might try to execute some instructions from there?
>
> As I say, we need there to be fake RAM at that address. We never
> try to read its contents, though.
>
> -- PMM
>

That wasn't clear for me.
Since I have real and used memory there in my model I worried that I my get
sometimes unexpected behavior.

Thanks,
Marcin


Re: [Qemu-devel] [PATCH v4 4/5] acpi: arm: add fw_cfg device node to dsdt

2015-09-30 Thread Laszlo Ersek
On 09/30/15 13:13, Peter Maydell wrote:
> On 30 September 2015 at 11:21, Laszlo Ersek  wrote:
>> However: if Gabriel has no access to actual aarch64 hardware (ie. cannot
>> run KVM guests), then I don't think he should bother. Booting just the
>> UEFI firmware on qemu-system-aarch64 with TCG acceleration is fine, but
>> for checking "/proc/iomem", he'd really need to boot into guest Linux,
>> and *that* takes absolutely forever with TCG.
> 
> If it actually takes forever that's a bug of some sort I think.
> TCG isn't all that snappy but it shouldn't take more than a few
> minutes to boot and it should be at least usably responsive on
> the command line once you get there. (Best not to try to boot
> into a GUI, though.)

Yes, TCG is fast, relative to the feat it pulls off, but in absolute
terms, even those minutes to boot are annoying when you repeatedly test
something in the guest.

Here's a timing from my new company laptop (Thinkpad W541, i7-4810MQ CPU
@ 2.80GHz, running docked); QEMU built with --enable-debug:

(1) From starting the guest until the EFI stub of the kernel launches:
omitted (we're not timing the firmware, as it is not universally
necessary for testing)

(2) From launching the EFI stub until the login prompt appears on the
serial console: 3 minutes 46 seconds

(3) After logging in super fast, the time it takes to get a shell
prompt: 50 seconds

(4) The time it takes for background services to quiesce (= for QEMU to
stop spinning) while waiting idly at the shell prompt (because it makes
no sense to issue commands earlier): 1 minute 19 seconds

(5) Once the guest quiesces, shutting it down with "poweroff": 1 minute
36 seconds.

In total, 7 minutes 31 seconds for a test cycle (not counting the
firmware), without running any actual commands in the guest.

Again, it depends on the services that are enabled in systemd, but you
usually want to test with a guest OS that users normally run.

(I realize step (5) can be avoided if you have a qcow2 snapshot -- just
kill the guest when you're done, and revert the image to the snapshot
before next boot; hopefully new guest files are not important. I also
agree the first investment in a TCG guest should be heavily trimming its
services.)

So -- there's no bug, but TCG does not appear very suitable for testing
in guest userspace *now*.

... This is not to diminish TCG's general brilliance, and usefulness in
certain situations. I haven't forgotten that aarch64 emulation in TCG
was a long awaited godsend after the Foundation Model!

Still: Gabriel, how do you feel about buying a 96Boards EE (when it
becomes available)? :)

https://www.96boards.org/products/ee/
http://community.arm.com/people/jeffunderhill/status/9831
https://community.amd.com/community/amd-business/blog/2015/06/23/extending-arm-s-ecosystem-for-server-developers

Thanks
Laszlo



Re: [Qemu-devel] [PATCH repost 4/4] exec: factor out duplicate mmap code

2015-09-30 Thread Marc-André Lureau
Hi

On Sun, Sep 27, 2015 at 12:14 PM, Michael S. Tsirkin  wrote:
> Anonymous and file-backed RAM allocation are now almost exactly the same.
>
> Reduce code duplication by moving RAM mmap code out of oslib-posix.c and
> exec.c.
>
> Signed-off-by: Michael S. Tsirkin 

This patch is failing vhost-user-test:

x86_64/vhost-user/read-guest-mem: **
ERROR:tests/vhost-user-test.c:248:read_guest_mem: assertion failed (a
== b): (4026597203 == 0)


> ---
>  include/qemu/mmap-alloc.h | 10 +
>  exec.c| 47 +-
>  util/mmap-alloc.c | 52 
> +++
>  util/oslib-posix.c| 28 -
>  util/Makefile.objs|  2 +-
>  5 files changed, 77 insertions(+), 62 deletions(-)
>  create mode 100644 include/qemu/mmap-alloc.h
>  create mode 100644 util/mmap-alloc.c
>
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> new file mode 100644
> index 000..3400e14
> --- /dev/null
> +++ b/include/qemu/mmap-alloc.h
> @@ -0,0 +1,10 @@
> +#ifndef QEMU_MMAP_ALLOC
> +#define QEMU_MMAP_ALLOC
> +
> +#include "qemu-common.h"
> +
> +void *qemu_ram_mmap(int fd, size_t size, size_t align);
> +
> +void qemu_ram_munmap(void *ptr, size_t size);
> +
> +#endif
> diff --git a/exec.c b/exec.c
> index 7d90a52..437634b 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -55,6 +55,9 @@
>  #include "exec/ram_addr.h"
>
>  #include "qemu/range.h"
> +#ifndef _WIN32
> +#include "qemu/mmap-alloc.h"
> +#endif
>
>  //#define DEBUG_SUBPAGE
>
> @@ -84,9 +87,9 @@ static MemoryRegion io_mem_unassigned;
>   */
>  #define RAM_RESIZEABLE (1 << 2)
>
> -/* An extra page is mapped on top of this RAM.
> +/* RAM is backed by an mmapped file.
>   */
> -#define RAM_EXTRA (1 << 3)
> +#define RAM_FILE (1 << 3)
>  #endif
>
>  struct CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
> @@ -1188,13 +1191,10 @@ static void *file_ram_alloc(RAMBlock *block,
>  char *filename;
>  char *sanitized_name;
>  char *c;
> -void *ptr;
> -void *area = NULL;
> +void *area;
>  int fd;
>  uint64_t hpagesize;
> -uint64_t total;
>  Error *local_err = NULL;
> -size_t offset;
>
>  hpagesize = gethugepagesize(path, &local_err);
>  if (local_err) {
> @@ -1238,7 +1238,6 @@ static void *file_ram_alloc(RAMBlock *block,
>  g_free(filename);
>
>  memory = ROUND_UP(memory, hpagesize);
> -total = memory + hpagesize;
>
>  /*
>   * ftruncate is not supported by hugetlbfs in older
> @@ -1250,40 +1249,14 @@ static void *file_ram_alloc(RAMBlock *block,
>  perror("ftruncate");
>  }
>
> -ptr = mmap(0, total, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS,
> --1, 0);
> -if (ptr == MAP_FAILED) {
> -error_setg_errno(errp, errno,
> - "unable to allocate memory range for hugepages");
> -close(fd);
> -goto error;
> -}
> -
> -offset = QEMU_ALIGN_UP((uintptr_t)ptr, hpagesize) - (uintptr_t)ptr;
> -
> -area = mmap(ptr + offset, memory, PROT_READ | PROT_WRITE,
> -(block->flags & RAM_SHARED ? MAP_SHARED : MAP_PRIVATE) |
> -MAP_FIXED,
> -fd, 0);
> +area = qemu_ram_mmap(fd, memory, hpagesize);
>  if (area == MAP_FAILED) {
>  error_setg_errno(errp, errno,
>   "unable to map backing store for hugepages");
> -munmap(ptr, total);
>  close(fd);
>  goto error;
>  }
>
> -if (offset > 0) {
> -munmap(ptr, offset);
> -}
> -ptr += offset;
> -total -= offset;
> -
> -if (total > memory + getpagesize()) {
> -munmap(ptr + memory + getpagesize(),
> -   total - memory - getpagesize());
> -}
> -
>  if (mem_prealloc) {
>  os_mem_prealloc(fd, area, memory);
>  }
> @@ -1601,7 +1574,7 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, 
> MemoryRegion *mr,
>  new_block->used_length = size;
>  new_block->max_length = size;
>  new_block->flags = share ? RAM_SHARED : 0;
> -new_block->flags |= RAM_EXTRA;
> +new_block->flags |= RAM_FILE;
>  new_block->host = file_ram_alloc(new_block, size,
>   mem_path, errp);
>  if (!new_block->host) {
> @@ -1703,8 +1676,8 @@ static void reclaim_ramblock(RAMBlock *block)
>  xen_invalidate_map_cache_entry(block->host);
>  #ifndef _WIN32
>  } else if (block->fd >= 0) {
> -if (block->flags & RAM_EXTRA) {
> -munmap(block->host, block->max_length + getpagesize());
> +if (block->flags & RAM_FILE) {
> +qemu_ram_munmap(block->host, block->max_length);
>  } else {
>  munmap(block->host, block->max_length);
>  }
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> new file mode 100644
> index 000..05c8b4b
> --- /dev/null
> +++ b/util/mmap-alloc.c
> @@ -0,0 +1,52 @@
> +/*
> + * Support for RAM backe

Re: [Qemu-devel] [RFC/PATCH] monitor/ppc: Access all SPRs from the monitor

2015-09-30 Thread Peter Maydell
On 27 September 2015 at 07:31, Benjamin Herrenschmidt
 wrote:
> We already have a table with all supported SPRs along with their names,
> so let's use that rather than a duplicate table that is perpetually
> out of sync in the monitor code.
>
> This adds a new monitor hook target_extra_monitor_def() which is called
> if nothing is found is the normal table. We still use the old mechanism
> for anything that isn't an SPR.
>
> Signed-off-by: Benjamin Herrenschmidt 
> ---
>  include/monitor/hmp-target.h |  1 +
>  monitor.c|  8 +++-
>  stubs/Makefile.objs  |  1 +
>  stubs/target-extra-monitor-def.c | 10 +
>  target-ppc/monitor.c | 93 
> +---
>  5 files changed, 39 insertions(+), 74 deletions(-)
>  create mode 100644 stubs/target-extra-monitor-def.c
>
> diff --git a/include/monitor/hmp-target.h b/include/monitor/hmp-target.h
> index 213566c..b946e32 100644
> --- a/include/monitor/hmp-target.h
> +++ b/include/monitor/hmp-target.h
> @@ -35,6 +35,7 @@ struct MonitorDef {
>  };
>
>  const MonitorDef *target_monitor_defs(void);
> +int target_extra_monitor_def(uint64_t *pval, const char *name);

This would be a good place to put a doc comment documenting
the semantics of this new hook.

MonitorDef structs treat the value to be obtained as
a target_long, but this uses uint64_t, which is a bit
inconsistent.

It might be better to:
 (a) fix the core monitor code to deal in int64_t rather
 than target_long
 (b) consider whether it would be better to have the ppc
 code generate a bunch of MonitorDef structs to return for the
 SPRs rather than having an extra hook function

> --- /dev/null
> +++ b/stubs/target-extra-monitor-def.c
> @@ -0,0 +1,10 @@
> +#include "stddef.h"
> +#include "qemu/typedefs.h"
> +#include 
> +
> +int target_extra_monitor_def(uint64_t *pval, const char *name);
> +
> +int target_extra_monitor_def(uint64_t *pval, const char *name)
> +{
> +return -1;
> +}

It would be better to put the prototype for the hook somewhere
the stub file can include it rather than having it just rewritten
here.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v4 40/47] tests: add ivshmem qtest

2015-09-30 Thread Claudio Fontana
On 24.09.2015 13:37, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Adds 4 ivshmemtests:
> - single qemu instance and basic IO
> - pair of instances, check memory sharing
> - pair of instances with server, and MSIX
> - hot plug/unplug
> 
> A temporary shm is created as well as a directory to place server
> socket, both should be clear on exit and abort.
> 
> Cc: Cam Macdonell 
> CC: Andreas Färber 
> Signed-off-by: Marc-André Lureau 

Three comments below, otherwise fine.

> ---
>  tests/Makefile   |   3 +
>  tests/ivshmem-test.c | 481 
> +++
>  2 files changed, 484 insertions(+)
>  create mode 100644 tests/ivshmem-test.c
> 
> diff --git a/tests/Makefile b/tests/Makefile
> index 4063639..7e6ac43 100644
> --- a/tests/Makefile
> +++ b/tests/Makefile
> @@ -146,6 +146,8 @@ gcov-files-pci-y += hw/display/virtio-gpu-pci.c
>  gcov-files-pci-$(CONFIG_VIRTIO_VGA) += hw/display/virtio-vga.c
>  check-qtest-pci-y += tests/intel-hda-test$(EXESUF)
>  gcov-files-pci-y += hw/audio/intel-hda.c hw/audio/hda-codec.c
> +check-qtest-pci-$(CONFIG_LINUX) += tests/ivshmem-test$(EXESUF)
> +gcov-files-pci-y += hw/misc/ivshmem.c
>  
>  check-qtest-i386-y = tests/endianness-test$(EXESUF)
>  check-qtest-i386-y += tests/fdc-test$(EXESUF)
> @@ -435,6 +437,7 @@ tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o 
> qemu-char.o qemu-timer.o
>  tests/qemu-iotests/socket_scm_helper$(EXESUF): 
> tests/qemu-iotests/socket_scm_helper.o
>  tests/test-qemu-opts$(EXESUF): tests/test-qemu-opts.o $(test-util-obj-y)
>  tests/test-write-threshold$(EXESUF): tests/test-write-threshold.o 
> $(test-block-obj-y)
> +tests/ivshmem-test$(EXESUF): tests/ivshmem-test.o 
> contrib/ivshmem-server/ivshmem-server.o $(libqos-pc-obj-y)
>  
>  ifeq ($(CONFIG_POSIX),y)
>  LIBS += -lutil
> diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
> new file mode 100644
> index 000..097de15
> --- /dev/null
> +++ b/tests/ivshmem-test.c
> @@ -0,0 +1,481 @@
> +/*
> + * QTest testcase for ivshmem
> + *
> + * Copyright (c) 2015 Red Hat, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "contrib/ivshmem-server/ivshmem-server.h"
> +#include "libqos/pci-pc.h"
> +#include "libqtest.h"
> +#include "qemu/osdep.h"
> +#include 
> +
> +#if GLIB_CHECK_VERSION(2, 32, 0)
> +#define HAVE_THREAD_NEW
> +#endif
> +
> +#define TMPSHMSIZE (1 << 20)
> +static char *tmpshm;
> +static void *tmpshmem;
> +static char *tmpdir;
> +static char *tmpserver;
> +
> +static void save_fn(QPCIDevice *dev, int devfn, void *data)
> +{
> +QPCIDevice **pdev = (QPCIDevice **) data;
> +
> +*pdev = dev;
> +}
> +
> +static QPCIDevice *get_device(void)
> +{
> +QPCIDevice *dev;
> +QPCIBus *pcibus;
> +
> +pcibus = qpci_init_pc();
> +qpci_device_foreach(pcibus, 0x1af4, 0x1110, save_fn, &dev);
> +g_assert(dev != NULL);
> +
> +return dev;
> +}
> +
> +typedef struct _IVState {
> +QTestState *qtest;
> +void *reg_base, *mem_base;
> +QPCIDevice *dev;
> +} IVState;
> +
> +enum Reg {
> +INTRMASK = 0,
> +INTRSTATUS = 4,
> +IVPOSITION = 8,
> +DOORBELL = 12,
> +};
> +
> +static const char* reg2str(enum Reg reg) {
> +switch (reg) {
> +case INTRMASK:
> +return "IntrMask";
> +case INTRSTATUS:
> +return "IntrStatus";
> +case IVPOSITION:
> +return "IVPosition";
> +case DOORBELL:
> +return "DoorBell";
> +default:
> +return NULL;
> +}
> +}
> +
> +static inline unsigned in_reg(IVState *s, enum Reg reg)
> +{
> +const char *name = reg2str(reg);
> +QTestState *qtest = global_qtest;
> +unsigned res;
> +
> +global_qtest = s->qtest;
> +res = qpci_io_readl(s->dev, s->reg_base + reg);
> +g_test_message("*%s -> %x\n", name, res);
> +global_qtest = qtest;
> +
> +return res;
> +}
> +
> +static inline void out_reg(IVState *s, enum Reg reg, unsigned v)
> +{
> +const char *name = reg2str(reg);
> +QTestState *qtest = global_qtest;
> +
> +global_qtest = s->qtest;
> +g_test_message("%x -> *%s\n", v, name);
> +qpci_io_writel(s->dev, s->reg_base + reg, v);
> +global_qtest = qtest;
> +}
> +
> +static void setup_vm_cmd(IVState *s, const char *cmd, bool msix)
> +{
> +uint64_t barsize;
> +
> +s->qtest = qtest_start(cmd);
> +
> +s->dev = get_device();
> +
> +/* FIXME: other bar order fails, mappings changes */
> +s->mem_base = qpci_iomap(s->dev, 2, &barsize);
> +g_assert_nonnull(s->mem_base);

I get an error on this one. That function is introduced in glib 2.40.
what about g_assert(s->mem_base != NULL) ?

> +g_assert_cmpuint(barsize, ==, TMPSHMSIZE);
> +
> +if (msix) {
> +qpci_msix_enable(s->dev);
> +}
> +
> +s->reg_base = qpci_iomap(s->dev, 0, &bars

Re: [Qemu-devel] [PATCH v8 23/54] Add migration-capability boolean for postcopy-ram.

2015-09-30 Thread Eric Blake
On 09/30/2015 01:00 AM, Amit Shah wrote:

>> Reviewed-by: Eric Blake 
>>
>> I'm guessing the plan is to keep this experimental until a bit more
>> experience is gained, to make sure we aren't missing anything essential
>> in the use of postcopy.
> 
>>From the cover letter:
> 
> I'm keeping the x-  for now, until the libvirt interface gets finalised.
> 
> I expect, though, that we'll merge this series in 2.5, and remove the
> x- before the 2.5 release.  My main concern of the Linux interface
> being not released in a stable release will be satisfied with the 4.3
> kernel release.
> 
> Any concerns from the libvirt side?

No, that should be fine. The libvirt side won't push the commit until
the x- is gone, but there's nothing stopping us from developing the
interface in parallel while x- is still present to prove that the design
will work.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v5 24/38] blockdev: Pull out blockdev option extraction

2015-09-30 Thread Alberto Garcia
On Fri 18 Sep 2015 05:22:59 PM CEST, Max Reitz  wrote:
> Extract some of the blockdev option extraction code from blockdev_init()
> into its own function. This simplifies blockdev_init() and will allow
> reusing the code in a different function added in a follow-up patch.
>
> Signed-off-by: Max Reitz 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-devel] [PATCH v4 4/5] acpi: arm: add fw_cfg device node to dsdt

2015-09-30 Thread Peter Maydell
On 30 September 2015 at 11:21, Laszlo Ersek  wrote:
> However: if Gabriel has no access to actual aarch64 hardware (ie. cannot
> run KVM guests), then I don't think he should bother. Booting just the
> UEFI firmware on qemu-system-aarch64 with TCG acceleration is fine, but
> for checking "/proc/iomem", he'd really need to boot into guest Linux,
> and *that* takes absolutely forever with TCG.

If it actually takes forever that's a bug of some sort I think.
TCG isn't all that snappy but it shouldn't take more than a few
minutes to boot and it should be at least usably responsive on
the command line once you get there. (Best not to try to boot
into a GUI, though.)

-- PMM



Re: [Qemu-devel] feature idea: allow user to run custom scripts

2015-09-30 Thread Peter Maydell
On 30 September 2015 at 09:14, Dr. David Alan Gilbert
 wrote:
> * Markus Armbruster (arm...@redhat.com) wrote:
>> In my opinion, QEMU should leave them to separate GUI shells, because
>> doing everything in QEMU distracts from our core mission and we don't
>> have GUI expertise[*].  One more point: building in the GUI is
>> problematic when you don't trust the guest, because then you really want
>> to run QEMU with least privileges.
>
> Given that we have a built in GUI then I can see people wanting to expand
> it.

Right, but where do you draw the line? We clearly don't have the
active maintainer and review capacity to do anything serious with
"ui/" (MAINTAINERS lists everything except SPICE as Odd Fixes).

This is why I tend to agree with Markus' opinion here: we should
provide enough graphical UI to make raw QEMU minimally usable,
and leave further user-friendliness to other projects which have
more direct interest in that.

If we had more regular contributors who were actively interested
in improving our UI layer my opinion might be different.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v8 00/54] Postcopy implementation

2015-09-30 Thread Bharata B Rao
On Mon, Sep 28, 2015 at 05:51:39PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
>   This is the 8th cut of my version of postcopy.
> 
> The userfaultfd linux kernel code is now in the upstream kernel
> tree, and so 4.3-rc3 can be used without modification.
> 
> This qemu series can be found at:
> https://github.com/orbitfp7/qemu.git
> on the wp3-postcopy-v8 tag
> 
> 
> Testing status:
>   * Tested heavily on x86
>   * Smoke tested on aarch64 (so it does work on different page sizes)
>   * Power is unhappy for me (but gets further than the htab problem
> v7 used to have) (I get a kvm run failed)

Seems to be completing successfully on Power. But it takes 2min for the
migration status to transition from setup to active.

Host: 4.3.0-rc3+
Guest: 4.3.0-rc3+
QEMU: wp3-postcopy-v8 of your tree.

# ./ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -machine pseries 
-m 8G,slots=32,maxmem=32G -device virtio-blk-pci,drive=rootdisk -drive 
file=/home/bharata/F20-snap1,if=none,cache=none,id=rootdisk,format=qcow2 -vga 
none -net nic,model=virtio -net user -redir tcp:2000::22 -smp 16,maxcpus=32 
-serial pty

(qemu) migrate_set_capability x-postcopy-ram on
(qemu) migrate -d tcp:localhost:
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
compress: off events: off x-postcopy-ram: on 
Migration status: setup
total time: 0 milliseconds

same status for around 2min...

(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
compress: off events: off x-postcopy-ram: on 
Migration status: active
total time: 130089 milliseconds
expected downtime: 300 milliseconds
setup: 24 milliseconds
transferred ram: 79454 kbytes
throughput: 50.09 mbps
remaining ram: 7670688 kbytes
total ram: 8388864 kbytes
duplicate: 160684 pages
skipped: 0 pages
normal: 18860 pages
normal bytes: 75440 kbytes
dirty sync count: 1

(qemu) migrate_start_postcopy

(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
compress: off events: off x-postcopy-ram: on 
Migration status: postcopy-active
total time: 135627 milliseconds
expected downtime: 43 milliseconds
setup: 24 milliseconds
transferred ram: 338598 kbytes
throughput: 74.02 mbps
remaining ram: 1688384 kbytes
total ram: 8388864 kbytes
duplicate: 1600406 pages
skipped: 0 pages
normal: 75258 pages
normal bytes: 301032 kbytes
dirty sync count: 0
dirty pages rate: 98 pages

(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
compress: off events: off x-postcopy-ram: on 
Migration status: completed
total time: 136898 milliseconds
downtime: 685 milliseconds
setup: 24 milliseconds
transferred ram: 1194196 kbytes
throughput: 72.00 mbps
remaining ram: 0 kbytes
total ram: 8388864 kbytes
duplicate: 1810921 pages
skipped: 0 pages
normal: 286839 pages
normal bytes: 1147356 kbytes
dirty sync count: 2




Re: [Qemu-devel] [PATCH v4 4/5] acpi: arm: add fw_cfg device node to dsdt

2015-09-30 Thread Laszlo Ersek
test results from an aarch64 Linux guest (using KVM and UEFI):

On 09/29/15 12:40, Laszlo Ersek wrote:
> On 09/27/15 23:29, Gabriel L. Somlo wrote:
>> Add a fw_cfg device node to the ACPI DSDT. This is mostly
>> informational, as the authoritative fw_cfg MMIO region(s)
>> are listed in the Device Tree. However, since we are building
>> ACPI tables, we might as well be thorough while at it...
>>
>> Signed-off-by: Gabriel Somlo 
>> ---
>>  hw/arm/virt-acpi-build.c | 15 +++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>> index 1aaff1f..f314132 100644
>> --- a/hw/arm/virt-acpi-build.c
>> +++ b/hw/arm/virt-acpi-build.c
>> @@ -110,6 +110,20 @@ static void acpi_dsdt_add_rtc(Aml *scope, const 
>> MemMapEntry *rtc_memmap,
>>  aml_append(scope, dev);
>>  }
>>  
>> +static void acpi_dsdt_add_fw_cfg(Aml *scope, const MemMapEntry 
>> *fw_cfg_memmap)
>> +{
>> +Aml *dev = aml_device("FWCF");
>> +aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0002")));
>> +/* device present, functioning, decoding, not shown in UI */
>> +aml_append(dev, aml_name_decl("_STA", aml_int(0xB)));
>> +
>> +Aml *crs = aml_resource_template();
>> +aml_append(crs, aml_memory32_fixed(fw_cfg_memmap->base,
>> +   fw_cfg_memmap->size, 
>> AML_READ_WRITE));
>> +aml_append(dev, aml_name_decl("_CRS", crs));
>> +aml_append(scope, dev);
>> +}
>> +
>>  static void acpi_dsdt_add_flash(Aml *scope, const MemMapEntry *flash_memmap)
>>  {
>>  Aml *dev, *crs;
>> @@ -529,6 +543,7 @@ build_dsdt(GArray *table_data, GArray *linker, 
>> VirtGuestInfo *guest_info)
>> (irqmap[VIRT_UART] + ARM_SPI_BASE));
>>  acpi_dsdt_add_rtc(scope, &memmap[VIRT_RTC],
>>(irqmap[VIRT_RTC] + ARM_SPI_BASE));
>> +acpi_dsdt_add_fw_cfg(scope, &memmap[VIRT_FW_CFG]);
>>  acpi_dsdt_add_flash(scope, &memmap[VIRT_FLASH]);
>>  acpi_dsdt_add_virtio(scope, &memmap[VIRT_MMIO],
>>  (irqmap[VIRT_MMIO] + ARM_SPI_BASE), 
>> NUM_VIRTIO_TRANSPORTS);
>>
> 
> Looks sane to me.
> 
> Did you test this with an aarch64 Linux guest (acpidump -b; iasl -d;

So I dumped and decompiled the DSDT, and the relevant output is:

> Device (FWCF)
> {
> Name (_HID, "QEMU0002")  // _HID: Hardware ID
> Name (_STA, 0x0B)  // _STA: Status
> Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource 
> Settings
> {
> Memory32Fixed (ReadWrite,
> 0x0902, // Address Base
> 0x000A, // Address Length
> )
> })
> }

This is correct -- the fw_cfg MMIO register block is correctly described by the 
above. (The actual size will change once Marc's fw_cfg-DMA series is merged, 
but that will be reflected by this patch automatically.)

Second,

> cat
> /proc/iomem?) I can help with that, if you'd like.
> 
> Reviewed-by: Laszlo Ersek 

this is the contents of /proc/iomem:

> -03ff : LNRO0015:00
> 0400-07ff : LNRO0015:01
> 0900-09000fff : ARMH0011:00
>   0900-09000fff : ARMH0011:00
> 0901-09010fff : LNRO0013:00
> 0902-09020009 : QEMU0002:00 < see it here (it's inclusive)
> 0a00-0a0001ff : LNRO0005:00
> 0a000200-0a0003ff : LNRO0005:01
> 0a000400-0a0005ff : LNRO0005:02
> 0a000600-0a0007ff : LNRO0005:03
> 0a000800-0a0009ff : LNRO0005:04
> 0a000a00-0a000bff : LNRO0005:05
> 0a000c00-0a000dff : LNRO0005:06
> 0a000e00-0a000fff : LNRO0005:07
> 0a001000-0a0011ff : LNRO0005:08
> 0a001200-0a0013ff : LNRO0005:09
> 0a001400-0a0015ff : LNRO0005:0a
> 0a001600-0a0017ff : LNRO0005:0b
> 0a001800-0a0019ff : LNRO0005:0c
> 0a001a00-0a001bff : LNRO0005:0d
> 0a001c00-0a001dff : LNRO0005:0e
> 0a001e00-0a001fff : LNRO0005:0f
> 0a002000-0a0021ff : LNRO0005:10
> 0a002200-0a0023ff : LNRO0005:11
> 0a002400-0a0025ff : LNRO0005:12
> 0a002600-0a0027ff : LNRO0005:13
> 0a002800-0a0029ff : LNRO0005:14
> 0a002a00-0a002bff : LNRO0005:15
> 0a002c00-0a002dff : LNRO0005:16
> 0a002e00-0a002fff : LNRO0005:17
> 0a003000-0a0031ff : LNRO0005:18
> 0a003200-0a0033ff : LNRO0005:19
> 0a003400-0a0035ff : LNRO0005:1a
> 0a003600-0a0037ff : LNRO0005:1b
> 0a003800-0a0039ff : LNRO0005:1c
> 0a003a00-0a003bff : LNRO0005:1d
> 0a003c00-0a003dff : LNRO0005:1e
>   0a003c00-0a003dff : LNRO0005:1e
> 0a003e00-0a003fff : LNRO0005:1f
>   0a003e00-0a003fff : LNRO0005:1f
> 1000-3efe : PCI Bus :00
> 3f00-3fff : PCI MMCONFIG  [bus 00-0f]
> 4000-13fff : System RAM
>   4008-40c22523 : Kernel code
>   40d2-414d : Kernel data
>   7fe0-ffdf : Crash kernel
>   fff3-fff8 : ACPI RAM
>   fffe- : ACPI RAM
> 80-80 : PCI Bus :00

Therefore

Tested-by: Laszlo Ersek 

Thanks
Laszlo



Re: [Qemu-devel] [PATCH v8 00/54] Postcopy implementation

2015-09-30 Thread Dr. David Alan Gilbert
* Bharata B Rao (bhar...@linux.vnet.ibm.com) wrote:
> On Mon, Sep 28, 2015 at 05:51:39PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> >   This is the 8th cut of my version of postcopy.
> > 
> > The userfaultfd linux kernel code is now in the upstream kernel
> > tree, and so 4.3-rc3 can be used without modification.
> > 
> > This qemu series can be found at:
> > https://github.com/orbitfp7/qemu.git
> > on the wp3-postcopy-v8 tag
> > 
> > 
> > Testing status:
> >   * Tested heavily on x86
> >   * Smoke tested on aarch64 (so it does work on different page sizes)
> >   * Power is unhappy for me (but gets further than the htab problem
> > v7 used to have) (I get a kvm run failed)
> 
> Seems to be completing successfully on Power. But it takes 2min for the
> migration status to transition from setup to active.
> 
> Host: 4.3.0-rc3+
> Guest: 4.3.0-rc3+
> QEMU: wp3-postcopy-v8 of your tree.
> 
> # ./ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -machine pseries 
> -m 8G,slots=32,maxmem=32G -device virtio-blk-pci,drive=rootdisk -drive 
> file=/home/bharata/F20-snap1,if=none,cache=none,id=rootdisk,format=qcow2 -vga 
> none -net nic,model=virtio -net user -redir tcp:2000::22 -smp 16,maxcpus=32 
> -serial pty
> 
> (qemu) migrate_set_capability x-postcopy-ram on
> (qemu) migrate -d tcp:localhost:
> (qemu) info migrate
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: 
> off compress: off events: off x-postcopy-ram: on 
> Migration status: setup
> total time: 0 milliseconds
> 
> same status for around 2min...

That's interesting; I saw that behaviour on my aarch64 box, but not on
my power box or on x86.  Can you try using tcp:127.0.0.1: to force
ipv4 (that fixed it for me on aarch64).  On the aarch box I found that
it still happened with head of tree qemu and so decided it wasn't my
postcopy world; I'm assuming what's happening is that it's trying
to connect to the IPv6 address, timing out and then trying IPv4.


> (qemu) info migrate
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: 
> off compress: off events: off x-postcopy-ram: on 
> Migration status: active
> total time: 130089 milliseconds
> expected downtime: 300 milliseconds
> setup: 24 milliseconds
> transferred ram: 79454 kbytes
> throughput: 50.09 mbps
> remaining ram: 7670688 kbytes
> total ram: 8388864 kbytes
> duplicate: 160684 pages
> skipped: 0 pages
> normal: 18860 pages
> normal bytes: 75440 kbytes
> dirty sync count: 1
> 
> (qemu) migrate_start_postcopy
> 
> (qemu) info migrate
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: 
> off compress: off events: off x-postcopy-ram: on 
> Migration status: postcopy-active
> total time: 135627 milliseconds
> expected downtime: 43 milliseconds
> setup: 24 milliseconds
> transferred ram: 338598 kbytes
> throughput: 74.02 mbps
> remaining ram: 1688384 kbytes
> total ram: 8388864 kbytes
> duplicate: 1600406 pages
> skipped: 0 pages
> normal: 75258 pages
> normal bytes: 301032 kbytes
> dirty sync count: 0
> dirty pages rate: 98 pages
> 
> (qemu) info migrate
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: 
> off compress: off events: off x-postcopy-ram: on 
> Migration status: completed
> total time: 136898 milliseconds
> downtime: 685 milliseconds
> setup: 24 milliseconds
> transferred ram: 1194196 kbytes
> throughput: 72.00 mbps
> remaining ram: 0 kbytes
> total ram: 8388864 kbytes
> duplicate: 1810921 pages
> skipped: 0 pages
> normal: 286839 pages
> normal bytes: 1147356 kbytes
> dirty sync count: 2

Great; is the guest happy?

Dave

> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH v4 4/5] acpi: arm: add fw_cfg device node to dsdt

2015-09-30 Thread Laszlo Ersek
On 09/30/15 11:59, Ard Biesheuvel wrote:
> On 29 September 2015 at 20:26, Gabriel L. Somlo  wrote:
>> On Tue, Sep 29, 2015 at 12:40:16PM +0200, Laszlo Ersek wrote:
>>> On 09/27/15 23:29, Gabriel L. Somlo wrote:
 Add a fw_cfg device node to the ACPI DSDT. This is mostly
 informational, as the authoritative fw_cfg MMIO region(s)
 are listed in the Device Tree. However, since we are building
 ACPI tables, we might as well be thorough while at it...

 Signed-off-by: Gabriel Somlo 
 ---
  hw/arm/virt-acpi-build.c | 15 +++
  1 file changed, 15 insertions(+)

 diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
 index 1aaff1f..f314132 100644
 --- a/hw/arm/virt-acpi-build.c
 +++ b/hw/arm/virt-acpi-build.c
 @@ -110,6 +110,20 @@ static void acpi_dsdt_add_rtc(Aml *scope, const 
 MemMapEntry *rtc_memmap,
  aml_append(scope, dev);
  }

 +static void acpi_dsdt_add_fw_cfg(Aml *scope, const MemMapEntry 
 *fw_cfg_memmap)
 +{
 +Aml *dev = aml_device("FWCF");
 +aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0002")));
 +/* device present, functioning, decoding, not shown in UI */
 +aml_append(dev, aml_name_decl("_STA", aml_int(0xB)));
 +
 +Aml *crs = aml_resource_template();
 +aml_append(crs, aml_memory32_fixed(fw_cfg_memmap->base,
 +   fw_cfg_memmap->size, 
 AML_READ_WRITE));
 +aml_append(dev, aml_name_decl("_CRS", crs));
 +aml_append(scope, dev);
 +}
 +
  static void acpi_dsdt_add_flash(Aml *scope, const MemMapEntry 
 *flash_memmap)
  {
  Aml *dev, *crs;
 @@ -529,6 +543,7 @@ build_dsdt(GArray *table_data, GArray *linker, 
 VirtGuestInfo *guest_info)
 (irqmap[VIRT_UART] + ARM_SPI_BASE));
  acpi_dsdt_add_rtc(scope, &memmap[VIRT_RTC],
(irqmap[VIRT_RTC] + ARM_SPI_BASE));
 +acpi_dsdt_add_fw_cfg(scope, &memmap[VIRT_FW_CFG]);
  acpi_dsdt_add_flash(scope, &memmap[VIRT_FLASH]);
  acpi_dsdt_add_virtio(scope, &memmap[VIRT_MMIO],
  (irqmap[VIRT_MMIO] + ARM_SPI_BASE), 
 NUM_VIRTIO_TRANSPORTS);

>>>
>>> Looks sane to me.
>>>
>>> Did you test this with an aarch64 Linux guest (acpidump -b; iasl -d; cat
>>> /proc/iomem?) I can help with that, if you'd like.
>>
>> I have a F22 arm setup generated by virt-builder, which I start using:
>>
>> bin/qemu-system-arm -M virt,accel=tcg -cpu cortex-a15 \
>>   -kernel ./ArmVirtBuilder/vmlinuz-4.0.4-301.fc22.armv7hl+lpae \
>>   -initrd ./ArmVirtBuilder/initramfs-4.0.4-301.fc22.armv7hl+lpae.img \
>>   -append "console=ttyAMA0 root=/dev/vda3 ro" \
>>   -device virtio-blk-device,drive=hd0 \
>>   -drive id=hd0,if=none,snapshot=on,file=./ArmVirtBuilder/fedora-22.img \
>>   -device virtio-net-device,netdev=usernet \
>>   -netdev user,id=usernet \
>>   -monitor stdio
>>
> 
> Note that you are booting 32-bit ARM here, which does not support ACPI nor 
> UEFI.
> (UEFI is work in progress, so you can try my ARM 32-bit UEFI tree if
> you need to: 
> https://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm-efi-combined-v2)
> 
> You will need to create an arm64 / AArch64 setup and boot the virt
> model using 'qemu-system-aarch64 -M virt -cpu cortex-a57 ...' instead.
> In either case, as Laszlo pointed out, you need UEFI firmware in QEMU
> as well.

I'm about to follow up with my test results, and I considered writing up
a more or less complete guide for Gabriel to test this with an aarch64
guest.

However: if Gabriel has no access to actual aarch64 hardware (ie. cannot
run KVM guests), then I don't think he should bother. Booting just the
UEFI firmware on qemu-system-aarch64 with TCG acceleration is fine, but
for checking "/proc/iomem", he'd really need to boot into guest Linux,
and *that* takes absolutely forever with TCG.

(Dependent on your guest distro, of course; I have tested Fedora 21+ and
RHELSA / RHEL-7 candidates thus far. I wouldn't recommend TCG for those.)

So, I'll just leave these links here for posterity (they could be
somewhat outdated), and I offer to help with aarch64 guest testing in
the future as well, if the patch series overlaps with my interests.

https://wiki.linaro.org/LEG/UEFIforQEMU
https://fedoraproject.org/wiki/Architectures/AArch64/Install_with_QEMU

Thanks
Laszlo



Re: [Qemu-devel] Loading image/elf to cpu that has different not system memory address space

2015-09-30 Thread Peter Maydell
On 30 September 2015 at 06:18, Marcin Krzemiński
 wrote:
> I have at 0xfff0 real memory now (with aliasing to lower memory
> address).
> Does it mean that qemu might try to execute some instructions from there?

As I say, we need there to be fake RAM at that address. We never
try to read its contents, though.

-- PMM



Re: [Qemu-devel] [RFC v5 4/6] target-arm: Create new runtime helpers for excl accesses

2015-09-30 Thread alvise rigo
On Wed, Sep 30, 2015 at 6:03 AM, Richard Henderson  wrote:
> On 09/24/2015 06:32 PM, Alvise Rigo wrote:
>>
>> Introduce a set of new runtime helpers do handle exclusive instructions.
>> This helpers are used as hooks to call the respective LL/SC helpers in
>> softmmu_llsc_template.h from TCG code.
>>
>> Suggested-by: Jani Kokkonen 
>> Suggested-by: Claudio Fontana 
>> Signed-off-by: Alvise Rigo 
>> ---
>>   target-arm/helper.h| 10 ++
>>   target-arm/op_helper.c | 94
>> ++
>>   2 files changed, 104 insertions(+)
>>
>> diff --git a/target-arm/helper.h b/target-arm/helper.h
>> index 827b33d..8e7a7c2 100644
>> --- a/target-arm/helper.h
>> +++ b/target-arm/helper.h
>> @@ -530,6 +530,16 @@ DEF_HELPER_2(dc_zva, void, env, i64)
>>   DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>>   DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>>
>> +DEF_HELPER_3(ldlink_aa32_i8, i32, env, i32, i32)
>> +DEF_HELPER_3(ldlink_aa32_i16, i32, env, i32, i32)
>> +DEF_HELPER_3(ldlink_aa32_i32, i32, env, i32, i32)
>> +DEF_HELPER_3(ldlink_aa32_i64, i64, env, i32, i32)
>> +
>> +DEF_HELPER_4(stcond_aa32_i8, i32, env, i32, i32, i32)
>> +DEF_HELPER_4(stcond_aa32_i16, i32, env, i32, i32, i32)
>> +DEF_HELPER_4(stcond_aa32_i32, i32, env, i32, i32, i32)
>> +DEF_HELPER_4(stcond_aa32_i64, i32, env, i32, i64, i32)
>> +
>>   #ifdef TARGET_AARCH64
>>   #include "helper-a64.h"
>>   #endif
>> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
>> index 663c05d..d832ba8 100644
>> --- a/target-arm/op_helper.c
>> +++ b/target-arm/op_helper.c
>> @@ -969,3 +969,97 @@ uint32_t HELPER(ror_cc)(CPUARMState *env, uint32_t x,
>> uint32_t i)
>>   return ((uint32_t)x >> shift) | (x << (32 - shift));
>>   }
>>   }
>> +
>> +/* LoadLink helpers, only unsigned. */
>> +static void * const qemu_ldex_helpers[16] = {
>> +[MO_UB]   = helper_ret_ldlinkub_mmu,
>> +
>> +[MO_LEUW] = helper_le_ldlinkuw_mmu,
>> +[MO_LEUL] = helper_le_ldlinkul_mmu,
>> +[MO_LEQ]  = helper_le_ldlinkq_mmu,
>> +
>> +[MO_BEUW] = helper_be_ldlinkuw_mmu,
>> +[MO_BEUL] = helper_be_ldlinkul_mmu,
>> +[MO_BEQ]  = helper_be_ldlinkq_mmu,
>> +};
>> +
>> +#define LDEX_HELPER(SUFF, OPC)  \
>> +uint32_t HELPER(ldlink_aa32_i##SUFF)(CPUARMState *env, uint32_t addr,   \
>> +   uint32_t index)  \
>> +{   \
>> +CPUArchState *state = env;  \
>> +TCGMemOpIdx op; \
>> +\
>> +op = make_memop_idx(OPC, index);\
>> +\
>> +tcg_target_ulong (*func)(CPUArchState *env, target_ulong addr,  \
>> + TCGMemOpIdx oi, uintptr_t retaddr);\
>> +func = qemu_ldex_helpers[OPC];  \
>> +\
>> +return (uint32_t)func(state, addr, op, GETRA());\
>> +}
>> +
>> +LDEX_HELPER(8, MO_UB)
>> +LDEX_HELPER(16, MO_TEUW)
>> +LDEX_HELPER(32, MO_TEUL)
>
>
> This is not what Aurelien meant.  I cannot see any reason at present why
> generic wrappers, available for all targets, shouldn't be sufficient.

I thought we could create ad-hoc helpers for each architecture - for
instance cmpxchg-like helpers for x86.
But now that I think about it, I can image just two types of helpers
(for the two atomic approaches - ll/sc and cmpxchg), that of course
can be generic.

>
> See tcg/tcg-runtime.h and tcg-runtime.c.

I will move them there.

>
> You shouldn't need to look up a function in a table like this.  The decision
> about whether to call a BE or LE helper should have been made in the
> translator.

In the next version I will:

- extend the macro to generate both versions of the helper and not just one
- make the decision in translate.c, always using MO_TE as 'toggle'

Thank you,
alvise

>
>
> r~



Re: [Qemu-devel] [PATCH 4/4] spapr_pci: Allow VFIO devices to work on the normal PCI host bridge

2015-09-30 Thread Thomas Huth
On 30/09/15 05:48, David Gibson wrote:
> The core VFIO infrastructure more or less allows VFIO devices to work
> on any normal guest PCI host bridge (PHB) without extra logic.
> However, the "spapr-pci-host-bridge" device (as opposed to the special
> "spapr-pci-vfio-host-bridge" device) breaks this by using a partially
> KVM accelerated implementation of the guest kernel IOMMU which won't
> work with VFIO devices, without additional kernel support.
> 
> This patch allows VFIO devices to work on the spapr-pci-host-bridge,
> by having it switch off KVM TCE acceleration when a VFIO device is
> added to the PHB (either on startup, or by hotplug).
> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr_pci.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index cb7c351..55fa8db 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1083,6 +1083,12 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector 
> *drc,
>  void *fdt = NULL;
>  int fdt_start_offset = 0, fdt_size;
>  
> +if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
> +sPAPRTCETable *tcet = spapr_tce_find_by_liobn(phb->dma_liobn);
> +
> +spapr_tce_set_need_vfio(tcet, true);
> +}
> +
>  if (dev->hotplugged) {
>  fdt = create_device_tree(&fdt_size);
>  fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);

Reviewed-by: Thomas Huth 




Re: [Qemu-devel] [PATCH 3/4] spapr_iommu: Provide a function to switch a TCE table to allowing VFIO

2015-09-30 Thread Thomas Huth
On 30/09/15 05:48, David Gibson wrote:
> Because of the way non-VFIO guest IOMMU operations are KVM accelerated, not
> all TCE tables (guest IOMMU contexts) can support VFIO devices.  Currently,
> this is decided at creation time.
> 
> To support hotplug of VFIO devices, we need to allow a TCE table which
> previously didn't allow VFIO devices to be switched so that it can.  This
> patch adds an spapr_tce_set_need_vfio() function to do this, by
> reallocating the table in userspace if necessary.
> 
> Currently this doesn't allow the KVM acceleration to be re-enabled if all
> the VFIO devices are removed.  That's an optimization for another time.
> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr_iommu.c   | 32 
>  include/hw/ppc/spapr.h |  2 ++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 5166cde..8d60f8b 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -168,6 +168,38 @@ static int spapr_tce_table_realize(DeviceState *dev)
>  return 0;
>  }
>  
> +void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
> +{
> +size_t table_size = tcet->nb_table * sizeof(uint64_t);
> +void *newtable;
> +
> +if (need_vfio == tcet->need_vfio) {
> +/* Nothing to do */
> +return;
> +}
> +
> +if (!need_vfio) {
> +/* FIXME: We don't support transition back to KVM accelerated
> + * TCEs yet */
> +return;
> +}
> +
> +tcet->need_vfio = true;
> +
> +if (tcet->fd < 0) {
> +/* Table is already in userspace, nothing to be do */
> +return;
> +}
> +
> +newtable = g_malloc0(table_size);

Since you immediately fill the whole table with the memcpy below, you do
not need to zero the memory here, i.e. g_malloc instead of g_malloc0
should be sufficient.

> +memcpy(newtable, tcet->table, table_size);
> +
> +kvmppc_remove_spapr_tce(tcet->table, tcet->fd, tcet->nb_table);
> +
> +tcet->fd = -1;
> +tcet->table = newtable;
> +}

 Thomas





Re: [Qemu-devel] [PATCH] migration: disallow migrate_add_blocker during migration

2015-09-30 Thread Kevin Wolf
Am 29.09.2015 um 22:20 hat John Snow geschrieben:
> If a migration is already in progress and somebody attempts
> to add a migration blocker, this should rightly fail.
> 
> Add an errp parameter and a retcode return value to migrate_add_blocker.
> 
> This is part one of two for a solution to prohibit e.g. block jobs
> from running concurrently with migration.
> 
> Signed-off-by: John Snow 

Through which tree should this be merged?

>  block/qcow.c  |  5 -
>  block/vdi.c   |  5 -
>  block/vhdx.c  |  5 -
>  block/vmdk.c  | 13 +
>  block/vpc.c   |  9 ++---
>  block/vvfat.c | 19 +++
>  hw/9pfs/virtio-9p.c   | 15 +++
>  hw/misc/ivshmem.c |  5 -
>  hw/scsi/vhost-scsi.c  | 11 +++
>  hw/virtio/vhost.c | 31 +++
>  include/migration/migration.h |  4 +++-
>  migration/migration.c | 32 
>  stubs/migr-blocker.c  |  3 ++-
>  target-i386/kvm.c |  6 +-
>  14 files changed, 117 insertions(+), 46 deletions(-)
> 
> diff --git a/block/qcow.c b/block/qcow.c
> index 6e35db1..1b82dec 100644
> --- a/block/qcow.c
> +++ b/block/qcow.c
> @@ -236,7 +236,10 @@ static int qcow_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  error_setg(&s->migration_blocker, "The qcow format used by node '%s' "
> "does not support live migration",
> bdrv_get_device_or_node_name(bs));
> -migrate_add_blocker(s->migration_blocker);
> +ret = migrate_add_blocker(s->migration_blocker, errp);
> +if (ret < 0) {
> +goto fail;

This error path leaks s->migration_blocker.

> +}
>  
>  qemu_co_mutex_init(&s->lock);
>  return 0;
> diff --git a/block/vdi.c b/block/vdi.c
> index 062a654..95b2690 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -505,7 +505,10 @@ static int vdi_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  error_setg(&s->migration_blocker, "The vdi format used by node '%s' "
> "does not support live migration",
> bdrv_get_device_or_node_name(bs));
> -migrate_add_blocker(s->migration_blocker);
> +ret = migrate_add_blocker(s->migration_blocker, errp);
> +if (ret < 0) {
> +goto fail_free_bmap;

Same.

> +}
>  
>  qemu_co_mutex_init(&s->write_lock);
>  
> diff --git a/block/vhdx.c b/block/vhdx.c
> index d3bb1bd..5bebe34 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -1005,7 +1005,10 @@ static int vhdx_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  error_setg(&s->migration_blocker, "The vhdx format used by node '%s' "
> "does not support live migration",
> bdrv_get_device_or_node_name(bs));
> -migrate_add_blocker(s->migration_blocker);
> +ret = migrate_add_blocker(s->migration_blocker, errp);
> +if (ret < 0) {
> +goto fail;
> +}
>  
>  return 0;
>  fail:

This one happens to be okay because VHDX uses the close function in the
failure path (and at last up to now that function even seems to cope
with half-initialised images - it just feels a bit brittle).

> diff --git a/block/vmdk.c b/block/vmdk.c
> index be0d640..09dcf6b 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -943,15 +943,20 @@ static int vmdk_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  if (ret) {
>  goto fail;
>  }
> -s->cid = vmdk_read_cid(bs, 0);
> -s->parent_cid = vmdk_read_cid(bs, 1);

Why do you move this code? It doesn't seem to do anything that you would
need to undo on failure.

> -qemu_co_mutex_init(&s->lock);
>
>  /* Disable migration when VMDK images are used */
>  error_setg(&s->migration_blocker, "The vmdk format used by node '%s' "
> "does not support live migration",
> bdrv_get_device_or_node_name(bs));
> -migrate_add_blocker(s->migration_blocker);
> +ret = migrate_add_blocker(s->migration_blocker, errp);
> +if (ret < 0) {
> +goto fail;
> +}

But the usual leak is still there. :-)

> +
> +s->cid = vmdk_read_cid(bs, 0);
> +s->parent_cid = vmdk_read_cid(bs, 1);
> +qemu_co_mutex_init(&s->lock);
> +
>  g_free(buf);
>  return 0;
>  
> diff --git a/block/vpc.c b/block/vpc.c
> index 2b3b518..4c60942 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -325,13 +325,16 @@ static int vpc_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  #endif
>  }
>  
> -qemu_co_mutex_init(&s->lock);
> -
>  /* Disable migration when VHD images are used */
>  error_setg(&s->migration_blocker, "The vpc format used by node '%s' "
> "does not support live migration",
> bdrv_get_device_or_node_name(bs));
> -migrate_add_blocker(s->migration_blocker);
> +ret = migrate_add_blocker(s->migrat

Re: [Qemu-devel] [PATCH v4 4/5] acpi: arm: add fw_cfg device node to dsdt

2015-09-30 Thread Ard Biesheuvel
On 29 September 2015 at 20:26, Gabriel L. Somlo  wrote:
> On Tue, Sep 29, 2015 at 12:40:16PM +0200, Laszlo Ersek wrote:
>> On 09/27/15 23:29, Gabriel L. Somlo wrote:
>> > Add a fw_cfg device node to the ACPI DSDT. This is mostly
>> > informational, as the authoritative fw_cfg MMIO region(s)
>> > are listed in the Device Tree. However, since we are building
>> > ACPI tables, we might as well be thorough while at it...
>> >
>> > Signed-off-by: Gabriel Somlo 
>> > ---
>> >  hw/arm/virt-acpi-build.c | 15 +++
>> >  1 file changed, 15 insertions(+)
>> >
>> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>> > index 1aaff1f..f314132 100644
>> > --- a/hw/arm/virt-acpi-build.c
>> > +++ b/hw/arm/virt-acpi-build.c
>> > @@ -110,6 +110,20 @@ static void acpi_dsdt_add_rtc(Aml *scope, const 
>> > MemMapEntry *rtc_memmap,
>> >  aml_append(scope, dev);
>> >  }
>> >
>> > +static void acpi_dsdt_add_fw_cfg(Aml *scope, const MemMapEntry 
>> > *fw_cfg_memmap)
>> > +{
>> > +Aml *dev = aml_device("FWCF");
>> > +aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0002")));
>> > +/* device present, functioning, decoding, not shown in UI */
>> > +aml_append(dev, aml_name_decl("_STA", aml_int(0xB)));
>> > +
>> > +Aml *crs = aml_resource_template();
>> > +aml_append(crs, aml_memory32_fixed(fw_cfg_memmap->base,
>> > +   fw_cfg_memmap->size, 
>> > AML_READ_WRITE));
>> > +aml_append(dev, aml_name_decl("_CRS", crs));
>> > +aml_append(scope, dev);
>> > +}
>> > +
>> >  static void acpi_dsdt_add_flash(Aml *scope, const MemMapEntry 
>> > *flash_memmap)
>> >  {
>> >  Aml *dev, *crs;
>> > @@ -529,6 +543,7 @@ build_dsdt(GArray *table_data, GArray *linker, 
>> > VirtGuestInfo *guest_info)
>> > (irqmap[VIRT_UART] + ARM_SPI_BASE));
>> >  acpi_dsdt_add_rtc(scope, &memmap[VIRT_RTC],
>> >(irqmap[VIRT_RTC] + ARM_SPI_BASE));
>> > +acpi_dsdt_add_fw_cfg(scope, &memmap[VIRT_FW_CFG]);
>> >  acpi_dsdt_add_flash(scope, &memmap[VIRT_FLASH]);
>> >  acpi_dsdt_add_virtio(scope, &memmap[VIRT_MMIO],
>> >  (irqmap[VIRT_MMIO] + ARM_SPI_BASE), 
>> > NUM_VIRTIO_TRANSPORTS);
>> >
>>
>> Looks sane to me.
>>
>> Did you test this with an aarch64 Linux guest (acpidump -b; iasl -d; cat
>> /proc/iomem?) I can help with that, if you'd like.
>
> I have a F22 arm setup generated by virt-builder, which I start using:
>
> bin/qemu-system-arm -M virt,accel=tcg -cpu cortex-a15 \
>   -kernel ./ArmVirtBuilder/vmlinuz-4.0.4-301.fc22.armv7hl+lpae \
>   -initrd ./ArmVirtBuilder/initramfs-4.0.4-301.fc22.armv7hl+lpae.img \
>   -append "console=ttyAMA0 root=/dev/vda3 ro" \
>   -device virtio-blk-device,drive=hd0 \
>   -drive id=hd0,if=none,snapshot=on,file=./ArmVirtBuilder/fedora-22.img \
>   -device virtio-net-device,netdev=usernet \
>   -netdev user,id=usernet \
>   -monitor stdio
>

Note that you are booting 32-bit ARM here, which does not support ACPI nor UEFI.
(UEFI is work in progress, so you can try my ARM 32-bit UEFI tree if
you need to: 
https://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm-efi-combined-v2)

You will need to create an arm64 / AArch64 setup and boot the virt
model using 'qemu-system-aarch64 -M virt -cpu cortex-a57 ...' instead.
In either case, as Laszlo pointed out, you need UEFI firmware in QEMU
as well.

-- 
Ard.



Re: [Qemu-devel] [RFC v5 5/6] configure: Use slow-path for atomic only when the softmmu is enabled

2015-09-30 Thread alvise rigo
On Wed, Sep 30, 2015 at 6:05 AM, Richard Henderson  wrote:
> On 09/24/2015 06:32 PM, Alvise Rigo wrote:
>>
>> Use the new slow path for atomic instruction translation when the
>> softmmu is enabled.
>
>
> Um... why?  TCG_USE_LDST_EXCL would appear to be 100% redundant with
> SOFTMMU.

Oops, modifying the previous version of the patch I didn't notice I
created a redundant variable.
I will remove it.

Regards,
alvise

>
>
> r~



Re: [Qemu-devel] [RFC v5 3/6] softmmu: Add helpers for a new slowpath

2015-09-30 Thread alvise rigo
On Wed, Sep 30, 2015 at 5:58 AM, Richard Henderson  wrote:
> On 09/24/2015 06:32 PM, Alvise Rigo wrote:
>>
>> The new helpers rely on the legacy ones to perform the actual read/write.
>>
>> The LoadLink helper (helper_ldlink_name) prepares the way for the
>> following SC operation. It sets the linked address and the size of the
>> access.
>> These helper also update the TLB entry of the page involved in the
>> LL/SC for those vCPUs that have the bit set (dirty), so that the
>> following accesses made by all the vCPUs will follow the slow path.
>>
>> The StoreConditional helper (helper_stcond_name) returns 1 if the
>> store has to fail due to a concurrent access to the same page by
>> another vCPU. A 'concurrent access' can be a store made by *any* vCPU
>> (although, some implementations allow stores made by the CPU that issued
>> the LoadLink).
>>
>> Suggested-by: Jani Kokkonen 
>> Suggested-by: Claudio Fontana 
>> Signed-off-by: Alvise Rigo 
>> ---
>>   cputlb.c|   3 ++
>>   softmmu_llsc_template.h | 124
>> 
>>   softmmu_template.h  |  12 +
>>   tcg/tcg.h   |  30 
>>   4 files changed, 169 insertions(+)
>>   create mode 100644 softmmu_llsc_template.h
>>
>> diff --git a/cputlb.c b/cputlb.c
>> index 1e25a2a..d5aae7c 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -416,6 +416,8 @@ static inline void
>> lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
>>
>>   #define MMUSUFFIX _mmu
>>
>> +/* Generates LoadLink/StoreConditional helpers in softmmu_template.h */
>> +#define GEN_EXCLUSIVE_HELPERS
>>   #define SHIFT 0
>>   #include "softmmu_template.h"
>>
>> @@ -428,6 +430,7 @@ static inline void
>> lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
>>   #define SHIFT 3
>>   #include "softmmu_template.h"
>>   #undef MMUSUFFIX
>> +#undef GEN_EXCLUSIVE_HELPERS
>>
>>   #define MMUSUFFIX _cmmu
>>   #undef GETPC_ADJ
>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
>> new file mode 100644
>> index 000..9f22834
>> --- /dev/null
>> +++ b/softmmu_llsc_template.h
>> @@ -0,0 +1,124 @@
>> +/*
>> + *  Software MMU support (esclusive load/store operations)
>> + *
>> + * Generate helpers used by TCG for qemu_ldlink/stcond ops.
>> + *
>> + * Included from softmmu_template.h only.
>> + *
>> + * Copyright (c) 2015 Virtual Open Systems
>> + *
>> + * Authors:
>> + *  Alvise Rigo 
>> + *
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2 of the License, or (at your option) any later version.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, see
>> .
>> + */
>> +
>> +/* This template does not generate together the le and be version, but
>> only one
>> + * of the two depending on whether BIGENDIAN_EXCLUSIVE_HELPERS has been
>> set.
>> + * The same nomenclature as softmmu_template.h is used for the exclusive
>> + * helpers.  */
>> +
>> +#ifdef BIGENDIAN_EXCLUSIVE_HELPERS
>> +
>> +#define helper_ldlink_name  glue(glue(helper_be_ldlink, USUFFIX),
>> MMUSUFFIX)
>> +#define helper_stcond_name  glue(glue(helper_be_stcond, SUFFIX),
>> MMUSUFFIX)
>> +#define helper_ld glue(glue(helper_be_ld, USUFFIX), MMUSUFFIX)
>> +#define helper_st glue(glue(helper_be_st, SUFFIX), MMUSUFFIX)
>> +
>> +#else /* LE helpers + 8bit helpers (generated only once for both LE end
>> BE) */
>> +
>> +#if DATA_SIZE > 1
>> +#define helper_ldlink_name  glue(glue(helper_le_ldlink, USUFFIX),
>> MMUSUFFIX)
>> +#define helper_stcond_name  glue(glue(helper_le_stcond, SUFFIX),
>> MMUSUFFIX)
>> +#define helper_ld glue(glue(helper_le_ld, USUFFIX), MMUSUFFIX)
>> +#define helper_st glue(glue(helper_le_st, SUFFIX), MMUSUFFIX)
>> +#else /* DATA_SIZE <= 1 */
>> +#define helper_ldlink_name  glue(glue(helper_ret_ldlink, USUFFIX),
>> MMUSUFFIX)
>> +#define helper_stcond_name  glue(glue(helper_ret_stcond, SUFFIX),
>> MMUSUFFIX)
>> +#define helper_ld glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)
>> +#define helper_st glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)
>> +#endif
>> +
>> +#endif
>> +
>> +WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>> +TCGMemOpIdx oi, uintptr_t retaddr)
>> +{
>> +WORD_TYPE ret;
>> +int index;
>> +CPUState *cpu;
>> +hwaddr hw_addr;
>> +unsigned mmu_idx = get_mmuidx(oi);
>> +
>> +/* Use the proper load helper from cpu_ldst.h */
>> +ret = helper_ld(env, addr, mmu_idx, retaddr);
>> +
>> +index = (addr >> TARGET_PAGE

Re: [Qemu-devel] [PATCH v3] Add argument filters to the seccomp sandbox

2015-09-30 Thread Daniel P. Berrange
On Wed, Sep 30, 2015 at 04:40:42AM -0400, Namsun Ch'o wrote:
> > This looks good now.
> > Thanks for your contribution.
> 
> > Acked-by: Eduardo Otubo 
> 
> > ps.: I'll create a pull request with all changes made so far on Friday.
> 
> I was told on IRC to submit patches in smaller chunks, with a few new filters
> at a time. Should I wait until it is merged, or should I go ahead and post a
> v1 patch in a new thread against the patched qemu-seccomp.c now?

There's no need to wait for things to be merged - feel free to post further
patches based on the patch you already submitted. Just mention when posting
them that they're a patch against your previous posting.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCH 4/4] spapr_pci: Allow VFIO devices to work on the normal PCI host bridge

2015-09-30 Thread Laurent Vivier


On 30/09/2015 05:48, David Gibson wrote:
> The core VFIO infrastructure more or less allows VFIO devices to work
> on any normal guest PCI host bridge (PHB) without extra logic.
> However, the "spapr-pci-host-bridge" device (as opposed to the special
> "spapr-pci-vfio-host-bridge" device) breaks this by using a partially
> KVM accelerated implementation of the guest kernel IOMMU which won't
> work with VFIO devices, without additional kernel support.
> 
> This patch allows VFIO devices to work on the spapr-pci-host-bridge,
> by having it switch off KVM TCE acceleration when a VFIO device is
> added to the PHB (either on startup, or by hotplug).
> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr_pci.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index cb7c351..55fa8db 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1083,6 +1083,12 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector 
> *drc,
>  void *fdt = NULL;
>  int fdt_start_offset = 0, fdt_size;
>  
> +if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
> +sPAPRTCETable *tcet = spapr_tce_find_by_liobn(phb->dma_liobn);
> +
> +spapr_tce_set_need_vfio(tcet, true);
> +}
> +
>  if (dev->hotplugged) {
>  fdt = create_device_tree(&fdt_size);
>  fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
> 

Reviewed-by: Laurent Vivier 



Re: [Qemu-devel] [PATCH 2/3] hw: do not pass NULL to memory_region_init from instance_init

2015-09-30 Thread Markus Armbruster
Paolo Bonzini  writes:

> This causes the region to outlive the object, because it attaches the
> region to /machine.  This is not nice for the "realize" method, but
> much worse for "instance_init" because it can cause dangling pointers
> after a simple object_new/object_unref pair.
>
> Reported-by: Markus Armbruster 
> Signed-off-by: Paolo Bonzini 

One more: pxa2xx_pcmcia_initfn().

The ones you fix are
Tested-by: Markus Armbruster 



Re: [Qemu-devel] [RFC v5 2/6] softmmu: Add new TLB_EXCL flag

2015-09-30 Thread alvise rigo
On Wed, Sep 30, 2015 at 5:34 AM, Richard Henderson  wrote:
> On 09/24/2015 06:32 PM, Alvise Rigo wrote:
>>
>> +if (unlikely(!(te->addr_write & TLB_MMIO) && (te->addr_write &
>> TLB_EXCL))) {
>> +/* We are removing an exclusive entry, set the page to dirty.
>> This
>> + * is not be necessary if the vCPU has performed both SC and LL.
>> */
>> +hwaddr hw_addr = (env->iotlb[mmu_idx][index].addr &
>> TARGET_PAGE_MASK) +
>> +  (te->addr_write &
>> TARGET_PAGE_MASK);
>> +cpu_physical_memory_set_excl_dirty(hw_addr, cpu->cpu_index);
>> +}
>
>
> Um... this seems dangerous.
>
> (1) I don't see why EXCL support should differ whether MMIO is set or not.
> Either we support exclusive accesses on memory-mapped io like we do on ram
> (in which case this is wrong) or we don't (in which case this is
> unnecessary).

I was not sure whether or not we had to support also MMIO memory.
In theory there shouldn't be any issues for including also
memory-mapped io, I will consider this for the next version.

>
> (2) Doesn't this prevent a target from accessing a page during a ll/sc
> sequence that aliases within our trivial hash?  Such a case on arm might be
>
> mov r3, #0x10
> ldrex   r0, [r2]
> ldr r1, [r2, r3]
> add r0, r0, r1
> strex   r0, [r2]
>

I'm not sure I got it. When the CPU issues the ldrex the page will be
set as "clean" (meaning that all the CPUs will then follow the
slow-path for that page) and the exclusive range - [r2, r2+4] in this
case - is stored in the CPU state.
The forced slow-path is used to check if the normal store is hitting
the exclusive range of any CPUs, the normal loads are not affected.
I don't see any problem in the code above, what am I missing?

> AFAIK, Alpha is the only target we have which specifies that any normal
> memory access during a ll+sc sequence may fail the sc.

I will dig into it because I remember that the Alpha architecture
behaves like ARM in the handling of LDxL/STxC instructions.

>
> (3) I'm finding the "clean/dirty" words less helpful than they could be,
> especially since "clean" implies "some cpu has an excl lock on the page",
> which is reverse of what seems natural but understandable given the
> implementation.  Perhaps we could rename these helpers?
>
>> @@ -376,6 +392,28 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1,
>> target_ulong addr)
>>   return qemu_ram_addr_from_host_nofail(p);
>>   }
>>
>> +/* Atomic insn translation TLB support. */
>> +#define EXCLUSIVE_RESET_ADDR ULLONG_MAX
>> +/* For every vCPU compare the exclusive address and reset it in case of a
>> + * match. Since only one vCPU is running at once, no lock has to be held
>> to
>> + * guard this operation. */
>> +static inline void lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr
>> size)
>> +{
>> +CPUState *cpu;
>> +CPUArchState *acpu;
>> +
>> +CPU_FOREACH(cpu) {
>> +acpu = (CPUArchState *)cpu->env_ptr;
>> +
>> +if (acpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
>> +ranges_overlap(acpu->excl_protected_range.begin,
>> +acpu->excl_protected_range.end -
>> acpu->excl_protected_range.begin,
>> +addr, size)) {
>
>
> Watch the indentation here... it ought to line up with the previous argument
> on the line above, just after the (.  This may require you split the
> subtract across the line too but that's ok.

OK, I will fix it.

>
>
>
>>   void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
>>   void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf);
>> diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
>> index 98b9cff..a67f295 100644
>> --- a/include/exec/cpu-defs.h
>> +++ b/include/exec/cpu-defs.h
>> @@ -27,6 +27,7 @@
>>   #include 
>>   #include "qemu/osdep.h"
>>   #include "qemu/queue.h"
>> +#include "qemu/range.h"
>>   #include "tcg-target.h"
>>   #ifndef CONFIG_USER_ONLY
>>   #include "exec/hwaddr.h"
>> @@ -150,5 +151,16 @@ typedef struct CPUIOTLBEntry {
>>   #define CPU_COMMON
>> \
>>   /* soft mmu support */
>> \
>>   CPU_COMMON_TLB
>> \
>> +\
>> +/* Used by the atomic insn translation backend. */  \
>> +int ll_sc_context;  \
>> +/* vCPU current exclusive addresses range.
>> + * The address is set to EXCLUSIVE_RESET_ADDR if the vCPU is not.
>> + * in the middle of a LL/SC. */ \
>> +struct Range excl_protected_range;  \
>> +/* Used to carry the SC result but also to flag a normal (legacy)
>> + * store access made by a stcond (see softmmu_template.h). */   \
>> +int excl_succeeded; \
>
>
>
> This seems to be required by softmmu_template.h?  In which case this mu

Re: [Qemu-devel] [PATCH 2/4] spapr_iommu: Rename vfio_accel parameter

2015-09-30 Thread Laurent Vivier


On 30/09/2015 05:48, David Gibson wrote:
> The vfio_accel parameter used when creating a new TCE table (guest IOMMU
> context) has a confusing name.  What it really means is whether we need the
> TCE table created to be able to support VFIO devices.
> 
> VFIO is relevant, because when available we use in-kernel acceleration of
> the TCE table, but that may not work with VFIO devices because updates to
> the table are handled in kernel, bypass qemu and so don't hit qemu's
> infrastructure for keeping the VFIO host IOMMU state in sync with the guest
> IOMMU state.
> 
> Rename the parameter to "need_vfio" throughout.  This is a cosmetic change,
> with no impact on the logic.
> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr_iommu.c   | 6 +++---
>  include/hw/ppc/spapr.h | 4 ++--
>  target-ppc/kvm.c   | 4 ++--
>  target-ppc/kvm_ppc.h   | 2 +-
>  4 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index f61504e..5166cde 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -146,7 +146,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
>  tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
>window_size,
>&tcet->fd,
> -  tcet->vfio_accel);
> +  tcet->need_vfio);
>  }
>  
>  if (!tcet->table) {
> @@ -172,7 +172,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, 
> uint32_t liobn,
> uint64_t bus_offset,
> uint32_t page_shift,
> uint32_t nb_table,
> -   bool vfio_accel)
> +   bool need_vfio)
>  {
>  sPAPRTCETable *tcet;
>  char tmp[64];
> @@ -192,7 +192,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, 
> uint32_t liobn,
>  tcet->bus_offset = bus_offset;
>  tcet->page_shift = page_shift;
>  tcet->nb_table = nb_table;
> -tcet->vfio_accel = vfio_accel;
> +tcet->need_vfio = need_vfio;
>  
>  snprintf(tmp, sizeof(tmp), "tce-table-%x", liobn);
>  object_property_add_child(OBJECT(owner), tmp, OBJECT(tcet), NULL);
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 56c5b0b..27d65d5 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -563,7 +563,7 @@ struct sPAPRTCETable {
>  uint32_t page_shift;
>  uint64_t *table;
>  bool bypass;
> -bool vfio_accel;
> +bool need_vfio;
>  int fd;
>  MemoryRegion iommu;
>  struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility only 
> */
> @@ -588,7 +588,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, 
> uint32_t liobn,
> uint64_t bus_offset,
> uint32_t page_shift,
> uint32_t nb_table,
> -   bool vfio_accel);
> +   bool need_vfio);
>  MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
>  int spapr_dma_dt(void *fdt, int node_off, const char *propname,
>   uint32_t liobn, uint64_t window, uint32_t size);
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index e641680..04ce614 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -2071,7 +2071,7 @@ bool kvmppc_spapr_use_multitce(void)
>  }
>  
>  void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
> -  bool vfio_accel)
> +  bool need_vfio)
>  {
>  struct kvm_create_spapr_tce args = {
>  .liobn = liobn,
> @@ -2085,7 +2085,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t 
> window_size, int *pfd,
>   * destroying the table, which the upper layers -will- do
>   */
>  *pfd = -1;
> -if (!cap_spapr_tce || (vfio_accel && !cap_spapr_vfio)) {
> +if (!cap_spapr_tce || (need_vfio && !cap_spapr_vfio)) {
>  return NULL;
>  }
>  
> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
> index 470f6d6..309cbe0 100644
> --- a/target-ppc/kvm_ppc.h
> +++ b/target-ppc/kvm_ppc.h
> @@ -36,7 +36,7 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu);
>  off_t kvmppc_alloc_rma(void **rma);
>  bool kvmppc_spapr_use_multitce(void);
>  void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
> -  bool vfio_accel);
> +  bool need_vfio);
>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>  int kvmppc_reset_htab(int shift_hint);
>  uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
> 
Reviewed-by: Laurent Vivier 



Re: [Qemu-devel] [PATCH] spapr: add a default rng device

2015-09-30 Thread Thomas Huth
On 30/09/15 10:33, Greg Kurz wrote:
> On Tue, 29 Sep 2015 15:01:09 +1000
> David Gibson  wrote:
> 
>> On Mon, Sep 28, 2015 at 12:13:47PM +0200, Greg Kurz wrote:
>>> A recent patch by Thomas Huth brought a new spapr-rng pseudo-device to
>>> provide high-quality random numbers to guests. The device may either be
>>> backed by a "RngBackend" or the in-kernel implementation of the H_RANDOM
>>> hypercall.
>>>
>>> Since modern POWER8 based servers always provide a hardware rng, it makes
>>> sense to create a spapr-rng device with use-kvm=true by default when it
>>> is available.
>>>
>>> Of course we want the user to have full control on how the rng is handled.
>>> The default device WILL NOT be created in the following cases:
>>> - the -nodefaults option was passed
>>> - a spapr-rng device was already passed on the command line
>>>
>>> The default device is created at reset time to ensure devices specified on
>>> the command line have been created.
>>>
>>> Signed-off-by: Greg Kurz 
>>
>> So, I think the concept is ok, but..
>>
> 
> Just to be sure about the concept.
> 
> The goal is to free users from having to explicitely pass
> 
>   -device spapr-rng,use-kvm=true
> 
> ... when ALL the following conditions are met:
> 
> 1) KVM is used and advertises KVM_CAP_PPC_HWRNG
> 2) -nodefaults HAS NOT been passed on the cmdline
> 3) -device spapr-rng HAS NOT been passed on the cmdline
> 
>>> ---
>>>  hw/ppc/spapr.c   |   17 +
>>>  hw/ppc/spapr_rng.c   |2 +-
>>>  target-ppc/kvm.c |9 +
>>>  target-ppc/kvm_ppc.h |6 ++
>>>  4 files changed, 29 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 7f4f196e53e5..ee048ecffd0c 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -1059,6 +1059,14 @@ static int spapr_check_htab_fd(sPAPRMachineState 
>>> *spapr)
>>>  return rc;
>>>  }
>>>  
>>> +static void spapr_rng_create(void)
>>> +{
>>> +Object *rng = object_new(TYPE_SPAPR_RNG);
>>> +
>>> +object_property_set_bool(rng, true, "use-kvm", &error_abort);
>>> +object_property_set_bool(rng, true, "realized", &error_abort);
>>> +}
>>> +
>>>  static void ppc_spapr_reset(void)
>>>  {
>>>  sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>> @@ -1082,6 +1090,15 @@ static void ppc_spapr_reset(void)
>>>  spapr->rtas_addr = rtas_limit - RTAS_MAX_SIZE;
>>>  spapr->fdt_addr = spapr->rtas_addr - FDT_MAX_SIZE;
>>>  
>>> +/* Create a rng device if the user did not provide it already and
>>> + * KVM has hwrng support.
>>> + */
>>> +if (defaults_enabled() &&
>>> +kvmppc_hwrng_present() &&
>>> +!object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)) {
>>> +spapr_rng_create();
>>> +}
>>> +
>>
>> Constructing the RNG at reset time is just wrong.  Using
>> defaults_enabled() is ugly at the best of times, using it at reset,
>> after construction of the qom tree is generally complete, is just
>> hideous.
>>
> 
> Yeah I ended up with this hack because I could not figure out how
> to give priority to a spapr-rng device specified on the cmdline
> over the automatic one... poor QOM skills :\
> 
> If you have a suggestion to handle this case in a more appropriate way,
> and it is worth the pain compared to the gain, please advice.

Not sure whether this might be an acceptable solution, but maybe you
could use qemu_opts_foreach(qemu_find_opts("device"), ...) to check
whether a "spapr-rng" device has been specified at the command line?

 Thomas




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCHv3 6/7] vfio: Allow hotplug of containers onto existing guest IOMMU mappings

2015-09-30 Thread Laurent Vivier


On 30/09/2015 04:13, David Gibson wrote:
> At present the memory listener used by vfio to keep host IOMMU mappings
> in sync with the guest memory image assumes that if a guest IOMMU
> appears, then it has no existing mappings.
> 
> This may not be true if a VFIO device is hotplugged onto a guest bus
> which didn't previously include a VFIO device, and which has existing
> guest IOMMU mappings.
> 
> Therefore, use the memory_region_register_iommu_notifier_replay()
> function in order to fix this case, replaying existing guest IOMMU
> mappings, bringing the host IOMMU into sync with the guest IOMMU.
> 
> Signed-off-by: David Gibson 
> ---
>  hw/vfio/common.c | 23 +--
>  1 file changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f666de2..6797208 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -312,6 +312,11 @@ out:
>  rcu_read_unlock();
>  }
>  
> +static hwaddr vfio_container_granularity(VFIOContainer *container)
> +{
> +return (hwaddr)1 << ctz64(container->iova_pgsizes);
> +}
> +
>  static void vfio_listener_region_add(MemoryListener *listener,
>   MemoryRegionSection *section)
>  {
> @@ -369,26 +374,16 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>   * would be the right place to wire that up (tell the KVM
>   * device emulation the VFIO iommu handles to use).
>   */
> -/*
> - * This assumes that the guest IOMMU is empty of
> - * mappings at this point.
> - *
> - * One way of doing this is:
> - * 1. Avoid sharing IOMMUs between emulated devices or different
> - * IOMMU groups.
> - * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
> - * there are some mappings in IOMMU.
> - *
> - * VFIO on SPAPR does that. Other IOMMU models may do that different,
> - * they must make sure there are no existing mappings or
> - * loop through existing mappings to map them into VFIO.
> - */
>  giommu = g_malloc0(sizeof(*giommu));
>  giommu->iommu = section->mr;
>  giommu->container = container;
>  giommu->n.notify = vfio_iommu_map_notify;
>  QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
> +
>  memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> +memory_region_iommu_replay(giommu->iommu, &giommu->n,
> +   vfio_container_granularity(container),
> +   false);

I'm wondering if it has any sense to provide the "is_write" information
at this level of the API: I don't think we can have access to this
information when we call this function (so it will be always used with
false, or called twice once with false, once with true). I think it
would be better to manage this internally.

-
>  
>  return;
>  }
> 



Re: [Qemu-devel] [PATCH v6 06/24] memfd: add fallback for memfd

2015-09-30 Thread Michael S. Tsirkin
On Tue, Sep 29, 2015 at 06:34:36PM +0200, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Add an open/unlink/mmap fallback for system that do not support memfd.
> This patch may require additional SELinux policies to work for enforced
> systems, but should gracefully fail nonetheless.
> 
> Signed-off-by: Marc-André Lureau 

I'd rather just fail migration.

> ---
>  util/memfd.c | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/util/memfd.c b/util/memfd.c
> index 3168902..970b5b0 100644
> --- a/util/memfd.c
> +++ b/util/memfd.c
> @@ -84,8 +84,26 @@ void *qemu_memfd_alloc(const char *name, size_t size, 
> unsigned int seals,
>  return NULL;
>  }
>  } else {
> -perror("memfd");
> -return NULL;
> +const char *tmpdir = getenv("TMPDIR");
> +gchar *fname;
> +
> +tmpdir = tmpdir ? tmpdir : "/tmp";
> +
> +fname = g_strdup_printf("%s/memfd-XX", tmpdir);
> +mfd = mkstemp(fname);
> +unlink(fname);
> +g_free(fname);
> +
> +if (mfd == -1) {
> +perror("mkstemp");
> +return NULL;
> +}
> +
> +if (ftruncate(mfd, size) == -1) {
> +perror("ftruncate");
> +close(mfd);
> +return NULL;
> +}
>  }
>  
>  ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, mfd, 0);
> -- 
> 2.4.3



[Qemu-devel] [PULL 6/6] migration: Disambiguate MAX_THROTTLE

2015-09-30 Thread Juan Quintela
From: "Jason J. Herne" 

Migration has a define for MAX_THROTTLE. Update comment to clarify that this is
used for throttling transfer speed. Hopefully this will prevent it from being
confused with a guest cpu throttling entity.

Signed-off-by: Jason J. Herne 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Eric Blake 
Signed-off-by: Juan Quintela 
Reviewed-by: Juan Quintela 
---
 migration/migration.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index c7472ed..b7de9b7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -31,7 +31,7 @@
 #include "qapi-event.h"
 #include "qom/cpu.h"

-#define MAX_THROTTLE  (32 << 20)  /* Migration speed throttling */
+#define MAX_THROTTLE  (32 << 20)  /* Migration transfer speed throttling */

 /* Amount of time to allocate to each "chunk" of bandwidth-throttled
  * data. */
-- 
2.4.3




[Qemu-devel] [PULL 5/6] qmp/hmp: Add throttle ratio to query-migrate and info migrate

2015-09-30 Thread Juan Quintela
From: "Jason J. Herne" 

Report throttle percentage in info migrate and query-migrate responses when
cpu throttling is active.

Signed-off-by: Jason J. Herne 
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Juan Quintela 
Reviewed-by: Juan Quintela 
---
 hmp.c | 5 +
 migration/migration.c | 5 +
 qapi-schema.json  | 7 ++-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/hmp.c b/hmp.c
index 48ce372..5048eee 100644
--- a/hmp.c
+++ b/hmp.c
@@ -232,6 +232,11 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
info->xbzrle_cache->overflow);
 }

+if (info->has_x_cpu_throttle_percentage) {
+monitor_printf(mon, "cpu throttle percentage: %" PRIu64 "\n",
+   info->x_cpu_throttle_percentage);
+}
+
 qapi_free_MigrationInfo(info);
 qapi_free_MigrationCapabilityStatusList(caps);
 }
diff --git a/migration/migration.c b/migration/migration.c
index e829231..c7472ed 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -447,6 +447,11 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info->disk->total = blk_mig_bytes_total();
 }

+if (cpu_throttle_active()) {
+info->has_x_cpu_throttle_percentage = true;
+info->x_cpu_throttle_percentage = cpu_throttle_get_percentage();
+}
+
 get_xbzrle_cache_stats(info);
 break;
 case MIGRATION_STATUS_COMPLETED:
diff --git a/qapi-schema.json b/qapi-schema.json
index 646c0fa..8b0520c 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -480,6 +480,10 @@
 #may be expensive, but do not actually occur during the iterative
 #migration rounds themselves. (since 1.6)
 #
+# @x-cpu-throttle-percentage: #optional percentage of time guest cpus are being
+#   throttled during auto-converge. This is only present when auto-converge
+#   has started throttling guest cpus. (Since 2.5)
+#
 # Since: 0.14.0
 ##
 { 'struct': 'MigrationInfo',
@@ -489,7 +493,8 @@
'*total-time': 'int',
'*expected-downtime': 'int',
'*downtime': 'int',
-   '*setup-time': 'int'} }
+   '*setup-time': 'int',
+   '*x-cpu-throttle-percentage': 'int'} }

 ##
 # @query-migrate
-- 
2.4.3




Re: [Qemu-devel] [PATCH] spapr: add a default rng device

2015-09-30 Thread Greg Kurz
On Tue, 29 Sep 2015 15:01:09 +1000
David Gibson  wrote:

> On Mon, Sep 28, 2015 at 12:13:47PM +0200, Greg Kurz wrote:
> > A recent patch by Thomas Huth brought a new spapr-rng pseudo-device to
> > provide high-quality random numbers to guests. The device may either be
> > backed by a "RngBackend" or the in-kernel implementation of the H_RANDOM
> > hypercall.
> > 
> > Since modern POWER8 based servers always provide a hardware rng, it makes
> > sense to create a spapr-rng device with use-kvm=true by default when it
> > is available.
> > 
> > Of course we want the user to have full control on how the rng is handled.
> > The default device WILL NOT be created in the following cases:
> > - the -nodefaults option was passed
> > - a spapr-rng device was already passed on the command line
> > 
> > The default device is created at reset time to ensure devices specified on
> > the command line have been created.
> > 
> > Signed-off-by: Greg Kurz 
> 
> So, I think the concept is ok, but..
> 

Just to be sure about the concept.

The goal is to free users from having to explicitely pass

-device spapr-rng,use-kvm=true

... when ALL the following conditions are met:

1) KVM is used and advertises KVM_CAP_PPC_HWRNG
2) -nodefaults HAS NOT been passed on the cmdline
3) -device spapr-rng HAS NOT been passed on the cmdline

> > ---
> >  hw/ppc/spapr.c   |   17 +
> >  hw/ppc/spapr_rng.c   |2 +-
> >  target-ppc/kvm.c |9 +
> >  target-ppc/kvm_ppc.h |6 ++
> >  4 files changed, 29 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 7f4f196e53e5..ee048ecffd0c 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1059,6 +1059,14 @@ static int spapr_check_htab_fd(sPAPRMachineState 
> > *spapr)
> >  return rc;
> >  }
> >  
> > +static void spapr_rng_create(void)
> > +{
> > +Object *rng = object_new(TYPE_SPAPR_RNG);
> > +
> > +object_property_set_bool(rng, true, "use-kvm", &error_abort);
> > +object_property_set_bool(rng, true, "realized", &error_abort);
> > +}
> > +
> >  static void ppc_spapr_reset(void)
> >  {
> >  sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > @@ -1082,6 +1090,15 @@ static void ppc_spapr_reset(void)
> >  spapr->rtas_addr = rtas_limit - RTAS_MAX_SIZE;
> >  spapr->fdt_addr = spapr->rtas_addr - FDT_MAX_SIZE;
> >  
> > +/* Create a rng device if the user did not provide it already and
> > + * KVM has hwrng support.
> > + */
> > +if (defaults_enabled() &&
> > +kvmppc_hwrng_present() &&
> > +!object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)) {
> > +spapr_rng_create();
> > +}
> > +
> 
> Constructing the RNG at reset time is just wrong.  Using
> defaults_enabled() is ugly at the best of times, using it at reset,
> after construction of the qom tree is generally complete, is just
> hideous.
> 

Yeah I ended up with this hack because I could not figure out how
to give priority to a spapr-rng device specified on the cmdline
over the automatic one... poor QOM skills :\

If you have a suggestion to handle this case in a more appropriate way,
and it is worth the pain compared to the gain, please advice.

Thanks.

--
Greg


pgp7IIPM0K3WF.pgp
Description: OpenPGP digital signature


[Qemu-devel] [PULL 4/6] migration: Dynamic cpu throttling for auto-converge

2015-09-30 Thread Juan Quintela
From: "Jason J. Herne" 

Remove traditional auto-converge static 30ms throttling code and replace it
with a dynamic throttling algorithm.

Additionally, be more aggressive when deciding when to start throttling.
Previously we waited until four unproductive memory passes. Now we begin
throttling after only two unproductive memory passes. Four seemed quite
arbitrary and only waiting for two passes allows us to complete the migration
faster.

Signed-off-by: Jason J. Herne 
Reviewed-by: Matthew Rosato 
Signed-off-by: Juan Quintela 
Reviewed-by: Juan Quintela 
---
 migration/migration.c |  4 +++
 migration/ram.c   | 89 +--
 2 files changed, 34 insertions(+), 59 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8a1af3b..e829231 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -29,6 +29,7 @@
 #include "trace.h"
 #include "qapi/util.h"
 #include "qapi-event.h"
+#include "qom/cpu.h"

 #define MAX_THROTTLE  (32 << 20)  /* Migration speed throttling */

@@ -1070,6 +1071,9 @@ static void *migration_thread(void *opaque)
 }
 }

+/* If we enabled cpu throttling for auto-converge, turn it off. */
+cpu_throttle_stop();
+
 qemu_mutex_lock_iothread();
 if (s->state == MIGRATION_STATUS_COMPLETED) {
 int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
diff --git a/migration/ram.c b/migration/ram.c
index 5187637..2d1d0b9 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -47,9 +47,7 @@
 do { } while (0)
 #endif

-static bool mig_throttle_on;
 static int dirty_rate_high_cnt;
-static void check_guest_throttling(void);

 static uint64_t bitmap_sync_count;

@@ -407,6 +405,29 @@ static size_t save_page_header(QEMUFile *f, RAMBlock 
*block, ram_addr_t offset)
 return size;
 }

+/* Reduce amount of guest cpu execution to hopefully slow down memory writes.
+ * If guest dirty memory rate is reduced below the rate at which we can
+ * transfer pages to the destination then we should be able to complete
+ * migration. Some workloads dirty memory way too fast and will not effectively
+ * converge, even with auto-converge.
+ */
+static void mig_throttle_guest_down(void)
+{
+MigrationState *s = migrate_get_current();
+uint64_t pct_initial =
+s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL];
+uint64_t pct_icrement =
+s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT];
+
+/* We have not started throttling yet. Let's start it. */
+if (!cpu_throttle_active()) {
+cpu_throttle_set(pct_initial);
+} else {
+/* Throttling already on, just increase the rate */
+cpu_throttle_set(cpu_throttle_get_percentage() + pct_icrement);
+}
+}
+
 /* Update the xbzrle cache to reflect a page that's been sent as all 0.
  * The important thing is that a stale (not-yet-0'd) page be replaced
  * by the new data.
@@ -599,21 +620,21 @@ static void migration_bitmap_sync(void)
 /* The following detection logic can be refined later. For now:
Check to see if the dirtied bytes is 50% more than the approx.
amount of bytes that just got transferred since the last time we
-   were in this routine. If that happens >N times (for now N==4)
-   we turn on the throttle down logic */
+   were in this routine. If that happens twice, start or increase
+   throttling */
 bytes_xfer_now = ram_bytes_transferred();
+
 if (s->dirty_pages_rate &&
(num_dirty_pages_period * TARGET_PAGE_SIZE >
(bytes_xfer_now - bytes_xfer_prev)/2) &&
-   (dirty_rate_high_cnt++ > 4)) {
+   (dirty_rate_high_cnt++ >= 2)) {
 trace_migration_throttle();
-mig_throttle_on = true;
 dirty_rate_high_cnt = 0;
+mig_throttle_guest_down();
  }
  bytes_xfer_prev = bytes_xfer_now;
-} else {
- mig_throttle_on = false;
 }
+
 if (migrate_use_xbzrle()) {
 if (iterations_prev != acct_info.iterations) {
 acct_info.xbzrle_cache_miss_rate =
@@ -1146,7 +1167,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 RAMBlock *block;
 int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */

-mig_throttle_on = false;
 dirty_rate_high_cnt = 0;
 bitmap_sync_count = 0;
 migration_bitmap_sync_init();
@@ -1251,7 +1271,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 }
 pages_sent += pages;
 acct_info.iterations++;
-check_guest_throttling();
+
 /* we want to check in the 1st loop, just in case it was the 1st time
and we had to sync the dirty bitmap.
qemu_get_clock_ns() is a bit expensive, so we only check each some
@@ -1664,52 +1684,3 @@ void ram_mig

Re: [Qemu-devel] [PATCH v6 06/24] memfd: add fallback for memfd

2015-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2015 at 05:06:55AM -0400, Marc-André Lureau wrote:
> Hi
> 
> - Original Message -
> > On Tue, Sep 29, 2015 at 06:34:36PM +0200, marcandre.lur...@redhat.com wrote:
> > > From: Marc-André Lureau 
> > > 
> > > Add an open/unlink/mmap fallback for system that do not support memfd.
> > > This patch may require additional SELinux policies to work for enforced
> > > systems, but should gracefully fail nonetheless.
> > > 
> > > Signed-off-by: Marc-André Lureau 
> > 
> > I'd rather just fail migration.
> 
> So we don't provide this compatibility code and migration should fail.
> 
> Would it be enough to check if memfd works at early runtime and add a 
> migration blocker for vhost-user? Or is it possible to recover if migration 
> fails when memfd fails to allocate? I would thing the former is better.

Fine with me.

> > 
> > > ---
> > >  util/memfd.c | 22 --
> > >  1 file changed, 20 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/util/memfd.c b/util/memfd.c
> > > index 3168902..970b5b0 100644
> > > --- a/util/memfd.c
> > > +++ b/util/memfd.c
> > > @@ -84,8 +84,26 @@ void *qemu_memfd_alloc(const char *name, size_t size,
> > > unsigned int seals,
> > >  return NULL;
> > >  }
> > >  } else {
> > > -perror("memfd");
> > > -return NULL;
> > > +const char *tmpdir = getenv("TMPDIR");
> > > +gchar *fname;
> > > +
> > > +tmpdir = tmpdir ? tmpdir : "/tmp";
> > > +
> > > +fname = g_strdup_printf("%s/memfd-XX", tmpdir);
> > > +mfd = mkstemp(fname);
> > > +unlink(fname);
> > > +g_free(fname);
> > > +
> > > +if (mfd == -1) {
> > > +perror("mkstemp");
> > > +return NULL;
> > > +}
> > > +
> > > +if (ftruncate(mfd, size) == -1) {
> > > +perror("ftruncate");
> > > +close(mfd);
> > > +return NULL;
> > > +}
> > >  }
> > >  
> > >  ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, mfd, 0);
> > > --
> > > 2.4.3
> > 



Re: [Qemu-devel] [PATCH 3/3] macio: move DBDMA_init from instance_init to realize

2015-09-30 Thread Thomas Huth
On 29/09/15 14:37, Paolo Bonzini wrote:
> DBDMA_init is not idempotent, and calling it from instance_init
> breaks a simple object_new/object_unref pair.  Work around this,
> pending qdev-ification of DBDMA, by moving the call to realize.
> 
> Reported-by: Markus Armbruster 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/misc/macio/macio.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/misc/macio/macio.c b/hw/misc/macio/macio.c
> index 2548d96..c661f86 100644
> --- a/hw/misc/macio/macio.c
> +++ b/hw/misc/macio/macio.c
> @@ -131,6 +131,10 @@ static void macio_common_realize(PCIDevice *d, Error 
> **errp)
>  MacIOState *s = MACIO(d);
>  SysBusDevice *sysbus_dev;
>  Error *err = NULL;
> +MemoryRegion *dbdma_mem;
> +
> +s->dbdma = DBDMA_init(&dbdma_mem);
> +memory_region_add_subregion(&s->bar, 0x08000, dbdma_mem);
>  
>  object_property_set_bool(OBJECT(&s->cuda), true, "realized", &err);
>  if (err) {
> @@ -328,16 +332,12 @@ static void macio_newworld_init(Object *obj)
>  static void macio_instance_init(Object *obj)
>  {
>  MacIOState *s = MACIO(obj);
> -MemoryRegion *dbdma_mem;
>  
>  memory_region_init(&s->bar, obj, "macio", 0x8);
>  
>  object_initialize(&s->cuda, sizeof(s->cuda), TYPE_CUDA);
>  qdev_set_parent_bus(DEVICE(&s->cuda), sysbus_get_default());
>  object_property_add_child(obj, "cuda", OBJECT(&s->cuda), NULL);
> -
> -s->dbdma = DBDMA_init(&dbdma_mem);
> -memory_region_add_subregion(&s->bar, 0x08000, dbdma_mem);
>  }
>  
>  static const VMStateDescription vmstate_macio_oldworld = {

Reviewed-by: Thomas Huth 





Re: [Qemu-devel] [PATCH v6 05/24] util: add memfd helpers

2015-09-30 Thread Michael S. Tsirkin
On Tue, Sep 29, 2015 at 06:34:35PM +0200, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Add qemu_memfd_alloc/free() helpers.
> 
> The function helps to allocate and seal a memfd.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  include/qemu/memfd.h |  4 
>  util/memfd.c | 59 
> ++--
>  2 files changed, 61 insertions(+), 2 deletions(-)
> 
> diff --git a/include/qemu/memfd.h b/include/qemu/memfd.h
> index 8b1fe6a..950fb88 100644
> --- a/include/qemu/memfd.h
> +++ b/include/qemu/memfd.h
> @@ -17,4 +17,8 @@
>  #define F_SEAL_WRITE0x0008  /* prevent writes */
>  #endif
>  
> +void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals,
> +   int *fd);
> +void qemu_memfd_free(void *ptr, size_t size, int fd);
> +
>  #endif /* QEMU_MEMFD_H */
> diff --git a/util/memfd.c b/util/memfd.c
> index a98d57e..3168902 100644
> --- a/util/memfd.c
> +++ b/util/memfd.c
> @@ -27,6 +27,14 @@
>  
>  #include "config-host.h"
>  
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
>  #include "qemu/memfd.h"
>  
>  #ifdef CONFIG_MEMFD
> @@ -44,13 +52,60 @@
>  #define MFD_ALLOW_SEALING 0x0002U
>  #endif
>  
> -static inline int memfd_create(const char *name, unsigned int flags)
> +static int memfd_create(const char *name, unsigned int flags)
>  {
>  return syscall(__NR_memfd_create, name, flags);
>  }
>  #else /* !LINUX */
> -static inline int memfd_create(const char *name, unsigned int flags)
> +static int memfd_create(const char *name, unsigned int flags)
>  {
>  return -1;
>  }
>  #endif
> +
> +void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals,
> +   int *fd)
> +{
> +void *ptr;
> +int mfd;
> +
> +*fd = -1;
> +mfd = memfd_create(name, MFD_ALLOW_SEALING | MFD_CLOEXEC);
> +if (mfd != -1) {
> +if (ftruncate(mfd, size) == -1) {
> +perror("ftruncate");
> +close(mfd);
> +return NULL;
> +}
> +
> +if (fcntl(mfd, F_ADD_SEALS, seals) == -1) {
> +perror("fcntl");
> +close(mfd);
> +return NULL;
> +}

Why do it here? I note that you don't try to do this with the tmpfs
fallback.

> +} else {
> +perror("memfd");
> +return NULL;
> +}
> +
> +ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, mfd, 0);
> +if (ptr == MAP_FAILED) {
> +perror("mmap");
> +close(mfd);
> +return NULL;
> +}
> +
> +*fd = mfd;
> +return ptr;
> +}
> +
> +void qemu_memfd_free(void *ptr, size_t size, int fd)
> +{
> +if (ptr) {
> +munmap(ptr, size);
> +}
> +
> +if (fd != -1) {
> +close(fd);
> +}
> +}
> -- 
> 2.4.3



Re: [Qemu-devel] [PATCH v6 04/24] util: add linux-only memfd fallback

2015-09-30 Thread Michael S. Tsirkin
On Tue, Sep 29, 2015 at 06:34:34PM +0200, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Implement memfd_create() fallback if not available in system libc.
> memfd_create() is still not included in glibc today, atlhough it's been
> available since Linux 3.17 in Oct 2014.
> 
> memfd has numerous advantages over traditional shm/mmap for ipc memory
> sharing with fd handler, which we are going to make use of for
> vhost-user logging memory in following patches.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  include/qemu/memfd.h | 20 +++
>  util/Makefile.objs   |  2 +-
>  util/memfd.c | 56 
> 
>  3 files changed, 77 insertions(+), 1 deletion(-)
>  create mode 100644 include/qemu/memfd.h
>  create mode 100644 util/memfd.c
> 
> diff --git a/include/qemu/memfd.h b/include/qemu/memfd.h
> new file mode 100644
> index 000..8b1fe6a
> --- /dev/null
> +++ b/include/qemu/memfd.h
> @@ -0,0 +1,20 @@
> +#ifndef QEMU_MEMFD_H
> +#define QEMU_MEMFD_H
> +
> +#include "config-host.h"
> +
> +#ifndef F_LINUX_SPECIFIC_BASE
> +#define F_LINUX_SPECIFIC_BASE 1024
> +#endif
> +
> +#ifndef F_ADD_SEALS
> +#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
> +#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
> +
> +#define F_SEAL_SEAL 0x0001  /* prevent further seals from being set */
> +#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
> +#define F_SEAL_GROW 0x0004  /* prevent file from growing */
> +#define F_SEAL_WRITE0x0008  /* prevent writes */
> +#endif
> +
> +#endif /* QEMU_MEMFD_H */
> diff --git a/util/Makefile.objs b/util/Makefile.objs
> index 114d657..84c5485 100644
> --- a/util/Makefile.objs
> +++ b/util/Makefile.objs
> @@ -1,6 +1,6 @@
>  util-obj-y = osdep.o cutils.o unicode.o qemu-timer-common.o
>  util-obj-$(CONFIG_WIN32) += oslib-win32.o qemu-thread-win32.o 
> event_notifier-win32.o
> -util-obj-$(CONFIG_POSIX) += oslib-posix.o qemu-thread-posix.o 
> event_notifier-posix.o qemu-openpty.o
> +util-obj-$(CONFIG_POSIX) += oslib-posix.o qemu-thread-posix.o 
> event_notifier-posix.o qemu-openpty.o memfd.o
>  util-obj-y += envlist.o path.o module.o
>  util-obj-$(call lnot,$(CONFIG_INT128)) += host-utils.o
>  util-obj-y += bitmap.o bitops.o hbitmap.o
> diff --git a/util/memfd.c b/util/memfd.c
> new file mode 100644
> index 000..a98d57e
> --- /dev/null
> +++ b/util/memfd.c
> @@ -0,0 +1,56 @@
> +/*
> + * memfd.c
> + *
> + * Copyright (c) 2015 Red Hat, Inc.
> + *
> + * QEMU library functions on POSIX which are shared between QEMU and
> + * the QEMU tools.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "config-host.h"
> +
> +#include "qemu/memfd.h"
> +
> +#ifdef CONFIG_MEMFD
> +#include 

So the file is empty if CONFIG_MEMFD is set?
It would be more elegant to avoid linking it in.
Can be a patch on top.

> +#elif defined CONFIG_LINUX
> +#include 
> +#include 
> +#include 
> +
> +#ifndef MFD_CLOEXEC
> +#define MFD_CLOEXEC 0x0001U
> +#endif
> +
> +#ifndef MFD_ALLOW_SEALING
> +#define MFD_ALLOW_SEALING 0x0002U
> +#endif
> +
> +static inline int memfd_create(const char *name, unsigned int flags)
> +{
> +return syscall(__NR_memfd_create, name, flags);
> +}
> +#else /* !LINUX */
> +static inline int memfd_create(const char *name, unsigned int flags)
> +{
> +return -1;
> +}
> +#endif
> -- 
> 2.4.3



Re: [Qemu-devel] [PATCH 3/4] spapr_iommu: Provide a function to switch a TCE table to allowing VFIO

2015-09-30 Thread Laurent Vivier


On 30/09/2015 05:48, David Gibson wrote:
> Because of the way non-VFIO guest IOMMU operations are KVM accelerated, not
> all TCE tables (guest IOMMU contexts) can support VFIO devices.  Currently,
> this is decided at creation time.
> 
> To support hotplug of VFIO devices, we need to allow a TCE table which
> previously didn't allow VFIO devices to be switched so that it can.  This
> patch adds an spapr_tce_set_need_vfio() function to do this, by
> reallocating the table in userspace if necessary.
> 
> Currently this doesn't allow the KVM acceleration to be re-enabled if all
> the VFIO devices are removed.  That's an optimization for another time.
> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr_iommu.c   | 32 
>  include/hw/ppc/spapr.h |  2 ++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 5166cde..8d60f8b 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -168,6 +168,38 @@ static int spapr_tce_table_realize(DeviceState *dev)
>  return 0;
>  }
>  
> +void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
> +{
> +size_t table_size = tcet->nb_table * sizeof(uint64_t);
> +void *newtable;
> +
> +if (need_vfio == tcet->need_vfio) {
> +/* Nothing to do */
> +return;
> +}
> +
> +if (!need_vfio) {
> +/* FIXME: We don't support transition back to KVM accelerated
> + * TCEs yet */

Report some warnings ?

> +return;
> +}
> +
> +tcet->need_vfio = true;
> +
> +if (tcet->fd < 0) {
> +/* Table is already in userspace, nothing to be do */
> +return;
> +}
> +
> +newtable = g_malloc0(table_size);
> +memcpy(newtable, tcet->table, table_size);
> +
> +kvmppc_remove_spapr_tce(tcet->table, tcet->fd, tcet->nb_table);
> +
> +tcet->fd = -1;
> +tcet->table = newtable;
> +}
> +
>  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
> uint64_t bus_offset,
> uint32_t page_shift,
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 27d65d5..5baa906 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -589,6 +589,8 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, 
> uint32_t liobn,
> uint32_t page_shift,
> uint32_t nb_table,
> bool need_vfio);
> +void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio);
> +
>  MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
>  int spapr_dma_dt(void *fdt, int node_off, const char *propname,
>   uint32_t liobn, uint64_t window, uint32_t size);
> 

Reviewed-by: Laurent Vivier 



Re: [Qemu-devel] [PATCH v2] vhost-user-test: do not reinvent glib-compat.h

2015-09-30 Thread Marc-André Lureau
Reviewed-by: Marc-André Lureau 
Tested-by: Marc-André Lureau 


(It conflicts with my vhost-user series, but fixing it is quite
trivial, I pushed a rebased version on my devel tree
https://github.com/elmarco/qemu/tree/vhost-user)

On Tue, Sep 29, 2015 at 2:12 PM, Paolo Bonzini  wrote:
> glib-compat.h has the gunk to support both old-style and new-style
> gthread functions.  Use it instead of reinventing it.
>
> Signed-off-by: Paolo Bonzini 
> ---
>  tests/vhost-user-test.c | 113 
> +++-
>  1 file changed, 16 insertions(+), 97 deletions(-)
>
> diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
> index e301db7..0e04f06 100644
> --- a/tests/vhost-user-test.c
> +++ b/tests/vhost-user-test.c
> @@ -8,7 +8,6 @@
>   *
>   */
>
> -#define QEMU_GLIB_COMPAT_H
>  #include 
>
>  #include "libqtest.h"
> @@ -30,12 +29,6 @@
>  #define HAVE_MONOTONIC_TIME
>  #endif
>
> -#if GLIB_CHECK_VERSION(2, 32, 0)
> -#define HAVE_MUTEX_INIT
> -#define HAVE_COND_INIT
> -#define HAVE_THREAD_NEW
> -#endif
> -
>  #define QEMU_CMD_ACCEL  " -machine accel=tcg"
>  #define QEMU_CMD_MEM" -m 512 -object 
> memory-backend-file,id=mem,size=512M,"\
>  "mem-path=%s,share=on -numa node,memdev=mem"
> @@ -113,93 +106,21 @@ static VhostUserMsg m __attribute__ ((unused));
>
>  int fds_num = 0, fds[VHOST_MEMORY_MAX_NREGIONS];
>  static VhostUserMemory memory;
> -static GMutex *data_mutex;
> -static GCond *data_cond;
> -
> -static gint64 _get_time(void)
> -{
> -#ifdef HAVE_MONOTONIC_TIME
> -return g_get_monotonic_time();
> -#else
> -GTimeVal time;
> -g_get_current_time(&time);
> -
> -return time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> -#endif
> -}
> -
> -static GMutex *_mutex_new(void)
> -{
> -GMutex *mutex;
> -
> -#ifdef HAVE_MUTEX_INIT
> -mutex = g_new(GMutex, 1);
> -g_mutex_init(mutex);
> -#else
> -mutex = g_mutex_new();
> -#endif
> -
> -return mutex;
> -}
> +static CompatGMutex data_mutex;
> +static CompatGCond data_cond;
>
> -static void _mutex_free(GMutex *mutex)
> -{
> -#ifdef HAVE_MUTEX_INIT
> -g_mutex_clear(mutex);
> -g_free(mutex);
> -#else
> -g_mutex_free(mutex);
> -#endif
> -}
> -
> -static GCond *_cond_new(void)
> -{
> -GCond *cond;
> -
> -#ifdef HAVE_COND_INIT
> -cond = g_new(GCond, 1);
> -g_cond_init(cond);
> -#else
> -cond = g_cond_new();
> -#endif
> -
> -return cond;
> -}
> -
> -static gboolean _cond_wait_until(GCond *cond, GMutex *mutex, gint64 end_time)
> +#if !GLIB_CHECK_VERSION(2, 32, 0)
> +static gboolean g_cond_wait_until(CompatGCond cond, CompatGMutex mutex,
> +  gint64 end_time)
>  {
>  gboolean ret = FALSE;
> -#ifdef HAVE_COND_INIT
> -ret = g_cond_wait_until(cond, mutex, end_time);
> -#else
> +end_time -= g_get_monotonic_time();
>  GTimeVal time = { end_time / G_TIME_SPAN_SECOND,
>end_time % G_TIME_SPAN_SECOND };
>  ret = g_cond_timed_wait(cond, mutex, &time);
> -#endif
>  return ret;
>  }
> -
> -static void _cond_free(GCond *cond)
> -{
> -#ifdef HAVE_COND_INIT
> -g_cond_clear(cond);
> -g_free(cond);
> -#else
> -g_cond_free(cond);
>  #endif
> -}
> -
> -static GThread *_thread_new(const gchar *name, GThreadFunc func, gpointer 
> data)
> -{
> -GThread *thread = NULL;
> -GError *error = NULL;
> -#ifdef HAVE_THREAD_NEW
> -thread = g_thread_try_new(name, func, data, &error);
> -#else
> -thread = g_thread_create(func, data, TRUE, &error);
> -#endif
> -return thread;
> -}
>
>  static void read_guest_mem(void)
>  {
> @@ -208,11 +129,11 @@ static void read_guest_mem(void)
>  int i, j;
>  size_t size;
>
> -g_mutex_lock(data_mutex);
> +g_mutex_lock(&data_mutex);
>
> -end_time = _get_time() + 5 * G_TIME_SPAN_SECOND;
> +end_time = g_get_monotonic_time() + 5 * G_TIME_SPAN_SECOND;
>  while (!fds_num) {
> -if (!_cond_wait_until(data_cond, data_mutex, end_time)) {
> +if (!g_cond_wait_until(&data_cond, &data_mutex, end_time)) {
>  /* timeout has passed */
>  g_assert(fds_num);
>  break;
> @@ -252,7 +173,7 @@ static void read_guest_mem(void)
>  }
>
>  g_assert_cmpint(1, ==, 1);
> -g_mutex_unlock(data_mutex);
> +g_mutex_unlock(&data_mutex);
>  }
>
>  static void *thread_function(void *data)
> @@ -280,7 +201,7 @@ static void chr_read(void *opaque, const uint8_t *buf, 
> int size)
>  return;
>  }
>
> -g_mutex_lock(data_mutex);
> +g_mutex_lock(&data_mutex);
>  memcpy(p, buf, VHOST_USER_HDR_SIZE);
>
>  if (msg.size) {
> @@ -313,7 +234,7 @@ static void chr_read(void *opaque, const uint8_t *buf, 
> int size)
>  fds_num = qemu_chr_fe_get_msgfds(chr, fds, sizeof(fds) / 
> sizeof(int));
>
>  /* signal the test that it can continue */
> -g_cond_signal(data_cond);
> +g_cond_signal(&data_cond);
>  break;
>
>  case VHOST_USER_SET_VR

[Qemu-devel] [PULL 2/6] cpu: Provide vcpu throttling interface

2015-09-30 Thread Juan Quintela
From: "Jason J. Herne" 

Provide a method to throttle guest cpu execution. CPUState is augmented with
timeout controls and throttle start/stop functions. To throttle the guest cpu
the caller simply has to call the throttle set function and provide a percentage
of throttle time.

Signed-off-by: Jason J. Herne 
Reviewed-by: Matthew Rosato 
Signed-off-by: Juan Quintela 
Reviewed-by: Juan Quintela 
---
 cpus.c| 78 +++
 include/qom/cpu.h | 42 ++
 2 files changed, 120 insertions(+)

diff --git a/cpus.c b/cpus.c
index 056..d44c0ed 100644
--- a/cpus.c
+++ b/cpus.c
@@ -69,6 +69,14 @@ static CPUState *next_cpu;
 int64_t max_delay;
 int64_t max_advance;

+/* vcpu throttling controls */
+static QEMUTimer *throttle_timer;
+static unsigned int throttle_percentage;
+
+#define CPU_THROTTLE_PCT_MIN 1
+#define CPU_THROTTLE_PCT_MAX 99
+#define CPU_THROTTLE_TIMESLICE_NS 1000
+
 bool cpu_is_stopped(CPUState *cpu)
 {
 return cpu->stopped || !runstate_is_running();
@@ -505,10 +513,80 @@ static const VMStateDescription vmstate_timers = {
 }
 };

+static void cpu_throttle_thread(void *opaque)
+{
+CPUState *cpu = opaque;
+double pct;
+double throttle_ratio;
+long sleeptime_ns;
+
+if (!cpu_throttle_get_percentage()) {
+return;
+}
+
+pct = (double)cpu_throttle_get_percentage()/100;
+throttle_ratio = pct / (1 - pct);
+sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);
+
+qemu_mutex_unlock_iothread();
+atomic_set(&cpu->throttle_thread_scheduled, 0);
+g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */
+qemu_mutex_lock_iothread();
+}
+
+static void cpu_throttle_timer_tick(void *opaque)
+{
+CPUState *cpu;
+double pct;
+
+/* Stop the timer if needed */
+if (!cpu_throttle_get_percentage()) {
+return;
+}
+CPU_FOREACH(cpu) {
+if (!atomic_xchg(&cpu->throttle_thread_scheduled, 1)) {
+async_run_on_cpu(cpu, cpu_throttle_thread, cpu);
+}
+}
+
+pct = (double)cpu_throttle_get_percentage()/100;
+timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
+   CPU_THROTTLE_TIMESLICE_NS / (1-pct));
+}
+
+void cpu_throttle_set(int new_throttle_pct)
+{
+/* Ensure throttle percentage is within valid range */
+new_throttle_pct = MIN(new_throttle_pct, CPU_THROTTLE_PCT_MAX);
+new_throttle_pct = MAX(new_throttle_pct, CPU_THROTTLE_PCT_MIN);
+
+atomic_set(&throttle_percentage, new_throttle_pct);
+
+timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
+   CPU_THROTTLE_TIMESLICE_NS);
+}
+
+void cpu_throttle_stop(void)
+{
+atomic_set(&throttle_percentage, 0);
+}
+
+bool cpu_throttle_active(void)
+{
+return (cpu_throttle_get_percentage() != 0);
+}
+
+int cpu_throttle_get_percentage(void)
+{
+return atomic_read(&throttle_percentage);
+}
+
 void cpu_ticks_init(void)
 {
 seqlock_init(&timers_state.vm_clock_seqlock, NULL);
 vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
+throttle_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
+   cpu_throttle_timer_tick, NULL);
 }

 void configure_icount(QemuOpts *opts, Error **errp)
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 302673d..9405554 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -321,6 +321,11 @@ struct CPUState {
 uint32_t can_do_io;
 int32_t exception_index; /* used by m68k TCG */

+/* Used to keep track of an outstanding cpu throttle thread for migration
+ * autoconverge
+ */
+bool throttle_thread_scheduled;
+
 /* Note that this is accessed at the start of every TB via a negative
offset from AREG0.  Leave this field at the end so as to make the
(absolute value) offset as small as possible.  This reduces code
@@ -565,6 +570,43 @@ CPUState *qemu_get_cpu(int index);
  */
 bool cpu_exists(int64_t id);

+/**
+ * cpu_throttle_set:
+ * @new_throttle_pct: Percent of sleep time. Valid range is 1 to 99.
+ *
+ * Throttles all vcpus by forcing them to sleep for the given percentage of
+ * time. A throttle_percentage of 25 corresponds to a 75% duty cycle roughly.
+ * (example: 10ms sleep for every 30ms awake).
+ *
+ * cpu_throttle_set can be called as needed to adjust new_throttle_pct.
+ * Once the throttling starts, it will remain in effect until cpu_throttle_stop
+ * is called.
+ */
+void cpu_throttle_set(int new_throttle_pct);
+
+/**
+ * cpu_throttle_stop:
+ *
+ * Stops the vcpu throttling started by cpu_throttle_set.
+ */
+void cpu_throttle_stop(void);
+
+/**
+ * cpu_throttle_active:
+ *
+ * Returns: %true if the vcpus are currently being throttled, %false otherwise.
+ */
+bool cpu_throttle_active(void);
+
+/**
+ * cpu_throttle_get_percentage:
+ *
+ * Returns the vcpu throttle percentage. See cpu_throttle_set for

Re: [Qemu-devel] [PATCHv3 3/7] vfio: Check guest IOVA ranges against host IOMMU capabilities

2015-09-30 Thread Laurent Vivier


On 30/09/2015 04:13, David Gibson wrote:
> The current vfio core code assumes that the host IOMMU is capable of
> mapping any IOVA the guest wants to use to where we need.  However, real
> IOMMUs generally only support translating a certain range of IOVAs (the
> "DMA window") not a full 64-bit address space.
> 
> The common x86 IOMMUs support a wide enough range that guests are very
> unlikely to go beyond it in practice, however the IOMMU used on IBM Power
> machines - in the default configuration - supports only a much more limited
> IOVA range, usually 0..2GiB.
> 
> If the guest attempts to set up an IOVA range that the host IOMMU can't
> map, qemu won't report an error until it actually attempts to map a bad
> IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
> visible IOMMU) then this will show up very quickly.  If there is a guest
> visible IOMMU, however, the problem might not show up until much later when
> the guest actually attempt to DMA with an IOVA the host can't handle.
> 
> This patch adds a test so that we will detect earlier if the guest is
> attempting to use IOVA ranges that the host IOMMU won't be able to deal
> with.
> 
> For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
> incorrect, but no worse than what we have already.  We can't do better for
> now because the Type1 kernel interface doesn't tell us what IOVA range the
> IOMMU actually supports.
> 
> For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
> IOVA range and validate guest IOVA ranges against it, and this patch does
> so.
> 
> Signed-off-by: David Gibson 
> Reviewed-by: Laurent Vivier 
> ---
>  hw/vfio/common.c  | 40 +---
>  include/hw/vfio/vfio-common.h |  6 ++
>  2 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 95a4850..2faf492 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -343,14 +343,22 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  if (int128_ge(int128_make64(iova), llend)) {
>  return;
>  }
> +end = int128_get64(llend);
> +
> +if ((iova < container->min_iova) || ((end - 1) > container->max_iova)) {
> +error_report("vfio: IOMMU container %p can't map guest IOVA region"
> + " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
> + container, iova, end - 1);
> +ret = -EFAULT;
> +goto fail;
> +}
>  
>  memory_region_ref(section->mr);
>  
>  if (memory_region_is_iommu(section->mr)) {
>  VFIOGuestIOMMU *giommu;
>  
> -trace_vfio_listener_region_add_iommu(iova,
> -int128_get64(int128_sub(llend, int128_one(;
> +trace_vfio_listener_region_add_iommu(iova, end - 1);
>  /*
>   * FIXME: We should do some checking to see if the
>   * capabilities of the host VFIO IOMMU are adequate to model
> @@ -387,7 +395,6 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  
>  /* Here we assume that memory_region_is_ram(section->mr)==true */
>  
> -end = int128_get64(llend);
>  vaddr = memory_region_get_ram_ptr(section->mr) +
>  section->offset_within_region +
>  (iova - section->offset_within_address_space);
> @@ -685,7 +692,19 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>  ret = -errno;
>  goto free_container_exit;
>  }
> +
> +/*
> + * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
> + * IOVA whatsoever.  That's not actually true, but the current
> + * kernel interface doesn't tell us what it can map, and the
> + * existing Type1 IOMMUs generally support any IOVA we're
> + * going to actually try in practice.
> + */
> +container->min_iova = 0;
> +container->max_iova = (hwaddr)-1;
>  } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
> +struct vfio_iommu_spapr_tce_info info;
> +
>  ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
>  if (ret) {
>  error_report("vfio: failed to set group container: %m");
> @@ -710,6 +729,21 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>  ret = -errno;
>  goto free_container_exit;
>  }
> +
> +/*
> + * This only considers the host IOMMU's 32-bit window.  At
> + * some point we need to add support for the optional 64-bit
> + * window and dynamic windows
> + */
> +info.argsz = sizeof(info);
> +ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> +if (ret) {
> +error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed: %m");
> +ret = -errno;
> +goto free_container_exit;
> +}
> +container->min_iova = i

Re: [Qemu-devel] [PATCHv3 7/7] vfio: Expose a VFIO PCI device's group for EEH

2015-09-30 Thread Laurent Vivier


On 30/09/2015 04:13, David Gibson wrote:
> The Enhanced Error Handling (EEH) interface in PAPR operates on units of a
> Partitionable Endpoint (PE).  For VFIO devices, the PE boundaries the guest
> sees must match the PE (i.e. IOMMU group) boundaries on the host.  To
> implement this it will need to discover from VFIO which group a given
> device belongs to.
> 
> This exposes a new vfio_pci_device_group() function for this purpose.
> 
> Signed-off-by: David Gibson 
> ---
>  hw/vfio/pci.c  | 14 ++
>  include/hw/vfio/vfio-pci.h | 11 +++
>  2 files changed, 25 insertions(+)
>  create mode 100644 include/hw/vfio/vfio-pci.h
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index dcabb6d..49ae834 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -35,6 +35,8 @@
>  #include "pci.h"
>  #include "trace.h"
>  
> +#include "hw/vfio/vfio-pci.h"
> +
>  #define MSIX_CAP_LENGTH 12
>  
>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> @@ -2312,6 +2314,18 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
> *vdev)
>  vdev->req_enabled = false;
>  }
>  
> +VFIOGroup *vfio_pci_device_group(PCIDevice *pdev)
> +{
> +VFIOPCIDevice *vdev;
> +
> +if (!object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
> +return NULL;
> +}
> +
> +vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> +return vdev->vbasedev.group;
> +}
> +
>  static int vfio_initfn(PCIDevice *pdev)
>  {
>  VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> diff --git a/include/hw/vfio/vfio-pci.h b/include/hw/vfio/vfio-pci.h
> new file mode 100644
> index 000..32105f7
> --- /dev/null
> +++ b/include/hw/vfio/vfio-pci.h
> @@ -0,0 +1,11 @@
> +#ifndef VFIO_PCI_H
> +#define VFIO_PCI_H
> +
> +#include "qemu/typedefs.h"
> +
> +/* We expose the concept of a VFIOGroup, though not its internals */
> +typedef struct VFIOGroup VFIOGroup;
> +
> +extern VFIOGroup *vfio_pci_device_group(PCIDevice *pdev);
> +
> +#endif /* VFIO_PCI_H */
> 

Reviewed-by: Laurent Vivier 



[Qemu-devel] [PULL 0/6] Migration pull request

2015-09-30 Thread Juan Quintela
Hi

This get the auto-converge changes for migration, please pull.


The following changes since commit b2312c680084ea18cd55fa7093397cad2224ec14:

  Merge remote-tracking branch 'remotes/amit-migration/tags/for-juan-201509' 
into staging (2015-09-29 12:41:19 +0100)

are available in the git repository at:

  git://github.com/juanquintela/qemu.git tags/migration/20150930

for you to fetch changes up to dc3256272cf70b2152279b013a8abb16e0f6fe96:

  migration: Disambiguate MAX_THROTTLE (2015-09-30 09:42:04 +0200)


migration/next for 20150930


Jason J. Herne (5):
  cpu: Provide vcpu throttling interface
  migration: Parameters for auto-converge cpu throttling
  migration: Dynamic cpu throttling for auto-converge
  qmp/hmp: Add throttle ratio to query-migrate and info migrate
  migration: Disambiguate MAX_THROTTLE

Juan Quintela (1):
  migration: yet more possible state transitions

 cpus.c| 78 
 hmp.c | 21 
 include/qom/cpu.h | 42 
 migration/migration.c | 57 +++--
 migration/ram.c   | 89 +--
 qapi-schema.json  | 40 ---
 vl.c  |  1 +
 7 files changed, 263 insertions(+), 65 deletions(-)



Re: [Qemu-devel] [PATCHv3 2/7] vfio: Generalize vfio_listener_region_add failure path

2015-09-30 Thread Laurent Vivier


On 30/09/2015 04:13, David Gibson wrote:
> If a DMA mapping operation fails in vfio_listener_region_add() it
> checks to see if we've already completed initial setup of the
> container.  If so it reports an error so the setup code can fail
> gracefully, otherwise throws a hw_error().
> 
> There are other potential failure cases in vfio_listener_region_add()
> which could benefit from the same logic, so move it to its own
> fail: block.  Later patches can use this to extend other failure cases
> to fail as gracefully as possible under the circumstances.
> 
> Signed-off-by: David Gibson 
> Reviewed-by: Thomas Huth 
> Reviewed-by: Laurent Vivier 
> ---
>  hw/vfio/common.c | 26 +++---
>  1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 1545f62..95a4850 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -399,19 +399,23 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
>   "0x%"HWADDR_PRIx", %p) = %d (%m)",
>   container, iova, end - iova, vaddr, ret);
> +goto fail;
> +}
>  
> -/*
> - * On the initfn path, store the first error in the container so we
> - * can gracefully fail.  Runtime, there's not much we can do other
> - * than throw a hardware error.
> - */
> -if (!container->initialized) {
> -if (!container->error) {
> -container->error = ret;
> -}
> -} else {
> -hw_error("vfio: DMA mapping failed, unable to continue");
> +return;
> +
> +fail:
> +/*
> + * On the initfn path, store the first error in the container so we
> + * can gracefully fail.  Runtime, there's not much we can do other
> + * than throw a hardware error.
> + */
> +if (!container->initialized) {
> +if (!container->error) {
> +container->error = ret;
>  }
> +} else {
> +hw_error("vfio: DMA mapping failed, unable to continue");
>  }
>  }
>  
> 
Reviewed-by: Laurent Vivier 



Re: [Qemu-devel] [PATCH 1/3] target-i386: add a subsection of vcpu's TSC rate in vmstate_x86_cpu

2015-09-30 Thread Dr. David Alan Gilbert
* Haozhong Zhang (haozhong.zh...@intel.com) wrote:
> On Tue, Sep 29, 2015 at 08:00:13PM +0100, Dr. David Alan Gilbert wrote:
> > * Haozhong Zhang (haozhong.zh...@intel.com) wrote:
> > > The newly added subsection 'vmstate_tsc_khz' in this patch results in
> > > vcpu's TSC rate being saved on the source machine and loaded on the
> > > target machine during the migration.
> > > 
> > > Signed-off-by: Haozhong Zhang 
> > 
> > Hi,
> >   I'd appreciate it if you could tie this to only do it on newer
> > machine types; that way it won't break back migration.
> >
> 
> Does "back migration" mean migrating from QEMU w/ vmstate_tsc_khz
> subsection to QEMU w/o that subsection?

Yes; like if we migrate from a newer qemu to an older qemu but with
the same machine type.

Dave

> 
> - Haozhong
> 
> > Dave
> > 
> > > ---
> > >  target-i386/machine.c | 20 
> > >  1 file changed, 20 insertions(+)
> > > 
> > > diff --git a/target-i386/machine.c b/target-i386/machine.c
> > > index 9fa0563..80108a3 100644
> > > --- a/target-i386/machine.c
> > > +++ b/target-i386/machine.c
> > > @@ -752,6 +752,25 @@ static const VMStateDescription vmstate_xss = {
> > >  }
> > >  };
> > >  
> > > +static bool tsc_khz_needed(void *opaque)
> > > +{
> > > +X86CPU *cpu = opaque;
> > > +CPUX86State *env = &cpu->env;
> > > +
> > > +return env->tsc_khz != 0;
> > > +}
> > > +
> > > +static const VMStateDescription vmstate_tsc_khz = {
> > > +.name = "cpu/tsc_khz",
> > > +.version_id = 1,
> > > +.minimum_version_id = 1,
> > > +.needed = tsc_khz_needed,
> > > +.fields = (VMStateField[]) {
> > > +VMSTATE_INT64(env.tsc_khz, X86CPU),
> > > +VMSTATE_END_OF_LIST()
> > > +}
> > > +};
> > > +
> > >  VMStateDescription vmstate_x86_cpu = {
> > >  .name = "cpu",
> > >  .version_id = 12,
> > > @@ -871,6 +890,7 @@ VMStateDescription vmstate_x86_cpu = {
> > >  &vmstate_msr_hyperv_crash,
> > >  &vmstate_avx512,
> > >  &vmstate_xss,
> > > +&vmstate_tsc_khz,
> > >  NULL
> > >  }
> > >  };
> > > -- 
> > > 2.4.8
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH v6 06/24] memfd: add fallback for memfd

2015-09-30 Thread Marc-André Lureau
Hi

- Original Message -
> On Tue, Sep 29, 2015 at 06:34:36PM +0200, marcandre.lur...@redhat.com wrote:
> > From: Marc-André Lureau 
> > 
> > Add an open/unlink/mmap fallback for system that do not support memfd.
> > This patch may require additional SELinux policies to work for enforced
> > systems, but should gracefully fail nonetheless.
> > 
> > Signed-off-by: Marc-André Lureau 
> 
> I'd rather just fail migration.

So we don't provide this compatibility code and migration should fail.

Would it be enough to check if memfd works at early runtime and add a migration 
blocker for vhost-user? Or is it possible to recover if migration fails when 
memfd fails to allocate? I would thing the former is better.

> 
> > ---
> >  util/memfd.c | 22 --
> >  1 file changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/util/memfd.c b/util/memfd.c
> > index 3168902..970b5b0 100644
> > --- a/util/memfd.c
> > +++ b/util/memfd.c
> > @@ -84,8 +84,26 @@ void *qemu_memfd_alloc(const char *name, size_t size,
> > unsigned int seals,
> >  return NULL;
> >  }
> >  } else {
> > -perror("memfd");
> > -return NULL;
> > +const char *tmpdir = getenv("TMPDIR");
> > +gchar *fname;
> > +
> > +tmpdir = tmpdir ? tmpdir : "/tmp";
> > +
> > +fname = g_strdup_printf("%s/memfd-XX", tmpdir);
> > +mfd = mkstemp(fname);
> > +unlink(fname);
> > +g_free(fname);
> > +
> > +if (mfd == -1) {
> > +perror("mkstemp");
> > +return NULL;
> > +}
> > +
> > +if (ftruncate(mfd, size) == -1) {
> > +perror("ftruncate");
> > +close(mfd);
> > +return NULL;
> > +}
> >  }
> >  
> >  ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, mfd, 0);
> > --
> > 2.4.3
> 



Re: [Qemu-devel] feature idea: allow user to run custom scripts

2015-09-30 Thread Dr. David Alan Gilbert
* Markus Armbruster (arm...@redhat.com) wrote:
> "Dr. David Alan Gilbert"  writes:
> 
> > * Peter Maydell (peter.mayd...@linaro.org) wrote:
> >> On 29 September 2015 at 14:11, Dr. David Alan Gilbert
> >>  wrote:
> >> > * Peter Maydell (peter.mayd...@linaro.org) wrote:
> >> >> On 28 September 2015 at 20:43, Programmingkid
> >> >>  wrote:
> >> >> >
> >> >> > On Sep 28, 2015, at 3:29 AM, Markus Armbruster wrote:
> >> >> >> You didn't mention you're talking about a *GUI* feature.
> >> >> >
> >> >> > I'm thinking it would be easier to send in the patch rather
> >> >> > than talk about
> >> >> > what this feature could be.
> >> >>
> >> >> I think Markus and I are trying to save you that effort by
> >> >> pointing out that this is a VM management layer feature,
> >> >> not a core QEMU feature.
> >> >
> >> > OK, so I'm going to agree with Programmingkid here.
> >> > I think this would be a useful feature to have in QEMU; I've
> >> > got gratuitous hacks in some of my test scripts that work
> >> > around it not being there.
> >> >
> >> > I think there are two possible things, both of which seem fairly
> >> > easy:
> >> >   1) Add a -chardev from file that works in this case
> >> >  (I don't think the current chardev file works does it?)
> 
> In general, character devices provide a bidirectional pipe, but -chardev
> file is write-only.  I think you want -chardev pipe.  I don't use it
> myself, because as socat user, I don't have to learn lesser tools :)
> 
> Here's how I use it.  Set up a local socket (any convenient
> bidirectional pipe would do, actually).
> 
> Example: QMP
> 
> # Configuration file for -readconfig
> [chardev "qmp"]
>   backend = "socket"
>   path = "sock-qmp"
>   server = "on"
>   wait = "off"
> 
> [mon "qmp"]
>   mode = "control"
>   chardev = "qmp"
> 
> Example: HMP
> 
> [chardev "hmp"]
>   backend = "socket"
>   path = "sock-hmp"
>   server = "on"
>   wait = "off"
> 
> [mon "hmp"]
>   mode = "readline"
>   chardev = "hmp"
> 
> Then do stuff with it.
> 
> Example: interactive QMP
> 
> $ socat UNIX:sock-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '
> 
> Example: interactive HMP
> 
> $ socat UNIX:sock-hmp READLINE,history=$HOME/.hmp_history
> 
> Arguably superior to our built-in not-quite readline monitor.
> 
> Example: send QMP input from a file, capture its output in a file
> 
> $ socat UNIX:sock-qmp STDIO output

Yes, this example is exactly why I want something less painful.
A -chardev file that allowed read/write would be ideal, to be able to read
a series of commands at startup.

> >> >   2) A 'source' like command.
> 
> QMP?  The command would have to take a filename as argument, and return
> a list of replies.  Probably stop on first failed command.  Pretty
> useless for remote clients, because if you have to upload the file, you
> can just as well send it down the QMP pipe.  Actually, that pretty much
> applies to local clients, too.  Except perhaps for interactive use.  I
> feel a QMP client geared for such use would be the appropriate home for
> this feature.  We have some in scripts/qmp/.
> 
> I don't have an opinion on HMP right now.

If QMP doesn't have a user for it fine; I'm just saying it would be useful
from my point of view in HMP.

> >> Yeah, these are both plausible. Neither of them are GUI features,
> >> though...
> >
> > Well, I don't use the GTK gui; I can see that those who do
> > might want features in it.
> 
> GUI users want GUI features, of course.
> 
> In my opinion, QEMU should leave them to separate GUI shells, because
> doing everything in QEMU distracts from our core mission and we don't
> have GUI expertise[*].  One more point: building in the GUI is
> problematic when you don't trust the guest, because then you really want
> to run QEMU with least privileges.

Given that we have a built in GUI then I can see people wanting to expand
it.

Dave

> 
> 
> [*] Short version of the argument, for the long one, see
> Message-ID: <87oahn51ys@blackfin.pond.sub.org>
> http://lists.gnu.org/archive/html/qemu-devel/2015-08/msg03916.html
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCHv3 5/7] memory: Allow replay of IOMMU mapping notifications

2015-09-30 Thread Laurent Vivier


On 30/09/2015 04:13, David Gibson wrote:
> When we have guest visible IOMMUs, we allow notifiers to be registered
> which will be informed of all changes to IOMMU mappings.  This is used by
> vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
> 
> However, unlike with a memory region listener, an iommu notifier won't be
> told about any mappings which already exist in the (guest) IOMMU at the
> time it is registered.  This can cause problems if hotplugging a VFIO
> device onto a guest bus which had existing guest IOMMU mappings, but didn't
> previously have an VFIO devices (and hence no host IOMMU mappings).
> 
> This adds a memory_region_iommu_replay() function to handle this case.  It
> replays any existing mappings in an IOMMU memory region to a specified
> notifier.  Because the IOMMU memory region doesn't internally remember the
> granularity of the guest IOMMU it has a small hack where the caller must
> specify a granularity at which to replay mappings.
> 
> If there are finer mappings in the guest IOMMU these will be reported in
> the iotlb structures passed to the notifier which it must handle (probably
> causing it to flag an error).  This isn't new - the VFIO iommu notifier
> must already handle notifications about guest IOMMU mappings too short
> for it to represent in the host IOMMU.
> 
> Signed-off-by: David Gibson 
> ---
>  include/exec/memory.h | 13 +
>  memory.c  | 20 
>  2 files changed, 33 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 5baaf48..0f07159 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -583,6 +583,19 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>  void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
>  
>  /**
> + * memory_region_iommu_replay: replay existing IOMMU translations to
> + * a notifier
> + *
> + * @mr: the memory region to observe
> + * @n: the notifier to which to replay iommu mappings
> + * @granularity: Minimum page granularity to replay notifications for
> + * @is_write: Whether to treat the replay as a translate "write"
> + * through the iommu
> + */
> +void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n,
> +hwaddr granularity, bool is_write);
> +
> +/**
>   * memory_region_unregister_iommu_notifier: unregister a notifier for
>   * changes to IOMMU translation entries.
>   *
> diff --git a/memory.c b/memory.c
> index ef87363..1b03d22 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1403,6 +1403,26 @@ void 
> memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
>  notifier_list_add(&mr->iommu_notify, n);
>  }
>  
> +void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n,
> +hwaddr granularity, bool is_write)
> +{
> +hwaddr addr;
> +IOMMUTLBEntry iotlb;
> +
> +for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> +iotlb = mr->iommu_ops->translate(mr, addr, is_write);

in iotlb, there is an "address_mask", on spapr, it is copied from
"page_shift", which is SPAPR_TCE_PAGE_SHIFT (12 -> 4k).

At a first glance, we would like to use it to scan the memory region,
but as granularity could be a greater value, I think it is a better choice.

But the question is: why the iotlb page_size is not equal to the
granularity given by VFIO_IOMMU_GET_INFO _IO ?

> +if (iotlb.perm != IOMMU_NONE) {
> +n->notify(n, &iotlb);
> +}
> +
> +/* if (2^64 - MR size) < granularity, it's possible to get an
> + * infinite loop here.  This should catch such a wraparound */
> +if ((addr + granularity) < addr) {
> +break;
> +}
> +}
> +}
> +
>  void memory_region_unregister_iommu_notifier(Notifier *n)
>  {
>  notifier_remove(n);
> 

As my question is not a bout this particular patch but on another
existing part, I can say:

Reviewed-by: Laurent Vivier 



Re: [Qemu-devel] [RFC v5 0/6] Slow-path for atomic instruction translation

2015-09-30 Thread alvise rigo
Hi Paolo,

On Wed, Sep 30, 2015 at 6:44 AM, Paolo Bonzini  wrote:
>
>
> On 24/09/2015 10:32, Alvise Rigo wrote:
>> The implementation heavily uses the software TLB together with a new
>> bitmap that has been added to the ram_list structure which flags, on a
>> per-CPU basis, all the memory pages that are in the middle of a LoadLink
>> (LL), StoreConditional (SC) operation.  Since all these pages can be
>> accessed directly through the fast-path and alter a vCPU's linked value,
>> the new bitmap has been coupled with a new TLB flag for the TLB virtual
>> address which forces the slow-path execution for all the accesses to a
>> page containing a linked address.
>
> Alvise, Emilio,
>
> I have a doubt about your patches for ll/sc emulation, that I hope you
> can clarify.
>
> From 1ft, both approaches rely on checking a flag during stores.
> This is split between the TLB and the CPUState for Alvise's patches (in
> order to exploit the existing fast-path checks), and entirely in the
> radix tree for Emilio's.  However, the idea is the same.
>
> Now, the patch are okay for serial emulation, but I am not sure if it's
> possible to do lock-free ll/sc emulation, because there is a race.

Do you mean to not use any locking mechanism at all at the emulation side?

>
> If we check the flag before the store, the race is as follows:
>
>CPU0CPU1
>---
>check flag
>load locked:
>   set flag
>   load value (normal load on CPU)
>store
>store conditional (normal store on CPU)
>
> where the sc doesn't fail.  For completeness, if we check it afterwards

Shouldn't this be prevented by the tcg_excl_access_lock in the
patchseries based on mttcg (branch slowpath-for-atomic-v5-mttcg)?
Consider also that CPU0 will always finish its store operation before
the transition "flag not set -> flag set" finishes.

> (which would be possible with Emilio's approach, though not for the
> TLB-based one):
>
>CPU0CPU1
>--
>load locked
>   set bit
>   load value (normal load on CPU)
>store
>store conditional (normal store on CPU)
>check flag
>
> and again the sc doesn't fail.
>
> Most solutions I can think of are impractical:
>
> - hardware ll/sc in CPU1. x86 doesn't have it.
>
> - hardware transactional memory in CPU0, checking the bit after the
> store and abort the transaction (I think).  HTM just doesn't exist.
>
> - some kind of store-in-progress (SIP) flag that ll can test and force
> failure of the corresponding sc.  For example, each CPU could store a
> (last_store_address, last_store_value) tuple. If the value that LL loads
> disagrees with any CPU, the LL would direct the SC to fail.  A store
> would look like:
>
>  store value to last_store_value
>  smp_wmb()
>  store address to last_store_address
>  smp_mb()
>  load TLB or radix tree
>
> The memory barrier orders the store to the SIP flag and the load from
> the TLB, and is probably too expensive. :(

Umm, I agree with you that this could be too expensive.

>
> - some array of atomic global generation counts, incremented
> unconditionally on every store and checked between ll and sc.  Cacheline
> bounce fiesta, hence extremely slow. :(
>
> Tell me I'm wrong. :)
>
> If I'm right, we can still keep the opcodes and implement them with a
> simple cmpxchg.  It would provide a nice generic tool to implement
> atomic operations, and it will work correctly if the target has ll/sc.
> However, ll/sc-on-cmpxchg (e.g., ARM-on-x86) would be susceptible to the
> ABA problem.

This was one of my fears that led me to the ll/sc approach. I think it
could be even more probable in emulation since we can't assume the
distance in time between LLs and SCs to be small to avoid "aba"
accesses.
We could solve this issue once again with an access counter, but this
would require to increment it in the fast path, which will be kill the
performance.

Regards,
alvise

>
> Paolo



Re: [Qemu-devel] [PATCH v3] Add argument filters to the seccomp sandbox

2015-09-30 Thread Namsun Ch'o
> This looks good now.
> Thanks for your contribution.

> Acked-by: Eduardo Otubo 

> ps.: I'll create a pull request with all changes made so far on Friday.

I was told on IRC to submit patches in smaller chunks, with a few new filters
at a time. Should I wait until it is merged, or should I go ahead and post a
v1 patch in a new thread against the patched qemu-seccomp.c now?



[Qemu-devel] [PULL 3/6] migration: Parameters for auto-converge cpu throttling

2015-09-30 Thread Juan Quintela
From: "Jason J. Herne" 

Add migration parameters to allow the user to adjust the parameters
that control cpu throttling when auto-converge is in effect. The added
parameters are as follows:

x-cpu-throttle-initial : Initial percantage of time guest cpus are throttled
when migration auto-converge is activated.

x-cpu-throttle-increment: throttle percantage increase each time
auto-converge detects that migration is not making progress.

Signed-off-by: Jason J. Herne 
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Juan Quintela 
Reviewed-by: Juan Quintela 
---
 hmp.c | 16 
 migration/migration.c | 46 +-
 qapi-schema.json  | 33 ++---
 3 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/hmp.c b/hmp.c
index 3f807b7..48ce372 100644
--- a/hmp.c
+++ b/hmp.c
@@ -272,6 +272,12 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, " %s: %" PRId64,
 MigrationParameter_lookup[MIGRATION_PARAMETER_DECOMPRESS_THREADS],
 params->decompress_threads);
+monitor_printf(mon, " %s: %" PRId64,
+
MigrationParameter_lookup[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL],
+params->x_cpu_throttle_initial);
+monitor_printf(mon, " %s: %" PRId64,
+
MigrationParameter_lookup[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT],
+params->x_cpu_throttle_increment);
 monitor_printf(mon, "\n");
 }

@@ -1221,6 +1227,8 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 bool has_compress_level = false;
 bool has_compress_threads = false;
 bool has_decompress_threads = false;
+bool has_x_cpu_throttle_initial = false;
+bool has_x_cpu_throttle_increment = false;
 int i;

 for (i = 0; i < MIGRATION_PARAMETER_MAX; i++) {
@@ -1235,10 +1243,18 @@ void hmp_migrate_set_parameter(Monitor *mon, const 
QDict *qdict)
 case MIGRATION_PARAMETER_DECOMPRESS_THREADS:
 has_decompress_threads = true;
 break;
+case MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL:
+has_x_cpu_throttle_initial = true;
+break;
+case MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT:
+has_x_cpu_throttle_increment = true;
+break;
 }
 qmp_migrate_set_parameters(has_compress_level, value,
has_compress_threads, value,
has_decompress_threads, value,
+   has_x_cpu_throttle_initial, value,
+   has_x_cpu_throttle_increment, value,
&err);
 break;
 }
diff --git a/migration/migration.c b/migration/migration.c
index e48dd13..8a1af3b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -44,6 +44,9 @@
 #define DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT 2
 /*0: means nocompress, 1: best speed, ... 9: best compress ratio */
 #define DEFAULT_MIGRATE_COMPRESS_LEVEL 1
+/* Define default autoconverge cpu throttle migration parameters */
+#define DEFAULT_MIGRATE_X_CPU_THROTTLE_INITIAL 20
+#define DEFAULT_MIGRATE_X_CPU_THROTTLE_INCREMENT 10

 /* Migration XBZRLE default cache size */
 #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
@@ -71,6 +74,10 @@ MigrationState *migrate_get_current(void)
 DEFAULT_MIGRATE_COMPRESS_THREAD_COUNT,
 .parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
 DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT,
+.parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL] =
+DEFAULT_MIGRATE_X_CPU_THROTTLE_INITIAL,
+.parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT] =
+DEFAULT_MIGRATE_X_CPU_THROTTLE_INCREMENT,
 };

 return ¤t_migration;
@@ -372,6 +379,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS];
 params->decompress_threads =
 s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS];
+params->x_cpu_throttle_initial =
+s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL];
+params->x_cpu_throttle_increment =
+s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT];

 return params;
 }
@@ -494,7 +505,11 @@ void qmp_migrate_set_parameters(bool has_compress_level,
 bool has_compress_threads,
 int64_t compress_threads,
 bool has_decompress_threads,
-int64_t decompress_threads, Error **errp)
+int64_t decompress_threads,
+bool has_x_cpu_throttle_initial,
+int64_t x_cpu_thr

Re: [Qemu-devel] [PATCH v4 42/47] ivshmem: use strtosz()

2015-09-30 Thread Claudio Fontana
On 24.09.2015 13:37, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Use the common qemu utility function to parse the memory size.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/misc/ivshmem.c | 36 +---
>  1 file changed, 5 insertions(+), 31 deletions(-)
> 
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 273db36..0ee61d5 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -646,33 +646,6 @@ static void ivshmem_reset(DeviceState *d)
>  ivshmem_use_msix(s);
>  }
>  
> -static uint64_t ivshmem_get_size(IVShmemState * s, Error **errp) {
> -
> -uint64_t value;
> -char *ptr;
> -
> -value = strtoull(s->sizearg, &ptr, 10);
> -switch (*ptr) {
> -case 0: case 'M': case 'm':
> -value <<= 20;
> -break;
> -case 'G': case 'g':
> -value <<= 30;
> -break;
> -default:
> -error_setg(errp, "invalid ram size: %s", s->sizearg);
> -return 0;
> -}
> -
> -/* BARs must be a power of 2 */
> -if (!is_power_of_2(value)) {
> -error_setg(errp, "size must be power of 2");
> -return 0;
> -}
> -
> -return value;
> -}
> -
>  static int ivshmem_setup_msi(IVShmemState * s)
>  {
>  if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1)) {
> @@ -700,16 +673,17 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
> **errp)
>  uint8_t *pci_conf;
>  uint8_t attr = PCI_BASE_ADDRESS_SPACE_MEMORY |
>  PCI_BASE_ADDRESS_MEM_PREFETCH;
> -Error *local_err = NULL;
>  
>  if (s->sizearg == NULL) {
>  s->ivshmem_size = 4 << 20; /* 4 MB default */
>  } else {
> -s->ivshmem_size = ivshmem_get_size(s, &local_err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +char *end;
> +int64_t size = strtosz(s->sizearg, &end);

hmm the function name is now qemu_strtosz, changed by yourself.
Are you on latest master?

> +if (size < 0 || *end != '\0') {
> +error_setg(errp, "Invalid size %s", s->sizearg);
>  return;
>  }
> +s->ivshmem_size = size;
>  }
>  
>  fifo8_create(&s->incoming_fifo, sizeof(long));
> 




Re: [Qemu-devel] [PATCH v8 23/54] Add migration-capability boolean for postcopy-ram.

2015-09-30 Thread Amit Shah
On (Tue) 29 Sep 2015 [14:22:17], Eric Blake wrote:
> On 09/29/2015 02:37 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > The 'postcopy ram' capability allows postcopy migration of RAM;
> > note that the migration starts off in precopy mode until
> > postcopy mode is triggered (see the migrate_start_postcopy
> > patch later in the series).
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> > Reviewed-by: Juan Quintela 
> > Reviewed-by: Amit Shah 
> > ---
> >  include/migration/migration.h |  1 +
> >  migration/migration.c | 23 +++
> >  qapi-schema.json  |  6 +-
> >  3 files changed, 29 insertions(+), 1 deletion(-)
> 
> Reviewed-by: Eric Blake 
> 
> I'm guessing the plan is to keep this experimental until a bit more
> experience is gained, to make sure we aren't missing anything essential
> in the use of postcopy.

>From the cover letter:

I'm keeping the x-  for now, until the libvirt interface gets finalised.

I expect, though, that we'll merge this series in 2.5, and remove the
x- before the 2.5 release.  My main concern of the Linux interface
being not released in a stable release will be satisfied with the 4.3
kernel release.

Any concerns from the libvirt side?



Amit



[Qemu-devel] [PULL 1/6] migration: yet more possible state transitions

2015-09-30 Thread Juan Quintela
On destination, we move from INMIGRATE to FINISH_MIGRATE.  Add that to
the list of allowed states.

Signed-off-by: Juan Quintela 
Reviewed-by: Juan Quintela 
---
 vl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/vl.c b/vl.c
index e211f6a..8d1846c 100644
--- a/vl.c
+++ b/vl.c
@@ -580,6 +580,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_INMIGRATE, RUN_STATE_SUSPENDED },
 { RUN_STATE_INMIGRATE, RUN_STATE_WATCHDOG },
 { RUN_STATE_INMIGRATE, RUN_STATE_GUEST_PANICKED },
+{ RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE },

 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
-- 
2.4.3




Re: [Qemu-devel] qemu-system-alpha -nographic does not work

2015-09-30 Thread Richard Henderson

On 09/30/2015 02:36 PM, Dennis Luehring wrote:

~/qemu/alpha-softmmu/qemu-system-alpha -m 1GB -monitor 
telnet::4440,server,nowait\
  -kernel vmlinux.img-2.6.26-2-alpha-generic -initrd
initrd.img-2.6.26-2-alpha-generic\
  -net nic -net user -hda alpha.qcow2\
  -drive file=debian-5010-alpha-netinst.iso,if=ide,media=cdrom -append
'root=/dev/hda3' #-serial telnet::3000,server -nographic


You forgot "-append console=ttyS0".  The kernel simply isn't writing to the 
serial port.



r~



Re: [Qemu-devel] [PATCH 2/3] hw: do not pass NULL to memory_region_init from instance_init

2015-09-30 Thread Thomas Huth
On 29/09/15 14:37, Paolo Bonzini wrote:
> This causes the region to outlive the object, because it attaches the
> region to /machine.  This is not nice for the "realize" method, but
> much worse for "instance_init" because it can cause dangling pointers
> after a simple object_new/object_unref pair.
> 
> Reported-by: Markus Armbruster 
> Signed-off-by: Paolo Bonzini 
...
> diff --git a/hw/display/tcx.c b/hw/display/tcx.c
> index 4635800..bf119bc 100644
> --- a/hw/display/tcx.c
> +++ b/hw/display/tcx.c
> @@ -944,7 +944,7 @@ static void tcx_initfn(Object *obj)
>  SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
>  TCXState *s = TCX(obj);
>  
> -memory_region_init_ram(&s->rom, NULL, "tcx.prom", FCODE_MAX_ROM_SIZE,
> +memory_region_init_ram(&s->rom, OBJECT(s), "tcx.prom", 
> FCODE_MAX_ROM_SIZE,
> &error_fatal);

Why "OBJECT(s)" and not simply "obj" ?

 Thomas




Re: [Qemu-devel] feature idea: allow user to run custom scripts

2015-09-30 Thread Markus Armbruster
"Dr. David Alan Gilbert"  writes:

> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>> On 29 September 2015 at 14:11, Dr. David Alan Gilbert
>>  wrote:
>> > * Peter Maydell (peter.mayd...@linaro.org) wrote:
>> >> On 28 September 2015 at 20:43, Programmingkid
>> >>  wrote:
>> >> >
>> >> > On Sep 28, 2015, at 3:29 AM, Markus Armbruster wrote:
>> >> >> You didn't mention you're talking about a *GUI* feature.
>> >> >
>> >> > I'm thinking it would be easier to send in the patch rather
>> >> > than talk about
>> >> > what this feature could be.
>> >>
>> >> I think Markus and I are trying to save you that effort by
>> >> pointing out that this is a VM management layer feature,
>> >> not a core QEMU feature.
>> >
>> > OK, so I'm going to agree with Programmingkid here.
>> > I think this would be a useful feature to have in QEMU; I've
>> > got gratuitous hacks in some of my test scripts that work
>> > around it not being there.
>> >
>> > I think there are two possible things, both of which seem fairly
>> > easy:
>> >   1) Add a -chardev from file that works in this case
>> >  (I don't think the current chardev file works does it?)

In general, character devices provide a bidirectional pipe, but -chardev
file is write-only.  I think you want -chardev pipe.  I don't use it
myself, because as socat user, I don't have to learn lesser tools :)

Here's how I use it.  Set up a local socket (any convenient
bidirectional pipe would do, actually).

Example: QMP

# Configuration file for -readconfig
[chardev "qmp"]
  backend = "socket"
  path = "sock-qmp"
  server = "on"
  wait = "off"

[mon "qmp"]
  mode = "control"
  chardev = "qmp"

Example: HMP

[chardev "hmp"]
  backend = "socket"
  path = "sock-hmp"
  server = "on"
  wait = "off"

[mon "hmp"]
  mode = "readline"
  chardev = "hmp"

Then do stuff with it.

Example: interactive QMP

$ socat UNIX:sock-qmp READLINE,history=$HOME/.qmp_history,prompt='QMP> '

Example: interactive HMP

$ socat UNIX:sock-hmp READLINE,history=$HOME/.hmp_history

Arguably superior to our built-in not-quite readline monitor.

Example: send QMP input from a file, capture its output in a file

$ socat UNIX:sock-qmp STDIO output

>> >   2) A 'source' like command.

QMP?  The command would have to take a filename as argument, and return
a list of replies.  Probably stop on first failed command.  Pretty
useless for remote clients, because if you have to upload the file, you
can just as well send it down the QMP pipe.  Actually, that pretty much
applies to local clients, too.  Except perhaps for interactive use.  I
feel a QMP client geared for such use would be the appropriate home for
this feature.  We have some in scripts/qmp/.

I don't have an opinion on HMP right now.

>> Yeah, these are both plausible. Neither of them are GUI features,
>> though...
>
> Well, I don't use the GTK gui; I can see that those who do
> might want features in it.

GUI users want GUI features, of course.

In my opinion, QEMU should leave them to separate GUI shells, because
doing everything in QEMU distracts from our core mission and we don't
have GUI expertise[*].  One more point: building in the GUI is
problematic when you don't trust the guest, because then you really want
to run QEMU with least privileges.


[*] Short version of the argument, for the long one, see
Message-ID: <87oahn51ys@blackfin.pond.sub.org>
http://lists.gnu.org/archive/html/qemu-devel/2015-08/msg03916.html



Re: [Qemu-devel] [PATCH v4 31/47] contrib: add ivshmem client and server

2015-09-30 Thread Claudio Fontana
On 24.09.2015 13:37, marcandre.lur...@redhat.com wrote:
> From: David Marchand 
> 
> When using ivshmem devices, notifications between guests can be sent as
> interrupts using a ivshmem-server (typical use described in documentation).
> The client is provided as a debug tool.
> 
> Signed-off-by: Olivier Matz 
> Signed-off-by: David Marchand 
> [fix a valgrind warning, option and server_close() segvs, extra server
> headers includes]
> Signed-off-by: Marc-André Lureau 

two small things below, the return value of getopt is int.

> ---
>  Makefile|   8 +
>  configure   |   3 +
>  contrib/ivshmem-client/ivshmem-client.c | 433 
> 
>  contrib/ivshmem-client/ivshmem-client.h | 212 
>  contrib/ivshmem-client/main.c   | 239 ++
>  contrib/ivshmem-server/ivshmem-server.c | 422 +++
>  contrib/ivshmem-server/ivshmem-server.h | 166 
>  contrib/ivshmem-server/main.c   | 264 +++
>  qemu-doc.texi   |  10 +-
>  9 files changed, 1754 insertions(+), 3 deletions(-)
>  create mode 100644 contrib/ivshmem-client/ivshmem-client.c
>  create mode 100644 contrib/ivshmem-client/ivshmem-client.h
>  create mode 100644 contrib/ivshmem-client/main.c
>  create mode 100644 contrib/ivshmem-server/ivshmem-server.c
>  create mode 100644 contrib/ivshmem-server/ivshmem-server.h
>  create mode 100644 contrib/ivshmem-server/main.c
> 
> diff --git a/Makefile b/Makefile
> index 8ec9b69..8e5dc12 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -325,6 +325,14 @@ msi:
>   @echo "MSI build not configured or dependency resolution failed 
> (reconfigure with --enable-guest-agent-msi option)"
>  endif
>  
> +IVSHMEM_CLIENT_OBJS=$(addprefix $(SRC_PATH)/contrib/ivshmem-client/, 
> ivshmem-client.o main.o)
> +ivshmem-client$(EXESUF): $(IVSHMEM_CLIENT_OBJS)
> + $(call LINK, $^)
> +
> +IVSHMEM_SERVER_OBJS=$(addprefix $(SRC_PATH)/contrib/ivshmem-server/, 
> ivshmem-server.o main.o)
> +ivshmem-server$(EXESUF): $(IVSHMEM_SERVER_OBJS) libqemuutil.a libqemustub.a
> + $(call LINK, $^)
> +
>  clean:
>  # avoid old build problems by removing potentially incorrect old files
>   rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h 
> gen-op-arm.h
> diff --git a/configure b/configure
> index 52f5b79..88f518f 100755
> --- a/configure
> +++ b/configure
> @@ -4375,6 +4375,9 @@ if test "$want_tools" = "yes" ; then
>if [ "$linux" = "yes" -o "$bsd" = "yes" -o "$solaris" = "yes" ] ; then
>  tools="qemu-nbd\$(EXESUF) $tools"
>fi
> +  if [ "$kvm" = "yes" ] ; then
> +tools="ivshmem-client\$(EXESUF) ivshmem-server\$(EXESUF) $tools"
> +  fi
>  fi
>  if test "$softmmu" = yes ; then
>if test "$virtfs" != no ; then
> diff --git a/contrib/ivshmem-client/ivshmem-client.c 
> b/contrib/ivshmem-client/ivshmem-client.c
> new file mode 100644
> index 000..11c805c
> --- /dev/null
> +++ b/contrib/ivshmem-client/ivshmem-client.c
> @@ -0,0 +1,433 @@
> +/*
> + * Copyright 6WIND S.A., 2014
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * (at your option) any later version.  See the COPYING file in the
> + * top-level directory.
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "qemu-common.h"
> +#include "qemu/queue.h"
> +
> +#include "ivshmem-client.h"
> +
> +/* log a message on stdout if verbose=1 */
> +#define IVSHMEM_CLIENT_DEBUG(client, fmt, ...) do { \
> +if ((client)->verbose) { \
> +printf(fmt, ## __VA_ARGS__); \
> +}\
> +} while (0)
> +
> +/* read message from the unix socket */
> +static int
> +ivshmem_client_read_one_msg(IvshmemClient *client, long *index, int *fd)
> +{
> +int ret;
> +struct msghdr msg;
> +struct iovec iov[1];
> +union {
> +struct cmsghdr cmsg;
> +char control[CMSG_SPACE(sizeof(int))];
> +} msg_control;
> +struct cmsghdr *cmsg;
> +
> +iov[0].iov_base = index;
> +iov[0].iov_len = sizeof(*index);
> +
> +memset(&msg, 0, sizeof(msg));
> +msg.msg_iov = iov;
> +msg.msg_iovlen = 1;
> +msg.msg_control = &msg_control;
> +msg.msg_controllen = sizeof(msg_control);
> +
> +ret = recvmsg(client->sock_fd, &msg, 0);
> +if (ret < 0) {
> +IVSHMEM_CLIENT_DEBUG(client, "cannot read message: %s\n",
> + strerror(errno));
> +return -1;
> +}
> +if (ret == 0) {
> +IVSHMEM_CLIENT_DEBUG(client, "lost connection to server\n");
> +return -1;
> +}
> +
> +*fd = -1;
> +
> +for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
> +
> +if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)) ||
> +cmsg->cmsg_level != SOL_SOCKET ||
> +cmsg->cmsg_type != SCM_RIGHTS) {
> +continue;
> +}
> +
> +memcp

Re: [Qemu-devel] [PATCHv3 4/7] vfio: Record host IOMMU's available IO page sizes

2015-09-30 Thread Laurent Vivier


On 30/09/2015 04:13, David Gibson wrote:
> Depending on the host IOMMU type we determine and record the available page
> sizes for IOMMU translation.  We'll need this for other validation in
> future patches.
> 
> Signed-off-by: David Gibson 
> Reviewed-by: Thomas Huth 
> Reviewed-by: Laurent Vivier 
> ---
>  hw/vfio/common.c  | 13 +
>  include/hw/vfio/vfio-common.h |  1 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 2faf492..f666de2 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -677,6 +677,7 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>  if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
>  ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
>  bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
> +struct vfio_iommu_type1_info info;
>  
>  ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
>  if (ret) {
> @@ -702,6 +703,15 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>   */
>  container->min_iova = 0;
>  container->max_iova = (hwaddr)-1;
> +
> +/* Assume just 4K IOVA page size */
> +container->iova_pgsizes = 0x1000;
> +info.argsz = sizeof(info);
> +ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
> +/* Ignore errors */
> +if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
> +container->iova_pgsizes = info.iova_pgsizes;
> +}
>  } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
>  struct vfio_iommu_spapr_tce_info info;
>  
> @@ -744,6 +754,9 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>  }
>  container->min_iova = info.dma32_window_start;
>  container->max_iova = container->min_iova + info.dma32_window_size - 
> 1;
> +
> +/* Assume just 4K IOVA pages for now */
> +container->iova_pgsizes = 0x1000;
>  } else {
>  error_report("vfio: No available IOMMU models");
>  ret = -EINVAL;
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 27a14c0..f037f3c 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -71,6 +71,7 @@ typedef struct VFIOContainer {
>   * future
>   */
>  hwaddr min_iova, max_iova;
> +uint64_t iova_pgsizes;
>  QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>  QLIST_HEAD(, VFIOGroup) group_list;
>  QLIST_ENTRY(VFIOContainer) next;
> 
Reviewed-by: Laurent Vivier 



Re: [Qemu-devel] [PATCH v12 0/5] remove icc bus/bridge

2015-09-30 Thread Zhu Guihua

Hi Eduardo,

Can you help merge this patch series to your x86 tree?

Thanks,
Zhu

On 09/16/2015 05:19 PM, Zhu Guihua wrote:

ICC Bus was used for providing a hotpluggable bus for APIC and CPU, but now we
use HotplugHandler to make hotplug. So ICC Bus is unnecessary.

This code has passed the new pc-cpu-test.
And I have tested with kvm along with kernel_irqchip=on/off, it works fine.

This patch series is based on the latest master.

v12:
  -move APIC MMIO mapping into x86_cpu_apic_realize()
  -change commit message in PATCH 4

v11:
  -improve commit messages
  -split per CPU AS change into a separate patch

v10:
  -improve commit messages in patch 1 and 2
  -make the check of cpu->cpu_as_root simplier

v9:
  -use a callback to correct reset sequence for x86
  -update apic mmio mapping

Chen Fan (2):
   apic: move APIC's MMIO region mapping into APIC
   cpu/apic: drop icc bus/bridge

Zhu Guihua (3):
   apic: use per CPU AS to map APIC MMIO for TCG
   x86: use new method to correct reset sequence
   icc_bus: drop the unused files

  default-configs/i386-softmmu.mak   |   1 -
  default-configs/x86_64-softmmu.mak |   1 -
  hw/cpu/Makefile.objs   |   1 -
  hw/cpu/icc_bus.c   | 118 -
  hw/i386/pc.c   |  46 ---
  hw/i386/pc_piix.c  |   9 +--
  hw/i386/pc_q35.c   |   9 +--
  hw/intc/apic_common.c  |  11 +---
  include/hw/cpu/icc_bus.h   |  82 --
  include/hw/i386/apic_internal.h|   7 ++-
  include/hw/i386/pc.h   |   2 +-
  target-i386/cpu.c  |  33 ---
  12 files changed, 58 insertions(+), 262 deletions(-)
  delete mode 100644 hw/cpu/icc_bus.c
  delete mode 100644 include/hw/cpu/icc_bus.h






Re: [Qemu-devel] [PATCHv3 1/7] vfio: Remove unneeded union from VFIOContainer

2015-09-30 Thread Laurent Vivier


On 30/09/2015 04:13, David Gibson wrote:
> Currently the VFIOContainer iommu_data field contains a union with
> different information for different host iommu types.  However:
>* It only actually contains information for the x86-like "Type1" iommu
>* Because we have a common listener the Type1 fields are actually used
> on all IOMMU types, including the SPAPR TCE type as well
> 
> In fact we now have a general structure for the listener which is unlikely
> to ever need per-iommu-type information, so this patch removes the union.
> 
> In a similar way we can unify the setup of the vfio memory listener in
> vfio_connect_container() that is currently split across a switch on iommu
> type, but is effectively the same in both cases.
> 
> The iommu_data.release pointer was only needed as a cleanup function
> which would handle potentially different data in the union.  With the
> union gone, it too can be removed.
> 
> Signed-off-by: David Gibson 
> ---
>  hw/vfio/common.c  | 52 
> ---
>  include/hw/vfio/vfio-common.h | 16 +++--
>  2 files changed, 22 insertions(+), 46 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 0d341a3..1545f62 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -315,8 +315,7 @@ out:
>  static void vfio_listener_region_add(MemoryListener *listener,
>   MemoryRegionSection *section)
>  {
> -VFIOContainer *container = container_of(listener, VFIOContainer,
> -iommu_data.type1.listener);
> +VFIOContainer *container = container_of(listener, VFIOContainer, 
> listener);
>  hwaddr iova, end;
>  Int128 llend;
>  void *vaddr;
> @@ -406,9 +405,9 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>   * can gracefully fail.  Runtime, there's not much we can do other
>   * than throw a hardware error.
>   */
> -if (!container->iommu_data.type1.initialized) {
> -if (!container->iommu_data.type1.error) {
> -container->iommu_data.type1.error = ret;
> +if (!container->initialized) {
> +if (!container->error) {
> +container->error = ret;
>  }
>  } else {
>  hw_error("vfio: DMA mapping failed, unable to continue");
> @@ -419,8 +418,7 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  static void vfio_listener_region_del(MemoryListener *listener,
>   MemoryRegionSection *section)
>  {
> -VFIOContainer *container = container_of(listener, VFIOContainer,
> -iommu_data.type1.listener);
> +VFIOContainer *container = container_of(listener, VFIOContainer, 
> listener);
>  hwaddr iova, end;
>  int ret;
>  
> @@ -485,7 +483,7 @@ static const MemoryListener vfio_memory_listener = {
>  
>  static void vfio_listener_release(VFIOContainer *container)
>  {
> -memory_listener_unregister(&container->iommu_data.type1.listener);
> +memory_listener_unregister(&container->listener);
>  }
>  
>  int vfio_mmap_region(Object *obj, VFIORegion *region,
> @@ -683,21 +681,6 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>  ret = -errno;
>  goto free_container_exit;
>  }
> -
> -container->iommu_data.type1.listener = vfio_memory_listener;
> -container->iommu_data.release = vfio_listener_release;
> -
> -memory_listener_register(&container->iommu_data.type1.listener,
> - container->space->as);
> -
> -if (container->iommu_data.type1.error) {
> -ret = container->iommu_data.type1.error;
> -error_report("vfio: memory listener initialization failed for 
> container");
> -goto listener_release_exit;
> -}
> -
> -container->iommu_data.type1.initialized = true;
> -
>  } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
>  ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
>  if (ret) {
> @@ -723,19 +706,24 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>  ret = -errno;
>  goto free_container_exit;
>  }
> -
> -container->iommu_data.type1.listener = vfio_memory_listener;
> -container->iommu_data.release = vfio_listener_release;
> -
> -memory_listener_register(&container->iommu_data.type1.listener,
> - container->space->as);
> -
>  } else {
>  error_report("vfio: No available IOMMU models");
>  ret = -EINVAL;
>  goto free_container_exit;
>  }
>  
> +container->listener = vfio_memory_listener;
> +
> +memory_listener_register(&container->listener, container->space->as);
> +
> +if (container->error) {
> +ret 

Re: [Qemu-devel] [PATCHv3 5/7] memory: Allow replay of IOMMU mapping notifications

2015-09-30 Thread Laurent Vivier


On 30/09/2015 04:13, David Gibson wrote:
> When we have guest visible IOMMUs, we allow notifiers to be registered
> which will be informed of all changes to IOMMU mappings.  This is used by
> vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
> 
> However, unlike with a memory region listener, an iommu notifier won't be
> told about any mappings which already exist in the (guest) IOMMU at the
> time it is registered.  This can cause problems if hotplugging a VFIO
> device onto a guest bus which had existing guest IOMMU mappings, but didn't
> previously have an VFIO devices (and hence no host IOMMU mappings).
> 
> This adds a memory_region_iommu_replay() function to handle this case.  It
> replays any existing mappings in an IOMMU memory region to a specified
> notifier.  Because the IOMMU memory region doesn't internally remember the
> granularity of the guest IOMMU it has a small hack where the caller must
> specify a granularity at which to replay mappings.
> 
> If there are finer mappings in the guest IOMMU these will be reported in
> the iotlb structures passed to the notifier which it must handle (probably
> causing it to flag an error).  This isn't new - the VFIO iommu notifier
> must already handle notifications about guest IOMMU mappings too short
> for it to represent in the host IOMMU.
> 
> Signed-off-by: David Gibson 
> ---
>  include/exec/memory.h | 13 +
>  memory.c  | 20 
>  2 files changed, 33 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 5baaf48..0f07159 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -583,6 +583,19 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>  void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
>  
>  /**
> + * memory_region_iommu_replay: replay existing IOMMU translations to
> + * a notifier
> + *
> + * @mr: the memory region to observe
> + * @n: the notifier to which to replay iommu mappings
> + * @granularity: Minimum page granularity to replay notifications for
> + * @is_write: Whether to treat the replay as a translate "write"
> + * through the iommu
> + */
> +void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n,
> +hwaddr granularity, bool is_write);
> +
> +/**
>   * memory_region_unregister_iommu_notifier: unregister a notifier for
>   * changes to IOMMU translation entries.
>   *
> diff --git a/memory.c b/memory.c
> index ef87363..1b03d22 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1403,6 +1403,26 @@ void 
> memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
>  notifier_list_add(&mr->iommu_notify, n);
>  }
>  
> +void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n,
> +hwaddr granularity, bool is_write)
> +{
> +hwaddr addr;
> +IOMMUTLBEntry iotlb;
> +
> +for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> +iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +if (iotlb.perm != IOMMU_NONE) {
> +n->notify(n, &iotlb);
> +}
> +
> +/* if (2^64 - MR size) < granularity, it's possible to get an
> + * infinite loop here.  This should catch such a wraparound */
> +if ((addr + granularity) < addr) {
> +break;
> +}
> +}
> +}
> +
>  void memory_region_unregister_iommu_notifier(Notifier *n)
>  {
>  notifier_remove(n);
> 
Reviewed-by: Laurent Vivier 



Re: [Qemu-devel] [Qemu-ppc] [RFC/PATCH] monitor/ppc: Access all SPRs from the monitor

2015-09-30 Thread Benjamin Herrenschmidt
On Wed, 2015-09-30 at 16:03 +1000, David Gibson wrote:
> On Sun, Sep 27, 2015 at 04:31:16PM +1000, Benjamin Herrenschmidt wrote:
> > We already have a table with all supported SPRs along with their names,
> > so let's use that rather than a duplicate table that is perpetually
> > out of sync in the monitor code.
> > 
> > This adds a new monitor hook target_extra_monitor_def() which is called
> > if nothing is found is the normal table. We still use the old mechanism
> > for anything that isn't an SPR.
> > 
> > Signed-off-by: Benjamin Herrenschmidt 
> 
> This looks like a good idea, but it seems to be a slightly different
> approach from the one taken by some rather similar patches Alexey
> posted recently.
> 
> Would you care to co-ordinate on which of those approaches to go ahead
> with?

The code upstream has changed quite a bit...

> [snip]
> > @@ -253,3 +180,23 @@ const MonitorDef *target_monitor_defs(void)
> >  {
> >  return monitor_defs;
> >  }
> > +
> > +int target_extra_monitor_def(uint64_t *pval, const char *name)
> > +{
> > + /* On ppc, search through the SPRs so we can print any of them */
> > +{
>^
> Also, this appears to be a redundant set of braces.

Ah right, that used to be inside the caller (monitor_defs()) and I
moved it to a hook and forgot to take out the extra braces.

I'll respin.

>  +CPUArchState *env = mon_get_cpu_env();
> > +ppc_spr_t *spr_cb = env->spr_cb;
> > +int i;
> > +
> > +for (i = 0; i < 1024; i++) {
> > +if (!spr_cb[i].name || strcasecmp(name, spr_cb[i].name)) {
> > +continue;
> > +}
> > +*pval = env->spr[i];
> > +return 0;
> > +}
> > +}
> > +return -1;
> > +}
> > +
> > 
> > 
> > 
> 



Re: [Qemu-devel] [Qemu-ppc] [PATCH] ppc/spapr: Allow VIRTIO_VGA

2015-09-30 Thread Gerd Hoffmann
On Mi, 2015-09-30 at 15:42 +1000, David Gibson wrote:
> On Wed, Sep 16, 2015 at 08:52:23AM +0200, Gerd Hoffmann wrote:
> > On Mi, 2015-09-16 at 07:08 +1000, Benjamin Herrenschmidt wrote:
> > > On Tue, 2015-09-15 at 11:19 +0200, Gerd Hoffmann wrote:
> > > > On Di, 2015-09-15 at 15:51 +1000, Benjamin Herrenschmidt wrote:
> > > > > It works fine with the Linux driver out of the box
> > > > 
> > > > Do you actually want the vga compatibility bits on pseries?
> > > 
> > > Yes, our firmware SLOF uses them (via MMIO BARs) for the early boot
> > > stuff (well, it will use them when the patches I sent are merged).
> > 
> > Fine then, patch queued up.
> 
> Just to clarify, Gerd,
> 
> You've taken this through your tree and I don't need to stage it in
> spapr-next?

If you prepare a spapr pull req anyway feel free to include it there.
Otherwise it'll go in with my next vga pull request, it's sitting in the
vga queue (it is the only patch there though ...)

cheers,
  Gerd





[Qemu-devel] [PATCH v4 01/26] tcg: Rename debug_insn_start to insn_start

2015-09-30 Thread Richard Henderson
With an eye toward making it mandatory.

Reviewed-by: Aurelien Jarno 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target-alpha/translate.c  | 2 +-
 target-arm/translate-a64.c| 2 +-
 target-arm/translate.c| 2 +-
 target-cris/translate.c   | 4 ++--
 target-cris/translate_v10.c   | 2 +-
 target-i386/translate.c   | 2 +-
 target-lm32/translate.c   | 2 +-
 target-m68k/translate.c   | 2 +-
 target-microblaze/translate.c | 2 +-
 target-mips/translate.c   | 2 +-
 target-moxie/translate.c  | 2 +-
 target-openrisc/translate.c   | 2 +-
 target-ppc/translate.c| 2 +-
 target-s390x/translate.c  | 2 +-
 target-sh4/translate.c| 2 +-
 target-sparc/translate.c  | 2 +-
 target-tilegx/translate.c | 2 +-
 target-unicore32/translate.c  | 2 +-
 target-xtensa/translate.c | 2 +-
 tcg/tcg-op.h  | 6 +++---
 tcg/tcg-opc.h | 4 ++--
 tcg/tcg.c | 6 +++---
 tci.c | 9 -
 23 files changed, 28 insertions(+), 37 deletions(-)

diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 2ba5fb8..76916f4 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2940,7 +2940,7 @@ static inline void 
gen_intermediate_code_internal(AlphaCPU *cpu,
 num_insns++;
 
if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
-tcg_gen_debug_insn_start(ctx.pc);
+tcg_gen_insn_start(ctx.pc);
 }
 
 TCGV_UNUSED_I64(ctx.zero);
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index ec0936c..a618711 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -11109,7 +11109,7 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu,
 }
 
 if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
-tcg_gen_debug_insn_start(dc->pc);
+tcg_gen_insn_start(dc->pc);
 }
 
 if (dc->ss_active && !dc->pstate_ss) {
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 84a21ac..b521fc8 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -11353,7 +11353,7 @@ static inline void 
gen_intermediate_code_internal(ARMCPU *cpu,
 gen_io_start();
 
 if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
-tcg_gen_debug_insn_start(dc->pc);
+tcg_gen_insn_start(dc->pc);
 }
 
 if (dc->ss_active && !dc->pstate_ss) {
diff --git a/target-cris/translate.c b/target-cris/translate.c
index d5b54e1..c5a22af 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -2995,8 +2995,8 @@ static unsigned int crisv32_decoder(CPUCRISState *env, 
DisasContext *dc)
 int i;
 
 if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
-tcg_gen_debug_insn_start(dc->pc);
-}
+tcg_gen_insn_start(dc->pc);
+}
 
 /* Load a halfword onto the instruction register.  */
 dc->ir = cris_fetch(env, dc, dc->pc, 2, 0);
diff --git a/target-cris/translate_v10.c b/target-cris/translate_v10.c
index da0b2ca..12d7dfc 100644
--- a/target-cris/translate_v10.c
+++ b/target-cris/translate_v10.c
@@ -1200,7 +1200,7 @@ static unsigned int crisv10_decoder(CPUCRISState *env, 
DisasContext *dc)
 unsigned int insn_len = 2;
 
 if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP)))
-tcg_gen_debug_insn_start(dc->pc);
+tcg_gen_insn_start(dc->pc);
 
 /* Load a halfword onto the instruction register.  */
 dc->ir = cpu_lduw_code(env, dc->pc);
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 8b35de1..c18f82b 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -4402,7 +4402,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 int rex_w, rex_r;
 
 if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
-tcg_gen_debug_insn_start(pc_start);
+tcg_gen_insn_start(pc_start);
 }
 s->pc = pc_start;
 prefixes = 0;
diff --git a/target-lm32/translate.c b/target-lm32/translate.c
index cf7042e..b1b4cbb 100644
--- a/target-lm32/translate.c
+++ b/target-lm32/translate.c
@@ -1006,7 +1006,7 @@ static const DecoderInfo decinfo[] = {
 static inline void decode(DisasContext *dc, uint32_t ir)
 {
 if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
-tcg_gen_debug_insn_start(dc->pc);
+tcg_gen_insn_start(dc->pc);
 }
 
 dc->ir = ir;
diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 3cdf665..e34bf2b 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2956,7 +2956,7 @@ static void disas_m68k_insn(CPUM68KState * env, 
DisasContext *s)
 uint16_t insn;
 
 if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
-tcg_gen_debug_insn_start(s->pc);
+tcg_gen_insn_start(s->pc);
 }
 
   

Re: [Qemu-devel] [Qemu-ppc] [PATCH 00/10][TRIVIAL] Define categories for some PPC devices

2015-09-30 Thread David Gibson
On Sat, Sep 26, 2015 at 06:22:02PM +0200, Laurent Vivier wrote:
> Some PPC devices appear uncategorized in the output of
> "-device ?". This series tries to categorize some of
> them.

These all look good to me.

I've merged them to a new 'ppc-next' staging branch at
git://github.com/dgibson/qemu.git

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpJk5V_5QoRc.pgp
Description: PGP signature


[Qemu-devel] [PATCH v4 26/26] tcg: Adjust CODE_GEN_AVG_BLOCK_SIZE

2015-09-30 Thread Richard Henderson
At present, the "average" guestimate of TB size is way too small, leading
to many unused entries in the pre-allocated TB array.  For a guest with 1GB
ram, we're currently allocating 256MB for the array.

Survey arm, alpha, aarch64, ppc, sparc, i686, x86_64 guests running on
x86_64 and ppc64 hosts and select a new average.  The size of the array
drops to 81MB with no more flushing than before.

Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 71c9d85..a63fd60 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -161,13 +161,14 @@ static inline void tlb_flush_by_mmuidx(CPUState *cpu, ...)
 #define CODE_GEN_PHYS_HASH_BITS 15
 #define CODE_GEN_PHYS_HASH_SIZE (1 << CODE_GEN_PHYS_HASH_BITS)
 
-/* estimated block size for TB allocation */
-/* XXX: use a per code average code fragment size and modulate it
-   according to the host CPU */
+/* Estimated block size for TB allocation.  */
+/* ??? The following is based on a 2015 survey of x86_64 host output.
+   Better would seem to be some sort of dynamically sized TB array,
+   adapting to the block sizes actually being produced.  */
 #if defined(CONFIG_SOFTMMU)
-#define CODE_GEN_AVG_BLOCK_SIZE 128
+#define CODE_GEN_AVG_BLOCK_SIZE 400
 #else
-#define CODE_GEN_AVG_BLOCK_SIZE 64
+#define CODE_GEN_AVG_BLOCK_SIZE 150
 #endif
 
 #if defined(__arm__) || defined(_ARCH_PPC) \
-- 
2.4.3




[Qemu-devel] [PATCH v4 24/26] tcg: Allocate a guard page after code_gen_buffer

2015-09-30 Thread Richard Henderson
This will catch any overflow of the buffer.

Add a native win32 alternative for alloc_code_gen_buffer;
remove the malloc alternative.

Signed-off-by: Richard Henderson 
---
 translate-all.c | 210 
 1 file changed, 119 insertions(+), 91 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 0e8d176..b43bd03 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -312,31 +312,6 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
 return false;
 }
 
-#ifdef _WIN32
-static __attribute__((unused)) void map_exec(void *addr, long size)
-{
-DWORD old_protect;
-VirtualProtect(addr, size,
-   PAGE_EXECUTE_READWRITE, &old_protect);
-}
-#else
-static __attribute__((unused)) void map_exec(void *addr, long size)
-{
-unsigned long start, end, page_size;
-
-page_size = getpagesize();
-start = (unsigned long)addr;
-start &= ~(page_size - 1);
-
-end = (unsigned long)addr + size;
-end += page_size - 1;
-end &= ~(page_size - 1);
-
-mprotect((void *)start, end - start,
- PROT_READ | PROT_WRITE | PROT_EXEC);
-}
-#endif
-
 void page_size_init(void)
 {
 /* NOTE: we can always suppose that qemu_host_page_size >=
@@ -473,14 +448,6 @@ static inline PageDesc *page_find(tb_page_addr_t index)
 #define USE_STATIC_CODE_GEN_BUFFER
 #endif
 
-/* ??? Should configure for this, not list operating systems here.  */
-#if (defined(__linux__) \
-|| defined(__FreeBSD__) || defined(__FreeBSD_kernel__) \
-|| defined(__DragonFly__) || defined(__OpenBSD__) \
-|| defined(__NetBSD__))
-# define USE_MMAP
-#endif
-
 /* Minimum size of the code gen buffer.  This number is randomly chosen,
but not so small that we can't have a fair number of TB's live.  */
 #define MIN_CODE_GEN_BUFFER_SIZE (1024u * 1024)
@@ -568,22 +535,102 @@ static inline void *split_cross_256mb(void *buf1, size_t 
size1)
 static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE]
 __attribute__((aligned(CODE_GEN_ALIGN)));
 
+# ifdef _WIN32
+static inline void do_protect(void *addr, long size, int prot)
+{
+DWORD old_protect;
+VirtualProtect(addr, size, prot, &old_protect);
+}
+
+static inline void map_exec(void *addr, long size)
+{
+do_protect(addr, size, PAGE_EXECUTE_READWRITE);
+}
+
+static inline void map_none(void *addr, long size)
+{
+do_protect(addr, size, PAGE_NOACCESS);
+}
+# else
+static inline void do_protect(void *addr, long size, int prot)
+{
+uintptr_t start, end;
+
+start = (uintptr_t)addr;
+start &= qemu_real_host_page_mask;
+
+end = (uintptr_t)addr + size;
+end = ROUND_UP(end, qemu_real_host_page_size);
+
+mprotect((void *)start, end - start, prot);
+}
+
+static inline void map_exec(void *addr, long size)
+{
+do_protect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
+}
+
+static inline void map_none(void *addr, long size)
+{
+do_protect(addr, size, PROT_NONE);
+}
+# endif /* WIN32 */
+
 static inline void *alloc_code_gen_buffer(void)
 {
 void *buf = static_code_gen_buffer;
+size_t full_size, size;
+
+/* The size of the buffer, rounded down to end on a page boundary.  */
+full_size = (((uintptr_t)buf + sizeof(static_code_gen_buffer))
+ & qemu_real_host_page_mask) - (uintptr_t)buf;
+
+/* Reserve a guard page.  */
+size = full_size - qemu_real_host_page_size;
+
+/* Honor a command-line option limiting the size of the buffer.  */
+if (size > tcg_ctx.code_gen_buffer_size) {
+size = (((uintptr_t)buf + tcg_ctx.code_gen_buffer_size)
+& qemu_real_host_page_mask) - (uintptr_t)buf;
+}
+tcg_ctx.code_gen_buffer_size = size;
+
 #ifdef __mips__
-if (cross_256mb(buf, tcg_ctx.code_gen_buffer_size)) {
-buf = split_cross_256mb(buf, tcg_ctx.code_gen_buffer_size);
+if (cross_256mb(buf, size)) {
+buf = split_cross_256mb(buf, size);
+size = tcg_ctx.code_gen_buffer_size;
 }
 #endif
-map_exec(buf, tcg_ctx.code_gen_buffer_size);
+
+map_exec(buf, size);
+map_none(buf + size, qemu_real_host_page_size);
+qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
+
 return buf;
 }
-#elif defined(USE_MMAP)
+#elif defined(_WIN32)
+static inline void *alloc_code_gen_buffer(void)
+{
+size_t size = tcg_ctx.code_gen_buffer_size;
+void *buf1, *buf2;
+
+/* Perform the allocation in two steps, so that the guard page
+   is reserved but uncommitted.  */
+buf1 = VirtualAlloc(NULL, size + qemu_real_host_page_size,
+MEM_RESERVE, PAGE_NOACCESS);
+if (buf1 != NULL) {
+buf2 = VirtualAlloc(buf1, size, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
+assert(buf1 == buf2);
+}
+
+return buf1;
+}
+#else
 static inline void *alloc_code_gen_buffer(void)
 {
 int flags = MAP_PRIVATE | MAP_ANONYMOUS;
 uintptr_t start = 0;
+size_t size = tcg_ctx.code_gen_buffer_size;
 void *buf;
 
 /* Constrain the 

[Qemu-devel] [PATCH] vmsvga: more cursor checks

2015-09-30 Thread Gerd Hoffmann
Check the cursor size more carefully.  Also switch to unsigned while
being at it, so they can't be negative.

Signed-off-by: Gerd Hoffmann 
---
 hw/display/vmware_vga.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/display/vmware_vga.c b/hw/display/vmware_vga.c
index 8e93509..9354037 100644
--- a/hw/display/vmware_vga.c
+++ b/hw/display/vmware_vga.c
@@ -488,10 +488,10 @@ static inline int vmsvga_fill_rect(struct vmsvga_state_s 
*s,
 #endif
 
 struct vmsvga_cursor_definition_s {
-int width;
-int height;
+uint32_t width;
+uint32_t height;
 int id;
-int bpp;
+uint32_t bpp;
 int hot_x;
 int hot_y;
 uint32_t mask[1024];
@@ -658,7 +658,10 @@ static void vmsvga_fifo_run(struct vmsvga_state_s *s)
 cursor.bpp = vmsvga_fifo_read(s);
 
 args = SVGA_BITMAP_SIZE(x, y) + SVGA_PIXMAP_SIZE(x, y, cursor.bpp);
-if (SVGA_BITMAP_SIZE(x, y) > sizeof cursor.mask ||
+if (cursor.width > 256 ||
+cursor.height > 256 ||
+cursor.bpp > 32 ||
+SVGA_BITMAP_SIZE(x, y) > sizeof cursor.mask ||
 SVGA_PIXMAP_SIZE(x, y, cursor.bpp) > sizeof cursor.image) {
 goto badcmd;
 }
-- 
1.8.3.1




[Qemu-devel] [PATCH v4 25/26] tcg: Check for overflow via highwater mark

2015-09-30 Thread Richard Henderson
We currently pre-compute an worst case code size for any TB, which
works out to be 122kB.  Since the average TB size is near 1kB, this
wastes quite a lot of storage.

Instead, check for overflow in between generating code for each opcode.
The overhead of the check isn't measurable and wastage is minimized.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h |  6 --
 tcg/tcg.c   | 14 +++---
 tcg/tcg.h   |  5 +++--
 translate-all.c | 31 ++-
 4 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 6871e78..71c9d85 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -62,12 +62,6 @@ typedef struct TranslationBlock TranslationBlock;
 #define OPC_BUF_SIZE 640
 #define OPC_MAX_SIZE (OPC_BUF_SIZE - MAX_OP_PER_INSTR)
 
-/* Maximum size a TCG op can expand to.  This is complicated because a
-   single op may require several host instructions and register reloads.
-   For now take a wild guess at 192 bytes, which should allow at least
-   a couple of fixup instructions per argument.  */
-#define TCG_MAX_OP_SIZE 192
-
 #define OPPARAM_BUF_SIZE (OPC_BUF_SIZE * MAX_OPC_PARAM)
 
 #include "qemu/log.h"
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5609108..682af8a 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -385,9 +385,10 @@ void tcg_prologue_init(TCGContext *s)
 total_size = s->code_gen_buffer_size - prologue_size;
 s->code_gen_buffer_size = total_size;
 
-/* Compute a high-water mark, at which we voluntarily flush the
-   buffer and start over.  */
-s->code_gen_buffer_max_size = total_size - TCG_MAX_OP_SIZE * OPC_BUF_SIZE;
+/* Compute a high-water mark, at which we voluntarily flush the buffer
+   and start over.  The size here is arbitrary, significantly larger
+   than we expect the code generation for any one opcode to require.  */
+s->code_gen_highwater = s->code_gen_buffer + (total_size - 1024);
 
 tcg_register_jit(s->code_gen_buffer, total_size);
 
@@ -2438,6 +2439,13 @@ int tcg_gen_code(TCGContext *s, tcg_insn_unit 
*gen_code_buf)
 #ifndef NDEBUG
 check_regs(s);
 #endif
+/* Test for (pending) buffer overflow.  The assumption is that any
+   one operation beginning below the high water mark cannot overrun
+   the buffer completely.  Thus we can test for overflow after
+   generating code without having to check during generation.  */
+if (unlikely(s->code_gen_ptr > s->code_gen_highwater)) {
+return -1;
+}
 }
 tcg_debug_assert(num_insns >= 0);
 s->gen_insn_end_off[num_insns] = tcg_current_code_size(s);
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 5fbbd15..a696922 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -559,10 +559,11 @@ struct TCGContext {
 void *code_gen_prologue;
 void *code_gen_buffer;
 size_t code_gen_buffer_size;
-/* threshold to flush the translated code buffer */
-size_t code_gen_buffer_max_size;
 void *code_gen_ptr;
 
+/* Threshold to flush the translated code buffer.  */
+void *code_gen_highwater;
+
 TBContext tb_ctx;
 
 /* The TCGBackendData structure is private to tcg-target.c.  */
diff --git a/translate-all.c b/translate-all.c
index b43bd03..333eba4 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -223,6 +223,7 @@ static target_long decode_sleb128(uint8_t **pp)
 
 static int encode_search(TranslationBlock *tb, uint8_t *block)
 {
+uint8_t *highwater = tcg_ctx.code_gen_highwater;
 uint8_t *p = block;
 int i, j, n;
 
@@ -241,6 +242,14 @@ static int encode_search(TranslationBlock *tb, uint8_t 
*block)
 }
 prev = (i == 0 ? 0 : tcg_ctx.gen_insn_end_off[i - 1]);
 p = encode_sleb128(p, tcg_ctx.gen_insn_end_off[i] - prev);
+
+/* Test for (pending) buffer overflow.  The assumption is that any
+   one row beginning below the high water mark cannot overrun
+   the buffer completely.  Thus we can test for overflow after
+   encoding a row without having to check during encoding.  */
+if (unlikely(p > highwater)) {
+return -1;
+}
 }
 
 return p - block;
@@ -756,9 +765,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 {
 TranslationBlock *tb;
 
-if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks ||
-(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) >=
- tcg_ctx.code_gen_buffer_max_size) {
+if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks) {
 return NULL;
 }
 tb = &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
@@ -1063,12 +1070,15 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 if (use_icount) {
 cflags |= CF_USE_ICOUNT;
 }
+
 tb = tb_alloc(pc);
-if (!tb) {
+if (unlikely(!tb)) {
+ buffer_overflow:
 /* flush must be done */
 tb_flush(cpu);
 /* cannot fail at t

Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox

2015-09-30 Thread Namsun Ch'o
> (I'm not sure what happens to your emails that all of them does not
> relate to the same thread/Message-ID, making a pain to follow through
> out the volume of email on the list, please pay attention to that)

I just click Reply All, I'm not sure how else I would do it. Are they somehow
being posted as new top level threads instead of replies?

> I'm not particularly against any improvement like configuration files or
> more command line args, but I'm concerned about the security itself. If
> some guest can scape to the host, it's gonna be much easier to whitelist
> syscalls for the next guests, changing the command line is a little too
> obvious -- paranoid example, I know.

Any config would be used only for syscalls which are already whitelisted by
the default qemu-seccomp.c For example changing the config could be used to
allow ioctls only on certain file descriptors, since ioctl is already
whitelisted, but it could not be used to whiteilst something which is not
already whitelisted, such as the personality system call.

> If you want to write an RFC with your idea, you're more than welcome. We
> could move on this discussion and perhaps come up with a nice solution.

I've never attempted that before. Could you point me in the right direction?



[Qemu-devel] [PATCH v4 23/26] tcg: Emit prologue to the beginning of code_gen_buffer

2015-09-30 Thread Richard Henderson
By putting the prologue at the end, we risk overwriting the
prologue should our estimate of maximum TB size.  Given the
two different placements of the call to tcg_prologue_init,
move the high water mark computation into tcg_prologue_init.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c   | 35 ---
 translate-all.c | 28 +---
 2 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index d3693b1..5609108 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -363,17 +363,38 @@ void tcg_context_init(TCGContext *s)
 
 void tcg_prologue_init(TCGContext *s)
 {
-/* init global prologue and epilogue */
-s->code_buf = s->code_gen_prologue;
-s->code_ptr = s->code_buf;
+size_t prologue_size, total_size;
+void *buf0, *buf1;
+
+/* Put the prologue at the beginning of code_gen_buffer.  */
+buf0 = s->code_gen_buffer;
+s->code_ptr = buf0;
+s->code_buf = buf0;
+s->code_gen_prologue = buf0;
+
+/* Generate the prologue.  */
 tcg_target_qemu_prologue(s);
-flush_icache_range((uintptr_t)s->code_buf, (uintptr_t)s->code_ptr);
+buf1 = s->code_ptr;
+flush_icache_range((uintptr_t)buf0, (uintptr_t)buf1);
+
+/* Deduct the prologue from the buffer.  */
+prologue_size = tcg_current_code_size(s);
+s->code_gen_ptr = buf1;
+s->code_gen_buffer = buf1;
+s->code_buf = buf1;
+total_size = s->code_gen_buffer_size - prologue_size;
+s->code_gen_buffer_size = total_size;
+
+/* Compute a high-water mark, at which we voluntarily flush the
+   buffer and start over.  */
+s->code_gen_buffer_max_size = total_size - TCG_MAX_OP_SIZE * OPC_BUF_SIZE;
+
+tcg_register_jit(s->code_gen_buffer, total_size);
 
 #ifdef DEBUG_DISAS
 if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM)) {
-size_t size = tcg_current_code_size(s);
-qemu_log("PROLOGUE: [size=%zu]\n", size);
-log_disas(s->code_buf, size);
+qemu_log("PROLOGUE: [size=%zu]\n", prologue_size);
+log_disas(buf0, prologue_size);
 qemu_log("\n");
 qemu_log_flush();
 }
diff --git a/translate-all.c b/translate-all.c
index 3454f4e..0e8d176 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -690,23 +690,15 @@ static inline void code_gen_alloc(size_t tb_size)
 }
 
 qemu_madvise(tcg_ctx.code_gen_buffer, tcg_ctx.code_gen_buffer_size,
-QEMU_MADV_HUGEPAGE);
-
-/* Steal room for the prologue at the end of the buffer.  This ensures
-   (via the MAX_CODE_GEN_BUFFER_SIZE limits above) that direct branches
-   from TB's to the prologue are going to be in range.  It also means
-   that we don't need to mark (additional) portions of the data segment
-   as executable.  */
-tcg_ctx.code_gen_prologue = tcg_ctx.code_gen_buffer +
-tcg_ctx.code_gen_buffer_size - 1024;
-tcg_ctx.code_gen_buffer_size -= 1024;
-
-tcg_ctx.code_gen_buffer_max_size = tcg_ctx.code_gen_buffer_size -
-(TCG_MAX_OP_SIZE * OPC_BUF_SIZE);
-tcg_ctx.code_gen_max_blocks = tcg_ctx.code_gen_buffer_size /
-CODE_GEN_AVG_BLOCK_SIZE;
-tcg_ctx.tb_ctx.tbs =
-g_malloc(tcg_ctx.code_gen_max_blocks * sizeof(TranslationBlock));
+ QEMU_MADV_HUGEPAGE);
+
+/* Estimate a good size for the number of TBs we can support.  We
+   still haven't deducted the prologue from the buffer size here,
+   but that's minimal and won't affect the estimate much.  */
+tcg_ctx.code_gen_max_blocks
+= tcg_ctx.code_gen_buffer_size / CODE_GEN_AVG_BLOCK_SIZE;
+tcg_ctx.tb_ctx.tbs = g_new(TranslationBlock, tcg_ctx.code_gen_max_blocks);
+
 qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
 }
 
@@ -717,8 +709,6 @@ void tcg_exec_init(unsigned long tb_size)
 {
 cpu_gen_init();
 code_gen_alloc(tb_size);
-tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer;
-tcg_register_jit(tcg_ctx.code_gen_buffer, tcg_ctx.code_gen_buffer_size);
 page_init();
 #if defined(CONFIG_SOFTMMU)
 /* There's no guest base to take into account, so go ahead and
-- 
2.4.3




[Qemu-devel] [PATCH v4 21/26] tcg: Remove gen_intermediate_code_pc

2015-09-30 Thread Richard Henderson
It is no longer used, so tidy up everything reached by it.
This includes the gen_opc_* arrays, the search_pc parameter
and the inline gen_intermediate_code_internal functions.

Reviewed-by: Aurelien Jarno 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h   |  1 -
 target-alpha/translate.c  | 41 
 target-arm/translate-a64.c| 30 +++-
 target-arm/translate.c| 54 ---
 target-arm/translate.h|  8 ++-
 target-cris/translate.c   | 50 +--
 target-i386/translate.c   | 49 ---
 target-lm32/translate.c   | 42 -
 target-m68k/translate.c   | 43 --
 target-microblaze/translate.c | 40 
 target-mips/translate.c   | 48 --
 target-moxie/translate.c  | 41 
 target-openrisc/translate.c   | 42 -
 target-ppc/translate.c| 40 
 target-s390x/translate.c  | 44 ---
 target-sh4/translate.c| 43 --
 target-sparc/translate.c  | 51 
 target-tilegx/translate.c | 41 
 target-tricore/translate.c| 31 -
 target-unicore32/translate.c  | 44 ---
 target-xtensa/translate.c | 39 ---
 tcg/tcg.h |  4 
 22 files changed, 90 insertions(+), 736 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 402dd87..6871e78 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -73,7 +73,6 @@ typedef struct TranslationBlock TranslationBlock;
 #include "qemu/log.h"
 
 void gen_intermediate_code(CPUArchState *env, struct TranslationBlock *tb);
-void gen_intermediate_code_pc(CPUArchState *env, struct TranslationBlock *tb);
 void restore_state_to_opc(CPUArchState *env, struct TranslationBlock *tb,
   target_ulong *data);
 
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 8395a30..f936d1b 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2858,17 +2858,14 @@ static ExitStatus translate_one(DisasContext *ctx, 
uint32_t insn)
 return ret;
 }
 
-static inline void gen_intermediate_code_internal(AlphaCPU *cpu,
-  TranslationBlock *tb,
-  bool search_pc)
+void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
 {
+AlphaCPU *cpu = alpha_env_get_cpu(env);
 CPUState *cs = CPU(cpu);
-CPUAlphaState *env = &cpu->env;
 DisasContext ctx, *ctxp = &ctx;
 target_ulong pc_start;
 target_ulong pc_mask;
 uint32_t insn;
-int j, lj = -1;
 ExitStatus ret;
 int num_insns;
 int max_insns;
@@ -2915,18 +2912,6 @@ static inline void 
gen_intermediate_code_internal(AlphaCPU *cpu,
 
 gen_tb_start(tb);
 do {
-if (search_pc) {
-j = tcg_op_buf_count();
-if (lj < j) {
-lj++;
-while (lj < j) {
-tcg_ctx.gen_opc_instr_start[lj++] = 0;
-}
-}
-tcg_ctx.gen_opc_pc[lj] = ctx.pc;
-tcg_ctx.gen_opc_instr_start[lj] = 1;
-tcg_ctx.gen_opc_icount[lj] = num_insns;
-}
 tcg_gen_insn_start(ctx.pc);
 num_insns++;
 
@@ -2993,16 +2978,8 @@ static inline void 
gen_intermediate_code_internal(AlphaCPU *cpu,
 
 gen_tb_end(tb, num_insns);
 
-if (search_pc) {
-j = tcg_op_buf_count();
-lj++;
-while (lj <= j) {
-tcg_ctx.gen_opc_instr_start[lj++] = 0;
-}
-} else {
-tb->size = ctx.pc - pc_start;
-tb->icount = num_insns;
-}
+tb->size = ctx.pc - pc_start;
+tb->icount = num_insns;
 
 #ifdef DEBUG_DISAS
 if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) {
@@ -3013,16 +2990,6 @@ static inline void 
gen_intermediate_code_internal(AlphaCPU *cpu,
 #endif
 }
 
-void gen_intermediate_code (CPUAlphaState *env, struct TranslationBlock *tb)
-{
-gen_intermediate_code_internal(alpha_env_get_cpu(env), tb, false);
-}
-
-void gen_intermediate_code_pc (CPUAlphaState *env, struct TranslationBlock *tb)
-{
-gen_intermediate_code_internal(alpha_env_get_cpu(env), tb, true);
-}
-
 void restore_state_to_opc(CPUAlphaState *env, TranslationBlock *tb,
   target_ulong *data)
 {
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 5022fc3..e65e309 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -11000

Re: [Qemu-devel] qemu-system-alpha -nographic does not work

2015-09-30 Thread Dennis Luehring

Am 30.09.2015 um 08:48 schrieb Richard Henderson:

On 09/30/2015 02:36 PM, Dennis Luehring wrote:
> ~/qemu/alpha-softmmu/qemu-system-alpha -m 1GB -monitor 
telnet::4440,server,nowait\
>   -kernel vmlinux.img-2.6.26-2-alpha-generic -initrd
> initrd.img-2.6.26-2-alpha-generic\
>   -net nic -net user -hda alpha.qcow2\
>   -drive file=debian-5010-alpha-netinst.iso,if=ide,media=cdrom -append
> 'root=/dev/hda3' #-serial telnet::3000,server -nographic

You forgot "-append console=ttyS0".  The kernel simply isn't writing to the
serial port.


works thx, maybe i forgot it because some of my test images sparc64, 
mips64 just don't need the append and it still work with -nographic




Re: [Qemu-devel] [Qemu-ppc] [RFC/PATCH] monitor/ppc: Access all SPRs from the monitor

2015-09-30 Thread David Gibson
On Sun, Sep 27, 2015 at 04:31:16PM +1000, Benjamin Herrenschmidt wrote:
> We already have a table with all supported SPRs along with their names,
> so let's use that rather than a duplicate table that is perpetually
> out of sync in the monitor code.
> 
> This adds a new monitor hook target_extra_monitor_def() which is called
> if nothing is found is the normal table. We still use the old mechanism
> for anything that isn't an SPR.
> 
> Signed-off-by: Benjamin Herrenschmidt 

This looks like a good idea, but it seems to be a slightly different
approach from the one taken by some rather similar patches Alexey
posted recently.

Would you care to co-ordinate on which of those approaches to go ahead
with?

[snip]
> @@ -253,3 +180,23 @@ const MonitorDef *target_monitor_defs(void)
>  {
>  return monitor_defs;
>  }
> +
> +int target_extra_monitor_def(uint64_t *pval, const char *name)
> +{
> + /* On ppc, search through the SPRs so we can print any of them */
> +{
   ^
Also, this appears to be a redundant set of braces.

> +CPUArchState *env = mon_get_cpu_env();
> +ppc_spr_t *spr_cb = env->spr_cb;
> +int i;
> +
> +for (i = 0; i < 1024; i++) {
> +if (!spr_cb[i].name || strcasecmp(name, spr_cb[i].name)) {
> +continue;
> +}
> +*pval = env->spr[i];
> +return 0;
> +}
> +}
> +return -1;
> +}
> +
> 
> 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpZa368vnIDO.pgp
Description: PGP signature


[Qemu-devel] [PATCH v4 19/26] tcg: Pass data argument to restore_state_to_opc

2015-09-30 Thread Richard Henderson
The gen_opc_* arrays are already redundant with the data stored in
the insn_start arguments.  Transition restore_state_to_opc to use
data from the latter.

Reviewed-by: Aurelien Jarno 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h   |  2 +-
 target-alpha/translate.c  |  5 +++--
 target-arm/translate.c|  9 +
 target-cris/translate.c   |  5 +++--
 target-i386/translate.c   | 26 ++
 target-lm32/translate.c   |  5 +++--
 target-m68k/translate.c   |  5 +++--
 target-microblaze/translate.c |  5 +++--
 target-mips/translate.c   |  9 +
 target-moxie/translate.c  |  5 +++--
 target-openrisc/translate.c   |  4 ++--
 target-ppc/translate.c|  5 +++--
 target-s390x/translate.c  |  8 
 target-sh4/translate.c|  7 ---
 target-sparc/translate.c  | 10 ++
 target-tilegx/translate.c |  5 +++--
 target-tricore/translate.c|  5 +++--
 target-unicore32/translate.c  |  5 +++--
 target-xtensa/translate.c |  5 +++--
 tcg/tcg.c | 11 ++-
 tcg/tcg.h |  2 ++
 translate-all.c   |  2 +-
 22 files changed, 79 insertions(+), 66 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 5340745..6a69802 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -75,7 +75,7 @@ typedef struct TranslationBlock TranslationBlock;
 void gen_intermediate_code(CPUArchState *env, struct TranslationBlock *tb);
 void gen_intermediate_code_pc(CPUArchState *env, struct TranslationBlock *tb);
 void restore_state_to_opc(CPUArchState *env, struct TranslationBlock *tb,
-  int pc_pos);
+  target_ulong *data);
 
 void cpu_gen_init(void);
 bool cpu_restore_state(CPUState *cpu, uintptr_t searched_pc);
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 538e202..8395a30 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -3023,7 +3023,8 @@ void gen_intermediate_code_pc (CPUAlphaState *env, struct 
TranslationBlock *tb)
 gen_intermediate_code_internal(alpha_env_get_cpu(env), tb, true);
 }
 
-void restore_state_to_opc(CPUAlphaState *env, TranslationBlock *tb, int pc_pos)
+void restore_state_to_opc(CPUAlphaState *env, TranslationBlock *tb,
+  target_ulong *data)
 {
-env->pc = tcg_ctx.gen_opc_pc[pc_pos];
+env->pc = data[0];
 }
diff --git a/target-arm/translate.c b/target-arm/translate.c
index fedb781..2296953 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -11612,13 +11612,14 @@ void arm_cpu_dump_state(CPUState *cs, FILE *f, 
fprintf_function cpu_fprintf,
 }
 }
 
-void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb, int pc_pos)
+void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb,
+  target_ulong *data)
 {
 if (is_a64(env)) {
-env->pc = tcg_ctx.gen_opc_pc[pc_pos];
+env->pc = data[0];
 env->condexec_bits = 0;
 } else {
-env->regs[15] = tcg_ctx.gen_opc_pc[pc_pos];
-env->condexec_bits = gen_opc_condexec_bits[pc_pos];
+env->regs[15] = data[0];
+env->condexec_bits = data[1];
 }
 }
diff --git a/target-cris/translate.c b/target-cris/translate.c
index d038bdb..77e2794 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -3433,7 +3433,8 @@ void cris_initialize_tcg(void)
 }
 }
 
-void restore_state_to_opc(CPUCRISState *env, TranslationBlock *tb, int pc_pos)
+void restore_state_to_opc(CPUCRISState *env, TranslationBlock *tb,
+  target_ulong *data)
 {
-env->pc = tcg_ctx.gen_opc_pc[pc_pos];
+env->pc = data[0];
 }
diff --git a/target-i386/translate.c b/target-i386/translate.c
index d3282e8..2f7b77f 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -8055,26 +8055,12 @@ void gen_intermediate_code_pc(CPUX86State *env, 
TranslationBlock *tb)
 gen_intermediate_code_internal(x86_env_get_cpu(env), tb, true);
 }
 
-void restore_state_to_opc(CPUX86State *env, TranslationBlock *tb, int pc_pos)
+void restore_state_to_opc(CPUX86State *env, TranslationBlock *tb,
+  target_ulong *data)
 {
-int cc_op;
-#ifdef DEBUG_DISAS
-if (qemu_loglevel_mask(CPU_LOG_TB_OP)) {
-int i;
-qemu_log("RESTORE:\n");
-for(i = 0;i <= pc_pos; i++) {
-if (tcg_ctx.gen_opc_instr_start[i]) {
-qemu_log("0x%04x: " TARGET_FMT_lx "\n", i,
-tcg_ctx.gen_opc_pc[i]);
-}
-}
-qemu_log("pc_pos=0x%x eip=" TARGET_FMT_lx " cs_base=%x\n",
-pc_pos, tcg_ctx.gen_opc_pc[pc_pos] - tb->cs_base,
-(uint32_t)tb->cs_base);
-}
-#endif
-env->eip = tcg_ctx.gen_opc_pc[pc_pos] - tb->cs_base;
-cc_op = gen_opc_cc_op[pc_pos];
-if (cc_op != CC_OP_DYNAMIC)
+int cc_op 

[Qemu-devel] [PATCH v4 22/26] tcg: Remove tcg_gen_code_search_pc

2015-09-30 Thread Richard Henderson
It's no longer used, so tidy up everything reached by it.

Reviewed-by: Aurelien Jarno 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 59 +++
 tcg/tcg.h |  2 --
 2 files changed, 19 insertions(+), 42 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 40f24de..d3693b1 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2290,12 +2290,28 @@ void tcg_dump_op_count(FILE *f, fprintf_function 
cpu_fprintf)
 #endif
 
 
-static inline int tcg_gen_code_common(TCGContext *s,
-  tcg_insn_unit *gen_code_buf,
-  long search_pc)
+int tcg_gen_code(TCGContext *s, tcg_insn_unit *gen_code_buf)
 {
 int i, oi, oi_next, num_insns;
 
+#ifdef CONFIG_PROFILER
+{
+int n;
+
+n = s->gen_last_op_idx + 1;
+s->op_count += n;
+if (n > s->op_count_max) {
+s->op_count_max = n;
+}
+
+n = s->nb_temps;
+s->temp_count += n;
+if (n > s->temp_count_max) {
+s->temp_count_max = n;
+}
+}
+#endif
+
 #ifdef DEBUG_DISAS
 if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP))) {
 qemu_log("OP:\n");
@@ -2398,9 +2414,6 @@ static inline int tcg_gen_code_common(TCGContext *s,
 tcg_reg_alloc_op(s, def, opc, args, dead_args, sync_args);
 break;
 }
-if (search_pc >= 0 && search_pc < tcg_current_code_size(s)) {
-return oi;
-}
 #ifndef NDEBUG
 check_regs(s);
 #endif
@@ -2410,30 +2423,6 @@ static inline int tcg_gen_code_common(TCGContext *s,
 
 /* Generate TB finalization at the end of block */
 tcg_out_tb_finalize(s);
-return -1;
-}
-
-int tcg_gen_code(TCGContext *s, tcg_insn_unit *gen_code_buf)
-{
-#ifdef CONFIG_PROFILER
-{
-int n;
-
-n = s->gen_last_op_idx + 1;
-s->op_count += n;
-if (n > s->op_count_max) {
-s->op_count_max = n;
-}
-
-n = s->nb_temps;
-s->temp_count += n;
-if (n > s->temp_count_max) {
-s->temp_count_max = n;
-}
-}
-#endif
-
-tcg_gen_code_common(s, gen_code_buf, -1);
 
 /* flush instruction cache */
 flush_icache_range((uintptr_t)s->code_buf, (uintptr_t)s->code_ptr);
@@ -2441,16 +2430,6 @@ int tcg_gen_code(TCGContext *s, tcg_insn_unit 
*gen_code_buf)
 return tcg_current_code_size(s);
 }
 
-/* Return the index of the micro operation such as the pc after is <
-   offset bytes from the start of the TB.  The contents of gen_code_buf must
-   not be changed, though writing the same values is ok.
-   Return -1 if not found. */
-int tcg_gen_code_search_pc(TCGContext *s, tcg_insn_unit *gen_code_buf,
-   long offset)
-{
-return tcg_gen_code_common(s, gen_code_buf, offset);
-}
-
 #ifdef CONFIG_PROFILER
 void tcg_dump_info(FILE *f, fprintf_function cpu_fprintf)
 {
diff --git a/tcg/tcg.h b/tcg/tcg.h
index d079a91..5fbbd15 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -626,8 +626,6 @@ void tcg_prologue_init(TCGContext *s);
 void tcg_func_start(TCGContext *s);
 
 int tcg_gen_code(TCGContext *s, tcg_insn_unit *gen_code_buf);
-int tcg_gen_code_search_pc(TCGContext *s, tcg_insn_unit *gen_code_buf,
-   long offset);
 
 void tcg_set_frame(TCGContext *s, int reg, intptr_t start, intptr_t size);
 
-- 
2.4.3




[Qemu-devel] [PATCH v4 18/26] tcg: Add TCG_MAX_INSNS

2015-09-30 Thread Richard Henderson
Adjust all translators to respect it.

Reviewed-by: Aurelien Jarno 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target-alpha/translate.c  |  3 +++
 target-arm/translate-a64.c|  3 +++
 target-arm/translate.c|  6 +-
 target-cris/translate.c   |  3 +++
 target-i386/translate.c   |  6 +-
 target-lm32/translate.c   |  3 +++
 target-m68k/translate.c   |  6 +-
 target-microblaze/translate.c |  6 +-
 target-mips/translate.c   |  7 ++-
 target-moxie/translate.c  | 13 +++--
 target-openrisc/translate.c   |  3 +++
 target-ppc/translate.c|  6 +-
 target-s390x/translate.c  |  3 +++
 target-sh4/translate.c|  7 ++-
 target-sparc/translate.c  |  7 ++-
 target-tilegx/translate.c |  3 +++
 target-tricore/translate.c| 20 +---
 target-unicore32/translate.c  |  3 +++
 target-xtensa/translate.c |  3 +++
 tcg/tcg.h |  1 +
 20 files changed, 95 insertions(+), 17 deletions(-)

diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index c10193e..538e202 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2903,6 +2903,9 @@ static inline void 
gen_intermediate_code_internal(AlphaCPU *cpu,
 if (max_insns == 0) {
 max_insns = CF_COUNT_MASK;
 }
+if (max_insns > TCG_MAX_INSNS) {
+max_insns = TCG_MAX_INSNS;
+}
 
 if (in_superpage(&ctx, pc_start)) {
 pc_mask = (1ULL << 41) - 1;
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 654a586..5022fc3 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -11072,6 +11072,9 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu,
 if (max_insns == 0) {
 max_insns = CF_COUNT_MASK;
 }
+if (max_insns > TCG_MAX_INSNS) {
+max_insns = TCG_MAX_INSNS;
+}
 
 gen_tb_start(tb);
 
diff --git a/target-arm/translate.c b/target-arm/translate.c
index fb69ecb..fedb781 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -11258,8 +11258,12 @@ static inline void 
gen_intermediate_code_internal(ARMCPU *cpu,
 lj = -1;
 num_insns = 0;
 max_insns = tb->cflags & CF_COUNT_MASK;
-if (max_insns == 0)
+if (max_insns == 0) {
 max_insns = CF_COUNT_MASK;
+}
+if (max_insns > TCG_MAX_INSNS) {
+max_insns = TCG_MAX_INSNS;
+}
 
 gen_tb_start(tb);
 
diff --git a/target-cris/translate.c b/target-cris/translate.c
index 3d55a6a..d038bdb 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -3155,6 +3155,9 @@ gen_intermediate_code_internal(CRISCPU *cpu, 
TranslationBlock *tb,
 if (max_insns == 0) {
 max_insns = CF_COUNT_MASK;
 }
+if (max_insns > TCG_MAX_INSNS) {
+max_insns = TCG_MAX_INSNS;
+}
 
 gen_tb_start(tb);
 do {
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 7501b91..d3282e8 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -7932,8 +7932,12 @@ static inline void gen_intermediate_code_internal(X86CPU 
*cpu,
 lj = -1;
 num_insns = 0;
 max_insns = tb->cflags & CF_COUNT_MASK;
-if (max_insns == 0)
+if (max_insns == 0) {
 max_insns = CF_COUNT_MASK;
+}
+if (max_insns > TCG_MAX_INSNS) {
+max_insns = TCG_MAX_INSNS;
+}
 
 gen_tb_start(tb);
 for(;;) {
diff --git a/target-lm32/translate.c b/target-lm32/translate.c
index 8ea7929..e16c31a 100644
--- a/target-lm32/translate.c
+++ b/target-lm32/translate.c
@@ -1069,6 +1069,9 @@ void gen_intermediate_code_internal(LM32CPU *cpu,
 if (max_insns == 0) {
 max_insns = CF_COUNT_MASK;
 }
+if (max_insns > TCG_MAX_INSNS) {
+max_insns = TCG_MAX_INSNS;
+}
 
 gen_tb_start(tb);
 do {
diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index afef37f..185c565 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2991,8 +2991,12 @@ gen_intermediate_code_internal(M68kCPU *cpu, 
TranslationBlock *tb,
 lj = -1;
 num_insns = 0;
 max_insns = tb->cflags & CF_COUNT_MASK;
-if (max_insns == 0)
+if (max_insns == 0) {
 max_insns = CF_COUNT_MASK;
+}
+if (max_insns > TCG_MAX_INSNS) {
+max_insns = TCG_MAX_INSNS;
+}
 
 gen_tb_start(tb);
 do {
diff --git a/target-microblaze/translate.c b/target-microblaze/translate.c
index 1224456..58b27ca 100644
--- a/target-microblaze/translate.c
+++ b/target-microblaze/translate.c
@@ -1674,8 +1674,12 @@ gen_intermediate_code_internal(MicroBlazeCPU *cpu, 
TranslationBlock *tb,
 lj = -1;
 num_insns = 0;
 max_insns = tb->cflags & CF_COUNT_MASK;
-if (max_insns == 0)
+if (max_insns == 0) {
 max_insns = CF_COUNT_MASK;
+}
+if (max_insns > TCG_MAX_INSNS) {
+max_insns = TCG_MAX_INSNS;
+}
 
 gen_tb_start(tb);
 do
diff --git a/target-mips/translate.c b/target-mips/translate.c
index 30d

Re: [Qemu-devel] Loading image/elf to cpu that has different not system memory address space

2015-09-30 Thread Marcin Krzemiński
2015-09-30 0:59 GMT+02:00 Peter Maydell :

> On 29 September 2015 at 23:40, Alistair Francis 
> wrote:
> > On Thu, Sep 24, 2015 at 11:58 AM, mar.krzeminski
> >  wrote:
> >>
> >>
> >> W dniu 24.09.2015 o 20:38, Peter Crosthwaite pisze:
> >>
> >>> On Thu, Sep 24, 2015 at 10:14 AM, mar.krzeminski
> >>>  wrote:
>  Today I stacked on other interesting think - and I do not want to spam
>  this
>  list - it is hack in cortex-m3
>  from armv7m.
> 
>   /* Hack to map an additional page of ram at the top of the
> address
>  space.  This stops qemu complaining about executing code
> outside
>  RAM
>  when returning from an exception.  */
>   memory_region_init_ram(hack, NULL, "armv7m.hack", 0x1000,
>  &error_abort);
>
> >> Then took me a while to understand why qemu crash while serving M3
> exception
> >> because I haven't took this hack :)
> >
> > It sounds like you figured out why it's there. From memory it is to
> > handle an exception, because the guest would just to a really high
> > memory area and if there is no memory there QEMU will throw an error.
>
> Sort of. This is for the exception-return mechanism, which on
> M-profile works by having the function-return instructions
> special-case attempts to "return" to addresses 0xfffx, which
> should do exception return semantics. QEMU's implementation
> handles this by treating "attempt to execute at 0xfff0"
> specially in the translator (we generate EXCP_EXCEPTION_EXIT).
> For this to work we have to have some fake RAM at that address.
>
> This isn't actually an architecturally correct way to do it,
> because the architecture says that you should only get the
> special exception-return behaviour if you use the right
> instructions to return to one of the magic addresses. If
> you just try to branch into those addresses you should
> get some kind of fault instead.
>
> However, like many things in our M-profile implementation,
> it sort of works and nobody has cared enough about M-profile
> to try to clean it up. (An efficient implementation of the
> right behaviour could be a bit tricky.)
>
> thanks
> -- PMM
>
I have at 0xfff0 real memory now (with aliasing to lower memory
address).
Does it mean that qemu might try to execute some instructions from there?

Regards,
Marcin


  1   2   3   >