Re: [Qemu-devel] [PATCH qemu v16 00/19] spapr: vfio: Enable Dynamic DMA windows (DDW)

2016-05-12 Thread Alex Williamson
On Fri, 13 May 2016 14:54:52 +1000
Alexey Kardashevskiy  wrote:

> Alex W,
> 
> could you please review VFIO-related chunks? Thanks!


https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg00744.html
https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg00745.html


> On 05/04/2016 04:52 PM, Alexey Kardashevskiy wrote:
> > Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus
> > where devices are allowed to do DMA. These ranges are called DMA windows.
> > By default, there is a single DMA window, 1 or 2GB big, mapped at zero
> > on a PCI bus.
> >
> > PAPR defines a DDW RTAS API which allows pseries guests
> > querying the hypervisor about DDW support and capabilities (page size mask
> > for now). A pseries guest may request an additional (to the default)
> > DMA windows using this RTAS API.
> > The existing pseries Linux guests request an additional window as big as
> > the guest RAM and map the entire guest window which effectively creates
> > direct mapping of the guest memory to a PCI bus.
> >
> > This patchset reworks PPC64 IOMMU code and adds necessary structures
> > to support big windows on pseries.
> >
> > This patchset is based on the latest upstream.
> >
> > This includes "vmstate: Define VARRAY with VMS_ALLOC" as it has been 
> > accepted
> > but has not been merged to upstream yet.
> >
> > Please comment. Thanks!
> >
> >
> > Paolo, I did cc: you on this because of 02/19 and 03/19, would be great to
> > get an opinion as the rest of the series relies on it to do
> > vfio-pci hot _un_plug properly. Thanks!
> >
> >
> > Alexey Kardashevskiy (19):
> >   vfio: Delay DMA address space listener release
> >   memory: Call region_del() callbacks on memory listener unregistering
> >   memory: Fix IOMMU replay base address
> >   vmstate: Define VARRAY with VMS_ALLOC
> >   vfio: Check that IOMMU MR translates to system address space
> >   spapr_pci: Use correct DMA LIOBN when composing the device tree
> >   spapr_iommu: Move table allocation to helpers
> >   spapr_iommu: Introduce "enabled" state for TCE table
> >   spapr_iommu: Finish renaming vfio_accel to need_vfio
> >   spapr_iommu: Migrate full state
> >   spapr_iommu: Add root memory region
> >   spapr_pci: Reset DMA config on PHB reset
> >   memory: Add reporting of supported page sizes
> >   vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)
> >   spapr_pci: Add and export DMA resetting helper
> >   vfio: Add host side DMA window capabilities
> >   spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping being
> > used by VFIO
> >   vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)
> >   spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)
> >
> >  hw/ppc/Makefile.objs  |   1 +
> >  hw/ppc/spapr.c|   5 +
> >  hw/ppc/spapr_iommu.c  | 228 --
> >  hw/ppc/spapr_pci.c|  96 +
> >  hw/ppc/spapr_rtas_ddw.c   | 292 ++
> >  hw/ppc/spapr_vio.c|   8 +-
> >  hw/vfio/Makefile.objs |   1 +
> >  hw/vfio/common.c  | 319 
> > +++---
> >  hw/vfio/prereg.c  | 137 ++
> >  include/exec/memory.h |  22 ++-
> >  include/hw/pci-host/spapr.h   |  10 +-
> >  include/hw/ppc/spapr.h|  31 +++-
> >  include/hw/vfio/vfio-common.h |  21 ++-
> >  include/migration/vmstate.h   |  10 ++
> >  memory.c  |  64 -
> >  target-ppc/kvm_ppc.h  |   2 +-
> >  trace-events  |  12 +-
> >  17 files changed, 1120 insertions(+), 139 deletions(-)
> >  create mode 100644 hw/ppc/spapr_rtas_ddw.c
> >  create mode 100644 hw/vfio/prereg.c
> >  
> 
> 




Re: [Qemu-devel] [PATCH qemu v16 00/19] spapr: vfio: Enable Dynamic DMA windows (DDW)

2016-05-12 Thread Alexey Kardashevskiy

Alex W,

could you please review VFIO-related chunks? Thanks!


On 05/04/2016 04:52 PM, Alexey Kardashevskiy wrote:

Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus
where devices are allowed to do DMA. These ranges are called DMA windows.
By default, there is a single DMA window, 1 or 2GB big, mapped at zero
on a PCI bus.

PAPR defines a DDW RTAS API which allows pseries guests
querying the hypervisor about DDW support and capabilities (page size mask
for now). A pseries guest may request an additional (to the default)
DMA windows using this RTAS API.
The existing pseries Linux guests request an additional window as big as
the guest RAM and map the entire guest window which effectively creates
direct mapping of the guest memory to a PCI bus.

This patchset reworks PPC64 IOMMU code and adds necessary structures
to support big windows on pseries.

This patchset is based on the latest upstream.

This includes "vmstate: Define VARRAY with VMS_ALLOC" as it has been accepted
but has not been merged to upstream yet.

Please comment. Thanks!


Paolo, I did cc: you on this because of 02/19 and 03/19, would be great to
get an opinion as the rest of the series relies on it to do
vfio-pci hot _un_plug properly. Thanks!


Alexey Kardashevskiy (19):
  vfio: Delay DMA address space listener release
  memory: Call region_del() callbacks on memory listener unregistering
  memory: Fix IOMMU replay base address
  vmstate: Define VARRAY with VMS_ALLOC
  vfio: Check that IOMMU MR translates to system address space
  spapr_pci: Use correct DMA LIOBN when composing the device tree
  spapr_iommu: Move table allocation to helpers
  spapr_iommu: Introduce "enabled" state for TCE table
  spapr_iommu: Finish renaming vfio_accel to need_vfio
  spapr_iommu: Migrate full state
  spapr_iommu: Add root memory region
  spapr_pci: Reset DMA config on PHB reset
  memory: Add reporting of supported page sizes
  vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)
  spapr_pci: Add and export DMA resetting helper
  vfio: Add host side DMA window capabilities
  spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping being
used by VFIO
  vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)
  spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

 hw/ppc/Makefile.objs  |   1 +
 hw/ppc/spapr.c|   5 +
 hw/ppc/spapr_iommu.c  | 228 --
 hw/ppc/spapr_pci.c|  96 +
 hw/ppc/spapr_rtas_ddw.c   | 292 ++
 hw/ppc/spapr_vio.c|   8 +-
 hw/vfio/Makefile.objs |   1 +
 hw/vfio/common.c  | 319 +++---
 hw/vfio/prereg.c  | 137 ++
 include/exec/memory.h |  22 ++-
 include/hw/pci-host/spapr.h   |  10 +-
 include/hw/ppc/spapr.h|  31 +++-
 include/hw/vfio/vfio-common.h |  21 ++-
 include/migration/vmstate.h   |  10 ++
 memory.c  |  64 -
 target-ppc/kvm_ppc.h  |   2 +-
 trace-events  |  12 +-
 17 files changed, 1120 insertions(+), 139 deletions(-)
 create mode 100644 hw/ppc/spapr_rtas_ddw.c
 create mode 100644 hw/vfio/prereg.c




--
Alexey



[Qemu-devel] [PATCH v3 1/4] hw/audio: QOM'ify cs4231.c

2016-05-12 Thread xiaoqiang zhao
Drop the old SysBus init function and use instance_init

Reviewed-by: Paolo Bonzini 
Signed-off-by: xiaoqiang zhao 
---
 hw/audio/cs4231.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/hw/audio/cs4231.c b/hw/audio/cs4231.c
index caf97c1..30690f9 100644
--- a/hw/audio/cs4231.c
+++ b/hw/audio/cs4231.c
@@ -145,16 +145,15 @@ static const VMStateDescription vmstate_cs4231 = {
 }
 };
 
-static int cs4231_init1(SysBusDevice *dev)
+static void cs4231_init(Object *obj)
 {
-CSState *s = CS4231(dev);
+CSState *s = CS4231(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
-memory_region_init_io(>iomem, OBJECT(s), _mem_ops, s, "cs4321",
+memory_region_init_io(>iomem, obj, _mem_ops, s, "cs4321",
   CS_SIZE);
 sysbus_init_mmio(dev, >iomem);
 sysbus_init_irq(dev, >irq);
-
-return 0;
 }
 
 static Property cs4231_properties[] = {
@@ -164,9 +163,7 @@ static Property cs4231_properties[] = {
 static void cs4231_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = cs4231_init1;
 dc->reset = cs_reset;
 dc->vmsd = _cs4231;
 dc->props = cs4231_properties;
@@ -176,6 +173,7 @@ static const TypeInfo cs4231_info = {
 .name  = TYPE_CS4231,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(CSState),
+.instance_init = cs4231_init,
 .class_init= cs4231_class_init,
 };
 
-- 
2.1.4





[Qemu-devel] [PATCH v3 4/4] hw/audio: QOM'ify milkymist-ac97.c

2016-05-12 Thread xiaoqiang zhao
* Drop the old SysBus init function and use instance_init
* Move AUD_open_in / AUD_open_out function into realize stage

Acked-by: Michael Walle 
Tested-by: Michael Walle 
Signed-off-by: xiaoqiang zhao 
---
 hw/audio/milkymist-ac97.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/hw/audio/milkymist-ac97.c b/hw/audio/milkymist-ac97.c
index 6a3b536..5075c2b 100644
--- a/hw/audio/milkymist-ac97.c
+++ b/hw/audio/milkymist-ac97.c
@@ -284,16 +284,26 @@ static int ac97_post_load(void *opaque, int version_id)
 return 0;
 }
 
-static int milkymist_ac97_init(SysBusDevice *dev)
+static void milkymist_ac97_init(Object *obj)
 {
-MilkymistAC97State *s = MILKYMIST_AC97(dev);
+MilkymistAC97State *s = MILKYMIST_AC97(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
-struct audsettings as;
 sysbus_init_irq(dev, >crrequest_irq);
 sysbus_init_irq(dev, >crreply_irq);
 sysbus_init_irq(dev, >dmar_irq);
 sysbus_init_irq(dev, >dmaw_irq);
 
+memory_region_init_io(>regs_region, obj, _mmio_ops, s,
+"milkymist-ac97", R_MAX * 4);
+sysbus_init_mmio(dev, >regs_region);
+}
+
+static void milkymist_ac97_realize(DeviceState *dev, Error **errp)
+{
+MilkymistAC97State *s = MILKYMIST_AC97(dev);
+struct audsettings as;
+
 AUD_register_card("Milkymist AC'97", >card);
 
 as.freq = 48000;
@@ -305,12 +315,6 @@ static int milkymist_ac97_init(SysBusDevice *dev)
 "mm_ac97.in", s, ac97_in_cb, );
 s->voice_out = AUD_open_out(>card, s->voice_out,
 "mm_ac97.out", s, ac97_out_cb, );
-
-memory_region_init_io(>regs_region, OBJECT(s), _mmio_ops, s,
-"milkymist-ac97", R_MAX * 4);
-sysbus_init_mmio(dev, >regs_region);
-
-return 0;
 }
 
 static const VMStateDescription vmstate_milkymist_ac97 = {
@@ -327,9 +331,8 @@ static const VMStateDescription vmstate_milkymist_ac97 = {
 static void milkymist_ac97_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = milkymist_ac97_init;
+dc->realize = milkymist_ac97_realize;
 dc->reset = milkymist_ac97_reset;
 dc->vmsd = _milkymist_ac97;
 }
@@ -338,6 +341,7 @@ static const TypeInfo milkymist_ac97_info = {
 .name  = TYPE_MILKYMIST_AC97,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(MilkymistAC97State),
+.instance_init = milkymist_ac97_init,
 .class_init= milkymist_ac97_class_init,
 };
 
-- 
2.1.4





[Qemu-devel] [PATCH v3 2/4] hw/audio: QOM cleanup for intel-hda

2016-05-12 Thread xiaoqiang zhao
drop the DO_UPCAST macro

Signed-off-by: xiaoqiang zhao 
---
 hw/audio/intel-hda.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index d372d4a..5b1e760 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -52,8 +52,8 @@ void hda_codec_bus_init(DeviceState *dev, HDACodecBus *bus, 
size_t bus_size,
 
 static int hda_codec_dev_init(DeviceState *qdev)
 {
-HDACodecBus *bus = DO_UPCAST(HDACodecBus, qbus, qdev->parent_bus);
-HDACodecDevice *dev = DO_UPCAST(HDACodecDevice, qdev, qdev);
+HDACodecBus *bus = HDA_BUS(qdev->parent_bus);
+HDACodecDevice *dev = HDA_CODEC_DEVICE(qdev);
 HDACodecDeviceClass *cdc = HDA_CODEC_DEVICE_GET_CLASS(dev);
 
 if (dev->cad == -1) {
@@ -68,7 +68,7 @@ static int hda_codec_dev_init(DeviceState *qdev)
 
 static int hda_codec_dev_exit(DeviceState *qdev)
 {
-HDACodecDevice *dev = DO_UPCAST(HDACodecDevice, qdev, qdev);
+HDACodecDevice *dev = HDA_CODEC_DEVICE(qdev);
 HDACodecDeviceClass *cdc = HDA_CODEC_DEVICE_GET_CLASS(dev);
 
 if (cdc->exit) {
@@ -84,7 +84,7 @@ HDACodecDevice *hda_codec_find(HDACodecBus *bus, uint32_t cad)
 
 QTAILQ_FOREACH(kid, >qbus.children, sibling) {
 DeviceState *qdev = kid->child;
-cdev = DO_UPCAST(HDACodecDevice, qdev, qdev);
+cdev = HDA_CODEC_DEVICE(qdev);
 if (cdev->cad == cad) {
 return cdev;
 }
@@ -94,14 +94,14 @@ HDACodecDevice *hda_codec_find(HDACodecBus *bus, uint32_t 
cad)
 
 void hda_codec_response(HDACodecDevice *dev, bool solicited, uint32_t response)
 {
-HDACodecBus *bus = DO_UPCAST(HDACodecBus, qbus, dev->qdev.parent_bus);
+HDACodecBus *bus = HDA_BUS(dev->qdev.parent_bus);
 bus->response(dev, solicited, response);
 }
 
 bool hda_codec_xfer(HDACodecDevice *dev, uint32_t stnr, bool output,
 uint8_t *buf, uint32_t len)
 {
-HDACodecBus *bus = DO_UPCAST(HDACodecBus, qbus, dev->qdev.parent_bus);
+HDACodecBus *bus = HDA_BUS(dev->qdev.parent_bus);
 return bus->xfer(dev, stnr, output, buf, len);
 }
 
@@ -337,7 +337,7 @@ static void intel_hda_corb_run(IntelHDAState *d)
 
 static void intel_hda_response(HDACodecDevice *dev, bool solicited, uint32_t 
response)
 {
-HDACodecBus *bus = DO_UPCAST(HDACodecBus, qbus, dev->qdev.parent_bus);
+HDACodecBus *bus = HDA_BUS(dev->qdev.parent_bus);
 IntelHDAState *d = container_of(bus, IntelHDAState, codecs);
 hwaddr addr;
 uint32_t wp, ex;
@@ -386,7 +386,7 @@ static void intel_hda_response(HDACodecDevice *dev, bool 
solicited, uint32_t res
 static bool intel_hda_xfer(HDACodecDevice *dev, uint32_t stnr, bool output,
uint8_t *buf, uint32_t len)
 {
-HDACodecBus *bus = DO_UPCAST(HDACodecBus, qbus, dev->qdev.parent_bus);
+HDACodecBus *bus = HDA_BUS(dev->qdev.parent_bus);
 IntelHDAState *d = container_of(bus, IntelHDAState, codecs);
 hwaddr addr;
 uint32_t s, copy, left;
@@ -493,7 +493,7 @@ static void intel_hda_notify_codecs(IntelHDAState *d, 
uint32_t stream, bool runn
 DeviceState *qdev = kid->child;
 HDACodecDeviceClass *cdc;
 
-cdev = DO_UPCAST(HDACodecDevice, qdev, qdev);
+cdev = HDA_CODEC_DEVICE(qdev);
 cdc = HDA_CODEC_DEVICE_GET_CLASS(cdev);
 if (cdc->stream) {
 cdc->stream(cdev, stream, running, output);
@@ -1120,7 +1120,7 @@ static void intel_hda_reset(DeviceState *dev)
 /* reset codecs */
 QTAILQ_FOREACH(kid, >codecs.qbus.children, sibling) {
 DeviceState *qdev = kid->child;
-cdev = DO_UPCAST(HDACodecDevice, qdev, qdev);
+cdev = HDA_CODEC_DEVICE(qdev);
 device_reset(DEVICE(cdev));
 d->state_sts |= (1 << cdev->cad);
 }
-- 
2.1.4





[Qemu-devel] [PATCH v3 3/4] hw/audio: QOM'ify intel-hda

2016-05-12 Thread xiaoqiang zhao
* use DeviceClass::realize instead of DeviceClass::init

Signed-off-by: xiaoqiang zhao 
---
 hw/audio/intel-hda.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index 5b1e760..93d7669 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -26,6 +26,7 @@
 #include "intel-hda.h"
 #include "intel-hda-defs.h"
 #include "sysemu/dma.h"
+#include "qapi/error.h"
 
 /* - */
 /* hda bus   */
@@ -50,7 +51,7 @@ void hda_codec_bus_init(DeviceState *dev, HDACodecBus *bus, 
size_t bus_size,
 bus->xfer = xfer;
 }
 
-static int hda_codec_dev_init(DeviceState *qdev)
+static void hda_codec_dev_realize(DeviceState *qdev, Error **errp)
 {
 HDACodecBus *bus = HDA_BUS(qdev->parent_bus);
 HDACodecDevice *dev = HDA_CODEC_DEVICE(qdev);
@@ -60,10 +61,13 @@ static int hda_codec_dev_init(DeviceState *qdev)
 dev->cad = bus->next_cad;
 }
 if (dev->cad >= 15) {
-return -1;
+error_setg(errp, "HDA audio codec address is full");
+return;
 }
 bus->next_cad = dev->cad + 1;
-return cdc->init(dev);
+if (cdc->init(dev) != 0) {
+error_setg(errp, "HDA audio init failed");
+}
 }
 
 static int hda_codec_dev_exit(DeviceState *qdev)
@@ -1298,7 +1302,7 @@ static const TypeInfo intel_hda_info_ich9 = {
 static void hda_codec_device_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *k = DEVICE_CLASS(klass);
-k->init = hda_codec_dev_init;
+k->realize = hda_codec_dev_realize;
 k->exit = hda_codec_dev_exit;
 set_bit(DEVICE_CATEGORY_SOUND, k->categories);
 k->bus_type = TYPE_HDA_BUS;
-- 
2.1.4





Re: [Qemu-devel] [PATCH V2] net/net: Add SocketReadState for reuse codes

2016-05-12 Thread Jason Wang



On 2016年05月12日 17:19, Zhang Chen wrote:

This function is from net/socket.c, move it to net.c and net.h.
Add SocketReadState to make others reuse net_fill_rstate().
suggestion from jason.

v2:
  - rename ReadState to SocketReadState
  - add SocketReadState init and finalize callback

v1:
  - init patch

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
  include/net/net.h   | 14 +
  net/filter-mirror.c | 72 ++-
  net/net.c   | 65 ++
  net/socket.c| 89 +
  4 files changed, 137 insertions(+), 103 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 73e4c46..4f6b6bf 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -57,6 +57,9 @@ typedef void (SetOffload)(NetClientState *, int, int, int, 
int, int);
  typedef void (SetVnetHdrLen)(NetClientState *, int);
  typedef int (SetVnetLE)(NetClientState *, bool);
  typedef int (SetVnetBE)(NetClientState *, bool);
+typedef struct SocketReadState SocketReadState;
+typedef void (SocketReadStateInit)(SocketReadState *rs);
+typedef void (SocketReadStateFinalize)(SocketReadState *rs);
  
  typedef struct NetClientInfo {

  NetClientOptionsKind type;
@@ -102,6 +105,16 @@ typedef struct NICState {
  bool peer_deleted;
  } NICState;
  
+struct SocketReadState {

+int state; /* 0 = getting length, 1 = getting data */
+uint32_t index;
+uint32_t packet_len;
+uint8_t buf[NET_BUFSIZE];
+SocketReadStateInit *init;
+SocketReadStateFinalize *finalize;
+};
+
+int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size);
  char *qemu_mac_strdup_printf(const uint8_t *macaddr);
  NetClientState *qemu_find_netdev(const char *id);
  int qemu_find_net_clients_except(const char *id, NetClientState **ncs,
@@ -160,6 +173,7 @@ ssize_t qemu_deliver_packet_iov(NetClientState *sender,
  
  void print_net_client(Monitor *mon, NetClientState *nc);

  void hmp_info_network(Monitor *mon, const QDict *qdict);
+void net_socket_rs_init(SocketReadState *rs);
  
  /* NIC info */
  
diff --git a/net/filter-mirror.c b/net/filter-mirror.c

index c0c4dc6..4e55924 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -40,10 +40,7 @@ typedef struct MirrorState {
  char *outdev;
  CharDriverState *chr_in;
  CharDriverState *chr_out;
-int state; /* 0 = getting length, 1 = getting data */
-unsigned int index;
-unsigned int packet_len;
-uint8_t buf[REDIRECTOR_MAX_LEN];
+SocketReadState rs;
  } MirrorState;
  
  static int filter_mirror_send(CharDriverState *chr_out,

@@ -108,50 +105,15 @@ static void redirector_chr_read(void *opaque, const 
uint8_t *buf, int size)
  {
  NetFilterState *nf = opaque;
  MirrorState *s = FILTER_REDIRECTOR(nf);
-unsigned int l;
-
-while (size > 0) {
-/* reassemble a packet from the network */
-switch (s->state) { /* 0 = getting length, 1 = getting data */
-case 0:
-l = 4 - s->index;
-if (l > size) {
-l = size;
-}
-memcpy(s->buf + s->index, buf, l);
-buf += l;
-size -= l;
-s->index += l;
-if (s->index == 4) {
-/* got length */
-s->packet_len = ntohl(*(uint32_t *)s->buf);
-s->index = 0;
-s->state = 1;
-}
-break;
-case 1:
-l = s->packet_len - s->index;
-if (l > size) {
-l = size;
-}
-if (s->index + l <= sizeof(s->buf)) {
-memcpy(s->buf + s->index, buf, l);
-} else {
-error_report("serious error: oversized packet received.");
-s->index = s->state = 0;
-qemu_chr_add_handlers(s->chr_in, NULL, NULL, NULL, NULL);
-return;
-}
-
-s->index += l;
-buf += l;
-size -= l;
-if (s->index >= s->packet_len) {
-s->index = 0;
-s->state = 0;
-redirector_to_filter(nf, s->buf, s->packet_len);
-}
-break;
+int ret;
+
+ret = net_fill_rstate(>rs, buf, size);
+
+if (ret == -1) {
+qemu_chr_add_handlers(s->chr_in, NULL, NULL, NULL, NULL);
+} else if (ret == 1) {
+if (s->rs.finalize) {
+s->rs.finalize(>rs);


Why not simply call this in net_fill_rstate()?


  }
  }
  }
@@ -258,6 +220,14 @@ static void filter_mirror_setup(NetFilterState *nf, Error 
**errp)
  }
  }
  
+static void redirector_rs_finalize(SocketReadState *rs)

+{
+MirrorState *s = container_of(rs, MirrorState, rs);
+NetFilterState *nf = NETFILTER(s);
+
+redirector_to_filter(nf, rs->buf, 

Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

2016-05-12 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Friday, May 13, 2016 3:06 AM
> 
> > >
> >
> > Based on above thought I'm thinking whether below would work:
> > (let's use gpa to replace existing iova in type1 driver, while using iova
> > for the one actually used in vGPU driver. Assume 'pin-all' scenario first
> > which matches existing vfio logic)
> >
> > - No change to existing vfio_dma structure. VFIO still maintains gpa<->vaddr
> > mapping, in coarse-grained regions;
> >
> > - Leverage same page accounting/pinning logic in type1 driver, which
> > should be enough for 'pin-all' usage;
> >
> > - Then main divergence point for vGPU would be in vfio_unmap_unpin
> > and vfio_iommu_map. I'm not sure whether it's easy to fake an
> > iommu_domain for vGPU so same iommu_map/unmap can be reused.
> 
> This seems troublesome.  Kirti's version used numerous api-only tests
> to avoid these which made the code difficult to trace.  Clearly one
> option is to split out the common code so that a new mediated-type1
> backend skips this, but they thought they could clean it up without
> this, so we'll see what happens in the next version.
> 
> > If not, we may introduce two new map/unmap callbacks provided
> > specifically by vGPU core driver, as you suggested:
> >
> > * vGPU core driver uses dma_map_page to map specified pfns:
> >
> > o When IOMMU is enabled, we'll get an iova returned different
> > from pfn;
> > o When IOMMU is disabled, returned iova is same as pfn;
> 
> Either way each iova needs to be stored and we have a worst case of one
> iova per page of guest memory.
> 
> > * Then vGPU core driver just maintains its own gpa<->iova lookup
> > table (e.g. called vgpu_dma)
> >
> > * Because each vfio_iommu_map invocation is about a contiguous
> > region, we can expect same number of vgpu_dma entries as maintained
> > for vfio_dma list;
> >
> > Then it's vGPU core driver's responsibility to provide gpa<->iova
> > lookup for vendor specific GPU driver. And we don't need worry about
> > tens of thousands of entries. Once we get this simple 'pin-all' model
> > ready, then it can be further extended to support 'pin-sparse'
> > scenario. We still maintain a top-level vgpu_dma list with each entry to
> > further link its own sparse mapping structure. In reality I don't expect
> > we really need to maintain per-page translation even with sparse pinning.
> 
> If you're trying to equate the scale of what we need to track vs what
> type1 currently tracks, they're significantly different.  Possible
> things we need to track include the pfn, the iova, and possibly a
> reference count or some sort of pinned page map.  In the pin-all model
> we can assume that every page is pinned on map and unpinned on unmap,
> so a reference count or map is unnecessary.  We can also assume that we
> can always regenerate the pfn with get_user_pages() from the vaddr, so
> we don't need to track that.  I don't see any way around tracking the
> iova.  The iommu can't tell us this like it can with the normal type1
> model because the pfn is the result of the translation, not the key for
> the translation. So we're always going to have between 1 and
> (size/PAGE_SIZE) iova entries per vgpu_dma entry.  You might be able to
> manage the vgpu_dma with an rb-tree, but each vgpu_dma entry needs some
> data structure tracking every iova.

There is one option. We may use alloc_iova to reserve continuous iova
range for each vgpu_dma range and then use iommu_map/unmap to
write iommu ptes later upon map request (then could be same #entries
as vfio_dma compared to unbounded entries when using dma_map_page). 
Of course this needs to be done in vGPU core driver, since vfio type1 only 
sees a faked iommu domain.

> 
> Sparse mapping has the same issue but of course the tree of iovas is
> potentially incomplete and we need a way to determine where it's
> incomplete.  A page table rooted in the vgpu_dma and indexed by the
> offset from the start vaddr seems like the way to go here.  It's also
> possible that some mediated device models might store the iova in the
> command sent to the device and therefore be able to parse those entries
> back out to unmap them without storing them separately.  This might be
> how the s390 channel-io model would prefer to work.  That seems like
> further validation that such tracking is going to be dependent on the
> mediated driver itself and probably not something to centralize in a
> mediated iommu driver.  Thanks,
> 

Another simpler way might be allocate an array for each memory
regions registered from user space. For a 512MB region, it means
512K*4=2MB array to track pfn or iova mapping corresponding to
a gfn. It may consume more resource than rb tree when not many
pages need to be pinned, but could be less when rb tree increases
a lot. 

Is such array-based approach considered ugly in kernel? :-)

Thanks
Kevin



Re: [Qemu-devel] [RFC PATCH V3 1/4] colo-compare: introduce colo compare initlization

2016-05-12 Thread Jason Wang



On 2016年05月12日 16:16, Zhang Chen wrote:



On 05/12/2016 04:01 PM, Jason Wang wrote:



On 2016年05月12日 14:49, Zhang Chen wrote:



On 05/09/2016 06:49 PM, Zhang Chen wrote:



+
+s->chr_sec_in = qemu_chr_find(s->sec_indev);
+if (s->chr_sec_in == NULL) {
+error_setg(errp, "Secondary IN Device '%s' not found",
+   s->sec_indev);
+return;
+}
+
+s->chr_out = qemu_chr_find(s->outdev);
+if (s->chr_out == NULL) {
+error_setg(errp, "OUT Device '%s' not found", s->outdev);
+return;
+}
+
+qemu_chr_fe_claim_no_fail(s->chr_pri_in);
+qemu_chr_add_handlers(s->chr_pri_in, compare_chr_can_read,
+  compare_pri_chr_in, NULL, s);
+
+qemu_chr_fe_claim_no_fail(s->chr_sec_in);
+qemu_chr_add_handlers(s->chr_sec_in, compare_chr_can_read,
+  compare_sec_chr_in, NULL, s);
+
Btw, what's the reason of handling this in main loop? I 
thought it

would
be better to do this in colo thread? Otherwise, you need 
lots of

extra
synchronizations?

Do you mean we should start/stop/do checkpoint it by colo-frame?
I mean we probably want to handle pri_in and sec_in in colo 
compare
thread. Through this way, there's no need for extra 
synchronization

with
main loop.

I get your point, but how to do this.
Now, we use qemu_chr_add_handlers to do this job.
You probably want to start a new main loop in colo comparing 
thread.


IIUC, do you mean
- remove char device read_handler

  ↓at colo comparing thread↓
while (true) {
- blocking read packet from char device with select(2)/poll(2)...
- compare packet
}

Yes, something like this.



But remove qemu_chr_add_handlers I can't get fd to select/poll.

How to get fd from all kinds of chardev?



Hi~ jason.

If we use chardev socket the fd save in QIOChannelSocket.

and if we use chardev file the fd save in QIOChannelFile.

Have any common method to get fd?


I'm not sure I get the question. But you probably can call 
qemu_chr_add_handlers() in colo comparing thread to solve this I think?




I have tested call qemu_chr_add_handlers() in colo comparing thread, 
but when data come,

the handler always running in main loop.

Thanks
Zhang Chen 


Cc Amit for the help.

Amit, we want to poll and handle chardev in another thread other than 
main loop. But looks like qemu_chr_add_handlers() can only work for 
default context other than thread default context. Any other solution 
for this?


Thanks



[Qemu-devel] [PATCH v3 0/4] QOM'ify hw/audio files

2016-05-12 Thread xiaoqiang zhao
This patch set QOM'ify some files under hw/audio directory.
See each patch's commit message for details.

Changes in v3: 
* fix code style errors
* refine error_setg message format 

Changes in v2: 
Move AUD_open_in/out function into device realize stage

xiaoqiang zhao (4):
  hw/audio: QOM'ify cs4231.c
  hw/audio: QOM cleanup for intel-hda
  hw/audio: QOM'ify intel-hda
  hw/audio: QOM'ify milkymist-ac97.c

 hw/audio/cs4231.c | 12 +---
 hw/audio/intel-hda.c  | 32 ++--
 hw/audio/milkymist-ac97.c | 26 +++---
 3 files changed, 38 insertions(+), 32 deletions(-)

-- 
2.1.4





[Qemu-devel] [PATCH v3 4/4] hw/char: QOM'ify lm32_uart.c

2016-05-12 Thread xiaoqiang zhao
* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial

Signed-off-by: xiaoqiang zhao 
---
 hw/char/lm32_uart.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/hw/char/lm32_uart.c b/hw/char/lm32_uart.c
index 036813d..bcf851b 100644
--- a/hw/char/lm32_uart.c
+++ b/hw/char/lm32_uart.c
@@ -249,23 +249,25 @@ static void uart_reset(DeviceState *d)
 s->regs[R_LSR] = LSR_THRE | LSR_TEMT;
 }
 
-static int lm32_uart_init(SysBusDevice *dev)
+static void lm32_uart_init(Object *obj)
 {
-LM32UartState *s = LM32_UART(dev);
+LM32UartState *s = LM32_UART(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
 sysbus_init_irq(dev, >irq);
 
-memory_region_init_io(>iomem, OBJECT(s), _ops, s,
+memory_region_init_io(>iomem, obj, _ops, s,
   "uart", R_MAX * 4);
 sysbus_init_mmio(dev, >iomem);
+}
+
+static void lm32_uart_realize(DeviceState *dev, Error **errp)
+{
+LM32UartState *s = LM32_UART(dev);
 
-/* FIXME use a qdev chardev prop instead of qemu_char_get_next_serial() */
-s->chr = qemu_char_get_next_serial();
 if (s->chr) {
 qemu_chr_add_handlers(s->chr, uart_can_rx, uart_rx, uart_event, s);
 }
-
-return 0;
 }
 
 static const VMStateDescription vmstate_lm32_uart = {
@@ -278,22 +280,26 @@ static const VMStateDescription vmstate_lm32_uart = {
 }
 };
 
+static Property lm32_uart_properties[] = {
+DEFINE_PROP_CHR("lm32_uart", LM32UartState, chr),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void lm32_uart_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = lm32_uart_init;
 dc->reset = uart_reset;
 dc->vmsd = _lm32_uart;
-/* Reason: init() method uses qemu_char_get_next_serial() */
-dc->cannot_instantiate_with_device_add_yet = true;
+dc->props = lm32_uart_properties;
+dc->realize = lm32_uart_realize;
 }
 
 static const TypeInfo lm32_uart_info = {
 .name  = TYPE_LM32_UART,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(LM32UartState),
+.instance_init = lm32_uart_init,
 .class_init= lm32_uart_class_init,
 };
 
-- 
2.1.4





[Qemu-devel] [PATCH v3 0/4] QOM'ify hw/char devices

2016-05-12 Thread xiaoqiang zhao
This patch set trys to QOM'ify hw/char files, see commit messages 
for more details

Note: patches
  hw/char: QOM'ify sclpconsole-lm.c
  hw/char: QOM'ify sclpconsole.c
of v2 has been taken by Cornelia Huck

Thanks Paolo  for your suggestions.

Changes in v3:
* use chardev property instead of qemu_char_get_next_serial
* call the functions that touch globals in the realize callback

Changes in v2:
* rename TYPE_SCLP_LM_CONSOLE to TYPE_SCLPLM_CONSOLE which is suggested by 
  Cornelia Huck 
* rebase on the current master

xiaoqiang zhao (4):
  hw/char: QOM'ify escc.c
  hw/char: QOM'ify etraxfs_ser.c
  hw/char: QOM'ify lm32_juart.c
  hw/char: QOM'ify lm32_uart.c

 hw/char/escc.c| 30 +++---
 hw/char/etraxfs_ser.c | 27 +--
 hw/char/lm32_juart.c  | 17 -
 hw/char/lm32_uart.c   | 28 +---
 4 files changed, 61 insertions(+), 41 deletions(-)

-- 
2.1.4





[Qemu-devel] [PATCH v3 3/4] hw/char: QOM'ify lm32_juart.c

2016-05-12 Thread xiaoqiang zhao
* Drop the old SysBus init function
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial

Signed-off-by: xiaoqiang zhao 
---
 hw/char/lm32_juart.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/hw/char/lm32_juart.c b/hw/char/lm32_juart.c
index 5bf8acf..1cc5ccf 100644
--- a/hw/char/lm32_juart.c
+++ b/hw/char/lm32_juart.c
@@ -114,17 +114,13 @@ static void juart_reset(DeviceState *d)
 s->jrx = 0;
 }
 
-static int lm32_juart_init(SysBusDevice *dev)
+static void lm32_juart_realize(DeviceState *dev, Error **errp)
 {
 LM32JuartState *s = LM32_JUART(dev);
 
-/* FIXME use a qdev chardev prop instead of qemu_char_get_next_serial() */
-s->chr = qemu_char_get_next_serial();
 if (s->chr) {
 qemu_chr_add_handlers(s->chr, juart_can_rx, juart_rx, juart_event, s);
 }
-
-return 0;
 }
 
 static const VMStateDescription vmstate_lm32_juart = {
@@ -138,16 +134,19 @@ static const VMStateDescription vmstate_lm32_juart = {
 }
 };
 
+static Property lm32_juart_properties[] = {
+DEFINE_PROP_CHR("lm32_juart", LM32JuartState, chr),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void lm32_juart_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = lm32_juart_init;
 dc->reset = juart_reset;
 dc->vmsd = _lm32_juart;
-/* Reason: init() method uses qemu_char_get_next_serial() */
-dc->cannot_instantiate_with_device_add_yet = true;
+dc->props = lm32_juart_properties;
+dc->realize = lm32_juart_realize;
 }
 
 static const TypeInfo lm32_juart_info = {
-- 
2.1.4





[Qemu-devel] [PATCH v3 1/4] hw/char: QOM'ify escc.c

2016-05-12 Thread xiaoqiang zhao
* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback

Signed-off-by: xiaoqiang zhao 
---
 hw/char/escc.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/hw/char/escc.c b/hw/char/escc.c
index 7bf09a0..8e6a7df 100644
--- a/hw/char/escc.c
+++ b/hw/char/escc.c
@@ -983,9 +983,10 @@ void slavio_serial_ms_kbd_init(hwaddr base, qemu_irq irq,
 sysbus_mmio_map(s, 0, base);
 }
 
-static int escc_init1(SysBusDevice *dev)
+static void escc_init1(Object *obj)
 {
-ESCCState *s = ESCC(dev);
+ESCCState *s = ESCC(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 unsigned int i;
 
 s->chn[0].disabled = s->disabled;
@@ -994,17 +995,26 @@ static int escc_init1(SysBusDevice *dev)
 sysbus_init_irq(dev, >chn[i].irq);
 s->chn[i].chn = 1 - i;
 s->chn[i].clock = s->frequency / 2;
-if (s->chn[i].chr) {
-qemu_chr_add_handlers(s->chn[i].chr, serial_can_receive,
-  serial_receive1, serial_event, >chn[i]);
-}
 }
 s->chn[0].otherchn = >chn[1];
 s->chn[1].otherchn = >chn[0];
 
-memory_region_init_io(>mmio, OBJECT(s), _mem_ops, s, "escc",
+memory_region_init_io(>mmio, obj, _mem_ops, s, "escc",
   ESCC_SIZE << s->it_shift);
 sysbus_init_mmio(dev, >mmio);
+}
+
+static void escc_realize(DeviceState *dev, Error **errp)
+{
+ESCCState *s = ESCC(dev);
+unsigned int i;
+
+for (i = 0; i < 2; i++) {
+if (s->chn[i].chr) {
+qemu_chr_add_handlers(s->chn[i].chr, serial_can_receive,
+  serial_receive1, serial_event, >chn[i]);
+}
+}
 
 if (s->chn[0].type == mouse) {
 qemu_add_mouse_event_handler(sunmouse_event, >chn[0], 0,
@@ -1014,8 +1024,6 @@ static int escc_init1(SysBusDevice *dev)
 s->chn[1].hs = qemu_input_handler_register((DeviceState *)(>chn[1]),
_handler);
 }
-
-return 0;
 }
 
 static Property escc_properties[] = {
@@ -1032,10 +1040,9 @@ static Property escc_properties[] = {
 static void escc_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = escc_init1;
 dc->reset = escc_reset;
+dc->realize = escc_realize;
 dc->vmsd = _escc;
 dc->props = escc_properties;
 set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
@@ -1045,6 +1052,7 @@ static const TypeInfo escc_info = {
 .name  = TYPE_ESCC,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(ESCCState),
+.instance_init = escc_init1,
 .class_init= escc_class_init,
 };
 
-- 
2.1.4





[Qemu-devel] [PATCH v3 2/4] hw/char: QOM'ify etraxfs_ser.c

2016-05-12 Thread xiaoqiang zhao
* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial

Signed-off-by: xiaoqiang zhao 
---
 hw/char/etraxfs_ser.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/hw/char/etraxfs_ser.c b/hw/char/etraxfs_ser.c
index 146b387..6957c68 100644
--- a/hw/char/etraxfs_ser.c
+++ b/hw/char/etraxfs_ser.c
@@ -159,6 +159,11 @@ static const MemoryRegionOps ser_ops = {
 }
 };
 
+static Property etraxfs_ser_properties[] = {
+DEFINE_PROP_CHR("etraxfs-serial", ETRAXSerial, chr),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void serial_receive(void *opaque, const uint8_t *buf, int size)
 {
 ETRAXSerial *s = opaque;
@@ -209,40 +214,42 @@ static void etraxfs_ser_reset(DeviceState *d)
 
 }
 
-static int etraxfs_ser_init(SysBusDevice *dev)
+static void etraxfs_ser_init(Object *obj)
 {
-ETRAXSerial *s = ETRAX_SERIAL(dev);
+ETRAXSerial *s = ETRAX_SERIAL(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
 sysbus_init_irq(dev, >irq);
-memory_region_init_io(>mmio, OBJECT(s), _ops, s,
+memory_region_init_io(>mmio, obj, _ops, s,
   "etraxfs-serial", R_MAX * 4);
 sysbus_init_mmio(dev, >mmio);
+}
+
+static void etraxfs_ser_realize(DeviceState *dev, Error **errp)
+{
+ETRAXSerial *s = ETRAX_SERIAL(dev);
 
-/* FIXME use a qdev chardev prop instead of qemu_char_get_next_serial() */
-s->chr = qemu_char_get_next_serial();
 if (s->chr) {
 qemu_chr_add_handlers(s->chr,
   serial_can_receive, serial_receive,
   serial_event, s);
 }
-return 0;
 }
 
 static void etraxfs_ser_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = etraxfs_ser_init;
 dc->reset = etraxfs_ser_reset;
-/* Reason: init() method uses qemu_char_get_next_serial() */
-dc->cannot_instantiate_with_device_add_yet = true;
+dc->props = etraxfs_ser_properties;
+dc->realize = etraxfs_ser_realize;
 }
 
 static const TypeInfo etraxfs_ser_info = {
 .name  = TYPE_ETRAX_FS_SERIAL,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(ETRAXSerial),
+.instance_init = etraxfs_ser_init,
 .class_init= etraxfs_ser_class_init,
 };
 
-- 
2.1.4





Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

2016-05-12 Thread Tian, Kevin
> From: Neo Jia [mailto:c...@nvidia.com]
> Sent: Friday, May 13, 2016 3:49 AM
> 
> >
> > > Perhaps one possibility would be to allow the vgpu driver to register
> > > map and unmap callbacks.  The unmap callback might provide the
> > > invalidation interface that we're so far missing.  The combination of
> > > map and unmap callbacks might simplify the Intel approach of pinning the
> > > entire VM memory space, ie. for each map callback do a translation
> > > (pin) and dma_map_page, for each unmap do a dma_unmap_page and release
> > > the translation.
> >
> > Yes adding map/unmap ops in pGPU drvier (I assume you are refering to
> > gpu_device_ops as
> > implemented in Kirti's patch) sounds a good idea, satisfying both: 1)
> > keeping vGPU purely
> > virtual; 2) dealing with the Linux DMA API to achive hardware IOMMU
> > compatibility.
> >
> > PS, this has very little to do with pinning wholly or partially. Intel 
> > KVMGT has
> > once been had the whole guest memory pinned, only because we used a 
> > spinlock,
> > which can't sleep at runtime.  We have removed that spinlock in our another
> > upstreaming effort, not here but for i915 driver, so probably no biggie.
> >
> 
> OK, then you guys don't need to pin everything. The next question will be if 
> you
> can send the pinning request from your mediated driver backend to request 
> memory
> pinning like we have demonstrated in the v3 patch, function vfio_pin_pages and
> vfio_unpin_pages?
> 

Jike can you confirm this statement? My feeling is that we don't have such logic
in our device model to figure out which pages need to be pinned on demand. So
currently pin-everything is same requirement in both KVM and Xen side...

Thanks
Kevin



[Qemu-devel] [PULL 11/39] tcg/mips: Make direct jump patching thread-safe

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Ensure direct jump patching in MIPS is atomic by using
atomic_read()/atomic_set() for code patching.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Message-Id: <1461341333-19646-11-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
[rth: Merged the deposit32 followup.]
[rth: Merged the following followup.]
Message-Id: <1462210518-26522-1-git-send-email-sergey.fedo...@linaro.org>
---
 tcg/mips/tcg-target.inc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index aaf881c..1e5a6b4 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -1885,7 +1885,6 @@ static void tcg_target_init(TCGContext *s)
 
 void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
 {
-uint32_t *ptr = (uint32_t *)jmp_addr;
-*ptr = deposit32(*ptr, 0, 26, addr >> 2);
+atomic_set((uint32_t *)jmp_addr, deposit32(OPC_J, 0, 26, addr >> 2));
 flush_icache_range(jmp_addr, jmp_addr + 4);
 }
-- 
2.5.5




[Qemu-devel] [PULL 33/39] cpu-exec: Remove relic orphaned comment

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

This comment should have been deleted by commit 0ac087f1f3ae ("removed
unused code") but somehow it is still here. There's no point to keep it.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Message-Id: <1462286050-21778-1-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 cpu-exec.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index d43d5ae..d55faa5 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -582,8 +582,6 @@ int cpu_exec(CPUState *cpu)
 /* Try to align the host and virtual clocks
if the guest is in advance */
 align_clocks(, cpu);
-/* reset soft MMU for next block (it can currently
-   only be set by a memory fault) */
 } /* for(;;) */
 } else {
 #if defined(__clang__) || !QEMU_GNUC_PREREQ(4, 6)
-- 
2.5.5




[Qemu-devel] [PULL 23/39] tcg: Clean up tb_jmp_unlink()

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Unify the code of this function with tb_jmp_remove_from_list(). Making
these functions similar improves their readability. Also this could be a
step towards making this function thread-safe.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 translate-all.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 9a57aab..d679ad1 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -965,25 +965,22 @@ static inline void tb_reset_jump(TranslationBlock *tb, 
int n)
 /* remove any jumps to the TB */
 static inline void tb_jmp_unlink(TranslationBlock *tb)
 {
-uintptr_t tb1, tb2;
+TranslationBlock *tb1;
+uintptr_t *ptb, ntb;
 unsigned int n1;
 
-tb1 = tb->jmp_list_first;
+ptb = >jmp_list_first;
 for (;;) {
-TranslationBlock *tmp_tb;
-n1 = tb1 & 3;
+ntb = *ptb;
+n1 = ntb & 3;
+tb1 = (TranslationBlock *)(ntb & ~3);
 if (n1 == 2) {
 break;
 }
-tmp_tb = (TranslationBlock *)(tb1 & ~3);
-tb2 = tmp_tb->jmp_list_next[n1];
-tb_reset_jump(tmp_tb, n1);
-tmp_tb->jmp_list_next[n1] = (uintptr_t)NULL;
-tb1 = tb2;
+tb_reset_jump(tb1, n1);
+*ptb = tb1->jmp_list_next[n1];
+tb1->jmp_list_next[n1] = (uintptr_t)NULL;
 }
-
-assert(((uintptr_t)tb & 3) == 0);
-tb->jmp_list_first = (uintptr_t)tb | 2; /* fail safe */
 }
 
 /* invalidate one TB */
-- 
2.5.5




[Qemu-devel] [PULL 17/39] tcg: Use uintptr_t type for jmp_list_{next|first} fields of TB

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

These fields do not contain pure pointers to a TranslationBlock
structure. So uintptr_t is the most appropriate type for them.
Also put some asserts to assure that the two least significant bits of
the pointer are always zero before assigning it to jmp_list_first.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h | 12 +++-
 translate-all.c | 38 --
 2 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 445d946..64c2a66 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -277,14 +277,16 @@ struct TranslationBlock {
  * jmp_list_first points to the first TB jumping to this one.
  * jmp_list_next is used to point to the next TB in a list.
  * Since each TB can have two jumps, it can participate in two lists.
- * The two least significant bits of a pointer are used to choose which
- * data field holds a pointer to the next TB:
+ * jmp_list_first and jmp_list_next are 4-byte aligned pointers to a
+ * TranslationBlock structure, but the two least significant bits of
+ * them are used to encode which data field of the pointed TB should
+ * be used to traverse the list further from that TB:
  * 0 => jmp_list_next[0], 1 => jmp_list_next[1], 2 => jmp_list_first.
  * In other words, 0/1 tells which jump is used in the pointed TB,
  * and 2 means that this is a pointer back to the target TB of this list.
  */
-struct TranslationBlock *jmp_list_next[2];
-struct TranslationBlock *jmp_list_first;
+uintptr_t jmp_list_next[2];
+uintptr_t jmp_list_first;
 };
 
 #include "qemu/thread.h"
@@ -382,7 +384,7 @@ static inline void tb_add_jump(TranslationBlock *tb, int n,
 
 /* add in TB jmp circular list */
 tb->jmp_list_next[n] = tb_next->jmp_list_first;
-tb_next->jmp_list_first = (TranslationBlock *)((uintptr_t)tb | n);
+tb_next->jmp_list_first = (uintptr_t)tb | n;
 }
 }
 
diff --git a/translate-all.c b/translate-all.c
index c6613d1..2fb1646 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -928,17 +928,17 @@ static inline void tb_page_remove(TranslationBlock **ptb, 
TranslationBlock *tb)
 
 static inline void tb_jmp_remove(TranslationBlock *tb, int n)
 {
-TranslationBlock *tb1, **ptb;
+TranslationBlock *tb1;
+uintptr_t *ptb, ntb;
 unsigned int n1;
 
 ptb = >jmp_list_next[n];
-tb1 = *ptb;
-if (tb1) {
+if (*ptb) {
 /* find tb(n) in circular list */
 for (;;) {
-tb1 = *ptb;
-n1 = (uintptr_t)tb1 & 3;
-tb1 = (TranslationBlock *)((uintptr_t)tb1 & ~3);
+ntb = *ptb;
+n1 = ntb & 3;
+tb1 = (TranslationBlock *)(ntb & ~3);
 if (n1 == n && tb1 == tb) {
 break;
 }
@@ -951,7 +951,7 @@ static inline void tb_jmp_remove(TranslationBlock *tb, int 
n)
 /* now we can suppress tb(n) from the list */
 *ptb = tb->jmp_list_next[n];
 
-tb->jmp_list_next[n] = NULL;
+tb->jmp_list_next[n] = (uintptr_t)NULL;
 }
 }
 
@@ -970,7 +970,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 PageDesc *p;
 unsigned int h, n1;
 tb_page_addr_t phys_pc;
-TranslationBlock *tb1, *tb2;
+uintptr_t tb1, tb2;
 
 /* remove the TB from the hash list */
 phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
@@ -1006,19 +1006,20 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 /* suppress any remaining jumps to this TB */
 tb1 = tb->jmp_list_first;
 for (;;) {
-n1 = (uintptr_t)tb1 & 3;
+TranslationBlock *tmp_tb;
+n1 = tb1 & 3;
 if (n1 == 2) {
 break;
 }
-tb1 = (TranslationBlock *)((uintptr_t)tb1 & ~3);
-tb2 = tb1->jmp_list_next[n1];
-tb_reset_jump(tb1, n1);
-tb1->jmp_list_next[n1] = NULL;
+tmp_tb = (TranslationBlock *)(tb1 & ~3);
+tb2 = tmp_tb->jmp_list_next[n1];
+tb_reset_jump(tmp_tb, n1);
+tmp_tb->jmp_list_next[n1] = (uintptr_t)NULL;
 tb1 = tb2;
 }
 
-/* fail safe */
-tb->jmp_list_first = (TranslationBlock *)((uintptr_t)tb | 2);
+assert(((uintptr_t)tb & 3) == 0);
+tb->jmp_list_first = (uintptr_t)tb | 2; /* fail safe */
 
 tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
 }
@@ -1492,9 +1493,10 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 tb->page_addr[1] = -1;
 }
 
-tb->jmp_list_first = (TranslationBlock *)((uintptr_t)tb | 2);
-tb->jmp_list_next[0] = NULL;
-tb->jmp_list_next[1] = NULL;
+assert(((uintptr_t)tb & 3) == 

[Qemu-devel] [PULL 24/39] tcg: Clean up direct block chaining safety checks

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

We don't take care of direct jumps when address mapping changes. Thus we
must be sure to generate direct jumps so that they always keep valid
even if address mapping changes. Luckily, we can only allow to execute a
TB if it was generated from the pages which match with current mapping.

Document tcg_gen_goto_tb() declaration and note the reason for
destination PC limitations.

Some targets with variable length instructions allow TB to straddle a
page boundary. However, we make sure that both of TB pages match the
current address mapping when looking up TBs. So it is safe to do direct
jumps into the both pages. Correct the checks for some of those targets.

Given that, we can safely patch a TB which spans two pages. Remove the
unnecessary check in cpu_exec() and allow such TBs to be patched.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 cpu-exec.c   |  7 ++-
 target-arm/translate.c   |  3 ++-
 target-cris/translate.c  |  4 +++-
 target-i386/translate.c  |  2 +-
 target-m68k/translate.c  |  2 +-
 target-s390x/translate.c |  2 +-
 tcg/tcg-op.h | 10 ++
 7 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index debc65c..f984dc7 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -508,11 +508,8 @@ int cpu_exec(CPUState *cpu)
 next_tb = 0;
 tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
 }
-/* see if we can patch the calling TB. When the TB
-   spans two pages, we cannot safely do a direct
-   jump. */
-if (next_tb != 0 && tb->page_addr[1] == -1
-&& !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+/* See if we can patch the calling TB. */
+if (next_tb != 0 && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
 tb_add_jump((TranslationBlock *)(next_tb & ~TB_EXIT_MASK),
 next_tb & TB_EXIT_MASK, tb);
 }
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 940ec8d..34196a8 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -4054,7 +4054,8 @@ static inline void gen_goto_tb(DisasContext *s, int n, 
target_ulong dest)
 TranslationBlock *tb;
 
 tb = s->tb;
-if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK)) {
+if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) ||
+((s->pc - 1) & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK)) {
 tcg_gen_goto_tb(n);
 gen_set_pc_im(s, dest);
 tcg_gen_exit_tb((uintptr_t)tb + n);
diff --git a/target-cris/translate.c b/target-cris/translate.c
index a73176c..9c8ff8f 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -524,7 +524,9 @@ static void gen_goto_tb(DisasContext *dc, int n, 
target_ulong dest)
 {
 TranslationBlock *tb;
 tb = dc->tb;
-if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK)) {
+
+if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) ||
+(dc->ppc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK)) {
 tcg_gen_goto_tb(n);
 tcg_gen_movi_tl(env_pc, dest);
 tcg_gen_exit_tb((uintptr_t)tb + n);
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 3a32f65..058d85a 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -2094,7 +2094,7 @@ static inline void gen_goto_tb(DisasContext *s, int 
tb_num, target_ulong eip)
 tb = s->tb;
 /* NOTE: we handle the case where the TB spans two pages here */
 if ((pc & TARGET_PAGE_MASK) == (tb->pc & TARGET_PAGE_MASK) ||
-(pc & TARGET_PAGE_MASK) == ((s->pc - 1) & TARGET_PAGE_MASK))  {
+(pc & TARGET_PAGE_MASK) == (s->pc_start & TARGET_PAGE_MASK))  {
 /* jump to same page: we can use a direct jump */
 tcg_gen_goto_tb(tb_num);
 gen_jmp_im(eip);
diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 7560c3a..e2ce6c6 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -861,7 +861,7 @@ static void gen_jmp_tb(DisasContext *s, int n, uint32_t 
dest)
 if (unlikely(s->singlestep_enabled)) {
 gen_exception(s, dest, EXCP_DEBUG);
 } else if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) ||
-   (s->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK)) {
+   (s->insn_pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK)) {
 tcg_gen_goto_tb(n);
 tcg_gen_movi_i32(QREG_PC, dest);
 tcg_gen_exit_tb((uintptr_t)tb + n);
diff --git a/target-s390x/translate.c b/target-s390x/translate.c
index c871ef2..c5179fe 100644
--- a/target-s390x/translate.c
+++ b/target-s390x/translate.c
@@ -610,7 +610,7 @@ static int 

[Qemu-devel] [PULL 38/39] cpu-exec: Remove unused 'x86_cpu' and 'env' from cpu_exec()

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Richard Henderson 
Message-Id: <1462962111-32237-6-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 cpu-exec.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 251988b..0ea47e9 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -570,10 +570,6 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, 
TranslationBlock *tb,
 int cpu_exec(CPUState *cpu)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
-#ifdef TARGET_I386
-X86CPU *x86_cpu = X86_CPU(cpu);
-CPUArchState *env = _cpu->env;
-#endif
 int ret;
 SyncClocks sc;
 
@@ -629,18 +625,10 @@ int cpu_exec(CPUState *cpu)
  * Newer versions of gcc would complain about this code 
(-Wclobbered). */
 cpu = current_cpu;
 cc = CPU_GET_CLASS(cpu);
-#ifdef TARGET_I386
-x86_cpu = X86_CPU(cpu);
-env = _cpu->env;
-#endif
 #else /* buggy compiler */
 /* Assert that the compiler does not smash local variables. */
 g_assert(cpu == current_cpu);
 g_assert(cc == CPU_GET_CLASS(cpu));
-#ifdef TARGET_I386
-g_assert(x86_cpu == X86_CPU(cpu));
-g_assert(env == _cpu->env);
-#endif
 #endif /* buggy compiler */
 cpu->can_do_io = 1;
 tb_lock_reset();
-- 
2.5.5




[Qemu-devel] [PULL 32/39] tcg: Remove needless CPUState::current_tb

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

This field was used for telling cpu_interrupt() to unlink a chain of TBs
being executed when it worked that way. Now, cpu_interrupt() don't do
this anymore. So we don't need this field anymore.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Message-Id: <1462273462-14036-1-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 cpu-exec-common.c  |  2 --
 cpu-exec.c |  4 
 cputlb.c   | 13 -
 hw/i386/kvmvapic.c |  1 -
 include/qom/cpu.h  |  2 --
 qom/cpu.c  |  1 -
 translate-all.c| 20 ++--
 7 files changed, 2 insertions(+), 41 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 1b1731c..6bdda6b 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -68,7 +68,6 @@ void cpu_reloading_memory_map(void)
 
 void cpu_loop_exit(CPUState *cpu)
 {
-cpu->current_tb = NULL;
 siglongjmp(cpu->jmp_env, 1);
 }
 
@@ -77,6 +76,5 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
 if (pc) {
 cpu_restore_state(cpu, pc);
 }
-cpu->current_tb = NULL;
 siglongjmp(cpu->jmp_env, 1);
 }
diff --git a/cpu-exec.c b/cpu-exec.c
index 7380b1e..d43d5ae 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -216,11 +216,9 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
  | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
 tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
 cpu->tb_flushed |= old_tb_flushed;
-cpu->current_tb = tb;
 /* execute the generated code */
 trace_exec_tb_nocache(tb, tb->pc);
 cpu_tb_exec(cpu, tb);
-cpu->current_tb = NULL;
 tb_phys_invalidate(tb, -1);
 tb_free(tb);
 }
@@ -532,9 +530,7 @@ int cpu_exec(CPUState *cpu)
 uintptr_t ret;
 trace_exec_tb(tb, tb->pc);
 /* execute the generated code */
-cpu->current_tb = tb;
 ret = cpu_tb_exec(cpu, tb);
-cpu->current_tb = NULL;
 last_tb = (TranslationBlock *)(ret & ~TB_EXIT_MASK);
 tb_exit = ret & TB_EXIT_MASK;
 switch (tb_exit) {
diff --git a/cputlb.c b/cputlb.c
index 43b..167280a 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -76,10 +76,6 @@ void tlb_flush(CPUState *cpu, int flush_global)
 
 tlb_debug("(%d)\n", flush_global);
 
-/* must reset current TB so that interrupts cannot modify the
-   links while we are modifying them */
-cpu->current_tb = NULL;
-
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
 memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
@@ -95,9 +91,6 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, 
va_list argp)
 CPUArchState *env = cpu->env_ptr;
 
 tlb_debug("start\n");
-/* must reset current TB so that interrupts cannot modify the
-   links while we are modifying them */
-cpu->current_tb = NULL;
 
 for (;;) {
 int mmu_idx = va_arg(argp, int);
@@ -152,9 +145,6 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
 tlb_flush(cpu, 1);
 return;
 }
-/* must reset current TB so that interrupts cannot modify the
-   links while we are modifying them */
-cpu->current_tb = NULL;
 
 addr &= TARGET_PAGE_MASK;
 i = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
@@ -193,9 +183,6 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong 
addr, ...)
 va_end(argp);
 return;
 }
-/* must reset current TB so that interrupts cannot modify the
-   links while we are modifying them */
-cpu->current_tb = NULL;
 
 addr &= TARGET_PAGE_MASK;
 i = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index 4bb695d..f14445d 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -446,7 +446,6 @@ static void patch_instruction(VAPICROMState *s, X86CPU 
*cpu, target_ulong ip)
 resume_all_vcpus();
 
 if (!kvm_enabled()) {
-cs->current_tb = NULL;
 tb_gen_code(cs, current_pc, current_cs_base, current_flags, 1);
 cpu_resume_from_signal(cs, NULL);
 }
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index c1ae24d..4349c46 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -253,7 +253,6 @@ struct kvm_run;
  * @as: Pointer to the first AddressSpace, for the convenience of targets which
  *  only have a single AddressSpace
  * @env_ptr: Pointer to subclass-specific CPUArchState field.
- * @current_tb: Currently executing TB.
  * @gdb_regs: Additional GDB registers.
  * @gdb_num_regs: Number of total registers accessible to GDB.
  * @gdb_num_g_regs: Number of registers in GDB 'g' packets.
@@ -305,7 +304,6 @@ struct CPUState {
 MemoryRegion *memory;
 
 void *env_ptr; /* 

[Qemu-devel] [PULL 30/39] tcg: Rework tb_invalidated_flag

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

'tb_invalidated_flag' was meant to catch two events:
 * some TB has been invalidated by tb_phys_invalidate();
 * the whole translation buffer has been flushed by tb_flush().

Then it was checked:
 * in cpu_exec() to ensure that the last executed TB can be safely
   linked to directly call the next one;
 * in cpu_exec_nocache() to decide if the original TB should be provided
   for further possible invalidation along with the temporarily
   generated TB.

It is always safe to patch an invalidated TB since it is not going to be
used anyway. It is also safe to call tb_phys_invalidate() for an already
invalidated TB. Thus, setting this flag in tb_phys_invalidate() is
simply unnecessary. Moreover, it can prevent from pretty proper linking
of TBs, if any arbitrary TB has been invalidated. So just don't touch it
in tb_phys_invalidate().

If this flag is only used to catch whether tb_flush() has been called
then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using
only 'true' and 'false' to set its value. Also, instead of setting it in
tb_gen_code(), just after tb_flush() has been called, do it right inside
of tb_flush().

In cpu_exec(), this flag is used to track if tb_flush() has been called
and have made 'next_tb' (a reference to the last executed TB) invalid
for linking it to directly call the next TB. tb_flush() can be called
during the CPU execution loop from tb_gen_code(), during TB execution or
by another thread while 'tb_lock' is released. Catch for translation
buffer flush reliably by resetting this flag once before first TB lookup
and each time we find it set before trying to add a direct jump. Don't
touch in in tb_find_physical().

Each vCPU has its own execution loop in multithreaded mode and thus
should have its own copy of the flag to be able to reset it with its own
'next_tb' and don't affect any other vCPU execution thread. So make this
flag per-vCPU and move it to CPUState.

In cpu_exec_nocache(), we only need to check if tb_flush() has been
called from tb_gen_code() called by cpu_exec_nocache() itself. To do
this reliably, preserve the old value of the flag, reset it before
calling tb_gen_code(), check afterwards, and combine the saved value
back to the flag.

This patch is based on the patch "tcg: move tb_invalidated_flag to
CPUState" from Paolo Bonzini .

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Signed-off-by: Richard Henderson 
---
 cpu-exec.c  | 21 +++--
 include/exec/exec-all.h |  2 --
 include/qom/cpu.h   |  2 ++
 translate-all.c |  5 +
 4 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 9407c66..f49a436 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -202,16 +202,20 @@ static void cpu_exec_nocache(CPUState *cpu, int 
max_cycles,
  TranslationBlock *orig_tb, bool ignore_icount)
 {
 TranslationBlock *tb;
+bool old_tb_flushed;
 
 /* Should never happen.
We only end up here when an existing TB is too long.  */
 if (max_cycles > CF_COUNT_MASK)
 max_cycles = CF_COUNT_MASK;
 
+old_tb_flushed = cpu->tb_flushed;
+cpu->tb_flushed = false;
 tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
  max_cycles | CF_NOCACHE
  | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
-tb->orig_tb = tcg_ctx.tb_ctx.tb_invalidated_flag ? NULL : orig_tb;
+tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
+cpu->tb_flushed |= old_tb_flushed;
 cpu->current_tb = tb;
 /* execute the generated code */
 trace_exec_tb_nocache(tb, tb->pc);
@@ -232,8 +236,6 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
 unsigned int h;
 tb_page_addr_t phys_pc, phys_page1;
 
-tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
-
 /* find translated block using physical mappings */
 phys_pc = get_page_addr_code(env, pc);
 phys_page1 = phys_pc & TARGET_PAGE_MASK;
@@ -446,6 +448,7 @@ int cpu_exec(CPUState *cpu)
 }
 
 last_tb = NULL; /* forget the last executed TB after exception */
+cpu->tb_flushed = false; /* reset before first TB lookup */
 for(;;) {
 interrupt_request = cpu->interrupt_request;
 if (unlikely(interrupt_request)) {
@@ -510,14 +513,12 @@ int cpu_exec(CPUState *cpu)
 }
 tb_lock();
 tb = tb_find_fast(cpu);
-/* Note: we do it here to avoid a gcc bug on Mac OS X when
-   doing it in tb_find_slow */
-if (tcg_ctx.tb_ctx.tb_invalidated_flag) {
-/* as some TB could have been invalidated because
-   of memory exceptions while generating the code, we
-   must recompute the hash index here */
+ 

Re: [Qemu-devel] [PATCH v2 0/2] fix coverity complaint

2016-05-12 Thread Gonglei (Arei)
> From: Gerd Hoffmann [mailto:kra...@redhat.com]
> Sent: Thursday, May 12, 2016 9:38 PM
> To: Gonglei (Arei)
> Cc: qemu-devel@nongnu.org
> Subject: Re: [PATCH v2 0/2] fix coverity complaint
> 
> On Do, 2016-05-12 at 17:57 +0800, Gonglei wrote:
> > Rebase on the latest master brunch.
>   ^^
> Haha.  Intentional or tyops?
> 
Oh, sorry, it's a typo ;)

> Added to ui queue now.
> 
> thanks,
>   Gerd

Regards,
-Gonglei


[Qemu-devel] [PULL 31/39] cpu-exec: Move TB chaining into tb_find_fast()

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Move tb_add_jump() call and surrounding code from cpu_exec() into
tb_find_fast(). That simplifies cpu_exec() a little by hiding the direct
chaining optimization details into tb_find_fast(). It also allows to
move tb_lock()/tb_unlock() pair into tb_find_fast(), putting it closer
to tb_find_slow() which also manipulates the lock.

Suggested-by: Alex Bennée 
Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Signed-off-by: Richard Henderson 
[rth: Fixed rebase typo in nochain test.]
---
 cpu-exec.c | 35 +++
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index f49a436..7380b1e 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -320,7 +320,9 @@ found:
 return tb;
 }
 
-static inline TranslationBlock *tb_find_fast(CPUState *cpu)
+static inline TranslationBlock *tb_find_fast(CPUState *cpu,
+ TranslationBlock **last_tb,
+ int tb_exit)
 {
 CPUArchState *env = (CPUArchState *)cpu->env_ptr;
 TranslationBlock *tb;
@@ -331,11 +333,24 @@ static inline TranslationBlock *tb_find_fast(CPUState 
*cpu)
always be the same before a given translated block
is executed. */
 cpu_get_tb_cpu_state(env, , _base, );
+tb_lock();
 tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
 if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
  tb->flags != flags)) {
 tb = tb_find_slow(cpu, pc, cs_base, flags);
 }
+if (cpu->tb_flushed) {
+/* Ensure that no TB jump will be modified as the
+ * translation buffer has been flushed.
+ */
+*last_tb = NULL;
+cpu->tb_flushed = false;
+}
+/* See if we can patch the calling TB. */
+if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+tb_add_jump(*last_tb, tb_exit, tb);
+}
+tb_unlock();
 return tb;
 }
 
@@ -441,7 +456,8 @@ int cpu_exec(CPUState *cpu)
 } else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
 /* try to cause an exception pending in the log */
-cpu_exec_nocache(cpu, 1, tb_find_fast(cpu), true);
+last_tb = NULL; /* Avoid chaining TBs */
+cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, _tb, 0), true);
 ret = -1;
 break;
 #endif
@@ -511,20 +527,7 @@ int cpu_exec(CPUState *cpu)
 cpu->exception_index = EXCP_INTERRUPT;
 cpu_loop_exit(cpu);
 }
-tb_lock();
-tb = tb_find_fast(cpu);
-if (cpu->tb_flushed) {
-/* Ensure that no TB jump will be modified as the
- * translation buffer has been flushed.
- */
-last_tb = NULL;
-cpu->tb_flushed = false;
-}
-/* See if we can patch the calling TB. */
-if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-tb_add_jump(last_tb, tb_exit, tb);
-}
-tb_unlock();
+tb = tb_find_fast(cpu, _tb, tb_exit);
 if (likely(!cpu->exit_request)) {
 uintptr_t ret;
 trace_exec_tb(tb, tb->pc);
-- 
2.5.5




[Qemu-devel] [PULL 26/39] tcg: code_bitmap and code_write_count are not used by user-mode emulation

2016-05-12 Thread Richard Henderson
From: Paolo Bonzini 

Signed-off-by: Paolo Bonzini 
[Sergey Fedorov: eliminate the field entirely in user-mode]
Signed-off-by: Sergey Fedorov 
Reviewed-by: Richard Henderson  
Reviewed-by: Alex Bennée 
[rth: merged followup fixup]
Message-Id: <1462982777-4513-1-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 translate-all.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index d679ad1..d5d2bbe 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -72,11 +72,12 @@
 typedef struct PageDesc {
 /* list of TBs intersecting this ram page */
 TranslationBlock *first_tb;
+#ifdef CONFIG_SOFTMMU
 /* in order to optimize self modifying code, we count the number
of lookups we do to a given page to use a bitmap */
 unsigned int code_write_count;
 unsigned long *code_bitmap;
-#if defined(CONFIG_USER_ONLY)
+#else
 unsigned long flags;
 #endif
 } PageDesc;
@@ -783,9 +784,11 @@ void tb_free(TranslationBlock *tb)
 
 static inline void invalidate_page_bitmap(PageDesc *p)
 {
+#ifdef CONFIG_SOFTMMU
 g_free(p->code_bitmap);
 p->code_bitmap = NULL;
 p->code_write_count = 0;
+#endif
 }
 
 /* Set to NULL all the 'first_tb' fields in all PageDescs. */
@@ -1028,6 +1031,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
 }
 
+#ifdef CONFIG_SOFTMMU
 static void build_page_bitmap(PageDesc *p)
 {
 int n, tb_start, tb_end;
@@ -1056,6 +1060,7 @@ static void build_page_bitmap(PageDesc *p)
 tb = tb->page_next[n];
 }
 }
+#endif
 
 /* add the tb in the target page and protect it if necessary
  *
@@ -1412,6 +1417,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, 
tb_page_addr_t end,
 #endif
 }
 
+#ifdef CONFIG_SOFTMMU
 /* len must be <= 8 and start must be a multiple of len */
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
 {
@@ -1449,8 +1455,7 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, 
int len)
 tb_invalidate_phys_page_range(start, start + len, 1);
 }
 }
-
-#if !defined(CONFIG_SOFTMMU)
+#else
 /* Called with mmap_lock held.  */
 static void tb_invalidate_phys_page(tb_page_addr_t addr,
 uintptr_t pc, void *puc,
-- 
2.5.5




[Qemu-devel] [PULL 37/39] cpu-exec: Move TB execution stuff out of cpu_exec()

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Simplify cpu_exec() by extracting TB execution code outside of
cpu_exec() into a new static inline function cpu_loop_exec_tb().

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Richard Henderson 
Message-Id: <1462962111-32237-5-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 cpu-exec.c | 119 +
 1 file changed, 64 insertions(+), 55 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index c83b354..251988b 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -505,6 +505,66 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
 }
 }
 
+static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
+TranslationBlock **last_tb, int *tb_exit,
+SyncClocks *sc)
+{
+uintptr_t ret;
+
+if (unlikely(cpu->exit_request)) {
+return;
+}
+
+trace_exec_tb(tb, tb->pc);
+ret = cpu_tb_exec(cpu, tb);
+*last_tb = (TranslationBlock *)(ret & ~TB_EXIT_MASK);
+*tb_exit = ret & TB_EXIT_MASK;
+switch (*tb_exit) {
+case TB_EXIT_REQUESTED:
+/* Something asked us to stop executing
+ * chained TBs; just continue round the main
+ * loop. Whatever requested the exit will also
+ * have set something else (eg exit_request or
+ * interrupt_request) which we will handle
+ * next time around the loop.  But we need to
+ * ensure the tcg_exit_req read in generated code
+ * comes before the next read of cpu->exit_request
+ * or cpu->interrupt_request.
+ */
+smp_rmb();
+*last_tb = NULL;
+break;
+case TB_EXIT_ICOUNT_EXPIRED:
+{
+/* Instruction counter expired.  */
+#ifdef CONFIG_USER_ONLY
+abort();
+#else
+int insns_left = cpu->icount_decr.u32;
+if (cpu->icount_extra && insns_left >= 0) {
+/* Refill decrementer and continue execution.  */
+cpu->icount_extra += insns_left;
+insns_left = MIN(0x, cpu->icount_extra);
+cpu->icount_extra -= insns_left;
+cpu->icount_decr.u16.low = insns_left;
+} else {
+if (insns_left > 0) {
+/* Execute remaining instructions.  */
+cpu_exec_nocache(cpu, insns_left, *last_tb, false);
+align_clocks(sc, cpu);
+}
+cpu->exception_index = EXCP_INTERRUPT;
+*last_tb = NULL;
+cpu_loop_exit(cpu);
+}
+break;
+#endif
+}
+default:
+break;
+}
+}
+
 /* main execution loop */
 
 int cpu_exec(CPUState *cpu)
@@ -515,8 +575,6 @@ int cpu_exec(CPUState *cpu)
 CPUArchState *env = _cpu->env;
 #endif
 int ret;
-TranslationBlock *tb, *last_tb;
-int tb_exit = 0;
 SyncClocks sc;
 
 /* replay_interrupt may need current_cpu */
@@ -543,6 +601,9 @@ int cpu_exec(CPUState *cpu)
 init_delay_params(, cpu);
 
 for(;;) {
+TranslationBlock *tb, *last_tb;
+int tb_exit = 0;
+
 /* prepare setjmp context for exception handling */
 if (sigsetjmp(cpu->jmp_env, 0) == 0) {
 /* if an exception is pending, we execute it here */
@@ -555,59 +616,7 @@ int cpu_exec(CPUState *cpu)
 for(;;) {
 cpu_handle_interrupt(cpu, _tb);
 tb = tb_find_fast(cpu, _tb, tb_exit);
-if (likely(!cpu->exit_request)) {
-uintptr_t ret;
-trace_exec_tb(tb, tb->pc);
-/* execute the generated code */
-ret = cpu_tb_exec(cpu, tb);
-last_tb = (TranslationBlock *)(ret & ~TB_EXIT_MASK);
-tb_exit = ret & TB_EXIT_MASK;
-switch (tb_exit) {
-case TB_EXIT_REQUESTED:
-/* Something asked us to stop executing
- * chained TBs; just continue round the main
- * loop. Whatever requested the exit will also
- * have set something else (eg exit_request or
- * interrupt_request) which we will handle
- * next time around the loop.  But we need to
- * ensure the tcg_exit_req read in generated code
- * comes before the next read of cpu->exit_request
- * or cpu->interrupt_request.
- */
-smp_rmb();
-last_tb = NULL;
-break;
-case TB_EXIT_ICOUNT_EXPIRED:
-{
-/* Instruction counter expired.  */
-#ifdef CONFIG_USER_ONLY
-abort();

[Qemu-devel] [PULL 28/39] cpu-exec: elide more icount code if CONFIG_USER_ONLY

2016-05-12 Thread Richard Henderson
From: Paolo Bonzini 

Signed-off-by: Paolo Bonzini 
[Alex Bennée: #ifndef replay code to match elided functions]
Signed-off-by: Alex Bennée 
Signed-off-by: Sergey Fedorov 
Signed-off-by: Richard Henderson 
---
 cpu-exec.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/cpu-exec.c b/cpu-exec.c
index 02a4907..bd831b5 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -192,6 +192,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, 
TranslationBlock *itb)
 return next_tb;
 }
 
+#ifndef CONFIG_USER_ONLY
 /* Execute the code without caching the generated code. An interpreter
could be used if available. */
 static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
@@ -216,6 +217,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 tb_phys_invalidate(tb, -1);
 tb_free(tb);
 }
+#endif
 
 static TranslationBlock *tb_find_physical(CPUState *cpu,
   target_ulong pc,
@@ -430,12 +432,14 @@ int cpu_exec(CPUState *cpu)
 }
 #endif
 }
+#ifndef CONFIG_USER_ONLY
 } else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
 /* try to cause an exception pending in the log */
 cpu_exec_nocache(cpu, 1, tb_find_fast(cpu), true);
 ret = -1;
 break;
+#endif
 }
 
 next_tb = 0; /* force lookup of first TB */
@@ -542,6 +546,9 @@ int cpu_exec(CPUState *cpu)
 case TB_EXIT_ICOUNT_EXPIRED:
 {
 /* Instruction counter expired.  */
+#ifdef CONFIG_USER_ONLY
+abort();
+#else
 int insns_left = cpu->icount_decr.u32;
 if (cpu->icount_extra && insns_left >= 0) {
 /* Refill decrementer and continue execution.  */
@@ -561,6 +568,7 @@ int cpu_exec(CPUState *cpu)
 cpu_loop_exit(cpu);
 }
 break;
+#endif
 }
 default:
 break;
-- 
2.5.5




[Qemu-devel] [PULL 20/39] tcg: Clarify thread safety check in tb_add_jump()

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

The check is to make sure that another thread hasn't already done the
same while we were outside of tb_lock. Mention this in a comment.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 64c2a66..06da1bc 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -372,20 +372,23 @@ static inline void tb_set_jmp_target(TranslationBlock *tb,
 static inline void tb_add_jump(TranslationBlock *tb, int n,
TranslationBlock *tb_next)
 {
-/* NOTE: this test is only needed for thread safety */
-if (!tb->jmp_list_next[n]) {
-qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
-   "Linking TBs %p [" TARGET_FMT_lx
-   "] index %d -> %p [" TARGET_FMT_lx "]\n",
-   tb->tc_ptr, tb->pc, n,
-   tb_next->tc_ptr, tb_next->pc);
-/* patch the native jump address */
-tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc_ptr);
-
-/* add in TB jmp circular list */
-tb->jmp_list_next[n] = tb_next->jmp_list_first;
-tb_next->jmp_list_first = (uintptr_t)tb | n;
+if (tb->jmp_list_next[n]) {
+/* Another thread has already done this while we were
+ * outside of the lock; nothing to do in this case */
+return;
 }
+qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
+   "Linking TBs %p [" TARGET_FMT_lx
+   "] index %d -> %p [" TARGET_FMT_lx "]\n",
+   tb->tc_ptr, tb->pc, n,
+   tb_next->tc_ptr, tb_next->pc);
+
+/* patch the native jump address */
+tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc_ptr);
+
+/* add in TB jmp circular list */
+tb->jmp_list_next[n] = tb_next->jmp_list_first;
+tb_next->jmp_list_first = (uintptr_t)tb | n;
 }
 
 /* GETRA is the true target of the return instruction that we'll execute,
-- 
2.5.5




[Qemu-devel] [PULL 39/39] cpu-exec: Clean up 'interrupt_request' reloading in cpu_handle_interrupt()

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Suggested-by: Richard Henderson 
Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Message-Id: <1463071937-26607-1-git-send-email-sergey.fedo...@linaro.org>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
 cpu-exec.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 0ea47e9..14df1aa 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -488,10 +488,11 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
 if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
 *last_tb = NULL;
 }
+/* The target hook may have updated the 'cpu->interrupt_request';
+ * reload the 'interrupt_request' value */
+interrupt_request = cpu->interrupt_request;
 }
-/* Don't use the cached interrupt_request value,
-   do_interrupt may have updated the EXITTB flag. */
-if (cpu->interrupt_request & CPU_INTERRUPT_EXITTB) {
+if (interrupt_request & CPU_INTERRUPT_EXITTB) {
 cpu->interrupt_request &= ~CPU_INTERRUPT_EXITTB;
 /* ensure that no TB jump will be modified as
the program flow was changed */
-- 
2.5.5




[Qemu-devel] [PULL 21/39] tcg: Rename tb_jmp_remove() to tb_remove_from_jmp_list()

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

tb_jmp_remove() was only used to remove the TB from a list of all TBs
jumping to the same TB which is n-th jump destination of the given TB.
Put a comment briefly describing the function behavior and rename it to
better reflect its purpose.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 translate-all.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 1dc1a73..5e057ba 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -924,7 +924,8 @@ static inline void tb_page_remove(TranslationBlock **ptb, 
TranslationBlock *tb)
 }
 }
 
-static inline void tb_jmp_remove(TranslationBlock *tb, int n)
+/* remove the TB from a list of TBs jumping to the n-th jump target of the TB 
*/
+static inline void tb_remove_from_jmp_list(TranslationBlock *tb, int n)
 {
 TranslationBlock *tb1;
 uintptr_t *ptb, ntb;
@@ -998,8 +999,8 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 }
 
 /* suppress this TB from the two jump lists */
-tb_jmp_remove(tb, 0);
-tb_jmp_remove(tb, 1);
+tb_remove_from_jmp_list(tb, 0);
+tb_remove_from_jmp_list(tb, 1);
 
 /* suppress any remaining jumps to this TB */
 tb1 = tb->jmp_list_first;
-- 
2.5.5




[Qemu-devel] [PULL 36/39] cpu-exec: Move interrupt handling out of cpu_exec()

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Simplify cpu_exec() by extracting interrupt handling code outside of
cpu_exec() into a new static inline function cpu_handle_interrupt().

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Richard Henderson  
Message-Id: <1462962111-32237-4-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 cpu-exec.c | 132 -
 1 file changed, 70 insertions(+), 62 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 36df395..c83b354 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -437,6 +437,74 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 return false;
 }
 
+static inline void cpu_handle_interrupt(CPUState *cpu,
+TranslationBlock **last_tb)
+{
+CPUClass *cc = CPU_GET_CLASS(cpu);
+int interrupt_request = cpu->interrupt_request;
+
+if (unlikely(interrupt_request)) {
+if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
+/* Mask out external interrupts for this step. */
+interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
+}
+if (interrupt_request & CPU_INTERRUPT_DEBUG) {
+cpu->interrupt_request &= ~CPU_INTERRUPT_DEBUG;
+cpu->exception_index = EXCP_DEBUG;
+cpu_loop_exit(cpu);
+}
+if (replay_mode == REPLAY_MODE_PLAY && !replay_has_interrupt()) {
+/* Do nothing */
+} else if (interrupt_request & CPU_INTERRUPT_HALT) {
+replay_interrupt();
+cpu->interrupt_request &= ~CPU_INTERRUPT_HALT;
+cpu->halted = 1;
+cpu->exception_index = EXCP_HLT;
+cpu_loop_exit(cpu);
+}
+#if defined(TARGET_I386)
+else if (interrupt_request & CPU_INTERRUPT_INIT) {
+X86CPU *x86_cpu = X86_CPU(cpu);
+CPUArchState *env = _cpu->env;
+replay_interrupt();
+cpu_svm_check_intercept_param(env, SVM_EXIT_INIT, 0);
+do_cpu_init(x86_cpu);
+cpu->exception_index = EXCP_HALTED;
+cpu_loop_exit(cpu);
+}
+#else
+else if (interrupt_request & CPU_INTERRUPT_RESET) {
+replay_interrupt();
+cpu_reset(cpu);
+cpu_loop_exit(cpu);
+}
+#endif
+/* The target hook has 3 exit conditions:
+   False when the interrupt isn't processed,
+   True when it is, and we should restart on a new TB,
+   and via longjmp via cpu_loop_exit.  */
+else {
+replay_interrupt();
+if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
+*last_tb = NULL;
+}
+}
+/* Don't use the cached interrupt_request value,
+   do_interrupt may have updated the EXITTB flag. */
+if (cpu->interrupt_request & CPU_INTERRUPT_EXITTB) {
+cpu->interrupt_request &= ~CPU_INTERRUPT_EXITTB;
+/* ensure that no TB jump will be modified as
+   the program flow was changed */
+*last_tb = NULL;
+}
+}
+if (unlikely(cpu->exit_request || replay_has_interrupt())) {
+cpu->exit_request = 0;
+cpu->exception_index = EXCP_INTERRUPT;
+cpu_loop_exit(cpu);
+}
+}
+
 /* main execution loop */
 
 int cpu_exec(CPUState *cpu)
@@ -446,7 +514,7 @@ int cpu_exec(CPUState *cpu)
 X86CPU *x86_cpu = X86_CPU(cpu);
 CPUArchState *env = _cpu->env;
 #endif
-int ret, interrupt_request;
+int ret;
 TranslationBlock *tb, *last_tb;
 int tb_exit = 0;
 SyncClocks sc;
@@ -485,67 +553,7 @@ int cpu_exec(CPUState *cpu)
 last_tb = NULL; /* forget the last executed TB after exception */
 cpu->tb_flushed = false; /* reset before first TB lookup */
 for(;;) {
-interrupt_request = cpu->interrupt_request;
-if (unlikely(interrupt_request)) {
-if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
-/* Mask out external interrupts for this step. */
-interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
-}
-if (interrupt_request & CPU_INTERRUPT_DEBUG) {
-cpu->interrupt_request &= ~CPU_INTERRUPT_DEBUG;
-cpu->exception_index = EXCP_DEBUG;
-cpu_loop_exit(cpu);
-}
-if (replay_mode == REPLAY_MODE_PLAY
-&& !replay_has_interrupt()) {
-/* Do nothing */
-} else if (interrupt_request & CPU_INTERRUPT_HALT) {
-replay_interrupt();
-cpu->interrupt_request &= ~CPU_INTERRUPT_HALT;
-cpu->halted = 1;
-   

[Qemu-devel] [PULL 25/39] tcg: Allow goto_tb to any target PC in user mode

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

In user mode, there's only a static address translation, TBs are always
invalidated properly and direct jumps are reset when mapping change.
Thus the destination address is always valid for direct jumps and
there's no need to restrict it to the pages the TB resides in.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Cc: Riku Voipio 
Cc: Blue Swirl 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 target-alpha/translate.c  |  4 
 target-arm/translate-a64.c|  2 ++
 target-arm/translate.c| 18 --
 target-cris/translate.c   | 18 --
 target-i386/translate.c   | 23 ++-
 target-lm32/translate.c   | 21 +++--
 target-m68k/translate.c   | 18 --
 target-microblaze/translate.c | 15 +++
 target-mips/translate.c   | 20 +++-
 target-moxie/translate.c  | 21 +++--
 target-openrisc/translate.c   | 20 +++-
 target-ppc/translate.c| 20 +++-
 target-s390x/translate.c  | 17 +++--
 target-sh4/translate.c| 21 +++--
 target-sparc/translate.c  | 24 +---
 target-tricore/translate.c| 20 +++-
 target-unicore32/translate.c  | 16 +++-
 target-xtensa/translate.c |  4 
 tcg/tcg-op.h  |  9 ++---
 19 files changed, 221 insertions(+), 90 deletions(-)

diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 5b86992..8c2183a 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -460,12 +460,16 @@ static bool use_goto_tb(DisasContext *ctx, uint64_t dest)
 || ctx->singlestep_enabled || singlestep) {
 return false;
 }
+#ifndef CONFIG_USER_ONLY
 /* If the destination is in the superpage, the page perms can't change.  */
 if (in_superpage(ctx, dest)) {
 return true;
 }
 /* Check for the dest on the same page as the start of the TB.  */
 return ((ctx->tb->pc ^ dest) & TARGET_PAGE_MASK) == 0;
+#else
+return true;
+#endif
 }
 
 static ExitStatus gen_bdirect(DisasContext *ctx, int ra, int32_t disp)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 24f5e17..5526bbd 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -274,10 +274,12 @@ static inline bool use_goto_tb(DisasContext *s, int n, 
uint64_t dest)
 return false;
 }
 
+#ifndef CONFIG_USER_ONLY
 /* Only link tbs from inside the same guest page */
 if ((s->tb->pc & TARGET_PAGE_MASK) != (dest & TARGET_PAGE_MASK)) {
 return false;
 }
+#endif
 
 return true;
 }
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 34196a8..a43b1f6 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -4049,16 +4049,22 @@ static int disas_vfp_insn(DisasContext *s, uint32_t 
insn)
 return 0;
 }
 
-static inline void gen_goto_tb(DisasContext *s, int n, target_ulong dest)
+static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
 {
-TranslationBlock *tb;
+#ifndef CONFIG_USER_ONLY
+return (s->tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) ||
+   ((s->pc - 1) & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK);
+#else
+return true;
+#endif
+}
 
-tb = s->tb;
-if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) ||
-((s->pc - 1) & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK)) {
+static inline void gen_goto_tb(DisasContext *s, int n, target_ulong dest)
+{
+if (use_goto_tb(s, dest)) {
 tcg_gen_goto_tb(n);
 gen_set_pc_im(s, dest);
-tcg_gen_exit_tb((uintptr_t)tb + n);
+tcg_gen_exit_tb((uintptr_t)s->tb + n);
 } else {
 gen_set_pc_im(s, dest);
 tcg_gen_exit_tb(0);
diff --git a/target-cris/translate.c b/target-cris/translate.c
index 9c8ff8f..f28b199 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -520,16 +520,22 @@ static void t_gen_cc_jmp(TCGv pc_true, TCGv pc_false)
 gen_set_label(l1);
 }
 
-static void gen_goto_tb(DisasContext *dc, int n, target_ulong dest)
+static inline bool use_goto_tb(DisasContext *dc, target_ulong dest)
 {
-TranslationBlock *tb;
-tb = dc->tb;
+#ifndef CONFIG_USER_ONLY
+return (dc->tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) ||
+   (dc->ppc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK);
+#else
+return true;
+#endif
+}
 
-if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) ||
-(dc->ppc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK)) {
+static void gen_goto_tb(DisasContext *dc, int n, target_ulong dest)
+{
+if (use_goto_tb(dc, dest)) {
 tcg_gen_goto_tb(n);
 

[Qemu-devel] [PULL 35/39] cpu-exec: Move exception handling out of cpu_exec()

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Simplify cpu_exec() by extracting exception handling code out of
cpu_exec() into a new static inline function cpu_handle_exception().
Also make cpu_handle_debug_exception() inline as it is used only once.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Richard Henderson 
Message-Id: <1462962111-32237-3-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 cpu-exec.c | 93 +++---
 1 file changed, 52 insertions(+), 41 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 529cac2..36df395 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -374,7 +374,7 @@ static inline bool cpu_handle_halt(CPUState *cpu)
 return false;
 }
 
-static void cpu_handle_debug_exception(CPUState *cpu)
+static inline void cpu_handle_debug_exception(CPUState *cpu)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
 CPUWatchpoint *wp;
@@ -388,6 +388,55 @@ static void cpu_handle_debug_exception(CPUState *cpu)
 cc->debug_excp_handler(cpu);
 }
 
+static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
+{
+if (cpu->exception_index >= 0) {
+if (cpu->exception_index >= EXCP_INTERRUPT) {
+/* exit request from the cpu execution loop */
+*ret = cpu->exception_index;
+if (*ret == EXCP_DEBUG) {
+cpu_handle_debug_exception(cpu);
+}
+cpu->exception_index = -1;
+return true;
+} else {
+#if defined(CONFIG_USER_ONLY)
+/* if user mode only, we simulate a fake exception
+   which will be handled outside the cpu execution
+   loop */
+#if defined(TARGET_I386)
+CPUClass *cc = CPU_GET_CLASS(cpu);
+cc->do_interrupt(cpu);
+#endif
+*ret = cpu->exception_index;
+cpu->exception_index = -1;
+return true;
+#else
+if (replay_exception()) {
+CPUClass *cc = CPU_GET_CLASS(cpu);
+cc->do_interrupt(cpu);
+cpu->exception_index = -1;
+} else if (!replay_has_interrupt()) {
+/* give a chance to iothread in replay mode */
+*ret = EXCP_INTERRUPT;
+return true;
+}
+#endif
+}
+#ifndef CONFIG_USER_ONLY
+} else if (replay_has_exception()
+   && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+/* try to cause an exception pending in the log */
+TranslationBlock *last_tb = NULL; /* Avoid chaining TBs */
+cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, _tb, 0), true);
+*ret = -1;
+return true;
+#endif
+}
+
+return false;
+}
+
 /* main execution loop */
 
 int cpu_exec(CPUState *cpu)
@@ -425,50 +474,12 @@ int cpu_exec(CPUState *cpu)
  */
 init_delay_params(, cpu);
 
-/* prepare setjmp context for exception handling */
 for(;;) {
+/* prepare setjmp context for exception handling */
 if (sigsetjmp(cpu->jmp_env, 0) == 0) {
 /* if an exception is pending, we execute it here */
-if (cpu->exception_index >= 0) {
-if (cpu->exception_index >= EXCP_INTERRUPT) {
-/* exit request from the cpu execution loop */
-ret = cpu->exception_index;
-if (ret == EXCP_DEBUG) {
-cpu_handle_debug_exception(cpu);
-}
-cpu->exception_index = -1;
-break;
-} else {
-#if defined(CONFIG_USER_ONLY)
-/* if user mode only, we simulate a fake exception
-   which will be handled outside the cpu execution
-   loop */
-#if defined(TARGET_I386)
-cc->do_interrupt(cpu);
-#endif
-ret = cpu->exception_index;
-cpu->exception_index = -1;
-break;
-#else
-if (replay_exception()) {
-cc->do_interrupt(cpu);
-cpu->exception_index = -1;
-} else if (!replay_has_interrupt()) {
-/* give a chance to iothread in replay mode */
-ret = EXCP_INTERRUPT;
-break;
-}
-#endif
-}
-#ifndef CONFIG_USER_ONLY
-} else if (replay_has_exception()
-   && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
-/* try to cause an exception pending in the log */
-last_tb = NULL; /* Avoid chaining TBs */
-cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, _tb, 0), true);
-ret = -1;
+if (cpu_handle_exception(cpu, )) {
 break;
-#endif
 }
 

[Qemu-devel] [PULL 27/39] tcg: reorganize tb_find_physical loop

2016-05-12 Thread Richard Henderson
From: Alex Bennée 

Put some comments and improve code structure. This should help reading
the code.

Signed-off-by: Alex Bennée 
[Sergey Fedorov: provide commit message; bring back resetting of
tb_invalidated_flag]
Signed-off-by: Sergey Fedorov 
Reviewed-by: Richard Henderson  
Signed-off-by: Richard Henderson 
---
 cpu-exec.c | 44 
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index f984dc7..02a4907 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -223,10 +223,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
   uint32_t flags)
 {
 CPUArchState *env = (CPUArchState *)cpu->env_ptr;
-TranslationBlock *tb, **ptb1;
+TranslationBlock *tb, **tb_hash_head, **ptb1;
 unsigned int h;
 tb_page_addr_t phys_pc, phys_page1;
-target_ulong virt_page2;
 
 tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
 
@@ -234,37 +233,42 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
 phys_pc = get_page_addr_code(env, pc);
 phys_page1 = phys_pc & TARGET_PAGE_MASK;
 h = tb_phys_hash_func(phys_pc);
-ptb1 = _ctx.tb_ctx.tb_phys_hash[h];
-for(;;) {
-tb = *ptb1;
-if (!tb) {
-return NULL;
-}
+
+/* Start at head of the hash entry */
+ptb1 = tb_hash_head = _ctx.tb_ctx.tb_phys_hash[h];
+tb = *ptb1;
+
+while (tb) {
 if (tb->pc == pc &&
 tb->page_addr[0] == phys_page1 &&
 tb->cs_base == cs_base &&
 tb->flags == flags) {
-/* check next page if needed */
-if (tb->page_addr[1] != -1) {
-tb_page_addr_t phys_page2;
 
-virt_page2 = (pc & TARGET_PAGE_MASK) +
-TARGET_PAGE_SIZE;
-phys_page2 = get_page_addr_code(env, virt_page2);
+if (tb->page_addr[1] == -1) {
+/* done, we have a match */
+break;
+} else {
+/* check next page if needed */
+target_ulong virt_page2 = (pc & TARGET_PAGE_MASK) +
+  TARGET_PAGE_SIZE;
+tb_page_addr_t phys_page2 = get_page_addr_code(env, 
virt_page2);
+
 if (tb->page_addr[1] == phys_page2) {
 break;
 }
-} else {
-break;
 }
 }
+
 ptb1 = >phys_hash_next;
+tb = *ptb1;
 }
 
-/* Move the TB to the head of the list */
-*ptb1 = tb->phys_hash_next;
-tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h];
-tcg_ctx.tb_ctx.tb_phys_hash[h] = tb;
+if (tb) {
+/* Move the TB to the head of the list */
+*ptb1 = tb->phys_hash_next;
+tb->phys_hash_next = *tb_hash_head;
+*tb_hash_head = tb;
+}
 return tb;
 }
 
-- 
2.5.5




[Qemu-devel] [PULL 16/39] tcg: Clean up direct block chaining data fields

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Briefly describe in a comment how direct block chaining is done. It
should help in understanding of the following data fields.

Rename some fields in TranslationBlock and TCGContext structures to
better reflect their purpose (dropping excessive 'tb_' prefix in
TranslationBlock but keeping it in TCGContext):
   tb_next_offset  =>  jmp_reset_offset
   tb_jmp_offset   =>  jmp_insn_offset
   tb_next =>  jmp_target_addr
   jmp_next=>  jmp_list_next
   jmp_first   =>  jmp_list_first

Avoid using a magic constant as an invalid offset which is used to
indicate that there's no n-th jump generated.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h  | 44 --
 tcg/aarch64/tcg-target.inc.c |  7 +++---
 tcg/arm/tcg-target.inc.c |  8 +++
 tcg/i386/tcg-target.inc.c|  8 +++
 tcg/ia64/tcg-target.inc.c|  6 +++---
 tcg/mips/tcg-target.inc.c|  8 +++
 tcg/ppc/tcg-target.inc.c |  6 +++---
 tcg/s390/tcg-target.inc.c| 11 +-
 tcg/sparc/tcg-target.inc.c   |  9 
 tcg/tcg.h|  6 +++---
 tcg/tci/tcg-target.inc.c | 10 -
 translate-all.c  | 51 +++-
 12 files changed, 96 insertions(+), 78 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 6c113a3..445d946 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -259,20 +259,32 @@ struct TranslationBlock {
 struct TranslationBlock *page_next[2];
 tb_page_addr_t page_addr[2];
 
-/* the following data are used to directly call another TB from
-   the code of this one. */
-uint16_t tb_next_offset[2]; /* offset of original jump target */
+/* The following data are used to directly call another TB from
+ * the code of this one. This can be done either by emitting direct or
+ * indirect native jump instructions. These jumps are reset so that the TB
+ * just continue its execution. The TB can be linked to another one by
+ * setting one of the jump targets (or patching the jump instruction). Only
+ * two of such jumps are supported.
+ */
+uint16_t jmp_reset_offset[2]; /* offset of original jump target */
+#define TB_JMP_RESET_OFFSET_INVALID 0x /* indicates no jump generated */
 #ifdef USE_DIRECT_JUMP
-uint16_t tb_jmp_offset[2]; /* offset of jump instruction */
+uint16_t jmp_insn_offset[2]; /* offset of native jump instruction */
 #else
-uintptr_t tb_next[2]; /* address of jump generated code */
+uintptr_t jmp_target_addr[2]; /* target address for indirect jump */
 #endif
-/* list of TBs jumping to this one. This is a circular list using
-   the two least significant bits of the pointers to tell what is
-   the next pointer: 0 = jmp_next[0], 1 = jmp_next[1], 2 =
-   jmp_first */
-struct TranslationBlock *jmp_next[2];
-struct TranslationBlock *jmp_first;
+/* Each TB has an assosiated circular list of TBs jumping to this one.
+ * jmp_list_first points to the first TB jumping to this one.
+ * jmp_list_next is used to point to the next TB in a list.
+ * Since each TB can have two jumps, it can participate in two lists.
+ * The two least significant bits of a pointer are used to choose which
+ * data field holds a pointer to the next TB:
+ * 0 => jmp_list_next[0], 1 => jmp_list_next[1], 2 => jmp_list_first.
+ * In other words, 0/1 tells which jump is used in the pointed TB,
+ * and 2 means that this is a pointer back to the target TB of this list.
+ */
+struct TranslationBlock *jmp_list_next[2];
+struct TranslationBlock *jmp_list_first;
 };
 
 #include "qemu/thread.h"
@@ -340,7 +352,7 @@ void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr);
 static inline void tb_set_jmp_target(TranslationBlock *tb,
  int n, uintptr_t addr)
 {
-uint16_t offset = tb->tb_jmp_offset[n];
+uint16_t offset = tb->jmp_insn_offset[n];
 tb_set_jmp_target1((uintptr_t)(tb->tc_ptr + offset), addr);
 }
 
@@ -350,7 +362,7 @@ static inline void tb_set_jmp_target(TranslationBlock *tb,
 static inline void tb_set_jmp_target(TranslationBlock *tb,
  int n, uintptr_t addr)
 {
-tb->tb_next[n] = addr;
+tb->jmp_target_addr[n] = addr;
 }
 
 #endif
@@ -359,7 +371,7 @@ static inline void tb_add_jump(TranslationBlock *tb, int n,
TranslationBlock *tb_next)
 {
 /* NOTE: this test is only needed for thread safety */
-if (!tb->jmp_next[n]) {
+if (!tb->jmp_list_next[n]) {
 qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
"Linking TBs %p [" TARGET_FMT_lx
   

[Qemu-devel] [PULL 22/39] tcg: Extract removing of jumps to TB from tb_phys_invalidate()

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Move the code for removing jumps to a TB out of tb_phys_invalidate() to
a separate static inline function tb_jmp_unlink(). This simplifies
tb_phys_invalidate() and improves code structure.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 translate-all.c | 44 ++--
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 5e057ba..9a57aab 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -962,14 +962,37 @@ static inline void tb_reset_jump(TranslationBlock *tb, 
int n)
 tb_set_jmp_target(tb, n, addr);
 }
 
+/* remove any jumps to the TB */
+static inline void tb_jmp_unlink(TranslationBlock *tb)
+{
+uintptr_t tb1, tb2;
+unsigned int n1;
+
+tb1 = tb->jmp_list_first;
+for (;;) {
+TranslationBlock *tmp_tb;
+n1 = tb1 & 3;
+if (n1 == 2) {
+break;
+}
+tmp_tb = (TranslationBlock *)(tb1 & ~3);
+tb2 = tmp_tb->jmp_list_next[n1];
+tb_reset_jump(tmp_tb, n1);
+tmp_tb->jmp_list_next[n1] = (uintptr_t)NULL;
+tb1 = tb2;
+}
+
+assert(((uintptr_t)tb & 3) == 0);
+tb->jmp_list_first = (uintptr_t)tb | 2; /* fail safe */
+}
+
 /* invalidate one TB */
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 {
 CPUState *cpu;
 PageDesc *p;
-unsigned int h, n1;
+unsigned int h;
 tb_page_addr_t phys_pc;
-uintptr_t tb1, tb2;
 
 /* remove the TB from the hash list */
 phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
@@ -1003,22 +1026,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 tb_remove_from_jmp_list(tb, 1);
 
 /* suppress any remaining jumps to this TB */
-tb1 = tb->jmp_list_first;
-for (;;) {
-TranslationBlock *tmp_tb;
-n1 = tb1 & 3;
-if (n1 == 2) {
-break;
-}
-tmp_tb = (TranslationBlock *)(tb1 & ~3);
-tb2 = tmp_tb->jmp_list_next[n1];
-tb_reset_jump(tmp_tb, n1);
-tmp_tb->jmp_list_next[n1] = (uintptr_t)NULL;
-tb1 = tb2;
-}
-
-assert(((uintptr_t)tb & 3) == 0);
-tb->jmp_list_first = (uintptr_t)tb | 2; /* fail safe */
+tb_jmp_unlink(tb);
 
 tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
 }
-- 
2.5.5




[Qemu-devel] [PULL 34/39] cpu-exec: Move halt handling out of cpu_exec()

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Simplify cpu_exec() by extracting CPU halt state handling code out of
cpu_exec() into a new static inline function cpu_handle_halt().

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Richard Henderson 
Message-Id: <1462962111-32237-2-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 cpu-exec.c | 38 --
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index d55faa5..529cac2 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -352,6 +352,28 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
 return tb;
 }
 
+static inline bool cpu_handle_halt(CPUState *cpu)
+{
+if (cpu->halted) {
+#if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
+if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
+&& replay_interrupt()) {
+X86CPU *x86_cpu = X86_CPU(cpu);
+apic_poll_irq(x86_cpu->apic_state);
+cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
+}
+#endif
+if (!cpu_has_work(cpu)) {
+current_cpu = NULL;
+return true;
+}
+
+cpu->halted = 0;
+}
+
+return false;
+}
+
 static void cpu_handle_debug_exception(CPUState *cpu)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
@@ -383,20 +405,8 @@ int cpu_exec(CPUState *cpu)
 /* replay_interrupt may need current_cpu */
 current_cpu = cpu;
 
-if (cpu->halted) {
-#if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
-if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
-&& replay_interrupt()) {
-apic_poll_irq(x86_cpu->apic_state);
-cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
-}
-#endif
-if (!cpu_has_work(cpu)) {
-current_cpu = NULL;
-return EXCP_HALTED;
-}
-
-cpu->halted = 0;
+if (cpu_handle_halt(cpu)) {
+return EXCP_HALTED;
 }
 
 atomic_mb_set(_current_cpu, cpu);
-- 
2.5.5




[Qemu-devel] [PULL 14/39] translate-all: add missing munmap of the code_gen guard page for MIPS

2016-05-12 Thread Richard Henderson
From: "Emilio G. Cota" 

Signed-off-by: Emilio G. Cota 
Message-Id: <1461283314-2353-2-git-send-email-c...@braap.org>
Signed-off-by: Richard Henderson 
---
 translate-all.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 6b0ecb4..93b91ba 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -684,11 +684,11 @@ static inline void *alloc_code_gen_buffer(void)
 case 1:
 if (!cross_256mb(buf2, size)) {
 /* Success!  Use the new buffer.  */
-munmap(buf, size);
+munmap(buf, size + qemu_real_host_page_size);
 break;
 }
 /* Failure.  Work with what we had.  */
-munmap(buf2, size);
+munmap(buf2, size + qemu_real_host_page_size);
 /* fallthru */
 default:
 /* Split the original buffer.  Free the smaller half.  */
-- 
2.5.5




[Qemu-devel] [PULL 15/39] translate-all: Adjust 256mb testing for mips64

2016-05-12 Thread Richard Henderson
Make sure we preserve the high 32-bits when masking for mips64.

Signed-off-by: Richard Henderson 
---
 translate-all.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 93b91ba..79a515d 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -515,7 +515,7 @@ static inline size_t size_code_gen_buffer(size_t tb_size)
that the buffer not cross a 256MB boundary.  */
 static inline bool cross_256mb(void *addr, size_t size)
 {
-return ((uintptr_t)addr ^ ((uintptr_t)addr + size)) & 0xf000;
+return ((uintptr_t)addr ^ ((uintptr_t)addr + size)) & ~0x0ffful;
 }
 
 /* We weren't able to allocate a buffer without crossing that boundary,
@@ -523,7 +523,7 @@ static inline bool cross_256mb(void *addr, size_t size)
Returns the new base of the buffer, and adjusts code_gen_buffer_size.  */
 static inline void *split_cross_256mb(void *buf1, size_t size1)
 {
-void *buf2 = (void *)(((uintptr_t)buf1 + size1) & 0xf000);
+void *buf2 = (void *)(((uintptr_t)buf1 + size1) & ~0x0ffful);
 size_t size2 = buf1 + size1 - buf2;
 
 size1 = buf2 - buf1;
-- 
2.5.5




[Qemu-devel] [PULL 19/39] tcg: Init TB's direct jumps before making it visible

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Initialize TB's direct jump list data fields and reset the jumps before
tb_link_page() puts it into the physical hash table and the physical
page list. So TB is completely initialized before it becomes visible.

This is pure rearrangement of code to a more suitable place, though it
could be a preparation for relaxing the locking scheme in future.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 translate-all.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 4a58af4..1dc1a73 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1134,19 +1134,6 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 tb->page_addr[1] = -1;
 }
 
-assert(((uintptr_t)tb & 3) == 0);
-tb->jmp_list_first = (uintptr_t)tb | 2;
-tb->jmp_list_next[0] = (uintptr_t)NULL;
-tb->jmp_list_next[1] = (uintptr_t)NULL;
-
-/* init original jump addresses */
-if (tb->jmp_reset_offset[0] != TB_JMP_RESET_OFFSET_INVALID) {
-tb_reset_jump(tb, 0);
-}
-if (tb->jmp_reset_offset[1] != TB_JMP_RESET_OFFSET_INVALID) {
-tb_reset_jump(tb, 1);
-}
-
 #ifdef DEBUG_TB_CHECK
 tb_page_check();
 #endif
@@ -1255,12 +1242,31 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 ROUND_UP((uintptr_t)gen_code_buf + gen_code_size + search_size,
  CODE_GEN_ALIGN);
 
+/* init jump list */
+assert(((uintptr_t)tb & 3) == 0);
+tb->jmp_list_first = (uintptr_t)tb | 2;
+tb->jmp_list_next[0] = (uintptr_t)NULL;
+tb->jmp_list_next[1] = (uintptr_t)NULL;
+
+/* init original jump addresses wich has been set during tcg_gen_code() */
+if (tb->jmp_reset_offset[0] != TB_JMP_RESET_OFFSET_INVALID) {
+tb_reset_jump(tb, 0);
+}
+if (tb->jmp_reset_offset[1] != TB_JMP_RESET_OFFSET_INVALID) {
+tb_reset_jump(tb, 1);
+}
+
 /* check next page if needed */
 virt_page2 = (pc + tb->size - 1) & TARGET_PAGE_MASK;
 phys_page2 = -1;
 if ((pc & TARGET_PAGE_MASK) != virt_page2) {
 phys_page2 = get_page_addr_code(env, virt_page2);
 }
+/* As long as consistency of the TB stuff is provided by tb_lock in user
+ * mode and is implicit in single-threaded softmmu emulation, no explicit
+ * memory barrier is required before tb_link_page() makes the TB visible
+ * through the physical hash table and physical page list.
+ */
 tb_link_page(tb, phys_pc, phys_page2);
 return tb;
 }
-- 
2.5.5




[Qemu-devel] [PULL 10/39] tcg/sparc: Make direct jump patching thread-safe

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Ensure direct jump patching in SPARC is atomic by using
atomic_read()/atomic_set() for code patching.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
Message-Id: <1461341333-19646-10-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 tcg/sparc/tcg-target.inc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index d641cfd..c6479e2 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -1647,6 +1647,6 @@ void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t 
addr)
the code_gen_buffer can't be larger than 2GB.  */
 tcg_debug_assert(disp == (int32_t)disp);
 
-*ptr = CALL | (uint32_t)disp >> 2;
+atomic_set(ptr, deposit32(CALL, 0, 30, disp >> 2));
 flush_icache_range(jmp_addr, jmp_addr + 4);
 }
-- 
2.5.5




[Qemu-devel] [PULL 29/39] tcg: Clean up from 'next_tb'

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

The value returned from tcg_qemu_tb_exec() is the value passed to the
corresponding tcg_gen_exit_tb() at translation time of the last TB
attempted to execute. It is a little confusing to store it in a variable
named 'next_tb'. In fact, it is a combination of 4-byte aligned pointer
and additional information in its two least significant bits. Break it
down right away into two variables named 'last_tb' and 'tb_exit' which
are a pointer to the last TB attempted to execute and the TB exit
reason, correspondingly. This simplifies the code and improves its
readability.

Correct a misleading documentation comment for tcg_qemu_tb_exec() and
fix logging in cpu_tb_exec(). Also rename a misleading 'next_tb' in
another couple of places.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Signed-off-by: Richard Henderson 
---
 cpu-exec.c   | 59 ---
 tcg/tcg.h| 19 ++-
 tci.c|  6 +++---
 trace-events |  2 +-
 4 files changed, 46 insertions(+), 40 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index bd831b5..9407c66 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -136,7 +136,9 @@ static void init_delay_params(SyncClocks *sc, const 
CPUState *cpu)
 static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock 
*itb)
 {
 CPUArchState *env = cpu->env_ptr;
-uintptr_t next_tb;
+uintptr_t ret;
+TranslationBlock *last_tb;
+int tb_exit;
 uint8_t *tb_ptr = itb->tc_ptr;
 
 qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc,
@@ -160,36 +162,37 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, 
TranslationBlock *itb)
 #endif /* DEBUG_DISAS */
 
 cpu->can_do_io = !use_icount;
-next_tb = tcg_qemu_tb_exec(env, tb_ptr);
+ret = tcg_qemu_tb_exec(env, tb_ptr);
 cpu->can_do_io = 1;
-trace_exec_tb_exit((void *) (next_tb & ~TB_EXIT_MASK),
-   next_tb & TB_EXIT_MASK);
+last_tb = (TranslationBlock *)(ret & ~TB_EXIT_MASK);
+tb_exit = ret & TB_EXIT_MASK;
+trace_exec_tb_exit(last_tb, tb_exit);
 
-if ((next_tb & TB_EXIT_MASK) > TB_EXIT_IDX1) {
+if (tb_exit > TB_EXIT_IDX1) {
 /* We didn't start executing this TB (eg because the instruction
  * counter hit zero); we must restore the guest PC to the address
  * of the start of the TB.
  */
 CPUClass *cc = CPU_GET_CLASS(cpu);
-TranslationBlock *tb = (TranslationBlock *)(next_tb & ~TB_EXIT_MASK);
-qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc,
+qemu_log_mask_and_addr(CPU_LOG_EXEC, last_tb->pc,
"Stopped execution of TB chain before %p ["
TARGET_FMT_lx "] %s\n",
-   itb->tc_ptr, itb->pc, lookup_symbol(itb->pc));
+   last_tb->tc_ptr, last_tb->pc,
+   lookup_symbol(last_tb->pc));
 if (cc->synchronize_from_tb) {
-cc->synchronize_from_tb(cpu, tb);
+cc->synchronize_from_tb(cpu, last_tb);
 } else {
 assert(cc->set_pc);
-cc->set_pc(cpu, tb->pc);
+cc->set_pc(cpu, last_tb->pc);
 }
 }
-if ((next_tb & TB_EXIT_MASK) == TB_EXIT_REQUESTED) {
+if (tb_exit == TB_EXIT_REQUESTED) {
 /* We were asked to stop executing TBs (probably a pending
  * interrupt. We've now stopped, so clear the flag.
  */
 cpu->tcg_exit_req = 0;
 }
-return next_tb;
+return ret;
 }
 
 #ifndef CONFIG_USER_ONLY
@@ -358,8 +361,8 @@ int cpu_exec(CPUState *cpu)
 CPUArchState *env = _cpu->env;
 #endif
 int ret, interrupt_request;
-TranslationBlock *tb;
-uintptr_t next_tb;
+TranslationBlock *tb, *last_tb;
+int tb_exit = 0;
 SyncClocks sc;
 
 /* replay_interrupt may need current_cpu */
@@ -442,7 +445,7 @@ int cpu_exec(CPUState *cpu)
 #endif
 }
 
-next_tb = 0; /* force lookup of first TB */
+last_tb = NULL; /* forget the last executed TB after exception */
 for(;;) {
 interrupt_request = cpu->interrupt_request;
 if (unlikely(interrupt_request)) {
@@ -487,7 +490,7 @@ int cpu_exec(CPUState *cpu)
 else {
 replay_interrupt();
 if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
-next_tb = 0;
+last_tb = NULL;
 }
 }
 /* Don't use the cached interrupt_request value,
@@ -496,7 +499,7 @@ int cpu_exec(CPUState *cpu)
 cpu->interrupt_request &= ~CPU_INTERRUPT_EXITTB;
 /* ensure that no TB jump will be modified as
the program flow was changed */
-  

[Qemu-devel] [PULL 09/39] tcg/aarch64: Make direct jump patching thread-safe

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Ensure direct jump patching in AArch64 is atomic by using
atomic_read()/atomic_set() for code patching.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Message-Id: <1461341333-19646-9-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index a8fb442..88183c8 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -73,6 +73,18 @@ static inline void reloc_pc26(tcg_insn_unit *code_ptr, 
tcg_insn_unit *target)
 *code_ptr = deposit32(*code_ptr, 0, 26, offset);
 }
 
+static inline void reloc_pc26_atomic(tcg_insn_unit *code_ptr,
+ tcg_insn_unit *target)
+{
+ptrdiff_t offset = target - code_ptr;
+tcg_insn_unit insn;
+tcg_debug_assert(offset == sextract64(offset, 0, 26));
+/* read instruction, mask away previous PC_REL26 parameter contents,
+   set the proper offset, then write back the instruction. */
+insn = atomic_read(code_ptr);
+atomic_set(code_ptr, deposit32(insn, 0, 26, offset));
+}
+
 static inline void reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - code_ptr;
@@ -835,7 +847,7 @@ void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, 
uintptr_t addr)
 tcg_insn_unit *code_ptr = (tcg_insn_unit *)jmp_addr;
 tcg_insn_unit *target = (tcg_insn_unit *)addr;
 
-reloc_pc26(code_ptr, target);
+reloc_pc26_atomic(code_ptr, target);
 flush_icache_range(jmp_addr, jmp_addr + 4);
 }
 
-- 
2.5.5




[Qemu-devel] [PULL 18/39] tcg: Rearrange tb_link_page() to avoid forward declaration

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 translate-all.c | 204 
 1 file changed, 101 insertions(+), 103 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 2fb1646..4a58af4 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -153,8 +153,6 @@ void tb_lock_reset(void)
 #endif
 }
 
-static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
- tb_page_addr_t phys_page2);
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
 
 void cpu_gen_init(void)
@@ -1053,6 +1051,107 @@ static void build_page_bitmap(PageDesc *p)
 }
 }
 
+/* add the tb in the target page and protect it if necessary
+ *
+ * Called with mmap_lock held for user-mode emulation.
+ */
+static inline void tb_alloc_page(TranslationBlock *tb,
+ unsigned int n, tb_page_addr_t page_addr)
+{
+PageDesc *p;
+#ifndef CONFIG_USER_ONLY
+bool page_already_protected;
+#endif
+
+tb->page_addr[n] = page_addr;
+p = page_find_alloc(page_addr >> TARGET_PAGE_BITS, 1);
+tb->page_next[n] = p->first_tb;
+#ifndef CONFIG_USER_ONLY
+page_already_protected = p->first_tb != NULL;
+#endif
+p->first_tb = (TranslationBlock *)((uintptr_t)tb | n);
+invalidate_page_bitmap(p);
+
+#if defined(CONFIG_USER_ONLY)
+if (p->flags & PAGE_WRITE) {
+target_ulong addr;
+PageDesc *p2;
+int prot;
+
+/* force the host page as non writable (writes will have a
+   page fault + mprotect overhead) */
+page_addr &= qemu_host_page_mask;
+prot = 0;
+for (addr = page_addr; addr < page_addr + qemu_host_page_size;
+addr += TARGET_PAGE_SIZE) {
+
+p2 = page_find(addr >> TARGET_PAGE_BITS);
+if (!p2) {
+continue;
+}
+prot |= p2->flags;
+p2->flags &= ~PAGE_WRITE;
+  }
+mprotect(g2h(page_addr), qemu_host_page_size,
+ (prot & PAGE_BITS) & ~PAGE_WRITE);
+#ifdef DEBUG_TB_INVALIDATE
+printf("protecting code page: 0x" TARGET_FMT_lx "\n",
+   page_addr);
+#endif
+}
+#else
+/* if some code is already present, then the pages are already
+   protected. So we handle the case where only the first TB is
+   allocated in a physical page */
+if (!page_already_protected) {
+tlb_protect_code(page_addr);
+}
+#endif
+}
+
+/* add a new TB and link it to the physical page tables. phys_page2 is
+ * (-1) to indicate that only one page contains the TB.
+ *
+ * Called with mmap_lock held for user-mode emulation.
+ */
+static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
+ tb_page_addr_t phys_page2)
+{
+unsigned int h;
+TranslationBlock **ptb;
+
+/* add in the physical hash table */
+h = tb_phys_hash_func(phys_pc);
+ptb = _ctx.tb_ctx.tb_phys_hash[h];
+tb->phys_hash_next = *ptb;
+*ptb = tb;
+
+/* add in the page list */
+tb_alloc_page(tb, 0, phys_pc & TARGET_PAGE_MASK);
+if (phys_page2 != -1) {
+tb_alloc_page(tb, 1, phys_page2);
+} else {
+tb->page_addr[1] = -1;
+}
+
+assert(((uintptr_t)tb & 3) == 0);
+tb->jmp_list_first = (uintptr_t)tb | 2;
+tb->jmp_list_next[0] = (uintptr_t)NULL;
+tb->jmp_list_next[1] = (uintptr_t)NULL;
+
+/* init original jump addresses */
+if (tb->jmp_reset_offset[0] != TB_JMP_RESET_OFFSET_INVALID) {
+tb_reset_jump(tb, 0);
+}
+if (tb->jmp_reset_offset[1] != TB_JMP_RESET_OFFSET_INVALID) {
+tb_reset_jump(tb, 1);
+}
+
+#ifdef DEBUG_TB_CHECK
+tb_page_check();
+#endif
+}
+
 /* Called with mmap_lock held for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
   target_ulong pc, target_ulong cs_base,
@@ -1410,107 +1509,6 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
 }
 #endif
 
-/* add the tb in the target page and protect it if necessary
- *
- * Called with mmap_lock held for user-mode emulation.
- */
-static inline void tb_alloc_page(TranslationBlock *tb,
- unsigned int n, tb_page_addr_t page_addr)
-{
-PageDesc *p;
-#ifndef CONFIG_USER_ONLY
-bool page_already_protected;
-#endif
-
-tb->page_addr[n] = page_addr;
-p = page_find_alloc(page_addr >> TARGET_PAGE_BITS, 1);
-tb->page_next[n] = p->first_tb;
-#ifndef CONFIG_USER_ONLY
-page_already_protected = p->first_tb != NULL;
-#endif
-p->first_tb = (TranslationBlock *)((uintptr_t)tb | n);
-invalidate_page_bitmap(p);
-
-#if defined(CONFIG_USER_ONLY)
-if (p->flags & PAGE_WRITE) {
-target_ulong addr;
-PageDesc *p2;
-int prot;
-
-

[Qemu-devel] [PULL 13/39] translate-all: remove redundant setting of tcg_ctx.code_gen_buffer_size

2016-05-12 Thread Richard Henderson
From: "Emilio G. Cota" 

The setting of tcg_ctx.code_gen_buffer_size is done by the only caller of
size_code_gen_buffer(), which is code_gen_alloc():

  $ git grep size_code_gen_buffer
  translate-all.c:static inline size_t size_code_gen_buffer(size_t tb_size)
  translate-all.c:tcg_ctx.code_gen_buffer_size = 
size_code_gen_buffer(tb_size);

Signed-off-by: Emilio G. Cota 
Message-Id: <1461283314-2353-1-git-send-email-c...@braap.org>
Signed-off-by: Richard Henderson 
---
 translate-all.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/translate-all.c b/translate-all.c
index 781819e..6b0ecb4 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -507,7 +507,6 @@ static inline size_t size_code_gen_buffer(size_t tb_size)
 if (tb_size > MAX_CODE_GEN_BUFFER_SIZE) {
 tb_size = MAX_CODE_GEN_BUFFER_SIZE;
 }
-tcg_ctx.code_gen_buffer_size = tb_size;
 return tb_size;
 }
 
-- 
2.5.5




[Qemu-devel] [PULL 04/39] tci: Make direct jump patching thread-safe

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Ensure direct jump patching in TCI is atomic by:
 * naturally aligning a location of direct jump address;
 * using atomic_read()/atomic_set() to load/store the address.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Message-Id: <1461341333-19646-4-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h  | 2 +-
 tcg/tci/tcg-target.inc.c | 2 ++
 tci.c| 5 -
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index c75fb3a..d49befd 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -303,7 +303,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr);
 static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
 {
 /* patch the branch destination */
-*(uint32_t *)jmp_addr = addr - (jmp_addr + 4);
+atomic_set((int32_t *)jmp_addr, addr - (jmp_addr + 4));
 /* no need to flush icache explicitly */
 }
 #elif defined(_ARCH_PPC)
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index e2fc52a..85eeb5d 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -556,6 +556,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 if (s->tb_jmp_offset) {
 /* Direct jump method. */
 tcg_debug_assert(args[0] < ARRAY_SIZE(s->tb_jmp_offset));
+/* Align for atomic patching and thread safety */
+s->code_ptr = QEMU_ALIGN_PTR_UP(s->code_ptr, 4);
 s->tb_jmp_offset[args[0]] = tcg_current_code_size(s);
 tcg_out32(s, 0);
 } else {
diff --git a/tci.c b/tci.c
index 82705fe..a8939e6 100644
--- a/tci.c
+++ b/tci.c
@@ -1089,7 +1089,10 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t 
*tb_ptr)
 goto exit;
 break;
 case INDEX_op_goto_tb:
-t0 = tci_read_i32(_ptr);
+/* Jump address is aligned */
+tb_ptr = QEMU_ALIGN_PTR_UP(tb_ptr, 4);
+t0 = atomic_read((int32_t *)tb_ptr);
+tb_ptr += sizeof(int32_t);
 tci_assert(tb_ptr == old_code_ptr + op_size);
 tb_ptr += (int32_t)t0;
 continue;
-- 
2.5.5




[Qemu-devel] [PULL 08/39] tcg/arm: Make direct jump patching thread-safe

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Ensure direct jump patching in ARM is atomic by using
atomic_read()/atomic_set() for code patching.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Message-Id: <1461341333-19646-8-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h  | 25 ++---
 tcg/arm/tcg-target.inc.c | 18 ++
 2 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 6c095e8..30cdd69 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -328,29 +328,8 @@ static inline void tb_set_jmp_target1(uintptr_t jmp_addr, 
uintptr_t addr)
 void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr);
 #define tb_set_jmp_target1 aarch64_tb_set_jmp_target
 #elif defined(__arm__)
-static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
-{
-#if !QEMU_GNUC_PREREQ(4, 1)
-register unsigned long _beg __asm ("a1");
-register unsigned long _end __asm ("a2");
-register unsigned long _flg __asm ("a3");
-#endif
-
-/* we could use a ldr pc, [pc, #-4] kind of branch and avoid the flush */
-*(uint32_t *)jmp_addr =
-(*(uint32_t *)jmp_addr & ~0xff)
-| (((addr - (jmp_addr + 8)) >> 2) & 0xff);
-
-#if QEMU_GNUC_PREREQ(4, 1)
-__builtin___clear_cache((char *) jmp_addr, (char *) jmp_addr + 4);
-#else
-/* flush icache */
-_beg = jmp_addr;
-_end = jmp_addr + 4;
-_flg = 0;
-__asm __volatile__ ("swi 0x9f0002" : : "r" (_beg), "r" (_end), "r" (_flg));
-#endif
-}
+void arm_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr);
+#define tb_set_jmp_target1 arm_tb_set_jmp_target
 #elif defined(__sparc__) || defined(__mips__)
 void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr);
 #else
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 2b7fbdd..977baa0 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -121,6 +121,14 @@ static inline void reloc_pc24(tcg_insn_unit *code_ptr, 
tcg_insn_unit *target)
 *code_ptr = (*code_ptr & ~0xff) | (offset & 0xff);
 }
 
+static inline void reloc_pc24_atomic(tcg_insn_unit *code_ptr, tcg_insn_unit 
*target)
+{
+ptrdiff_t offset = (tcg_ptr_byte_diff(target, code_ptr) - 8) >> 2;
+tcg_insn_unit insn = atomic_read(code_ptr);
+tcg_debug_assert(offset == sextract32(offset, 0, 24));
+atomic_set(code_ptr, deposit32(insn, 0, 24, offset));
+}
+
 static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 intptr_t value, intptr_t addend)
 {
@@ -1038,6 +1046,16 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit 
*addr)
 }
 }
 
+void arm_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
+{
+tcg_insn_unit *code_ptr = (tcg_insn_unit *)jmp_addr;
+tcg_insn_unit *target = (tcg_insn_unit *)addr;
+
+/* we could use a ldr pc, [pc, #-4] kind of branch and avoid the flush */
+reloc_pc24_atomic(code_ptr, target);
+flush_icache_range(jmp_addr, jmp_addr + 4);
+}
+
 static inline void tcg_out_goto_label(TCGContext *s, int cond, TCGLabel *l)
 {
 if (l->has_value) {
-- 
2.5.5




[Qemu-devel] [PULL 01/39] tb: consistently use uint32_t for tb->flags

2016-05-12 Thread Richard Henderson
From: "Emilio G. Cota" 

We are inconsistent with the type of tb->flags: usage varies loosely
between int and uint64_t. Settle to uint32_t everywhere, which is
superior to both: at least one target (aarch64) uses the most significant
bit in the u32, and uint64_t is wasteful.

Compile-tested for all targets.

Suggested-by: Laurent Desnogues 
Suggested-by: Richard Henderson 
Tested-by: Edgar E. Iglesias 
Reviewed-by: Edgar E. Iglesias 
Reviewed-by: Laurent Desnogues 
Signed-off-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
Message-Id: <1460049562-23517-1-git-send-email-c...@braap.org>
---
 cpu-exec.c  |  6 +++---
 exec.c  |  2 +-
 hw/i386/kvmvapic.c  |  2 +-
 include/exec/exec-all.h |  5 +++--
 target-alpha/cpu.h  |  2 +-
 target-arm/cpu.h|  2 +-
 target-cris/cpu.h   |  2 +-
 target-i386/cpu.h   |  2 +-
 target-i386/translate.c |  2 +-
 target-lm32/cpu.h   |  2 +-
 target-m68k/cpu.h   |  2 +-
 target-microblaze/cpu.h |  2 +-
 target-mips/cpu.h   |  2 +-
 target-moxie/cpu.h  |  2 +-
 target-openrisc/cpu.h   |  2 +-
 target-ppc/cpu.h|  2 +-
 target-s390x/cpu.h  |  2 +-
 target-sh4/cpu.h|  2 +-
 target-sparc/cpu.h  |  2 +-
 target-tilegx/cpu.h |  2 +-
 target-tricore/cpu.h|  2 +-
 target-unicore32/cpu.h  |  2 +-
 target-xtensa/cpu.h |  2 +-
 translate-all.c | 10 +-
 24 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index bbfcbfb..debc65c 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -220,7 +220,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 static TranslationBlock *tb_find_physical(CPUState *cpu,
   target_ulong pc,
   target_ulong cs_base,
-  uint64_t flags)
+  uint32_t flags)
 {
 CPUArchState *env = (CPUArchState *)cpu->env_ptr;
 TranslationBlock *tb, **ptb1;
@@ -271,7 +271,7 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
 static TranslationBlock *tb_find_slow(CPUState *cpu,
   target_ulong pc,
   target_ulong cs_base,
-  uint64_t flags)
+  uint32_t flags)
 {
 TranslationBlock *tb;
 
@@ -314,7 +314,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu)
 CPUArchState *env = (CPUArchState *)cpu->env_ptr;
 TranslationBlock *tb;
 target_ulong cs_base, pc;
-int flags;
+uint32_t flags;
 
 /* we record a subset of the CPU state. It will
always be the same before a given translated block
diff --git a/exec.c b/exec.c
index c4f9036..ee45472 100644
--- a/exec.c
+++ b/exec.c
@@ -2087,7 +2087,7 @@ static void check_watchpoint(int offset, int len, 
MemTxAttrs attrs, int flags)
 target_ulong pc, cs_base;
 target_ulong vaddr;
 CPUWatchpoint *wp;
-int cpu_flags;
+uint32_t cpu_flags;
 
 if (cpu->watchpoint_hit) {
 /* We re-entered the check after replacing the TB. Now raise
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index c69f374..4bb695d 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -397,7 +397,7 @@ static void patch_instruction(VAPICROMState *s, X86CPU 
*cpu, target_ulong ip)
 uint32_t imm32;
 target_ulong current_pc = 0;
 target_ulong current_cs_base = 0;
-int current_flags = 0;
+uint32_t current_flags = 0;
 
 if (smp_cpus == 1) {
 handlers = >rom_state.up;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 7362095..c75fb3a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -76,7 +76,8 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t searched_pc);
 void QEMU_NORETURN cpu_resume_from_signal(CPUState *cpu, void *puc);
 void QEMU_NORETURN cpu_io_recompile(CPUState *cpu, uintptr_t retaddr);
 TranslationBlock *tb_gen_code(CPUState *cpu,
-  target_ulong pc, target_ulong cs_base, int flags,
+  target_ulong pc, target_ulong cs_base,
+  uint32_t flags,
   int cflags);
 void cpu_exec_init(CPUState *cpu, Error **errp);
 void QEMU_NORETURN cpu_loop_exit(CPUState *cpu);
@@ -235,7 +236,7 @@ static inline void tlb_flush_by_mmuidx(CPUState *cpu, ...)
 struct TranslationBlock {
 target_ulong pc;   /* simulated PC corresponding to this block (EIP + CS 
base) */
 target_ulong cs_base; /* CS base for this block */
-uint64_t flags; /* flags defining in which context the code was generated 
*/
+uint32_t flags; /* flags defining in which context the code was generated 
*/
  

[Qemu-devel] [PULL 12/39] tcg: Note requirement on atomic direct jump patching

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
Message-Id: <1461341333-19646-12-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 30cdd69..6c113a3 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -230,6 +230,7 @@ static inline void tlb_flush_by_mmuidx(CPUState *cpu, ...)
 || defined(__sparc__) || defined(__aarch64__) \
 || defined(__s390x__) || defined(__mips__) \
 || defined(CONFIG_TCG_INTERPRETER)
+/* NOTE: Direct jump patching must be atomic to be thread-safe. */
 #define USE_DIRECT_JUMP
 #endif
 
-- 
2.5.5




[Qemu-devel] [PULL 05/39] tcg/ppc: Make direct jump patching thread-safe

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Ensure direct jump patching in PPC is atomic by:
 * limiting translation buffer size in 32-bit mode to be addressable by
   Branch I-form instruction;
 * using atomic_read()/atomic_set() for code patching.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Reviewed-by: Alex Bennée 
Message-Id: <1461341333-19646-5-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.inc.c | 22 ++
 translate-all.c  |  2 ++
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 00bb90f..039fa77 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -1237,6 +1237,7 @@ static void tcg_out_brcond2 (TCGContext *s, const TCGArg 
*args,
 tcg_out_bc(s, BC | BI(7, CR_EQ) | BO_COND_TRUE, arg_label(args[5]));
 }
 
+#ifdef __powerpc64__
 void ppc_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
 {
 tcg_insn_unit i1, i2;
@@ -1265,11 +1266,18 @@ void ppc_tb_set_jmp_target(uintptr_t jmp_addr, 
uintptr_t addr)
 pair = (uint64_t)i2 << 32 | i1;
 #endif
 
-/* ??? __atomic_store_8, presuming there's some way to do that
-   for 32-bit, otherwise this is good enough for 64-bit.  */
-*(uint64_t *)jmp_addr = pair;
+atomic_set((uint64_t *)jmp_addr, pair);
 flush_icache_range(jmp_addr, jmp_addr + 8);
 }
+#else
+void ppc_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
+{
+intptr_t diff = addr - jmp_addr;
+tcg_debug_assert(in_range_b(diff));
+atomic_set((uint32_t *)jmp_addr, B | (diff & 0x3fc));
+flush_icache_range(jmp_addr, jmp_addr + 4);
+}
+#endif
 
 static void tcg_out_call(TCGContext *s, tcg_insn_unit *target)
 {
@@ -1895,7 +1903,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 break;
 case INDEX_op_goto_tb:
 tcg_debug_assert(s->tb_jmp_offset);
-/* Direct jump.  Ensure the next insns are 8-byte aligned. */
+/* Direct jump. */
+#ifdef __powerpc64__
+/* Ensure the next insns are 8-byte aligned. */
 if ((uintptr_t)s->code_ptr & 7) {
 tcg_out32(s, NOP);
 }
@@ -1904,6 +1914,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 s->code_ptr += 2;
 tcg_out32(s, MTSPR | RS(TCG_REG_TMP1) | CTR);
 tcg_out32(s, BCCTR | BO_ALWAYS);
+#else
+/* To be replaced by a branch.  */
+s->code_ptr++;
+#endif
 s->tb_next_offset[args[0]] = tcg_current_code_size(s);
 break;
 case INDEX_op_br:
diff --git a/translate-all.c b/translate-all.c
index 1a8f68b..781819e 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -464,6 +464,8 @@ static inline PageDesc *page_find(tb_page_addr_t index)
 # define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
 #elif defined(__powerpc64__)
 # define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+#elif defined(__powerpc__)
+# define MAX_CODE_GEN_BUFFER_SIZE  (32u * 1024 * 1024)
 #elif defined(__aarch64__)
 # define MAX_CODE_GEN_BUFFER_SIZE  (128ul * 1024 * 1024)
 #elif defined(__arm__)
-- 
2.5.5




[Qemu-devel] [PULL 06/39] tcg/i386: Make direct jump patching thread-safe

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Ensure direct jump patching in i386 is atomic by:
 * naturally aligning a location of direct jump address;
 * using atomic_read()/atomic_set() for code patching.

tcg_out_nopn() implementation:
Suggested-by: Richard Henderson .

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Message-Id: <1461341333-19646-6-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h   |  2 +-
 tcg/i386/tcg-target.inc.c | 23 +++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d49befd..0ab7803 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -313,7 +313,7 @@ void ppc_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t 
addr);
 static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
 {
 /* patch the branch destination */
-stl_le_p((void*)jmp_addr, addr - (jmp_addr + 4));
+atomic_set((int32_t *)jmp_addr, addr - (jmp_addr + 4));
 /* no need to flush icache explicitly */
 }
 #elif defined(__s390x__)
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 007407c..8d242a6 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1123,6 +1123,21 @@ static void tcg_out_jmp(TCGContext *s, tcg_insn_unit 
*dest)
 tcg_out_branch(s, 0, dest);
 }
 
+static void tcg_out_nopn(TCGContext *s, int n)
+{
+int i;
+/* Emit 1 or 2 operand size prefixes for the standard one byte nop,
+ * "xchg %eax,%eax", forming "xchg %ax,%ax". All cores accept the
+ * duplicate prefix, and all of the interesting recent cores can
+ * decode and discard the duplicates in a single cycle.
+ */
+tcg_debug_assert(n >= 1);
+for (i = 1; i < n; ++i) {
+tcg_out8(s, 0x66);
+}
+tcg_out8(s, 0x90);
+}
+
 #if defined(CONFIG_SOFTMMU)
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  * int mmu_idx, uintptr_t ra)
@@ -1777,6 +1792,14 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 case INDEX_op_goto_tb:
 if (s->tb_jmp_offset) {
 /* direct jump method */
+int gap;
+/* jump displacement must be aligned for atomic patching;
+ * see if we need to add extra nops before jump
+ */
+gap = tcg_pcrel_diff(s, QEMU_ALIGN_PTR_UP(s->code_ptr + 1, 4));
+if (gap != 1) {
+tcg_out_nopn(s, gap - 1);
+}
 tcg_out8(s, OPC_JMP_long); /* jmp im */
 s->tb_jmp_offset[args[0]] = tcg_current_code_size(s);
 tcg_out32(s, 0);
-- 
2.5.5




[Qemu-devel] [PULL 02/39] include/qemu/osdep.h: Add a macro to check for alignment

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Message-Id: <1461341333-19646-2-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 include/qemu/osdep.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 408783f..e3bc50b 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -158,6 +158,9 @@ extern int daemon(int, int);
 /* Round number up to multiple */
 #define QEMU_ALIGN_UP(n, m) QEMU_ALIGN_DOWN((n) + (m) - 1, (m))
 
+/* Check if n is a multiple of m */
+#define QEMU_IS_ALIGNED(n, m) (((n) % (m)) == 0)
+
 #ifndef ROUND_UP
 #define ROUND_UP(n,d) (((n) + (d) - 1) & -(d))
 #endif
-- 
2.5.5




[Qemu-devel] [PULL 03/39] include/qemu/osdep.h: Add macros for pointer alignment

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

These macros provide a convenient way to n-byte align pointers up and
down and check if a pointer is n-byte aligned.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Message-Id: <1461341333-19646-3-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 include/qemu/osdep.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index e3bc50b..1e3221c 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -161,6 +161,17 @@ extern int daemon(int, int);
 /* Check if n is a multiple of m */
 #define QEMU_IS_ALIGNED(n, m) (((n) % (m)) == 0)
 
+/* n-byte align pointer down */
+#define QEMU_ALIGN_PTR_DOWN(p, n) \
+((typeof(p))QEMU_ALIGN_DOWN((uintptr_t)(p), (n)))
+
+/* n-byte align pointer up */
+#define QEMU_ALIGN_PTR_UP(p, n) \
+((typeof(p))QEMU_ALIGN_UP((uintptr_t)(p), (n)))
+
+/* Check if pointer p is n-bytes aligned */
+#define QEMU_PTR_IS_ALIGNED(p, n) QEMU_IS_ALIGNED((uintptr_t)(p), (n))
+
 #ifndef ROUND_UP
 #define ROUND_UP(n,d) (((n) + (d) - 1) & -(d))
 #endif
-- 
2.5.5




[Qemu-devel] [PULL 00/39] tcg-next patch queue

2016-05-12 Thread Richard Henderson
Wow, this has gotten a bit longer than I remembered.


r~


The following changes since commit f68419eee9a966f5a915314c43cda6778f976a77:

  Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging 
(2016-05-12 16:33:40 +0100)

are available in the git repository at:

  git://github.com/rth7680/qemu.git tags/pull-tcg-20160512

for you to fetch changes up to 8b1fe3f439eaa2f0a6ee7737942bb6c405725867:

  cpu-exec: Clean up 'interrupt_request' reloading in cpu_handle_interrupt() 
(2016-05-12 14:07:16 -1000)


queued 2.7 patches


Alex Bennée (1):
  tcg: reorganize tb_find_physical loop

Emilio G. Cota (3):
  tb: consistently use uint32_t for tb->flags
  translate-all: remove redundant setting of tcg_ctx.code_gen_buffer_size
  translate-all: add missing munmap of the code_gen guard page for MIPS

Paolo Bonzini (2):
  tcg: code_bitmap and code_write_count are not used by user-mode emulation
  cpu-exec: elide more icount code if CONFIG_USER_ONLY

Richard Henderson (1):
  translate-all: Adjust 256mb testing for mips64

Sergey Fedorov (32):
  include/qemu/osdep.h: Add a macro to check for alignment
  include/qemu/osdep.h: Add macros for pointer alignment
  tci: Make direct jump patching thread-safe
  tcg/ppc: Make direct jump patching thread-safe
  tcg/i386: Make direct jump patching thread-safe
  tcg/s390: Make direct jump patching thread-safe
  tcg/arm: Make direct jump patching thread-safe
  tcg/aarch64: Make direct jump patching thread-safe
  tcg/sparc: Make direct jump patching thread-safe
  tcg/mips: Make direct jump patching thread-safe
  tcg: Note requirement on atomic direct jump patching
  tcg: Clean up direct block chaining data fields
  tcg: Use uintptr_t type for jmp_list_{next|first} fields of TB
  tcg: Rearrange tb_link_page() to avoid forward declaration
  tcg: Init TB's direct jumps before making it visible
  tcg: Clarify thread safety check in tb_add_jump()
  tcg: Rename tb_jmp_remove() to tb_remove_from_jmp_list()
  tcg: Extract removing of jumps to TB from tb_phys_invalidate()
  tcg: Clean up tb_jmp_unlink()
  tcg: Clean up direct block chaining safety checks
  tcg: Allow goto_tb to any target PC in user mode
  tcg: Clean up from 'next_tb'
  tcg: Rework tb_invalidated_flag
  cpu-exec: Move TB chaining into tb_find_fast()
  tcg: Remove needless CPUState::current_tb
  cpu-exec: Remove relic orphaned comment
  cpu-exec: Move halt handling out of cpu_exec()
  cpu-exec: Move exception handling out of cpu_exec()
  cpu-exec: Move interrupt handling out of cpu_exec()
  cpu-exec: Move TB execution stuff out of cpu_exec()
  cpu-exec: Remove unused 'x86_cpu' and 'env' from cpu_exec()
  cpu-exec: Clean up 'interrupt_request' reloading in cpu_handle_interrupt()

 cpu-exec-common.c |   2 -
 cpu-exec.c| 519 +++---
 cputlb.c  |  13 --
 exec.c|   2 +-
 hw/i386/kvmvapic.c|   3 +-
 include/exec/exec-all.h   | 108 +
 include/qemu/osdep.h  |  14 ++
 include/qom/cpu.h |   4 +-
 qom/cpu.c |   1 -
 target-alpha/cpu.h|   2 +-
 target-alpha/translate.c  |   4 +
 target-arm/cpu.h  |   2 +-
 target-arm/translate-a64.c|   2 +
 target-arm/translate.c|  17 +-
 target-cris/cpu.h |   2 +-
 target-cris/translate.c   |  16 +-
 target-i386/cpu.h |   2 +-
 target-i386/translate.c   |  25 +-
 target-lm32/cpu.h |   2 +-
 target-lm32/translate.c   |  21 +-
 target-m68k/cpu.h |   2 +-
 target-m68k/translate.c   |  18 +-
 target-microblaze/cpu.h   |   2 +-
 target-microblaze/translate.c |  15 +-
 target-mips/cpu.h |   2 +-
 target-mips/translate.c   |  20 +-
 target-moxie/cpu.h|   2 +-
 target-moxie/translate.c  |  21 +-
 target-openrisc/cpu.h |   2 +-
 target-openrisc/translate.c   |  20 +-
 target-ppc/cpu.h  |   2 +-
 target-ppc/translate.c|  20 +-
 target-s390x/cpu.h|   2 +-
 target-s390x/translate.c  |  17 +-
 target-sh4/cpu.h  |   2 +-
 target-sh4/translate.c|  21 +-
 target-sparc/cpu.h|   2 +-
 target-sparc/translate.c  |  24 +-
 target-tilegx/cpu.h   |   2 +-
 target-tricore/cpu.h  |   2 +-
 target-tricore/translate.c|  20 +-
 target-unicore32/cpu.h|   2 +-
 target-unicore32/translate.c  |  16 +-
 target-xtensa/cpu.h   |   2 +-
 target-xtensa/translate.c |   4 +
 tcg/aarch64/tcg-target.inc.c  |  21 +-
 tcg/arm/tcg-target.inc.c  |  26 ++-
 tcg/i386/tcg-target.inc.c |  31 ++-
 tcg/ia64/

[Qemu-devel] [PULL 07/39] tcg/s390: Make direct jump patching thread-safe

2016-05-12 Thread Richard Henderson
From: Sergey Fedorov 

Ensure direct jump patching in s390 is atomic by:
 * naturally aligning a location of direct jump address;
 * using atomic_read()/atomic_set() for code patching.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
Message-Id: <1461341333-19646-7-git-send-email-sergey.fedo...@linaro.org>
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h   | 2 +-
 tcg/s390/tcg-target.inc.c | 8 
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 0ab7803..6c095e8 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -321,7 +321,7 @@ static inline void tb_set_jmp_target1(uintptr_t jmp_addr, 
uintptr_t addr)
 {
 /* patch the branch destination */
 intptr_t disp = addr - (jmp_addr - 2);
-stl_be_p((void*)jmp_addr, disp / 2);
+atomic_set((int32_t *)jmp_addr, disp / 2);
 /* no need to flush icache explicitly */
 }
 #elif defined(__aarch64__)
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 5805532..8e8064c 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -219,6 +219,8 @@ typedef enum S390Opcode {
 RX_ST   = 0x50,
 RX_STC  = 0x42,
 RX_STH  = 0x40,
+
+NOP = 0x0707,
 } S390Opcode;
 
 #ifdef CONFIG_DEBUG_TCG
@@ -1716,6 +1718,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 
 case INDEX_op_goto_tb:
 if (s->tb_jmp_offset) {
+/* branch displacement must be aligned for atomic patching;
+ * see if we need to add extra nop before branch
+ */
+if (!QEMU_PTR_IS_ALIGNED(s->code_ptr + 1, 4)) {
+tcg_out16(s, NOP);
+}
 tcg_out16(s, RIL_BRCL | (S390_CC_ALWAYS << 4));
 s->tb_jmp_offset[args[0]] = tcg_current_code_size(s);
 s->code_ptr += 2;
-- 
2.5.5




[Qemu-devel] RFC: Proposed vfio IGD assignment fw_cfg ABI

2016-05-12 Thread Alex Williamson
Hey folks,

I'm planning to add a couple fw_cfg files for vfio IGD (Intel
Graphics Device) assignment, but since this does represent a QEMU-BIOS
ABI and since most of the vfio code is committed with only my own
sign-off and review, I'd like to pull this out for discussion separate
from the patches themselves.

#1: "etc/igd-opregion"

the IGD OpRegion is an area of memory which contains among other
things, the Video BIOS Table which is integral in allowing an assigned
IGD to configure and make use of the physical display outputs of the
system.  "etc/igd-opregion" is an opaque fw_cfg file which the BIOS
will use to allocate an appropriately sized reserved memory region,
copy the contents of the fw_cfg file into the allocated memory region,
and write the base address of the allocated memory region to the dword
registers at 0xFC in PCI config space on the IGD device itself.  The
BIOS will look for this fw_cfg file any time a PCI class VGA device is
found with Intel vendor ID.  Multiple IGD devices per VM, such as might
potentially be possible with Intel vGPU, is not within the scope of
this proposal.  The expected size of this fw_cfg file is on the order
of a few pages, 8KB is typical.

#2: "etc/igd-bdsm"

The BDSM is the register on IGD storing the base address of stolen
memory (Base Data of Stolen Memory).  This is simply an area of
reserved RAM which the IGD uses for both GTT and stolen video memory.
The semantics are much the same as for "etc/igd-opregion" with the
exception that the fw_cfg file is only used to request a reserved
memory allocation for this purpose and to indicate the size of the
reserved memory.  There is no content to this fw_cfg file and it should
not be read or written.  As above, the BIOS should look for this file
upon discovering a PCI class VGA device with Intel vendor ID, allocate
the necessary size and write the address to the IGD device.  The BDSM
register is a dword register located at 0x5C in PCI config space of the
IGD device.  This proposal does not intend to address the vague
possibility of multiple BDSM per VM.  The expected size of this fw_cfg
file is from 1MB to multiple hundreds of MB with user specified stolen
video memory.  8MB would be the typical maximum as QEMU currently does
not allocate stolen video memory itself.

I'd appreciate any comments on these entries and I'd be happy to
describe them further.  Perhaps we should create a docs/api/
directory with these sorts of descriptions where we define how a
given API is intended to work.  Thanks,

Alex



[Qemu-devel] [PATCH v6 12/13] misc: Introduce ZynqMP IOU SLCR

2016-05-12 Thread Alistair Francis
From: Peter Crosthwaite 

IOU = I/O Unit
SLCR = System Level Control Registers

This IP is a misc collections of control registers that switch various
properties of system IPs. Currently the only thing implemented is the
SD_SLOTTYPE control (implemented as a GPIO output).

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 hw/misc/Makefile.objs  |   1 +
 hw/misc/xlnx-zynqmp-iou-slcr.c | 115 +
 include/hw/misc/xlnx-zynqmp-iou-slcr.h |  47 ++
 3 files changed, 163 insertions(+)
 create mode 100644 hw/misc/xlnx-zynqmp-iou-slcr.c
 create mode 100644 include/hw/misc/xlnx-zynqmp-iou-slcr.h

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 93f9528..d772c50 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -42,6 +42,7 @@ obj-$(CONFIG_RASPI) += bcm2835_property.o
 obj-$(CONFIG_SLAVIO) += slavio_misc.o
 obj-$(CONFIG_ZYNQ) += zynq_slcr.o
 obj-$(CONFIG_ZYNQ) += zynq-xadc.o
+obj-$(CONFIG_ZYNQ) += xlnx-zynqmp-iou-slcr.o
 obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
 obj-$(CONFIG_MIPS_CPS) += mips_cmgcr.o
 obj-$(CONFIG_MIPS_CPS) += mips_cpc.o
diff --git a/hw/misc/xlnx-zynqmp-iou-slcr.c b/hw/misc/xlnx-zynqmp-iou-slcr.c
new file mode 100644
index 000..00b617e
--- /dev/null
+++ b/hw/misc/xlnx-zynqmp-iou-slcr.c
@@ -0,0 +1,115 @@
+/*
+ * Xilinx ZynqMP IOU System Level Control Registers (SLCR)
+ *
+ * Copyright (c) 2013 Xilinx Inc
+ * Copyright (c) 2013 Peter Crosthwaite 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/misc/xlnx-zynqmp-iou-slcr.h"
+
+#ifndef XLNX_ZYNQMP_IOU_SLCR_ERR_DEBUG
+#define XLNX_ZYNQMP_IOU_SLCR_ERR_DEBUG 0
+#endif
+
+REG32(SD_SLOTTYPE, 0x310)
+#define R_SD_SLOTTYPE_RSVD   0x7ffe
+
+static const RegisterAccessInfo xlnx_zynqmp_iou_slcr_regs_info[] = {
+{   .name = "SD Slot TYPE", .decode.addr = A_SD_SLOTTYPE,
+.rsvd = R_SD_SLOTTYPE_RSVD,
+.gpios = (RegisterGPIOMapping []) {
+{ .name = "SD0_SLOTTYPE",   .bit_pos = 0  },
+{ .name = "SD1_SLOTTYPE",   .bit_pos = 15 },
+{},
+}
+}
+/* FIXME: Complete device model */
+};
+
+static void xlnx_zynqmp_iou_slcr_reset(DeviceState *dev)
+{
+XlnxZynqMPIOUSLCR *s = XLNX_ZYNQMP_IOU_SLCR(dev);
+int i;
+
+for (i = 0; i < XLNX_ZYNQ_MP_IOU_SLCR_R_MAX; ++i) {
+register_reset(>regs_info[i]);
+}
+}
+
+static const MemoryRegionOps xlnx_zynqmp_iou_slcr_ops = {
+.read = register_read_memory_le,
+.write = register_write_memory_le,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+}
+};
+
+static void xlnx_zynqmp_iou_slcr_init(Object *obj)
+{
+XlnxZynqMPIOUSLCR *s = XLNX_ZYNQMP_IOU_SLCR(obj);
+
+memory_region_init(>iomem, obj, "MMIO", XLNX_ZYNQ_MP_IOU_SLCR_R_MAX * 
4);
+register_init_block32(DEVICE(obj), xlnx_zynqmp_iou_slcr_regs_info,
+  ARRAY_SIZE(xlnx_zynqmp_iou_slcr_regs_info),
+  s->regs_info, s->regs, >iomem,
+  _zynqmp_iou_slcr_ops,
+  XLNX_ZYNQMP_IOU_SLCR_ERR_DEBUG,
+  XLNX_ZYNQ_MP_IOU_SLCR_R_MAX);
+sysbus_init_mmio(SYS_BUS_DEVICE(obj), >iomem);
+}
+
+static const VMStateDescription vmstate_xlnx_zynqmp_iou_slcr = {
+.name = "xlnx_zynqmp_iou_slcr",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32_ARRAY(regs, XlnxZynqMPIOUSLCR,
+ XLNX_ZYNQ_MP_IOU_SLCR_R_MAX),
+VMSTATE_END_OF_LIST(),
+}
+};
+
+static void xlnx_zynqmp_iou_slcr_class_init(ObjectClass *klass, void 

[Qemu-devel] [PATCH v6 10/13] irq: Add opaque setter routine

2016-05-12 Thread Alistair Francis
From: Peter Crosthwaite 

Add a routine to set or override the opaque data of an IRQ.

Qdev currently always initialises IRQ opaque as the device itself.
This allows you to override to a custom opaque in the case where
there is extra or different data needed.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 hw/core/irq.c| 5 +
 include/hw/irq.h | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/hw/core/irq.c b/hw/core/irq.c
index 49ff2e6..9d125fb 100644
--- a/hw/core/irq.c
+++ b/hw/core/irq.c
@@ -77,6 +77,11 @@ qemu_irq qemu_allocate_irq(qemu_irq_handler handler, void 
*opaque, int n)
 return irq;
 }
 
+void qemu_irq_set_opaque(qemu_irq irq, void *opaque)
+{
+irq->opaque = opaque;
+}
+
 void qemu_free_irqs(qemu_irq *s, int n)
 {
 int i;
diff --git a/include/hw/irq.h b/include/hw/irq.h
index 4c4c2ea..edad0fc 100644
--- a/include/hw/irq.h
+++ b/include/hw/irq.h
@@ -44,6 +44,8 @@ qemu_irq qemu_allocate_irq(qemu_irq_handler handler, void 
*opaque, int n);
 qemu_irq *qemu_extend_irqs(qemu_irq *old, int n_old, qemu_irq_handler handler,
 void *opaque, int n);
 
+void qemu_irq_set_opaque(qemu_irq irq, void *opaque);
+
 void qemu_free_irqs(qemu_irq *s, int n);
 void qemu_free_irq(qemu_irq irq);
 
-- 
2.7.4




[Qemu-devel] [PATCH v6 06/13] register: Add block initialise helper

2016-05-12 Thread Alistair Francis
From: Peter Crosthwaite 

Add a helper that will scan a static RegisterAccessInfo Array
and populate a container MemoryRegion with registers as defined.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
The reason that I'm not using GArray is because the array needs to store
the memory region that covers all of the registers.

V6:
 - Fixup the loop logic
V5:
 - Convert to only using one memory region
V3:
 - Fix typo
V2:
 - Use memory_region_add_subregion_no_print()

 hw/core/register.c| 36 
 include/hw/register.h | 22 ++
 2 files changed, 58 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index c5a2c78..c68d510 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -231,6 +231,42 @@ uint64_t register_read_memory_le(void *opaque, hwaddr 
addr, unsigned size)
 return register_read_memory(opaque, addr, size, false);
 }
 
+void register_init_block32(DeviceState *owner, const RegisterAccessInfo *rae,
+   int num, RegisterInfo *ri, uint32_t *data,
+   MemoryRegion *container, const MemoryRegionOps *ops,
+   bool debug_enabled, uint64_t memory_size)
+{
+const char *device_prefix = object_get_typename(OBJECT(owner));
+RegisterInfoArray *r_array = g_malloc(sizeof(RegisterInfoArray));
+int i;
+
+r_array->r = g_malloc_n(num, sizeof(RegisterInfo *));
+r_array->num_elements = num;
+r_array->debug = debug_enabled;
+r_array->prefix = device_prefix;
+
+for (i = 0; i < num; i++) {
+int index = rae[i].decode.addr / 4;
+RegisterInfo *r = [index];
+
+*r = (RegisterInfo) {
+.data = [index],
+.data_size = sizeof(uint32_t),
+.access = [i],
+.opaque = owner,
+};
+register_init(r);
+
+r_array->r[i] = r;
+}
+
+memory_region_init_io(_array->mem, OBJECT(owner), ops, r_array,
+  device_prefix, memory_size);
+memory_region_add_subregion(container,
+r_array->r[0]->access->decode.addr,
+_array->mem);
+}
+
 static const TypeInfo register_info = {
 .name  = TYPE_REGISTER,
 .parent = TYPE_DEVICE,
diff --git a/include/hw/register.h b/include/hw/register.h
index eedd578..c40cf03 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -102,6 +102,8 @@ struct RegisterInfo {
  */
 
 struct RegisterInfoArray {
+MemoryRegion mem;
+
 int num_elements;
 RegisterInfo **r;
 
@@ -172,6 +174,26 @@ void register_write_memory_le(void *opaque, hwaddr addr, 
uint64_t value,
 uint64_t register_read_memory_be(void *opaque, hwaddr addr, unsigned size);
 uint64_t register_read_memory_le(void *opaque, hwaddr addr, unsigned size);
 
+/**
+ * Init a block of consecutive registers into a container MemoryRegion. A
+ * number of constant register definitions are parsed to create a corresponding
+ * array of RegisterInfo's.
+ *
+ * @owner: device owning the registers
+ * @rae: Register definitions to init
+ * @num: number of registers to init (length of @rae)
+ * @ri: Register array to init
+ * @data: Array to use for register data
+ * @container: Memory region to contain new registers
+ * @ops: Memory region ops to access registers.
+ * @debug enabled: turn on/off verbose debug information
+ */
+
+void register_init_block32(DeviceState *owner, const RegisterAccessInfo *rae,
+   int num, RegisterInfo *ri, uint32_t *data,
+   MemoryRegion *container, const MemoryRegionOps *ops,
+   bool debug_enabled, uint64_t memory_size);
+
 /* Define constants for a 32 bit register */
 #define REG32(reg, addr)  \
 enum { A_ ## reg = (addr) };  \
-- 
2.7.4




[Qemu-devel] [PATCH v6 07/13] dma: Add Xilinx Zynq devcfg device model

2016-05-12 Thread Alistair Francis
Add a minimal model for the devcfg device which is part of Zynq.
This model supports DMA capabilities and interrupt generation.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
V5:
 - Corrections to the device model logic

 default-configs/arm-softmmu.mak   |   1 +
 hw/dma/Makefile.objs  |   1 +
 hw/dma/xlnx-zynq-devcfg.c | 396 ++
 include/hw/dma/xlnx-zynq-devcfg.h |  62 ++
 4 files changed, 460 insertions(+)
 create mode 100644 hw/dma/xlnx-zynq-devcfg.c
 create mode 100644 include/hw/dma/xlnx-zynq-devcfg.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index c63cdd0..40f94ec 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -66,6 +66,7 @@ CONFIG_PXA2XX=y
 CONFIG_BITBANG_I2C=y
 CONFIG_FRAMEBUFFER=y
 CONFIG_XILINX_SPIPS=y
+CONFIG_ZYNQ_DEVCFG=y
 
 CONFIG_ARM11SCU=y
 CONFIG_A9SCU=y
diff --git a/hw/dma/Makefile.objs b/hw/dma/Makefile.objs
index a1abbcf..29b4520 100644
--- a/hw/dma/Makefile.objs
+++ b/hw/dma/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-$(CONFIG_PL330) += pl330.o
 common-obj-$(CONFIG_I82374) += i82374.o
 common-obj-$(CONFIG_I8257) += i8257.o
 common-obj-$(CONFIG_XILINX_AXI) += xilinx_axidma.o
+common-obj-$(CONFIG_ZYNQ_DEVCFG) += xlnx-zynq-devcfg.o
 common-obj-$(CONFIG_ETRAXFS) += etraxfs_dma.o
 common-obj-$(CONFIG_STP2000) += sparc32_dma.o
 common-obj-$(CONFIG_SUN4M) += sun4m_iommu.o
diff --git a/hw/dma/xlnx-zynq-devcfg.c b/hw/dma/xlnx-zynq-devcfg.c
new file mode 100644
index 000..9de4e75
--- /dev/null
+++ b/hw/dma/xlnx-zynq-devcfg.c
@@ -0,0 +1,396 @@
+/*
+ * QEMU model of the Xilinx Zynq Devcfg Interface
+ *
+ * (C) 2011 PetaLogix Pty Ltd
+ * (C) 2014 Xilinx Inc.
+ * Written by Peter Crosthwaite 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/dma/xlnx-zynq-devcfg.h"
+#include "qemu/bitops.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/dma.h"
+
+#define FREQ_HZ 9
+
+#define BTT_MAX 0x400
+
+#ifndef XLNX_ZYNQ_DEVCFG_ERR_DEBUG
+#define XLNX_ZYNQ_DEVCFG_ERR_DEBUG 0
+#endif
+
+#define DB_PRINT(fmt, args...) do { \
+if (XLNX_ZYNQ_DEVCFG_ERR_DEBUG) { \
+qemu_log("%s: " fmt, __func__, ## args); \
+} \
+} while (0);
+
+REG32(CTRL, 0x00)
+FIELD(CTRL, FORCE_RST,  31,  1) /* Not supported, wr ignored */
+FIELD(CTRL, PCAP_PR,27,  1) /* Forced to 0 on bad unlock */
+FIELD(CTRL, PCAP_MODE,  26,  1)
+FIELD(CTRL, MULTIBOOT_EN,   24,  1)
+FIELD(CTRL, USER_MODE,  15,  1)
+FIELD(CTRL, PCFG_AES_FUSE,  12,  1)
+FIELD(CTRL, PCFG_AES_EN, 9,  3)
+FIELD(CTRL, SEU_EN,  8,  1)
+FIELD(CTRL, SEC_EN,  7,  1)
+FIELD(CTRL, SPNIDEN, 6,  1)
+FIELD(CTRL, SPIDEN,  5,  1)
+FIELD(CTRL, NIDEN,   4,  1)
+FIELD(CTRL, DBGEN,   3,  1)
+FIELD(CTRL, DAP_EN,  0,  3)
+
+REG32(LOCK, 0x04)
+#define AES_FUSE_LOCK4
+#define AES_EN_LOCK  3
+#define SEU_LOCK 2
+#define SEC_LOCK 1
+#define DBG_LOCK 0
+
+/* mapping bits in R_LOCK to what they lock in R_CTRL */
+static const uint32_t lock_ctrl_map[] = {
+[AES_FUSE_LOCK] = R_CTRL_PCFG_AES_FUSE_MASK,
+[AES_EN_LOCK]   = R_CTRL_PCFG_AES_EN_MASK,
+[SEU_LOCK]  = R_CTRL_SEU_EN_MASK,
+[SEC_LOCK]  = R_CTRL_SEC_EN_MASK,
+[DBG_LOCK]  = R_CTRL_SPNIDEN_MASK | R_CTRL_SPIDEN_MASK |
+  R_CTRL_NIDEN_MASK   | R_CTRL_DBGEN_MASK  |
+  R_CTRL_DAP_EN_MASK,
+};
+
+REG32(CFG, 0x08)
+FIELD(CFG,  RFIFO_TH,   10,  2)
+FIELD(CFG,  WFIFO_TH,8,  2)
+FIELD(CFG,  

[Qemu-devel] [PATCH v6 09/13] qdev: Define qdev_get_gpio_out

2016-05-12 Thread Alistair Francis
From: Peter Crosthwaite 

An API similar to the existing qdev_get_gpio_in() except gets outputs.
Useful for:

1: Implementing lightweight devices that don't want to keep pointers
to their own GPIOs. They can get their GPIO pointers at runtime from
QOM using this API.

2: testing or debugging code which may wish to override the
hardware generated value of of a GPIO with a user specified value
(E.G. interrupt injection).

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 hw/core/qdev.c | 12 
 include/hw/qdev-core.h |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index db41aa1..e3015d2 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -489,6 +489,18 @@ qemu_irq qdev_get_gpio_in(DeviceState *dev, int n)
 return qdev_get_gpio_in_named(dev, NULL, n);
 }
 
+qemu_irq qdev_get_gpio_out_named(DeviceState *dev, const char *name, int n)
+{
+char *propname = g_strdup_printf("%s[%d]",
+ name ? name : "unnamed-gpio-out", n);
+return (qemu_irq)object_property_get_link(OBJECT(dev), propname, NULL);
+}
+
+qemu_irq qdev_get_gpio_out(DeviceState *dev, int n)
+{
+return qdev_get_gpio_out_named(dev, NULL, n);
+}
+
 void qdev_connect_gpio_out_named(DeviceState *dev, const char *name, int n,
  qemu_irq pin)
 {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 1ce02b2..0e216af 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -285,6 +285,8 @@ bool qdev_machine_modified(void);
 
 qemu_irq qdev_get_gpio_in(DeviceState *dev, int n);
 qemu_irq qdev_get_gpio_in_named(DeviceState *dev, const char *name, int n);
+qemu_irq qdev_get_gpio_out(DeviceState *dev, int n);
+qemu_irq qdev_get_gpio_out_named(DeviceState *dev, const char *name, int n);
 
 void qdev_connect_gpio_out(DeviceState *dev, int n, qemu_irq pin);
 void qdev_connect_gpio_out_named(DeviceState *dev, const char *name, int n,
-- 
2.7.4




[Qemu-devel] [PATCH v6 05/13] register: QOMify

2016-05-12 Thread Alistair Francis
From: Peter Crosthwaite 

QOMify registers as a child of TYPE_DEVICE. This allows registers to
define GPIOs.

Define an init helper that will do QOM initialisation.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
Reviewed-by: KONRAD Frederic 
---
V5:
 - Convert to using only one memory region

 hw/core/register.c| 23 +++
 include/hw/register.h | 15 +++
 2 files changed, 38 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index 25196e6..c5a2c78 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -148,6 +148,17 @@ void register_reset(RegisterInfo *reg)
 register_write_val(reg, reg->access->reset);
 }
 
+void register_init(RegisterInfo *reg)
+{
+assert(reg);
+
+if (!reg->data || !reg->access) {
+return;
+}
+
+object_initialize((void *)reg, sizeof(*reg), TYPE_REGISTER);
+}
+
 static inline void register_write_memory(void *opaque, hwaddr addr,
  uint64_t value, unsigned size, bool 
be)
 {
@@ -219,3 +230,15 @@ uint64_t register_read_memory_le(void *opaque, hwaddr 
addr, unsigned size)
 {
 return register_read_memory(opaque, addr, size, false);
 }
+
+static const TypeInfo register_info = {
+.name  = TYPE_REGISTER,
+.parent = TYPE_DEVICE,
+};
+
+static void register_register_types(void)
+{
+type_register_static(_info);
+}
+
+type_init(register_register_types)
diff --git a/include/hw/register.h b/include/hw/register.h
index e0aac91..eedd578 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -11,6 +11,7 @@
 #ifndef REGISTER_H
 #define REGISTER_H
 
+#include "hw/qdev-core.h"
 #include "exec/memory.h"
 
 typedef struct RegisterInfo RegisterInfo;
@@ -74,6 +75,9 @@ struct RegisterAccessInfo {
  */
 
 struct RegisterInfo {
+/*  */
+DeviceState parent_obj;
+
 /*  */
 void *data;
 int data_size;
@@ -83,6 +87,9 @@ struct RegisterInfo {
 void *opaque;
 };
 
+#define TYPE_REGISTER "qemu,register"
+#define REGISTER(obj) OBJECT_CHECK(RegisterInfo, (obj), TYPE_REGISTER)
+
 /**
  * This structure is used to group all of the individual registers which are
  * modeled using the RegisterInfo strucutre.
@@ -132,6 +139,14 @@ uint64_t register_read(RegisterInfo *reg, const char* 
prefix, bool debug);
 void register_reset(RegisterInfo *reg);
 
 /**
+ * Initialize a register. GPIO's are setup as IOs to the specified device.
+ * Fast paths for eligible registers are enabled.
+ * @reg: Register to initialize
+ */
+
+void register_init(RegisterInfo *reg);
+
+/**
  * Memory API MMIO write handler that will write to a Register API register.
  *  _be for big endian variant and _le for little endian.
  * @opaque: RegisterInfo to write to
-- 
2.7.4




[Qemu-devel] [PATCH v6 03/13] register: Add Memory API glue

2016-05-12 Thread Alistair Francis
Add memory io handlers that glue the register API to the memory API.
Just translation functions at this stage. Although it does allow for
devices to be created without all-in-one mmio r/w handlers.

This patch also adds the RegisterInfoArray struct, which allows all of
the individual RegisterInfo structs to be grouped into a single memory
region.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
V6:
 - Add the memory region later
V5:
 - Convert to using only one memory region

 hw/core/register.c| 72 +++
 include/hw/register.h | 50 +++
 2 files changed, 122 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index 5e6f621..25196e6 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -147,3 +147,75 @@ void register_reset(RegisterInfo *reg)
 
 register_write_val(reg, reg->access->reset);
 }
+
+static inline void register_write_memory(void *opaque, hwaddr addr,
+ uint64_t value, unsigned size, bool 
be)
+{
+RegisterInfoArray *reg_array = opaque;
+RegisterInfo *reg = NULL;
+uint64_t we = ~0;
+int i, shift = 0;
+
+for (i = 0; i < reg_array->num_elements; i++) {
+if (reg_array->r[i]->access->decode.addr == addr) {
+reg = reg_array->r[i];
+break;
+}
+}
+assert(reg);
+
+/* Generate appropriate write enable mask and shift values */
+if (reg->data_size < size) {
+we = MAKE_64BIT_MASK(0, reg->data_size * 8);
+shift = 8 * (be ? reg->data_size - size : 0);
+} else if (reg->data_size >= size) {
+we = MAKE_64BIT_MASK(0, size * 8);
+}
+
+register_write(reg, value << shift, we << shift, reg_array->prefix,
+   reg_array->debug);
+}
+
+void register_write_memory_be(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size)
+{
+register_write_memory(opaque, addr, value, size, true);
+}
+
+
+void register_write_memory_le(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size)
+{
+register_write_memory(opaque, addr, value, size, false);
+}
+
+static inline uint64_t register_read_memory(void *opaque, hwaddr addr,
+unsigned size, bool be)
+{
+RegisterInfoArray *reg_array = opaque;
+RegisterInfo *reg = NULL;
+int i, shift;
+
+for (i = 0; i < reg_array->num_elements; i++) {
+if (reg_array->r[i]->access->decode.addr == addr) {
+reg = reg_array->r[i];
+break;
+}
+}
+assert(reg);
+
+shift = 8 * (be ? reg->data_size - size : 0);
+
+return (register_read(reg, reg_array->prefix, reg_array->debug) >> shift) &
+   MAKE_64BIT_MASK(0, size * 8);
+}
+
+uint64_t register_read_memory_be(void *opaque, hwaddr addr, unsigned size)
+{
+return register_read_memory(opaque, addr, size, true);
+}
+
+uint64_t register_read_memory_le(void *opaque, hwaddr addr, unsigned size)
+{
+return register_read_memory(opaque, addr, size, false);
+}
diff --git a/include/hw/register.h b/include/hw/register.h
index 07d0616..786707b 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -15,6 +15,7 @@
 
 typedef struct RegisterInfo RegisterInfo;
 typedef struct RegisterAccessInfo RegisterAccessInfo;
+typedef struct RegisterInfoArray RegisterInfoArray;
 
 /**
  * Access description for a register that is part of guest accessible device
@@ -51,6 +52,10 @@ struct RegisterAccessInfo {
 void (*post_write)(RegisterInfo *reg, uint64_t val);
 
 uint64_t (*post_read)(RegisterInfo *reg, uint64_t val);
+
+struct {
+hwaddr addr;
+} decode;
 };
 
 /**
@@ -79,6 +84,25 @@ struct RegisterInfo {
 };
 
 /**
+ * This structure is used to group all of the individual registers which are
+ * modeled using the RegisterInfo strucutre.
+ *
+ * @r is an aray containing of all the relevent RegisterInfo structures.
+ *
+ * @num_elements is the number of elements in the array r
+ *
+ * @mem: optional Memory region for the register
+ */
+
+struct RegisterInfoArray {
+int num_elements;
+RegisterInfo **r;
+
+bool debug;
+const char *prefix;
+};
+
+/**
  * write a value to a register, subject to its restrictions
  * @reg: register to write to
  * @val: value to write
@@ -107,4 +131,30 @@ uint64_t register_read(RegisterInfo *reg, const char* 
prefix, bool debug);
 
 void register_reset(RegisterInfo *reg);
 
+/**
+ * Memory API MMIO write handler that will write to a Register API register.
+ *  _be for big endian variant and _le for little endian.
+ * @opaque: RegisterInfo to write to
+ * @addr: Address to write
+ * @value: Value to write
+ * @size: Number of bytes to write
+ */
+
+void register_write_memory_be(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size);

[Qemu-devel] [PATCH v6 01/13] bitops: Add MAKE_64BIT_MASK macro

2016-05-12 Thread Alistair Francis
Add a macro that creates a 64bit value which has length number of ones
shifted acrros by the value of shift.

Signed-off-by: Alistair Francis 
Reviewed-by: Alex Bennée 
---
V5:
 - Re-write to a 64-bit mask instead of ONES()
 - Re-order this patch in the series

 include/qemu/bitops.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index 755fdd1..3c45791 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -24,6 +24,9 @@
 #define BIT_WORD(nr)((nr) / BITS_PER_LONG)
 #define BITS_TO_LONGS(nr)   DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
 
+#define MAKE_64BIT_MASK(shift, length) \
+(((1ull << (length)) - 1) << shift)
+
 /**
  * set_bit - Set a bit in memory
  * @nr: the bit to set
-- 
2.7.4




[Qemu-devel] [PATCH v6 08/13] xilinx_zynq: Connect devcfg to the Zynq machine model

2016-05-12 Thread Alistair Francis
From: Peter Crosthwaite 

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
V4:
 - Small corrections to the device model logic

 hw/arm/xilinx_zynq.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index 98b17c9..ffea3be 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -293,6 +293,14 @@ static void zynq_init(MachineState *machine)
 sysbus_connect_irq(busdev, n + 1, pic[dma_irqs[n] - IRQ_OFFSET]);
 }
 
+dev = qdev_create(NULL, "xlnx.ps7-dev-cfg");
+object_property_add_child(qdev_get_machine(), "xlnx-devcfg", OBJECT(dev),
+  NULL);
+qdev_init_nofail(dev);
+busdev = SYS_BUS_DEVICE(dev);
+sysbus_connect_irq(busdev, 0, pic[40 - IRQ_OFFSET]);
+sysbus_mmio_map(busdev, 0, 0xF8007000);
+
 zynq_binfo.ram_size = ram_size;
 zynq_binfo.kernel_filename = kernel_filename;
 zynq_binfo.kernel_cmdline = kernel_cmdline;
-- 
2.7.4




[Qemu-devel] [PATCH v6 11/13] register: Add GPIO API

2016-05-12 Thread Alistair Francis
Add GPIO functionality to the register API. This allows association
and automatic connection of GPIOs to bits in registers. GPIO inputs
will attach to handlers that automatically set read-only bits in
registers. GPIO outputs will be updated to reflect their field value
when their respective registers are written (or reset). Supports
active low GPIOs.

This is particularly effective for implementing system level
controllers, where heterogenous collections of control signals are
placed is a SoC specific peripheral then propagated all over the
system.

Signed-off-by: Peter Crosthwaite 
[ EI Changes:
  * register: Add a polarity field to GPIO connections
  Makes it possible to directly connect active low signals
  to generic interrupt pins.
]
Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Alistair Francis 
---
V6:
 - Don't use qdev_pass_all_gpios() as it doesn't seem to work
V5
 - Remove RegisterAccessError struct

 hw/core/register.c| 96 +++
 include/hw/register.h | 27 +++
 2 files changed, 123 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index c68d510..306d196 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -101,6 +101,7 @@ void register_write(RegisterInfo *reg, uint64_t val, 
uint64_t we,
 }
 
 register_write_val(reg, new_val);
+register_refresh_gpios(reg, old_val, debug);
 
 if (ac->post_write) {
 ac->post_write(reg, new_val);
@@ -140,23 +141,118 @@ uint64_t register_read(RegisterInfo *reg, const char* 
prefix, bool debug)
 void register_reset(RegisterInfo *reg)
 {
 g_assert(reg);
+uint64_t old_val;
 
 if (!reg->data || !reg->access) {
 return;
 }
 
+old_val = register_read_val(reg);
+
 register_write_val(reg, reg->access->reset);
+register_refresh_gpios(reg, old_val, false);
+}
+
+void register_refresh_gpios(RegisterInfo *reg, uint64_t old_value, bool debug)
+{
+const RegisterAccessInfo *ac;
+const RegisterGPIOMapping *gpio;
+
+ac = reg->access;
+for (gpio = ac->gpios; gpio && gpio->name; gpio++) {
+int i;
+
+if (gpio->input) {
+continue;
+}
+
+for (i = 0; i < gpio->num; ++i) {
+uint64_t gpio_value, gpio_value_old;
+
+qemu_irq gpo = qdev_get_gpio_out_named(DEVICE(reg), gpio->name, i);
+gpio_value_old = extract64(old_value,
+   gpio->bit_pos + i * gpio->width,
+   gpio->width) ^ gpio->polarity;
+gpio_value = extract64(register_read_val(reg),
+   gpio->bit_pos + i * gpio->width,
+   gpio->width) ^ gpio->polarity;
+if (!(gpio_value_old ^ gpio_value)) {
+continue;
+}
+if (debug && gpo) {
+qemu_log("refreshing gpio out %s to %" PRIx64 "\n",
+ gpio->name, gpio_value);
+}
+qemu_set_irq(gpo, gpio_value);
+}
+}
+}
+
+typedef struct DeviceNamedGPIOHandlerOpaque {
+DeviceState *dev;
+const char *name;
+} DeviceNamedGPIOHandlerOpaque;
+
+static void register_gpio_handler(void *opaque, int n, int level)
+{
+DeviceNamedGPIOHandlerOpaque *gho = opaque;
+RegisterInfo *reg = REGISTER(gho->dev);
+
+const RegisterAccessInfo *ac;
+const RegisterGPIOMapping *gpio;
+
+ac = reg->access;
+for (gpio = ac->gpios; gpio && gpio->name; gpio++) {
+if (gpio->input && !strcmp(gho->name, gpio->name)) {
+register_write_val(reg, deposit64(register_read_val(reg),
+  gpio->bit_pos + n * gpio->width,
+  gpio->width,
+  level ^ gpio->polarity));
+return;
+}
+}
+
+abort();
 }
 
 void register_init(RegisterInfo *reg)
 {
 assert(reg);
+const RegisterAccessInfo *ac;
+const RegisterGPIOMapping *gpio;
 
 if (!reg->data || !reg->access) {
 return;
 }
 
 object_initialize((void *)reg, sizeof(*reg), TYPE_REGISTER);
+
+ac = reg->access;
+for (gpio = ac->gpios; gpio && gpio->name; gpio++) {
+if (!gpio->num) {
+((RegisterGPIOMapping *)gpio)->num = 1;
+}
+if (!gpio->width) {
+((RegisterGPIOMapping *)gpio)->width = 1;
+}
+if (gpio->input) {
+DeviceNamedGPIOHandlerOpaque gho = {
+.name = gpio->name,
+.dev = DEVICE(reg),
+};
+qemu_irq irq;
+
+qdev_init_gpio_in_named(DEVICE(reg), register_gpio_handler,
+gpio->name, gpio->num);
+irq = qdev_get_gpio_in_named(DEVICE(reg), gpio->name, gpio->num);

[Qemu-devel] [PATCH v6 04/13] register: Define REG and FIELD macros

2016-05-12 Thread Alistair Francis
From: Peter Crosthwaite 

Define some macros that can be used for defining registers and fields.

The REG32 macro will define A_FOO, for the byte address of a register
as well as R_FOO for the uint32_t[] register number (A_FOO / 4).

The FIELD macro will define FOO_BAR_MASK, FOO_BAR_SHIFT and
FOO_BAR_LENGTH constants for field BAR in register FOO.

Finally, there are some shorthand helpers for extracting/depositing
fields from registers based on these naming schemes.

Usage can greatly reduce the verbosity of device code.

The deposit and extract macros (eg F_EX32, AF_DP32 etc.) can be used
to generate extract and deposits without any repetition of the name
stems.

Signed-off-by: Peter Crosthwaite 
[ EI Changes:
  * Add Deposit macros
]
Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Alistair Francis 
---
E.g. Currently you have to define something like:

\#define R_FOOREG (0x84/4)
\#define R_FOOREG_BARFIELD_SHIFT 10
\#define R_FOOREG_BARFIELD_LENGTH 5

uint32_t foobar_val = extract32(s->regs[R_FOOREG],
R_FOOREG_BARFIELD_SHIFT,
R_FOOREG_BARFIELD_LENGTH);

Which has:
2 macro definitions per field
3 register names ("FOOREG") per extract
2 field names ("BARFIELD") per extract

With these macros this becomes:

REG32(FOOREG, 0x84)
FIELD(FOOREG, BARFIELD, 10, 5)

uint32_t foobar_val = AF_EX32(s->regs, FOOREG, BARFIELD)

Which has:
1 macro definition per field
1 register name per extract
1 field name per extract

If you are not using arrays for the register data you can just use the
non-array "F_" variants and still save 2 name stems:

uint32_t foobar_val = F_EX32(s->fooreg, FOOREG, BARFIELD)

Deposit is similar for depositing values. Deposit has compile-time
overflow checking for literals.
For example:

REG32(XYZ1, 0x84)
FIELD(XYZ1, TRC, 0, 4)

/* Correctly set XYZ1.TRC = 5.  */
AF_DP32(s->regs, XYZ1, TRC, 5);

/* Incorrectly set XYZ1.TRC = 16.  */
AF_DP32(s->regs, XYZ1, TRC, 16);

The latter assignment results in:
warning: large integer implicitly truncated to unsigned type [-Woverflow]


 include/hw/register.h | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/include/hw/register.h b/include/hw/register.h
index 786707b..e0aac91 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -157,4 +157,42 @@ void register_write_memory_le(void *opaque, hwaddr addr, 
uint64_t value,
 uint64_t register_read_memory_be(void *opaque, hwaddr addr, unsigned size);
 uint64_t register_read_memory_le(void *opaque, hwaddr addr, unsigned size);
 
+/* Define constants for a 32 bit register */
+#define REG32(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) / 4 };
+
+/* Define SHIFT, LEGTH and MASK constants for a field within a register */
+#define FIELD(reg, field, shift, length)  \
+enum { R_ ## reg ## _ ## field ## _SHIFT = (shift)};  \
+enum { R_ ## reg ## _ ## field ## _LENGTH = (length)};\
+enum { R_ ## reg ## _ ## field ## _MASK = (((1ULL << (length)) - 1)   \
+  << (shift)) };
+
+/* Extract a field from a register */
+
+#define F_EX32(storage, reg, field)   \
+extract32((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
+  R_ ## reg ## _ ## field ## _LENGTH)
+
+/* Extract a field from an array of registers */
+
+#define AF_EX32(regs, reg, field) \
+F_EX32((regs)[R_ ## reg], reg, field)
+
+/* Deposit a register field.  */
+
+#define F_DP32(storage, reg, field, val) ({   \
+struct {  \
+unsigned int v:R_ ## reg ## _ ## field ## _LENGTH;\
+} v = { .v = val };   \
+uint32_t d;   \
+d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
+  R_ ## reg ## _ ## field ## _LENGTH, v.v);   \
+d; })
+
+/* Deposit a field to array of registers.  */
+
+#define AF_DP32(regs, reg, field, val)\
+(regs)[R_ ## reg] = F_DP32((regs)[R_ ## reg], reg, field, val);
 #endif
-- 
2.7.4




[Qemu-devel] [PATCH v6 00/13] data-driven device registers

2016-05-12 Thread Alistair Francis
This patch series is based on Peter C's original register API. His
original cover letter is below.

Future work: Allow support for memory attributes.

V6:
 - Small changes to the API based on Alex's comments
 - Remove 'register: Add support for decoding information' patch
 - Move prefix and debug into the RegisterInfoArray as it is the same
   for every register.
V5:
 - Only create a single memory region instead of a memory region for
   each register
 - General tidyups based on Alex's comments
V4:
 - Rebase and fix build issue
 - Simplify the register write logic
 - Other small fixes suggested by Alex Bennee
V3:
 - Small changes reported by Fred
V2:
 - Rebase
 - Fix up IOU SLCR connections
 - Add the memory_region_add_subregion_no_print() function and use it
   for the registers
Changes since RFC:
 - Connect the ZynqMP IOU SLCR device
 - Rebase

Original cover letter From Peter:
Hi All. This is a new scheme I've come up with handling device registers in a
data driven way. My motivation for this is to factor out a lot of the access
checking that seems to be replicated in every device. See P1 commit message for
further discussion.

P1 is the main patch, adds the register definition functionality
P2-3,6 add helpers that glue the register API to the Memory API
P4 Defines a set of macros that minimise register and field definitions
P5 is QOMfication
P7 is a trivial
P10-13 Work up to GPIO support
P8,9,14 add new devices (the Xilinx Zynq devcfg & ZynqMP SLCR) that use this
scheme.
P15: Connect the ZynqMP SLCR device

This Zynq devcfg device was particularly finnicky with per-bit restrictions.
I'm also looking for a higher-than-usual modelling fidelity
on the register space, with semantics defined for random reserved bits
in-between otherwise consistent fields.

Here's an example of the qemu_log output for the devcfg device. This is produced
by now generic sharable code:

/machine/unattached/device[44]:Addr 0x08:CFG: write of value 0508
/machine/unattached/device[44]:Addr 0x80:MCTRL: write of value 00800010
/machine/unattached/device[44]:Addr 0x10:INT_MASK: write of value 
/machine/unattached/device[44]:Addr :CTRL: write of value 0c00607f

And an example of a rogue guest banging on a bad bit:

/machine/unattached/device[44]:Addr 0x14:STATUS bits 0x01 may not be \
written to 1

A future feature I am interested in is implementing TCG optimisation of
side-effectless registers. The register API allows clear definition of
what registers have txn side effects and which ones don't. You could even
go a step further and translate such side-effectless accesses based on the
data pointer for the register.


Alistair Francis (6):
  bitops: Add MAKE_64BIT_MASK macro
  register: Add Register API
  register: Add Memory API glue
  dma: Add Xilinx Zynq devcfg device model
  register: Add GPIO API
  xlnx-zynqmp: Connect the ZynqMP IOU SLCR

Peter Crosthwaite (7):
  register: Define REG and FIELD macros
  register: QOMify
  register: Add block initialise helper
  xilinx_zynq: Connect devcfg to the Zynq machine model
  qdev: Define qdev_get_gpio_out
  irq: Add opaque setter routine
  misc: Introduce ZynqMP IOU SLCR

 default-configs/arm-softmmu.mak|   1 +
 hw/arm/xilinx_zynq.c   |   8 +
 hw/arm/xlnx-zynqmp.c   |  13 ++
 hw/core/Makefile.objs  |   1 +
 hw/core/irq.c  |   5 +
 hw/core/qdev.c |  12 +
 hw/core/register.c | 376 +++
 hw/dma/Makefile.objs   |   1 +
 hw/dma/xlnx-zynq-devcfg.c  | 396 +
 hw/misc/Makefile.objs  |   1 +
 hw/misc/xlnx-zynqmp-iou-slcr.c | 115 ++
 include/hw/arm/xlnx-zynqmp.h   |   2 +
 include/hw/dma/xlnx-zynq-devcfg.h  |  62 ++
 include/hw/irq.h   |   2 +
 include/hw/misc/xlnx-zynqmp-iou-slcr.h |  47 
 include/hw/qdev-core.h |   2 +
 include/hw/register.h  | 262 ++
 include/qemu/bitops.h  |   3 +
 18 files changed, 1309 insertions(+)
 create mode 100644 hw/core/register.c
 create mode 100644 hw/dma/xlnx-zynq-devcfg.c
 create mode 100644 hw/misc/xlnx-zynqmp-iou-slcr.c
 create mode 100644 include/hw/dma/xlnx-zynq-devcfg.h
 create mode 100644 include/hw/misc/xlnx-zynqmp-iou-slcr.h
 create mode 100644 include/hw/register.h

-- 
2.7.4




[Qemu-devel] [PATCH v6 02/13] register: Add Register API

2016-05-12 Thread Alistair Francis
This API provides some encapsulation of registers and factors our some
common functionality to common code. Bits of device state (usually MMIO
registers), often have all sorts of access restrictions and semantics
associated with them. This API allow you to define what those
restrictions are on a bit-by-bit basis.

Helper functions are then used to access the register which observe the
semantics defined by the RegisterAccessInfo struct.

Some features:
Bits can be marked as read_only (ro field)
Bits can be marked as write-1-clear (w1c field)
Bits can be marked as reserved (rsvd field)
Reset values can be defined (reset)
Bits can be marked clear on read (cor)
Pre and post action callbacks can be added to read and write ops
Verbose debugging info can be enabled/disabled

Useful for defining device register spaces in a data driven way. Cuts
down on a lot of the verbosity and repetition in the switch-case blocks
in the standard foo_mmio_read/write functions.

Also useful for automated generation of device models from hardware
design sources.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
Reviewed-by: Alex Bennée 
---
V5:
 - Convert to using only one memory region
V4:
 - Rebase
 - Remove the guest error masking
 - Simplify the unimplemented masking
 - Use the reserved value in the write calculations
 - Remove read_lite and write_lite
 - General fixes to asserts and log printing
V3:
 - Address some comments from Fred

 hw/core/Makefile.objs |   1 +
 hw/core/register.c| 149 ++
 include/hw/register.h | 110 +
 3 files changed, 260 insertions(+)
 create mode 100644 hw/core/register.c
 create mode 100644 include/hw/register.h

diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index abb3560..bf95db5 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -14,4 +14,5 @@ common-obj-$(CONFIG_SOFTMMU) += machine.o
 common-obj-$(CONFIG_SOFTMMU) += null-machine.o
 common-obj-$(CONFIG_SOFTMMU) += loader.o
 common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
+common-obj-$(CONFIG_SOFTMMU) += register.o
 common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
diff --git a/hw/core/register.c b/hw/core/register.c
new file mode 100644
index 000..5e6f621
--- /dev/null
+++ b/hw/core/register.c
@@ -0,0 +1,149 @@
+/*
+ * Register Definition API
+ *
+ * Copyright (c) 2016 Xilinx Inc.
+ * Copyright (c) 2013 Peter Crosthwaite 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/register.h"
+#include "hw/qdev.h"
+#include "qemu/log.h"
+
+static inline void register_write_val(RegisterInfo *reg, uint64_t val)
+{
+g_assert(reg->data);
+
+switch (reg->data_size) {
+case 1:
+*(uint8_t *)reg->data = val;
+break;
+case 2:
+*(uint16_t *)reg->data = val;
+break;
+case 4:
+*(uint32_t *)reg->data = val;
+break;
+case 8:
+*(uint64_t *)reg->data = val;
+break;
+default:
+g_assert_not_reached();
+}
+}
+
+static inline uint64_t register_read_val(RegisterInfo *reg)
+{
+switch (reg->data_size) {
+case 1:
+return *(uint8_t *)reg->data;
+case 2:
+return *(uint16_t *)reg->data;
+case 4:
+return *(uint32_t *)reg->data;
+case 8:
+return *(uint64_t *)reg->data;
+default:
+g_assert_not_reached();
+}
+return 0; /* unreachable */
+}
+
+void register_write(RegisterInfo *reg, uint64_t val, uint64_t we,
+const char* prefix, bool debug)
+{
+uint64_t old_val, new_val, test, no_w_mask;
+const RegisterAccessInfo *ac;
+
+assert(reg);
+
+ac = reg->access;
+
+if (!ac || !ac->name) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: write to undefined device state "
+  "(written value: %#" PRIx64 ")\n", prefix, val);
+return;
+}
+
+old_val = reg->data ? register_read_val(reg) : ac->reset;
+
+test = (old_val ^ val) & ac->rsvd;
+if (test) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: change of value in reserved bit"
+  "fields: %#" PRIx64 ")\n", prefix, test);
+}
+
+test = val & ac->unimp;
+if (test) {
+qemu_log_mask(LOG_UNIMP,
+  "%s:%s writing %#" PRIx64 " to unimplemented bits:" \
+  " %#" PRIx64 "",
+  prefix, reg->access->name, val, ac->unimp);
+}
+
+/* Create the no write mask based on the read only, write to clear and
+ * reserved bit masks.
+ */
+no_w_mask = ac->ro | ac->w1c | ac->rsvd | ~we;
+new_val = (val & ~no_w_mask) | (old_val & no_w_mask);
+new_val &= ~(val & ac->w1c);
+
+if (ac->pre_write) {
+new_val = 

Re: [Qemu-devel] [PATCH v5 4/6] qemu-io: Allow unaligned access by default

2016-05-12 Thread Eric Blake
On 05/12/2016 09:50 AM, Eric Blake wrote:
>> This breaks qemu-iotests 136 for raw. It's pretty obvious that this is a
>> test case problem (uses unaligned requests to test error accounting), so
>> I'm not dropping the patch, but please do send a follow-up.
> 
> ...which explains why I missed this failure with ./check -raw.  Will
> fix, and maybe I should have grepped a bit harder, since it is fairly
> obvious:
> 
> tests/qemu-iotests/136:# Two types of invalid operations:
> unaligned length and unaligned offset
> 
> I will also check if this needs updating:
> 
> tests/qemu-iotests/109:# qemu-img compare can't handle unaligned
> file sizes

Turns out the comment was stale, even before my recent patches, but I
didn't bother bisecting to find when qemu-img learned to handle
unaligned raw images.  But see my comments in my other mail on the patch
for this file: 'qemu-img compare' doesn't necessarily give the nicest of
error messages for unaligned files

> 
> as both of those tests run under -raw but not -qcow2
> 
>>
>> Maybe negative length and offset work as a replacement.

Sadly, no, because cvtnum() doesn't like things larger than INT64_MAX,
so you can't pass in a negative number.  I added a new '-i' flag
instead; series now available for review.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH 1/3] qemu-io: Fix missing getopt() updates

2016-05-12 Thread Eric Blake
Commit 770e0e0e [*] forgot to implement 'writev -f'.  Likewise,
commit c3e001c forgot to implement 'aio_write -u -z'.

[*] does it sound "ech0e" in here? :)

Signed-off-by: Eric Blake 
---
 qemu-io-cmds.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 4a00bc6..415be25 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1150,7 +1150,7 @@ static int writev_f(BlockBackend *blk, int argc, char 
**argv)
 int pattern = 0xcd;
 QEMUIOVector qiov;

-while ((c = getopt(argc, argv, "CqP:")) != -1) {
+while ((c = getopt(argc, argv, "CfqP:")) != -1) {
 switch (c) {
 case 'C':
 Cflag = true;
@@ -1595,7 +1595,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 int flags = 0;

 ctx->blk = blk;
-while ((c = getopt(argc, argv, "CfqP:z")) != -1) {
+while ((c = getopt(argc, argv, "CfqP:uz")) != -1) {
 switch (c) {
 case 'C':
 ctx->Cflag = true;
-- 
2.5.5




[Qemu-devel] [PATCH 3/3] qemu-iotests: Fix regression in 136 on aio_read invalid

2016-05-12 Thread Eric Blake
Commit 093ea232 removed the ability for aio_read and aio_write
to artificially inflate the invalid statistics counters for
block devices, since it no longer flags unaligned offset or
length.  Add 'aio_read -i' and 'aio_write -i' to restore
the ability, and update test 136 to use it.

Reported-by: Kevin Wolf 
Signed-off-by: Eric Blake 
---
 qemu-io-cmds.c | 20 
 tests/qemu-iotests/136 | 18 +++---
 2 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 415be25..059b8ee 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1476,6 +1476,7 @@ static void aio_read_help(void)
 " used to ensure all outstanding aio requests have been completed.\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -P, -- use a pattern to verify read data\n"
+" -i, -- treat request as invalid, for exercising stats\n"
 " -v, -- dump buffer to standard output\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
 "\n");
@@ -1488,7 +1489,7 @@ static const cmdinfo_t aio_read_cmd = {
 .cfunc  = aio_read_f,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-Cqv] [-P pattern] off len [len..]",
+.args   = "[-Ciqv] [-P pattern] off len [len..]",
 .oneline= "asynchronously reads a number of bytes",
 .help   = aio_read_help,
 };
@@ -1499,7 +1500,7 @@ static int aio_read_f(BlockBackend *blk, int argc, char 
**argv)
 struct aio_ctx *ctx = g_new0(struct aio_ctx, 1);

 ctx->blk = blk;
-while ((c = getopt(argc, argv, "CP:qv")) != -1) {
+while ((c = getopt(argc, argv, "CP:iqv")) != -1) {
 switch (c) {
 case 'C':
 ctx->Cflag = true;
@@ -1512,6 +1513,11 @@ static int aio_read_f(BlockBackend *blk, int argc, char 
**argv)
 return 0;
 }
 break;
+case 'i':
+printf("injecting invalid read request\n");
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ);
+g_free(ctx);
+return 0;
 case 'q':
 ctx->qflag = true;
 break;
@@ -1569,6 +1575,7 @@ static void aio_write_help(void)
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -f, -- use Force Unit Access semantics\n"
+" -i, -- treat request as invalid, for exercising stats\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
 " -u, -- with -z, allow unmapping\n"
 " -z, -- write zeroes using blk_aio_write_zeroes\n"
@@ -1582,7 +1589,7 @@ static const cmdinfo_t aio_write_cmd = {
 .cfunc  = aio_write_f,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-Cfquz] [-P pattern] off len [len..]",
+.args   = "[-Cfiquz] [-P pattern] off len [len..]",
 .oneline= "asynchronously writes a number of bytes",
 .help   = aio_write_help,
 };
@@ -1595,7 +1602,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 int flags = 0;

 ctx->blk = blk;
-while ((c = getopt(argc, argv, "CfqP:uz")) != -1) {
+while ((c = getopt(argc, argv, "CfiqP:uz")) != -1) {
 switch (c) {
 case 'C':
 ctx->Cflag = true;
@@ -1616,6 +1623,11 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 return 0;
 }
 break;
+case 'i':
+printf("injecting invalid write request\n");
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE);
+g_free(ctx);
+return 0;
 case 'z':
 ctx->zflag = true;
 break;
diff --git a/tests/qemu-iotests/136 b/tests/qemu-iotests/136
index e8c6937..5e92c4b 100644
--- a/tests/qemu-iotests/136
+++ b/tests/qemu-iotests/136
@@ -226,18 +226,14 @@ sector = "%d"

 highest_offset = wr_ops * wr_size

-# Two types of invalid operations: unaligned length and unaligned 
offset
-for i in range(invalid_rd_ops / 2):
-ops.append("aio_read 0 511")
+# Block layer abstracts away unaligned length and offset, so we
+# can't trigger an invalid op with any addresses; use qemu-io's
+# invalid injection feature instead
+for i in range(invalid_rd_ops):
+ops.append("aio_read -i 0 512")

-for i in range(invalid_rd_ops / 2, invalid_rd_ops):
-ops.append("aio_read 13 512")
-
-for i in range(invalid_wr_ops / 2):
-ops.append("aio_write 0 511")
-
-for i in range(invalid_wr_ops / 2, invalid_wr_ops):
-ops.append("aio_write 13 512")
+for i in range(invalid_wr_ops):
+ops.append("aio_write -i 0 512")

 for i in range(failed_rd_ops):
 ops.append("aio_read %d 512" % bad_offset)
-- 
2.5.5




[Qemu-devel] [PATCH 2/3] qemu-iotests: Simplify 109 with unaligned qemu-img compare

2016-05-12 Thread Eric Blake
For some time now, qemu-img compare has been able to compare
unaligned images.  So we no longer need test 109's hack of
resizing to sector boundaries before invoking compare.

Signed-off-by: Eric Blake 

---
Note that qemu-img compare on unaligned images is still a bit
underwhelming on message quality:

$ printf abc > file1
$ printf ab > file2
$ qemu-img compare file1 file2
Content mismatch at offset 0!
$ printf 'ab\0' > file1
$ qemu-img compare file1 file2
Images are identical.

The first message should claim that the mismatch is at offset 2
(or in sector 0), rather than at offset 0; and the second message
might be wise to mention that the sizes differ even though the
contents read identically (since we pad out 0s to the end of the
sector for both raw files).  But improving that is unrelated to
this patch.
---
 tests/qemu-iotests/109 | 2 --
 tests/qemu-iotests/109.out | 4 
 2 files changed, 6 deletions(-)

diff --git a/tests/qemu-iotests/109 b/tests/qemu-iotests/109
index f980b0c..adf9889 100755
--- a/tests/qemu-iotests/109
+++ b/tests/qemu-iotests/109
@@ -104,8 +104,6 @@ for sample_img in empty.bochs iotest-dirtylog-10G-4M.vhdx 
parallels-v1 \
 $QEMU_IO -c 'read -P 0 0 64k' "$TEST_IMG" | _filter_qemu_io

 run_qemu "$TEST_IMG" "$TEST_IMG.src" "'format': 'raw'," "BLOCK_JOB_READY"
-# qemu-img compare can't handle unaligned file sizes
-$QEMU_IMG resize -f raw "$TEST_IMG.src" +0
 $QEMU_IMG compare -f raw -F raw "$TEST_IMG" "$TEST_IMG.src"
 done

diff --git a/tests/qemu-iotests/109.out b/tests/qemu-iotests/109.out
index 38bc073..7c797ed 100644
--- a/tests/qemu-iotests/109.out
+++ b/tests/qemu-iotests/109.out
@@ -143,7 +143,6 @@ read 65536/65536 bytes at offset 0
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "src", "len": 2560, "offset": 2560, 
"speed": 0, "type": "mirror"}}
 {"return": [{"io-status": "ok", "device": "src", "busy": false, "len": 2560, 
"offset": 2560, "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
-Image resized.
 Warning: Image size mismatch!
 Images are identical.

@@ -164,7 +163,6 @@ read 65536/65536 bytes at offset 0
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "src", "len": 31457280, "offset": 
31457280, "speed": 0, "type": "mirror"}}
 {"return": [{"io-status": "ok", "device": "src", "busy": false, "len": 
31457280, "offset": 31457280, "paused": false, "speed": 0, "ready": true, 
"type": "mirror"}]}
-Image resized.
 Warning: Image size mismatch!
 Images are identical.

@@ -185,7 +183,6 @@ read 65536/65536 bytes at offset 0
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "src", "len": 327680, "offset": 327680, 
"speed": 0, "type": "mirror"}}
 {"return": [{"io-status": "ok", "device": "src", "busy": false, "len": 327680, 
"offset": 327680, "paused": false, "speed": 0, "ready": true, "type": 
"mirror"}]}
-Image resized.
 Warning: Image size mismatch!
 Images are identical.

@@ -206,7 +203,6 @@ read 65536/65536 bytes at offset 0
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "src", "len": 2048, "offset": 2048, 
"speed": 0, "type": "mirror"}}
 {"return": [{"io-status": "ok", "device": "src", "busy": false, "len": 2048, 
"offset": 2048, "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
-Image resized.
 Warning: Image size mismatch!
 Images are identical.

-- 
2.5.5




[Qemu-devel] [PATCH 0/3] Fix recent qemu-iotests issues

2016-05-12 Thread Eric Blake
I introduced a couple of bugs in my recent qemu-io enhancements;
time to fix them back up now that the broken patches are already
part of mainline.

Eric Blake (3):
  qemu-io: Fix missing getopt() updates
  qemu-iotests: Simplify 109 with unaligned qemu-img compare
  qemu-iotests: Fix regression in 136 on aio_read invalid

 qemu-io-cmds.c | 22 +-
 tests/qemu-iotests/109 |  2 --
 tests/qemu-iotests/109.out |  4 
 tests/qemu-iotests/136 | 18 +++---
 4 files changed, 24 insertions(+), 22 deletions(-)

-- 
2.5.5




Re: [Qemu-devel] [PATCH 00/52] 680x0 instructions emulation

2016-05-12 Thread John Paul Adrian Glaubitz
Hi!

Now that qemu 2.6.0 has been released, what about making Laurent the
maintainer for the orphaned M68K target so that the 680x0 emulation
support can be merged?

What do the qemu maintainers think? Is there anything which speaks
against my suggestion?

Thanks,
Adrian

On 05/06/2016 11:54 AM, Laurent Vivier wrote:
> 
> 
> Le 06/05/2016 à 11:35, Andreas Schwab a écrit :
>> When I bootstrap gcc with the qemu built from your 680x0-master-dev
>> branch I get a bootstrap comparison failure for a lot of files.
>> Rerunning the stage1 compiler in aranym then produces object files that
>> are identical to what the stage2 compiler produced, thus some insn (or
>> combination thereof) only used in the stage1 compiler (which is built
>> unoptimized) isn't handled correctly yet.
> 
> Yes, I know this is not perfect, but keeping all these patches in my own
> tree doesn't help.
> 
> It's why I try to have them merged.
> 
> BTW, Adrian is using this branch (680x0-master-dev) for months to build
> Debian packages.
> 
> Thanks,
> Laurent
> 


-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: [Qemu-devel] [PATCH 00/52] 680x0 instructions emulation

2016-05-12 Thread John Paul Adrian Glaubitz
On 05/12/2016 11:29 PM, Alexander Graf wrote:
> Rest assured that we're all happy to see m68k finally going back to 
> maintained state ;).

Glad to hear that.  Laurent has been doing a fantastic job on m68k and
so far it has been a pleasure to helm him improve the code with tests
and patches.

qemu-m68k has helped us in Debian to have the m68k port keep up with
the rest of the architectures.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: [Qemu-devel] [PATCH 00/52] 680x0 instructions emulation

2016-05-12 Thread John Paul Adrian Glaubitz
On 05/12/2016 11:23 PM, Alexander Graf wrote:
> I expect he'll send a v2 of the patch set that fixes all review comments
> and includes the patch to MAINTAINERS. I don't see how applying only the
> MAINTAINERS patch would help anyone? Would it speed up his work to get v2 
> out? :)

Oh, sorry, I must have missed that part. I wasn't aware that Laurent
already got enough feedback to work on revised version. Then I will
be happy to wait and looking forward to the updated patch set! :-)

Thanks for the heads-up!

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: [Qemu-devel] [PULL 64/69] qemu-io: Add 'write -f' to test FUA flag

2016-05-12 Thread Eric Blake
On 05/12/2016 08:35 AM, Kevin Wolf wrote:
> From: Eric Blake 
> 
> Make it easier to test block drivers with BDRV_REQ_FUA in
> .supported_write_flags, by adding the '-f' flag to qemu-io to
> conditionally pass the flag through to specific writes ('write',
> 'write -z', 'writev', 'aio_write', 'aio_write -z'). You'll want
> to use 'qemu-io -t none' to actually make -f useful (as
> otherwise, the default writethrough mode automatically sets the
> FUA bit on every write).
> 
> Signed-off-by: Eric Blake 
> Message-id: 1462677405-4752-6-git-send-email-ebl...@redhat.com
> Reviewed-by: Max Reitz 
> Signed-off-by: Max Reitz 
> ---
>  qemu-io-cmds.c | 57 +
>  1 file changed, 41 insertions(+), 16 deletions(-)
> 

> @@ -935,6 +938,7 @@ static void write_help(void)
> +" -f, -- use Force Unit Access semantics\n"

> @@ -951,7 +955,7 @@ static const cmdinfo_t write_cmd = {
> -.args   = "[-bcCqz] [-P pattern] off len",
> +.args   = "[-bcCfqz] [-P pattern] off len",

> @@ -969,7 +974,7 @@ static int write_f(BlockBackend *blk, int argc, char 
> **argv)
> -while ((c = getopt(argc, argv, "bcCpP:qz")) != -1) {
> +while ((c = getopt(argc, argv, "bcCfpP:qz")) != -1) {

> @@ -980,6 +985,9 @@ static int write_f(BlockBackend *blk, int argc, char 
> **argv)
>  case 'C':
>  Cflag = true;
>  break;
> +case 'f':

Four places to touch per added option.  'write -f' is okay...

> @@ -1097,6 +1110,7 @@ writev_help(void)
> +" -f, -- use Force Unit Access semantics\n"

> @@ -1108,7 +1122,7 @@ static const cmdinfo_t writev_cmd = {
> -.args   = "[-Cq] [-P pattern] off len [len..]",
> +.args   = "[-Cfq] [-P pattern] off len [len..]",

> @@ -1131,6 +1146,9 @@ static int writev_f(BlockBackend *blk, int argc, char 
> **argv)
> +case 'f':

Whoops - forgot to update getopt() for 'writev -f'. Followup patch
coming shortly.  I also botched it in 65/69, for 'aio_write -z -u'.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v4 24/26] block: rip out all traces of password prompting

2016-05-12 Thread Eric Blake
On 02/29/2016 05:00 AM, Daniel P. Berrange wrote:
> Now that qcow & qcow2 are wired up to get encryption keys
> via the QCryptoSecret object, nothing is relying on the
> interactive prompting for passwords. All the code related
> to password prompting can thus be ripped out.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  hmp.c | 31 -
>  hw/usb/dev-storage.c  | 34 
>  include/monitor/monitor.h |  7 -
>  include/qemu/osdep.h  |  2 --
>  monitor.c | 68 
> ---
>  qemu-img.c| 31 -
>  qemu-io.c | 21 ---
>  qmp.c | 10 +--
>  tests/qemu-iotests/087|  2 ++
>  util/oslib-posix.c| 66 -
>  util/oslib-win32.c| 24 -
>  11 files changed, 3 insertions(+), 293 deletions(-)

Missed a spot: in qapi-schema.json, human-monitor-command states:

# Notes: This command only exists as a stop-gap.  Its use is highly
#discouraged.  The semantics of this command are not guaranteed.
...
#   o Commands that prompt the user for data (eg. 'cont' when the block
# device is encrypted) don't currently work

but after your series, cont no longer prompts for passwords so the
comment is (or will be) stale and worth removing.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 5/9] block: Remove bdrv_aio_multiwrite()

2016-05-12 Thread Eric Blake
On 04/27/2016 07:20 AM, Kevin Wolf wrote:
> Since virtio-blk implements request merging itself these days, the only
> remaining users are test cases for the function. That doesn't make the
> function exactly useful any more.
> 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Max Reitz 
> ---

> +++ b/tests/qemu-iotests/136
> @@ -248,14 +248,6 @@ sector = "%d"
>  if failed_wr_ops > 0:
>  highest_offset = max(highest_offset, bad_offset + 512)
>  
> -for i in range(wr_merged):
> -first = i * wr_size * 2
> -second = first + wr_size
> -ops.append("multiwrite %d %d ; %d %d" %
> -   (first, wr_size, second, wr_size))
> -
> -highest_offset = max(highest_offset, wr_merged * wr_size * 2)
> -

Why not delete the wr_merged parameter from do_test_stats()...

>  # Now perform all operations
>  for op in ops:
>  self.vm.hmp_qemu_io("drive0", op)
> @@ -309,19 +301,15 @@ sector = "%d"
>  def test_flush(self):
>  self.do_test_stats(flush_ops = 8)
>  
> -def test_merged(self):
> -for i in range(5):
> -self.do_test_stats(wr_merged = i * 3)
> -
>  def test_all(self):
>  # rd_size, rd_ops, wr_size, wr_ops, flush_ops
>  # invalid_rd_ops,  invalid_wr_ops,
>  # failed_rd_ops,   failed_wr_ops
>  # wr_merged
> -test_values = [[512,1, 512,   1, 1, 4, 7, 5, 2, 1],
> -   [65536,  1, 2048, 12, 7, 7, 5, 2, 5, 5],
> -   [32768,  9, 8192,  1, 4, 3, 2, 4, 6, 4],
> -   [16384, 11, 3584, 16, 9, 8, 6, 7, 3, 4]]
> +test_values = [[512,1, 512,   1, 1, 4, 7, 5, 2, 0],

as well as remove the # wr_merged comment and the now-useless final
member of each test_values[] array entry?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

2016-05-12 Thread Neo Jia
On Thu, May 12, 2016 at 01:05:52PM -0600, Alex Williamson wrote:
> On Thu, 12 May 2016 08:00:36 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Thursday, May 12, 2016 6:06 AM
> > > 
> > > On Wed, 11 May 2016 17:15:15 +0800
> > > Jike Song  wrote:
> > >   
> > > > On 05/11/2016 12:02 AM, Neo Jia wrote:  
> > > > > On Tue, May 10, 2016 at 03:52:27PM +0800, Jike Song wrote:  
> > > > >> On 05/05/2016 05:27 PM, Tian, Kevin wrote:  
> > > >  From: Song, Jike
> > > > 
> > > >  IIUC, an api-only domain is a VFIO domain *without* underlying 
> > > >  IOMMU
> > > >  hardware. It just, as you said in another mail, "rather than
> > > >  programming them into an IOMMU for a device, it simply stores the
> > > >  translations for use by later requests".
> > > > 
> > > >  That imposes a constraint on gfx driver: hardware IOMMU must be 
> > > >  disabled.
> > > >  Otherwise, if IOMMU is present, the gfx driver eventually programs
> > > >  the hardware IOMMU with IOVA returned by pci_map_page or 
> > > >  dma_map_page;
> > > >  Meanwhile, the IOMMU backend for vgpu only maintains GPA <-> HPA
> > > >  translations without any knowledge about hardware IOMMU, how is the
> > > >  device model supposed to do to get an IOVA for a given GPA 
> > > >  (thereby HPA
> > > >  by the IOMMU backend here)?
> > > > 
> > > >  If things go as guessed above, as vfio_pin_pages() indicates, it
> > > >  pin & translate vaddr to PFN, then it will be very difficult for 
> > > >  the
> > > >  device model to figure out:
> > > > 
> > > > 1, for a given GPA, how to avoid calling dma_map_page multiple 
> > > >  times?
> > > > 2, for which page to call dma_unmap_page?
> > > > 
> > > >  --  
> > > > >>>
> > > > >>> We have to support both w/ iommu and w/o iommu case, since
> > > > >>> that fact is out of GPU driver control. A simple way is to use
> > > > >>> dma_map_page which internally will cope with w/ and w/o iommu
> > > > >>> case gracefully, i.e. return HPA w/o iommu and IOVA w/ iommu.
> > > > >>> Then in this file we only need to cache GPA to whatever dmadr_t
> > > > >>> returned by dma_map_page.
> > > > >>>  
> > > > >>
> > > > >> Hi Alex, Kirti and Neo, any thought on the IOMMU compatibility here? 
> > > > >>  
> > > > >
> > > > > Hi Jike,
> > > > >
> > > > > With mediated passthru, you still can use hardware iommu, but more 
> > > > > important
> > > > > that part is actually orthogonal to what we are discussing here as we 
> > > > > will only
> > > > > cache the mapping between , 
> > > > > once we
> > > > > have pinned pages later with the help of above info, you can map it 
> > > > > into the
> > > > > proper iommu domain if the system has configured so.
> > > > >  
> > > >
> > > > Hi Neo,
> > > >
> > > > Technically yes you can map a pfn into the proper IOMMU domain 
> > > > elsewhere,
> > > > but to find out whether a pfn was previously mapped or not, you have to
> > > > track it with another rbtree-alike data structure (the IOMMU driver 
> > > > simply
> > > > doesn't bother with tracking), that seems somehow duplicate with the 
> > > > vGPU
> > > > IOMMU backend we are discussing here.
> > > >
> > > > And it is also semantically correct for an IOMMU backend to handle both 
> > > > w/
> > > > and w/o an IOMMU hardware? :)  
> > > 
> > > A problem with the iommu doing the dma_map_page() though is for what
> > > device does it do this?  In the mediated case the vfio infrastructure
> > > is dealing with a software representation of a device.  For all we
> > > know that software model could transparently migrate from one physical
> > > GPU to another.  There may not even be a physical device backing
> > > the mediated device.  Those are details left to the vgpu driver itself.  
> > 
> > This is a fair argument. VFIO iommu driver simply serves user space
> > requests, where only vaddr<->iova (essentially gpa in kvm case) is
> > mattered. How iova is mapped into real IOMMU is not VFIO's interest.
> > 
> > > 
> > > Perhaps one possibility would be to allow the vgpu driver to register
> > > map and unmap callbacks.  The unmap callback might provide the
> > > invalidation interface that we're so far missing.  The combination of
> > > map and unmap callbacks might simplify the Intel approach of pinning the
> > > entire VM memory space, ie. for each map callback do a translation
> > > (pin) and dma_map_page, for each unmap do a dma_unmap_page and release
> > > the translation.  There's still the problem of where that dma_addr_t
> > > from the dma_map_page is stored though.  Someone would need to keep
> > > track of iova to dma_addr_t.  The vfio iommu might be a place to do
> > > that since we're already tracking information based on iova, possibly
> > > in an opaque data element provided by the vgpu driver.  However, 

Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

2016-05-12 Thread Neo Jia
On Thu, May 12, 2016 at 12:11:00PM +0800, Jike Song wrote:
> On Thu, May 12, 2016 at 6:06 AM, Alex Williamson
>  wrote:
> > On Wed, 11 May 2016 17:15:15 +0800
> > Jike Song  wrote:
> >
> >> On 05/11/2016 12:02 AM, Neo Jia wrote:
> >> > On Tue, May 10, 2016 at 03:52:27PM +0800, Jike Song wrote:
> >> >> On 05/05/2016 05:27 PM, Tian, Kevin wrote:
> >>  From: Song, Jike
> >> 
> >>  IIUC, an api-only domain is a VFIO domain *without* underlying IOMMU
> >>  hardware. It just, as you said in another mail, "rather than
> >>  programming them into an IOMMU for a device, it simply stores the
> >>  translations for use by later requests".
> >> 
> >>  That imposes a constraint on gfx driver: hardware IOMMU must be 
> >>  disabled.
> >>  Otherwise, if IOMMU is present, the gfx driver eventually programs
> >>  the hardware IOMMU with IOVA returned by pci_map_page or dma_map_page;
> >>  Meanwhile, the IOMMU backend for vgpu only maintains GPA <-> HPA
> >>  translations without any knowledge about hardware IOMMU, how is the
> >>  device model supposed to do to get an IOVA for a given GPA (thereby 
> >>  HPA
> >>  by the IOMMU backend here)?
> >> 
> >>  If things go as guessed above, as vfio_pin_pages() indicates, it
> >>  pin & translate vaddr to PFN, then it will be very difficult for the
> >>  device model to figure out:
> >> 
> >>   1, for a given GPA, how to avoid calling dma_map_page multiple times?
> >>   2, for which page to call dma_unmap_page?
> >> 
> >>  --
> >> >>>
> >> >>> We have to support both w/ iommu and w/o iommu case, since
> >> >>> that fact is out of GPU driver control. A simple way is to use
> >> >>> dma_map_page which internally will cope with w/ and w/o iommu
> >> >>> case gracefully, i.e. return HPA w/o iommu and IOVA w/ iommu.
> >> >>> Then in this file we only need to cache GPA to whatever dmadr_t
> >> >>> returned by dma_map_page.
> >> >>>
> >> >>
> >> >> Hi Alex, Kirti and Neo, any thought on the IOMMU compatibility here?
> >> >
> >> > Hi Jike,
> >> >
> >> > With mediated passthru, you still can use hardware iommu, but more 
> >> > important
> >> > that part is actually orthogonal to what we are discussing here as we 
> >> > will only
> >> > cache the mapping between , 
> >> > once we
> >> > have pinned pages later with the help of above info, you can map it into 
> >> > the
> >> > proper iommu domain if the system has configured so.
> >> >
> >>
> >> Hi Neo,
> >>
> >> Technically yes you can map a pfn into the proper IOMMU domain elsewhere,
> >> but to find out whether a pfn was previously mapped or not, you have to
> >> track it with another rbtree-alike data structure (the IOMMU driver simply
> >> doesn't bother with tracking), that seems somehow duplicate with the vGPU
> >> IOMMU backend we are discussing here.
> >>
> >> And it is also semantically correct for an IOMMU backend to handle both w/
> >> and w/o an IOMMU hardware? :)
> >
> > A problem with the iommu doing the dma_map_page() though is for what
> > device does it do this?  In the mediated case the vfio infrastructure
> > is dealing with a software representation of a device.  For all we
> > know that software model could transparently migrate from one physical
> > GPU to another.  There may not even be a physical device backing
> > the mediated device.  Those are details left to the vgpu driver itself.
> >
> 
> Great point :) Yes, I agree it's a bit intrusive to do the mapping for
> a particular
> pdev in an vGPU IOMMU BE.
> 
> > Perhaps one possibility would be to allow the vgpu driver to register
> > map and unmap callbacks.  The unmap callback might provide the
> > invalidation interface that we're so far missing.  The combination of
> > map and unmap callbacks might simplify the Intel approach of pinning the
> > entire VM memory space, ie. for each map callback do a translation
> > (pin) and dma_map_page, for each unmap do a dma_unmap_page and release
> > the translation.
> 
> Yes adding map/unmap ops in pGPU drvier (I assume you are refering to
> gpu_device_ops as
> implemented in Kirti's patch) sounds a good idea, satisfying both: 1)
> keeping vGPU purely
> virtual; 2) dealing with the Linux DMA API to achive hardware IOMMU
> compatibility.
> 
> PS, this has very little to do with pinning wholly or partially. Intel KVMGT 
> has
> once been had the whole guest memory pinned, only because we used a spinlock,
> which can't sleep at runtime.  We have removed that spinlock in our another
> upstreaming effort, not here but for i915 driver, so probably no biggie.
> 

OK, then you guys don't need to pin everything. The next question will be if you
can send the pinning request from your mediated driver backend to request memory
pinning like we have demonstrated in the v3 patch, function vfio_pin_pages and
vfio_unpin_pages?

Thanks,
Neo

> 
> > There's still the problem of where that 

Re: [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held

2016-05-12 Thread Sergey Fedorov
On 11/05/16 16:46, Paolo Bonzini wrote:
> On 11/05/2016 15:36, Sergey Fedorov wrote:
>> On 11/05/16 15:58, Paolo Bonzini wrote:
>>> On 06/05/2016 20:22, Sergey Fedorov wrote:
 However, there's no sensible description of what is protected by tb_lock
 and mmap_lock. I think we need to have a clear documented description of
 the TCG locking scheme in order to be sure we do right things in MTTCG.
>>> I think there was such a patch somewhere, but: tb_lock basically
>>> protects tcg_ctx, while mmap_lock protects the user-mode emulation page
>>> table (the equivalent for system emulation is the memory map which is
>>> protected by the BQL).  Furthermore, mmap_lock must be taken outside
>>> tb_lock.
>> What's a user-mode emulation page table? 'l1_map'?
> Yes.  It's used beyond TCG in user-mode emulation.
>
>> It is used by system
>> emulation to keep track of TBs per page and 'code_bitmap'. Shouldn't it
>> be protected with 'mmap_lock' in system emulation?
> tb_lock is used instead because it's taken everywhere system emulation
> uses l1_map; so tb_lock is protecting l1_map too in system emulation.
>
> As mentioned above, user-mode emulation uses l1_map in linux-user/mmap.c
> via page_{get,set}_flags, which I guess is why the lock is separate.
> None of us was involved in the original multi-threaded linux-user work,
> we're reverse engineering it just like you. :)

While I'm investigating 'tb_lock' and 'mmap_lock' usage I am wondering
why don't put 'l1_map' into 'tcg_ctx'?

Kind regards,
Sergey



Re: [Qemu-devel] [PATCH 7/7] ipmi: Add ACPI to the SMBus IPMI device

2016-05-12 Thread Michael S. Tsirkin
On Thu, May 12, 2016 at 02:20:25PM -0500, Corey Minyard wrote:
> On 05/12/2016 08:35 AM, Michael S. Tsirkin wrote:
> >On Thu, May 12, 2016 at 08:32:51AM -0500, Corey Minyard wrote:
> >>On 05/12/2016 02:36 AM, Michael S. Tsirkin wrote:
> >>>On Wed, May 11, 2016 at 02:46:06PM -0500, miny...@acm.org wrote:
> From: Corey Minyard 
> 
> Signed-off-by: Corey Minyard 
> ---
>   hw/ipmi/smbus_ipmi.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ipmi/smbus_ipmi.c b/hw/ipmi/smbus_ipmi.c
> index 4e7203b..3a34aaf 100644
> --- a/hw/ipmi/smbus_ipmi.c
> +++ b/hw/ipmi/smbus_ipmi.c
> @@ -167,6 +167,7 @@ static void smbus_ipmi_realize(DeviceState *dev, 
> Error **errp)
>   sid->fwinfo.base_address = sid->parent.i2c.address;
>   sid->fwinfo.memspace = IPMI_MEMSPACE_SMBUS;
>   sid->fwinfo.register_spacing = 1;
> +sid->fwinfo.acpi_parent = "\\_SB.PCI0.SMB0";
> >>>I don't think it's a good idea to spread things like PCI0
> >>>outside acpi-build.c. Why do you want to pass in the path
> >>>at all?
> >>You have to define the namespace for the ASL definition,
> >>and you have to give the name in the serial bus definition.
> >>However, thinking it through some more, this name needs
> >>to come from the I2C device, not just hard-coded here.
> >>
> >>-corey
> >For now you can put it as PCI0 within aml-build.c.
> >
> >Also do we need \\_SB.PCI0? Why not just create the
> >device within PCI0 scope?
> I'm not sure I follow here. \SB.PCI0.SMB0 is the proper
> scope to identify which SMBus the device is on.  This
> is how the ISA devices are added, for instance.

We add most of them within PCI0 scope instead.
E.g.

Device (PCI0) {
Device (SMB0) {
}
}

this way device does not need to know where it is.


> What I've done now to fix this is added an ACPI namespace
> to the I2C bus structure and stored the value for the I2C
> bus there in the i386 code.
> 
> Maybe it should be in BusState?  That way the ISA IPMI code
> could pull it from the bus, too.
> 
> Putting PCI0 into aml-build.c (or hw/acpi/ipmi.c, really) seems
> like a violation of scope.
> 
> -corey

Generally we scan each bus and list devices found there.


>   ipmi_add_fwinfo(>fwinfo, errp);
>   }
> -- 
> 2.7.4



Re: [Qemu-devel] [PATCH 7/7] ipmi: Add ACPI to the SMBus IPMI device

2016-05-12 Thread Corey Minyard

On 05/12/2016 08:35 AM, Michael S. Tsirkin wrote:

On Thu, May 12, 2016 at 08:32:51AM -0500, Corey Minyard wrote:

On 05/12/2016 02:36 AM, Michael S. Tsirkin wrote:

On Wed, May 11, 2016 at 02:46:06PM -0500, miny...@acm.org wrote:

From: Corey Minyard 

Signed-off-by: Corey Minyard 
---
  hw/ipmi/smbus_ipmi.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/ipmi/smbus_ipmi.c b/hw/ipmi/smbus_ipmi.c
index 4e7203b..3a34aaf 100644
--- a/hw/ipmi/smbus_ipmi.c
+++ b/hw/ipmi/smbus_ipmi.c
@@ -167,6 +167,7 @@ static void smbus_ipmi_realize(DeviceState *dev, Error 
**errp)
  sid->fwinfo.base_address = sid->parent.i2c.address;
  sid->fwinfo.memspace = IPMI_MEMSPACE_SMBUS;
  sid->fwinfo.register_spacing = 1;
+sid->fwinfo.acpi_parent = "\\_SB.PCI0.SMB0";

I don't think it's a good idea to spread things like PCI0
outside acpi-build.c. Why do you want to pass in the path
at all?

You have to define the namespace for the ASL definition,
and you have to give the name in the serial bus definition.
However, thinking it through some more, this name needs
to come from the I2C device, not just hard-coded here.

-corey

For now you can put it as PCI0 within aml-build.c.

Also do we need \\_SB.PCI0? Why not just create the
device within PCI0 scope?

I'm not sure I follow here. \SB.PCI0.SMB0 is the proper
scope to identify which SMBus the device is on.  This
is how the ISA devices are added, for instance.

What I've done now to fix this is added an ACPI namespace
to the I2C bus structure and stored the value for the I2C
bus there in the i386 code.

Maybe it should be in BusState?  That way the ISA IPMI code
could pull it from the bus, too.

Putting PCI0 into aml-build.c (or hw/acpi/ipmi.c, really) seems
like a violation of scope.

-corey


  ipmi_add_fwinfo(>fwinfo, errp);
  }
--
2.7.4





Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

2016-05-12 Thread Alex Williamson
On Thu, 12 May 2016 08:00:36 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Thursday, May 12, 2016 6:06 AM
> > 
> > On Wed, 11 May 2016 17:15:15 +0800
> > Jike Song  wrote:
> >   
> > > On 05/11/2016 12:02 AM, Neo Jia wrote:  
> > > > On Tue, May 10, 2016 at 03:52:27PM +0800, Jike Song wrote:  
> > > >> On 05/05/2016 05:27 PM, Tian, Kevin wrote:  
> > >  From: Song, Jike
> > > 
> > >  IIUC, an api-only domain is a VFIO domain *without* underlying IOMMU
> > >  hardware. It just, as you said in another mail, "rather than
> > >  programming them into an IOMMU for a device, it simply stores the
> > >  translations for use by later requests".
> > > 
> > >  That imposes a constraint on gfx driver: hardware IOMMU must be 
> > >  disabled.
> > >  Otherwise, if IOMMU is present, the gfx driver eventually programs
> > >  the hardware IOMMU with IOVA returned by pci_map_page or 
> > >  dma_map_page;
> > >  Meanwhile, the IOMMU backend for vgpu only maintains GPA <-> HPA
> > >  translations without any knowledge about hardware IOMMU, how is the
> > >  device model supposed to do to get an IOVA for a given GPA (thereby 
> > >  HPA
> > >  by the IOMMU backend here)?
> > > 
> > >  If things go as guessed above, as vfio_pin_pages() indicates, it
> > >  pin & translate vaddr to PFN, then it will be very difficult for the
> > >  device model to figure out:
> > > 
> > >   1, for a given GPA, how to avoid calling dma_map_page multiple 
> > >  times?
> > >   2, for which page to call dma_unmap_page?
> > > 
> > >  --  
> > > >>>
> > > >>> We have to support both w/ iommu and w/o iommu case, since
> > > >>> that fact is out of GPU driver control. A simple way is to use
> > > >>> dma_map_page which internally will cope with w/ and w/o iommu
> > > >>> case gracefully, i.e. return HPA w/o iommu and IOVA w/ iommu.
> > > >>> Then in this file we only need to cache GPA to whatever dmadr_t
> > > >>> returned by dma_map_page.
> > > >>>  
> > > >>
> > > >> Hi Alex, Kirti and Neo, any thought on the IOMMU compatibility here?  
> > > >
> > > > Hi Jike,
> > > >
> > > > With mediated passthru, you still can use hardware iommu, but more 
> > > > important
> > > > that part is actually orthogonal to what we are discussing here as we 
> > > > will only
> > > > cache the mapping between , 
> > > > once we
> > > > have pinned pages later with the help of above info, you can map it 
> > > > into the
> > > > proper iommu domain if the system has configured so.
> > > >  
> > >
> > > Hi Neo,
> > >
> > > Technically yes you can map a pfn into the proper IOMMU domain elsewhere,
> > > but to find out whether a pfn was previously mapped or not, you have to
> > > track it with another rbtree-alike data structure (the IOMMU driver simply
> > > doesn't bother with tracking), that seems somehow duplicate with the vGPU
> > > IOMMU backend we are discussing here.
> > >
> > > And it is also semantically correct for an IOMMU backend to handle both w/
> > > and w/o an IOMMU hardware? :)  
> > 
> > A problem with the iommu doing the dma_map_page() though is for what
> > device does it do this?  In the mediated case the vfio infrastructure
> > is dealing with a software representation of a device.  For all we
> > know that software model could transparently migrate from one physical
> > GPU to another.  There may not even be a physical device backing
> > the mediated device.  Those are details left to the vgpu driver itself.  
> 
> This is a fair argument. VFIO iommu driver simply serves user space
> requests, where only vaddr<->iova (essentially gpa in kvm case) is
> mattered. How iova is mapped into real IOMMU is not VFIO's interest.
> 
> > 
> > Perhaps one possibility would be to allow the vgpu driver to register
> > map and unmap callbacks.  The unmap callback might provide the
> > invalidation interface that we're so far missing.  The combination of
> > map and unmap callbacks might simplify the Intel approach of pinning the
> > entire VM memory space, ie. for each map callback do a translation
> > (pin) and dma_map_page, for each unmap do a dma_unmap_page and release
> > the translation.  There's still the problem of where that dma_addr_t
> > from the dma_map_page is stored though.  Someone would need to keep
> > track of iova to dma_addr_t.  The vfio iommu might be a place to do
> > that since we're already tracking information based on iova, possibly
> > in an opaque data element provided by the vgpu driver.  However, we're
> > going to need to take a serious look at whether an rb-tree is the right
> > data structure for the job.  It works well for the current type1
> > functionality where we typically have tens of entries.  I think the
> > NVIDIA model of sparse pinning the VM is pushing that up to tens of
> > thousands.  If Intel intends to pin 

[Qemu-devel] [PATCH v2 21/28] linux-user: Add debug code to exercise restarting system calls

2016-05-12 Thread Peter Maydell
From: Timothy E Baldwin 

If DEBUG_ERESTARTSYS is set restart all system calls once. This
is pure debug code for exercising the syscall restart code paths
in the per-architecture cpu main loops.

Signed-off-by: Timothy Edward Baldwin 
Message-id: 1441497448-32489-10-git-send-email-t.e.baldwi...@members.leeds.ac.uk
[PMM: Add comment and a commented-out #define next to the commented-out
 generic DEBUG #define; remove the check on TARGET_USE_ERESTARTSYS;
 tweak comment message]
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 linux-user/syscall.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index a4a1af7..ced519d 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -110,6 +110,10 @@ int __clone2(int (*fn)(void *), void *child_stack_base,
 CLONE_PARENT_SETTID | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID)
 
 //#define DEBUG
+/* Define DEBUG_ERESTARTSYS to force every syscall to be restarted
+ * once. This exercises the codepaths for restart.
+ */
+//#define DEBUG_ERESTARTSYS
 
 //#include 
 #defineVFAT_IOCTL_READDIR_BOTH _IOR('r', 1, struct 
linux_dirent [2])
@@ -5871,6 +5875,21 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 struct statfs stfs;
 void *p;
 
+#if defined(DEBUG_ERESTARTSYS)
+/* Debug-only code for exercising the syscall-restart code paths
+ * in the per-architecture cpu main loops: restart every syscall
+ * the guest makes once before letting it through.
+ */
+{
+static int flag;
+
+flag = !flag;
+if (flag) {
+return -TARGET_ERESTARTSYS;
+}
+}
+#endif
+
 #ifdef DEBUG
 gemu_log("syscall %d", num);
 #endif
-- 
1.9.1




Re: [Qemu-devel] [PATCH v2 1/2] exec: [tcg] Track which vCPU is performing translation and execution

2016-05-12 Thread Lluís Vilanova
Paolo Bonzini writes:

> On 11/05/2016 21:55, Lluís Vilanova wrote:
>> 
>> diff --git a/translate-all.c b/translate-all.c
>> index 8329ea6..1c16b14 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -1092,6 +1092,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>> ti = profile_getclock();
>> #endif
>> 
>> +tcg_ctx.cpu = ENV_GET_CPU(env);
>> +
>> tcg_func_start(_ctx);
>> 
>> gen_intermediate_code(env, tb);
>> 

> I prefer to also set this to NULL outside translation.

Do you mean after the call to gen_intermediate_code()?


Cheers,
  Lluis




Re: [Qemu-devel] [PATCH v5 3/3] docs: Add a generic loader explanation document

2016-05-12 Thread Alistair Francis
On Thu, May 12, 2016 at 9:24 AM, Eric Blake  wrote:
> On 05/12/2016 10:13 AM, Alistair Francis wrote:
>> Signed-off-by: Alistair Francis 
>> ---
>> V4:
>>  - Re-write to be more comprehensive
>>
>>  docs/generic-loader.txt | 56 
>> +
>>  1 file changed, 56 insertions(+)
>>  create mode 100644 docs/generic-loader.txt
>
>> +Loading Memory Values
>> +---
>
> Worth matching  line length to the line above?

Good point, fixed.

>
>> +
>> +NOTE: The loader device supports other options (see the next section) but 
>> they
>> +  do not apply to setting memory values and will be ignored.
>
> Ignoring invalid option combinations is not as friendly as outright
> rejecting them.

Ok, I have added stricter checking to cover most invalid use cases.

>
>
>> +
>> +Loading Files
>> +---
>
> And again

Fixed.

>
>> +The loader device also allows files to be loaded into memory. This can be 
>> done
>> +similarly to setting memory values. The syntax is shown below:
>> +
>> +-device loader,file=,addr=,cpu-num=,force-raw=
>> +
>> +  - A file to be loaded into memory
>> +  - The addr in memory that the file should be loaded. This is
>> +  ignored if you are using an ELF (unless force-raw is 
>> true).
>> +  This is requried if you aren't loading an ELF.
>
> s/requried/required/

Fixed

>
>> +   - This specifices the CPU that should be used. This is an
>
> s/specifices/specifies/

Fixed

>
>> +  optional argument and will cause the CPU's PC to be set to
>> +  where the image is stored. This option should only be used
>> +  for the boot image.
>> + - Forces the file to be treated as a raw image. This can be
>> +  used to specificy the load address of ELF files.
>
> s/specificy/specify/

Fixed.

Thanks for reading the patch.

Alistair

>
>> +
>> +An example of loading an ELF file which CPU0 will boot is shown below:
>> +-device loader,file=./images/boot.elf,cpu-num=0
>>
>
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>



[Qemu-devel] [PATCH v2 12/28] linux-user: Support for restarting system calls for Alpha targets

2016-05-12 Thread Peter Maydell
From: Timothy E Baldwin 

Update the Alpha main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin 
Message-id: 1441497448-32489-13-git-send-email-t.e.baldwi...@members.leeds.ac.uk
Reviewed-by: Peter Maydell 
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define;
 PC is env->pc, not env->ir[IR_PV]]
Signed-off-by: Peter Maydell 
---
 linux-user/alpha/target_signal.h | 1 +
 linux-user/main.c| 7 +--
 linux-user/signal.c  | 4 ++--
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/linux-user/alpha/target_signal.h b/linux-user/alpha/target_signal.h
index d3822da..4c78319 100644
--- a/linux-user/alpha/target_signal.h
+++ b/linux-user/alpha/target_signal.h
@@ -27,6 +27,7 @@ static inline abi_ulong get_sp_from_cpustate(CPUAlphaState 
*state)
 return state->ir[IR_SP];
 }
 
+
 /* From .  */
 #define TARGET_GEN_INTOVF  -1  /* integer overflow */
 #define TARGET_GEN_INTDIV  -2  /* integer division by zero */
diff --git a/linux-user/main.c b/linux-user/main.c
index c2dc4b2..94dc2d4 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -3266,8 +3266,11 @@ void cpu_loop(CPUAlphaState *env)
 env->ir[IR_A2], env->ir[IR_A3],
 env->ir[IR_A4], env->ir[IR_A5],
 0, 0);
-if (trapnr == TARGET_NR_sigreturn
-|| trapnr == TARGET_NR_rt_sigreturn) {
+if (sysret == -TARGET_ERESTARTSYS) {
+env->pc -= 4;
+break;
+}
+if (sysret == -TARGET_QEMU_ESIGRETURN) {
 break;
 }
 /* Syscall writes 0 to V0 to bypass error check, similar
diff --git a/linux-user/signal.c b/linux-user/signal.c
index 8b5ddf2..559e764 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -5527,7 +5527,7 @@ long do_sigreturn(CPUAlphaState *env)
 
 restore_sigcontext(env, sc);
 unlock_user_struct(sc, sc_addr, 0);
-return env->ir[IR_V0];
+return -TARGET_QEMU_ESIGRETURN;
 
 badframe:
 force_sig(TARGET_SIGSEGV);
@@ -5554,7 +5554,7 @@ long do_rt_sigreturn(CPUAlphaState *env)
 }
 
 unlock_user_struct(frame, frame_addr, 0);
-return env->ir[IR_V0];
+return -TARGET_QEMU_ESIGRETURN;
 
 
 badframe:
-- 
1.9.1




[Qemu-devel] [PATCH v2 28/28] linux-user: Use safe_syscall for futex syscall

2016-05-12 Thread Peter Maydell
Use the safe_syscall wrapper for the futex syscall.

In particular, this fixes hangs when using programs that link
against the Boehm garbage collector, including the Mono runtime.

(We don't change the sys_futex() call in the implementation of
the exit syscall, because as the FIXME comment there notes
that should be handled by disabling signals, since we can't
easily back out if the futex were to return ERESTARTSYS.)

Signed-off-by: Peter Maydell 
---
 linux-user/syscall.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index c9c2ae9..4e419fb 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -697,6 +697,8 @@ safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, 
siginfo_t *, infop, \
 safe_syscall3(int, execve, const char *, filename, char **, argv, char **, 
envp)
 safe_syscall6(int, pselect6, int, nfds, fd_set *, readfds, fd_set *, writefds, 
\
   fd_set *, exceptfds, struct timespec *, timeout, void *, sig)
+safe_syscall6(int,futex,int *,uaddr,int,op,int,val, \
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
 
 static inline int host_to_target_sock_type(int host_type)
 {
@@ -5381,12 +5383,12 @@ static int do_futex(target_ulong uaddr, int op, int 
val, target_ulong timeout,
 } else {
 pts = NULL;
 }
-return get_errno(sys_futex(g2h(uaddr), op, tswap32(val),
+return get_errno(safe_futex(g2h(uaddr), op, tswap32(val),
  pts, NULL, val3));
 case FUTEX_WAKE:
-return get_errno(sys_futex(g2h(uaddr), op, val, NULL, NULL, 0));
+return get_errno(safe_futex(g2h(uaddr), op, val, NULL, NULL, 0));
 case FUTEX_FD:
-return get_errno(sys_futex(g2h(uaddr), op, val, NULL, NULL, 0));
+return get_errno(safe_futex(g2h(uaddr), op, val, NULL, NULL, 0));
 case FUTEX_REQUEUE:
 case FUTEX_CMP_REQUEUE:
 case FUTEX_WAKE_OP:
@@ -5396,11 +5398,11 @@ static int do_futex(target_ulong uaddr, int op, int 
val, target_ulong timeout,
to satisfy the compiler.  We do not need to tswap TIMEOUT
since it's not compared to guest memory.  */
 pts = (struct timespec *)(uintptr_t) timeout;
-return get_errno(sys_futex(g2h(uaddr), op, val, pts,
-   g2h(uaddr2),
-   (base_op == FUTEX_CMP_REQUEUE
-? tswap32(val3)
-: val3)));
+return get_errno(safe_futex(g2h(uaddr), op, val, pts,
+g2h(uaddr2),
+(base_op == FUTEX_CMP_REQUEUE
+ ? tswap32(val3)
+ : val3)));
 default:
 return -TARGET_ENOSYS;
 }
-- 
1.9.1




[Qemu-devel] [PATCH v2 14/28] linux-user: Support for restarting system calls for OpenRISC targets

2016-05-12 Thread Peter Maydell
From: Timothy E Baldwin 

Update the OpenRISC main loop code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

(We don't implement sigreturn on this target so there is no
code there to update.)

Signed-off-by: Timothy Edward Baldwin 
Message-id: 1441497448-32489-31-git-send-email-t.e.baldwi...@members.leeds.ac.uk
Reviewed-by: Peter Maydell 
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell 
---
 linux-user/main.c   | 22 ++
 linux-user/openrisc/target_signal.h |  1 +
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 5e5efa9..fa75521 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -2723,6 +2723,7 @@ void cpu_loop(CPUOpenRISCState *env)
 {
 CPUState *cs = CPU(openrisc_env_get_cpu(env));
 int trapnr, gdbsig;
+abi_long ret;
 
 for (;;) {
 cpu_exec_start(cs);
@@ -2768,14 +2769,19 @@ void cpu_loop(CPUOpenRISCState *env)
 break;
 case EXCP_SYSCALL:
 env->pc += 4;   /* 0xc00; */
-env->gpr[11] = do_syscall(env,
-  env->gpr[11], /* return value   */
-  env->gpr[3],  /* r3 - r7 are params */
-  env->gpr[4],
-  env->gpr[5],
-  env->gpr[6],
-  env->gpr[7],
-  env->gpr[8], 0, 0);
+ret = do_syscall(env,
+ env->gpr[11], /* return value   */
+ env->gpr[3],  /* r3 - r7 are params */
+ env->gpr[4],
+ env->gpr[5],
+ env->gpr[6],
+ env->gpr[7],
+ env->gpr[8], 0, 0);
+if (ret == -TARGET_ERESTARTSYS) {
+env->pc -= 4;
+} else if (ret != -TARGET_QEMU_ESIGRETURN) {
+env->gpr[11] = ret;
+}
 break;
 case EXCP_FPE:
 qemu_log_mask(CPU_LOG_INT, "\nFloating point error\n");
diff --git a/linux-user/openrisc/target_signal.h 
b/linux-user/openrisc/target_signal.h
index 964aed6..f600501 100644
--- a/linux-user/openrisc/target_signal.h
+++ b/linux-user/openrisc/target_signal.h
@@ -23,4 +23,5 @@ static inline abi_ulong get_sp_from_cpustate(CPUOpenRISCState 
*state)
 return state->gpr[1];
 }
 
+
 #endif /* TARGET_SIGNAL_H */
-- 
1.9.1




[Qemu-devel] [PATCH v2 10/28] linux-user: Support for restarting system calls for SPARC targets

2016-05-12 Thread Peter Maydell
From: Timothy E Baldwin 

Update the SPARC main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin 
Message-id: 1441497448-32489-9-git-send-email-t.e.baldwi...@members.leeds.ac.uk
[PMM: Commit message tweaks; drop TARGET_USE_ERESTARTSYS define]
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 linux-user/main.c  | 3 +++
 linux-user/signal.c| 2 +-
 linux-user/sparc/target_signal.h   | 1 +
 linux-user/sparc64/target_signal.h | 1 +
 4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index be8dcdb..eec68c7 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -1375,6 +1375,9 @@ void cpu_loop (CPUSPARCState *env)
   env->regwptr[2], env->regwptr[3],
   env->regwptr[4], env->regwptr[5],
   0, 0);
+if (ret == -TARGET_ERESTARTSYS || ret == -TARGET_QEMU_ESIGRETURN) {
+break;
+}
 if ((abi_ulong)ret >= (abi_ulong)(-515)) {
 #if defined(TARGET_SPARC64) && !defined(TARGET_ABI32)
 env->xcc |= PSR_CARRY;
diff --git a/linux-user/signal.c b/linux-user/signal.c
index 14e58b0..e742347 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -2449,7 +2449,7 @@ long do_sigreturn(CPUSPARCState *env)
 goto segv_and_exit;
 }
 unlock_user_struct(sf, sf_addr, 0);
-return env->regwptr[0];
+return -TARGET_QEMU_ESIGRETURN;
 
 segv_and_exit:
 unlock_user_struct(sf, sf_addr, 0);
diff --git a/linux-user/sparc/target_signal.h b/linux-user/sparc/target_signal.h
index c7de300..2df38c8 100644
--- a/linux-user/sparc/target_signal.h
+++ b/linux-user/sparc/target_signal.h
@@ -33,4 +33,5 @@ static inline abi_ulong get_sp_from_cpustate(CPUSPARCState 
*state)
 return state->regwptr[UREG_FP];
 }
 
+
 #endif /* TARGET_SIGNAL_H */
diff --git a/linux-user/sparc64/target_signal.h 
b/linux-user/sparc64/target_signal.h
index c7de300..2df38c8 100644
--- a/linux-user/sparc64/target_signal.h
+++ b/linux-user/sparc64/target_signal.h
@@ -33,4 +33,5 @@ static inline abi_ulong get_sp_from_cpustate(CPUSPARCState 
*state)
 return state->regwptr[UREG_FP];
 }
 
+
 #endif /* TARGET_SIGNAL_H */
-- 
1.9.1




[Qemu-devel] [PATCH v2 26/28] linux-user: Use safe_syscall for execve syscall

2016-05-12 Thread Peter Maydell
From: Timothy E Baldwin 

Wrap execve() in the safe-syscall handling. Although execve() is not
an interruptible syscall, it is a special case: if we allow a signal
to happen before we make the host$ syscall then we will 'lose' it,
because at the point of execve the process leaves QEMU's control.  So
we use the safe syscall wrapper to ensure that we either take the
signal as a guest signal, or else it does not happen before the
execve completes and makes it the other program's problem.

The practical upshot is that without this SIGTERM could fail to
terminate the process.

Signed-off-by: Timothy Edward Baldwin 
Message-id: 1441497448-32489-25-git-send-email-t.e.baldwi...@members.leeds.ac.uk
[PMM: expanded commit message to explain in more detail why this is
 needed, and add comment about it too]
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 linux-user/syscall.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index d9f4695..dea827f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -703,6 +703,7 @@ safe_syscall4(pid_t, wait4, pid_t, pid, int *, status, int, 
options, \
   struct rusage *, rusage)
 safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, siginfo_t *, infop, \
   int, options, struct rusage *, rusage)
+safe_syscall3(int, execve, const char *, filename, char **, argv, char **, 
envp)
 
 static inline int host_to_target_sock_type(int host_type)
 {
@@ -6179,7 +6180,17 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 
 if (!(p = lock_user_string(arg1)))
 goto execve_efault;
-ret = get_errno(execve(p, argp, envp));
+/* Although execve() is not an interruptible syscall it is
+ * a special case where we must use the safe_syscall wrapper:
+ * if we allow a signal to happen before we make the host
+ * syscall then we will 'lose' it, because at the point of
+ * execve the process leaves QEMU's control. So we use the
+ * safe syscall wrapper to ensure that we either take the
+ * signal as a guest signal, or else it does not happen
+ * before the execve completes and makes it the other
+ * program's problem.
+ */
+ret = get_errno(safe_execve(p, argp, envp));
 unlock_user(p, arg1, 0);
 
 goto execve_end;
-- 
1.9.1




[Qemu-devel] [PATCH v2 03/28] linux-user: Reindent signal handling

2016-05-12 Thread Peter Maydell
From: Timothy E Baldwin 

Some of the signal handling was a mess with a mixture of tabs and 8 space
indents.

Signed-off-by: Timothy Edward Baldwin 
Message-id: 1441497448-32489-3-git-send-email-t.e.baldwi...@members.leeds.ac.uk
Reviewed-by: Peter Maydell 
[PMM: just rebased]
Signed-off-by: Peter Maydell 
---
 linux-user/signal.c | 1543 ++-
 1 file changed, 791 insertions(+), 752 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 96e86c0..04c21d0 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -157,7 +157,7 @@ static void target_to_host_sigset_internal(sigset_t *d,
 if (target_sigismember(s, i)) {
 sigaddset(d, target_to_host_signal(i));
 }
- }
+}
 }
 
 void target_to_host_sigset(sigset_t *d, const target_sigset_t *s)
@@ -250,18 +250,18 @@ static inline void 
host_to_target_siginfo_noswap(target_siginfo_t *tinfo,
 tinfo->si_code = info->si_code;
 
 if (sig == TARGET_SIGILL || sig == TARGET_SIGFPE || sig == TARGET_SIGSEGV
-|| sig == TARGET_SIGBUS || sig == TARGET_SIGTRAP) {
+|| sig == TARGET_SIGBUS || sig == TARGET_SIGTRAP) {
 /* Should never come here, but who knows. The information for
the target is irrelevant.  */
 tinfo->_sifields._sigfault._addr = 0;
 } else if (sig == TARGET_SIGIO) {
 tinfo->_sifields._sigpoll._band = info->si_band;
-   tinfo->_sifields._sigpoll._fd = info->si_fd;
+tinfo->_sifields._sigpoll._fd = info->si_fd;
 } else if (sig == TARGET_SIGCHLD) {
 tinfo->_sifields._sigchld._pid = info->si_pid;
 tinfo->_sifields._sigchld._uid = info->si_uid;
 tinfo->_sifields._sigchld._status
-= host_to_target_waitstatus(info->si_status);
+= host_to_target_waitstatus(info->si_status);
 tinfo->_sifields._sigchld._utime = info->si_utime;
 tinfo->_sifields._sigchld._stime = info->si_stime;
 } else if (sig >= TARGET_SIGRTMIN) {
@@ -269,7 +269,7 @@ static inline void 
host_to_target_siginfo_noswap(target_siginfo_t *tinfo,
 tinfo->_sifields._rt._uid = info->si_uid;
 /* XXX: potential problem if 64 bit */
 tinfo->_sifields._rt._sigval.sival_ptr
-= (abi_ulong)(unsigned long)info->si_value.sival_ptr;
+= (abi_ulong)(unsigned long)info->si_value.sival_ptr;
 }
 }
 
@@ -723,75 +723,75 @@ int do_sigaction(int sig, const struct target_sigaction 
*act,
 /* from the Linux kernel */
 
 struct target_fpreg {
-   uint16_t significand[4];
-   uint16_t exponent;
+uint16_t significand[4];
+uint16_t exponent;
 };
 
 struct target_fpxreg {
-   uint16_t significand[4];
-   uint16_t exponent;
-   uint16_t padding[3];
+uint16_t significand[4];
+uint16_t exponent;
+uint16_t padding[3];
 };
 
 struct target_xmmreg {
-   abi_ulong element[4];
+abi_ulong element[4];
 };
 
 struct target_fpstate {
-   /* Regular FPU environment */
-abi_ulong   cw;
-abi_ulong   sw;
-abi_ulong   tag;
-abi_ulong   ipoff;
-abi_ulong   cssel;
-abi_ulong   dataoff;
-abi_ulong   datasel;
-   struct target_fpreg _st[8];
-   uint16_tstatus;
-   uint16_tmagic;  /* 0x = regular FPU data only */
-
-   /* FXSR FPU environment */
-abi_ulong   _fxsr_env[6];   /* FXSR FPU env is ignored */
-abi_ulong   mxcsr;
-abi_ulong   reserved;
-   struct target_fpxreg_fxsr_st[8];/* FXSR FPU reg data is ignored 
*/
-   struct target_xmmreg_xmm[8];
-abi_ulong   padding[56];
+/* Regular FPU environment */
+abi_ulong cw;
+abi_ulong sw;
+abi_ulong tag;
+abi_ulong ipoff;
+abi_ulong cssel;
+abi_ulong dataoff;
+abi_ulong datasel;
+struct target_fpreg _st[8];
+uint16_t  status;
+uint16_t  magic;  /* 0x = regular FPU data only */
+
+/* FXSR FPU environment */
+abi_ulong _fxsr_env[6];   /* FXSR FPU env is ignored */
+abi_ulong mxcsr;
+abi_ulong reserved;
+struct target_fpxreg _fxsr_st[8]; /* FXSR FPU reg data is ignored */
+struct target_xmmreg _xmm[8];
+abi_ulong padding[56];
 };
 
 #define X86_FXSR_MAGIC 0x
 
 struct target_sigcontext {
-   uint16_t gs, __gsh;
-   uint16_t fs, __fsh;
-   uint16_t es, __esh;
-   uint16_t ds, __dsh;
-abi_ulong edi;
-abi_ulong esi;
-abi_ulong ebp;
-abi_ulong esp;
-abi_ulong ebx;
-abi_ulong edx;
-abi_ulong ecx;
-abi_ulong eax;
-abi_ulong trapno;
-abi_ulong err;
-abi_ulong eip;
-   uint16_t cs, __csh;
-abi_ulong eflags;
-abi_ulong esp_at_signal;
-   

[Qemu-devel] [PATCH v2 08/28] linux-user: Support for restarting system calls for MIPS targets

2016-05-12 Thread Peter Maydell
From: Timothy E Baldwin 

Update the MIPS main loop code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn

(We already handle TARGET_QEMU_ESIGRETURN.)

Signed-off-by: Timothy Edward Baldwin 
Message-id: 1441497448-32489-7-git-send-email-t.e.baldwi...@members.leeds.ac.uk
Reviewed-by: Peter Maydell 
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell 
---
 linux-user/main.c | 4 
 linux-user/mips/target_signal.h   | 1 +
 linux-user/mips64/target_signal.h | 1 +
 3 files changed, 6 insertions(+)

diff --git a/linux-user/main.c b/linux-user/main.c
index 59c1166..0a8b7b6 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -2527,6 +2527,10 @@ done_syscall:
  env->active_tc.gpr[8], env->active_tc.gpr[9],
  env->active_tc.gpr[10], env->active_tc.gpr[11]);
 # endif /* O32 */
+if (ret == -TARGET_ERESTARTSYS) {
+env->active_tc.PC -= 4;
+break;
+}
 if (ret == -TARGET_QEMU_ESIGRETURN) {
 /* Returning from a successful sigreturn syscall.
Avoid clobbering register state.  */
diff --git a/linux-user/mips/target_signal.h b/linux-user/mips/target_signal.h
index 6e1dc8b..460cc9f 100644
--- a/linux-user/mips/target_signal.h
+++ b/linux-user/mips/target_signal.h
@@ -26,4 +26,5 @@ static inline abi_ulong get_sp_from_cpustate(CPUMIPSState 
*state)
 return state->active_tc.gpr[29];
 }
 
+
 #endif /* TARGET_SIGNAL_H */
diff --git a/linux-user/mips64/target_signal.h 
b/linux-user/mips64/target_signal.h
index 5fb6a2c..a2dc514 100644
--- a/linux-user/mips64/target_signal.h
+++ b/linux-user/mips64/target_signal.h
@@ -26,4 +26,5 @@ static inline abi_ulong get_sp_from_cpustate(CPUMIPSState 
*state)
 return state->active_tc.gpr[29];
 }
 
+
 #endif /* TARGET_SIGNAL_H */
-- 
1.9.1




[Qemu-devel] [PATCH v2 04/28] linux-user: Define TARGET_ERESTART* errno values

2016-05-12 Thread Peter Maydell
From: Timothy E Baldwin 

Define TARGET_ERESTARTSYS; like the kernel, we will use this to
indicate that a guest system call should be restarted. We use
the same value the kernel does for this, 512.

Signed-off-by: Timothy Edward Baldwin 
[PMM: split out from the patch which moves and renumbers
 TARGET_QEMU_ESIGRETURN, add comment on usage]
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 linux-user/errno_defs.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/linux-user/errno_defs.h b/linux-user/errno_defs.h
index 8a1cf76..b7a8c9f 100644
--- a/linux-user/errno_defs.h
+++ b/linux-user/errno_defs.h
@@ -139,3 +139,11 @@
 /* for robust mutexes */
 #define TARGET_EOWNERDEAD  130 /* Owner died */
 #define TARGET_ENOTRECOVERABLE 131 /* State not recoverable */
+
+/* QEMU internal, not visible to the guest. This is returned when a
+ * system call should be restarted, to tell the main loop that it
+ * should wind the guest PC backwards so it will re-execute the syscall
+ * after handling any pending signals. They match with the ones the guest
+ * kernel uses for the same purpose.
+ */
+#define TARGET_ERESTARTSYS 512 /* Restart system call (if SA_RESTART) 
*/
-- 
1.9.1




[Qemu-devel] [PATCH v2 25/28] linux-user: Use safe_syscall for wait system calls

2016-05-12 Thread Peter Maydell
From: Timothy E Baldwin 

Use safe_syscall for waitpid, waitid and wait4 syscalls. Note that this
change allows us to implement support for waitid's fifth (rusage) argument
in future; for the moment we ignore it as we have done up til now.

Signed-off-by: Timothy Edward Baldwin 
Message-id: 1441497448-32489-18-git-send-email-t.e.baldwi...@members.leeds.ac.uk
[PMM: Adjust to new safe_syscall convention. Add fifth waitid syscall argument
 (which isn't present in the libc interface but is in the syscall ABI)]
Signed-off-by: Peter Maydell 
---
 linux-user/syscall.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 0037ee7..d9f4695 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -699,6 +699,10 @@ safe_syscall3(ssize_t, read, int, fd, void *, buff, 
size_t, count)
 safe_syscall3(ssize_t, write, int, fd, const void *, buff, size_t, count)
 safe_syscall4(int, openat, int, dirfd, const char *, pathname, \
   int, flags, mode_t, mode)
+safe_syscall4(pid_t, wait4, pid_t, pid, int *, status, int, options, \
+  struct rusage *, rusage)
+safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, siginfo_t *, infop, \
+  int, options, struct rusage *, rusage)
 
 static inline int host_to_target_sock_type(int host_type)
 {
@@ -6037,7 +6041,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
 case TARGET_NR_waitpid:
 {
 int status;
-ret = get_errno(waitpid(arg1, , arg3));
+ret = get_errno(safe_wait4(arg1, , arg3, 0));
 if (!is_error(ret) && arg2 && ret
 && put_user_s32(host_to_target_waitstatus(status), arg2))
 goto efault;
@@ -6049,7 +6053,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
 {
 siginfo_t info;
 info.si_pid = 0;
-ret = get_errno(waitid(arg1, arg2, , arg4));
+ret = get_errno(safe_waitid(arg1, arg2, , arg4, NULL));
 if (!is_error(ret) && arg3 && info.si_pid != 0) {
 if (!(p = lock_user(VERIFY_WRITE, arg3, 
sizeof(target_siginfo_t), 0)))
 goto efault;
@@ -7761,7 +7765,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
 rusage_ptr = 
 else
 rusage_ptr = NULL;
-ret = get_errno(wait4(arg1, , arg3, rusage_ptr));
+ret = get_errno(safe_wait4(arg1, , arg3, rusage_ptr));
 if (!is_error(ret)) {
 if (status_ptr && ret) {
 status = host_to_target_waitstatus(status);
-- 
1.9.1




  1   2   3   4   >