[Qemu-devel] Re: SVM emulation: EVENTINJ marked valid when a pagefault happens while issuing a software interrupt

2010-05-27 Thread Jan Kiszka
Erik van der Kouwe wrote:
> Hi,
> 
>> Use Linux+KVM as host OS, it can also run VMMs as guests (aka nested
>> SVM). And you could even debug those guests just like when you would run
>> QEMU in emulation mode. In contrast to SVM emulation, nesting is fairly
>> stable AFAIK. And it is faster.
> 
> In my experience, if I provide the -enable-kvm switch then the guest VMM
> never detects the presence of virtualization support. Does this only
> work on AMD hardware? Or do I need to supply some additional parameter
> to make it work?

Yes, forgot to mention: -enable-nesting, and you need qemu-kvm. This
feature hasn't been merged upstream yet.

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] Re: SVM emulation: EVENTINJ marked valid when a pagefault happens while issuing a software interrupt

2010-05-27 Thread Erik van der Kouwe

Hi,


Use Linux+KVM as host OS, it can also run VMMs as guests (aka nested
SVM). And you could even debug those guests just like when you would run
QEMU in emulation mode. In contrast to SVM emulation, nesting is fairly
stable AFAIK. And it is faster.


In my experience, if I provide the -enable-kvm switch then the guest VMM 
never detects the presence of virtualization support. Does this only 
work on AMD hardware? Or do I need to supply some additional parameter 
to make it work?


With kind regards,
Erik



[Qemu-devel] Re: [OpenBIOS] [PATCH 0/3] sparc64 cleanups v1

2010-05-27 Thread Igor Kovalenko
On Thu, May 27, 2010 at 8:57 PM, Mark Cave-Ayland
 wrote:
> Blue Swirl wrote:
>
>> On Tue, May 25, 2010 at 12:12 PM, Igor V. Kovalenko
>>  wrote:
>>>
>>> One code cleanup and another pci host bridge remap change,
>>> the latter requires qemu update with patch already posted to qemu list.
>>>
>>> v0->v1: added missing patch moving asi.h to arch includes
>>
>> Thanks, applied all.
>
> Whilst updating to OpenBIOS SVN and qemu git head to test these patches,
> I've found a regression with qemu-system-sparc64 and
> debian-504-sparc-netinst.iso. Rather than getting to the end of the kernel
> boot and being unable to mount the root filesystem, instead I now get the
> following fatal trap message:
>
>
> [   42.493402] Console: switching to mono PROM 128x96
> [   63.440200] [drm] Initialized drm 1.1.0 20060810
> [   63.542123] su: probe of ffe2dea0 failed with error -12
> [   63.690331] brd: module loaded
> [   63.787034] loop: module loaded
> [   63.863989] Uniform Multi-Platform E-IDE driver
> [   63.961215] ide: Assuming 33MHz system bus speed for PIO modes; override
> with idebus=xx
> [   64.115119] mice: PS/2 mouse device common for all mice
> [   64.234482] usbcore: registered new interface driver usbhid
> [   64.359397] usbhid: v2.6:USB HID core driver
> [   64.462167] TCP cubic registered
> [   64.539714] NET: Registered protocol family 17
> [   64.642969] registered taskstats version 1
> [   64.737822] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
> qemu: fatal: Trap 0x0068 while trap level (5) >= MAXTL (5), Error state
> pc: 00424d18  npc: 00424d1c
> General Registers:
> %g0-3:  0800 4000 0002
> %g4-7: 03ff 0001 0020 4000
>
> Current Register Window:
> %o0-3:    
> %o4-7:   fffd3ef0 
> %l0-3:    
> %l4-7:    
> %i0-3:    
> %i4-7:    
>
> Floating Point Registers:
> %f00: 0.00 0.00 0.00 0.00
> %f04: 0.00 0.00 0.00 0.00
> %f08: 0.00 0.00 0.00 0.00
> %f12: 0.00 0.00 0.00 0.00
> %f16: 0.00 0.00 0.00 0.00
> %f20: 0.00 0.00 0.00 0.00
> %f24: 0.00 0.00 0.00 0.00
> %f28: 0.00 0.00 0.00 0.00
> %f32: 0.00 0.00 0.00 0.00
> %f36: 0.00 0.00 0.00 0.00
> %f40: 0.00 0.00 0.00 0.00
> %f44: 0.00 0.00 0.00 0.00
> %f48: 0.00 0.00 0.00 0.00
> %f52: 0.00 0.00 0.00 0.00
> %f56: 0.00 0.00 0.00 0.00
> %f60: 0.00 0.00 0.00 0.00
> pstate: 0414 ccr: 00 (icc:  xcc: ) asi: 82 tl: 5 pil: 0
> cansave: 6 canrestore: 0 otherwin: 0 wstate: 2 cleanwin: 0 cwp: 7
> fsr:  y:  fprs: 
> Aborted
>
>
> Digging deeper, it seems that this was something that was introduced earlier
> than the last set of patches. Reverting to OpenBIOS SVN r777 and using 'git
> bisect', I can identify the offending commit in qemu git as
> 2aae2b8e0abd58e76d616bcbe93c6966d06d0188 "sparc64: fix pstate privilege
> bits". Does that help at all?

With many debian iso images I consistently get scrolling blanks after
the following line on qemu video console:

io sched cfq registered (default)

Please share your qemu command line, and installer prompt input if any.

-- 
Kind regards,
Igor V. Kovalenko



[Qemu-devel] Re: [OpenBIOS] [PATCH 0/3] sparc64 cleanups v1

2010-05-27 Thread Igor Kovalenko
On Fri, May 28, 2010 at 12:42 AM, Blue Swirl  wrote:
> On Thu, May 27, 2010 at 4:57 PM, Mark Cave-Ayland
>  wrote:
>> Blue Swirl wrote:
>>
>>> On Tue, May 25, 2010 at 12:12 PM, Igor V. Kovalenko
>>>  wrote:

 One code cleanup and another pci host bridge remap change,
 the latter requires qemu update with patch already posted to qemu list.

 v0->v1: added missing patch moving asi.h to arch includes
>>>
>>> Thanks, applied all.
>>
>> Whilst updating to OpenBIOS SVN and qemu git head to test these patches,
>> I've found a regression with qemu-system-sparc64 and
>> debian-504-sparc-netinst.iso. Rather than getting to the end of the kernel
>> boot and being unable to mount the root filesystem, instead I now get the
>> following fatal trap message:
>>
>>
>> [   42.493402] Console: switching to mono PROM 128x96
>> [   63.440200] [drm] Initialized drm 1.1.0 20060810
>> [   63.542123] su: probe of ffe2dea0 failed with error -12
>> [   63.690331] brd: module loaded
>> [   63.787034] loop: module loaded
>> [   63.863989] Uniform Multi-Platform E-IDE driver
>> [   63.961215] ide: Assuming 33MHz system bus speed for PIO modes; override
>> with idebus=xx
>> [   64.115119] mice: PS/2 mouse device common for all mice
>> [   64.234482] usbcore: registered new interface driver usbhid
>> [   64.359397] usbhid: v2.6:USB HID core driver
>> [   64.462167] TCP cubic registered
>> [   64.539714] NET: Registered protocol family 17
>> [   64.642969] registered taskstats version 1
>> [   64.737822] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
>> qemu: fatal: Trap 0x0068 while trap level (5) >= MAXTL (5), Error state
>> pc: 00424d18  npc: 00424d1c
>> General Registers:
>> %g0-3:  0800 4000 0002
>> %g4-7: 03ff 0001 0020 4000
>>
>> Current Register Window:
>> %o0-3:    
>> %o4-7:   fffd3ef0 
>> %l0-3:    
>> %l4-7:    
>> %i0-3:    
>> %i4-7:    
>>
>> Floating Point Registers:
>> %f00: 0.00 0.00 0.00 0.00
>> %f04: 0.00 0.00 0.00 0.00
>> %f08: 0.00 0.00 0.00 0.00
>> %f12: 0.00 0.00 0.00 0.00
>> %f16: 0.00 0.00 0.00 0.00
>> %f20: 0.00 0.00 0.00 0.00
>> %f24: 0.00 0.00 0.00 0.00
>> %f28: 0.00 0.00 0.00 0.00
>> %f32: 0.00 0.00 0.00 0.00
>> %f36: 0.00 0.00 0.00 0.00
>> %f40: 0.00 0.00 0.00 0.00
>> %f44: 0.00 0.00 0.00 0.00
>> %f48: 0.00 0.00 0.00 0.00
>> %f52: 0.00 0.00 0.00 0.00
>> %f56: 0.00 0.00 0.00 0.00
>> %f60: 0.00 0.00 0.00 0.00
>> pstate: 0414 ccr: 00 (icc:  xcc: ) asi: 82 tl: 5 pil: 0
>> cansave: 6 canrestore: 0 otherwin: 0 wstate: 2 cleanwin: 0 cwp: 7
>> fsr:  y:  fprs: 
>> Aborted
>>
>>
>> Digging deeper, it seems that this was something that was introduced earlier
>> than the last set of patches. Reverting to OpenBIOS SVN r777 and using 'git
>> bisect', I can identify the offending commit in qemu git as
>> 2aae2b8e0abd58e76d616bcbe93c6966d06d0188 "sparc64: fix pstate privilege
>> bits". Does that help at all?
>
> Yes, bisection results are usually very helpful, thanks.
>
> I think the problem is that previously psrs was always 1 and PSR_HYPV
> always set, so maximally permissive MMU_HYPV_INDEX was always selected
> by cpu_mmu_index (bug!). Also because PSR_HYPV is no longer set, some
> checks in translate.c indicate privilege violations.
>
> The logic was previously such that if the CPU does not have a
> hypervisor mode, for compatibility, supervisor mode would also select
> hypervisor mode (or at least that was my intention and probably Igor
> wasn't aware of this, sorry). Now that they are separate, CPUs without
> hypervisor mode must be handled differently. Perhaps this commit
> should be reverted, the fix won't be so trivial.

I'll take a look at this issue.

>
> The lesson here is also that subtle

[Qemu-devel] [RFC PATCH v4 3/3] block: add sheepdog driver for distributed storage support

2010-05-27 Thread MORITA Kazutaka
Sheepdog is a distributed storage system for QEMU. It provides highly
available block level storage volumes to VMs like Amazon EBS.  This
patch adds a qemu block driver for Sheepdog.

Sheepdog features are:
- No node in the cluster is special (no metadata node, no control
  node, etc)
- Linear scalability in performance and capacity
- No single point of failure
- Autonomous management (zero configuration)
- Useful volume management support such as snapshot and cloning
- Thin provisioning
- Autonomous load balancing

The more details are available at the project site:
http://www.osrg.net/sheepdog/

Signed-off-by: MORITA Kazutaka 
---
 Makefile.objs|2 +-
 block/sheepdog.c | 1835 ++
 2 files changed, 1836 insertions(+), 1 deletions(-)
 create mode 100644 block/sheepdog.c

diff --git a/Makefile.objs b/Makefile.objs
index 1a942e5..527a754 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -14,7 +14,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 
 block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o 
vvfat.o
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
-block-nested-y += parallels.o nbd.o blkdebug.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block/sheepdog.c b/block/sheepdog.c
new file mode 100644
index 000..68545e8
--- /dev/null
+++ b/block/sheepdog.c
@@ -0,0 +1,1835 @@
+/*
+ * Copyright (C) 2009-2010 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+#include 
+#include 
+
+#include "qemu-common.h"
+#include "qemu-error.h"
+#include "block_int.h"
+
+#define SD_PROTO_VER 0x01
+
+#define SD_DEFAULT_ADDR "localhost:7000"
+
+#define SD_OP_CREATE_AND_WRITE_OBJ  0x01
+#define SD_OP_READ_OBJ   0x02
+#define SD_OP_WRITE_OBJ  0x03
+
+#define SD_OP_NEW_VDI0x11
+#define SD_OP_LOCK_VDI   0x12
+#define SD_OP_RELEASE_VDI0x13
+#define SD_OP_GET_VDI_INFO   0x14
+#define SD_OP_READ_VDIS  0x15
+
+#define SD_FLAG_CMD_WRITE0x01
+#define SD_FLAG_CMD_COW  0x02
+
+#define SD_RES_SUCCESS   0x00 /* Success */
+#define SD_RES_UNKNOWN   0x01 /* Unknown error */
+#define SD_RES_NO_OBJ0x02 /* No object found */
+#define SD_RES_EIO   0x03 /* I/O error */
+#define SD_RES_VDI_EXIST 0x04 /* Vdi exists already */
+#define SD_RES_INVALID_PARMS 0x05 /* Invalid parameters */
+#define SD_RES_SYSTEM_ERROR  0x06 /* System error */
+#define SD_RES_VDI_LOCKED0x07 /* Vdi is locked */
+#define SD_RES_NO_VDI0x08 /* No vdi found */
+#define SD_RES_NO_BASE_VDI   0x09 /* No base vdi found */
+#define SD_RES_VDI_READ  0x0A /* Cannot read requested vdi */
+#define SD_RES_VDI_WRITE 0x0B /* Cannot write requested vdi */
+#define SD_RES_BASE_VDI_READ 0x0C /* Cannot read base vdi */
+#define SD_RES_BASE_VDI_WRITE   0x0D /* Cannot write base vdi */
+#define SD_RES_NO_TAG0x0E /* Requested tag is not found */
+#define SD_RES_STARTUP   0x0F /* Sheepdog is on starting up */
+#define SD_RES_VDI_NOT_LOCKED   0x10 /* Vdi is not locked */
+#define SD_RES_SHUTDOWN  0x11 /* Sheepdog is shutting down */
+#define SD_RES_NO_MEM0x12 /* Cannot allocate memory */
+#define SD_RES_FULL_VDI  0x13 /* we already have the maximum vdis */
+#define SD_RES_VER_MISMATCH  0x14 /* Protocol version mismatch */
+#define SD_RES_NO_SPACE  0x15 /* Server has no room for new objects */
+#define SD_RES_WAIT_FOR_FORMAT  0x16 /* Sheepdog is waiting for a format 
operation */
+#define SD_RES_WAIT_FOR_JOIN0x17 /* Sheepdog is waiting for other nodes 
joining */
+#define SD_RES_JOIN_FAILED   0x18 /* Target node had failed to join sheepdog */
+
+/*
+ * Object ID rules
+ *
+ *  0 - 19 (20 bits): data object space
+ * 20 - 31 (12 bits): reserved data object space
+ * 32 - 55 (24 bits): vdi object space
+ * 56 - 59 ( 4 bits): reserved vdi object space
+ * 60 - 63 ( 4 bits): object type indentifier space
+ */
+
+#define VDI_SPACE_SHIFT   32
+#define VDI_BIT (UINT64_C(1) << 63)
+#define VMSTATE_BIT (UINT64_C(1) << 62)
+#define MAX_DATA_OBJS (1ULL << 20)
+#define MAX_CHILDREN 1024
+#define SD_MAX_VDI_LEN 256
+#define SD_NR_VDIS   (1U << 24)
+#define SD_DATA_OBJ_SIZE (UINT64_C(1) << 22)
+
+#define SD_INODE_SIZE (sizeof(SheepdogInode))
+#define CURRENT_VDI_ID 0
+
+typedef struct SheepdogReq {
+   uint8_t proto_ver;
+   uint8_t opcode;
+   uint16_tflags;
+   uint32_tepoch;
+   uint32_tid;
+   uint32_tdata_length

[Qemu-devel] [RFC PATCH v4 1/3] close all the block drivers before the qemu process exits

2010-05-27 Thread MORITA Kazutaka
This patch calls the close handler of the block driver before the qemu
process exits.

This is necessary because the sheepdog block driver releases the lock
of VM images in the close handler.

Signed-off-by: MORITA Kazutaka 
---
 block.c |9 +
 block.h |1 +
 vl.c|1 +
 3 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 24c63f6..da0dc47 100644
--- a/block.c
+++ b/block.c
@@ -646,6 +646,15 @@ void bdrv_close(BlockDriverState *bs)
 }
 }
 
+void bdrv_close_all(void)
+{
+BlockDriverState *bs;
+
+QTAILQ_FOREACH(bs, &bdrv_states, list) {
+bdrv_close(bs);
+}
+}
+
 void bdrv_delete(BlockDriverState *bs)
 {
 /* remove from list, if necessary */
diff --git a/block.h b/block.h
index 756670d..25744b1 100644
--- a/block.h
+++ b/block.h
@@ -123,6 +123,7 @@ BlockDriverAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
 /* Ensure contents are flushed to disk.  */
 void bdrv_flush(BlockDriverState *bs);
 void bdrv_flush_all(void);
+void bdrv_close_all(void);
 
 int bdrv_has_zero_init(BlockDriverState *bs);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
diff --git a/vl.c b/vl.c
index 7121cd0..8ffe36f 100644
--- a/vl.c
+++ b/vl.c
@@ -1992,6 +1992,7 @@ static void main_loop(void)
 vm_stop(r);
 }
 }
+bdrv_close_all();
 pause_all_vcpus();
 }
 
-- 
1.5.6.5




[Qemu-devel] [RFC PATCH v4 0/3] Sheepdog: distributed storage system for QEMU

2010-05-27 Thread MORITA Kazutaka
Hi all,

This patch adds a block driver for Sheepdog distributed storage
system.  Please consider for inclusion.

I applied comments for the 2nd patch (thanks Kevin!).  The rest
patches are not changed from the previous version.


Changes from v3 to v4 are:
 - fix error handling in bdrv_snapshot_goto.

Changes from v2 to v3 are:

 - add drv->bdrv_close() and drv->bdrv_open() before and after
   bdrv_snapshot_goto() call of the protocol.
 - address the review comments on the sheepdog driver code.

Changes from v1 to v2 are:

 - rebase onto git://repo.or.cz/qemu/kevin.git block
 - modify the sheepdog driver as a protocol driver
 - add new patch to call the snapshot handler of the protocol

Thanks,

Kazutaka


MORITA Kazutaka (3):
  close all the block drivers before the qemu process exits
  block: call the snapshot handlers of the protocol drivers
  block: add sheepdog driver for distributed storage support

 Makefile.objs|2 +-
 block.c  |   70 ++-
 block.h  |1 +
 block/sheepdog.c | 1835 ++
 vl.c |1 +
 5 files changed, 1890 insertions(+), 19 deletions(-)
 create mode 100644 block/sheepdog.c




[Qemu-devel] [RFC PATCH v4 2/3] block: call the snapshot handlers of the protocol drivers

2010-05-27 Thread MORITA Kazutaka
When snapshot handlers are not defined in the format driver, it is
better to call the ones of the protocol driver.  This enables us to
implement snapshot support in the protocol driver.

We need to call bdrv_close() and bdrv_open() handlers of the format
driver before and after bdrv_snapshot_goto() call of the protocol.  It is
because the contents of the block driver state may need to be changed
after loading vmstate.

Signed-off-by: MORITA Kazutaka 
---
 block.c |   61 +++--
 1 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/block.c b/block.c
index da0dc47..cf80dbf 100644
--- a/block.c
+++ b/block.c
@@ -1697,9 +1697,11 @@ int bdrv_save_vmstate(BlockDriverState *bs, const 
uint8_t *buf,
 BlockDriver *drv = bs->drv;
 if (!drv)
 return -ENOMEDIUM;
-if (!drv->bdrv_save_vmstate)
-return -ENOTSUP;
-return drv->bdrv_save_vmstate(bs, buf, pos, size);
+if (drv->bdrv_save_vmstate)
+return drv->bdrv_save_vmstate(bs, buf, pos, size);
+if (bs->file)
+return bdrv_save_vmstate(bs->file, buf, pos, size);
+return -ENOTSUP;
 }
 
 int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
@@ -1708,9 +1710,11 @@ int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
 BlockDriver *drv = bs->drv;
 if (!drv)
 return -ENOMEDIUM;
-if (!drv->bdrv_load_vmstate)
-return -ENOTSUP;
-return drv->bdrv_load_vmstate(bs, buf, pos, size);
+if (drv->bdrv_load_vmstate)
+return drv->bdrv_load_vmstate(bs, buf, pos, size);
+if (bs->file)
+return bdrv_load_vmstate(bs->file, buf, pos, size);
+return -ENOTSUP;
 }
 
 void bdrv_debug_event(BlockDriverState *bs, BlkDebugEvent event)
@@ -1734,20 +1738,37 @@ int bdrv_snapshot_create(BlockDriverState *bs,
 BlockDriver *drv = bs->drv;
 if (!drv)
 return -ENOMEDIUM;
-if (!drv->bdrv_snapshot_create)
-return -ENOTSUP;
-return drv->bdrv_snapshot_create(bs, sn_info);
+if (drv->bdrv_snapshot_create)
+return drv->bdrv_snapshot_create(bs, sn_info);
+if (bs->file)
+return bdrv_snapshot_create(bs->file, sn_info);
+return -ENOTSUP;
 }
 
 int bdrv_snapshot_goto(BlockDriverState *bs,
const char *snapshot_id)
 {
 BlockDriver *drv = bs->drv;
+int ret, open_ret;
+
 if (!drv)
 return -ENOMEDIUM;
-if (!drv->bdrv_snapshot_goto)
-return -ENOTSUP;
-return drv->bdrv_snapshot_goto(bs, snapshot_id);
+if (drv->bdrv_snapshot_goto)
+return drv->bdrv_snapshot_goto(bs, snapshot_id);
+
+if (bs->file) {
+drv->bdrv_close(bs);
+ret = bdrv_snapshot_goto(bs->file, snapshot_id);
+open_ret = drv->bdrv_open(bs, bs->open_flags);
+if (open_ret < 0) {
+bdrv_delete(bs->file);
+bs->drv = NULL;
+return open_ret;
+}
+return ret;
+}
+
+return -ENOTSUP;
 }
 
 int bdrv_snapshot_delete(BlockDriverState *bs, const char *snapshot_id)
@@ -1755,9 +1776,11 @@ int bdrv_snapshot_delete(BlockDriverState *bs, const 
char *snapshot_id)
 BlockDriver *drv = bs->drv;
 if (!drv)
 return -ENOMEDIUM;
-if (!drv->bdrv_snapshot_delete)
-return -ENOTSUP;
-return drv->bdrv_snapshot_delete(bs, snapshot_id);
+if (drv->bdrv_snapshot_delete)
+return drv->bdrv_snapshot_delete(bs, snapshot_id);
+if (bs->file)
+return bdrv_snapshot_delete(bs->file, snapshot_id);
+return -ENOTSUP;
 }
 
 int bdrv_snapshot_list(BlockDriverState *bs,
@@ -1766,9 +1789,11 @@ int bdrv_snapshot_list(BlockDriverState *bs,
 BlockDriver *drv = bs->drv;
 if (!drv)
 return -ENOMEDIUM;
-if (!drv->bdrv_snapshot_list)
-return -ENOTSUP;
-return drv->bdrv_snapshot_list(bs, psn_info);
+if (drv->bdrv_snapshot_list)
+return drv->bdrv_snapshot_list(bs, psn_info);
+if (bs->file)
+return bdrv_snapshot_list(bs->file, psn_info);
+return -ENOTSUP;
 }
 
 #define NB_SUFFIXES 4
-- 
1.5.6.5




Re: [Qemu-devel] [PATCH 00/62] s390x tcg target

2010-05-27 Thread Richard Henderson
On 05/27/2010 02:00 PM, Blue Swirl wrote:
>>  tcg-s390: Update disassembler from binutils head.
> 
> This is GPLv3, which is not OK. Please use the last v2 version, see
> 88103cfecf5666237fb2e55a7dd666fa66d316ec.

Ok.  Thankfully there aren't too many changes since then.

I'll wait for more comments before reorganizing the patches
on the branch.



r~



Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

2010-05-27 Thread Paul Brook

> >> Then the amount
> >> of CPU cycles between timer interrupts would increase and hopefully
> >> the guest can keep up. If the guest sleeps, time base could be
> >> accelerated to catch up with wall clock and then set back to 1:1 rate.
> > 
> > Can't follow you ATM, sorry. What should be slowed down then? And how
> > precisely?
> 
> I think vm_clock and everything that depends on vm_clock, also
> rtc_clock should be tied to vm_clock in this mode, not host_clock.

The problem is more fundamental than that. There is no real correlation 
between vm_clock and the amount of code executed by the guest, especially not 
on timescales less than a second.

Paul



Re: [Qemu-devel] [PATCH 00/62] s390x tcg target

2010-05-27 Thread Blue Swirl
On Thu, May 27, 2010 at 8:45 PM, Richard Henderson  wrote:
> The following patch series is available at
>
>  git://repo.or.cz/qemu/rth.git tcg-s390-2
>
> It begins with Uli Hecht's original patch, posted by Alexander
> sometime last year.  I then make incremental changes to
>
>  (1) Make it compile -- first patch that compiles is tagged
>      as tcg-s390-2-first-compile and is
>
>      d142103... tcg-s390: Define tcg_target_reg_names.
>
>  (2) Make it work -- the first patch that i386-linux-user
>      successfully completes linux-test-user-0.2 is tagged
>      as tcg-s390-2-first-working and is
>
>      3571f8d... tcg-s390: Implement setcond.
>
>  (3) Make it work for other targets.  I don't tag this,
>      but there are lots of load/store aborts and an
>      incorrectly division routine until
>
>      9798371... tcg-s390: Implement div2.
>
>  (4) Make it work well.  The balance of the patches incrementally
>      add support for new instructions.  At
>
>      7bfaa9e... tcg-s390: Query instruction extensions that are installed.
>
>      I add support for detecting the instruction set extensions
>      present in the host and then start disabling some of those
>      new instructions that may not be present.
>
> Once things start working, each step was tested with an --enable-debug
> compile, and running the linux-user-test suite as well as booting
> the {arm,coldfire,sparc}-linux test kernels, and booting freedos.
>
> Unfortunately, each step was only built without optimization, and it
> is only at the end that we discovered that TCG was not properly honoring
> the host ABI.  This is solved by the last patch, adding proper sign
> extensions for the 32-bit function arguments.  With the final patch
> everything works for an optimized build as well.
>
> The current state is that the TCG compiler works for an s390x host.
> That is, with a 64-bit userland binary.  It will *compile* for a
> 32-bit userland binary, but that facility is only retained for the
> purpose of running the s390 kvm guest.  If kvm is not used, the
> 32-bit binary will exit with an error message.
>
> Given that this is the beginning of proper support for s390, I don't
> know whether bisectability is really an issue.  I suppose we could
> fairly easily re-base the patches that touch files outside tcg/s390/
> and then squash the rest, but I suspect the history may be useful.
>
>
>
> r~
>
>
>
> Alexander Graf (2):
>  S390 TCG target
>  add lost chunks from the original patch
>
> Richard Henderson (60):
>  tcg-s390: Only validate CPUTLBEntry for system mode.
>  tcg-s390: Fix tcg_prepare_qemu_ldst for user mode.
>  tcg-s390: Move opcode defines to tcg-target.c.
>  s390x: Avoid _llseek.
>  s390x: Don't use a linker script for user-only.
>  tcg-s390: Avoid set-but-not-used werrors.
>  tcg-s390: Mark R0 & R15 reserved.
>  tcg-s390: R6 is a function argument register
>  tcg-s390: Move tcg_out_mov up and use it throughout.
>  tcg-s390: Eliminate the S constraint.
>  tcg-s390: Add -m64 and -march to s390x compilation.
>  tcg-s390: Define tcg_target_reg_names.
>  tcg-s390: Update disassembler from binutils head.

This is GPLv3, which is not OK. Please use the last v2 version, see
88103cfecf5666237fb2e55a7dd666fa66d316ec.

>  tcg-s390: Compute is_write in cpu_signal_handler.
>  tcg-s390: Reorganize instruction emission
>  tcg-s390: Use matching constraints.
>  tcg-s390: Fixup qemu_ld/st opcodes.
>  tcg-s390: Implement setcond.
>  tcg-s390: Generalize the direct load/store emission.
>  tcg-s390: Tidy branches.
>  tcg-s390: Add tgen_calli.
>  tcg-s390: Implement div2.
>  tcg-s390: Re-implement tcg_out_movi.
>  tcg-s390: Implement sign and zero-extension operations.
>  tcg-s390: Implement bswap operations.
>  tcg-s390: Implement rotates.
>  tcg-s390: Use LOAD COMPLIMENT for negate.
>  tcg-s390: Tidy unimplemented opcodes.
>  tcg-s390: Use the extended-immediate facility for add/sub.
>  tcg-s390: Implement immediate ANDs.
>  tcg-s390: Implement immediate ORs.
>  tcg-s390: Implement immediate MULs.
>  tcg-s390: Implement immediate XORs.
>  tcg-s390: Icache flush is a no-op.
>  tcg-s390: Define TCG_TMP0.
>  tcg-s390: Tidy regset initialization; use R14 as temporary.
>  tcg-s390: Rearrange register allocation order.
>  tcg-s390: Tidy goto_tb.
>  tcg-s390: Allocate the code_gen_buffer near the main program.
>  tcg-s390: Rearrange qemu_ld/st to avoid register copy.
>  tcg-s390: Tidy tcg_prepare_qemu_ldst.
>  tcg-s390: Tidy user qemu_ld/st.
>  tcg-s390: Implement GUEST_BASE.
>  tcg-s390: Query instruction extensions that are installed.
>  tcg-s390: Conditionalize general-instruction-extension insns.
>  tcg-s390: Conditionalize ADD IMMEDIATE instructions.
>  tcg-s390: Conditionalize LOAD IMMEDIATE instructions.
>  tcg-s390: Conditionalize 8 and 16 bit extensions.
>  tcg-s390: Conditionalize AND IMMEDIATE instructions.
>  tcg-s390: Conditionalize OR IMMEDIATE instructions.
>  tcg-s390: Conditionalize XOR IMMEDIATE instructions.
>  tcg-s390: Do not require the

[Qemu-devel] [PATCH 60/62] tcg-s390: Fix TLB comparison width.

2010-05-27 Thread Richard Henderson
The TLB comparator is sized for the target.
Use a 32-bit compare when appropriate.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   12 ++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 6101255..ec4c72a 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -174,7 +174,9 @@ typedef enum S390Opcode {
 RS_SRL  = 0x88,
 
 RXY_AG  = 0xe308,
+RXY_AY  = 0xe35a,
 RXY_CG  = 0xe320,
+RXY_CY  = 0xe359,
 RXY_LB  = 0xe376,
 RXY_LG  = 0xe304,
 RXY_LGB = 0xe377,
@@ -198,6 +200,8 @@ typedef enum S390Opcode {
 RXY_STRVH   = 0xe33f,
 RXY_STY = 0xe350,
 
+RX_A= 0x5a,
+RX_C= 0x59,
 RX_L= 0x58,
 RX_LH   = 0x48,
 RX_ST   = 0x50,
@@ -1442,7 +1446,11 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, TCGReg 
data_reg,
 }
 assert(ofs < 0x8);
 
-tcg_out_insn(s, RXY, CG, arg0, arg1, TCG_AREG0, ofs);
+if (TARGET_LONG_BITS == 32) {
+tcg_out_mem(s, RX_C, RXY_CY, arg0, arg1, TCG_AREG0, ofs);
+} else {
+tcg_out_mem(s, 0, RXY_CG, arg0, arg1, TCG_AREG0, ofs);
+}
 
 if (TARGET_LONG_BITS == 32) {
 tgen_ext32u(s, arg0, addr_reg);
@@ -1494,7 +1502,7 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, TCGReg 
data_reg,
 ofs = offsetof(CPUState, tlb_table[mem_index][0].addend);
 assert(ofs < 0x8);
 
-tcg_out_insn(s, RXY, AG, arg0, arg1, TCG_AREG0, ofs);
+tcg_out_mem(s, 0, RXY_AG, arg0, arg1, TCG_AREG0, ofs);
 }
 
 static void tcg_finish_qemu_ldst(TCGContext* s, uint16_t *label2_ptr)
-- 
1.7.0.1




[Qemu-devel] [PATCH 59/62] tcg-s390: Generalize load/store support.

2010-05-27 Thread Richard Henderson
Rename tcg_out_ldst to tcg_out_mem and add an index parameter.  If the
index parameter is present, handle it when the offset parameter is large
and the addend must be (partially) loaded.

Rename SH{32,64}_REG_NONE to TCG_REG_NONE, as the concept of a missing
register is not unique to the shift operations.

Adjust all users of tcg_out_mem to add TCG_REG_NONE as the index.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   94 +---
 1 files changed, 49 insertions(+), 45 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 4e3fb8b..6101255 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -47,6 +47,7 @@
 #define TCG_CT_CONST_XORI  0x4000
 #define TCG_CT_CONST_CMPI  0x8000
 
+#define TCG_REG_NONE   TCG_REG_R0
 #define TCG_TMP0   TCG_REG_R14
 
 #ifdef CONFIG_USE_GUEST_BASE
@@ -204,9 +205,6 @@ typedef enum S390Opcode {
 RX_STH  = 0x40,
 } S390Opcode;
 
-#define SH32_REG_NONE  0
-#define SH64_REG_NONE  0
-
 #define LD_SIGNED  0x04
 #define LD_UINT8   0x00
 #define LD_INT8(LD_UINT8 | LD_SIGNED)
@@ -338,7 +336,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 
 /* ??? Not the usual definition of "addend".  */
 pcrel2 = (value - (code_ptr_tl + addend)) >> 1;
-
+
 switch (type) {
 case R_390_PC16DBL:
 assert(pcrel2 == (int16_t)pcrel2);
@@ -597,7 +595,7 @@ static int tcg_target_const_match(tcg_target_long val,
 } else if (ct & TCG_CT_CONST_MULI) {
 /* Immediates that may be used with multiply.  If we have the
general-instruction-extensions, then we have MULTIPLY SINGLE
-   IMMEDIATE with a signed 32-bit, otherwise we have only 
+   IMMEDIATE with a signed 32-bit, otherwise we have only
MULTIPLY HALFWORD IMMEDIATE, with a signed 16-bit.  */
 if (facilities & FACILITY_GEN_INST_EXT) {
 return val == (int32_t)val;
@@ -799,17 +797,21 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
OPC_RX:   If the operation has an RX format opcode (e.g. STC), otherwise 0.
OPC_RXY:  The RXY format opcode for the operation (e.g. STCY).  */
 
-static void tcg_out_ldst(TCGContext *s, S390Opcode opc_rx, S390Opcode opc_rxy,
- TCGReg data, TCGReg base, tcg_target_long ofs)
+static void tcg_out_mem(TCGContext *s, S390Opcode opc_rx, S390Opcode opc_rxy,
+TCGReg data, TCGReg base, TCGReg index,
+tcg_target_long ofs)
 {
-TCGReg index = 0;
-
 if (ofs < -0x8 || ofs >= 0x8) {
 /* Combine the low 16 bits of the offset with the actual load insn;
the high 48 bits must come from an immediate load.  */
-index = TCG_TMP0;
-tcg_out_movi(s, TCG_TYPE_PTR, index, ofs & ~0x);
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0, ofs & ~0x);
 ofs &= 0x;
+
+/* If we were already given an index register, add it in.  */
+if (index != TCG_REG_NONE) {
+tcg_out_insn(s, RRE, AGR, TCG_TMP0, index);
+}
+index = TCG_TMP0;
 }
 
 if (opc_rx && ofs >= 0 && ofs < 0x1000) {
@@ -825,9 +827,9 @@ static inline void tcg_out_ld(TCGContext *s, TCGType type, 
TCGReg data,
   TCGReg base, tcg_target_long ofs)
 {
 if (type == TCG_TYPE_I32) {
-tcg_out_ldst(s, RX_L, RXY_LY, data, base, ofs);
+tcg_out_mem(s, RX_L, RXY_LY, data, base, TCG_REG_NONE, ofs);
 } else {
-tcg_out_ldst(s, 0, RXY_LG, data, base, ofs);
+tcg_out_mem(s, 0, RXY_LG, data, base, TCG_REG_NONE, ofs);
 }
 }
 
@@ -835,9 +837,9 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, 
TCGReg data,
   TCGReg base, tcg_target_long ofs)
 {
 if (type == TCG_TYPE_I32) {
-tcg_out_ldst(s, RX_ST, RXY_STY, data, base, ofs);
+tcg_out_mem(s, RX_ST, RXY_STY, data, base, TCG_REG_NONE, ofs);
 } else {
-tcg_out_ldst(s, 0, RXY_STG, data, base, ofs);
+tcg_out_mem(s, 0, RXY_STG, data, base, TCG_REG_NONE, ofs);
 }
 }
 
@@ -871,14 +873,14 @@ static void tgen_ext8s(TCGContext *s, TCGType type, 
TCGReg dest, TCGReg src)
 
 if (type == TCG_TYPE_I32) {
 if (dest == src) {
-tcg_out_sh32(s, RS_SLL, dest, SH32_REG_NONE, 24);
+tcg_out_sh32(s, RS_SLL, dest, TCG_REG_NONE, 24);
 } else {
-tcg_out_sh64(s, RSY_SLLG, dest, src, SH64_REG_NONE, 24);
+tcg_out_sh64(s, RSY_SLLG, dest, src, TCG_REG_NONE, 24);
 }
-tcg_out_sh32(s, RS_SRA, dest, SH32_REG_NONE, 24);
+tcg_out_sh32(s, RS_SRA, dest, TCG_REG_NONE, 24);
 } else {
-tcg_out_sh64(s, RSY_SLLG, dest, src, SH64_REG_NONE, 56);
-tcg_out_sh64(s, RSY_SRAG, dest, dest, SH64_REG_NONE, 56);
+tcg_out_sh64(s, RSY_SLLG, dest, src, TCG_REG_NONE, 56);
+tcg_out_sh64(s, RSY_SRAG, dest, dest, TCG_REG_NONE, 56);

[Qemu-devel] [PATCH 41/62] tcg-s390: Allocate the code_gen_buffer near the main program.

2010-05-27 Thread Richard Henderson
This allows the use of direct calls to the helpers,
and a direct branch back to the epilogue.

Signed-off-by: Richard Henderson 
---
 exec.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index bb3dcad..7bbfe60 100644
--- a/exec.c
+++ b/exec.c
@@ -519,6 +519,13 @@ static void code_gen_alloc(unsigned long tb_size)
 start = (void *) 0x0100UL;
 if (code_gen_buffer_size > 16 * 1024 * 1024)
 code_gen_buffer_size = 16 * 1024 * 1024;
+#elif defined(__s390x__)
+/* Map the buffer so that we can use direct calls and branches.  */
+/* We have a +- 4GB range on the branches; leave some slop.  */
+if (code_gen_buffer_size > (3ul * 1024 * 1024 * 1024)) {
+code_gen_buffer_size = 3ul * 1024 * 1024 * 1024;
+}
+start = (void *)0x9000UL;
 #endif
 code_gen_buffer = mmap(start, code_gen_buffer_size,
PROT_WRITE | PROT_READ | PROT_EXEC,
-- 
1.7.0.1




[Qemu-devel] [PATCH 56/62] tcg-s390: Use the LOAD AND TEST instruction for compares.

2010-05-27 Thread Richard Henderson
This instruction is always available, and nicely eliminates
the constant load for comparisons against zero.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |  133 +---
 1 files changed, 91 insertions(+), 42 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 697c5e4..edae6a8 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -45,6 +45,7 @@
 #define TCG_CT_CONST_ANDI  0x1000
 #define TCG_CT_CONST_ORI   0x2000
 #define TCG_CT_CONST_XORI  0x4000
+#define TCG_CT_CONST_CMPI  0x8000
 
 #define TCG_TMP0   TCG_REG_R14
 
@@ -126,6 +127,7 @@ typedef enum S390Opcode {
 RRE_LLGHR   = 0xb985,
 RRE_LRVR= 0xb91f,
 RRE_LRVGR   = 0xb90f,
+RRE_LTGR= 0xb902,
 RRE_MSGR= 0xb90c,
 RRE_MSR = 0xb252,
 RRE_NGR = 0xb980,
@@ -141,6 +143,7 @@ typedef enum S390Opcode {
 RR_DR   = 0x1d,
 RR_LCR  = 0x13,
 RR_LR   = 0x18,
+RR_LTR  = 0x12,
 RR_NR   = 0x14,
 RR_OR   = 0x16,
 RR_SR   = 0x1b,
@@ -242,9 +245,6 @@ static const int tcg_target_call_oarg_regs[] = {
 TCG_REG_R3,
 };
 
-/* signed/unsigned is handled by using COMPARE and COMPARE LOGICAL,
-   respectively */
-
 #define S390_CC_EQ  8
 #define S390_CC_LT  4
 #define S390_CC_GT  2
@@ -252,19 +252,37 @@ static const int tcg_target_call_oarg_regs[] = {
 #define S390_CC_NE  (S390_CC_LT | S390_CC_GT)
 #define S390_CC_LE  (S390_CC_LT | S390_CC_EQ)
 #define S390_CC_GE  (S390_CC_GT | S390_CC_EQ)
+#define S390_CC_NEVER   0
 #define S390_CC_ALWAYS  15
 
+/* Condition codes that result from a COMPARE and COMPARE LOGICAL.  */
 static const uint8_t tcg_cond_to_s390_cond[10] = {
 [TCG_COND_EQ]  = S390_CC_EQ,
+[TCG_COND_NE]  = S390_CC_NE,
 [TCG_COND_LT]  = S390_CC_LT,
-[TCG_COND_LTU] = S390_CC_LT,
 [TCG_COND_LE]  = S390_CC_LE,
-[TCG_COND_LEU] = S390_CC_LE,
 [TCG_COND_GT]  = S390_CC_GT,
-[TCG_COND_GTU] = S390_CC_GT,
 [TCG_COND_GE]  = S390_CC_GE,
+[TCG_COND_LTU] = S390_CC_LT,
+[TCG_COND_LEU] = S390_CC_LE,
+[TCG_COND_GTU] = S390_CC_GT,
 [TCG_COND_GEU] = S390_CC_GE,
+};
+
+/* Condition codes that result from a LOAD AND TEST.  Here, we have no
+   unsigned instruction variation, however since the test is vs zero we
+   can re-map the outcomes appropriately.  */
+static const uint8_t tcg_cond_to_ltr_cond[10] = {
+[TCG_COND_EQ]  = S390_CC_EQ,
 [TCG_COND_NE]  = S390_CC_NE,
+[TCG_COND_LT]  = S390_CC_LT,
+[TCG_COND_LE]  = S390_CC_LE,
+[TCG_COND_GT]  = S390_CC_GT,
+[TCG_COND_GE]  = S390_CC_GE,
+[TCG_COND_LTU] = S390_CC_NEVER,
+[TCG_COND_LEU] = S390_CC_EQ,
+[TCG_COND_GTU] = S390_CC_NE,
+[TCG_COND_GEU] = S390_CC_ALWAYS,
 };
 
 #ifdef CONFIG_SOFTMMU
@@ -381,6 +399,10 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 ct->ct &= ~TCG_CT_REG;
 ct->ct |= TCG_CT_CONST_XORI;
 break;
+case 'C':
+ct->ct &= ~TCG_CT_REG;
+ct->ct |= TCG_CT_CONST_CMPI;
+break;
 default:
 break;
 }
@@ -501,6 +523,13 @@ static int tcg_match_xori(int ct, tcg_target_long val)
 return 1;
 }
 
+/* Imediates to be used with comparisons.  */
+
+static int tcg_match_cmpi(int ct, tcg_target_long val)
+{
+return (val == 0);
+}
+
 /* Test if a constant matches the constraint. */
 static int tcg_target_const_match(tcg_target_long val,
   const TCGArgConstraint *arg_ct)
@@ -546,6 +575,8 @@ static int tcg_target_const_match(tcg_target_long val,
 return tcg_match_ori(ct, val);
 } else if (ct & TCG_CT_CONST_XORI) {
 return tcg_match_xori(ct, val);
+} else if (ct & TCG_CT_CONST_CMPI) {
+return tcg_match_cmpi(ct, val);
 }
 
 return 0;
@@ -1040,39 +1071,48 @@ static void tgen64_xori(TCGContext *s, TCGReg dest, 
tcg_target_ulong val)
 }
 }
 
-static void tgen32_cmp(TCGContext *s, TCGCond c, TCGReg r1, TCGReg r2)
-{
-if (c > TCG_COND_GT) {
-/* unsigned */
-tcg_out_insn(s, RR, CLR, r1, r2);
-} else {
-/* signed */
-tcg_out_insn(s, RR, CR, r1, r2);
-}
-}
-
-static void tgen64_cmp(TCGContext *s, TCGCond c, TCGReg r1, TCGReg r2)
+static int tgen_cmp(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
+TCGArg c2, int c2const)
 {
-if (c > TCG_COND_GT) {
-/* unsigned */
-tcg_out_insn(s, RRE, CLGR, r1, r2);
+if (c2const) {
+if (c2 == 0) {
+if (type == TCG_TYPE_I32) {
+tcg_out_insn(s, RR, LTR, r1, r1);
+} else {
+tcg_out_insn(s, RRE, LTGR, r1, r1);
+}
+return tcg_cond_to_ltr_cond[c];
+} else {
+tcg_abort();
+}
 } else {
-/* signed */
-tcg_out_insn(s, RRE, CGR, r1, r2);
+if (c > TCG_COND_GT) {
+/* unsigned */
+if (ty

Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

2010-05-27 Thread Jan Kiszka
Blue Swirl wrote:
> On Thu, May 27, 2010 at 7:08 PM, Jan Kiszka  wrote:
>> Blue Swirl wrote:
>>> On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka  wrote:
 Blue Swirl wrote:
> On Wed, May 26, 2010 at 11:26 PM, Paul Brook  
> wrote:
>>> At the other extreme, would it be possible to make the educated guests
>>> aware of the virtualization also in clock aspect: virtio-clock?
>> The guest doesn't even need to be aware of virtualization. It just needs 
>> to be
>> able to accommodate the lack of guaranteed realtime behavior.
>>
>> The fundamental problem here is that some guest operating systems assume 
>> that
>> the hardware provides certain realtime guarantees with respect to 
>> execution of
>> interrupt handlers.  In particular they assume that the CPU will always 
>> be
>> able to complete execution of the timer IRQ handler before the periodic 
>> timer
>> triggers again.  In most virtualized environments you have absolutely no
>> guarantee of realtime response.
>>
>> With Linux guests this was solved a long time ago by the introduction of
>> tickless kernels.  These separate the timekeeping from wakeup events, so 
>> it
>> doesn't matter if several wakeup triggers end up getting merged (either 
>> at the
>> hardware level or via top/bottom half guest IRQ handlers).
>>
>>
>> It's worth mentioning that this problem also occurs on real hardware,
>> typically due to lame hardware/drivers which end up masking interrupts or
>> otherwise stall the CPU for for long periods of time.
>>
>>
>> The PIT hack attempts to workaround broken guests by adding artificial 
>> latency
>> to the timer event, ensuring that the guest "sees" them all.  
>> Unfortunately
>> guests vary on when it is safe for them to see the next timer event, and
>> trying to observe this behavior involves potentially harmful heuristics 
>> and
>> collusion between unrelated devices (e.g. interrupt controller and 
>> timer).
>>
>> In some cases we don't even do that, and just reschedule the event some
>> arbitrarily small amount of time later. This assumes the guest to do 
>> useful
>> work in that time. In a single threaded environment this is probably 
>> true -
>> qemu got enough CPU to inject the first interrupt, so will probably 
>> manage to
>> execute some guest code before the end of its timeslice. In an 
>> environment
>> where interrupt processing/delivery and execution of the guest code 
>> happen in
>> different threads this becomes increasingly likely to fail.
> So any voodoo around timer events is doomed to fail in some cases.
> What's the amount of hacks what we want then? Is there any generic
 The aim of this patch is to reduce the amount of existing and upcoming
 hacks. It may still require some refinements, but I think we haven't
 found any smarter approach yet that fits existing use cases.
>>> I don't feel we have tried other possibilities hard enough.
>> Well, seeing prototypes wouldn't be bad, also to run real load againt
>> them. But at least I'm currently clueless what to implement.
> 
> Perhaps now is then not the time to rush to implement something, but
> to brainstorm for a clean solution.

And sometimes it can help to understand how ideas could even be improved
or why others doesn't work at all.

> 
> solution, like slowing down the guest system to the point where we can
> guarantee the interrupt rate vs. CPU execution speed?
 That's generally a non-option in virtualized production environments.
 Specifically if the guest system lost interrupts due to host
 overcommitment, you do not want it slow down even further.
>>> I meant that the guest time could be scaled down, for example 2s in
>>> wall clock time would be presented to the guest as 1s.
>> But that is precisely what already happens when the guest loses timer
>> interrupts. There is no other time source for this kind of guests -
>> often except for some external events generated by systems which you
>> don't want to fall behind arbitrarily.
>>
>>> Then the amount
>>> of CPU cycles between timer interrupts would increase and hopefully
>>> the guest can keep up. If the guest sleeps, time base could be
>>> accelerated to catch up with wall clock and then set back to 1:1 rate.
>> Can't follow you ATM, sorry. What should be slowed down then? And how
>> precisely?
> 
> I think vm_clock and everything that depends on vm_clock, also
> rtc_clock should be tied to vm_clock in this mode, not host_clock.

Let me check if I got this idea correctly: Instead of tuning just the
tick frequency of the affected timer device / sending its backlog in a
row, you rather want to tune the vm_clock correspondingly? Maybe a way
to abstract the required logic currently sitting only in the RTC for use
by other timer sources as well.

But just s

[Qemu-devel] [PATCH 53/62] tcg-s390: Conditionalize XOR IMMEDIATE instructions.

2010-05-27 Thread Richard Henderson
The immediate XOR instructions are in the extended-immediate
facility.  Use these only if present.

At the same time, pull the logic to load immediates into registers
into a constraint letter for TCG.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   53 +++-
 1 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 36d4ad0..084448a 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -39,6 +39,7 @@
 #define TCG_CT_CONST_MULI  0x0800
 #define TCG_CT_CONST_ANDI  0x1000
 #define TCG_CT_CONST_ORI   0x2000
+#define TCG_CT_CONST_XORI  0x4000
 
 #define TCG_TMP0   TCG_REG_R14
 
@@ -363,6 +364,10 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 ct->ct &= ~TCG_CT_REG;
 ct->ct |= TCG_CT_CONST_ORI;
 break;
+case 'X':
+ct->ct &= ~TCG_CT_REG;
+ct->ct |= TCG_CT_CONST_XORI;
+break;
 default:
 break;
 }
@@ -459,6 +464,30 @@ static int tcg_match_ori(int ct, tcg_target_long val)
 return 1;
 }
 
+/* Immediates to be used with logical XOR.  This is almost, but not quite,
+   only an optimization.  XOR with immediate is only supported with the
+   extended-immediate facility.  That said, there are a few patterns for
+   which it is better to load the value into a register first.  */
+
+static int tcg_match_xori(int ct, tcg_target_long val)
+{
+if ((facilities & FACILITY_EXT_IMM) == 0) {
+return 0;
+}
+
+if (ct & TCG_CT_CONST_32) {
+/* All 32-bit XORs can be performed with 1 48-bit insn.  */
+return 1;
+}
+
+/* Look for negative values.  These are best to load with LGHI.  */
+if (val < 0 && val == (int32_t)val) {
+return 0;
+}
+
+return 1;
+}
+
 /* Test if a constant matches the constraint. */
 static int tcg_target_const_match(tcg_target_long val,
   const TCGArgConstraint *arg_ct)
@@ -502,6 +531,8 @@ static int tcg_target_const_match(tcg_target_long val,
 return tcg_match_andi(ct, val);
 } else if (ct & TCG_CT_CONST_ORI) {
 return tcg_match_ori(ct, val);
+} else if (ct & TCG_CT_CONST_XORI) {
+return tcg_match_xori(ct, val);
 }
 
 return 0;
@@ -987,23 +1018,7 @@ static void tgen64_ori(TCGContext *s, TCGReg dest, 
tcg_target_ulong val)
 
 static void tgen64_xori(TCGContext *s, TCGReg dest, tcg_target_ulong val)
 {
-tcg_target_long sval = val;
-
-/* Zero-th, look for no-op.  */
-if (val == 0) {
-return;
-}
-
-/* First, look for 64-bit values for which it is better to load the
-   value first and perform the xor via registers.  This is true for
-   any 32-bit negative value, where the high 32-bits get flipped too.  */
-if (sval < 0 && sval == (int32_t)sval) {
-tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, sval);
-tcg_out_insn(s, RRE, XGR, dest, TCG_TMP0);
-return;
-}
-
-/* Second, perform the xor by parts.  */
+/* Perform the xor by parts.  */
 if (val & 0x) {
 tcg_out_insn(s, RIL, XILF, dest, val);
 }
@@ -1813,7 +1828,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 
 { INDEX_op_and_i32, { "r", "0", "rWA" } },
 { INDEX_op_or_i32, { "r", "0", "rWO" } },
-{ INDEX_op_xor_i32, { "r", "0", "ri" } },
+{ INDEX_op_xor_i32, { "r", "0", "rWX" } },
 { INDEX_op_neg_i32, { "r", "r" } },
 
 { INDEX_op_shl_i32, { "r", "0", "Ri" } },
@@ -1874,7 +1889,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 
 { INDEX_op_and_i64, { "r", "0", "rA" } },
 { INDEX_op_or_i64, { "r", "0", "rO" } },
-{ INDEX_op_xor_i64, { "r", "0", "ri" } },
+{ INDEX_op_xor_i64, { "r", "0", "rX" } },
 { INDEX_op_neg_i64, { "r", "r" } },
 
 { INDEX_op_shl_i64, { "r", "r", "Ri" } },
-- 
1.7.0.1




Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

2010-05-27 Thread Paul Brook
> > In some cases we don't even do that, and just reschedule the event some
> > arbitrarily small amount of time later. This assumes the guest to do
> > useful work in that time. In a single threaded environment this is
> > probably true - qemu got enough CPU to inject the first interrupt, so
> > will probably manage to execute some guest code before the end of its
> > timeslice. In an environment where interrupt processing/delivery and
> > execution of the guest code happen in different threads this becomes
> > increasingly likely to fail.
> 
> So any voodoo around timer events is doomed to fail in some cases.

Depends on the level of voodoo.
My guess is that common guest operating systems require hacks which result in 
demonstrably incorrect behavior.

> What's the amount of hacks what we want then? Is there any generic
> solution, like slowing down the guest system to the point where we can
> guarantee the interrupt rate vs. CPU execution speed?

The "-icount N" option gives deterministic virtual realtime behavior, however 
teh guuest if completely decoupled from real-world time.
The "-icount auto" option gives semi-deterministic behavior while maintaining 
overall consistency with the real world.  This may introduce some small-scale 
time jitter, but will still satisfy all but the most demanding hard-real-time 
assumptions.

Neither of these options work with KVM. It may be possible to implement 
something using using performance counters.  I don't know how much additional 
overhead this would involve.

Paul



[Qemu-devel] Re: SVM emulation: EVENTINJ marked valid when a pagefault happens while issuing a software interrupt

2010-05-27 Thread Jan Kiszka
Erik van der Kouwe wrote:
> Hi,
> 
>> Be warned: Though my experience is already more than a year old, the SVM
>> emulation in QEMU is most probably not yet rock-stable. Always check
>> suspicious behavior against real hardware and/or the spec. [ As real
>> hardware is everywhere, nesting works with KVM+SVM and is much faster,
>> motivation to improve QEMU in this area is unfortunately limited. ]
> 
> Problem is: I'm compiling in Linux and testing in MINIX. Testing on the
> real hardware would require a reboot everytime. Moreover, it might screw
> up my system if I make bad mistakes (the MINIX filesystem is easily
> corrupted).

Use Linux+KVM as host OS, it can also run VMMs as guests (aka nested
SVM). And you could even debug those guests just like when you would run
QEMU in emulation mode. In contrast to SVM emulation, nesting is fairly
stable AFAIK. And it is faster.

> 
> That said, I do aim to eventually test the real hardware. Plenty of
> virtualization capable hardware where I work, although unfortunately all
> Intel.
> 
>>> This issue is easy to work around by clearing the EVENTINJ field on each
>>> #VMEXIT (and I have submitted a patch to that effect to the Palacios
>>> people) and this approach is also found in KVM.
>>
>> /me does not find such clearing in KVM - what line(s) are you looking at?
> 
> Linux source tree (2.6.31-ubuntu), arch/x86/kvm/svm.c, end of function
> nested_svm_vmrun. Here event_inj and event_inj_err are copied from a
> different VMCB, effectively clearing the value set by the CPU. Maybe
> this isn't were I should have been looking though?

Yep. This is the path taken for injecting events when switching from
level-1 to level-2 guests, i.e. you are running some VMM inside KVM.

> 
>>> The relevant code is in target-i386/op_helper.c. The "handle_even_inj"
>>> function sets the EVENTINJ field (called event_inf in the QEMU code) and
>>> the helper_vmexit function copies that field into EXITINTINFO
>>> (exit_int_info in the QEMU code). I believe (but once again, am not
>>> certain) that the SVM documentation only says that this information
>>> should be stored in EXITINTINFO.
>>
>> Yes, this also looks suspicious. handle_even_inj should not push the
>> real (level 1) event to be injected into event_inj[_err] but into
>> exit_int_info[_err] or some temporary fields from which the exit info is
>> then loaded later on.
> 
> Yes, if this is indeed incorrect behaviour then this is what I would
> expect a fix to be like.
> 
> Thanks again,
> Erik

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH 52/62] tcg-s390: Conditionalize OR IMMEDIATE instructions.

2010-05-27 Thread Richard Henderson
The 32-bit immediate OR instructions are in the extended-immediate
facility.  Use these only if present.

At the same time, pull the logic to load immediates into registers
into a constraint letter for TCG.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   92 +
 1 files changed, 70 insertions(+), 22 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 359f6d1..36d4ad0 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -38,6 +38,7 @@
 #define TCG_CT_CONST_ADDI  0x0400
 #define TCG_CT_CONST_MULI  0x0800
 #define TCG_CT_CONST_ANDI  0x1000
+#define TCG_CT_CONST_ORI   0x2000
 
 #define TCG_TMP0   TCG_REG_R14
 
@@ -358,6 +359,10 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 ct->ct &= ~TCG_CT_REG;
 ct->ct |= TCG_CT_CONST_ANDI;
 break;
+case 'O':
+ct->ct &= ~TCG_CT_REG;
+ct->ct |= TCG_CT_CONST_ORI;
+break;
 default:
 break;
 }
@@ -424,6 +429,36 @@ static int tcg_match_andi(int ct, tcg_target_ulong val)
 return 1;
 }
 
+/* Immediates to be used with logical OR.  This is an optimization only,
+   since a full 64-bit immediate OR can always be performed with 4 sequential
+   OI[LH][LH] instructions.  What we're looking for is immediates that we
+   can load efficiently, and the immediate load plus the reg-reg OR is
+   smaller than the sequential OI's.  */
+
+static int tcg_match_ori(int ct, tcg_target_long val)
+{
+if (facilities & FACILITY_EXT_IMM) {
+if (ct & TCG_CT_CONST_32) {
+/* All 32-bit ORs can be performed with 1 48-bit insn.  */
+return 1;
+}
+}
+
+/* Look for negative values.  These are best to load with LGHI.  */
+if (val < 0) {
+if (val == (int16_t)val) {
+return 0;
+}
+if (facilities & FACILITY_EXT_IMM) {
+if (val == (int32_t)val) {
+return 0;
+}
+}
+}
+
+return 1;
+}
+
 /* Test if a constant matches the constraint. */
 static int tcg_target_const_match(tcg_target_long val,
   const TCGArgConstraint *arg_ct)
@@ -465,6 +500,8 @@ static int tcg_target_const_match(tcg_target_long val,
 }
 } else if (ct & TCG_CT_CONST_ANDI) {
 return tcg_match_andi(ct, val);
+} else if (ct & TCG_CT_CONST_ORI) {
+return tcg_match_ori(ct, val);
 }
 
 return 0;
@@ -907,34 +944,45 @@ static void tgen64_ori(TCGContext *s, TCGReg dest, 
tcg_target_ulong val)
 
 int i;
 
-/* Zero-th, look for no-op.  */
+/* Look for no-op.  */
 if (val == 0) {
 return;
 }
 
-/* First, try all 32-bit insns that can perform it in one go.  */
-for (i = 0; i < 4; i++) {
-tcg_target_ulong mask = (0xull << i*16);
-if ((val & mask) != 0 && (val & ~mask) == 0) {
-tcg_out_insn_RI(s, oi_insns[i], dest, val >> i*16);
-return;
+if (facilities & FACILITY_EXT_IMM) {
+/* Try all 32-bit insns that can perform it in one go.  */
+for (i = 0; i < 4; i++) {
+tcg_target_ulong mask = (0xull << i*16);
+if ((val & mask) != 0 && (val & ~mask) == 0) {
+tcg_out_insn_RI(s, oi_insns[i], dest, val >> i*16);
+return;
+}
 }
-}
 
-/* Second, try all 48-bit insns that can perform it in one go.  */
-for (i = 0; i < 2; i++) {
-tcg_target_ulong mask = (0xull << i*32);
-if ((val & mask) != 0 && (val & ~mask) == 0) {
-tcg_out_insn_RIL(s, nif_insns[i], dest, val >> i*32);
-return;
+/* Try all 48-bit insns that can perform it in one go.  */
+for (i = 0; i < 2; i++) {
+tcg_target_ulong mask = (0xull << i*32);
+if ((val & mask) != 0 && (val & ~mask) == 0) {
+tcg_out_insn_RIL(s, nif_insns[i], dest, val >> i*32);
+return;
+}
 }
-}
 
-/* Last, perform the OR via sequential modifications to the
-   high and low parts.  Do this via recursion to handle 16-bit
-   vs 32-bit masks in each half.  */
-tgen64_ori(s, dest, val & 0xull);
-tgen64_ori(s, dest, val & 0xull);
+/* Perform the OR via sequential modifications to the high and
+   low parts.  Do this via recursion to handle 16-bit vs 32-bit
+   masks in each half.  */
+tgen64_ori(s, dest, val & 0xull);
+tgen64_ori(s, dest, val & 0xull);
+} else {
+/* With no extended-immediate facility, we don't need to be so
+   clever.  Just iterate over the insns and mask in the constant.  */
+for (i = 0; i < 4; i++) {
+tcg_target_ulong mask = (0xull << i*16);
+if ((val & mask) != 0) {
+tcg_ou

[Qemu-devel] Re: [PATCH v3 10/17] QMP: Reserve namespace for complex object classes

2010-05-27 Thread Jan Kiszka
Luiz Capitulino wrote:
> On Sun, 23 May 2010 12:59:23 +0200
> Jan Kiszka  wrote:
> 
>> From: Jan Kiszka 
>>
>> This reserves JSON objects that contain the key '__class__' for QMP-specific
>> complex objects. First user will be the buffer class.
>>
>> Signed-off-by: Jan Kiszka 
>> ---
>>  QMP/qmp-spec.txt |   16 +---
>>  1 files changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/QMP/qmp-spec.txt b/QMP/qmp-spec.txt
>> index 9d30a8c..fa1dd62 100644
>> --- a/QMP/qmp-spec.txt
>> +++ b/QMP/qmp-spec.txt
>> @@ -146,6 +146,15 @@ The format is:
>>  For a listing of supported asynchronous events, please, refer to the
>>  qmp-events.txt file.
>>  
>> +2.6 Complex object classes
>> +--
>> +
>> +JSON objects that contain the key-value pair '"__class__": json-string' are
> 
>  I'm not strong about this, but it's better to call it just a 'pair', as 
> 'value'
> is a bit problematic because of json-value.

Hmm, the official term is "name/value pairs". Will use that instead.

> 
>> +reserved for QMP-specific complex object classes that. QMP specifies which
> 
>  Early full stop?

Obviously. I just don't remember what I wanted to add.

> 
>> +further keys each of these objects include and how they are encoded.
>> +
>> +So far, no complex object class is specified.
>> +
>>  3. QMP Examples
>>  ===
>>  
>> @@ -229,9 +238,10 @@ avoid modifying QMP.  Both upstream and downstream need 
>> to take care to
>>  preserve long-term compatibility and interoperability.
>>  
>>  To help with that, QMP reserves JSON object member names beginning with
>> -'__' (double underscore) for downstream use ("downstream names").  This
>> -means upstream will never use any downstream names for its commands,
>> -arguments, errors, asynchronous events, and so forth.
>> +'__' (double underscore) for downstream use ("downstream names").  
>> Downstream
>> +names MUST NOT end with '__' as this pattern is reserved for QMP-defined 
>> JSON
>> +object classes.  Upstream will never use any downstream names for its
>> +commands, arguments, errors, asynchronous events, and so forth.
> 
>  Suggest mentioning subsection 2.6.

OK.

Thanks,
Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] Re: [PATCH v3 06/17] qdev: Allow device specification by qtree path for device_del

2010-05-27 Thread Jan Kiszka
Luiz Capitulino wrote:
> On Sun, 23 May 2010 12:59:19 +0200
> Jan Kiszka  wrote:
> 
>> From: Jan Kiszka 
>>
>> Allow to specify the device to be removed via device_del not only by ID
>> but also by its full or abbreviated qtree path. For this purpose,
>> qdev_find is introduced which combines walking the qtree with searching
>> for device IDs if required.
> 
>  [...]
> 
>>  Arguments:
>>  
>> -- "id": the device's ID (json-string)
>> +- "path": the device's qtree path or unique ID (json-string)
>>  
>>  Example:
>>  
>> --> { "execute": "device_del", "arguments": { "id": "net1" } }
>> +-> { "execute": "device_del", "arguments": { "path": "net1" } }
> 
>  Doesn't seem like a good change to me, besides being incompatible[1] we
> shouldn't overload arguments this way in QMP as overloading leads to
> interface degradation (harder to use, understand, maintain).

It's not overloaded, think of an ID as a (weak) symbolic link in the
qtree filesystem. The advantage of basing everything on top of full or
abbreviated qtree paths is that IDs are not always assigned, paths are.

> 
>  Maybe we could have both arguments as optional, but one must be passed.

This would at least require some way to keep the proposed unified path
specification for the human monitor (having separate arguments there is
really unhandy).

> 
> [1] It's 'legal' to break the protocol before 0.13, but this has to be
> coordinated with libvirt so we should have a good reason to do this
> 
>>  <- { "return": {} }
>>  
>>  EQMP
> 

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH 49/62] tcg-s390: Conditionalize LOAD IMMEDIATE instructions.

2010-05-27 Thread Richard Henderson
The LOAD IMMEDIATE and (some of) the LOAD LOGICAL IMMEDIATE instructions
are in the extended-immediate facility.  Begin making that facility
optional by using these only if present.  Thankfully, the LOAD ADDRESS
RELATIVE and the LOAD LOGICAL IMMEDIATE insns with 16-bit constants are
always available.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   79 
 1 files changed, 59 insertions(+), 20 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index b66778a..491de07 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -491,7 +491,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 sval = (int32_t)sval;
 }
 
-/* First, try all 32-bit insns that can load it in one go.  */
+/* Try all 32-bit insns that can load it in one go.  */
 if (sval >= -0x8000 && sval < 0x8000) {
 tcg_out_insn(s, RI, LGHI, ret, sval);
 return;
@@ -505,22 +505,22 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 }
 }
 
-/* Second, try all 48-bit insns that can load it in one go.  */
-if (sval == (int32_t)sval) {
-tcg_out_insn(s, RIL, LGFI, ret, sval);
-return;
-}
-if (uval <= 0x) {
-tcg_out_insn(s, RIL, LLILF, ret, uval);
-return;
-}
-if ((uval & 0x) == 0) {
-tcg_out_insn(s, RIL, LLIHF, ret, uval >> 32);
-return;
+/* Try all 48-bit insns that can load it in one go.  */
+if (facilities & FACILITY_EXT_IMM) {
+if (sval == (int32_t)sval) {
+tcg_out_insn(s, RIL, LGFI, ret, sval);
+return;
+}
+if (uval <= 0x) {
+tcg_out_insn(s, RIL, LLILF, ret, uval);
+return;
+}
+if ((uval & 0x) == 0) {
+tcg_out_insn(s, RIL, LLIHF, ret, uval >> 32);
+return;
+}
 }
 
-/* If we get here, both the high and low parts have non-zero bits.  */
-
 /* Try for PC-relative address load.  */
 if ((sval & 1) == 0) {
 intptr_t off = (sval - (intptr_t)s->code_ptr) >> 1;
@@ -530,17 +530,56 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 }
 }
 
+/* If extended immediates are not present, then we may have to issue
+   several instructions to load the low 32 bits.  */
+if (!(facilities & FACILITY_EXT_IMM)) {
+/* A 32-bit unsigned value can be loaded in 2 insns.  And given
+   that the lli_insns loop above did not succeed, we know that
+   both insns are required.  */
+if (uval <= 0x) {
+tcg_out_insn(s, RI, LLILL, ret, uval);
+tcg_out_insn(s, RI, IILH, ret, uval >> 16);
+return;
+}
+
+/* If all high bits are set, the value can be loaded in 2 or 3 insns.
+   We first want to make sure that all the high bits get set.  With
+   luck the low 16-bits can be considered negative to perform that for
+   free, otherwise we load an explicit -1.  */
+if (sval >> 32 == -1) {
+if (uval & 0x8000) {
+tcg_out_insn(s, RI, LGHI, ret, uval);
+} else {
+tcg_out_insn(s, RI, LGHI, ret, -1);
+tcg_out_insn(s, RI, IILL, ret, uval);
+}
+tcg_out_insn(s, RI, IILH, ret, uval >> 16);
+return;
+}
+}
+
+/* If we get here, both the high and low parts have non-zero bits.  */
+
 /* Recurse to load the lower 32-bits.  */
 tcg_out_movi(s, TCG_TYPE_I32, ret, sval);
 
 /* Insert data into the high 32-bits.  */
 uval >>= 32;
-if (uval < 0x1) {
-tcg_out_insn(s, RI, IIHL, ret, uval);
-} else if ((uval & 0x) == 0) {
-tcg_out_insn(s, RI, IIHH, ret, uval >> 16);
+if (facilities & FACILITY_EXT_IMM) {
+if (uval < 0x1) {
+tcg_out_insn(s, RI, IIHL, ret, uval);
+} else if ((uval & 0x) == 0) {
+tcg_out_insn(s, RI, IIHH, ret, uval >> 16);
+} else {
+tcg_out_insn(s, RIL, IIHF, ret, uval);
+}
 } else {
-tcg_out_insn(s, RIL, IIHF, ret, uval);
+if (uval & 0x) {
+tcg_out_insn(s, RI, IIHL, ret, uval);
+}
+if (uval & 0x) {
+tcg_out_insn(s, RI, IIHH, ret, uval >> 16);
+}
 }
 }
 
-- 
1.7.0.1




[Qemu-devel] [PATCH 50/62] tcg-s390: Conditionalize 8 and 16 bit extensions.

2010-05-27 Thread Richard Henderson
These instructions are part of the extended-immediate facility.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |  115 ++---
 1 files changed, 90 insertions(+), 25 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 491de07..8a7c9ae 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -156,11 +156,9 @@ typedef enum S390Opcode {
 RXY_LGF = 0xe314,
 RXY_LGH = 0xe315,
 RXY_LHY = 0xe378,
-RXY_LLC = 0xe394,
 RXY_LLGC= 0xe390,
 RXY_LLGF= 0xe316,
 RXY_LLGH= 0xe391,
-RXY_LLH = 0xe395,
 RXY_LMG = 0xeb04,
 RXY_LRV = 0xe31e,
 RXY_LRVG= 0xe30f,
@@ -653,24 +651,84 @@ static void tcg_out_ld_abs(TCGContext *s, TCGType type, 
TCGReg dest, void *abs)
 tcg_out_ld(s, type, dest, dest, addr & 0x);
 }
 
-static inline void tgen_ext8s(TCGContext *s, TCGReg dest, TCGReg src)
+static void tgen_ext8s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-tcg_out_insn(s, RRE, LGBR, dest, src);
+if (facilities & FACILITY_EXT_IMM) {
+tcg_out_insn(s, RRE, LGBR, dest, src);
+return;
+}
+
+if (type == TCG_TYPE_I32) {
+if (dest == src) {
+tcg_out_sh32(s, RS_SLL, dest, SH32_REG_NONE, 24);
+} else {
+tcg_out_sh64(s, RSY_SLLG, dest, src, SH64_REG_NONE, 24);
+}
+tcg_out_sh32(s, RS_SRA, dest, SH32_REG_NONE, 24);
+} else {
+tcg_out_sh64(s, RSY_SLLG, dest, src, SH64_REG_NONE, 56);
+tcg_out_sh64(s, RSY_SRAG, dest, dest, SH64_REG_NONE, 56);
+}
 }
 
-static inline void tgen_ext8u(TCGContext *s, TCGReg dest, TCGReg src)
+static void tgen_ext8u(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-tcg_out_insn(s, RRE, LLGCR, dest, src);
+if (facilities & FACILITY_EXT_IMM) {
+tcg_out_insn(s, RRE, LLGCR, dest, src);
+return;
+}
+
+if (dest == src) {
+tcg_out_movi(s, type, TCG_TMP0, 0xff);
+src = TCG_TMP0;
+} else {
+tcg_out_movi(s, type, dest, 0xff);
+}
+if (type == TCG_TYPE_I32) {
+tcg_out_insn(s, RR, NR, dest, src);
+} else {
+tcg_out_insn(s, RRE, NGR, dest, src);
+}
 }
 
-static inline void tgen_ext16s(TCGContext *s, TCGReg dest, TCGReg src)
+static void tgen_ext16s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-tcg_out_insn(s, RRE, LGHR, dest, src);
+if (facilities & FACILITY_EXT_IMM) {
+tcg_out_insn(s, RRE, LGHR, dest, src);
+return;
+}
+
+if (type == TCG_TYPE_I32) {
+if (dest == src) {
+tcg_out_sh32(s, RS_SLL, dest, SH32_REG_NONE, 16);
+} else {
+tcg_out_sh64(s, RSY_SLLG, dest, src, SH64_REG_NONE, 16);
+}
+tcg_out_sh32(s, RS_SRA, dest, SH32_REG_NONE, 24);
+} else {
+tcg_out_sh64(s, RSY_SLLG, dest, src, SH64_REG_NONE, 48);
+tcg_out_sh64(s, RSY_SRAG, dest, dest, SH64_REG_NONE, 48);
+}
 }
 
-static inline void tgen_ext16u(TCGContext *s, TCGReg dest, TCGReg src)
+static void tgen_ext16u(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-tcg_out_insn(s, RRE, LLGHR, dest, src);
+if (facilities & FACILITY_EXT_IMM) {
+tcg_out_insn(s, RRE, LLGHR, dest, src);
+return;
+}
+
+if (dest == src) {
+tcg_out_movi(s, type, TCG_TMP0, 0x);
+src = TCG_TMP0;
+} else {
+tcg_out_movi(s, type, dest, 0x);
+}
+if (type == TCG_TYPE_I32) {
+tcg_out_insn(s, RR, NR, dest, src);
+} else {
+tcg_out_insn(s, RRE, NGR, dest, src);
+}
 }
 
 static inline void tgen_ext32s(TCGContext *s, TCGReg dest, TCGReg src)
@@ -972,7 +1030,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, 
TCGReg data,
 if (bswap) {
 /* swapped unsigned halfword load with upper bits zeroed */
 tcg_out_insn(s, RXY, LRVH, data, base, index, disp);
-tgen_ext16u(s, data, data);
+tgen_ext16u(s, TCG_TYPE_I64, data, data);
 } else {
 tcg_out_insn(s, RXY, LLGH, data, base, index, disp);
 }
@@ -981,7 +1039,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, 
TCGReg data,
 if (bswap) {
 /* swapped sign-extended halfword load */
 tcg_out_insn(s, RXY, LRVH, data, base, index, disp);
-tgen_ext16s(s, data, data);
+tgen_ext16s(s, TCG_TYPE_I64, data, data);
 } else {
 tcg_out_insn(s, RXY, LGH, data, base, index, disp);
 }
@@ -1117,10 +1175,10 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, TCGReg 
data_reg,
 /* sign extension */
 switch (opc) {
 case LD_INT8:
-tgen_ext8s(s, data_reg, arg0);
+tgen_ext8s(s, TCG_TYPE_I64, data_reg, arg0);
 break;
 case LD_INT16:
-tgen_ext16s(s, data_reg, arg0);
+tgen_ext16s(s, TCG_TYPE_I64,

[Qemu-devel] [PATCH 48/62] tcg-s390: Conditionalize ADD IMMEDIATE instructions.

2010-05-27 Thread Richard Henderson
The ADD IMMEDIATE instructions are in the extended-immediate facility.
Begin making that facility optional by using these only if present.
This requires rearranging the way constants constraints are handled,
so that we properly canonicalize constants for 32-bit operations.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   64 +++-
 1 files changed, 46 insertions(+), 18 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index aecabf9..b66778a 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -33,9 +33,10 @@
 do { } while (0)
 #endif
 
-#define TCG_CT_CONST_S32   0x100
-#define TCG_CT_CONST_N32   0x200
-#define TCG_CT_CONST_MULI  0x400
+#define TCG_CT_CONST_320x100
+#define TCG_CT_CONST_NEG   0x200
+#define TCG_CT_CONST_ADDI  0x400
+#define TCG_CT_CONST_MULI  0x800
 
 #define TCG_TMP0   TCG_REG_R14
 
@@ -57,6 +58,7 @@
 typedef enum S390Opcode {
 RIL_AFI = 0xc209,
 RIL_AGFI= 0xc208,
+RIL_ALGFI   = 0xc20a,
 RIL_BRASL   = 0xc005,
 RIL_BRCL= 0xc004,
 RIL_IIHF= 0xc008,
@@ -337,13 +339,17 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 tcg_regset_clear(ct->u.regs);
 tcg_regset_set_reg(ct->u.regs, TCG_REG_R3);
 break;
-case 'I':
+case 'N':  /* force immediate negate */
+ct->ct &= ~TCG_CT_REG;
+ct->ct |= TCG_CT_CONST_NEG;
+break;
+case 'W':  /* force 32-bit ("word") immediate */
 ct->ct &= ~TCG_CT_REG;
-ct->ct |= TCG_CT_CONST_S32;
+ct->ct |= TCG_CT_CONST_32;
 break;
-case 'J':
+case 'I':
 ct->ct &= ~TCG_CT_REG;
-ct->ct |= TCG_CT_CONST_N32;
+ct->ct |= TCG_CT_CONST_ADDI;
 break;
 case 'K':
 ct->ct &= ~TCG_CT_REG;
@@ -366,10 +372,27 @@ static inline int tcg_target_const_match(tcg_target_long 
val,
 
 if (ct & TCG_CT_CONST) {
 return 1;
-} else if (ct & TCG_CT_CONST_S32) {
-return val == (int32_t)val;
-} else if (ct & TCG_CT_CONST_N32) {
-return -val == (int32_t)-val;
+}
+
+/* Handle the modifiers.  */
+if (ct & TCG_CT_CONST_NEG) {
+val = -val;
+}
+if (ct & TCG_CT_CONST_32) {
+val = (int32_t)val;
+}
+
+/* The following are mutually exclusive.  */
+if (ct & TCG_CT_CONST_ADDI) {
+/* Immediates that may be used with add.  If we have the
+   extended-immediates facility then we have ADD IMMEDIATE
+   with signed and unsigned 32-bit, otherwise we have only
+   ADD HALFWORD IMMEDIATE with a signed 16-bit.  */
+if (facilities & FACILITY_EXT_IMM) {
+return val == (int32_t)val || val == (uint32_t)val;
+} else {
+return val == (int16_t)val;
+}
 } else if (ct & TCG_CT_CONST_MULI) {
 /* Immediates that may be used with multiply.  If we have the
general-instruction-extensions, then we have MULTIPLY SINGLE
@@ -621,7 +644,7 @@ static inline void tgen_ext32u(TCGContext *s, TCGReg dest, 
TCGReg src)
 tcg_out_insn(s, RRE, LLGFR, dest, src);
 }
 
-static inline void tgen32_addi(TCGContext *s, TCGReg dest, tcg_target_long val)
+static void tgen32_addi(TCGContext *s, TCGReg dest, int32_t val)
 {
 if (val == (int16_t)val) {
 tcg_out_insn(s, RI, AHI, dest, val);
@@ -630,13 +653,18 @@ static inline void tgen32_addi(TCGContext *s, TCGReg 
dest, tcg_target_long val)
 }
 }
 
-static inline void tgen64_addi(TCGContext *s, TCGReg dest, tcg_target_long val)
+static void tgen64_addi(TCGContext *s, TCGReg dest, int64_t val)
 {
 if (val == (int16_t)val) {
 tcg_out_insn(s, RI, AGHI, dest, val);
-} else {
+} else if (val == (int32_t)val) {
 tcg_out_insn(s, RIL, AGFI, dest, val);
+} else if (val == (uint32_t)val) {
+tcg_out_insn(s, RIL, ALGFI, dest, val);
+} else {
+tcg_abort();
 }
+
 }
 
 static void tgen32_andi(TCGContext *s, TCGReg dest, uint32_t val)
@@ -1589,9 +1617,9 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_st16_i32, { "r", "r" } },
 { INDEX_op_st_i32, { "r", "r" } },
 
-{ INDEX_op_add_i32, { "r", "0", "ri" } },
-{ INDEX_op_sub_i32, { "r", "0", "ri" } },
-{ INDEX_op_mul_i32, { "r", "0", "rK" } },
+{ INDEX_op_add_i32, { "r", "0", "rWI" } },
+{ INDEX_op_sub_i32, { "r", "0", "rWNI" } },
+{ INDEX_op_mul_i32, { "r", "0", "rWK" } },
 
 { INDEX_op_div2_i32, { "b", "a", "0", "1", "r" } },
 { INDEX_op_divu2_i32, { "b", "a", "0", "1", "r" } },
@@ -1651,7 +1679,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_st_i64, { "r", "r" } },
 
 { INDEX_op_add_i64, { "r", "0", "rI" } },
-{ INDEX_op_sub_i64, { "r", "0", "rJ" } },
+{ INDEX_op_sub_i64, { "r", "0", "rNI" } },
 { INDEX_op_mul_i64, { "r", "0", "rK" } },
 
 { INDEX_op_div2_i64, { "b", "a", "0", 

[Qemu-devel] [PATCH 61/62] tcg-s390: Enable compile in 32-bit mode.

2010-05-27 Thread Richard Henderson
The TCG translator will *not* work in 32-bit mode, and there is a
check added to query_facilities to enforce that.

However, QEMU can run in KVM mode when built in 32-bit mode, and
this patch is just good enough to enable that method to continue.

Signed-off-by: Richard Henderson 
---
 configure |3 +-
 tcg/s390/tcg-target.c |  386 +
 tcg/s390/tcg-target.h |7 +
 3 files changed, 207 insertions(+), 189 deletions(-)

diff --git a/configure b/configure
index f818198..f565026 100755
--- a/configure
+++ b/configure
@@ -697,7 +697,8 @@ case "$cpu" in
fi
;;
 s390)
-   QEMU_CFLAGS="-march=z990 $QEMU_CFLAGS"
+   QEMU_CFLAGS="-m31 -march=z990 $QEMU_CFLAGS"
+   LDFLAGS="-m31 $LDFLAGS"
host_guest_base="yes"
;;
 s390x)
diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index ec4c72a..cb1b013 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -727,7 +727,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 return;
 }
 if ((uval & 0x) == 0) {
-tcg_out_insn(s, RIL, LLIHF, ret, uval >> 32);
+tcg_out_insn(s, RIL, LLIHF, ret, uval >> 31 >> 1);
 return;
 }
 }
@@ -757,7 +757,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
We first want to make sure that all the high bits get set.  With
luck the low 16-bits can be considered negative to perform that for
free, otherwise we load an explicit -1.  */
-if (sval >> 32 == -1) {
+if (sval >> 31 >> 1 == -1) {
 if (uval & 0x8000) {
 tcg_out_insn(s, RI, LGHI, ret, uval);
 } else {
@@ -775,7 +775,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 tcg_out_movi(s, TCG_TYPE_I32, ret, sval);
 
 /* Insert data into the high 32-bits.  */
-uval >>= 32;
+uval = uval >> 31 >> 1;
 if (facilities & FACILITY_EXT_IMM) {
 if (uval < 0x1) {
 tcg_out_insn(s, RI, IIHL, ret, uval);
@@ -958,7 +958,7 @@ static inline void tgen_ext32u(TCGContext *s, TCGReg dest, 
TCGReg src)
 tcg_out_insn(s, RRE, LLGFR, dest, src);
 }
 
-static void tgen32_addi(TCGContext *s, TCGReg dest, int32_t val)
+static inline void tgen32_addi(TCGContext *s, TCGReg dest, int32_t val)
 {
 if (val == (int16_t)val) {
 tcg_out_insn(s, RI, AHI, dest, val);
@@ -967,7 +967,7 @@ static void tgen32_addi(TCGContext *s, TCGReg dest, int32_t 
val)
 }
 }
 
-static void tgen64_addi(TCGContext *s, TCGReg dest, int64_t val)
+static inline void tgen64_addi(TCGContext *s, TCGReg dest, int64_t val)
 {
 if (val == (int16_t)val) {
 tcg_out_insn(s, RI, AGHI, dest, val);
@@ -1108,7 +1108,7 @@ static void tgen64_xori(TCGContext *s, TCGReg dest, 
tcg_target_ulong val)
 tcg_out_insn(s, RIL, XILF, dest, val);
 }
 if (val > 0x) {
-tcg_out_insn(s, RIL, XIHF, dest, val >> 32);
+tcg_out_insn(s, RIL, XIHF, dest, val >> 31 >> 1);
 }
 }
 
@@ -1589,6 +1589,15 @@ static void tcg_out_qemu_st(TCGContext* s, const TCGArg* 
args, int opc)
 #endif
 }
 
+#if TCG_TARGET_REG_BITS == 64
+# define OP_32_64(x) \
+case glue(glue(INDEX_op_,x),_i32): \
+case glue(glue(INDEX_op_,x),_i64)
+#else
+# define OP_32_64(x) \
+case glue(glue(INDEX_op_,x),_i32)
+#endif
+
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 const TCGArg *args, const int *const_args)
 {
@@ -1621,21 +1630,18 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 
-case INDEX_op_ld8u_i32:
-case INDEX_op_ld8u_i64:
+OP_32_64(ld8u):
 /* ??? LLC (RXY format) is only present with the extended-immediate
facility, whereas LLGC is always present.  */
 tcg_out_mem(s, 0, RXY_LLGC, args[0], args[1], TCG_REG_NONE, args[2]);
 break;
 
-case INDEX_op_ld8s_i32:
-case INDEX_op_ld8s_i64:
+OP_32_64(ld8s):
 /* ??? LB is no smaller than LGB, so no point to using it.  */
 tcg_out_mem(s, 0, RXY_LGB, args[0], args[1], TCG_REG_NONE, args[2]);
 break;
 
-case INDEX_op_ld16u_i32:
-case INDEX_op_ld16u_i64:
+OP_32_64(ld16u):
 /* ??? LLH (RXY format) is only present with the extended-immediate
facility, whereas LLGH is always present.  */
 tcg_out_mem(s, 0, RXY_LLGH, args[0], args[1], TCG_REG_NONE, args[2]);
@@ -1644,45 +1650,25 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 case INDEX_op_ld16s_i32:
 tcg_out_mem(s, RX_LH, RXY_LHY, args[0], args[1], TCG_REG_NONE, 
args[2]);
 break;
-case INDEX_op_ld16s_i64:
-tcg_out_mem(s, 0, RXY_LGH, args[0], args[1], TCG_REG_NONE, args[2]);
-break;
 
 case INDEX_op_ld_i32:
 tcg_out_ld(s, TCG_TYPE_I32, args[0], args[1], args[2]);
 break;
-cas

[Qemu-devel] Re: [PATCH v3 13/17] monitor: Allow to exclude commands from QMP

2010-05-27 Thread Jan Kiszka
Luiz Capitulino wrote:
> On Sun, 23 May 2010 12:59:26 +0200
> Jan Kiszka  wrote:
> 
>> From: Jan Kiszka 
>>
>> Ported commands that are marked 'user_only' will not be considered for
>> QMP monitor sessions. This allows to implement new commands that do not
>> (yet) provide a sufficiently stable interface for QMP use (e.g.
>> device_show).
> 
>  This is fine for me, but two things I've been wondering:
> 
>  1. Isn't a 'flags' struct member better? So that we can do (in the
> qemu-monitor.hx entry):
> 
> .flags = MONITOR_USER_ONLY | MONITOR_HANDLER_ASYNC,
> 
> I'm not suggesting this is an async handler, just exemplifying multiple
> flags.

Yes, can refactor this.

> 
>   2. Getting QMP handlers right in the first time might be difficult, so
>  we could have a way to mark them unstable. Maybe a different namespace
>  which is only enabled at configure time with:
> 
>  --enable-qmp-unstable-commands
> 
>  If this were possible, we could have device_show and any command we
>  aren't sure is QMP-ready working in QMP this way.

Do you suggest this as an alternative to this patch? Or an extension
later on? I have no opinion on this yet, I would just like to know how
to proceed for this series.

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH 39/62] tcg-s390: Rearrange register allocation order.

2010-05-27 Thread Richard Henderson
Try to avoid conflicting with the outgoing function call arguments.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   23 +--
 1 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index a26c963..eb57e24 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -186,22 +186,25 @@ static const char * const 
tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif
 
+/* Since R6 is a potential argument register, choose it last of the
+   call-saved registers.  Likewise prefer the call-clobbered registers
+   in reverse order to maximize the chance of avoiding the arguments.  */
 static const int tcg_target_reg_alloc_order[] = {
-TCG_REG_R6,
-TCG_REG_R7,
-TCG_REG_R8,
-TCG_REG_R9,
-TCG_REG_R10,
-TCG_REG_R11,
-TCG_REG_R12,
 TCG_REG_R13,
+TCG_REG_R12,
+TCG_REG_R11,
+TCG_REG_R10,
+TCG_REG_R9,
+TCG_REG_R8,
+TCG_REG_R7,
+TCG_REG_R6,
 TCG_REG_R14,
 TCG_REG_R0,
 TCG_REG_R1,
-TCG_REG_R2,
-TCG_REG_R3,
-TCG_REG_R4,
 TCG_REG_R5,
+TCG_REG_R4,
+TCG_REG_R3,
+TCG_REG_R2,
 };
 
 static const int tcg_target_call_iarg_regs[] = {
-- 
1.7.0.1




[Qemu-devel] [PATCH 43/62] tcg-s390: Tidy tcg_prepare_qemu_ldst.

2010-05-27 Thread Richard Henderson
Make use of the reg+reg+disp addressing mode to eliminate
redundant additions.  Make use of the load-and-operate insns.
Avoid an extra register copy when using the 64-bit shift insns.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   56 
 1 files changed, 19 insertions(+), 37 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 5d2efaa..000a646 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -135,6 +135,7 @@ typedef enum S390Opcode {
 RS_SRA  = 0x8a,
 RS_SRL  = 0x88,
 
+RXY_AG  = 0xe308,
 RXY_CG  = 0xe320,
 RXY_LB  = 0xe376,
 RXY_LG  = 0xe304,
@@ -962,24 +963,16 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, TCGReg 
data_reg,
 {
 const TCGReg arg0 = TCG_REG_R2;
 const TCGReg arg1 = TCG_REG_R3;
-const TCGReg arg2 = TCG_REG_R4;
-int s_bits;
+int s_bits = opc & 3;
 uint16_t *label1_ptr;
+tcg_target_long ofs;
 
-if (is_store) {
-s_bits = opc;
+if (TARGET_LONG_BITS == 32) {
+tgen_ext32u(s, arg0, addr_reg);
 } else {
-s_bits = opc & 3;
+tcg_out_mov(s, arg0, addr_reg);
 }
 
-#if TARGET_LONG_BITS == 32
-tgen_ext32u(s, arg1, addr_reg);
-tgen_ext32u(s, arg0, addr_reg);
-#else
-tcg_out_mov(s, arg1, addr_reg);
-tcg_out_mov(s, arg0, addr_reg);
-#endif
-
 tcg_out_sh64(s, RSY_SRLG, arg1, addr_reg, SH64_REG_NONE,
  TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
 
@@ -987,17 +980,19 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, TCGReg 
data_reg,
 tgen64_andi(s, arg1, (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
 
 if (is_store) {
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0,
- offsetof(CPUState, tlb_table[mem_index][0].addr_write));
+ofs = offsetof(CPUState, tlb_table[mem_index][0].addr_write);
 } else {
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0,
- offsetof(CPUState, tlb_table[mem_index][0].addr_read));
+ofs = offsetof(CPUState, tlb_table[mem_index][0].addr_read);
 }
-tcg_out_insn(s, RRE, AGR, arg1, TCG_TMP0);
+assert(ofs < 0x8);
 
-tcg_out_insn(s, RRE, AGR, arg1, TCG_AREG0);
+tcg_out_insn(s, RXY, CG, arg0, arg1, TCG_AREG0, ofs);
 
-tcg_out_insn(s, RXY, CG, arg0, arg1, 0, 0);
+if (TARGET_LONG_BITS == 32) {
+tgen_ext32u(s, arg0, addr_reg);
+} else {
+tcg_out_mov(s, arg0, addr_reg);
+}
 
 label1_ptr = (uint16_t*)s->code_ptr;
 
@@ -1005,15 +1000,9 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, TCGReg 
data_reg,
 tcg_out_insn(s, RI, BRC, S390_CC_EQ, 0);
 
 /* call load/store helper */
-#if TARGET_LONG_BITS == 32
-tgen_ext32u(s, arg0, addr_reg);
-#else
-tcg_out_mov(s, arg0, addr_reg);
-#endif
-
 if (is_store) {
 tcg_out_mov(s, arg1, data_reg);
-tcg_out_movi(s, TCG_TYPE_I32, arg2, mem_index);
+tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R4, mem_index);
 tgen_calli(s, (tcg_target_ulong)qemu_st_helpers[s_bits]);
 } else {
 tcg_out_movi(s, TCG_TYPE_I32, arg1, mem_index);
@@ -1046,17 +1035,10 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, TCGReg 
data_reg,
 *(label1_ptr + 1) = ((unsigned long)s->code_ptr -
  (unsigned long)label1_ptr) >> 1;
 
-if (is_store) {
-tcg_out_insn(s, RXY, LG, arg1, arg1, 0,
- offsetof(CPUTLBEntry, addend)
- - offsetof(CPUTLBEntry, addr_write));
-} else {
-tcg_out_insn(s, RXY, LG, arg1, arg1, 0,
- offsetof(CPUTLBEntry, addend)
- - offsetof(CPUTLBEntry, addr_read));
-}
+ofs = offsetof(CPUState, tlb_table[mem_index][0].addend);
+assert(ofs < 0x8);
 
-tcg_out_insn(s, RRE, AGR, arg0, arg1);
+tcg_out_insn(s, RXY, AG, arg0, arg1, TCG_AREG0, ofs);
 }
 
 static void tcg_finish_qemu_ldst(TCGContext* s, uint16_t *label2_ptr)
-- 
1.7.0.1




[Qemu-devel] [PATCH 54/62] tcg-s390: Do not require the extended-immediate facility.

2010-05-27 Thread Richard Henderson
All of the instructions from this group are now conditionalized.

Signed-off-by: Richard Henderson 
---
 configure |2 +-
 tcg/s390/tcg-target.c |4 
 2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/configure b/configure
index 56dee88..f818198 100755
--- a/configure
+++ b/configure
@@ -701,7 +701,7 @@ case "$cpu" in
host_guest_base="yes"
;;
 s390x)
-   QEMU_CFLAGS="-m64 -march=z9-109 $QEMU_CFLAGS"
+   QEMU_CFLAGS="-m64 -march=z990 $QEMU_CFLAGS"
LDFLAGS="-m64 $LDFLAGS"
host_guest_base="yes"
;;
diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 084448a..0dc71e2 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -2010,10 +2010,6 @@ static void query_facilities(void)
 fprintf(stderr, "TCG: long-displacement facility is required\n");
 fail = 1;
 }
-if ((facilities & FACILITY_EXT_IMM) == 0) {
-fprintf(stderr, "TCG: extended-immediate facility is required\n");
-fail = 1;
-}
 if (fail) {
 exit(-1);
 }
-- 
1.7.0.1




[Qemu-devel] [PATCH 45/62] tcg-s390: Implement GUEST_BASE.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 configure |2 ++
 tcg/s390/tcg-target.c |   30 +-
 tcg/s390/tcg-target.h |2 ++
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/configure b/configure
index 72d3df8..56dee88 100755
--- a/configure
+++ b/configure
@@ -698,10 +698,12 @@ case "$cpu" in
;;
 s390)
QEMU_CFLAGS="-march=z990 $QEMU_CFLAGS"
+   host_guest_base="yes"
;;
 s390x)
QEMU_CFLAGS="-m64 -march=z9-109 $QEMU_CFLAGS"
LDFLAGS="-m64 $LDFLAGS"
+   host_guest_base="yes"
;;
 i386)
QEMU_CFLAGS="-m32 $QEMU_CFLAGS"
diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index fa089ab..4a3235c 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -33,10 +33,20 @@
 do { } while (0)
 #endif
 
-#define TCG_CT_CONST_S320x100
-#define TCG_CT_CONST_N320x200
+#define TCG_CT_CONST_S32   0x100
+#define TCG_CT_CONST_N32   0x200
 
-#define TCG_TMP0TCG_REG_R14
+#define TCG_TMP0   TCG_REG_R14
+
+#ifdef CONFIG_USE_GUEST_BASE
+#define TCG_GUEST_BASE_REG TCG_REG_R13
+#else
+#define TCG_GUEST_BASE_REG TCG_REG_R0
+#endif
+
+#ifndef GUEST_BASE
+#define GUEST_BASE 0
+#endif
 
 
 /* All of the following instructions are prefixed with their instruction
@@ -1051,12 +1061,17 @@ static void tcg_finish_qemu_ldst(TCGContext* s, 
uint16_t *label2_ptr)
 static void tcg_prepare_user_ldst(TCGContext *s, TCGReg *addr_reg,
   TCGReg *index_reg, tcg_target_long *disp)
 {
-*index_reg = 0;
-*disp = 0;
 if (TARGET_LONG_BITS == 32) {
 tgen_ext32u(s, TCG_TMP0, *addr_reg);
 *addr_reg = TCG_TMP0;
 }
+if (GUEST_BASE < 0x8) {
+*index_reg = 0;
+*disp = GUEST_BASE;
+} else {
+*index_reg = TCG_GUEST_BASE_REG;
+*disp = 0;
+}
 }
 #endif /* CONFIG_SOFTMMU */
 
@@ -1682,6 +1697,11 @@ void tcg_target_qemu_prologue(TCGContext *s)
 /* aghi %r15,-160 (stack frame) */
 tcg_out_insn(s, RI, AGHI, TCG_REG_R15, -160);
 
+if (GUEST_BASE >= 0x8) {
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, GUEST_BASE);
+tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
+}
+
 /* br %r2 (go to TB) */
 tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, TCG_REG_R2);
 
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index fae8ed7..940f530 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -83,6 +83,8 @@ typedef enum TCGReg {
 // #define TCG_TARGET_HAS_nand_i64
 // #define TCG_TARGET_HAS_nor_i64
 
+#define TCG_TARGET_HAS_GUEST_BASE
+
 /* used for function call generation */
 #define TCG_REG_CALL_STACK TCG_REG_R15
 #define TCG_TARGET_STACK_ALIGN 8
-- 
1.7.0.1




[Qemu-devel] [PATCH 62/62] tcg: Optionally sign-extend 32-bit arguments for 64-bit host.

2010-05-27 Thread Richard Henderson
Some hosts (amd64, ia64) have an ABI that ignores the high bits
of the 64-bit register when passing 32-bit arguments.  Others,
like s390x, require the value to be properly sign-extended for
the type.  I.e. "int32_t" must be sign-extended and "uint32_t"
must be zero-extended to 64-bits.

To effect this, extend the "sizemask" parameter to tcg_gen_callN
to include the signedness of the type of each parameter.  If the
tcg target requires it, extend each 32-bit argument into a 64-bit
temp and pass that to the function call.

Signed-off-by: Richard Henderson 
---
 def-helper.h |   38 +-
 target-i386/ops_sse_header.h |3 +++
 target-ppc/helper.h  |1 +
 tcg/s390/tcg-target.h|2 ++
 tcg/tcg-op.h |   34 +-
 tcg/tcg.c|   41 +++--
 6 files changed, 87 insertions(+), 32 deletions(-)

diff --git a/def-helper.h b/def-helper.h
index 8a88c5b..8a822c7 100644
--- a/def-helper.h
+++ b/def-helper.h
@@ -81,9 +81,29 @@
 #define dh_is_64bit_ptr (TCG_TARGET_REG_BITS == 64)
 #define dh_is_64bit(t) glue(dh_is_64bit_, dh_alias(t))
 
+#define dh_is_signed_void 0
+#define dh_is_signed_i32 0
+#define dh_is_signed_s32 1
+#define dh_is_signed_i64 0
+#define dh_is_signed_s64 1
+#define dh_is_signed_f32 0
+#define dh_is_signed_f64 0
+#define dh_is_signed_tl  0
+#define dh_is_signed_int 1
+/* ??? This is highly specific to the host cpu.  There are even special
+   extension instructions that may be required, e.g. ia64's addp4.  But
+   for now we don't support any 64-bit targets with 32-bit pointers.  */
+#define dh_is_signed_ptr 0
+#define dh_is_signed_env dh_is_signed_ptr
+#define dh_is_signed(t) dh_is_signed_##t
+
+#define dh_sizemask(t, n) \
+  sizemask |= dh_is_64bit(t) << (n*2); \
+  sizemask |= dh_is_signed(t) << (n*2+1)
+
 #define dh_arg(t, n) \
   args[n - 1] = glue(GET_TCGV_, dh_alias(t))(glue(arg, n)); \
-  sizemask |= dh_is_64bit(t) << n
+  dh_sizemask(t, n)
 
 #define dh_arg_decl(t, n) glue(TCGv_, dh_alias(t)) glue(arg, n)
 
@@ -138,8 +158,8 @@ static inline void glue(gen_helper_, 
name)(dh_retvar_decl0(ret)) \
 static inline void glue(gen_helper_, name)(dh_retvar_decl(ret) dh_arg_decl(t1, 
1)) \
 { \
   TCGArg args[1]; \
-  int sizemask; \
-  sizemask = dh_is_64bit(ret); \
+  int sizemask = 0; \
+  dh_sizemask(ret, 0); \
   dh_arg(t1, 1); \
   tcg_gen_helperN(HELPER(name), flags, sizemask, dh_retvar(ret), 1, args); \
 }
@@ -149,8 +169,8 @@ static inline void glue(gen_helper_, 
name)(dh_retvar_decl(ret) dh_arg_decl(t1, 1
 dh_arg_decl(t2, 2)) \
 { \
   TCGArg args[2]; \
-  int sizemask; \
-  sizemask = dh_is_64bit(ret); \
+  int sizemask = 0; \
+  dh_sizemask(ret, 0); \
   dh_arg(t1, 1); \
   dh_arg(t2, 2); \
   tcg_gen_helperN(HELPER(name), flags, sizemask, dh_retvar(ret), 2, args); \
@@ -161,8 +181,8 @@ static inline void glue(gen_helper_, 
name)(dh_retvar_decl(ret) dh_arg_decl(t1, 1
 dh_arg_decl(t2, 2), dh_arg_decl(t3, 3)) \
 { \
   TCGArg args[3]; \
-  int sizemask; \
-  sizemask = dh_is_64bit(ret); \
+  int sizemask = 0; \
+  dh_sizemask(ret, 0); \
   dh_arg(t1, 1); \
   dh_arg(t2, 2); \
   dh_arg(t3, 3); \
@@ -174,8 +194,8 @@ static inline void glue(gen_helper_, 
name)(dh_retvar_decl(ret) dh_arg_decl(t1, 1
 dh_arg_decl(t2, 2), dh_arg_decl(t3, 3), dh_arg_decl(t4, 4)) \
 { \
   TCGArg args[4]; \
-  int sizemask; \
-  sizemask = dh_is_64bit(ret); \
+  int sizemask = 0; \
+  dh_sizemask(ret, 0); \
   dh_arg(t1, 1); \
   dh_arg(t2, 2); \
   dh_arg(t3, 3); \
diff --git a/target-i386/ops_sse_header.h b/target-i386/ops_sse_header.h
index a0a6361..8d4b2b7 100644
--- a/target-i386/ops_sse_header.h
+++ b/target-i386/ops_sse_header.h
@@ -30,6 +30,9 @@
 #define dh_ctype_Reg Reg *
 #define dh_ctype_XMMReg XMMReg *
 #define dh_ctype_MMXReg MMXReg *
+#define dh_is_signed_Reg dh_is_signed_ptr
+#define dh_is_signed_XMMReg dh_is_signed_ptr
+#define dh_is_signed_MMXReg dh_is_signed_ptr
 
 DEF_HELPER_2(glue(psrlw, SUFFIX), void, Reg, Reg)
 DEF_HELPER_2(glue(psraw, SUFFIX), void, Reg, Reg)
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 5cf6cd4..c025a2f 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -95,6 +95,7 @@ DEF_HELPER_3(fsel, i64, i64, i64, i64)
 
 #define dh_alias_avr ptr
 #define dh_ctype_avr ppc_avr_t *
+#define dh_is_signed_avr dh_is_signed_ptr
 
 DEF_HELPER_3(vaddubm, void, avr, avr, avr)
 DEF_HELPER_3(vadduhm, void, avr, avr, avr)
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 451f1f5..4e45cf3 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -97,6 +97,8 @@ typedef enum TCGReg {
 #define TCG_TARGET_STACK_ALIGN 8
 #define TCG_TARGET_CALL_STACK_OFFSET   0
 
+#define TCG_TARGET_EXTEND_ARGS 1
+
 enum {
 /* Note: must be synced with dyngen-exec.h */
 TCG_AREG0 = TCG_REG_R10,
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index bafac2b..fbafa89 100644
--- a/tcg/tcg-op.h
+++ b/t

[Qemu-devel] cg14

2010-05-27 Thread Artyom Tarasenko
2010/5/27 Bob Breuer :
> Artyom Tarasenko wrote:
>> Was going to put some more empty slots into SS-10/20 (VSIMMs, SX)
>> after we are done with SS-5 (due to technical limitations I can switch
>> access from one real SS model to another one once a few days only).
>>
> I have a partial implementation of the SS-20 VSIMM (cg14) that I've been
> working on.  With the Sun firmware, I have working text console, color
> boot logo, and programmable video resolutions up to 1600x1280.

Great news! This would allow qemu booting NeXTStep! Are you planning
to submit the patches any time soon?


-- 
Regards,
Artyom Tarasenko

solaris/sparc under qemu blog: http://tyom.blogspot.com/



[Qemu-devel] [PATCH 44/62] tcg-s390: Tidy user qemu_ld/st.

2010-05-27 Thread Richard Henderson
Create a tcg_prepare_user_ldst to prep the host address to
be used to implement the guest memory operation.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   33 +
 1 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 000a646..fa089ab 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -1047,6 +1047,17 @@ static void tcg_finish_qemu_ldst(TCGContext* s, uint16_t 
*label2_ptr)
 *(label2_ptr + 1) = ((unsigned long)s->code_ptr -
  (unsigned long)label2_ptr) >> 1;
 }
+#else
+static void tcg_prepare_user_ldst(TCGContext *s, TCGReg *addr_reg,
+  TCGReg *index_reg, tcg_target_long *disp)
+{
+*index_reg = 0;
+*disp = 0;
+if (TARGET_LONG_BITS == 32) {
+tgen_ext32u(s, TCG_TMP0, *addr_reg);
+*addr_reg = TCG_TMP0;
+}
+}
 #endif /* CONFIG_SOFTMMU */
 
 /* load data with address translation (if applicable)
@@ -1057,6 +1068,9 @@ static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* 
args, int opc)
 #if defined(CONFIG_SOFTMMU)
 int mem_index;
 uint16_t *label2_ptr;
+#else
+TCGReg index_reg;
+tcg_target_long disp;
 #endif
 
 data_reg = *args++;
@@ -1072,12 +1086,8 @@ static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* 
args, int opc)
 
 tcg_finish_qemu_ldst(s, label2_ptr);
 #else
-if (TARGET_LONG_BITS == 32) {
-tgen_ext32u(s, TCG_TMP0, addr_reg);
-tcg_out_qemu_ld_direct(s, opc, data_reg, TCG_TMP0, 0, 0);
-} else {
-tcg_out_qemu_ld_direct(s, opc, data_reg, addr_reg, 0, 0);
-}
+tcg_prepare_user_ldst(s, &addr_reg, &index_reg, &disp);
+tcg_out_qemu_ld_direct(s, opc, data_reg, addr_reg, index_reg, disp);
 #endif
 }
 
@@ -1087,6 +1097,9 @@ static void tcg_out_qemu_st(TCGContext* s, const TCGArg* 
args, int opc)
 #if defined(CONFIG_SOFTMMU)
 int mem_index;
 uint16_t *label2_ptr;
+#else
+TCGReg index_reg;
+tcg_target_long disp;
 #endif
 
 data_reg = *args++;
@@ -1102,12 +1115,8 @@ static void tcg_out_qemu_st(TCGContext* s, const TCGArg* 
args, int opc)
 
 tcg_finish_qemu_ldst(s, label2_ptr);
 #else
-if (TARGET_LONG_BITS == 32) {
-tgen_ext32u(s, TCG_TMP0, addr_reg);
-tcg_out_qemu_st_direct(s, opc, data_reg, TCG_TMP0, 0, 0);
-} else {
-tcg_out_qemu_st_direct(s, opc, data_reg, addr_reg, 0, 0);
-}
+tcg_prepare_user_ldst(s, &addr_reg, &index_reg, &disp);
+tcg_out_qemu_st_direct(s, opc, data_reg, addr_reg, index_reg, disp);
 #endif
 }
 
-- 
1.7.0.1




[Qemu-devel] [PATCH 40/62] tcg-s390: Tidy goto_tb.

2010-05-27 Thread Richard Henderson
Invent tcg_out_ld_abs, using LOAD RELATIVE instructions, and use it.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   34 ++
 1 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index eb57e24..627f7b7 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -48,12 +48,14 @@ typedef enum S390Opcode {
 RIL_AGFI= 0xc208,
 RIL_BRASL   = 0xc005,
 RIL_BRCL= 0xc004,
-RIL_LARL= 0xc000,
 RIL_IIHF= 0xc008,
 RIL_IILF= 0xc009,
+RIL_LARL= 0xc000,
 RIL_LGFI= 0xc001,
+RIL_LGRL= 0xc408,
 RIL_LLIHF   = 0xc00e,
 RIL_LLILF   = 0xc00f,
+RIL_LRL = 0xc40d,
 RIL_MSFI= 0xc201,
 RIL_MSGFI   = 0xc200,
 RIL_NIHF= 0xc00a,
@@ -531,6 +533,24 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, 
TCGReg data,
 }
 }
 
+/* load data from an absolute host address */
+static void tcg_out_ld_abs(TCGContext *s, TCGType type, TCGReg dest, void *abs)
+{
+tcg_target_long addr = (tcg_target_long)abs;
+tcg_target_long disp = (addr - (tcg_target_long)s->code_ptr) >> 1;
+
+if (disp == (int32_t)disp) {
+if (type == TCG_TYPE_I32) {
+tcg_out_insn(s, RIL, LRL, dest, disp);
+} else {
+tcg_out_insn(s, RIL, LGRL, dest, disp);
+}
+} else {
+tcg_out_movi(s, TCG_TYPE_PTR, dest, addr & ~0x);
+tcg_out_ld(s, type, dest, dest, addr & 0x);
+}
+}
+
 static inline void tgen_ext8s(TCGContext *s, TCGReg dest, TCGReg src)
 {
 tcg_out_insn(s, RRE, LGBR, dest, src);
@@ -1105,18 +1125,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 if (s->tb_jmp_offset) {
 tcg_abort();
 } else {
-tcg_target_long off = ((tcg_target_long)(s->tb_next + args[0]) -
-   (tcg_target_long)s->code_ptr) >> 1;
-if (off == (int32_t)off) {
-/* load address relative to PC */
-tcg_out_insn(s, RIL, LARL, TCG_TMP0, off);
-} else {
-/* too far for larl */
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0,
- (tcg_target_long)(s->tb_next + args[0]));
-}
 /* load address stored at s->tb_next + args[0] */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP0, TCG_TMP0, 0);
+tcg_out_ld_abs(s, TCG_TYPE_PTR, TCG_TMP0, s->tb_next + args[0]);
 /* and go there */
 tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, TCG_TMP0);
 }
-- 
1.7.0.1




[Qemu-devel] [PATCH 58/62] tcg-s390: Use COMPARE AND BRANCH instructions.

2010-05-27 Thread Richard Henderson
These instructions are available with the general-instructions-extension
facility.  Use them if available.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |  102 +---
 1 files changed, 95 insertions(+), 7 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 5af8bc9..4e3fb8b 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -114,6 +114,15 @@ typedef enum S390Opcode {
 RI_OILH = 0xa50a,
 RI_OILL = 0xa50b,
 
+RIE_CGIJ= 0xec7c,
+RIE_CGRJ= 0xec64,
+RIE_CIJ = 0xec7e,
+RIE_CLGRJ   = 0xec65,
+RIE_CLIJ= 0xec7f,
+RIE_CLGIJ   = 0xec7d,
+RIE_CLRJ= 0xec77,
+RIE_CRJ = 0xec76,
+
 RRE_AGR = 0xb908,
 RRE_CGR = 0xb920,
 RRE_CLGR= 0xb921,
@@ -1100,6 +1109,7 @@ static void tgen64_xori(TCGContext *s, TCGReg dest, 
tcg_target_ulong val)
 static int tgen_cmp(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
 TCGArg c2, int c2const)
 {
+_Bool is_unsigned = (c > TCG_COND_GT);
 if (c2const) {
 if (c2 == 0) {
 if (type == TCG_TYPE_I32) {
@@ -1109,15 +1119,13 @@ static int tgen_cmp(TCGContext *s, TCGType type, 
TCGCond c, TCGReg r1,
 }
 return tcg_cond_to_ltr_cond[c];
 } else {
-if (c > TCG_COND_GT) {
-/* unsigned */
+if (is_unsigned) {
 if (type == TCG_TYPE_I32) {
 tcg_out_insn(s, RIL, CLFI, r1, c2);
 } else {
 tcg_out_insn(s, RIL, CLGFI, r1, c2);
 }
 } else {
-/* signed */
 if (type == TCG_TYPE_I32) {
 tcg_out_insn(s, RIL, CFI, r1, c2);
 } else {
@@ -1126,15 +1134,13 @@ static int tgen_cmp(TCGContext *s, TCGType type, 
TCGCond c, TCGReg r1,
 }
 }
 } else {
-if (c > TCG_COND_GT) {
-/* unsigned */
+if (is_unsigned) {
 if (type == TCG_TYPE_I32) {
 tcg_out_insn(s, RR, CLR, r1, c2);
 } else {
 tcg_out_insn(s, RRE, CLGR, r1, c2);
 }
 } else {
-/* signed */
 if (type == TCG_TYPE_I32) {
 tcg_out_insn(s, RR, CR, r1, c2);
 } else {
@@ -1185,10 +1191,92 @@ static void tgen_branch(TCGContext *s, int cc, int 
labelno)
 }
 }
 
+static void tgen_compare_branch(TCGContext *s, S390Opcode opc, int cc,
+TCGReg r1, TCGReg r2, int labelno)
+{
+TCGLabel* l = &s->labels[labelno];
+tcg_target_long off;
+
+if (l->has_value) {
+off = (l->u.value - (tcg_target_long)s->code_ptr) >> 1;
+} else {
+/* We need to keep the offset unchanged for retranslation.  */
+off = ((int16_t *)s->code_ptr)[1];
+tcg_out_reloc(s, s->code_ptr + 2, R_390_PC16DBL, labelno, -2);
+}
+
+tcg_out16(s, (opc & 0xff00) | (r1 << 4) | r2);
+tcg_out16(s, off);
+tcg_out16(s, cc << 12 | (opc & 0xff));
+}
+
+static void tgen_compare_imm_branch(TCGContext *s, S390Opcode opc, int cc,
+TCGReg r1, int i2, int labelno)
+{
+TCGLabel* l = &s->labels[labelno];
+tcg_target_long off;
+
+if (l->has_value) {
+off = (l->u.value - (tcg_target_long)s->code_ptr) >> 1;
+} else {
+/* We need to keep the offset unchanged for retranslation.  */
+off = ((int16_t *)s->code_ptr)[1];
+tcg_out_reloc(s, s->code_ptr + 2, R_390_PC16DBL, labelno, -2);
+}
+
+tcg_out16(s, (opc & 0xff00) | (r1 << 4) | cc);
+tcg_out16(s, off);
+tcg_out16(s, (i2 << 8) | (opc & 0xff));
+}
+
 static void tgen_brcond(TCGContext *s, TCGType type, TCGCond c,
 TCGReg r1, TCGArg c2, int c2const, int labelno)
 {
-int cc = tgen_cmp(s, type, c, r1, c2, c2const);
+int cc;
+
+if (facilities & FACILITY_GEN_INST_EXT) {
+_Bool is_unsigned = (c > TCG_COND_GT);
+_Bool in_range;
+S390Opcode opc;
+
+cc = tcg_cond_to_s390_cond[c];
+
+if (!c2const) {
+opc = (type == TCG_TYPE_I32
+   ? (is_unsigned ? RIE_CLRJ : RIE_CRJ)
+   : (is_unsigned ? RIE_CLGRJ : RIE_CGRJ));
+tgen_compare_branch(s, opc, cc, r1, c2, labelno);
+return;
+}
+
+/* COMPARE IMMEDIATE AND BRANCH RELATIVE has an 8-bit immediate field.
+   If the immediate we've been given does not fit that range, we'll
+   fall back to separate compare and branch instructions using the
+   larger comparison range afforded by COMPARE IMMEDIATE.  */
+if (type == TCG_TYPE_I32) {
+if (is_unsigned) {
+opc = RIE_CLIJ;
+in_range = (uint32_t)c2 == (uint8_t)c2;
+} else {
+opc = RIE_CIJ;
+in_range = (int32_

[Qemu-devel] [PATCH 42/62] tcg-s390: Rearrange qemu_ld/st to avoid register copy.

2010-05-27 Thread Richard Henderson
Split out qemu_ld/st_direct with full address components.
Avoid copy from addr_reg to R2 for 64-bit guests.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |  270 ++---
 1 files changed, 145 insertions(+), 125 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 627f7b7..5d2efaa 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -846,14 +846,123 @@ static void tgen_calli(TCGContext *s, tcg_target_long 
dest)
 }
 }
 
+static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data,
+   TCGReg base, TCGReg index, int disp)
+{
+#ifdef TARGET_WORDS_BIGENDIAN
+const int bswap = 0;
+#else
+const int bswap = 1;
+#endif
+switch (opc) {
+case LD_UINT8:
+tcg_out_insn(s, RXY, LLGC, data, base, index, disp);
+break;
+case LD_INT8:
+tcg_out_insn(s, RXY, LGB, data, base, index, disp);
+break;
+case LD_UINT16:
+if (bswap) {
+/* swapped unsigned halfword load with upper bits zeroed */
+tcg_out_insn(s, RXY, LRVH, data, base, index, disp);
+tgen_ext16u(s, data, data);
+} else {
+tcg_out_insn(s, RXY, LLGH, data, base, index, disp);
+}
+break;
+case LD_INT16:
+if (bswap) {
+/* swapped sign-extended halfword load */
+tcg_out_insn(s, RXY, LRVH, data, base, index, disp);
+tgen_ext16s(s, data, data);
+} else {
+tcg_out_insn(s, RXY, LGH, data, base, index, disp);
+}
+break;
+case LD_UINT32:
+if (bswap) {
+/* swapped unsigned int load with upper bits zeroed */
+tcg_out_insn(s, RXY, LRV, data, base, index, disp);
+tgen_ext32u(s, data, data);
+} else {
+tcg_out_insn(s, RXY, LLGF, data, base, index, disp);
+}
+break;
+case LD_INT32:
+if (bswap) {
+/* swapped sign-extended int load */
+tcg_out_insn(s, RXY, LRV, data, base, index, disp);
+tgen_ext32s(s, data, data);
+} else {
+tcg_out_insn(s, RXY, LGF, data, base, index, disp);
+}
+break;
+case LD_UINT64:
+if (bswap) {
+tcg_out_insn(s, RXY, LRVG, data, base, index, disp);
+} else {
+tcg_out_insn(s, RXY, LG, data, base, index, disp);
+}
+break;
+default:
+tcg_abort();
+}
+}
+
+static void tcg_out_qemu_st_direct(TCGContext *s, int opc, TCGReg data,
+   TCGReg base, TCGReg index, int disp)
+{
+#ifdef TARGET_WORDS_BIGENDIAN
+const int bswap = 0;
+#else
+const int bswap = 1;
+#endif
+switch (opc) {
+case LD_UINT8:
+if (disp >= 0 && disp < 0x1000) {
+tcg_out_insn(s, RX, STC, data, base, index, disp);
+} else {
+tcg_out_insn(s, RXY, STCY, data, base, index, disp);
+}
+break;
+case LD_UINT16:
+if (bswap) {
+tcg_out_insn(s, RXY, STRVH, data, base, index, disp);
+} else if (disp >= 0 && disp < 0x1000) {
+tcg_out_insn(s, RX, STH, data, base, index, disp);
+} else {
+tcg_out_insn(s, RXY, STHY, data, base, index, disp);
+}
+break;
+case LD_UINT32:
+if (bswap) {
+tcg_out_insn(s, RXY, STRV, data, base, index, disp);
+} else if (disp >= 0 && disp < 0x1000) {
+tcg_out_insn(s, RX, ST, data, base, index, disp);
+} else {
+tcg_out_insn(s, RXY, STY, data, base, index, disp);
+}
+break;
+case LD_UINT64:
+if (bswap) {
+tcg_out_insn(s, RXY, STRVG, data, base, index, disp);
+} else {
+tcg_out_insn(s, RXY, STG, data, base, index, disp);
+}
+break;
+default:
+tcg_abort();
+}
+}
+
 #if defined(CONFIG_SOFTMMU)
-static void tcg_prepare_qemu_ldst(TCGContext* s, int data_reg, int addr_reg,
-  int mem_index, int opc,
+static void tcg_prepare_qemu_ldst(TCGContext* s, TCGReg data_reg,
+  TCGReg addr_reg, int mem_index, int opc,
   uint16_t **label2_ptr_p, int is_store)
-  {
-int arg0 = TCG_REG_R2;
-int arg1 = TCG_REG_R3;
-int arg2 = TCG_REG_R4;
+{
+const TCGReg arg0 = TCG_REG_R2;
+const TCGReg arg1 = TCG_REG_R3;
+const TCGReg arg2 = TCG_REG_R4;
 int s_bits;
 uint16_t *label1_ptr;
 
@@ -947,13 +1056,6 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
  - offsetof(CPUTLBEntry, addr_read));
 }
 
-#if TARGET_LONG_BITS == 32
-/* zero upper 32 bits */
-tcg_out_insn(s, RRE, LLGFR, arg0, addr_reg);
-#else
-/* just copy */
-tcg_out_mov(s, arg0, addr_reg);
-#endif
 tcg_out_insn(s, RRE, AGR, arg0, arg1);

[Qemu-devel] [PATCH 36/62] tcg-s390: Icache flush is a no-op.

2010-05-27 Thread Richard Henderson
Before gcc 4.2, __builtin___clear_cache doesn't exist, and
afterward the gcc s390 backend implements it as nothing.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.h |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 0af4d38..fae8ed7 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -95,9 +95,4 @@ enum {
 
 static inline void flush_icache_range(unsigned long start, unsigned long stop)
 {
-#if QEMU_GNUC_PREREQ(4, 1)
-__builtin___clear_cache((char *) start, (char *) stop);
-#else
-#error not implemented
-#endif
 }
-- 
1.7.0.1




[Qemu-devel] [PATCH 57/62] tcg-s390: Use the COMPARE IMMEDIATE instrucions for compares.

2010-05-27 Thread Richard Henderson
These instructions are available with extended-immediate facility.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   44 ++--
 1 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index edae6a8..5af8bc9 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -70,6 +70,10 @@ typedef enum S390Opcode {
 RIL_ALGFI   = 0xc20a,
 RIL_BRASL   = 0xc005,
 RIL_BRCL= 0xc004,
+RIL_CFI = 0xc20d,
+RIL_CGFI= 0xc20c,
+RIL_CLFI= 0xc20f,
+RIL_CLGFI   = 0xc20e,
 RIL_IIHF= 0xc008,
 RIL_IILF= 0xc009,
 RIL_LARL= 0xc000,
@@ -527,7 +531,29 @@ static int tcg_match_xori(int ct, tcg_target_long val)
 
 static int tcg_match_cmpi(int ct, tcg_target_long val)
 {
-return (val == 0);
+if (facilities & FACILITY_EXT_IMM) {
+/* The COMPARE IMMEDIATE instruction is available.  */
+if (ct & TCG_CT_CONST_32) {
+/* We have a 32-bit immediate and can compare against anything.  */
+return 1;
+} else {
+/* ??? We have no insight here into whether the comparison is
+   signed or unsigned.  The COMPARE IMMEDIATE insn uses a 32-bit
+   signed immediate, and the COMPARE LOGICAL IMMEDIATE insn uses
+   a 32-bit unsigned immediate.  If we were to use the (semi)
+   obvious "val == (int32_t)val" we would be enabling unsigned
+   comparisons vs very large numbers.  The only solution is to
+   take the intersection of the ranges.  */
+/* ??? Another possible solution is to simply lie and allow all
+   constants here and force the out-of-range values into a temp
+   register in tgen_cmp when we have knowledge of the actual
+   comparison code in use.  */
+return val >= 0 && val <= 0x7fff;
+}
+} else {
+/* Only the LOAD AND TEST instruction is available.  */
+return val == 0;
+}
 }
 
 /* Test if a constant matches the constraint. */
@@ -1083,7 +1109,21 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond 
c, TCGReg r1,
 }
 return tcg_cond_to_ltr_cond[c];
 } else {
-tcg_abort();
+if (c > TCG_COND_GT) {
+/* unsigned */
+if (type == TCG_TYPE_I32) {
+tcg_out_insn(s, RIL, CLFI, r1, c2);
+} else {
+tcg_out_insn(s, RIL, CLGFI, r1, c2);
+}
+} else {
+/* signed */
+if (type == TCG_TYPE_I32) {
+tcg_out_insn(s, RIL, CFI, r1, c2);
+} else {
+tcg_out_insn(s, RIL, CGFI, r1, c2);
+}
+}
 }
 } else {
 if (c > TCG_COND_GT) {
-- 
1.7.0.1




[Qemu-devel] [PATCH 38/62] tcg-s390: Tidy regset initialization; use R14 as temporary.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   26 --
 1 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index ee2e879..a26c963 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -36,7 +36,7 @@
 #define TCG_CT_CONST_S320x100
 #define TCG_CT_CONST_N320x200
 
-#define TCG_TMP0TCG_REG_R13
+#define TCG_TMP0TCG_REG_R14
 
 
 /* All of the following instructions are prefixed with their instruction
@@ -1630,24 +1630,22 @@ void tcg_target_init(TCGContext *s)
 
 tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0x);
 tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0x);
-tcg_regset_set32(tcg_target_call_clobber_regs, 0,
- (1 << TCG_REG_R0) |
- (1 << TCG_REG_R1) |
- (1 << TCG_REG_R2) |
- (1 << TCG_REG_R3) |
- (1 << TCG_REG_R4) |
- (1 << TCG_REG_R5) |
- (1 << TCG_REG_R14)); /* link register */
+
+tcg_regset_clear(tcg_target_call_clobber_regs);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R0);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R1);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R2);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R3);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R4);
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R5);
+/* The return register can be considered call-clobbered.  */
+tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R14);
 
 tcg_regset_clear(s->reserved_regs);
-/* frequently used as a temporary */
 tcg_regset_set_reg(s->reserved_regs, TCG_TMP0);
-/* another temporary */
-tcg_regset_set_reg(s->reserved_regs, TCG_REG_R12);
 /* XXX many insns can't be used with R0, so we better avoid it for now */
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R0);
-/* The stack pointer.  */
-tcg_regset_set_reg(s->reserved_regs, TCG_REG_R15);
+tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
 
 tcg_add_target_add_op_defs(s390_op_defs);
 }
-- 
1.7.0.1




[Qemu-devel] [PATCH 47/62] tcg-s390: Conditionalize general-instruction-extension insns.

2010-05-27 Thread Richard Henderson
The LOAD RELATIVE and MULTIPLY SINGLE IMMEDIATE instructions
are currently the only insns from that extension.  It's easy
enough to test for that facility and avoid emitting them.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   51 +---
 1 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 4807bca..aecabf9 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -35,6 +35,7 @@
 
 #define TCG_CT_CONST_S32   0x100
 #define TCG_CT_CONST_N32   0x200
+#define TCG_CT_CONST_MULI  0x400
 
 #define TCG_TMP0   TCG_REG_R14
 
@@ -344,6 +345,10 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 ct->ct &= ~TCG_CT_REG;
 ct->ct |= TCG_CT_CONST_N32;
 break;
+case 'K':
+ct->ct &= ~TCG_CT_REG;
+ct->ct |= TCG_CT_CONST_MULI;
+break;
 default:
 break;
 }
@@ -365,6 +370,16 @@ static inline int tcg_target_const_match(tcg_target_long 
val,
 return val == (int32_t)val;
 } else if (ct & TCG_CT_CONST_N32) {
 return -val == (int32_t)-val;
+} else if (ct & TCG_CT_CONST_MULI) {
+/* Immediates that may be used with multiply.  If we have the
+   general-instruction-extensions, then we have MULTIPLY SINGLE
+   IMMEDIATE with a signed 32-bit, otherwise we have only 
+   MULTIPLY HALFWORD IMMEDIATE, with a signed 16-bit.  */
+if (facilities & FACILITY_GEN_INST_EXT) {
+return val == (int32_t)val;
+} else {
+return val == (int16_t)val;
+}
 }
 
 return 0;
@@ -559,18 +574,21 @@ static inline void tcg_out_st(TCGContext *s, TCGType 
type, TCGReg data,
 static void tcg_out_ld_abs(TCGContext *s, TCGType type, TCGReg dest, void *abs)
 {
 tcg_target_long addr = (tcg_target_long)abs;
-tcg_target_long disp = (addr - (tcg_target_long)s->code_ptr) >> 1;
 
-if (disp == (int32_t)disp) {
-if (type == TCG_TYPE_I32) {
-tcg_out_insn(s, RIL, LRL, dest, disp);
-} else {
-tcg_out_insn(s, RIL, LGRL, dest, disp);
+if (facilities & FACILITY_GEN_INST_EXT) {
+tcg_target_long disp = (addr - (tcg_target_long)s->code_ptr) >> 1;
+if (disp == (int32_t)disp) {
+if (type == TCG_TYPE_I32) {
+tcg_out_insn(s, RIL, LRL, dest, disp);
+} else {
+tcg_out_insn(s, RIL, LGRL, dest, disp);
+}
+return;
 }
-} else {
-tcg_out_movi(s, TCG_TYPE_PTR, dest, addr & ~0x);
-tcg_out_ld(s, type, dest, dest, addr & 0x);
 }
+
+tcg_out_movi(s, TCG_TYPE_PTR, dest, addr & ~0x);
+tcg_out_ld(s, type, dest, dest, addr & 0x);
 }
 
 static inline void tgen_ext8s(TCGContext *s, TCGReg dest, TCGReg src)
@@ -1322,7 +1340,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 
 case INDEX_op_mul_i32:
 if (const_args[2]) {
-if (args[2] == (int16_t)args[2]) {
+if ((int32_t)args[2] == (int16_t)args[2]) {
 tcg_out_insn(s, RI, MHI, args[0], args[2]);
 } else {
 tcg_out_insn(s, RIL, MSFI, args[0], args[2]);
@@ -1573,7 +1591,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 
 { INDEX_op_add_i32, { "r", "0", "ri" } },
 { INDEX_op_sub_i32, { "r", "0", "ri" } },
-{ INDEX_op_mul_i32, { "r", "0", "ri" } },
+{ INDEX_op_mul_i32, { "r", "0", "rK" } },
 
 { INDEX_op_div2_i32, { "b", "a", "0", "1", "r" } },
 { INDEX_op_divu2_i32, { "b", "a", "0", "1", "r" } },
@@ -1634,7 +1652,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 
 { INDEX_op_add_i64, { "r", "0", "rI" } },
 { INDEX_op_sub_i64, { "r", "0", "rJ" } },
-{ INDEX_op_mul_i64, { "r", "0", "rI" } },
+{ INDEX_op_mul_i64, { "r", "0", "rK" } },
 
 { INDEX_op_div2_i64, { "b", "a", "0", "1", "r" } },
 { INDEX_op_divu2_i64, { "b", "a", "0", "1", "r" } },
@@ -1752,9 +1770,7 @@ static void query_facilities(void)
 
 sigaction(SIGILL, &sa_old, NULL);
 
-/* ??? The translator currently uses all of these extensions
-   unconditionally.  This list could be pruned back to just
-   z/Arch and long displacement with some work.  */
+/* The translator currently uses these extensions unconditionally.  */
 fail = 0;
 if ((facilities & FACILITY_ZARCH_ACTIVE) == 0) {
 fprintf(stderr, "TCG: z/Arch facility is required\n");
@@ -1768,11 +1784,6 @@ static void query_facilities(void)
 fprintf(stderr, "TCG: extended-immediate facility is required\n");
 fail = 1;
 }
-if ((facilities & FACILITY_GEN_INST_EXT) == 0) {
-fprintf(stderr, "TCG: general-instructions-extension "
-"facility is required\n");
-fail = 1;
-}
 if (fail) {
 exit(-1);
 }
-- 
1.7.0.1




[Qemu-devel] [PATCH 35/62] tcg-s390: Implement immediate XORs.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   45 +
 1 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 1bc9b4c..ec8c84d 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -57,6 +57,8 @@ typedef enum S390Opcode {
 RIL_NILF= 0xc00b,
 RIL_OIHF= 0xc00c,
 RIL_OILF= 0xc00d,
+RIL_XIHF= 0xc006,
+RIL_XILF= 0xc007,
 
 RI_AGHI = 0xa70b,
 RI_AHI  = 0xa70a,
@@ -719,6 +721,33 @@ static void tgen64_ori(TCGContext *s, TCGReg dest, 
tcg_target_ulong val)
 tgen64_ori(s, dest, val & 0xull);
 }
 
+static void tgen64_xori(TCGContext *s, TCGReg dest, tcg_target_ulong val)
+{
+tcg_target_long sval = val;
+
+/* Zero-th, look for no-op.  */
+if (val == 0) {
+return;
+}
+
+/* First, look for 64-bit values for which it is better to load the
+   value first and perform the xor via registers.  This is true for
+   any 32-bit negative value, where the high 32-bits get flipped too.  */
+if (sval < 0 && sval == (int32_t)sval) {
+tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R13, sval);
+tcg_out_insn(s, RRE, XGR, dest, TCG_REG_R13);
+return;
+}
+
+/* Second, perform the xor by parts.  */
+if (val & 0x) {
+tcg_out_insn(s, RIL, XILF, dest, val);
+}
+if (val > 0x) {
+tcg_out_insn(s, RIL, XIHF, dest, val >> 32);
+}
+}
+
 static void tgen32_cmp(TCGContext *s, TCGCond c, TCGReg r1, TCGReg r2)
 {
 if (c > TCG_COND_GT) {
@@ -1202,7 +1231,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 case INDEX_op_xor_i32:
-tcg_out_insn(s, RR, XR, args[0], args[2]);
+if (const_args[2]) {
+tgen64_xori(s, args[0], args[2] & 0x);
+} else {
+tcg_out_insn(s, RR, XR, args[0], args[2]);
+}
 break;
 
 case INDEX_op_and_i64:
@@ -1220,7 +1253,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 case INDEX_op_xor_i64:
-tcg_out_insn(s, RRE, XGR, args[0], args[2]);
+if (const_args[2]) {
+tgen64_xori(s, args[0], args[2]);
+} else {
+tcg_out_insn(s, RRE, XGR, args[0], args[2]);
+}
 break;
 
 case INDEX_op_neg_i32:
@@ -1490,7 +1527,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 
 { INDEX_op_and_i32, { "r", "0", "ri" } },
 { INDEX_op_or_i32, { "r", "0", "ri" } },
-{ INDEX_op_xor_i32, { "r", "0", "r" } },
+{ INDEX_op_xor_i32, { "r", "0", "ri" } },
 { INDEX_op_neg_i32, { "r", "r" } },
 
 { INDEX_op_shl_i32, { "r", "0", "Ri" } },
@@ -1551,7 +1588,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 
 { INDEX_op_and_i64, { "r", "0", "ri" } },
 { INDEX_op_or_i64, { "r", "0", "ri" } },
-{ INDEX_op_xor_i64, { "r", "0", "r" } },
+{ INDEX_op_xor_i64, { "r", "0", "ri" } },
 { INDEX_op_neg_i64, { "r", "r" } },
 
 { INDEX_op_shl_i64, { "r", "r", "Ri" } },
-- 
1.7.0.1




[Qemu-devel] [PATCH 55/62] tcg-s390: Use 16-bit branches for forward jumps.

2010-05-27 Thread Richard Henderson
Translation blocks are never big enough to require 32-bit branches.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   27 ++-
 1 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 0dc71e2..697c5e4 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -33,6 +33,11 @@
 do { } while (0)
 #endif
 
+/* ??? The translation blocks produced by TCG are generally small enough to
+   be entirely reachable with a 16-bit displacement.  Leaving the option for
+   a 32-bit displacement here Just In Case.  */
+#define USE_LONG_BRANCHES 0
+
 #define TCG_CT_CONST_320x0100
 #define TCG_CT_CONST_NEG   0x0200
 #define TCG_CT_CONST_ADDI  0x0400
@@ -295,14 +300,22 @@ static uint8_t *tb_ret_addr;
 static uint64_t facilities;
 
 static void patch_reloc(uint8_t *code_ptr, int type,
-tcg_target_long value, tcg_target_long addend)
+tcg_target_long value, tcg_target_long addend)
 {
-uint32_t *code_ptr_32 = (uint32_t*)code_ptr;
-tcg_target_long code_ptr_tlong = (tcg_target_long)code_ptr;
+tcg_target_long code_ptr_tl = (tcg_target_long)code_ptr;
+tcg_target_long pcrel2;
 
+/* ??? Not the usual definition of "addend".  */
+pcrel2 = (value - (code_ptr_tl + addend)) >> 1;
+
 switch (type) {
+case R_390_PC16DBL:
+assert(pcrel2 == (int16_t)pcrel2);
+*(int16_t *)code_ptr = pcrel2;
+break;
 case R_390_PC32DBL:
-*code_ptr_32 = (value - (code_ptr_tlong + addend)) >> 1;
+assert(pcrel2 == (int32_t)pcrel2);
+*(int32_t *)code_ptr = pcrel2;
 break;
 default:
 tcg_abort();
@@ -1081,10 +1094,14 @@ static void tgen_branch(TCGContext *s, int cc, int 
labelno)
 TCGLabel* l = &s->labels[labelno];
 if (l->has_value) {
 tgen_gotoi(s, cc, l->u.value);
-} else {
+} else if (USE_LONG_BRANCHES) {
 tcg_out16(s, RIL_BRCL | (cc << 4));
 tcg_out_reloc(s, s->code_ptr, R_390_PC32DBL, labelno, -2);
 s->code_ptr += 4;
+} else {
+tcg_out16(s, RI_BRC | (cc << 4));
+tcg_out_reloc(s, s->code_ptr, R_390_PC16DBL, labelno, -2);
+s->code_ptr += 2;
 }
 }
 
-- 
1.7.0.1




[Qemu-devel] [PATCH 33/62] tcg-s390: Implement immediate ORs.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   63 +---
 1 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 2fd58bd..2a9d64d 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -53,6 +53,8 @@ typedef enum S390Opcode {
 RIL_LLILF   = 0xc00f,
 RIL_NIHF= 0xc00a,
 RIL_NILF= 0xc00b,
+RIL_OIHF= 0xc00c,
+RIL_OILF= 0xc00d,
 
 RI_AGHI = 0xa70b,
 RI_AHI  = 0xa70a,
@@ -70,6 +72,10 @@ typedef enum S390Opcode {
 RI_NIHL = 0xa505,
 RI_NILH = 0xa506,
 RI_NILL = 0xa507,
+RI_OIHH = 0xa508,
+RI_OIHL = 0xa509,
+RI_OILH = 0xa50a,
+RI_OILL = 0xa50b,
 
 RRE_AGR = 0xb908,
 RRE_CGR = 0xb920,
@@ -668,6 +674,47 @@ static void tgen64_andi(TCGContext *s, TCGReg dest, 
tcg_target_ulong val)
 tgen64_andi(s, dest, val | 0xull);
 }
 
+static void tgen64_ori(TCGContext *s, TCGReg dest, tcg_target_ulong val)
+{
+static const S390Opcode oi_insns[4] = {
+RI_OILL, RI_OILH, RI_OIHL, RI_OIHH
+};
+static const S390Opcode nif_insns[2] = {
+RIL_OILF, RIL_OIHF
+};
+
+int i;
+
+/* Zero-th, look for no-op.  */
+if (val == 0) {
+return;
+}
+
+/* First, try all 32-bit insns that can perform it in one go.  */
+for (i = 0; i < 4; i++) {
+tcg_target_ulong mask = (0xull << i*16);
+if ((val & mask) != 0 && (val & ~mask) == 0) {
+tcg_out_insn_RI(s, oi_insns[i], dest, val >> i*16);
+return;
+}
+}
+
+/* Second, try all 48-bit insns that can perform it in one go.  */
+for (i = 0; i < 2; i++) {
+tcg_target_ulong mask = (0xull << i*32);
+if ((val & mask) != 0 && (val & ~mask) == 0) {
+tcg_out_insn_RIL(s, nif_insns[i], dest, val >> i*32);
+return;
+}
+}
+
+/* Last, perform the OR via sequential modifications to the
+   high and low parts.  Do this via recursion to handle 16-bit
+   vs 32-bit masks in each half.  */
+tgen64_ori(s, dest, val & 0xull);
+tgen64_ori(s, dest, val & 0xull);
+}
+
 static void tgen32_cmp(TCGContext *s, TCGCond c, TCGReg r1, TCGReg r2)
 {
 if (c > TCG_COND_GT) {
@@ -1144,7 +1191,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 case INDEX_op_or_i32:
-tcg_out_insn(s, RR, OR, args[0], args[2]);
+if (const_args[2]) {
+tgen64_ori(s, args[0], args[2] & 0x);
+} else {
+tcg_out_insn(s, RR, OR, args[0], args[2]);
+}
 break;
 case INDEX_op_xor_i32:
 tcg_out_insn(s, RR, XR, args[0], args[2]);
@@ -1158,7 +1209,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 case INDEX_op_or_i64:
-tcg_out_insn(s, RRE, OGR, args[0], args[2]);
+if (const_args[2]) {
+tgen64_ori(s, args[0], args[2]);
+} else {
+tcg_out_insn(s, RRE, OGR, args[0], args[2]);
+}
 break;
 case INDEX_op_xor_i64:
 tcg_out_insn(s, RRE, XGR, args[0], args[2]);
@@ -1414,7 +1469,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_divu2_i32, { "b", "a", "0", "1", "r" } },
 
 { INDEX_op_and_i32, { "r", "0", "ri" } },
-{ INDEX_op_or_i32, { "r", "0", "r" } },
+{ INDEX_op_or_i32, { "r", "0", "ri" } },
 { INDEX_op_xor_i32, { "r", "0", "r" } },
 { INDEX_op_neg_i32, { "r", "r" } },
 
@@ -1475,7 +1530,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_divu2_i64, { "b", "a", "0", "1", "r" } },
 
 { INDEX_op_and_i64, { "r", "0", "ri" } },
-{ INDEX_op_or_i64, { "r", "0", "r" } },
+{ INDEX_op_or_i64, { "r", "0", "ri" } },
 { INDEX_op_xor_i64, { "r", "0", "r" } },
 { INDEX_op_neg_i64, { "r", "r" } },
 
-- 
1.7.0.1




[Qemu-devel] [PATCH 34/62] tcg-s390: Implement immediate MULs.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   28 
 1 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 2a9d64d..1bc9b4c 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -51,6 +51,8 @@ typedef enum S390Opcode {
 RIL_LGFI= 0xc001,
 RIL_LLIHF   = 0xc00e,
 RIL_LLILF   = 0xc00f,
+RIL_MSFI= 0xc201,
+RIL_MSGFI   = 0xc200,
 RIL_NIHF= 0xc00a,
 RIL_NILF= 0xc00b,
 RIL_OIHF= 0xc00c,
@@ -68,6 +70,8 @@ typedef enum S390Opcode {
 RI_LLIHL= 0xa50d,
 RI_LLILH= 0xa50e,
 RI_LLILL= 0xa50f,
+RI_MGHI = 0xa70d,
+RI_MHI  = 0xa70c,
 RI_NIHH = 0xa504,
 RI_NIHL = 0xa505,
 RI_NILH = 0xa506,
@@ -1227,10 +1231,26 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 break;
 
 case INDEX_op_mul_i32:
-tcg_out_insn(s, RRE, MSR, args[0], args[2]);
+if (const_args[2]) {
+if (args[2] == (int16_t)args[2]) {
+tcg_out_insn(s, RI, MHI, args[0], args[2]);
+} else {
+tcg_out_insn(s, RIL, MSFI, args[0], args[2]);
+}
+} else {
+tcg_out_insn(s, RRE, MSR, args[0], args[2]);
+}
 break;
 case INDEX_op_mul_i64:
-tcg_out_insn(s, RRE, MSGR, args[0], args[2]);
+if (const_args[2]) {
+if (args[2] == (int16_t)args[2]) {
+tcg_out_insn(s, RI, MGHI, args[0], args[2]);
+} else {
+tcg_out_insn(s, RIL, MSGFI, args[0], args[2]);
+}
+} else {
+tcg_out_insn(s, RRE, MSGR, args[0], args[2]);
+}
 break;
 
 case INDEX_op_div2_i32:
@@ -1463,7 +1483,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 
 { INDEX_op_add_i32, { "r", "0", "ri" } },
 { INDEX_op_sub_i32, { "r", "0", "ri" } },
-{ INDEX_op_mul_i32, { "r", "0", "r" } },
+{ INDEX_op_mul_i32, { "r", "0", "ri" } },
 
 { INDEX_op_div2_i32, { "b", "a", "0", "1", "r" } },
 { INDEX_op_divu2_i32, { "b", "a", "0", "1", "r" } },
@@ -1524,7 +1544,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 
 { INDEX_op_add_i64, { "r", "0", "rI" } },
 { INDEX_op_sub_i64, { "r", "0", "rJ" } },
-{ INDEX_op_mul_i64, { "r", "0", "r" } },
+{ INDEX_op_mul_i64, { "r", "0", "rI" } },
 
 { INDEX_op_div2_i64, { "b", "a", "0", "1", "r" } },
 { INDEX_op_divu2_i64, { "b", "a", "0", "1", "r" } },
-- 
1.7.0.1




[Qemu-devel] [PATCH 51/62] tcg-s390: Conditionalize AND IMMEDIATE instructions.

2010-05-27 Thread Richard Henderson
The 32-bit immediate AND instructions are in the extended-immediate
facility.  Use these only if present.

At the same time, pull the logic to load immediates into registers
into a constraint letter for TCG.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |  209 
 1 files changed, 122 insertions(+), 87 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 8a7c9ae..359f6d1 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -33,10 +33,11 @@
 do { } while (0)
 #endif
 
-#define TCG_CT_CONST_320x100
-#define TCG_CT_CONST_NEG   0x200
-#define TCG_CT_CONST_ADDI  0x400
-#define TCG_CT_CONST_MULI  0x800
+#define TCG_CT_CONST_320x0100
+#define TCG_CT_CONST_NEG   0x0200
+#define TCG_CT_CONST_ADDI  0x0400
+#define TCG_CT_CONST_MULI  0x0800
+#define TCG_CT_CONST_ANDI  0x1000
 
 #define TCG_TMP0   TCG_REG_R14
 
@@ -353,6 +354,10 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 ct->ct &= ~TCG_CT_REG;
 ct->ct |= TCG_CT_CONST_MULI;
 break;
+case 'A':
+ct->ct &= ~TCG_CT_REG;
+ct->ct |= TCG_CT_CONST_ANDI;
+break;
 default:
 break;
 }
@@ -362,9 +367,66 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 return 0;
 }
 
+/* Immediates to be used with logical AND.  This is an optimization only,
+   since a full 64-bit immediate AND can always be performed with 4 sequential
+   NI[LH][LH] instructions.  What we're looking for is immediates that we
+   can load efficiently, and the immediate load plus the reg-reg AND is
+   smaller than the sequential NI's.  */
+
+static int tcg_match_andi(int ct, tcg_target_ulong val)
+{
+int i;
+
+if (facilities & FACILITY_EXT_IMM) {
+if (ct & TCG_CT_CONST_32) {
+/* All 32-bit ANDs can be performed with 1 48-bit insn.  */
+return 1;
+}
+
+/* Zero-extensions.  */
+if (val == 0xff || val == 0x || val == 0x) {
+return 1;
+}
+} else {
+if (ct & TCG_CT_CONST_32) {
+val = (uint32_t)val;
+} else if (val == 0x) {
+return 1;
+}
+}
+
+/* Try all 32-bit insns that can perform it in one go.  */
+for (i = 0; i < 4; i++) {
+tcg_target_ulong mask = ~(0xull << i*16);
+if ((val & mask) == mask) {
+return 1;
+}
+}
+
+/* Look for 16-bit values performing the mask.  These are better
+   to load with LLI[LH][LH].  */
+for (i = 0; i < 4; i++) {
+tcg_target_ulong mask = 0xull << i*16;
+if ((val & mask) == val) {
+return 0;
+}
+}
+
+/* Look for 32-bit values performing the 64-bit mask.  These
+   are better to load with LLI[LH]F, or if extended immediates
+   not available, with a pair of LLI insns.  */
+if ((ct & TCG_CT_CONST_32) == 0) {
+if (val <= 0x || (val & 0x) == 0) {
+return 0;
+}
+}
+
+return 1;
+}
+
 /* Test if a constant matches the constraint. */
-static inline int tcg_target_const_match(tcg_target_long val,
- const TCGArgConstraint *arg_ct)
+static int tcg_target_const_match(tcg_target_long val,
+  const TCGArgConstraint *arg_ct)
 {
 int ct = arg_ct->ct;
 
@@ -401,6 +463,8 @@ static inline int tcg_target_const_match(tcg_target_long 
val,
 } else {
 return val == (int16_t)val;
 }
+} else if (ct & TCG_CT_CONST_ANDI) {
+return tcg_match_andi(ct, val);
 }
 
 return 0;
@@ -764,37 +828,6 @@ static void tgen64_addi(TCGContext *s, TCGReg dest, 
int64_t val)
 
 }
 
-static void tgen32_andi(TCGContext *s, TCGReg dest, uint32_t val)
-{
-/* Zero-th, look for no-op.  */
-if (val == -1) {
-return;
-}
-
-/* First, look for the zero-extensions.  */
-if (val == 0xff) {
-tgen_ext8u(s, dest, dest);
-return;
-}
-if (val == 0x) {
-tgen_ext16u(s, dest, dest);
-return;
-}
-
-/* Second, try all 32-bit insns that can perform it in one go.  */
-if ((val & 0x) == 0x) {
-tcg_out_insn(s, RI, NILL, dest, val);
-return;
-}
-if ((val & 0x) == 0x) {
-tcg_out_insn(s, RI, NILH, dest, val >> 16);
-return;
-}
-
-/* Lastly, perform the entire operation with a 48-bit insn.  */
-tcg_out_insn(s, RIL, NILF, dest, val);
-}
-
 static void tgen64_andi(TCGContext *s, TCGReg dest, tcg_target_ulong val)
 {
 static const S390Opcode ni_insns[4] = {
@@ -806,69 +839,61 @@ static void tgen64_andi(TCGContext *s, TCGReg dest, 
tcg_target_ulong val)
 
 int i;
 
-/* Zero-th, look for no-op.  */
+/* Look for no-op.  */
 if (val == -1) {
 return;
 }
 
-/* 

[Qemu-devel] [PATCH 46/62] tcg-s390: Query instruction extensions that are installed.

2010-05-27 Thread Richard Henderson
Verify that we have all the instruction extensions that we generate.
Future patches can tailor code generation to the set of instructions
that are present.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |  122 +
 1 files changed, 122 insertions(+), 0 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 4a3235c..4807bca 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -279,6 +279,17 @@ static void *qemu_st_helpers[4] = {
 
 static uint8_t *tb_ret_addr;
 
+/* A list of relevant facilities used by this translator.  Some of these
+   are required for proper operation, and these are checked at startup.  */
+
+#define FACILITY_ZARCH (1ULL << (63 - 1))
+#define FACILITY_ZARCH_ACTIVE  (1ULL << (63 - 2))
+#define FACILITY_LONG_DISP (1ULL << (63 - 18))
+#define FACILITY_EXT_IMM   (1ULL << (63 - 21))
+#define FACILITY_GEN_INST_EXT  (1ULL << (63 - 34))
+
+static uint64_t facilities;
+
 static void patch_reloc(uint8_t *code_ptr, int type,
 tcg_target_long value, tcg_target_long addend)
 {
@@ -1658,6 +1669,115 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { -1 },
 };
 
+/* ??? Linux kernels provide an AUXV entry AT_HWCAP that provides most of
+   this information.  However, getting at that entry is not easy this far
+   away from main.  Our options are: start searching from environ, but 
+   that fails as soon as someone does a setenv in between.  Read the data
+   from /proc/self/auxv.  Or do the probing ourselves.  The only thing 
+   extra that AT_HWCAP gives us is HWCAP_S390_HIGH_GPRS, which indicates
+   that the kernel saves all 64-bits of the registers around traps while
+   in 31-bit mode.  But this is true of all "recent" kernels (ought to dig
+   back and see from when this might not be true).  */
+
+#include 
+
+static volatile sig_atomic_t got_sigill;
+
+static void sigill_handler(int sig)
+{
+got_sigill = 1;
+}
+
+static void query_facilities(void)
+{
+struct sigaction sa_old, sa_new;
+register int r0 __asm__("0");
+register void *r1 __asm__("1");
+int fail;
+
+memset(&sa_new, 0, sizeof(sa_new));
+sa_new.sa_handler = sigill_handler;
+sigaction(SIGILL, &sa_new, &sa_old);
+
+/* First, try STORE FACILITY LIST EXTENDED.  If this is present, then
+   we need not do any more probing.  Unfortunately, this itself is an
+   extension and the original STORE FACILITY LIST instruction is
+   kernel-only, storing its results at absolute address 200.  */
+/* stfle 0(%r1) */
+r1 = &facilities;
+asm volatile(".word 0xb2b0,0x1000"
+ : "=r"(r0) : "0"(0), "r"(r1) : "memory", "cc");
+
+if (got_sigill) {
+/* STORE FACILITY EXTENDED is not available.  Probe for one of each
+   kind of instruction that we're interested in.  */
+/* ??? Possibly some of these are in practice never present unless
+   the store-facility-extended facility is also present.  But since
+   that isn't documented it's just better to probe for each.  */
+   
+/* Test for z/Architecture.  Required even in 31-bit mode.  */
+got_sigill = 0;
+/* agr %r0,%r0 */
+asm volatile(".word 0xb908,0x" : "=r"(r0) : : "cc");
+if (!got_sigill) {
+facilities |= FACILITY_ZARCH | FACILITY_ZARCH_ACTIVE;
+}
+
+/* Test for long displacement.  */
+got_sigill = 0;
+/* ly %r0,0(%r1) */
+r1 = &facilities;
+asm volatile(".word 0xe300,0x1000,0x0058"
+ : "=r"(r0) : "r"(r1) : "cc");
+if (!got_sigill) {
+facilities |= FACILITY_LONG_DISP;
+}
+
+/* Test for extended immediates.  */
+got_sigill = 0;
+/* afi %r0,0 */
+asm volatile(".word 0xc209,0x,0x" : : : "cc");
+if (!got_sigill) {
+facilities |= FACILITY_EXT_IMM;
+}
+
+/* Test for general-instructions-extension.  */
+got_sigill = 0;
+/* msfi %r0,1 */
+asm volatile(".word 0xc201,0x,0x0001");
+if (!got_sigill) {
+facilities |= FACILITY_GEN_INST_EXT;
+}
+}
+
+sigaction(SIGILL, &sa_old, NULL);
+
+/* ??? The translator currently uses all of these extensions
+   unconditionally.  This list could be pruned back to just
+   z/Arch and long displacement with some work.  */
+fail = 0;
+if ((facilities & FACILITY_ZARCH_ACTIVE) == 0) {
+fprintf(stderr, "TCG: z/Arch facility is required\n");
+fail = 1;
+}
+if ((facilities & FACILITY_LONG_DISP) == 0) {
+fprintf(stderr, "TCG: long-displacement facility is required\n");
+fail = 1;
+}
+if ((facilities & FACILITY_EXT_IMM) == 0) {
+fprintf(stderr, "TCG: extended-immediate facility is required\n");
+fail = 1;
+}
+if ((facilities & FACILITY_GEN_INST_EXT) == 0) {
+

[Qemu-devel] [PATCH 25/62] tcg-s390: Re-implement tcg_out_movi.

2010-05-27 Thread Richard Henderson
Make better use of the LOAD HALFWORD IMMEDIATE, LOAD IMMEDIATE,
and INSERT IMMEDIATE instruction groups.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   90 -
 1 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 4c2acca..fe83415 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -44,12 +44,23 @@ typedef enum S390Opcode {
 RIL_BRASL   = 0xc005,
 RIL_BRCL= 0xc004,
 RIL_LARL= 0xc000,
+RIL_IIHF= 0xc008,
+RIL_IILF= 0xc009,
+RIL_LGFI= 0xc001,
+RIL_LLIHF   = 0xc00e,
+RIL_LLILF   = 0xc00f,
 
 RI_AGHI = 0xa70b,
 RI_AHI  = 0xa70a,
 RI_BRC  = 0xa704,
+RI_IIHH = 0xa500,
+RI_IIHL = 0xa501,
 RI_IILH = 0xa502,
+RI_IILL = 0xa503,
 RI_LGHI = 0xa709,
+RI_LLIHH= 0xa50c,
+RI_LLIHL= 0xa50d,
+RI_LLILH= 0xa50e,
 RI_LLILL= 0xa50f,
 
 RRE_AGR = 0xb908,
@@ -363,24 +374,71 @@ static inline void tcg_out_mov(TCGContext *s, int ret, 
int arg)
 }
 
 /* load a register with an immediate value */
-static inline void tcg_out_movi(TCGContext *s, TCGType type,
-int ret, tcg_target_long arg)
+static void tcg_out_movi(TCGContext *s, TCGType type,
+ TCGReg ret, tcg_target_long sval)
 {
-if (arg >= -0x8000 && arg < 0x8000) { /* signed immediate load */
-tcg_out_insn(s, RI, LGHI, ret, arg);
-} else if (!(arg & 0xUL)) {
-tcg_out_insn(s, RI, LLILL, ret, arg);
-} else if (!(arg & 0xUL) || type == TCG_TYPE_I32) {
-tcg_out_insn(s, RI, LLILL, ret, arg);
-tcg_out_insn(s, RI, IILH, ret, arg >> 16);
+static const S390Opcode lli_insns[4] = {
+RI_LLILL, RI_LLILH, RI_LLIHL, RI_LLIHH
+};
+
+tcg_target_ulong uval = sval;
+int i;
+
+if (type == TCG_TYPE_I32) {
+uval = (uint32_t)sval;
+sval = (int32_t)sval;
+}
+
+/* First, try all 32-bit insns that can load it in one go.  */
+if (sval >= -0x8000 && sval < 0x8000) {
+tcg_out_insn(s, RI, LGHI, ret, sval);
+return;
+}
+
+for (i = 0; i < 4; i++) {
+tcg_target_long mask = 0xull << i*16;
+if ((uval & mask) != 0 && (uval & ~mask) == 0) {
+tcg_out_insn_RI(s, lli_insns[i], ret, uval >> i*16);
+return;
+}
+}
+
+/* Second, try all 48-bit insns that can load it in one go.  */
+if (sval == (int32_t)sval) {
+tcg_out_insn(s, RIL, LGFI, ret, sval);
+return;
+}
+if (uval <= 0x) {
+tcg_out_insn(s, RIL, LLILF, ret, uval);
+return;
+}
+if ((uval & 0x) == 0) {
+tcg_out_insn(s, RIL, LLIHF, ret, uval >> 32);
+return;
+}
+
+/* If we get here, both the high and low parts have non-zero bits.  */
+
+/* Try for PC-relative address load.  */
+if ((sval & 1) == 0) {
+intptr_t off = (sval - (intptr_t)s->code_ptr) >> 1;
+if (off == (int32_t)off) {
+tcg_out_insn(s, RIL, LARL, ret, off);
+return;
+}
+}
+
+/* Recurse to load the lower 32-bits.  */
+tcg_out_movi(s, TCG_TYPE_I32, ret, sval);
+
+/* Insert data into the high 32-bits.  */
+uval >>= 32;
+if (uval < 0x1) {
+tcg_out_insn(s, RI, IIHL, ret, uval);
+} else if ((uval & 0x) == 0) {
+tcg_out_insn(s, RI, IIHH, ret, uval >> 16);
 } else {
-/* branch over constant and store its address in R13 */
-tcg_out_insn(s, RIL, BRASL, TCG_REG_R13, (6 + 8) >> 1);
-/* 64-bit constant */
-tcg_out32(s, arg >> 32);
-tcg_out32(s, arg);
-/* load constant to ret */
-tcg_out_insn(s, RXY, LG, ret, TCG_REG_R13, 0, 0);
+tcg_out_insn(s, RIL, IIHF, ret, uval);
 }
 }
 
-- 
1.7.0.1




[Qemu-devel] [PATCH 30/62] tcg-s390: Tidy unimplemented opcodes.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   25 ++---
 1 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 97ac66d..cf70cc2 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -911,11 +911,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 }
 break;
 
-case INDEX_op_jmp:
-/* XXX */
-tcg_abort();
-break;
-
 case INDEX_op_ld8u_i32:
 tcg_out_ldst(s, 0, RXY_LLC, args[0], args[1], args[2]);
 break;
@@ -977,16 +972,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_st(s, TCG_TYPE_I64, args[0], args[1], args[2]);
 break;
 
-case INDEX_op_mov_i32:
-/* XXX */
-tcg_abort();
-break;
-
-case INDEX_op_movi_i32:
-/* XXX */
-tcg_abort();
-break;
-
 case INDEX_op_add_i32:
 if (const_args[2]) {
 tcg_out_insn(s, RI, AHI, args[0], args[2]);
@@ -1234,6 +1219,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tcg_out_qemu_st(s, args, LD_UINT64);
 break;
 
+case INDEX_op_mov_i32:
+case INDEX_op_mov_i64:
+case INDEX_op_movi_i32:
+case INDEX_op_movi_i64:
+/* These are always emitted by TCG directly.  */
+case INDEX_op_jmp:
+/* This one is obsolete and never emitted.  */
+tcg_abort();
+break;
+
 default:
 fprintf(stderr,"unimplemented opc 0x%x\n",opc);
 tcg_abort();
-- 
1.7.0.1




[Qemu-devel] [PATCH 29/62] tcg-s390: Use LOAD COMPLIMENT for negate.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   10 ++
 1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index f85063e..97ac66d 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -1028,16 +1028,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 break;
 
 case INDEX_op_neg_i32:
-/* FIXME: optimize args[0] != args[1] case */
-tcg_out_insn(s, RR, LR, 13, args[1]);
-tcg_out_movi(s, TCG_TYPE_I32, args[0], 0);
-tcg_out_insn(s, RR, SR, args[0], 13);
+tcg_out_insn(s, RR, LCR, args[0], args[1]);
 break;
 case INDEX_op_neg_i64:
-/* FIXME: optimize args[0] != args[1] case */
-tcg_out_mov(s, TCG_REG_R13, args[1]);
-tcg_out_movi(s, TCG_TYPE_I64, args[0], 0);
-tcg_out_insn(s, RRE, SGR, args[0], TCG_REG_R13);
+tcg_out_insn(s, RRE, LCGR, args[0], args[1]);
 break;
 
 case INDEX_op_mul_i32:
-- 
1.7.0.1




[Qemu-devel] [PATCH 26/62] tcg-s390: Implement sign and zero-extension operations.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   94 +---
 tcg/s390/tcg-target.h |   20 +-
 2 files changed, 90 insertions(+), 24 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index fe83415..3f7d08d 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -70,10 +70,14 @@ typedef enum S390Opcode {
 RRE_DLR = 0xb997,
 RRE_DSGFR   = 0xb91d,
 RRE_DSGR= 0xb90d,
+RRE_LGBR= 0xb906,
 RRE_LCGR= 0xb903,
 RRE_LGFR= 0xb914,
+RRE_LGHR= 0xb907,
 RRE_LGR = 0xb904,
+RRE_LLGCR   = 0xb984,
 RRE_LLGFR   = 0xb916,
+RRE_LLGHR   = 0xb985,
 RRE_MSGR= 0xb90c,
 RRE_MSR = 0xb252,
 RRE_NGR = 0xb980,
@@ -491,6 +495,36 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, 
TCGReg data,
 }
 }
 
+static inline void tgen_ext8s(TCGContext *s, TCGReg dest, TCGReg src)
+{
+tcg_out_insn(s, RRE, LGBR, dest, src);
+}
+
+static inline void tgen_ext8u(TCGContext *s, TCGReg dest, TCGReg src)
+{
+tcg_out_insn(s, RRE, LLGCR, dest, src);
+}
+
+static inline void tgen_ext16s(TCGContext *s, TCGReg dest, TCGReg src)
+{
+tcg_out_insn(s, RRE, LGHR, dest, src);
+}
+
+static inline void tgen_ext16u(TCGContext *s, TCGReg dest, TCGReg src)
+{
+tcg_out_insn(s, RRE, LLGHR, dest, src);
+}
+
+static inline void tgen_ext32s(TCGContext *s, TCGReg dest, TCGReg src)
+{
+tcg_out_insn(s, RRE, LGFR, dest, src);
+}
+
+static inline void tgen_ext32u(TCGContext *s, TCGReg dest, TCGReg src)
+{
+tcg_out_insn(s, RRE, LLGFR, dest, src);
+}
+
 static void tgen32_cmp(TCGContext *s, TCGCond c, TCGReg r1, TCGReg r2)
 {
 if (c > TCG_COND_GT) {
@@ -581,8 +615,8 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 }
 
 #if TARGET_LONG_BITS == 32
-tcg_out_insn(s, RRE, LLGFR, arg1, addr_reg);
-tcg_out_insn(s, RRE, LLGFR, arg0, addr_reg);
+tgen_ext32u(s, arg1, addr_reg);
+tgen_ext32u(s, arg0, addr_reg);
 #else
 tcg_out_mov(s, arg1, addr_reg);
 tcg_out_mov(s, arg0, addr_reg);
@@ -619,7 +653,7 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 
 /* call load/store helper */
 #if TARGET_LONG_BITS == 32
-tcg_out_insn(s, RRE, LLGFR, arg0, addr_reg);
+tgen_ext32u(s, arg0, addr_reg);
 #else
 tcg_out_mov(s, arg0, addr_reg);
 #endif
@@ -635,15 +669,13 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 /* sign extension */
 switch (opc) {
 case LD_INT8:
-tcg_out_insn(s, RSY, SLLG, data_reg, arg0, SH64_REG_NONE, 56);
-tcg_out_insn(s, RSY, SRAG, data_reg, data_reg, SH64_REG_NONE, 56);
+tgen_ext8s(s, data_reg, arg0);
 break;
 case LD_INT16:
-tcg_out_insn(s, RSY, SLLG, data_reg, arg0, SH64_REG_NONE, 48);
-tcg_out_insn(s, RSY, SRAG, data_reg, data_reg, SH64_REG_NONE, 48);
+tgen_ext16s(s, data_reg, arg0);
 break;
 case LD_INT32:
-tcg_out_insn(s, RRE, LGFR, data_reg, arg0);
+tgen_ext32s(s, data_reg, arg0);
 break;
 default:
 /* unsigned -> just copy */
@@ -741,8 +773,7 @@ static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* 
args, int opc)
 #else
 /* swapped unsigned halfword load with upper bits zeroed */
 tcg_out_insn(s, RXY, LRVH, data_reg, arg0, 0, 0);
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, 0xL);
-tcg_out_insn(s, RRE, NGR, data_reg, 13);
+tgen_ext16u(s, data_reg, data_reg);
 #endif
 break;
 case LD_INT16:
@@ -751,8 +782,7 @@ static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* 
args, int opc)
 #else
 /* swapped sign-extended halfword load */
 tcg_out_insn(s, RXY, LRVH, data_reg, arg0, 0, 0);
-tcg_out_insn(s, RSY, SLLG, data_reg, data_reg, SH64_REG_NONE, 48);
-tcg_out_insn(s, RSY, SRAG, data_reg, data_reg, SH64_REG_NONE, 48);
+tgen_ext16s(s, data_reg, data_reg);
 #endif
 break;
 case LD_UINT32:
@@ -761,7 +791,7 @@ static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* 
args, int opc)
 #else
 /* swapped unsigned int load with upper bits zeroed */
 tcg_out_insn(s, RXY, LRV, data_reg, arg0, 0, 0);
-tcg_out_insn(s, RRE, LLGFR, data_reg, data_reg);
+tgen_ext32u(s, data_reg, data_reg);
 #endif
 break;
 case LD_INT32:
@@ -770,7 +800,7 @@ static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* 
args, int opc)
 #else
 /* swapped sign-extended int load */
 tcg_out_insn(s, RXY, LRV, data_reg, arg0, 0, 0);
-tcg_out_insn(s, RRE, LGFR, data_reg, data_reg);
+tgen_ext32s(s, data_reg, data_reg);
 #endif
 break;
 case LD_UINT64:
@@ -1063,6 +1093,30 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 op = RSY_SRAG;
 goto do_s

[Qemu-devel] [PATCH 37/62] tcg-s390: Define TCG_TMP0.

2010-05-27 Thread Richard Henderson
Use a define for the temp register instead of hard-coding it.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   49 ++---
 1 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index ec8c84d..ee2e879 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -36,6 +36,9 @@
 #define TCG_CT_CONST_S320x100
 #define TCG_CT_CONST_N320x200
 
+#define TCG_TMP0TCG_REG_R13
+
+
 /* All of the following instructions are prefixed with their instruction
format, and are defined as 8- or 16-bit quantities, even when the two
halves of the 16-bit quantity may appear 32 bits apart in the insn.
@@ -491,7 +494,7 @@ static void tcg_out_ldst(TCGContext *s, S390Opcode opc_rx, 
S390Opcode opc_rxy,
 if (ofs < -0x8 || ofs >= 0x8) {
 /* Combine the low 16 bits of the offset with the actual load insn;
the high 48 bits must come from an immediate load.  */
-index = TCG_REG_R13;
+index = TCG_TMP0;
 tcg_out_movi(s, TCG_TYPE_PTR, index, ofs & ~0x);
 ofs &= 0x;
 }
@@ -658,8 +661,8 @@ static void tgen64_andi(TCGContext *s, TCGReg dest, 
tcg_target_ulong val)
 for (i = 0; i < 4; i++) {
 tcg_target_ulong mask = ~(0xull << i*16);
 if ((val & mask) == 0) {
-tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R13, val);
-tcg_out_insn(s, RRE, NGR, dest, TCG_REG_R13);
+tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, val);
+tcg_out_insn(s, RRE, NGR, dest, TCG_TMP0);
 return;
 }
 }
@@ -667,8 +670,8 @@ static void tgen64_andi(TCGContext *s, TCGReg dest, 
tcg_target_ulong val)
 for (i = 0; i < 2; i++) {
 tcg_target_ulong mask = ~(0xull << i*32);
 if ((val & mask) == 0) {
-tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R13, val);
-tcg_out_insn(s, RRE, NGR, dest, TCG_REG_R13);
+tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, val);
+tcg_out_insn(s, RRE, NGR, dest, TCG_TMP0);
 return;
 }
 }
@@ -734,8 +737,8 @@ static void tgen64_xori(TCGContext *s, TCGReg dest, 
tcg_target_ulong val)
value first and perform the xor via registers.  This is true for
any 32-bit negative value, where the high 32-bits get flipped too.  */
 if (sval < 0 && sval == (int32_t)sval) {
-tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R13, sval);
-tcg_out_insn(s, RRE, XGR, dest, TCG_REG_R13);
+tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, sval);
+tcg_out_insn(s, RRE, XGR, dest, TCG_TMP0);
 return;
 }
 
@@ -792,8 +795,8 @@ static void tgen_gotoi(TCGContext *s, int cc, 
tcg_target_long dest)
 } else if (off == (int32_t)off) {
 tcg_out_insn(s, RIL, BRCL, cc, off);
 } else {
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, dest);
-tcg_out_insn(s, RR, BCR, cc, TCG_REG_R13);
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0, dest);
+tcg_out_insn(s, RR, BCR, cc, TCG_TMP0);
 }
 }
 
@@ -815,8 +818,8 @@ static void tgen_calli(TCGContext *s, tcg_target_long dest)
 if (off == (int32_t)off) {
 tcg_out_insn(s, RIL, BRASL, TCG_REG_R14, off);
 } else {
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, dest);
-tcg_out_insn(s, RR, BASR, TCG_REG_R14, TCG_REG_R13);
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0, dest);
+tcg_out_insn(s, RR, BASR, TCG_REG_R14, TCG_TMP0);
 }
 }
 
@@ -852,13 +855,13 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 tgen64_andi(s, arg1, (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
 
 if (is_store) {
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13,
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0,
  offsetof(CPUState, tlb_table[mem_index][0].addr_write));
 } else {
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13,
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0,
  offsetof(CPUState, tlb_table[mem_index][0].addr_read));
 }
-tcg_out_insn(s, RRE, AGR, arg1, TCG_REG_R13);
+tcg_out_insn(s, RRE, AGR, arg1, TCG_TMP0);
 
 tcg_out_insn(s, RRE, AGR, arg1, TCG_AREG0);
 
@@ -1103,16 +1106,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
(tcg_target_long)s->code_ptr) >> 1;
 if (off == (int32_t)off) {
 /* load address relative to PC */
-tcg_out_insn(s, RIL, LARL, TCG_REG_R13, off);
+tcg_out_insn(s, RIL, LARL, TCG_TMP0, off);
 } else {
 /* too far for larl */
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13,
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0,
  (tcg_target_long)(s->tb_next + args[0]));
 }
 /* load address stored at s->tb_next + args[0

[Qemu-devel] [PATCH 22/62] tcg-s390: Tidy branches.

2010-05-27 Thread Richard Henderson
Add tgen_gotoi to implement conditional and unconditional direct
branches.  Add tgen_branch to implement branches to labels.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   96 -
 1 files changed, 55 insertions(+), 41 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 21ad1a3..f4dab1a 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -41,8 +41,9 @@
halves of the 16-bit quantity may appear 32 bits apart in the insn.
This makes it easy to copy the values from the tables in Appendix B.  */
 typedef enum S390Opcode {
-RIL_LARL= 0xc000,
 RIL_BRASL   = 0xc005,
+RIL_BRCL= 0xc004,
+RIL_LARL= 0xc000,
 
 RI_AGHI = 0xa70b,
 RI_AHI  = 0xa70a,
@@ -175,17 +176,27 @@ static const int tcg_target_call_oarg_regs[] = {
 
 /* signed/unsigned is handled by using COMPARE and COMPARE LOGICAL,
respectively */
+
+#define S390_CC_EQ  8
+#define S390_CC_LT  4
+#define S390_CC_GT  2
+#define S390_CC_OV  1
+#define S390_CC_NE  (S390_CC_LT | S390_CC_GT)
+#define S390_CC_LE  (S390_CC_LT | S390_CC_EQ)
+#define S390_CC_GE  (S390_CC_GT | S390_CC_EQ)
+#define S390_CC_ALWAYS  15
+
 static const uint8_t tcg_cond_to_s390_cond[10] = {
-[TCG_COND_EQ]  = 8,
-[TCG_COND_LT]  = 4,
-[TCG_COND_LTU] = 4,
-[TCG_COND_LE]  = 8 | 4,
-[TCG_COND_LEU] = 8 | 4,
-[TCG_COND_GT]  = 2,
-[TCG_COND_GTU] = 2,
-[TCG_COND_GE]  = 8 | 2,
-[TCG_COND_GEU] = 8 | 2,
-[TCG_COND_NE]  = 4 | 2 | 1,
+[TCG_COND_EQ]  = S390_CC_EQ,
+[TCG_COND_LT]  = S390_CC_LT,
+[TCG_COND_LTU] = S390_CC_LT,
+[TCG_COND_LE]  = S390_CC_LE,
+[TCG_COND_LEU] = S390_CC_LE,
+[TCG_COND_GT]  = S390_CC_GT,
+[TCG_COND_GTU] = S390_CC_GT,
+[TCG_COND_GE]  = S390_CC_GE,
+[TCG_COND_GEU] = S390_CC_GE,
+[TCG_COND_NE]  = S390_CC_NE,
 };
 
 #ifdef CONFIG_SOFTMMU
@@ -455,6 +466,31 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond c,
 tcg_out_movi(s, type, dest, 0);
 }
 
+static void tgen_gotoi(TCGContext *s, int cc, tcg_target_long dest)
+{
+tcg_target_long off = (dest - (tcg_target_long)s->code_ptr) >> 1;
+if (off > -0x8000 && off < 0x7fff) {
+tcg_out_insn(s, RI, BRC, cc, off);
+} else if (off == (int32_t)off) {
+tcg_out_insn(s, RIL, BRCL, cc, off);
+} else {
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, dest);
+tcg_out_insn(s, RR, BCR, cc, TCG_REG_R13);
+}
+}
+
+static void tgen_branch(TCGContext *s, int cc, int labelno)
+{
+TCGLabel* l = &s->labels[labelno];
+if (l->has_value) {
+tgen_gotoi(s, cc, l->u.value);
+} else {
+tcg_out16(s, RIL_BRCL | (cc << 4));
+tcg_out_reloc(s, s->code_ptr, R_390_PC32DBL, labelno, -2);
+s->code_ptr += 4;
+}
+}
+
 #if defined(CONFIG_SOFTMMU)
 static void tcg_prepare_qemu_ldst(TCGContext* s, int data_reg, int addr_reg,
   int mem_index, int opc,
@@ -507,7 +543,7 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 label1_ptr = (uint16_t*)s->code_ptr;
 
 /* je label1 (offset will be patched in later) */
-tcg_out_insn(s, RI, BRC, 8, 0);
+tcg_out_insn(s, RI, BRC, S390_CC_EQ, 0);
 
 /* call load/store helper */
 #if TARGET_LONG_BITS == 32
@@ -551,7 +587,7 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 /* jump to label2 (end) */
 *label2_ptr_p = (uint16_t*)s->code_ptr;
 
-tcg_out_insn(s, RI, BRC, 15, 0);
+tcg_out_insn(s, RI, BRC, S390_CC_ALWAYS, 0);
 
 /* this is label1, patch branch */
 *(label1_ptr + 1) = ((unsigned long)s->code_ptr -
@@ -734,16 +770,13 @@ static void tcg_out_qemu_st(TCGContext* s, const TCGArg* 
args, int opc)
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 const TCGArg *args, const int *const_args)
 {
-TCGLabel* l;
 S390Opcode op;
 
 switch (opc) {
 case INDEX_op_exit_tb:
 /* return value */
 tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R2, args[0]);
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, (unsigned long)tb_ret_addr);
-/* br %r13 */
-tcg_out_insn(s, RR, BCR, 15, TCG_REG_R13);
+tgen_gotoi(s, S390_CC_ALWAYS, (unsigned long)tb_ret_addr);
 break;
 
 case INDEX_op_goto_tb:
@@ -763,7 +796,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 /* load address stored at s->tb_next + args[0] */
 tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R13, TCG_REG_R13, 0);
 /* and go there */
-tcg_out_insn(s, RR, BASR, TCG_REG_R13, TCG_REG_R13);
+tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, TCG_REG_R13);
 }
 s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
 break;
@@ -971,16 +1004,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 goto do_shift64;
 
 case IND

[Qemu-devel] [PATCH 14/62] tcg-s390: Define tcg_target_reg_names.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 2f29728..e0a0e73 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -113,6 +113,12 @@
 #define S390_INS_MSR   0xb252
 #define S390_INS_LARL  0xc000
 
+#ifndef NDEBUG
+static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
+"%r0", "%r1", "%r2", "%r3", "%r4", "%r5", "%r6", "%r7",
+"%r8", "%r9", "%r10" "%r11" "%r12" "%r13" "%r14" "%r15"
+};
+#endif
 
 static const int tcg_target_reg_alloc_order[] = {
 TCG_REG_R6,
-- 
1.7.0.1




[Qemu-devel] [PATCH 32/62] tcg-s390: Implement immediate ANDs.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |  138 +
 1 files changed, 127 insertions(+), 11 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index caa2d0d..2fd58bd 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -51,6 +51,8 @@ typedef enum S390Opcode {
 RIL_LGFI= 0xc001,
 RIL_LLIHF   = 0xc00e,
 RIL_LLILF   = 0xc00f,
+RIL_NIHF= 0xc00a,
+RIL_NILF= 0xc00b,
 
 RI_AGHI = 0xa70b,
 RI_AHI  = 0xa70a,
@@ -64,6 +66,10 @@ typedef enum S390Opcode {
 RI_LLIHL= 0xa50d,
 RI_LLILH= 0xa50e,
 RI_LLILL= 0xa50f,
+RI_NIHH = 0xa504,
+RI_NIHL = 0xa505,
+RI_NILH = 0xa506,
+RI_NILL = 0xa507,
 
 RRE_AGR = 0xb908,
 RRE_CGR = 0xb920,
@@ -555,6 +561,113 @@ static inline void tgen64_addi(TCGContext *s, TCGReg 
dest, tcg_target_long val)
 }
 }
 
+static void tgen32_andi(TCGContext *s, TCGReg dest, uint32_t val)
+{
+/* Zero-th, look for no-op.  */
+if (val == -1) {
+return;
+}
+
+/* First, look for the zero-extensions.  */
+if (val == 0xff) {
+tgen_ext8u(s, dest, dest);
+return;
+}
+if (val == 0x) {
+tgen_ext16u(s, dest, dest);
+return;
+}
+
+/* Second, try all 32-bit insns that can perform it in one go.  */
+if ((val & 0x) == 0x) {
+tcg_out_insn(s, RI, NILL, dest, val);
+return;
+}
+if ((val & 0x) == 0x) {
+tcg_out_insn(s, RI, NILH, dest, val >> 16);
+return;
+}
+
+/* Lastly, perform the entire operation with a 48-bit insn.  */
+tcg_out_insn(s, RIL, NILF, dest, val);
+}
+
+static void tgen64_andi(TCGContext *s, TCGReg dest, tcg_target_ulong val)
+{
+static const S390Opcode ni_insns[4] = {
+RI_NILL, RI_NILH, RI_NIHL, RI_NIHH
+};
+static const S390Opcode nif_insns[2] = {
+RIL_NILF, RIL_NIHF
+};
+
+int i;
+
+/* Zero-th, look for no-op.  */
+if (val == -1) {
+return;
+}
+
+/* First, look for the zero-extensions.  */
+if (val == 0xff) {
+tgen_ext8u(s, dest, dest);
+return;
+}
+if (val == 0x) {
+tgen_ext16u(s, dest, dest);
+return;
+}
+if (val == 0x) {
+tgen_ext32u(s, dest, dest);
+return;
+}
+
+/* Second, try all 32-bit insns that can perform it in one go.  */
+for (i = 0; i < 4; i++) {
+tcg_target_ulong mask = ~(0xull << i*16);
+if ((val & mask) == mask) {
+tcg_out_insn_RI(s, ni_insns[i], dest, val >> i*16);
+return;
+}
+}
+
+/* Third, try all 48-bit insns that can perform it in one go.  */
+for (i = 0; i < 2; i++) {
+tcg_target_ulong mask = ~(0xull << i*32);
+if ((val & mask) == mask) {
+tcg_out_insn_RIL(s, nif_insns[i], dest, val >> i*32);
+return;
+}
+}
+
+/* Fourth, look for masks that can be loaded with one instruction
+   into a register.  This is slightly smaller than using two 48-bit
+   masks, as below.  */
+for (i = 0; i < 4; i++) {
+tcg_target_ulong mask = ~(0xull << i*16);
+if ((val & mask) == 0) {
+tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R13, val);
+tcg_out_insn(s, RRE, NGR, dest, TCG_REG_R13);
+return;
+}
+}
+
+for (i = 0; i < 2; i++) {
+tcg_target_ulong mask = ~(0xull << i*32);
+if ((val & mask) == 0) {
+tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R13, val);
+tcg_out_insn(s, RRE, NGR, dest, TCG_REG_R13);
+return;
+}
+}
+
+/* Last, perform the AND via sequential modifications to the
+   high and low parts.  Do this via recursion to handle 16-bit
+   vs 32-bit masks in each half.  */
+tgen64_andi(s, dest, val | 0xull);
+tgen64_andi(s, dest, val | 0xull);
+}
+
 static void tgen32_cmp(TCGContext *s, TCGCond c, TCGReg r1, TCGReg r2)
 {
 if (c > TCG_COND_GT) {
@@ -655,13 +768,8 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 tcg_out_sh64(s, RSY_SRLG, arg1, addr_reg, SH64_REG_NONE,
  TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
 
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13,
- TARGET_PAGE_MASK | ((1 << s_bits) - 1));
-tcg_out_insn(s, RRE, NGR, arg0, TCG_REG_R13);
-
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13,
- (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
-tcg_out_insn(s, RRE, NGR, arg1, TCG_REG_R13);
+tgen64_andi(s, arg0, TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+tgen64_andi(s, arg1, (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
 
 if (is_store) {
 tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13,
@@ -1029,7 +1137,11 @@ static inline void tcg_out_op(TCGContext *s, 

[Qemu-devel] [PATCH 24/62] tcg-s390: Implement div2.

2010-05-27 Thread Richard Henderson
The s390 divide instructions always produce both remainder and quotient.
Since TCG has no mechanism for allocating even+odd register pairs, force
the use of the R2/R3 register pair.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   44 ++--
 tcg/s390/tcg-target.h |4 ++--
 2 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 0bd4276..4c2acca 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -75,6 +75,7 @@ typedef enum S390Opcode {
 RR_BCR  = 0x07,
 RR_CLR  = 0x15,
 RR_CR   = 0x19,
+RR_DR   = 0x1d,
 RR_LCR  = 0x13,
 RR_LR   = 0x18,
 RR_NR   = 0x14,
@@ -258,6 +259,14 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 case 'R':/* not R0 */
 tcg_regset_reset_reg(ct->u.regs, TCG_REG_R0);
 break;
+case 'a':  /* force R2 for division */
+tcg_regset_clear(ct->u.regs);
+tcg_regset_set_reg(ct->u.regs, TCG_REG_R2);
+break;
+case 'b':  /* force R3 for division */
+tcg_regset_clear(ct->u.regs);
+tcg_regset_set_reg(ct->u.regs, TCG_REG_R3);
+break;
 case 'I':
 ct->ct &= ~TCG_CT_REG;
 ct->ct |= TCG_CT_CONST_S16;
@@ -946,16 +955,22 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tcg_out_insn(s, RRE, MSGR, args[0], args[2]);
 break;
 
-case INDEX_op_divu_i32:
-case INDEX_op_remu_i32:
-tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R12, 0);
-tcg_out_insn(s, RR, LR, TCG_REG_R13, args[1]);
-tcg_out_insn(s, RRE, DLR, TCG_REG_R12, args[2]);
-if (opc == INDEX_op_divu_i32) {
-  tcg_out_insn(s, RR, LR, args[0], TCG_REG_R13);/* quotient */
-} else {
-  tcg_out_insn(s, RR, LR, args[0], TCG_REG_R12);/* remainder */
-}
+case INDEX_op_div2_i32:
+tcg_out_insn(s, RR, DR, TCG_REG_R2, args[4]);
+break;
+case INDEX_op_divu2_i32:
+tcg_out_insn(s, RRE, DLR, TCG_REG_R2, args[4]);
+break;
+
+case INDEX_op_div2_i64:
+/* ??? We get an unnecessary sign-extension of the dividend
+   into R3 with this definition, but as we do in fact always
+   produce both quotient and remainder using INDEX_op_div_i64
+   instead requires jumping through even more hoops.  */
+tcg_out_insn(s, RRE, DSGR, TCG_REG_R2, args[4]);
+break;
+case INDEX_op_divu2_i64:
+tcg_out_insn(s, RRE, DLGR, TCG_REG_R2, args[4]);
 break;
 
 case INDEX_op_shl_i32:
@@ -1085,10 +1100,8 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_sub_i32, { "r", "0", "r" } },
 { INDEX_op_mul_i32, { "r", "0", "r" } },
 
-{ INDEX_op_div_i32, { "r", "r", "r" } },
-{ INDEX_op_divu_i32, { "r", "r", "r" } },
-{ INDEX_op_rem_i32, { "r", "r", "r" } },
-{ INDEX_op_remu_i32, { "r", "r", "r" } },
+{ INDEX_op_div2_i32, { "b", "a", "0", "1", "r" } },
+{ INDEX_op_divu2_i32, { "b", "a", "0", "1", "r" } },
 
 { INDEX_op_and_i32, { "r", "0", "r" } },
 { INDEX_op_or_i32, { "r", "0", "r" } },
@@ -1137,6 +1150,9 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_sub_i64, { "r", "0", "r" } },
 { INDEX_op_mul_i64, { "r", "0", "r" } },
 
+{ INDEX_op_div2_i64, { "b", "a", "0", "1", "r" } },
+{ INDEX_op_divu2_i64, { "b", "a", "0", "1", "r" } },
+
 { INDEX_op_and_i64, { "r", "0", "r" } },
 { INDEX_op_or_i64, { "r", "0", "r" } },
 { INDEX_op_xor_i64, { "r", "0", "r" } },
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index c81f886..b987a7e 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -48,7 +48,7 @@ typedef enum TCGReg {
 #define TCG_TARGET_NB_REGS 16
 
 /* optional instructions */
-#define TCG_TARGET_HAS_div_i32
+#define TCG_TARGET_HAS_div2_i32
 // #define TCG_TARGET_HAS_rot_i32
 // #define TCG_TARGET_HAS_ext8s_i32
 // #define TCG_TARGET_HAS_ext16s_i32
@@ -64,7 +64,7 @@ typedef enum TCGReg {
 // #define TCG_TARGET_HAS_nand_i32
 // #define TCG_TARGET_HAS_nor_i32
 
-// #define TCG_TARGET_HAS_div_i64
+#define TCG_TARGET_HAS_div2_i64
 // #define TCG_TARGET_HAS_rot_i64
 // #define TCG_TARGET_HAS_ext8s_i64
 // #define TCG_TARGET_HAS_ext16s_i64
-- 
1.7.0.1




[Qemu-devel] [PATCH 20/62] tcg-s390: Implement setcond.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   66 ++--
 1 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index f21a9ca..b150d1a 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -381,6 +381,42 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg 
data,
 }
 }
 
+static void tgen32_cmp(TCGContext *s, TCGCond c, TCGReg r1, TCGReg r2)
+{
+if (c > TCG_COND_GT) {
+/* unsigned */
+tcg_out_insn(s, RR, CLR, r1, r2);
+} else {
+/* signed */
+tcg_out_insn(s, RR, CR, r1, r2);
+}
+}
+
+static void tgen64_cmp(TCGContext *s, TCGCond c, TCGReg r1, TCGReg r2)
+{
+if (c > TCG_COND_GT) {
+/* unsigned */
+tcg_out_insn(s, RRE, CLGR, r1, r2);
+} else {
+/* signed */
+tcg_out_insn(s, RRE, CGR, r1, r2);
+}
+}
+
+static void tgen_setcond(TCGContext *s, TCGType type, TCGCond c,
+ TCGReg dest, TCGReg r1, TCGReg r2)
+{
+if (type == TCG_TYPE_I32) {
+tgen32_cmp(s, c, r1, r2);
+} else {
+tgen64_cmp(s, c, r1, r2);
+}
+/* Emit: r1 = 1; if (cc) goto over; r1 = 0; over:  */
+tcg_out_movi(s, type, dest, 1);
+tcg_out_insn(s, RI, BRC, tcg_cond_to_s390_cond[c], (4 + 4) >> 1);
+tcg_out_movi(s, type, dest, 0);
+}
+
 #if defined(CONFIG_SOFTMMU)
 static void tcg_prepare_qemu_ldst(TCGContext* s, int data_reg, int addr_reg,
   int mem_index, int opc,
@@ -958,27 +994,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 break;
 
 case INDEX_op_brcond_i64:
-if (args[2] > TCG_COND_GT) {
-/* unsigned */
-/* clgr %ra0, %ra1 */
-tcg_out_insn(s, RRE, CLGR, args[0], args[1]);
-} else {
-/* signed */
-/* cgr %ra0, %ra1 */
-tcg_out_insn(s, RRE, CGR, args[0], args[1]);
-}
+tgen64_cmp(s, args[2], args[0], args[1]);
 goto do_brcond;
-
 case INDEX_op_brcond_i32:
-if (args[2] > TCG_COND_GT) {
-/* unsigned */
-/* clr %ra0, %ra1 */
-tcg_out_insn(s, RR, CLR, args[0], args[1]);
-} else {
-/* signed */
-/* cr %ra0, %ra1 */
-tcg_out_insn(s, RR, CR, args[0], args[1]);
-}
+tgen32_cmp(s, args[2], args[0], args[1]);
 do_brcond:
 l = &s->labels[args[3]];
 if (l->has_value) {
@@ -993,6 +1012,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tcg_out_insn(s, RR, BCR, tcg_cond_to_s390_cond[args[2]], TCG_REG_R13);
 break;
 
+case INDEX_op_setcond_i32:
+tgen_setcond(s, TCG_TYPE_I32, args[3], args[0], args[1], args[2]);
+break;
+case INDEX_op_setcond_i64:
+tgen_setcond(s, TCG_TYPE_I64, args[3], args[0], args[1], args[2]);
+break;
+
 case INDEX_op_qemu_ld8u:
 tcg_out_qemu_ld(s, args, LD_UINT8);
 break;
@@ -1083,6 +1109,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_sar_i32, { "r", "0", "Ri" } },
 
 { INDEX_op_brcond_i32, { "r", "r" } },
+{ INDEX_op_setcond_i32, { "r", "r", "r" } },
 
 { INDEX_op_qemu_ld8u, { "r", "L" } },
 { INDEX_op_qemu_ld8s, { "r", "L" } },
@@ -1129,6 +1156,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_sar_i64, { "r", "r", "Ri" } },
 
 { INDEX_op_brcond_i64, { "r", "r" } },
+{ INDEX_op_setcond_i64, { "r", "r", "r" } },
 #endif
 
 { -1 },
-- 
1.7.0.1




[Qemu-devel] [PATCH 18/62] tcg-s390: Use matching constraints.

2010-05-27 Thread Richard Henderson
Simplify the generation within tcg_out_op by forcing arg1 == arg0 for
the two-operand instructions.

In addition, fix the use of the 64-bit shift insns in implementing the
32-bit shifts.  This would yield incorrect results for the right shifts.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |  181 +++--
 1 files changed, 39 insertions(+), 142 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 0deb332..c45d8b5 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -683,7 +683,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 const TCGArg *args, const int *const_args)
 {
 TCGLabel* l;
-S390Opcode op, op2;
+S390Opcode op;
 
 switch (opc) {
 case INDEX_op_exit_tb:
@@ -842,111 +842,43 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 
 case INDEX_op_add_i32:
 if (const_args[2]) {
-if (args[0] == args[1]) {
-tcg_out_insn(s, RI, AHI, args[1], args[2]);
-} else {
-tcg_out_insn(s, RR, LR, args[0], args[1]);
-tcg_out_insn(s, RI, AHI, args[0], args[2]);
-}
-} else if (args[0] == args[1]) {
-tcg_out_insn(s, RR, AR, args[1], args[2]);
-} else if (args[0] == args[2]) {
-tcg_out_insn(s, RR, AR, args[0], args[1]);
+tcg_out_insn(s, RI, AHI, args[0], args[2]);
 } else {
-tcg_out_insn(s, RR, LR, args[0], args[1]);
 tcg_out_insn(s, RR, AR, args[0], args[2]);
 }
 break;
 
-case INDEX_op_sub_i32:
-if (args[0] == args[1]) {
-/* sr %ra0/1, %ra2 */
-tcg_out_insn(s, RR, SR, args[1], args[2]);
-} else if (args[0] == args[2]) {
-/* lr %r13, %raa0/2 */
-tcg_out_insn(s, RR, LR, TCG_REG_R13, args[2]);
-/* lr %ra0/2, %ra1 */
-tcg_out_insn(s, RR, LR, args[0], args[1]);
-/* sr %ra0/2, %r13 */
-tcg_out_insn(s, RR, SR, args[0], TCG_REG_R13);
-} else {
-/* lr %ra0, %ra1 */
-tcg_out_insn(s, RR, LR, args[0], args[1]);
-/* sr %ra0, %ra2 */
-tcg_out_insn(s, RR, SR, args[0], args[2]);
-}
+case INDEX_op_add_i64:
+tcg_out_insn(s, RRE, AGR, args[0], args[2]);
 break;
 
-case INDEX_op_sub_i64:
-if (args[0] == args[1]) {
-/* sgr %ra0/1, %ra2 */
-tcg_out_insn(s, RRE, SGR, args[1], args[2]);
-} else if (args[0] == args[2]) {
-tcg_out_mov(s, TCG_REG_R13, args[2]);
-tcg_out_mov(s, args[0], args[1]);
-/* sgr %ra0/2, %r13 */
-tcg_out_insn(s, RRE, SGR, args[0], TCG_REG_R13);
-} else {
-tcg_out_mov(s, args[0], args[1]);
-/* sgr %ra0, %ra2 */
-tcg_out_insn(s, RRE, SGR, args[0], args[2]);
-}
+case INDEX_op_sub_i32:
+tcg_out_insn(s, RR, SR, args[0], args[2]);
 break;
 
-case INDEX_op_add_i64:
-if (args[0] == args[1]) {
-tcg_out_insn(s, RRE, AGR, args[1], args[2]);
-} else if (args[0] == args[2]) {
-tcg_out_insn(s, RRE, AGR, args[0], args[1]);
-} else {
-tcg_out_mov(s, args[0], args[1]);
-tcg_out_insn(s, RRE, AGR, args[0], args[2]);
-}
+case INDEX_op_sub_i64:
+tcg_out_insn(s, RRE, SGR, args[0], args[2]);
 break;
 
 case INDEX_op_and_i32:
-op = RR_NR;
-do_logic_i32:
-if (args[0] == args[1]) {
-/* xr %ra0/1, %ra2 */
-tcg_out_insn_RR(s, op, args[1], args[2]);
-} else if (args[0] == args[2]) {
-/* xr %ra0/2, %ra1 */
-tcg_out_insn_RR(s, op, args[0], args[1]);
-} else {
-/* lr %ra0, %ra1 */
-tcg_out_insn(s, RR, LR, args[0], args[1]);
-/* xr %ra0, %ra2 */
-tcg_out_insn_RR(s, op, args[0], args[2]);
-}
+tcg_out_insn(s, RR, NR, args[0], args[2]);
 break;
-
 case INDEX_op_or_i32:
-op = RR_OR;
-goto do_logic_i32;
+tcg_out_insn(s, RR, OR, args[0], args[2]);
+break;
 case INDEX_op_xor_i32:
-op = RR_XR;
-goto do_logic_i32;
+tcg_out_insn(s, RR, XR, args[0], args[2]);
+break;
 
 case INDEX_op_and_i64:
-op = RRE_NGR;
-do_logic_i64:
-if (args[0] == args[1]) {
-tcg_out_insn_RRE(s, op, args[0], args[2]);
-} else if (args[0] == args[2]) {
-tcg_out_insn_RRE(s, op, args[0], args[1]);
-} else {
-tcg_out_mov(s, args[0], args[1]);
-tcg_out_insn_RRE(s, op, args[0], args[2]);
-}
+tcg_out_insn(s, RRE, NGR, args[0], args[2]);
 break;
-
 case INDEX_op_or_i64:
-op = RRE_OGR;
-goto do_logic_i64;
+tcg_out_insn(s, RRE, OG

[Qemu-devel] [PATCH 23/62] tcg-s390: Add tgen_calli.

2010-05-27 Thread Richard Henderson
Use it in the softmmu code paths, and INDEX_op_call.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   45 -
 1 files changed, 16 insertions(+), 29 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index f4dab1a..0bd4276 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -347,12 +347,6 @@ static void tcg_out_sh32(TCGContext* s, S390Opcode op, 
TCGReg dest,
 tcg_out_insn_RS(s, op, dest, sh_reg, 0, sh_imm);
 }
 
-/* branch to relative address (long) */
-static void tcg_out_brasl(TCGContext *s, TCGReg r, tcg_target_long raddr)
-{
-tcg_out_insn(s, RIL, BRASL, r, raddr >> 1);
-}
-
 static inline void tcg_out_mov(TCGContext *s, int ret, int arg)
 {
 /* ??? With a TCGType argument, we could emit the smaller LR insn.  */
@@ -372,7 +366,7 @@ static inline void tcg_out_movi(TCGContext *s, TCGType type,
 tcg_out_insn(s, RI, IILH, ret, arg >> 16);
 } else {
 /* branch over constant and store its address in R13 */
-tcg_out_brasl(s, TCG_REG_R13, 14);
+tcg_out_insn(s, RIL, BRASL, TCG_REG_R13, (6 + 8) >> 1);
 /* 64-bit constant */
 tcg_out32(s, arg >> 32);
 tcg_out32(s, arg);
@@ -491,6 +485,17 @@ static void tgen_branch(TCGContext *s, int cc, int labelno)
 }
 }
 
+static void tgen_calli(TCGContext *s, tcg_target_long dest)
+{
+tcg_target_long off = (dest - (tcg_target_long)s->code_ptr) >> 1;
+if (off == (int32_t)off) {
+tcg_out_insn(s, RIL, BRASL, TCG_REG_R14, off);
+} else {
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, dest);
+tcg_out_insn(s, RR, BASR, TCG_REG_R14, TCG_REG_R13);
+}
+}
+
 #if defined(CONFIG_SOFTMMU)
 static void tcg_prepare_qemu_ldst(TCGContext* s, int data_reg, int addr_reg,
   int mem_index, int opc,
@@ -555,14 +560,10 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 if (is_store) {
 tcg_out_mov(s, arg1, data_reg);
 tcg_out_movi(s, TCG_TYPE_I32, arg2, mem_index);
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13,
- (tcg_target_ulong)qemu_st_helpers[s_bits]);
-tcg_out_insn(s, RR, BASR, TCG_REG_R14, TCG_REG_R13);
+tgen_calli(s, (tcg_target_ulong)qemu_st_helpers[s_bits]);
 } else {
 tcg_out_movi(s, TCG_TYPE_I32, arg1, mem_index);
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13,
- (tcg_target_ulong)qemu_ld_helpers[s_bits]);
-tcg_out_insn(s, RR, BASR, TCG_REG_R14, TCG_REG_R13);
+tgen_calli(s, (tcg_target_ulong)qemu_ld_helpers[s_bits]);
 
 /* sign extension */
 switch (opc) {
@@ -785,7 +786,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 } else {
 tcg_target_long off = ((tcg_target_long)(s->tb_next + args[0]) -
(tcg_target_long)s->code_ptr) >> 1;
-if (off > -0x8000L && off < 0x7fffL) {
+if (off == (int32_t)off) {
 /* load address relative to PC */
 tcg_out_insn(s, RIL, LARL, TCG_REG_R13, off);
 } else {
@@ -803,22 +804,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
 case INDEX_op_call:
 if (const_args[0]) {
-tcg_target_long off;
-
-/* FIXME: + 4? Where did that come from? */
-off = (args[0] - (tcg_target_long)s->code_ptr + 4) >> 1;
-if (off > -0x8000 && off < 0x7fff) {
-/* relative call */
-tcg_out_brasl(s, TCG_REG_R14, off << 1);
-/* XXX untested */
-tcg_abort();
-} else {
-/* too far for a relative call, load full address */
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, args[0]);
-tcg_out_insn(s, RR, BASR, TCG_REG_R14, TCG_REG_R13);
-}
+tgen_calli(s, args[0]);
 } else {
-/* call function in register args[0] */
 tcg_out_insn(s, RR, BASR, TCG_REG_R14, args[0]);
 }
 break;
-- 
1.7.0.1




[Qemu-devel] [PATCH 31/62] tcg-s390: Use the extended-immediate facility for add/sub.

2010-05-27 Thread Richard Henderson
This gives us 32-bit immediate addends.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   68 +---
 1 files changed, 52 insertions(+), 16 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index cf70cc2..caa2d0d 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -33,14 +33,16 @@
 do { } while (0)
 #endif
 
-#define TCG_CT_CONST_S160x100
-#define TCG_CT_CONST_U120x200
+#define TCG_CT_CONST_S320x100
+#define TCG_CT_CONST_N320x200
 
 /* All of the following instructions are prefixed with their instruction
format, and are defined as 8- or 16-bit quantities, even when the two
halves of the 16-bit quantity may appear 32 bits apart in the insn.
This makes it easy to copy the values from the tables in Appendix B.  */
 typedef enum S390Opcode {
+RIL_AFI = 0xc209,
+RIL_AGFI= 0xc208,
 RIL_BRASL   = 0xc005,
 RIL_BRCL= 0xc004,
 RIL_LARL= 0xc000,
@@ -288,7 +290,11 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 break;
 case 'I':
 ct->ct &= ~TCG_CT_REG;
-ct->ct |= TCG_CT_CONST_S16;
+ct->ct |= TCG_CT_CONST_S32;
+break;
+case 'J':
+ct->ct &= ~TCG_CT_REG;
+ct->ct |= TCG_CT_CONST_N32;
 break;
 default:
 break;
@@ -305,10 +311,12 @@ static inline int tcg_target_const_match(tcg_target_long 
val,
 {
 int ct = arg_ct->ct;
 
-if ((ct & TCG_CT_CONST) ||
-   ((ct & TCG_CT_CONST_S16) && val == (int16_t)val) ||
-   ((ct & TCG_CT_CONST_U12) && val == (val & 0xfff))) {
+if (ct & TCG_CT_CONST) {
 return 1;
+} else if (ct & TCG_CT_CONST_S32) {
+return val == (int32_t)val;
+} else if (ct & TCG_CT_CONST_N32) {
+return -val == (int32_t)-val;
 }
 
 return 0;
@@ -529,6 +537,24 @@ static inline void tgen_ext32u(TCGContext *s, TCGReg dest, 
TCGReg src)
 tcg_out_insn(s, RRE, LLGFR, dest, src);
 }
 
+static inline void tgen32_addi(TCGContext *s, TCGReg dest, tcg_target_long val)
+{
+if (val == (int16_t)val) {
+tcg_out_insn(s, RI, AHI, dest, val);
+} else {
+tcg_out_insn(s, RIL, AFI, dest, val);
+}
+}
+
+static inline void tgen64_addi(TCGContext *s, TCGReg dest, tcg_target_long val)
+{
+if (val == (int16_t)val) {
+tcg_out_insn(s, RI, AGHI, dest, val);
+} else {
+tcg_out_insn(s, RIL, AGFI, dest, val);
+}
+}
+
 static void tgen32_cmp(TCGContext *s, TCGCond c, TCGReg r1, TCGReg r2)
 {
 if (c > TCG_COND_GT) {
@@ -974,22 +1000,32 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 
 case INDEX_op_add_i32:
 if (const_args[2]) {
-tcg_out_insn(s, RI, AHI, args[0], args[2]);
+tgen32_addi(s, args[0], args[2]);
 } else {
 tcg_out_insn(s, RR, AR, args[0], args[2]);
 }
 break;
-
 case INDEX_op_add_i64:
-tcg_out_insn(s, RRE, AGR, args[0], args[2]);
+if (const_args[2]) {
+tgen64_addi(s, args[0], args[2]);
+} else {
+tcg_out_insn(s, RRE, AGR, args[0], args[2]);
+}
 break;
 
 case INDEX_op_sub_i32:
-tcg_out_insn(s, RR, SR, args[0], args[2]);
+if (const_args[2]) {
+tgen32_addi(s, args[0], -args[2]);
+} else {
+tcg_out_insn(s, RR, SR, args[0], args[2]);
+}
 break;
-
 case INDEX_op_sub_i64:
-tcg_out_insn(s, RRE, SGR, args[0], args[2]);
+if (const_args[2]) {
+tgen64_addi(s, args[0], -args[2]);
+} else {
+tcg_out_insn(s, RRE, SGR, args[0], args[2]);
+}
 break;
 
 case INDEX_op_and_i32:
@@ -1254,8 +1290,8 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_st16_i32, { "r", "r" } },
 { INDEX_op_st_i32, { "r", "r" } },
 
-{ INDEX_op_add_i32, { "r", "0", "rI" } },
-{ INDEX_op_sub_i32, { "r", "0", "r" } },
+{ INDEX_op_add_i32, { "r", "0", "ri" } },
+{ INDEX_op_sub_i32, { "r", "0", "ri" } },
 { INDEX_op_mul_i32, { "r", "0", "r" } },
 
 { INDEX_op_div2_i32, { "b", "a", "0", "1", "r" } },
@@ -1315,8 +1351,8 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_st32_i64, { "r", "r" } },
 { INDEX_op_st_i64, { "r", "r" } },
 
-{ INDEX_op_add_i64, { "r", "0", "r" } },
-{ INDEX_op_sub_i64, { "r", "0", "r" } },
+{ INDEX_op_add_i64, { "r", "0", "rI" } },
+{ INDEX_op_sub_i64, { "r", "0", "rJ" } },
 { INDEX_op_mul_i64, { "r", "0", "r" } },
 
 { INDEX_op_div2_i64, { "b", "a", "0", "1", "r" } },
-- 
1.7.0.1




[Qemu-devel] [PATCH 28/62] tcg-s390: Implement rotates.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   46 ++
 tcg/s390/tcg-target.h |4 ++--
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 7c7adb3..f85063e 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -100,6 +100,8 @@ typedef enum S390Opcode {
 RR_SR   = 0x1b,
 RR_XR   = 0x17,
 
+RSY_RLL = 0xeb1d,
+RSY_RLLG= 0xeb1c,
 RSY_SLLG= 0xeb0d,
 RSY_SRAG= 0xeb0a,
 RSY_SRLG= 0xeb0c,
@@ -1095,6 +1097,44 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 op = RSY_SRAG;
 goto do_shift64;
 
+case INDEX_op_rotl_i32:
+/* ??? Using tcg_out_sh64 here for the format; it is a 32-bit rol.  */
+if (const_args[2]) {
+tcg_out_sh64(s, RSY_RLL, args[0], args[1], SH32_REG_NONE, args[2]);
+} else {
+tcg_out_sh64(s, RSY_RLL, args[0], args[1], args[2], 0);
+}
+break;
+case INDEX_op_rotr_i32:
+if (const_args[2]) {
+tcg_out_sh64(s, RSY_RLL, args[0], args[1],
+ SH32_REG_NONE, (32 - args[2]) & 31);
+} else {
+tcg_out_insn(s, RR, LCR, TCG_REG_R13, args[2]);
+tcg_out_sh64(s, RSY_RLL, args[0], args[1], TCG_REG_R13, 0);
+}
+break;
+
+case INDEX_op_rotl_i64:
+if (const_args[2]) {
+tcg_out_sh64(s, RSY_RLLG, args[0], args[1],
+ SH64_REG_NONE, args[2]);
+} else {
+tcg_out_sh64(s, RSY_RLLG, args[0], args[1], args[2], 0);
+}
+break;
+case INDEX_op_rotr_i64:
+if (const_args[2]) {
+tcg_out_sh64(s, RSY_RLLG, args[0], args[1],
+ SH64_REG_NONE, (64 - args[2]) & 63);
+} else {
+/* We can use the smaller 32-bit negate because only the
+   low 6 bits are examined for the rotate.  */
+tcg_out_insn(s, RR, LCR, TCG_REG_R13, args[2]);
+tcg_out_sh64(s, RSY_RLLG, args[0], args[1], TCG_REG_R13, 0);
+}
+break;
+
 case INDEX_op_ext8s_i32:
 case INDEX_op_ext8s_i64:
 tgen_ext8s(s, args[0], args[1]);
@@ -1241,6 +1281,9 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_shr_i32, { "r", "0", "Ri" } },
 { INDEX_op_sar_i32, { "r", "0", "Ri" } },
 
+{ INDEX_op_rotl_i32, { "r", "r", "Ri" } },
+{ INDEX_op_rotr_i32, { "r", "r", "Ri" } },
+
 { INDEX_op_ext8s_i32, { "r", "r" } },
 { INDEX_op_ext8u_i32, { "r", "r" } },
 { INDEX_op_ext16s_i32, { "r", "r" } },
@@ -1299,6 +1342,9 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_shr_i64, { "r", "r", "Ri" } },
 { INDEX_op_sar_i64, { "r", "r", "Ri" } },
 
+{ INDEX_op_rotl_i64, { "r", "r", "Ri" } },
+{ INDEX_op_rotr_i64, { "r", "r", "Ri" } },
+
 { INDEX_op_ext8s_i64, { "r", "r" } },
 { INDEX_op_ext8u_i64, { "r", "r" } },
 { INDEX_op_ext16s_i64, { "r", "r" } },
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 76f1d03..0af4d38 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -49,7 +49,7 @@ typedef enum TCGReg {
 
 /* optional instructions */
 #define TCG_TARGET_HAS_div2_i32
-// #define TCG_TARGET_HAS_rot_i32
+#define TCG_TARGET_HAS_rot_i32
 #define TCG_TARGET_HAS_ext8s_i32
 #define TCG_TARGET_HAS_ext16s_i32
 #define TCG_TARGET_HAS_ext8u_i32
@@ -65,7 +65,7 @@ typedef enum TCGReg {
 // #define TCG_TARGET_HAS_nor_i32
 
 #define TCG_TARGET_HAS_div2_i64
-// #define TCG_TARGET_HAS_rot_i64
+#define TCG_TARGET_HAS_rot_i64
 #define TCG_TARGET_HAS_ext8s_i64
 #define TCG_TARGET_HAS_ext16s_i64
 #define TCG_TARGET_HAS_ext32s_i64
-- 
1.7.0.1




[Qemu-devel] [PATCH 17/62] tcg-s390: Reorganize instruction emission

2010-05-27 Thread Richard Henderson
Tie the opcode names to the format, and arrange for moderate
compile-time checking that the instruction format output routine
matches the format used by the opcode.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |  625 +++--
 tcg/s390/tcg-target.h |5 +-
 2 files changed, 297 insertions(+), 333 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index e0a0e73..0deb332 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -36,64 +36,86 @@
 #define TCG_CT_CONST_S160x100
 #define TCG_CT_CONST_U120x200
 
-#define E3_LG  0x04
-#define E3_LRVG0x0f
-#define E3_LGF 0x14
-#define E3_LGH 0x15
-#define E3_LLGF0x16
-#define E3_LRV 0x1e
-#define E3_LRVH0x1f
-#define E3_CG  0x20
-#define E3_STG 0x24
-#define E3_STRVG   0x2f
-#define E3_STRV0x3e
-#define E3_STRVH   0x3f
-#define E3_STHY0x70
-#define E3_STCY0x72
-#define E3_LGB 0x77
-#define E3_LLGC0x90
-#define E3_LLGH0x91
-
-#define B9_LGR 0x04
-#define B9_AGR 0x08
-#define B9_SGR 0x09
-#define B9_MSGR0x0c
-#define B9_LGFR0x14
-#define B9_LLGFR   0x16
-#define B9_CGR 0x20
-#define B9_CLGR0x21
-#define B9_NGR 0x80
-#define B9_OGR 0x81
-#define B9_XGR 0x82
-#define B9_DLGR0x87
-#define B9_DLR 0x97
-
-#define RR_BASR0x0d
-#define RR_NR  0x14
-#define RR_CLR 0x15
-#define RR_OR  0x16
-#define RR_XR  0x17
-#define RR_LR  0x18
-#define RR_CR  0x19
-#define RR_AR  0x1a
-#define RR_SR  0x1b
-
-#define A7_AHI 0xa
-#define A7_AHGI0xb
-
-#define SH64_REG_NONE  0x00 /* use immediate only (not R0!) */
-#define SH64_SRAG  0x0a
-#define SH64_SRLG  0x0c
-#define SH64_SLLG  0x0d
-
-#define SH32_REG_NONE  0x00 /* use immediate only (not R0!) */
-#define SH32_SRL   0x08
-#define SH32_SLL   0x09
-#define SH32_SRA   0x0a
-
-#define ST_STH 0x40
-#define ST_STC 0x42
-#define ST_ST  0x50
+/* All of the following instructions are prefixed with their instruction
+   format, and are defined as 8- or 16-bit quantities, even when the two
+   halves of the 16-bit quantity may appear 32 bits apart in the insn.
+   This makes it easy to copy the values from the tables in Appendix B.  */
+typedef enum S390Opcode {
+RIL_LARL= 0xc000,
+RIL_BRASL   = 0xc005,
+
+RI_AGHI = 0xa70b,
+RI_AHI  = 0xa70a,
+RI_BRC  = 0xa704,
+RI_IILH = 0xa502,
+RI_LGHI = 0xa709,
+RI_LLILL= 0xa50f,
+
+RRE_AGR = 0xb908,
+RRE_CGR = 0xb920,
+RRE_CLGR= 0xb921,
+RRE_DLGR= 0xb987,
+RRE_DLR = 0xb997,
+RRE_DSGFR   = 0xb91d,
+RRE_DSGR= 0xb90d,
+RRE_LCGR= 0xb903,
+RRE_LGFR= 0xb914,
+RRE_LGR = 0xb904,
+RRE_LLGFR   = 0xb916,
+RRE_MSGR= 0xb90c,
+RRE_MSR = 0xb252,
+RRE_NGR = 0xb980,
+RRE_OGR = 0xb981,
+RRE_SGR = 0xb909,
+RRE_XGR = 0xb982,
+
+RR_AR   = 0x1a,
+RR_BASR = 0x0d,
+RR_BCR  = 0x07,
+RR_CLR  = 0x15,
+RR_CR   = 0x19,
+RR_LCR  = 0x13,
+RR_LR   = 0x18,
+RR_NR   = 0x14,
+RR_OR   = 0x16,
+RR_SR   = 0x1b,
+RR_XR   = 0x17,
+
+RSY_SLLG= 0xeb0d,
+RSY_SRAG= 0xeb0a,
+RSY_SRLG= 0xeb0c,
+
+RS_SLL  = 0x89,
+RS_SRA  = 0x8a,
+RS_SRL  = 0x88,
+
+RXY_CG  = 0xe320,
+RXY_LG  = 0xe304,
+RXY_LGB = 0xe377,
+RXY_LGF = 0xe314,
+RXY_LGH = 0xe315,
+RXY_LLGC= 0xe390,
+RXY_LLGF= 0xe316,
+RXY_LLGH= 0xe391,
+RXY_LMG = 0xeb04,
+RXY_LRV = 0xe31e,
+RXY_LRVG= 0xe30f,
+RXY_LRVH= 0xe31f,
+RXY_STCY= 0xe372,
+RXY_STG = 0xe324,
+RXY_STHY= 0xe370,
+RXY_STMG= 0xeb24,
+RXY_STRV= 0xe33e,
+RXY_STRVG   = 0xe32f,
+RXY_STRVH   = 0xe33f,
+
+RX_ST   = 0x50,
+RX_STC  = 0x42,
+RX_STH  = 0x40,
+} S390Opcode;
+
+#define SH32_REG_NONE  0
+#define SH64_REG_NONE  0
 
 #define LD_SIGNED  0x04
 #define LD_UINT8   0x00
@@ -105,14 +127,6 @@
 #define LD_UINT64  0x03
 #define LD_INT64   (LD_UINT64 | LD_SIGNED)
 
-#define S390_INS_BCR   0x0700
-#define S390_INS_BR(S390_INS_BCR | 0x00f0)
-#define S390_INS_IILH  0xa502
-#define S390_INS_LLILL 0xa50f
-#define S390_INS_LGHI  0xa709
-#define S390_INS_MSR   0xb252
-#define S390_INS_LARL  0xc000
-
 #ifndef NDEBUG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "%r0", "%r1", "%r2", "%r3", "%r4", "%r5", "%r6", "%r7",
@@ -204,7 +218,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 }
 
 static int tcg_target_get_call_iarg_r

[Qemu-devel] [PATCH 12/62] tcg-s390: Eliminate the S constraint.

2010-05-27 Thread Richard Henderson
R4 is not clobbered until all of the inputs are consumed,
so there's no need to avoid R4 in the qemu_st paths.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   30 ++
 1 files changed, 6 insertions(+), 24 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 455cf6a..2f29728 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -202,20 +202,6 @@ static int tcg_target_get_call_iarg_regs_count(int flags)
 return sizeof(tcg_target_call_iarg_regs) / sizeof(int);
 }
 
-static void constraint_softmmu(TCGArgConstraint *ct, const char c)
-{
-#ifdef CONFIG_SOFTMMU
-switch (c) {
-case 'S':   /* qemu_st constraint */
-tcg_regset_reset_reg (ct->u.regs, TCG_REG_R4);
-/* fall through */
-case 'L':   /* qemu_ld constraint */
-tcg_regset_reset_reg (ct->u.regs, TCG_REG_R3);
-break;
-}
-#endif
-  }
-
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 {
@@ -226,13 +212,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 ct_str = *pct_str;
 
 switch (ct_str[0]) {
-case 'L':   /* qemu_ld constraint */
+case 'L':   /* qemu_ld/st constraint */
 tcg_regset_reset_reg (ct->u.regs, TCG_REG_R2);
-constraint_softmmu(ct, 'L');
-break;
-case 'S':   /* qemu_st constraint */
-tcg_regset_reset_reg (ct->u.regs, TCG_REG_R2);
-constraint_softmmu(ct, 'S');
+tcg_regset_reset_reg (ct->u.regs, TCG_REG_R3);
 break;
 case 'R':/* not R0 */
 tcg_regset_reset_reg(ct->u.regs, TCG_REG_R0);
@@ -1239,9 +1221,9 @@ do_logic_i64:
 { INDEX_op_qemu_ld32u, { "r", "L" } },
 { INDEX_op_qemu_ld32s, { "r", "L" } },
 
-{ INDEX_op_qemu_st8, { "S", "S" } },
-{ INDEX_op_qemu_st16, { "S", "S" } },
-{ INDEX_op_qemu_st32, { "S", "S" } },
+{ INDEX_op_qemu_st8, { "L", "L" } },
+{ INDEX_op_qemu_st16, { "L", "L" } },
+{ INDEX_op_qemu_st32, { "L", "L" } },
 
 #if defined(__s390x__)
 { INDEX_op_mov_i64, { "r", "r" } },
@@ -1261,7 +1243,7 @@ do_logic_i64:
 { INDEX_op_st_i64, { "r", "r" } },
 
 { INDEX_op_qemu_ld64, { "L", "L" } },
-{ INDEX_op_qemu_st64, { "S", "S" } },
+{ INDEX_op_qemu_st64, { "L", "L" } },
 
 { INDEX_op_add_i64, { "r", "r", "r" } },
 { INDEX_op_mul_i64, { "r", "r", "r" } },
-- 
1.7.0.1




[Qemu-devel] [PATCH 09/62] tcg-s390: Mark R0 & R15 reserved.

2010-05-27 Thread Richard Henderson
Don't merely exclude them from the register allocation order.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index eb3ca38..6988937 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -124,8 +124,7 @@ static const int tcg_target_reg_alloc_order[] = {
 TCG_REG_R12,
 TCG_REG_R13,
 TCG_REG_R14,
-/* XXX many insns can't be used with R0, so we better avoid it for now */
-/* TCG_REG_R0 */
+TCG_REG_R0,
 TCG_REG_R1,
 TCG_REG_R2,
 TCG_REG_R3,
@@ -1304,6 +1303,10 @@ void tcg_target_init(TCGContext *s)
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R13);
 /* another temporary */
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R12);
+/* XXX many insns can't be used with R0, so we better avoid it for now */
+tcg_regset_set_reg(s->reserved_regs, TCG_REG_R0);
+/* The stack pointer.  */
+tcg_regset_set_reg(s->reserved_regs, TCG_REG_R15);
 
 tcg_add_target_add_op_defs(s390_op_defs);
 }
-- 
1.7.0.1




[Qemu-devel] [PATCH 07/62] s390x: Don't use a linker script for user-only.

2010-05-27 Thread Richard Henderson
The default placement of the application at 0x8000 is fine,
and will avoid the default placement for most other guests.

Signed-off-by: Richard Henderson 
---
 configure |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 3cd2c5f..e2b389d 100755
--- a/configure
+++ b/configure
@@ -2753,6 +2753,9 @@ if test "$target_linux_user" = "yes" -o 
"$target_bsd_user" = "yes" ; then
 # -static is used to avoid g1/g3 usage by the dynamic linker
 ldflags="$linker_script -static $ldflags"
 ;;
+  alpha | s390x)
+# The default placement of the application is fine.
+;;
   *)
 ldflags="$linker_script $ldflags"
 ;;
-- 
1.7.0.1




[Qemu-devel] [PATCH 11/62] tcg-s390: Move tcg_out_mov up and use it throughout.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   42 --
 1 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 25c80e6..455cf6a 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -315,6 +315,12 @@ static void tcg_out_store(TCGContext* s, int op, int r0, 
int r1, int off)
 tcg_out32(s, (op << 24) | (r0 << 20) | (r1 << 12) | off);
 }
 
+static inline void tcg_out_mov(TCGContext *s, int ret, int arg)
+{
+/* ??? With a TCGType argument, we could emit the smaller LR insn.  */
+tcg_out_b9(s, B9_LGR, ret, arg);
+}
+
 /* load a register with an immediate value */
 static inline void tcg_out_movi(TCGContext *s, TCGType type,
 int ret, tcg_target_long arg)
@@ -386,8 +392,8 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 tcg_out_b9(s, B9_LLGFR, arg1, addr_reg);
 tcg_out_b9(s, B9_LLGFR, arg0, addr_reg);
 #else
-tcg_out_b9(s, B9_LGR, arg1, addr_reg);
-tcg_out_b9(s, B9_LGR, arg0, addr_reg);
+tcg_out_mov(s, arg1, addr_reg);
+tcg_out_mov(s, arg0, addr_reg);
 #endif
 
 tcg_out_sh64(s, SH64_SRLG, arg1, addr_reg, SH64_REG_NONE,
@@ -423,11 +429,11 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 #if TARGET_LONG_BITS == 32
 tcg_out_b9(s, B9_LLGFR, arg0, addr_reg);
 #else
-tcg_out_b9(s, B9_LGR, arg0, addr_reg);
+tcg_out_mov(s, arg0, addr_reg);
 #endif
 
 if (is_store) {
-tcg_out_b9(s, B9_LGR, arg1, data_reg);
+tcg_out_mov(s, arg1, data_reg);
 tcg_out_movi(s, TCG_TYPE_I32, arg2, mem_index);
 tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13,
  (tcg_target_ulong)qemu_st_helpers[s_bits]);
@@ -453,7 +459,7 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 break;
 default:
 /* unsigned -> just copy */
-tcg_out_b9(s, B9_LGR, data_reg, arg0);
+tcg_out_mov(s, data_reg, arg0);
 break;
 }
 }
@@ -481,7 +487,7 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 tcg_out_b9(s, B9_LLGFR, arg0, addr_reg);
 #else
 /* just copy */
-tcg_out_b9(s, B9_LGR, arg0, addr_reg);
+tcg_out_mov(s, arg0, addr_reg);
 #endif
 tcg_out_b9(s, B9_AGR, arg0, arg1);
   }
@@ -505,7 +511,7 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 if (TARGET_LONG_BITS == 32) {
 tcg_out_b9(s, B9_LLGFR, arg0, addr_reg);
 } else {
-tcg_out_b9(s, B9_LGR, arg0, addr_reg);
+tcg_out_mov(s, arg0, addr_reg);
 }
 }
 
@@ -898,15 +904,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 /* sgr %ra0/1, %ra2 */
 tcg_out_b9(s, B9_SGR, args[1], args[2]);
 } else if (args[0] == args[2]) {
-/* lgr %r13, %raa0/2 */
-tcg_out_b9(s, B9_LGR, TCG_REG_R13, args[2]);
-/* lgr %ra0/2, %ra1 */
-tcg_out_b9(s, B9_LGR, args[0], args[1]);
+tcg_out_mov(s, TCG_REG_R13, args[2]);
+tcg_out_mov(s, args[0], args[1]);
 /* sgr %ra0/2, %r13 */
 tcg_out_b9(s, B9_SGR, args[0], TCG_REG_R13);
 } else {
-/* lgr %ra0, %ra1 */
-tcg_out_b9(s, B9_LGR, args[0], args[1]);
+tcg_out_mov(s, args[0], args[1]);
 /* sgr %ra0, %ra2 */
 tcg_out_b9(s, B9_SGR, args[0], args[2]);
 }
@@ -920,7 +923,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 } else if (args[0] == args[2]) {
 tcg_out_b9(s, B9_AGR, args[0], args[1]);
 } else {
-tcg_out_b9(s, B9_LGR, args[0], args[1]);
+tcg_out_mov(s, args[0], args[1]);
 tcg_out_b9(s, B9_AGR, args[0], args[2]);
 }
 break;
@@ -960,7 +963,7 @@ do_logic_i64:
 } else if (args[0] == args[2]) {
 tcg_out_b9(s, op, args[0], args[1]);
 } else {
-tcg_out_b9(s, B9_LGR, args[0], args[1]);
+tcg_out_mov(s, args[0], args[1]);
 tcg_out_b9(s, op, args[0], args[2]);
 }
 break;
@@ -987,10 +990,10 @@ do_logic_i64:
 dprintf("op 0x%x neg_i64 0x%lx 0x%lx 0x%lx\n",
 opc, args[0], args[1], args[2]);
 /* FIXME: optimize args[0] != args[1] case */
-tcg_out_b9(s, B9_LGR, 13, args[1]);
+tcg_out_mov(s, TCG_REG_R13, args[1]);
 /* lghi %ra0, 0 */
 tcg_out32(s, S390_INS_LGHI | (args[0] << 20));
-tcg_out_b9(s, B9_SGR, args[0], 13);
+tcg_out_b9(s, B9_SGR, args[0], TCG_REG_R13);
 break;
 
 case INDEX_op_mul_i32:
@@ -1334,11 +1337,6 @@ void tcg_target_qemu_prologue(TCGContext *s)
 tcg_out16(s, S390_INS_BR | TCG_REG_R14);
 }
 
-static inline void tcg_out_mov(TCGContext *s, int ret, int arg)
-{
-tcg_out_b9(s, B9_LGR, ret,

[Qemu-devel] [PATCH 27/62] tcg-s390: Implement bswap operations.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   24 
 tcg/s390/tcg-target.h |   10 +-
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 3f7d08d..7c7adb3 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -78,6 +78,8 @@ typedef enum S390Opcode {
 RRE_LLGCR   = 0xb984,
 RRE_LLGFR   = 0xb916,
 RRE_LLGHR   = 0xb985,
+RRE_LRVR= 0xb91f,
+RRE_LRVGR   = 0xb90f,
 RRE_MSGR= 0xb90c,
 RRE_MSR = 0xb252,
 RRE_NGR = 0xb980,
@@ -1117,6 +1119,21 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tgen_ext32u(s, args[0], args[1]);
 break;
 
+case INDEX_op_bswap16_i32:
+case INDEX_op_bswap16_i64:
+/* The TCG bswap definition requires bits 0-47 already be zero.
+   Thus we don't need the G-type insns to implement bswap16_i64.  */
+tcg_out_insn(s, RRE, LRVR, args[0], args[1]);
+tcg_out_insn(s, RS, SRL, args[0], 0, SH32_REG_NONE, 16);
+break;
+case INDEX_op_bswap32_i32:
+case INDEX_op_bswap32_i64:
+tcg_out_insn(s, RRE, LRVR, args[0], args[1]);
+break;
+case INDEX_op_bswap64_i64:
+tcg_out_insn(s, RRE, LRVGR, args[0], args[1]);
+break;
+
 case INDEX_op_br:
 tgen_branch(s, S390_CC_ALWAYS, args[0]);
 break;
@@ -1229,6 +1246,9 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_ext16s_i32, { "r", "r" } },
 { INDEX_op_ext16u_i32, { "r", "r" } },
 
+{ INDEX_op_bswap16_i32, { "r", "r" } },
+{ INDEX_op_bswap32_i32, { "r", "r" } },
+
 { INDEX_op_brcond_i32, { "r", "r" } },
 { INDEX_op_setcond_i32, { "r", "r", "r" } },
 
@@ -1286,6 +1306,10 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_ext32s_i64, { "r", "r" } },
 { INDEX_op_ext32u_i64, { "r", "r" } },
 
+{ INDEX_op_bswap16_i64, { "r", "r" } },
+{ INDEX_op_bswap32_i64, { "r", "r" } },
+{ INDEX_op_bswap64_i64, { "r", "r" } },
+
 { INDEX_op_brcond_i64, { "r", "r" } },
 { INDEX_op_setcond_i64, { "r", "r", "r" } },
 #endif
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 76a13fc..76f1d03 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -54,8 +54,8 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_ext16s_i32
 #define TCG_TARGET_HAS_ext8u_i32
 #define TCG_TARGET_HAS_ext16u_i32
-// #define TCG_TARGET_HAS_bswap16_i32
-// #define TCG_TARGET_HAS_bswap32_i32
+#define TCG_TARGET_HAS_bswap16_i32
+#define TCG_TARGET_HAS_bswap32_i32
 // #define TCG_TARGET_HAS_not_i32
 #define TCG_TARGET_HAS_neg_i32
 // #define TCG_TARGET_HAS_andc_i32
@@ -72,9 +72,9 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_ext8u_i64
 #define TCG_TARGET_HAS_ext16u_i64
 #define TCG_TARGET_HAS_ext32u_i64
-// #define TCG_TARGET_HAS_bswap16_i64
-// #define TCG_TARGET_HAS_bswap32_i64
-// #define TCG_TARGET_HAS_bswap64_i64
+#define TCG_TARGET_HAS_bswap16_i64
+#define TCG_TARGET_HAS_bswap32_i64
+#define TCG_TARGET_HAS_bswap64_i64
 // #define TCG_TARGET_HAS_not_i64
 #define TCG_TARGET_HAS_neg_i64
 // #define TCG_TARGET_HAS_andc_i64
-- 
1.7.0.1




[Qemu-devel] [PATCH 19/62] tcg-s390: Fixup qemu_ld/st opcodes.

2010-05-27 Thread Richard Henderson
Implement INDEX_op_qemu_ld32.  Fix constraints on qemu_ld64.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index c45d8b5..f21a9ca 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -1009,6 +1009,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tcg_out_qemu_ld(s, args, LD_INT16);
 break;
 
+case INDEX_op_qemu_ld32:
+/* ??? Technically we can use a non-extending instruction.  */
 case INDEX_op_qemu_ld32u:
 tcg_out_qemu_ld(s, args, LD_UINT32);
 break;
@@ -1088,10 +1090,13 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_qemu_ld16s, { "r", "L" } },
 { INDEX_op_qemu_ld32u, { "r", "L" } },
 { INDEX_op_qemu_ld32s, { "r", "L" } },
+{ INDEX_op_qemu_ld32, { "r", "L" } },
+{ INDEX_op_qemu_ld64, { "r", "L" } },
 
 { INDEX_op_qemu_st8, { "L", "L" } },
 { INDEX_op_qemu_st16, { "L", "L" } },
 { INDEX_op_qemu_st32, { "L", "L" } },
+{ INDEX_op_qemu_st64, { "L", "L" } },
 
 #if defined(__s390x__)
 { INDEX_op_mov_i64, { "r", "r" } },
@@ -1110,9 +1115,6 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_st32_i64, { "r", "r" } },
 { INDEX_op_st_i64, { "r", "r" } },
 
-{ INDEX_op_qemu_ld64, { "L", "L" } },
-{ INDEX_op_qemu_st64, { "L", "L" } },
-
 { INDEX_op_add_i64, { "r", "0", "r" } },
 { INDEX_op_sub_i64, { "r", "0", "r" } },
 { INDEX_op_mul_i64, { "r", "0", "r" } },
-- 
1.7.0.1




[Qemu-devel] [PATCH 05/62] tcg-s390: Move opcode defines to tcg-target.c.

2010-05-27 Thread Richard Henderson
In addition to being the Right Thing, some of the RR_* defines
conflict with RR_* enumerations in target-mips/cpu.h.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   81 +
 tcg/s390/tcg-target.h |   80 
 2 files changed, 81 insertions(+), 80 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index f0013e7..1f961ad 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -33,6 +33,87 @@
 do { } while (0)
 #endif
 
+#define TCG_CT_CONST_S160x100
+#define TCG_CT_CONST_U120x200
+
+#define E3_LG  0x04
+#define E3_LRVG0x0f
+#define E3_LGF 0x14
+#define E3_LGH 0x15
+#define E3_LLGF0x16
+#define E3_LRV 0x1e
+#define E3_LRVH0x1f
+#define E3_CG  0x20
+#define E3_STG 0x24
+#define E3_STRVG   0x2f
+#define E3_STRV0x3e
+#define E3_STRVH   0x3f
+#define E3_STHY0x70
+#define E3_STCY0x72
+#define E3_LGB 0x77
+#define E3_LLGC0x90
+#define E3_LLGH0x91
+
+#define B9_LGR 0x04
+#define B9_AGR 0x08
+#define B9_SGR 0x09
+#define B9_MSGR0x0c
+#define B9_LGFR0x14
+#define B9_LLGFR   0x16
+#define B9_CGR 0x20
+#define B9_CLGR0x21
+#define B9_NGR 0x80
+#define B9_OGR 0x81
+#define B9_XGR 0x82
+#define B9_DLGR0x87
+#define B9_DLR 0x97
+
+#define RR_BASR0x0d
+#define RR_NR  0x14
+#define RR_CLR 0x15
+#define RR_OR  0x16
+#define RR_XR  0x17
+#define RR_LR  0x18
+#define RR_CR  0x19
+#define RR_AR  0x1a
+#define RR_SR  0x1b
+
+#define A7_AHI 0xa
+#define A7_AHGI0xb
+
+#define SH64_REG_NONE  0x00 /* use immediate only (not R0!) */
+#define SH64_SRAG  0x0a
+#define SH64_SRLG  0x0c
+#define SH64_SLLG  0x0d
+
+#define SH32_REG_NONE  0x00 /* use immediate only (not R0!) */
+#define SH32_SRL   0x08
+#define SH32_SLL   0x09
+#define SH32_SRA   0x0a
+
+#define ST_STH 0x40
+#define ST_STC 0x42
+#define ST_ST  0x50
+
+#define LD_SIGNED  0x04
+#define LD_UINT8   0x00
+#define LD_INT8(LD_UINT8 | LD_SIGNED)
+#define LD_UINT16  0x01
+#define LD_INT16   (LD_UINT16 | LD_SIGNED)
+#define LD_UINT32  0x02
+#define LD_INT32   (LD_UINT32 | LD_SIGNED)
+#define LD_UINT64  0x03
+#define LD_INT64   (LD_UINT64 | LD_SIGNED)
+
+#define S390_INS_BCR   0x0700
+#define S390_INS_BR(S390_INS_BCR | 0x00f0)
+#define S390_INS_IILH  0xa502
+#define S390_INS_LLILL 0xa50f
+#define S390_INS_LGHI  0xa709
+#define S390_INS_MSR   0xb252
+#define S390_INS_LARL  0xc000
+
+
 static const int tcg_target_reg_alloc_order[] = {
 TCG_REG_R6,
 TCG_REG_R7,
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index bd72115..7495258 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -87,86 +87,6 @@ enum {
 #define TCG_TARGET_STACK_ALIGN 8
 #define TCG_TARGET_CALL_STACK_OFFSET   0
 
-#define TCG_CT_CONST_S160x100
-#define TCG_CT_CONST_U120x200
-
-#define E3_LG  0x04
-#define E3_LRVG0x0f
-#define E3_LGF 0x14
-#define E3_LGH 0x15
-#define E3_LLGF0x16
-#define E3_LRV 0x1e
-#define E3_LRVH0x1f
-#define E3_CG  0x20
-#define E3_STG 0x24
-#define E3_STRVG   0x2f
-#define E3_STRV0x3e
-#define E3_STRVH   0x3f
-#define E3_STHY0x70
-#define E3_STCY0x72
-#define E3_LGB 0x77
-#define E3_LLGC0x90
-#define E3_LLGH0x91
-
-#define B9_LGR 0x04
-#define B9_AGR 0x08
-#define B9_SGR 0x09
-#define B9_MSGR0x0c
-#define B9_LGFR0x14
-#define B9_LLGFR   0x16
-#define B9_CGR 0x20
-#define B9_CLGR0x21
-#define B9_NGR 0x80
-#define B9_OGR 0x81
-#define B9_XGR 0x82
-#define B9_DLGR0x87
-#define B9_DLR 0x97
-
-#define RR_BASR0x0d
-#define RR_NR  0x14
-#define RR_CLR 0x15
-#define RR_OR  0x16
-#define RR_XR  0x17
-#define RR_LR  0x18
-#define RR_CR  0x19
-#define RR_AR  0x1a
-#define RR_SR  0x1b
-
-#define A7_AHI 0xa
-#define A7_AHGI0xb
-
-#define SH64_REG_NONE  0x00 /* use immediate only (not R0!) */
-#define SH64_SRAG  0x0a
-#define SH64_SRLG  0x0c
-#define SH64_SLLG  0x0d
-
-#define SH32_REG_NONE  0x00 /* use immediate only (not R0!) */
-#define SH32_SRL   0x08
-#define SH32_SLL   0x09
-#define SH32_SRA   0x0a
-
-#define ST_STH 0x40
-#define ST_STC 0x42
-#define ST_ST  0x50
-
-#define LD_SIGNED  0x04
-#define LD_UINT8   0x00
-#define LD_INT8(LD_UINT8 | LD_SIGNED)
-#define LD_UINT16  

[Qemu-devel] [PATCH 08/62] tcg-s390: Avoid set-but-not-used werrors.

2010-05-27 Thread Richard Henderson
The s_bits variable was only used in a dprintf, and isn't
really informative since we already dump 'opc' from which
s_bits is trivially derived.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |   16 ++--
 1 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 1f961ad..eb3ca38 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -519,7 +519,7 @@ static void tcg_finish_qemu_ldst(TCGContext* s, uint16_t 
*label2_ptr)
and endianness conversion */
 static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* args, int opc)
 {
-int addr_reg, data_reg, mem_index, s_bits;
+int addr_reg, data_reg, mem_index;
 int arg0 = TCG_REG_R2;
 uint16_t *label2_ptr;
 
@@ -527,10 +527,8 @@ static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* 
args, int opc)
 addr_reg = *args++;
 mem_index = *args;
 
-s_bits = opc & 3;
-
-dprintf("tcg_out_qemu_ld opc %d data_reg %d addr_reg %d mem_index %d "
-"s_bits %d\n", opc, data_reg, addr_reg, mem_index, s_bits);
+dprintf("tcg_out_qemu_ld opc %d data_reg %d addr_reg %d mem_index %d\n"
+opc, data_reg, addr_reg, mem_index);
 
 tcg_prepare_qemu_ldst(s, data_reg, addr_reg, mem_index,
   opc, &label2_ptr, 0);
@@ -596,7 +594,7 @@ static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* 
args, int opc)
 
 static void tcg_out_qemu_st(TCGContext* s, const TCGArg* args, int opc)
 {
-int addr_reg, data_reg, mem_index, s_bits;
+int addr_reg, data_reg, mem_index;
 uint16_t *label2_ptr;
 int arg0 = TCG_REG_R2;
 
@@ -604,10 +602,8 @@ static void tcg_out_qemu_st(TCGContext* s, const TCGArg* 
args, int opc)
 addr_reg = *args++;
 mem_index = *args;
 
-s_bits = opc;
-
-dprintf("tcg_out_qemu_st opc %d data_reg %d addr_reg %d mem_index %d "
-"s_bits %d\n", opc, data_reg, addr_reg, mem_index, s_bits);
+dprintf("tcg_out_qemu_st opc %d data_reg %d addr_reg %d mem_index %d\n"
+opc, data_reg, addr_reg, mem_index);
 
 tcg_prepare_qemu_ldst(s, data_reg, addr_reg, mem_index,
   opc, &label2_ptr, 1);
-- 
1.7.0.1




[Qemu-devel] [PATCH 21/62] tcg-s390: Generalize the direct load/store emission.

2010-05-27 Thread Richard Henderson
Define tcg_out_ldst which can properly choose between RX and RXY
format instructions based on the offset used, and also handles
large offsets.  Use it to implement all the INDEX_op_ld/st operations.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |  152 +++--
 1 files changed, 71 insertions(+), 81 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index b150d1a..21ad1a3 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -90,17 +90,22 @@ typedef enum S390Opcode {
 RS_SRL  = 0x88,
 
 RXY_CG  = 0xe320,
+RXY_LB  = 0xe376,
 RXY_LG  = 0xe304,
 RXY_LGB = 0xe377,
 RXY_LGF = 0xe314,
 RXY_LGH = 0xe315,
+RXY_LHY = 0xe378,
+RXY_LLC = 0xe394,
 RXY_LLGC= 0xe390,
 RXY_LLGF= 0xe316,
 RXY_LLGH= 0xe391,
+RXY_LLH = 0xe395,
 RXY_LMG = 0xeb04,
 RXY_LRV = 0xe31e,
 RXY_LRVG= 0xe30f,
 RXY_LRVH= 0xe31f,
+RXY_LY  = 0xe358,
 RXY_STCY= 0xe372,
 RXY_STG = 0xe324,
 RXY_STHY= 0xe370,
@@ -108,7 +113,10 @@ typedef enum S390Opcode {
 RXY_STRV= 0xe33e,
 RXY_STRVG   = 0xe32f,
 RXY_STRVH   = 0xe33f,
+RXY_STY = 0xe350,
 
+RX_L= 0x58,
+RX_LH   = 0x48,
 RX_ST   = 0x50,
 RX_STC  = 0x42,
 RX_STH  = 0x40,
@@ -362,22 +370,52 @@ static inline void tcg_out_movi(TCGContext *s, TCGType 
type,
 }
 }
 
-/* load data without address translation or endianness conversion */
-static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg data,
-   TCGReg base, tcg_target_long ofs)
+
+/* Emit a load/store type instruction.  Inputs are:
+   DATA: The register to be loaded or stored.
+   BASE+OFS: The effective address.
+   OPC_RX:   If the operation has an RX format opcode (e.g. STC), otherwise 0.
+   OPC_RXY:  The RXY format opcode for the operation (e.g. STCY).  */
+
+static void tcg_out_ldst(TCGContext *s, S390Opcode opc_rx, S390Opcode opc_rxy,
+ TCGReg data, TCGReg base, tcg_target_long ofs)
 {
-S390Opcode op;
+TCGReg index = 0;
+
+if (ofs < -0x8 || ofs >= 0x8) {
+/* Combine the low 16 bits of the offset with the actual load insn;
+   the high 48 bits must come from an immediate load.  */
+index = TCG_REG_R13;
+tcg_out_movi(s, TCG_TYPE_PTR, index, ofs & ~0x);
+ofs &= 0x;
+}
 
-op = (type == TCG_TYPE_I32) ? RXY_LLGF : RXY_LG;
+if (opc_rx && ofs >= 0 && ofs < 0x1000) {
+tcg_out_insn_RX(s, opc_rx, data, base, index, ofs);
+} else {
+tcg_out_insn_RXY(s, opc_rxy, data, base, index, ofs);
+}
+}
 
-if (ofs < -0x8 || ofs > 0x7) {
-/* load the displacement */
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, ofs);
-/* load the data */
-tcg_out_insn_RXY(s, op, data, base, TCG_REG_R13, 0);
+
+/* load data without address translation or endianness conversion */
+static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg data,
+  TCGReg base, tcg_target_long ofs)
+{
+if (type == TCG_TYPE_I32) {
+tcg_out_ldst(s, RX_L, RXY_LY, data, base, ofs);
 } else {
-/* load the data */
-tcg_out_insn_RXY(s, op, data, base, 0, ofs);
+tcg_out_ldst(s, 0, RXY_LG, data, base, ofs);
+}
+}
+
+static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg data,
+  TCGReg base, tcg_target_long ofs)
+{
+if (type == TCG_TYPE_I32) {
+tcg_out_ldst(s, RX_ST, RXY_STY, data, base, ofs);
+} else {
+tcg_out_ldst(s, 0, RXY_STG, data, base, ofs);
 }
 }
 
@@ -693,28 +731,6 @@ static void tcg_out_qemu_st(TCGContext* s, const TCGArg* 
args, int opc)
 tcg_finish_qemu_ldst(s, label2_ptr);
 }
 
-static inline void tcg_out_st(TCGContext *s, TCGType type, int arg,
-  int arg1, tcg_target_long arg2)
-{
-dprintf("tcg_out_st arg 0x%x arg1 0x%x arg2 0x%lx\n", arg, arg1, arg2);
-
-if (type == TCG_TYPE_I32) {
-if (((long)arg2) < -0x800 || ((long)arg2) > 0x7ff) {
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, arg2);
-tcg_out_insn(s, RRE, AGR, 13, arg1);
-tcg_out_insn(s, RX, ST, arg, TCG_REG_R13, 0, 0);
-} else {
-tcg_out_insn(s, RX, ST, arg, arg1, 0, arg2);
-}
-}
-else {
-if (((long)arg2) < -0x8 || ((long)arg2) > 0x7) {
-tcg_abort();
-}
-tcg_out_insn(s, RXY, STG, arg, arg1, 0, arg2);
-}
-}
-
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 const TCGArg *args, const int *const_args)
 {
@@ -780,51 +796,41 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 break;
 
 case INDEX_op_ld8u_i32:
+tcg_out_ldst(s, 0, RXY_LLC, args[0], args[1], args[2]);
+

[Qemu-devel] [PATCH 10/62] tcg-s390: R6 is a function argument register

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 6988937..25c80e6 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -137,6 +137,7 @@ static const int tcg_target_call_iarg_regs[] = {
 TCG_REG_R3,
 TCG_REG_R4,
 TCG_REG_R5,
+TCG_REG_R6,
 };
 
 static const int tcg_target_call_oarg_regs[] = {
-- 
1.7.0.1




[Qemu-devel] [PATCH 01/62] S390 TCG target

2010-05-27 Thread Richard Henderson
From: Alexander Graf 

We already have stubs for a TCG target on S390, but were missing code that
would actually generate instructions.

So I took Uli's patch, cleaned it up and present it to you again :-).

I hope I found all odd coding style and unprettiness issues, but if you
still spot one feel free to nag about it.

Signed-off-by: Alexander Graf 
CC: Uli Hecht 
---
 tcg/s390/tcg-target.c | 1176 -
 1 files changed, 1162 insertions(+), 14 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 265194a..d2a93c2 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -2,6 +2,7 @@
  * Tiny Code Generator for QEMU
  *
  * Copyright (c) 2009 Ulrich Hecht 
+ * Copyright (c) 2009 Alexander Graf 
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
@@ -22,31 +23,146 @@
  * THE SOFTWARE.
  */
 
+/* #define DEBUG_S390_TCG */
+
+#ifdef DEBUG_S390_TCG
+#define dprintf(fmt, ...) \
+do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define dprintf(fmt, ...) \
+do { } while (0)
+#endif
+
 static const int tcg_target_reg_alloc_order[] = {
+TCG_REG_R6,
+TCG_REG_R7,
+TCG_REG_R8,
+TCG_REG_R9,
+TCG_REG_R10,
+TCG_REG_R11,
+TCG_REG_R12,
+TCG_REG_R13,
+TCG_REG_R14,
+/* XXX many insns can't be used with R0, so we better avoid it for now */
+/* TCG_REG_R0 */
+TCG_REG_R1,
+TCG_REG_R2,
+TCG_REG_R3,
+TCG_REG_R4,
+TCG_REG_R5,
 };
 
 static const int tcg_target_call_iarg_regs[] = {
+TCG_REG_R2,
+TCG_REG_R3,
+TCG_REG_R4,
+TCG_REG_R5,
 };
 
 static const int tcg_target_call_oarg_regs[] = {
+TCG_REG_R2,
+TCG_REG_R3,
+};
+
+/* signed/unsigned is handled by using COMPARE and COMPARE LOGICAL,
+   respectively */
+static const uint8_t tcg_cond_to_s390_cond[10] = {
+[TCG_COND_EQ]  = 8,
+[TCG_COND_LT]  = 4,
+[TCG_COND_LTU] = 4,
+[TCG_COND_LE]  = 8 | 4,
+[TCG_COND_LEU] = 8 | 4,
+[TCG_COND_GT]  = 2,
+[TCG_COND_GTU] = 2,
+[TCG_COND_GE]  = 8 | 2,
+[TCG_COND_GEU] = 8 | 2,
+[TCG_COND_NE]  = 4 | 2 | 1,
+};
+
+#ifdef CONFIG_SOFTMMU
+
+#include "../../softmmu_defs.h"
+
+static void *qemu_ld_helpers[4] = {
+__ldb_mmu,
+__ldw_mmu,
+__ldl_mmu,
+__ldq_mmu,
+};
+
+static void *qemu_st_helpers[4] = {
+__stb_mmu,
+__stw_mmu,
+__stl_mmu,
+__stq_mmu,
 };
 
 static void patch_reloc(uint8_t *code_ptr, int type,
 tcg_target_long value, tcg_target_long addend)
 {
-tcg_abort();
+uint32_t *code_ptr_32 = (uint32_t*)code_ptr;
+tcg_target_long code_ptr_tlong = (tcg_target_long)code_ptr;
+
+switch (type) {
+case R_390_PC32DBL:
+*code_ptr_32 = (value - (code_ptr_tlong + addend)) >> 1;
+break;
+default:
+tcg_abort();
+break;
+}
 }
 
-static inline int tcg_target_get_call_iarg_regs_count(int flags)
-{
-tcg_abort();
-return 0;
+static int tcg_target_get_call_iarg_regs_count(int flags)
+  {
+return sizeof(tcg_target_call_iarg_regs) / sizeof(int);
 }
 
+static void constraint_softmmu(TCGArgConstraint *ct, const char c)
+{
+#ifdef CONFIG_SOFTMMU
+switch (c) {
+case 'S':   /* qemu_st constraint */
+tcg_regset_reset_reg (ct->u.regs, TCG_REG_R4);
+/* fall through */
+case 'L':   /* qemu_ld constraint */
+tcg_regset_reset_reg (ct->u.regs, TCG_REG_R3);
+break;
+}
+#endif
+  }
+
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 {
-tcg_abort();
+const char *ct_str;
+
+ct->ct |= TCG_CT_REG;
+tcg_regset_set32(ct->u.regs, 0, 0x);
+ct_str = *pct_str;
+
+switch (ct_str[0]) {
+case 'L':   /* qemu_ld constraint */
+tcg_regset_reset_reg (ct->u.regs, TCG_REG_R2);
+constraint_softmmu(ct, 'L');
+break;
+case 'S':   /* qemu_st constraint */
+tcg_regset_reset_reg (ct->u.regs, TCG_REG_R2);
+constraint_softmmu(ct, 'S');
+break;
+case 'R':/* not R0 */
+tcg_regset_reset_reg(ct->u.regs, TCG_REG_R0);
+break;
+case 'I':
+ct->ct &= ~TCG_CT_REG;
+ct->ct |= TCG_CT_CONST_S16;
+break;
+default:
+break;
+}
+ct_str++;
+*pct_str = ct_str;
+
 return 0;
 }
 
@@ -54,49 +170,1081 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 static inline int tcg_target_const_match(tcg_target_long val,
 const TCGArgConstraint *arg_ct)
 {
-tcg_abort();
+int ct = arg_ct->ct;
+
+if ((ct & TCG_CT_CONST) ||
+   ((ct & TCG_CT_CONST_S16) && val == (int16_t)val) ||
+   ((ct & TCG_CT_CONST_U12) && val == (val & 0xfff))) {
+return 1;
+}
+

[Qemu-devel] [PATCH 16/62] tcg-s390: Compute is_write in cpu_signal_handler.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 cpu-exec.c |   42 +++---
 1 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index c776605..026980a 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -1156,11 +1156,47 @@ int cpu_signal_handler(int host_signum, void *pinfo,
 siginfo_t *info = pinfo;
 struct ucontext *uc = puc;
 unsigned long pc;
-int is_write;
+uint16_t *pinsn;
+int is_write = 0;
 
 pc = uc->uc_mcontext.psw.addr;
-/* XXX: compute is_write */
-is_write = 0;
+
+/* ??? On linux, the non-rt signal handler has 4 (!) arguments instead
+   of the normal 2 arguments.  The 3rd argument contains the "int_code"
+   from the hardware which does in fact contain the is_write value.
+   The rt signal handler, as far as I can tell, does not give this value
+   at all.  Not that we could get to it from here even if it were.  */
+/* ??? This is not even close to complete, since it ignores all
+   of the read-modify-write instructions.  */
+pinsn = (uint16_t *)pc;
+switch (pinsn[0] >> 8) {
+case 0x50: /* ST */
+case 0x42: /* STC */
+case 0x40: /* STH */
+is_write = 1;
+break;
+case 0xc4: /* RIL format insns */
+switch (pinsn[0] & 0xf) {
+case 0xf: /* STRL */
+case 0xb: /* STGRL */
+case 0x7: /* STHRL */
+is_write = 1;
+}
+break;
+case 0xe3: /* RXY format insns */
+switch (pinsn[2] & 0xff) {
+case 0x50: /* STY */
+case 0x24: /* STG */
+case 0x72: /* STCY */
+case 0x70: /* STHY */
+case 0x8e: /* STPQ */
+case 0x3f: /* STRVH */
+case 0x3e: /* STRV */
+case 0x2f: /* STRVG */
+is_write = 1;
+}
+break;
+}
 return handle_cpu_signal(pc, (unsigned long)info->si_addr,
  is_write, &uc->uc_sigmask, puc);
 }
-- 
1.7.0.1




[Qemu-devel] [PATCH 06/62] s390x: Avoid _llseek.

2010-05-27 Thread Richard Henderson
There's no _llseek on s390x either.  Replace the existing
test for __x86_64__ with a functional test for __NR_llseek.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8222cb9..e94f1ee 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -208,7 +208,7 @@ _syscall3(int, sys_getdents, uint, fd, struct linux_dirent 
*, dirp, uint, count)
 _syscall3(int, sys_getdents64, uint, fd, struct linux_dirent64 *, dirp, uint, 
count);
 #endif
 _syscall2(int, sys_getpriority, int, which, int, who);
-#if defined(TARGET_NR__llseek) && !defined (__x86_64__)
+#if defined(TARGET_NR__llseek) && defined(__NR_llseek)
 _syscall5(int, _llseek,  uint,  fd, ulong, hi, ulong, lo,
   loff_t *, res, uint, wh);
 #endif
@@ -5933,7 +5933,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
 #ifdef TARGET_NR__llseek /* Not on alpha */
 case TARGET_NR__llseek:
 {
-#if defined (__x86_64__)
+#if !defined(__NR_llseek)
 ret = get_errno(lseek(arg1, ((uint64_t )arg2 << 32) | arg3, arg5));
 if (put_user_s64(ret, arg4))
 goto efault;
-- 
1.7.0.1




[Qemu-devel] [PATCH 04/62] tcg-s390: Fix tcg_prepare_qemu_ldst for user mode.

2010-05-27 Thread Richard Henderson
This isn't the most efficient way to implement user
memory accesses, but it's the minimal change to fix
the compilation error.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 9ab1d96..f0013e7 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -418,8 +418,14 @@ static void tcg_prepare_qemu_ldst(TCGContext* s, int 
data_reg, int addr_reg,
 int mem_index, int opc,
 uint16_t **label2_ptr_p, int is_store)
 {
+int arg0 = TCG_REG_R2;
+
 /* user mode, no address translation required */
-*arg0 = addr_reg;
+if (TARGET_LONG_BITS == 32) {
+tcg_out_b9(s, B9_LLGFR, arg0, addr_reg);
+} else {
+tcg_out_b9(s, B9_LGR, arg0, addr_reg);
+}
 }
 
 static void tcg_finish_qemu_ldst(TCGContext* s, uint16_t *label2_ptr)
-- 
1.7.0.1




[Qemu-devel] [PATCH 13/62] tcg-s390: Add -m64 and -march to s390x compilation.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 configure |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/configure b/configure
index e2b389d..72d3df8 100755
--- a/configure
+++ b/configure
@@ -697,7 +697,11 @@ case "$cpu" in
fi
;;
 s390)
-   QEMU_CFLAGS="-march=z900 $QEMU_CFLAGS"
+   QEMU_CFLAGS="-march=z990 $QEMU_CFLAGS"
+   ;;
+s390x)
+   QEMU_CFLAGS="-m64 -march=z9-109 $QEMU_CFLAGS"
+   LDFLAGS="-m64 $LDFLAGS"
;;
 i386)
QEMU_CFLAGS="-m32 $QEMU_CFLAGS"
-- 
1.7.0.1




[Qemu-devel] [PATCH 03/62] tcg-s390: Only validate CPUTLBEntry for system mode.

2010-05-27 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 45c1bf7..9ab1d96 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -1198,10 +1198,12 @@ do_logic_i64:
 
 void tcg_target_init(TCGContext *s)
 {
+#if !defined(CONFIG_USER_ONLY)
 /* fail safe */
 if ((1 << CPU_TLB_ENTRY_BITS) != sizeof(CPUTLBEntry)) {
 tcg_abort();
 }
+#endif
 
 tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0x);
 tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0x);
-- 
1.7.0.1




[Qemu-devel] [PATCH 02/62] add lost chunks from the original patch

2010-05-27 Thread Richard Henderson
From: Alexander Graf 

---
 tcg/s390/tcg-target.c |3 ++
 tcg/s390/tcg-target.h |   86 +++--
 2 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index d2a93c2..45c1bf7 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -96,6 +96,9 @@ static void *qemu_st_helpers[4] = {
 __stl_mmu,
 __stq_mmu,
 };
+#endif
+
+static uint8_t *tb_ret_addr;
 
 static void patch_reloc(uint8_t *code_ptr, int type,
 tcg_target_long value, tcg_target_long addend)
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index d8a2955..bd72115 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -47,7 +47,7 @@ enum {
 #define TCG_TARGET_NB_REGS 16
 
 /* optional instructions */
-// #define TCG_TARGET_HAS_div_i32
+#define TCG_TARGET_HAS_div_i32
 // #define TCG_TARGET_HAS_rot_i32
 // #define TCG_TARGET_HAS_ext8s_i32
 // #define TCG_TARGET_HAS_ext16s_i32
@@ -56,7 +56,7 @@ enum {
 // #define TCG_TARGET_HAS_bswap16_i32
 // #define TCG_TARGET_HAS_bswap32_i32
 // #define TCG_TARGET_HAS_not_i32
-// #define TCG_TARGET_HAS_neg_i32
+#define TCG_TARGET_HAS_neg_i32
 // #define TCG_TARGET_HAS_andc_i32
 // #define TCG_TARGET_HAS_orc_i32
 // #define TCG_TARGET_HAS_eqv_i32
@@ -75,7 +75,7 @@ enum {
 // #define TCG_TARGET_HAS_bswap32_i64
 // #define TCG_TARGET_HAS_bswap64_i64
 // #define TCG_TARGET_HAS_not_i64
-// #define TCG_TARGET_HAS_neg_i64
+#define TCG_TARGET_HAS_neg_i64
 // #define TCG_TARGET_HAS_andc_i64
 // #define TCG_TARGET_HAS_orc_i64
 // #define TCG_TARGET_HAS_eqv_i64
@@ -87,6 +87,86 @@ enum {
 #define TCG_TARGET_STACK_ALIGN 8
 #define TCG_TARGET_CALL_STACK_OFFSET   0
 
+#define TCG_CT_CONST_S160x100
+#define TCG_CT_CONST_U120x200
+
+#define E3_LG  0x04
+#define E3_LRVG0x0f
+#define E3_LGF 0x14
+#define E3_LGH 0x15
+#define E3_LLGF0x16
+#define E3_LRV 0x1e
+#define E3_LRVH0x1f
+#define E3_CG  0x20
+#define E3_STG 0x24
+#define E3_STRVG   0x2f
+#define E3_STRV0x3e
+#define E3_STRVH   0x3f
+#define E3_STHY0x70
+#define E3_STCY0x72
+#define E3_LGB 0x77
+#define E3_LLGC0x90
+#define E3_LLGH0x91
+
+#define B9_LGR 0x04
+#define B9_AGR 0x08
+#define B9_SGR 0x09
+#define B9_MSGR0x0c
+#define B9_LGFR0x14
+#define B9_LLGFR   0x16
+#define B9_CGR 0x20
+#define B9_CLGR0x21
+#define B9_NGR 0x80
+#define B9_OGR 0x81
+#define B9_XGR 0x82
+#define B9_DLGR0x87
+#define B9_DLR 0x97
+
+#define RR_BASR0x0d
+#define RR_NR  0x14
+#define RR_CLR 0x15
+#define RR_OR  0x16
+#define RR_XR  0x17
+#define RR_LR  0x18
+#define RR_CR  0x19
+#define RR_AR  0x1a
+#define RR_SR  0x1b
+
+#define A7_AHI 0xa
+#define A7_AHGI0xb
+
+#define SH64_REG_NONE  0x00 /* use immediate only (not R0!) */
+#define SH64_SRAG  0x0a
+#define SH64_SRLG  0x0c
+#define SH64_SLLG  0x0d
+
+#define SH32_REG_NONE  0x00 /* use immediate only (not R0!) */
+#define SH32_SRL   0x08
+#define SH32_SLL   0x09
+#define SH32_SRA   0x0a
+
+#define ST_STH 0x40
+#define ST_STC 0x42
+#define ST_ST  0x50
+
+#define LD_SIGNED  0x04
+#define LD_UINT8   0x00
+#define LD_INT8(LD_UINT8 | LD_SIGNED)
+#define LD_UINT16  0x01
+#define LD_INT16   (LD_UINT16 | LD_SIGNED)
+#define LD_UINT32  0x02
+#define LD_INT32   (LD_UINT32 | LD_SIGNED)
+#define LD_UINT64  0x03
+#define LD_INT64   (LD_UINT64 | LD_SIGNED)
+
+#define S390_INS_BCR   0x0700
+#define S390_INS_BR(S390_INS_BCR | 0x00f0)
+#define S390_INS_IILH  0xa502
+#define S390_INS_LLILL 0xa50f
+#define S390_INS_LGHI  0xa709
+#define S390_INS_MSR   0xb252
+#define S390_INS_LARL  0xc000
+
 enum {
 /* Note: must be synced with dyngen-exec.h */
 TCG_AREG0 = TCG_REG_R10,
-- 
1.7.0.1




[Qemu-devel] [PATCH 00/62] s390x tcg target

2010-05-27 Thread Richard Henderson
The following patch series is available at

  git://repo.or.cz/qemu/rth.git tcg-s390-2

It begins with Uli Hecht's original patch, posted by Alexander
sometime last year.  I then make incremental changes to

  (1) Make it compile -- first patch that compiles is tagged
  as tcg-s390-2-first-compile and is

  d142103... tcg-s390: Define tcg_target_reg_names.

  (2) Make it work -- the first patch that i386-linux-user 
  successfully completes linux-test-user-0.2 is tagged
  as tcg-s390-2-first-working and is

  3571f8d... tcg-s390: Implement setcond.

  (3) Make it work for other targets.  I don't tag this,
  but there are lots of load/store aborts and an 
  incorrectly division routine until

  9798371... tcg-s390: Implement div2.

  (4) Make it work well.  The balance of the patches incrementally
  add support for new instructions.  At

  7bfaa9e... tcg-s390: Query instruction extensions that are installed.

  I add support for detecting the instruction set extensions
  present in the host and then start disabling some of those
  new instructions that may not be present.

Once things start working, each step was tested with an --enable-debug
compile, and running the linux-user-test suite as well as booting 
the {arm,coldfire,sparc}-linux test kernels, and booting freedos.

Unfortunately, each step was only built without optimization, and it
is only at the end that we discovered that TCG was not properly honoring
the host ABI.  This is solved by the last patch, adding proper sign
extensions for the 32-bit function arguments.  With the final patch
everything works for an optimized build as well.

The current state is that the TCG compiler works for an s390x host.
That is, with a 64-bit userland binary.  It will *compile* for a 
32-bit userland binary, but that facility is only retained for the
purpose of running the s390 kvm guest.  If kvm is not used, the
32-bit binary will exit with an error message.

Given that this is the beginning of proper support for s390, I don't
know whether bisectability is really an issue.  I suppose we could
fairly easily re-base the patches that touch files outside tcg/s390/
and then squash the rest, but I suspect the history may be useful.



r~



Alexander Graf (2):
  S390 TCG target
  add lost chunks from the original patch

Richard Henderson (60):
  tcg-s390: Only validate CPUTLBEntry for system mode.
  tcg-s390: Fix tcg_prepare_qemu_ldst for user mode.
  tcg-s390: Move opcode defines to tcg-target.c.
  s390x: Avoid _llseek.
  s390x: Don't use a linker script for user-only.
  tcg-s390: Avoid set-but-not-used werrors.
  tcg-s390: Mark R0 & R15 reserved.
  tcg-s390: R6 is a function argument register
  tcg-s390: Move tcg_out_mov up and use it throughout.
  tcg-s390: Eliminate the S constraint.
  tcg-s390: Add -m64 and -march to s390x compilation.
  tcg-s390: Define tcg_target_reg_names.
  tcg-s390: Update disassembler from binutils head.
  tcg-s390: Compute is_write in cpu_signal_handler.
  tcg-s390: Reorganize instruction emission
  tcg-s390: Use matching constraints.
  tcg-s390: Fixup qemu_ld/st opcodes.
  tcg-s390: Implement setcond.
  tcg-s390: Generalize the direct load/store emission.
  tcg-s390: Tidy branches.
  tcg-s390: Add tgen_calli.
  tcg-s390: Implement div2.
  tcg-s390: Re-implement tcg_out_movi.
  tcg-s390: Implement sign and zero-extension operations.
  tcg-s390: Implement bswap operations.
  tcg-s390: Implement rotates.
  tcg-s390: Use LOAD COMPLIMENT for negate.
  tcg-s390: Tidy unimplemented opcodes.
  tcg-s390: Use the extended-immediate facility for add/sub.
  tcg-s390: Implement immediate ANDs.
  tcg-s390: Implement immediate ORs.
  tcg-s390: Implement immediate MULs.
  tcg-s390: Implement immediate XORs.
  tcg-s390: Icache flush is a no-op.
  tcg-s390: Define TCG_TMP0.
  tcg-s390: Tidy regset initialization; use R14 as temporary.
  tcg-s390: Rearrange register allocation order.
  tcg-s390: Tidy goto_tb.
  tcg-s390: Allocate the code_gen_buffer near the main program.
  tcg-s390: Rearrange qemu_ld/st to avoid register copy.
  tcg-s390: Tidy tcg_prepare_qemu_ldst.
  tcg-s390: Tidy user qemu_ld/st.
  tcg-s390: Implement GUEST_BASE.
  tcg-s390: Query instruction extensions that are installed.
  tcg-s390: Conditionalize general-instruction-extension insns.
  tcg-s390: Conditionalize ADD IMMEDIATE instructions.
  tcg-s390: Conditionalize LOAD IMMEDIATE instructions.
  tcg-s390: Conditionalize 8 and 16 bit extensions.
  tcg-s390: Conditionalize AND IMMEDIATE instructions.
  tcg-s390: Conditionalize OR IMMEDIATE instructions.
  tcg-s390: Conditionalize XOR IMMEDIATE instructions.
  tcg-s390: Do not require the extended-immediate facility.
  tcg-s390: Use 16-bit branches for forward jumps.
  tcg-s390: Use the LOAD AND TEST instruction for compares.
  tcg-s390: Use the COMPARE IMMEDIATE instrucions for compares.
  tcg-s390: Use COMPARE AND BRANCH instructions.
  tcg-s390: Generalize load/store support.
  tcg-s390

[Qemu-devel] Re: [OpenBIOS] [PATCH 0/3] sparc64 cleanups v1

2010-05-27 Thread Blue Swirl
On Thu, May 27, 2010 at 4:57 PM, Mark Cave-Ayland
 wrote:
> Blue Swirl wrote:
>
>> On Tue, May 25, 2010 at 12:12 PM, Igor V. Kovalenko
>>  wrote:
>>>
>>> One code cleanup and another pci host bridge remap change,
>>> the latter requires qemu update with patch already posted to qemu list.
>>>
>>> v0->v1: added missing patch moving asi.h to arch includes
>>
>> Thanks, applied all.
>
> Whilst updating to OpenBIOS SVN and qemu git head to test these patches,
> I've found a regression with qemu-system-sparc64 and
> debian-504-sparc-netinst.iso. Rather than getting to the end of the kernel
> boot and being unable to mount the root filesystem, instead I now get the
> following fatal trap message:
>
>
> [   42.493402] Console: switching to mono PROM 128x96
> [   63.440200] [drm] Initialized drm 1.1.0 20060810
> [   63.542123] su: probe of ffe2dea0 failed with error -12
> [   63.690331] brd: module loaded
> [   63.787034] loop: module loaded
> [   63.863989] Uniform Multi-Platform E-IDE driver
> [   63.961215] ide: Assuming 33MHz system bus speed for PIO modes; override
> with idebus=xx
> [   64.115119] mice: PS/2 mouse device common for all mice
> [   64.234482] usbcore: registered new interface driver usbhid
> [   64.359397] usbhid: v2.6:USB HID core driver
> [   64.462167] TCP cubic registered
> [   64.539714] NET: Registered protocol family 17
> [   64.642969] registered taskstats version 1
> [   64.737822] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
> qemu: fatal: Trap 0x0068 while trap level (5) >= MAXTL (5), Error state
> pc: 00424d18  npc: 00424d1c
> General Registers:
> %g0-3:  0800 4000 0002
> %g4-7: 03ff 0001 0020 4000
>
> Current Register Window:
> %o0-3:    
> %o4-7:   fffd3ef0 
> %l0-3:    
> %l4-7:    
> %i0-3:    
> %i4-7:    
>
> Floating Point Registers:
> %f00: 0.00 0.00 0.00 0.00
> %f04: 0.00 0.00 0.00 0.00
> %f08: 0.00 0.00 0.00 0.00
> %f12: 0.00 0.00 0.00 0.00
> %f16: 0.00 0.00 0.00 0.00
> %f20: 0.00 0.00 0.00 0.00
> %f24: 0.00 0.00 0.00 0.00
> %f28: 0.00 0.00 0.00 0.00
> %f32: 0.00 0.00 0.00 0.00
> %f36: 0.00 0.00 0.00 0.00
> %f40: 0.00 0.00 0.00 0.00
> %f44: 0.00 0.00 0.00 0.00
> %f48: 0.00 0.00 0.00 0.00
> %f52: 0.00 0.00 0.00 0.00
> %f56: 0.00 0.00 0.00 0.00
> %f60: 0.00 0.00 0.00 0.00
> pstate: 0414 ccr: 00 (icc:  xcc: ) asi: 82 tl: 5 pil: 0
> cansave: 6 canrestore: 0 otherwin: 0 wstate: 2 cleanwin: 0 cwp: 7
> fsr:  y:  fprs: 
> Aborted
>
>
> Digging deeper, it seems that this was something that was introduced earlier
> than the last set of patches. Reverting to OpenBIOS SVN r777 and using 'git
> bisect', I can identify the offending commit in qemu git as
> 2aae2b8e0abd58e76d616bcbe93c6966d06d0188 "sparc64: fix pstate privilege
> bits". Does that help at all?

Yes, bisection results are usually very helpful, thanks.

I think the problem is that previously psrs was always 1 and PSR_HYPV
always set, so maximally permissive MMU_HYPV_INDEX was always selected
by cpu_mmu_index (bug!). Also because PSR_HYPV is no longer set, some
checks in translate.c indicate privilege violations.

The logic was previously such that if the CPU does not have a
hypervisor mode, for compatibility, supervisor mode would also select
hypervisor mode (or at least that was my intention and probably Igor
wasn't aware of this, sorry). Now that they are separate, CPUs without
hypervisor mode must be handled differently. Perhaps this commit
should be reverted, the fix won't be so trivial.

The lesson here is also that subtle assumptions like this should be documented.



[Qemu-devel] Re: [PATCH v3 16/17] QMP: Fix python helper /wrt long return strings

2010-05-27 Thread Luiz Capitulino
On Sun, 23 May 2010 12:59:29 +0200
Jan Kiszka  wrote:

> From: Jan Kiszka 
> 
> Remove the arbitrary limitation of 1024 characters per return string and
> read complete lines instead. Required for device_show.

 Thanks for both fixes, I have started working on a better version of this
script that mimics better the user monitor but it's only half done.

> 
> Signed-off-by: Jan Kiszka 
> ---
>  QMP/qmp.py |6 +-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/QMP/qmp.py b/QMP/qmp.py
> index d9da603..4062f84 100644
> --- a/QMP/qmp.py
> +++ b/QMP/qmp.py
> @@ -63,10 +63,14 @@ class QEMUMonitorProtocol:
>  
>  def __json_read(self):
>  try:
> -return json.loads(self.sock.recv(1024))
> +while True:
> +line = json.loads(self.sockfile.readline())
> +if not 'event' in line:
> +return line
>  except ValueError:
>  return
>  
>  def __init__(self, filename):
>  self.filename = filename
>  self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
> +self.sockfile = self.sock.makefile()




[Qemu-devel] Re: [PATCH v3 13/17] monitor: Allow to exclude commands from QMP

2010-05-27 Thread Luiz Capitulino
On Sun, 23 May 2010 12:59:26 +0200
Jan Kiszka  wrote:

> From: Jan Kiszka 
> 
> Ported commands that are marked 'user_only' will not be considered for
> QMP monitor sessions. This allows to implement new commands that do not
> (yet) provide a sufficiently stable interface for QMP use (e.g.
> device_show).

 This is fine for me, but two things I've been wondering:

 1. Isn't a 'flags' struct member better? So that we can do (in the
qemu-monitor.hx entry):

.flags = MONITOR_USER_ONLY | MONITOR_HANDLER_ASYNC,

I'm not suggesting this is an async handler, just exemplifying multiple
flags.

  2. Getting QMP handlers right in the first time might be difficult, so
 we could have a way to mark them unstable. Maybe a different namespace
 which is only enabled at configure time with:

 --enable-qmp-unstable-commands

 If this were possible, we could have device_show and any command we
 aren't sure is QMP-ready working in QMP this way.

> 
> Signed-off-by: Jan Kiszka 
> ---
>  monitor.c |   13 ++---
>  1 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/monitor.c b/monitor.c
> index 6766e49..5768c6e 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -114,6 +114,7 @@ typedef struct mon_cmd_t {
>MonitorCompletion *cb, void *opaque);
>  } mhandler;
>  int async;
> +bool user_only;
>  } mon_cmd_t;
>  
>  /* file descriptors passed via SCM_RIGHTS */
> @@ -635,6 +636,11 @@ static int do_info(Monitor *mon, const QDict *qdict, 
> QObject **ret_data)
>  goto help;
>  }
>  
> +if (monitor_ctrl_mode(mon) && cmd->user_only) {
> +qerror_report(QERR_COMMAND_NOT_FOUND, item);
> +return -1;
> +}
> +
>  if (monitor_handler_is_async(cmd)) {
>  if (monitor_ctrl_mode(mon)) {
>  qmp_async_info_handler(mon, cmd);
> @@ -732,13 +738,14 @@ static void do_info_commands(Monitor *mon, QObject 
> **ret_data)
>  cmd_list = qlist_new();
>  
>  for (cmd = mon_cmds; cmd->name != NULL; cmd++) {
> -if (monitor_handler_ported(cmd) && !compare_cmd(cmd->name, "info")) {
> +if (monitor_handler_ported(cmd) && !cmd->user_only &&
> +!compare_cmd(cmd->name, "info")) {
>  qlist_append_obj(cmd_list, get_cmd_dict(cmd->name));
>  }
>  }
>  
>  for (cmd = info_cmds; cmd->name != NULL; cmd++) {
> -if (monitor_handler_ported(cmd)) {
> +if (monitor_handler_ported(cmd) && !cmd->user_only) {
>  char buf[128];
>  snprintf(buf, sizeof(buf), "query-%s", cmd->name);
>  qlist_append_obj(cmd_list, get_cmd_dict(buf));
> @@ -4416,7 +4423,7 @@ static void handle_qmp_command(JSONMessageParser 
> *parser, QList *tokens)
>qobject_from_jsonf("{ 'item': %s }", info_item));
>  } else {
>  cmd = monitor_find_command(cmd_name);
> -if (!cmd || !monitor_handler_ported(cmd)) {
> +if (!cmd || !monitor_handler_ported(cmd) || cmd->user_only) {
>  qerror_report(QERR_COMMAND_NOT_FOUND, cmd_name);
>  goto err_input;
>  }




[Qemu-devel] [PATCH] Extra scan codes for missing keys

2010-05-27 Thread Brendan Sleight
Hi All,

First - Qemu is fantastic and allows lots of wonderful things.

Second, when using qemu-system-ppc, I wanted to use sendkey to emulate
a colon. This patch enables shift-semicolon to emulate a ':'

Whilst I was adding semicolon, I used the following link to look up
some other missing keys :-
 http://terpconnect.umd.edu/~nsw/ench250/scancode.htm#Key2

Please cc my to any replies as I am not subscribed to the list.

Best Regards,
Brendan M. Sleight

diff --git a/monitor.c b/monitor.c
index ad50f12..e1ffa0e 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1641,6 +1641,8 @@ static const KeyDef key_defs[] = {
 { 0x17, "i" },
 { 0x18, "o" },
 { 0x19, "p" },
+{ 0x1a, "sqr_brack_l" },
+{ 0x1b, "sqr_brack_r" },

 { 0x1c, "ret" },

@@ -1653,7 +1655,11 @@ static const KeyDef key_defs[] = {
 { 0x24, "j" },
 { 0x25, "k" },
 { 0x26, "l" },
+{ 0x27, "semicolon" },
+{ 0x28, "apostrophe" },
+{ 0x29, "grave_accent" },

+{ 0x2b, "backslash" },
 { 0x2c, "z" },
 { 0x2d, "x" },
 { 0x2e, "c" },



[Qemu-devel] Re: [PATCH v3 10/17] QMP: Reserve namespace for complex object classes

2010-05-27 Thread Luiz Capitulino
On Sun, 23 May 2010 12:59:23 +0200
Jan Kiszka  wrote:

> From: Jan Kiszka 
> 
> This reserves JSON objects that contain the key '__class__' for QMP-specific
> complex objects. First user will be the buffer class.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  QMP/qmp-spec.txt |   16 +---
>  1 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/QMP/qmp-spec.txt b/QMP/qmp-spec.txt
> index 9d30a8c..fa1dd62 100644
> --- a/QMP/qmp-spec.txt
> +++ b/QMP/qmp-spec.txt
> @@ -146,6 +146,15 @@ The format is:
>  For a listing of supported asynchronous events, please, refer to the
>  qmp-events.txt file.
>  
> +2.6 Complex object classes
> +--
> +
> +JSON objects that contain the key-value pair '"__class__": json-string' are

 I'm not strong about this, but it's better to call it just a 'pair', as 'value'
is a bit problematic because of json-value.

> +reserved for QMP-specific complex object classes that. QMP specifies which

 Early full stop?

> +further keys each of these objects include and how they are encoded.
> +
> +So far, no complex object class is specified.
> +
>  3. QMP Examples
>  ===
>  
> @@ -229,9 +238,10 @@ avoid modifying QMP.  Both upstream and downstream need 
> to take care to
>  preserve long-term compatibility and interoperability.
>  
>  To help with that, QMP reserves JSON object member names beginning with
> -'__' (double underscore) for downstream use ("downstream names").  This
> -means upstream will never use any downstream names for its commands,
> -arguments, errors, asynchronous events, and so forth.
> +'__' (double underscore) for downstream use ("downstream names").  Downstream
> +names MUST NOT end with '__' as this pattern is reserved for QMP-defined JSON
> +object classes.  Upstream will never use any downstream names for its
> +commands, arguments, errors, asynchronous events, and so forth.

 Suggest mentioning subsection 2.6.

>  
>  Any new names downstream wishes to add must begin with '__'.  To
>  ensure compatibility with other downstreams, it is strongly




[Qemu-devel] Re: SVM emulation: EVENTINJ marked valid when a pagefault happens while issuing a software interrupt

2010-05-27 Thread Erik van der Kouwe

Hi,


Be warned: Though my experience is already more than a year old, the SVM
emulation in QEMU is most probably not yet rock-stable. Always check
suspicious behavior against real hardware and/or the spec. [ As real
hardware is everywhere, nesting works with KVM+SVM and is much faster,
motivation to improve QEMU in this area is unfortunately limited. ]


Problem is: I'm compiling in Linux and testing in MINIX. Testing on the 
real hardware would require a reboot everytime. Moreover, it might screw 
up my system if I make bad mistakes (the MINIX filesystem is easily 
corrupted).


That said, I do aim to eventually test the real hardware. Plenty of 
virtualization capable hardware where I work, although unfortunately all 
Intel.



This issue is easy to work around by clearing the EVENTINJ field on each
#VMEXIT (and I have submitted a patch to that effect to the Palacios
people) and this approach is also found in KVM.


/me does not find such clearing in KVM - what line(s) are you looking at?


Linux source tree (2.6.31-ubuntu), arch/x86/kvm/svm.c, end of function 
nested_svm_vmrun. Here event_inj and event_inj_err are copied from a 
different VMCB, effectively clearing the value set by the CPU. Maybe 
this isn't were I should have been looking though?



The relevant code is in target-i386/op_helper.c. The "handle_even_inj"
function sets the EVENTINJ field (called event_inf in the QEMU code) and
the helper_vmexit function copies that field into EXITINTINFO
(exit_int_info in the QEMU code). I believe (but once again, am not
certain) that the SVM documentation only says that this information
should be stored in EXITINTINFO.


Yes, this also looks suspicious. handle_even_inj should not push the
real (level 1) event to be injected into event_inj[_err] but into
exit_int_info[_err] or some temporary fields from which the exit info is
then loaded later on.


Yes, if this is indeed incorrect behaviour then this is what I would 
expect a fix to be like.


Thanks again,
Erik



[Qemu-devel] Re: [PATCH v3 06/17] qdev: Allow device specification by qtree path for device_del

2010-05-27 Thread Luiz Capitulino
On Sun, 23 May 2010 12:59:19 +0200
Jan Kiszka  wrote:

> From: Jan Kiszka 
> 
> Allow to specify the device to be removed via device_del not only by ID
> but also by its full or abbreviated qtree path. For this purpose,
> qdev_find is introduced which combines walking the qtree with searching
> for device IDs if required.

 [...]

>  Arguments:
>  
> -- "id": the device's ID (json-string)
> +- "path": the device's qtree path or unique ID (json-string)
>  
>  Example:
>  
> --> { "execute": "device_del", "arguments": { "id": "net1" } }
> +-> { "execute": "device_del", "arguments": { "path": "net1" } }

 Doesn't seem like a good change to me, besides being incompatible[1] we
shouldn't overload arguments this way in QMP as overloading leads to
interface degradation (harder to use, understand, maintain).

 Maybe we could have both arguments as optional, but one must be passed.

[1] It's 'legal' to break the protocol before 0.13, but this has to be
coordinated with libvirt so we should have a good reason to do this

>  <- { "return": {} }
>  
>  EQMP




Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

2010-05-27 Thread Blue Swirl
On Thu, May 27, 2010 at 7:08 PM, Jan Kiszka  wrote:
> Blue Swirl wrote:
>> On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka  wrote:
>>> Blue Swirl wrote:
 On Wed, May 26, 2010 at 11:26 PM, Paul Brook  wrote:
>> At the other extreme, would it be possible to make the educated guests
>> aware of the virtualization also in clock aspect: virtio-clock?
> The guest doesn't even need to be aware of virtualization. It just needs 
> to be
> able to accommodate the lack of guaranteed realtime behavior.
>
> The fundamental problem here is that some guest operating systems assume 
> that
> the hardware provides certain realtime guarantees with respect to 
> execution of
> interrupt handlers.  In particular they assume that the CPU will always be
> able to complete execution of the timer IRQ handler before the periodic 
> timer
> triggers again.  In most virtualized environments you have absolutely no
> guarantee of realtime response.
>
> With Linux guests this was solved a long time ago by the introduction of
> tickless kernels.  These separate the timekeeping from wakeup events, so 
> it
> doesn't matter if several wakeup triggers end up getting merged (either 
> at the
> hardware level or via top/bottom half guest IRQ handlers).
>
>
> It's worth mentioning that this problem also occurs on real hardware,
> typically due to lame hardware/drivers which end up masking interrupts or
> otherwise stall the CPU for for long periods of time.
>
>
> The PIT hack attempts to workaround broken guests by adding artificial 
> latency
> to the timer event, ensuring that the guest "sees" them all.  
> Unfortunately
> guests vary on when it is safe for them to see the next timer event, and
> trying to observe this behavior involves potentially harmful heuristics 
> and
> collusion between unrelated devices (e.g. interrupt controller and timer).
>
> In some cases we don't even do that, and just reschedule the event some
> arbitrarily small amount of time later. This assumes the guest to do 
> useful
> work in that time. In a single threaded environment this is probably true 
> -
> qemu got enough CPU to inject the first interrupt, so will probably 
> manage to
> execute some guest code before the end of its timeslice. In an environment
> where interrupt processing/delivery and execution of the guest code 
> happen in
> different threads this becomes increasingly likely to fail.
 So any voodoo around timer events is doomed to fail in some cases.
 What's the amount of hacks what we want then? Is there any generic
>>> The aim of this patch is to reduce the amount of existing and upcoming
>>> hacks. It may still require some refinements, but I think we haven't
>>> found any smarter approach yet that fits existing use cases.
>>
>> I don't feel we have tried other possibilities hard enough.
>
> Well, seeing prototypes wouldn't be bad, also to run real load againt
> them. But at least I'm currently clueless what to implement.

Perhaps now is then not the time to rush to implement something, but
to brainstorm for a clean solution.

>>
 solution, like slowing down the guest system to the point where we can
 guarantee the interrupt rate vs. CPU execution speed?
>>> That's generally a non-option in virtualized production environments.
>>> Specifically if the guest system lost interrupts due to host
>>> overcommitment, you do not want it slow down even further.
>>
>> I meant that the guest time could be scaled down, for example 2s in
>> wall clock time would be presented to the guest as 1s.
>
> But that is precisely what already happens when the guest loses timer
> interrupts. There is no other time source for this kind of guests -
> often except for some external events generated by systems which you
> don't want to fall behind arbitrarily.
>
>> Then the amount
>> of CPU cycles between timer interrupts would increase and hopefully
>> the guest can keep up. If the guest sleeps, time base could be
>> accelerated to catch up with wall clock and then set back to 1:1 rate.
>
> Can't follow you ATM, sorry. What should be slowed down then? And how
> precisely?

I think vm_clock and everything that depends on vm_clock, also
rtc_clock should be tied to vm_clock in this mode, not host_clock.

>
> Jan
>
>>
>> Slowing down could be triggered by measuring the guest load (for
>> example, by checking for presence of halt instructions), if it's close
>> to 1, time would be slowed down. If the guest starts to issue halt
>> instructions because it's more idle, we can increase speed.
>>
>> If this approach worked, even APIC could be made ignorant about
>> coalescing voodoo so it should be a major cleanup.
>
>
>



[Qemu-devel] [PATCH 1/1] ceph/rbd block driver for qemu-kvm (v2)

2010-05-27 Thread Christian Brunner
This is a block driver for the distributed file system Ceph 
(http://ceph.newdream.net/). This driver uses librados (which 
is part of the Ceph server) for direct access to the Ceph object 
store and is running entirely in userspace. Therefore it is 
called "rbd" - rados block device.

To compile the driver a recent version of ceph (unstable/testin git 
head or 0.20.3 once it is released) is needed and you have to 
"--enable-rbd" when running configure.

Additional information is available on the Ceph-Wiki:

http://ceph.newdream.net/wiki/Kvm-rbd

The patch is based on git://repo.or.cz/qemu/kevin.git block

---
 Makefile  |3 +
 Makefile.objs |1 +
 block/rbd.c   |  584 +
 block/rbd_types.h |   52 +
 configure |   27 +++
 5 files changed, 667 insertions(+), 0 deletions(-)
 create mode 100644 block/rbd.c
 create mode 100644 block/rbd_types.h

diff --git a/Makefile b/Makefile
index 7986bf6..8d09612 100644
--- a/Makefile
+++ b/Makefile
@@ -27,6 +27,9 @@ configure: ;
 $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw)
 
 LIBS+=-lz $(LIBS_TOOLS)
+ifdef CONFIG_RBD
+LIBS+=-lrados
+endif
 
 ifdef BUILD_DOCS
 DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8
diff --git a/Makefile.objs b/Makefile.objs
index 1a942e5..08dc11f 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -18,6 +18,7 @@ block-nested-y += parallels.o nbd.o blkdebug.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
+block-nested-$(CONFIG_RBD) += rbd.o
 
 block-obj-y +=  $(addprefix block/, $(block-nested-y))
 
diff --git a/block/rbd.c b/block/rbd.c
new file mode 100644
index 000..375ae9d
--- /dev/null
+++ b/block/rbd.c
@@ -0,0 +1,584 @@
+/*
+ * QEMU Block driver for RADOS (Ceph)
+ *
+ * Copyright (C) 2010 Christian Brunner 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include 
+#include 
+
+#include 
+
+#include "rbd_types.h"
+#include "module.h"
+#include "block_int.h"
+
+#include 
+#include 
+#include 
+
+#include 
+
+/*
+ * When specifying the image filename use:
+ *
+ * rbd:poolname/devicename
+ *
+ * poolname must be the name of an existing rados pool
+ *
+ * devicename is the basename for all objects used to
+ * emulate the raw device.
+ *
+ * Metadata information (image size, ...) is stored in an
+ * object with the name "devicename.rbd".
+ *
+ * The raw device is split into 4MB sized objects by default.
+ * The sequencenumber is encoded in a 12 byte long hex-string,
+ * and is attached to the devicename, separated by a dot.
+ * e.g. "devicename.1234567890ab"
+ *
+ */
+
+#define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER)
+
+typedef struct RBDAIOCB {
+BlockDriverAIOCB common;
+QEMUBH *bh;
+int ret;
+QEMUIOVector *qiov;
+char *bounce;
+int write;
+int64_t sector_num;
+int aiocnt;
+int error;
+} RBDAIOCB;
+
+typedef struct RADOSCB {
+int rcbid;
+RBDAIOCB *acb;
+int done;
+int64_t segsize;
+char *buf;
+} RADOSCB;
+
+typedef struct RBDRVRBDState {
+rados_pool_t pool;
+char name[RBD_MAX_OBJ_NAME_SIZE];
+int name_len;
+uint64_t size;
+uint64_t objsize;
+} RBDRVRBDState;
+
+typedef struct rbd_obj_header_ondisk RbdHeader1;
+
+static int rbd_parsename(const char *filename, char *pool, char *name)
+{
+const char *rbdname;
+char *p, *n;
+int l;
+
+if (!strstart(filename, "rbd:", &rbdname)) {
+return -EINVAL;
+}
+
+pstrcpy(pool, 2 * RBD_MAX_SEG_NAME_SIZE, rbdname);
+p = strchr(pool, '/');
+if (p == NULL) {
+return -EINVAL;
+}
+
+*p = '\0';
+n = ++p;
+
+l = strlen(n);
+
+if (l > RBD_MAX_OBJ_NAME_SIZE) {
+fprintf(stderr, "object name to long\n");
+return -EINVAL;
+} else if (l <= 0) {
+fprintf(stderr, "object name to short\n");
+return -EINVAL;
+}
+
+strcpy(name, n);
+
+return l;
+}
+
+static int create_tmap_op(uint8_t op, const char *name, char **tmap_desc)
+{
+uint32_t len = strlen(name);
+uint32_t total_len = 1 + (sizeof(uint32_t) + len) + sizeof(uint32_t);  
 /* encoding op + name + empty buffer */
+char *desc;
+
+desc = qemu_malloc(total_len);
+if (!desc) {
+return -ENOMEM;
+}
+
+*tmap_desc = desc;
+
+*desc = op;
+desc++;
+memcpy(desc, &len, sizeof(len));
+desc += sizeof(len);
+memcpy(desc, name, len);
+desc += len;
+len = 0;
+memcpy(desc, &len, sizeof(len));
+desc += sizeof(len);
+
+return desc - *tmap_desc;
+}
+
+static void free_tmap_op(char *tmap_desc)
+{
+qemu_free(tmap_desc);
+}
+
+static int rbd_register_image(rados_pool_t pool, const char *name)
+{
+char *tmap_desc;
+const char *dir = RBD_DIRECTORY;
+int ret;
+
+ret = create_tmap_op(CEPH_OSD_TMAP_SET, name, &tmap_d

[Qemu-devel] [PATCH 0/1] ceph/rbd block driver for qemu-kvm (v2)

2010-05-27 Thread Christian Brunner
Hi,

Based on the review notes Blue Swirl sent us after my last mail, Yehuda
cleaned up the header files. The patch is much smaller now and I hope
that you accept it for inclusion.

To build it, you will need the testing (or unstable) git head of ceph 
now. The required header files will be part of the next release of 
ceph (0.20.3).

In case you didn't read my last posting, here is the short description
again:

This patch is a block driver for the distributed file system Ceph
(http://ceph.newdream.net/). Ceph was included in the Linux v2.6.34
kernel. However, this driver uses librados (which is part of the Ceph
server) for direct access to the Ceph object store and is running entirely
in userspace. Therefore it is called "rbd" - rados block device.

The basic idea is to stripe a VM block device over (by default) 4MB objects
stored in the Ceph distributed object store.  This is very similar to what
the sheepdog project is doing, but uses the ceph server as a storage backend.
If you don't plan on using the entire ceph filesystem you may leave out the
metadata service of ceph.

Yehuda Sadeh helped me with the implementation and put some additional
usage information on the Ceph-Wiki (http://ceph.newdream.net/wiki/Kvm-rbd).
He has also written a Linux kernel driver to make an rbd image accessible as
a block device.

Regards,
Christian

---
 Makefile  |3 +
 Makefile.objs |1 +
 block/rbd.c   |  584 +
 block/rbd_types.h |   52 +
 configure |   27 +++
 5 files changed, 667 insertions(+), 0 deletions(-)
 create mode 100644 block/rbd.c
 create mode 100644 block/rbd_types.h



Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

2010-05-27 Thread Jan Kiszka
Blue Swirl wrote:
> On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka  wrote:
>> Blue Swirl wrote:
>>> On Wed, May 26, 2010 at 11:26 PM, Paul Brook  wrote:
> At the other extreme, would it be possible to make the educated guests
> aware of the virtualization also in clock aspect: virtio-clock?
 The guest doesn't even need to be aware of virtualization. It just needs 
 to be
 able to accommodate the lack of guaranteed realtime behavior.

 The fundamental problem here is that some guest operating systems assume 
 that
 the hardware provides certain realtime guarantees with respect to 
 execution of
 interrupt handlers.  In particular they assume that the CPU will always be
 able to complete execution of the timer IRQ handler before the periodic 
 timer
 triggers again.  In most virtualized environments you have absolutely no
 guarantee of realtime response.

 With Linux guests this was solved a long time ago by the introduction of
 tickless kernels.  These separate the timekeeping from wakeup events, so it
 doesn't matter if several wakeup triggers end up getting merged (either at 
 the
 hardware level or via top/bottom half guest IRQ handlers).


 It's worth mentioning that this problem also occurs on real hardware,
 typically due to lame hardware/drivers which end up masking interrupts or
 otherwise stall the CPU for for long periods of time.


 The PIT hack attempts to workaround broken guests by adding artificial 
 latency
 to the timer event, ensuring that the guest "sees" them all.  Unfortunately
 guests vary on when it is safe for them to see the next timer event, and
 trying to observe this behavior involves potentially harmful heuristics and
 collusion between unrelated devices (e.g. interrupt controller and timer).

 In some cases we don't even do that, and just reschedule the event some
 arbitrarily small amount of time later. This assumes the guest to do useful
 work in that time. In a single threaded environment this is probably true -
 qemu got enough CPU to inject the first interrupt, so will probably manage 
 to
 execute some guest code before the end of its timeslice. In an environment
 where interrupt processing/delivery and execution of the guest code happen 
 in
 different threads this becomes increasingly likely to fail.
>>> So any voodoo around timer events is doomed to fail in some cases.
>>> What's the amount of hacks what we want then? Is there any generic
>> The aim of this patch is to reduce the amount of existing and upcoming
>> hacks. It may still require some refinements, but I think we haven't
>> found any smarter approach yet that fits existing use cases.
> 
> I don't feel we have tried other possibilities hard enough.

Well, seeing prototypes wouldn't be bad, also to run real load againt
them. But at least I'm currently clueless what to implement.

> 
>>> solution, like slowing down the guest system to the point where we can
>>> guarantee the interrupt rate vs. CPU execution speed?
>> That's generally a non-option in virtualized production environments.
>> Specifically if the guest system lost interrupts due to host
>> overcommitment, you do not want it slow down even further.
> 
> I meant that the guest time could be scaled down, for example 2s in
> wall clock time would be presented to the guest as 1s.

But that is precisely what already happens when the guest loses timer
interrupts. There is no other time source for this kind of guests -
often except for some external events generated by systems which you
don't want to fall behind arbitrarily.

> Then the amount
> of CPU cycles between timer interrupts would increase and hopefully
> the guest can keep up. If the guest sleeps, time base could be
> accelerated to catch up with wall clock and then set back to 1:1 rate.

Can't follow you ATM, sorry. What should be slowed down then? And how
precisely?

Jan

> 
> Slowing down could be triggered by measuring the guest load (for
> example, by checking for presence of halt instructions), if it's close
> to 1, time would be slowed down. If the guest starts to issue halt
> instructions because it's more idle, we can increase speed.
> 
> If this approach worked, even APIC could be made ignorant about
> coalescing voodoo so it should be a major cleanup.




signature.asc
Description: OpenPGP digital signature


[Qemu-devel] Re: [PATCH, RFC 1/4] pci: add I/O registration functions

2010-05-27 Thread Blue Swirl
On Thu, May 27, 2010 at 2:39 PM, Michael S. Tsirkin  wrote:
> On Sun, May 23, 2010 at 08:34:30PM +, Blue Swirl wrote:
>> Convert also APB to use the registration so that
>> we can remove mem_base.
>>
>> Signed-off-by: Blue Swirl 
>> ---
>>  hw/apb_pci.c |   23 -
>>  hw/pci.c     |   64 
>> ++---
>>  hw/pci.h     |    9 +++-
>>  3 files changed, 68 insertions(+), 28 deletions(-)
>
> Probably should mention pci.c changes in the changelog.

It's the subject.

>
>> diff --git a/hw/apb_pci.c b/hw/apb_pci.c
>> index 65d8ba6..fb23397 100644
>> --- a/hw/apb_pci.c
>> +++ b/hw/apb_pci.c
>> @@ -74,6 +74,7 @@ typedef struct APBState {
>>      qemu_irq pci_irqs[32];
>>      uint32_t reset_control;
>>      unsigned int nr_resets;
>> +    target_phys_addr_t mem_base;
>>  } APBState;
>>
>>  static void apb_config_writel (void *opaque, target_phys_addr_t addr,
>> @@ -316,6 +317,24 @@ static void apb_pci_bridge_init(PCIBus *b)
>>                   PCI_HEADER_TYPE_MULTI_FUNCTION);
>>  }
>>
>> +static void apb_register_mem(void *opaque, pcibus_t addr, pcibus_t
>> size, int mm)
>> +{
>> +    APBState *d = opaque;
>> +
>> +    APB_DPRINTF("%s: addr %" FMT_PCIBUS " size %" FMT_PCIBUS "mm %x\n",
>> +                __func__, addr, size, mm);
>> +    cpu_register_physical_memory(addr + d->mem_base, size, mm);
>> +}
>> +
>> +static void apb_unregister_mem(void *opaque, pcibus_t addr, pcibus_t size)
>> +{
>> +    APBState *d = opaque;
>> +
>> +    APB_DPRINTF("%s: addr %" FMT_PCIBUS " size %" FMT_PCIBUS "\n",
>> +                __func__, addr, size);
>> +    cpu_register_physical_memory(addr + d->mem_base, size, 
>> IO_MEM_UNASSIGNED);
>> +}
>> +
>>  PCIBus *pci_apb_init(target_phys_addr_t special_base,
>>                       target_phys_addr_t mem_base,
>>                       qemu_irq *pic, PCIBus **bus2, PCIBus **bus3)
>> @@ -338,10 +357,12 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
>>      /* mem_data */
>>      sysbus_mmio_map(s, 3, mem_base);
>>      d = FROM_SYSBUS(APBState, s);
>> +    d->mem_base = mem_base;
>>      d->host_state.bus = pci_register_bus(&d->busdev.qdev, "pci",
>>                                           pci_apb_set_irq, pci_pbm_map_irq, 
>> d,
>>                                           0, 32);
>> -    pci_bus_set_mem_base(d->host_state.bus, mem_base);
>> +    pci_bus_set_register_mem_fn(d->host_state.bus, apb_register_mem,
>> +                                apb_unregister_mem, d);
>>
>>      for (i = 0; i < 32; i++) {
>>          sysbus_connect_irq(s, i, pic[i]);
>> diff --git a/hw/pci.c b/hw/pci.c
>> index 8d84651..ffd6dc3 100644
>> --- a/hw/pci.c
>> +++ b/hw/pci.c
>> @@ -46,7 +46,9 @@ struct PCIBus {
>>      void *irq_opaque;
>>      PCIDevice *devices[256];
>>      PCIDevice *parent_dev;
>> -    target_phys_addr_t mem_base;
>> +    pci_register_mem_fn register_mem;
>> +    pci_unregister_mem_fn unregister_mem;
>> +    void *register_fn_opaque;
>>
>>      QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */
>>      QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */
>> @@ -163,6 +165,18 @@ static void pci_device_reset(PCIDevice *dev)
>>      pci_update_mappings(dev);
>>  }
>>
>> +static void pci_bus_default_register_mem(void *opaque, pcibus_t addr,
>> +                                         pcibus_t size, int mm)
>> +{
>> +    cpu_register_physical_memory(addr, size, mm);
>> +}
>> +
>> +static void pci_bus_default_unregister_mem(void *opaque, pcibus_t addr,
>> +                                           pcibus_t size)
>> +{
>> +    cpu_register_physical_memory(addr, size, IO_MEM_UNASSIGNED);
>> +}
>> +
>>  static void pci_bus_reset(void *opaque)
>>  {
>>      PCIBus *bus = opaque;
>> @@ -205,6 +219,8 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState 
>> *parent,
>>  {
>>      qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
>>      bus->devfn_min = devfn_min;
>> +    bus->register_mem = pci_bus_default_register_mem;
>> +    bus->unregister_mem = pci_bus_default_unregister_mem;
>>
>>      /* host bridge */
>>      QLIST_INIT(&bus->child);
>> @@ -241,11 +257,6 @@ void pci_bus_hotplug(PCIBus *bus, pci_hotplug_fn
>> hotplug, DeviceState *qdev)
>>      bus->hotplug_qdev = qdev;
>>  }
>>
>> -void pci_bus_set_mem_base(PCIBus *bus, target_phys_addr_t base)
>> -{
>> -    bus->mem_base = base;
>> -}
>> -
>>  PCIBus *pci_register_bus(DeviceState *parent, const char *name,
>>                           pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
>>                           void *irq_opaque, int devfn_min, int nirq)
>> @@ -651,12 +662,6 @@ PCIDevice *pci_register_device(PCIBus *bus, const
>> char *name,
>>      return pci_dev;
>>  }
>>
>> -static target_phys_addr_t pci_to_cpu_addr(PCIBus *bus,
>> -                                          target_phys_addr_t addr)
>> -{
>> -    return addr + bus->mem_base;
>> -}
>> -
>>  static void pci_unregister_io_regions(PCIDe

Re: [Qemu-devel] [PATCH] vhost_net.c: v2 Fix build failure introduced by 0bfcd599e3f5c5679cc7d0165a0a1822e2f60de2

2010-05-27 Thread Blue Swirl
Thanks, applied.

On Thu, May 27, 2010 at 12:26 PM,   wrote:
> From: Jes Sorensen 
>
> Fix build failure introduced by 0bfcd599e3f5c5679cc7d0165a0a1822e2f60de2
>
> The format statement expects unsigned long on x86_64, but receives
> unsigned long long, so gcc exits with an error.
>
> Signed-off-by: Jes Sorensen 
> ---
>  hw/vhost_net.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/hw/vhost_net.c b/hw/vhost_net.c
> index 26dae79..606aa0c 100644
> --- a/hw/vhost_net.c
> +++ b/hw/vhost_net.c
> @@ -100,7 +100,7 @@ struct vhost_net *vhost_net_init(VLANClientState 
> *backend, int devfd)
>     }
>     if (~net->dev.features & net->dev.backend_features) {
>         fprintf(stderr, "vhost lacks feature mask %" PRIu64 " for backend\n",
> -                ~net->dev.features & net->dev.backend_features);
> +                (uint64_t)(~net->dev.features & net->dev.backend_features));
>         vhost_dev_cleanup(&net->dev);
>         goto fail;
>     }
> --
> 1.6.5.2
>
>
>



[Qemu-devel] [Bug 586221] Re: Linux on ARM/Mainstone machine fails at bootstrap

2010-05-27 Thread Lars Munch
There are already patches pending to solve these issues:

http://article.gmane.org/gmane.comp.emulators.qemu/69598
and
http://article.gmane.org/gmane.comp.emulators.qemu/69597

Hopefully they will be reviewed/applied soon.

-- 
Linux on ARM/Mainstone machine fails at bootstrap
https://bugs.launchpad.net/bugs/586221
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
When QEMU (0.12.4) starts a ARM machine which boots a Linux, it immediately 
fails because the initial PC is wrong (is equal to 0 instead of 0xa000). 
First investigations indicate that the reset handlers queue does not return the 
correct opaque structure in main_cpu_reset() which is passed during the 
insertion within arm_load_kernel(). I checked the value of r15 and the value is 
correct at the insertion, but at 0x0 in main_cpu_reset().

There may be still a bug in the new queue management functions (in 
qemu-queue.h).





[Qemu-devel] [Bug 586424] Re: SMC91C111 failed when booting Linux/ARM(Mainstone) since 0.10.0

2010-05-27 Thread Lars Munch
This was fixed some time ago in commit
3b4b86aace17ef07fc4f85a9662c991efbc83e15

-- 
SMC91C111 failed when booting Linux/ARM(Mainstone) since 0.10.0
https://bugs.launchpad.net/bugs/586424
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: In Progress

Bug description:
Since QEMU 0.10.0, the SMC91C111 emulation on a ARM machine like Mainstone, 
fails when performing some read/write.





Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

2010-05-27 Thread Blue Swirl
On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka  wrote:
> Blue Swirl wrote:
>> On Wed, May 26, 2010 at 11:26 PM, Paul Brook  wrote:
 At the other extreme, would it be possible to make the educated guests
 aware of the virtualization also in clock aspect: virtio-clock?
>>> The guest doesn't even need to be aware of virtualization. It just needs to 
>>> be
>>> able to accommodate the lack of guaranteed realtime behavior.
>>>
>>> The fundamental problem here is that some guest operating systems assume 
>>> that
>>> the hardware provides certain realtime guarantees with respect to execution 
>>> of
>>> interrupt handlers.  In particular they assume that the CPU will always be
>>> able to complete execution of the timer IRQ handler before the periodic 
>>> timer
>>> triggers again.  In most virtualized environments you have absolutely no
>>> guarantee of realtime response.
>>>
>>> With Linux guests this was solved a long time ago by the introduction of
>>> tickless kernels.  These separate the timekeeping from wakeup events, so it
>>> doesn't matter if several wakeup triggers end up getting merged (either at 
>>> the
>>> hardware level or via top/bottom half guest IRQ handlers).
>>>
>>>
>>> It's worth mentioning that this problem also occurs on real hardware,
>>> typically due to lame hardware/drivers which end up masking interrupts or
>>> otherwise stall the CPU for for long periods of time.
>>>
>>>
>>> The PIT hack attempts to workaround broken guests by adding artificial 
>>> latency
>>> to the timer event, ensuring that the guest "sees" them all.  Unfortunately
>>> guests vary on when it is safe for them to see the next timer event, and
>>> trying to observe this behavior involves potentially harmful heuristics and
>>> collusion between unrelated devices (e.g. interrupt controller and timer).
>>>
>>> In some cases we don't even do that, and just reschedule the event some
>>> arbitrarily small amount of time later. This assumes the guest to do useful
>>> work in that time. In a single threaded environment this is probably true -
>>> qemu got enough CPU to inject the first interrupt, so will probably manage 
>>> to
>>> execute some guest code before the end of its timeslice. In an environment
>>> where interrupt processing/delivery and execution of the guest code happen 
>>> in
>>> different threads this becomes increasingly likely to fail.
>>
>> So any voodoo around timer events is doomed to fail in some cases.
>> What's the amount of hacks what we want then? Is there any generic
>
> The aim of this patch is to reduce the amount of existing and upcoming
> hacks. It may still require some refinements, but I think we haven't
> found any smarter approach yet that fits existing use cases.

I don't feel we have tried other possibilities hard enough.

>> solution, like slowing down the guest system to the point where we can
>> guarantee the interrupt rate vs. CPU execution speed?
>
> That's generally a non-option in virtualized production environments.
> Specifically if the guest system lost interrupts due to host
> overcommitment, you do not want it slow down even further.

I meant that the guest time could be scaled down, for example 2s in
wall clock time would be presented to the guest as 1s. Then the amount
of CPU cycles between timer interrupts would increase and hopefully
the guest can keep up. If the guest sleeps, time base could be
accelerated to catch up with wall clock and then set back to 1:1 rate.

Slowing down could be triggered by measuring the guest load (for
example, by checking for presence of halt instructions), if it's close
to 1, time would be slowed down. If the guest starts to issue halt
instructions because it's more idle, we can increase speed.

If this approach worked, even APIC could be made ignorant about
coalescing voodoo so it should be a major cleanup.



[Qemu-devel] Re: SVM emulation: EVENTINJ marked valid when a pagefault happens while issuing a software interrupt

2010-05-27 Thread Jan Kiszka
Erik van der Kouwe wrote:
> Dear all,
> 
> I have been experiencing problems with duplicate delivery of software
> interrupts when running a VMM inside QEMU with SVM emulation. I believe

Be warned: Though my experience is already more than a year old, the SVM
emulation in QEMU is most probably not yet rock-stable. Always check
suspicious behavior against real hardware and/or the spec. [ As real
hardware is everywhere, nesting works with KVM+SVM and is much faster,
motivation to improve QEMU in this area is unfortunately limited. ]

> QEMU's behaviour deviates from the SVM specification in "AMD64
> Architecture Programmer’s Manual Volume 2 System Programming" but I am
> not entirely certain because this specification isn't very clear. I
> would like to hear your views on this.
> 
> My set-up is as follows:
> Host: Linux 2.6.31-21-generic-pae (Ubuntu 9.10)
> VMM running on host: QEMU 0.12.3 (compiled from source)
> Outer guest: MINIX 3.1.7 (from SVN, see http://www.minix3.org/)
> VMM running on outer guest: Palacios 1.2.0 32-bit (from git, see
> http://www.v3vee.org/palacios/)
> Inner guest: MINIX 3.1.7 (from SVN, see http://www.minix3.org/)
> 
> The issue is the following: whenever an software interrupt instruction
> (INT n, used in this case to perform a system call) in the inner guest
> triggers a page fault (used for shadow paging by Palacios, not a real
> guest page fault), QEMU sets the EVENTINV field of the guest VMCB to the
> exit information that the software interrupt would produce and marks it
> as valid. Palacios does not overwrite the EVENTINJ field, so after the
> page fault is handled a software interrupt event is injected. After the
> IRET of the interrupt handler, control returns to the original INT n
> instruction which once again triggers the interrupt.
> 
> This issue is easy to work around by clearing the EVENTINJ field on each
> #VMEXIT (and I have submitted a patch to that effect to the Palacios
> people) and this approach is also found in KVM.

/me does not find such clearing in KVM - what line(s) are you looking at?

> 
> However, I haven't been able to find information in the AMD
> documentation that mentions that the CPU sets the valid bit in the
> EVENTINJ field so, unless I am mistaken here, I believe this behaviour
> is incorrect. QEMU stores interrupt information in both EVENTINJ and
> EXITINTINFO while I believe it should be only in the latter.
> Unfortunately I don't have a physical AMD available to verify its
> behaviour.

Based on the KVM code (which is known to work perfectly :) ), I think
you are right: SVM apparently clears the valid bit in EVENTINJ during
VMRUN once it starts processing the injection, not after it as it's the
case in current QEMU. But better ask the experts: Jörg, Gleb?

> 
> The relevant code is in target-i386/op_helper.c. The "handle_even_inj"
> function sets the EVENTINJ field (called event_inf in the QEMU code) and
> the helper_vmexit function copies that field into EXITINTINFO
> (exit_int_info in the QEMU code). I believe (but once again, am not
> certain) that the SVM documentation only says that this information
> should be stored in EXITINTINFO.

Yes, this also looks suspicious. handle_even_inj should not push the
real (level 1) event to be injected into event_inj[_err] but into
exit_int_info[_err] or some temporary fields from which the exit info is
then loaded later on.

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] Matheus Teles cantor sertanejo

2010-05-27 Thread Matheus Teles

Matheus Teles, 15 anos de idade, cantor sertanejo. Acesse: www.MatheusTeles.com.br



Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

2010-05-27 Thread Jan Kiszka
Blue Swirl wrote:
> On Wed, May 26, 2010 at 11:26 PM, Paul Brook  wrote:
>>> At the other extreme, would it be possible to make the educated guests
>>> aware of the virtualization also in clock aspect: virtio-clock?
>> The guest doesn't even need to be aware of virtualization. It just needs to 
>> be
>> able to accommodate the lack of guaranteed realtime behavior.
>>
>> The fundamental problem here is that some guest operating systems assume that
>> the hardware provides certain realtime guarantees with respect to execution 
>> of
>> interrupt handlers.  In particular they assume that the CPU will always be
>> able to complete execution of the timer IRQ handler before the periodic timer
>> triggers again.  In most virtualized environments you have absolutely no
>> guarantee of realtime response.
>>
>> With Linux guests this was solved a long time ago by the introduction of
>> tickless kernels.  These separate the timekeeping from wakeup events, so it
>> doesn't matter if several wakeup triggers end up getting merged (either at 
>> the
>> hardware level or via top/bottom half guest IRQ handlers).
>>
>>
>> It's worth mentioning that this problem also occurs on real hardware,
>> typically due to lame hardware/drivers which end up masking interrupts or
>> otherwise stall the CPU for for long periods of time.
>>
>>
>> The PIT hack attempts to workaround broken guests by adding artificial 
>> latency
>> to the timer event, ensuring that the guest "sees" them all.  Unfortunately
>> guests vary on when it is safe for them to see the next timer event, and
>> trying to observe this behavior involves potentially harmful heuristics and
>> collusion between unrelated devices (e.g. interrupt controller and timer).
>>
>> In some cases we don't even do that, and just reschedule the event some
>> arbitrarily small amount of time later. This assumes the guest to do useful
>> work in that time. In a single threaded environment this is probably true -
>> qemu got enough CPU to inject the first interrupt, so will probably manage to
>> execute some guest code before the end of its timeslice. In an environment
>> where interrupt processing/delivery and execution of the guest code happen in
>> different threads this becomes increasingly likely to fail.
> 
> So any voodoo around timer events is doomed to fail in some cases.
> What's the amount of hacks what we want then? Is there any generic

The aim of this patch is to reduce the amount of existing and upcoming
hacks. It may still require some refinements, but I think we haven't
found any smarter approach yet that fits existing use cases.

> solution, like slowing down the guest system to the point where we can
> guarantee the interrupt rate vs. CPU execution speed?

That's generally a non-option in virtualized production environments.
Specifically if the guest system lost interrupts due to host
overcommitment, you do not want it slow down even further.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

2010-05-27 Thread Blue Swirl
2010/5/27 Gleb Natapov :
> On Wed, May 26, 2010 at 08:35:00PM +, Blue Swirl wrote:
>> On Wed, May 26, 2010 at 8:09 PM, Jan Kiszka  wrote:
>> > Blue Swirl wrote:
>> >> On Tue, May 25, 2010 at 9:44 PM, Jan Kiszka  wrote:
>> >>> Anthony Liguori wrote:
>>  On 05/25/2010 02:09 PM, Blue Swirl wrote:
>> > On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka  wrote:
>> >
>> >> From: Jan Kiszka
>> >>
>> >> This allows to communicate potential IRQ coalescing during delivery 
>> >> from
>> >> the sink back to the source. Targets that support IRQ coalescing
>> >> workarounds need to register handlers that return the appropriate
>> >> QEMU_IRQ_* code, and they have to propergate the code across all IRQ
>> >> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can
>> >> apply its workaround. If multiple sinks exist, the source may only
>> >> consider an IRQ coalesced if all other sinks either report
>> >> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED.
>> >>
>> > No real devices are interested whether any of their output lines are
>> > even connected. This would introduce a new signal type, bidirectional
>> > multi-level, which is not correct.
>> >
>>  I don't think it's really an issue of correct, but I wouldn't disagree
>>  to a suggestion that we ought to introduce a new signal type for this
>>  type of bidirectional feedback.  Maybe it's qemu_coalesced_irq and has a
>>  similar interface as qemu_irq.
>> >>> A separate type would complicate the delivery of the feedback value
>> >>> across GPIO pins (as Paul requested for the RTC->HPET routing).
>> >>>
>> > I think the real solution to coalescing is put the logic inside one
>> > device, in this case APIC because it has the information about irq
>> > delivery. APIC could monitor incoming RTC irqs for frequency
>> > information and whether they get delivered or not. If not, an internal
>> > timer is installed which injects the lost irqs.
>> >>> That won't fly as the IRQs will already arrive at the APIC with a
>> >>> sufficiently high jitter. At the bare minimum, you need to tell the
>> >>> interrupt controller about the fact that a particular IRQ should be
>> >>> delivered at a specific regular rate. For this, you also need a generic
>> >>> interface - nothing really "won".
>> >>
>> >> OK, let's simplify: just reinject at next possible chance. No need to
>> >> monitor or tell anything.
>> >
>> > There are guests that won't like this (I know of one in-house, but
>> > others may even have more examples), specifically if you end up firing
>> > multiple IRQs in a row due to a longer backlog. For that reason, the RTC
>> > spreads the reinjection according to the current rate.
>>
>> Then reinject with a constant delay, or next CPU exit. Such buggy
> If guest's time frequency is the same as host time frequency you can't
> reinject with constant delay. That is why current code mixes two
> approaches: reinject M interrupts in a raw then delay.

This approach can be also used by APIC-only version.

>> guests could also be assisted with special handling (like win2k
>> install hack), for example guest instructions could be counted
>> (approximately, for example using TB size or TSC) and only inject
>> after at least N instructions have passed.
> Guest instructions cannot be easily counted in KVM (it can be done more
> or less reliably using perf counters, may be).

Aren't there any debug registers or perf counters, which can generate
an interrupt after some number of instructions have been executed?

>>
>> > And even if the rate did not matter, the APIC woult still have to now
>> > about the fact that an IRQ is really periodic and does not only appear
>> > as such for a certain interval. This really does not sound like
>> > simplifying things or even make them cleaner.
>>
>> It would, the voodoo would be contained only in APIC, RTC would be
>> just like any other device. With the bidirectional irqs, this voodoo
>> would probably eventually spread to many other devices. The logical
>> conclusion of that would be a system where all devices would be
>> careful not to disturb the guest at wrong moment because that would
>> trigger a bug.
>>
> This voodoo will be so complex and unreliable that it will make RTC hack
> pale in comparison (and I still don't see how you are going to make it
> actually work).

Implement everything inside APIC: only coalescing and reinjection.
Maybe that version would not bend backwards as much as the current to
cater for buggy hosts.

> The fact is that timer device is not "just like any
> other device" in virtual world. Any other device is easy: you just
> implement spec as close as possible and everything works. For time
> source device this is not enough. You can implement RTC+HPET to the
> letter and your guest will drift like crazy.

It's doable: a cycle accurate emulator will not cause any drift,
without any voodoo. The interrupts wou

  1   2   >