date:20111006

Re: [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2)

2011-10-06 Thread Alexander Graf


On 30.09.2011, at 09:39, David Gibson wrote:

> Alex Graf has added support for KVM acceleration of the pseries
> machine, using his Book3S-PR KVM variant, which runs the guest in
> userspace, emulating supervisor operations.  Recent kernels now have
> the Book3S-HV KVM variant which uses the hardware hypervisor features
> of recent POWER CPUs.  Alex's changes to qemu are enough to get qemu
> working roughly with Book3S-HV, but taking full advantage of this mode
> needs more work.  This patch series makes a start on better exploiting
> Book3S-HV.
> 
> Even with these patches, qemu won't quite be able to run on a current
> Book3S-HV KVM kernel.  That's because current Book3S-HV requires guest
> memory to be backed by hugepages, but qemu refuses to use hugepages
> for guest memory unless KVM advertises CAP_SYNC_MMU, which Book3S-HV
> does not currently do.  We're working on improvements to the KVM code
> which will implement CAP_SYNC_MMU and allow smallpage backing of
> guests, but they're not there yet.  So, in order to test Book3S-HV for
> now you need to either:
> 
> * Hack the host kernel to lie and advertise CAP_SYNC_MMU even though
>   it doesn't really implement it.
> 
> or
> 
> * Hack qemu so it does not check for CAP_SYNC_MMU when the -mem-path
>   option is used.
> 
> Bot approaches are ugly and unsafe, but it seems we can generally get
> away with it in practice.  Obviously this is only an interim hack
> until the proper CAP_SYNC_MMU support is ready.

I would prefer the latter. We could even #ifdef it for TARGET_PPC.


Alex

Re: [Qemu-devel] [PATCH v2 2/2] ppc/e500_pci: Fix an array overflow issue

2011-10-06 Thread Alexander Graf


On 30.09.2011, at 05:52, Liu Yu wrote:

> When access PPCE500_PCI_IW1 the previous index get overflow.
> The patch fix the issue and update all to keep consistent style.
> 
> Signed-off-by: Liu Yu 

Thanks, applied both to my local ppc-next tree. Will push once Blue pulled the 
request.


Alex

[Qemu-devel] [PATCH 2/2] qemu-options.hx: Update virtfs command documentation

2011-10-06 Thread Aneesh Kumar K.V

Clarify the virtfs option better
Updates from:Sripathi Kodi 

Signed-off-by: Aneesh Kumar K.V 
---
 qemu-options.hx |  119 ---
 1 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 38f0aef..6c744e0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -522,43 +522,61 @@ possible drivers and properties, use @code{-device ?} and
 @code{-device @var{driver},?}.
 ETEXI
 
+DEFHEADING()
+
 DEFHEADING(File system options:)
 
 DEF("fsdev", HAS_ARG, QEMU_OPTION_fsdev,
-"-fsdev local,id=id,path=path,security_model=[mapped|passthrough|none]\n"
+"-fsdev 
fsdriver,id=id,path=path,security_model=[mapped|passthrough|none]\n"
 "   [,cache=writethrough]\n",
 QEMU_ARCH_ALL)
 
 STEXI
 
-The general form of a File system device option is:
-@table @option
-
-@item -fsdev @var{fstype} ,id=@var{id} [,@var{options}]
+@item -fsdev 
@var{fsdriver},id=@var{id},path=@var{path},security_model=@var{security_model}[,cache=@var{cache}]
 @findex -fsdev
-Fstype is one of:
-@option{local},
-The specific Fstype will determine the applicable options.
-
-Options to each backend are described below.
-
-@item -fsdev local ,id=@var{id} ,path=@var{path} 
,security_model=@var{security_model}[,cache=@var{cache}]
-
-Create a file-system-"device" for local-filesystem.
-
-@option{local} is only available on Linux.
-
-@option{path} specifies the path to be exported. @option{path} is required.
-
-@option{security_model} specifies the security model to be followed.
-@option{security_model} is required.
-
-@option{cache} specifies whether to skip the host page cache.
-@option{cache} is an optional argument.
+Define a new file system device. Valid options are:
+@table @option
+@item @var{fsdriver}
+This option specifies the fs driver backend to use.
+Currently "local" and "handle" file system drivers are supported.
+@item id=@var{id}
+Specifies identifier for this device
+@item path=@var{path}
+Specifies the export path for the file system device. Files under
+this path will be available to the 9p client on the guest.
+@item security_model=@var{security_model}
+Specifies the security model to be used for this export path.
+Supported security models are "passthrough", "mapped" and "none".
+In "passthrough" security model, files are stored using the same
+credentials as they are created on the guest. This requires qemu
+to run as root. In "mapped" security model, some of the file
+attributes like uid, gid, mode bits and link target are stored as
+file attributes. Directories exported by this security model cannot
+interact with other unix tools. "none" security model is same as
+passthrough except the sever won't report failures if it fails to
+set file attributes like ownership.
+@item cache=@var{cache}
+This is an optional argument. The only supported value is "writethrough".
+This means that host page cache will be used to read and write data but
+write notification will be sent to the guest only when the data has been
+reported as written by the storage subsystem.
+@end table
 
+-fsdev option is used along with -device driver "virtio-9p-pci".
+@item -device virtio-9p-pci,fsdev=@var{id},mount_tag=@var{mount_tag}
+Options for virtio-9p-pci driver are:
+@table @option
+@item fsdev=@var{id}
+Specifies the id value specified along with -fsdev option
+@item mount_tag=@var{mount_tag}
+Specifies the tag name to be used by the guest to mount this export point
 @end table
+
 ETEXI
 
+DEFHEADING()
+
 DEFHEADING(Virtual File system pass-through options:)
 
 DEF("virtfs", HAS_ARG, QEMU_OPTION_virtfs,
@@ -568,34 +586,35 @@ DEF("virtfs", HAS_ARG, QEMU_OPTION_virtfs,
 
 STEXI
 
-The general form of a Virtual File system pass-through option is:
-@table @option
-
-@item -virtfs @var{fstype} [,@var{options}]
+@item -virtfs 
@var{fsdriver},path=@var{path},mount_tag=@var{mount_tag},security_model=@var{security_model}[,cache=@var{cache}]
 @findex -virtfs
-Fstype is one of:
-@option{local},
-The specific Fstype will determine the applicable options.
-
-Options to each backend are described below.
-
-@item -virtfs local ,path=@var{path} ,mount_tag=@var{mount_tag} 
,security_model=@var{security_model}[,cache=@var{cache}]
-
-Create a Virtual file-system-pass through for local-filesystem.
-
-@option{local} is only available on Linux.
-
-@option{path} specifies the path to be exported. @option{path} is required.
-
-@option{security_model} specifies the security model to be followed.
-@option{security_model} is required.
-
-@option{mount_tag} specifies the tag with which the exported file is mounted.
-@option{mount_tag} is required.
-
-@option{cache} specifies whether to skip the host page cache.
-@option{cache} is an optional argument.
 
+The general form of a Virtual File system pass-through options are:
+@table @option
+@item @var{fsdriver}
+This option specifies the fs driver backend to use.
+Currently "local" and "handle" file system drivers are supported.

[Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-06 Thread Aneesh Kumar K.V

cache=writethrough implies the file are opened in the host with O_SYNC open flag

Signed-off-by: Aneesh Kumar K.V 
---
 fsdev/file-op-9p.h |1 +
 fsdev/qemu-fsdev.c |   10 --
 fsdev/qemu-fsdev.h |2 ++
 hw/9pfs/virtio-9p-device.c |5 +
 hw/9pfs/virtio-9p.c|   24 ++--
 qemu-config.c  |6 ++
 qemu-options.hx|   17 -
 vl.c   |6 ++
 8 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index 8de8abf..5d088d4 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -59,6 +59,7 @@ typedef struct FsContext
 char *fs_root;
 SecModel fs_sm;
 uid_t uid;
+int open_flags;
 struct xattr_operations **xops;
 /* fs driver specific data */
 void *private;
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index 768819f..fce016b 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts)
 const char *fstype = qemu_opt_get(opts, "fstype");
 const char *path = qemu_opt_get(opts, "path");
 const char *sec_model = qemu_opt_get(opts, "security_model");
+const char *cache = qemu_opt_get(opts, "cache");
+
 
 if (!fsdev_id) {
 fprintf(stderr, "fsdev: No id specified\n");
@@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts)
 fsle->fse.path = g_strdup(path);
 fsle->fse.security_model = g_strdup(sec_model);
 fsle->fse.ops = FsTypes[i].ops;
-
+fsle->fse.cache_model = 0;
+if (cache) {
+if (!strcmp(cache, "writethrough")) {
+fsle->fse.cache_model = V9FS_WRITETHROUGH_CACHE;
+}
+}
 QTAILQ_INSERT_TAIL(&fstype_entries, fsle, next);
 return 0;
-
 }
 
 FsTypeEntry *get_fsdev_fsentry(char *id)
diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h
index e04931a..4e53966 100644
--- a/fsdev/qemu-fsdev.h
+++ b/fsdev/qemu-fsdev.h
@@ -34,6 +34,7 @@ typedef struct FsTypeTable {
 FileOperations *ops;
 } FsTypeTable;
 
+#define V9FS_WRITETHROUGH_CACHE 0x1
 /*
  * Structure to store the various fsdev's passed through command line.
  */
@@ -41,6 +42,7 @@ typedef struct FsTypeEntry {
 char *fsdev_id;
 char *path;
 char *security_model;
+int cache_model;
 FileOperations *ops;
 } FsTypeEntry;
 
diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index e5b68da..a267f01 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -115,6 +115,11 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf 
*conf)
 exit(1);
 }
 
+if (fse->cache_model & V9FS_WRITETHROUGH_CACHE) {
+s->ctx.open_flags = O_SYNC;
+} else {
+s->ctx.open_flags = 0;
+}
 s->ctx.fs_root = g_strdup(fse->path);
 len = strlen(conf->tag);
 if (len > MAX_TAG_LEN) {
diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
index c01c31a..1ca3c8e 100644
--- a/hw/9pfs/virtio-9p.c
+++ b/hw/9pfs/virtio-9p.c
@@ -80,6 +80,22 @@ void cred_init(FsCred *credp)
 credp->fc_rdev = -1;
 }
 
+static int get_dotl_openflags(V9fsState *s, int oflags)
+{
+int flags;
+/*
+ * Filter the client open flags
+ */
+flags = s->ctx.open_flags;
+flags |= oflags;
+flags &= ~(O_NOCTTY | O_ASYNC | O_CREAT);
+/*
+ * Ignore direct disk access hint until the server supports it.
+ */
+flags &= ~O_DIRECT;
+return flags;
+}
+
 void v9fs_string_init(V9fsString *str)
 {
 str->data = NULL;
@@ -1598,10 +1614,7 @@ static void v9fs_open(void *opaque)
 err = offset;
 } else {
 if (s->proto_version == V9FS_PROTO_2000L) {
-flags = mode;
-flags &= ~(O_NOCTTY | O_ASYNC | O_CREAT);
-/* Ignore direct disk access hint until the server supports it. */
-flags &= ~O_DIRECT;
+flags = get_dotl_openflags(s, mode);
 } else {
 flags = omode_to_uflags(mode);
 }
@@ -1650,8 +1663,7 @@ static void v9fs_lcreate(void *opaque)
 goto out_nofid;
 }
 
-/* Ignore direct disk access hint until the server supports it. */
-flags &= ~O_DIRECT;
+flags = get_dotl_openflags(pdu->s, flags);
 err = v9fs_co_open2(pdu, fidp, &name, gid,
 flags | O_CREAT, mode, &stbuf);
 if (err < 0) {
diff --git a/qemu-config.c b/qemu-config.c
index 7a7854f..b2ab0b2 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -177,6 +177,9 @@ QemuOptsList qemu_fsdev_opts = {
 }, {
 .name = "security_model",
 .type = QEMU_OPT_STRING,
+}, {
+.name = "cache",
+.type = QEMU_OPT_STRING,
 },
 { /*End of list */ }
 },
@@ -199,6 +202,9 @@ QemuOptsList qemu_virtfs_opts = {
 }, {
 .name = "security_model",
 .type = QEMU_OPT_STRING,
+}, {
+.name = "cache",
+.type = QEMU_OPT_STRING,

Re: [Qemu-devel] [PATCH 4/5] savevm: improve subsections detection on load

2011-10-06 Thread Paolo Bonzini

On 10/07/2011 12:42 AM, Juan Quintela wrote:

>  This changes semantics for reads above 32KB.  It should be in the
>  commit message, or preferably v1 could be committed instead.:)

how it changes?  My understanding is that we read the same, only change
that I can think of is the one that I have jsut shown (and that is on
the error case).

Yes, you're right.

Paolo

[Qemu-devel] [PATCH] hw/9pfs: Use ioeventfd for 9p

2011-10-06 Thread Aneesh Kumar K.V

With ioeventfd:
[root@qemu-img-64 storage]# dd if=/dev/zero of=/storage/testx bs=8k 
count=131072 oflag=direct
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 26.767 s, 40.1 MB/s

Without:
[root@qemu-img-64 storage]# dd if=/dev/zero of=/storage/testx bs=8k 
count=131072 oflag=direct
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 65.3361 s, 16.4 MB/s

Signed-off-by: Aneesh Kumar K.V 
---
 hw/9pfs/virtio-9p-device.c |2 ++
 hw/virtio-pci.c|5 -
 hw/virtio-pci.h|5 +
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 513e181..e5b68da 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -169,6 +169,8 @@ static PCIDeviceInfo virtio_9p_info = {
 .revision  = VIRTIO_PCI_ABI_VERSION,
 .class_id  = 0x2,
 .qdev.props = (Property[]) {
+DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
+VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
 DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
 DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
 DEFINE_PROP_STRING("mount_tag", VirtIOPCIProxy, fsconf.tag),
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index df27c19..ca5923c 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -83,11 +83,6 @@
 /* Flags track per-device state like workarounds for quirks in older guests. */
 #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG  (1 << 0)
 
-/* Performance improves when virtqueue kick processing is decoupled from the
- * vcpu thread using ioeventfd for some devices. */
-#define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1
-#define VIRTIO_PCI_FLAG_USE_IOEVENTFD   (1 << 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT)
-
 /* QEMU doesn't strictly need write barriers since everything runs in
  * lock-step.  We'll leave the calls to wmb() in though to make it obvious for
  * KVM or if kqemu gets SMP support.
diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
index 14c10f7..f8404de 100644
--- a/hw/virtio-pci.h
+++ b/hw/virtio-pci.h
@@ -18,6 +18,11 @@
 #include "virtio-net.h"
 #include "virtio-serial.h"
 
+/* Performance improves when virtqueue kick processing is decoupled from the
+ * vcpu thread using ioeventfd for some devices. */
+#define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1
+#define VIRTIO_PCI_FLAG_USE_IOEVENTFD   (1 << 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT)
+
 typedef struct {
 PCIDevice pci_dev;
 VirtIODevice *vdev;
-- 
1.7.4.1

[Qemu-devel] [PATCH] qemu-char: Fix use of free() instead of g_free()

2011-10-06 Thread Stefan Weil

cppcheck reported these errors:

qemu-char.c:1667: error: Mismatching allocation and deallocation: s
qemu-char.c:1668: error: Mismatching allocation and deallocation: chr
qemu-char.c:1769: error: Mismatching allocation and deallocation: s
qemu-char.c:1770: error: Mismatching allocation and deallocation: chr

Signed-off-by: Stefan Weil 
---
 qemu-char.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 09d2309..e1b2b87 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -1664,8 +1664,8 @@ static int qemu_chr_open_win(QemuOpts *opts, 
CharDriverState **_chr)
 chr->chr_close = win_chr_close;
 
 if (win_chr_init(chr, filename) < 0) {
-free(s);
-free(chr);
+g_free(s);
+g_free(chr);
 return -EIO;
 }
 qemu_chr_generic_open(chr);
@@ -1766,8 +1766,8 @@ static int qemu_chr_open_win_pipe(QemuOpts *opts, 
CharDriverState **_chr)
 chr->chr_close = win_chr_close;
 
 if (win_chr_pipe_init(chr, filename) < 0) {
-free(s);
-free(chr);
+g_free(s);
+g_free(chr);
 return -EIO;
 }
 qemu_chr_generic_open(chr);
-- 
1.7.2.5

[Qemu-devel] [PATCH] block/qcow: Fix use of free() instead of g_free()

2011-10-06 Thread Stefan Weil

cppcheck reported this error:

qemu/block/qcow.c:599: error: Mismatching allocation and deallocation: 
cluster_data

Signed-off-by: Stefan Weil 
---
 block/qcow.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index c8bfecc..eba5a04 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -596,7 +596,7 @@ static int qcow_co_writev(BlockDriverState *bs, int64_t 
sector_num,
 if (qiov->niov > 1) {
 qemu_vfree(orig_buf);
 }
-free(cluster_data);
+g_free(cluster_data);
 
 return ret;
 }
-- 
1.7.2.5

Re: [Qemu-devel] QEMU + ARMMP11Core combination does not work

2011-10-06 Thread TusharK

Hi Peter,

Thanks for reply.

I seek help from you because I am sort of stuck. As kernel image does not get 
executed by QEMU. I tried single stepping but I could not debug the initial 
assembly code. I do not know why. My breakpoint never hits. I would like to use 
ARM11MPcore because my software will be running on ARM11MPCOre when the 
hardware is available.

Can you please help me in resolving this? If you suspect the configuration is 
having problem, then can you please share me the configuration file which you 
test for kernel 2.6.39.3 or let me know what are the mandatary changes for the 
kernel? I used the default file configuration which is present in the kernel 
itself i.e. realview-smp-defconfig under arch/arm/configs. I did not make any 
changes to this configuration and expected that it will work.

Can you please comment on this and let me know how can I proceed? It would be 
good if you could also try this at your end.

Thanks & Regards,

Tushar

On Thu, 06 Oct 2011 13:43:52 +0530  wrote

>On 6 October 2011 04:43, TusharK  wrote:

> (1) Does your kernel boot on the real hardware?

> I do not have real hardware to test my kernel. But what I did was, I

> downloaded pre-built kernel image from

> http://code.google.com/p/smp-on-qemu/downloads/list website and

> tried to run using QEMU, it boots but my kernel 2.6.39.3 does not boot.

If somebody else's kernel boots but yours does not then the chances

are very high that there is a problem with your kernel (probably

a wrong config) which you'll need to debug the same way you'd

debug this kind of misconfiguration on real hardware. Connecting

an ARM gdb up to qemu and singlestepping kernel startup may be

helpful.

> Even decompressing kernel print itself is not coming and hence

> I suspect something is wrong with wither kernel or QEMU.

If there's no output of the "Uncompressing the kernel" message

this is almost certainly a kernel configuration or compilation

problem -- QEMU's serial port code is pretty heavily tested.

(Why are you using the 11MPCore model anyway, just out of interest?)

-- PMM

Re: [Qemu-devel] [PATCH] qemu: new option for snapshot_blkdev to avoid image creation

2011-10-06 Thread Dor Laor


On 10/03/2011 06:09 PM, Federico Simoncelli wrote:

Add the new option [-n] for snapshot_blkdev to avoid the image creation.
The file provided as [new-image-file] is considered as already initialized
and will be used after passing a check for the backing file.


Seems ok to me as a way to go around fdget and still have selinux gain.
Worth to get Kevin's view too.

Federico, would you like to ack or extend the design:
http://wiki.qemu.org/Features/Snapshots



Signed-off-by: Federico Simoncelli
---
  blockdev.c  |   54 --
  hmp-commands.hx |7 ---
  qmp-commands.hx |4 ++--
  3 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 0827bf7..bd46808 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -550,8 +550,53 @@ void do_commit(Monitor *mon, const QDict *qdict)
  }
  }

+static int check_snapshot_file(const char *filename, const char *oldfilename,
+   int flags, BlockDriver *drv)
+{
+BlockDriverState *bs;
+char bak_filename[1024], *abs_filename;
+int ret = 0;
+
+bs = bdrv_new("");
+if (!bs) {
+return -1;
+}
+
+ret = bdrv_open(bs, filename, flags, drv);
+if (ret) {
+qerror_report(QERR_OPEN_FILE_FAILED, filename);
+goto err0;
+}
+
+if (bs->backing_file) {
+path_combine(bak_filename, sizeof(bak_filename),
+ filename, bs->backing_file);
+
+abs_filename = realpath(bak_filename, NULL);
+if (!abs_filename) {
+ret = -1;
+goto err1;
+}
+
+if (strcmp(abs_filename, oldfilename)) {
+qerror_report(QERR_OPEN_FILE_FAILED, filename);
+ret = -1;
+}
+
+free(abs_filename);
+}
+
+err1:
+bdrv_close(bs);
+
+err0:
+bdrv_delete(bs);
+return ret;
+}
+
  int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data)
  {
+const int nocreate = qdict_get_try_bool(qdict, "nocreate", 0);
  const char *device = qdict_get_str(qdict, "device");
  const char *filename = qdict_get_try_str(qdict, "snapshot-file");
  const char *format = qdict_get_try_str(qdict, "format");
@@ -597,8 +642,13 @@ int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, 
QObject **ret_data)
  goto out;
  }

-ret = bdrv_img_create(filename, format, bs->filename,
-  bs->drv->format_name, NULL, -1, flags);
+if (nocreate) {
+ret = check_snapshot_file(filename, bs->filename, flags, drv);
+} else {
+ret = bdrv_img_create(filename, format, bs->filename,
+  bs->drv->format_name, NULL, -1, flags);
+}
+
  if (ret) {
  goto out;
  }
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 9e1cca8..eb9fcd4 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -840,11 +840,12 @@ ETEXI

  {
  .name   = "snapshot_blkdev",
-.args_type  = "device:B,snapshot-file:s?,format:s?",
-.params = "device [new-image-file] [format]",
+.args_type  = "nocreate:-n,device:B,snapshot-file:s?,format:s?",
+.params = "[-n] device [new-image-file] [format]",
  .help   = "initiates a live snapshot\n\t\t\t"
"of device. If a new image file is specified, 
the\n\t\t\t"
-  "new image file will become the new root image.\n\t\t\t"
+  "new image file will be created (unless -n is\n\t\t\t"
+  "specified) and will become the new root image.\n\t\t\t"
"If format is specified, the snapshot file will\n\t\t\t"
"be created in that format. Otherwise the\n\t\t\t"
"snapshot will be internal! (currently unsupported)",
diff --git a/qmp-commands.hx b/qmp-commands.hx
index d83bce5..7af36d8 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -695,8 +695,8 @@ EQMP

  {
  .name   = "blockdev-snapshot-sync",
-.args_type  = "device:B,snapshot-file:s?,format:s?",
-.params = "device [new-image-file] [format]",
+.args_type  = "nocreate:-n,device:B,snapshot-file:s?,format:s?",
+.params = "[-n] device [new-image-file] [format]",
  .user_print = monitor_user_noop,
  .mhandler.cmd_new = do_snapshot_blkdev,
  },

Re: [Qemu-devel] [PATCH 4/5] savevm: improve subsections detection on load

2011-10-06 Thread Juan Quintela

Paolo Bonzini  wrote:
> On 10/06/2011 06:21 PM, Juan Quintela wrote:
>> +
>> +int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
>> +{
>> +int pending = size;
>> +int done = 0;
>> +
>> +while (pending>  0) {
>> +int res;
>> +
>> +res = qemu_peek_buffer(f, buf, pending, 0);
>> +if (res == 0) {
>> +return 0;

should this line return "done" insntead?

>>   }
>> -memcpy(buf, f->buf + f->buf_index, l);
>> -f->buf_index += l;
>> -buf += l;
>> -size -= l;
>> +qemu_file_skip(f, res);
>> +buf += res;
>> +pending -= res;
>> +done += res;
>>   }
>> -return size1 - size;
>> +return done;
>>   }
>
> This changes semantics for reads above 32KB.  It should be in the
> commit message, or preferably v1 could be committed instead. :)

how it changes?  My understanding is that we read the same, only change
that I can think of is the one that I have jsut shown (and that is on
the error case).

Later, Juan.

Re: [Qemu-devel] [PATCH 3/5] savevm: define qemu_get_byte() using qemu_peek_byte()

2011-10-06 Thread Juan Quintela

Paolo Bonzini  wrote:
> On 10/06/2011 06:21 PM, Juan Quintela wrote:
>> +result = qemu_peek_byte(f);
>> +
>> +if (f->buf_index<  f->buf_size) {
>> +f->buf_index++;
>>   }
>
> This should really be an assert that f->buf_index < f->buf_size,
> otherwise qemu_peek_byte has read garbage.

That is a change from current behaviour.  qemu_get_byte() returns 0 in
the case that there is nothing to read.  Yes, it is ugly.

Later, Juan.

Re: [Qemu-devel] qemu guest agent spins in poll/nanosleep(100ms) when nothing is listening on host

2011-10-06 Thread Michael Roth

On Thu, 6 Oct 2011 12:31:05 +0100, "Daniel P. Berrange"  
wrote:
> I've been doing some experimentation with the QEMU guest agent and have
> noticed that when nothing is connected on the host side of the virtio
> serial channel, the guest agent just spins in a pool/sleep(100ms) loop.
> I know you'd ordinarily expect some mgmt app in the host to be listening
> to the other end of the channel, but it still seems suboptimal to have
> to spin in a loop like this when nothing is listening, constantly causing
> wakeups in an otherwise idle guest.
> 
> Looking at the qemu-ga.c code I see two places where it might handle
> a poll event and then sleep, when nothing is on the other end of the
> virtio serial socket.
> 
> 
>case G_IO_STATUS_AGAIN:
> /* virtio causes us to spin here when no process is attached to
>  * host-side chardev. sleep a bit to mitigate this
>  */
> if (s->virtio) {
> usleep(100*1000);
> }
> return true;
> 
>
> 
> 
> } else if (strcmp(s->method, "virtio-serial") == 0) {
> /* we spin on EOF for virtio-serial, so back off a bit. also,
>  * dont close the connection in this case, it'll resume normal
>  * operation when another process connects to host chardev
>  */
> usleep(100*1000);
> goto out_noclose;
> }
> 
> 
> I get the feeling that this kind of problem inherant in the use of any
> virtio-serial channel, in the same way you can't detect EOF for a regular
> serial device channel either. Given that virtio-serial is a nice paravirt
> device, is there anything we can do to it, to allow better handling of
> EOF by applications ?

Indeed, and there was a discussion a while back where I think we had tentative
agreement on a path forward for this. Unfortunately there doesn't seem to be
a clear solution for doing it purely in guest-userspace:

http://www.mail-archive.com/qemu-devel@nongnu.org/msg57002.html

The gist of it is basically making the (guest-side) virtio-serial chardev
behave more like a unix socket, i.e. if the host hangs up you get a single EOF
and then your FD becomes invalid, at which point you need to re-open the
chardev to get a valid FD. This could potentially be done with via a new set of
-chardev/-device flags.

> 
> Or perhaps there is some way to make use of epoll() in edge-triggered
> mode to detect it already, because IIUC, edge-triggered mode should only
> fire once for the EOF condition, and then not fire again until something
> in the host actually sends some data ?
> 
> Of course glib's event loop doesn't support edge-triggered events/epoll,
> but perhaps we could just call epoll() directly in the event handler,
> instead of the usleep() call ?

That's definitely worth looking into. Has the 100ms sleep been causing any
issues though? My main concern with the polling behavior was less a matter of
performance than being able to provide a "session" where the start and end
of a stream could be reliably determined, which we don't have currently. But
the guest agent has since been reworked to persist state between host
connects/disconnects so it didn't seem to be a major issue anymore.


> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
> 

-- 
Sincerely,
Mike Roth
IBM Linux Technology Center

Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper

2011-10-06 Thread Corey Bryant




On 10/06/2011 02:04 PM, Anthony Liguori wrote:

On 10/06/2011 11:41 AM, Daniel P. Berrange wrote:

On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote:

This patch adds a helper that can be used to create a tap device
attached to
a bridge device. Since this helper is minimal in what it does, it can be
given CAP_NET_ADMIN which allows qemu to avoid running as root while
still
satisfying the majority of what users tend to want to do with tap
devices.

The way this all works is that qemu launches this helper passing a
bridge
name and the name of an inherited file descriptor. The descriptor is one
end of a socketpair() of domain sockets. This domain socket is used to
transmit a file descriptor of the opened tap device from the helper
to qemu.

The helper can then exit and let qemu use the tap device.


When QEMU is run by libvirt, we generally like to use capng to
remove the ability for QEMU to run setuid programs at all. So
obviously it will struggle to run the qemu-bridge-helper binary
in such a scenario.

With the way you transmit the TAP device FD back to the caller,
it looks like libvirt itself could execute the qemu-bridge-helper
receiving the FD, and then pass the FD onto QEMU using the
traditional tap,fd=XX syntax.


Exactly. This would allow tap-based networking using libvirt session://
URIs.



I'll take note of this.  It seems like it would be a nice future 
addition to libvirt.


A slight tangent, but a point on DAC isolation.  The helper enables DAC 
isolation for qemu:///session but we still need some work in libvirt to 
provide DAC isolation for qemu:///system.  This could be done by 
allowing management applications to specify custom user/group IDs when 
creating guests rather than hard coding the IDs in the configuration file.




The TAP device FD is only one FD we normally pass to QEMU. How about
support for vhost net ? Is it reasonable to ask the qemu-bridge-helper
to send back a vhost net FD also.


Absolutely.


Or indeed multiple vhost net FDs
when we get multiqueue NICs. Should we expect the bridge helper to
be strictly limited to just connecting a TAP dev to a bridge, or is
the expectation that it will grow more& more functionality over
time ?


I would not expect it to do more than create virtual network interfaces,
and add them to bridges. Multiqueue virtual nics, vhost, etc. would all
be in scope as they are part of creating a virtual network interface.

Creating the bridges and managing the bridges should be done statically
by an administrator and would be out of scope.

Regards,

Anthony Liguori



Daniel




--
Regards,
Corey

Re: [Qemu-devel] [PATCH 4/4] Add support for bridge

2011-10-06 Thread Corey Bryant




On 10/06/2011 02:19 PM, Anthony Liguori wrote:

On 10/06/2011 01:15 PM, Corey Bryant wrote:



On 10/06/2011 01:49 PM, Anthony Liguori wrote:

On 10/06/2011 10:38 AM, Richa Marwaha wrote:

The most common use of -net tap is to connect a tap device to a
bridge. This
requires the use of a script and running qemu as root in order to
allocate a
tap device to pass to the script.

This model is great for portability and flexibility but it's incredibly
difficult to eliminate the need to run qemu as root. The only really
viable
mechanism is to use tunctl to create a tap device, attach it to a
bridge as
root, and then hand that tap device to qemu. The problem with this
mechanism
is that it requires administrator intervention whenever a user wants
to create
a guest.

By essentially writing a helper that implements the most common
qemu-ifup
script that can be safely given cap_net_admin, we can dramatically
simplify
things for non-privileged users. We still support existing -net tap
options
as a mechanism for advanced users and backwards compatibility.

Currently, this is very Linux centric but there's really no reason
why it
couldn't be extended for other Unixes.

The default bridge that we attach to is qemubr0. The thinking is that
a distro
could preconfigure such an interface to allow out-of-the-box bridged
networking.

Alternatively, if a user wants to use a different bridge, they can say:

qemu-hda linux.img -net
tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
-net nic,model=virtio



Wouldn't it be better to make the syntax:

-net bridge[,br=BRIDGE][,helper=HELPER]

And default BRIDGE to br0 and HELPER to
${prefix}/libexec/qemu-bridge-helper ?

That gives distros a proper way to configure a default bridge making
-net bridge Just Work for most people.

Regards,

Anthony Liguori



Yes I think it would be much more usable under -net bridge. I really
wanted this
to work under -net tap (where fd and init are) but now we know there's
no good
way to default to the helper without spelling out the path.


I'm certainly in favor of leaving helper as part of -net tap, but I
think there should be a -net bridge in addition.

Regards,

Anthony Liguori


Ok, yes.  The best of both worlds.

--
Regards,
Corey

Re: [Qemu-devel] [PATCH 4/4] Add support for bridge

2011-10-06 Thread Anthony Liguori


On 10/06/2011 01:15 PM, Corey Bryant wrote:



On 10/06/2011 01:49 PM, Anthony Liguori wrote:

On 10/06/2011 10:38 AM, Richa Marwaha wrote:

The most common use of -net tap is to connect a tap device to a
bridge. This
requires the use of a script and running qemu as root in order to
allocate a
tap device to pass to the script.

This model is great for portability and flexibility but it's incredibly
difficult to eliminate the need to run qemu as root. The only really
viable
mechanism is to use tunctl to create a tap device, attach it to a
bridge as
root, and then hand that tap device to qemu. The problem with this
mechanism
is that it requires administrator intervention whenever a user wants
to create
a guest.

By essentially writing a helper that implements the most common qemu-ifup
script that can be safely given cap_net_admin, we can dramatically
simplify
things for non-privileged users. We still support existing -net tap
options
as a mechanism for advanced users and backwards compatibility.

Currently, this is very Linux centric but there's really no reason why it
couldn't be extended for other Unixes.

The default bridge that we attach to is qemubr0. The thinking is that
a distro
could preconfigure such an interface to allow out-of-the-box bridged
networking.

Alternatively, if a user wants to use a different bridge, they can say:

qemu-hda linux.img -net
tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
-net nic,model=virtio



Wouldn't it be better to make the syntax:

-net bridge[,br=BRIDGE][,helper=HELPER]

And default BRIDGE to br0 and HELPER to
${prefix}/libexec/qemu-bridge-helper ?

That gives distros a proper way to configure a default bridge making
-net bridge Just Work for most people.

Regards,

Anthony Liguori



Yes I think it would be much more usable under -net bridge. I really wanted this
to work under -net tap (where fd and init are) but now we know there's no good
way to default to the helper without spelling out the path.


I'm certainly in favor of leaving helper as part of -net tap, but I think there 
should be a -net bridge in addition.


Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH 4/4] Add support for bridge

2011-10-06 Thread Corey Bryant




On 10/06/2011 01:49 PM, Anthony Liguori wrote:

On 10/06/2011 10:38 AM, Richa Marwaha wrote:

The most common use of -net tap is to connect a tap device to a
bridge. This
requires the use of a script and running qemu as root in order to
allocate a
tap device to pass to the script.

This model is great for portability and flexibility but it's incredibly
difficult to eliminate the need to run qemu as root. The only really
viable
mechanism is to use tunctl to create a tap device, attach it to a
bridge as
root, and then hand that tap device to qemu. The problem with this
mechanism
is that it requires administrator intervention whenever a user wants
to create
a guest.

By essentially writing a helper that implements the most common qemu-ifup
script that can be safely given cap_net_admin, we can dramatically
simplify
things for non-privileged users. We still support existing -net tap
options
as a mechanism for advanced users and backwards compatibility.

Currently, this is very Linux centric but there's really no reason why it
couldn't be extended for other Unixes.

The default bridge that we attach to is qemubr0. The thinking is that
a distro
could preconfigure such an interface to allow out-of-the-box bridged
networking.

Alternatively, if a user wants to use a different bridge, they can say:

qemu-hda linux.img -net
tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
-net nic,model=virtio



Wouldn't it be better to make the syntax:

-net bridge[,br=BRIDGE][,helper=HELPER]

And default BRIDGE to br0 and HELPER to
${prefix}/libexec/qemu-bridge-helper ?

That gives distros a proper way to configure a default bridge making
-net bridge Just Work for most people.

Regards,

Anthony Liguori



Yes I think it would be much more usable under -net bridge.  I really 
wanted this to work under -net tap (where fd and init are) but now we 
know there's no good way to default to the helper without spelling out 
the path.


We'll move to -net bridge if folks are in agreement and default to 
bridge br0.




Signed-off-by: Richa Marwaha
---
configure | 2 +
net.c | 8 +++
net.h | 2 +
net/tap.c | 150 ---
qemu-options.hx | 48 +-
5 files changed, 190 insertions(+), 20 deletions(-)

diff --git a/configure b/configure
index f46e9b7..ef05954 100755
--- a/configure
+++ b/configure
@@ -2775,6 +2775,8 @@ echo "sysconfdir=$sysconfdir">> $config_host_mak
echo "docdir=$docdir">> $config_host_mak
echo "libexecdir=\${prefix}/libexec">> $config_host_mak
echo "confdir=$confdir">> $config_host_mak
+echo "CONFIG_QEMU_SHAREDIR=\"$prefix$datasuffix\"">> $config_host_mak
+echo "CONFIG_QEMU_HELPERDIR=\"$prefix/libexec\"">> $config_host_mak

case "$cpu" in
i386|x86_64|alpha|cris|hppa|ia64|lm32|m68k|microblaze|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64|unicore32)

diff --git a/net.c b/net.c
index d05930c..4c3c551 100644
--- a/net.c
+++ b/net.c
@@ -956,6 +956,14 @@ static const struct {
.type = QEMU_OPT_STRING,
.help = "script to shut down the interface",
}, {
+ .name = "br",
+ .type = QEMU_OPT_STRING,
+ .help = "bridge name",
+ }, {
+ .name = "helper",
+ .type = QEMU_OPT_STRING,
+ .help = "command to execute to configure bridge",
+ }, {
.name = "sndbuf",
.type = QEMU_OPT_SIZE,
.help = "send buffer limit"
diff --git a/net.h b/net.h
index 9f633f8..eeb19a7 100644
--- a/net.h
+++ b/net.h
@@ -174,6 +174,8 @@ int do_netdev_del(Monitor *mon, const QDict
*qdict, QObject **ret_data);

#define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup"
#define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown"
+#define DEFAULT_BRIDGE_HELPER CONFIG_QEMU_HELPERDIR
"/qemu-bridge-helper"
+#define DEFAULT_BRIDGE_INTERFACE "qemubr0"

void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd);

diff --git a/net/tap.c b/net/tap.c
index 1f26dc9..74f103a 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -388,6 +388,108 @@ static int launch_script(const char
*setup_script, const char *ifname, int fd)
return -1;
}

+static int recv_fd(int c)
+{
+ int fd;
+ uint8_t msgbuf[CMSG_SPACE(sizeof(fd))];
+ struct msghdr msg = {
+ .msg_control = msgbuf,
+ .msg_controllen = sizeof(msgbuf),
+ };
+ struct cmsghdr *cmsg;
+ struct iovec iov;
+ uint8_t req[1];
+ ssize_t len;
+
+ cmsg = CMSG_FIRSTHDR(&msg);
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
+ msg.msg_controllen = cmsg->cmsg_len;
+
+ iov.iov_base = req;
+ iov.iov_len = sizeof(req);
+
+ msg.msg_iov =&iov;
+ msg.msg_iovlen = 1;
+
+ len = recvmsg(c,&msg, 0);
+ if (len> 0) {
+ memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd));
+ return fd;
+ }
+
+ return len;
+}
+
+static int net_bridge_run_helper(const char *helper, const char *bridge)
+{
+ sigset_t oldmask, mask;
+ int pid, status;
+ char *args[5];
+ char **parg;
+ int sv[2];
+
+ sigemptyset(&mask);
+ sigaddset(&mask, SIGCHLD);
+ sigprocmask(SIG_BLOCK,&mask,&oldmask);
+
+ if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+ return -1;
+ }
+
+ /* try to launch bridge helper */
+ pid

Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper

2011-10-06 Thread Corey Bryant




On 10/06/2011 01:44 PM, Anthony Liguori wrote:

On 10/06/2011 10:38 AM, Richa Marwaha wrote:

This patch adds a helper that can be used to create a tap device
attached to
a bridge device. Since this helper is minimal in what it does, it can be
given CAP_NET_ADMIN which allows qemu to avoid running as root while
still
satisfying the majority of what users tend to want to do with tap
devices.

The way this all works is that qemu launches this helper passing a bridge
name and the name of an inherited file descriptor. The descriptor is one
end of a socketpair() of domain sockets. This domain socket is used to
transmit a file descriptor of the opened tap device from the helper to
qemu.

The helper can then exit and let qemu use the tap device.

Signed-off-by: Richa Marwaha
---
Makefile | 12 +++-
configure | 1 +
qemu-bridge-helper.c | 205
++
3 files changed, 216 insertions(+), 2 deletions(-)
create mode 100644 qemu-bridge-helper.c

diff --git a/Makefile b/Makefile
index 6ed3194..f2caedc 100644
--- a/Makefile
+++ b/Makefile
@@ -34,6 +34,8 @@ $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw)

LIBS+=-lz $(LIBS_TOOLS)

+HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF)
+
ifdef BUILD_DOCS
DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8
QMP/qmp-commands.txt
else
@@ -74,7 +76,7 @@ defconfig:

-include config-all-devices.mak

-build-all: $(DOCS) $(TOOLS) recurse-all
+build-all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all

config-host.h: config-host.h-timestamp
config-host.h-timestamp: config-host.mak
@@ -151,6 +153,8 @@ qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o
qemu-error.o $(oslib-obj-y) $(trace-ob

qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o
$(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y)
$(version-obj-y) qemu-timer-common.o

+qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o
+
qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h< $< > $@," GEN $@")

@@ -208,7 +212,7 @@ clean:
# avoid old build problems by removing potentially incorrect old files
rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h
gen-op-arm.h
rm -f qemu-options.def
- rm -f *.o *.d *.a *.lo $(TOOLS) qemu-ga TAGS cscope.* *.pod *~ */*~
+ rm -f *.o *.d *.a *.lo $(TOOLS) $(HELPERS-y) qemu-ga TAGS cscope.*
*.pod *~ */*~
rm -Rf .libs
rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d
net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d qapi/*.o qapi/*.d
qga/*.o qga/*.d
rm -f qemu-img-cmds.h
@@ -275,6 +279,10 @@ install: all $(if $(BUILD_DOCS),install-doc)
install-sysconfig
ifneq ($(TOOLS),)
$(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) "$(DESTDIR)$(bindir)"
endif
+ifneq ($(HELPERS-y),)
+ $(INSTALL_DIR) "$(DESTDIR)$(libexecdir)"
+ $(INSTALL_PROG) $(STRIP_OPT) $(HELPERS-y) "$(DESTDIR)$(libexecdir)"
+endif
ifneq ($(BLOBS),)
$(INSTALL_DIR) "$(DESTDIR)$(datadir)"
set -e; for x in $(BLOBS); do \
diff --git a/configure b/configure
index 59b1494..3e32834 100755
--- a/configure
+++ b/configure
@@ -2742,6 +2742,7 @@ echo "mandir=$mandir">> $config_host_mak
echo "datadir=$datadir">> $config_host_mak
echo "sysconfdir=$sysconfdir">> $config_host_mak
echo "docdir=$docdir">> $config_host_mak
+echo "libexecdir=\${prefix}/libexec">> $config_host_mak
echo "confdir=$confdir">> $config_host_mak

case "$cpu" in
diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c
new file mode 100644
index 000..4ac7b36
--- /dev/null
+++ b/qemu-bridge-helper.c
@@ -0,0 +1,205 @@
+/*
+ * QEMU Bridge Helper
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ * Anthony Liguori


Heh, fairly sure that's not my email address ;-)



I thought that was a secret identity. :) We'll update that.


+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "config-host.h"
+
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include
+#include
+#include
+#include
+#include
+
+#include
+
+#include
+
+#include "net/tap-linux.h"
+
+static int has_vnet_hdr(int fd)
+{
+ unsigned int features = 0;
+ struct ifreq ifreq;
+
+ if (ioctl(fd, TUNGETFEATURES,&features) == -1) {
+ return -errno;
+ }
+
+ if (!(features& IFF_VNET_HDR)) {
+ return -ENOTSUP;
+ }
+
+ if (ioctl(fd, TUNGETIFF,&ifreq) != -1 || errno != EBADFD) {
+ return -ENOTSUP;
+ }
+
+ return 1;
+}
+
+static void prep_ifreq(struct ifreq *ifr, const char *ifname)
+{
+ memset(ifr, 0, sizeof(*ifr));
+ snprintf(ifr->ifr_name, IFNAMSIZ, "%s", ifname);
+}
+
+static int send_fd(int c, int fd)
+{
+ char msgbuf[CMSG_SPACE(sizeof(fd))];
+ struct msghdr msg = {
+ .msg_control = msgbuf,
+ .msg_controllen = sizeof(msgbuf),
+ };
+ struct cmsghdr *cmsg;
+ struct iovec iov;
+ char req[1] = { 0x00 };
+
+ cmsg = CMSG_FIRSTHDR(&msg);
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
+ msg.msg_controllen = cmsg->cmsg_len;
+
+ iov.iov_base = req;
+

Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper

2011-10-06 Thread Anthony Liguori


On 10/06/2011 11:41 AM, Daniel P. Berrange wrote:

On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote:

This patch adds a helper that can be used to create a tap device attached to
a bridge device.  Since this helper is minimal in what it does, it can be
given CAP_NET_ADMIN which allows qemu to avoid running as root while still
satisfying the majority of what users tend to want to do with tap devices.

The way this all works is that qemu launches this helper passing a bridge
name and the name of an inherited file descriptor.  The descriptor is one
end of a socketpair() of domain sockets.  This domain socket is used to
transmit a file descriptor of the opened tap device from the helper to qemu.

The helper can then exit and let qemu use the tap device.


When QEMU is run by libvirt, we generally like to use capng to
remove the ability for QEMU to run setuid programs at all. So
obviously it will struggle to run the qemu-bridge-helper binary
in such a scenario.

With the way you transmit the TAP device FD back to the caller,
it looks like libvirt itself could execute the qemu-bridge-helper
receiving the FD, and then pass the FD onto QEMU using the
traditional tap,fd=XX syntax.


Exactly.  This would allow tap-based networking using libvirt session:// URIs.



The TAP device FD is only one FD we normally pass to QEMU. How about
support for vhost net ? Is it reasonable to ask the qemu-bridge-helper
to send back a vhost net FD also.


Absolutely.


Or indeed multiple vhost net FDs
when we get multiqueue NICs.  Should we expect the bridge helper to
be strictly limited to just connecting a TAP dev to a bridge, or is
the expectation that it will grow more&  more functionality over
time ?


I would not expect it to do more than create virtual network interfaces, and add 
them to bridges.  Multiqueue virtual nics, vhost, etc. would all be in scope as 
they are part of creating a virtual network interface.


Creating the bridges and managing the bridges should be done statically by an 
administrator and would be out of scope.


Regards,

Anthony Liguori



Daniel

Re: [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID

2011-10-06 Thread Corey Bryant




On 10/06/2011 01:42 PM, Anthony Liguori wrote:

On 10/06/2011 11:34 AM, Daniel P. Berrange wrote:

On Thu, Oct 06, 2011 at 11:38:27AM -0400, Richa Marwaha wrote:

The ideal way to use qemu-bridge-helper is to give it an fscap of using:

setcap cap_net_admin=ep qemu-bridge-helper

Unfortunately, most distros still do not have a mechanism to package
files
with fscaps applied. This means they'll have to SUID the
qemu-bridge-helper
binary.

To improve security, use libcap to reduce our capability set to just
cap_net_admin, then reduce privileges down to the calling user. This is
hopefully close to equivalent to fscap support from a security
perspective.
+#ifdef CONFIG_LIBCAP
+static int drop_privileges(void)
+{
+ cap_t cap;
+ cap_value_t new_caps[] = {CAP_NET_ADMIN};
+
+ cap = cap_init();


Check for NULL ?


+
+ /* set capabilities to be permitted and inheritable. we don't need the
+ * caps to be effective right now as they'll get reset when we seteuid
+ * anyway */
+ cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
+ cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET);


Check for failure ?


+
+ if (cap_set_proc(cap) == -1) {
+ return -1;
+ }
+
+ cap_free(cap);


Check for failure ?


+
+ /* reduce our privileges to a normal user */
+ setegid(getgid());
+ seteuid(getuid());


Check for failure ?


+ cap = cap_init();


Check for NULL ?


+
+ /* enable the our capabilities. we marked them as inheritable earlier
+ * which is what allows this to work. */
+ cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET);
+ cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);


Check for failure ?


+
+ if (cap_set_proc(cap) == -1) {
+ return -1;
+ }
+
+ cap_free(cap);


Check for failure ?


+
+ return 0;
+}
+#endif


It may seem like checking for failure on cap_free/cap_set_flag is
not required because they can only return EINVAL for invalid
args, but since this is missing the check for NULL on cap_init
you can actually see errors from those latter functions in an
OOM cenario.

I think I'd suggest not using libcap, instead try libcap-ng [1] whose
APIs are designed with safety in mind& result in much simpler and
clearer code:

eg, that entire function above can be expressed using capng with
something approximating:

capng_clear(CAPNG_SELECT_BOTH);
if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED,
CAP_NET_ADMIN)< 0)
error(...);
if (capng_change_id(getuid(), getgid(), CAPNG_DROP_SUPP_GRP |
CAPNG_CLEAR_BOUNDING))
error(...);


Ah, libcap-ng didn't exist when the code was initially written but I
agree, it looks like a nice library.

Regards,

Anthony Liguori



This looks a lot simpler.  We'll definitely look into implementing this
in v2.

--
Regards,
Corey




Regards,
Daniel

[1] http://people.redhat.com/sgrubb/libcap-ng/

Re: [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID

2011-10-06 Thread Corey Bryant




On 10/06/2011 01:42 PM, Anthony Liguori wrote:

On 10/06/2011 11:34 AM, Daniel P. Berrange wrote:

On Thu, Oct 06, 2011 at 11:38:27AM -0400, Richa Marwaha wrote:

The ideal way to use qemu-bridge-helper is to give it an fscap of using:

setcap cap_net_admin=ep qemu-bridge-helper

Unfortunately, most distros still do not have a mechanism to package
files
with fscaps applied. This means they'll have to SUID the
qemu-bridge-helper
binary.

To improve security, use libcap to reduce our capability set to just
cap_net_admin, then reduce privileges down to the calling user. This is
hopefully close to equivalent to fscap support from a security
perspective.
+#ifdef CONFIG_LIBCAP
+static int drop_privileges(void)
+{
+ cap_t cap;
+ cap_value_t new_caps[] = {CAP_NET_ADMIN};
+
+ cap = cap_init();


Check for NULL ?


+
+ /* set capabilities to be permitted and inheritable. we don't need the
+ * caps to be effective right now as they'll get reset when we seteuid
+ * anyway */
+ cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
+ cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET);


Check for failure ?


+
+ if (cap_set_proc(cap) == -1) {
+ return -1;
+ }
+
+ cap_free(cap);


Check for failure ?


+
+ /* reduce our privileges to a normal user */
+ setegid(getgid());
+ seteuid(getuid());


Check for failure ?


+ cap = cap_init();


Check for NULL ?


+
+ /* enable the our capabilities. we marked them as inheritable earlier
+ * which is what allows this to work. */
+ cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET);
+ cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);


Check for failure ?


+
+ if (cap_set_proc(cap) == -1) {
+ return -1;
+ }
+
+ cap_free(cap);


Check for failure ?


+
+ return 0;
+}
+#endif


It may seem like checking for failure on cap_free/cap_set_flag is
not required because they can only return EINVAL for invalid
args, but since this is missing the check for NULL on cap_init
you can actually see errors from those latter functions in an
OOM cenario.

I think I'd suggest not using libcap, instead try libcap-ng [1] whose
APIs are designed with safety in mind& result in much simpler and
clearer code:

eg, that entire function above can be expressed using capng with
something approximating:

capng_clear(CAPNG_SELECT_BOTH);
if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED,
CAP_NET_ADMIN)< 0)
error(...);
if (capng_change_id(getuid(), getgid(), CAPNG_DROP_SUPP_GRP |
CAPNG_CLEAR_BOUNDING))
error(...);


Ah, libcap-ng didn't exist when the code was initially written but I
agree, it looks like a nice library.

Regards,

Anthony Liguori



This looks a lot simpler.  We'll definitely look into implementing this 
in v2.


--
Regards,
Corey




Regards,
Daniel

[1] http://people.redhat.com/sgrubb/libcap-ng/

Re: [Qemu-devel] [PATCH 4/4] Add support for bridge

2011-10-06 Thread Anthony Liguori


On 10/06/2011 10:38 AM, Richa Marwaha wrote:

The most common use of -net tap is to connect a tap device to a bridge.  This
requires the use of a script and running qemu as root in order to allocate a
tap device to pass to the script.

This model is great for portability and flexibility but it's incredibly
difficult to eliminate the need to run qemu as root.  The only really viable
mechanism is to use tunctl to create a tap device, attach it to a bridge as
root, and then hand that tap device to qemu.  The problem with this mechanism
is that it requires administrator intervention whenever a user wants to create
a guest.

By essentially writing a helper that implements the most common qemu-ifup
script that can be safely given cap_net_admin, we can dramatically simplify
things for non-privileged users.  We still support existing -net tap options
as a mechanism for advanced users and backwards compatibility.

Currently, this is very Linux centric but there's really no reason why it
couldn't be extended for other Unixes.

The default bridge that we attach to is qemubr0.  The thinking is that a distro
could preconfigure such an interface to allow out-of-the-box bridged networking.

Alternatively, if a user wants to use a different bridge, they can say:

   qemu-hda linux.img -net 
tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
  -net nic,model=virtio



Wouldn't it be better to make the syntax:

-net bridge[,br=BRIDGE][,helper=HELPER]

And default BRIDGE to br0 and HELPER to ${prefix}/libexec/qemu-bridge-helper ?

That gives distros a proper way to configure a default bridge making -net bridge 
Just Work for most people.


Regards,

Anthony Liguori



Signed-off-by: Richa Marwaha
---
  configure   |2 +
  net.c   |8 +++
  net.h   |2 +
  net/tap.c   |  150 ---
  qemu-options.hx |   48 +-
  5 files changed, 190 insertions(+), 20 deletions(-)

diff --git a/configure b/configure
index f46e9b7..ef05954 100755
--- a/configure
+++ b/configure
@@ -2775,6 +2775,8 @@ echo "sysconfdir=$sysconfdir">>  $config_host_mak
  echo "docdir=$docdir">>  $config_host_mak
  echo "libexecdir=\${prefix}/libexec">>  $config_host_mak
  echo "confdir=$confdir">>  $config_host_mak
+echo "CONFIG_QEMU_SHAREDIR=\"$prefix$datasuffix\"">>  $config_host_mak
+echo "CONFIG_QEMU_HELPERDIR=\"$prefix/libexec\"">>  $config_host_mak

  case "$cpu" in

i386|x86_64|alpha|cris|hppa|ia64|lm32|m68k|microblaze|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64|unicore32)
diff --git a/net.c b/net.c
index d05930c..4c3c551 100644
--- a/net.c
+++ b/net.c
@@ -956,6 +956,14 @@ static const struct {
  .type = QEMU_OPT_STRING,
  .help = "script to shut down the interface",
  }, {
+.name = "br",
+.type = QEMU_OPT_STRING,
+.help = "bridge name",
+}, {
+.name = "helper",
+.type = QEMU_OPT_STRING,
+.help = "command to execute to configure bridge",
+}, {
  .name = "sndbuf",
  .type = QEMU_OPT_SIZE,
  .help = "send buffer limit"
diff --git a/net.h b/net.h
index 9f633f8..eeb19a7 100644
--- a/net.h
+++ b/net.h
@@ -174,6 +174,8 @@ int do_netdev_del(Monitor *mon, const QDict *qdict, QObject 
**ret_data);

  #define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup"
  #define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown"
+#define DEFAULT_BRIDGE_HELPER CONFIG_QEMU_HELPERDIR "/qemu-bridge-helper"
+#define DEFAULT_BRIDGE_INTERFACE "qemubr0"

  void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd);

diff --git a/net/tap.c b/net/tap.c
index 1f26dc9..74f103a 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -388,6 +388,108 @@ static int launch_script(const char *setup_script, const 
char *ifname, int fd)
  return -1;
  }

+static int recv_fd(int c)
+{
+int fd;
+uint8_t msgbuf[CMSG_SPACE(sizeof(fd))];
+struct msghdr msg = {
+.msg_control = msgbuf,
+.msg_controllen = sizeof(msgbuf),
+};
+struct cmsghdr *cmsg;
+struct iovec iov;
+uint8_t req[1];
+ssize_t len;
+
+cmsg = CMSG_FIRSTHDR(&msg);
+cmsg->cmsg_level = SOL_SOCKET;
+cmsg->cmsg_type = SCM_RIGHTS;
+cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
+msg.msg_controllen = cmsg->cmsg_len;
+
+iov.iov_base = req;
+iov.iov_len = sizeof(req);
+
+msg.msg_iov =&iov;
+msg.msg_iovlen = 1;
+
+len = recvmsg(c,&msg, 0);
+if (len>  0) {
+memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd));
+return fd;
+}
+
+return len;
+}
+
+static int net_bridge_run_helper(const char *helper, const char *bridge)
+{
+sigset_t oldmask, mask;
+int pid, status;
+char *args[5];
+char **parg;
+int sv[2];
+
+sigemptyset(&mask);
+sigaddset(&mask, SIGCHLD);
+sigprocmask(SIG_BLOCK,&mask,&oldmask);
+
+if (socketpair(

Re: [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID

2011-10-06 Thread Anthony Liguori


On 10/06/2011 11:34 AM, Daniel P. Berrange wrote:

On Thu, Oct 06, 2011 at 11:38:27AM -0400, Richa Marwaha wrote:

The ideal way to use qemu-bridge-helper is to give it an fscap of using:

  setcap cap_net_admin=ep qemu-bridge-helper

Unfortunately, most distros still do not have a mechanism to package files
with fscaps applied.  This means they'll have to SUID the qemu-bridge-helper
binary.

To improve security, use libcap to reduce our capability set to just
cap_net_admin, then reduce privileges down to the calling user.  This is
hopefully close to equivalent to fscap support from a security perspective.
+#ifdef CONFIG_LIBCAP
+static int drop_privileges(void)
+{
+cap_t cap;
+cap_value_t new_caps[] = {CAP_NET_ADMIN};
+
+cap = cap_init();


Check for NULL ?


+
+/* set capabilities to be permitted and inheritable.  we don't need the
+ * caps to be effective right now as they'll get reset when we seteuid
+ * anyway */
+cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
+cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET);


Check for failure ?


+
+if (cap_set_proc(cap) == -1) {
+return -1;
+}
+
+cap_free(cap);


Check for failure ?


+
+/* reduce our privileges to a normal user */
+setegid(getgid());
+seteuid(getuid());


Check for failure ?


+cap = cap_init();


Check for NULL ?


+
+/* enable the our capabilities.  we marked them as inheritable earlier
+ * which is what allows this to work. */
+cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET);
+cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);


Check for failure ?


+
+if (cap_set_proc(cap) == -1) {
+return -1;
+}
+
+cap_free(cap);


Check for failure ?


+
+return 0;
+}
+#endif


It may seem like checking for failure on cap_free/cap_set_flag is
not required because they can only return EINVAL for invalid
args, but since this is missing the check for NULL on cap_init
you can actually see errors from those latter functions in an
OOM cenario.

I think I'd suggest not using libcap, instead try libcap-ng [1] whose
APIs are designed with safety in mind&  result in much simpler and
clearer code:

eg, that entire function above can be expressed using capng with
something approximating:

  capng_clear(CAPNG_SELECT_BOTH);
  if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, 
CAP_NET_ADMIN)<  0)
  error(...);
  if (capng_change_id(getuid(), getgid(), CAPNG_DROP_SUPP_GRP | 
CAPNG_CLEAR_BOUNDING))
  error(...);


Ah, libcap-ng didn't exist when the code was initially written but I agree, it 
looks like a nice library.


Regards,

Anthony Liguori




Regards,
Daniel

[1] http://people.redhat.com/sgrubb/libcap-ng/

Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper

2011-10-06 Thread Anthony Liguori


On 10/06/2011 10:38 AM, Richa Marwaha wrote:

This patch adds a helper that can be used to create a tap device attached to
a bridge device.  Since this helper is minimal in what it does, it can be
given CAP_NET_ADMIN which allows qemu to avoid running as root while still
satisfying the majority of what users tend to want to do with tap devices.

The way this all works is that qemu launches this helper passing a bridge
name and the name of an inherited file descriptor.  The descriptor is one
end of a socketpair() of domain sockets.  This domain socket is used to
transmit a file descriptor of the opened tap device from the helper to qemu.

The helper can then exit and let qemu use the tap device.

Signed-off-by: Richa Marwaha
---
  Makefile |   12 +++-
  configure|1 +
  qemu-bridge-helper.c |  205 ++
  3 files changed, 216 insertions(+), 2 deletions(-)
  create mode 100644 qemu-bridge-helper.c

diff --git a/Makefile b/Makefile
index 6ed3194..f2caedc 100644
--- a/Makefile
+++ b/Makefile
@@ -34,6 +34,8 @@ $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw)

  LIBS+=-lz $(LIBS_TOOLS)

+HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF)
+
  ifdef BUILD_DOCS
  DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 
QMP/qmp-commands.txt
  else
@@ -74,7 +76,7 @@ defconfig:

  -include config-all-devices.mak

-build-all: $(DOCS) $(TOOLS) recurse-all
+build-all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all

  config-host.h: config-host.h-timestamp
  config-host.h-timestamp: config-host.mak
@@ -151,6 +153,8 @@ qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o 
$(oslib-obj-y) $(trace-ob

  qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(oslib-obj-y) 
$(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) 
qemu-timer-common.o

+qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o
+
  qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h<  $<  >  $@,"  GEN   
$@")

@@ -208,7 +212,7 @@ clean:
  # avoid old build problems by removing potentially incorrect old files
rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h 
gen-op-arm.h
rm -f qemu-options.def
-   rm -f *.o *.d *.a *.lo $(TOOLS) qemu-ga TAGS cscope.* *.pod *~ */*~
+   rm -f *.o *.d *.a *.lo $(TOOLS) $(HELPERS-y) qemu-ga TAGS cscope.* 
*.pod *~ */*~
rm -Rf .libs
rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d 
net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d qapi/*.o qapi/*.d qga/*.o 
qga/*.d
rm -f qemu-img-cmds.h
@@ -275,6 +279,10 @@ install: all $(if $(BUILD_DOCS),install-doc) 
install-sysconfig
  ifneq ($(TOOLS),)
$(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) "$(DESTDIR)$(bindir)"
  endif
+ifneq ($(HELPERS-y),)
+   $(INSTALL_DIR) "$(DESTDIR)$(libexecdir)"
+   $(INSTALL_PROG) $(STRIP_OPT) $(HELPERS-y) "$(DESTDIR)$(libexecdir)"
+endif
  ifneq ($(BLOBS),)
$(INSTALL_DIR) "$(DESTDIR)$(datadir)"
set -e; for x in $(BLOBS); do \
diff --git a/configure b/configure
index 59b1494..3e32834 100755
--- a/configure
+++ b/configure
@@ -2742,6 +2742,7 @@ echo "mandir=$mandir">>  $config_host_mak
  echo "datadir=$datadir">>  $config_host_mak
  echo "sysconfdir=$sysconfdir">>  $config_host_mak
  echo "docdir=$docdir">>  $config_host_mak
+echo "libexecdir=\${prefix}/libexec">>  $config_host_mak
  echo "confdir=$confdir">>  $config_host_mak

  case "$cpu" in
diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c
new file mode 100644
index 000..4ac7b36
--- /dev/null
+++ b/qemu-bridge-helper.c
@@ -0,0 +1,205 @@
+/*
+ * QEMU Bridge Helper
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ * Anthony Liguori


Heh, fairly sure that's not my email address ;-)


+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "config-host.h"
+
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include
+#include
+#include
+#include
+#include
+
+#include
+
+#include
+
+#include "net/tap-linux.h"
+
+static int has_vnet_hdr(int fd)
+{
+unsigned int features = 0;
+struct ifreq ifreq;
+
+if (ioctl(fd, TUNGETFEATURES,&features) == -1) {
+return -errno;
+}
+
+if (!(features&  IFF_VNET_HDR)) {
+return -ENOTSUP;
+}
+
+if (ioctl(fd, TUNGETIFF,&ifreq) != -1 || errno != EBADFD) {
+return -ENOTSUP;
+}
+
+return 1;
+}
+
+static void prep_ifreq(struct ifreq *ifr, const char *ifname)
+{
+memset(ifr, 0, sizeof(*ifr));
+snprintf(ifr->ifr_name, IFNAMSIZ, "%s", ifname);
+}
+
+static int send_fd(int c, int fd)
+{
+char msgbuf[CMSG_SPACE(sizeof(fd))];
+struct msghdr msg = {
+.msg_control = msgbuf,
+.msg_controllen = sizeof(msgbuf),
+};
+struct cmsghdr *cmsg;
+struct iovec iov;
+char req[1] = { 0x00 };
+
+cmsg = CMSG_FIRSTHDR(&ms

Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper

2011-10-06 Thread Daniel P. Berrange

On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote:
> This patch adds a helper that can be used to create a tap device attached to
> a bridge device.  Since this helper is minimal in what it does, it can be
> given CAP_NET_ADMIN which allows qemu to avoid running as root while still
> satisfying the majority of what users tend to want to do with tap devices.
> 
> The way this all works is that qemu launches this helper passing a bridge
> name and the name of an inherited file descriptor.  The descriptor is one
> end of a socketpair() of domain sockets.  This domain socket is used to
> transmit a file descriptor of the opened tap device from the helper to qemu.
> 
> The helper can then exit and let qemu use the tap device.

When QEMU is run by libvirt, we generally like to use capng to
remove the ability for QEMU to run setuid programs at all. So
obviously it will struggle to run the qemu-bridge-helper binary
in such a scenario.

With the way you transmit the TAP device FD back to the caller,
it looks like libvirt itself could execute the qemu-bridge-helper
receiving the FD, and then pass the FD onto QEMU using the
traditional tap,fd=XX syntax.

The TAP device FD is only one FD we normally pass to QEMU. How about
support for vhost net ? Is it reasonable to ask the qemu-bridge-helper
to send back a vhost net FD also. Or indeed multiple vhost net FDs
when we get multiqueue NICs.  Should we expect the bridge helper to
be strictly limited to just connecting a TAP dev to a bridge, or is
the expectation that it will grow more & more functionality over
time ?

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID

2011-10-06 Thread Daniel P. Berrange

On Thu, Oct 06, 2011 at 11:38:27AM -0400, Richa Marwaha wrote:
> The ideal way to use qemu-bridge-helper is to give it an fscap of using:
> 
>  setcap cap_net_admin=ep qemu-bridge-helper
> 
> Unfortunately, most distros still do not have a mechanism to package files
> with fscaps applied.  This means they'll have to SUID the qemu-bridge-helper
> binary.
> 
> To improve security, use libcap to reduce our capability set to just
> cap_net_admin, then reduce privileges down to the calling user.  This is
> hopefully close to equivalent to fscap support from a security perspective.
> +#ifdef CONFIG_LIBCAP
> +static int drop_privileges(void)
> +{
> +cap_t cap;
> +cap_value_t new_caps[] = {CAP_NET_ADMIN};
> +
> +cap = cap_init();

Check for NULL ?

> +
> +/* set capabilities to be permitted and inheritable.  we don't need the
> + * caps to be effective right now as they'll get reset when we seteuid
> + * anyway */
> +cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
> +cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET);

Check for failure ?

> +
> +if (cap_set_proc(cap) == -1) {
> +return -1;
> +}
> +
> +cap_free(cap);

Check for failure ?

> +
> +/* reduce our privileges to a normal user */
> +setegid(getgid());
> +seteuid(getuid());

Check for failure ?

> +cap = cap_init();

Check for NULL ?

> +
> +/* enable the our capabilities.  we marked them as inheritable earlier
> + * which is what allows this to work. */
> +cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET);
> +cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);

Check for failure ?

> +
> +if (cap_set_proc(cap) == -1) {
> +return -1;
> +}
> +
> +cap_free(cap);

Check for failure ?

> +
> +return 0;
> +}
> +#endif

It may seem like checking for failure on cap_free/cap_set_flag is
not required because they can only return EINVAL for invalid
args, but since this is missing the check for NULL on cap_init
you can actually see errors from those latter functions in an
OOM cenario.

I think I'd suggest not using libcap, instead try libcap-ng [1] whose
APIs are designed with safety in mind & result in much simpler and
clearer code:

eg, that entire function above can be expressed using capng with
something approximating:

 capng_clear(CAPNG_SELECT_BOTH);
 if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, 
CAP_NET_ADMIN) < 0)
 error(...);
 if (capng_change_id(getuid(), getgid(), CAPNG_DROP_SUPP_GRP | 
CAPNG_CLEAR_BOUNDING))
 error(...);


Regards,
Daniel

[1] http://people.redhat.com/sgrubb/libcap-ng/

-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [RFC] Use TCGReg for all TCG targets?

2011-10-06 Thread Richard Henderson

On 10/06/2011 09:24 AM, Stefan Weil wrote:
> Is there consensus that this is a good idea, or should
> TCGReg be removed (then all TCG targets use int) or only
> used for s390?

I think it's a good idea.


r~

[Qemu-devel] Integrating Dynamips and GNS3 UDP tunnels (Patches)

2011-10-06 Thread Benjamin Epitech

GNS3 team developed a GUI in order to inter-connect different emulated
hardware. In order
to achieve a network inter-connection between each hosts, one single
protocol is used: an
UDP tunneling protocol introduced by Dynamips (a cisco hardware emulator).

Since the beginning, GNS3 supports Qemu by providing patches for its users,
these patches
bring to Qemu the implementation of Dynamips UDP tunneling protocol.

As GNS3 improves and now supports VirtualBox, it should be time to free
users of the assle
of having to patch Qemu themselves. FreeBSD integrated our patches in the
ports tree, we
ship a patched Qemu for Windows, and we're now looking forward to integrate
those patches
upstream.

Here are the patches that apply on the latest release of Qemu, I hereby
submit them for your
approval or not.

1) Basic patch in order to build the new source file
http://code.gns3.net/qemu-patches/file/6a927b6cdaf8/Makefile_objs.patch

2) Parse -net udp
http://code.gns3.net/qemu-patches/file/6a927b6cdaf8/net_c.patch

3) New NET_CLIENT_TYPE_UDP macro
http://code.gns3.net/qemu-patches/file/6a927b6cdaf8/net_h.patch

4) New source code file, implementation of the UDP tunneling protocol
http://code.gns3.net/qemu-patches/file/6a927b6cdaf8/net_udp_c.patch

5) Corresponding header file
http://code.gns3.net/qemu-patches/file/6a927b6cdaf8/net_udp_h.patch

The hw_e1000_c.patch is no longer needed, it was a dirty hack that we kept
for too long.
The block_raw-win32_c.patch fixes a minor issue that arises only on Windows,
it may deserve
another topic.

Please include me in the replies as I am not subscribed to the list.

Regards,

Benjamin
GNS3 contributor

[Qemu-devel] [PATCH] hw/9pfs: Fix build error on platform that don't support futimens

2011-10-06 Thread Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V 
---
 hw/9pfs/virtio-9p-handle.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c
index 860b0e3..9860a87 100644
--- a/hw/9pfs/virtio-9p-handle.c
+++ b/hw/9pfs/virtio-9p-handle.c
@@ -386,12 +386,17 @@ static int handle_utimensat(FsContext *ctx, V9fsPath 
*fs_path,
 int fd, ret;
 struct handle_data *data = (struct handle_data *)ctx->private;
 
+#ifdef CONFIG_UTIMENSAT
 fd = open_by_handle(data->mountfd, fs_path->data, O_NONBLOCK);
 if (fd < 0) {
 return fd;
 }
 ret = futimens(fd, buf);
 close(fd);
+#else
+ret = -1;
+errno = ENOSYS;
+#endif
 return ret;
 }
 
@@ -591,8 +596,15 @@ static int handle_init(FsContext *ctx)
 int ret, mnt_id;
 struct statfs stbuf;
 struct file_handle fh;
-struct handle_data *data = g_malloc(sizeof(struct handle_data));
+struct handle_data *data;
 
+#ifndef CONFIG_UTIMENSAT
+/*
+ * We support handle fs driver only if futimens is provided by the host
+ */
+return -1;
+#endif
+data = g_malloc(sizeof(struct handle_data));
 data->mountfd = open(ctx->fs_root, O_DIRECTORY);
 if (data->mountfd < 0) {
 ret = data->mountfd;
-- 
1.7.4.1

[Qemu-devel] [PATCH 5/5] Revert "savevm: fix corruption in vmstate_subsection_load()."

2011-10-06 Thread Juan Quintela

This reverts commit eb60260de0b050a5e8ab725e84d377d0b44c43ae.

Conflicts:

savevm.c

We changed qemu_peek_byte() prototype, just fixed the rejects.

Signed-off-by: Juan Quintela 
Reviewed-by: Anthony Liguori 
---
 savevm.c |   10 +-
 1 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/savevm.c b/savevm.c
index 28c0a43..1c62269 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1704,12 +1704,6 @@ static const VMStateDescription 
*vmstate_get_subsection(const VMStateSubsection
 static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription *vmsd,
void *opaque)
 {
-const VMStateSubsection *sub = vmsd->subsections;
-
-if (!sub || !sub->needed) {
-return 0;
-}
-
 while (qemu_peek_byte(f, 0) == QEMU_VM_SUBSECTION) {
 char idstr[256];
 int ret;
@@ -1731,7 +1725,7 @@ static int vmstate_subsection_load(QEMUFile *f, const 
VMStateDescription *vmsd,
 /* it don't have a valid subsection name */
 return 0;
 }
-sub_vmsd = vmstate_get_subsection(sub, idstr);
+sub_vmsd = vmstate_get_subsection(vmsd->subsections, idstr);
 if (sub_vmsd == NULL) {
 return -ENOENT;
 }
@@ -1740,7 +1734,6 @@ static int vmstate_subsection_load(QEMUFile *f, const 
VMStateDescription *vmsd,
 qemu_file_skip(f, len); /* idstr */
 version_id = qemu_get_be32(f);

-assert(!sub_vmsd->subsections);
 ret = vmstate_load_state(f, sub_vmsd, opaque, version_id);
 if (ret) {
 return ret;
@@ -1764,7 +1757,6 @@ static void vmstate_subsection_save(QEMUFile *f, const 
VMStateDescription *vmsd,
 qemu_put_byte(f, len);
 qemu_put_buffer(f, (uint8_t *)vmsd->name, len);
 qemu_put_be32(f, vmsd->version_id);
-assert(!vmsd->subsections);
 vmstate_save_state(f, vmsd, opaque);
 }
 sub++;
-- 
1.7.6.4

Re: [Qemu-devel] [PATCH 4/5] savevm: improve subsections detection on load

2011-10-06 Thread Paolo Bonzini


On 10/06/2011 06:21 PM, Juan Quintela wrote:

+
+int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
+{
+int pending = size;
+int done = 0;
+
+while (pending>  0) {
+int res;
+
+res = qemu_peek_buffer(f, buf, pending, 0);
+if (res == 0) {
+return 0;
  }
-memcpy(buf, f->buf + f->buf_index, l);
-f->buf_index += l;
-buf += l;
-size -= l;
+qemu_file_skip(f, res);
+buf += res;
+pending -= res;
+done += res;
  }
-return size1 - size;
+return done;
  }


This changes semantics for reads above 32KB.  It should be in the commit 
message, or preferably v1 could be committed instead. :)


Paolo

Re: [Qemu-devel] [PATCH 3/5] savevm: define qemu_get_byte() using qemu_peek_byte()

2011-10-06 Thread Paolo Bonzini


On 10/06/2011 06:21 PM, Juan Quintela wrote:

+result = qemu_peek_byte(f);
+
+if (f->buf_index<  f->buf_size) {
+f->buf_index++;
  }


This should really be an assert that f->buf_index < f->buf_size, 
otherwise qemu_peek_byte has read garbage.


Paolo

[Qemu-devel] [RFC] Use TCGReg for all TCG targets?

2011-10-06 Thread Stefan Weil


Hi,

commit 48bb3750e13cbb5a634d3aeab5191d74d124232f
introduced the data type 'TCGReg' in tcg/s390.

Today, s390 is the only TCG target which uses TCGReg.

This causes a conflict with my commit
c0ad3001bf12292b137b05e1c4643f31c6b0a727,
because some function prototypes in tcg/s390/tcg-target.c
differ from those in all the other TCG targets.
Builds on s390 hosts are broken now.

I'd like to use TCGReg in all TCG targets, thus fixing the
conflict and improving readability of the code
('TCGReg' is more specific than 'int').

Is there consensus that this is a good idea, or should
TCGReg be removed (then all TCG targets use int) or only
used for s390?

I cc'ed all TCG maintainers because their code would
have to be changed.

Regards,
Stefan Weil

[Qemu-devel] [PATCH 4/5] savevm: improve subsections detection on load

2011-10-06 Thread Juan Quintela

We add qemu_peek_buffer, that is identical to qemu_get_buffer, just
that it don't update f->buf_index.

We add a paramenter to qemu_peek_byte() to be able to peek more than
one byte.

Once this is done, to see if we have a subsection we look:
- 1st byte is QEMU_VM_SUBSECTION
- 2nd byte is a length, and is bigger than section name
- 3rd element is a string that starts with section_name

So, we shouldn't have false positives (yes, content could still get us
wrong but probabilities are really low).

v2:
- Alex Williamsom found that we could get negative values on index.
- Rework code to fix that part.
- Rewrite qemu_get_buffer() using qemu_peek_buffer()

Signed-off-by: Juan Quintela 
---
 savevm.c |  110 ++
 1 files changed, 75 insertions(+), 35 deletions(-)

diff --git a/savevm.c b/savevm.c
index 94628c6..28c0a43 100644
--- a/savevm.c
+++ b/savevm.c
@@ -532,59 +532,85 @@ void qemu_put_byte(QEMUFile *f, int v)
 qemu_fflush(f);
 }

-int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size1)
+static void qemu_file_skip(QEMUFile *f, int size)
 {
-int size, l;
+if (f->buf_index + size < f->buf_size) {
+f->buf_index += size;
+}
+}
+
+static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
+{
+int pending;
+int index;

 if (f->is_write) {
 abort();
 }

-size = size1;
-while (size > 0) {
-l = f->buf_size - f->buf_index;
-if (l == 0) {
-qemu_fill_buffer(f);
-l = f->buf_size - f->buf_index;
-if (l == 0) {
-break;
-}
-}
-if (l > size) {
-l = size;
+index = f->buf_index + offset;
+pending = f->buf_size - index;
+if (pending < size) {
+qemu_fill_buffer(f);
+index = f->buf_index + offset;
+pending = f->buf_size - index;
+}
+
+if (pending <= 0) {
+return 0;
+}
+if (size > pending) {
+size = pending;
+}
+
+memcpy(buf, f->buf + index, size);
+return size;
+}
+
+int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
+{
+int pending = size;
+int done = 0;
+
+while (pending > 0) {
+int res;
+
+res = qemu_peek_buffer(f, buf, pending, 0);
+if (res == 0) {
+return 0;
 }
-memcpy(buf, f->buf + f->buf_index, l);
-f->buf_index += l;
-buf += l;
-size -= l;
+qemu_file_skip(f, res);
+buf += res;
+pending -= res;
+done += res;
 }
-return size1 - size;
+return done;
 }

-static int qemu_peek_byte(QEMUFile *f)
+static int qemu_peek_byte(QEMUFile *f, int offset)
 {
+int index = f->buf_index + offset;
+
 if (f->is_write) {
 abort();
 }

-if (f->buf_index >= f->buf_size) {
+if (index >= f->buf_size) {
 qemu_fill_buffer(f);
-if (f->buf_index >= f->buf_size) {
+index = f->buf_index + offset;
+if (index >= f->buf_size) {
 return 0;
 }
 }
-return f->buf[f->buf_index];
+return f->buf[index];
 }

 int qemu_get_byte(QEMUFile *f)
 {
 int result;

-result = qemu_peek_byte(f);
-
-if (f->buf_index < f->buf_size) {
-f->buf_index++;
-}
+result = qemu_peek_byte(f, 0);
+qemu_file_skip(f, 1);
 return result;
 }

@@ -1684,22 +1710,36 @@ static int vmstate_subsection_load(QEMUFile *f, const 
VMStateDescription *vmsd,
 return 0;
 }

-while (qemu_peek_byte(f) == QEMU_VM_SUBSECTION) {
+while (qemu_peek_byte(f, 0) == QEMU_VM_SUBSECTION) {
 char idstr[256];
 int ret;
-uint8_t version_id, len;
+uint8_t version_id, len, size;
 const VMStateDescription *sub_vmsd;

-qemu_get_byte(f); /* subsection */
-len = qemu_get_byte(f);
-qemu_get_buffer(f, (uint8_t *)idstr, len);
-idstr[len] = 0;
-version_id = qemu_get_be32(f);
+len = qemu_peek_byte(f, 1);
+if (len < strlen(vmsd->name) + 1) {
+/* subsection name has be be "section_name/a" */
+return 0;
+}
+size = qemu_peek_buffer(f, (uint8_t *)idstr, len, 2);
+if (size != len) {
+return 0;
+}
+idstr[size] = 0;

+if (strncmp(vmsd->name, idstr, strlen(vmsd->name)) != 0) {
+/* it don't have a valid subsection name */
+return 0;
+}
 sub_vmsd = vmstate_get_subsection(sub, idstr);
 if (sub_vmsd == NULL) {
 return -ENOENT;
 }
+qemu_file_skip(f, 1); /* subsection */
+qemu_file_skip(f, 1); /* len */
+qemu_file_skip(f, len); /* idstr */
+version_id = qemu_get_be32(f);
+
 assert(!sub_vmsd->subsections);
 ret = vmstate_load_state(f, sub_vmsd, opaque, version_id);
 if (ret) {
-- 
1.7.6.4

[Qemu-devel] [PATCH 3/5] savevm: define qemu_get_byte() using qemu_peek_byte()

2011-10-06 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 savevm.c |   15 ++-
 1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/savevm.c b/savevm.c
index 4069b34..94628c6 100644
--- a/savevm.c
+++ b/savevm.c
@@ -578,17 +578,14 @@ static int qemu_peek_byte(QEMUFile *f)

 int qemu_get_byte(QEMUFile *f)
 {
-if (f->is_write) {
-abort();
-}
+int result;

-if (f->buf_index >= f->buf_size) {
-qemu_fill_buffer(f);
-if (f->buf_index >= f->buf_size) {
-return 0;
-}
+result = qemu_peek_byte(f);
+
+if (f->buf_index < f->buf_size) {
+f->buf_index++;
 }
-return f->buf[f->buf_index++];
+return result;
 }

 int64_t qemu_ftell(QEMUFile *f)
-- 
1.7.6.4

[Qemu-devel] [PATCH 2/5] savevm: some coding style cleanups

2011-10-06 Thread Juan Quintela

This patch will make moving code on next patches and having checkpatch
happy easier.

Signed-off-by: Juan Quintela 
Reviewed-by: Anthony Liguori 
---
 savevm.c |   21 ++---
 1 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/savevm.c b/savevm.c
index 743c304..4069b34 100644
--- a/savevm.c
+++ b/savevm.c
@@ -536,8 +536,9 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size1)
 {
 int size, l;

-if (f->is_write)
+if (f->is_write) {
 abort();
+}

 size = size1;
 while (size > 0) {
@@ -545,11 +546,13 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size1)
 if (l == 0) {
 qemu_fill_buffer(f);
 l = f->buf_size - f->buf_index;
-if (l == 0)
+if (l == 0) {
 break;
+}
 }
-if (l > size)
+if (l > size) {
 l = size;
+}
 memcpy(buf, f->buf + f->buf_index, l);
 f->buf_index += l;
 buf += l;
@@ -560,26 +563,30 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size1)

 static int qemu_peek_byte(QEMUFile *f)
 {
-if (f->is_write)
+if (f->is_write) {
 abort();
+}

 if (f->buf_index >= f->buf_size) {
 qemu_fill_buffer(f);
-if (f->buf_index >= f->buf_size)
+if (f->buf_index >= f->buf_size) {
 return 0;
+}
 }
 return f->buf[f->buf_index];
 }

 int qemu_get_byte(QEMUFile *f)
 {
-if (f->is_write)
+if (f->is_write) {
 abort();
+}

 if (f->buf_index >= f->buf_size) {
 qemu_fill_buffer(f);
-if (f->buf_index >= f->buf_size)
+if (f->buf_index >= f->buf_size) {
 return 0;
+}
 }
 return f->buf[f->buf_index++];
 }
-- 
1.7.6.4

[Qemu-devel] [PATCH 1/5] savevm: teach qemu_fill_buffer to do partial refills

2011-10-06 Thread Juan Quintela

We will need on next patch to be able to lookahead on next patch

v2: rename "used" to "pending" (Alex Williams)

Signed-off-by: Juan Quintela 
Reviewed-by: Anthony Liguori 
---
 savevm.c |   14 +++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/savevm.c b/savevm.c
index 46f2447..743c304 100644
--- a/savevm.c
+++ b/savevm.c
@@ -455,6 +455,7 @@ void qemu_fflush(QEMUFile *f)
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
+int pending;

 if (!f->get_buffer)
 return;
@@ -462,10 +463,17 @@ static void qemu_fill_buffer(QEMUFile *f)
 if (f->is_write)
 abort();

-len = f->get_buffer(f->opaque, f->buf, f->buf_offset, IO_BUF_SIZE);
+pending = f->buf_size - f->buf_index;
+if (pending > 0) {
+memmove(f->buf, f->buf + f->buf_index, pending);
+}
+f->buf_index = 0;
+f->buf_size = pending;
+
+len = f->get_buffer(f->opaque, f->buf + pending, f->buf_offset,
+IO_BUF_SIZE - pending);
 if (len > 0) {
-f->buf_index = 0;
-f->buf_size = len;
+f->buf_size += len;
 f->buf_offset += len;
 } else if (len != -EAGAIN)
 f->has_error = 1;
-- 
1.7.6.4

[Qemu-devel] [PATCH 0/5] migration: Improve subsections detection

2011-10-06 Thread Juan Quintela

Hi

v2:
- rename "used" to "remaining" (Alex suggestion)
- implement qemu_get_{byte,buffer} on top of qemu_peek_{byte, buffer}
  (Anthony suggestion)
- fix qemu_peek_buffe_logic (Alex  discovered the problem)

v1:
This series move the subsections detection code form:
- Look that it starts form 5
To:
- Look that it starts form 5 (SUBSECTION)
- Look at the length
- Look that length is bigger than section name
- Look at the idstr and see that it starts with the subsection name.

Please review.

Later, Juan.

Juan Quintela (5):
  savevm: teach qemu_fill_buffer to do partial refills
  savevm: some coding style cleanups
  savevm: define qemu_get_byte() using qemu_peek_byte()
  savevm: improve subsections detection on load
  Revert "savevm: fix corruption in vmstate_subsection_load()."

 savevm.c |  144 -
 1 files changed, 94 insertions(+), 50 deletions(-)

-- 
1.7.6.4

[Qemu-devel] [PATCH 0/4] -net tap: rootless bridge support for qemu

2011-10-06 Thread Richa Marwaha

With qemu it possible to run guest with unprivileged user but if
we wanted to communicate with the outside world we had to switch
to root.

We address this problem by introducing a new network option.This
option is less flexible as compare to other -net tap options because
it relies on a helper with elevated privileges to do the heavy lifting
of allocating and attaching a tap device to a bridge.  We use a special
purpose helper because we don't want to elevate the privileges of more
generic tools like brctl.

Qemu can be run with the default network helper as follows (in
this case attaching the tap device to the default qemubr0 bridge):

 qemu -hda linux.img -net tap,helper=/usr/local/libexec/qemu-bridge-helper 
-net nic

We're not overly thrilled with having to spell out the helper file name,
however we didn't want to regress any current behavior of -net tap.
Additionally, we feel that this support makes sense in the -net tap backend.
Any suggestions to improve on this are more than welcome.

The default helper uses it's own ACL mechanism for access control,but
future network helpers could be developed, for example, to support PolicyKit
for access control.

More details are included in individual patches.The helper is broken into
a series of patches to improve reviewabilty.

Richa Marwaha (4):
  Add basic version of bridge helper
  Add access control support to qemu-bridge-helper
  Add cap reduction support to enable use as SUID
  Add support for bridge

 Makefile |   12 ++-
 configure|   37 +
 net.c|8 +
 net.h|2 +
 net/tap.c|  150 ++-
 qemu-bridge-helper.c |  402 ++
 qemu-options.hx  |   48 +--
 7 files changed, 637 insertions(+), 22 deletions(-)
 create mode 100644 qemu-bridge-helper.c

[Qemu-devel] [PATCH 1/4] Add basic version of bridge helper

2011-10-06 Thread Richa Marwaha

This patch adds a helper that can be used to create a tap device attached to
a bridge device.  Since this helper is minimal in what it does, it can be
given CAP_NET_ADMIN which allows qemu to avoid running as root while still
satisfying the majority of what users tend to want to do with tap devices.

The way this all works is that qemu launches this helper passing a bridge
name and the name of an inherited file descriptor.  The descriptor is one
end of a socketpair() of domain sockets.  This domain socket is used to
transmit a file descriptor of the opened tap device from the helper to qemu.

The helper can then exit and let qemu use the tap device.

Signed-off-by: Richa Marwaha 
---
 Makefile |   12 +++-
 configure|1 +
 qemu-bridge-helper.c |  205 ++
 3 files changed, 216 insertions(+), 2 deletions(-)
 create mode 100644 qemu-bridge-helper.c

diff --git a/Makefile b/Makefile
index 6ed3194..f2caedc 100644
--- a/Makefile
+++ b/Makefile
@@ -34,6 +34,8 @@ $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw)
 
 LIBS+=-lz $(LIBS_TOOLS)
 
+HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF)
+
 ifdef BUILD_DOCS
 DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 
QMP/qmp-commands.txt
 else
@@ -74,7 +76,7 @@ defconfig:
 
 -include config-all-devices.mak
 
-build-all: $(DOCS) $(TOOLS) recurse-all
+build-all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all
 
 config-host.h: config-host.h-timestamp
 config-host.h-timestamp: config-host.mak
@@ -151,6 +153,8 @@ qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o 
$(oslib-obj-y) $(trace-ob
 
 qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(oslib-obj-y) 
$(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) 
qemu-timer-common.o
 
+qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o
+
 qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"  GEN  
 $@")
 
@@ -208,7 +212,7 @@ clean:
 # avoid old build problems by removing potentially incorrect old files
rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h 
gen-op-arm.h
rm -f qemu-options.def
-   rm -f *.o *.d *.a *.lo $(TOOLS) qemu-ga TAGS cscope.* *.pod *~ */*~
+   rm -f *.o *.d *.a *.lo $(TOOLS) $(HELPERS-y) qemu-ga TAGS cscope.* 
*.pod *~ */*~
rm -Rf .libs
rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d 
net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d qapi/*.o qapi/*.d qga/*.o 
qga/*.d
rm -f qemu-img-cmds.h
@@ -275,6 +279,10 @@ install: all $(if $(BUILD_DOCS),install-doc) 
install-sysconfig
 ifneq ($(TOOLS),)
$(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) "$(DESTDIR)$(bindir)"
 endif
+ifneq ($(HELPERS-y),)
+   $(INSTALL_DIR) "$(DESTDIR)$(libexecdir)"
+   $(INSTALL_PROG) $(STRIP_OPT) $(HELPERS-y) "$(DESTDIR)$(libexecdir)"
+endif
 ifneq ($(BLOBS),)
$(INSTALL_DIR) "$(DESTDIR)$(datadir)"
set -e; for x in $(BLOBS); do \
diff --git a/configure b/configure
index 59b1494..3e32834 100755
--- a/configure
+++ b/configure
@@ -2742,6 +2742,7 @@ echo "mandir=$mandir" >> $config_host_mak
 echo "datadir=$datadir" >> $config_host_mak
 echo "sysconfdir=$sysconfdir" >> $config_host_mak
 echo "docdir=$docdir" >> $config_host_mak
+echo "libexecdir=\${prefix}/libexec" >> $config_host_mak
 echo "confdir=$confdir" >> $config_host_mak
 
 case "$cpu" in
diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c
new file mode 100644
index 000..4ac7b36
--- /dev/null
+++ b/qemu-bridge-helper.c
@@ -0,0 +1,205 @@
+/*
+ * QEMU Bridge Helper
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ * Anthony Liguori   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "config-host.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+
+#include "net/tap-linux.h"
+
+static int has_vnet_hdr(int fd)
+{
+unsigned int features = 0;
+struct ifreq ifreq;
+
+if (ioctl(fd, TUNGETFEATURES, &features) == -1) {
+return -errno;
+}
+
+if (!(features & IFF_VNET_HDR)) {
+return -ENOTSUP;
+}
+
+if (ioctl(fd, TUNGETIFF, &ifreq) != -1 || errno != EBADFD) {
+return -ENOTSUP;
+}
+
+return 1;
+}
+
+static void prep_ifreq(struct ifreq *ifr, const char *ifname)
+{
+memset(ifr, 0, sizeof(*ifr));
+snprintf(ifr->ifr_name, IFNAMSIZ, "%s", ifname);
+}
+
+static int send_fd(int c, int fd)
+{
+char msgbuf[CMSG_SPACE(sizeof(fd))];
+struct msghdr msg = {
+.msg_control = msgbuf,
+.msg_controllen = sizeof(msgbuf),
+};
+struct cmsghdr *cmsg;
+struct iovec iov;
+char req[1] = { 0x00 };
+
+cmsg = CMSG_FIRSTHDR(&msg);
+cmsg->cmsg_level = SOL_SOCKET;
+cmsg->cmsg_type = SCM_RIGHTS;
+cmsg->cmsg_len =

[Qemu-devel] [PATCH 2/4] Add access control support to qemu-bridge-helper

2011-10-06 Thread Richa Marwaha

We go to great lengths to restrict ourselves to just cap_net_admin as an OS
enforced security mechanism.  However, we further restrict what we allow users
to do to simply adding a tap device to a bridge interface by virtue of the fact
that this is the only functionality we expose.

This is not good enough though.  An administrator is likely to want to restrict
the bridges that an unprivileged user can access, in particular, to restrict
an unprivileged user from putting a guest on what should be isolated networks.

This patch implements a ACL mechanism that is enforced by qemu-bridge-helper.
The ACLs are fairly simple whitelist/blacklist mechanisms with a wildcard of
'all'.

An interesting feature of this ACL mechanism is that you can include external
ACL files.  The main reason to support this is so that you can set different
file system permissions on those external ACL files.  This allows an
administrator to implement rather sophisicated ACL policies based on user/group
policies via the file system.

As an example:

/etc/qemu/bridge.conf root:qemu 0640

 deny all
 allow br0
 include /etc/qemu/alice.conf
 include /etc/qemu/bob.conf

/etc/qemu/alice.conf root:alice 0640
 allow br1

/etc/qemu/bob.conf root:bob 0640
 allow br2

This ACL pattern allows any user in the qemu group to get a tap device
connected to br0 (which is bridged to the physical network).

Users in the alice group can additionally get a tap device connected to br1.
This allows br1 to act as a private bridge for the alice group.

Users in the bob group can additionally get a tap device connected to br2.
This allows br2 to act as a private bridge for the bob group.

Under no circumstance can the bob group get access to br1 or can the alice
group get access to br2.

Signed-off-by: Richa Marwaha 
---
 qemu-bridge-helper.c |  141 ++
 1 files changed, 141 insertions(+), 0 deletions(-)

diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c
index 4ac7b36..5e09fea 100644
--- a/qemu-bridge-helper.c
+++ b/qemu-bridge-helper.c
@@ -33,6 +33,105 @@
 
 #include "net/tap-linux.h"
 
+#define MAX_ACLS (128)
+#define DEFAULT_ACL_FILE CONFIG_QEMU_CONFDIR "/bridge.conf"
+
+enum {
+ACL_ALLOW = 0,
+ACL_ALLOW_ALL,
+ACL_DENY,
+ACL_DENY_ALL,
+};
+
+typedef struct ACLRule {
+int type;
+char iface[IFNAMSIZ];
+} ACLRule;
+
+static int parse_acl_file(const char *filename, ACLRule *acls, int *pacl_count)
+{
+int acl_count = *pacl_count;
+FILE *f;
+char line[4096];
+
+f = fopen(filename, "r");
+if (f == NULL) {
+return -1;
+}
+
+while (acl_count != MAX_ACLS &&
+fgets(line, sizeof(line), f) != NULL) {
+char *ptr = line;
+char *cmd, *arg, *argend;
+
+while (isspace(*ptr)) {
+ptr++;
+}
+
+/* skip comments and empty lines */
+if (*ptr == '#' || *ptr == 0) {
+continue;
+}
+
+cmd = ptr;
+arg = strchr(cmd, ' ');
+if (arg == NULL) {
+arg = strchr(cmd, '\t');
+}
+
+if (arg == NULL) {
+fprintf(stderr, "Invalid config line:\n  %s\n", line);
+fclose(f);
+errno = EINVAL;
+return -1;
+}
+
+*arg = 0;
+arg++;
+while (isspace(*arg)) {
+arg++;
+}
+
+argend = arg + strlen(arg);
+while (arg != argend && isspace(*(argend - 1))) {
+argend--;
+}
+*argend = 0;
+
+if (strcmp(cmd, "deny") == 0) {
+if (strcmp(arg, "all") == 0) {
+acls[acl_count].type = ACL_DENY_ALL;
+} else {
+acls[acl_count].type = ACL_DENY;
+snprintf(acls[acl_count].iface, IFNAMSIZ, "%s", arg);
+}
+acl_count++;
+} else if (strcmp(cmd, "allow") == 0) {
+if (strcmp(arg, "all") == 0) {
+acls[acl_count].type = ACL_ALLOW_ALL;
+} else {
+acls[acl_count].type = ACL_ALLOW;
+snprintf(acls[acl_count].iface, IFNAMSIZ, "%s", arg);
+}
+acl_count++;
+} else if (strcmp(cmd, "include") == 0) {
+/* ignore errors */
+parse_acl_file(arg, acls, &acl_count);
+} else {
+fprintf(stderr, "Unknown command `%s'\n", cmd);
+fclose(f);
+errno = EINVAL;
+return -1;
+}
+}
+
+*pacl_count = acl_count;
+
+fclose(f);
+
+return 0;
+}
+
 static int has_vnet_hdr(int fd)
 {
 unsigned int features = 0;
@@ -95,6 +194,9 @@ int main(int argc, char **argv)
 const char *bridge;
 char iface[IFNAMSIZ];
 int index;
+ACLRule acls[MAX_ACLS];
+int acl_count = 0;
+int i, access_allowed, access_denied;
 
 /* parse arguments */
 if (argc < 3 || argc > 4) {
@@ -115,6 +217,45 @@ int main(int argc, char **argv)
 bridge = argv[index++];

[Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID

2011-10-06 Thread Richa Marwaha

The ideal way to use qemu-bridge-helper is to give it an fscap of using:

 setcap cap_net_admin=ep qemu-bridge-helper

Unfortunately, most distros still do not have a mechanism to package files
with fscaps applied.  This means they'll have to SUID the qemu-bridge-helper
binary.

To improve security, use libcap to reduce our capability set to just
cap_net_admin, then reduce privileges down to the calling user.  This is
hopefully close to equivalent to fscap support from a security perspective.

Signed-off-by: Richa Marwaha 
---
 configure|   34 ++
 qemu-bridge-helper.c |   56 ++
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 3e32834..f46e9b7 100755
--- a/configure
+++ b/configure
@@ -128,6 +128,7 @@ vnc_thread="no"
 xen=""
 xen_ctrl_version=""
 linux_aio=""
+cap=""
 attr=""
 xfs=""
 
@@ -653,6 +654,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-cap)  cap="no"
+  ;;
+  --enable-cap) cap="yes"
+  ;;
   --disable-spice) spice="no"
   ;;
   --enable-spice) spice="yes"
@@ -1032,6 +1037,8 @@ echo "  --disable-vdedisable support for vde 
network"
 echo "  --enable-vde enable support for vde network"
 echo "  --disable-linux-aio  disable Linux AIO support"
 echo "  --enable-linux-aio   enable Linux AIO support"
+echo "  --disable-capdisable libcap support"
+echo "  --enable-cap enable libcap support"
 echo "  --disable-attr   disables attr and xattr support"
 echo "  --enable-attrenable attr and xattr support"
 echo "  --disable-blobs  disable installing provided firmware blobs"
@@ -1638,6 +1645,29 @@ EOF
 fi
 
 ##
+# cap library probe
+if test "$cap" != "no" ; then
+  cap_libs="-lcap"
+  cat > $TMPC << EOF
+#include 
+int main(void)
+{
+cap_init();
+return 0;
+}
+EOF
+  if compile_prog "" "$cap_libs" ; then
+cap=yes
+libs_tools="$cap_libs $libs_tools"
+  else
+if test "$cap" = "yes" ; then
+  feature_not_found "cap"
+fi
+cap=no
+  fi
+fi
+
+##
 # Sound support libraries probe
 
 audio_drv_probe()
@@ -2710,6 +2740,7 @@ echo "fdatasync $fdatasync"
 echo "madvise   $madvise"
 echo "posix_madvise $posix_madvise"
 echo "uuid support  $uuid"
+echo "libcap support$cap"
 echo "vhost-net support $vhost_net"
 echo "Trace backend $trace_backend"
 echo "Trace output file $trace_file-"
@@ -2821,6 +2852,9 @@ fi
 if test "$vde" = "yes" ; then
   echo "CONFIG_VDE=y" >> $config_host_mak
 fi
+if test "$cap" = "yes" ; then
+  echo "CONFIG_LIBCAP=y" >> $config_host_mak
+fi
 for card in $audio_card_list; do
 def=CONFIG_`echo $card | tr '[:lower:]' '[:upper:]'`
 echo "$def=y" >> $config_host_mak
diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c
index 5e09fea..b1519e0 100644
--- a/qemu-bridge-helper.c
+++ b/qemu-bridge-helper.c
@@ -33,6 +33,10 @@
 
 #include "net/tap-linux.h"
 
+#ifdef CONFIG_LIBCAP
+#include 
+#endif
+
 #define MAX_ACLS (128)
 #define DEFAULT_ACL_FILE CONFIG_QEMU_CONFDIR "/bridge.conf"
 
@@ -185,6 +189,47 @@ static int send_fd(int c, int fd)
 return sendmsg(c, &msg, 0);
 }
 
+#ifdef CONFIG_LIBCAP
+static int drop_privileges(void)
+{
+cap_t cap;
+cap_value_t new_caps[] = {CAP_NET_ADMIN};
+
+cap = cap_init();
+
+/* set capabilities to be permitted and inheritable.  we don't need the
+ * caps to be effective right now as they'll get reset when we seteuid
+ * anyway */
+cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
+cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET);
+
+if (cap_set_proc(cap) == -1) {
+return -1;
+}
+
+cap_free(cap);
+
+/* reduce our privileges to a normal user */
+setegid(getgid());
+seteuid(getuid());
+
+cap = cap_init();
+
+/* enable the our capabilities.  we marked them as inheritable earlier
+ * which is what allows this to work. */
+cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET);
+cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
+
+if (cap_set_proc(cap) == -1) {
+return -1;
+}
+
+cap_free(cap);
+
+return 0;
+}
+#endif
+
 int main(int argc, char **argv)
 {
 struct ifreq ifr;
@@ -198,6 +243,17 @@ int main(int argc, char **argv)
 int acl_count = 0;
 int i, access_allowed, access_denied;
 
+#ifdef CONFIG_LIBCAP
+/* if we're run from an suid binary, immediately drop privileges preserving
+ * cap_net_admin */
+if (geteuid() == 0 && getuid() != geteuid()) {
+if (drop_privileges() == -1) {
+fprintf(stderr, "failed to drop privileges\n");
+return 1;
+}
+}
+#endif
+
 /* parse arguments */
 if (argc < 3 || argc > 4) {
 fprintf(stderr, "Usage: %s [--use-vnet] BRIDGE FD\n", argv[0]);
-- 
1.7.1

[Qemu-devel] [PATCH 4/4] Add support for bridge

2011-10-06 Thread Richa Marwaha

The most common use of -net tap is to connect a tap device to a bridge.  This
requires the use of a script and running qemu as root in order to allocate a
tap device to pass to the script.

This model is great for portability and flexibility but it's incredibly
difficult to eliminate the need to run qemu as root.  The only really viable
mechanism is to use tunctl to create a tap device, attach it to a bridge as
root, and then hand that tap device to qemu.  The problem with this mechanism
is that it requires administrator intervention whenever a user wants to create
a guest.

By essentially writing a helper that implements the most common qemu-ifup
script that can be safely given cap_net_admin, we can dramatically simplify
things for non-privileged users.  We still support existing -net tap options
as a mechanism for advanced users and backwards compatibility.

Currently, this is very Linux centric but there's really no reason why it
couldn't be extended for other Unixes.

The default bridge that we attach to is qemubr0.  The thinking is that a distro
could preconfigure such an interface to allow out-of-the-box bridged networking.

Alternatively, if a user wants to use a different bridge, they can say:

  qemu-hda linux.img -net 
tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
 -net nic,model=virtio

Signed-off-by: Richa Marwaha 
---
 configure   |2 +
 net.c   |8 +++
 net.h   |2 +
 net/tap.c   |  150 ---
 qemu-options.hx |   48 +-
 5 files changed, 190 insertions(+), 20 deletions(-)

diff --git a/configure b/configure
index f46e9b7..ef05954 100755
--- a/configure
+++ b/configure
@@ -2775,6 +2775,8 @@ echo "sysconfdir=$sysconfdir" >> $config_host_mak
 echo "docdir=$docdir" >> $config_host_mak
 echo "libexecdir=\${prefix}/libexec" >> $config_host_mak
 echo "confdir=$confdir" >> $config_host_mak
+echo "CONFIG_QEMU_SHAREDIR=\"$prefix$datasuffix\"" >> $config_host_mak
+echo "CONFIG_QEMU_HELPERDIR=\"$prefix/libexec\"" >> $config_host_mak
 
 case "$cpu" in
   
i386|x86_64|alpha|cris|hppa|ia64|lm32|m68k|microblaze|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64|unicore32)
diff --git a/net.c b/net.c
index d05930c..4c3c551 100644
--- a/net.c
+++ b/net.c
@@ -956,6 +956,14 @@ static const struct {
 .type = QEMU_OPT_STRING,
 .help = "script to shut down the interface",
 }, {
+.name = "br",
+.type = QEMU_OPT_STRING,
+.help = "bridge name",
+}, {
+.name = "helper",
+.type = QEMU_OPT_STRING,
+.help = "command to execute to configure bridge",
+}, {
 .name = "sndbuf",
 .type = QEMU_OPT_SIZE,
 .help = "send buffer limit"
diff --git a/net.h b/net.h
index 9f633f8..eeb19a7 100644
--- a/net.h
+++ b/net.h
@@ -174,6 +174,8 @@ int do_netdev_del(Monitor *mon, const QDict *qdict, QObject 
**ret_data);
 
 #define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup"
 #define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown"
+#define DEFAULT_BRIDGE_HELPER CONFIG_QEMU_HELPERDIR "/qemu-bridge-helper"
+#define DEFAULT_BRIDGE_INTERFACE "qemubr0"
 
 void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd);
 
diff --git a/net/tap.c b/net/tap.c
index 1f26dc9..74f103a 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -388,6 +388,108 @@ static int launch_script(const char *setup_script, const 
char *ifname, int fd)
 return -1;
 }
 
+static int recv_fd(int c)
+{
+int fd;
+uint8_t msgbuf[CMSG_SPACE(sizeof(fd))];
+struct msghdr msg = {
+.msg_control = msgbuf,
+.msg_controllen = sizeof(msgbuf),
+};
+struct cmsghdr *cmsg;
+struct iovec iov;
+uint8_t req[1];
+ssize_t len;
+
+cmsg = CMSG_FIRSTHDR(&msg);
+cmsg->cmsg_level = SOL_SOCKET;
+cmsg->cmsg_type = SCM_RIGHTS;
+cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
+msg.msg_controllen = cmsg->cmsg_len;
+
+iov.iov_base = req;
+iov.iov_len = sizeof(req);
+
+msg.msg_iov = &iov;
+msg.msg_iovlen = 1;
+
+len = recvmsg(c, &msg, 0);
+if (len > 0) {
+memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd));
+return fd;
+}
+
+return len;
+}
+
+static int net_bridge_run_helper(const char *helper, const char *bridge)
+{
+sigset_t oldmask, mask;
+int pid, status;
+char *args[5];
+char **parg;
+int sv[2];
+
+sigemptyset(&mask);
+sigaddset(&mask, SIGCHLD);
+sigprocmask(SIG_BLOCK, &mask, &oldmask);
+
+if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+return -1;
+}
+
+/* try to launch bridge helper */
+pid = fork();
+if (pid == 0) {
+int open_max = sysconf(_SC_OPEN_MAX), i;
+char buf[32];
+
+snprintf(buf, sizeof(buf), "%d", sv[1]);
+
+for (i = 0; i < open_max; i++) {
+if (i != STDIN_FILENO &&
+

[Qemu-devel] Running Qemu on Mac OS 10.7

2011-10-06 Thread mneug

Hi,

Has anyone on this list tried and succeeded in running a recent version of
Qemu on Mac OS X?
Are there any precautions that need to be taken? For me 0.15.0 compiles
fine on OS X 10.7, but crashes right after trying to load an image with a
segfault:

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x003a
0   qemu0x00010d2cf5bd
helper_svm_check_intercept_param + 29 (op_helper.c:5236)
1   qemu0x00010d2de109 helper_write_crN +
41 (op_helper.c:2946)
2   ??? 0x0001113d107c 0 + 4584181884

Cheers,
Matthias

[Qemu-devel] [RFC] 1.0 release schedule adjustment (spreading out RCs)

2011-10-06 Thread Anthony Liguori


Hi,

I'm trying to map out the 1.1 release using the same formula as the 1.0 release. 
 To make things work a bit better, I'd like to adjust the -rc schedule a bit. 
Namely:


| 2011-11-01
| Freeze master
|-
| 2011-11-04  ->   2011-11-07
| Tag qemu-1.0-rc1
|-
| 2011-11-11  ->   2011-11-14
| Tag qemu-1.0-rc2
|-
| 2011-11-18  ->   2011-11-21
| Tag qemu-1.0-rc3
|-
| 2011-11-23  ->   2011-11-28
| Tag qemu-1.0-rc4
|-
| 2011-12-01
| Tag qemu-1.0

I had squashed things originally because of the US Thanksgiving holiday on the 
25th but realistically, the 28th is no better than the 23rd.  This spreads out 
the -rcs a bit more evenly.


Any thoughts/objections?

Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH] runstate: do not discard runstate changes when paused

2011-10-06 Thread Jan Kiszka

On 2011-10-06 16:27, Avi Kivity wrote:
> On 10/05/2011 08:02 PM, Jan Kiszka wrote:
>> >
>> >  Let's examine a concrete example: a user is debugging a guest, which
>> >  stops at a breakpoint.  Meanwhile a live migration is going on,
>> >  involving internal stops.  When the guest does manage to run for a
>> bit,
>> >  it runs out of disk space, generating a stop, which the management
>> agent
>> >  resolves by allocating more space and issuing a cont.
>> >
>> >  With a counting cont, no matter in what order these events happen,
>> >  things work out fine.  How do they work out with your proposal?
>>
>> We can enforce stop for temporal reasons (migration/savevm), something
>> that overrules user/management initiated stops.
> 
> Migration resume shouldn't overrule user stop.

That's not what I had in mind. Migration stop could overrule user resume.

But that discussion is moot as there is no time span where this could
happen. Migration just needs to re-enter the original state on error,
savevm/loadvm restore what it found on entry. All this is atomic /wrt
other agents.

> 
> It's really simple.  If any agent wants the system stopped, it's
> stopped.  Only when no one wants it stopped, it may run.
> 
>>
>> BTW, does stop due to migration actually have a window where it accepts
>> other commands? I thought that phase is synchronous. Then we would just
>> have to implement proper state saving/restoring.
> 
> Save: ++stop_count, restore: --stop_count.
> 
>>
>> Anyway, there is no point in lock counting for stop reasons that require
>> external synchronization anyway. gdb vs. management stack vs. human
>> monitor - nothing is solved by counting the stops, they all can step on
>> each other's shoes.
> 
> Please elaborate.

Every agent can issue every monitor command. If you have a gdb session
running, you don't want the management stack to migrate your VM away or
mess with it otherwise. If you try to migrate a machine, you don't want
any other agent change its configuration beforehand, adding a device
that is not present on the target, etc.

> 
>> Even worse, exposing a counting stop via the user
>> interface requires additional interfaces to recover lost or forgotten
>> locks. We've discussed this in the past IIRC.
>>
> 
> Agree with that.  So there's the second proposal:
> 
> vm_stop(unsigned reason)
> {
> if (!stop_state) {
> do_vm_stop();
> }
> stop_state |= 1 << reason;
> }
> 
> vm_resume(unsigned reason)
> {
> stop_state &= ~(1 << reason);
> if (!stop_state) {
> do_vm_resume();
> }
> }
> 
> so now each agent is separated from the other.
> 

Stop reasons are orthogonal to agents.

BTW, the above model would still require extending the user interface to
report pending stop reasons and allow specifying resume reasons.

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] Set an invalid-bits mask for each SPE instructions

2011-10-06 Thread Fabien Chouteau

On 28/09/2011 17:54, Fabien Chouteau wrote:
> SPE instructions are defined by pairs. Currently, the invalid-bits mask is set
> for the first instruction, but the second one can have a different mask.
> 
> example:
> GEN_SPE(efdcmpeq,efdcfs,  0x17, 0x0B, 0x0060, 0x0018, 
> PPC_SPE_DOUBLE),
> 

Any comments?

-- 
Fabien Chouteau

[Qemu-devel] [PATCH V5] Add stdio char device on windows

2011-10-06 Thread Fabien Chouteau

Simple implementation of an stdio char device on Windows.

Signed-off-by: Fabien Chouteau 
---
 qemu-char.c |  227 ++-
 1 files changed, 225 insertions(+), 2 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 09d2309..b9381be 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -538,6 +538,9 @@ int send_all(int fd, const void *_buf, int len1)
 }
 #endif /* !_WIN32 */
 
+#define STDIO_MAX_CLIENTS 1
+static int stdio_nb_clients;
+
 #ifndef _WIN32
 
 typedef struct {
@@ -545,8 +548,6 @@ typedef struct {
 int max_size;
 } FDCharDriver;
 
-#define STDIO_MAX_CLIENTS 1
-static int stdio_nb_clients = 0;
 
 static int fd_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
 {
@@ -1451,6 +1452,8 @@ static int qemu_chr_open_pp(QemuOpts *opts, 
CharDriverState **_chr)
 
 #else /* _WIN32 */
 
+static CharDriverState *stdio_clients[STDIO_MAX_CLIENTS];
+
 typedef struct {
 int max_size;
 HANDLE hcom, hrecv, hsend;
@@ -1459,6 +1462,14 @@ typedef struct {
 DWORD len;
 } WinCharState;
 
+typedef struct {
+HANDLE  hStdIn;
+HANDLE  hInputReadyEvent;
+HANDLE  hInputDoneEvent;
+HANDLE  hInputThread;
+uint8_t win_stdio_buf;
+} WinStdioCharState;
+
 #define NSENDBUF 2048
 #define NRECVBUF 2048
 #define MAXCONNECT 1
@@ -1809,6 +1820,217 @@ static int qemu_chr_open_win_file_out(QemuOpts *opts, 
CharDriverState **_chr)
 
 return qemu_chr_open_win_file(fd_out, _chr);
 }
+
+static int win_stdio_write(CharDriverState *chr, const uint8_t *buf, int len)
+{
+HANDLE  hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
+DWORD   dwSize;
+int len1;
+
+len1 = len;
+
+while (len1 > 0) {
+if (!WriteFile(hStdOut, buf, len1, &dwSize, NULL)) {
+break;
+}
+buf  += dwSize;
+len1 -= dwSize;
+}
+
+return len - len1;
+}
+
+static void win_stdio_wait_func(void *opaque)
+{
+CharDriverState   *chr   = opaque;
+WinStdioCharState *stdio = chr->opaque;
+INPUT_RECORD   buf[4];
+intret;
+DWORD  dwSize;
+inti;
+
+ret = ReadConsoleInput(stdio->hStdIn, buf, sizeof(buf) / sizeof(*buf),
+   &dwSize);
+
+if (!ret) {
+/* Avoid error storm */
+qemu_del_wait_object(stdio->hStdIn, NULL, NULL);
+return;
+}
+
+for (i = 0; i < dwSize; i++) {
+KEY_EVENT_RECORD *kev = &buf[i].Event.KeyEvent;
+
+if (buf[i].EventType == KEY_EVENT && kev->bKeyDown) {
+int j;
+if (kev->uChar.AsciiChar != 0) {
+for (j = 0; j < kev->wRepeatCount; j++) {
+if (qemu_chr_be_can_write(chr)) {
+uint8_t c = kev->uChar.AsciiChar;
+qemu_chr_be_write(chr, &c, 1);
+}
+}
+}
+}
+}
+}
+
+static DWORD WINAPI win_stdio_thread(LPVOID param)
+{
+CharDriverState   *chr   = param;
+WinStdioCharState *stdio = chr->opaque;
+intret;
+DWORD  dwSize;
+
+while (1) {
+
+/* Wait for one byte */
+ret = ReadFile(stdio->hStdIn, &stdio->win_stdio_buf, 1, &dwSize, NULL);
+
+/* Exit in case of error, continue if nothing read */
+if (!ret) {
+break;
+}
+if (!dwSize) {
+continue;
+}
+
+/* Some terminal emulator returns \r\n for Enter, just pass \n */
+if (stdio->win_stdio_buf == '\r') {
+continue;
+}
+
+/* Signal the main thread and wait until the byte was eaten */
+if (!SetEvent(stdio->hInputReadyEvent)) {
+break;
+}
+if (WaitForSingleObject(stdio->hInputDoneEvent, INFINITE)
+!= WAIT_OBJECT_0) {
+break;
+}
+}
+
+qemu_del_wait_object(stdio->hInputReadyEvent, NULL, NULL);
+return 0;
+}
+
+static void win_stdio_thread_wait_func(void *opaque)
+{
+CharDriverState   *chr   = opaque;
+WinStdioCharState *stdio = chr->opaque;
+
+if (qemu_chr_be_can_write(chr)) {
+qemu_chr_be_write(chr, &stdio->win_stdio_buf, 1);
+}
+
+SetEvent(stdio->hInputDoneEvent);
+}
+
+static void qemu_chr_set_echo_win_stdio(CharDriverState *chr, bool echo)
+{
+WinStdioCharState *stdio  = chr->opaque;
+DWORD  dwMode = 0;
+
+GetConsoleMode(stdio->hStdIn, &dwMode);
+
+if (echo) {
+SetConsoleMode(stdio->hStdIn, dwMode | ENABLE_ECHO_INPUT);
+} else {
+SetConsoleMode(stdio->hStdIn, dwMode & ~ENABLE_ECHO_INPUT);
+}
+}
+
+static void win_stdio_close(CharDriverState *chr)
+{
+WinStdioCharState *stdio = chr->opaque;
+
+if (stdio->hInputReadyEvent != INVALID_HANDLE_VALUE) {
+CloseHandle(stdio->hInputReadyEvent);
+}
+if (stdio->hInputDoneEvent != INVALID_HANDLE_VALUE) {
+CloseHandle(stdio->hInputDoneEvent);
+}
+if (stdio->hInputThrea

Re: [Qemu-devel] [PATCH] runstate: do not discard runstate changes when paused

2011-10-06 Thread Avi Kivity

On 10/05/2011 08:02 PM, Jan Kiszka wrote:

>
>  Let's examine a concrete example: a user is debugging a guest, which
>  stops at a breakpoint.  Meanwhile a live migration is going on,
>  involving internal stops.  When the guest does manage to run for a bit,
>  it runs out of disk space, generating a stop, which the management agent
>  resolves by allocating more space and issuing a cont.
>
>  With a counting cont, no matter in what order these events happen,
>  things work out fine.  How do they work out with your proposal?

We can enforce stop for temporal reasons (migration/savevm), something
that overrules user/management initiated stops.

Migration resume shouldn't overrule user stop.

It's really simple.  If any agent wants the system stopped, it's 
stopped.  Only when no one wants it stopped, it may run.

BTW, does stop due to migration actually have a window where it accepts
other commands? I thought that phase is synchronous. Then we would just
have to implement proper state saving/restoring.

Save: ++stop_count, restore: --stop_count.

Anyway, there is no point in lock counting for stop reasons that require
external synchronization anyway. gdb vs. management stack vs. human
monitor - nothing is solved by counting the stops, they all can step on
each other's shoes.

Please elaborate.

Even worse, exposing a counting stop via the user
interface requires additional interfaces to recover lost or forgotten
locks. We've discussed this in the past IIRC.

Agree with that.  So there's the second proposal:

vm_stop(unsigned reason)
{
if (!stop_state) {
do_vm_stop();
}
stop_state |= 1 << reason;
}

vm_resume(unsigned reason)
{
stop_state &= ~(1 << reason);
if (!stop_state) {
do_vm_resume();
}
}

so now each agent is separated from the other.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [fedora-virt] balloon drivers missing in virtio-win-1.1.16.vfd

2011-10-06 Thread Andrew Cathrow


- Original Message -
> From: "Justin M. Forbes" 
> To: "Andrew Cathrow" 
> Cc: v...@lists.fedoraproject.org, "Onkar N Mahajan" , 
> qemu-devel@nongnu.org, k...@vger.kernel.org
> Sent: Thursday, October 6, 2011 9:35:44 AM
> Subject: Re: [Qemu-devel] [fedora-virt] balloon drivers missing in
> virtio-win-1.1.16.vfd
> 
> On Thu, 2011-10-06 at 02:33 -0400, Andrew Cathrow wrote:
> > 
> > 
> > - Original Message -
> > > From: "Onkar N Mahajan" 
> > > To: k...@vger.kernel.org, qemu-devel@nongnu.org
> > > Sent: Thursday, September 29, 2011 6:03:26 AM
> > > Subject: balloon drivers missing in virtio-win-1.1.16.vfd
> > > 
> > > virtio_balloon drivers are missing in the virtio-win floppy disk
> > > image
> > > found at
> > > http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/
> > > whereas they are present in the ISO image , any specific reason
> > > for
> > > this ? Shouldn't they be ideally present ?
> 
> 
> The vfd is not supposed to contain the full set of drivers, it is
> meant
> to be the bare minimum drivers required to install (and fit in
> 1.44mb).
> The vfd only contains network and block drivers so that you can
> install
> the system and grab the full set of drivers from the ISO or another
> location.  Later versions of Windows can install using the ISO for
> drivers and do not need the vfd at all.

Makes sense,

thanks
Aic


> 
> Justin
> 
> 
> 
>

Re: [Qemu-devel] [fedora-virt] balloon drivers missing in virtio-win-1.1.16.vfd

2011-10-06 Thread Justin M. Forbes

On Thu, 2011-10-06 at 02:33 -0400, Andrew Cathrow wrote:
> 
> 
> - Original Message -
> > From: "Onkar N Mahajan" 
> > To: k...@vger.kernel.org, qemu-devel@nongnu.org
> > Sent: Thursday, September 29, 2011 6:03:26 AM
> > Subject: balloon drivers missing in virtio-win-1.1.16.vfd
> > 
> > virtio_balloon drivers are missing in the virtio-win floppy disk
> > image
> > found at
> > http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/
> > whereas they are present in the ISO image , any specific reason for
> > this ? Shouldn't they be ideally present ?

The vfd is not supposed to contain the full set of drivers, it is meant
to be the bare minimum drivers required to install (and fit in 1.44mb).
The vfd only contains network and block drivers so that you can install
the system and grab the full set of drivers from the ISO or another
location.  Later versions of Windows can install using the ISO for
drivers and do not need the vfd at all.

Justin

[Qemu-devel] [PATCH 02/25] PPC: Fix via-cuda memory registration

2011-10-06 Thread Avi Kivity

From: Alexander Graf 

Commit 23c5e4ca (convert to memory API) broke the VIA Cuda emulation layer
by not registering the IO structs.

This patch registers them properly and thus makes -M g3beige and -M mac99
work again.

Signed-off-by: Alexander Graf 
Signed-off-by: Avi Kivity 
---
 hw/cuda.c |   28 
 1 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/hw/cuda.c b/hw/cuda.c
index 5c92d81..736de7f 100644
--- a/hw/cuda.c
+++ b/hw/cuda.c
@@ -633,16 +633,20 @@ static uint32_t cuda_readl (void *opaque, 
target_phys_addr_t addr)
 return 0;
 }
 
-static CPUWriteMemoryFunc * const cuda_write[] = {
-&cuda_writeb,
-&cuda_writew,
-&cuda_writel,
-};
-
-static CPUReadMemoryFunc * const cuda_read[] = {
-&cuda_readb,
-&cuda_readw,
-&cuda_readl,
+static MemoryRegionOps cuda_ops = {
+.old_mmio = {
+.write = {
+cuda_writeb,
+cuda_writew,
+cuda_writel,
+},
+.read = {
+cuda_readb,
+cuda_readw,
+cuda_readl,
+},
+},
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 static bool cuda_timer_exist(void *opaque, int version_id)
@@ -739,8 +743,8 @@ void cuda_init (MemoryRegion **cuda_mem, qemu_irq irq)
 s->tick_offset = (uint32_t)mktimegm(&tm) + RTC_OFFSET;
 
 s->adb_poll_timer = qemu_new_timer_ns(vm_clock, cuda_adb_poll, s);
-cpu_register_io_memory(cuda_read, cuda_write, s,
- DEVICE_NATIVE_ENDIAN);
+memory_region_init_io(&s->mem, &cuda_ops, s, "cuda", 0x2000);
+
 *cuda_mem = &s->mem;
 vmstate_register(NULL, -1, &vmstate_cuda, s);
 qemu_register_reset(cuda_reset, s);
-- 
1.7.6.3

[Qemu-devel] [PATCH 13/25] isa: Add isa_register_portio_list()

2011-10-06 Thread Avi Kivity

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/isa-bus.c |   17 +
 hw/isa.h |   31 ++-
 2 files changed, 47 insertions(+), 1 deletions(-)

diff --git a/hw/isa-bus.c b/hw/isa-bus.c
index e9c1712..5d8ff84 100644
--- a/hw/isa-bus.c
+++ b/hw/isa-bus.c
@@ -103,6 +103,23 @@ void isa_register_ioport(ISADevice *dev, MemoryRegion *io, 
uint16_t start)
 }
 }
 
+void isa_register_portio_list(ISADevice *dev, uint16_t start,
+  const MemoryRegionPortio *pio_start,
+  void *opaque, const char *name)
+{
+PortioList *piolist = g_new(PortioList, 1);
+
+/* START is how we should treat DEV, regardless of the actual
+   contents of the portio array.  This is how the old code
+   actually handled e.g. the FDC device.  */
+if (dev) {
+isa_init_ioport(dev, start);
+}
+
+portio_list_init(piolist, pio_start, opaque, name);
+portio_list_add(piolist, isabus->address_space_io, start);
+}
+
 static int isa_qdev_init(DeviceState *qdev, DeviceInfo *base)
 {
 ISADevice *dev = DO_UPCAST(ISADevice, qdev, qdev);
diff --git a/hw/isa.h b/hw/isa.h
index c5c2618..177ef95 100644
--- a/hw/isa.h
+++ b/hw/isa.h
@@ -28,7 +28,6 @@ ISABus *isa_bus_new(DeviceState *dev, MemoryRegion 
*address_space_io);
 void isa_bus_irqs(qemu_irq *irqs);
 qemu_irq isa_get_irq(int isairq);
 void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq);
-void isa_register_ioport(ISADevice *dev, MemoryRegion *io, uint16_t start);
 void isa_init_ioport(ISADevice *dev, uint16_t ioport);
 void isa_init_ioport_range(ISADevice *dev, uint16_t start, uint16_t length);
 void isa_qdev_register(ISADeviceInfo *info);
@@ -37,6 +36,36 @@ ISADevice *isa_create(const char *name);
 ISADevice *isa_try_create(const char *name);
 ISADevice *isa_create_simple(const char *name);
 
+/**
+ * isa_register_ioport: Install an I/O port region on the ISA bus.
+ *
+ * Register an I/O port region via memory_region_add_subregion
+ * inside the ISA I/O address space.
+ *
+ * @dev: the ISADevice against which these are registered; may be NULL.
+ * @io: the #MemoryRegion being registered.
+ * @start: the base I/O port.
+ */
+void isa_register_ioport(ISADevice *dev, MemoryRegion *io, uint16_t start);
+
+/**
+ * isa_register_portio_list: Initialize a set of ISA io ports
+ *
+ * Several ISA devices have many dis-joint I/O ports.  Worse, these I/O
+ * ports can be interleaved with I/O ports from other devices.  This
+ * function makes it easy to create multiple MemoryRegions for a single
+ * device and use the legacy portio routines.
+ *
+ * @dev: the ISADevice against which these are registered; may be NULL.
+ * @start: the base I/O port against which the portio->offset is applied.
+ * @portio: the ports, sorted by offset.
+ * @opaque: passed into the old_portio callbacks.
+ * @name: passed into memory_region_init_io.
+ */
+void isa_register_portio_list(ISADevice *dev, uint16_t start,
+  const MemoryRegionPortio *portio,
+  void *opaque, const char *name);
+
 extern target_phys_addr_t isa_mem_base;
 
 void isa_mmio_setup(MemoryRegion *mr, target_phys_addr_t size);
-- 
1.7.6.3

[Qemu-devel] [PATCH 10/25] isa: Tidy support code for isabus_get_fw_dev_path

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

The only user of ISADevice.ioports is isabus_get_fw_dev_path, and it
only looks at the first entry of the array.  Which suggests that this
entire array+sort operation can be replaced by a simple minimum.

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/isa-bus.c |   25 +
 hw/isa.h |5 +
 2 files changed, 6 insertions(+), 24 deletions(-)

diff --git a/hw/isa-bus.c b/hw/isa-bus.c
index 6c15a31..e9c1712 100644
--- a/hw/isa-bus.c
+++ b/hw/isa-bus.c
@@ -83,24 +83,11 @@ void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq)
 dev->nirqs++;
 }
 
-static void isa_init_ioport_one(ISADevice *dev, uint16_t ioport)
-{
-assert(dev->nioports < ARRAY_SIZE(dev->ioports));
-dev->ioports[dev->nioports++] = ioport;
-}
-
-static int isa_cmp_ports(const void *p1, const void *p2)
-{
-return *(uint16_t*)p1 - *(uint16_t*)p2;
-}
-
 void isa_init_ioport_range(ISADevice *dev, uint16_t start, uint16_t length)
 {
-int i;
-for (i = start; i < start + length; i++) {
-isa_init_ioport_one(dev, i);
+if (dev->ioport_id == 0 || start < dev->ioport_id) {
+dev->ioport_id = start;
 }
-qsort(dev->ioports, dev->nioports, sizeof(dev->ioports[0]), isa_cmp_ports);
 }
 
 void isa_init_ioport(ISADevice *dev, uint16_t ioport)
@@ -112,9 +99,7 @@ void isa_register_ioport(ISADevice *dev, MemoryRegion *io, 
uint16_t start)
 {
 memory_region_add_subregion(isabus->address_space_io, start, io);
 if (dev != NULL) {
-assert(dev->nio < ARRAY_SIZE(dev->io));
-dev->io[dev->nio++] = io;
-isa_init_ioport_range(dev, start, memory_region_size(io));
+isa_init_ioport(dev, start);
 }
 }
 
@@ -208,8 +193,8 @@ static void isabus_register_devices(void)
 int off;
 
 off = snprintf(path, sizeof(path), "%s", qdev_fw_name(dev));
-if (d->nioports) {
-snprintf(path + off, sizeof(path) - off, "@%04x", d->ioports[0]);
+if (d->ioport_id) {
+snprintf(path + off, sizeof(path) - off, "@%04x", d->ioport_id);
 }
 
 return strdup(path);
diff --git a/hw/isa.h b/hw/isa.h
index 432d17a..c5c2618 100644
--- a/hw/isa.h
+++ b/hw/isa.h
@@ -13,12 +13,9 @@ typedef struct ISADeviceInfo ISADeviceInfo;
 
 struct ISADevice {
 DeviceState qdev;
-MemoryRegion *io[32];
 uint32_t isairq[2];
-uint16_t ioports[32];
 int nirqs;
-int nioports;
-int nio;
+int ioport_id;
 };
 
 typedef int (*isa_qdev_initfn)(ISADevice *dev);
-- 
1.7.6.3

[Qemu-devel] [PATCH 23/25] vmport: Convert to isa_register_ioport

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/vmport.c |   16 +---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/hw/vmport.c b/hw/vmport.c
index c8aefaa..b5c6fa1 100644
--- a/hw/vmport.c
+++ b/hw/vmport.c
@@ -38,6 +38,7 @@
 typedef struct _VMPortState
 {
 ISADevice dev;
+MemoryRegion io;
 IOPortReadFunc *func[VMPORT_ENTRIES];
 void *opaque[VMPORT_ENTRIES];
 } VMPortState;
@@ -120,13 +121,22 @@ void vmmouse_set_data(const uint32_t *data)
 env->regs[R_ESI] = data[4]; env->regs[R_EDI] = data[5];
 }
 
+static const MemoryRegionPortio vmport_portio[] = {
+{0, 1, 4, .read = vmport_ioport_read, .write = vmport_ioport_write },
+PORTIO_END_OF_LIST(),
+};
+
+static const MemoryRegionOps vmport_ops = {
+.old_portio = vmport_portio
+};
+
 static int vmport_initfn(ISADevice *dev)
 {
 VMPortState *s = DO_UPCAST(VMPortState, dev, dev);
 
-register_ioport_read(0x5658, 1, 4, vmport_ioport_read, s);
-register_ioport_write(0x5658, 1, 4, vmport_ioport_write, s);
-isa_init_ioport(dev, 0x5658);
+memory_region_init_io(&s->io, &vmport_ops, s, "vmport", 1);
+isa_register_ioport(dev, &s->io, 0x5658);
+
 port_state = s;
 /* Register some generic port commands */
 vmport_register(VMPORT_CMD_GETVERSION, vmport_cmd_get_version, NULL);
-- 
1.7.6.3

[Qemu-devel] [PATCH] qemu-options: avoid #if in spicevmc texi help

2011-10-06 Thread Stefan Hajnoczi

Preprocessor directives cannot be used in STEXI/ETEXI sections since
they are not passed through the preprocessor.  The spicevmc chardev
option help currently uses #if, which is included verbatim in the man
page output.

Fix this by simply stating that spicevmc chardevs are available only in
builds with spice support.

Signed-off-by: Stefan Hajnoczi 
---
 qemu-options.hx |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index dfbabd0..d4fe990 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1673,15 +1673,15 @@ Connect to a local parallel port.
 @option{path} specifies the path to the parallel port device. @option{path} is
 required.
 
-#if defined(CONFIG_SPICE)
 @item -chardev spicevmc ,id=@var{id} ,debug=@var{debug}, name=@var{name}
 
+@option{spicevmc} is only available when spice support is built in.
+
 @option{debug} debug level for spicevmc
 
 @option{name} name of spice channel to connect to
 
 Connect to a spice virtual machine channel, such as vdiport.
-#endif
 
 @end table
 ETEXI
-- 
1.7.6.3

[Qemu-devel] [PATCH 07/25] hw/arm11mpcore: Clean up to avoid using sysbus_mmio_init_cb2

2011-10-06 Thread Avi Kivity

From: Peter Maydell 

Clean up the initialisation of the realview_mpcore device to avoid
using sysbus_init_mmio_cb2(): we can pass through the MemoryRegion
of the private arm11mpcore_priv device directly now.

Signed-off-by: Peter Maydell 
Signed-off-by: Avi Kivity 
---
 hw/arm11mpcore.c |   13 +
 1 files changed, 1 insertions(+), 12 deletions(-)

diff --git a/hw/arm11mpcore.c b/hw/arm11mpcore.c
index 7d60ef6..974a0d8 100644
--- a/hw/arm11mpcore.c
+++ b/hw/arm11mpcore.c
@@ -48,17 +48,6 @@ static void mpcore_rirq_set_irq(void *opaque, int irq, int 
level)
 }
 }
 
-static void mpcore_rirq_map(SysBusDevice *dev, target_phys_addr_t base)
-{
-mpcore_rirq_state *s = FROM_SYSBUS(mpcore_rirq_state, dev);
-sysbus_mmio_map(s->priv, 0, base);
-}
-
-static void mpcore_rirq_unmap(SysBusDevice *dev, target_phys_addr_t base)
-{
-/* nothing to do */
-}
-
 static int realview_mpcore_init(SysBusDevice *dev)
 {
 mpcore_rirq_state *s = FROM_SYSBUS(mpcore_rirq_state, dev);
@@ -84,7 +73,7 @@ static int realview_mpcore_init(SysBusDevice *dev)
 }
 }
 qdev_init_gpio_in(&dev->qdev, mpcore_rirq_set_irq, 64);
-sysbus_init_mmio_cb2(dev, mpcore_rirq_map, mpcore_rirq_unmap);
+sysbus_init_mmio_region(dev, sysbus_mmio_get_region(s->priv, 0));
 return 0;
 }
 
-- 
1.7.6.3

[Qemu-devel] [PATCH 22/25] pc: Convert port92 to isa_register_ioport

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/pc.c |   16 +---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 203627d..ded4758 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -428,6 +428,7 @@ void pc_cmos_init(ram_addr_t ram_size, ram_addr_t 
above_4g_mem_size,
 /* port 92 stuff: could be split off */
 typedef struct Port92State {
 ISADevice dev;
+MemoryRegion io;
 uint8_t outport;
 qemu_irq *a20_out;
 } Port92State;
@@ -479,13 +480,22 @@ static void port92_reset(DeviceState *d)
 s->outport &= ~1;
 }
 
+static const MemoryRegionPortio port92_portio[] = {
+{ 0, 1, 1, .read = port92_read, .write = port92_write },
+PORTIO_END_OF_LIST(),
+};
+
+static const MemoryRegionOps port92_ops = {
+.old_portio = port92_portio
+};
+
 static int port92_initfn(ISADevice *dev)
 {
 Port92State *s = DO_UPCAST(Port92State, dev, dev);
 
-register_ioport_read(0x92, 1, 1, port92_read, s);
-register_ioport_write(0x92, 1, 1, port92_write, s);
-isa_init_ioport(dev, 0x92);
+memory_region_init_io(&s->io, &port92_ops, s, "port92", 1);
+isa_register_ioport(dev, &s->io, 0x92);
+
 s->outport = 0;
 return 0;
 }
-- 
1.7.6.3

[Qemu-devel] qemu guest agent spins in poll/nanosleep(100ms) when nothing is listening on host

2011-10-06 Thread Daniel P. Berrange

I've been doing some experimentation with the QEMU guest agent and have
noticed that when nothing is connected on the host side of the virtio
serial channel, the guest agent just spins in a pool/sleep(100ms) loop.
I know you'd ordinarily expect some mgmt app in the host to be listening
to the other end of the channel, but it still seems suboptimal to have
to spin in a loop like this when nothing is listening, constantly causing
wakeups in an otherwise idle guest.

Looking at the qemu-ga.c code I see two places where it might handle
a poll event and then sleep, when nothing is on the other end of the
virtio serial socket.


   case G_IO_STATUS_AGAIN:
/* virtio causes us to spin here when no process is attached to
 * host-side chardev. sleep a bit to mitigate this
 */
if (s->virtio) {
usleep(100*1000);
}
return true;

   


} else if (strcmp(s->method, "virtio-serial") == 0) {
/* we spin on EOF for virtio-serial, so back off a bit. also,
 * dont close the connection in this case, it'll resume normal
 * operation when another process connects to host chardev
 */
usleep(100*1000);
goto out_noclose;
}


I get the feeling that this kind of problem inherant in the use of any
virtio-serial channel, in the same way you can't detect EOF for a regular
serial device channel either. Given that virtio-serial is a nice paravirt
device, is there anything we can do to it, to allow better handling of
EOF by applications ?

Or perhaps there is some way to make use of epoll() in edge-triggered
mode to detect it already, because IIUC, edge-triggered mode should only
fire once for the EOF condition, and then not fire again until something
in the host actually sends some data ?

Of course glib's event loop doesn't support edge-triggered events/epoll,
but perhaps we could just call epoll() directly in the event handler,
instead of the usleep() call ?

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

[Qemu-devel] [PATCH 24/25] ide: Convert to isa_register_portio_list

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/ide/core.c |   30 +++---
 hw/ide/internal.h |3 ++-
 hw/ide/isa.c  |4 +---
 hw/ide/piix.c |7 ---
 hw/ide/via.c  |7 ---
 5 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 4e76fc7..9eaf7f2 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "qemu-error.h"
 #include "qemu-timer.h"
 #include "sysemu.h"
@@ -1969,20 +1970,27 @@ void ide_init2_with_non_qdev_drives(IDEBus *bus, 
DriveInfo *hd0,
 bus->dma = &ide_dma_nop;
 }
 
-void ide_init_ioport(IDEBus *bus, int iobase, int iobase2)
+static const MemoryRegionPortio ide_portio_list[] = {
+{ 0, 8, 1, .read = ide_ioport_read, .write = ide_ioport_write },
+{ 0, 2, 2, .read = ide_data_readw, .write = ide_data_writew },
+{ 0, 4, 4, .read = ide_data_readl, .write = ide_data_writel },
+PORTIO_END_OF_LIST(),
+};
+
+static const MemoryRegionPortio ide_portio2_list[] = {
+{ 0, 1, 1, .read = ide_status_read, .write = ide_cmd_write },
+PORTIO_END_OF_LIST(),
+};
+
+void ide_init_ioport(IDEBus *bus, ISADevice *dev, int iobase, int iobase2)
 {
-register_ioport_write(iobase, 8, 1, ide_ioport_write, bus);
-register_ioport_read(iobase, 8, 1, ide_ioport_read, bus);
+/* ??? Assume only ISA and PCI configurations, and that the PCI-ISA
+   bridge has been setup properly to always register with ISA.  */
+isa_register_portio_list(dev, iobase, ide_portio_list, bus, "ide");
+
 if (iobase2) {
-register_ioport_read(iobase2, 1, 1, ide_status_read, bus);
-register_ioport_write(iobase2, 1, 1, ide_cmd_write, bus);
+isa_register_portio_list(dev, iobase2, ide_portio2_list, bus, "ide");
 }
-
-/* data ports */
-register_ioport_write(iobase, 2, 2, ide_data_writew, bus);
-register_ioport_read(iobase, 2, 2, ide_data_readw, bus);
-register_ioport_write(iobase, 4, 4, ide_data_writel, bus);
-register_ioport_read(iobase, 4, 4, ide_data_readl, bus);
 }
 
 static bool is_identify_set(void *opaque, int version_id)
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 9046e96..c39dc05 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -7,6 +7,7 @@
  * non-internal declarations are in hw/ide.h
  */
 #include 
+#include 
 #include "iorange.h"
 #include "dma.h"
 #include "sysemu.h"
@@ -600,7 +601,7 @@ int ide_init_drive(IDEState *s, BlockDriverState *bs, 
IDEDriveKind kind,
 void ide_init2(IDEBus *bus, qemu_irq irq);
 void ide_init2_with_non_qdev_drives(IDEBus *bus, DriveInfo *hd0,
 DriveInfo *hd1, qemu_irq irq);
-void ide_init_ioport(IDEBus *bus, int iobase, int iobase2);
+void ide_init_ioport(IDEBus *bus, ISADevice *isa, int iobase, int iobase2);
 
 void ide_exec_cmd(IDEBus *bus, uint32_t val);
 void ide_dma_cb(void *opaque, int ret);
diff --git a/hw/ide/isa.c b/hw/ide/isa.c
index 28b69d2..01a9e59 100644
--- a/hw/ide/isa.c
+++ b/hw/ide/isa.c
@@ -66,10 +66,8 @@ static int isa_ide_initfn(ISADevice *dev)
 ISAIDEState *s = DO_UPCAST(ISAIDEState, dev, dev);
 
 ide_bus_new(&s->bus, &s->dev.qdev, 0);
-ide_init_ioport(&s->bus, s->iobase, s->iobase2);
+ide_init_ioport(&s->bus, dev, s->iobase, s->iobase2);
 isa_init_irq(dev, &s->irq, s->isairq);
-isa_init_ioport_range(dev, s->iobase, 8);
-isa_init_ioport(dev, s->iobase2);
 ide_init2(&s->bus, s->irq);
 vmstate_register(&dev->qdev, 0, &vmstate_ide_isa, s);
 return 0;
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 88d3181..08cbbe2 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -122,8 +122,7 @@ static void piix3_reset(void *opaque)
 }
 
 static void pci_piix_init_ports(PCIIDEState *d) {
-int i;
-struct {
+static const struct {
 int iobase;
 int iobase2;
 int isairq;
@@ -131,10 +130,12 @@ static void pci_piix_init_ports(PCIIDEState *d) {
 {0x1f0, 0x3f6, 14},
 {0x170, 0x376, 15},
 };
+int i;
 
 for (i = 0; i < 2; i++) {
 ide_bus_new(&d->bus[i], &d->dev.qdev, i);
-ide_init_ioport(&d->bus[i], port_info[i].iobase, port_info[i].iobase2);
+ide_init_ioport(&d->bus[i], NULL, port_info[i].iobase,
+port_info[i].iobase2);
 ide_init2(&d->bus[i], isa_get_irq(port_info[i].isairq));
 
 bmdma_init(&d->bus[i], &d->bmdma[i], d);
diff --git a/hw/ide/via.c b/hw/ide/via.c
index dab8a39..098f150 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -146,8 +146,7 @@ static void via_reset(void *opaque)
 }
 
 static void vt82c686b_init_ports(PCIIDEState *d) {
-int i;
-struct {
+static const struct {
 int iobase;
 int iobase2;
 int isairq;
@@ -155,10 +154,12 @@ static void vt82c686b_init_ports(PCIIDEState *d) {
 {0x1f0, 0x3f6, 14},
 {0x170, 0x376, 15},
 };
+int i;

[Qemu-devel] [PATCH 18/25] ne2000: Convert to isa_register_ioport

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/ne2000-isa.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/hw/ne2000-isa.c b/hw/ne2000-isa.c
index 756ed5c..11ffee7 100644
--- a/hw/ne2000-isa.c
+++ b/hw/ne2000-isa.c
@@ -68,10 +68,7 @@ static int isa_ne2000_initfn(ISADevice *dev)
 NE2000State *s = &isa->ne2000;
 
 ne2000_setup_io(s, 0x20);
-isa_init_ioport_range(dev, isa->iobase, 16);
-isa_init_ioport_range(dev, isa->iobase + 0x10, 2);
-isa_init_ioport(dev, isa->iobase + 0x1f);
-memory_region_add_subregion(get_system_io(), isa->iobase, &s->io);
+isa_register_ioport(dev, &s->io, isa->iobase);
 
 isa_init_irq(dev, &s->irq, isa->isairq);
 
-- 
1.7.6.3

[Qemu-devel] [PATCH 06/25] ppc405_boards: convert to memory API

2011-10-06 Thread Avi Kivity

Signed-off-by: Avi Kivity 
---
 hw/ppc405_boards.c |   84 +++-
 1 files changed, 37 insertions(+), 47 deletions(-)

diff --git a/hw/ppc405_boards.c b/hw/ppc405_boards.c
index ca65ac3..b28bdda 100644
--- a/hw/ppc405_boards.c
+++ b/hw/ppc405_boards.c
@@ -137,16 +137,16 @@ static void ref405ep_fpga_writel (void *opaque,
 ref405ep_fpga_writeb(opaque, addr + 3, value & 0xFF);
 }
 
-static CPUReadMemoryFunc * const ref405ep_fpga_read[] = {
-&ref405ep_fpga_readb,
-&ref405ep_fpga_readw,
-&ref405ep_fpga_readl,
-};
-
-static CPUWriteMemoryFunc * const ref405ep_fpga_write[] = {
-&ref405ep_fpga_writeb,
-&ref405ep_fpga_writew,
-&ref405ep_fpga_writel,
+static const MemoryRegionOps ref405ep_fpga_ops = {
+.old_mmio = {
+.read = {
+ref405ep_fpga_readb, ref405ep_fpga_readw, ref405ep_fpga_readl,
+},
+.write = {
+ref405ep_fpga_writeb, ref405ep_fpga_writew, ref405ep_fpga_writel,
+},
+},
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 static void ref405ep_fpga_reset (void *opaque)
@@ -158,16 +158,15 @@ static void ref405ep_fpga_reset (void *opaque)
 fpga->reg1 = 0x0F;
 }
 
-static void ref405ep_fpga_init (uint32_t base)
+static void ref405ep_fpga_init (MemoryRegion *sysmem, uint32_t base)
 {
 ref405ep_fpga_t *fpga;
-int fpga_memory;
+MemoryRegion *fpga_memory = g_new(MemoryRegion, 1);
 
 fpga = g_malloc0(sizeof(ref405ep_fpga_t));
-fpga_memory = cpu_register_io_memory(ref405ep_fpga_read,
- ref405ep_fpga_write, fpga,
- DEVICE_NATIVE_ENDIAN);
-cpu_register_physical_memory(base, 0x0100, fpga_memory);
+memory_region_init_io(fpga_memory, &ref405ep_fpga_ops, fpga,
+  "fpga", 0x0100);
+memory_region_add_subregion(sysmem, base, fpga_memory);
 qemu_register_reset(&ref405ep_fpga_reset, fpga);
 }
 
@@ -183,7 +182,8 @@ static void ref405ep_init (ram_addr_t ram_size,
 CPUPPCState *env;
 qemu_irq *pic;
 MemoryRegion *bios;
-ram_addr_t sram_offset, bdloc;
+MemoryRegion *sram = g_new(MemoryRegion, 1);
+ram_addr_t bdloc;
 MemoryRegion *ram_memories = g_malloc(2 * sizeof(*ram_memories));
 target_phys_addr_t ram_bases[2], ram_sizes[2];
 target_ulong sram_size;
@@ -195,6 +195,7 @@ static void ref405ep_init (ram_addr_t ram_size,
 int linux_boot;
 int fl_idx, fl_sectors, len;
 DriveInfo *dinfo;
+MemoryRegion *sysmem = get_system_memory();
 
 /* XXX: fix this */
 memory_region_init_ram(&ram_memories[0], NULL, "ef405ep.ram", 0x0800);
@@ -207,16 +208,12 @@ static void ref405ep_init (ram_addr_t ram_size,
 #ifdef DEBUG_BOARD_INIT
 printf("%s: register cpu\n", __func__);
 #endif
-env = ppc405ep_init(get_system_memory(), ram_memories, ram_bases, 
ram_sizes,
+env = ppc405ep_init(sysmem, ram_memories, ram_bases, ram_sizes,
 , &pic, kernel_filename == NULL ? 0 : 1);
 /* allocate SRAM */
 sram_size = 512 * 1024;
-sram_offset = qemu_ram_alloc(NULL, "ef405ep.sram", sram_size);
-#ifdef DEBUG_BOARD_INIT
-printf("%s: register SRAM at offset %08lx\n", __func__, sram_offset);
-#endif
-cpu_register_physical_memory(0xFFF0, sram_size,
- sram_offset | IO_MEM_RAM);
+memory_region_init_ram(sram, NULL, "ef405ep.sram", sram_size);
+memory_region_add_subregion(sysmem, 0xFFF0, sram);
 /* allocate and load BIOS */
 #ifdef DEBUG_BOARD_INIT
 printf("%s: register BIOS\n", __func__);
@@ -263,14 +260,13 @@ static void ref405ep_init (ram_addr_t ram_size,
 }
 bios_size = (bios_size + 0xfff) & ~0xfff;
 memory_region_set_readonly(bios, true);
-memory_region_add_subregion(get_system_memory(),
-(uint32_t)(-bios_size), bios);
+memory_region_add_subregion(sysmem, (uint32_t)(-bios_size), bios);
 }
 /* Register FPGA */
 #ifdef DEBUG_BOARD_INIT
 printf("%s: register FPGA\n", __func__);
 #endif
-ref405ep_fpga_init(0xF030);
+ref405ep_fpga_init(sysmem, 0xF030);
 /* Register NVRAM */
 #ifdef DEBUG_BOARD_INIT
 printf("%s: register NVRAM\n", __func__);
@@ -468,16 +464,12 @@ static void taihu_cpld_writel (void *opaque,
 taihu_cpld_writeb(opaque, addr + 3, value & 0xFF);
 }
 
-static CPUReadMemoryFunc * const taihu_cpld_read[] = {
-&taihu_cpld_readb,
-&taihu_cpld_readw,
-&taihu_cpld_readl,
-};
-
-static CPUWriteMemoryFunc * const taihu_cpld_write[] = {
-&taihu_cpld_writeb,
-&taihu_cpld_writew,
-&taihu_cpld_writel,
+static const MemoryRegionOps taihu_cpld_ops = {
+.old_mmio = {
+.read = { taihu_cpld_readb, taihu_cpld_readw, taihu_cpld_readl, },
+.write = { taihu_cpld_writeb, taihu_cpld_writew, taihu_cpld_writel, },
+},
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 static vo

[Qemu-devel] [PATCH 04/25] petalogix_ml605: convert to memory API

2011-10-06 Thread Avi Kivity

Signed-off-by: Avi Kivity 
---
 hw/petalogix_ml605_mmu.c |   15 +++
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/petalogix_ml605_mmu.c b/hw/petalogix_ml605_mmu.c
index 2a0f7fd..fb4ba29 100644
--- a/hw/petalogix_ml605_mmu.c
+++ b/hw/petalogix_ml605_mmu.c
@@ -149,8 +149,8 @@ static uint64_t translate_kernel_address(void *opaque, 
uint64_t addr)
 DriveInfo *dinfo;
 int i;
 target_phys_addr_t ddr_base = MEMORY_BASEADDR;
-ram_addr_t phys_lmb_bram;
-ram_addr_t phys_ram;
+MemoryRegion *phys_lmb_bram = g_new(MemoryRegion, 1);
+MemoryRegion *phys_ram = g_new(MemoryRegion, 1);
 qemu_irq irq[32], *cpu_irq;
 
 /* init CPUs */
@@ -162,13 +162,12 @@ static uint64_t translate_kernel_address(void *opaque, 
uint64_t addr)
 qemu_register_reset(main_cpu_reset, env);
 
 /* Attach emulated BRAM through the LMB.  */
-phys_lmb_bram = qemu_ram_alloc(NULL, "petalogix_ml605.lmb_bram",
-   LMB_BRAM_SIZE);
-cpu_register_physical_memory(0x, LMB_BRAM_SIZE,
- phys_lmb_bram | IO_MEM_RAM);
+memory_region_init_ram(phys_lmb_bram, NULL, "petalogix_ml605.lmb_bram",
+   LMB_BRAM_SIZE);
+memory_region_add_subregion(address_space_mem, 0x, phys_lmb_bram);
 
-phys_ram = qemu_ram_alloc(NULL, "petalogix_ml605.ram", ram_size);
-cpu_register_physical_memory(ddr_base, ram_size, phys_ram | IO_MEM_RAM);
+memory_region_init_ram(phys_ram, NULL, "petalogix_ml605.ram", ram_size);
+memory_region_add_subregion(address_space_mem, ddr_base, phys_ram);
 
 dinfo = drive_get(IF_PFLASH, 0, 0);
 /* 5th parameter 2 means bank-width
-- 
1.7.6.3

[Qemu-devel] [PATCH 20/25] sb16: Convert to isa_register_portio_list

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/sb16.c |   32 +---
 1 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/hw/sb16.c b/hw/sb16.c
index a76df1b..fe927e2 100644
--- a/hw/sb16.c
+++ b/hw/sb16.c
@@ -1341,12 +1341,21 @@ static int sb16_post_load (void *opaque, int version_id)
 }
 };
 
+static const MemoryRegionPortio sb16_ioport_list[] = {
+{  4, 1, 1, .write = mixer_write_indexb },
+{  4, 1, 2, .write = mixer_write_indexw },
+{  5, 1, 1, .read = mixer_read, .write = mixer_write_datab },
+{  6, 1, 1, .read = dsp_read, .write = dsp_write },
+{ 10, 1, 1, .read = dsp_read },
+{ 12, 1, 1, .write = dsp_write },
+{ 12, 4, 1, .read = dsp_read },
+PORTIO_END_OF_LIST(),
+};
+
+
 static int sb16_initfn (ISADevice *dev)
 {
-static const uint8_t dsp_write_ports[] = {0x6, 0xc};
-static const uint8_t dsp_read_ports[] = {0x6, 0xa, 0xc, 0xd, 0xe, 0xf};
 SB16State *s;
-int i;
 
 s = DO_UPCAST (SB16State, dev, dev);
 
@@ -1366,22 +1375,7 @@ static int sb16_initfn (ISADevice *dev)
 dolog ("warning: Could not create auxiliary timer\n");
 }
 
-for (i = 0; i < ARRAY_SIZE (dsp_write_ports); i++) {
-register_ioport_write (s->port + dsp_write_ports[i], 1, 1, dsp_write, 
s);
-isa_init_ioport(dev, s->port + dsp_write_ports[i]);
-}
-
-for (i = 0; i < ARRAY_SIZE (dsp_read_ports); i++) {
-register_ioport_read (s->port + dsp_read_ports[i], 1, 1, dsp_read, s);
-isa_init_ioport(dev, s->port + dsp_read_ports[i]);
-}
-
-register_ioport_write (s->port + 0x4, 1, 1, mixer_write_indexb, s);
-register_ioport_write (s->port + 0x4, 1, 2, mixer_write_indexw, s);
-isa_init_ioport(dev, s->port + 0x4);
-register_ioport_read (s->port + 0x5, 1, 1, mixer_read, s);
-register_ioport_write (s->port + 0x5, 1, 1, mixer_write_datab, s);
-isa_init_ioport(dev, s->port + 0x5);
+isa_register_portio_list(dev, s->port, sb16_ioport_list, s, "sb16");
 
 DMA_register_channel (s->hdma, SB_read_DMA, s);
 DMA_register_channel (s->dma, SB_read_DMA, s);
-- 
1.7.6.3

[Qemu-devel] [PATCH 09/25] ppc_newworld: convert to memory API

2011-10-06 Thread Avi Kivity

Signed-off-by: Avi Kivity 
---
 hw/ppc_newworld.c |   39 +--
 1 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/hw/ppc_newworld.c b/hw/ppc_newworld.c
index b1cc3d7..946070c 100644
--- a/hw/ppc_newworld.c
+++ b/hw/ppc_newworld.c
@@ -83,12 +83,13 @@
 #endif
 
 /* UniN device */
-static void unin_writel (void *opaque, target_phys_addr_t addr, uint32_t value)
+static void unin_write(void *opaque, target_phys_addr_t addr, uint64_t value,
+   unsigned size)
 {
-UNIN_DPRINTF("writel addr " TARGET_FMT_plx " val %x\n", addr, value);
+UNIN_DPRINTF("write addr " TARGET_FMT_plx " val %"PRIx64"\n", addr, value);
 }
 
-static uint32_t unin_readl (void *opaque, target_phys_addr_t addr)
+static uint64_t unin_read(void *opaque, target_phys_addr_t addr, unsigned size)
 {
 uint32_t value;
 
@@ -98,16 +99,10 @@ static uint32_t unin_readl (void *opaque, 
target_phys_addr_t addr)
 return value;
 }
 
-static CPUWriteMemoryFunc * const unin_write[] = {
-&unin_writel,
-&unin_writel,
-&unin_writel,
-};
-
-static CPUReadMemoryFunc * const unin_read[] = {
-&unin_readl,
-&unin_readl,
-&unin_readl,
+static const MemoryRegionOps unin_ops = {
+.read = unin_read,
+.write = unin_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 static int fw_cfg_boot_set(void *opaque, const char *boot_device)
@@ -137,9 +132,9 @@ static void ppc_core99_init (ram_addr_t ram_size,
 CPUState *env = NULL;
 char *filename;
 qemu_irq *pic, **openpic_irqs;
-int unin_memory;
+MemoryRegion *unin_memory = g_new(MemoryRegion, 1);
 int linux_boot, i;
-ram_addr_t ram_offset, bios_offset;
+MemoryRegion *ram = g_new(MemoryRegion, 1), *bios = g_new(MemoryRegion, 1);
 target_phys_addr_t kernel_base, initrd_base, cmdline_base = 0;
 long kernel_size, initrd_size;
 PCIBus *pci_bus;
@@ -175,15 +170,16 @@ static void ppc_core99_init (ram_addr_t ram_size,
 }
 
 /* allocate RAM */
-ram_offset = qemu_ram_alloc(NULL, "ppc_core99.ram", ram_size);
-cpu_register_physical_memory(0, ram_size, ram_offset);
+memory_region_init_ram(ram, NULL, "ppc_core99.ram", ram_size);
+memory_region_add_subregion(get_system_memory(), 0, ram);
 
 /* allocate and load BIOS */
-bios_offset = qemu_ram_alloc(NULL, "ppc_core99.bios", BIOS_SIZE);
+memory_region_init_ram(bios, NULL, "ppc_core99.bios", BIOS_SIZE);
 if (bios_name == NULL)
 bios_name = PROM_FILENAME;
 filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
-cpu_register_physical_memory(PROM_ADDR, BIOS_SIZE, bios_offset | 
IO_MEM_ROM);
+memory_region_set_readonly(bios, true);
+memory_region_add_subregion(get_system_memory(), PROM_ADDR, bios);
 
 /* Load OpenBIOS (ELF) */
 if (filename) {
@@ -266,9 +262,8 @@ static void ppc_core99_init (ram_addr_t ram_size,
 isa_mmio_init(0xf200, 0x0080);
 
 /* UniN init */
-unin_memory = cpu_register_io_memory(unin_read, unin_write, NULL,
- DEVICE_NATIVE_ENDIAN);
-cpu_register_physical_memory(0xf800, 0x1000, unin_memory);
+memory_region_init_io(unin_memory, &unin_ops, NULL, "unin", 0x1000);
+memory_region_add_subregion(get_system_memory(), 0xf800, unin_memory);
 
 openpic_irqs = g_malloc0(smp_cpus * sizeof(qemu_irq *));
 openpic_irqs[0] =
-- 
1.7.6.3

[Qemu-devel] [PATCH 08/25] hw/versatile_pci: Expose multiple sysbus mmio regions

2011-10-06 Thread Avi Kivity

From: Peter Maydell 

Clean up versatile_pci to expose the various PCI mmio regions
properly as separate mmio regions rather than as a single mmio
which uses callbacks to map and unmap everything.

Signed-off-by: Peter Maydell 
Signed-off-by: Avi Kivity 
---
 hw/realview.c  |   12 ++--
 hw/versatile_pci.c |   42 --
 hw/versatilepb.c   |   12 ++--
 3 files changed, 28 insertions(+), 38 deletions(-)

diff --git a/hw/realview.c b/hw/realview.c
index 549bb15..11ffb8a 100644
--- a/hw/realview.c
+++ b/hw/realview.c
@@ -272,8 +272,16 @@ static void realview_init(ram_addr_t ram_size,
 sysbus_create_simple("pl031", 0x10017000, pic[10]);
 
 if (!is_pb) {
-dev = sysbus_create_varargs("realview_pci", 0x6000,
-pic[48], pic[49], pic[50], pic[51], NULL);
+dev = qdev_create(NULL, "realview_pci");
+busdev = sysbus_from_qdev(dev);
+qdev_init_nofail(dev);
+sysbus_mmio_map(busdev, 0, 0x6100); /* PCI self-config */
+sysbus_mmio_map(busdev, 1, 0x6200); /* PCI config */
+sysbus_mmio_map(busdev, 2, 0x6300); /* PCI I/O */
+sysbus_connect_irq(busdev, 0, pic[48]);
+sysbus_connect_irq(busdev, 1, pic[49]);
+sysbus_connect_irq(busdev, 2, pic[50]);
+sysbus_connect_irq(busdev, 3, pic[51]);
 pci_bus = (PCIBus *)qdev_get_child_bus(dev, "pci");
 if (usb_enabled) {
 usb_ohci_init_pci(pci_bus, -1);
diff --git a/hw/versatile_pci.c b/hw/versatile_pci.c
index 98e56f1..8a88696 100644
--- a/hw/versatile_pci.c
+++ b/hw/versatile_pci.c
@@ -58,38 +58,6 @@ static void pci_vpb_set_irq(void *opaque, int irq_num, int 
level)
 qemu_set_irq(pic[irq_num], level);
 }
 
-
-static void pci_vpb_map(SysBusDevice *dev, target_phys_addr_t base)
-{
-PCIVPBState *s = (PCIVPBState *)dev;
-/* Selfconfig area.  */
-memory_region_add_subregion(get_system_memory(), base + 0x0100,
-&s->mem_config);
-/* Normal config area.  */
-memory_region_add_subregion(get_system_memory(), base + 0x0200,
-&s->mem_config2);
-
-if (s->realview) {
-/* IO memory area.  */
-memory_region_add_subregion(get_system_memory(), base + 0x0300,
-&s->isa);
-}
-}
-
-static void pci_vpb_unmap(SysBusDevice *dev, target_phys_addr_t base)
-{
-PCIVPBState *s = (PCIVPBState *)dev;
-/* Selfconfig area.  */
-memory_region_del_subregion(get_system_memory(), &s->mem_config);
-/* Normal config area.  */
-memory_region_del_subregion(get_system_memory(), &s->mem_config2);
-
-if (s->realview) {
-/* IO memory area.  */
-memory_region_del_subregion(get_system_memory(), &s->isa);
-}
-}
-
 static int pci_vpb_init(SysBusDevice *dev)
 {
 PCIVPBState *s = FROM_SYSBUS(PCIVPBState, dev);
@@ -106,16 +74,22 @@ static int pci_vpb_init(SysBusDevice *dev)
 
 /* ??? Register memory space.  */
 
+/* Our memory regions are:
+ * 0 : PCI self config window
+ * 1 : PCI config window
+ * 2 : PCI IO window (realview_pci only)
+ */
 memory_region_init_io(&s->mem_config, &pci_vpb_config_ops, bus,
   "pci-vpb-selfconfig", 0x100);
+sysbus_init_mmio_region(dev, &s->mem_config);
 memory_region_init_io(&s->mem_config2, &pci_vpb_config_ops, bus,
   "pci-vpb-config", 0x100);
+sysbus_init_mmio_region(dev, &s->mem_config2);
 if (s->realview) {
 isa_mmio_setup(&s->isa, 0x010);
+sysbus_init_mmio_region(dev, &s->isa);
 }
 
-sysbus_init_mmio_cb2(dev, pci_vpb_map, pci_vpb_unmap);
-
 pci_create_simple(bus, -1, "versatile_pci_host");
 return 0;
 }
diff --git a/hw/versatilepb.c b/hw/versatilepb.c
index 49f8f5f..68402cc 100644
--- a/hw/versatilepb.c
+++ b/hw/versatilepb.c
@@ -181,6 +181,7 @@ static void versatile_init(ram_addr_t ram_size,
 qemu_irq pic[32];
 qemu_irq sic[32];
 DeviceState *dev, *sysctl;
+SysBusDevice *busdev;
 PCIBus *pci_bus;
 NICInfo *nd;
 int n;
@@ -219,8 +220,15 @@ static void versatile_init(ram_addr_t ram_size,
 sysbus_create_simple("pl050_keyboard", 0x10006000, sic[3]);
 sysbus_create_simple("pl050_mouse", 0x10007000, sic[4]);
 
-dev = sysbus_create_varargs("versatile_pci", 0x4000,
-sic[27], sic[28], sic[29], sic[30], NULL);
+dev = qdev_create(NULL, "versatile_pci");
+busdev = sysbus_from_qdev(dev);
+qdev_init_nofail(dev);
+sysbus_mmio_map(busdev, 0, 0x4100); /* PCI self-config */
+sysbus_mmio_map(busdev, 1, 0x4200); /* PCI config */
+sysbus_connect_irq(busdev, 0, sic[27]);
+sysbus_connect_irq(busdev, 1, sic[28]);
+sysbus_connect_irq(busdev, 2, sic[29]);
+sysbus_connect_irq(busdev, 3, sic[30]);
 pci_bus = (PCIBus *)qdev_get_child_bus(dev, "pci

Re: [Qemu-devel] [PATCH] runstate: do not discard runstate changes when paused

2011-10-06 Thread Paolo Bonzini

On 10/05/2011 08:50 PM, Luiz Capitulino wrote:

>  >  >  I'm not exactly against the semantics you're proposing, but they don't
>  >  >  seem to fit today's qemu.

>  >
>  >  Today's qemu is broken here.

>
>  For me it's broken because it will abort() if you migrate a paused vm, for
>  you it seems to be broken at the semantic level.
>
>  We can fix the semantics without breaking compatibility.

s/We can/ We can't

I think we should divide stop causes into three groups:

1) those that are undone by QEMU itself:
RSTATE_DEBUG
RSTATE_SAVEVM
RSTATE_PRE_MIGRATE
RSTATE_RESTORE

For these a lock/release scheme is definitely better.  The VM should not 
start until none of these conditions is in effect, even after a "cont" 
command.

2) those that are undone by management:
RSTATE_IO_ERROR

For this we can add a new "retry" monitor command that guarantees no 
races if the user issues a "stop" or "cont" command while management is 
processing it.  Effectively, it is also a lock/release scheme but 
controlled by management.

3) those that are undone by "cont":
RSTATE_PRE_LAUNCH
RSTATE_PAUSED
RSTATE_WATCHDOG
RSTATE_POST_MIGRATE
RSTATE_PANICKED

It put here the three runstates where the VM should really not be 
restarted at all.  We can then add a new "start" command that only flips 
these five to RSTATE_RUNNING.

So the runstate is composed of six elements: five lock/unlock states (of 
which only one can be unlocked by the user), and one running/paused 
state (composed of five pause reasons + "none").  That is, the runstate 
is a tuple like [debug, savevm, pre_migrate, restore, io_error, 
pause_reason] and for the VM to run it must look like [false, false, 
false, false, false, none].

The four monitor commands would be:

1) "stop":
if runstate[pause_reason] == none then
runstate[pause_reason] = paused

2) "retry":
runstate[io_error] = false

3) "start":
runstate[pause_reason] = none

There could also be a differentiation between "start" and "start -f", 
where "-f" would be needed to get out of RSTATE_POST_MIGRATE, 
RSTATE_PANICKED and probably RSTATE_WATCHDOG too.

4) "cont": backwards compatibility provided by "retry"+"start -f".

How does this look?

Paolo

[Qemu-devel] [PATCH 19/25] parallel: Convert to isa_register_portio_list

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/parallel.c |   47 ---
 1 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/hw/parallel.c b/hw/parallel.c
index ecbc8c3..8494d94 100644
--- a/hw/parallel.c
+++ b/hw/parallel.c
@@ -448,6 +448,29 @@ static void parallel_reset(void *opaque)
 
 static const int isa_parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
 
+static const MemoryRegionPortio isa_parallel_portio_hw_list[] = {
+{ 0, 8, 1,
+  .read = parallel_ioport_read_hw,
+  .write = parallel_ioport_write_hw },
+{ 4, 1, 2,
+  .read = parallel_ioport_eppdata_read_hw2,
+  .write = parallel_ioport_eppdata_write_hw2 },
+{ 4, 1, 4,
+  .read = parallel_ioport_eppdata_read_hw4,
+  .write = parallel_ioport_eppdata_write_hw4 },
+{ 0x400, 8, 1,
+  .read = parallel_ioport_ecp_read,
+  .write = parallel_ioport_ecp_write },
+PORTIO_END_OF_LIST(),
+};
+
+static const MemoryRegionPortio isa_parallel_portio_sw_list[] = {
+{ 0, 8, 1,
+  .read = parallel_ioport_read_sw,
+  .write = parallel_ioport_write_sw },
+PORTIO_END_OF_LIST(),
+};
+
 static int parallel_isa_initfn(ISADevice *dev)
 {
 static int index;
@@ -478,25 +501,11 @@ static int parallel_isa_initfn(ISADevice *dev)
 s->status = dummy;
 }
 
-if (s->hw_driver) {
-register_ioport_write(base, 8, 1, parallel_ioport_write_hw, s);
-register_ioport_read(base, 8, 1, parallel_ioport_read_hw, s);
-isa_init_ioport_range(dev, base, 8);
-
-register_ioport_write(base+4, 1, 2, parallel_ioport_eppdata_write_hw2, 
s);
-register_ioport_read(base+4, 1, 2, parallel_ioport_eppdata_read_hw2, 
s);
-register_ioport_write(base+4, 1, 4, parallel_ioport_eppdata_write_hw4, 
s);
-register_ioport_read(base+4, 1, 4, parallel_ioport_eppdata_read_hw4, 
s);
-isa_init_ioport(dev, base+4);
-register_ioport_write(base+0x400, 8, 1, parallel_ioport_ecp_write, s);
-register_ioport_read(base+0x400, 8, 1, parallel_ioport_ecp_read, s);
-isa_init_ioport_range(dev, base+0x400, 8);
-}
-else {
-register_ioport_write(base, 8, 1, parallel_ioport_write_sw, s);
-register_ioport_read(base, 8, 1, parallel_ioport_read_sw, s);
-isa_init_ioport_range(dev, base, 8);
-}
+isa_register_portio_list(dev, base,
+ (s->hw_driver
+  ? &isa_parallel_portio_hw_list[0]
+  : &isa_parallel_portio_sw_list[0]),
+ s, "parallel");
 return 0;
 }
 
-- 
1.7.6.3

[Qemu-devel] [PATCH 14/25] fdc: Convert to isa_register_portio_list

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/fdc.c |   34 --
 1 files changed, 4 insertions(+), 30 deletions(-)

diff --git a/hw/fdc.c b/hw/fdc.c
index 0f1cee9..4b06e04 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -424,7 +424,6 @@ struct FDCtrl {
 
 typedef struct FDCtrlISABus {
 ISADevice busdev;
-MemoryRegion io_0, io_7;
 struct FDCtrl state;
 int32_t bootindexA;
 int32_t bootindexB;
@@ -1880,32 +1879,10 @@ static int fdctrl_init_common(FDCtrl *fdctrl)
 return fdctrl_connect_drives(fdctrl);
 }
 
-static uint32_t fdctrl_read_port_7(void *opaque, uint32_t reg)
-{
-return fdctrl_read(opaque, reg + 7);
-}
-
-static void fdctrl_write_port_7(void *opaque, uint32_t reg, uint32_t value)
-{
-fdctrl_write(opaque, reg + 7, value);
-}
-
-static const MemoryRegionPortio fdc_portio_0[] = {
+static const MemoryRegionPortio fdc_portio_list[] = {
 { 1, 5, 1, .read = fdctrl_read, .write = fdctrl_write },
-PORTIO_END_OF_LIST()
-};
-
-static const MemoryRegionPortio fdc_portio_7[] = {
-{ 0, 1, 1, .read = fdctrl_read_port_7, .write = fdctrl_write_port_7 },
-PORTIO_END_OF_LIST()
-};
-
-static const MemoryRegionOps fdc_ioport_0_ops = {
-.old_portio = fdc_portio_0
-};
-
-static const MemoryRegionOps fdc_ioport_7_ops = {
-.old_portio = fdc_portio_7
+{ 7, 1, 1, .read = fdctrl_read, .write = fdctrl_write },
+PORTIO_END_OF_LIST(),
 };
 
 static int isabus_fdc_init1(ISADevice *dev)
@@ -1917,10 +1894,7 @@ static int isabus_fdc_init1(ISADevice *dev)
 int dma_chann = 2;
 int ret;
 
-memory_region_init_io(&isa->io_0, &fdc_ioport_0_ops, fdctrl, "fdc", 6);
-memory_region_init_io(&isa->io_7, &fdc_ioport_7_ops, fdctrl, "fdc", 1);
-isa_register_ioport(dev, &isa->io_0, iobase);
-isa_register_ioport(dev, &isa->io_7, iobase + 7);
+isa_register_portio_list(dev, iobase, fdc_portio_list, fdctrl, "fdc");
 
 isa_init_irq(&isa->busdev, &fdctrl->irq, isairq);
 fdctrl->dma_chann = dma_chann;
-- 
1.7.6.3

[Qemu-devel] [PATCH 25/25] isa: Remove isa_init_ioport_range and isa_init_ioport

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

All users have been converted to either isa_register_ioport
or isa_register_old_portio_list.

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/isa-bus.c |   19 +--
 hw/isa.h |2 --
 2 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/hw/isa-bus.c b/hw/isa-bus.c
index 5d8ff84..7c2c261 100644
--- a/hw/isa-bus.c
+++ b/hw/isa-bus.c
@@ -83,24 +83,17 @@ void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq)
 dev->nirqs++;
 }
 
-void isa_init_ioport_range(ISADevice *dev, uint16_t start, uint16_t length)
+static inline void isa_init_ioport(ISADevice *dev, uint16_t ioport)
 {
-if (dev->ioport_id == 0 || start < dev->ioport_id) {
-dev->ioport_id = start;
+if (dev && (dev->ioport_id == 0 || ioport < dev->ioport_id)) {
+dev->ioport_id = ioport;
 }
 }
 
-void isa_init_ioport(ISADevice *dev, uint16_t ioport)
-{
-isa_init_ioport_range(dev, ioport, 1);
-}
-
 void isa_register_ioport(ISADevice *dev, MemoryRegion *io, uint16_t start)
 {
 memory_region_add_subregion(isabus->address_space_io, start, io);
-if (dev != NULL) {
-isa_init_ioport(dev, start);
-}
+isa_init_ioport(dev, start);
 }
 
 void isa_register_portio_list(ISADevice *dev, uint16_t start,
@@ -112,9 +105,7 @@ void isa_register_portio_list(ISADevice *dev, uint16_t 
start,
 /* START is how we should treat DEV, regardless of the actual
contents of the portio array.  This is how the old code
actually handled e.g. the FDC device.  */
-if (dev) {
-isa_init_ioport(dev, start);
-}
+isa_init_ioport(dev, start);
 
 portio_list_init(piolist, pio_start, opaque, name);
 portio_list_add(piolist, isabus->address_space_io, start);
diff --git a/hw/isa.h b/hw/isa.h
index 177ef95..d3cae35 100644
--- a/hw/isa.h
+++ b/hw/isa.h
@@ -28,8 +28,6 @@ ISABus *isa_bus_new(DeviceState *dev, MemoryRegion 
*address_space_io);
 void isa_bus_irqs(qemu_irq *irqs);
 qemu_irq isa_get_irq(int isairq);
 void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq);
-void isa_init_ioport(ISADevice *dev, uint16_t ioport);
-void isa_init_ioport_range(ISADevice *dev, uint16_t start, uint16_t length);
 void isa_qdev_register(ISADeviceInfo *info);
 MemoryRegion *isa_address_space(ISADevice *dev);
 ISADevice *isa_create(const char *name);
-- 
1.7.6.3

[Qemu-devel] [PATCH 16/25] m48t59: Convert to isa_register_ioport

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

The sysbus interface is as yet unconverted.

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/m48t59.c |   15 ---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/hw/m48t59.c b/hw/m48t59.c
index 0cc361e..f318e67 100644
--- a/hw/m48t59.c
+++ b/hw/m48t59.c
@@ -73,6 +73,7 @@ struct M48t59State {
 typedef struct M48t59ISAState {
 ISADevice busdev;
 M48t59State state;
+MemoryRegion io;
 } M48t59ISAState;
 
 typedef struct M48t59SysBusState {
@@ -626,6 +627,15 @@ static void m48t59_reset_sysbus(DeviceState *d)
 m48t59_reset_common(NVRAM);
 }
 
+static const MemoryRegionPortio m48t59_portio[] = {
+{0, 4, 1, .read = NVRAM_readb, .write = NVRAM_writeb },
+PORTIO_END_OF_LIST(),
+};
+
+static const MemoryRegionOps m48t59_io_ops = {
+.old_portio = m48t59_portio,
+};
+
 /* Initialisation routine */
 M48t59State *m48t59_init(qemu_irq IRQ, target_phys_addr_t mem_base,
  uint32_t io_base, uint16_t size, int type)
@@ -669,10 +679,9 @@ static void m48t59_reset_sysbus(DeviceState *d)
 d = DO_UPCAST(M48t59ISAState, busdev, dev);
 s = &d->state;
 
+memory_region_init_io(&d->io, &m48t59_io_ops, s, "m48t59", 4);
 if (io_base != 0) {
-register_ioport_read(io_base, 0x04, 1, NVRAM_readb, s);
-register_ioport_write(io_base, 0x04, 1, NVRAM_writeb, s);
-isa_init_ioport_range(dev, io_base, 4);
+isa_register_ioport(dev, &d->io, io_base);
 }
 
 return s;
-- 
1.7.6.3

[Qemu-devel] [PATCH 05/25] petalogix_s2adsp1800: convert to memory API

2011-10-06 Thread Avi Kivity

Signed-off-by: Avi Kivity 
---
 hw/petalogix_s3adsp1800_mmu.c |   18 ++
 1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/petalogix_s3adsp1800_mmu.c b/hw/petalogix_s3adsp1800_mmu.c
index 66fb96d..17da2fd 100644
--- a/hw/petalogix_s3adsp1800_mmu.c
+++ b/hw/petalogix_s3adsp1800_mmu.c
@@ -35,6 +35,7 @@
 #include "loader.h"
 #include "elf.h"
 #include "blockdev.h"
+#include "exec-memory.h"
 
 #include "microblaze_pic_cpu.h"
 
@@ -125,9 +126,10 @@ static uint64_t translate_kernel_address(void *opaque, 
uint64_t addr)
 DriveInfo *dinfo;
 int i;
 target_phys_addr_t ddr_base = 0x9000;
-ram_addr_t phys_lmb_bram;
-ram_addr_t phys_ram;
+MemoryRegion *phys_lmb_bram = g_new(MemoryRegion, 1);
+MemoryRegion *phys_ram = g_new(MemoryRegion, 1);
 qemu_irq irq[32], *cpu_irq;
+MemoryRegion *sysmem = get_system_memory();
 
 /* init CPUs */
 if (cpu_model == NULL) {
@@ -139,13 +141,13 @@ static uint64_t translate_kernel_address(void *opaque, 
uint64_t addr)
 qemu_register_reset(main_cpu_reset, env);
 
 /* Attach emulated BRAM through the LMB.  */
-phys_lmb_bram = qemu_ram_alloc(NULL, "petalogix_s3adsp1800.lmb_bram",
-   LMB_BRAM_SIZE);
-cpu_register_physical_memory(0x, LMB_BRAM_SIZE,
- phys_lmb_bram | IO_MEM_RAM);
+memory_region_init_ram(phys_lmb_bram, NULL,
+   "petalogix_s3adsp1800.lmb_bram", LMB_BRAM_SIZE);
+memory_region_add_subregion(sysmem, 0x, phys_lmb_bram);
 
-phys_ram = qemu_ram_alloc(NULL, "petalogix_s3adsp1800.ram", ram_size);
-cpu_register_physical_memory(ddr_base, ram_size, phys_ram | IO_MEM_RAM);
+memory_region_init_ram(phys_ram, NULL, "petalogix_s3adsp1800.ram",
+   ram_size);
+memory_region_add_subregion(sysmem, ddr_base, phys_ram);
 
 dinfo = drive_get(IF_PFLASH, 0, 0);
 pflash_cfi01_register(0xa000,
-- 
1.7.6.3

[Qemu-devel] [PATCH 21/25] vga: Convert to isa_register_portio_list

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

[jan: fix cut'n'paste errors]
[avi: adjust pci variants not to use isa functions]

Signed-off-by: Richard Henderson 
Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 
---
 hw/qxl.c|2 +-
 hw/vga-isa.c|   17 
 hw/vga-pci.c|2 +-
 hw/vga.c|   73 +++---
 hw/vga_int.h|7 -
 hw/vmware_vga.c |7 +++--
 6 files changed, 59 insertions(+), 49 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index 6db2f1a..03848ed 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -1601,7 +1601,7 @@ static int qxl_init_primary(PCIDevice *dev)
 ram_size = 32 * 1024 * 1024;
 }
 vga_common_init(vga, ram_size);
-vga_init(vga, pci_address_space(dev));
+vga_init(vga, pci_address_space(dev), pci_address_space_io(dev), false);
 register_ioport_write(0x3c0, 16, 1, qxl_vga_ioport_write, vga);
 register_ioport_write(0x3b4,  2, 1, qxl_vga_ioport_write, vga);
 register_ioport_write(0x3d4,  2, 1, qxl_vga_ioport_write, vga);
diff --git a/hw/vga-isa.c b/hw/vga-isa.c
index 6b5c8ed..4825313 100644
--- a/hw/vga-isa.c
+++ b/hw/vga-isa.c
@@ -47,24 +47,19 @@ static int vga_initfn(ISADevice *dev)
 ISAVGAState *d = DO_UPCAST(ISAVGAState, dev, dev);
 VGACommonState *s = &d->state;
 MemoryRegion *vga_io_memory;
+const MemoryRegionPortio *vga_ports, *vbe_ports;
 
 vga_common_init(s, VGA_RAM_SIZE);
 s->legacy_address_space = isa_address_space(dev);
-vga_io_memory = vga_init_io(s);
+vga_io_memory = vga_init_io(s, &vga_ports, &vbe_ports);
+isa_register_portio_list(dev, 0x3b0, vga_ports, s, "vga");
+if (vbe_ports) {
+isa_register_portio_list(dev, 0x1ce, vbe_ports, s, "vbe");
+}
 memory_region_add_subregion_overlap(isa_address_space(dev),
 isa_mem_base + 0x000a,
 vga_io_memory, 1);
 memory_region_set_coalescing(vga_io_memory);
-isa_init_ioport(dev, 0x3c0);
-isa_init_ioport(dev, 0x3b4);
-isa_init_ioport(dev, 0x3ba);
-isa_init_ioport(dev, 0x3da);
-isa_init_ioport(dev, 0x3c0);
-#ifdef CONFIG_BOCHS_VBE
-isa_init_ioport(dev, 0x1ce);
-isa_init_ioport(dev, 0x1cf);
-isa_init_ioport(dev, 0x1d0);
-#endif /* CONFIG_BOCHS_VBE */
 s->ds = graphic_console_init(s->update, s->invalidate,
  s->screen_dump, s->text_update, s);
 
diff --git a/hw/vga-pci.c b/hw/vga-pci.c
index 3c8bcb0..14bfadb 100644
--- a/hw/vga-pci.c
+++ b/hw/vga-pci.c
@@ -54,7 +54,7 @@ static int pci_vga_initfn(PCIDevice *dev)
 
  // vga + console init
  vga_common_init(s, VGA_RAM_SIZE);
- vga_init(s, pci_address_space(dev));
+ vga_init(s, pci_address_space(dev), pci_address_space_io(dev), true);
 
  s->ds = graphic_console_init(s->update, s->invalidate,
   s->screen_dump, s->text_update, s);
diff --git a/hw/vga.c b/hw/vga.c
index f9a6014..5beaa99 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -2241,40 +2241,39 @@ void vga_common_init(VGACommonState *s, int 
vga_ram_size)
 vga_dirty_log_start(s);
 }
 
-/* used by both ISA and PCI */
-MemoryRegion *vga_init_io(VGACommonState *s)
-{
-MemoryRegion *vga_mem;
-
-register_ioport_write(0x3c0, 16, 1, vga_ioport_write, s);
-
-register_ioport_write(0x3b4, 2, 1, vga_ioport_write, s);
-register_ioport_write(0x3d4, 2, 1, vga_ioport_write, s);
-register_ioport_write(0x3ba, 1, 1, vga_ioport_write, s);
-register_ioport_write(0x3da, 1, 1, vga_ioport_write, s);
-
-register_ioport_read(0x3c0, 16, 1, vga_ioport_read, s);
-
-register_ioport_read(0x3b4, 2, 1, vga_ioport_read, s);
-register_ioport_read(0x3d4, 2, 1, vga_ioport_read, s);
-register_ioport_read(0x3ba, 1, 1, vga_ioport_read, s);
-register_ioport_read(0x3da, 1, 1, vga_ioport_read, s);
+static const MemoryRegionPortio vga_portio_list[] = {
+{ 0x04,  2, 1, .read = vga_ioport_read, .write = vga_ioport_write }, /* 
3b4 */
+{ 0x0a,  1, 1, .read = vga_ioport_read, .write = vga_ioport_write }, /* 
3ba */
+{ 0x10, 16, 1, .read = vga_ioport_read, .write = vga_ioport_write }, /* 
3c0 */
+{ 0x24,  2, 1, .read = vga_ioport_read, .write = vga_ioport_write }, /* 
3d4 */
+{ 0x2a,  1, 1, .read = vga_ioport_read, .write = vga_ioport_write }, /* 
3da */
+PORTIO_END_OF_LIST(),
+};
 
 #ifdef CONFIG_BOCHS_VBE
-#if defined (TARGET_I386)
-register_ioport_read(0x1ce, 1, 2, vbe_ioport_read_index, s);
-register_ioport_read(0x1cf, 1, 2, vbe_ioport_read_data, s);
+static const MemoryRegionPortio vbe_portio_list[] = {
+{ 0, 1, 2, .read = vbe_ioport_read_index, .write = vbe_ioport_write_index 
},
+# ifdef TARGET_I386
+{ 1, 1, 2, .read = vbe_ioport_read_data, .write = vbe_ioport_write_data },
+# else
+{ 2, 1, 2, .read = vbe_ioport_read_data, .write = vbe_ioport_write_data },
+# endif
+PORTIO_END_OF_LIST(),
+};
+#endif /* CONFIG_BOCHS_VBE */
 
-register

[Qemu-devel] [PATCH 12/25] memory: Fix old portio word accesses

2011-10-06 Thread Avi Kivity

From: Jan Kiszka 

As we register old portio regions via ioport_register, we are also
responsible for providing the word access wrapper.

Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 
---
 memory.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/memory.c b/memory.c
index 528e5fb..a8359b1 100644
--- a/memory.c
+++ b/memory.c
@@ -404,6 +404,11 @@ static void memory_region_iorange_read(IORange *iorange,
 *data = ((uint64_t)1 << (width * 8)) - 1;
 if (mrp) {
 *data = mrp->read(mr->opaque, offset + mr->offset);
+} else if (width == 2) {
+mrp = find_portio(mr, offset, 1, false);
+assert(mrp);
+*data = mrp->read(mr->opaque, offset + mr->offset) |
+(mrp->read(mr->opaque, offset + mr->offset + 1) << 8);
 }
 return;
 }
@@ -426,6 +431,11 @@ static void memory_region_iorange_write(IORange *iorange,
 
 if (mrp) {
 mrp->write(mr->opaque, offset + mr->offset, data);
+} else if (width == 2) {
+mrp = find_portio(mr, offset, 1, false);
+assert(mrp);
+mrp->write(mr->opaque, offset + mr->offset, data & 0xff);
+mrp->write(mr->opaque, offset + mr->offset + 1, data >> 8);
 }
 return;
 }
-- 
1.7.6.3

Re: [Qemu-devel] [RFC 0/2] target-arm: Adding Cortex-R4F support

2011-10-06 Thread Peter Maydell

On 6 October 2011 11:16, Andreas Färber  wrote:
> Am 02.10.2011 23:44, schrieb Peter Maydell:
>> On 2 October 2011 19:56, Andreas Färber  wrote:
>>> 1) Currently, -cpu is used to look up a Main ID Register value and to base
>>> feature decisions on that. This doesn't work for Cortex-R4 and Cortex-R4F,
>>> which have an identical MIDR but only -R4F has the FPU.
>>> Re-checking the model string, while ugly, does the trick. Comments?
>>
>> That is indeed kind of ugly. I think if CPUID value isn't a unique value
>> for the things we pass to -cpu then we shouldn't treat it as one.
>
> For the reset, the MIDR is read, then the memset() is performed and
> cpu_reset_model_id() is called with the previously read MIDR value,
> which the function then writes into the register first thing. I'd
> suggest to move that out into cpu_reset(), drop the id parameter and
> switch on the register instead (only other use is cpu_abort()).

If we're shuffling code around we should probably be doing something like:
 * in cpu_arm_init() look at the model string and set feature switches,
   ID register values, etc
 * in reset, don't reset ID registers (they're constant, after all),
   and [as with the rest of the code] behave based only on cpuid and
   feature switches

>> More
>> generally, it would be nice to be able to say "I want a Cortex-A9
>> but I only want the no-neon VFPv3D16 variant". (I think some of the
>> other targets already have syntax for this.)
>
> Coming from a ppc background, we have a whole matrix of processors with
> fixed features but I'm not aware of an arch where we opt-in/out
> processor core features.

target-i386 seems to have some code for handling syntax like this
(you seem to be able to say -cpu pentium,-fpu for instance).

>> I think that (1) the bare CPU name should be the most recent rev of the
>> core that QEMU knows about [and that we should be happy to change qemu
>> to move up to supporting newer revisions]
>
>> (Anybody want to argue with (1) ?)
>
> I concur that an easy-to-type -cpu should provide the latest and
> greatest features. Features hidden will not get much exposure. But if a
> revision noticeably changes behavior, I guess we should remain command
> line compatible.

Depends what you mean by "noticeably". User space will basically never
notice or care, typically. The kernel does care occasionally. I think
I'd rather have "cortex-foo" do the right thing for the vast majority
of users who don't care whether they get r2 or r3, rather than be stuck
with it meaning r1 because that's what we happened to model first and
there was some minor incompatible change between r1 and r2. I don't think
there's a position between "cortex-foo is always the most recent rev
we model" and "user must always specify rXpY" which doesn't lead you
into weird and confusing UI inconsistencies between CPUs.

-- PMM

[Qemu-devel] [PATCH 11/25] Introduce PortioList

2011-10-06 Thread Avi Kivity

Add a type and methods for manipulating a list of disjoint I/O ports,
used in some older hardware devices.

Based on original patch by Richard Henderson.

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 Makefile.objs   |2 +-
 Makefile.target |2 +-
 ioport.c|  108 +++
 ioport.h|   21 +++
 memory.c|8 ++--
 5 files changed, 135 insertions(+), 6 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 8d23fbb..86ab37b 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -82,7 +82,7 @@ common-obj-$(CONFIG_WIN32) += os-win32.o
 common-obj-$(CONFIG_POSIX) += os-posix.o
 
 common-obj-y += tcg-runtime.o host-utils.o
-common-obj-y += irq.o ioport.o input.o
+common-obj-y += irq.o input.o
 common-obj-$(CONFIG_PTIMER) += ptimer.o
 common-obj-$(CONFIG_MAX7310) += max7310.o
 common-obj-$(CONFIG_WM8750) += wm8750.o
diff --git a/Makefile.target b/Makefile.target
index 88d2f1f..c1529b3 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -183,7 +183,7 @@ endif #CONFIG_BSD_USER
 # System emulator target
 ifdef CONFIG_SOFTMMU
 
-obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o
+obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
 obj-$(CONFIG_NO_PCI) += pci-stub.o
diff --git a/ioport.c b/ioport.c
index a32483b..36fa3a4 100644
--- a/ioport.c
+++ b/ioport.c
@@ -27,6 +27,7 @@
 
 #include "ioport.h"
 #include "trace.h"
+#include "memory.h"
 
 /***/
 /* IO Port */
@@ -313,3 +314,110 @@ uint32_t cpu_inl(pio_addr_t addr)
 LOG_IOPORT("inl : %04"FMT_pioaddr" %08"PRIx32"\n", addr, val);
 return val;
 }
+
+void portio_list_init(PortioList *piolist,
+  const MemoryRegionPortio *callbacks,
+  void *opaque, const char *name)
+{
+unsigned n = 0;
+
+while (callbacks[n].size) {
+++n;
+}
+
+piolist->ports = callbacks;
+piolist->nr = 0;
+piolist->regions = g_new0(MemoryRegion *, n);
+piolist->address_space = NULL;
+piolist->opaque = opaque;
+piolist->name = name;
+}
+
+void portio_list_destroy(PortioList *piolist)
+{
+g_free(piolist->regions);
+}
+
+static void portio_list_add_1(PortioList *piolist,
+  const MemoryRegionPortio *pio_init,
+  unsigned count, unsigned start,
+  unsigned off_low, unsigned off_high)
+{
+MemoryRegionPortio *pio;
+MemoryRegionOps *ops;
+MemoryRegion *region;
+unsigned i;
+
+/* Copy the sub-list and null-terminate it.  */
+pio = g_new(MemoryRegionPortio, count + 1);
+memcpy(pio, pio_init, sizeof(MemoryRegionPortio) * count);
+memset(pio + count, 0, sizeof(MemoryRegionPortio));
+
+/* Adjust the offsets to all be zero-based for the region.  */
+for (i = 0; i < count; ++i) {
+pio[i].offset -= off_low;
+}
+
+ops = g_new0(MemoryRegionOps, 1);
+ops->old_portio = pio;
+
+region = g_new(MemoryRegion, 1);
+memory_region_init_io(region, ops, piolist->opaque, piolist->name,
+  off_high - off_low);
+memory_region_set_offset(region, start + off_low);
+memory_region_add_subregion(piolist->address_space,
+start + off_low, region);
+piolist->regions[piolist->nr++] = region;
+}
+
+void portio_list_add(PortioList *piolist,
+ MemoryRegion *address_space,
+ uint32_t start)
+{
+const MemoryRegionPortio *pio, *pio_start = piolist->ports;
+unsigned int off_low, off_high, off_last, count;
+
+piolist->address_space = address_space;
+
+/* Handle the first entry specially.  */
+off_last = off_low = pio_start->offset;
+off_high = off_low + pio_start->len;
+count = 1;
+
+for (pio = pio_start + 1; pio->size != 0; pio++, count++) {
+/* All entries must be sorted by offset.  */
+assert(pio->offset >= off_last);
+off_last = pio->offset;
+
+/* If we see a hole, break the region.  */
+if (off_last > off_high) {
+portio_list_add_1(piolist, pio_start, count, start, off_low,
+  off_high);
+/* ... and start collecting anew.  */
+pio_start = pio;
+off_low = off_last;
+off_high = off_low + pio->len;
+count = 0;
+} else if (off_last + pio->len > off_high) {
+off_high = off_last + pio->len;
+}
+}
+
+/* There will always be an open sub-list.  */
+portio_list_add_1(piolist, pio_start, count, start, off_low, off_high);
+}
+
+void portio_list_del(PortioList *piolist)
+{
+MemoryRegion *mr;
+unsigned i;
+
+for (i = 0; i < piolist->nr; ++i) {
+mr = piolist->regions[i];
+

[Qemu-devel] [PATCH 17/25] rtc: Convert to isa_register_ioport

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/mc146818rtc.c |   15 ---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c
index feb3b25..2aaca2f 100644
--- a/hw/mc146818rtc.c
+++ b/hw/mc146818rtc.c
@@ -81,6 +81,7 @@
 
 typedef struct RTCState {
 ISADevice dev;
+MemoryRegion io;
 uint8_t cmos_data[128];
 uint8_t cmos_index;
 struct tm current_tm;
@@ -604,6 +605,15 @@ static void rtc_reset(void *opaque)
 #endif
 }
 
+static const MemoryRegionPortio cmos_portio[] = {
+{0, 2, 1, .read = cmos_ioport_read, .write = cmos_ioport_write },
+PORTIO_END_OF_LIST(),
+};
+
+static const MemoryRegionOps cmos_ops = {
+.old_portio = cmos_portio
+};
+
 static int rtc_initfn(ISADevice *dev)
 {
 RTCState *s = DO_UPCAST(RTCState, dev, dev);
@@ -632,9 +642,8 @@ static int rtc_initfn(ISADevice *dev)
 qemu_get_clock_ns(rtc_clock) + (get_ticks_per_sec() * 99) / 100;
 qemu_mod_timer(s->second_timer2, s->next_second_time);
 
-register_ioport_write(base, 2, 1, cmos_ioport_write, s);
-register_ioport_read(base, 2, 1, cmos_ioport_read, s);
-isa_init_ioport_range(dev, base, 2);
+memory_region_init_io(&s->io, &cmos_ops, s, "rtc", 2);
+isa_register_ioport(dev, &s->io, base);
 
 qdev_set_legacy_instance_id(&dev->qdev, base, 2);
 qemu_register_reset(rtc_reset, s);
-- 
1.7.6.3

[Qemu-devel] [PATCH 15/25] gus: Convert to isa_register_portio_list

2011-10-06 Thread Avi Kivity

From: Richard Henderson 

Signed-off-by: Richard Henderson 
Signed-off-by: Avi Kivity 
---
 hw/gus.c |   39 +++
 1 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/hw/gus.c b/hw/gus.c
index 37e543a..1532686 100644
--- a/hw/gus.c
+++ b/hw/gus.c
@@ -232,6 +232,22 @@ static int GUS_read_DMA (void *opaque, int nchan, int 
dma_pos, int dma_len)
 }
 };
 
+static const MemoryRegionPortio gus_portio_list1[] = {
+{0x000,  1, 1, .write = gus_writeb },
+{0x000,  1, 2, .write = gus_writew },
+{0x006, 10, 1, .read = gus_readb, .write = gus_writeb },
+{0x006, 10, 2, .read = gus_readw, .write = gus_writew },
+{0x100,  8, 1, .read = gus_readb, .write = gus_writeb },
+{0x100,  8, 2, .read = gus_readw, .write = gus_writew },
+PORTIO_END_OF_LIST(),
+};
+
+static const MemoryRegionPortio gus_portio_list2[] = {
+{0, 1, 1, .read = gus_readb },
+{0, 1, 2, .read = gus_readw },
+PORTIO_END_OF_LIST(),
+};
+
 static int gus_initfn (ISADevice *dev)
 {
 GUSState *s = DO_UPCAST(GUSState, dev, dev);
@@ -262,26 +278,9 @@ static int gus_initfn (ISADevice *dev)
 s->samples = AUD_get_buffer_size_out (s->voice) >> s->shift;
 s->mixbuf = g_malloc0 (s->samples << s->shift);
 
-register_ioport_write (s->port, 1, 1, gus_writeb, s);
-register_ioport_write (s->port, 1, 2, gus_writew, s);
-isa_init_ioport_range(dev, s->port, 2);
-
-register_ioport_read ((s->port + 0x100) & 0xf00, 1, 1, gus_readb, s);
-register_ioport_read ((s->port + 0x100) & 0xf00, 1, 2, gus_readw, s);
-isa_init_ioport_range(dev, (s->port + 0x100) & 0xf00, 2);
-
-register_ioport_write (s->port + 6, 10, 1, gus_writeb, s);
-register_ioport_write (s->port + 6, 10, 2, gus_writew, s);
-register_ioport_read (s->port + 6, 10, 1, gus_readb, s);
-register_ioport_read (s->port + 6, 10, 2, gus_readw, s);
-isa_init_ioport_range(dev, s->port + 6, 10);
-
-
-register_ioport_write (s->port + 0x100, 8, 1, gus_writeb, s);
-register_ioport_write (s->port + 0x100, 8, 2, gus_writew, s);
-register_ioport_read (s->port + 0x100, 8, 1, gus_readb, s);
-register_ioport_read (s->port + 0x100, 8, 2, gus_readw, s);
-isa_init_ioport_range(dev, s->port + 0x100, 8);
+isa_register_portio_list(dev, s->port, gus_portio_list1, s, "gus");
+isa_register_portio_list(dev, (s->port + 0x100) & 0xf00,
+ gus_portio_list2, s, "gus");
 
 DMA_register_channel (s->emu.gusdma, GUS_read_DMA, s);
 s->emu.himemaddr = s->himem;
-- 
1.7.6.3

[Qemu-devel] [PATCH 03/25] palm: convert to memory API

2011-10-06 Thread Avi Kivity

Signed-off-by: Avi Kivity 
---
 hw/palm.c |   53 +
 1 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/hw/palm.c b/hw/palm.c
index d8f50e3..094bfde 100644
--- a/hw/palm.c
+++ b/hw/palm.c
@@ -54,16 +54,12 @@ static void static_write(void *opaque, target_phys_addr_t 
offset,
 #endif
 }
 
-static CPUReadMemoryFunc * const static_readfn[] = {
-static_readb,
-static_readh,
-static_readw,
-};
-
-static CPUWriteMemoryFunc * const static_writefn[] = {
-static_write,
-static_write,
-static_write,
+static const MemoryRegionOps static_ops = {
+.old_mmio = {
+.read = { static_readb, static_readh, static_readw, },
+.write = { static_write, static_write, static_write, },
+},
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 /* Palm Tunsgten|E support */
@@ -203,34 +199,35 @@ static void palmte_init(ram_addr_t ram_size,
 struct omap_mpu_state_s *cpu;
 int flash_size = 0x0080;
 int sdram_size = palmte_binfo.ram_size;
-int io;
 static uint32_t cs0val = 0x;
 static uint32_t cs1val = 0xe1a0;
 static uint32_t cs2val = 0xe1a0;
 static uint32_t cs3val = 0xe1a0e1a0;
 int rom_size, rom_loaded = 0;
 DisplayState *ds = get_displaystate();
+MemoryRegion *flash = g_new(MemoryRegion, 1);
+MemoryRegion *cs = g_new(MemoryRegion, 4);
 
 cpu = omap310_mpu_init(address_space_mem, sdram_size, cpu_model);
 
 /* External Flash (EMIFS) */
-cpu_register_physical_memory(OMAP_CS0_BASE, flash_size,
- qemu_ram_alloc(NULL, "palmte.flash",
-flash_size) | IO_MEM_ROM);
-
-io = cpu_register_io_memory(static_readfn, static_writefn, &cs0val,
-DEVICE_NATIVE_ENDIAN);
-cpu_register_physical_memory(OMAP_CS0_BASE + flash_size,
-OMAP_CS0_SIZE - flash_size, io);
-io = cpu_register_io_memory(static_readfn, static_writefn, &cs1val,
-DEVICE_NATIVE_ENDIAN);
-cpu_register_physical_memory(OMAP_CS1_BASE, OMAP_CS1_SIZE, io);
-io = cpu_register_io_memory(static_readfn, static_writefn, &cs2val,
-DEVICE_NATIVE_ENDIAN);
-cpu_register_physical_memory(OMAP_CS2_BASE, OMAP_CS2_SIZE, io);
-io = cpu_register_io_memory(static_readfn, static_writefn, &cs3val,
-DEVICE_NATIVE_ENDIAN);
-cpu_register_physical_memory(OMAP_CS3_BASE, OMAP_CS3_SIZE, io);
+memory_region_init_ram(flash, NULL, "palmte.flash", flash_size);
+memory_region_set_readonly(flash, true);
+memory_region_add_subregion(address_space_mem, OMAP_CS0_BASE, flash);
+
+memory_region_init_io(&cs[0], &static_ops, &cs0val, "palmte-cs0",
+  OMAP_CS0_SIZE - flash_size);
+memory_region_add_subregion(address_space_mem, OMAP_CS0_BASE + flash_size,
+&cs[0]);
+memory_region_init_io(&cs[1], &static_ops, &cs1val, "palmte-cs1",
+  OMAP_CS1_SIZE);
+memory_region_add_subregion(address_space_mem, OMAP_CS1_BASE, &cs[1]);
+memory_region_init_io(&cs[2], &static_ops, &cs2val, "palmte-cs2",
+  OMAP_CS2_SIZE);
+memory_region_add_subregion(address_space_mem, OMAP_CS2_BASE, &cs[2]);
+memory_region_init_io(&cs[3], &static_ops, &cs3val, "palmte-cs3",
+  OMAP_CS3_SIZE);
+memory_region_add_subregion(address_space_mem, OMAP_CS3_BASE, &cs[3]);
 
 palmte_microwire_setup(cpu);
 
-- 
1.7.6.3

[Qemu-devel] [PATCH 00/25] Memory API converions, batch 11

2011-10-06 Thread Avi Kivity

Review before push.  I see that Alex's patch is also in the ppc queue, I'll
drop it if it's merged before.

Alexander Graf (1):
  PPC: Fix via-cuda memory registration

Avi Kivity (7):
  palm: convert to memory API
  petalogix_ml605: convert to memory API
  petalogix_s2adsp1800: convert to memory API
  ppc405_boards: convert to memory API
  ppc_newworld: convert to memory API
  Introduce PortioList
  isa: Add isa_register_portio_list()

Jan Kiszka (1):
  memory: Fix old portio word accesses

Peter Maydell (3):
  hw/lan9118.c: Convert to MemoryRegion
  hw/arm11mpcore: Clean up to avoid using sysbus_mmio_init_cb2
  hw/versatile_pci: Expose multiple sysbus mmio regions

Richard Henderson (13):
  isa: Tidy support code for isabus_get_fw_dev_path
  fdc: Convert to isa_register_portio_list
  gus: Convert to isa_register_portio_list
  m48t59: Convert to isa_register_ioport
  rtc: Convert to isa_register_ioport
  ne2000: Convert to isa_register_ioport
  parallel: Convert to isa_register_portio_list
  sb16: Convert to isa_register_portio_list
  vga: Convert to isa_register_portio_list
  pc: Convert port92 to isa_register_ioport
  vmport: Convert to isa_register_ioport
  ide: Convert to isa_register_portio_list
  isa: Remove isa_init_ioport_range and isa_init_ioport

 Makefile.objs |2 +-
 Makefile.target   |2 +-
 hw/arm11mpcore.c  |   13 +-
 hw/cuda.c |   28 ++-
 hw/fdc.c  |   34 ++---
 hw/gus.c  |   39 +++
 hw/ide/core.c |   30 +++
 hw/ide/internal.h |3 +-
 hw/ide/isa.c  |4 +-
 hw/ide/piix.c |7 ++-
 hw/ide/via.c  |7 ++-
 hw/isa-bus.c  |   45 +++--
 hw/isa.h  |   38 ---
 hw/lan9118.c  |   29 ---
 hw/m48t59.c   |   15 +-
 hw/mc146818rtc.c  |   15 +-
 hw/ne2000-isa.c   |5 +--
 hw/palm.c |   53 +---
 hw/parallel.c |   47 +++---
 hw/pc.c   |   16 +-
 hw/petalogix_ml605_mmu.c  |   15 +++---
 hw/petalogix_s3adsp1800_mmu.c |   18 ---
 hw/ppc405_boards.c|   84 ++--
 hw/ppc_newworld.c |   39 ++
 hw/qxl.c  |2 +-
 hw/realview.c |   12 -
 hw/sb16.c |   32 +---
 hw/versatile_pci.c|   42 +++-
 hw/versatilepb.c  |   12 -
 hw/vga-isa.c  |   17 ++
 hw/vga-pci.c  |2 +-
 hw/vga.c  |   73 
 hw/vga_int.h  |7 ++-
 hw/vmport.c   |   16 +-
 hw/vmware_vga.c   |7 ++-
 ioport.c  |  108 +
 ioport.h  |   21 
 memory.c  |   18 +--
 38 files changed, 551 insertions(+), 406 deletions(-)

-- 
1.7.6.3

[Qemu-devel] [PATCH 01/25] hw/lan9118.c: Convert to MemoryRegion

2011-10-06 Thread Avi Kivity

From: Peter Maydell 

Signed-off-by: Peter Maydell 
Signed-off-by: Avi Kivity 
---
 hw/lan9118.c |   29 +++--
 1 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/hw/lan9118.c b/hw/lan9118.c
index 73a8661..634b88e 100644
--- a/hw/lan9118.c
+++ b/hw/lan9118.c
@@ -152,7 +152,7 @@ enum tx_state {
 NICState *nic;
 NICConf conf;
 qemu_irq irq;
-int mmio_index;
+MemoryRegion mmio;
 ptimer_state *timer;
 
 uint32_t irq_cfg;
@@ -895,7 +895,7 @@ static void lan9118_tick(void *opaque)
 }
 
 static void lan9118_writel(void *opaque, target_phys_addr_t offset,
-   uint32_t val)
+   uint64_t val, unsigned size)
 {
 lan9118_state *s = (lan9118_state *)opaque;
 offset &= 0xff;
@@ -1022,13 +1022,14 @@ static void lan9118_writel(void *opaque, 
target_phys_addr_t offset,
 break;
 
 default:
-hw_error("lan9118_write: Bad reg 0x%x = %x\n", (int)offset, val);
+hw_error("lan9118_write: Bad reg 0x%x = %x\n", (int)offset, (int)val);
 break;
 }
 lan9118_update(s);
 }
 
-static uint32_t lan9118_readl(void *opaque, target_phys_addr_t offset)
+static uint64_t lan9118_readl(void *opaque, target_phys_addr_t offset,
+  unsigned size)
 {
 lan9118_state *s = (lan9118_state *)opaque;
 
@@ -1101,16 +1102,10 @@ static uint32_t lan9118_readl(void *opaque, 
target_phys_addr_t offset)
 return 0;
 }
 
-static CPUReadMemoryFunc * const lan9118_readfn[] = {
-lan9118_readl,
-lan9118_readl,
-lan9118_readl
-};
-
-static CPUWriteMemoryFunc * const lan9118_writefn[] = {
-lan9118_writel,
-lan9118_writel,
-lan9118_writel
+static const MemoryRegionOps lan9118_mem_ops = {
+.read = lan9118_readl,
+.write = lan9118_writel,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 static void lan9118_cleanup(VLANClientState *nc)
@@ -1135,10 +1130,8 @@ static int lan9118_init1(SysBusDevice *dev)
 QEMUBH *bh;
 int i;
 
-s->mmio_index = cpu_register_io_memory(lan9118_readfn,
-   lan9118_writefn, s,
-   DEVICE_NATIVE_ENDIAN);
-sysbus_init_mmio(dev, 0x100, s->mmio_index);
+memory_region_init_io(&s->mmio, &lan9118_mem_ops, s, "lan9118-mmio", 
0x100);
+sysbus_init_mmio_region(dev, &s->mmio);
 sysbus_init_irq(dev, &s->irq);
 qemu_macaddr_default_if_unset(&s->conf.macaddr);
 
-- 
1.7.6.3

Re: [Qemu-devel] [RFC 0/2] target-arm: Adding Cortex-R4F support

2011-10-06 Thread Andreas Färber

Am 02.10.2011 23:44, schrieb Peter Maydell:
> On 2 October 2011 19:56, Andreas Färber  wrote:
>> I've been looking into adding support for Cortex-R4F.
> 
> Ooh, that will be the first R profile core. In particular the only
> other non-M-profile PMSA core we support is the 946 which was a v5
> core,

Yeah, I rarely pick the easy tasks. :)

>> 1) Currently, -cpu is used to look up a Main ID Register value and to base
>> feature decisions on that. This doesn't work for Cortex-R4 and Cortex-R4F,
>> which have an identical MIDR but only -R4F has the FPU.
>> Re-checking the model string, while ugly, does the trick. Comments?
> 
> That is indeed kind of ugly. I think if CPUID value isn't a unique value
> for the things we pass to -cpu then we shouldn't treat it as one.

For the reset, the MIDR is read, then the memset() is performed and
cpu_reset_model_id() is called with the previously read MIDR value,
which the function then writes into the register first thing. I'd
suggest to move that out into cpu_reset(), drop the id parameter and
switch on the register instead (only other use is cpu_abort()).

> More
> generally, it would be nice to be able to say "I want a Cortex-A9
> but I only want the no-neon VFPv3D16 variant". (I think some of the
> other targets already have syntax for this.)

Coming from a ppc background, we have a whole matrix of processors with
fixed features but I'm not aware of an arch where we opt-in/out
processor core features.

> Currently the approach is to say "you only get one variant of the
> processor, and it's the one with all the bells and whistles enabled".
> That would imply that '-cpu cortex-r4' gives you one with an FPU.

I'll go with cortex-r4f then.

> I think that (1) the bare CPU name should be the most recent rev of the
> core that QEMU knows about [and that we should be happy to change qemu
> to move up to supporting newer revisions]

> (Anybody want to argue with (1) ?)

I concur that an easy-to-type -cpu should provide the latest and
greatest features. Features hidden will not get much exposure. But if a
revision noticeably changes behavior, I guess we should remain command
line compatible.

Andreas

Re: [Qemu-devel] [PATCH v2] tap: Add optional parameters to up/down script

2011-10-06 Thread Thomas Jung

On 30.09 2011 11:45, Sasha Levin wrote:
> Subject: [PATCH v2] tap: Add optional parameters to up/down script
>
> This allows the user to add custom parameters to the up or down
> scripts.
> 
> Extra parameters are useful in more complex networking scenarios
> where we would like to configure network devices when starting
> or stopping the guest.

PATCH v2 isn't working for me. Neither the scriptparams nor the
downscriptparams.
Usage was like:
qemu-system-x86_64 -m 512 -net nic,macaddr=[...] -net
tap,script=/path/to/script,scriptparams="param1",
downscript=/path/to/downscript -drive...

Greetings
Thomas

[Qemu-devel] [PATCH 13/64] PPC: E500: Generate IRQ lines for many CPUs

2011-10-06 Thread Alexander Graf

Now that we can generate multiple envs for all our virtual CPUs, we
also need to tell the MPIC that we have multiple CPUs connected and
connect them all to the respective virtual interrupt lines.

Signed-off-by: Alexander Graf 
---
 hw/ppce500_mpc8544ds.c |   17 -
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c
index 8d05587..9cb01f3 100644
--- a/hw/ppce500_mpc8544ds.c
+++ b/hw/ppce500_mpc8544ds.c
@@ -237,7 +237,7 @@ static void mpc8544ds_init(ram_addr_t ram_size,
 target_long initrd_size=0;
 int i=0;
 unsigned int pci_irq_nrs[4] = {1, 2, 3, 4};
-qemu_irq *irqs, *mpic;
+qemu_irq **irqs, *mpic;
 DeviceState *dev;
 struct boot_info *boot_info;
 CPUState *firstenv = NULL;
@@ -247,6 +247,8 @@ static void mpc8544ds_init(ram_addr_t ram_size,
 cpu_model = "e500v2_v30";
 }
 
+irqs = g_malloc0(smp_cpus * sizeof(qemu_irq *));
+irqs[0] = g_malloc0(smp_cpus * sizeof(qemu_irq) * OPENPIC_OUTPUT_NB);
 for (i = 0; i < smp_cpus; i++) {
 qemu_irq *input;
 env = cpu_ppc_init(cpu_model);
@@ -259,6 +261,10 @@ static void mpc8544ds_init(ram_addr_t ram_size,
 firstenv = env;
 }
 
+irqs[i] = irqs[0] + (i * OPENPIC_OUTPUT_NB);
+input = (qemu_irq *)env->irq_inputs;
+irqs[i][OPENPIC_OUTPUT_INT] = input[PPCE500_INPUT_INT];
+irqs[i][OPENPIC_OUTPUT_CINT] = input[PPCE500_INPUT_CINT];
 env->spr[SPR_BOOKE_PIR] = env->cpu_index = i;
 
 /* XXX register timer? */
@@ -283,10 +289,11 @@ static void mpc8544ds_init(ram_addr_t ram_size,
  "mpc8544ds.ram", ram_size));
 
 /* MPIC */
-irqs = g_malloc0(sizeof(qemu_irq) * OPENPIC_OUTPUT_NB);
-irqs[OPENPIC_OUTPUT_INT] = ((qemu_irq 
*)env->irq_inputs)[PPCE500_INPUT_INT];
-irqs[OPENPIC_OUTPUT_CINT] = ((qemu_irq 
*)env->irq_inputs)[PPCE500_INPUT_CINT];
-mpic = mpic_init(MPC8544_MPIC_REGS_BASE, 1, &irqs, NULL);
+mpic = mpic_init(MPC8544_MPIC_REGS_BASE, smp_cpus, irqs, NULL);
+
+if (!mpic) {
+cpu_abort(env, "MPIC failed to initialize\n");
+}
 
 /* Serial */
 if (serial_hds[0]) {
-- 
1.6.0.2

[Qemu-devel] [PATCH 20/64] PPC: KVM: Remove kvmppc_read_host_property

2011-10-06 Thread Alexander Graf

We just got rid of the last user of kvmppc_read_host_property, so we
can now safely remove it.

Signed-off-by: Alexander Graf 
---
 target-ppc/kvm_ppc.c |   35 ---
 target-ppc/kvm_ppc.h |   11 ---
 2 files changed, 0 insertions(+), 46 deletions(-)

diff --git a/target-ppc/kvm_ppc.c b/target-ppc/kvm_ppc.c
index 26ecc9d..24fc6bc 100644
--- a/target-ppc/kvm_ppc.c
+++ b/target-ppc/kvm_ppc.c
@@ -21,41 +21,6 @@
 static QEMUTimer *kvmppc_timer;
 static unsigned int kvmppc_timer_rate;
 
-#ifdef CONFIG_FDT
-int kvmppc_read_host_property(const char *node_path, const char *prop,
- void *val, size_t len)
-{
-char *path;
-FILE *f;
-int ret = 0;
-int pathlen;
-
-pathlen = snprintf(NULL, 0, "%s/%s/%s", PROC_DEVTREE_PATH, node_path, prop)
-  + 1;
-path = g_malloc(pathlen);
-
-snprintf(path, pathlen, "%s/%s/%s", PROC_DEVTREE_PATH, node_path, prop);
-
-f = fopen(path, "rb");
-if (f == NULL) {
-ret = errno;
-goto free;
-}
-
-len = fread(val, len, 1, f);
-if (len != 1) {
-ret = ferror(f);
-goto close;
-}
-
-close:
-fclose(f);
-free:
-free(path);
-return ret;
-}
-#endif
-
 static void kvmppc_timer_hack(void *opaque)
 {
 qemu_notify_event();
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 7c08c0f..0c659c8 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -10,17 +10,6 @@
 #define __KVM_PPC_H__
 
 void kvmppc_init(void);
-#ifndef CONFIG_KVM
-static inline int kvmppc_read_host_property(const char *node_path, const char 
*prop,
-void *val, size_t len)
-{
-assert(0);
-return -ENOSYS;
-}
-#else
-int kvmppc_read_host_property(const char *node_path, const char *prop,
- void *val, size_t len);
-#endif
 
 uint32_t kvmppc_get_tbfreq(void);
 uint64_t kvmppc_get_clockfreq(void);
-- 
1.6.0.2

[Qemu-devel] [PATCH 28/64] device tree: give dt more size

2011-10-06 Thread Alexander Graf

We currently load a device tree blob and then just take its size x2 to
account for modifications we do inside. While this is nice and great,
it fails when we have a small device tree as blob and lots of nodes added
in machine init code.

So for now, just make it 20k bigger than it was before. We maybe want to
be more clever about this later.

Signed-off-by: Alexander Graf 
---
 device_tree.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/device_tree.c b/device_tree.c
index 751538e..dc69232 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -41,6 +41,7 @@ void *load_device_tree(const char *filename_path, int *sizep)
 }
 
 /* Expand to 2x size to give enough room for manipulation.  */
+dt_size += 1;
 dt_size *= 2;
 /* First allocate space in qemu for device tree */
 fdt = g_malloc0(dt_size);
-- 
1.6.0.2

[Qemu-devel] [PATCH 39/64] pseries: More complete WIMG validation in H_ENTER code

2011-10-06 Thread Alexander Graf

From: David Gibson 

Currently our implementation of the H_ENTER hypercall, which inserts a
mapping in the hash page table assumes that only ordinary memory is ever
mapped, and only permits mapping attribute bits accordingly (WIMG==0010).

However, we intend to start adding emulated IO to the pseries platform
(and real IO with PCI passthrough on kvm) which means this simple test
will no longer suffice.

This patch extends the h_enter validation code to check if the given
address is a RAM address.  If it is it enforces WIMG==0010, otherwise
it assumes that it is an IO mapping and instead enforces WIMG=010x.

Signed-off-by: David Gibson 
Signed-off-by: Alexander Graf 
---
 hw/spapr.c   |3 ++-
 hw/spapr.h   |1 +
 hw/spapr_hcall.c |   22 ++
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 9eefef9..00aed62 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -336,7 +336,8 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 }
 
 /* allocate RAM */
-ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
+spapr->ram_limit = ram_size;
+ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", spapr->ram_limit);
 cpu_register_physical_memory(0, ram_size, ram_offset);
 
 /* allocate hash page table.  For now we always make this 16mb,
diff --git a/hw/spapr.h b/hw/spapr.h
index 009c459..3d21b7a 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -10,6 +10,7 @@ typedef struct sPAPREnvironment {
 struct VIOsPAPRBus *vio_bus;
 struct icp_state *icp;
 
+target_phys_addr_t ram_limit;
 void *htab;
 long htab_size;
 target_phys_addr_t fdt_addr, rtas_addr;
diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
index f7ead04..70f853c 100644
--- a/hw/spapr_hcall.c
+++ b/hw/spapr_hcall.c
@@ -99,6 +99,8 @@ static target_ulong h_enter(CPUState *env, sPAPREnvironment 
*spapr,
 target_ulong pte_index = args[1];
 target_ulong pteh = args[2];
 target_ulong ptel = args[3];
+target_ulong page_shift = 12;
+target_ulong raddr;
 target_ulong i;
 uint8_t *hpte;
 
@@ -111,6 +113,7 @@ static target_ulong h_enter(CPUState *env, sPAPREnvironment 
*spapr,
 #endif
 if ((ptel & 0xff000) == 0) {
 /* 16M page */
+page_shift = 24;
 /* lowest AVA bit must be 0 for 16M pages */
 if (pteh & 0x80) {
 return H_PARAMETER;
@@ -120,12 +123,23 @@ static target_ulong h_enter(CPUState *env, 
sPAPREnvironment *spapr,
 }
 }
 
-/* FIXME: bounds check the pa? */
+raddr = (ptel & HPTE_R_RPN) & ~((1ULL << page_shift) - 1);
 
-/* Check WIMG */
-if ((ptel & HPTE_R_WIMG) != HPTE_R_M) {
-return H_PARAMETER;
+if (raddr < spapr->ram_limit) {
+/* Regular RAM - should have WIMG=0010 */
+if ((ptel & HPTE_R_WIMG) != HPTE_R_M) {
+return H_PARAMETER;
+}
+} else {
+/* Looks like an IO address */
+/* FIXME: What WIMG combinations could be sensible for IO?
+ * For now we allow WIMG=010x, but are there others? */
+/* FIXME: Should we check against registered IO addresses? */
+if ((ptel & (HPTE_R_W | HPTE_R_I | HPTE_R_M)) != HPTE_R_I) {
+return H_PARAMETER;
+}
 }
+
 pteh &= ~0x60ULL;
 
 if ((pte_index * HASH_PTE_SIZE_64) & ~env->htab_mask) {
-- 
1.6.0.2

[Qemu-devel] [PATCH 29/64] MPC8544DS: Remove CPU nodes

2011-10-06 Thread Alexander Graf

We want to generate the CPU nodes in machine init code, so remove them from
the device tree definition that we precompile.

Signed-off-by: Alexander Graf 
---
 pc-bios/mpc8544ds.dtb |  Bin 2277 -> 2028 bytes
 pc-bios/mpc8544ds.dts |   12 
 2 files changed, 0 insertions(+), 12 deletions(-)

diff --git a/pc-bios/mpc8544ds.dtb b/pc-bios/mpc8544ds.dtb
index 
ae318b1fe83846cc2e133951a3666fcfcdf87f79..c6d302153c7407d5d0127be29b0c35f80e47f8fb
 100644
GIT binary patch
delta 424
zcmaDV_=aEO0`I@K3=HgV7#J8V7#P?t0BH>%76f7eAO-?P8KC%#jT*{~lRq;qVGNu+
zgGpO80wTx2Se#mvnV92XVrpOj5@H5o79dUoaVFO=n@yHu7E~<+@qhp%%K^lVK&%DC
zOh63N(K9)OS(!0yas{(Dk?LOn)z6*G!y?7RuxYXeOPCPDVW4@8NM@d#Jb@*NiQ(ep
zFD&Xnqh(mFTTP3F~Xi9v};53e2?3j(nK5CZ{YE>PTIqlPkLJ!3$Ad1_IB
zvyO$SiHU;&Seh9~vH-DTazQCb0LJ$Paex5E4+OFmkod`He4yqApb%Vr6B@stfk6!<
z4_B}V%tP=uK>19QJs6iW9+>=rQJZnmWEm!T#^aN1n7mbC@*oFs0P!Ut)&gQCAci^e
z?&LL0%0TrOh*s~wtStKu$plcqfdJG*M&`*4%wa-|B0wQVBw?w^FPM{<7?mdbu&4v=
zD`Bx>Vl;
#size-cells = <0>;
-
-   PowerPC,8544@0 {
-   device_type = "cpu";
-   reg = <0x0>;
-   d-cache-line-size = <32>;   // 32 bytes
-   i-cache-line-size = <32>;   // 32 bytes
-   d-cache-size = <0x8000>;// L1, 32K
-   i-cache-size = <0x8000>;// L1, 32K
-   timebase-frequency = <0>;
-   bus-frequency = <0>;
-   clock-frequency = <0>;
-   };
};
 
memory {
-- 
1.6.0.2

[Qemu-devel] [PATCH 26/64] device tree: add add_subnode command

2011-10-06 Thread Alexander Graf

We want to be able to create subnodes in our device tree, so export it through
the qemu device tree abstraction framework.

Signed-off-by: Alexander Graf 
---
 device_tree.c |   24 
 device_tree.h |1 +
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/device_tree.c b/device_tree.c
index 23e89e3..f4a78c8 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -118,3 +118,27 @@ int qemu_devtree_nop_node(void *fdt, const char *node_path)
 
 return fdt_nop_node(fdt, offset);
 }
+
+int qemu_devtree_add_subnode(void *fdt, const char *name)
+{
+int offset;
+char *dupname = g_strdup(name);
+char *basename = strrchr(dupname, '/');
+int retval;
+
+if (!basename) {
+return -1;
+}
+
+basename[0] = '\0';
+basename++;
+
+offset = fdt_path_offset(fdt, dupname);
+if (offset < 0) {
+return offset;
+}
+
+retval = fdt_add_subnode(fdt, offset, basename);
+g_free(dupname);
+return retval;
+}
diff --git a/device_tree.h b/device_tree.h
index 76fce5f..4378685 100644
--- a/device_tree.h
+++ b/device_tree.h
@@ -23,5 +23,6 @@ int qemu_devtree_setprop_cell(void *fdt, const char 
*node_path,
 int qemu_devtree_setprop_string(void *fdt, const char *node_path,
 const char *property, const char *string);
 int qemu_devtree_nop_node(void *fdt, const char *node_path);
+int qemu_devtree_add_subnode(void *fdt, const char *name);
 
 #endif /* __DEVICE_TREE_H__ */
-- 
1.6.0.2

[Qemu-devel] [PATCH 30/64] MPC8544DS: Generate CPU nodes on init

2011-10-06 Thread Alexander Graf

With this patch, we generate CPU nodes in the machine initialization, giving
us the freedom to generate as many nodes as we want and as the machine supports,
but only those.

This is a first step towards a much cleaner device tree generation
infrastructure, where we would not require precompiled dtb blobs anymore.

Signed-off-by: Alexander Graf 
---
 hw/ppce500_mpc8544ds.c |   46 +-
 1 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c
index a3e1ce4..dfa8034 100644
--- a/hw/ppce500_mpc8544ds.c
+++ b/hw/ppce500_mpc8544ds.c
@@ -123,23 +123,43 @@ static int mpc8544_load_device_tree(CPUState *env,
  hypercall, sizeof(hypercall));
 }
 
-for (i = 0; i < smp_cpus; i++) {
+/* We need to generate the cpu nodes in reverse order, so Linux can pick
+   the first node as boot node and be happy */
+for (i = smp_cpus - 1; i >= 0; i--) {
 char cpu_name[128];
-uint64_t cpu_release_addr[] = {
-cpu_to_be64(MPC8544_SPIN_BASE + (i * 0x20))
-};
+uint64_t cpu_release_addr = cpu_to_be64(MPC8544_SPIN_BASE + (i * 
0x20));
+
+for (env = first_cpu; env != NULL; env = env->next_cpu) {
+if (env->cpu_index == i) {
+break;
+}
+}
+
+if (!env) {
+continue;
+}
 
-snprintf(cpu_name, sizeof(cpu_name), "/cpus/PowerPC,8544@%x", i);
+snprintf(cpu_name, sizeof(cpu_name), "/cpus/PowerPC,8544@%x", 
env->cpu_index);
+qemu_devtree_add_subnode(fdt, cpu_name);
 qemu_devtree_setprop_cell(fdt, cpu_name, "clock-frequency", 
clock_freq);
 qemu_devtree_setprop_cell(fdt, cpu_name, "timebase-frequency", 
tb_freq);
-qemu_devtree_setprop(fdt, cpu_name, "cpu-release-addr",
- cpu_release_addr, sizeof(cpu_release_addr));
-}
-
-for (i = smp_cpus; i < 32; i++) {
-char cpu_name[128];
-snprintf(cpu_name, sizeof(cpu_name), "/cpus/PowerPC,8544@%x", i);
-qemu_devtree_nop_node(fdt, cpu_name);
+qemu_devtree_setprop_string(fdt, cpu_name, "device_type", "cpu");
+qemu_devtree_setprop_cell(fdt, cpu_name, "reg", env->cpu_index);
+qemu_devtree_setprop_cell(fdt, cpu_name, "d-cache-line-size",
+  env->dcache_line_size);
+qemu_devtree_setprop_cell(fdt, cpu_name, "i-cache-line-size",
+  env->icache_line_size);
+qemu_devtree_setprop_cell(fdt, cpu_name, "d-cache-size", 0x8000);
+qemu_devtree_setprop_cell(fdt, cpu_name, "i-cache-size", 0x8000);
+qemu_devtree_setprop_cell(fdt, cpu_name, "bus-frequency", 0);
+if (env->cpu_index) {
+qemu_devtree_setprop_string(fdt, cpu_name, "status", "disabled");
+qemu_devtree_setprop_string(fdt, cpu_name, "enable-method", 
"spin-table");
+qemu_devtree_setprop(fdt, cpu_name, "cpu-release-addr",
+ &cpu_release_addr, sizeof(cpu_release_addr));
+} else {
+qemu_devtree_setprop_string(fdt, cpu_name, "status", "okay");
+}
 }
 
 ret = rom_add_blob_fixed(BINARY_DEVICE_TREE_FILE, fdt, fdt_size, addr);
-- 
1.6.0.2

[Qemu-devel] [PATCH 37/64] pseries: Add a phandle to the xicp interrupt controller device tree node

2011-10-06 Thread Alexander Graf

From: David Gibson 

Future devices we will be adding to the pseries machine (e.g. PCI) will
need nodes in the device tree which explicitly reference the top-level
interrupt controller via interrupt-parent or interrupt-map properties.

In order to do this, the interrupt controller node needs an assigned
phandle.  This patch adds the appropriate property, in preparation.

Signed-off-by: David Gibson 
Signed-off-by: Alexander Graf 
---
 hw/spapr.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 760e323..bb00ae6 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -57,6 +57,8 @@
 #define MAX_CPUS256
 #define XICS_IRQS  1024
 
+#define PHANDLE_XICP0x
+
 sPAPREnvironment *spapr;
 
 static void *spapr_create_fdt_skel(const char *cpu_model,
@@ -202,6 +204,9 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
 _FDT((fdt_property(fdt, "ibm,interrupt-server-ranges",
interrupt_server_ranges_prop,
sizeof(interrupt_server_ranges_prop;
+_FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
+_FDT((fdt_property_cell(fdt, "linux,phandle", PHANDLE_XICP)));
+_FDT((fdt_property_cell(fdt, "phandle", PHANDLE_XICP)));
 
 _FDT((fdt_end_node(fdt)));
 
-- 
1.6.0.2

[Qemu-devel] [PATCH 54/64] ppc: move ADB stuff from ppc_mac.h to adb.h

2011-10-06 Thread Alexander Graf

From: Laurent Vivier 

Allow to use ADB in non-ppc macintosh

Signed-off-by: Laurent Vivier 
Signed-off-by: Alexander Graf 
---
 hw/adb.c  |2 +-
 hw/adb.h  |   67 +
 hw/cuda.c |1 +
 hw/ppc_mac.h  |   42 -
 hw/ppc_newworld.c |1 +
 hw/ppc_oldworld.c |1 +
 6 files changed, 71 insertions(+), 43 deletions(-)
 create mode 100644 hw/adb.h

diff --git a/hw/adb.c b/hw/adb.c
index 8dedbf8..aa15f55 100644
--- a/hw/adb.c
+++ b/hw/adb.c
@@ -22,7 +22,7 @@
  * THE SOFTWARE.
  */
 #include "hw.h"
-#include "ppc_mac.h"
+#include "adb.h"
 #include "console.h"
 
 /* debug ADB */
diff --git a/hw/adb.h b/hw/adb.h
new file mode 100644
index 000..b2a591c
--- /dev/null
+++ b/hw/adb.h
@@ -0,0 +1,67 @@
+/*
+ * QEMU ADB emulation shared definitions and prototypes
+ *
+ * Copyright (c) 2004-2007 Fabrice Bellard
+ * Copyright (c) 2007 Jocelyn Mayer
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#if !defined(__ADB_H__)
+#define __ADB_H__
+
+#define MAX_ADB_DEVICES 16
+
+#define ADB_MAX_OUT_LEN 16
+
+typedef struct ADBDevice ADBDevice;
+
+/* buf = NULL means polling */
+typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out,
+  const uint8_t *buf, int len);
+typedef int ADBDeviceReset(ADBDevice *d);
+
+struct ADBDevice {
+struct ADBBusState *bus;
+int devaddr;
+int handler;
+ADBDeviceRequest *devreq;
+ADBDeviceReset *devreset;
+void *opaque;
+};
+
+typedef struct ADBBusState {
+ADBDevice devices[MAX_ADB_DEVICES];
+int nb_devices;
+int poll_index;
+} ADBBusState;
+
+int adb_request(ADBBusState *s, uint8_t *buf_out,
+const uint8_t *buf, int len);
+int adb_poll(ADBBusState *s, uint8_t *buf_out);
+
+ADBDevice *adb_register_device(ADBBusState *s, int devaddr,
+   ADBDeviceRequest *devreq,
+   ADBDeviceReset *devreset,
+   void *opaque);
+void adb_kbd_init(ADBBusState *bus);
+void adb_mouse_init(ADBBusState *bus);
+
+extern ADBBusState adb_bus;
+#endif /* !defined(__ADB_H__) */
diff --git a/hw/cuda.c b/hw/cuda.c
index 5c92d81..6f05975 100644
--- a/hw/cuda.c
+++ b/hw/cuda.c
@@ -24,6 +24,7 @@
  */
 #include "hw.h"
 #include "ppc_mac.h"
+#include "adb.h"
 #include "qemu-timer.h"
 #include "sysemu.h"
 
diff --git a/hw/ppc_mac.h b/hw/ppc_mac.h
index 7351bb6..af75e45 100644
--- a/hw/ppc_mac.h
+++ b/hw/ppc_mac.h
@@ -77,46 +77,4 @@ void macio_nvram_setup_bar(MacIONVRAMState *s, MemoryRegion 
*bar,
 void pmac_format_nvram_partition (MacIONVRAMState *nvr, int len);
 uint32_t macio_nvram_read (void *opaque, uint32_t addr);
 void macio_nvram_write (void *opaque, uint32_t addr, uint32_t val);
-
-/* adb.c */
-
-#define MAX_ADB_DEVICES 16
-
-#define ADB_MAX_OUT_LEN 16
-
-typedef struct ADBDevice ADBDevice;
-
-/* buf = NULL means polling */
-typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out,
-  const uint8_t *buf, int len);
-typedef int ADBDeviceReset(ADBDevice *d);
-
-struct ADBDevice {
-struct ADBBusState *bus;
-int devaddr;
-int handler;
-ADBDeviceRequest *devreq;
-ADBDeviceReset *devreset;
-void *opaque;
-};
-
-typedef struct ADBBusState {
-ADBDevice devices[MAX_ADB_DEVICES];
-int nb_devices;
-int poll_index;
-} ADBBusState;
-
-int adb_request(ADBBusState *s, uint8_t *buf_out,
-const uint8_t *buf, int len);
-int adb_poll(ADBBusState *s, uint8_t *buf_out);
-
-ADBDevice *adb_register_device(ADBBusState *s, int devaddr,
-   ADBDeviceRequest *devreq,
-   ADBDeviceReset *devreset,
-   void *opaque);
-void adb_kbd_init(ADBBusState *bus);
-void adb_mouse_init(ADBBusState *bus);
-
-extern ADBBusState adb_bus;
-
 #endif /* !defined(__PPC_MAC_H__) */
diff --git a

[Qemu-devel] [PATCH 35/64] PPC: SPAPR: Use KVM function for time info

2011-10-06 Thread Alexander Graf

One of the things we can't fake on PPC is the timer speed. So
we need to extract the frequency information from the host and
put it back into the guest device tree.

Luckily, we already have functions for that from the non-pseries
targets, so all we need to do is to connect the dots and the guest
suddenly gets to know its real timer speeds.

Signed-off-by: Alexander Graf 
---
 hw/spapr.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index c5c9a95..760e323 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -140,6 +140,8 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
 char *nodename;
 uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
0x, 0x};
+uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq() : TIMEBASE_FREQ;
+uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 10;
 
 if (asprintf(&nodename, "%s@%x", modelname, index) < 0) {
 fprintf(stderr, "Allocation failure\n");
@@ -158,10 +160,8 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
 env->dcache_line_size)));
 _FDT((fdt_property_cell(fdt, "icache-block-size",
 env->icache_line_size)));
-_FDT((fdt_property_cell(fdt, "timebase-frequency", TIMEBASE_FREQ)));
-/* Hardcode CPU frequency for now.  It's kind of arbitrary on
- * full emu, for kvm we should copy it from the host */
-_FDT((fdt_property_cell(fdt, "clock-frequency", 10)));
+_FDT((fdt_property_cell(fdt, "timebase-frequency", tbfreq)));
+_FDT((fdt_property_cell(fdt, "clock-frequency", cpufreq)));
 _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr)));
 _FDT((fdt_property(fdt, "ibm,pft-size",
pft_size_prop, sizeof(pft_size_prop;
-- 
1.6.0.2

[Qemu-devel] [PATCH 53/64] openpic: Unfold write_IRQreg

2011-10-06 Thread Alexander Graf

The helper function write_IRQreg was always called with a specific argument on
the type of register to access. Inside the function we were simply doing a
switch on that constant argument again. It's a lot easier to just unfold this
into two separate functions and call each individually.

Reported-by: Blue Swirl 
Signed-off-by: Alexander Graf 
---
 hw/openpic.c |   79 +++--
 1 files changed, 37 insertions(+), 42 deletions(-)

diff --git a/hw/openpic.c b/hw/openpic.c
index fbd8837..43b8f27 100644
--- a/hw/openpic.c
+++ b/hw/openpic.c
@@ -482,30 +482,25 @@ static inline uint32_t read_IRQreg_ipvp(openpic_t *opp, 
int n_IRQ)
 return opp->src[n_IRQ].ipvp;
 }
 
-static inline void write_IRQreg (openpic_t *opp, int n_IRQ,
- uint32_t reg, uint32_t val)
+static inline void write_IRQreg_ide(openpic_t *opp, int n_IRQ, uint32_t val)
 {
 uint32_t tmp;
 
-switch (reg) {
-case IRQ_IPVP:
-/* NOTE: not fully accurate for special IRQs, but simple and
-   sufficient */
-/* ACTIVITY bit is read-only */
-opp->src[n_IRQ].ipvp =
-(opp->src[n_IRQ].ipvp & 0x4000) |
-(val & 0x800F00FF);
-openpic_update_irq(opp, n_IRQ);
-DPRINTF("Set IPVP %d to 0x%08x -> 0x%08x\n",
-n_IRQ, val, opp->src[n_IRQ].ipvp);
-break;
-case IRQ_IDE:
-tmp = val & 0xC000;
-tmp |= val & ((1ULL << MAX_CPU) - 1);
-opp->src[n_IRQ].ide = tmp;
-DPRINTF("Set IDE %d to 0x%08x\n", n_IRQ, opp->src[n_IRQ].ide);
-break;
-}
+tmp = val & 0xC000;
+tmp |= val & ((1ULL << MAX_CPU) - 1);
+opp->src[n_IRQ].ide = tmp;
+DPRINTF("Set IDE %d to 0x%08x\n", n_IRQ, opp->src[n_IRQ].ide);
+}
+
+static inline void write_IRQreg_ipvp(openpic_t *opp, int n_IRQ, uint32_t val)
+{
+/* NOTE: not fully accurate for special IRQs, but simple and sufficient */
+/* ACTIVITY bit is read-only */
+opp->src[n_IRQ].ipvp = (opp->src[n_IRQ].ipvp & 0x4000)
+ | (val & 0x800F00FF);
+openpic_update_irq(opp, n_IRQ);
+DPRINTF("Set IPVP %d to 0x%08x -> 0x%08x\n", n_IRQ, val,
+opp->src[n_IRQ].ipvp);
 }
 
 #if 0 // Code provision for Intel model
@@ -535,10 +530,10 @@ static void write_doorbell_register (penpic_t *opp, int 
n_dbl,
 {
 switch (offset) {
 case DBL_IVPR_OFFSET:
-write_IRQreg(opp, IRQ_DBL0 + n_dbl, IRQ_IPVP, value);
+write_IRQreg_ipvp(opp, IRQ_DBL0 + n_dbl, value);
 break;
 case DBL_IDE_OFFSET:
-write_IRQreg(opp, IRQ_DBL0 + n_dbl, IRQ_IDE, value);
+write_IRQreg_ide(opp, IRQ_DBL0 + n_dbl, value);
 break;
 case DBL_DMR_OFFSET:
 opp->doorbells[n_dbl].dmr = value;
@@ -576,10 +571,10 @@ static void write_mailbox_register (openpic_t *opp, int 
n_mbx,
 opp->mailboxes[n_mbx].mbr = value;
 break;
 case MBX_IVPR_OFFSET:
-write_IRQreg(opp, IRQ_MBX0 + n_mbx, IRQ_IPVP, value);
+write_IRQreg_ipvp(opp, IRQ_MBX0 + n_mbx, value);
 break;
 case MBX_DMR_OFFSET:
-write_IRQreg(opp, IRQ_MBX0 + n_mbx, IRQ_IDE, value);
+write_IRQreg_ide(opp, IRQ_MBX0 + n_mbx, value);
 break;
 }
 }
@@ -636,7 +631,7 @@ static void openpic_gbl_write (void *opaque, 
target_phys_addr_t addr, uint32_t v
 {
 int idx;
 idx = (addr - 0x10A0) >> 4;
-write_IRQreg(opp, opp->irq_ipi0 + idx, IRQ_IPVP, val);
+write_IRQreg_ipvp(opp, opp->irq_ipi0 + idx, val);
 }
 break;
 case 0x10E0: /* SPVE */
@@ -729,10 +724,10 @@ static void openpic_timer_write (void *opaque, uint32_t 
addr, uint32_t val)
 opp->timers[idx].tibc = val;
 break;
 case 0x20: /* TIVP */
-write_IRQreg(opp, opp->irq_tim0 + idx, IRQ_IPVP, val);
+write_IRQreg_ipvp(opp, opp->irq_tim0 + idx, val);
 break;
 case 0x30: /* TIDE */
-write_IRQreg(opp, opp->irq_tim0 + idx, IRQ_IDE, val);
+write_IRQreg_ide(opp, opp->irq_tim0 + idx, val);
 break;
 }
 }
@@ -782,10 +777,10 @@ static void openpic_src_write (void *opaque, uint32_t 
addr, uint32_t val)
 idx = addr >> 5;
 if (addr & 0x10) {
 /* EXDE / IFEDE / IEEDE */
-write_IRQreg(opp, idx, IRQ_IDE, val);
+write_IRQreg_ide(opp, idx, val);
 } else {
 /* EXVP / IFEVP / IEEVP */
-write_IRQreg(opp, idx, IRQ_IPVP, val);
+write_IRQreg_ipvp(opp, idx, val);
 }
 }
 
@@ -835,8 +830,8 @@ static void openpic_cpu_write_internal(void *opaque, 
target_phys_addr_t addr,
 case 0x70:
 idx = (addr - 0x40) >> 4;
 /* we use IDE as mask which CPUs to deliver the IPI to still. */
-write_IRQreg(opp, opp->irq_ipi0 + idx, IRQ_IDE,
- opp->src[opp->irq_ipi0 + idx].ide | val);
+write_IRQreg_ide(opp, opp->irq_ipi0 + idx,
+ opp->src[opp

[Qemu-devel] [PATCH 46/64] ppc: booke206: use MAV=2.0 TSIZE definition, fix 4G pages

2011-10-06 Thread Alexander Graf

From: Scott Wood 

This definition is backward compatible with MAV=1.0 as long as
the guest does not set reserved bits in MAS1/MAS4.

Also, fix the shift in booke206_tlb_to_page_size -- it's the base
that should be able to hold a 4G page size, not the shift count.

Signed-off-by: Scott Wood 
Signed-off-by: Alexander Graf 
---
 hw/ppce500_mpc8544ds.c |2 +-
 target-ppc/cpu.h   |4 ++--
 target-ppc/helper.c|5 +++--
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c
index 61151d8..8095516 100644
--- a/hw/ppce500_mpc8544ds.c
+++ b/hw/ppce500_mpc8544ds.c
@@ -174,7 +174,7 @@ out:
 /* Create -kernel TLB entries for BookE, linearly spanning 256MB.  */
 static inline target_phys_addr_t booke206_page_size_to_tlb(uint64_t size)
 {
-return (ffs(size >> 10) - 1) >> 1;
+return ffs(size >> 10) - 1;
 }
 
 static void mmubooke_create_initial_mapping(CPUState *env,
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 5200e6e..32706df 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -667,8 +667,8 @@ enum {
 #define MAS0_ATSEL_TLB 0
 #define MAS0_ATSEL_LRATMAS0_ATSEL
 
-#define MAS1_TSIZE_SHIFT   8
-#define MAS1_TSIZE_MASK(0xf << MAS1_TSIZE_SHIFT)
+#define MAS1_TSIZE_SHIFT   7
+#define MAS1_TSIZE_MASK(0x1f << MAS1_TSIZE_SHIFT)
 
 #define MAS1_TS_SHIFT  12
 #define MAS1_TS(1 << MAS1_TS_SHIFT)
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 4b3731e..6339be3 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -1293,7 +1293,7 @@ target_phys_addr_t booke206_tlb_to_page_size(CPUState 
*env, ppcmas_tlb_t *tlb)
 {
 uint32_t tlbncfg;
 int tlbn = booke206_tlbm_to_tlbn(env, tlb);
-target_phys_addr_t tlbm_size;
+int tlbm_size;
 
 tlbncfg = env->spr[SPR_BOOKE_TLB0CFG + tlbn];
 
@@ -1301,9 +1301,10 @@ target_phys_addr_t booke206_tlb_to_page_size(CPUState 
*env, ppcmas_tlb_t *tlb)
 tlbm_size = (tlb->mas1 & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT;
 } else {
 tlbm_size = (tlbncfg & TLBnCFG_MINSIZE) >> TLBnCFG_MINSIZE_SHIFT;
+tlbm_size <<= 1;
 }
 
-return (1 << (tlbm_size << 1)) << 10;
+return 1024ULL << tlbm_size;
 }
 
 /* TLB check function for MAS based SoftTLBs */
-- 
1.6.0.2

[Qemu-devel] [PATCH 57/64] KVM: Update kernel headers

2011-10-06 Thread Alexander Graf

Removes ABI-breaking HIOR parts - KVM patch to follow.

Signed-off-by: Alexander Graf 
---
 linux-headers/asm-powerpc/kvm.h |8 
 linux-headers/linux/kvm.h   |1 -
 2 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h
index 28eecf0..25964ee 100644
--- a/linux-headers/asm-powerpc/kvm.h
+++ b/linux-headers/asm-powerpc/kvm.h
@@ -149,12 +149,6 @@ struct kvm_regs {
 #define KVM_SREGS_E_UPDATE_DBSR(1 << 3)
 
 /*
- * Book3S special bits to indicate contents in the struct by maintaining
- * backwards compatibility with older structs. If adding a new field,
- * please make sure to add a flag for that new field */
-#define KVM_SREGS_S_HIOR   (1 << 0)
-
-/*
  * In KVM_SET_SREGS, reserved/pad fields must be left untouched from a
  * previous KVM_GET_REGS.
  *
@@ -179,8 +173,6 @@ struct kvm_sregs {
__u64 ibat[8];
__u64 dbat[8];
} ppc32;
-   __u64 flags; /* KVM_SREGS_S_ */
-   __u64 hior;
} s;
struct {
union {
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 8bb6cde..6f5095c 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -554,7 +554,6 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_SMT 64
 #define KVM_CAP_PPC_RMA65
 #define KVM_CAP_MAX_VCPUS 66   /* returns max vcpus per vm */
-#define KVM_CAP_PPC_HIOR 67
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_SW_TLB 69
 
-- 
1.6.0.2

[Qemu-devel] [PATCH 21/64] PPC: KVM: Add stubs for kvm helper functions

2011-10-06 Thread Alexander Graf

We have a bunch of helper functions that don't have any stubs for them in case
we don't have CONFIG_KVM enabled. That didn't bite us so far, because gcc can
optimize them out pretty well, but we should really provide them.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

   - use uint64_t for clockfreq
---
 target-ppc/kvm_ppc.h |   26 ++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 0c659c8..76f98d9 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -11,11 +11,37 @@
 
 void kvmppc_init(void);
 
+#ifdef CONFIG_KVM
+
 uint32_t kvmppc_get_tbfreq(void);
 uint64_t kvmppc_get_clockfreq(void);
 int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len);
 int kvmppc_set_interrupt(CPUState *env, int irq, int level);
 
+#else
+
+static inline uint32_t kvmppc_get_tbfreq(void)
+{
+return 0;
+}
+
+static inline uint64_t kvmppc_get_clockfreq(void)
+{
+return 0;
+}
+
+static inline int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int 
buf_len)
+{
+return -1;
+}
+
+static inline int kvmppc_set_interrupt(CPUState *env, int irq, int level)
+{
+return -1;
+}
+
+#endif
+
 #ifndef CONFIG_KVM
 #define kvmppc_eieio() do { } while (0)
 #else
-- 
1.6.0.2

[Qemu-devel] [PATCH 42/64] pseries: use macro for firmware filename

2011-10-06 Thread Alexander Graf

From: Nishanth Aravamudan 

For some time we've had a nicely defined macro with the filename for our
firmware image.  However we didn't actually use it in the place we're
supposed to.  This patch fixes it.

Signed-off-by: Nishanth Aravamudan 
Signed-off-by: David Gibson 
Signed-off-by: Alexander Graf 
---
 hw/spapr.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 00aed62..91953cf 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -442,7 +442,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 "%ldM guest RAM\n", MIN_RAM_SLOF);
 exit(1);
 }
-filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "slof.bin");
+filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, FW_FILE_NAME);
 fw_size = load_image_targphys(filename, 0, FW_MAX_SIZE);
 if (fw_size < 0) {
 hw_error("qemu: could not load LPAR rtas '%s'\n", filename);
-- 
1.6.0.2

[Qemu-devel] [PATCH 31/64] PPC: E500: Bump CPU count to 15

2011-10-06 Thread Alexander Graf

Now that we have everything in place, make the machine description
aware of the fact that we can now handle 15 virtual CPUs!

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Max cpus is 15 because of MPIC
---
 hw/ppce500_mpc8544ds.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c
index dfa8034..b86a008 100644
--- a/hw/ppce500_mpc8544ds.c
+++ b/hw/ppce500_mpc8544ds.c
@@ -396,6 +396,7 @@ static QEMUMachine mpc8544ds_machine = {
 .name = "mpc8544ds",
 .desc = "mpc8544ds",
 .init = mpc8544ds_init,
+.max_cpus = 15,
 };
 
 static void mpc8544ds_machine_init(void)
-- 
1.6.0.2

[Qemu-devel] [PATCH 23/64] PPC: E500: Remove unneeded CPU nodes

2011-10-06 Thread Alexander Graf

We should only keep CPU nodes in the device tree around that we really have
virtual CPUs for. So remove all superfluous entries that we just keep there
in case someone wants to create a lot of vCPUs.

Signed-off-by: Alexander Graf 
---
 hw/ppce500_mpc8544ds.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c
index 0791e27..9379624 100644
--- a/hw/ppce500_mpc8544ds.c
+++ b/hw/ppce500_mpc8544ds.c
@@ -129,6 +129,12 @@ static int mpc8544_load_device_tree(CPUState *env,
 qemu_devtree_setprop_cell(fdt, cpu_name, "timebase-frequency", 
tb_freq);
 }
 
+for (i = smp_cpus; i < 32; i++) {
+char cpu_name[128];
+snprintf(cpu_name, sizeof(cpu_name), "/cpus/PowerPC,8544@%x", i);
+qemu_devtree_nop_node(fdt, cpu_name);
+}
+
 ret = rom_add_blob_fixed(BINARY_DEVICE_TREE_FILE, fdt, fdt_size, addr);
 g_free(fdt);
 
-- 
1.6.0.2

1 2 >

1 - 100 of 152 matches

Mail list logo