Re: [Qemu-devel] [0/4] pseries: Support and improvements for KVM Book3S-HV support (v2)
On 30.09.2011, at 09:39, David Gibson wrote: > Alex Graf has added support for KVM acceleration of the pseries > machine, using his Book3S-PR KVM variant, which runs the guest in > userspace, emulating supervisor operations. Recent kernels now have > the Book3S-HV KVM variant which uses the hardware hypervisor features > of recent POWER CPUs. Alex's changes to qemu are enough to get qemu > working roughly with Book3S-HV, but taking full advantage of this mode > needs more work. This patch series makes a start on better exploiting > Book3S-HV. > > Even with these patches, qemu won't quite be able to run on a current > Book3S-HV KVM kernel. That's because current Book3S-HV requires guest > memory to be backed by hugepages, but qemu refuses to use hugepages > for guest memory unless KVM advertises CAP_SYNC_MMU, which Book3S-HV > does not currently do. We're working on improvements to the KVM code > which will implement CAP_SYNC_MMU and allow smallpage backing of > guests, but they're not there yet. So, in order to test Book3S-HV for > now you need to either: > > * Hack the host kernel to lie and advertise CAP_SYNC_MMU even though > it doesn't really implement it. > > or > > * Hack qemu so it does not check for CAP_SYNC_MMU when the -mem-path > option is used. > > Bot approaches are ugly and unsafe, but it seems we can generally get > away with it in practice. Obviously this is only an interim hack > until the proper CAP_SYNC_MMU support is ready. I would prefer the latter. We could even #ifdef it for TARGET_PPC. Alex
Re: [Qemu-devel] [PATCH v2 2/2] ppc/e500_pci: Fix an array overflow issue
On 30.09.2011, at 05:52, Liu Yu wrote: > When access PPCE500_PCI_IW1 the previous index get overflow. > The patch fix the issue and update all to keep consistent style. > > Signed-off-by: Liu Yu Thanks, applied both to my local ppc-next tree. Will push once Blue pulled the request. Alex
[Qemu-devel] [PATCH 2/2] qemu-options.hx: Update virtfs command documentation
Clarify the virtfs option better Updates from:Sripathi Kodi Signed-off-by: Aneesh Kumar K.V --- qemu-options.hx | 119 --- 1 files changed, 69 insertions(+), 50 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index 38f0aef..6c744e0 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -522,43 +522,61 @@ possible drivers and properties, use @code{-device ?} and @code{-device @var{driver},?}. ETEXI +DEFHEADING() + DEFHEADING(File system options:) DEF("fsdev", HAS_ARG, QEMU_OPTION_fsdev, -"-fsdev local,id=id,path=path,security_model=[mapped|passthrough|none]\n" +"-fsdev fsdriver,id=id,path=path,security_model=[mapped|passthrough|none]\n" " [,cache=writethrough]\n", QEMU_ARCH_ALL) STEXI -The general form of a File system device option is: -@table @option - -@item -fsdev @var{fstype} ,id=@var{id} [,@var{options}] +@item -fsdev @var{fsdriver},id=@var{id},path=@var{path},security_model=@var{security_model}[,cache=@var{cache}] @findex -fsdev -Fstype is one of: -@option{local}, -The specific Fstype will determine the applicable options. - -Options to each backend are described below. - -@item -fsdev local ,id=@var{id} ,path=@var{path} ,security_model=@var{security_model}[,cache=@var{cache}] - -Create a file-system-"device" for local-filesystem. - -@option{local} is only available on Linux. - -@option{path} specifies the path to be exported. @option{path} is required. - -@option{security_model} specifies the security model to be followed. -@option{security_model} is required. - -@option{cache} specifies whether to skip the host page cache. -@option{cache} is an optional argument. +Define a new file system device. Valid options are: +@table @option +@item @var{fsdriver} +This option specifies the fs driver backend to use. +Currently "local" and "handle" file system drivers are supported. +@item id=@var{id} +Specifies identifier for this device +@item path=@var{path} +Specifies the export path for the file system device. Files under +this path will be available to the 9p client on the guest. +@item security_model=@var{security_model} +Specifies the security model to be used for this export path. +Supported security models are "passthrough", "mapped" and "none". +In "passthrough" security model, files are stored using the same +credentials as they are created on the guest. This requires qemu +to run as root. In "mapped" security model, some of the file +attributes like uid, gid, mode bits and link target are stored as +file attributes. Directories exported by this security model cannot +interact with other unix tools. "none" security model is same as +passthrough except the sever won't report failures if it fails to +set file attributes like ownership. +@item cache=@var{cache} +This is an optional argument. The only supported value is "writethrough". +This means that host page cache will be used to read and write data but +write notification will be sent to the guest only when the data has been +reported as written by the storage subsystem. +@end table +-fsdev option is used along with -device driver "virtio-9p-pci". +@item -device virtio-9p-pci,fsdev=@var{id},mount_tag=@var{mount_tag} +Options for virtio-9p-pci driver are: +@table @option +@item fsdev=@var{id} +Specifies the id value specified along with -fsdev option +@item mount_tag=@var{mount_tag} +Specifies the tag name to be used by the guest to mount this export point @end table + ETEXI +DEFHEADING() + DEFHEADING(Virtual File system pass-through options:) DEF("virtfs", HAS_ARG, QEMU_OPTION_virtfs, @@ -568,34 +586,35 @@ DEF("virtfs", HAS_ARG, QEMU_OPTION_virtfs, STEXI -The general form of a Virtual File system pass-through option is: -@table @option - -@item -virtfs @var{fstype} [,@var{options}] +@item -virtfs @var{fsdriver},path=@var{path},mount_tag=@var{mount_tag},security_model=@var{security_model}[,cache=@var{cache}] @findex -virtfs -Fstype is one of: -@option{local}, -The specific Fstype will determine the applicable options. - -Options to each backend are described below. - -@item -virtfs local ,path=@var{path} ,mount_tag=@var{mount_tag} ,security_model=@var{security_model}[,cache=@var{cache}] - -Create a Virtual file-system-pass through for local-filesystem. - -@option{local} is only available on Linux. - -@option{path} specifies the path to be exported. @option{path} is required. - -@option{security_model} specifies the security model to be followed. -@option{security_model} is required. - -@option{mount_tag} specifies the tag with which the exported file is mounted. -@option{mount_tag} is required. - -@option{cache} specifies whether to skip the host page cache. -@option{cache} is an optional argument. +The general form of a Virtual File system pass-through options are: +@table @option +@item @var{fsdriver} +This option specifies the fs driver backend to use. +Currently "local" and "handle" file system drivers are supported.
[Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
cache=writethrough implies the file are opened in the host with O_SYNC open flag Signed-off-by: Aneesh Kumar K.V --- fsdev/file-op-9p.h |1 + fsdev/qemu-fsdev.c | 10 -- fsdev/qemu-fsdev.h |2 ++ hw/9pfs/virtio-9p-device.c |5 + hw/9pfs/virtio-9p.c| 24 ++-- qemu-config.c |6 ++ qemu-options.hx| 17 - vl.c |6 ++ 8 files changed, 58 insertions(+), 13 deletions(-) diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 8de8abf..5d088d4 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -59,6 +59,7 @@ typedef struct FsContext char *fs_root; SecModel fs_sm; uid_t uid; +int open_flags; struct xattr_operations **xops; /* fs driver specific data */ void *private; diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 768819f..fce016b 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts) const char *fstype = qemu_opt_get(opts, "fstype"); const char *path = qemu_opt_get(opts, "path"); const char *sec_model = qemu_opt_get(opts, "security_model"); +const char *cache = qemu_opt_get(opts, "cache"); + if (!fsdev_id) { fprintf(stderr, "fsdev: No id specified\n"); @@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts) fsle->fse.path = g_strdup(path); fsle->fse.security_model = g_strdup(sec_model); fsle->fse.ops = FsTypes[i].ops; - +fsle->fse.cache_model = 0; +if (cache) { +if (!strcmp(cache, "writethrough")) { +fsle->fse.cache_model = V9FS_WRITETHROUGH_CACHE; +} +} QTAILQ_INSERT_TAIL(&fstype_entries, fsle, next); return 0; - } FsTypeEntry *get_fsdev_fsentry(char *id) diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index e04931a..4e53966 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -34,6 +34,7 @@ typedef struct FsTypeTable { FileOperations *ops; } FsTypeTable; +#define V9FS_WRITETHROUGH_CACHE 0x1 /* * Structure to store the various fsdev's passed through command line. */ @@ -41,6 +42,7 @@ typedef struct FsTypeEntry { char *fsdev_id; char *path; char *security_model; +int cache_model; FileOperations *ops; } FsTypeEntry; diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index e5b68da..a267f01 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -115,6 +115,11 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) exit(1); } +if (fse->cache_model & V9FS_WRITETHROUGH_CACHE) { +s->ctx.open_flags = O_SYNC; +} else { +s->ctx.open_flags = 0; +} s->ctx.fs_root = g_strdup(fse->path); len = strlen(conf->tag); if (len > MAX_TAG_LEN) { diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c index c01c31a..1ca3c8e 100644 --- a/hw/9pfs/virtio-9p.c +++ b/hw/9pfs/virtio-9p.c @@ -80,6 +80,22 @@ void cred_init(FsCred *credp) credp->fc_rdev = -1; } +static int get_dotl_openflags(V9fsState *s, int oflags) +{ +int flags; +/* + * Filter the client open flags + */ +flags = s->ctx.open_flags; +flags |= oflags; +flags &= ~(O_NOCTTY | O_ASYNC | O_CREAT); +/* + * Ignore direct disk access hint until the server supports it. + */ +flags &= ~O_DIRECT; +return flags; +} + void v9fs_string_init(V9fsString *str) { str->data = NULL; @@ -1598,10 +1614,7 @@ static void v9fs_open(void *opaque) err = offset; } else { if (s->proto_version == V9FS_PROTO_2000L) { -flags = mode; -flags &= ~(O_NOCTTY | O_ASYNC | O_CREAT); -/* Ignore direct disk access hint until the server supports it. */ -flags &= ~O_DIRECT; +flags = get_dotl_openflags(s, mode); } else { flags = omode_to_uflags(mode); } @@ -1650,8 +1663,7 @@ static void v9fs_lcreate(void *opaque) goto out_nofid; } -/* Ignore direct disk access hint until the server supports it. */ -flags &= ~O_DIRECT; +flags = get_dotl_openflags(pdu->s, flags); err = v9fs_co_open2(pdu, fidp, &name, gid, flags | O_CREAT, mode, &stbuf); if (err < 0) { diff --git a/qemu-config.c b/qemu-config.c index 7a7854f..b2ab0b2 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -177,6 +177,9 @@ QemuOptsList qemu_fsdev_opts = { }, { .name = "security_model", .type = QEMU_OPT_STRING, +}, { +.name = "cache", +.type = QEMU_OPT_STRING, }, { /*End of list */ } }, @@ -199,6 +202,9 @@ QemuOptsList qemu_virtfs_opts = { }, { .name = "security_model", .type = QEMU_OPT_STRING, +}, { +.name = "cache", +.type = QEMU_OPT_STRING,
Re: [Qemu-devel] [PATCH 4/5] savevm: improve subsections detection on load
On 10/07/2011 12:42 AM, Juan Quintela wrote: > This changes semantics for reads above 32KB. It should be in the > commit message, or preferably v1 could be committed instead.:) how it changes? My understanding is that we read the same, only change that I can think of is the one that I have jsut shown (and that is on the error case). Yes, you're right. Paolo
[Qemu-devel] [PATCH] hw/9pfs: Use ioeventfd for 9p
With ioeventfd: [root@qemu-img-64 storage]# dd if=/dev/zero of=/storage/testx bs=8k count=131072 oflag=direct 131072+0 records in 131072+0 records out 1073741824 bytes (1.1 GB) copied, 26.767 s, 40.1 MB/s Without: [root@qemu-img-64 storage]# dd if=/dev/zero of=/storage/testx bs=8k count=131072 oflag=direct 131072+0 records in 131072+0 records out 1073741824 bytes (1.1 GB) copied, 65.3361 s, 16.4 MB/s Signed-off-by: Aneesh Kumar K.V --- hw/9pfs/virtio-9p-device.c |2 ++ hw/virtio-pci.c|5 - hw/virtio-pci.h|5 + 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index 513e181..e5b68da 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -169,6 +169,8 @@ static PCIDeviceInfo virtio_9p_info = { .revision = VIRTIO_PCI_ABI_VERSION, .class_id = 0x2, .qdev.props = (Property[]) { +DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags, +VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), DEFINE_PROP_STRING("mount_tag", VirtIOPCIProxy, fsconf.tag), diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index df27c19..ca5923c 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -83,11 +83,6 @@ /* Flags track per-device state like workarounds for quirks in older guests. */ #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG (1 << 0) -/* Performance improves when virtqueue kick processing is decoupled from the - * vcpu thread using ioeventfd for some devices. */ -#define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1 -#define VIRTIO_PCI_FLAG_USE_IOEVENTFD (1 << VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT) - /* QEMU doesn't strictly need write barriers since everything runs in * lock-step. We'll leave the calls to wmb() in though to make it obvious for * KVM or if kqemu gets SMP support. diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h index 14c10f7..f8404de 100644 --- a/hw/virtio-pci.h +++ b/hw/virtio-pci.h @@ -18,6 +18,11 @@ #include "virtio-net.h" #include "virtio-serial.h" +/* Performance improves when virtqueue kick processing is decoupled from the + * vcpu thread using ioeventfd for some devices. */ +#define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1 +#define VIRTIO_PCI_FLAG_USE_IOEVENTFD (1 << VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT) + typedef struct { PCIDevice pci_dev; VirtIODevice *vdev; -- 1.7.4.1
[Qemu-devel] [PATCH] qemu-char: Fix use of free() instead of g_free()
cppcheck reported these errors: qemu-char.c:1667: error: Mismatching allocation and deallocation: s qemu-char.c:1668: error: Mismatching allocation and deallocation: chr qemu-char.c:1769: error: Mismatching allocation and deallocation: s qemu-char.c:1770: error: Mismatching allocation and deallocation: chr Signed-off-by: Stefan Weil --- qemu-char.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/qemu-char.c b/qemu-char.c index 09d2309..e1b2b87 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -1664,8 +1664,8 @@ static int qemu_chr_open_win(QemuOpts *opts, CharDriverState **_chr) chr->chr_close = win_chr_close; if (win_chr_init(chr, filename) < 0) { -free(s); -free(chr); +g_free(s); +g_free(chr); return -EIO; } qemu_chr_generic_open(chr); @@ -1766,8 +1766,8 @@ static int qemu_chr_open_win_pipe(QemuOpts *opts, CharDriverState **_chr) chr->chr_close = win_chr_close; if (win_chr_pipe_init(chr, filename) < 0) { -free(s); -free(chr); +g_free(s); +g_free(chr); return -EIO; } qemu_chr_generic_open(chr); -- 1.7.2.5
[Qemu-devel] [PATCH] block/qcow: Fix use of free() instead of g_free()
cppcheck reported this error: qemu/block/qcow.c:599: error: Mismatching allocation and deallocation: cluster_data Signed-off-by: Stefan Weil --- block/qcow.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/block/qcow.c b/block/qcow.c index c8bfecc..eba5a04 100644 --- a/block/qcow.c +++ b/block/qcow.c @@ -596,7 +596,7 @@ static int qcow_co_writev(BlockDriverState *bs, int64_t sector_num, if (qiov->niov > 1) { qemu_vfree(orig_buf); } -free(cluster_data); +g_free(cluster_data); return ret; } -- 1.7.2.5
Re: [Qemu-devel] QEMU + ARMMP11Core combination does not work
Hi Peter, Thanks for reply. I seek help from you because I am sort of stuck. As kernel image does not get executed by QEMU. I tried single stepping but I could not debug the initial assembly code. I do not know why. My breakpoint never hits. I would like to use ARM11MPcore because my software will be running on ARM11MPCOre when the hardware is available. Can you please help me in resolving this? If you suspect the configuration is having problem, then can you please share me the configuration file which you test for kernel 2.6.39.3 or let me know what are the mandatary changes for the kernel? I used the default file configuration which is present in the kernel itself i.e. realview-smp-defconfig under arch/arm/configs. I did not make any changes to this configuration and expected that it will work. Can you please comment on this and let me know how can I proceed? It would be good if you could also try this at your end. Thanks & Regards, Tushar On Thu, 06 Oct 2011 13:43:52 +0530 wrote >On 6 October 2011 04:43, TusharK wrote: > (1) Does your kernel boot on the real hardware? > I do not have real hardware to test my kernel. But what I did was, I > downloaded pre-built kernel image from > http://code.google.com/p/smp-on-qemu/downloads/list website and > tried to run using QEMU, it boots but my kernel 2.6.39.3 does not boot. If somebody else's kernel boots but yours does not then the chances are very high that there is a problem with your kernel (probably a wrong config) which you'll need to debug the same way you'd debug this kind of misconfiguration on real hardware. Connecting an ARM gdb up to qemu and singlestepping kernel startup may be helpful. > Even decompressing kernel print itself is not coming and hence > I suspect something is wrong with wither kernel or QEMU. If there's no output of the "Uncompressing the kernel" message this is almost certainly a kernel configuration or compilation problem -- QEMU's serial port code is pretty heavily tested. (Why are you using the 11MPCore model anyway, just out of interest?) -- PMM
Re: [Qemu-devel] [PATCH] qemu: new option for snapshot_blkdev to avoid image creation
On 10/03/2011 06:09 PM, Federico Simoncelli wrote: Add the new option [-n] for snapshot_blkdev to avoid the image creation. The file provided as [new-image-file] is considered as already initialized and will be used after passing a check for the backing file. Seems ok to me as a way to go around fdget and still have selinux gain. Worth to get Kevin's view too. Federico, would you like to ack or extend the design: http://wiki.qemu.org/Features/Snapshots Signed-off-by: Federico Simoncelli --- blockdev.c | 54 -- hmp-commands.hx |7 --- qmp-commands.hx |4 ++-- 3 files changed, 58 insertions(+), 7 deletions(-) diff --git a/blockdev.c b/blockdev.c index 0827bf7..bd46808 100644 --- a/blockdev.c +++ b/blockdev.c @@ -550,8 +550,53 @@ void do_commit(Monitor *mon, const QDict *qdict) } } +static int check_snapshot_file(const char *filename, const char *oldfilename, + int flags, BlockDriver *drv) +{ +BlockDriverState *bs; +char bak_filename[1024], *abs_filename; +int ret = 0; + +bs = bdrv_new(""); +if (!bs) { +return -1; +} + +ret = bdrv_open(bs, filename, flags, drv); +if (ret) { +qerror_report(QERR_OPEN_FILE_FAILED, filename); +goto err0; +} + +if (bs->backing_file) { +path_combine(bak_filename, sizeof(bak_filename), + filename, bs->backing_file); + +abs_filename = realpath(bak_filename, NULL); +if (!abs_filename) { +ret = -1; +goto err1; +} + +if (strcmp(abs_filename, oldfilename)) { +qerror_report(QERR_OPEN_FILE_FAILED, filename); +ret = -1; +} + +free(abs_filename); +} + +err1: +bdrv_close(bs); + +err0: +bdrv_delete(bs); +return ret; +} + int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data) { +const int nocreate = qdict_get_try_bool(qdict, "nocreate", 0); const char *device = qdict_get_str(qdict, "device"); const char *filename = qdict_get_try_str(qdict, "snapshot-file"); const char *format = qdict_get_try_str(qdict, "format"); @@ -597,8 +642,13 @@ int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data) goto out; } -ret = bdrv_img_create(filename, format, bs->filename, - bs->drv->format_name, NULL, -1, flags); +if (nocreate) { +ret = check_snapshot_file(filename, bs->filename, flags, drv); +} else { +ret = bdrv_img_create(filename, format, bs->filename, + bs->drv->format_name, NULL, -1, flags); +} + if (ret) { goto out; } diff --git a/hmp-commands.hx b/hmp-commands.hx index 9e1cca8..eb9fcd4 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -840,11 +840,12 @@ ETEXI { .name = "snapshot_blkdev", -.args_type = "device:B,snapshot-file:s?,format:s?", -.params = "device [new-image-file] [format]", +.args_type = "nocreate:-n,device:B,snapshot-file:s?,format:s?", +.params = "[-n] device [new-image-file] [format]", .help = "initiates a live snapshot\n\t\t\t" "of device. If a new image file is specified, the\n\t\t\t" - "new image file will become the new root image.\n\t\t\t" + "new image file will be created (unless -n is\n\t\t\t" + "specified) and will become the new root image.\n\t\t\t" "If format is specified, the snapshot file will\n\t\t\t" "be created in that format. Otherwise the\n\t\t\t" "snapshot will be internal! (currently unsupported)", diff --git a/qmp-commands.hx b/qmp-commands.hx index d83bce5..7af36d8 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -695,8 +695,8 @@ EQMP { .name = "blockdev-snapshot-sync", -.args_type = "device:B,snapshot-file:s?,format:s?", -.params = "device [new-image-file] [format]", +.args_type = "nocreate:-n,device:B,snapshot-file:s?,format:s?", +.params = "[-n] device [new-image-file] [format]", .user_print = monitor_user_noop, .mhandler.cmd_new = do_snapshot_blkdev, },
Re: [Qemu-devel] [PATCH 4/5] savevm: improve subsections detection on load
Paolo Bonzini wrote: > On 10/06/2011 06:21 PM, Juan Quintela wrote: >> + >> +int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size) >> +{ >> +int pending = size; >> +int done = 0; >> + >> +while (pending> 0) { >> +int res; >> + >> +res = qemu_peek_buffer(f, buf, pending, 0); >> +if (res == 0) { >> +return 0; should this line return "done" insntead? >> } >> -memcpy(buf, f->buf + f->buf_index, l); >> -f->buf_index += l; >> -buf += l; >> -size -= l; >> +qemu_file_skip(f, res); >> +buf += res; >> +pending -= res; >> +done += res; >> } >> -return size1 - size; >> +return done; >> } > > This changes semantics for reads above 32KB. It should be in the > commit message, or preferably v1 could be committed instead. :) how it changes? My understanding is that we read the same, only change that I can think of is the one that I have jsut shown (and that is on the error case). Later, Juan.
Re: [Qemu-devel] [PATCH 3/5] savevm: define qemu_get_byte() using qemu_peek_byte()
Paolo Bonzini wrote: > On 10/06/2011 06:21 PM, Juan Quintela wrote: >> +result = qemu_peek_byte(f); >> + >> +if (f->buf_index< f->buf_size) { >> +f->buf_index++; >> } > > This should really be an assert that f->buf_index < f->buf_size, > otherwise qemu_peek_byte has read garbage. That is a change from current behaviour. qemu_get_byte() returns 0 in the case that there is nothing to read. Yes, it is ugly. Later, Juan.
Re: [Qemu-devel] qemu guest agent spins in poll/nanosleep(100ms) when nothing is listening on host
On Thu, 6 Oct 2011 12:31:05 +0100, "Daniel P. Berrange" wrote: > I've been doing some experimentation with the QEMU guest agent and have > noticed that when nothing is connected on the host side of the virtio > serial channel, the guest agent just spins in a pool/sleep(100ms) loop. > I know you'd ordinarily expect some mgmt app in the host to be listening > to the other end of the channel, but it still seems suboptimal to have > to spin in a loop like this when nothing is listening, constantly causing > wakeups in an otherwise idle guest. > > Looking at the qemu-ga.c code I see two places where it might handle > a poll event and then sleep, when nothing is on the other end of the > virtio serial socket. > > >case G_IO_STATUS_AGAIN: > /* virtio causes us to spin here when no process is attached to > * host-side chardev. sleep a bit to mitigate this > */ > if (s->virtio) { > usleep(100*1000); > } > return true; > > > > > } else if (strcmp(s->method, "virtio-serial") == 0) { > /* we spin on EOF for virtio-serial, so back off a bit. also, > * dont close the connection in this case, it'll resume normal > * operation when another process connects to host chardev > */ > usleep(100*1000); > goto out_noclose; > } > > > I get the feeling that this kind of problem inherant in the use of any > virtio-serial channel, in the same way you can't detect EOF for a regular > serial device channel either. Given that virtio-serial is a nice paravirt > device, is there anything we can do to it, to allow better handling of > EOF by applications ? Indeed, and there was a discussion a while back where I think we had tentative agreement on a path forward for this. Unfortunately there doesn't seem to be a clear solution for doing it purely in guest-userspace: http://www.mail-archive.com/qemu-devel@nongnu.org/msg57002.html The gist of it is basically making the (guest-side) virtio-serial chardev behave more like a unix socket, i.e. if the host hangs up you get a single EOF and then your FD becomes invalid, at which point you need to re-open the chardev to get a valid FD. This could potentially be done with via a new set of -chardev/-device flags. > > Or perhaps there is some way to make use of epoll() in edge-triggered > mode to detect it already, because IIUC, edge-triggered mode should only > fire once for the EOF condition, and then not fire again until something > in the host actually sends some data ? > > Of course glib's event loop doesn't support edge-triggered events/epoll, > but perhaps we could just call epoll() directly in the event handler, > instead of the usleep() call ? That's definitely worth looking into. Has the 100ms sleep been causing any issues though? My main concern with the polling behavior was less a matter of performance than being able to provide a "session" where the start and end of a stream could be reliably determined, which we don't have currently. But the guest agent has since been reworked to persist state between host connects/disconnects so it didn't seem to be a major issue anymore. > > Regards, > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| > -- Sincerely, Mike Roth IBM Linux Technology Center
Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
On 10/06/2011 02:04 PM, Anthony Liguori wrote: On 10/06/2011 11:41 AM, Daniel P. Berrange wrote: On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote: This patch adds a helper that can be used to create a tap device attached to a bridge device. Since this helper is minimal in what it does, it can be given CAP_NET_ADMIN which allows qemu to avoid running as root while still satisfying the majority of what users tend to want to do with tap devices. The way this all works is that qemu launches this helper passing a bridge name and the name of an inherited file descriptor. The descriptor is one end of a socketpair() of domain sockets. This domain socket is used to transmit a file descriptor of the opened tap device from the helper to qemu. The helper can then exit and let qemu use the tap device. When QEMU is run by libvirt, we generally like to use capng to remove the ability for QEMU to run setuid programs at all. So obviously it will struggle to run the qemu-bridge-helper binary in such a scenario. With the way you transmit the TAP device FD back to the caller, it looks like libvirt itself could execute the qemu-bridge-helper receiving the FD, and then pass the FD onto QEMU using the traditional tap,fd=XX syntax. Exactly. This would allow tap-based networking using libvirt session:// URIs. I'll take note of this. It seems like it would be a nice future addition to libvirt. A slight tangent, but a point on DAC isolation. The helper enables DAC isolation for qemu:///session but we still need some work in libvirt to provide DAC isolation for qemu:///system. This could be done by allowing management applications to specify custom user/group IDs when creating guests rather than hard coding the IDs in the configuration file. The TAP device FD is only one FD we normally pass to QEMU. How about support for vhost net ? Is it reasonable to ask the qemu-bridge-helper to send back a vhost net FD also. Absolutely. Or indeed multiple vhost net FDs when we get multiqueue NICs. Should we expect the bridge helper to be strictly limited to just connecting a TAP dev to a bridge, or is the expectation that it will grow more& more functionality over time ? I would not expect it to do more than create virtual network interfaces, and add them to bridges. Multiqueue virtual nics, vhost, etc. would all be in scope as they are part of creating a virtual network interface. Creating the bridges and managing the bridges should be done statically by an administrator and would be out of scope. Regards, Anthony Liguori Daniel -- Regards, Corey
Re: [Qemu-devel] [PATCH 4/4] Add support for bridge
On 10/06/2011 02:19 PM, Anthony Liguori wrote: On 10/06/2011 01:15 PM, Corey Bryant wrote: On 10/06/2011 01:49 PM, Anthony Liguori wrote: On 10/06/2011 10:38 AM, Richa Marwaha wrote: The most common use of -net tap is to connect a tap device to a bridge. This requires the use of a script and running qemu as root in order to allocate a tap device to pass to the script. This model is great for portability and flexibility but it's incredibly difficult to eliminate the need to run qemu as root. The only really viable mechanism is to use tunctl to create a tap device, attach it to a bridge as root, and then hand that tap device to qemu. The problem with this mechanism is that it requires administrator intervention whenever a user wants to create a guest. By essentially writing a helper that implements the most common qemu-ifup script that can be safely given cap_net_admin, we can dramatically simplify things for non-privileged users. We still support existing -net tap options as a mechanism for advanced users and backwards compatibility. Currently, this is very Linux centric but there's really no reason why it couldn't be extended for other Unixes. The default bridge that we attach to is qemubr0. The thinking is that a distro could preconfigure such an interface to allow out-of-the-box bridged networking. Alternatively, if a user wants to use a different bridge, they can say: qemu-hda linux.img -net tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper -net nic,model=virtio Wouldn't it be better to make the syntax: -net bridge[,br=BRIDGE][,helper=HELPER] And default BRIDGE to br0 and HELPER to ${prefix}/libexec/qemu-bridge-helper ? That gives distros a proper way to configure a default bridge making -net bridge Just Work for most people. Regards, Anthony Liguori Yes I think it would be much more usable under -net bridge. I really wanted this to work under -net tap (where fd and init are) but now we know there's no good way to default to the helper without spelling out the path. I'm certainly in favor of leaving helper as part of -net tap, but I think there should be a -net bridge in addition. Regards, Anthony Liguori Ok, yes. The best of both worlds. -- Regards, Corey
Re: [Qemu-devel] [PATCH 4/4] Add support for bridge
On 10/06/2011 01:15 PM, Corey Bryant wrote: On 10/06/2011 01:49 PM, Anthony Liguori wrote: On 10/06/2011 10:38 AM, Richa Marwaha wrote: The most common use of -net tap is to connect a tap device to a bridge. This requires the use of a script and running qemu as root in order to allocate a tap device to pass to the script. This model is great for portability and flexibility but it's incredibly difficult to eliminate the need to run qemu as root. The only really viable mechanism is to use tunctl to create a tap device, attach it to a bridge as root, and then hand that tap device to qemu. The problem with this mechanism is that it requires administrator intervention whenever a user wants to create a guest. By essentially writing a helper that implements the most common qemu-ifup script that can be safely given cap_net_admin, we can dramatically simplify things for non-privileged users. We still support existing -net tap options as a mechanism for advanced users and backwards compatibility. Currently, this is very Linux centric but there's really no reason why it couldn't be extended for other Unixes. The default bridge that we attach to is qemubr0. The thinking is that a distro could preconfigure such an interface to allow out-of-the-box bridged networking. Alternatively, if a user wants to use a different bridge, they can say: qemu-hda linux.img -net tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper -net nic,model=virtio Wouldn't it be better to make the syntax: -net bridge[,br=BRIDGE][,helper=HELPER] And default BRIDGE to br0 and HELPER to ${prefix}/libexec/qemu-bridge-helper ? That gives distros a proper way to configure a default bridge making -net bridge Just Work for most people. Regards, Anthony Liguori Yes I think it would be much more usable under -net bridge. I really wanted this to work under -net tap (where fd and init are) but now we know there's no good way to default to the helper without spelling out the path. I'm certainly in favor of leaving helper as part of -net tap, but I think there should be a -net bridge in addition. Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH 4/4] Add support for bridge
On 10/06/2011 01:49 PM, Anthony Liguori wrote: On 10/06/2011 10:38 AM, Richa Marwaha wrote: The most common use of -net tap is to connect a tap device to a bridge. This requires the use of a script and running qemu as root in order to allocate a tap device to pass to the script. This model is great for portability and flexibility but it's incredibly difficult to eliminate the need to run qemu as root. The only really viable mechanism is to use tunctl to create a tap device, attach it to a bridge as root, and then hand that tap device to qemu. The problem with this mechanism is that it requires administrator intervention whenever a user wants to create a guest. By essentially writing a helper that implements the most common qemu-ifup script that can be safely given cap_net_admin, we can dramatically simplify things for non-privileged users. We still support existing -net tap options as a mechanism for advanced users and backwards compatibility. Currently, this is very Linux centric but there's really no reason why it couldn't be extended for other Unixes. The default bridge that we attach to is qemubr0. The thinking is that a distro could preconfigure such an interface to allow out-of-the-box bridged networking. Alternatively, if a user wants to use a different bridge, they can say: qemu-hda linux.img -net tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper -net nic,model=virtio Wouldn't it be better to make the syntax: -net bridge[,br=BRIDGE][,helper=HELPER] And default BRIDGE to br0 and HELPER to ${prefix}/libexec/qemu-bridge-helper ? That gives distros a proper way to configure a default bridge making -net bridge Just Work for most people. Regards, Anthony Liguori Yes I think it would be much more usable under -net bridge. I really wanted this to work under -net tap (where fd and init are) but now we know there's no good way to default to the helper without spelling out the path. We'll move to -net bridge if folks are in agreement and default to bridge br0. Signed-off-by: Richa Marwaha --- configure | 2 + net.c | 8 +++ net.h | 2 + net/tap.c | 150 --- qemu-options.hx | 48 +- 5 files changed, 190 insertions(+), 20 deletions(-) diff --git a/configure b/configure index f46e9b7..ef05954 100755 --- a/configure +++ b/configure @@ -2775,6 +2775,8 @@ echo "sysconfdir=$sysconfdir">> $config_host_mak echo "docdir=$docdir">> $config_host_mak echo "libexecdir=\${prefix}/libexec">> $config_host_mak echo "confdir=$confdir">> $config_host_mak +echo "CONFIG_QEMU_SHAREDIR=\"$prefix$datasuffix\"">> $config_host_mak +echo "CONFIG_QEMU_HELPERDIR=\"$prefix/libexec\"">> $config_host_mak case "$cpu" in i386|x86_64|alpha|cris|hppa|ia64|lm32|m68k|microblaze|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64|unicore32) diff --git a/net.c b/net.c index d05930c..4c3c551 100644 --- a/net.c +++ b/net.c @@ -956,6 +956,14 @@ static const struct { .type = QEMU_OPT_STRING, .help = "script to shut down the interface", }, { + .name = "br", + .type = QEMU_OPT_STRING, + .help = "bridge name", + }, { + .name = "helper", + .type = QEMU_OPT_STRING, + .help = "command to execute to configure bridge", + }, { .name = "sndbuf", .type = QEMU_OPT_SIZE, .help = "send buffer limit" diff --git a/net.h b/net.h index 9f633f8..eeb19a7 100644 --- a/net.h +++ b/net.h @@ -174,6 +174,8 @@ int do_netdev_del(Monitor *mon, const QDict *qdict, QObject **ret_data); #define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup" #define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown" +#define DEFAULT_BRIDGE_HELPER CONFIG_QEMU_HELPERDIR "/qemu-bridge-helper" +#define DEFAULT_BRIDGE_INTERFACE "qemubr0" void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd); diff --git a/net/tap.c b/net/tap.c index 1f26dc9..74f103a 100644 --- a/net/tap.c +++ b/net/tap.c @@ -388,6 +388,108 @@ static int launch_script(const char *setup_script, const char *ifname, int fd) return -1; } +static int recv_fd(int c) +{ + int fd; + uint8_t msgbuf[CMSG_SPACE(sizeof(fd))]; + struct msghdr msg = { + .msg_control = msgbuf, + .msg_controllen = sizeof(msgbuf), + }; + struct cmsghdr *cmsg; + struct iovec iov; + uint8_t req[1]; + ssize_t len; + + cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + cmsg->cmsg_len = CMSG_LEN(sizeof(fd)); + msg.msg_controllen = cmsg->cmsg_len; + + iov.iov_base = req; + iov.iov_len = sizeof(req); + + msg.msg_iov =&iov; + msg.msg_iovlen = 1; + + len = recvmsg(c,&msg, 0); + if (len> 0) { + memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd)); + return fd; + } + + return len; +} + +static int net_bridge_run_helper(const char *helper, const char *bridge) +{ + sigset_t oldmask, mask; + int pid, status; + char *args[5]; + char **parg; + int sv[2]; + + sigemptyset(&mask); + sigaddset(&mask, SIGCHLD); + sigprocmask(SIG_BLOCK,&mask,&oldmask); + + if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) { + return -1; + } + + /* try to launch bridge helper */ + pid
Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
On 10/06/2011 01:44 PM, Anthony Liguori wrote: On 10/06/2011 10:38 AM, Richa Marwaha wrote: This patch adds a helper that can be used to create a tap device attached to a bridge device. Since this helper is minimal in what it does, it can be given CAP_NET_ADMIN which allows qemu to avoid running as root while still satisfying the majority of what users tend to want to do with tap devices. The way this all works is that qemu launches this helper passing a bridge name and the name of an inherited file descriptor. The descriptor is one end of a socketpair() of domain sockets. This domain socket is used to transmit a file descriptor of the opened tap device from the helper to qemu. The helper can then exit and let qemu use the tap device. Signed-off-by: Richa Marwaha --- Makefile | 12 +++- configure | 1 + qemu-bridge-helper.c | 205 ++ 3 files changed, 216 insertions(+), 2 deletions(-) create mode 100644 qemu-bridge-helper.c diff --git a/Makefile b/Makefile index 6ed3194..f2caedc 100644 --- a/Makefile +++ b/Makefile @@ -34,6 +34,8 @@ $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw) LIBS+=-lz $(LIBS_TOOLS) +HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF) + ifdef BUILD_DOCS DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 QMP/qmp-commands.txt else @@ -74,7 +76,7 @@ defconfig: -include config-all-devices.mak -build-all: $(DOCS) $(TOOLS) recurse-all +build-all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all config-host.h: config-host.h-timestamp config-host.h-timestamp: config-host.mak @@ -151,6 +153,8 @@ qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-ob qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) qemu-timer-common.o +qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o + qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h< $< > $@," GEN $@") @@ -208,7 +212,7 @@ clean: # avoid old build problems by removing potentially incorrect old files rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h gen-op-arm.h rm -f qemu-options.def - rm -f *.o *.d *.a *.lo $(TOOLS) qemu-ga TAGS cscope.* *.pod *~ */*~ + rm -f *.o *.d *.a *.lo $(TOOLS) $(HELPERS-y) qemu-ga TAGS cscope.* *.pod *~ */*~ rm -Rf .libs rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d qapi/*.o qapi/*.d qga/*.o qga/*.d rm -f qemu-img-cmds.h @@ -275,6 +279,10 @@ install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig ifneq ($(TOOLS),) $(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) "$(DESTDIR)$(bindir)" endif +ifneq ($(HELPERS-y),) + $(INSTALL_DIR) "$(DESTDIR)$(libexecdir)" + $(INSTALL_PROG) $(STRIP_OPT) $(HELPERS-y) "$(DESTDIR)$(libexecdir)" +endif ifneq ($(BLOBS),) $(INSTALL_DIR) "$(DESTDIR)$(datadir)" set -e; for x in $(BLOBS); do \ diff --git a/configure b/configure index 59b1494..3e32834 100755 --- a/configure +++ b/configure @@ -2742,6 +2742,7 @@ echo "mandir=$mandir">> $config_host_mak echo "datadir=$datadir">> $config_host_mak echo "sysconfdir=$sysconfdir">> $config_host_mak echo "docdir=$docdir">> $config_host_mak +echo "libexecdir=\${prefix}/libexec">> $config_host_mak echo "confdir=$confdir">> $config_host_mak case "$cpu" in diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c new file mode 100644 index 000..4ac7b36 --- /dev/null +++ b/qemu-bridge-helper.c @@ -0,0 +1,205 @@ +/* + * QEMU Bridge Helper + * + * Copyright IBM, Corp. 2011 + * + * Authors: + * Anthony Liguori Heh, fairly sure that's not my email address ;-) I thought that was a secret identity. :) We'll update that. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include "config-host.h" + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include + +#include + +#include "net/tap-linux.h" + +static int has_vnet_hdr(int fd) +{ + unsigned int features = 0; + struct ifreq ifreq; + + if (ioctl(fd, TUNGETFEATURES,&features) == -1) { + return -errno; + } + + if (!(features& IFF_VNET_HDR)) { + return -ENOTSUP; + } + + if (ioctl(fd, TUNGETIFF,&ifreq) != -1 || errno != EBADFD) { + return -ENOTSUP; + } + + return 1; +} + +static void prep_ifreq(struct ifreq *ifr, const char *ifname) +{ + memset(ifr, 0, sizeof(*ifr)); + snprintf(ifr->ifr_name, IFNAMSIZ, "%s", ifname); +} + +static int send_fd(int c, int fd) +{ + char msgbuf[CMSG_SPACE(sizeof(fd))]; + struct msghdr msg = { + .msg_control = msgbuf, + .msg_controllen = sizeof(msgbuf), + }; + struct cmsghdr *cmsg; + struct iovec iov; + char req[1] = { 0x00 }; + + cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + cmsg->cmsg_len = CMSG_LEN(sizeof(fd)); + msg.msg_controllen = cmsg->cmsg_len; + + iov.iov_base = req; +
Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
On 10/06/2011 11:41 AM, Daniel P. Berrange wrote: On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote: This patch adds a helper that can be used to create a tap device attached to a bridge device. Since this helper is minimal in what it does, it can be given CAP_NET_ADMIN which allows qemu to avoid running as root while still satisfying the majority of what users tend to want to do with tap devices. The way this all works is that qemu launches this helper passing a bridge name and the name of an inherited file descriptor. The descriptor is one end of a socketpair() of domain sockets. This domain socket is used to transmit a file descriptor of the opened tap device from the helper to qemu. The helper can then exit and let qemu use the tap device. When QEMU is run by libvirt, we generally like to use capng to remove the ability for QEMU to run setuid programs at all. So obviously it will struggle to run the qemu-bridge-helper binary in such a scenario. With the way you transmit the TAP device FD back to the caller, it looks like libvirt itself could execute the qemu-bridge-helper receiving the FD, and then pass the FD onto QEMU using the traditional tap,fd=XX syntax. Exactly. This would allow tap-based networking using libvirt session:// URIs. The TAP device FD is only one FD we normally pass to QEMU. How about support for vhost net ? Is it reasonable to ask the qemu-bridge-helper to send back a vhost net FD also. Absolutely. Or indeed multiple vhost net FDs when we get multiqueue NICs. Should we expect the bridge helper to be strictly limited to just connecting a TAP dev to a bridge, or is the expectation that it will grow more& more functionality over time ? I would not expect it to do more than create virtual network interfaces, and add them to bridges. Multiqueue virtual nics, vhost, etc. would all be in scope as they are part of creating a virtual network interface. Creating the bridges and managing the bridges should be done statically by an administrator and would be out of scope. Regards, Anthony Liguori Daniel
Re: [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID
On 10/06/2011 01:42 PM, Anthony Liguori wrote: On 10/06/2011 11:34 AM, Daniel P. Berrange wrote: On Thu, Oct 06, 2011 at 11:38:27AM -0400, Richa Marwaha wrote: The ideal way to use qemu-bridge-helper is to give it an fscap of using: setcap cap_net_admin=ep qemu-bridge-helper Unfortunately, most distros still do not have a mechanism to package files with fscaps applied. This means they'll have to SUID the qemu-bridge-helper binary. To improve security, use libcap to reduce our capability set to just cap_net_admin, then reduce privileges down to the calling user. This is hopefully close to equivalent to fscap support from a security perspective. +#ifdef CONFIG_LIBCAP +static int drop_privileges(void) +{ + cap_t cap; + cap_value_t new_caps[] = {CAP_NET_ADMIN}; + + cap = cap_init(); Check for NULL ? + + /* set capabilities to be permitted and inheritable. we don't need the + * caps to be effective right now as they'll get reset when we seteuid + * anyway */ + cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET); + cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET); Check for failure ? + + if (cap_set_proc(cap) == -1) { + return -1; + } + + cap_free(cap); Check for failure ? + + /* reduce our privileges to a normal user */ + setegid(getgid()); + seteuid(getuid()); Check for failure ? + cap = cap_init(); Check for NULL ? + + /* enable the our capabilities. we marked them as inheritable earlier + * which is what allows this to work. */ + cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET); + cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET); Check for failure ? + + if (cap_set_proc(cap) == -1) { + return -1; + } + + cap_free(cap); Check for failure ? + + return 0; +} +#endif It may seem like checking for failure on cap_free/cap_set_flag is not required because they can only return EINVAL for invalid args, but since this is missing the check for NULL on cap_init you can actually see errors from those latter functions in an OOM cenario. I think I'd suggest not using libcap, instead try libcap-ng [1] whose APIs are designed with safety in mind& result in much simpler and clearer code: eg, that entire function above can be expressed using capng with something approximating: capng_clear(CAPNG_SELECT_BOTH); if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_NET_ADMIN)< 0) error(...); if (capng_change_id(getuid(), getgid(), CAPNG_DROP_SUPP_GRP | CAPNG_CLEAR_BOUNDING)) error(...); Ah, libcap-ng didn't exist when the code was initially written but I agree, it looks like a nice library. Regards, Anthony Liguori This looks a lot simpler. We'll definitely look into implementing this in v2. -- Regards, Corey Regards, Daniel [1] http://people.redhat.com/sgrubb/libcap-ng/
Re: [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID
On 10/06/2011 01:42 PM, Anthony Liguori wrote: On 10/06/2011 11:34 AM, Daniel P. Berrange wrote: On Thu, Oct 06, 2011 at 11:38:27AM -0400, Richa Marwaha wrote: The ideal way to use qemu-bridge-helper is to give it an fscap of using: setcap cap_net_admin=ep qemu-bridge-helper Unfortunately, most distros still do not have a mechanism to package files with fscaps applied. This means they'll have to SUID the qemu-bridge-helper binary. To improve security, use libcap to reduce our capability set to just cap_net_admin, then reduce privileges down to the calling user. This is hopefully close to equivalent to fscap support from a security perspective. +#ifdef CONFIG_LIBCAP +static int drop_privileges(void) +{ + cap_t cap; + cap_value_t new_caps[] = {CAP_NET_ADMIN}; + + cap = cap_init(); Check for NULL ? + + /* set capabilities to be permitted and inheritable. we don't need the + * caps to be effective right now as they'll get reset when we seteuid + * anyway */ + cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET); + cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET); Check for failure ? + + if (cap_set_proc(cap) == -1) { + return -1; + } + + cap_free(cap); Check for failure ? + + /* reduce our privileges to a normal user */ + setegid(getgid()); + seteuid(getuid()); Check for failure ? + cap = cap_init(); Check for NULL ? + + /* enable the our capabilities. we marked them as inheritable earlier + * which is what allows this to work. */ + cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET); + cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET); Check for failure ? + + if (cap_set_proc(cap) == -1) { + return -1; + } + + cap_free(cap); Check for failure ? + + return 0; +} +#endif It may seem like checking for failure on cap_free/cap_set_flag is not required because they can only return EINVAL for invalid args, but since this is missing the check for NULL on cap_init you can actually see errors from those latter functions in an OOM cenario. I think I'd suggest not using libcap, instead try libcap-ng [1] whose APIs are designed with safety in mind& result in much simpler and clearer code: eg, that entire function above can be expressed using capng with something approximating: capng_clear(CAPNG_SELECT_BOTH); if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_NET_ADMIN)< 0) error(...); if (capng_change_id(getuid(), getgid(), CAPNG_DROP_SUPP_GRP | CAPNG_CLEAR_BOUNDING)) error(...); Ah, libcap-ng didn't exist when the code was initially written but I agree, it looks like a nice library. Regards, Anthony Liguori This looks a lot simpler. We'll definitely look into implementing this in v2. -- Regards, Corey Regards, Daniel [1] http://people.redhat.com/sgrubb/libcap-ng/
Re: [Qemu-devel] [PATCH 4/4] Add support for bridge
On 10/06/2011 10:38 AM, Richa Marwaha wrote: The most common use of -net tap is to connect a tap device to a bridge. This requires the use of a script and running qemu as root in order to allocate a tap device to pass to the script. This model is great for portability and flexibility but it's incredibly difficult to eliminate the need to run qemu as root. The only really viable mechanism is to use tunctl to create a tap device, attach it to a bridge as root, and then hand that tap device to qemu. The problem with this mechanism is that it requires administrator intervention whenever a user wants to create a guest. By essentially writing a helper that implements the most common qemu-ifup script that can be safely given cap_net_admin, we can dramatically simplify things for non-privileged users. We still support existing -net tap options as a mechanism for advanced users and backwards compatibility. Currently, this is very Linux centric but there's really no reason why it couldn't be extended for other Unixes. The default bridge that we attach to is qemubr0. The thinking is that a distro could preconfigure such an interface to allow out-of-the-box bridged networking. Alternatively, if a user wants to use a different bridge, they can say: qemu-hda linux.img -net tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper -net nic,model=virtio Wouldn't it be better to make the syntax: -net bridge[,br=BRIDGE][,helper=HELPER] And default BRIDGE to br0 and HELPER to ${prefix}/libexec/qemu-bridge-helper ? That gives distros a proper way to configure a default bridge making -net bridge Just Work for most people. Regards, Anthony Liguori Signed-off-by: Richa Marwaha --- configure |2 + net.c |8 +++ net.h |2 + net/tap.c | 150 --- qemu-options.hx | 48 +- 5 files changed, 190 insertions(+), 20 deletions(-) diff --git a/configure b/configure index f46e9b7..ef05954 100755 --- a/configure +++ b/configure @@ -2775,6 +2775,8 @@ echo "sysconfdir=$sysconfdir">> $config_host_mak echo "docdir=$docdir">> $config_host_mak echo "libexecdir=\${prefix}/libexec">> $config_host_mak echo "confdir=$confdir">> $config_host_mak +echo "CONFIG_QEMU_SHAREDIR=\"$prefix$datasuffix\"">> $config_host_mak +echo "CONFIG_QEMU_HELPERDIR=\"$prefix/libexec\"">> $config_host_mak case "$cpu" in i386|x86_64|alpha|cris|hppa|ia64|lm32|m68k|microblaze|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64|unicore32) diff --git a/net.c b/net.c index d05930c..4c3c551 100644 --- a/net.c +++ b/net.c @@ -956,6 +956,14 @@ static const struct { .type = QEMU_OPT_STRING, .help = "script to shut down the interface", }, { +.name = "br", +.type = QEMU_OPT_STRING, +.help = "bridge name", +}, { +.name = "helper", +.type = QEMU_OPT_STRING, +.help = "command to execute to configure bridge", +}, { .name = "sndbuf", .type = QEMU_OPT_SIZE, .help = "send buffer limit" diff --git a/net.h b/net.h index 9f633f8..eeb19a7 100644 --- a/net.h +++ b/net.h @@ -174,6 +174,8 @@ int do_netdev_del(Monitor *mon, const QDict *qdict, QObject **ret_data); #define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup" #define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown" +#define DEFAULT_BRIDGE_HELPER CONFIG_QEMU_HELPERDIR "/qemu-bridge-helper" +#define DEFAULT_BRIDGE_INTERFACE "qemubr0" void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd); diff --git a/net/tap.c b/net/tap.c index 1f26dc9..74f103a 100644 --- a/net/tap.c +++ b/net/tap.c @@ -388,6 +388,108 @@ static int launch_script(const char *setup_script, const char *ifname, int fd) return -1; } +static int recv_fd(int c) +{ +int fd; +uint8_t msgbuf[CMSG_SPACE(sizeof(fd))]; +struct msghdr msg = { +.msg_control = msgbuf, +.msg_controllen = sizeof(msgbuf), +}; +struct cmsghdr *cmsg; +struct iovec iov; +uint8_t req[1]; +ssize_t len; + +cmsg = CMSG_FIRSTHDR(&msg); +cmsg->cmsg_level = SOL_SOCKET; +cmsg->cmsg_type = SCM_RIGHTS; +cmsg->cmsg_len = CMSG_LEN(sizeof(fd)); +msg.msg_controllen = cmsg->cmsg_len; + +iov.iov_base = req; +iov.iov_len = sizeof(req); + +msg.msg_iov =&iov; +msg.msg_iovlen = 1; + +len = recvmsg(c,&msg, 0); +if (len> 0) { +memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd)); +return fd; +} + +return len; +} + +static int net_bridge_run_helper(const char *helper, const char *bridge) +{ +sigset_t oldmask, mask; +int pid, status; +char *args[5]; +char **parg; +int sv[2]; + +sigemptyset(&mask); +sigaddset(&mask, SIGCHLD); +sigprocmask(SIG_BLOCK,&mask,&oldmask); + +if (socketpair(
Re: [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID
On 10/06/2011 11:34 AM, Daniel P. Berrange wrote: On Thu, Oct 06, 2011 at 11:38:27AM -0400, Richa Marwaha wrote: The ideal way to use qemu-bridge-helper is to give it an fscap of using: setcap cap_net_admin=ep qemu-bridge-helper Unfortunately, most distros still do not have a mechanism to package files with fscaps applied. This means they'll have to SUID the qemu-bridge-helper binary. To improve security, use libcap to reduce our capability set to just cap_net_admin, then reduce privileges down to the calling user. This is hopefully close to equivalent to fscap support from a security perspective. +#ifdef CONFIG_LIBCAP +static int drop_privileges(void) +{ +cap_t cap; +cap_value_t new_caps[] = {CAP_NET_ADMIN}; + +cap = cap_init(); Check for NULL ? + +/* set capabilities to be permitted and inheritable. we don't need the + * caps to be effective right now as they'll get reset when we seteuid + * anyway */ +cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET); +cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET); Check for failure ? + +if (cap_set_proc(cap) == -1) { +return -1; +} + +cap_free(cap); Check for failure ? + +/* reduce our privileges to a normal user */ +setegid(getgid()); +seteuid(getuid()); Check for failure ? +cap = cap_init(); Check for NULL ? + +/* enable the our capabilities. we marked them as inheritable earlier + * which is what allows this to work. */ +cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET); +cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET); Check for failure ? + +if (cap_set_proc(cap) == -1) { +return -1; +} + +cap_free(cap); Check for failure ? + +return 0; +} +#endif It may seem like checking for failure on cap_free/cap_set_flag is not required because they can only return EINVAL for invalid args, but since this is missing the check for NULL on cap_init you can actually see errors from those latter functions in an OOM cenario. I think I'd suggest not using libcap, instead try libcap-ng [1] whose APIs are designed with safety in mind& result in much simpler and clearer code: eg, that entire function above can be expressed using capng with something approximating: capng_clear(CAPNG_SELECT_BOTH); if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_NET_ADMIN)< 0) error(...); if (capng_change_id(getuid(), getgid(), CAPNG_DROP_SUPP_GRP | CAPNG_CLEAR_BOUNDING)) error(...); Ah, libcap-ng didn't exist when the code was initially written but I agree, it looks like a nice library. Regards, Anthony Liguori Regards, Daniel [1] http://people.redhat.com/sgrubb/libcap-ng/
Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
On 10/06/2011 10:38 AM, Richa Marwaha wrote: This patch adds a helper that can be used to create a tap device attached to a bridge device. Since this helper is minimal in what it does, it can be given CAP_NET_ADMIN which allows qemu to avoid running as root while still satisfying the majority of what users tend to want to do with tap devices. The way this all works is that qemu launches this helper passing a bridge name and the name of an inherited file descriptor. The descriptor is one end of a socketpair() of domain sockets. This domain socket is used to transmit a file descriptor of the opened tap device from the helper to qemu. The helper can then exit and let qemu use the tap device. Signed-off-by: Richa Marwaha --- Makefile | 12 +++- configure|1 + qemu-bridge-helper.c | 205 ++ 3 files changed, 216 insertions(+), 2 deletions(-) create mode 100644 qemu-bridge-helper.c diff --git a/Makefile b/Makefile index 6ed3194..f2caedc 100644 --- a/Makefile +++ b/Makefile @@ -34,6 +34,8 @@ $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw) LIBS+=-lz $(LIBS_TOOLS) +HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF) + ifdef BUILD_DOCS DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 QMP/qmp-commands.txt else @@ -74,7 +76,7 @@ defconfig: -include config-all-devices.mak -build-all: $(DOCS) $(TOOLS) recurse-all +build-all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all config-host.h: config-host.h-timestamp config-host.h-timestamp: config-host.mak @@ -151,6 +153,8 @@ qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-ob qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) qemu-timer-common.o +qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o + qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h< $< > $@," GEN $@") @@ -208,7 +212,7 @@ clean: # avoid old build problems by removing potentially incorrect old files rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h gen-op-arm.h rm -f qemu-options.def - rm -f *.o *.d *.a *.lo $(TOOLS) qemu-ga TAGS cscope.* *.pod *~ */*~ + rm -f *.o *.d *.a *.lo $(TOOLS) $(HELPERS-y) qemu-ga TAGS cscope.* *.pod *~ */*~ rm -Rf .libs rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d qapi/*.o qapi/*.d qga/*.o qga/*.d rm -f qemu-img-cmds.h @@ -275,6 +279,10 @@ install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig ifneq ($(TOOLS),) $(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) "$(DESTDIR)$(bindir)" endif +ifneq ($(HELPERS-y),) + $(INSTALL_DIR) "$(DESTDIR)$(libexecdir)" + $(INSTALL_PROG) $(STRIP_OPT) $(HELPERS-y) "$(DESTDIR)$(libexecdir)" +endif ifneq ($(BLOBS),) $(INSTALL_DIR) "$(DESTDIR)$(datadir)" set -e; for x in $(BLOBS); do \ diff --git a/configure b/configure index 59b1494..3e32834 100755 --- a/configure +++ b/configure @@ -2742,6 +2742,7 @@ echo "mandir=$mandir">> $config_host_mak echo "datadir=$datadir">> $config_host_mak echo "sysconfdir=$sysconfdir">> $config_host_mak echo "docdir=$docdir">> $config_host_mak +echo "libexecdir=\${prefix}/libexec">> $config_host_mak echo "confdir=$confdir">> $config_host_mak case "$cpu" in diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c new file mode 100644 index 000..4ac7b36 --- /dev/null +++ b/qemu-bridge-helper.c @@ -0,0 +1,205 @@ +/* + * QEMU Bridge Helper + * + * Copyright IBM, Corp. 2011 + * + * Authors: + * Anthony Liguori Heh, fairly sure that's not my email address ;-) + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include "config-host.h" + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include + +#include + +#include "net/tap-linux.h" + +static int has_vnet_hdr(int fd) +{ +unsigned int features = 0; +struct ifreq ifreq; + +if (ioctl(fd, TUNGETFEATURES,&features) == -1) { +return -errno; +} + +if (!(features& IFF_VNET_HDR)) { +return -ENOTSUP; +} + +if (ioctl(fd, TUNGETIFF,&ifreq) != -1 || errno != EBADFD) { +return -ENOTSUP; +} + +return 1; +} + +static void prep_ifreq(struct ifreq *ifr, const char *ifname) +{ +memset(ifr, 0, sizeof(*ifr)); +snprintf(ifr->ifr_name, IFNAMSIZ, "%s", ifname); +} + +static int send_fd(int c, int fd) +{ +char msgbuf[CMSG_SPACE(sizeof(fd))]; +struct msghdr msg = { +.msg_control = msgbuf, +.msg_controllen = sizeof(msgbuf), +}; +struct cmsghdr *cmsg; +struct iovec iov; +char req[1] = { 0x00 }; + +cmsg = CMSG_FIRSTHDR(&ms
Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote: > This patch adds a helper that can be used to create a tap device attached to > a bridge device. Since this helper is minimal in what it does, it can be > given CAP_NET_ADMIN which allows qemu to avoid running as root while still > satisfying the majority of what users tend to want to do with tap devices. > > The way this all works is that qemu launches this helper passing a bridge > name and the name of an inherited file descriptor. The descriptor is one > end of a socketpair() of domain sockets. This domain socket is used to > transmit a file descriptor of the opened tap device from the helper to qemu. > > The helper can then exit and let qemu use the tap device. When QEMU is run by libvirt, we generally like to use capng to remove the ability for QEMU to run setuid programs at all. So obviously it will struggle to run the qemu-bridge-helper binary in such a scenario. With the way you transmit the TAP device FD back to the caller, it looks like libvirt itself could execute the qemu-bridge-helper receiving the FD, and then pass the FD onto QEMU using the traditional tap,fd=XX syntax. The TAP device FD is only one FD we normally pass to QEMU. How about support for vhost net ? Is it reasonable to ask the qemu-bridge-helper to send back a vhost net FD also. Or indeed multiple vhost net FDs when we get multiqueue NICs. Should we expect the bridge helper to be strictly limited to just connecting a TAP dev to a bridge, or is the expectation that it will grow more & more functionality over time ? Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID
On Thu, Oct 06, 2011 at 11:38:27AM -0400, Richa Marwaha wrote: > The ideal way to use qemu-bridge-helper is to give it an fscap of using: > > setcap cap_net_admin=ep qemu-bridge-helper > > Unfortunately, most distros still do not have a mechanism to package files > with fscaps applied. This means they'll have to SUID the qemu-bridge-helper > binary. > > To improve security, use libcap to reduce our capability set to just > cap_net_admin, then reduce privileges down to the calling user. This is > hopefully close to equivalent to fscap support from a security perspective. > +#ifdef CONFIG_LIBCAP > +static int drop_privileges(void) > +{ > +cap_t cap; > +cap_value_t new_caps[] = {CAP_NET_ADMIN}; > + > +cap = cap_init(); Check for NULL ? > + > +/* set capabilities to be permitted and inheritable. we don't need the > + * caps to be effective right now as they'll get reset when we seteuid > + * anyway */ > +cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET); > +cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET); Check for failure ? > + > +if (cap_set_proc(cap) == -1) { > +return -1; > +} > + > +cap_free(cap); Check for failure ? > + > +/* reduce our privileges to a normal user */ > +setegid(getgid()); > +seteuid(getuid()); Check for failure ? > +cap = cap_init(); Check for NULL ? > + > +/* enable the our capabilities. we marked them as inheritable earlier > + * which is what allows this to work. */ > +cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET); > +cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET); Check for failure ? > + > +if (cap_set_proc(cap) == -1) { > +return -1; > +} > + > +cap_free(cap); Check for failure ? > + > +return 0; > +} > +#endif It may seem like checking for failure on cap_free/cap_set_flag is not required because they can only return EINVAL for invalid args, but since this is missing the check for NULL on cap_init you can actually see errors from those latter functions in an OOM cenario. I think I'd suggest not using libcap, instead try libcap-ng [1] whose APIs are designed with safety in mind & result in much simpler and clearer code: eg, that entire function above can be expressed using capng with something approximating: capng_clear(CAPNG_SELECT_BOTH); if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_NET_ADMIN) < 0) error(...); if (capng_change_id(getuid(), getgid(), CAPNG_DROP_SUPP_GRP | CAPNG_CLEAR_BOUNDING)) error(...); Regards, Daniel [1] http://people.redhat.com/sgrubb/libcap-ng/ -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [RFC] Use TCGReg for all TCG targets?
On 10/06/2011 09:24 AM, Stefan Weil wrote: > Is there consensus that this is a good idea, or should > TCGReg be removed (then all TCG targets use int) or only > used for s390? I think it's a good idea. r~
[Qemu-devel] Integrating Dynamips and GNS3 UDP tunnels (Patches)
GNS3 team developed a GUI in order to inter-connect different emulated hardware. In order to achieve a network inter-connection between each hosts, one single protocol is used: an UDP tunneling protocol introduced by Dynamips (a cisco hardware emulator). Since the beginning, GNS3 supports Qemu by providing patches for its users, these patches bring to Qemu the implementation of Dynamips UDP tunneling protocol. As GNS3 improves and now supports VirtualBox, it should be time to free users of the assle of having to patch Qemu themselves. FreeBSD integrated our patches in the ports tree, we ship a patched Qemu for Windows, and we're now looking forward to integrate those patches upstream. Here are the patches that apply on the latest release of Qemu, I hereby submit them for your approval or not. 1) Basic patch in order to build the new source file http://code.gns3.net/qemu-patches/file/6a927b6cdaf8/Makefile_objs.patch 2) Parse -net udp http://code.gns3.net/qemu-patches/file/6a927b6cdaf8/net_c.patch 3) New NET_CLIENT_TYPE_UDP macro http://code.gns3.net/qemu-patches/file/6a927b6cdaf8/net_h.patch 4) New source code file, implementation of the UDP tunneling protocol http://code.gns3.net/qemu-patches/file/6a927b6cdaf8/net_udp_c.patch 5) Corresponding header file http://code.gns3.net/qemu-patches/file/6a927b6cdaf8/net_udp_h.patch The hw_e1000_c.patch is no longer needed, it was a dirty hack that we kept for too long. The block_raw-win32_c.patch fixes a minor issue that arises only on Windows, it may deserve another topic. Please include me in the replies as I am not subscribed to the list. Regards, Benjamin GNS3 contributor
[Qemu-devel] [PATCH] hw/9pfs: Fix build error on platform that don't support futimens
Signed-off-by: Aneesh Kumar K.V --- hw/9pfs/virtio-9p-handle.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 860b0e3..9860a87 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -386,12 +386,17 @@ static int handle_utimensat(FsContext *ctx, V9fsPath *fs_path, int fd, ret; struct handle_data *data = (struct handle_data *)ctx->private; +#ifdef CONFIG_UTIMENSAT fd = open_by_handle(data->mountfd, fs_path->data, O_NONBLOCK); if (fd < 0) { return fd; } ret = futimens(fd, buf); close(fd); +#else +ret = -1; +errno = ENOSYS; +#endif return ret; } @@ -591,8 +596,15 @@ static int handle_init(FsContext *ctx) int ret, mnt_id; struct statfs stbuf; struct file_handle fh; -struct handle_data *data = g_malloc(sizeof(struct handle_data)); +struct handle_data *data; +#ifndef CONFIG_UTIMENSAT +/* + * We support handle fs driver only if futimens is provided by the host + */ +return -1; +#endif +data = g_malloc(sizeof(struct handle_data)); data->mountfd = open(ctx->fs_root, O_DIRECTORY); if (data->mountfd < 0) { ret = data->mountfd; -- 1.7.4.1
[Qemu-devel] [PATCH 5/5] Revert "savevm: fix corruption in vmstate_subsection_load()."
This reverts commit eb60260de0b050a5e8ab725e84d377d0b44c43ae. Conflicts: savevm.c We changed qemu_peek_byte() prototype, just fixed the rejects. Signed-off-by: Juan Quintela Reviewed-by: Anthony Liguori --- savevm.c | 10 +- 1 files changed, 1 insertions(+), 9 deletions(-) diff --git a/savevm.c b/savevm.c index 28c0a43..1c62269 100644 --- a/savevm.c +++ b/savevm.c @@ -1704,12 +1704,6 @@ static const VMStateDescription *vmstate_get_subsection(const VMStateSubsection static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription *vmsd, void *opaque) { -const VMStateSubsection *sub = vmsd->subsections; - -if (!sub || !sub->needed) { -return 0; -} - while (qemu_peek_byte(f, 0) == QEMU_VM_SUBSECTION) { char idstr[256]; int ret; @@ -1731,7 +1725,7 @@ static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription *vmsd, /* it don't have a valid subsection name */ return 0; } -sub_vmsd = vmstate_get_subsection(sub, idstr); +sub_vmsd = vmstate_get_subsection(vmsd->subsections, idstr); if (sub_vmsd == NULL) { return -ENOENT; } @@ -1740,7 +1734,6 @@ static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription *vmsd, qemu_file_skip(f, len); /* idstr */ version_id = qemu_get_be32(f); -assert(!sub_vmsd->subsections); ret = vmstate_load_state(f, sub_vmsd, opaque, version_id); if (ret) { return ret; @@ -1764,7 +1757,6 @@ static void vmstate_subsection_save(QEMUFile *f, const VMStateDescription *vmsd, qemu_put_byte(f, len); qemu_put_buffer(f, (uint8_t *)vmsd->name, len); qemu_put_be32(f, vmsd->version_id); -assert(!vmsd->subsections); vmstate_save_state(f, vmsd, opaque); } sub++; -- 1.7.6.4
Re: [Qemu-devel] [PATCH 4/5] savevm: improve subsections detection on load
On 10/06/2011 06:21 PM, Juan Quintela wrote: + +int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size) +{ +int pending = size; +int done = 0; + +while (pending> 0) { +int res; + +res = qemu_peek_buffer(f, buf, pending, 0); +if (res == 0) { +return 0; } -memcpy(buf, f->buf + f->buf_index, l); -f->buf_index += l; -buf += l; -size -= l; +qemu_file_skip(f, res); +buf += res; +pending -= res; +done += res; } -return size1 - size; +return done; } This changes semantics for reads above 32KB. It should be in the commit message, or preferably v1 could be committed instead. :) Paolo
Re: [Qemu-devel] [PATCH 3/5] savevm: define qemu_get_byte() using qemu_peek_byte()
On 10/06/2011 06:21 PM, Juan Quintela wrote: +result = qemu_peek_byte(f); + +if (f->buf_index< f->buf_size) { +f->buf_index++; } This should really be an assert that f->buf_index < f->buf_size, otherwise qemu_peek_byte has read garbage. Paolo
[Qemu-devel] [RFC] Use TCGReg for all TCG targets?
Hi, commit 48bb3750e13cbb5a634d3aeab5191d74d124232f introduced the data type 'TCGReg' in tcg/s390. Today, s390 is the only TCG target which uses TCGReg. This causes a conflict with my commit c0ad3001bf12292b137b05e1c4643f31c6b0a727, because some function prototypes in tcg/s390/tcg-target.c differ from those in all the other TCG targets. Builds on s390 hosts are broken now. I'd like to use TCGReg in all TCG targets, thus fixing the conflict and improving readability of the code ('TCGReg' is more specific than 'int'). Is there consensus that this is a good idea, or should TCGReg be removed (then all TCG targets use int) or only used for s390? I cc'ed all TCG maintainers because their code would have to be changed. Regards, Stefan Weil
[Qemu-devel] [PATCH 4/5] savevm: improve subsections detection on load
We add qemu_peek_buffer, that is identical to qemu_get_buffer, just that it don't update f->buf_index. We add a paramenter to qemu_peek_byte() to be able to peek more than one byte. Once this is done, to see if we have a subsection we look: - 1st byte is QEMU_VM_SUBSECTION - 2nd byte is a length, and is bigger than section name - 3rd element is a string that starts with section_name So, we shouldn't have false positives (yes, content could still get us wrong but probabilities are really low). v2: - Alex Williamsom found that we could get negative values on index. - Rework code to fix that part. - Rewrite qemu_get_buffer() using qemu_peek_buffer() Signed-off-by: Juan Quintela --- savevm.c | 110 ++ 1 files changed, 75 insertions(+), 35 deletions(-) diff --git a/savevm.c b/savevm.c index 94628c6..28c0a43 100644 --- a/savevm.c +++ b/savevm.c @@ -532,59 +532,85 @@ void qemu_put_byte(QEMUFile *f, int v) qemu_fflush(f); } -int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size1) +static void qemu_file_skip(QEMUFile *f, int size) { -int size, l; +if (f->buf_index + size < f->buf_size) { +f->buf_index += size; +} +} + +static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset) +{ +int pending; +int index; if (f->is_write) { abort(); } -size = size1; -while (size > 0) { -l = f->buf_size - f->buf_index; -if (l == 0) { -qemu_fill_buffer(f); -l = f->buf_size - f->buf_index; -if (l == 0) { -break; -} -} -if (l > size) { -l = size; +index = f->buf_index + offset; +pending = f->buf_size - index; +if (pending < size) { +qemu_fill_buffer(f); +index = f->buf_index + offset; +pending = f->buf_size - index; +} + +if (pending <= 0) { +return 0; +} +if (size > pending) { +size = pending; +} + +memcpy(buf, f->buf + index, size); +return size; +} + +int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size) +{ +int pending = size; +int done = 0; + +while (pending > 0) { +int res; + +res = qemu_peek_buffer(f, buf, pending, 0); +if (res == 0) { +return 0; } -memcpy(buf, f->buf + f->buf_index, l); -f->buf_index += l; -buf += l; -size -= l; +qemu_file_skip(f, res); +buf += res; +pending -= res; +done += res; } -return size1 - size; +return done; } -static int qemu_peek_byte(QEMUFile *f) +static int qemu_peek_byte(QEMUFile *f, int offset) { +int index = f->buf_index + offset; + if (f->is_write) { abort(); } -if (f->buf_index >= f->buf_size) { +if (index >= f->buf_size) { qemu_fill_buffer(f); -if (f->buf_index >= f->buf_size) { +index = f->buf_index + offset; +if (index >= f->buf_size) { return 0; } } -return f->buf[f->buf_index]; +return f->buf[index]; } int qemu_get_byte(QEMUFile *f) { int result; -result = qemu_peek_byte(f); - -if (f->buf_index < f->buf_size) { -f->buf_index++; -} +result = qemu_peek_byte(f, 0); +qemu_file_skip(f, 1); return result; } @@ -1684,22 +1710,36 @@ static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription *vmsd, return 0; } -while (qemu_peek_byte(f) == QEMU_VM_SUBSECTION) { +while (qemu_peek_byte(f, 0) == QEMU_VM_SUBSECTION) { char idstr[256]; int ret; -uint8_t version_id, len; +uint8_t version_id, len, size; const VMStateDescription *sub_vmsd; -qemu_get_byte(f); /* subsection */ -len = qemu_get_byte(f); -qemu_get_buffer(f, (uint8_t *)idstr, len); -idstr[len] = 0; -version_id = qemu_get_be32(f); +len = qemu_peek_byte(f, 1); +if (len < strlen(vmsd->name) + 1) { +/* subsection name has be be "section_name/a" */ +return 0; +} +size = qemu_peek_buffer(f, (uint8_t *)idstr, len, 2); +if (size != len) { +return 0; +} +idstr[size] = 0; +if (strncmp(vmsd->name, idstr, strlen(vmsd->name)) != 0) { +/* it don't have a valid subsection name */ +return 0; +} sub_vmsd = vmstate_get_subsection(sub, idstr); if (sub_vmsd == NULL) { return -ENOENT; } +qemu_file_skip(f, 1); /* subsection */ +qemu_file_skip(f, 1); /* len */ +qemu_file_skip(f, len); /* idstr */ +version_id = qemu_get_be32(f); + assert(!sub_vmsd->subsections); ret = vmstate_load_state(f, sub_vmsd, opaque, version_id); if (ret) { -- 1.7.6.4
[Qemu-devel] [PATCH 3/5] savevm: define qemu_get_byte() using qemu_peek_byte()
Signed-off-by: Juan Quintela --- savevm.c | 15 ++- 1 files changed, 6 insertions(+), 9 deletions(-) diff --git a/savevm.c b/savevm.c index 4069b34..94628c6 100644 --- a/savevm.c +++ b/savevm.c @@ -578,17 +578,14 @@ static int qemu_peek_byte(QEMUFile *f) int qemu_get_byte(QEMUFile *f) { -if (f->is_write) { -abort(); -} +int result; -if (f->buf_index >= f->buf_size) { -qemu_fill_buffer(f); -if (f->buf_index >= f->buf_size) { -return 0; -} +result = qemu_peek_byte(f); + +if (f->buf_index < f->buf_size) { +f->buf_index++; } -return f->buf[f->buf_index++]; +return result; } int64_t qemu_ftell(QEMUFile *f) -- 1.7.6.4
[Qemu-devel] [PATCH 2/5] savevm: some coding style cleanups
This patch will make moving code on next patches and having checkpatch happy easier. Signed-off-by: Juan Quintela Reviewed-by: Anthony Liguori --- savevm.c | 21 ++--- 1 files changed, 14 insertions(+), 7 deletions(-) diff --git a/savevm.c b/savevm.c index 743c304..4069b34 100644 --- a/savevm.c +++ b/savevm.c @@ -536,8 +536,9 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size1) { int size, l; -if (f->is_write) +if (f->is_write) { abort(); +} size = size1; while (size > 0) { @@ -545,11 +546,13 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size1) if (l == 0) { qemu_fill_buffer(f); l = f->buf_size - f->buf_index; -if (l == 0) +if (l == 0) { break; +} } -if (l > size) +if (l > size) { l = size; +} memcpy(buf, f->buf + f->buf_index, l); f->buf_index += l; buf += l; @@ -560,26 +563,30 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size1) static int qemu_peek_byte(QEMUFile *f) { -if (f->is_write) +if (f->is_write) { abort(); +} if (f->buf_index >= f->buf_size) { qemu_fill_buffer(f); -if (f->buf_index >= f->buf_size) +if (f->buf_index >= f->buf_size) { return 0; +} } return f->buf[f->buf_index]; } int qemu_get_byte(QEMUFile *f) { -if (f->is_write) +if (f->is_write) { abort(); +} if (f->buf_index >= f->buf_size) { qemu_fill_buffer(f); -if (f->buf_index >= f->buf_size) +if (f->buf_index >= f->buf_size) { return 0; +} } return f->buf[f->buf_index++]; } -- 1.7.6.4
[Qemu-devel] [PATCH 1/5] savevm: teach qemu_fill_buffer to do partial refills
We will need on next patch to be able to lookahead on next patch v2: rename "used" to "pending" (Alex Williams) Signed-off-by: Juan Quintela Reviewed-by: Anthony Liguori --- savevm.c | 14 +++--- 1 files changed, 11 insertions(+), 3 deletions(-) diff --git a/savevm.c b/savevm.c index 46f2447..743c304 100644 --- a/savevm.c +++ b/savevm.c @@ -455,6 +455,7 @@ void qemu_fflush(QEMUFile *f) static void qemu_fill_buffer(QEMUFile *f) { int len; +int pending; if (!f->get_buffer) return; @@ -462,10 +463,17 @@ static void qemu_fill_buffer(QEMUFile *f) if (f->is_write) abort(); -len = f->get_buffer(f->opaque, f->buf, f->buf_offset, IO_BUF_SIZE); +pending = f->buf_size - f->buf_index; +if (pending > 0) { +memmove(f->buf, f->buf + f->buf_index, pending); +} +f->buf_index = 0; +f->buf_size = pending; + +len = f->get_buffer(f->opaque, f->buf + pending, f->buf_offset, +IO_BUF_SIZE - pending); if (len > 0) { -f->buf_index = 0; -f->buf_size = len; +f->buf_size += len; f->buf_offset += len; } else if (len != -EAGAIN) f->has_error = 1; -- 1.7.6.4
[Qemu-devel] [PATCH 0/5] migration: Improve subsections detection
Hi v2: - rename "used" to "remaining" (Alex suggestion) - implement qemu_get_{byte,buffer} on top of qemu_peek_{byte, buffer} (Anthony suggestion) - fix qemu_peek_buffe_logic (Alex discovered the problem) v1: This series move the subsections detection code form: - Look that it starts form 5 To: - Look that it starts form 5 (SUBSECTION) - Look at the length - Look that length is bigger than section name - Look at the idstr and see that it starts with the subsection name. Please review. Later, Juan. Juan Quintela (5): savevm: teach qemu_fill_buffer to do partial refills savevm: some coding style cleanups savevm: define qemu_get_byte() using qemu_peek_byte() savevm: improve subsections detection on load Revert "savevm: fix corruption in vmstate_subsection_load()." savevm.c | 144 - 1 files changed, 94 insertions(+), 50 deletions(-) -- 1.7.6.4
[Qemu-devel] [PATCH 0/4] -net tap: rootless bridge support for qemu
With qemu it possible to run guest with unprivileged user but if we wanted to communicate with the outside world we had to switch to root. We address this problem by introducing a new network option.This option is less flexible as compare to other -net tap options because it relies on a helper with elevated privileges to do the heavy lifting of allocating and attaching a tap device to a bridge. We use a special purpose helper because we don't want to elevate the privileges of more generic tools like brctl. Qemu can be run with the default network helper as follows (in this case attaching the tap device to the default qemubr0 bridge): qemu -hda linux.img -net tap,helper=/usr/local/libexec/qemu-bridge-helper -net nic We're not overly thrilled with having to spell out the helper file name, however we didn't want to regress any current behavior of -net tap. Additionally, we feel that this support makes sense in the -net tap backend. Any suggestions to improve on this are more than welcome. The default helper uses it's own ACL mechanism for access control,but future network helpers could be developed, for example, to support PolicyKit for access control. More details are included in individual patches.The helper is broken into a series of patches to improve reviewabilty. Richa Marwaha (4): Add basic version of bridge helper Add access control support to qemu-bridge-helper Add cap reduction support to enable use as SUID Add support for bridge Makefile | 12 ++- configure| 37 + net.c|8 + net.h|2 + net/tap.c| 150 ++- qemu-bridge-helper.c | 402 ++ qemu-options.hx | 48 +-- 7 files changed, 637 insertions(+), 22 deletions(-) create mode 100644 qemu-bridge-helper.c
[Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
This patch adds a helper that can be used to create a tap device attached to a bridge device. Since this helper is minimal in what it does, it can be given CAP_NET_ADMIN which allows qemu to avoid running as root while still satisfying the majority of what users tend to want to do with tap devices. The way this all works is that qemu launches this helper passing a bridge name and the name of an inherited file descriptor. The descriptor is one end of a socketpair() of domain sockets. This domain socket is used to transmit a file descriptor of the opened tap device from the helper to qemu. The helper can then exit and let qemu use the tap device. Signed-off-by: Richa Marwaha --- Makefile | 12 +++- configure|1 + qemu-bridge-helper.c | 205 ++ 3 files changed, 216 insertions(+), 2 deletions(-) create mode 100644 qemu-bridge-helper.c diff --git a/Makefile b/Makefile index 6ed3194..f2caedc 100644 --- a/Makefile +++ b/Makefile @@ -34,6 +34,8 @@ $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw) LIBS+=-lz $(LIBS_TOOLS) +HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF) + ifdef BUILD_DOCS DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 QMP/qmp-commands.txt else @@ -74,7 +76,7 @@ defconfig: -include config-all-devices.mak -build-all: $(DOCS) $(TOOLS) recurse-all +build-all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all config-host.h: config-host.h-timestamp config-host.h-timestamp: config-host.mak @@ -151,6 +153,8 @@ qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-ob qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) qemu-timer-common.o +qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o + qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@," GEN $@") @@ -208,7 +212,7 @@ clean: # avoid old build problems by removing potentially incorrect old files rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h gen-op-arm.h rm -f qemu-options.def - rm -f *.o *.d *.a *.lo $(TOOLS) qemu-ga TAGS cscope.* *.pod *~ */*~ + rm -f *.o *.d *.a *.lo $(TOOLS) $(HELPERS-y) qemu-ga TAGS cscope.* *.pod *~ */*~ rm -Rf .libs rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d qapi/*.o qapi/*.d qga/*.o qga/*.d rm -f qemu-img-cmds.h @@ -275,6 +279,10 @@ install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig ifneq ($(TOOLS),) $(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) "$(DESTDIR)$(bindir)" endif +ifneq ($(HELPERS-y),) + $(INSTALL_DIR) "$(DESTDIR)$(libexecdir)" + $(INSTALL_PROG) $(STRIP_OPT) $(HELPERS-y) "$(DESTDIR)$(libexecdir)" +endif ifneq ($(BLOBS),) $(INSTALL_DIR) "$(DESTDIR)$(datadir)" set -e; for x in $(BLOBS); do \ diff --git a/configure b/configure index 59b1494..3e32834 100755 --- a/configure +++ b/configure @@ -2742,6 +2742,7 @@ echo "mandir=$mandir" >> $config_host_mak echo "datadir=$datadir" >> $config_host_mak echo "sysconfdir=$sysconfdir" >> $config_host_mak echo "docdir=$docdir" >> $config_host_mak +echo "libexecdir=\${prefix}/libexec" >> $config_host_mak echo "confdir=$confdir" >> $config_host_mak case "$cpu" in diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c new file mode 100644 index 000..4ac7b36 --- /dev/null +++ b/qemu-bridge-helper.c @@ -0,0 +1,205 @@ +/* + * QEMU Bridge Helper + * + * Copyright IBM, Corp. 2011 + * + * Authors: + * Anthony Liguori + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include "config-host.h" + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include + +#include + +#include "net/tap-linux.h" + +static int has_vnet_hdr(int fd) +{ +unsigned int features = 0; +struct ifreq ifreq; + +if (ioctl(fd, TUNGETFEATURES, &features) == -1) { +return -errno; +} + +if (!(features & IFF_VNET_HDR)) { +return -ENOTSUP; +} + +if (ioctl(fd, TUNGETIFF, &ifreq) != -1 || errno != EBADFD) { +return -ENOTSUP; +} + +return 1; +} + +static void prep_ifreq(struct ifreq *ifr, const char *ifname) +{ +memset(ifr, 0, sizeof(*ifr)); +snprintf(ifr->ifr_name, IFNAMSIZ, "%s", ifname); +} + +static int send_fd(int c, int fd) +{ +char msgbuf[CMSG_SPACE(sizeof(fd))]; +struct msghdr msg = { +.msg_control = msgbuf, +.msg_controllen = sizeof(msgbuf), +}; +struct cmsghdr *cmsg; +struct iovec iov; +char req[1] = { 0x00 }; + +cmsg = CMSG_FIRSTHDR(&msg); +cmsg->cmsg_level = SOL_SOCKET; +cmsg->cmsg_type = SCM_RIGHTS; +cmsg->cmsg_len =
[Qemu-devel] [PATCH 2/4] Add access control support to qemu-bridge-helper
We go to great lengths to restrict ourselves to just cap_net_admin as an OS enforced security mechanism. However, we further restrict what we allow users to do to simply adding a tap device to a bridge interface by virtue of the fact that this is the only functionality we expose. This is not good enough though. An administrator is likely to want to restrict the bridges that an unprivileged user can access, in particular, to restrict an unprivileged user from putting a guest on what should be isolated networks. This patch implements a ACL mechanism that is enforced by qemu-bridge-helper. The ACLs are fairly simple whitelist/blacklist mechanisms with a wildcard of 'all'. An interesting feature of this ACL mechanism is that you can include external ACL files. The main reason to support this is so that you can set different file system permissions on those external ACL files. This allows an administrator to implement rather sophisicated ACL policies based on user/group policies via the file system. As an example: /etc/qemu/bridge.conf root:qemu 0640 deny all allow br0 include /etc/qemu/alice.conf include /etc/qemu/bob.conf /etc/qemu/alice.conf root:alice 0640 allow br1 /etc/qemu/bob.conf root:bob 0640 allow br2 This ACL pattern allows any user in the qemu group to get a tap device connected to br0 (which is bridged to the physical network). Users in the alice group can additionally get a tap device connected to br1. This allows br1 to act as a private bridge for the alice group. Users in the bob group can additionally get a tap device connected to br2. This allows br2 to act as a private bridge for the bob group. Under no circumstance can the bob group get access to br1 or can the alice group get access to br2. Signed-off-by: Richa Marwaha --- qemu-bridge-helper.c | 141 ++ 1 files changed, 141 insertions(+), 0 deletions(-) diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c index 4ac7b36..5e09fea 100644 --- a/qemu-bridge-helper.c +++ b/qemu-bridge-helper.c @@ -33,6 +33,105 @@ #include "net/tap-linux.h" +#define MAX_ACLS (128) +#define DEFAULT_ACL_FILE CONFIG_QEMU_CONFDIR "/bridge.conf" + +enum { +ACL_ALLOW = 0, +ACL_ALLOW_ALL, +ACL_DENY, +ACL_DENY_ALL, +}; + +typedef struct ACLRule { +int type; +char iface[IFNAMSIZ]; +} ACLRule; + +static int parse_acl_file(const char *filename, ACLRule *acls, int *pacl_count) +{ +int acl_count = *pacl_count; +FILE *f; +char line[4096]; + +f = fopen(filename, "r"); +if (f == NULL) { +return -1; +} + +while (acl_count != MAX_ACLS && +fgets(line, sizeof(line), f) != NULL) { +char *ptr = line; +char *cmd, *arg, *argend; + +while (isspace(*ptr)) { +ptr++; +} + +/* skip comments and empty lines */ +if (*ptr == '#' || *ptr == 0) { +continue; +} + +cmd = ptr; +arg = strchr(cmd, ' '); +if (arg == NULL) { +arg = strchr(cmd, '\t'); +} + +if (arg == NULL) { +fprintf(stderr, "Invalid config line:\n %s\n", line); +fclose(f); +errno = EINVAL; +return -1; +} + +*arg = 0; +arg++; +while (isspace(*arg)) { +arg++; +} + +argend = arg + strlen(arg); +while (arg != argend && isspace(*(argend - 1))) { +argend--; +} +*argend = 0; + +if (strcmp(cmd, "deny") == 0) { +if (strcmp(arg, "all") == 0) { +acls[acl_count].type = ACL_DENY_ALL; +} else { +acls[acl_count].type = ACL_DENY; +snprintf(acls[acl_count].iface, IFNAMSIZ, "%s", arg); +} +acl_count++; +} else if (strcmp(cmd, "allow") == 0) { +if (strcmp(arg, "all") == 0) { +acls[acl_count].type = ACL_ALLOW_ALL; +} else { +acls[acl_count].type = ACL_ALLOW; +snprintf(acls[acl_count].iface, IFNAMSIZ, "%s", arg); +} +acl_count++; +} else if (strcmp(cmd, "include") == 0) { +/* ignore errors */ +parse_acl_file(arg, acls, &acl_count); +} else { +fprintf(stderr, "Unknown command `%s'\n", cmd); +fclose(f); +errno = EINVAL; +return -1; +} +} + +*pacl_count = acl_count; + +fclose(f); + +return 0; +} + static int has_vnet_hdr(int fd) { unsigned int features = 0; @@ -95,6 +194,9 @@ int main(int argc, char **argv) const char *bridge; char iface[IFNAMSIZ]; int index; +ACLRule acls[MAX_ACLS]; +int acl_count = 0; +int i, access_allowed, access_denied; /* parse arguments */ if (argc < 3 || argc > 4) { @@ -115,6 +217,45 @@ int main(int argc, char **argv) bridge = argv[index++];
[Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID
The ideal way to use qemu-bridge-helper is to give it an fscap of using: setcap cap_net_admin=ep qemu-bridge-helper Unfortunately, most distros still do not have a mechanism to package files with fscaps applied. This means they'll have to SUID the qemu-bridge-helper binary. To improve security, use libcap to reduce our capability set to just cap_net_admin, then reduce privileges down to the calling user. This is hopefully close to equivalent to fscap support from a security perspective. Signed-off-by: Richa Marwaha --- configure| 34 ++ qemu-bridge-helper.c | 56 ++ 2 files changed, 90 insertions(+), 0 deletions(-) diff --git a/configure b/configure index 3e32834..f46e9b7 100755 --- a/configure +++ b/configure @@ -128,6 +128,7 @@ vnc_thread="no" xen="" xen_ctrl_version="" linux_aio="" +cap="" attr="" xfs="" @@ -653,6 +654,10 @@ for opt do ;; --enable-kvm) kvm="yes" ;; + --disable-cap) cap="no" + ;; + --enable-cap) cap="yes" + ;; --disable-spice) spice="no" ;; --enable-spice) spice="yes" @@ -1032,6 +1037,8 @@ echo " --disable-vdedisable support for vde network" echo " --enable-vde enable support for vde network" echo " --disable-linux-aio disable Linux AIO support" echo " --enable-linux-aio enable Linux AIO support" +echo " --disable-capdisable libcap support" +echo " --enable-cap enable libcap support" echo " --disable-attr disables attr and xattr support" echo " --enable-attrenable attr and xattr support" echo " --disable-blobs disable installing provided firmware blobs" @@ -1638,6 +1645,29 @@ EOF fi ## +# cap library probe +if test "$cap" != "no" ; then + cap_libs="-lcap" + cat > $TMPC << EOF +#include +int main(void) +{ +cap_init(); +return 0; +} +EOF + if compile_prog "" "$cap_libs" ; then +cap=yes +libs_tools="$cap_libs $libs_tools" + else +if test "$cap" = "yes" ; then + feature_not_found "cap" +fi +cap=no + fi +fi + +## # Sound support libraries probe audio_drv_probe() @@ -2710,6 +2740,7 @@ echo "fdatasync $fdatasync" echo "madvise $madvise" echo "posix_madvise $posix_madvise" echo "uuid support $uuid" +echo "libcap support$cap" echo "vhost-net support $vhost_net" echo "Trace backend $trace_backend" echo "Trace output file $trace_file-" @@ -2821,6 +2852,9 @@ fi if test "$vde" = "yes" ; then echo "CONFIG_VDE=y" >> $config_host_mak fi +if test "$cap" = "yes" ; then + echo "CONFIG_LIBCAP=y" >> $config_host_mak +fi for card in $audio_card_list; do def=CONFIG_`echo $card | tr '[:lower:]' '[:upper:]'` echo "$def=y" >> $config_host_mak diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c index 5e09fea..b1519e0 100644 --- a/qemu-bridge-helper.c +++ b/qemu-bridge-helper.c @@ -33,6 +33,10 @@ #include "net/tap-linux.h" +#ifdef CONFIG_LIBCAP +#include +#endif + #define MAX_ACLS (128) #define DEFAULT_ACL_FILE CONFIG_QEMU_CONFDIR "/bridge.conf" @@ -185,6 +189,47 @@ static int send_fd(int c, int fd) return sendmsg(c, &msg, 0); } +#ifdef CONFIG_LIBCAP +static int drop_privileges(void) +{ +cap_t cap; +cap_value_t new_caps[] = {CAP_NET_ADMIN}; + +cap = cap_init(); + +/* set capabilities to be permitted and inheritable. we don't need the + * caps to be effective right now as they'll get reset when we seteuid + * anyway */ +cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET); +cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET); + +if (cap_set_proc(cap) == -1) { +return -1; +} + +cap_free(cap); + +/* reduce our privileges to a normal user */ +setegid(getgid()); +seteuid(getuid()); + +cap = cap_init(); + +/* enable the our capabilities. we marked them as inheritable earlier + * which is what allows this to work. */ +cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET); +cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET); + +if (cap_set_proc(cap) == -1) { +return -1; +} + +cap_free(cap); + +return 0; +} +#endif + int main(int argc, char **argv) { struct ifreq ifr; @@ -198,6 +243,17 @@ int main(int argc, char **argv) int acl_count = 0; int i, access_allowed, access_denied; +#ifdef CONFIG_LIBCAP +/* if we're run from an suid binary, immediately drop privileges preserving + * cap_net_admin */ +if (geteuid() == 0 && getuid() != geteuid()) { +if (drop_privileges() == -1) { +fprintf(stderr, "failed to drop privileges\n"); +return 1; +} +} +#endif + /* parse arguments */ if (argc < 3 || argc > 4) { fprintf(stderr, "Usage: %s [--use-vnet] BRIDGE FD\n", argv[0]); -- 1.7.1
[Qemu-devel] [PATCH 4/4] Add support for bridge
The most common use of -net tap is to connect a tap device to a bridge. This requires the use of a script and running qemu as root in order to allocate a tap device to pass to the script. This model is great for portability and flexibility but it's incredibly difficult to eliminate the need to run qemu as root. The only really viable mechanism is to use tunctl to create a tap device, attach it to a bridge as root, and then hand that tap device to qemu. The problem with this mechanism is that it requires administrator intervention whenever a user wants to create a guest. By essentially writing a helper that implements the most common qemu-ifup script that can be safely given cap_net_admin, we can dramatically simplify things for non-privileged users. We still support existing -net tap options as a mechanism for advanced users and backwards compatibility. Currently, this is very Linux centric but there's really no reason why it couldn't be extended for other Unixes. The default bridge that we attach to is qemubr0. The thinking is that a distro could preconfigure such an interface to allow out-of-the-box bridged networking. Alternatively, if a user wants to use a different bridge, they can say: qemu-hda linux.img -net tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper -net nic,model=virtio Signed-off-by: Richa Marwaha --- configure |2 + net.c |8 +++ net.h |2 + net/tap.c | 150 --- qemu-options.hx | 48 +- 5 files changed, 190 insertions(+), 20 deletions(-) diff --git a/configure b/configure index f46e9b7..ef05954 100755 --- a/configure +++ b/configure @@ -2775,6 +2775,8 @@ echo "sysconfdir=$sysconfdir" >> $config_host_mak echo "docdir=$docdir" >> $config_host_mak echo "libexecdir=\${prefix}/libexec" >> $config_host_mak echo "confdir=$confdir" >> $config_host_mak +echo "CONFIG_QEMU_SHAREDIR=\"$prefix$datasuffix\"" >> $config_host_mak +echo "CONFIG_QEMU_HELPERDIR=\"$prefix/libexec\"" >> $config_host_mak case "$cpu" in i386|x86_64|alpha|cris|hppa|ia64|lm32|m68k|microblaze|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64|unicore32) diff --git a/net.c b/net.c index d05930c..4c3c551 100644 --- a/net.c +++ b/net.c @@ -956,6 +956,14 @@ static const struct { .type = QEMU_OPT_STRING, .help = "script to shut down the interface", }, { +.name = "br", +.type = QEMU_OPT_STRING, +.help = "bridge name", +}, { +.name = "helper", +.type = QEMU_OPT_STRING, +.help = "command to execute to configure bridge", +}, { .name = "sndbuf", .type = QEMU_OPT_SIZE, .help = "send buffer limit" diff --git a/net.h b/net.h index 9f633f8..eeb19a7 100644 --- a/net.h +++ b/net.h @@ -174,6 +174,8 @@ int do_netdev_del(Monitor *mon, const QDict *qdict, QObject **ret_data); #define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup" #define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown" +#define DEFAULT_BRIDGE_HELPER CONFIG_QEMU_HELPERDIR "/qemu-bridge-helper" +#define DEFAULT_BRIDGE_INTERFACE "qemubr0" void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd); diff --git a/net/tap.c b/net/tap.c index 1f26dc9..74f103a 100644 --- a/net/tap.c +++ b/net/tap.c @@ -388,6 +388,108 @@ static int launch_script(const char *setup_script, const char *ifname, int fd) return -1; } +static int recv_fd(int c) +{ +int fd; +uint8_t msgbuf[CMSG_SPACE(sizeof(fd))]; +struct msghdr msg = { +.msg_control = msgbuf, +.msg_controllen = sizeof(msgbuf), +}; +struct cmsghdr *cmsg; +struct iovec iov; +uint8_t req[1]; +ssize_t len; + +cmsg = CMSG_FIRSTHDR(&msg); +cmsg->cmsg_level = SOL_SOCKET; +cmsg->cmsg_type = SCM_RIGHTS; +cmsg->cmsg_len = CMSG_LEN(sizeof(fd)); +msg.msg_controllen = cmsg->cmsg_len; + +iov.iov_base = req; +iov.iov_len = sizeof(req); + +msg.msg_iov = &iov; +msg.msg_iovlen = 1; + +len = recvmsg(c, &msg, 0); +if (len > 0) { +memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd)); +return fd; +} + +return len; +} + +static int net_bridge_run_helper(const char *helper, const char *bridge) +{ +sigset_t oldmask, mask; +int pid, status; +char *args[5]; +char **parg; +int sv[2]; + +sigemptyset(&mask); +sigaddset(&mask, SIGCHLD); +sigprocmask(SIG_BLOCK, &mask, &oldmask); + +if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) { +return -1; +} + +/* try to launch bridge helper */ +pid = fork(); +if (pid == 0) { +int open_max = sysconf(_SC_OPEN_MAX), i; +char buf[32]; + +snprintf(buf, sizeof(buf), "%d", sv[1]); + +for (i = 0; i < open_max; i++) { +if (i != STDIN_FILENO && +
[Qemu-devel] Running Qemu on Mac OS 10.7
Hi, Has anyone on this list tried and succeeded in running a recent version of Qemu on Mac OS X? Are there any precautions that need to be taken? For me 0.15.0 compiles fine on OS X 10.7, but crashes right after trying to load an image with a segfault: Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x003a 0 qemu0x00010d2cf5bd helper_svm_check_intercept_param + 29 (op_helper.c:5236) 1 qemu0x00010d2de109 helper_write_crN + 41 (op_helper.c:2946) 2 ??? 0x0001113d107c 0 + 4584181884 Cheers, Matthias
[Qemu-devel] [RFC] 1.0 release schedule adjustment (spreading out RCs)
Hi, I'm trying to map out the 1.1 release using the same formula as the 1.0 release. To make things work a bit better, I'd like to adjust the -rc schedule a bit. Namely: | 2011-11-01 | Freeze master |- | 2011-11-04 -> 2011-11-07 | Tag qemu-1.0-rc1 |- | 2011-11-11 -> 2011-11-14 | Tag qemu-1.0-rc2 |- | 2011-11-18 -> 2011-11-21 | Tag qemu-1.0-rc3 |- | 2011-11-23 -> 2011-11-28 | Tag qemu-1.0-rc4 |- | 2011-12-01 | Tag qemu-1.0 I had squashed things originally because of the US Thanksgiving holiday on the 25th but realistically, the 28th is no better than the 23rd. This spreads out the -rcs a bit more evenly. Any thoughts/objections? Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH] runstate: do not discard runstate changes when paused
On 2011-10-06 16:27, Avi Kivity wrote: > On 10/05/2011 08:02 PM, Jan Kiszka wrote: >> > >> > Let's examine a concrete example: a user is debugging a guest, which >> > stops at a breakpoint. Meanwhile a live migration is going on, >> > involving internal stops. When the guest does manage to run for a >> bit, >> > it runs out of disk space, generating a stop, which the management >> agent >> > resolves by allocating more space and issuing a cont. >> > >> > With a counting cont, no matter in what order these events happen, >> > things work out fine. How do they work out with your proposal? >> >> We can enforce stop for temporal reasons (migration/savevm), something >> that overrules user/management initiated stops. > > Migration resume shouldn't overrule user stop. That's not what I had in mind. Migration stop could overrule user resume. But that discussion is moot as there is no time span where this could happen. Migration just needs to re-enter the original state on error, savevm/loadvm restore what it found on entry. All this is atomic /wrt other agents. > > It's really simple. If any agent wants the system stopped, it's > stopped. Only when no one wants it stopped, it may run. > >> >> BTW, does stop due to migration actually have a window where it accepts >> other commands? I thought that phase is synchronous. Then we would just >> have to implement proper state saving/restoring. > > Save: ++stop_count, restore: --stop_count. > >> >> Anyway, there is no point in lock counting for stop reasons that require >> external synchronization anyway. gdb vs. management stack vs. human >> monitor - nothing is solved by counting the stops, they all can step on >> each other's shoes. > > Please elaborate. Every agent can issue every monitor command. If you have a gdb session running, you don't want the management stack to migrate your VM away or mess with it otherwise. If you try to migrate a machine, you don't want any other agent change its configuration beforehand, adding a device that is not present on the target, etc. > >> Even worse, exposing a counting stop via the user >> interface requires additional interfaces to recover lost or forgotten >> locks. We've discussed this in the past IIRC. >> > > Agree with that. So there's the second proposal: > > vm_stop(unsigned reason) > { > if (!stop_state) { > do_vm_stop(); > } > stop_state |= 1 << reason; > } > > vm_resume(unsigned reason) > { > stop_state &= ~(1 << reason); > if (!stop_state) { > do_vm_resume(); > } > } > > so now each agent is separated from the other. > Stop reasons are orthogonal to agents. BTW, the above model would still require extending the user interface to report pending stop reasons and allow specifying resume reasons. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] Set an invalid-bits mask for each SPE instructions
On 28/09/2011 17:54, Fabien Chouteau wrote: > SPE instructions are defined by pairs. Currently, the invalid-bits mask is set > for the first instruction, but the second one can have a different mask. > > example: > GEN_SPE(efdcmpeq,efdcfs, 0x17, 0x0B, 0x0060, 0x0018, > PPC_SPE_DOUBLE), > Any comments? -- Fabien Chouteau
[Qemu-devel] [PATCH V5] Add stdio char device on windows
Simple implementation of an stdio char device on Windows. Signed-off-by: Fabien Chouteau --- qemu-char.c | 227 ++- 1 files changed, 225 insertions(+), 2 deletions(-) diff --git a/qemu-char.c b/qemu-char.c index 09d2309..b9381be 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -538,6 +538,9 @@ int send_all(int fd, const void *_buf, int len1) } #endif /* !_WIN32 */ +#define STDIO_MAX_CLIENTS 1 +static int stdio_nb_clients; + #ifndef _WIN32 typedef struct { @@ -545,8 +548,6 @@ typedef struct { int max_size; } FDCharDriver; -#define STDIO_MAX_CLIENTS 1 -static int stdio_nb_clients = 0; static int fd_chr_write(CharDriverState *chr, const uint8_t *buf, int len) { @@ -1451,6 +1452,8 @@ static int qemu_chr_open_pp(QemuOpts *opts, CharDriverState **_chr) #else /* _WIN32 */ +static CharDriverState *stdio_clients[STDIO_MAX_CLIENTS]; + typedef struct { int max_size; HANDLE hcom, hrecv, hsend; @@ -1459,6 +1462,14 @@ typedef struct { DWORD len; } WinCharState; +typedef struct { +HANDLE hStdIn; +HANDLE hInputReadyEvent; +HANDLE hInputDoneEvent; +HANDLE hInputThread; +uint8_t win_stdio_buf; +} WinStdioCharState; + #define NSENDBUF 2048 #define NRECVBUF 2048 #define MAXCONNECT 1 @@ -1809,6 +1820,217 @@ static int qemu_chr_open_win_file_out(QemuOpts *opts, CharDriverState **_chr) return qemu_chr_open_win_file(fd_out, _chr); } + +static int win_stdio_write(CharDriverState *chr, const uint8_t *buf, int len) +{ +HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE); +DWORD dwSize; +int len1; + +len1 = len; + +while (len1 > 0) { +if (!WriteFile(hStdOut, buf, len1, &dwSize, NULL)) { +break; +} +buf += dwSize; +len1 -= dwSize; +} + +return len - len1; +} + +static void win_stdio_wait_func(void *opaque) +{ +CharDriverState *chr = opaque; +WinStdioCharState *stdio = chr->opaque; +INPUT_RECORD buf[4]; +intret; +DWORD dwSize; +inti; + +ret = ReadConsoleInput(stdio->hStdIn, buf, sizeof(buf) / sizeof(*buf), + &dwSize); + +if (!ret) { +/* Avoid error storm */ +qemu_del_wait_object(stdio->hStdIn, NULL, NULL); +return; +} + +for (i = 0; i < dwSize; i++) { +KEY_EVENT_RECORD *kev = &buf[i].Event.KeyEvent; + +if (buf[i].EventType == KEY_EVENT && kev->bKeyDown) { +int j; +if (kev->uChar.AsciiChar != 0) { +for (j = 0; j < kev->wRepeatCount; j++) { +if (qemu_chr_be_can_write(chr)) { +uint8_t c = kev->uChar.AsciiChar; +qemu_chr_be_write(chr, &c, 1); +} +} +} +} +} +} + +static DWORD WINAPI win_stdio_thread(LPVOID param) +{ +CharDriverState *chr = param; +WinStdioCharState *stdio = chr->opaque; +intret; +DWORD dwSize; + +while (1) { + +/* Wait for one byte */ +ret = ReadFile(stdio->hStdIn, &stdio->win_stdio_buf, 1, &dwSize, NULL); + +/* Exit in case of error, continue if nothing read */ +if (!ret) { +break; +} +if (!dwSize) { +continue; +} + +/* Some terminal emulator returns \r\n for Enter, just pass \n */ +if (stdio->win_stdio_buf == '\r') { +continue; +} + +/* Signal the main thread and wait until the byte was eaten */ +if (!SetEvent(stdio->hInputReadyEvent)) { +break; +} +if (WaitForSingleObject(stdio->hInputDoneEvent, INFINITE) +!= WAIT_OBJECT_0) { +break; +} +} + +qemu_del_wait_object(stdio->hInputReadyEvent, NULL, NULL); +return 0; +} + +static void win_stdio_thread_wait_func(void *opaque) +{ +CharDriverState *chr = opaque; +WinStdioCharState *stdio = chr->opaque; + +if (qemu_chr_be_can_write(chr)) { +qemu_chr_be_write(chr, &stdio->win_stdio_buf, 1); +} + +SetEvent(stdio->hInputDoneEvent); +} + +static void qemu_chr_set_echo_win_stdio(CharDriverState *chr, bool echo) +{ +WinStdioCharState *stdio = chr->opaque; +DWORD dwMode = 0; + +GetConsoleMode(stdio->hStdIn, &dwMode); + +if (echo) { +SetConsoleMode(stdio->hStdIn, dwMode | ENABLE_ECHO_INPUT); +} else { +SetConsoleMode(stdio->hStdIn, dwMode & ~ENABLE_ECHO_INPUT); +} +} + +static void win_stdio_close(CharDriverState *chr) +{ +WinStdioCharState *stdio = chr->opaque; + +if (stdio->hInputReadyEvent != INVALID_HANDLE_VALUE) { +CloseHandle(stdio->hInputReadyEvent); +} +if (stdio->hInputDoneEvent != INVALID_HANDLE_VALUE) { +CloseHandle(stdio->hInputDoneEvent); +} +if (stdio->hInputThrea
Re: [Qemu-devel] [PATCH] runstate: do not discard runstate changes when paused
On 10/05/2011 08:02 PM, Jan Kiszka wrote: > > Let's examine a concrete example: a user is debugging a guest, which > stops at a breakpoint. Meanwhile a live migration is going on, > involving internal stops. When the guest does manage to run for a bit, > it runs out of disk space, generating a stop, which the management agent > resolves by allocating more space and issuing a cont. > > With a counting cont, no matter in what order these events happen, > things work out fine. How do they work out with your proposal? We can enforce stop for temporal reasons (migration/savevm), something that overrules user/management initiated stops. Migration resume shouldn't overrule user stop. It's really simple. If any agent wants the system stopped, it's stopped. Only when no one wants it stopped, it may run. BTW, does stop due to migration actually have a window where it accepts other commands? I thought that phase is synchronous. Then we would just have to implement proper state saving/restoring. Save: ++stop_count, restore: --stop_count. Anyway, there is no point in lock counting for stop reasons that require external synchronization anyway. gdb vs. management stack vs. human monitor - nothing is solved by counting the stops, they all can step on each other's shoes. Please elaborate. Even worse, exposing a counting stop via the user interface requires additional interfaces to recover lost or forgotten locks. We've discussed this in the past IIRC. Agree with that. So there's the second proposal: vm_stop(unsigned reason) { if (!stop_state) { do_vm_stop(); } stop_state |= 1 << reason; } vm_resume(unsigned reason) { stop_state &= ~(1 << reason); if (!stop_state) { do_vm_resume(); } } so now each agent is separated from the other. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [fedora-virt] balloon drivers missing in virtio-win-1.1.16.vfd
- Original Message - > From: "Justin M. Forbes" > To: "Andrew Cathrow" > Cc: v...@lists.fedoraproject.org, "Onkar N Mahajan" , > qemu-devel@nongnu.org, k...@vger.kernel.org > Sent: Thursday, October 6, 2011 9:35:44 AM > Subject: Re: [Qemu-devel] [fedora-virt] balloon drivers missing in > virtio-win-1.1.16.vfd > > On Thu, 2011-10-06 at 02:33 -0400, Andrew Cathrow wrote: > > > > > > - Original Message - > > > From: "Onkar N Mahajan" > > > To: k...@vger.kernel.org, qemu-devel@nongnu.org > > > Sent: Thursday, September 29, 2011 6:03:26 AM > > > Subject: balloon drivers missing in virtio-win-1.1.16.vfd > > > > > > virtio_balloon drivers are missing in the virtio-win floppy disk > > > image > > > found at > > > http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/ > > > whereas they are present in the ISO image , any specific reason > > > for > > > this ? Shouldn't they be ideally present ? > > > The vfd is not supposed to contain the full set of drivers, it is > meant > to be the bare minimum drivers required to install (and fit in > 1.44mb). > The vfd only contains network and block drivers so that you can > install > the system and grab the full set of drivers from the ISO or another > location. Later versions of Windows can install using the ISO for > drivers and do not need the vfd at all. Makes sense, thanks Aic > > Justin > > > >
Re: [Qemu-devel] [fedora-virt] balloon drivers missing in virtio-win-1.1.16.vfd
On Thu, 2011-10-06 at 02:33 -0400, Andrew Cathrow wrote: > > > - Original Message - > > From: "Onkar N Mahajan" > > To: k...@vger.kernel.org, qemu-devel@nongnu.org > > Sent: Thursday, September 29, 2011 6:03:26 AM > > Subject: balloon drivers missing in virtio-win-1.1.16.vfd > > > > virtio_balloon drivers are missing in the virtio-win floppy disk > > image > > found at > > http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/ > > whereas they are present in the ISO image , any specific reason for > > this ? Shouldn't they be ideally present ? The vfd is not supposed to contain the full set of drivers, it is meant to be the bare minimum drivers required to install (and fit in 1.44mb). The vfd only contains network and block drivers so that you can install the system and grab the full set of drivers from the ISO or another location. Later versions of Windows can install using the ISO for drivers and do not need the vfd at all. Justin
[Qemu-devel] [PATCH 02/25] PPC: Fix via-cuda memory registration
From: Alexander Graf Commit 23c5e4ca (convert to memory API) broke the VIA Cuda emulation layer by not registering the IO structs. This patch registers them properly and thus makes -M g3beige and -M mac99 work again. Signed-off-by: Alexander Graf Signed-off-by: Avi Kivity --- hw/cuda.c | 28 1 files changed, 16 insertions(+), 12 deletions(-) diff --git a/hw/cuda.c b/hw/cuda.c index 5c92d81..736de7f 100644 --- a/hw/cuda.c +++ b/hw/cuda.c @@ -633,16 +633,20 @@ static uint32_t cuda_readl (void *opaque, target_phys_addr_t addr) return 0; } -static CPUWriteMemoryFunc * const cuda_write[] = { -&cuda_writeb, -&cuda_writew, -&cuda_writel, -}; - -static CPUReadMemoryFunc * const cuda_read[] = { -&cuda_readb, -&cuda_readw, -&cuda_readl, +static MemoryRegionOps cuda_ops = { +.old_mmio = { +.write = { +cuda_writeb, +cuda_writew, +cuda_writel, +}, +.read = { +cuda_readb, +cuda_readw, +cuda_readl, +}, +}, +.endianness = DEVICE_NATIVE_ENDIAN, }; static bool cuda_timer_exist(void *opaque, int version_id) @@ -739,8 +743,8 @@ void cuda_init (MemoryRegion **cuda_mem, qemu_irq irq) s->tick_offset = (uint32_t)mktimegm(&tm) + RTC_OFFSET; s->adb_poll_timer = qemu_new_timer_ns(vm_clock, cuda_adb_poll, s); -cpu_register_io_memory(cuda_read, cuda_write, s, - DEVICE_NATIVE_ENDIAN); +memory_region_init_io(&s->mem, &cuda_ops, s, "cuda", 0x2000); + *cuda_mem = &s->mem; vmstate_register(NULL, -1, &vmstate_cuda, s); qemu_register_reset(cuda_reset, s); -- 1.7.6.3
[Qemu-devel] [PATCH 13/25] isa: Add isa_register_portio_list()
Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/isa-bus.c | 17 + hw/isa.h | 31 ++- 2 files changed, 47 insertions(+), 1 deletions(-) diff --git a/hw/isa-bus.c b/hw/isa-bus.c index e9c1712..5d8ff84 100644 --- a/hw/isa-bus.c +++ b/hw/isa-bus.c @@ -103,6 +103,23 @@ void isa_register_ioport(ISADevice *dev, MemoryRegion *io, uint16_t start) } } +void isa_register_portio_list(ISADevice *dev, uint16_t start, + const MemoryRegionPortio *pio_start, + void *opaque, const char *name) +{ +PortioList *piolist = g_new(PortioList, 1); + +/* START is how we should treat DEV, regardless of the actual + contents of the portio array. This is how the old code + actually handled e.g. the FDC device. */ +if (dev) { +isa_init_ioport(dev, start); +} + +portio_list_init(piolist, pio_start, opaque, name); +portio_list_add(piolist, isabus->address_space_io, start); +} + static int isa_qdev_init(DeviceState *qdev, DeviceInfo *base) { ISADevice *dev = DO_UPCAST(ISADevice, qdev, qdev); diff --git a/hw/isa.h b/hw/isa.h index c5c2618..177ef95 100644 --- a/hw/isa.h +++ b/hw/isa.h @@ -28,7 +28,6 @@ ISABus *isa_bus_new(DeviceState *dev, MemoryRegion *address_space_io); void isa_bus_irqs(qemu_irq *irqs); qemu_irq isa_get_irq(int isairq); void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq); -void isa_register_ioport(ISADevice *dev, MemoryRegion *io, uint16_t start); void isa_init_ioport(ISADevice *dev, uint16_t ioport); void isa_init_ioport_range(ISADevice *dev, uint16_t start, uint16_t length); void isa_qdev_register(ISADeviceInfo *info); @@ -37,6 +36,36 @@ ISADevice *isa_create(const char *name); ISADevice *isa_try_create(const char *name); ISADevice *isa_create_simple(const char *name); +/** + * isa_register_ioport: Install an I/O port region on the ISA bus. + * + * Register an I/O port region via memory_region_add_subregion + * inside the ISA I/O address space. + * + * @dev: the ISADevice against which these are registered; may be NULL. + * @io: the #MemoryRegion being registered. + * @start: the base I/O port. + */ +void isa_register_ioport(ISADevice *dev, MemoryRegion *io, uint16_t start); + +/** + * isa_register_portio_list: Initialize a set of ISA io ports + * + * Several ISA devices have many dis-joint I/O ports. Worse, these I/O + * ports can be interleaved with I/O ports from other devices. This + * function makes it easy to create multiple MemoryRegions for a single + * device and use the legacy portio routines. + * + * @dev: the ISADevice against which these are registered; may be NULL. + * @start: the base I/O port against which the portio->offset is applied. + * @portio: the ports, sorted by offset. + * @opaque: passed into the old_portio callbacks. + * @name: passed into memory_region_init_io. + */ +void isa_register_portio_list(ISADevice *dev, uint16_t start, + const MemoryRegionPortio *portio, + void *opaque, const char *name); + extern target_phys_addr_t isa_mem_base; void isa_mmio_setup(MemoryRegion *mr, target_phys_addr_t size); -- 1.7.6.3
[Qemu-devel] [PATCH 10/25] isa: Tidy support code for isabus_get_fw_dev_path
From: Richard Henderson The only user of ISADevice.ioports is isabus_get_fw_dev_path, and it only looks at the first entry of the array. Which suggests that this entire array+sort operation can be replaced by a simple minimum. Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/isa-bus.c | 25 + hw/isa.h |5 + 2 files changed, 6 insertions(+), 24 deletions(-) diff --git a/hw/isa-bus.c b/hw/isa-bus.c index 6c15a31..e9c1712 100644 --- a/hw/isa-bus.c +++ b/hw/isa-bus.c @@ -83,24 +83,11 @@ void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq) dev->nirqs++; } -static void isa_init_ioport_one(ISADevice *dev, uint16_t ioport) -{ -assert(dev->nioports < ARRAY_SIZE(dev->ioports)); -dev->ioports[dev->nioports++] = ioport; -} - -static int isa_cmp_ports(const void *p1, const void *p2) -{ -return *(uint16_t*)p1 - *(uint16_t*)p2; -} - void isa_init_ioport_range(ISADevice *dev, uint16_t start, uint16_t length) { -int i; -for (i = start; i < start + length; i++) { -isa_init_ioport_one(dev, i); +if (dev->ioport_id == 0 || start < dev->ioport_id) { +dev->ioport_id = start; } -qsort(dev->ioports, dev->nioports, sizeof(dev->ioports[0]), isa_cmp_ports); } void isa_init_ioport(ISADevice *dev, uint16_t ioport) @@ -112,9 +99,7 @@ void isa_register_ioport(ISADevice *dev, MemoryRegion *io, uint16_t start) { memory_region_add_subregion(isabus->address_space_io, start, io); if (dev != NULL) { -assert(dev->nio < ARRAY_SIZE(dev->io)); -dev->io[dev->nio++] = io; -isa_init_ioport_range(dev, start, memory_region_size(io)); +isa_init_ioport(dev, start); } } @@ -208,8 +193,8 @@ static void isabus_register_devices(void) int off; off = snprintf(path, sizeof(path), "%s", qdev_fw_name(dev)); -if (d->nioports) { -snprintf(path + off, sizeof(path) - off, "@%04x", d->ioports[0]); +if (d->ioport_id) { +snprintf(path + off, sizeof(path) - off, "@%04x", d->ioport_id); } return strdup(path); diff --git a/hw/isa.h b/hw/isa.h index 432d17a..c5c2618 100644 --- a/hw/isa.h +++ b/hw/isa.h @@ -13,12 +13,9 @@ typedef struct ISADeviceInfo ISADeviceInfo; struct ISADevice { DeviceState qdev; -MemoryRegion *io[32]; uint32_t isairq[2]; -uint16_t ioports[32]; int nirqs; -int nioports; -int nio; +int ioport_id; }; typedef int (*isa_qdev_initfn)(ISADevice *dev); -- 1.7.6.3
[Qemu-devel] [PATCH 23/25] vmport: Convert to isa_register_ioport
From: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/vmport.c | 16 +--- 1 files changed, 13 insertions(+), 3 deletions(-) diff --git a/hw/vmport.c b/hw/vmport.c index c8aefaa..b5c6fa1 100644 --- a/hw/vmport.c +++ b/hw/vmport.c @@ -38,6 +38,7 @@ typedef struct _VMPortState { ISADevice dev; +MemoryRegion io; IOPortReadFunc *func[VMPORT_ENTRIES]; void *opaque[VMPORT_ENTRIES]; } VMPortState; @@ -120,13 +121,22 @@ void vmmouse_set_data(const uint32_t *data) env->regs[R_ESI] = data[4]; env->regs[R_EDI] = data[5]; } +static const MemoryRegionPortio vmport_portio[] = { +{0, 1, 4, .read = vmport_ioport_read, .write = vmport_ioport_write }, +PORTIO_END_OF_LIST(), +}; + +static const MemoryRegionOps vmport_ops = { +.old_portio = vmport_portio +}; + static int vmport_initfn(ISADevice *dev) { VMPortState *s = DO_UPCAST(VMPortState, dev, dev); -register_ioport_read(0x5658, 1, 4, vmport_ioport_read, s); -register_ioport_write(0x5658, 1, 4, vmport_ioport_write, s); -isa_init_ioport(dev, 0x5658); +memory_region_init_io(&s->io, &vmport_ops, s, "vmport", 1); +isa_register_ioport(dev, &s->io, 0x5658); + port_state = s; /* Register some generic port commands */ vmport_register(VMPORT_CMD_GETVERSION, vmport_cmd_get_version, NULL); -- 1.7.6.3
[Qemu-devel] [PATCH] qemu-options: avoid #if in spicevmc texi help
Preprocessor directives cannot be used in STEXI/ETEXI sections since they are not passed through the preprocessor. The spicevmc chardev option help currently uses #if, which is included verbatim in the man page output. Fix this by simply stating that spicevmc chardevs are available only in builds with spice support. Signed-off-by: Stefan Hajnoczi --- qemu-options.hx |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index dfbabd0..d4fe990 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -1673,15 +1673,15 @@ Connect to a local parallel port. @option{path} specifies the path to the parallel port device. @option{path} is required. -#if defined(CONFIG_SPICE) @item -chardev spicevmc ,id=@var{id} ,debug=@var{debug}, name=@var{name} +@option{spicevmc} is only available when spice support is built in. + @option{debug} debug level for spicevmc @option{name} name of spice channel to connect to Connect to a spice virtual machine channel, such as vdiport. -#endif @end table ETEXI -- 1.7.6.3
[Qemu-devel] [PATCH 07/25] hw/arm11mpcore: Clean up to avoid using sysbus_mmio_init_cb2
From: Peter Maydell Clean up the initialisation of the realview_mpcore device to avoid using sysbus_init_mmio_cb2(): we can pass through the MemoryRegion of the private arm11mpcore_priv device directly now. Signed-off-by: Peter Maydell Signed-off-by: Avi Kivity --- hw/arm11mpcore.c | 13 + 1 files changed, 1 insertions(+), 12 deletions(-) diff --git a/hw/arm11mpcore.c b/hw/arm11mpcore.c index 7d60ef6..974a0d8 100644 --- a/hw/arm11mpcore.c +++ b/hw/arm11mpcore.c @@ -48,17 +48,6 @@ static void mpcore_rirq_set_irq(void *opaque, int irq, int level) } } -static void mpcore_rirq_map(SysBusDevice *dev, target_phys_addr_t base) -{ -mpcore_rirq_state *s = FROM_SYSBUS(mpcore_rirq_state, dev); -sysbus_mmio_map(s->priv, 0, base); -} - -static void mpcore_rirq_unmap(SysBusDevice *dev, target_phys_addr_t base) -{ -/* nothing to do */ -} - static int realview_mpcore_init(SysBusDevice *dev) { mpcore_rirq_state *s = FROM_SYSBUS(mpcore_rirq_state, dev); @@ -84,7 +73,7 @@ static int realview_mpcore_init(SysBusDevice *dev) } } qdev_init_gpio_in(&dev->qdev, mpcore_rirq_set_irq, 64); -sysbus_init_mmio_cb2(dev, mpcore_rirq_map, mpcore_rirq_unmap); +sysbus_init_mmio_region(dev, sysbus_mmio_get_region(s->priv, 0)); return 0; } -- 1.7.6.3
[Qemu-devel] [PATCH 22/25] pc: Convert port92 to isa_register_ioport
From: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/pc.c | 16 +--- 1 files changed, 13 insertions(+), 3 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 203627d..ded4758 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -428,6 +428,7 @@ void pc_cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size, /* port 92 stuff: could be split off */ typedef struct Port92State { ISADevice dev; +MemoryRegion io; uint8_t outport; qemu_irq *a20_out; } Port92State; @@ -479,13 +480,22 @@ static void port92_reset(DeviceState *d) s->outport &= ~1; } +static const MemoryRegionPortio port92_portio[] = { +{ 0, 1, 1, .read = port92_read, .write = port92_write }, +PORTIO_END_OF_LIST(), +}; + +static const MemoryRegionOps port92_ops = { +.old_portio = port92_portio +}; + static int port92_initfn(ISADevice *dev) { Port92State *s = DO_UPCAST(Port92State, dev, dev); -register_ioport_read(0x92, 1, 1, port92_read, s); -register_ioport_write(0x92, 1, 1, port92_write, s); -isa_init_ioport(dev, 0x92); +memory_region_init_io(&s->io, &port92_ops, s, "port92", 1); +isa_register_ioport(dev, &s->io, 0x92); + s->outport = 0; return 0; } -- 1.7.6.3
[Qemu-devel] qemu guest agent spins in poll/nanosleep(100ms) when nothing is listening on host
I've been doing some experimentation with the QEMU guest agent and have noticed that when nothing is connected on the host side of the virtio serial channel, the guest agent just spins in a pool/sleep(100ms) loop. I know you'd ordinarily expect some mgmt app in the host to be listening to the other end of the channel, but it still seems suboptimal to have to spin in a loop like this when nothing is listening, constantly causing wakeups in an otherwise idle guest. Looking at the qemu-ga.c code I see two places where it might handle a poll event and then sleep, when nothing is on the other end of the virtio serial socket. case G_IO_STATUS_AGAIN: /* virtio causes us to spin here when no process is attached to * host-side chardev. sleep a bit to mitigate this */ if (s->virtio) { usleep(100*1000); } return true; } else if (strcmp(s->method, "virtio-serial") == 0) { /* we spin on EOF for virtio-serial, so back off a bit. also, * dont close the connection in this case, it'll resume normal * operation when another process connects to host chardev */ usleep(100*1000); goto out_noclose; } I get the feeling that this kind of problem inherant in the use of any virtio-serial channel, in the same way you can't detect EOF for a regular serial device channel either. Given that virtio-serial is a nice paravirt device, is there anything we can do to it, to allow better handling of EOF by applications ? Or perhaps there is some way to make use of epoll() in edge-triggered mode to detect it already, because IIUC, edge-triggered mode should only fire once for the EOF condition, and then not fire again until something in the host actually sends some data ? Of course glib's event loop doesn't support edge-triggered events/epoll, but perhaps we could just call epoll() directly in the event handler, instead of the usleep() call ? Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
[Qemu-devel] [PATCH 24/25] ide: Convert to isa_register_portio_list
From: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/ide/core.c | 30 +++--- hw/ide/internal.h |3 ++- hw/ide/isa.c |4 +--- hw/ide/piix.c |7 --- hw/ide/via.c |7 --- 5 files changed, 30 insertions(+), 21 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 4e76fc7..9eaf7f2 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -25,6 +25,7 @@ #include #include #include +#include #include "qemu-error.h" #include "qemu-timer.h" #include "sysemu.h" @@ -1969,20 +1970,27 @@ void ide_init2_with_non_qdev_drives(IDEBus *bus, DriveInfo *hd0, bus->dma = &ide_dma_nop; } -void ide_init_ioport(IDEBus *bus, int iobase, int iobase2) +static const MemoryRegionPortio ide_portio_list[] = { +{ 0, 8, 1, .read = ide_ioport_read, .write = ide_ioport_write }, +{ 0, 2, 2, .read = ide_data_readw, .write = ide_data_writew }, +{ 0, 4, 4, .read = ide_data_readl, .write = ide_data_writel }, +PORTIO_END_OF_LIST(), +}; + +static const MemoryRegionPortio ide_portio2_list[] = { +{ 0, 1, 1, .read = ide_status_read, .write = ide_cmd_write }, +PORTIO_END_OF_LIST(), +}; + +void ide_init_ioport(IDEBus *bus, ISADevice *dev, int iobase, int iobase2) { -register_ioport_write(iobase, 8, 1, ide_ioport_write, bus); -register_ioport_read(iobase, 8, 1, ide_ioport_read, bus); +/* ??? Assume only ISA and PCI configurations, and that the PCI-ISA + bridge has been setup properly to always register with ISA. */ +isa_register_portio_list(dev, iobase, ide_portio_list, bus, "ide"); + if (iobase2) { -register_ioport_read(iobase2, 1, 1, ide_status_read, bus); -register_ioport_write(iobase2, 1, 1, ide_cmd_write, bus); +isa_register_portio_list(dev, iobase2, ide_portio2_list, bus, "ide"); } - -/* data ports */ -register_ioport_write(iobase, 2, 2, ide_data_writew, bus); -register_ioport_read(iobase, 2, 2, ide_data_readw, bus); -register_ioport_write(iobase, 4, 4, ide_data_writel, bus); -register_ioport_read(iobase, 4, 4, ide_data_readl, bus); } static bool is_identify_set(void *opaque, int version_id) diff --git a/hw/ide/internal.h b/hw/ide/internal.h index 9046e96..c39dc05 100644 --- a/hw/ide/internal.h +++ b/hw/ide/internal.h @@ -7,6 +7,7 @@ * non-internal declarations are in hw/ide.h */ #include +#include #include "iorange.h" #include "dma.h" #include "sysemu.h" @@ -600,7 +601,7 @@ int ide_init_drive(IDEState *s, BlockDriverState *bs, IDEDriveKind kind, void ide_init2(IDEBus *bus, qemu_irq irq); void ide_init2_with_non_qdev_drives(IDEBus *bus, DriveInfo *hd0, DriveInfo *hd1, qemu_irq irq); -void ide_init_ioport(IDEBus *bus, int iobase, int iobase2); +void ide_init_ioport(IDEBus *bus, ISADevice *isa, int iobase, int iobase2); void ide_exec_cmd(IDEBus *bus, uint32_t val); void ide_dma_cb(void *opaque, int ret); diff --git a/hw/ide/isa.c b/hw/ide/isa.c index 28b69d2..01a9e59 100644 --- a/hw/ide/isa.c +++ b/hw/ide/isa.c @@ -66,10 +66,8 @@ static int isa_ide_initfn(ISADevice *dev) ISAIDEState *s = DO_UPCAST(ISAIDEState, dev, dev); ide_bus_new(&s->bus, &s->dev.qdev, 0); -ide_init_ioport(&s->bus, s->iobase, s->iobase2); +ide_init_ioport(&s->bus, dev, s->iobase, s->iobase2); isa_init_irq(dev, &s->irq, s->isairq); -isa_init_ioport_range(dev, s->iobase, 8); -isa_init_ioport(dev, s->iobase2); ide_init2(&s->bus, s->irq); vmstate_register(&dev->qdev, 0, &vmstate_ide_isa, s); return 0; diff --git a/hw/ide/piix.c b/hw/ide/piix.c index 88d3181..08cbbe2 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -122,8 +122,7 @@ static void piix3_reset(void *opaque) } static void pci_piix_init_ports(PCIIDEState *d) { -int i; -struct { +static const struct { int iobase; int iobase2; int isairq; @@ -131,10 +130,12 @@ static void pci_piix_init_ports(PCIIDEState *d) { {0x1f0, 0x3f6, 14}, {0x170, 0x376, 15}, }; +int i; for (i = 0; i < 2; i++) { ide_bus_new(&d->bus[i], &d->dev.qdev, i); -ide_init_ioport(&d->bus[i], port_info[i].iobase, port_info[i].iobase2); +ide_init_ioport(&d->bus[i], NULL, port_info[i].iobase, +port_info[i].iobase2); ide_init2(&d->bus[i], isa_get_irq(port_info[i].isairq)); bmdma_init(&d->bus[i], &d->bmdma[i], d); diff --git a/hw/ide/via.c b/hw/ide/via.c index dab8a39..098f150 100644 --- a/hw/ide/via.c +++ b/hw/ide/via.c @@ -146,8 +146,7 @@ static void via_reset(void *opaque) } static void vt82c686b_init_ports(PCIIDEState *d) { -int i; -struct { +static const struct { int iobase; int iobase2; int isairq; @@ -155,10 +154,12 @@ static void vt82c686b_init_ports(PCIIDEState *d) { {0x1f0, 0x3f6, 14}, {0x170, 0x376, 15}, }; +int i;
[Qemu-devel] [PATCH 18/25] ne2000: Convert to isa_register_ioport
From: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/ne2000-isa.c |5 + 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/hw/ne2000-isa.c b/hw/ne2000-isa.c index 756ed5c..11ffee7 100644 --- a/hw/ne2000-isa.c +++ b/hw/ne2000-isa.c @@ -68,10 +68,7 @@ static int isa_ne2000_initfn(ISADevice *dev) NE2000State *s = &isa->ne2000; ne2000_setup_io(s, 0x20); -isa_init_ioport_range(dev, isa->iobase, 16); -isa_init_ioport_range(dev, isa->iobase + 0x10, 2); -isa_init_ioport(dev, isa->iobase + 0x1f); -memory_region_add_subregion(get_system_io(), isa->iobase, &s->io); +isa_register_ioport(dev, &s->io, isa->iobase); isa_init_irq(dev, &s->irq, isa->isairq); -- 1.7.6.3
[Qemu-devel] [PATCH 06/25] ppc405_boards: convert to memory API
Signed-off-by: Avi Kivity --- hw/ppc405_boards.c | 84 +++- 1 files changed, 37 insertions(+), 47 deletions(-) diff --git a/hw/ppc405_boards.c b/hw/ppc405_boards.c index ca65ac3..b28bdda 100644 --- a/hw/ppc405_boards.c +++ b/hw/ppc405_boards.c @@ -137,16 +137,16 @@ static void ref405ep_fpga_writel (void *opaque, ref405ep_fpga_writeb(opaque, addr + 3, value & 0xFF); } -static CPUReadMemoryFunc * const ref405ep_fpga_read[] = { -&ref405ep_fpga_readb, -&ref405ep_fpga_readw, -&ref405ep_fpga_readl, -}; - -static CPUWriteMemoryFunc * const ref405ep_fpga_write[] = { -&ref405ep_fpga_writeb, -&ref405ep_fpga_writew, -&ref405ep_fpga_writel, +static const MemoryRegionOps ref405ep_fpga_ops = { +.old_mmio = { +.read = { +ref405ep_fpga_readb, ref405ep_fpga_readw, ref405ep_fpga_readl, +}, +.write = { +ref405ep_fpga_writeb, ref405ep_fpga_writew, ref405ep_fpga_writel, +}, +}, +.endianness = DEVICE_NATIVE_ENDIAN, }; static void ref405ep_fpga_reset (void *opaque) @@ -158,16 +158,15 @@ static void ref405ep_fpga_reset (void *opaque) fpga->reg1 = 0x0F; } -static void ref405ep_fpga_init (uint32_t base) +static void ref405ep_fpga_init (MemoryRegion *sysmem, uint32_t base) { ref405ep_fpga_t *fpga; -int fpga_memory; +MemoryRegion *fpga_memory = g_new(MemoryRegion, 1); fpga = g_malloc0(sizeof(ref405ep_fpga_t)); -fpga_memory = cpu_register_io_memory(ref405ep_fpga_read, - ref405ep_fpga_write, fpga, - DEVICE_NATIVE_ENDIAN); -cpu_register_physical_memory(base, 0x0100, fpga_memory); +memory_region_init_io(fpga_memory, &ref405ep_fpga_ops, fpga, + "fpga", 0x0100); +memory_region_add_subregion(sysmem, base, fpga_memory); qemu_register_reset(&ref405ep_fpga_reset, fpga); } @@ -183,7 +182,8 @@ static void ref405ep_init (ram_addr_t ram_size, CPUPPCState *env; qemu_irq *pic; MemoryRegion *bios; -ram_addr_t sram_offset, bdloc; +MemoryRegion *sram = g_new(MemoryRegion, 1); +ram_addr_t bdloc; MemoryRegion *ram_memories = g_malloc(2 * sizeof(*ram_memories)); target_phys_addr_t ram_bases[2], ram_sizes[2]; target_ulong sram_size; @@ -195,6 +195,7 @@ static void ref405ep_init (ram_addr_t ram_size, int linux_boot; int fl_idx, fl_sectors, len; DriveInfo *dinfo; +MemoryRegion *sysmem = get_system_memory(); /* XXX: fix this */ memory_region_init_ram(&ram_memories[0], NULL, "ef405ep.ram", 0x0800); @@ -207,16 +208,12 @@ static void ref405ep_init (ram_addr_t ram_size, #ifdef DEBUG_BOARD_INIT printf("%s: register cpu\n", __func__); #endif -env = ppc405ep_init(get_system_memory(), ram_memories, ram_bases, ram_sizes, +env = ppc405ep_init(sysmem, ram_memories, ram_bases, ram_sizes, , &pic, kernel_filename == NULL ? 0 : 1); /* allocate SRAM */ sram_size = 512 * 1024; -sram_offset = qemu_ram_alloc(NULL, "ef405ep.sram", sram_size); -#ifdef DEBUG_BOARD_INIT -printf("%s: register SRAM at offset %08lx\n", __func__, sram_offset); -#endif -cpu_register_physical_memory(0xFFF0, sram_size, - sram_offset | IO_MEM_RAM); +memory_region_init_ram(sram, NULL, "ef405ep.sram", sram_size); +memory_region_add_subregion(sysmem, 0xFFF0, sram); /* allocate and load BIOS */ #ifdef DEBUG_BOARD_INIT printf("%s: register BIOS\n", __func__); @@ -263,14 +260,13 @@ static void ref405ep_init (ram_addr_t ram_size, } bios_size = (bios_size + 0xfff) & ~0xfff; memory_region_set_readonly(bios, true); -memory_region_add_subregion(get_system_memory(), -(uint32_t)(-bios_size), bios); +memory_region_add_subregion(sysmem, (uint32_t)(-bios_size), bios); } /* Register FPGA */ #ifdef DEBUG_BOARD_INIT printf("%s: register FPGA\n", __func__); #endif -ref405ep_fpga_init(0xF030); +ref405ep_fpga_init(sysmem, 0xF030); /* Register NVRAM */ #ifdef DEBUG_BOARD_INIT printf("%s: register NVRAM\n", __func__); @@ -468,16 +464,12 @@ static void taihu_cpld_writel (void *opaque, taihu_cpld_writeb(opaque, addr + 3, value & 0xFF); } -static CPUReadMemoryFunc * const taihu_cpld_read[] = { -&taihu_cpld_readb, -&taihu_cpld_readw, -&taihu_cpld_readl, -}; - -static CPUWriteMemoryFunc * const taihu_cpld_write[] = { -&taihu_cpld_writeb, -&taihu_cpld_writew, -&taihu_cpld_writel, +static const MemoryRegionOps taihu_cpld_ops = { +.old_mmio = { +.read = { taihu_cpld_readb, taihu_cpld_readw, taihu_cpld_readl, }, +.write = { taihu_cpld_writeb, taihu_cpld_writew, taihu_cpld_writel, }, +}, +.endianness = DEVICE_NATIVE_ENDIAN, }; static vo
[Qemu-devel] [PATCH 04/25] petalogix_ml605: convert to memory API
Signed-off-by: Avi Kivity --- hw/petalogix_ml605_mmu.c | 15 +++ 1 files changed, 7 insertions(+), 8 deletions(-) diff --git a/hw/petalogix_ml605_mmu.c b/hw/petalogix_ml605_mmu.c index 2a0f7fd..fb4ba29 100644 --- a/hw/petalogix_ml605_mmu.c +++ b/hw/petalogix_ml605_mmu.c @@ -149,8 +149,8 @@ static uint64_t translate_kernel_address(void *opaque, uint64_t addr) DriveInfo *dinfo; int i; target_phys_addr_t ddr_base = MEMORY_BASEADDR; -ram_addr_t phys_lmb_bram; -ram_addr_t phys_ram; +MemoryRegion *phys_lmb_bram = g_new(MemoryRegion, 1); +MemoryRegion *phys_ram = g_new(MemoryRegion, 1); qemu_irq irq[32], *cpu_irq; /* init CPUs */ @@ -162,13 +162,12 @@ static uint64_t translate_kernel_address(void *opaque, uint64_t addr) qemu_register_reset(main_cpu_reset, env); /* Attach emulated BRAM through the LMB. */ -phys_lmb_bram = qemu_ram_alloc(NULL, "petalogix_ml605.lmb_bram", - LMB_BRAM_SIZE); -cpu_register_physical_memory(0x, LMB_BRAM_SIZE, - phys_lmb_bram | IO_MEM_RAM); +memory_region_init_ram(phys_lmb_bram, NULL, "petalogix_ml605.lmb_bram", + LMB_BRAM_SIZE); +memory_region_add_subregion(address_space_mem, 0x, phys_lmb_bram); -phys_ram = qemu_ram_alloc(NULL, "petalogix_ml605.ram", ram_size); -cpu_register_physical_memory(ddr_base, ram_size, phys_ram | IO_MEM_RAM); +memory_region_init_ram(phys_ram, NULL, "petalogix_ml605.ram", ram_size); +memory_region_add_subregion(address_space_mem, ddr_base, phys_ram); dinfo = drive_get(IF_PFLASH, 0, 0); /* 5th parameter 2 means bank-width -- 1.7.6.3
[Qemu-devel] [PATCH 20/25] sb16: Convert to isa_register_portio_list
From: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/sb16.c | 32 +--- 1 files changed, 13 insertions(+), 19 deletions(-) diff --git a/hw/sb16.c b/hw/sb16.c index a76df1b..fe927e2 100644 --- a/hw/sb16.c +++ b/hw/sb16.c @@ -1341,12 +1341,21 @@ static int sb16_post_load (void *opaque, int version_id) } }; +static const MemoryRegionPortio sb16_ioport_list[] = { +{ 4, 1, 1, .write = mixer_write_indexb }, +{ 4, 1, 2, .write = mixer_write_indexw }, +{ 5, 1, 1, .read = mixer_read, .write = mixer_write_datab }, +{ 6, 1, 1, .read = dsp_read, .write = dsp_write }, +{ 10, 1, 1, .read = dsp_read }, +{ 12, 1, 1, .write = dsp_write }, +{ 12, 4, 1, .read = dsp_read }, +PORTIO_END_OF_LIST(), +}; + + static int sb16_initfn (ISADevice *dev) { -static const uint8_t dsp_write_ports[] = {0x6, 0xc}; -static const uint8_t dsp_read_ports[] = {0x6, 0xa, 0xc, 0xd, 0xe, 0xf}; SB16State *s; -int i; s = DO_UPCAST (SB16State, dev, dev); @@ -1366,22 +1375,7 @@ static int sb16_initfn (ISADevice *dev) dolog ("warning: Could not create auxiliary timer\n"); } -for (i = 0; i < ARRAY_SIZE (dsp_write_ports); i++) { -register_ioport_write (s->port + dsp_write_ports[i], 1, 1, dsp_write, s); -isa_init_ioport(dev, s->port + dsp_write_ports[i]); -} - -for (i = 0; i < ARRAY_SIZE (dsp_read_ports); i++) { -register_ioport_read (s->port + dsp_read_ports[i], 1, 1, dsp_read, s); -isa_init_ioport(dev, s->port + dsp_read_ports[i]); -} - -register_ioport_write (s->port + 0x4, 1, 1, mixer_write_indexb, s); -register_ioport_write (s->port + 0x4, 1, 2, mixer_write_indexw, s); -isa_init_ioport(dev, s->port + 0x4); -register_ioport_read (s->port + 0x5, 1, 1, mixer_read, s); -register_ioport_write (s->port + 0x5, 1, 1, mixer_write_datab, s); -isa_init_ioport(dev, s->port + 0x5); +isa_register_portio_list(dev, s->port, sb16_ioport_list, s, "sb16"); DMA_register_channel (s->hdma, SB_read_DMA, s); DMA_register_channel (s->dma, SB_read_DMA, s); -- 1.7.6.3
[Qemu-devel] [PATCH 09/25] ppc_newworld: convert to memory API
Signed-off-by: Avi Kivity --- hw/ppc_newworld.c | 39 +-- 1 files changed, 17 insertions(+), 22 deletions(-) diff --git a/hw/ppc_newworld.c b/hw/ppc_newworld.c index b1cc3d7..946070c 100644 --- a/hw/ppc_newworld.c +++ b/hw/ppc_newworld.c @@ -83,12 +83,13 @@ #endif /* UniN device */ -static void unin_writel (void *opaque, target_phys_addr_t addr, uint32_t value) +static void unin_write(void *opaque, target_phys_addr_t addr, uint64_t value, + unsigned size) { -UNIN_DPRINTF("writel addr " TARGET_FMT_plx " val %x\n", addr, value); +UNIN_DPRINTF("write addr " TARGET_FMT_plx " val %"PRIx64"\n", addr, value); } -static uint32_t unin_readl (void *opaque, target_phys_addr_t addr) +static uint64_t unin_read(void *opaque, target_phys_addr_t addr, unsigned size) { uint32_t value; @@ -98,16 +99,10 @@ static uint32_t unin_readl (void *opaque, target_phys_addr_t addr) return value; } -static CPUWriteMemoryFunc * const unin_write[] = { -&unin_writel, -&unin_writel, -&unin_writel, -}; - -static CPUReadMemoryFunc * const unin_read[] = { -&unin_readl, -&unin_readl, -&unin_readl, +static const MemoryRegionOps unin_ops = { +.read = unin_read, +.write = unin_write, +.endianness = DEVICE_NATIVE_ENDIAN, }; static int fw_cfg_boot_set(void *opaque, const char *boot_device) @@ -137,9 +132,9 @@ static void ppc_core99_init (ram_addr_t ram_size, CPUState *env = NULL; char *filename; qemu_irq *pic, **openpic_irqs; -int unin_memory; +MemoryRegion *unin_memory = g_new(MemoryRegion, 1); int linux_boot, i; -ram_addr_t ram_offset, bios_offset; +MemoryRegion *ram = g_new(MemoryRegion, 1), *bios = g_new(MemoryRegion, 1); target_phys_addr_t kernel_base, initrd_base, cmdline_base = 0; long kernel_size, initrd_size; PCIBus *pci_bus; @@ -175,15 +170,16 @@ static void ppc_core99_init (ram_addr_t ram_size, } /* allocate RAM */ -ram_offset = qemu_ram_alloc(NULL, "ppc_core99.ram", ram_size); -cpu_register_physical_memory(0, ram_size, ram_offset); +memory_region_init_ram(ram, NULL, "ppc_core99.ram", ram_size); +memory_region_add_subregion(get_system_memory(), 0, ram); /* allocate and load BIOS */ -bios_offset = qemu_ram_alloc(NULL, "ppc_core99.bios", BIOS_SIZE); +memory_region_init_ram(bios, NULL, "ppc_core99.bios", BIOS_SIZE); if (bios_name == NULL) bios_name = PROM_FILENAME; filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name); -cpu_register_physical_memory(PROM_ADDR, BIOS_SIZE, bios_offset | IO_MEM_ROM); +memory_region_set_readonly(bios, true); +memory_region_add_subregion(get_system_memory(), PROM_ADDR, bios); /* Load OpenBIOS (ELF) */ if (filename) { @@ -266,9 +262,8 @@ static void ppc_core99_init (ram_addr_t ram_size, isa_mmio_init(0xf200, 0x0080); /* UniN init */ -unin_memory = cpu_register_io_memory(unin_read, unin_write, NULL, - DEVICE_NATIVE_ENDIAN); -cpu_register_physical_memory(0xf800, 0x1000, unin_memory); +memory_region_init_io(unin_memory, &unin_ops, NULL, "unin", 0x1000); +memory_region_add_subregion(get_system_memory(), 0xf800, unin_memory); openpic_irqs = g_malloc0(smp_cpus * sizeof(qemu_irq *)); openpic_irqs[0] = -- 1.7.6.3
[Qemu-devel] [PATCH 08/25] hw/versatile_pci: Expose multiple sysbus mmio regions
From: Peter Maydell Clean up versatile_pci to expose the various PCI mmio regions properly as separate mmio regions rather than as a single mmio which uses callbacks to map and unmap everything. Signed-off-by: Peter Maydell Signed-off-by: Avi Kivity --- hw/realview.c | 12 ++-- hw/versatile_pci.c | 42 -- hw/versatilepb.c | 12 ++-- 3 files changed, 28 insertions(+), 38 deletions(-) diff --git a/hw/realview.c b/hw/realview.c index 549bb15..11ffb8a 100644 --- a/hw/realview.c +++ b/hw/realview.c @@ -272,8 +272,16 @@ static void realview_init(ram_addr_t ram_size, sysbus_create_simple("pl031", 0x10017000, pic[10]); if (!is_pb) { -dev = sysbus_create_varargs("realview_pci", 0x6000, -pic[48], pic[49], pic[50], pic[51], NULL); +dev = qdev_create(NULL, "realview_pci"); +busdev = sysbus_from_qdev(dev); +qdev_init_nofail(dev); +sysbus_mmio_map(busdev, 0, 0x6100); /* PCI self-config */ +sysbus_mmio_map(busdev, 1, 0x6200); /* PCI config */ +sysbus_mmio_map(busdev, 2, 0x6300); /* PCI I/O */ +sysbus_connect_irq(busdev, 0, pic[48]); +sysbus_connect_irq(busdev, 1, pic[49]); +sysbus_connect_irq(busdev, 2, pic[50]); +sysbus_connect_irq(busdev, 3, pic[51]); pci_bus = (PCIBus *)qdev_get_child_bus(dev, "pci"); if (usb_enabled) { usb_ohci_init_pci(pci_bus, -1); diff --git a/hw/versatile_pci.c b/hw/versatile_pci.c index 98e56f1..8a88696 100644 --- a/hw/versatile_pci.c +++ b/hw/versatile_pci.c @@ -58,38 +58,6 @@ static void pci_vpb_set_irq(void *opaque, int irq_num, int level) qemu_set_irq(pic[irq_num], level); } - -static void pci_vpb_map(SysBusDevice *dev, target_phys_addr_t base) -{ -PCIVPBState *s = (PCIVPBState *)dev; -/* Selfconfig area. */ -memory_region_add_subregion(get_system_memory(), base + 0x0100, -&s->mem_config); -/* Normal config area. */ -memory_region_add_subregion(get_system_memory(), base + 0x0200, -&s->mem_config2); - -if (s->realview) { -/* IO memory area. */ -memory_region_add_subregion(get_system_memory(), base + 0x0300, -&s->isa); -} -} - -static void pci_vpb_unmap(SysBusDevice *dev, target_phys_addr_t base) -{ -PCIVPBState *s = (PCIVPBState *)dev; -/* Selfconfig area. */ -memory_region_del_subregion(get_system_memory(), &s->mem_config); -/* Normal config area. */ -memory_region_del_subregion(get_system_memory(), &s->mem_config2); - -if (s->realview) { -/* IO memory area. */ -memory_region_del_subregion(get_system_memory(), &s->isa); -} -} - static int pci_vpb_init(SysBusDevice *dev) { PCIVPBState *s = FROM_SYSBUS(PCIVPBState, dev); @@ -106,16 +74,22 @@ static int pci_vpb_init(SysBusDevice *dev) /* ??? Register memory space. */ +/* Our memory regions are: + * 0 : PCI self config window + * 1 : PCI config window + * 2 : PCI IO window (realview_pci only) + */ memory_region_init_io(&s->mem_config, &pci_vpb_config_ops, bus, "pci-vpb-selfconfig", 0x100); +sysbus_init_mmio_region(dev, &s->mem_config); memory_region_init_io(&s->mem_config2, &pci_vpb_config_ops, bus, "pci-vpb-config", 0x100); +sysbus_init_mmio_region(dev, &s->mem_config2); if (s->realview) { isa_mmio_setup(&s->isa, 0x010); +sysbus_init_mmio_region(dev, &s->isa); } -sysbus_init_mmio_cb2(dev, pci_vpb_map, pci_vpb_unmap); - pci_create_simple(bus, -1, "versatile_pci_host"); return 0; } diff --git a/hw/versatilepb.c b/hw/versatilepb.c index 49f8f5f..68402cc 100644 --- a/hw/versatilepb.c +++ b/hw/versatilepb.c @@ -181,6 +181,7 @@ static void versatile_init(ram_addr_t ram_size, qemu_irq pic[32]; qemu_irq sic[32]; DeviceState *dev, *sysctl; +SysBusDevice *busdev; PCIBus *pci_bus; NICInfo *nd; int n; @@ -219,8 +220,15 @@ static void versatile_init(ram_addr_t ram_size, sysbus_create_simple("pl050_keyboard", 0x10006000, sic[3]); sysbus_create_simple("pl050_mouse", 0x10007000, sic[4]); -dev = sysbus_create_varargs("versatile_pci", 0x4000, -sic[27], sic[28], sic[29], sic[30], NULL); +dev = qdev_create(NULL, "versatile_pci"); +busdev = sysbus_from_qdev(dev); +qdev_init_nofail(dev); +sysbus_mmio_map(busdev, 0, 0x4100); /* PCI self-config */ +sysbus_mmio_map(busdev, 1, 0x4200); /* PCI config */ +sysbus_connect_irq(busdev, 0, sic[27]); +sysbus_connect_irq(busdev, 1, sic[28]); +sysbus_connect_irq(busdev, 2, sic[29]); +sysbus_connect_irq(busdev, 3, sic[30]); pci_bus = (PCIBus *)qdev_get_child_bus(dev, "pci
Re: [Qemu-devel] [PATCH] runstate: do not discard runstate changes when paused
On 10/05/2011 08:50 PM, Luiz Capitulino wrote: > > > I'm not exactly against the semantics you're proposing, but they don't > > > seem to fit today's qemu. > > > > Today's qemu is broken here. > > For me it's broken because it will abort() if you migrate a paused vm, for > you it seems to be broken at the semantic level. > > We can fix the semantics without breaking compatibility. s/We can/ We can't I think we should divide stop causes into three groups: 1) those that are undone by QEMU itself: RSTATE_DEBUG RSTATE_SAVEVM RSTATE_PRE_MIGRATE RSTATE_RESTORE For these a lock/release scheme is definitely better. The VM should not start until none of these conditions is in effect, even after a "cont" command. 2) those that are undone by management: RSTATE_IO_ERROR For this we can add a new "retry" monitor command that guarantees no races if the user issues a "stop" or "cont" command while management is processing it. Effectively, it is also a lock/release scheme but controlled by management. 3) those that are undone by "cont": RSTATE_PRE_LAUNCH RSTATE_PAUSED RSTATE_WATCHDOG RSTATE_POST_MIGRATE RSTATE_PANICKED It put here the three runstates where the VM should really not be restarted at all. We can then add a new "start" command that only flips these five to RSTATE_RUNNING. So the runstate is composed of six elements: five lock/unlock states (of which only one can be unlocked by the user), and one running/paused state (composed of five pause reasons + "none"). That is, the runstate is a tuple like [debug, savevm, pre_migrate, restore, io_error, pause_reason] and for the VM to run it must look like [false, false, false, false, false, none]. The four monitor commands would be: 1) "stop": if runstate[pause_reason] == none then runstate[pause_reason] = paused 2) "retry": runstate[io_error] = false 3) "start": runstate[pause_reason] = none There could also be a differentiation between "start" and "start -f", where "-f" would be needed to get out of RSTATE_POST_MIGRATE, RSTATE_PANICKED and probably RSTATE_WATCHDOG too. 4) "cont": backwards compatibility provided by "retry"+"start -f". How does this look? Paolo
[Qemu-devel] [PATCH 19/25] parallel: Convert to isa_register_portio_list
From: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/parallel.c | 47 --- 1 files changed, 28 insertions(+), 19 deletions(-) diff --git a/hw/parallel.c b/hw/parallel.c index ecbc8c3..8494d94 100644 --- a/hw/parallel.c +++ b/hw/parallel.c @@ -448,6 +448,29 @@ static void parallel_reset(void *opaque) static const int isa_parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc }; +static const MemoryRegionPortio isa_parallel_portio_hw_list[] = { +{ 0, 8, 1, + .read = parallel_ioport_read_hw, + .write = parallel_ioport_write_hw }, +{ 4, 1, 2, + .read = parallel_ioport_eppdata_read_hw2, + .write = parallel_ioport_eppdata_write_hw2 }, +{ 4, 1, 4, + .read = parallel_ioport_eppdata_read_hw4, + .write = parallel_ioport_eppdata_write_hw4 }, +{ 0x400, 8, 1, + .read = parallel_ioport_ecp_read, + .write = parallel_ioport_ecp_write }, +PORTIO_END_OF_LIST(), +}; + +static const MemoryRegionPortio isa_parallel_portio_sw_list[] = { +{ 0, 8, 1, + .read = parallel_ioport_read_sw, + .write = parallel_ioport_write_sw }, +PORTIO_END_OF_LIST(), +}; + static int parallel_isa_initfn(ISADevice *dev) { static int index; @@ -478,25 +501,11 @@ static int parallel_isa_initfn(ISADevice *dev) s->status = dummy; } -if (s->hw_driver) { -register_ioport_write(base, 8, 1, parallel_ioport_write_hw, s); -register_ioport_read(base, 8, 1, parallel_ioport_read_hw, s); -isa_init_ioport_range(dev, base, 8); - -register_ioport_write(base+4, 1, 2, parallel_ioport_eppdata_write_hw2, s); -register_ioport_read(base+4, 1, 2, parallel_ioport_eppdata_read_hw2, s); -register_ioport_write(base+4, 1, 4, parallel_ioport_eppdata_write_hw4, s); -register_ioport_read(base+4, 1, 4, parallel_ioport_eppdata_read_hw4, s); -isa_init_ioport(dev, base+4); -register_ioport_write(base+0x400, 8, 1, parallel_ioport_ecp_write, s); -register_ioport_read(base+0x400, 8, 1, parallel_ioport_ecp_read, s); -isa_init_ioport_range(dev, base+0x400, 8); -} -else { -register_ioport_write(base, 8, 1, parallel_ioport_write_sw, s); -register_ioport_read(base, 8, 1, parallel_ioport_read_sw, s); -isa_init_ioport_range(dev, base, 8); -} +isa_register_portio_list(dev, base, + (s->hw_driver + ? &isa_parallel_portio_hw_list[0] + : &isa_parallel_portio_sw_list[0]), + s, "parallel"); return 0; } -- 1.7.6.3
[Qemu-devel] [PATCH 14/25] fdc: Convert to isa_register_portio_list
From: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/fdc.c | 34 -- 1 files changed, 4 insertions(+), 30 deletions(-) diff --git a/hw/fdc.c b/hw/fdc.c index 0f1cee9..4b06e04 100644 --- a/hw/fdc.c +++ b/hw/fdc.c @@ -424,7 +424,6 @@ struct FDCtrl { typedef struct FDCtrlISABus { ISADevice busdev; -MemoryRegion io_0, io_7; struct FDCtrl state; int32_t bootindexA; int32_t bootindexB; @@ -1880,32 +1879,10 @@ static int fdctrl_init_common(FDCtrl *fdctrl) return fdctrl_connect_drives(fdctrl); } -static uint32_t fdctrl_read_port_7(void *opaque, uint32_t reg) -{ -return fdctrl_read(opaque, reg + 7); -} - -static void fdctrl_write_port_7(void *opaque, uint32_t reg, uint32_t value) -{ -fdctrl_write(opaque, reg + 7, value); -} - -static const MemoryRegionPortio fdc_portio_0[] = { +static const MemoryRegionPortio fdc_portio_list[] = { { 1, 5, 1, .read = fdctrl_read, .write = fdctrl_write }, -PORTIO_END_OF_LIST() -}; - -static const MemoryRegionPortio fdc_portio_7[] = { -{ 0, 1, 1, .read = fdctrl_read_port_7, .write = fdctrl_write_port_7 }, -PORTIO_END_OF_LIST() -}; - -static const MemoryRegionOps fdc_ioport_0_ops = { -.old_portio = fdc_portio_0 -}; - -static const MemoryRegionOps fdc_ioport_7_ops = { -.old_portio = fdc_portio_7 +{ 7, 1, 1, .read = fdctrl_read, .write = fdctrl_write }, +PORTIO_END_OF_LIST(), }; static int isabus_fdc_init1(ISADevice *dev) @@ -1917,10 +1894,7 @@ static int isabus_fdc_init1(ISADevice *dev) int dma_chann = 2; int ret; -memory_region_init_io(&isa->io_0, &fdc_ioport_0_ops, fdctrl, "fdc", 6); -memory_region_init_io(&isa->io_7, &fdc_ioport_7_ops, fdctrl, "fdc", 1); -isa_register_ioport(dev, &isa->io_0, iobase); -isa_register_ioport(dev, &isa->io_7, iobase + 7); +isa_register_portio_list(dev, iobase, fdc_portio_list, fdctrl, "fdc"); isa_init_irq(&isa->busdev, &fdctrl->irq, isairq); fdctrl->dma_chann = dma_chann; -- 1.7.6.3
[Qemu-devel] [PATCH 25/25] isa: Remove isa_init_ioport_range and isa_init_ioport
From: Richard Henderson All users have been converted to either isa_register_ioport or isa_register_old_portio_list. Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/isa-bus.c | 19 +-- hw/isa.h |2 -- 2 files changed, 5 insertions(+), 16 deletions(-) diff --git a/hw/isa-bus.c b/hw/isa-bus.c index 5d8ff84..7c2c261 100644 --- a/hw/isa-bus.c +++ b/hw/isa-bus.c @@ -83,24 +83,17 @@ void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq) dev->nirqs++; } -void isa_init_ioport_range(ISADevice *dev, uint16_t start, uint16_t length) +static inline void isa_init_ioport(ISADevice *dev, uint16_t ioport) { -if (dev->ioport_id == 0 || start < dev->ioport_id) { -dev->ioport_id = start; +if (dev && (dev->ioport_id == 0 || ioport < dev->ioport_id)) { +dev->ioport_id = ioport; } } -void isa_init_ioport(ISADevice *dev, uint16_t ioport) -{ -isa_init_ioport_range(dev, ioport, 1); -} - void isa_register_ioport(ISADevice *dev, MemoryRegion *io, uint16_t start) { memory_region_add_subregion(isabus->address_space_io, start, io); -if (dev != NULL) { -isa_init_ioport(dev, start); -} +isa_init_ioport(dev, start); } void isa_register_portio_list(ISADevice *dev, uint16_t start, @@ -112,9 +105,7 @@ void isa_register_portio_list(ISADevice *dev, uint16_t start, /* START is how we should treat DEV, regardless of the actual contents of the portio array. This is how the old code actually handled e.g. the FDC device. */ -if (dev) { -isa_init_ioport(dev, start); -} +isa_init_ioport(dev, start); portio_list_init(piolist, pio_start, opaque, name); portio_list_add(piolist, isabus->address_space_io, start); diff --git a/hw/isa.h b/hw/isa.h index 177ef95..d3cae35 100644 --- a/hw/isa.h +++ b/hw/isa.h @@ -28,8 +28,6 @@ ISABus *isa_bus_new(DeviceState *dev, MemoryRegion *address_space_io); void isa_bus_irqs(qemu_irq *irqs); qemu_irq isa_get_irq(int isairq); void isa_init_irq(ISADevice *dev, qemu_irq *p, int isairq); -void isa_init_ioport(ISADevice *dev, uint16_t ioport); -void isa_init_ioport_range(ISADevice *dev, uint16_t start, uint16_t length); void isa_qdev_register(ISADeviceInfo *info); MemoryRegion *isa_address_space(ISADevice *dev); ISADevice *isa_create(const char *name); -- 1.7.6.3
[Qemu-devel] [PATCH 16/25] m48t59: Convert to isa_register_ioport
From: Richard Henderson The sysbus interface is as yet unconverted. Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/m48t59.c | 15 --- 1 files changed, 12 insertions(+), 3 deletions(-) diff --git a/hw/m48t59.c b/hw/m48t59.c index 0cc361e..f318e67 100644 --- a/hw/m48t59.c +++ b/hw/m48t59.c @@ -73,6 +73,7 @@ struct M48t59State { typedef struct M48t59ISAState { ISADevice busdev; M48t59State state; +MemoryRegion io; } M48t59ISAState; typedef struct M48t59SysBusState { @@ -626,6 +627,15 @@ static void m48t59_reset_sysbus(DeviceState *d) m48t59_reset_common(NVRAM); } +static const MemoryRegionPortio m48t59_portio[] = { +{0, 4, 1, .read = NVRAM_readb, .write = NVRAM_writeb }, +PORTIO_END_OF_LIST(), +}; + +static const MemoryRegionOps m48t59_io_ops = { +.old_portio = m48t59_portio, +}; + /* Initialisation routine */ M48t59State *m48t59_init(qemu_irq IRQ, target_phys_addr_t mem_base, uint32_t io_base, uint16_t size, int type) @@ -669,10 +679,9 @@ static void m48t59_reset_sysbus(DeviceState *d) d = DO_UPCAST(M48t59ISAState, busdev, dev); s = &d->state; +memory_region_init_io(&d->io, &m48t59_io_ops, s, "m48t59", 4); if (io_base != 0) { -register_ioport_read(io_base, 0x04, 1, NVRAM_readb, s); -register_ioport_write(io_base, 0x04, 1, NVRAM_writeb, s); -isa_init_ioport_range(dev, io_base, 4); +isa_register_ioport(dev, &d->io, io_base); } return s; -- 1.7.6.3
[Qemu-devel] [PATCH 05/25] petalogix_s2adsp1800: convert to memory API
Signed-off-by: Avi Kivity --- hw/petalogix_s3adsp1800_mmu.c | 18 ++ 1 files changed, 10 insertions(+), 8 deletions(-) diff --git a/hw/petalogix_s3adsp1800_mmu.c b/hw/petalogix_s3adsp1800_mmu.c index 66fb96d..17da2fd 100644 --- a/hw/petalogix_s3adsp1800_mmu.c +++ b/hw/petalogix_s3adsp1800_mmu.c @@ -35,6 +35,7 @@ #include "loader.h" #include "elf.h" #include "blockdev.h" +#include "exec-memory.h" #include "microblaze_pic_cpu.h" @@ -125,9 +126,10 @@ static uint64_t translate_kernel_address(void *opaque, uint64_t addr) DriveInfo *dinfo; int i; target_phys_addr_t ddr_base = 0x9000; -ram_addr_t phys_lmb_bram; -ram_addr_t phys_ram; +MemoryRegion *phys_lmb_bram = g_new(MemoryRegion, 1); +MemoryRegion *phys_ram = g_new(MemoryRegion, 1); qemu_irq irq[32], *cpu_irq; +MemoryRegion *sysmem = get_system_memory(); /* init CPUs */ if (cpu_model == NULL) { @@ -139,13 +141,13 @@ static uint64_t translate_kernel_address(void *opaque, uint64_t addr) qemu_register_reset(main_cpu_reset, env); /* Attach emulated BRAM through the LMB. */ -phys_lmb_bram = qemu_ram_alloc(NULL, "petalogix_s3adsp1800.lmb_bram", - LMB_BRAM_SIZE); -cpu_register_physical_memory(0x, LMB_BRAM_SIZE, - phys_lmb_bram | IO_MEM_RAM); +memory_region_init_ram(phys_lmb_bram, NULL, + "petalogix_s3adsp1800.lmb_bram", LMB_BRAM_SIZE); +memory_region_add_subregion(sysmem, 0x, phys_lmb_bram); -phys_ram = qemu_ram_alloc(NULL, "petalogix_s3adsp1800.ram", ram_size); -cpu_register_physical_memory(ddr_base, ram_size, phys_ram | IO_MEM_RAM); +memory_region_init_ram(phys_ram, NULL, "petalogix_s3adsp1800.ram", + ram_size); +memory_region_add_subregion(sysmem, ddr_base, phys_ram); dinfo = drive_get(IF_PFLASH, 0, 0); pflash_cfi01_register(0xa000, -- 1.7.6.3
[Qemu-devel] [PATCH 21/25] vga: Convert to isa_register_portio_list
From: Richard Henderson [jan: fix cut'n'paste errors] [avi: adjust pci variants not to use isa functions] Signed-off-by: Richard Henderson Signed-off-by: Jan Kiszka Signed-off-by: Avi Kivity --- hw/qxl.c|2 +- hw/vga-isa.c| 17 hw/vga-pci.c|2 +- hw/vga.c| 73 +++--- hw/vga_int.h|7 - hw/vmware_vga.c |7 +++-- 6 files changed, 59 insertions(+), 49 deletions(-) diff --git a/hw/qxl.c b/hw/qxl.c index 6db2f1a..03848ed 100644 --- a/hw/qxl.c +++ b/hw/qxl.c @@ -1601,7 +1601,7 @@ static int qxl_init_primary(PCIDevice *dev) ram_size = 32 * 1024 * 1024; } vga_common_init(vga, ram_size); -vga_init(vga, pci_address_space(dev)); +vga_init(vga, pci_address_space(dev), pci_address_space_io(dev), false); register_ioport_write(0x3c0, 16, 1, qxl_vga_ioport_write, vga); register_ioport_write(0x3b4, 2, 1, qxl_vga_ioport_write, vga); register_ioport_write(0x3d4, 2, 1, qxl_vga_ioport_write, vga); diff --git a/hw/vga-isa.c b/hw/vga-isa.c index 6b5c8ed..4825313 100644 --- a/hw/vga-isa.c +++ b/hw/vga-isa.c @@ -47,24 +47,19 @@ static int vga_initfn(ISADevice *dev) ISAVGAState *d = DO_UPCAST(ISAVGAState, dev, dev); VGACommonState *s = &d->state; MemoryRegion *vga_io_memory; +const MemoryRegionPortio *vga_ports, *vbe_ports; vga_common_init(s, VGA_RAM_SIZE); s->legacy_address_space = isa_address_space(dev); -vga_io_memory = vga_init_io(s); +vga_io_memory = vga_init_io(s, &vga_ports, &vbe_ports); +isa_register_portio_list(dev, 0x3b0, vga_ports, s, "vga"); +if (vbe_ports) { +isa_register_portio_list(dev, 0x1ce, vbe_ports, s, "vbe"); +} memory_region_add_subregion_overlap(isa_address_space(dev), isa_mem_base + 0x000a, vga_io_memory, 1); memory_region_set_coalescing(vga_io_memory); -isa_init_ioport(dev, 0x3c0); -isa_init_ioport(dev, 0x3b4); -isa_init_ioport(dev, 0x3ba); -isa_init_ioport(dev, 0x3da); -isa_init_ioport(dev, 0x3c0); -#ifdef CONFIG_BOCHS_VBE -isa_init_ioport(dev, 0x1ce); -isa_init_ioport(dev, 0x1cf); -isa_init_ioport(dev, 0x1d0); -#endif /* CONFIG_BOCHS_VBE */ s->ds = graphic_console_init(s->update, s->invalidate, s->screen_dump, s->text_update, s); diff --git a/hw/vga-pci.c b/hw/vga-pci.c index 3c8bcb0..14bfadb 100644 --- a/hw/vga-pci.c +++ b/hw/vga-pci.c @@ -54,7 +54,7 @@ static int pci_vga_initfn(PCIDevice *dev) // vga + console init vga_common_init(s, VGA_RAM_SIZE); - vga_init(s, pci_address_space(dev)); + vga_init(s, pci_address_space(dev), pci_address_space_io(dev), true); s->ds = graphic_console_init(s->update, s->invalidate, s->screen_dump, s->text_update, s); diff --git a/hw/vga.c b/hw/vga.c index f9a6014..5beaa99 100644 --- a/hw/vga.c +++ b/hw/vga.c @@ -2241,40 +2241,39 @@ void vga_common_init(VGACommonState *s, int vga_ram_size) vga_dirty_log_start(s); } -/* used by both ISA and PCI */ -MemoryRegion *vga_init_io(VGACommonState *s) -{ -MemoryRegion *vga_mem; - -register_ioport_write(0x3c0, 16, 1, vga_ioport_write, s); - -register_ioport_write(0x3b4, 2, 1, vga_ioport_write, s); -register_ioport_write(0x3d4, 2, 1, vga_ioport_write, s); -register_ioport_write(0x3ba, 1, 1, vga_ioport_write, s); -register_ioport_write(0x3da, 1, 1, vga_ioport_write, s); - -register_ioport_read(0x3c0, 16, 1, vga_ioport_read, s); - -register_ioport_read(0x3b4, 2, 1, vga_ioport_read, s); -register_ioport_read(0x3d4, 2, 1, vga_ioport_read, s); -register_ioport_read(0x3ba, 1, 1, vga_ioport_read, s); -register_ioport_read(0x3da, 1, 1, vga_ioport_read, s); +static const MemoryRegionPortio vga_portio_list[] = { +{ 0x04, 2, 1, .read = vga_ioport_read, .write = vga_ioport_write }, /* 3b4 */ +{ 0x0a, 1, 1, .read = vga_ioport_read, .write = vga_ioport_write }, /* 3ba */ +{ 0x10, 16, 1, .read = vga_ioport_read, .write = vga_ioport_write }, /* 3c0 */ +{ 0x24, 2, 1, .read = vga_ioport_read, .write = vga_ioport_write }, /* 3d4 */ +{ 0x2a, 1, 1, .read = vga_ioport_read, .write = vga_ioport_write }, /* 3da */ +PORTIO_END_OF_LIST(), +}; #ifdef CONFIG_BOCHS_VBE -#if defined (TARGET_I386) -register_ioport_read(0x1ce, 1, 2, vbe_ioport_read_index, s); -register_ioport_read(0x1cf, 1, 2, vbe_ioport_read_data, s); +static const MemoryRegionPortio vbe_portio_list[] = { +{ 0, 1, 2, .read = vbe_ioport_read_index, .write = vbe_ioport_write_index }, +# ifdef TARGET_I386 +{ 1, 1, 2, .read = vbe_ioport_read_data, .write = vbe_ioport_write_data }, +# else +{ 2, 1, 2, .read = vbe_ioport_read_data, .write = vbe_ioport_write_data }, +# endif +PORTIO_END_OF_LIST(), +}; +#endif /* CONFIG_BOCHS_VBE */ -register
[Qemu-devel] [PATCH 12/25] memory: Fix old portio word accesses
From: Jan Kiszka As we register old portio regions via ioport_register, we are also responsible for providing the word access wrapper. Signed-off-by: Jan Kiszka Signed-off-by: Avi Kivity --- memory.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/memory.c b/memory.c index 528e5fb..a8359b1 100644 --- a/memory.c +++ b/memory.c @@ -404,6 +404,11 @@ static void memory_region_iorange_read(IORange *iorange, *data = ((uint64_t)1 << (width * 8)) - 1; if (mrp) { *data = mrp->read(mr->opaque, offset + mr->offset); +} else if (width == 2) { +mrp = find_portio(mr, offset, 1, false); +assert(mrp); +*data = mrp->read(mr->opaque, offset + mr->offset) | +(mrp->read(mr->opaque, offset + mr->offset + 1) << 8); } return; } @@ -426,6 +431,11 @@ static void memory_region_iorange_write(IORange *iorange, if (mrp) { mrp->write(mr->opaque, offset + mr->offset, data); +} else if (width == 2) { +mrp = find_portio(mr, offset, 1, false); +assert(mrp); +mrp->write(mr->opaque, offset + mr->offset, data & 0xff); +mrp->write(mr->opaque, offset + mr->offset + 1, data >> 8); } return; } -- 1.7.6.3
Re: [Qemu-devel] [RFC 0/2] target-arm: Adding Cortex-R4F support
On 6 October 2011 11:16, Andreas Färber wrote: > Am 02.10.2011 23:44, schrieb Peter Maydell: >> On 2 October 2011 19:56, Andreas Färber wrote: >>> 1) Currently, -cpu is used to look up a Main ID Register value and to base >>> feature decisions on that. This doesn't work for Cortex-R4 and Cortex-R4F, >>> which have an identical MIDR but only -R4F has the FPU. >>> Re-checking the model string, while ugly, does the trick. Comments? >> >> That is indeed kind of ugly. I think if CPUID value isn't a unique value >> for the things we pass to -cpu then we shouldn't treat it as one. > > For the reset, the MIDR is read, then the memset() is performed and > cpu_reset_model_id() is called with the previously read MIDR value, > which the function then writes into the register first thing. I'd > suggest to move that out into cpu_reset(), drop the id parameter and > switch on the register instead (only other use is cpu_abort()). If we're shuffling code around we should probably be doing something like: * in cpu_arm_init() look at the model string and set feature switches, ID register values, etc * in reset, don't reset ID registers (they're constant, after all), and [as with the rest of the code] behave based only on cpuid and feature switches >> More >> generally, it would be nice to be able to say "I want a Cortex-A9 >> but I only want the no-neon VFPv3D16 variant". (I think some of the >> other targets already have syntax for this.) > > Coming from a ppc background, we have a whole matrix of processors with > fixed features but I'm not aware of an arch where we opt-in/out > processor core features. target-i386 seems to have some code for handling syntax like this (you seem to be able to say -cpu pentium,-fpu for instance). >> I think that (1) the bare CPU name should be the most recent rev of the >> core that QEMU knows about [and that we should be happy to change qemu >> to move up to supporting newer revisions] > >> (Anybody want to argue with (1) ?) > > I concur that an easy-to-type -cpu should provide the latest and > greatest features. Features hidden will not get much exposure. But if a > revision noticeably changes behavior, I guess we should remain command > line compatible. Depends what you mean by "noticeably". User space will basically never notice or care, typically. The kernel does care occasionally. I think I'd rather have "cortex-foo" do the right thing for the vast majority of users who don't care whether they get r2 or r3, rather than be stuck with it meaning r1 because that's what we happened to model first and there was some minor incompatible change between r1 and r2. I don't think there's a position between "cortex-foo is always the most recent rev we model" and "user must always specify rXpY" which doesn't lead you into weird and confusing UI inconsistencies between CPUs. -- PMM
[Qemu-devel] [PATCH 11/25] Introduce PortioList
Add a type and methods for manipulating a list of disjoint I/O ports, used in some older hardware devices. Based on original patch by Richard Henderson. Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- Makefile.objs |2 +- Makefile.target |2 +- ioport.c| 108 +++ ioport.h| 21 +++ memory.c|8 ++-- 5 files changed, 135 insertions(+), 6 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index 8d23fbb..86ab37b 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -82,7 +82,7 @@ common-obj-$(CONFIG_WIN32) += os-win32.o common-obj-$(CONFIG_POSIX) += os-posix.o common-obj-y += tcg-runtime.o host-utils.o -common-obj-y += irq.o ioport.o input.o +common-obj-y += irq.o input.o common-obj-$(CONFIG_PTIMER) += ptimer.o common-obj-$(CONFIG_MAX7310) += max7310.o common-obj-$(CONFIG_WM8750) += wm8750.o diff --git a/Makefile.target b/Makefile.target index 88d2f1f..c1529b3 100644 --- a/Makefile.target +++ b/Makefile.target @@ -183,7 +183,7 @@ endif #CONFIG_BSD_USER # System emulator target ifdef CONFIG_SOFTMMU -obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o +obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o # virtio has to be here due to weird dependency between PCI and virtio-net. # need to fix this properly obj-$(CONFIG_NO_PCI) += pci-stub.o diff --git a/ioport.c b/ioport.c index a32483b..36fa3a4 100644 --- a/ioport.c +++ b/ioport.c @@ -27,6 +27,7 @@ #include "ioport.h" #include "trace.h" +#include "memory.h" /***/ /* IO Port */ @@ -313,3 +314,110 @@ uint32_t cpu_inl(pio_addr_t addr) LOG_IOPORT("inl : %04"FMT_pioaddr" %08"PRIx32"\n", addr, val); return val; } + +void portio_list_init(PortioList *piolist, + const MemoryRegionPortio *callbacks, + void *opaque, const char *name) +{ +unsigned n = 0; + +while (callbacks[n].size) { +++n; +} + +piolist->ports = callbacks; +piolist->nr = 0; +piolist->regions = g_new0(MemoryRegion *, n); +piolist->address_space = NULL; +piolist->opaque = opaque; +piolist->name = name; +} + +void portio_list_destroy(PortioList *piolist) +{ +g_free(piolist->regions); +} + +static void portio_list_add_1(PortioList *piolist, + const MemoryRegionPortio *pio_init, + unsigned count, unsigned start, + unsigned off_low, unsigned off_high) +{ +MemoryRegionPortio *pio; +MemoryRegionOps *ops; +MemoryRegion *region; +unsigned i; + +/* Copy the sub-list and null-terminate it. */ +pio = g_new(MemoryRegionPortio, count + 1); +memcpy(pio, pio_init, sizeof(MemoryRegionPortio) * count); +memset(pio + count, 0, sizeof(MemoryRegionPortio)); + +/* Adjust the offsets to all be zero-based for the region. */ +for (i = 0; i < count; ++i) { +pio[i].offset -= off_low; +} + +ops = g_new0(MemoryRegionOps, 1); +ops->old_portio = pio; + +region = g_new(MemoryRegion, 1); +memory_region_init_io(region, ops, piolist->opaque, piolist->name, + off_high - off_low); +memory_region_set_offset(region, start + off_low); +memory_region_add_subregion(piolist->address_space, +start + off_low, region); +piolist->regions[piolist->nr++] = region; +} + +void portio_list_add(PortioList *piolist, + MemoryRegion *address_space, + uint32_t start) +{ +const MemoryRegionPortio *pio, *pio_start = piolist->ports; +unsigned int off_low, off_high, off_last, count; + +piolist->address_space = address_space; + +/* Handle the first entry specially. */ +off_last = off_low = pio_start->offset; +off_high = off_low + pio_start->len; +count = 1; + +for (pio = pio_start + 1; pio->size != 0; pio++, count++) { +/* All entries must be sorted by offset. */ +assert(pio->offset >= off_last); +off_last = pio->offset; + +/* If we see a hole, break the region. */ +if (off_last > off_high) { +portio_list_add_1(piolist, pio_start, count, start, off_low, + off_high); +/* ... and start collecting anew. */ +pio_start = pio; +off_low = off_last; +off_high = off_low + pio->len; +count = 0; +} else if (off_last + pio->len > off_high) { +off_high = off_last + pio->len; +} +} + +/* There will always be an open sub-list. */ +portio_list_add_1(piolist, pio_start, count, start, off_low, off_high); +} + +void portio_list_del(PortioList *piolist) +{ +MemoryRegion *mr; +unsigned i; + +for (i = 0; i < piolist->nr; ++i) { +mr = piolist->regions[i]; +
[Qemu-devel] [PATCH 17/25] rtc: Convert to isa_register_ioport
From: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/mc146818rtc.c | 15 --- 1 files changed, 12 insertions(+), 3 deletions(-) diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c index feb3b25..2aaca2f 100644 --- a/hw/mc146818rtc.c +++ b/hw/mc146818rtc.c @@ -81,6 +81,7 @@ typedef struct RTCState { ISADevice dev; +MemoryRegion io; uint8_t cmos_data[128]; uint8_t cmos_index; struct tm current_tm; @@ -604,6 +605,15 @@ static void rtc_reset(void *opaque) #endif } +static const MemoryRegionPortio cmos_portio[] = { +{0, 2, 1, .read = cmos_ioport_read, .write = cmos_ioport_write }, +PORTIO_END_OF_LIST(), +}; + +static const MemoryRegionOps cmos_ops = { +.old_portio = cmos_portio +}; + static int rtc_initfn(ISADevice *dev) { RTCState *s = DO_UPCAST(RTCState, dev, dev); @@ -632,9 +642,8 @@ static int rtc_initfn(ISADevice *dev) qemu_get_clock_ns(rtc_clock) + (get_ticks_per_sec() * 99) / 100; qemu_mod_timer(s->second_timer2, s->next_second_time); -register_ioport_write(base, 2, 1, cmos_ioport_write, s); -register_ioport_read(base, 2, 1, cmos_ioport_read, s); -isa_init_ioport_range(dev, base, 2); +memory_region_init_io(&s->io, &cmos_ops, s, "rtc", 2); +isa_register_ioport(dev, &s->io, base); qdev_set_legacy_instance_id(&dev->qdev, base, 2); qemu_register_reset(rtc_reset, s); -- 1.7.6.3
[Qemu-devel] [PATCH 15/25] gus: Convert to isa_register_portio_list
From: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Avi Kivity --- hw/gus.c | 39 +++ 1 files changed, 19 insertions(+), 20 deletions(-) diff --git a/hw/gus.c b/hw/gus.c index 37e543a..1532686 100644 --- a/hw/gus.c +++ b/hw/gus.c @@ -232,6 +232,22 @@ static int GUS_read_DMA (void *opaque, int nchan, int dma_pos, int dma_len) } }; +static const MemoryRegionPortio gus_portio_list1[] = { +{0x000, 1, 1, .write = gus_writeb }, +{0x000, 1, 2, .write = gus_writew }, +{0x006, 10, 1, .read = gus_readb, .write = gus_writeb }, +{0x006, 10, 2, .read = gus_readw, .write = gus_writew }, +{0x100, 8, 1, .read = gus_readb, .write = gus_writeb }, +{0x100, 8, 2, .read = gus_readw, .write = gus_writew }, +PORTIO_END_OF_LIST(), +}; + +static const MemoryRegionPortio gus_portio_list2[] = { +{0, 1, 1, .read = gus_readb }, +{0, 1, 2, .read = gus_readw }, +PORTIO_END_OF_LIST(), +}; + static int gus_initfn (ISADevice *dev) { GUSState *s = DO_UPCAST(GUSState, dev, dev); @@ -262,26 +278,9 @@ static int gus_initfn (ISADevice *dev) s->samples = AUD_get_buffer_size_out (s->voice) >> s->shift; s->mixbuf = g_malloc0 (s->samples << s->shift); -register_ioport_write (s->port, 1, 1, gus_writeb, s); -register_ioport_write (s->port, 1, 2, gus_writew, s); -isa_init_ioport_range(dev, s->port, 2); - -register_ioport_read ((s->port + 0x100) & 0xf00, 1, 1, gus_readb, s); -register_ioport_read ((s->port + 0x100) & 0xf00, 1, 2, gus_readw, s); -isa_init_ioport_range(dev, (s->port + 0x100) & 0xf00, 2); - -register_ioport_write (s->port + 6, 10, 1, gus_writeb, s); -register_ioport_write (s->port + 6, 10, 2, gus_writew, s); -register_ioport_read (s->port + 6, 10, 1, gus_readb, s); -register_ioport_read (s->port + 6, 10, 2, gus_readw, s); -isa_init_ioport_range(dev, s->port + 6, 10); - - -register_ioport_write (s->port + 0x100, 8, 1, gus_writeb, s); -register_ioport_write (s->port + 0x100, 8, 2, gus_writew, s); -register_ioport_read (s->port + 0x100, 8, 1, gus_readb, s); -register_ioport_read (s->port + 0x100, 8, 2, gus_readw, s); -isa_init_ioport_range(dev, s->port + 0x100, 8); +isa_register_portio_list(dev, s->port, gus_portio_list1, s, "gus"); +isa_register_portio_list(dev, (s->port + 0x100) & 0xf00, + gus_portio_list2, s, "gus"); DMA_register_channel (s->emu.gusdma, GUS_read_DMA, s); s->emu.himemaddr = s->himem; -- 1.7.6.3
[Qemu-devel] [PATCH 03/25] palm: convert to memory API
Signed-off-by: Avi Kivity --- hw/palm.c | 53 + 1 files changed, 25 insertions(+), 28 deletions(-) diff --git a/hw/palm.c b/hw/palm.c index d8f50e3..094bfde 100644 --- a/hw/palm.c +++ b/hw/palm.c @@ -54,16 +54,12 @@ static void static_write(void *opaque, target_phys_addr_t offset, #endif } -static CPUReadMemoryFunc * const static_readfn[] = { -static_readb, -static_readh, -static_readw, -}; - -static CPUWriteMemoryFunc * const static_writefn[] = { -static_write, -static_write, -static_write, +static const MemoryRegionOps static_ops = { +.old_mmio = { +.read = { static_readb, static_readh, static_readw, }, +.write = { static_write, static_write, static_write, }, +}, +.endianness = DEVICE_NATIVE_ENDIAN, }; /* Palm Tunsgten|E support */ @@ -203,34 +199,35 @@ static void palmte_init(ram_addr_t ram_size, struct omap_mpu_state_s *cpu; int flash_size = 0x0080; int sdram_size = palmte_binfo.ram_size; -int io; static uint32_t cs0val = 0x; static uint32_t cs1val = 0xe1a0; static uint32_t cs2val = 0xe1a0; static uint32_t cs3val = 0xe1a0e1a0; int rom_size, rom_loaded = 0; DisplayState *ds = get_displaystate(); +MemoryRegion *flash = g_new(MemoryRegion, 1); +MemoryRegion *cs = g_new(MemoryRegion, 4); cpu = omap310_mpu_init(address_space_mem, sdram_size, cpu_model); /* External Flash (EMIFS) */ -cpu_register_physical_memory(OMAP_CS0_BASE, flash_size, - qemu_ram_alloc(NULL, "palmte.flash", -flash_size) | IO_MEM_ROM); - -io = cpu_register_io_memory(static_readfn, static_writefn, &cs0val, -DEVICE_NATIVE_ENDIAN); -cpu_register_physical_memory(OMAP_CS0_BASE + flash_size, -OMAP_CS0_SIZE - flash_size, io); -io = cpu_register_io_memory(static_readfn, static_writefn, &cs1val, -DEVICE_NATIVE_ENDIAN); -cpu_register_physical_memory(OMAP_CS1_BASE, OMAP_CS1_SIZE, io); -io = cpu_register_io_memory(static_readfn, static_writefn, &cs2val, -DEVICE_NATIVE_ENDIAN); -cpu_register_physical_memory(OMAP_CS2_BASE, OMAP_CS2_SIZE, io); -io = cpu_register_io_memory(static_readfn, static_writefn, &cs3val, -DEVICE_NATIVE_ENDIAN); -cpu_register_physical_memory(OMAP_CS3_BASE, OMAP_CS3_SIZE, io); +memory_region_init_ram(flash, NULL, "palmte.flash", flash_size); +memory_region_set_readonly(flash, true); +memory_region_add_subregion(address_space_mem, OMAP_CS0_BASE, flash); + +memory_region_init_io(&cs[0], &static_ops, &cs0val, "palmte-cs0", + OMAP_CS0_SIZE - flash_size); +memory_region_add_subregion(address_space_mem, OMAP_CS0_BASE + flash_size, +&cs[0]); +memory_region_init_io(&cs[1], &static_ops, &cs1val, "palmte-cs1", + OMAP_CS1_SIZE); +memory_region_add_subregion(address_space_mem, OMAP_CS1_BASE, &cs[1]); +memory_region_init_io(&cs[2], &static_ops, &cs2val, "palmte-cs2", + OMAP_CS2_SIZE); +memory_region_add_subregion(address_space_mem, OMAP_CS2_BASE, &cs[2]); +memory_region_init_io(&cs[3], &static_ops, &cs3val, "palmte-cs3", + OMAP_CS3_SIZE); +memory_region_add_subregion(address_space_mem, OMAP_CS3_BASE, &cs[3]); palmte_microwire_setup(cpu); -- 1.7.6.3
[Qemu-devel] [PATCH 00/25] Memory API converions, batch 11
Review before push. I see that Alex's patch is also in the ppc queue, I'll drop it if it's merged before. Alexander Graf (1): PPC: Fix via-cuda memory registration Avi Kivity (7): palm: convert to memory API petalogix_ml605: convert to memory API petalogix_s2adsp1800: convert to memory API ppc405_boards: convert to memory API ppc_newworld: convert to memory API Introduce PortioList isa: Add isa_register_portio_list() Jan Kiszka (1): memory: Fix old portio word accesses Peter Maydell (3): hw/lan9118.c: Convert to MemoryRegion hw/arm11mpcore: Clean up to avoid using sysbus_mmio_init_cb2 hw/versatile_pci: Expose multiple sysbus mmio regions Richard Henderson (13): isa: Tidy support code for isabus_get_fw_dev_path fdc: Convert to isa_register_portio_list gus: Convert to isa_register_portio_list m48t59: Convert to isa_register_ioport rtc: Convert to isa_register_ioport ne2000: Convert to isa_register_ioport parallel: Convert to isa_register_portio_list sb16: Convert to isa_register_portio_list vga: Convert to isa_register_portio_list pc: Convert port92 to isa_register_ioport vmport: Convert to isa_register_ioport ide: Convert to isa_register_portio_list isa: Remove isa_init_ioport_range and isa_init_ioport Makefile.objs |2 +- Makefile.target |2 +- hw/arm11mpcore.c | 13 +- hw/cuda.c | 28 ++- hw/fdc.c | 34 ++--- hw/gus.c | 39 +++ hw/ide/core.c | 30 +++ hw/ide/internal.h |3 +- hw/ide/isa.c |4 +- hw/ide/piix.c |7 ++- hw/ide/via.c |7 ++- hw/isa-bus.c | 45 +++-- hw/isa.h | 38 --- hw/lan9118.c | 29 --- hw/m48t59.c | 15 +- hw/mc146818rtc.c | 15 +- hw/ne2000-isa.c |5 +-- hw/palm.c | 53 +--- hw/parallel.c | 47 +++--- hw/pc.c | 16 +- hw/petalogix_ml605_mmu.c | 15 +++--- hw/petalogix_s3adsp1800_mmu.c | 18 --- hw/ppc405_boards.c| 84 ++-- hw/ppc_newworld.c | 39 ++ hw/qxl.c |2 +- hw/realview.c | 12 - hw/sb16.c | 32 +--- hw/versatile_pci.c| 42 +++- hw/versatilepb.c | 12 - hw/vga-isa.c | 17 ++ hw/vga-pci.c |2 +- hw/vga.c | 73 hw/vga_int.h |7 ++- hw/vmport.c | 16 +- hw/vmware_vga.c |7 ++- ioport.c | 108 + ioport.h | 21 memory.c | 18 +-- 38 files changed, 551 insertions(+), 406 deletions(-) -- 1.7.6.3
[Qemu-devel] [PATCH 01/25] hw/lan9118.c: Convert to MemoryRegion
From: Peter Maydell Signed-off-by: Peter Maydell Signed-off-by: Avi Kivity --- hw/lan9118.c | 29 +++-- 1 files changed, 11 insertions(+), 18 deletions(-) diff --git a/hw/lan9118.c b/hw/lan9118.c index 73a8661..634b88e 100644 --- a/hw/lan9118.c +++ b/hw/lan9118.c @@ -152,7 +152,7 @@ enum tx_state { NICState *nic; NICConf conf; qemu_irq irq; -int mmio_index; +MemoryRegion mmio; ptimer_state *timer; uint32_t irq_cfg; @@ -895,7 +895,7 @@ static void lan9118_tick(void *opaque) } static void lan9118_writel(void *opaque, target_phys_addr_t offset, - uint32_t val) + uint64_t val, unsigned size) { lan9118_state *s = (lan9118_state *)opaque; offset &= 0xff; @@ -1022,13 +1022,14 @@ static void lan9118_writel(void *opaque, target_phys_addr_t offset, break; default: -hw_error("lan9118_write: Bad reg 0x%x = %x\n", (int)offset, val); +hw_error("lan9118_write: Bad reg 0x%x = %x\n", (int)offset, (int)val); break; } lan9118_update(s); } -static uint32_t lan9118_readl(void *opaque, target_phys_addr_t offset) +static uint64_t lan9118_readl(void *opaque, target_phys_addr_t offset, + unsigned size) { lan9118_state *s = (lan9118_state *)opaque; @@ -1101,16 +1102,10 @@ static uint32_t lan9118_readl(void *opaque, target_phys_addr_t offset) return 0; } -static CPUReadMemoryFunc * const lan9118_readfn[] = { -lan9118_readl, -lan9118_readl, -lan9118_readl -}; - -static CPUWriteMemoryFunc * const lan9118_writefn[] = { -lan9118_writel, -lan9118_writel, -lan9118_writel +static const MemoryRegionOps lan9118_mem_ops = { +.read = lan9118_readl, +.write = lan9118_writel, +.endianness = DEVICE_NATIVE_ENDIAN, }; static void lan9118_cleanup(VLANClientState *nc) @@ -1135,10 +1130,8 @@ static int lan9118_init1(SysBusDevice *dev) QEMUBH *bh; int i; -s->mmio_index = cpu_register_io_memory(lan9118_readfn, - lan9118_writefn, s, - DEVICE_NATIVE_ENDIAN); -sysbus_init_mmio(dev, 0x100, s->mmio_index); +memory_region_init_io(&s->mmio, &lan9118_mem_ops, s, "lan9118-mmio", 0x100); +sysbus_init_mmio_region(dev, &s->mmio); sysbus_init_irq(dev, &s->irq); qemu_macaddr_default_if_unset(&s->conf.macaddr); -- 1.7.6.3
Re: [Qemu-devel] [RFC 0/2] target-arm: Adding Cortex-R4F support
Am 02.10.2011 23:44, schrieb Peter Maydell: > On 2 October 2011 19:56, Andreas Färber wrote: >> I've been looking into adding support for Cortex-R4F. > > Ooh, that will be the first R profile core. In particular the only > other non-M-profile PMSA core we support is the 946 which was a v5 > core, Yeah, I rarely pick the easy tasks. :) >> 1) Currently, -cpu is used to look up a Main ID Register value and to base >> feature decisions on that. This doesn't work for Cortex-R4 and Cortex-R4F, >> which have an identical MIDR but only -R4F has the FPU. >> Re-checking the model string, while ugly, does the trick. Comments? > > That is indeed kind of ugly. I think if CPUID value isn't a unique value > for the things we pass to -cpu then we shouldn't treat it as one. For the reset, the MIDR is read, then the memset() is performed and cpu_reset_model_id() is called with the previously read MIDR value, which the function then writes into the register first thing. I'd suggest to move that out into cpu_reset(), drop the id parameter and switch on the register instead (only other use is cpu_abort()). > More > generally, it would be nice to be able to say "I want a Cortex-A9 > but I only want the no-neon VFPv3D16 variant". (I think some of the > other targets already have syntax for this.) Coming from a ppc background, we have a whole matrix of processors with fixed features but I'm not aware of an arch where we opt-in/out processor core features. > Currently the approach is to say "you only get one variant of the > processor, and it's the one with all the bells and whistles enabled". > That would imply that '-cpu cortex-r4' gives you one with an FPU. I'll go with cortex-r4f then. > I think that (1) the bare CPU name should be the most recent rev of the > core that QEMU knows about [and that we should be happy to change qemu > to move up to supporting newer revisions] > (Anybody want to argue with (1) ?) I concur that an easy-to-type -cpu should provide the latest and greatest features. Features hidden will not get much exposure. But if a revision noticeably changes behavior, I guess we should remain command line compatible. Andreas
Re: [Qemu-devel] [PATCH v2] tap: Add optional parameters to up/down script
On 30.09 2011 11:45, Sasha Levin wrote: > Subject: [PATCH v2] tap: Add optional parameters to up/down script > > This allows the user to add custom parameters to the up or down > scripts. > > Extra parameters are useful in more complex networking scenarios > where we would like to configure network devices when starting > or stopping the guest. PATCH v2 isn't working for me. Neither the scriptparams nor the downscriptparams. Usage was like: qemu-system-x86_64 -m 512 -net nic,macaddr=[...] -net tap,script=/path/to/script,scriptparams="param1", downscript=/path/to/downscript -drive... Greetings Thomas
[Qemu-devel] [PATCH 13/64] PPC: E500: Generate IRQ lines for many CPUs
Now that we can generate multiple envs for all our virtual CPUs, we also need to tell the MPIC that we have multiple CPUs connected and connect them all to the respective virtual interrupt lines. Signed-off-by: Alexander Graf --- hw/ppce500_mpc8544ds.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index 8d05587..9cb01f3 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -237,7 +237,7 @@ static void mpc8544ds_init(ram_addr_t ram_size, target_long initrd_size=0; int i=0; unsigned int pci_irq_nrs[4] = {1, 2, 3, 4}; -qemu_irq *irqs, *mpic; +qemu_irq **irqs, *mpic; DeviceState *dev; struct boot_info *boot_info; CPUState *firstenv = NULL; @@ -247,6 +247,8 @@ static void mpc8544ds_init(ram_addr_t ram_size, cpu_model = "e500v2_v30"; } +irqs = g_malloc0(smp_cpus * sizeof(qemu_irq *)); +irqs[0] = g_malloc0(smp_cpus * sizeof(qemu_irq) * OPENPIC_OUTPUT_NB); for (i = 0; i < smp_cpus; i++) { qemu_irq *input; env = cpu_ppc_init(cpu_model); @@ -259,6 +261,10 @@ static void mpc8544ds_init(ram_addr_t ram_size, firstenv = env; } +irqs[i] = irqs[0] + (i * OPENPIC_OUTPUT_NB); +input = (qemu_irq *)env->irq_inputs; +irqs[i][OPENPIC_OUTPUT_INT] = input[PPCE500_INPUT_INT]; +irqs[i][OPENPIC_OUTPUT_CINT] = input[PPCE500_INPUT_CINT]; env->spr[SPR_BOOKE_PIR] = env->cpu_index = i; /* XXX register timer? */ @@ -283,10 +289,11 @@ static void mpc8544ds_init(ram_addr_t ram_size, "mpc8544ds.ram", ram_size)); /* MPIC */ -irqs = g_malloc0(sizeof(qemu_irq) * OPENPIC_OUTPUT_NB); -irqs[OPENPIC_OUTPUT_INT] = ((qemu_irq *)env->irq_inputs)[PPCE500_INPUT_INT]; -irqs[OPENPIC_OUTPUT_CINT] = ((qemu_irq *)env->irq_inputs)[PPCE500_INPUT_CINT]; -mpic = mpic_init(MPC8544_MPIC_REGS_BASE, 1, &irqs, NULL); +mpic = mpic_init(MPC8544_MPIC_REGS_BASE, smp_cpus, irqs, NULL); + +if (!mpic) { +cpu_abort(env, "MPIC failed to initialize\n"); +} /* Serial */ if (serial_hds[0]) { -- 1.6.0.2
[Qemu-devel] [PATCH 20/64] PPC: KVM: Remove kvmppc_read_host_property
We just got rid of the last user of kvmppc_read_host_property, so we can now safely remove it. Signed-off-by: Alexander Graf --- target-ppc/kvm_ppc.c | 35 --- target-ppc/kvm_ppc.h | 11 --- 2 files changed, 0 insertions(+), 46 deletions(-) diff --git a/target-ppc/kvm_ppc.c b/target-ppc/kvm_ppc.c index 26ecc9d..24fc6bc 100644 --- a/target-ppc/kvm_ppc.c +++ b/target-ppc/kvm_ppc.c @@ -21,41 +21,6 @@ static QEMUTimer *kvmppc_timer; static unsigned int kvmppc_timer_rate; -#ifdef CONFIG_FDT -int kvmppc_read_host_property(const char *node_path, const char *prop, - void *val, size_t len) -{ -char *path; -FILE *f; -int ret = 0; -int pathlen; - -pathlen = snprintf(NULL, 0, "%s/%s/%s", PROC_DEVTREE_PATH, node_path, prop) - + 1; -path = g_malloc(pathlen); - -snprintf(path, pathlen, "%s/%s/%s", PROC_DEVTREE_PATH, node_path, prop); - -f = fopen(path, "rb"); -if (f == NULL) { -ret = errno; -goto free; -} - -len = fread(val, len, 1, f); -if (len != 1) { -ret = ferror(f); -goto close; -} - -close: -fclose(f); -free: -free(path); -return ret; -} -#endif - static void kvmppc_timer_hack(void *opaque) { qemu_notify_event(); diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 7c08c0f..0c659c8 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -10,17 +10,6 @@ #define __KVM_PPC_H__ void kvmppc_init(void); -#ifndef CONFIG_KVM -static inline int kvmppc_read_host_property(const char *node_path, const char *prop, -void *val, size_t len) -{ -assert(0); -return -ENOSYS; -} -#else -int kvmppc_read_host_property(const char *node_path, const char *prop, - void *val, size_t len); -#endif uint32_t kvmppc_get_tbfreq(void); uint64_t kvmppc_get_clockfreq(void); -- 1.6.0.2
[Qemu-devel] [PATCH 28/64] device tree: give dt more size
We currently load a device tree blob and then just take its size x2 to account for modifications we do inside. While this is nice and great, it fails when we have a small device tree as blob and lots of nodes added in machine init code. So for now, just make it 20k bigger than it was before. We maybe want to be more clever about this later. Signed-off-by: Alexander Graf --- device_tree.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/device_tree.c b/device_tree.c index 751538e..dc69232 100644 --- a/device_tree.c +++ b/device_tree.c @@ -41,6 +41,7 @@ void *load_device_tree(const char *filename_path, int *sizep) } /* Expand to 2x size to give enough room for manipulation. */ +dt_size += 1; dt_size *= 2; /* First allocate space in qemu for device tree */ fdt = g_malloc0(dt_size); -- 1.6.0.2
[Qemu-devel] [PATCH 39/64] pseries: More complete WIMG validation in H_ENTER code
From: David Gibson Currently our implementation of the H_ENTER hypercall, which inserts a mapping in the hash page table assumes that only ordinary memory is ever mapped, and only permits mapping attribute bits accordingly (WIMG==0010). However, we intend to start adding emulated IO to the pseries platform (and real IO with PCI passthrough on kvm) which means this simple test will no longer suffice. This patch extends the h_enter validation code to check if the given address is a RAM address. If it is it enforces WIMG==0010, otherwise it assumes that it is an IO mapping and instead enforces WIMG=010x. Signed-off-by: David Gibson Signed-off-by: Alexander Graf --- hw/spapr.c |3 ++- hw/spapr.h |1 + hw/spapr_hcall.c | 22 ++ 3 files changed, 21 insertions(+), 5 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 9eefef9..00aed62 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -336,7 +336,8 @@ static void ppc_spapr_init(ram_addr_t ram_size, } /* allocate RAM */ -ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size); +spapr->ram_limit = ram_size; +ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", spapr->ram_limit); cpu_register_physical_memory(0, ram_size, ram_offset); /* allocate hash page table. For now we always make this 16mb, diff --git a/hw/spapr.h b/hw/spapr.h index 009c459..3d21b7a 100644 --- a/hw/spapr.h +++ b/hw/spapr.h @@ -10,6 +10,7 @@ typedef struct sPAPREnvironment { struct VIOsPAPRBus *vio_bus; struct icp_state *icp; +target_phys_addr_t ram_limit; void *htab; long htab_size; target_phys_addr_t fdt_addr, rtas_addr; diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c index f7ead04..70f853c 100644 --- a/hw/spapr_hcall.c +++ b/hw/spapr_hcall.c @@ -99,6 +99,8 @@ static target_ulong h_enter(CPUState *env, sPAPREnvironment *spapr, target_ulong pte_index = args[1]; target_ulong pteh = args[2]; target_ulong ptel = args[3]; +target_ulong page_shift = 12; +target_ulong raddr; target_ulong i; uint8_t *hpte; @@ -111,6 +113,7 @@ static target_ulong h_enter(CPUState *env, sPAPREnvironment *spapr, #endif if ((ptel & 0xff000) == 0) { /* 16M page */ +page_shift = 24; /* lowest AVA bit must be 0 for 16M pages */ if (pteh & 0x80) { return H_PARAMETER; @@ -120,12 +123,23 @@ static target_ulong h_enter(CPUState *env, sPAPREnvironment *spapr, } } -/* FIXME: bounds check the pa? */ +raddr = (ptel & HPTE_R_RPN) & ~((1ULL << page_shift) - 1); -/* Check WIMG */ -if ((ptel & HPTE_R_WIMG) != HPTE_R_M) { -return H_PARAMETER; +if (raddr < spapr->ram_limit) { +/* Regular RAM - should have WIMG=0010 */ +if ((ptel & HPTE_R_WIMG) != HPTE_R_M) { +return H_PARAMETER; +} +} else { +/* Looks like an IO address */ +/* FIXME: What WIMG combinations could be sensible for IO? + * For now we allow WIMG=010x, but are there others? */ +/* FIXME: Should we check against registered IO addresses? */ +if ((ptel & (HPTE_R_W | HPTE_R_I | HPTE_R_M)) != HPTE_R_I) { +return H_PARAMETER; +} } + pteh &= ~0x60ULL; if ((pte_index * HASH_PTE_SIZE_64) & ~env->htab_mask) { -- 1.6.0.2
[Qemu-devel] [PATCH 29/64] MPC8544DS: Remove CPU nodes
We want to generate the CPU nodes in machine init code, so remove them from the device tree definition that we precompile. Signed-off-by: Alexander Graf --- pc-bios/mpc8544ds.dtb | Bin 2277 -> 2028 bytes pc-bios/mpc8544ds.dts | 12 2 files changed, 0 insertions(+), 12 deletions(-) diff --git a/pc-bios/mpc8544ds.dtb b/pc-bios/mpc8544ds.dtb index ae318b1fe83846cc2e133951a3666fcfcdf87f79..c6d302153c7407d5d0127be29b0c35f80e47f8fb 100644 GIT binary patch delta 424 zcmaDV_=aEO0`I@K3=HgV7#J8V7#P?t0BH>%76f7eAO-?P8KC%#jT*{~lRq;qVGNu+ zgGpO80wTx2Se#mvnV92XVrpOj5@H5o79dUoaVFO=n@yHu7E~<+@qhp%%K^lVK&%DC zOh63N(K9)OS(!0yas{(Dk?LOn)z6*G!y?7RuxYXeOPCPDVW4@8NM@d#Jb@*NiQ(ep zFD&Xnqh(mFTTP3F~Xi9v};53e2?3j(nK5CZ{YE>PTIqlPkLJ!3$Ad1_IB zvyO$SiHU;&Seh9~vH-DTazQCb0LJ$Paex5E4+OFmkod`He4yqApb%Vr6B@stfk6!< z4_B}V%tP=uK>19QJs6iW9+>=rQJZnmWEm!T#^aN1n7mbC@*oFs0P!Ut)&gQCAci^e z?&LL0%0TrOh*s~wtStKu$plcqfdJG*M&`*4%wa-|B0wQVBw?w^FPM{<7?mdbu&4v= zD`Bx>Vl; #size-cells = <0>; - - PowerPC,8544@0 { - device_type = "cpu"; - reg = <0x0>; - d-cache-line-size = <32>; // 32 bytes - i-cache-line-size = <32>; // 32 bytes - d-cache-size = <0x8000>;// L1, 32K - i-cache-size = <0x8000>;// L1, 32K - timebase-frequency = <0>; - bus-frequency = <0>; - clock-frequency = <0>; - }; }; memory { -- 1.6.0.2
[Qemu-devel] [PATCH 26/64] device tree: add add_subnode command
We want to be able to create subnodes in our device tree, so export it through the qemu device tree abstraction framework. Signed-off-by: Alexander Graf --- device_tree.c | 24 device_tree.h |1 + 2 files changed, 25 insertions(+), 0 deletions(-) diff --git a/device_tree.c b/device_tree.c index 23e89e3..f4a78c8 100644 --- a/device_tree.c +++ b/device_tree.c @@ -118,3 +118,27 @@ int qemu_devtree_nop_node(void *fdt, const char *node_path) return fdt_nop_node(fdt, offset); } + +int qemu_devtree_add_subnode(void *fdt, const char *name) +{ +int offset; +char *dupname = g_strdup(name); +char *basename = strrchr(dupname, '/'); +int retval; + +if (!basename) { +return -1; +} + +basename[0] = '\0'; +basename++; + +offset = fdt_path_offset(fdt, dupname); +if (offset < 0) { +return offset; +} + +retval = fdt_add_subnode(fdt, offset, basename); +g_free(dupname); +return retval; +} diff --git a/device_tree.h b/device_tree.h index 76fce5f..4378685 100644 --- a/device_tree.h +++ b/device_tree.h @@ -23,5 +23,6 @@ int qemu_devtree_setprop_cell(void *fdt, const char *node_path, int qemu_devtree_setprop_string(void *fdt, const char *node_path, const char *property, const char *string); int qemu_devtree_nop_node(void *fdt, const char *node_path); +int qemu_devtree_add_subnode(void *fdt, const char *name); #endif /* __DEVICE_TREE_H__ */ -- 1.6.0.2
[Qemu-devel] [PATCH 30/64] MPC8544DS: Generate CPU nodes on init
With this patch, we generate CPU nodes in the machine initialization, giving us the freedom to generate as many nodes as we want and as the machine supports, but only those. This is a first step towards a much cleaner device tree generation infrastructure, where we would not require precompiled dtb blobs anymore. Signed-off-by: Alexander Graf --- hw/ppce500_mpc8544ds.c | 46 +- 1 files changed, 33 insertions(+), 13 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index a3e1ce4..dfa8034 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -123,23 +123,43 @@ static int mpc8544_load_device_tree(CPUState *env, hypercall, sizeof(hypercall)); } -for (i = 0; i < smp_cpus; i++) { +/* We need to generate the cpu nodes in reverse order, so Linux can pick + the first node as boot node and be happy */ +for (i = smp_cpus - 1; i >= 0; i--) { char cpu_name[128]; -uint64_t cpu_release_addr[] = { -cpu_to_be64(MPC8544_SPIN_BASE + (i * 0x20)) -}; +uint64_t cpu_release_addr = cpu_to_be64(MPC8544_SPIN_BASE + (i * 0x20)); + +for (env = first_cpu; env != NULL; env = env->next_cpu) { +if (env->cpu_index == i) { +break; +} +} + +if (!env) { +continue; +} -snprintf(cpu_name, sizeof(cpu_name), "/cpus/PowerPC,8544@%x", i); +snprintf(cpu_name, sizeof(cpu_name), "/cpus/PowerPC,8544@%x", env->cpu_index); +qemu_devtree_add_subnode(fdt, cpu_name); qemu_devtree_setprop_cell(fdt, cpu_name, "clock-frequency", clock_freq); qemu_devtree_setprop_cell(fdt, cpu_name, "timebase-frequency", tb_freq); -qemu_devtree_setprop(fdt, cpu_name, "cpu-release-addr", - cpu_release_addr, sizeof(cpu_release_addr)); -} - -for (i = smp_cpus; i < 32; i++) { -char cpu_name[128]; -snprintf(cpu_name, sizeof(cpu_name), "/cpus/PowerPC,8544@%x", i); -qemu_devtree_nop_node(fdt, cpu_name); +qemu_devtree_setprop_string(fdt, cpu_name, "device_type", "cpu"); +qemu_devtree_setprop_cell(fdt, cpu_name, "reg", env->cpu_index); +qemu_devtree_setprop_cell(fdt, cpu_name, "d-cache-line-size", + env->dcache_line_size); +qemu_devtree_setprop_cell(fdt, cpu_name, "i-cache-line-size", + env->icache_line_size); +qemu_devtree_setprop_cell(fdt, cpu_name, "d-cache-size", 0x8000); +qemu_devtree_setprop_cell(fdt, cpu_name, "i-cache-size", 0x8000); +qemu_devtree_setprop_cell(fdt, cpu_name, "bus-frequency", 0); +if (env->cpu_index) { +qemu_devtree_setprop_string(fdt, cpu_name, "status", "disabled"); +qemu_devtree_setprop_string(fdt, cpu_name, "enable-method", "spin-table"); +qemu_devtree_setprop(fdt, cpu_name, "cpu-release-addr", + &cpu_release_addr, sizeof(cpu_release_addr)); +} else { +qemu_devtree_setprop_string(fdt, cpu_name, "status", "okay"); +} } ret = rom_add_blob_fixed(BINARY_DEVICE_TREE_FILE, fdt, fdt_size, addr); -- 1.6.0.2
[Qemu-devel] [PATCH 37/64] pseries: Add a phandle to the xicp interrupt controller device tree node
From: David Gibson Future devices we will be adding to the pseries machine (e.g. PCI) will need nodes in the device tree which explicitly reference the top-level interrupt controller via interrupt-parent or interrupt-map properties. In order to do this, the interrupt controller node needs an assigned phandle. This patch adds the appropriate property, in preparation. Signed-off-by: David Gibson Signed-off-by: Alexander Graf --- hw/spapr.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 760e323..bb00ae6 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -57,6 +57,8 @@ #define MAX_CPUS256 #define XICS_IRQS 1024 +#define PHANDLE_XICP0x + sPAPREnvironment *spapr; static void *spapr_create_fdt_skel(const char *cpu_model, @@ -202,6 +204,9 @@ static void *spapr_create_fdt_skel(const char *cpu_model, _FDT((fdt_property(fdt, "ibm,interrupt-server-ranges", interrupt_server_ranges_prop, sizeof(interrupt_server_ranges_prop; +_FDT((fdt_property_cell(fdt, "#interrupt-cells", 2))); +_FDT((fdt_property_cell(fdt, "linux,phandle", PHANDLE_XICP))); +_FDT((fdt_property_cell(fdt, "phandle", PHANDLE_XICP))); _FDT((fdt_end_node(fdt))); -- 1.6.0.2
[Qemu-devel] [PATCH 54/64] ppc: move ADB stuff from ppc_mac.h to adb.h
From: Laurent Vivier Allow to use ADB in non-ppc macintosh Signed-off-by: Laurent Vivier Signed-off-by: Alexander Graf --- hw/adb.c |2 +- hw/adb.h | 67 + hw/cuda.c |1 + hw/ppc_mac.h | 42 - hw/ppc_newworld.c |1 + hw/ppc_oldworld.c |1 + 6 files changed, 71 insertions(+), 43 deletions(-) create mode 100644 hw/adb.h diff --git a/hw/adb.c b/hw/adb.c index 8dedbf8..aa15f55 100644 --- a/hw/adb.c +++ b/hw/adb.c @@ -22,7 +22,7 @@ * THE SOFTWARE. */ #include "hw.h" -#include "ppc_mac.h" +#include "adb.h" #include "console.h" /* debug ADB */ diff --git a/hw/adb.h b/hw/adb.h new file mode 100644 index 000..b2a591c --- /dev/null +++ b/hw/adb.h @@ -0,0 +1,67 @@ +/* + * QEMU ADB emulation shared definitions and prototypes + * + * Copyright (c) 2004-2007 Fabrice Bellard + * Copyright (c) 2007 Jocelyn Mayer + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#if !defined(__ADB_H__) +#define __ADB_H__ + +#define MAX_ADB_DEVICES 16 + +#define ADB_MAX_OUT_LEN 16 + +typedef struct ADBDevice ADBDevice; + +/* buf = NULL means polling */ +typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out, + const uint8_t *buf, int len); +typedef int ADBDeviceReset(ADBDevice *d); + +struct ADBDevice { +struct ADBBusState *bus; +int devaddr; +int handler; +ADBDeviceRequest *devreq; +ADBDeviceReset *devreset; +void *opaque; +}; + +typedef struct ADBBusState { +ADBDevice devices[MAX_ADB_DEVICES]; +int nb_devices; +int poll_index; +} ADBBusState; + +int adb_request(ADBBusState *s, uint8_t *buf_out, +const uint8_t *buf, int len); +int adb_poll(ADBBusState *s, uint8_t *buf_out); + +ADBDevice *adb_register_device(ADBBusState *s, int devaddr, + ADBDeviceRequest *devreq, + ADBDeviceReset *devreset, + void *opaque); +void adb_kbd_init(ADBBusState *bus); +void adb_mouse_init(ADBBusState *bus); + +extern ADBBusState adb_bus; +#endif /* !defined(__ADB_H__) */ diff --git a/hw/cuda.c b/hw/cuda.c index 5c92d81..6f05975 100644 --- a/hw/cuda.c +++ b/hw/cuda.c @@ -24,6 +24,7 @@ */ #include "hw.h" #include "ppc_mac.h" +#include "adb.h" #include "qemu-timer.h" #include "sysemu.h" diff --git a/hw/ppc_mac.h b/hw/ppc_mac.h index 7351bb6..af75e45 100644 --- a/hw/ppc_mac.h +++ b/hw/ppc_mac.h @@ -77,46 +77,4 @@ void macio_nvram_setup_bar(MacIONVRAMState *s, MemoryRegion *bar, void pmac_format_nvram_partition (MacIONVRAMState *nvr, int len); uint32_t macio_nvram_read (void *opaque, uint32_t addr); void macio_nvram_write (void *opaque, uint32_t addr, uint32_t val); - -/* adb.c */ - -#define MAX_ADB_DEVICES 16 - -#define ADB_MAX_OUT_LEN 16 - -typedef struct ADBDevice ADBDevice; - -/* buf = NULL means polling */ -typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out, - const uint8_t *buf, int len); -typedef int ADBDeviceReset(ADBDevice *d); - -struct ADBDevice { -struct ADBBusState *bus; -int devaddr; -int handler; -ADBDeviceRequest *devreq; -ADBDeviceReset *devreset; -void *opaque; -}; - -typedef struct ADBBusState { -ADBDevice devices[MAX_ADB_DEVICES]; -int nb_devices; -int poll_index; -} ADBBusState; - -int adb_request(ADBBusState *s, uint8_t *buf_out, -const uint8_t *buf, int len); -int adb_poll(ADBBusState *s, uint8_t *buf_out); - -ADBDevice *adb_register_device(ADBBusState *s, int devaddr, - ADBDeviceRequest *devreq, - ADBDeviceReset *devreset, - void *opaque); -void adb_kbd_init(ADBBusState *bus); -void adb_mouse_init(ADBBusState *bus); - -extern ADBBusState adb_bus; - #endif /* !defined(__PPC_MAC_H__) */ diff --git a
[Qemu-devel] [PATCH 35/64] PPC: SPAPR: Use KVM function for time info
One of the things we can't fake on PPC is the timer speed. So we need to extract the frequency information from the host and put it back into the guest device tree. Luckily, we already have functions for that from the non-pseries targets, so all we need to do is to connect the dots and the guest suddenly gets to know its real timer speeds. Signed-off-by: Alexander Graf --- hw/spapr.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index c5c9a95..760e323 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -140,6 +140,8 @@ static void *spapr_create_fdt_skel(const char *cpu_model, char *nodename; uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40), 0x, 0x}; +uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq() : TIMEBASE_FREQ; +uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 10; if (asprintf(&nodename, "%s@%x", modelname, index) < 0) { fprintf(stderr, "Allocation failure\n"); @@ -158,10 +160,8 @@ static void *spapr_create_fdt_skel(const char *cpu_model, env->dcache_line_size))); _FDT((fdt_property_cell(fdt, "icache-block-size", env->icache_line_size))); -_FDT((fdt_property_cell(fdt, "timebase-frequency", TIMEBASE_FREQ))); -/* Hardcode CPU frequency for now. It's kind of arbitrary on - * full emu, for kvm we should copy it from the host */ -_FDT((fdt_property_cell(fdt, "clock-frequency", 10))); +_FDT((fdt_property_cell(fdt, "timebase-frequency", tbfreq))); +_FDT((fdt_property_cell(fdt, "clock-frequency", cpufreq))); _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr))); _FDT((fdt_property(fdt, "ibm,pft-size", pft_size_prop, sizeof(pft_size_prop; -- 1.6.0.2
[Qemu-devel] [PATCH 53/64] openpic: Unfold write_IRQreg
The helper function write_IRQreg was always called with a specific argument on the type of register to access. Inside the function we were simply doing a switch on that constant argument again. It's a lot easier to just unfold this into two separate functions and call each individually. Reported-by: Blue Swirl Signed-off-by: Alexander Graf --- hw/openpic.c | 79 +++-- 1 files changed, 37 insertions(+), 42 deletions(-) diff --git a/hw/openpic.c b/hw/openpic.c index fbd8837..43b8f27 100644 --- a/hw/openpic.c +++ b/hw/openpic.c @@ -482,30 +482,25 @@ static inline uint32_t read_IRQreg_ipvp(openpic_t *opp, int n_IRQ) return opp->src[n_IRQ].ipvp; } -static inline void write_IRQreg (openpic_t *opp, int n_IRQ, - uint32_t reg, uint32_t val) +static inline void write_IRQreg_ide(openpic_t *opp, int n_IRQ, uint32_t val) { uint32_t tmp; -switch (reg) { -case IRQ_IPVP: -/* NOTE: not fully accurate for special IRQs, but simple and - sufficient */ -/* ACTIVITY bit is read-only */ -opp->src[n_IRQ].ipvp = -(opp->src[n_IRQ].ipvp & 0x4000) | -(val & 0x800F00FF); -openpic_update_irq(opp, n_IRQ); -DPRINTF("Set IPVP %d to 0x%08x -> 0x%08x\n", -n_IRQ, val, opp->src[n_IRQ].ipvp); -break; -case IRQ_IDE: -tmp = val & 0xC000; -tmp |= val & ((1ULL << MAX_CPU) - 1); -opp->src[n_IRQ].ide = tmp; -DPRINTF("Set IDE %d to 0x%08x\n", n_IRQ, opp->src[n_IRQ].ide); -break; -} +tmp = val & 0xC000; +tmp |= val & ((1ULL << MAX_CPU) - 1); +opp->src[n_IRQ].ide = tmp; +DPRINTF("Set IDE %d to 0x%08x\n", n_IRQ, opp->src[n_IRQ].ide); +} + +static inline void write_IRQreg_ipvp(openpic_t *opp, int n_IRQ, uint32_t val) +{ +/* NOTE: not fully accurate for special IRQs, but simple and sufficient */ +/* ACTIVITY bit is read-only */ +opp->src[n_IRQ].ipvp = (opp->src[n_IRQ].ipvp & 0x4000) + | (val & 0x800F00FF); +openpic_update_irq(opp, n_IRQ); +DPRINTF("Set IPVP %d to 0x%08x -> 0x%08x\n", n_IRQ, val, +opp->src[n_IRQ].ipvp); } #if 0 // Code provision for Intel model @@ -535,10 +530,10 @@ static void write_doorbell_register (penpic_t *opp, int n_dbl, { switch (offset) { case DBL_IVPR_OFFSET: -write_IRQreg(opp, IRQ_DBL0 + n_dbl, IRQ_IPVP, value); +write_IRQreg_ipvp(opp, IRQ_DBL0 + n_dbl, value); break; case DBL_IDE_OFFSET: -write_IRQreg(opp, IRQ_DBL0 + n_dbl, IRQ_IDE, value); +write_IRQreg_ide(opp, IRQ_DBL0 + n_dbl, value); break; case DBL_DMR_OFFSET: opp->doorbells[n_dbl].dmr = value; @@ -576,10 +571,10 @@ static void write_mailbox_register (openpic_t *opp, int n_mbx, opp->mailboxes[n_mbx].mbr = value; break; case MBX_IVPR_OFFSET: -write_IRQreg(opp, IRQ_MBX0 + n_mbx, IRQ_IPVP, value); +write_IRQreg_ipvp(opp, IRQ_MBX0 + n_mbx, value); break; case MBX_DMR_OFFSET: -write_IRQreg(opp, IRQ_MBX0 + n_mbx, IRQ_IDE, value); +write_IRQreg_ide(opp, IRQ_MBX0 + n_mbx, value); break; } } @@ -636,7 +631,7 @@ static void openpic_gbl_write (void *opaque, target_phys_addr_t addr, uint32_t v { int idx; idx = (addr - 0x10A0) >> 4; -write_IRQreg(opp, opp->irq_ipi0 + idx, IRQ_IPVP, val); +write_IRQreg_ipvp(opp, opp->irq_ipi0 + idx, val); } break; case 0x10E0: /* SPVE */ @@ -729,10 +724,10 @@ static void openpic_timer_write (void *opaque, uint32_t addr, uint32_t val) opp->timers[idx].tibc = val; break; case 0x20: /* TIVP */ -write_IRQreg(opp, opp->irq_tim0 + idx, IRQ_IPVP, val); +write_IRQreg_ipvp(opp, opp->irq_tim0 + idx, val); break; case 0x30: /* TIDE */ -write_IRQreg(opp, opp->irq_tim0 + idx, IRQ_IDE, val); +write_IRQreg_ide(opp, opp->irq_tim0 + idx, val); break; } } @@ -782,10 +777,10 @@ static void openpic_src_write (void *opaque, uint32_t addr, uint32_t val) idx = addr >> 5; if (addr & 0x10) { /* EXDE / IFEDE / IEEDE */ -write_IRQreg(opp, idx, IRQ_IDE, val); +write_IRQreg_ide(opp, idx, val); } else { /* EXVP / IFEVP / IEEVP */ -write_IRQreg(opp, idx, IRQ_IPVP, val); +write_IRQreg_ipvp(opp, idx, val); } } @@ -835,8 +830,8 @@ static void openpic_cpu_write_internal(void *opaque, target_phys_addr_t addr, case 0x70: idx = (addr - 0x40) >> 4; /* we use IDE as mask which CPUs to deliver the IPI to still. */ -write_IRQreg(opp, opp->irq_ipi0 + idx, IRQ_IDE, - opp->src[opp->irq_ipi0 + idx].ide | val); +write_IRQreg_ide(opp, opp->irq_ipi0 + idx, + opp->src[opp
[Qemu-devel] [PATCH 46/64] ppc: booke206: use MAV=2.0 TSIZE definition, fix 4G pages
From: Scott Wood This definition is backward compatible with MAV=1.0 as long as the guest does not set reserved bits in MAS1/MAS4. Also, fix the shift in booke206_tlb_to_page_size -- it's the base that should be able to hold a 4G page size, not the shift count. Signed-off-by: Scott Wood Signed-off-by: Alexander Graf --- hw/ppce500_mpc8544ds.c |2 +- target-ppc/cpu.h |4 ++-- target-ppc/helper.c|5 +++-- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index 61151d8..8095516 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -174,7 +174,7 @@ out: /* Create -kernel TLB entries for BookE, linearly spanning 256MB. */ static inline target_phys_addr_t booke206_page_size_to_tlb(uint64_t size) { -return (ffs(size >> 10) - 1) >> 1; +return ffs(size >> 10) - 1; } static void mmubooke_create_initial_mapping(CPUState *env, diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index 5200e6e..32706df 100644 --- a/target-ppc/cpu.h +++ b/target-ppc/cpu.h @@ -667,8 +667,8 @@ enum { #define MAS0_ATSEL_TLB 0 #define MAS0_ATSEL_LRATMAS0_ATSEL -#define MAS1_TSIZE_SHIFT 8 -#define MAS1_TSIZE_MASK(0xf << MAS1_TSIZE_SHIFT) +#define MAS1_TSIZE_SHIFT 7 +#define MAS1_TSIZE_MASK(0x1f << MAS1_TSIZE_SHIFT) #define MAS1_TS_SHIFT 12 #define MAS1_TS(1 << MAS1_TS_SHIFT) diff --git a/target-ppc/helper.c b/target-ppc/helper.c index 4b3731e..6339be3 100644 --- a/target-ppc/helper.c +++ b/target-ppc/helper.c @@ -1293,7 +1293,7 @@ target_phys_addr_t booke206_tlb_to_page_size(CPUState *env, ppcmas_tlb_t *tlb) { uint32_t tlbncfg; int tlbn = booke206_tlbm_to_tlbn(env, tlb); -target_phys_addr_t tlbm_size; +int tlbm_size; tlbncfg = env->spr[SPR_BOOKE_TLB0CFG + tlbn]; @@ -1301,9 +1301,10 @@ target_phys_addr_t booke206_tlb_to_page_size(CPUState *env, ppcmas_tlb_t *tlb) tlbm_size = (tlb->mas1 & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT; } else { tlbm_size = (tlbncfg & TLBnCFG_MINSIZE) >> TLBnCFG_MINSIZE_SHIFT; +tlbm_size <<= 1; } -return (1 << (tlbm_size << 1)) << 10; +return 1024ULL << tlbm_size; } /* TLB check function for MAS based SoftTLBs */ -- 1.6.0.2
[Qemu-devel] [PATCH 57/64] KVM: Update kernel headers
Removes ABI-breaking HIOR parts - KVM patch to follow. Signed-off-by: Alexander Graf --- linux-headers/asm-powerpc/kvm.h |8 linux-headers/linux/kvm.h |1 - 2 files changed, 0 insertions(+), 9 deletions(-) diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h index 28eecf0..25964ee 100644 --- a/linux-headers/asm-powerpc/kvm.h +++ b/linux-headers/asm-powerpc/kvm.h @@ -149,12 +149,6 @@ struct kvm_regs { #define KVM_SREGS_E_UPDATE_DBSR(1 << 3) /* - * Book3S special bits to indicate contents in the struct by maintaining - * backwards compatibility with older structs. If adding a new field, - * please make sure to add a flag for that new field */ -#define KVM_SREGS_S_HIOR (1 << 0) - -/* * In KVM_SET_SREGS, reserved/pad fields must be left untouched from a * previous KVM_GET_REGS. * @@ -179,8 +173,6 @@ struct kvm_sregs { __u64 ibat[8]; __u64 dbat[8]; } ppc32; - __u64 flags; /* KVM_SREGS_S_ */ - __u64 hior; } s; struct { union { diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 8bb6cde..6f5095c 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -554,7 +554,6 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_PPC_SMT 64 #define KVM_CAP_PPC_RMA65 #define KVM_CAP_MAX_VCPUS 66 /* returns max vcpus per vm */ -#define KVM_CAP_PPC_HIOR 67 #define KVM_CAP_PPC_PAPR 68 #define KVM_CAP_SW_TLB 69 -- 1.6.0.2
[Qemu-devel] [PATCH 21/64] PPC: KVM: Add stubs for kvm helper functions
We have a bunch of helper functions that don't have any stubs for them in case we don't have CONFIG_KVM enabled. That didn't bite us so far, because gcc can optimize them out pretty well, but we should really provide them. Signed-off-by: Alexander Graf --- v1 -> v2: - use uint64_t for clockfreq --- target-ppc/kvm_ppc.h | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 0c659c8..76f98d9 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -11,11 +11,37 @@ void kvmppc_init(void); +#ifdef CONFIG_KVM + uint32_t kvmppc_get_tbfreq(void); uint64_t kvmppc_get_clockfreq(void); int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len); int kvmppc_set_interrupt(CPUState *env, int irq, int level); +#else + +static inline uint32_t kvmppc_get_tbfreq(void) +{ +return 0; +} + +static inline uint64_t kvmppc_get_clockfreq(void) +{ +return 0; +} + +static inline int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len) +{ +return -1; +} + +static inline int kvmppc_set_interrupt(CPUState *env, int irq, int level) +{ +return -1; +} + +#endif + #ifndef CONFIG_KVM #define kvmppc_eieio() do { } while (0) #else -- 1.6.0.2
[Qemu-devel] [PATCH 42/64] pseries: use macro for firmware filename
From: Nishanth Aravamudan For some time we've had a nicely defined macro with the filename for our firmware image. However we didn't actually use it in the place we're supposed to. This patch fixes it. Signed-off-by: Nishanth Aravamudan Signed-off-by: David Gibson Signed-off-by: Alexander Graf --- hw/spapr.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 00aed62..91953cf 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -442,7 +442,7 @@ static void ppc_spapr_init(ram_addr_t ram_size, "%ldM guest RAM\n", MIN_RAM_SLOF); exit(1); } -filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "slof.bin"); +filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, FW_FILE_NAME); fw_size = load_image_targphys(filename, 0, FW_MAX_SIZE); if (fw_size < 0) { hw_error("qemu: could not load LPAR rtas '%s'\n", filename); -- 1.6.0.2
[Qemu-devel] [PATCH 31/64] PPC: E500: Bump CPU count to 15
Now that we have everything in place, make the machine description aware of the fact that we can now handle 15 virtual CPUs! Signed-off-by: Alexander Graf --- v1 -> v2: - Max cpus is 15 because of MPIC --- hw/ppce500_mpc8544ds.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index dfa8034..b86a008 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -396,6 +396,7 @@ static QEMUMachine mpc8544ds_machine = { .name = "mpc8544ds", .desc = "mpc8544ds", .init = mpc8544ds_init, +.max_cpus = 15, }; static void mpc8544ds_machine_init(void) -- 1.6.0.2
[Qemu-devel] [PATCH 23/64] PPC: E500: Remove unneeded CPU nodes
We should only keep CPU nodes in the device tree around that we really have virtual CPUs for. So remove all superfluous entries that we just keep there in case someone wants to create a lot of vCPUs. Signed-off-by: Alexander Graf --- hw/ppce500_mpc8544ds.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index 0791e27..9379624 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -129,6 +129,12 @@ static int mpc8544_load_device_tree(CPUState *env, qemu_devtree_setprop_cell(fdt, cpu_name, "timebase-frequency", tb_freq); } +for (i = smp_cpus; i < 32; i++) { +char cpu_name[128]; +snprintf(cpu_name, sizeof(cpu_name), "/cpus/PowerPC,8544@%x", i); +qemu_devtree_nop_node(fdt, cpu_name); +} + ret = rom_add_blob_fixed(BINARY_DEVICE_TREE_FILE, fdt, fdt_size, addr); g_free(fdt); -- 1.6.0.2