Re: [Qemu-devel] [PATCH v3] Sort the help info shown in monitor at runtime
On Wed, 2011-10-12 at 11:32 +0800, Wayne Xia wrote: This patch would try sort the command list in monitor at runtime. As a result, command help and help info would show a more friendly sorted command list. For eg: (qemu)help acl_add acl_policy acl_remove acl_reset acl_show balloon block_passwd ... the command list is sorted. v3: using qsort function to sort the command list. Signed-off-by: Wayne Xia xiaw...@linux.vnet.ibm.com --- monitor.c | 30 ++ 1 files changed, 26 insertions(+), 4 deletions(-) diff --git a/monitor.c b/monitor.c index 31b212a..a172167 100644 --- a/monitor.c +++ b/monitor.c @@ -195,8 +195,8 @@ static inline int mon_print_count_get(const Monitor *mon) { return 0; } static QLIST_HEAD(mon_list, Monitor) mon_list; -static const mon_cmd_t mon_cmds[]; -static const mon_cmd_t info_cmds[]; +static mon_cmd_t mon_cmds[]; +static mon_cmd_t info_cmds[]; static const mon_cmd_t qmp_cmds[]; static const mon_cmd_t qmp_query_cmds[]; @@ -2726,13 +2726,14 @@ int monitor_get_fd(Monitor *mon, const char *fdname) return -1; } -static const mon_cmd_t mon_cmds[] = { +/* mon_cmds and info_cmds would be sorted at runtime */ +static mon_cmd_t mon_cmds[] = { #include hmp-commands.h { NULL, NULL, }, }; /* Please update hmp-commands.hx when adding or changing commands */ -static const mon_cmd_t info_cmds[] = { +static mon_cmd_t info_cmds[] = { { .name = version, .args_type = , @@ -5068,6 +5069,25 @@ static void monitor_event(void *opaque, int event) } } +static int +compare_mon_cmd(const void *a, const void *b) +{ +return strcmp(((const mon_cmd_t *)a)-name, +((const mon_cmd_t *)b)-name); +} + +static void sortcmdlist(void) +{ +int array_num; +int elem_size = sizeof(mon_cmd_t); + +array_num = sizeof(mon_cmds)/elem_size-1; +qsort((void *)mon_cmds, array_num, elem_size, compare_mon_cmd); + +array_num = sizeof(info_cmds)/elem_size-1; +qsort((void *)info_cmds, array_num, elem_size, compare_mon_cmd); +} + /* * Local variables: @@ -5110,6 +5130,8 @@ void monitor_init(CharDriverState *chr, int flags) QLIST_INSERT_HEAD(mon_list, mon, entry); if (!default_mon || (flags MONITOR_IS_DEFAULT)) default_mon = mon; + +sortcmdlist(); } static void bdrv_password_cb(Monitor *mon, const char *password, void *opaque) Tested-by: Wenyi Gao we...@linux.vnet.ibm.com Work nice. Wenyi Gao
Re: [Qemu-devel] [PATCH] kernel/kvm: fix improper nmi emulation
(2011/10/10 19:26), Avi Kivity wrote: On 10/10/2011 08:06 AM, Lai Jiangshan wrote: From: Kenji Kaneshigekaneshige.ke...@jp.fujitsu.com Currently, NMI interrupt is blindly sent to all the vCPUs when NMI button event happens. This doesn't properly emulate real hardware on which NMI button event triggers LINT1. Because of this, NMI is sent to the processor even when LINT1 is maskied in LVT. For example, this causes the problem that kdump initiated by NMI sometimes doesn't work on KVM, because kdump assumes NMI is masked on CPUs other than CPU0. With this patch, KVM_NMI ioctl is handled as follows. - When in-kernel irqchip is enabled, KVM_NMI ioctl is handled as a request of triggering LINT1 on the processor. LINT1 is emulated in in-kernel irqchip. - When in-kernel irqchip is disabled, KVM_NMI ioctl is handled as a request of injecting NMI to the processor. This assumes LINT1 is already emulated in userland. Please add a KVM_NMI section to Documentation/virtual/kvm/api.txt. -static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu) -{ - kvm_inject_nmi(vcpu); - - return 0; -} - static int vcpu_ioctl_tpr_access_reporting(struct kvm_vcpu *vcpu, struct kvm_tpr_access_ctl *tac) { @@ -3038,9 +3031,10 @@ long kvm_arch_vcpu_ioctl(struct file *fi break; } case KVM_NMI: { - r = kvm_vcpu_ioctl_nmi(vcpu); - if (r) - goto out; + if (irqchip_in_kernel(vcpu-kvm)) + kvm_apic_lint1_deliver(vcpu); + else + kvm_inject_nmi(vcpu); r = 0; break; } Why did you drop kvm_vcpu_ioctl_nmi()? Please add (and document) a KVM_CAP flag that lets userspace know the new behaviour is supported. Sorry for the delayed responding. I don't understand why new KVM_CAP flag is needed. I think the old behavior was clearly a bug, and new behavior is not a new capability. Furthermore, the kvm patch and the qemu patch in this patchset can be applied independently. If only the kvm patch is applied, NMI bug in kernel irq is fixed and qemu NMI behavior is not changed. If the only the qemu patch is applied, qemu NMI bug is fixed and the NMI behavior in kernel irq is not changed. Regards, Kenji Kaneshige
Re: [Qemu-devel] [PATCH 1/1 V2] kernel/kvm: fix improper nmi emulation
(2011/10/12 2:00), Lai Jiangshan wrote: From: Kenji Kaneshigekaneshige.ke...@jp.fujitsu.com Currently, NMI interrupt is blindly sent to all the vCPUs when NMI button event happens. This doesn't properly emulate real hardware on which NMI button event triggers LINT1. Because of this, NMI is sent to the processor even when LINT1 is maskied in LVT. For example, this causes the problem that kdump initiated by NMI sometimes doesn't work on KVM, because kdump assumes NMI is masked on CPUs other than CPU0. With this patch, KVM_NMI ioctl is handled as follows. - When in-kernel irqchip is enabled, KVM_NMI ioctl is handled as a request of triggering LINT1 on the processor. LINT1 is emulated in in-kernel irqchip. - When in-kernel irqchip is disabled, KVM_NMI ioctl is handled as a request of injecting NMI to the processor. This assumes LINT1 is already emulated in userland. (laijs) Changed from v1: Add KVM_NMI API document Add KVM_CAP_USER_NMI Signed-off-by: Kenji Kaneshigekaneshige.ke...@jp.fujitsu.com Tested-by: Lai Jiangshanla...@cn.fujitsu.com --- Documentation/virtual/kvm/api.txt | 20 arch/x86/kvm/irq.h|1 + arch/x86/kvm/lapic.c |7 +++ arch/x86/kvm/x86.c| 12 include/linux/kvm.h |3 +++ 5 files changed, 43 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index b0e4b9c..5c24cc3 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1430,6 +1430,26 @@ is supported; 2 if the processor requires all virtual machines to have an RMA, or 1 if the processor can use an RMA but doesn't require it, because it supports the Virtual RMA (VRMA) facility. +4.64 KVM_NMI + +Capability: KVM_CAP_USER_NMI +Architectures: x86 +Type: vcpu ioctl +Parameters: none +Returns: 0 on success, -1 on error + +This ioctl injects NMI to the vcpu. + +If with capability KVM_CAP_LAPIC_NMI, KVM_NMI ioctl is handled as follows: + + - When in-kernel irqchip is enabled, KVM_NMI ioctl is handled as a + request of triggering LINT1 on the processor. LINT1 is emulated in + in-kernel lapic irqchip. + + - When in-kernel irqchip is disabled, KVM_NMI ioctl is handled as a + request of injecting NMI to the processor. This assumes LINT1 is + already emulated in userland lapic. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h index 53e2d08..0c96315 100644 --- a/arch/x86/kvm/irq.h +++ b/arch/x86/kvm/irq.h @@ -95,6 +95,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s); void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu); void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu); void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu); +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu); void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu); void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu); void __kvm_migrate_timers(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 57dcbd4..87fe36a 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1039,6 +1039,13 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu) kvm_apic_local_deliver(apic, APIC_LVT0); } +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu-arch.apic; + + kvm_apic_local_deliver(apic, APIC_LVT1); +} + static struct kvm_timer_ops lapic_timer_ops = { .is_periodic = lapic_is_periodic, }; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 84a28ea..6862ef7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2729,12 +2729,24 @@ static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, return 0; } +#ifdef KVM_CAP_LAPIC_NMI +static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu) +{ + if (irqchip_in_kernel(vcpu-kvm)) + kvm_apic_lint1_deliver(vcpu); + else + kvm_inject_nmi(vcpu); + + return 0; +} +#else static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu) { kvm_inject_nmi(vcpu); return 0; } +#endif I don't think we need to keep old kvm_vcpu_ioctl_nmi() behavior because it's clearly a bug. Regards, Kenji Kaneshige
Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
On Wed, Oct 12, 2011 at 02:07:46PM +1100, David Gibson wrote: Um.. why? PCI is defined by the spec to be LE, so I don't see that we need explicit endianness versions for PCI helpers. LE in the spec only applies to structures defined by the spec, that is pci configuration and msix tables in device memory. -- MST
Re: [Qemu-devel] [PATCH] spice-input: migrate ledstate
Hi, There is no ledstate in a PS/2 keyboard (or I'm reading too much into the implementation in qemu). There is. It isn't explicitly stored into the state struct though because the ps/2 keyboard itself doesn't use it, it just calls kbd_put_ledstate() to inform others about it. cheers, Gerd
[Qemu-devel] [PATCH 1/2] Add opt_set_bool function
In addition to qemu_opt_set function, we need a function to set bool value also. Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- qemu-option.c | 35 +++ qemu-option.h |1 + 2 files changed, 36 insertions(+), 0 deletions(-) diff --git a/qemu-option.c b/qemu-option.c index 105d760..d6bc908 100644 --- a/qemu-option.c +++ b/qemu-option.c @@ -636,6 +636,41 @@ int qemu_opt_set(QemuOpts *opts, const char *name, const char *value) return 0; } +int qemu_opt_set_bool(QemuOpts *opts, const char *name, int val) +{ +QemuOpt *opt; +const QemuOptDesc *desc = opts-list-desc; +int i; + +for (i = 0; desc[i].name != NULL; i++) { +if (strcmp(desc[i].name, name) == 0) { +break; +} +} +if (desc[i].name == NULL) { +if (i == 0) { +/* empty list - allow any */; +} else { +qerror_report(QERR_INVALID_PARAMETER, name); +return -1; +} +} + +opt = g_malloc0(sizeof(*opt)); +opt-name = g_strdup(name); +opt-opts = opts; +QTAILQ_INSERT_TAIL(opts-head, opt, next); +if (desc[i].name != NULL) { +opt-desc = desc+i; +} +opt-value.boolean = !!val; +if (qemu_opt_parse(opt) 0) { +qemu_opt_del(opt); +return -1; +} +return 0; +} + int qemu_opt_foreach(QemuOpts *opts, qemu_opt_loopfunc func, void *opaque, int abort_on_failure) { diff --git a/qemu-option.h b/qemu-option.h index b515813..af4d36b 100644 --- a/qemu-option.h +++ b/qemu-option.h @@ -109,6 +109,7 @@ int qemu_opt_get_bool(QemuOpts *opts, const char *name, int defval); uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval); uint64_t qemu_opt_get_size(QemuOpts *opts, const char *name, uint64_t defval); int qemu_opt_set(QemuOpts *opts, const char *name, const char *value); +int qemu_opt_set_bool(QemuOpts *opts, const char *name, int val); typedef int (*qemu_opt_loopfunc)(const char *name, const char *value, void *opaque); int qemu_opt_foreach(QemuOpts *opts, qemu_opt_loopfunc func, void *opaque, int abort_on_failure); -- 1.7.6
[Qemu-devel] [PATCH V4 2/2] hw/9pfs: Add readonly support for 9p export
A new fsdev parameter readonly is introduced to control accessing 9p export. readonly=on|off can be used to specify the access type. By default rw access is given. Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- Changes from previous version V3: * Use opt_set_bool function to set readonly option * Change the flag from MS_READONLY to 9p specific Change from previous version V2: * QEMU_OPT_BOOL is used for readdonly parameter Changes from previous version: * Use readonly option instead of access * Change function return type to boolean where its needed fsdev/file-op-9p.h |3 +- fsdev/qemu-fsdev.c | 12 +- fsdev/qemu-fsdev.h |1 + hw/9pfs/virtio-9p-device.c |3 ++ hw/9pfs/virtio-9p.c| 46 qemu-config.c |7 ++ vl.c |2 + 7 files changed, 71 insertions(+), 3 deletions(-) diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 33fb07f..b75290d 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -58,7 +58,8 @@ typedef struct extended_ops { } extended_ops; /* FsContext flag values */ -#define PATHNAME_FSCONTEXT 0x1 +#define PATHNAME_FSCONTEXT 0x1 +#define P9_RDONLY_EXPORT0x2 /* cache flags */ #define V9FS_WRITETHROUGH_CACHE 0x1 diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index d08ba9c..f8a8227 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -29,13 +29,13 @@ static FsTypeTable FsTypes[] = { int qemu_fsdev_add(QemuOpts *opts) { struct FsTypeListEntry *fsle; -int i; +int i, flags = 0; const char *fsdev_id = qemu_opts_id(opts); const char *fstype = qemu_opt_get(opts, fstype); const char *path = qemu_opt_get(opts, path); const char *sec_model = qemu_opt_get(opts, security_model); const char *cache = qemu_opt_get(opts, cache); - +int rdonly = qemu_opt_get_bool(opts, readonly, 0); if (!fsdev_id) { fprintf(stderr, fsdev: No id specified\n); @@ -76,6 +76,14 @@ int qemu_fsdev_add(QemuOpts *opts) fsle-fse.cache_flags = V9FS_WRITETHROUGH_CACHE; } } +if (rdonly) { +flags |= P9_RDONLY_EXPORT; +} else { +flags = ~P9_RDONLY_EXPORT; +} + +fsle-fse.flags = flags; + QTAILQ_INSERT_TAIL(fstype_entries, fsle, next); return 0; } diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index 0f67880..2938eaf 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -44,6 +44,7 @@ typedef struct FsTypeEntry { char *security_model; int cache_flags; FileOperations *ops; +int flags; } FsTypeEntry; typedef struct FsTypeListEntry { diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index 1846e36..336292c 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -125,6 +125,9 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) s-tag_len = len; s-ctx.uid = -1; s-ctx.flags = 0; +if (fse-flags P9_RDONLY_EXPORT) { +s-ctx.flags |= P9_RDONLY_EXPORT; +} s-ops = fse-ops; s-vdev.get_features = virtio_9p_get_features; diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c index 47ed2f1..9f15787 100644 --- a/hw/9pfs/virtio-9p.c +++ b/hw/9pfs/virtio-9p.c @@ -1271,6 +1271,11 @@ static void v9fs_fix_path(V9fsPath *dst, V9fsPath *src, int len) dst-size++; } +static inline bool is_ro_export(FsContext *fs_ctx) +{ +return fs_ctx-flags P9_RDONLY_EXPORT; +} + static void v9fs_version(void *opaque) { V9fsPDU *pdu = opaque; @@ -1690,6 +1695,14 @@ static void v9fs_open(void *opaque) } else { flags = omode_to_uflags(mode); } +if (is_ro_export(s-ctx)) { +if (mode O_WRONLY || mode O_RDWR || mode O_APPEND) { +err = -EROFS; +goto out; +} else { +flags |= O_NOATIME; +} +} err = v9fs_co_open(pdu, fidp, flags); if (err 0) { goto out; @@ -3301,6 +3314,33 @@ static void v9fs_op_not_supp(void *opaque) complete_pdu(pdu-s, pdu, -EOPNOTSUPP); } +static inline bool is_read_only_op(int id) +{ +switch (id) { +case P9_TREADDIR: +case P9_TSTATFS: +case P9_TGETATTR: +case P9_TXATTRWALK: +case P9_TLOCK: +case P9_TGETLOCK: +case P9_TREADLINK: +case P9_TVERSION: +case P9_TLOPEN: +case P9_TATTACH: +case P9_TSTAT: +case P9_TWALK: +case P9_TCLUNK: +case P9_TFSYNC: +case P9_TOPEN: +case P9_TREAD: +case P9_TAUTH: +case P9_TFLUSH: +return 1; +default: +return 0; +} +} + static void submit_pdu(V9fsState *s, V9fsPDU *pdu) { Coroutine *co; @@ -3312,6 +3352,12 @@ static void submit_pdu(V9fsState *s, V9fsPDU *pdu) } else { handler = pdu_co_handlers[pdu-id]; } + +if (is_ro_export(s-ctx) !is_read_only_op(pdu-id)) { +complete_pdu(s,
Re: [Qemu-devel] [PATCH] qed: fix use-after-free during l2 cache commit
On Tue, Oct 11, 2011 at 04:22:11PM +0200, Kevin Wolf wrote: Am 30.09.2011 17:49, schrieb Amit Shah: On (Fri) 30 Sep 2011 [16:23:30], Stefan Hajnoczi wrote: On Fri, Sep 30, 2011 at 12:27 PM, Amit Shah amit.s...@redhat.com wrote: On (Fri) 30 Sep 2011 [11:39:11], Stefan Hajnoczi wrote: QED's metadata caching strategy allows two parallel requests to race for metadata lookup. The first one to complete will populate the metadata cache and the second one will drop the data it just read in favor of the cached data. There is a use-after-free in qed_read_l2_table_cb() and qed_commit_l2_update() where l2_table-offset was used after the l2_table may have been freed due to a metadata lookup race. Fix this by keeping the l2_offset in a local variable and not reaching into the possibly freed l2_table. Reported-by: Amit Shah amit.s...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- Hi Amit, Thanks for reporting the assertion failure you saw at http://fpaste.org/CDuv/. Does this patch fix the problem? Yes, this fixes it. Were you able to reliably reproduce the assertion failure before? Absolutely. I even reverted the patch and tried the same image; same segfault again. I wonder because this only happens when two metadata lookups race (which is rare enough on my setup that I've never seen this failure). It might be worth trying a few times. Get the F16 beta-rc LXE live iso, install guest. It doesn't cleanly reboot, you have to kill the VM. Next start of the VM produces this segfault. https://alt.fedoraproject.org/pub/alt/stage/16-Beta.RC2/Live/x86_64/Fedora-16-Beta-x86_64-Live-LXDE.iso Can we try to artificially produce it in a qemu-iotests case? I will take a look. Stefan
[Qemu-devel] [PATCH] hw/9pfs: Handle Security model parsing
Security model is needed only for 'local' fs driver. Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- fsdev/qemu-fsdev.c |6 + fsdev/qemu-fsdev.h |1 + hw/9pfs/virtio-9p-device.c | 47 ++- vl.c | 20 +++-- 4 files changed, 43 insertions(+), 31 deletions(-) diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 36db127..d08ba9c 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -58,11 +58,6 @@ int qemu_fsdev_add(QemuOpts *opts) return -1; } -if (!sec_model) { -fprintf(stderr, fsdev: No security_model specified.\n); -return -1; -} - if (!path) { fprintf(stderr, fsdev: No path specified.\n); return -1; @@ -72,6 +67,7 @@ int qemu_fsdev_add(QemuOpts *opts) fsle-fse.fsdev_id = g_strdup(fsdev_id); fsle-fse.path = g_strdup(path); +fsle-fse.fsdriver = g_strdup(fstype); fsle-fse.security_model = g_strdup(sec_model); fsle-fse.ops = FsTypes[i].ops; fsle-fse.cache_flags = 0; diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index 9c440f2..0f67880 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -40,6 +40,7 @@ typedef struct FsTypeTable { typedef struct FsTypeEntry { char *fsdev_id; char *path; +char *fsdriver; char *security_model; int cache_flags; FileOperations *ops; diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index aac58ad..1846e36 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -83,29 +83,30 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) exit(1); } -if (!strcmp(fse-security_model, passthrough)) { -/* Files on the Fileserver set to client user credentials */ -s-ctx.fs_sm = SM_PASSTHROUGH; -s-ctx.xops = passthrough_xattr_ops; -} else if (!strcmp(fse-security_model, mapped)) { -/* Files on the fileserver are set to QEMU credentials. - * Client user credentials are saved in extended attributes. - */ -s-ctx.fs_sm = SM_MAPPED; -s-ctx.xops = mapped_xattr_ops; -} else if (!strcmp(fse-security_model, none)) { -/* - * Files on the fileserver are set to QEMU credentials. - */ -s-ctx.fs_sm = SM_NONE; -s-ctx.xops = none_xattr_ops; -} else { -fprintf(stderr, Default to security_model=none. You may want - enable advanced security model using -security option:\n\t security_model=passthrough\n\t -security_model=mapped\n); -s-ctx.fs_sm = SM_NONE; -s-ctx.xops = none_xattr_ops; +/* security models is needed only for local fs driver */ +if (!strcmp(fse-fsdriver, local)) { +if (!strcmp(fse-security_model, passthrough)) { +/* Files on the Fileserver set to client user credentials */ +s-ctx.fs_sm = SM_PASSTHROUGH; +s-ctx.xops = passthrough_xattr_ops; +} else if (!strcmp(fse-security_model, mapped)) { +/* Files on the fileserver are set to QEMU credentials. +* Client user credentials are saved in extended attributes. +*/ +s-ctx.fs_sm = SM_MAPPED; +s-ctx.xops = mapped_xattr_ops; +} else if (!strcmp(fse-security_model, none)) { +/* +* Files on the fileserver are set to QEMU credentials. +*/ +s-ctx.fs_sm = SM_NONE; +s-ctx.xops = none_xattr_ops; +} else { +fprintf(stderr, Invalid security_model %s specified.\n +Available security models are:\t +passthrough,mapped or none\n, fse-security_model); +exit(1); +} } s-ctx.cache_flags = fse-cache_flags; diff --git a/vl.c b/vl.c index 6760e39..a961fa3 100644 --- a/vl.c +++ b/vl.c @@ -2795,6 +2795,7 @@ int main(int argc, char **argv, char **envp) QemuOpts *fsdev; QemuOpts *device; const char *cache; +const char *fsdriver; olist = qemu_find_opts(virtfs); if (!olist) { @@ -2809,13 +2810,26 @@ int main(int argc, char **argv, char **envp) if (qemu_opt_get(opts, fstype) == NULL || qemu_opt_get(opts, mount_tag) == NULL || -qemu_opt_get(opts, path) == NULL || -qemu_opt_get(opts, security_model) == NULL) { +qemu_opt_get(opts, path) == NULL) { fprintf(stderr, Usage: -virtfs fstype,path=/share_path/, -security_model=[mapped|passthrough|none], +{security_model=[mapped|passthrough|none]}, mount_tag=tag.\n); exit(1); } +fsdriver =
Re: [Qemu-devel] [PATCH] hw/9pfs: Handle Security model parsing
On Wed, Oct 12, 2011 at 01:24:16PM +0530, M. Mohan Kumar wrote: Security model is needed only for 'local' fs driver. Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- fsdev/qemu-fsdev.c |6 + fsdev/qemu-fsdev.h |1 + hw/9pfs/virtio-9p-device.c | 47 ++- vl.c | 20 +++-- 4 files changed, 43 insertions(+), 31 deletions(-) --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -40,6 +40,7 @@ typedef struct FsTypeTable { typedef struct FsTypeEntry { char *fsdev_id; char *path; +char *fsdriver; char *security_model; int cache_flags; FileOperations *ops; diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index aac58ad..1846e36 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -83,29 +83,30 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) exit(1); } -if (!strcmp(fse-security_model, passthrough)) { -/* Files on the Fileserver set to client user credentials */ -s-ctx.fs_sm = SM_PASSTHROUGH; -s-ctx.xops = passthrough_xattr_ops; -} else if (!strcmp(fse-security_model, mapped)) { -/* Files on the fileserver are set to QEMU credentials. - * Client user credentials are saved in extended attributes. - */ -s-ctx.fs_sm = SM_MAPPED; -s-ctx.xops = mapped_xattr_ops; -} else if (!strcmp(fse-security_model, none)) { -/* - * Files on the fileserver are set to QEMU credentials. - */ -s-ctx.fs_sm = SM_NONE; -s-ctx.xops = none_xattr_ops; -} else { -fprintf(stderr, Default to security_model=none. You may want - enable advanced security model using -security option:\n\t security_model=passthrough\n\t -security_model=mapped\n); -s-ctx.fs_sm = SM_NONE; -s-ctx.xops = none_xattr_ops; +/* security models is needed only for local fs driver */ +if (!strcmp(fse-fsdriver, local)) { +if (!strcmp(fse-security_model, passthrough)) { +/* Files on the Fileserver set to client user credentials */ +s-ctx.fs_sm = SM_PASSTHROUGH; +s-ctx.xops = passthrough_xattr_ops; +} else if (!strcmp(fse-security_model, mapped)) { +/* Files on the fileserver are set to QEMU credentials. +* Client user credentials are saved in extended attributes. +*/ +s-ctx.fs_sm = SM_MAPPED; +s-ctx.xops = mapped_xattr_ops; +} else if (!strcmp(fse-security_model, none)) { +/* +* Files on the fileserver are set to QEMU credentials. +*/ +s-ctx.fs_sm = SM_NONE; +s-ctx.xops = none_xattr_ops; +} else { +fprintf(stderr, Invalid security_model %s specified.\n +Available security models are:\t +passthrough,mapped or none\n, fse-security_model); +exit(1); +} Are you sure there aren't use cases where people would like to choose between passthrough mapped, even when using the 'proxy' or 'handle' security drivers. Both of the security models seem pretty generally useful to me, regardless of the driver type. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
On Wed, Oct 12, 2011 at 02:11:37PM +1100, David Gibson wrote: On Sun, Oct 02, 2011 at 12:52:39PM +0200, Michael S. Tsirkin wrote: On Sun, Oct 02, 2011 at 12:29:08PM +0200, Avi Kivity wrote: On 10/02/2011 12:25 PM, Michael S. Tsirkin wrote: On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote: This patch adds functions to pci.[ch] to perform PCI DMA operations. At present, these are just stubs which perform directly cpu physical memory accesses. Using these stubs, however, distinguishes PCI device DMA transactions from other accesses to physical memory, which will allow PCI IOMMU support to be added in one place, rather than updating every PCI driver at that time. That is, it allows us to update individual PCI drivers to support an IOMMU without having yet determined the details of how the IOMMU emulation will operate. This will let us remove the most bitrot-sensitive part of an IOMMU patch in advance. Signed-off-by: David Gibsonda...@gibson.dropbear.id.au So something I just thought about: all wrappers now go through cpu_physical_memory_rw. This is a problem as e.g. virtio assumes that accesses such as stw are atomic. cpu_physical_memory_rw is a memcpy which makes no such guarantees. Let's change cpu_physical_memory_rw() to provide that guarantee for aligned two and four byte accesses. Having separate paths just for that is not maintainable. Well, we also have stX_phys convert to target native endian-ness (nop for KVM but not necessarily for qemu). Yes.. as do the stX_pci_dma() helpers. They assume LE, rather than having two variants, because PCI is an LE spec, and all normal PCI devices work in LE. IMO, not really. PCI devices do DMA any way they like. LE is probably more common because both ARM and x86 processors are LE. If we need to model some perverse BE PCI device, it can reswap itself. An explicit API for this would be cleaner. -- MST
Re: [Qemu-devel] [PATCH 01/36] ds1225y: Use stdio instead of QEMUFile
On 10/11/2011 06:00 PM, Juan Quintela wrote: QEMUFile * is only intended for migration nowadays. Using it for anything else just adds pain and a layer of buffers for no good reason. Signed-off-by: Juan Quintelaquint...@redhat.com --- hw/ds1225y.c | 28 1 files changed, 16 insertions(+), 12 deletions(-) diff --git a/hw/ds1225y.c b/hw/ds1225y.c index 9875c44..6852a61 100644 --- a/hw/ds1225y.c +++ b/hw/ds1225y.c @@ -29,7 +29,7 @@ typedef struct { DeviceState qdev; uint32_t chip_size; char *filename; -QEMUFile *file; +FILE *file; uint8_t *contents; } NvRamState; @@ -70,9 +70,9 @@ static void nvram_writeb (void *opaque, target_phys_addr_t addr, uint32_t val) s-contents[addr] = val; if (s-file) { -qemu_fseek(s-file, addr, SEEK_SET); -qemu_put_byte(s-file, (int)val); -qemu_fflush(s-file); +fseek(s-file, addr, SEEK_SET); +fputc(val, s-file); +fflush(s-file); } } @@ -108,15 +108,17 @@ static int nvram_post_load(void *opaque, int version_id) /* Close file, as filename may has changed in load/store process */ if (s-file) { -qemu_fclose(s-file); +fclose(s-file); } /* Write back nvram contents */ -s-file = qemu_fopen(s-filename, wb); +s-file = fopen(s-filename, wb); if (s-file) { /* Write back contents, as 'wb' mode cleaned the file */ -qemu_put_buffer(s-file, s-contents, s-chip_size); -qemu_fflush(s-file); +if (fwrite(s-contents, s-chip_size, 1, s-file) != 1) { +printf(nvram_post_load: short write\n); +} +fflush(s-file); } return 0; @@ -143,7 +145,7 @@ typedef struct { static int nvram_sysbus_initfn(SysBusDevice *dev) { NvRamState *s =FROM_SYSBUS(SysBusNvRamState, dev)-nvram; -QEMUFile *file; +FILE *file; int s_io; s-contents = g_malloc0(s-chip_size); @@ -153,11 +155,13 @@ static int nvram_sysbus_initfn(SysBusDevice *dev) sysbus_init_mmio(dev, s-chip_size, s_io); /* Read current file */ -file = qemu_fopen(s-filename, rb); +file = fopen(s-filename, rb); if (file) { /* Read nvram contents */ -qemu_get_buffer(file, s-contents, s-chip_size); -qemu_fclose(file); +if (fread(s-contents, s-chip_size, 1, file) != 1) { +printf(nvram_sysbus_initfn: short read\n); +} +fclose(file); } nvram_post_load(s, 0); Tested-by: Zhi Hui Lizhihu...@linux.vnet.ibm.com
Re: [Qemu-devel] [PATCH 3/6] block: switch bdrv_read()/bdrv_write() to coroutines
On Tue, Oct 11, 2011 at 7:44 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Thu, Oct 6, 2011 at 12:17 AM, Stefan Hajnoczi stefa...@linux.vnet.ibm.com wrote: @@ -1101,36 +1144,7 @@ static void set_dirty_bitmap(BlockDriverState *bs, int64_t sector_num, int bdrv_write(BlockDriverState *bs, int64_t sector_num, const uint8_t *buf, int nb_sectors) { - BlockDriver *drv = bs-drv; - - if (!bs-drv) - return -ENOMEDIUM; - - if (bdrv_has_async_rw(drv) qemu_in_coroutine()) { - QEMUIOVector qiov; - struct iovec iov = { - .iov_base = (void *)buf, - .iov_len = nb_sectors * BDRV_SECTOR_SIZE, - }; - - qemu_iovec_init_external(qiov, iov, 1); - return bdrv_co_writev(bs, sector_num, nb_sectors, qiov); - } - - if (bs-read_only) - return -EACCES; - if (bdrv_check_request(bs, sector_num, nb_sectors)) - return -EIO; - - if (bs-dirty_bitmap) { - set_dirty_bitmap(bs, sector_num, nb_sectors, 1); - } - - if (bs-wr_highest_sector sector_num + nb_sectors - 1) { - bs-wr_highest_sector = sector_num + nb_sectors - 1; - } The above codes are removed, will it be safe? If you are checking that removing bs-wr_highest_sector code is okay, then yes, it is safe because bdrv_co_do_writev() does the dirty bitmap and wr_highest_sector updates. We haven't lost any code by unifying request processing - bdrv_co_do_writev() must do everything that bdrv_aio_writev() and bdrv_write() did. Stefan
Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
Hi, Yes.. as do the stX_pci_dma() helpers. They assume LE, rather than having two variants, because PCI is an LE spec, and all normal PCI devices work in LE. IMO, not really. PCI devices do DMA any way they like. LE is probably more common because both ARM and x86 processors are LE. Also having _le_ in the function name makes explicitly clear that the functions read/write little endian values and byteswaps if needed, which makes the code more readable. I'd suggest to add it even if there is no need for a _be_ companion as devices needing that are rare. cheers, Gerd
Re: [Qemu-devel] [PATCH 3/6] block: switch bdrv_read()/bdrv_write() to coroutines
On Wed, Oct 12, 2011 at 5:03 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Tue, Oct 11, 2011 at 7:44 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Thu, Oct 6, 2011 at 12:17 AM, Stefan Hajnoczi stefa...@linux.vnet.ibm.com wrote: @@ -1101,36 +1144,7 @@ static void set_dirty_bitmap(BlockDriverState *bs, int64_t sector_num, int bdrv_write(BlockDriverState *bs, int64_t sector_num, const uint8_t *buf, int nb_sectors) { - BlockDriver *drv = bs-drv; - - if (!bs-drv) - return -ENOMEDIUM; - - if (bdrv_has_async_rw(drv) qemu_in_coroutine()) { - QEMUIOVector qiov; - struct iovec iov = { - .iov_base = (void *)buf, - .iov_len = nb_sectors * BDRV_SECTOR_SIZE, - }; - - qemu_iovec_init_external(qiov, iov, 1); - return bdrv_co_writev(bs, sector_num, nb_sectors, qiov); - } - - if (bs-read_only) - return -EACCES; - if (bdrv_check_request(bs, sector_num, nb_sectors)) - return -EIO; How about the above four lines of codes? - - if (bs-dirty_bitmap) { - set_dirty_bitmap(bs, sector_num, nb_sectors, 1); - } - - if (bs-wr_highest_sector sector_num + nb_sectors - 1) { - bs-wr_highest_sector = sector_num + nb_sectors - 1; - } The above codes are removed, will it be safe? If you are checking that removing bs-wr_highest_sector code is okay, then yes, it is safe because bdrv_co_do_writev() does the dirty bitmap and wr_highest_sector updates. We haven't lost any code by unifying OK. got it. thanks. request processing - bdrv_co_do_writev() must do everything that bdrv_aio_writev() and bdrv_write() did. Stefan -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
On Wed, Oct 12, 2011 at 02:09:26PM +1100, David Gibson wrote: On Mon, Oct 03, 2011 at 08:17:05AM -0500, Anthony Liguori wrote: On 10/02/2011 07:14 AM, Michael S. Tsirkin wrote: On Sun, Oct 02, 2011 at 02:01:10PM +0200, Avi Kivity wrote: Hmm, not entirely virtio specific, some devices use stX macros to do the conversion. E.g. stw_be_phys and stl_le_phys are used in several places. These are fine - explicit endianness. Right. So changing these to e.g. stl_dma and assuming LE is default seems like a step backwards. We're generalizing too much. In general, the device model doesn't need atomic access functions. Anthony, are you sure? PCI both provides atomic operations for devices (likely uncommon). PCI express spec strongly recommends at least dword update granularity for both reads and writes. Some guests might depend on this. That's because device model RAM access is not coherent with CPU RAM access. Virtio is a very, very special case. virtio requires coherent RAM access. E.g., e1000 driver seems to allocate its rings in coherent memory. Required? Your guess is as good as mine. It seems to work fine ATM without these guarantees. Right, but it should only need that for the actual rings in the virtio core. I was expecting that those would remain as direct physical memory accesses - precisely because virtio is special - rather than accesses through any kind of DMA interface. At the moment, yes. Further, that was just an example I know about. How about msi/msix? We don't want to split these writes as that would confuse the APIC. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au| minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] Is realview-pb-a8 fully supported ?
On 10 October 2011 14:48, Francis Moreau francis.m...@gmail.com wrote: On Mon, Oct 10, 2011 at 10:42 AM, Peter Maydell peter.mayd...@linaro.org wrote: On 10 October 2011 08:35, Francis Moreau francis.m...@gmail.com wrote: I noticed another point for the realview platofrm: if I boot with -M 512, it works however if I set -M 256 then it doesn't. Perhaps your kernel is configured to load in the higher 256MB address range hmm which options do you have in mind ? Hmm, I thought there was an option for this but I can't find it in the config, so I must have been misremembering somehow. When I say it doesn't work, it means that nothing happen when starting qemu: no trace, it looks like it's running an infinite loop. Not even Uncompressing the kernel ? If you want to track down what's going on then you'll need to connect an ARM gdb up to qemu and single step through the boot process, I'm afraid. BTW I'm wondering which kernel source I should use to build kernels for such plateforms (realview, vexpress, versatile) ? I'm currently using the source from kernel.org (well similar since this server seems really dead). but I'm not sure if it's a good idea... I think the mainline kernel sources should in theory work (in particular if they work with 512MB then that's a good sign...) but I'm not a kernel expert; mostly I use other peoples' prebuilt ones. -- PMM
Re: [Qemu-devel] qemu-0.15.1 stable call for patches
On 26/09/11 9:16 AM, Justin M. Forbes wrote: With the current patch queue I would like to start getting qemu-0.15.1 stable into shape. With this in mind, the plan is to tag the release on Monday Oct 3rd. If you have patches pending for stable, now would be the time to send them. Please CC jmfor...@linuxtx.org if you can to ensure that I see them. Thanks, Justin Is there anywhere where we can see what's been pulled into the pending 0.15.1 code base so far since you don't seem to really post to the list about the stable branches? -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..441a37f 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, const struct iovec *iov, return writev(fd, iov, iovcnt); The sync_file_range(2) call below is dead-code since we'll return immediately after writev(2) completes. The writev(2) return value needs to be saved temporarily and then returned after sync_file_range(2). } #endif + if (ctx-cache_flags V9FS_WRITETHROUGH_CACHE) { -drive cache=writethrough means something different from 9pfs writethrough. This is confusing so I wonder if there is a better name like immediate write-out. + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when cache=writethrough is sepcified. + */ + sync_file_range(fd, offset, 0, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE); + } I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary. As a best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although a client that rapidly rewrites may be able to leave dirty pages in the host page cache. SYNC_FILE_RANGE_WAIT_BEFORE ensures that dirty pages get written out but it is no longer asynchronous because it blocks. Stefan
Re: [Qemu-devel] [BUG] USB assertion triggers in usb_packet_complete()
On Tue, Oct 11, 2011 at 8:35 AM, Thomas Huth th...@linux.vnet.ibm.com wrote: Am Mon, 10 Oct 2011 15:03:41 +0200 schrieb Thomas Huth th...@linux.vnet.ibm.com: I am currently facing a problem when running QEMU (up-to-date git version) with OHCI and a lot of virtual USB devices. The emulator dies with the following assertion: qemu-system-arm: hw/usb.c:337: usb_packet_complete: Assertion `p-owner != ((void *)0)' failed. Hi Thomas, I hit the same bug recently and Gerd has posted a patch which you can test: http://patchwork.ozlabs.org/patch/118726/ Stefan
[Qemu-devel] PCI 64-bit BAR access with qemu
Hi there, I've read a few days ago that it was possible to emulate PCI device with 64-bit BARs and have a real 64-bit memory access. Thus, I've created a virtual device named toto accessible through a 64-bit BAR ___ static const MemoryRegionOps bxi_common_mmio_ops = { .read = toto_mmio_read, .write = toto_mmio_write, .endianness = DEVICE_LITTLE_ENDIAN, .impl = { .min_access_size = 1, .max_access_size = 8, }, }; ___ memory_region_init_io(d-mmio, toto_mmio_ops, d, toto-mmio, 0x1000); pci_register_bar(d-dev, BAR_0, PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, d-mmio); when I use in my driver pointers to unsigned char, unsigned short or unsigned int, I can see that toto_mmio_read is called with increasing sizes. But that does not work at all with unsigned long long. In such a case, I always obtain two calls to toto_mmio_read with sizes limited to 4. My understanding here is that a direct 64-bit memory access does not work (but is composed of 2 32-bit accesses). Then, I've tried to understand why this happens and realized that cpu_physical_memory_rw was always called for this memory region (for in the old_mmio manner) : ___ (gdb) bt #0 toto_mmio_read (opaque=0x2bec2c0, addr=0, size=4) at /home/workspace/qemu/hw/toto_hw.c:92 #1 0x005f6d37 in memory_region_read_accessor (opaque=0x2bec738, addr=0, value=0x7f9997ffec78, size= 4, shift=0, mask=4294967295) at /home/workspace/qemu/memory.c:239 #2 0x005f6ee0 in access_with_adjusted_size (addr=0, value=0x7f9997ffec78, size=4, access_size_min=1, access_size_max=8, access=0x5f6cdf memory_region_read_accessor, opaque=0x2bec738) at /home/workspace/qemu/memory.c:284 #3 0x005f8909 in memory_region_read_thunk_n (_mr=0x2bec738, addr=0, size=4) at /home/workspace/qemu/memory.c:824 #4 0x005f8af1 in memory_region_read_thunk_l (mr=0x2bec738, addr=0) at /home/workspace/qemu/memory.c:867 #5 0x005c8f69 in cpu_physical_memory_rw (addr=3892314112, buf=0x7f999c5d0028 \320\b, len=8, is_write=0) at /home/workspace/qemu/exec.c:3965 #6 0x005ecfa1 in kvm_cpu_exec (env=0x238ee10) at /home/workspace/qemu/kvm-all.c:985 #7 0x005bf1fb in qemu_kvm_cpu_thread_fn (arg=0x238ee10) at /home/workspace/qemu/cpus.c:661 #8 0x0034dfc077e1 in start_thread () from /lib64/libpthread.so.0 #9 0x0034df4e68ed in clone () from /lib64/libc.so.6 Did I miss anything ? Is it a real defect ? Are there developments still planned to allow a direct 64-bit access ? Thanks for any help, François
Re: [Qemu-devel] [PATCH 0/3] block: zero write detection
On Tue, Oct 11, 2011 at 03:46:28PM +0200, Kevin Wolf wrote: Am 07.10.2011 17:49, schrieb Stefan Hajnoczi: Image streaming copies data from the backing file into the image file. It is important to represent zero regions from the backing file efficiently during streaming, otherwise the image file grows to the full virtual disk size and loses sparseness. There are two ways to implement zero write detection, they are subtly different: 1. Allow image formats to provide efficient representations for zero regions. QED does this with zero clusters and it has been discussed for qcow2v3. 2. During streaming, check for zeroes and skip writing to the image file when zeroes are detected. However, there are some disadvantages to #2 because it leaves unallocated holes in the image file. If image streaming is aborted before it completes then it will be necessary to reread all unallocated clusters from the backing file upon resuming image streaming. Potentionally worse is that a backing file over a slow remote connection will have the zero regions fetched again and again if the guest accesses them. #1 avoids these problems because the image file contains information on which regions are zeroes and do not need to be refetched. This patch series implements #1 with the existing QED zero cluster feature. In the future we can add qcow2v3 zero clusters too. We can also implement #2 directly in the image streaming code as a fallback when the BlockDriver does not support zero detection #1 itself. That way we get the best possible zero write detection, depending on the image format. Here is a qemu-iotest to verify that zero write detection is working: http://repo.or.cz/w/qemu-iotests/stefanha.git/commitdiff/226949695eef51bdcdea3e6ce3d7e5a863427f37 Stefan Hajnoczi (3): block: add zero write detection interface qed: add zero write detection support qemu-io: add zero write detection option block.c | 16 +++ block.h |2 + block/qed.c | 81 +-- block_int.h | 13 + qemu-io.c | 35 - 5 files changed, 132 insertions(+), 15 deletions(-) It's good to have an option to detect zero writes and turn them into zero clusters, but it's something that introduces some overhead and probably won't be suitable as a default. Yes, this series simply has a bdrv_set_zero_detection() API to toggle it at runtime. By default it is off to save CPU cycles. I think what we really want to have for image streaming is an API that explicitly writes zeros and doesn't have to look at the whole buffer (or actually doesn't even get a buffer). I didn't take this approach to avoid having block drivers handle the zero buffers that need to be allocated when the region does not cover entire clusters. It can be done for sure but I'm not sure how to do it nicely yet. Stefan
Re: [Qemu-devel] PCI 64-bit BAR access with qemu
I've read a few days ago that it was possible to emulate PCI device with 64-bit BARs and have a real 64-bit memory access. Thus, I've created a virtual device named toto accessible through a 64-bit BAR You've probably confused an ability to locate BAR anywhere in 64-bit address space (such BAR actually spans 2 consecutive PCI BAR registers and has 100 in its 3 least significant bits) and an ability to access BAR-mapped memory in 64 bit items. You obviously want the latter but currently it is not implemented, see e.g. static inline DATA_TYPE glue(io_read, SUFFIX)(target_phys_addr_t physaddr, target_ulong addr, void *retaddr) definition in the softmmu_template.h. And it's quite simple to fix it, you only need to change io_{read,write} in the softmmu_template.h and extend io_mem_{read,write} loops in exec.c to 4 elements, taking care that io_mem_{read,write}[3] can pass uint64_t. -- Thanks. -- Max
Re: [Qemu-devel] [PATCH 0/3] block: zero write detection
Am 12.10.2011 12:39, schrieb Stefan Hajnoczi: On Tue, Oct 11, 2011 at 03:46:28PM +0200, Kevin Wolf wrote: Am 07.10.2011 17:49, schrieb Stefan Hajnoczi: Image streaming copies data from the backing file into the image file. It is important to represent zero regions from the backing file efficiently during streaming, otherwise the image file grows to the full virtual disk size and loses sparseness. There are two ways to implement zero write detection, they are subtly different: 1. Allow image formats to provide efficient representations for zero regions. QED does this with zero clusters and it has been discussed for qcow2v3. 2. During streaming, check for zeroes and skip writing to the image file when zeroes are detected. However, there are some disadvantages to #2 because it leaves unallocated holes in the image file. If image streaming is aborted before it completes then it will be necessary to reread all unallocated clusters from the backing file upon resuming image streaming. Potentionally worse is that a backing file over a slow remote connection will have the zero regions fetched again and again if the guest accesses them. #1 avoids these problems because the image file contains information on which regions are zeroes and do not need to be refetched. This patch series implements #1 with the existing QED zero cluster feature. In the future we can add qcow2v3 zero clusters too. We can also implement #2 directly in the image streaming code as a fallback when the BlockDriver does not support zero detection #1 itself. That way we get the best possible zero write detection, depending on the image format. Here is a qemu-iotest to verify that zero write detection is working: http://repo.or.cz/w/qemu-iotests/stefanha.git/commitdiff/226949695eef51bdcdea3e6ce3d7e5a863427f37 Stefan Hajnoczi (3): block: add zero write detection interface qed: add zero write detection support qemu-io: add zero write detection option block.c | 16 +++ block.h |2 + block/qed.c | 81 +-- block_int.h | 13 + qemu-io.c | 35 - 5 files changed, 132 insertions(+), 15 deletions(-) It's good to have an option to detect zero writes and turn them into zero clusters, but it's something that introduces some overhead and probably won't be suitable as a default. Yes, this series simply has a bdrv_set_zero_detection() API to toggle it at runtime. By default it is off to save CPU cycles. I think what we really want to have for image streaming is an API that explicitly writes zeros and doesn't have to look at the whole buffer (or actually doesn't even get a buffer). I didn't take this approach to avoid having block drivers handle the zero buffers that need to be allocated when the region does not cover entire clusters. It can be done for sure but I'm not sure how to do it nicely yet. If I understand your QED code right, in such cases it ignores that there are some zeros that could be turned into a zero cluster. Considering this and that you always fill a buffer just to be able to check it (which is known to take considerable time from qemu-img convert experience) - how could any solution that works consistently, but requires an allocation in the block driver be less nice? Kevin
[Qemu-devel] balloon driver on winxp guest start failed
Hi, I used balloon driver for windows virtio-win-0.1-15.iso (from http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/) following the install guard , I installed the balloon driver like this: devcon.exe install d:\wxp\x86\balloon.inf PCI\VEN_1AF4DEV_1002SUBSYS_00051AF4REV_00 then reboot guest Os, but the status of driver installed is always incorrect, that show me the driver start failed (code 10) in the device manager. I typed the following cmds in the monitor command line: (qemu) device_add virtio-balloon (qemu) info balloon balloon: actual=2048 (qemu) balloon 1024 (qemu) info balloon balloon: actual=2048 (qemu) info balloon balloon: actual=2048 And I also tried it by using qemu -balloon virtio param when getting qemu up, the status is worse, the winxp guest froze at boot screen. Am I using balloon driver in a correct way?
Re: [Qemu-devel] [BUG] USB assertion triggers in usb_packet_complete()
Am Wed, 12 Oct 2011 11:02:42 +0100 schrieb Stefan Hajnoczi stefa...@gmail.com: On Tue, Oct 11, 2011 at 8:35 AM, Thomas Huth th...@linux.vnet.ibm.com wrote: Am Mon, 10 Oct 2011 15:03:41 +0200 schrieb Thomas Huth th...@linux.vnet.ibm.com: I am currently facing a problem when running QEMU (up-to-date git version) with OHCI and a lot of virtual USB devices. The emulator dies with the following assertion: qemu-system-arm: hw/usb.c:337: usb_packet_complete: Assertion `p-owner != ((void *)0)' failed. Hi Thomas, I hit the same bug recently and Gerd has posted a patch which you can test: http://patchwork.ozlabs.org/patch/118726/ Thanks for the hint, Stefan, you're right, that seems to be the same bug. Your patch is working fine in my scenario, too. However, Gerd's patch is not working for me, the assertion still triggers. It seems like usb_packet_complete() is called for the leaf node before it is called for the hub node, so the leaf node already set p-owner = NULL. Thomas
[Qemu-devel] [PATCH] usb-hid: activate usb tablet / mouse after migration.
qemu uses the ps/2 mouse by default. The usb tablet (or mouse) is activated as soon as qemu sees some guest activity on the device, i.e. polling for HID events. That used to work fine for both fresh boot and migration. Remote wakeup support changed the picture though: There will be no polling after migration in case the guest suspended the usb bus, waiting for wakeup events. Result is that the ps/2 mouse stays active. Fix this by activating the usb tablet / mouse in post_load() in case the guest enabled remote wakeup. Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/usb-hid.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/hw/usb-hid.c b/hw/usb-hid.c index 7c564b6..997f828 100644 --- a/hw/usb-hid.c +++ b/hw/usb-hid.c @@ -520,10 +520,21 @@ static int usb_keyboard_initfn(USBDevice *dev) return usb_hid_initfn(dev, HID_KEYBOARD); } +static int usb_ptr_post_load(void *opaque, int version_id) +{ +USBHIDState *s = opaque; + +if (s-dev.remote_wakeup) { +hid_pointer_activate(s-hid); +} +return 0; +} + static const VMStateDescription vmstate_usb_ptr = { .name = usb-ptr, .version_id = 1, .minimum_version_id = 1, +.post_load = usb_ptr_post_load, .fields = (VMStateField []) { VMSTATE_USB_DEVICE(dev, USBHIDState), VMSTATE_HID_POINTER_DEVICE(hid, USBHIDState), -- 1.7.1
Re: [Qemu-devel] [PATCH 0/3] block: zero write detection
On Wed, Oct 12, 2011 at 12:03 PM, Kevin Wolf kw...@redhat.com wrote: Am 12.10.2011 12:39, schrieb Stefan Hajnoczi: On Tue, Oct 11, 2011 at 03:46:28PM +0200, Kevin Wolf wrote: Am 07.10.2011 17:49, schrieb Stefan Hajnoczi: Image streaming copies data from the backing file into the image file. It is important to represent zero regions from the backing file efficiently during streaming, otherwise the image file grows to the full virtual disk size and loses sparseness. There are two ways to implement zero write detection, they are subtly different: 1. Allow image formats to provide efficient representations for zero regions. QED does this with zero clusters and it has been discussed for qcow2v3. 2. During streaming, check for zeroes and skip writing to the image file when zeroes are detected. However, there are some disadvantages to #2 because it leaves unallocated holes in the image file. If image streaming is aborted before it completes then it will be necessary to reread all unallocated clusters from the backing file upon resuming image streaming. Potentionally worse is that a backing file over a slow remote connection will have the zero regions fetched again and again if the guest accesses them. #1 avoids these problems because the image file contains information on which regions are zeroes and do not need to be refetched. This patch series implements #1 with the existing QED zero cluster feature. In the future we can add qcow2v3 zero clusters too. We can also implement #2 directly in the image streaming code as a fallback when the BlockDriver does not support zero detection #1 itself. That way we get the best possible zero write detection, depending on the image format. Here is a qemu-iotest to verify that zero write detection is working: http://repo.or.cz/w/qemu-iotests/stefanha.git/commitdiff/226949695eef51bdcdea3e6ce3d7e5a863427f37 Stefan Hajnoczi (3): block: add zero write detection interface qed: add zero write detection support qemu-io: add zero write detection option block.c | 16 +++ block.h | 2 + block/qed.c | 81 +-- block_int.h | 13 + qemu-io.c | 35 - 5 files changed, 132 insertions(+), 15 deletions(-) It's good to have an option to detect zero writes and turn them into zero clusters, but it's something that introduces some overhead and probably won't be suitable as a default. Yes, this series simply has a bdrv_set_zero_detection() API to toggle it at runtime. By default it is off to save CPU cycles. I think what we really want to have for image streaming is an API that explicitly writes zeros and doesn't have to look at the whole buffer (or actually doesn't even get a buffer). I didn't take this approach to avoid having block drivers handle the zero buffers that need to be allocated when the region does not cover entire clusters. It can be done for sure but I'm not sure how to do it nicely yet. If I understand your QED code right, in such cases it ignores that there are some zeros that could be turned into a zero cluster. Considering this and that you always fill a buffer just to be able to check it (which is known to take considerable time from qemu-img convert experience) - how could any solution that works consistently, but requires an allocation in the block driver be less nice? The fallback is easy when you already have a buffer - just do the write :). My point is that this patch is the simplest approach. Other approaches can optimize better and the question is whether they are worth doing. Stefan
Re: [Qemu-devel] PCI 64-bit BAR access with qemu
Hi Max, thank you for your answer. Actually, I hadn't confused. I already thought to your proposal but I found that it was a really ugly solution (essentially because of the uint32_t to uint64_t silent conversion). Isn't there any other (current or future) development that may fix it ? François Le 12/10/2011 13:00, Max Filippov a écrit : I've read a few days ago that it was possible to emulate PCI device with 64-bit BARs and have a real 64-bit memory access. Thus, I've created a virtual device named toto accessible through a 64-bit BAR You've probably confused an ability to locate BAR anywhere in 64-bit address space (such BAR actually spans 2 consecutive PCI BAR registers and has 100 in its 3 least significant bits) and an ability to access BAR-mapped memory in 64 bit items. You obviously want the latter but currently it is not implemented, see e.g. static inline DATA_TYPE glue(io_read, SUFFIX)(target_phys_addr_t physaddr, target_ulong addr, void *retaddr) definition in the softmmu_template.h. And it's quite simple to fix it, you only need to change io_{read,write} in the softmmu_template.h and extend io_mem_{read,write} loops in exec.c to 4 elements, taking care that io_mem_{read,write}[3] can pass uint64_t.
Re: [Qemu-devel] [PATCH 3/6] block: switch bdrv_read()/bdrv_write() to coroutines
Am 05.10.2011 18:17, schrieb Stefan Hajnoczi: The bdrv_read()/bdrv_write() functions call .bdrv_read()/.bdrv_write(). They should go through bdrv_co_do_readv() and bdrv_co_do_writev() instead in order to unify request processing code across sync, aio, and coroutine interfaces. This is also an important step towards removing BlockDriverState .bdrv_read()/.bdrv_write() in the future. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com This breaks drivers that only provide synchronous .bdrv_read/write. Attempts to read or write anything results in endless recursion. Kevin
Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3
On Tue, Jun 28, 2011 at 10:38 AM, Frediano Ziglio fredd...@gmail.com wrote: 2011/6/27 Kevin Wolf kw...@redhat.com: This is the second draft for what I think could be added when we increase qcow2's version number to 3. This includes points that have been made by several people over the past few months. We're probably not going to implement this next week, but I think it's important to get discussions started early, so here it is. Changes implemented in this RFC: - Added compatible/incompatible/auto-clear feature bits plus an optional feature name table to allow useful error messages even if an older version doesn't know some feature at all. - Added a dirty flag which tells that the refcount may not be accurate (QED mode). This means that we can save writes to the refcount table with cache=writethrough, but isn't really useful otherwise since Qcow2Cache. - Configurable refcount width. If you don't want to use internal snapshots, make refcounts one bit and save cache space and I/O. - Added subclusters. This separate the COW size (one subcluster, I'm thinking of 64k default size here) from the allocation size (one cluster, 2M). Less fragmentation, less metadata, but still reasonable COW granularity. This also allows to preallocate clusters, but none of their subclusters. You can have an image that is like raw + COW metadata, and you can also preallocate metadata for images with backing files. - Zero cluster flags. This allows discard even with a backing file that doesn't contain zeros. It is also useful for copy-on-read/image streaming, as you'll want to keep sparseness without accessing the remote image for an unallocated cluster all the time. - Fixed internal snapshot metadata to use 64 bit VM state size. You can't save a snapshot of a VM with = 4 GB RAM today. Possible future additions: - Add per-L2-table dirty flag to L1? - Add per-refcount-block full flag to refcount table? Hi, thinking about image improvement I would add - GUID for image and backing file - relative path for backing file This would help finding images in a distributed environment or if file are moved, ie: gfs/nfs/ocfs mounted in different mount points, backing used a template in a different images directory and move this directory somewhere else. Also with GUID a possible higher level could manage a GUID - file image db. I was also think about a backing file length field to support resizing but probably can be implemented with zero cluster. Assume you have a image of 5gb, create a new image with first image as backing one, now resize second image from 5gb to 3gb then resize it again (after some works) to 10gb, part from 3gb to 5gb should not be read from backing file. Interesting idea. One could argue either way. When image file size != backing file size you need to know what you are doing :). I think the case where the image is smaller than the backing file is rare and zeroing vs exposing the backing file on resize isn't an obvious choice. Also a bit in l2 offset to say there is no l2 table cause all clusters in l2 are contiguous so we avoid entirely l2. Obviously this require an optimization step to detect or create such condition. There are several reserved L1 entry bits which could be used to mark this mode. This mode severely restricts qcow2 features though: how would snapshots and COW work? Perhaps by breaking the huge cluster back into an L2 table with individual clusters? Backing files also cannot be used - unless we extend the sub-clusters approach and also keep a large bitmap with allocated/unallocated/zero information. A mode like this could be used for best performance on local storage, where efficiently image transport (e.g. scp or http) is not required. Actually I think this is reasonable, we could use qemu-img convert to produce a compact qcow2 for export and use the L2-less qcow2 for running the actual VM. Kevin: what do you think about fleshing out this mode instead of sub-clusters? This mail sound quite strange to me, I thought qed would be the future of qcow2 but I must be really wrong. What it's called doesn't matter but we need better metadata, and by making qcow2v3 extensible we can now improvements without losing support for existing image files. Stefan
Re: [Qemu-devel] [PATCH 4/6] block: switch bdrv_aio_readv() to coroutines
Am 05.10.2011 18:17, schrieb Stefan Hajnoczi: More sync, aio, and coroutine unification. Make bdrv_aio_readv() go through coroutine request processing. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- block.c | 48 +++- 1 files changed, 35 insertions(+), 13 deletions(-) diff --git a/block.c b/block.c index 90c29db..b83e911 100644 --- a/block.c +++ b/block.c @@ -78,6 +78,15 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov); static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov); +static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs, + int64_t sector_num, + QEMUIOVector *qiov, + int nb_sectors, + BlockDriverCompletionFunc *cb, + void *opaque, + bool is_write, + CoroutineEntry *entry); +static void coroutine_fn bdrv_co_do_rw(void *opaque); static QTAILQ_HEAD(, BlockDriverState) bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states); @@ -2346,17 +2355,10 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, BlockDriverCompletionFunc *cb, void *opaque) { -BlockDriver *drv = bs-drv; - trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque); -if (!drv) -return NULL; -if (bdrv_check_request(bs, sector_num, nb_sectors)) -return NULL; - -return drv-bdrv_aio_readv(bs, sector_num, qiov, nb_sectors, - cb, opaque); +return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, + cb, opaque, false, bdrv_co_do_rw); } typedef struct BlockCompleteData { @@ -2803,6 +2805,7 @@ static void bdrv_co_rw_bh(void *opaque) qemu_aio_release(acb); } +/* Invoke .bdrv_co_readv/.bdrv_co_writev */ static void coroutine_fn bdrv_co_rw(void *opaque) { BlockDriverAIOCBCoroutine *acb = opaque; @@ -2820,13 +2823,32 @@ static void coroutine_fn bdrv_co_rw(void *opaque) qemu_bh_schedule(acb-bh); } +/* Invoke bdrv_co_do_readv/bdrv_co_do_writev */ +static void coroutine_fn bdrv_co_do_rw(void *opaque) +{ +BlockDriverAIOCBCoroutine *acb = opaque; +BlockDriverState *bs = acb-common.bs; + +if (!acb-is_write) { +acb-req.error = bdrv_co_do_readv(bs, acb-req.sector, +acb-req.nb_sectors, acb-req.qiov); +} else { +acb-req.error = bdrv_co_do_writev(bs, acb-req.sector, +acb-req.nb_sectors, acb-req.qiov); +} + +acb-bh = qemu_bh_new(bdrv_co_rw_bh, acb); +qemu_bh_schedule(acb-bh); +} The difference between the existing bdrv_co_rw and the new bdrv_co_do_rw is that the former directly calls drv-... whereas the latter does some checks first. I think you could just switch bdrv_co_rw to do the checks. If I'm not mistaken, the other path is dead code anyway after this change. Actually, it looks like this series leaves quite a bit of dead code behind, but I need to apply all patches and check the result to be sure. Kevin
Re: [Qemu-devel] [PATCH 0/6] block: do request processing in a coroutine
Am 05.10.2011 18:17, schrieb Stefan Hajnoczi: Block layer features like dirty block tracing, I/O throttling, and live block copy are forced to duplicate code due to the three different interfaces: synchronous, asynchronous, and coroutines. Since there are bdrv_read(), bdrv_aio_readv(), and bdrv_co_readv() interfaces for read (and similar for write), per-request processing needs to be duplicated for each of these execution contexts. For example, dirty block tracking code is duplicated across these three interfaces. This patch series unifies request processing so that there is only one code path. I see this as a prerequisite to the live block copy (image streaming) code I am working on, so I'm pushing it now. The short-term win from this series is that it becomes easy to add live block copy and other features. We now have a single code path where the perf-request processing is done. The longer-term win will be dropping the BlockDriver .bdrv_read(), .bdrv_write(), .bdrv_aio_readv(), and .bdrv_aio_writev() interfaces. By doing that we can bring all BlockDrivers onto a common interface, namely .bdrv_co_readv() and .bdrv_co_writev(). It will also allow us to drop most of the sync and aio emulation code. A consequence of this patch series is that every I/O request goes through at least one coroutine. There is no longer a direct .bdrv_read(), .bdrv_write(), .bdrv_aio_readv(), or .bdrv_aio_writev() call - we're trying to phase out those interfaces. I have not noticed performance degradation in correctness tests but we need to confirm that there has not been a performance regression. Stefan Hajnoczi (6): block: directly invoke .bdrv_aio_*() in bdrv_co_io_em() block: split out bdrv_co_do_readv() and bdrv_co_do_writev() block: switch bdrv_read()/bdrv_write() to coroutines block: switch bdrv_aio_readv() to coroutines block: mark blocks dirty on coroutine write completion block: switch bdrv_aio_writev() to coroutines block.c | 273 +++ 1 files changed, 134 insertions(+), 139 deletions(-) Except for breaking synchronous drivers this looks good. I have applied the first two patches to the block branch, so if you don't need to change them, submitting patches 3-6 for v2 will be enough. Kevin
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Wed, 12 Oct 2011 10:55:21 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..441a37f 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, const struct iovec *iov, return writev(fd, iov, iovcnt); The sync_file_range(2) call below is dead-code since we'll return immediately after writev(2) completes. The writev(2) return value needs to be saved temporarily and then returned after sync_file_range(2). Missed that. Will fix in the next update } #endif + if (ctx-cache_flags V9FS_WRITETHROUGH_CACHE) { -drive cache=writethrough means something different from 9pfs writethrough. This is confusing so I wonder if there is a better name like immediate write-out. cache=immediate-write-out ? + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when cache=writethrough is sepcified. + */ + sync_file_range(fd, offset, 0, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE); + } I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary. As a best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although a client that rapidly rewrites may be able to leave dirty pages in the host page cache Shouldn't we prevent this ? That is the reason for me to use that WAIT_BEFORE ? . SYNC_FILE_RANGE_WAIT_BEFORE ensures that dirty pages get written out but it is no longer asynchronous because it blocks. -aneesh
Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3
Am 12.10.2011 14:51, schrieb Stefan Hajnoczi: Also a bit in l2 offset to say there is no l2 table cause all clusters in l2 are contiguous so we avoid entirely l2. Obviously this require an optimization step to detect or create such condition. There are several reserved L1 entry bits which could be used to mark this mode. This mode severely restricts qcow2 features though: how would snapshots and COW work? Perhaps by breaking the huge cluster back into an L2 table with individual clusters? Backing files also cannot be used - unless we extend the sub-clusters approach and also keep a large bitmap with allocated/unallocated/zero information. A mode like this could be used for best performance on local storage, where efficiently image transport (e.g. scp or http) is not required. Actually I think this is reasonable, we could use qemu-img convert to produce a compact qcow2 for export and use the L2-less qcow2 for running the actual VM. Kevin: what do you think about fleshing out this mode instead of sub-clusters? I'm hesitant to something like this as it adds quite some complexity and I'm not sure if there are practical use cases for it at all. If you take the current cluster sizes, an L2 table contains 512 MB of data, so you would lose any sparseness. You would probably already get full allocation just by creating a file system on the image. But even if you do have a use case where sparseness doesn't matter, the effect is very much the same as allowing a 512 MB cluster size and not changing any of the qcow2 internals. (What would the use case be? Backing files or snapshots with a COW granularity of 512 MB isn't going to fly. That leaves only something like encryption.) Kevin
Re: [Qemu-devel] [v7 Patch 1/5]Qemu: Enhance info block to display host cache setting
Am 11.10.2011 05:10, schrieb Supriya Kannery: Enhance info block to display hostcache setting for each block device. Example: (qemu) info block ide0-hd0: removable=0 file=../sles11-32.raw ro=0 drv=raw encrypted=0 Enhanced to display hostcache setting: (qemu) info block ide0-hd0: removable=0 hostcache=1 file=../sles11-32.raw ro=0 drv=raw encrypted=0 Signed-off-by: Supriya Kannery supri...@linux.vnet.ibm.com --- block.c | 20 qmp-commands.hx |2 ++ 2 files changed, 18 insertions(+), 4 deletions(-) Index: qemu/qmp-commands.hx === --- qemu.orig/qmp-commands.hx +++ qemu/qmp-commands.hx @@ -1142,6 +1142,7 @@ Each json-object contain the following: - locked: true if the device is locked, false otherwise (json-bool) - tray-open: only present if removable, true if the device has a tray, and it is open (json-bool) +- hostcache: true if host pagecache enabled, false otherwise (json-bool) - inserted: only present if the device is inserted, it is a json-object containing the following: - file: device file name (json-string) @@ -1163,6 +1164,7 @@ Example: { device:ide0-hd0, locked:false, +hostcache:false, removable:false, inserted:{ ro:false, Index: qemu/block.c === --- qemu.orig/block.c +++ qemu/block.c @@ -1866,6 +1866,15 @@ static void bdrv_print_dict(QObject *obj monitor_printf(mon, tray-open=%d, qdict_get_bool(bs_dict, tray-open)); } + +if (qdict_haskey(bs_dict, open_flags)) { +int open_flags = qdict_get_int(bs_dict, open_flags); +if (open_flags BDRV_O_NOCACHE) +monitor_printf(mon, hostcache=0); +else +monitor_printf(mon, hostcache=1); Coding style requires braces. +} + if (qdict_haskey(bs_dict, inserted)) { QDict *qdict = qobject_to_qdict(qdict_get(bs_dict, inserted)); @@ -1903,11 +1912,14 @@ void bdrv_info(Monitor *mon, QObject **r QDict *bs_dict; bs_obj = qobject_from_jsonf({ 'device': %s, 'type': 'unknown', -'removable': %i, 'locked': %i }, +'removable': %i, 'locked': %i, +'hostcache': %i }, bs-device_name, bdrv_dev_has_removable_media(bs), -bdrv_dev_is_medium_locked(bs)); +bdrv_dev_is_medium_locked(bs), +!(bs-open_flags BDRV_O_NOCACHE)); bs_dict = qobject_to_qdict(bs_obj); +qdict_put(bs_dict, open_flags, qint_from_int(bs-open_flags)); No. This adds a open_flags field to the QMP structure that is transferred to clients. This is wrong, open_flags is an internal thing that should never be visible on an interface. In bdrv_print_dict, access the hostcache field that you introduced, it provides the same information. Kevin
Re: [Qemu-devel] [PATCH 2/2] pseries: Add device tree properties for VMX/VSX and DFP under kvm
On 10/11/2011 06:31 AM, David Gibson wrote: Sufficiently recent PAPR specifications define properties ibm,vmx and ibm,dfp on the CPU node which advertise whether the VMX vector extensions (or the later VSX version) and/or the Decimal Floating Point operations from IBM's recent POWER CPUs are available. Currently we do not put these in the guest device tree and the guest kernel will consequently assume they are not available. This is good, because they are not supported under TCG. VMX is similar enough to Altivec that it might be trivial to support, but VSX and DFP would both require significant work to support in TCG. However, when running under kvm on a host which supports these instructions, there's no reason not to let the guest use them. This patch, therefore, checks for the relevant support on the host CPU and, if present, advertises them to the guest as well. Signed-off-by: David Gibsonda...@gibson.dropbear.id.au --- hw/spapr.c | 17 + target-ppc/kvm.c | 10 ++ target-ppc/kvm_ppc.h | 12 3 files changed, 39 insertions(+), 0 deletions(-) diff --git a/hw/spapr.c b/hw/spapr.c index 9a3a1ea..00b9c67 100644 --- a/hw/spapr.c +++ b/hw/spapr.c @@ -186,6 +186,8 @@ static void *spapr_create_fdt_skel(const char *cpu_model, 0x, 0x}; uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq() : TIMEBASE_FREQ; uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 10; +uint32_t vmx = kvm_enabled() ? kvmppc_get_vmx() : 0; +uint32_t dfp = kvm_enabled() ? kvmppc_get_dfp() : 0; This should only happen when the CPU type selected by -cpu also supports VMX or DFP. For now, this is moot as we can't run any non-host CPU in KVM for HV PPC. But looking forward, it certainly makes sense to split it. This is not a nack, but please make sure to implement -cpu host in the near future and default to that for -M pseries. Then we can change this piece here to check for vcpu capabilities and AND them with the host caps. The rest looks good. Alex if ((index % smt) != 0) { continue; @@ -233,6 +235,21 @@ static void *spapr_create_fdt_skel(const char *cpu_model, segs, sizeof(segs; } +/* Advertise VMX/VSX (vector extensions) if available + * 0 / no property == no vector extensions + * 1 == VMX / Altivec available + * 2 == VSX available */ +if (vmx) { +_FDT((fdt_property_cell(fdt, ibm,vmx, vmx))); +} + +/* Advertise DFP (Decimal Floating Point) if available + * 0 / no property == no DFP + * 1 == DFP available */ +if (dfp) { +_FDT((fdt_property_cell(fdt, ibm,dfp, dfp))); +} + _FDT((fdt_end_node(fdt))); } diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 6667b61..6a48eb4 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -694,6 +694,16 @@ uint64_t kvmppc_get_clockfreq(void) return kvmppc_read_int_cpu_dt(clock-frequency); } +uint32_t kvmppc_get_vmx(void) +{ +return kvmppc_read_int_cpu_dt(ibm,vmx); +} + +uint32_t kvmppc_get_dfp(void) +{ +return kvmppc_read_int_cpu_dt(ibm,dfp); +} + int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len) { uint32_t *hc = (uint32_t*)buf; diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 9e8a7b5..fa131bf 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -15,6 +15,8 @@ void kvmppc_init(void); uint32_t kvmppc_get_tbfreq(void); uint64_t kvmppc_get_clockfreq(void); +uint32_t kvmppc_get_vmx(void); +uint32_t kvmppc_get_dfp(void); int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len); int kvmppc_set_interrupt(CPUState *env, int irq, int level); void kvmppc_set_papr(CPUState *env); @@ -35,6 +37,16 @@ static inline uint64_t kvmppc_get_clockfreq(void) return 0; } +static inline uint32_t kvmppc_get_vmx(void) +{ +return 0; +} + +static inline uint32_t kvmppc_get_dfp(void) +{ +return 0; +} + static inline int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len) { return -1;
Re: [Qemu-devel] [PATCH V4 2/2] hw/9pfs: Add readonly support for 9p export
On Wed, 12 Oct 2011 13:23:34 +0530, M. Mohan Kumar mo...@in.ibm.com wrote: A new fsdev parameter readonly is introduced to control accessing 9p export. readonly=on|off can be used to specify the access type. By default rw access is given. Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- Changes from previous version V3: * Use opt_set_bool function to set readonly option * Change the flag from MS_READONLY to 9p specific Change from previous version V2: * QEMU_OPT_BOOL is used for readdonly parameter Changes from previous version: * Use readonly option instead of access * Change function return type to boolean where its needed fsdev/file-op-9p.h |3 +- fsdev/qemu-fsdev.c | 12 +- fsdev/qemu-fsdev.h |1 + hw/9pfs/virtio-9p-device.c |3 ++ hw/9pfs/virtio-9p.c| 46 qemu-config.c |7 ++ vl.c |2 + 7 files changed, 71 insertions(+), 3 deletions(-) diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 33fb07f..b75290d 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -58,7 +58,8 @@ typedef struct extended_ops { } extended_ops; /* FsContext flag values */ -#define PATHNAME_FSCONTEXT 0x1 +#define PATHNAME_FSCONTEXT 0x1 why ? +#define P9_RDONLY_EXPORT0x2 /* cache flags */ #define V9FS_WRITETHROUGH_CACHE 0x1 diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index d08ba9c..f8a8227 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -29,13 +29,13 @@ static FsTypeTable FsTypes[] = { int qemu_fsdev_add(QemuOpts *opts) { struct FsTypeListEntry *fsle; -int i; +int i, flags = 0; const char *fsdev_id = qemu_opts_id(opts); const char *fstype = qemu_opt_get(opts, fstype); const char *path = qemu_opt_get(opts, path); const char *sec_model = qemu_opt_get(opts, security_model); const char *cache = qemu_opt_get(opts, cache); - +int rdonly = qemu_opt_get_bool(opts, readonly, 0); if (!fsdev_id) { fprintf(stderr, fsdev: No id specified\n); @@ -76,6 +76,14 @@ int qemu_fsdev_add(QemuOpts *opts) fsle-fse.cache_flags = V9FS_WRITETHROUGH_CACHE; } } +if (rdonly) { +flags |= P9_RDONLY_EXPORT; +} else { +flags = ~P9_RDONLY_EXPORT; +} + +fsle-fse.flags = flags; + QTAILQ_INSERT_TAIL(fstype_entries, fsle, next); return 0; } diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index 0f67880..2938eaf 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -44,6 +44,7 @@ typedef struct FsTypeEntry { char *security_model; int cache_flags; FileOperations *ops; +int flags; do we need extra flags ? Why not use cache_flags ? renaming that to export_flags ? } FsTypeEntry; typedef struct FsTypeListEntry { diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index 1846e36..336292c 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -125,6 +125,9 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) s-tag_len = len; s-ctx.uid = -1; s-ctx.flags = 0; +if (fse-flags P9_RDONLY_EXPORT) { +s-ctx.flags |= P9_RDONLY_EXPORT; +} s-ops = fse-ops; s-vdev.get_features = virtio_9p_get_features; diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c index 47ed2f1..9f15787 100644 --- a/hw/9pfs/virtio-9p.c +++ b/hw/9pfs/virtio-9p.c @@ -1271,6 +1271,11 @@ static void v9fs_fix_path(V9fsPath *dst, V9fsPath *src, int len) dst-size++; } +static inline bool is_ro_export(FsContext *fs_ctx) +{ +return fs_ctx-flags P9_RDONLY_EXPORT; +} + static void v9fs_version(void *opaque) { V9fsPDU *pdu = opaque; @@ -1690,6 +1695,14 @@ static void v9fs_open(void *opaque) } else { flags = omode_to_uflags(mode); } +if (is_ro_export(s-ctx)) { +if (mode O_WRONLY || mode O_RDWR || mode O_APPEND) { +err = -EROFS; +goto out; +} else { +flags |= O_NOATIME; +} +} What about O_TRUNC ? err = v9fs_co_open(pdu, fidp, flags); if (err 0) { goto out; @@ -3301,6 +3314,33 @@ static void v9fs_op_not_supp(void *opaque) complete_pdu(pdu-s, pdu, -EOPNOTSUPP); } -aneesh
Re: [Qemu-devel] [PATCH 1/2] ppc: Generalize the kvmppc_get_clockfreq() function
On 10/11/2011 06:31 AM, David Gibson wrote: Currently the kvmppc_get_clockfreq() function reads the host's clock frequency from /proc/device-tree, which is useful to past to the guest in KVM setups. However, there are some other host properties advertised in the device tree which can also be relevant to the guests. This patch, therefore, replaces kvmppc_get_clockfreq() which can retrieve any named, single integer property from the host device tree's CPU node. Signed-off-by: David Gibsonda...@gibson.dropbear.id.au Thanks, applied both patches to ppc-next. Alex
Re: [Qemu-devel] [PATCH] hw/9pfs: Handle Security model parsing
On Wed, 12 Oct 2011 13:24:16 +0530, M. Mohan Kumar mo...@in.ibm.com wrote: Security model is needed only for 'local' fs driver. Can you also cleanup that fstype - fsdriver rename ? fsdriver seems more appropriate. Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- fsdev/qemu-fsdev.c |6 + fsdev/qemu-fsdev.h |1 + hw/9pfs/virtio-9p-device.c | 47 ++- vl.c | 20 +++-- 4 files changed, 43 insertions(+), 31 deletions(-) diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 36db127..d08ba9c 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -58,11 +58,6 @@ int qemu_fsdev_add(QemuOpts *opts) return -1; } -if (!sec_model) { -fprintf(stderr, fsdev: No security_model specified.\n); -return -1; -} - if (!path) { fprintf(stderr, fsdev: No path specified.\n); return -1; @@ -72,6 +67,7 @@ int qemu_fsdev_add(QemuOpts *opts) fsle-fse.fsdev_id = g_strdup(fsdev_id); fsle-fse.path = g_strdup(path); +fsle-fse.fsdriver = g_strdup(fstype); Why use it as a string ? Why can't this again be an export_flag. That would help us to avoid that strdup fsle-fse.security_model = g_strdup(sec_model); fsle-fse.ops = FsTypes[i].ops; fsle-fse.cache_flags = 0; diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index 9c440f2..0f67880 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -40,6 +40,7 @@ typedef struct FsTypeTable { typedef struct FsTypeEntry { char *fsdev_id; char *path; +char *fsdriver; char *security_model; int cache_flags; FileOperations *ops; diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index aac58ad..1846e36 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -83,29 +83,30 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) exit(1); } -if (!strcmp(fse-security_model, passthrough)) { -/* Files on the Fileserver set to client user credentials */ -s-ctx.fs_sm = SM_PASSTHROUGH; -s-ctx.xops = passthrough_xattr_ops; -} else if (!strcmp(fse-security_model, mapped)) { -/* Files on the fileserver are set to QEMU credentials. - * Client user credentials are saved in extended attributes. - */ -s-ctx.fs_sm = SM_MAPPED; -s-ctx.xops = mapped_xattr_ops; -} else if (!strcmp(fse-security_model, none)) { -/* - * Files on the fileserver are set to QEMU credentials. - */ -s-ctx.fs_sm = SM_NONE; -s-ctx.xops = none_xattr_ops; -} else { -fprintf(stderr, Default to security_model=none. You may want - enable advanced security model using -security option:\n\t security_model=passthrough\n\t -security_model=mapped\n); -s-ctx.fs_sm = SM_NONE; -s-ctx.xops = none_xattr_ops; +/* security models is needed only for local fs driver */ +if (!strcmp(fse-fsdriver, local)) { +if (!strcmp(fse-security_model, passthrough)) { +/* Files on the Fileserver set to client user credentials */ +s-ctx.fs_sm = SM_PASSTHROUGH; +s-ctx.xops = passthrough_xattr_ops; +} else if (!strcmp(fse-security_model, mapped)) { +/* Files on the fileserver are set to QEMU credentials. +* Client user credentials are saved in extended attributes. +*/ +s-ctx.fs_sm = SM_MAPPED; +s-ctx.xops = mapped_xattr_ops; +} else if (!strcmp(fse-security_model, none)) { +/* +* Files on the fileserver are set to QEMU credentials. +*/ +s-ctx.fs_sm = SM_NONE; +s-ctx.xops = none_xattr_ops; +} else { +fprintf(stderr, Invalid security_model %s specified.\n +Available security models are:\t +passthrough,mapped or none\n, fse-security_model); +exit(1); +} } s-ctx.cache_flags = fse-cache_flags; diff --git a/vl.c b/vl.c index 6760e39..a961fa3 100644 --- a/vl.c +++ b/vl.c @@ -2795,6 +2795,7 @@ int main(int argc, char **argv, char **envp) QemuOpts *fsdev; QemuOpts *device; const char *cache; +const char *fsdriver; olist = qemu_find_opts(virtfs); if (!olist) { @@ -2809,13 +2810,26 @@ int main(int argc, char **argv, char **envp) if (qemu_opt_get(opts, fstype) == NULL || qemu_opt_get(opts, mount_tag) == NULL || -qemu_opt_get(opts, path) == NULL || -qemu_opt_get(opts, security_model) == NULL) { +
[Qemu-devel] [PATCH 03/10] scsi-generic: check ioctl statuses when SG_IO succeeds
A succeeding ioctl does not imply that the SCSI command succeeded. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-generic.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c index 5130e9a..04549aa 100644 --- a/hw/scsi-generic.c +++ b/hw/scsi-generic.c @@ -311,7 +311,7 @@ static int get_blocksize(BlockDriverState *bdrv) io_header.timeout = 6000; /* XXX */ ret = bdrv_ioctl(bdrv, SG_IO, io_header); -if (ret 0) +if (ret 0 || io_header.driver_status || io_header.host_status) return -1; return ldl_be_p(buf[4]); @@ -341,7 +341,7 @@ static int get_stream_blocksize(BlockDriverState *bdrv) io_header.timeout = 6000; /* XXX */ ret = bdrv_ioctl(bdrv, SG_IO, io_header); -if (ret 0) +if (ret 0 || io_header.driver_status || io_header.host_status) return -1; return ldl_be_p(buf[0]) 0xff; -- 1.7.6
[Qemu-devel] [PATCH 00/10] scsi: add specialized block device passthrough
This series adds a new scsi-block device that is able to do SCSI passthrough for block devices only, but at the same time does not suffer the limitations of scsi-generic. In particular it does not need a bounce buffer that is as big as the request, and will be able to support s/g lists. This puts scsi-block and virtio-blk on feature parity. To do this, scsi-generic is simplified to the point that its ReqOps can be used by any SCSIDevice. Then, a new scsi-disk variant is introduced that uses regular AIO read/writes whenever possible, and falls back to SG_IO for other SCSI commands. Management can then choose between isolating the guest from the specificities of the target device (scsi-hd/scsi-cd), or making it aware of them (scsi-block). In the future, both scsi-generic and scsi-block could be made to support other types of passthrough, for example iSCSI passthrough. Patches 1-4 are cleanups to scsi-generic, patches 5-6 are cleanups to scsi-disk. Then comes the actual implementation of the feature. Paolo Bonzini (10): scsi-generic: drop SCSIGenericState scsi-generic: remove scsi_req_fixup scsi-generic: check ioctl statuses when SG_IO succeeds scsi-generic: look at host status scsi-disk: do not duplicate BlockDriverState member scsi-disk: small clean up to INQUIRY scsi: make reqops const scsi: export scsi_generic_reqops scsi: pass cdb to alloc_req scsi-disk: add scsi-block for device passthrough hw/scsi-bus.c | 12 ++-- hw/scsi-disk.c| 260 +--- hw/scsi-generic.c | 128 --- hw/scsi.h | 12 ++- 4 files changed, 255 insertions(+), 157 deletions(-) -- 1.7.6
[Qemu-devel] [PATCH 06/10] scsi-disk: small clean up to INQUIRY
Set s-removable, s-qdev.blocksize and s-qdev.type in the callers of scsi_initfn. With this in place, s-qdev.type is allowed, and we can just reuse it as the first byte in VPD data (just like we do in standard INQUIRY data). Also set s-removable is set consistently and we can use it. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-disk.c | 46 +- 1 files changed, 21 insertions(+), 25 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 5e3ef51..a4daa29 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -397,11 +397,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf) return -1; } -if (s-qdev.type == TYPE_ROM) { -outbuf[buflen++] = 5; -} else { -outbuf[buflen++] = 0; -} +outbuf[buflen++] = s-qdev.type 0x1f; outbuf[buflen++] = page_code ; // this page outbuf[buflen++] = 0x00; @@ -541,11 +537,10 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf) memset(outbuf, 0, buflen); outbuf[0] = s-qdev.type 0x1f; +outbuf[1] = s-removable ? 0x80 : 0; if (s-qdev.type == TYPE_ROM) { -outbuf[1] = 0x80; memcpy(outbuf[16], QEMU CD-ROM , 16); } else { -outbuf[1] = s-removable ? 0x80 : 0; memcpy(outbuf[16], QEMU HARDDISK , 16); } memcpy(outbuf[8], QEMU, 8); @@ -1511,7 +1506,7 @@ static void scsi_disk_unit_attention_reported(SCSIDevice *dev) } } -static int scsi_initfn(SCSIDevice *dev, uint8_t scsi_type) +static int scsi_initfn(SCSIDevice *dev) { SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev); DriveInfo *dinfo; @@ -1521,7 +1516,7 @@ static int scsi_initfn(SCSIDevice *dev, uint8_t scsi_type) return -1; } -if (scsi_type == TYPE_DISK !bdrv_is_inserted(s-qdev.conf.bs)) { +if (!s-removable !bdrv_is_inserted(s-qdev.conf.bs)) { error_report(Device needs media, but drive is empty); return -1; } @@ -1543,19 +1538,12 @@ static int scsi_initfn(SCSIDevice *dev, uint8_t scsi_type) return -1; } -if (scsi_type == TYPE_ROM) { +if (s-removable) { bdrv_set_dev_ops(s-qdev.conf.bs, scsi_cd_block_ops, s); -s-qdev.blocksize = 2048; -} else if (scsi_type == TYPE_DISK) { -s-qdev.blocksize = s-qdev.conf.logical_block_size; -} else { -error_report(scsi-disk: Unhandled SCSI type %02x, scsi_type); -return -1; } s-cluster_size = s-qdev.blocksize / 512; bdrv_set_buffer_alignment(s-qdev.conf.bs, s-qdev.blocksize); -s-qdev.type = scsi_type; qemu_add_vm_change_state_handler(scsi_dma_restart_cb, s); bdrv_iostatus_enable(s-qdev.conf.bs); add_boot_device_path(s-qdev.conf.bootindex, dev-qdev, ,0); @@ -1564,27 +1552,35 @@ static int scsi_initfn(SCSIDevice *dev, uint8_t scsi_type) static int scsi_hd_initfn(SCSIDevice *dev) { -return scsi_initfn(dev, TYPE_DISK); +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev); +s-qdev.blocksize = s-qdev.conf.logical_block_size; +s-qdev.type = TYPE_DISK; +return scsi_initfn(s-qdev); } static int scsi_cd_initfn(SCSIDevice *dev) { -return scsi_initfn(dev, TYPE_ROM); +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev); +s-qdev.blocksize = 2048; +s-qdev.type = TYPE_ROM; +s-removable = true; +return scsi_initfn(s-qdev); } static int scsi_disk_initfn(SCSIDevice *dev) { DriveInfo *dinfo; -uint8_t scsi_type; if (!dev-conf.bs) { -scsi_type = TYPE_DISK; /* will die in scsi_initfn() */ -} else { -dinfo = drive_get_by_blockdev(dev-conf.bs); -scsi_type = dinfo-media_cd ? TYPE_ROM : TYPE_DISK; +return scsi_initfn(dev); /* ... and die there */ } -return scsi_initfn(dev, scsi_type); +dinfo = drive_get_by_blockdev(dev-conf.bs); +if (dinfo-media_cd) { +return scsi_cd_initfn(dev); +} else { +return scsi_hd_initfn(dev); +} } static SCSIReqOps scsi_disk_reqops = { -- 1.7.6
Re: [Qemu-devel] [v7 Patch 4/5]Qemu: Add commandline -drive option 'hostcache'
Am 11.10.2011 05:11, schrieb Supriya Kannery: qemu command option 'hostcache' added to -drive for block devices. While starting a VM from qemu commandline, this option can be used for setting host cache usage for block data access. Simultaneous use of 'hostcache' and 'cache' options not allowed. Signed-off-by: Supriya Kannery supri...@linux.vnet.ibm.com I'm not sure if introducing this alone makes sense. I think I would only do it when we introduce more options that allow replacing all cache=... options by other parameters. --- blockdev.c | 13 + qemu-config.c |4 qemu-options.hx |2 +- 3 files changed, 18 insertions(+), 1 deletion(-) Index: qemu/blockdev.c === --- qemu.orig/blockdev.c +++ qemu/blockdev.c @@ -237,6 +237,7 @@ DriveInfo *drive_init(QemuOpts *opts, in DriveInfo *dinfo; int snapshot = 0; int ret; +int hostcache = 0; translation = BIOS_ATA_TRANSLATION_AUTO; media = MEDIA_DISK; @@ -319,7 +320,19 @@ DriveInfo *drive_init(QemuOpts *opts, in } } +if ((hostcache = qemu_opt_get_bool(opts, hostcache, -1)) != -1) { +if (!hostcache) { +bdrv_flags |= BDRV_O_NOCACHE; +} else { +bdrv_flags = ~BDRV_O_NOCACHE; +} bdrv_flags is initialised to 0, so the else branch is unnecessary. Kevin
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Wed, Oct 12, 2011 at 2:19 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: On Wed, 12 Oct 2011 10:55:21 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..441a37f 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, const struct iovec *iov, return writev(fd, iov, iovcnt); The sync_file_range(2) call below is dead-code since we'll return immediately after writev(2) completes. The writev(2) return value needs to be saved temporarily and then returned after sync_file_range(2). Missed that. Will fix in the next update } #endif + if (ctx-cache_flags V9FS_WRITETHROUGH_CACHE) { -drive cache=writethrough means something different from 9pfs writethrough. This is confusing so I wonder if there is a better name like immediate write-out. cache=immediate-write-out ? + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when cache=writethrough is sepcified. + */ + sync_file_range(fd, offset, 0, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE); + } I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary. As a best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although a client that rapidly rewrites may be able to leave dirty pages in the host page cache Shouldn't we prevent this ? That is the reason for me to use that WAIT_BEFORE ? The flag will cause sync_file_range(2) to wait on in-flight I/O. The guest will notice slow I/O. You should at least specify a range instead of nbytes=0 in the arguments. Stefan
Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3
On Wed, Oct 12, 2011 at 2:31 PM, Kevin Wolf kw...@redhat.com wrote: Am 12.10.2011 14:51, schrieb Stefan Hajnoczi: Also a bit in l2 offset to say there is no l2 table cause all clusters in l2 are contiguous so we avoid entirely l2. Obviously this require an optimization step to detect or create such condition. There are several reserved L1 entry bits which could be used to mark this mode. This mode severely restricts qcow2 features though: how would snapshots and COW work? Perhaps by breaking the huge cluster back into an L2 table with individual clusters? Backing files also cannot be used - unless we extend the sub-clusters approach and also keep a large bitmap with allocated/unallocated/zero information. A mode like this could be used for best performance on local storage, where efficiently image transport (e.g. scp or http) is not required. Actually I think this is reasonable, we could use qemu-img convert to produce a compact qcow2 for export and use the L2-less qcow2 for running the actual VM. Kevin: what do you think about fleshing out this mode instead of sub-clusters? I'm hesitant to something like this as it adds quite some complexity and I'm not sure if there are practical use cases for it at all. If you take the current cluster sizes, an L2 table contains 512 MB of data, so you would lose any sparseness. You would probably already get full allocation just by creating a file system on the image. But even if you do have a use case where sparseness doesn't matter, the effect is very much the same as allowing a 512 MB cluster size and not changing any of the qcow2 internals. I guess I'm thinking of the 512 MB cluster size situation, because we'd definitely want a cow bitmap in order to keep backing files and sparseness. (What would the use case be? Backing files or snapshots with a COW granularity of 512 MB isn't going to fly. That leaves only something like encryption.) COW granularity needs to stay at 64-256 kb since those are reasonable request sizes for COW. Stefan
Re: [Qemu-devel] [PATCH V5] Add stdio char device on windows
On 06/10/2011 16:37, Fabien Chouteau wrote: Simple implementation of an stdio char device on Windows. Any comments? -- Fabien Chouteau
Re: [Qemu-devel] [v7 Patch 5/5]Qemu: New struct 'BDRVReopenState' for image files reopen
Am 11.10.2011 05:11, schrieb Supriya Kannery: Struct BDRVReopenState introduced for handling reopen state of images. This can be extended by each of the block drivers to reopen respective image files. Implementation for raw-posix is done here. Signed-off-by: Supriya Kannery supri...@linux.vnet.ibm.com Maybe it would make sense to split this into two patches, one for the block.c infrastructure and another one for adding the callbacks in drivers. --- block.c | 46 block/raw-posix.c | 62 ++ block_int.h | 15 + 3 files changed, 114 insertions(+), 9 deletions(-) Index: qemu/block/raw-posix.c === --- qemu.orig/block/raw-posix.c +++ qemu/block/raw-posix.c @@ -279,6 +279,66 @@ static int raw_open(BlockDriverState *bs return raw_open_common(bs, filename, flags, 0); } +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **rs, + int flags) +{ +BDRVReopenState *raw_rs = g_malloc0(sizeof(BDRVReopenState)); +BDRVRawState *s = bs-opaque; + +raw_rs-bs = bs; +raw_rs-reopen_flags = flags; +raw_rs-reopen_fd = -1; + +/* If only O_DIRECT to be toggled, use fcntl */ +if (!((bs-open_flags ~BDRV_O_NOCACHE) ^ +(flags ~BDRV_O_NOCACHE))) { +raw_rs-reopen_fd = dup(s-fd); +if (raw_rs-reopen_fd = 0) { +return -1; This leaks raw_rs. +} +} + +/* TBD: Handle O_DSYNC and other flags */ +*rs = raw_rs; +return 0; +} + +static int raw_reopen_commit(BDRVReopenState *rs) bdrv_reopen_commit must never fail. Any error that can happen must already be handled in bdrv_reopen_prepare. The commit function should really only do s-fd = rs-reopen_fd (besides cleanup), everything else should already be prepared. +{ +BlockDriverState *bs = rs-bs; +BDRVRawState *s = bs-opaque; + +if (!rs-reopen_fd) { +return -1; +} + +int ret = fcntl_setfl(rs-reopen_fd, rs-reopen_flags); reopen_flags is BDRV_O_*, not O_*, so it needs to be translated. +if (ret 0) { +return ret; +} + +/* Set new flags, so replace old fd with new one */ +close(s-fd); +s-fd = rs-reopen_fd; +s-open_flags = bs-open_flags = rs-reopen_flags; +g_free(rs); +rs = NULL; Setting to NULL has no effect, rs isn't read afterwards. + +return 0; +} + +static int raw_reopen_abort(BDRVReopenState *rs) This must not fail either, so it can be void, too. +{ +if (rs-reopen_fd != -1) { +close(rs-reopen_fd); +rs-reopen_fd = -1; +rs-reopen_flags = 0; +} +g_free(rs); +rs = NULL; +return 0; +} + /* XXX: use host sector size if necessary with: #ifdef DIOCGSECTORSIZE { @@ -896,6 +956,9 @@ static BlockDriver bdrv_file = { .instance_size = sizeof(BDRVRawState), .bdrv_probe = NULL, /* no probe for protocols */ .bdrv_file_open = raw_open, +.bdrv_reopen_prepare = raw_reopen_prepare, +.bdrv_reopen_commit = raw_reopen_commit, +.bdrv_reopen_abort = raw_reopen_abort, .bdrv_read = raw_read, .bdrv_write = raw_write, .bdrv_close = raw_close, @@ -1166,6 +1229,10 @@ static BlockDriver bdrv_host_device = { .instance_size = sizeof(BDRVRawState), .bdrv_probe_device = hdev_probe_device, .bdrv_file_open = hdev_open, +.bdrv_reopen_prepare += raw_reopen_prepare, +.bdrv_reopen_commit = raw_reopen_commit, +.bdrv_reopen_abort = raw_reopen_abort, .bdrv_close = raw_close, .bdrv_create= hdev_create, .create_options = raw_create_options, Index: qemu/block.c === --- qemu.orig/block.c +++ qemu/block.c @@ -706,6 +706,7 @@ int bdrv_reopen(BlockDriverState *bs, in { BlockDriver *drv = bs-drv; int ret = 0, open_flags; +BDRVReopenState *rs; /* Quiesce IO for the given block device */ qemu_aio_flush(); @@ -713,20 +714,48 @@ int bdrv_reopen(BlockDriverState *bs, in qerror_report(QERR_DATA_SYNC_FAILED, bs-device_name); return ret; } -open_flags = bs-open_flags; -bdrv_close(bs); -ret = bdrv_open(bs, bs-filename, bdrv_flags, drv); -if (ret 0) { -/* Reopen failed. Try to open with original flags */ -qerror_report(QERR_REOPEN_FILE_FAILED, bs-filename); -ret = bdrv_open(bs, bs-filename, open_flags, drv); +/* Use driver specific reopen() if available */ +if (drv-bdrv_reopen_prepare) { +ret = drv-bdrv_reopen_prepare(bs, rs, bdrv_flags); if (ret 0) { -/* Reopen failed with orig and modified flags */ -abort(); +
Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3
Am 12.10.2011 16:37, schrieb Stefan Hajnoczi: On Wed, Oct 12, 2011 at 2:31 PM, Kevin Wolf kw...@redhat.com wrote: Am 12.10.2011 14:51, schrieb Stefan Hajnoczi: Also a bit in l2 offset to say there is no l2 table cause all clusters in l2 are contiguous so we avoid entirely l2. Obviously this require an optimization step to detect or create such condition. There are several reserved L1 entry bits which could be used to mark this mode. This mode severely restricts qcow2 features though: how would snapshots and COW work? Perhaps by breaking the huge cluster back into an L2 table with individual clusters? Backing files also cannot be used - unless we extend the sub-clusters approach and also keep a large bitmap with allocated/unallocated/zero information. A mode like this could be used for best performance on local storage, where efficiently image transport (e.g. scp or http) is not required. Actually I think this is reasonable, we could use qemu-img convert to produce a compact qcow2 for export and use the L2-less qcow2 for running the actual VM. Kevin: what do you think about fleshing out this mode instead of sub-clusters? I'm hesitant to something like this as it adds quite some complexity and I'm not sure if there are practical use cases for it at all. If you take the current cluster sizes, an L2 table contains 512 MB of data, so you would lose any sparseness. You would probably already get full allocation just by creating a file system on the image. But even if you do have a use case where sparseness doesn't matter, the effect is very much the same as allowing a 512 MB cluster size and not changing any of the qcow2 internals. I guess I'm thinking of the 512 MB cluster size situation, because we'd definitely want a cow bitmap in order to keep backing files and sparseness. (What would the use case be? Backing files or snapshots with a COW granularity of 512 MB isn't going to fly. That leaves only something like encryption.) COW granularity needs to stay at 64-256 kb since those are reasonable request sizes for COW. But how do you do that without L2 tables? What you're describing (different sizes for allocation and COW) is exactly what subclusters are doing. I can't see how switching to 512 MB clusters and a single-level table can make that work. Kevin
[Qemu-devel] [PATCH 07/10] scsi: make reqops const
Also delete a stale occurrence of SCSIReqOps inside SCSIDeviceInfo. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-bus.c | 10 +- hw/scsi-disk.c|2 +- hw/scsi-generic.c |2 +- hw/scsi.h |7 +++ 4 files changed, 10 insertions(+), 11 deletions(-) diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index bdd6e94..252e903 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -160,7 +160,7 @@ static int32_t scsi_invalid_command(SCSIRequest *req, uint8_t *buf) return 0; } -struct SCSIReqOps reqops_invalid_opcode = { +const struct SCSIReqOps reqops_invalid_opcode = { .size = sizeof(SCSIRequest), .send_command = scsi_invalid_command }; @@ -178,7 +178,7 @@ static int32_t scsi_unit_attention(SCSIRequest *req, uint8_t *buf) return 0; } -struct SCSIReqOps reqops_unit_attention = { +const struct SCSIReqOps reqops_unit_attention = { .size = sizeof(SCSIRequest), .send_command = scsi_unit_attention }; @@ -386,7 +386,7 @@ static uint8_t *scsi_target_get_buf(SCSIRequest *req) return r-buf; } -struct SCSIReqOps reqops_target_command = { +const struct SCSIReqOps reqops_target_command = { .size = sizeof(SCSITargetReq), .send_command = scsi_target_send_command, .read_data= scsi_target_read_data, @@ -394,8 +394,8 @@ struct SCSIReqOps reqops_target_command = { }; -SCSIRequest *scsi_req_alloc(SCSIReqOps *reqops, SCSIDevice *d, uint32_t tag, -uint32_t lun, void *hba_private) +SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d, +uint32_t tag, uint32_t lun, void *hba_private) { SCSIRequest *req; diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index a4daa29..6c1d5a2 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -1583,7 +1583,7 @@ static int scsi_disk_initfn(SCSIDevice *dev) } } -static SCSIReqOps scsi_disk_reqops = { +static const SCSIReqOps scsi_disk_reqops = { .size = sizeof(SCSIDiskReq), .free_req = scsi_free_request, .send_command = scsi_send_command, diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c index 7b291ec..9c9f64a 100644 --- a/hw/scsi-generic.c +++ b/hw/scsi-generic.c @@ -432,7 +432,7 @@ static int scsi_generic_initfn(SCSIDevice *s) return 0; } -static SCSIReqOps scsi_generic_req_ops = { +static const SCSIReqOps scsi_generic_req_ops = { .size = sizeof(SCSIGenericReq), .free_req = scsi_free_request, .send_command = scsi_send_command, diff --git a/hw/scsi.h b/hw/scsi.h index c8649cf..832682e 100644 --- a/hw/scsi.h +++ b/hw/scsi.h @@ -41,7 +41,7 @@ struct SCSICommand { struct SCSIRequest { SCSIBus *bus; SCSIDevice*dev; -SCSIReqOps*ops; +const SCSIReqOps *ops; uint32_t refcount; uint32_t tag; uint32_t lun; @@ -95,7 +95,6 @@ struct SCSIDeviceInfo { SCSIRequest *(*alloc_req)(SCSIDevice *s, uint32_t tag, uint32_t lun, void *hba_private); void (*unit_attention_reported)(SCSIDevice *s); -SCSIReqOps reqops; }; struct SCSIBusInfo { @@ -175,8 +174,8 @@ extern const struct SCSISense sense_code_DEVICE_INTERNAL_RESET; int scsi_sense_valid(SCSISense sense); -SCSIRequest *scsi_req_alloc(SCSIReqOps *reqops, SCSIDevice *d, uint32_t tag, -uint32_t lun, void *hba_private); +SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d, +uint32_t tag, uint32_t lun, void *hba_private); SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, uint8_t *buf, void *hba_private); int32_t scsi_req_enqueue(SCSIRequest *req); -- 1.7.6
[Qemu-devel] [PATCH 05/10] scsi-disk: do not duplicate BlockDriverState member
Same as for scsi-generic, avoid duplication even if it causes longer lines. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-disk.c | 94 +++ 1 files changed, 46 insertions(+), 48 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 860a3bf..5e3ef51 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -65,7 +65,6 @@ typedef struct SCSIDiskReq { struct SCSIDiskState { SCSIDevice qdev; -BlockDriverState *bs; /* The qemu block layer uses a fixed 512 byte sector size. This is the number of 512 byte blocks in a single scsi sector. */ int cluster_size; @@ -119,7 +118,7 @@ static uint32_t scsi_init_iovec(SCSIDiskReq *r) if (!r-iov.iov_base) { r-buflen = SCSI_DMA_BUF_SIZE; -r-iov.iov_base = qemu_blockalign(s-bs, r-buflen); +r-iov.iov_base = qemu_blockalign(s-qdev.conf.bs, r-buflen); } r-iov.iov_len = MIN(r-sector_count * 512, r-buflen); qemu_iovec_init_external(r-qiov, r-iov, 1); @@ -134,7 +133,7 @@ static void scsi_read_complete(void * opaque, int ret) if (r-req.aiocb != NULL) { r-req.aiocb = NULL; -bdrv_acct_done(s-bs, r-acct); +bdrv_acct_done(s-qdev.conf.bs, r-acct); } if (ret) { @@ -160,7 +159,7 @@ static void scsi_flush_complete(void * opaque, int ret) if (r-req.aiocb != NULL) { r-req.aiocb = NULL; -bdrv_acct_done(s-bs, r-acct); +bdrv_acct_done(s-qdev.conf.bs, r-acct); } if (ret 0) { @@ -210,8 +209,8 @@ static void scsi_read_data(SCSIRequest *req) /* Save a ref for scsi_read_complete, in case r is canceled. */ scsi_req_ref(r-req); n = scsi_init_iovec(r); -bdrv_acct_start(s-bs, r-acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_READ); -r-req.aiocb = bdrv_aio_readv(s-bs, r-sector, r-qiov, n, +bdrv_acct_start(s-qdev.conf.bs, r-acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_READ); +r-req.aiocb = bdrv_aio_readv(s-qdev.conf.bs, r-sector, r-qiov, n, scsi_read_complete, r); if (r-req.aiocb == NULL) { scsi_read_complete(r, -EIO); @@ -222,10 +221,10 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error, int type) { int is_read = (type == SCSI_REQ_STATUS_RETRY_READ); SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r-req.dev); -BlockErrorAction action = bdrv_get_on_error(s-bs, is_read); +BlockErrorAction action = bdrv_get_on_error(s-qdev.conf.bs, is_read); if (action == BLOCK_ERR_IGNORE) { -bdrv_mon_event(s-bs, BDRV_ACTION_IGNORE, is_read); +bdrv_mon_event(s-qdev.conf.bs, BDRV_ACTION_IGNORE, is_read); return 0; } @@ -235,9 +234,9 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error, int type) type = SCSI_REQ_STATUS_RETRY_TYPE_MASK; r-status |= SCSI_REQ_STATUS_RETRY | type; -bdrv_mon_event(s-bs, BDRV_ACTION_STOP, is_read); +bdrv_mon_event(s-qdev.conf.bs, BDRV_ACTION_STOP, is_read); vm_stop(RUN_STATE_IO_ERROR); -bdrv_iostatus_set_err(s-bs, error); +bdrv_iostatus_set_err(s-qdev.conf.bs, error); return 1; } else { switch (error) { @@ -254,7 +253,7 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error, int type) scsi_check_condition(r, SENSE_CODE(IO_ERROR)); break; } -bdrv_mon_event(s-bs, BDRV_ACTION_REPORT, is_read); +bdrv_mon_event(s-qdev.conf.bs, BDRV_ACTION_REPORT, is_read); return 0; } } @@ -267,7 +266,7 @@ static void scsi_write_complete(void * opaque, int ret) if (r-req.aiocb != NULL) { r-req.aiocb = NULL; -bdrv_acct_done(s-bs, r-acct); +bdrv_acct_done(s-qdev.conf.bs, r-acct); } if (ret) { @@ -311,8 +310,8 @@ static void scsi_write_data(SCSIRequest *req) if (s-tray_open) { scsi_write_complete(r, -ENOMEDIUM); } -bdrv_acct_start(s-bs, r-acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_WRITE); -r-req.aiocb = bdrv_aio_writev(s-bs, r-sector, r-qiov, n, +bdrv_acct_start(s-qdev.conf.bs, r-acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_WRITE); +r-req.aiocb = bdrv_aio_writev(s-qdev.conf.bs, r-sector, r-qiov, n, scsi_write_complete, r); if (r-req.aiocb == NULL) { scsi_write_complete(r, -ENOMEM); @@ -450,7 +449,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf) case 0x83: /* Device identification page, mandatory */ { int max_len = 255 - 8; -int id_len = strlen(bdrv_get_device_name(s-bs)); +int id_len = strlen(bdrv_get_device_name(s-qdev.conf.bs)); if (id_len max_len) id_len = max_len; @@ -463,7 +462,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf) outbuf[buflen++] = 0; // reserved
[Qemu-devel] [PATCH 08/10] scsi: export scsi_generic_reqops
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-generic.c |2 +- hw/scsi.h |3 +++ 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c index 9c9f64a..eb066ba 100644 --- a/hw/scsi-generic.c +++ b/hw/scsi-generic.c @@ -432,7 +432,7 @@ static int scsi_generic_initfn(SCSIDevice *s) return 0; } -static const SCSIReqOps scsi_generic_req_ops = { +const SCSIReqOps scsi_generic_req_ops = { .size = sizeof(SCSIGenericReq), .free_req = scsi_free_request, .send_command = scsi_send_command, diff --git a/hw/scsi.h b/hw/scsi.h index 832682e..c23 100644 --- a/hw/scsi.h +++ b/hw/scsi.h @@ -196,4 +196,7 @@ void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense); int scsi_device_get_sense(SCSIDevice *dev, uint8_t *buf, int len, bool fixed); SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int target, int lun); +/* scsi-generic.c. */ +extern const SCSIReqOps scsi_generic_req_ops; + #endif -- 1.7.6
[Qemu-devel] [PATCH 10/10] scsi-disk: add scsi-block for device passthrough
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-disk.c | 116 1 files changed, 116 insertions(+), 0 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 835cc7f..c97ef52 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -39,6 +39,10 @@ do { fprintf(stderr, scsi-disk: fmt , ## __VA_ARGS__); } while (0) #include blockdev.h #include block_int.h +#ifdef __linux +#include scsi/sg.h +#endif + #define SCSI_DMA_BUF_SIZE131072 #define SCSI_MAX_INQUIRY_LEN 256 @@ -1603,6 +1607,103 @@ static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t tag, uint32_t lun, return req; } +#ifdef __linux__ +static int get_device_type(SCSIDiskState *s) +{ +BlockDriverState *bdrv = s-qdev.conf.bs; +uint8_t cmd[16]; +uint8_t buf[36]; +uint8_t sensebuf[8]; +sg_io_hdr_t io_header; +int ret; + +memset(cmd, 0, sizeof(cmd)); +memset(buf, 0, sizeof(buf)); +cmd[0] = INQUIRY; +cmd[4] = sizeof(buf); + +memset(io_header, 0, sizeof(io_header)); +io_header.interface_id = 'S'; +io_header.dxfer_direction = SG_DXFER_FROM_DEV; +io_header.dxfer_len = sizeof(buf); +io_header.dxferp = buf; +io_header.cmdp = cmd; +io_header.cmd_len = sizeof(cmd); +io_header.mx_sb_len = sizeof(sensebuf); +io_header.sbp = sensebuf; +io_header.timeout = 6000; /* XXX */ + +ret = bdrv_ioctl(bdrv, SG_IO, io_header); +if (ret 0 || io_header.driver_status || io_header.host_status) { +return -1; +} +s-qdev.blocksize = s-qdev.conf.logical_block_size; +s-qdev.type = buf[0]; +s-removable = (buf[1] 0x80) != 0; +return 0; +} + +static int scsi_block_initfn(SCSIDevice *dev) +{ +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev); +int sg_version; +int rc; + +if (!s-qdev.conf.bs) { +error_report(scsi-block: drive property not set); +return -1; +} + +/* check we are using a driver managing SG_IO (version 3 and after) */ +if (bdrv_ioctl(s-qdev.conf.bs, SG_GET_VERSION_NUM, sg_version) 0 || +sg_version 3) { +error_report(scsi-block: scsi generic interface too old); +return -1; +} + +/* get device type from INQUIRY data */ +rc = get_device_type(s); +if (rc 0) { +error_report(scsi-block: INQUIRY failed); +return -1; +} + +return scsi_initfn(s-qdev); +} + +static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, + uint32_t lun, uint8_t *buf, + void *hba_private) +{ +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d); + +switch (buf[0]) { +case SERVICE_ACTION_IN_16: +if ((buf[1] 31) != SAI_READ_CAPACITY_16) { +break; +} + +case READ_6: +case READ_10: +case READ_12: +case READ_16: +case READ_CAPACITY_10: +case WRITE_6: +case WRITE_10: +case WRITE_12: +case WRITE_16: +case WRITE_VERIFY_10: +case WRITE_VERIFY_12: +case WRITE_VERIFY_16: +return scsi_req_alloc(scsi_disk_reqops, s-qdev, tag, lun, + hba_private); +} + +return scsi_req_alloc(scsi_generic_req_ops, s-qdev, tag, lun, + hba_private); +} +#endif + #define DEFINE_SCSI_DISK_PROPERTIES() \ DEFINE_BLOCK_PROPERTIES(SCSIDiskState, qdev.conf), \ DEFINE_PROP_STRING(ver, SCSIDiskState, version), \ @@ -1638,6 +1739,21 @@ static SCSIDeviceInfo scsi_disk_info[] = { DEFINE_SCSI_DISK_PROPERTIES(), DEFINE_PROP_END_OF_LIST(), }, +#ifdef __linux__ +},{ +.qdev.name= scsi-block, +.qdev.fw_name = disk, +.qdev.desc= SCSI block device passthrough, +.qdev.size= sizeof(SCSIDiskState), +.qdev.reset = scsi_disk_reset, +.init = scsi_block_initfn, +.destroy = scsi_destroy, +.alloc_req= scsi_block_new_request, +.qdev.props = (Property[]) { +DEFINE_SCSI_DISK_PROPERTIES(), +DEFINE_PROP_END_OF_LIST(), +}, +#endif },{ .qdev.name= scsi-disk, /* legacy -device scsi-disk */ .qdev.fw_name = disk, -- 1.7.6
[Qemu-devel] [PATCH 01/10] scsi-generic: drop SCSIGenericState
It is not needed, because s-bs is already stored in SCSIDevice, and can be reached from the conf.bs member. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-generic.c | 85 +++-- 1 files changed, 37 insertions(+), 48 deletions(-) diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c index 2696fa1..1e5c41b 100644 --- a/hw/scsi-generic.c +++ b/hw/scsi-generic.c @@ -46,8 +46,6 @@ do { fprintf(stderr, scsi-generic: fmt , ## __VA_ARGS__); } while (0) #define MAX_UINT ((unsigned int)-1) #endif -typedef struct SCSIGenericState SCSIGenericState; - typedef struct SCSIGenericReq { SCSIRequest req; uint8_t *buf; @@ -56,12 +54,6 @@ typedef struct SCSIGenericReq { sg_io_hdr_t io_header; } SCSIGenericReq; -struct SCSIGenericState -{ -SCSIDevice qdev; -BlockDriverState *bs; -}; - static void scsi_free_request(SCSIRequest *req) { SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req); @@ -174,7 +166,7 @@ static void scsi_read_complete(void * opaque, int ret) static void scsi_read_data(SCSIRequest *req) { SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req); -SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev, r-req.dev); +SCSIDevice *s = r-req.dev; int ret; DPRINTF(scsi_read_data 0x%x\n, req-tag); @@ -183,7 +175,7 @@ static void scsi_read_data(SCSIRequest *req) return; } -ret = execute_command(s-bs, r, SG_DXFER_FROM_DEV, scsi_read_complete); +ret = execute_command(s-conf.bs, r, SG_DXFER_FROM_DEV, scsi_read_complete); if (ret 0) { scsi_command_complete(r, ret); return; @@ -193,7 +185,7 @@ static void scsi_read_data(SCSIRequest *req) static void scsi_write_complete(void * opaque, int ret) { SCSIGenericReq *r = (SCSIGenericReq *)opaque; -SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev, r-req.dev); +SCSIDevice *s = r-req.dev; DPRINTF(scsi_write_complete() ret = %d\n, ret); r-req.aiocb = NULL; @@ -204,9 +196,9 @@ static void scsi_write_complete(void * opaque, int ret) } if (r-req.cmd.buf[0] == MODE_SELECT r-req.cmd.buf[4] == 12 -s-qdev.type == TYPE_TAPE) { -s-qdev.blocksize = (r-buf[9] 16) | (r-buf[10] 8) | r-buf[11]; -DPRINTF(block size %d\n, s-qdev.blocksize); +s-type == TYPE_TAPE) { +s-blocksize = (r-buf[9] 16) | (r-buf[10] 8) | r-buf[11]; +DPRINTF(block size %d\n, s-blocksize); } scsi_command_complete(r, ret); @@ -216,8 +208,8 @@ static void scsi_write_complete(void * opaque, int ret) The transfer may complete asynchronously. */ static void scsi_write_data(SCSIRequest *req) { -SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev, req-dev); SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req); +SCSIDevice *s = r-req.dev; int ret; DPRINTF(scsi_write_data 0x%x\n, req-tag); @@ -227,7 +219,7 @@ static void scsi_write_data(SCSIRequest *req) return; } -ret = execute_command(s-bs, r, SG_DXFER_TO_DEV, scsi_write_complete); +ret = execute_command(s-conf.bs, r, SG_DXFER_TO_DEV, scsi_write_complete); if (ret 0) { scsi_command_complete(r, ret); } @@ -261,8 +253,8 @@ static void scsi_req_fixup(SCSIRequest *req) static int32_t scsi_send_command(SCSIRequest *req, uint8_t *cmd) { -SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev, req-dev); SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req); +SCSIDevice *s = r-req.dev; int ret; scsi_req_fixup(r-req); @@ -285,7 +277,7 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *cmd) g_free(r-buf); r-buflen = 0; r-buf = NULL; -ret = execute_command(s-bs, r, SG_DXFER_NONE, scsi_command_complete); +ret = execute_command(s-conf.bs, r, SG_DXFER_NONE, scsi_command_complete); if (ret 0) { scsi_command_complete(r, ret); return 0; @@ -372,77 +364,74 @@ static int get_stream_blocksize(BlockDriverState *bdrv) static void scsi_generic_reset(DeviceState *dev) { -SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev.qdev, dev); +SCSIDevice *s = DO_UPCAST(SCSIDevice, qdev, dev); -scsi_device_purge_requests(s-qdev, SENSE_CODE(RESET)); +scsi_device_purge_requests(s, SENSE_CODE(RESET)); } -static void scsi_destroy(SCSIDevice *d) +static void scsi_destroy(SCSIDevice *s) { -SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev, d); - -scsi_device_purge_requests(s-qdev, SENSE_CODE(NO_SENSE)); -blockdev_mark_auto_del(s-qdev.conf.bs); +scsi_device_purge_requests(s, SENSE_CODE(NO_SENSE)); +blockdev_mark_auto_del(s-conf.bs); } -static int scsi_generic_initfn(SCSIDevice *dev) +static int scsi_generic_initfn(SCSIDevice *s) { -SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev, dev); int sg_version; struct sg_scsi_id scsiid; -if (!s-qdev.conf.bs)
[Qemu-devel] [PATCH 09/10] scsi: pass cdb to alloc_req
This will let scsi-block choose between passthrough and emulation. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-bus.c |2 +- hw/scsi-disk.c|4 ++-- hw/scsi-generic.c |2 +- hw/scsi.h |2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index 252e903..72c0dd2 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -451,7 +451,7 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, req = scsi_req_alloc(reqops_target_command, d, tag, lun, hba_private); } else { -req = d-info-alloc_req(d, tag, lun, hba_private); +req = d-info-alloc_req(d, tag, lun, buf, hba_private); } } diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 6c1d5a2..835cc7f 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -1593,8 +1593,8 @@ static const SCSIReqOps scsi_disk_reqops = { .get_buf = scsi_get_buf, }; -static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t tag, - uint32_t lun, void *hba_private) +static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t tag, uint32_t lun, + uint8_t *buf, void *hba_private) { SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d); SCSIRequest *req; diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c index eb066ba..9091e13 100644 --- a/hw/scsi-generic.c +++ b/hw/scsi-generic.c @@ -443,7 +443,7 @@ const SCSIReqOps scsi_generic_req_ops = { }; static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t tag, uint32_t lun, - void *hba_private) + uint8_t *buf, void *hba_private) { SCSIRequest *req; diff --git a/hw/scsi.h b/hw/scsi.h index c23..aa816e6 100644 --- a/hw/scsi.h +++ b/hw/scsi.h @@ -93,7 +93,7 @@ struct SCSIDeviceInfo { scsi_qdev_initfn init; void (*destroy)(SCSIDevice *s); SCSIRequest *(*alloc_req)(SCSIDevice *s, uint32_t tag, uint32_t lun, - void *hba_private); + uint8_t *buf, void *hba_private); void (*unit_attention_reported)(SCSIDevice *s); }; -- 1.7.6
Re: [Qemu-devel] [PATCH V4 2/2] hw/9pfs: Add readonly support for 9p export
On Wednesday, October 12, 2011 07:38:25 PM Aneesh Kumar K.V wrote: On Wed, 12 Oct 2011 13:23:34 +0530, M. Mohan Kumar mo...@in.ibm.com wrote: A new fsdev parameter readonly is introduced to control accessing 9p export. readonly=on|off can be used to specify the access type. By default rw access is given. Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- Changes from previous version V3: * Use opt_set_bool function to set readonly option * Change the flag from MS_READONLY to 9p specific Change from previous version V2: * QEMU_OPT_BOOL is used for readdonly parameter Changes from previous version: * Use readonly option instead of access * Change function return type to boolean where its needed fsdev/file-op-9p.h |3 +- fsdev/qemu-fsdev.c | 12 +- fsdev/qemu-fsdev.h |1 + hw/9pfs/virtio-9p-device.c |3 ++ hw/9pfs/virtio-9p.c| 46 qemu-config.c |7 ++ vl.c |2 + 7 files changed, 71 insertions(+), 3 deletions(-) diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 33fb07f..b75290d 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -58,7 +58,8 @@ typedef struct extended_ops { } extended_ops; /* FsContext flag values */ -#define PATHNAME_FSCONTEXT 0x1 +#define PATHNAME_FSCONTEXT 0x1 why ? It was part of alignment change for above line +#define P9_RDONLY_EXPORT0x2 /* cache flags */ #define V9FS_WRITETHROUGH_CACHE 0x1 diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index d08ba9c..f8a8227 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -29,13 +29,13 @@ static FsTypeTable FsTypes[] = { int qemu_fsdev_add(QemuOpts *opts) { struct FsTypeListEntry *fsle; -int i; +int i, flags = 0; const char *fsdev_id = qemu_opts_id(opts); const char *fstype = qemu_opt_get(opts, fstype); const char *path = qemu_opt_get(opts, path); const char *sec_model = qemu_opt_get(opts, security_model); const char *cache = qemu_opt_get(opts, cache); - +int rdonly = qemu_opt_get_bool(opts, readonly, 0); if (!fsdev_id) { fprintf(stderr, fsdev: No id specified\n); @@ -76,6 +76,14 @@ int qemu_fsdev_add(QemuOpts *opts) fsle-fse.cache_flags = V9FS_WRITETHROUGH_CACHE; } } +if (rdonly) { +flags |= P9_RDONLY_EXPORT; +} else { +flags = ~P9_RDONLY_EXPORT; +} + +fsle-fse.flags = flags; + QTAILQ_INSERT_TAIL(fstype_entries, fsle, next); return 0; } diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index 0f67880..2938eaf 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -44,6 +44,7 @@ typedef struct FsTypeEntry { char *security_model; int cache_flags; FileOperations *ops; +int flags; do we need extra flags ? Why not use cache_flags ? renaming that to export_flags ? Yes, we can use same variable. } FsTypeEntry; typedef struct FsTypeListEntry { diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index 1846e36..336292c 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -125,6 +125,9 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) s-tag_len = len; s-ctx.uid = -1; s-ctx.flags = 0; +if (fse-flags P9_RDONLY_EXPORT) { +s-ctx.flags |= P9_RDONLY_EXPORT; +} s-ops = fse-ops; s-vdev.get_features = virtio_9p_get_features; diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c index 47ed2f1..9f15787 100644 --- a/hw/9pfs/virtio-9p.c +++ b/hw/9pfs/virtio-9p.c @@ -1271,6 +1271,11 @@ static void v9fs_fix_path(V9fsPath *dst, V9fsPath *src, int len) dst-size++; } +static inline bool is_ro_export(FsContext *fs_ctx) +{ +return fs_ctx-flags P9_RDONLY_EXPORT; +} + static void v9fs_version(void *opaque) { V9fsPDU *pdu = opaque; @@ -1690,6 +1695,14 @@ static void v9fs_open(void *opaque) } else { flags = omode_to_uflags(mode); } +if (is_ro_export(s-ctx)) { +if (mode O_WRONLY || mode O_RDWR || mode O_APPEND) { +err = -EROFS; +goto out; +} else { +flags |= O_NOATIME; +} +} What about O_TRUNC ? Thanks, I will include the check for O_TRUNC err = v9fs_co_open(pdu, fidp, flags); if (err 0) { goto out; @@ -3301,6 +3314,33 @@ static void v9fs_op_not_supp(void *opaque) complete_pdu(pdu-s, pdu, -EOPNOTSUPP);
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Wed, 12 Oct 2011 15:32:36 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Wed, Oct 12, 2011 at 2:19 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: On Wed, 12 Oct 2011 10:55:21 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..441a37f 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, const struct iovec *iov, return writev(fd, iov, iovcnt); The sync_file_range(2) call below is dead-code since we'll return immediately after writev(2) completes. The writev(2) return value needs to be saved temporarily and then returned after sync_file_range(2). Missed that. Will fix in the next update } #endif + if (ctx-cache_flags V9FS_WRITETHROUGH_CACHE) { -drive cache=writethrough means something different from 9pfs writethrough. This is confusing so I wonder if there is a better name like immediate write-out. cache=immediate-write-out ? + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when cache=writethrough is sepcified. + */ + sync_file_range(fd, offset, 0, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE); + } I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary. As a best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although a client that rapidly rewrites may be able to leave dirty pages in the host page cache Shouldn't we prevent this ? That is the reason for me to use that WAIT_BEFORE ? The flag will cause sync_file_range(2) to wait on in-flight I/O. The guest will notice slow I/O. You should at least specify a range instead of nbytes=0 in the arguments. version 3 i have commit db3cb2ac561d837176f9e0d02075a898c57ad309 Author: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Date: Wed Oct 12 19:11:23 2011 +0530 hw/9pfs: Add new virtfs option writeout=immediate skip host page cache writeout=immediate implies the after pwritev we do a sync_file_range. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 8de8abf..af3ecbe 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -53,12 +53,16 @@ struct xattr_operations; /* FsContext flag values */ #define PATHNAME_FSCONTEXT 0x1 +/* export flags */ +#define V9FS_IMMEDIATE_WRITEOUT 0x1 + typedef struct FsContext { int flags; char *fs_root; SecModel fs_sm; uid_t uid; +int export_flags; struct xattr_operations **xops; /* fs driver specific data */ void *private; diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 768819f..946bad7 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts) const char *fstype = qemu_opt_get(opts, fstype); const char *path = qemu_opt_get(opts, path); const char *sec_model = qemu_opt_get(opts, security_model); +const char *writeout = qemu_opt_get(opts, writeout); + if (!fsdev_id) { fprintf(stderr, fsdev: No id specified\n); @@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts) fsle-fse.path = g_strdup(path); fsle-fse.security_model = g_strdup(sec_model); fsle-fse.ops = FsTypes[i].ops; - +fsle-fse.export_flags = 0; +if (writeout) { +if (!strcmp(writeout, immediate)) { +fsle-fse.export_flags = V9FS_IMMEDIATE_WRITEOUT; +} +} QTAILQ_INSERT_TAIL(fstype_entries, fsle, next); return 0; - } FsTypeEntry *get_fsdev_fsentry(char *id) diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index e04931a..3b2feb4 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -41,6 +41,7 @@ typedef struct FsTypeEntry { char *fsdev_id; char *path; char *security_model; +int export_flags; FileOperations *ops; } FsTypeEntry; diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index e5b68da..403eed0 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -115,6 +115,7 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) exit(1); } +s-ctx.export_flags = fse-export_flags; s-ctx.fs_root = g_strdup(fse-path); len = strlen(conf-tag); if (len MAX_TAG_LEN) { diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..91d757a 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -192,16 +192,27 @@ static ssize_t handle_preadv(FsContext *ctx, int fd, const struct iovec *iov, static ssize_t
[Qemu-devel] [PATCH 04/10] scsi-generic: look at host status
Pass down the host status so that failing transport can be detected by the guest. Similar treatment of host status could be done in virtio-blk, too. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-generic.c | 20 1 files changed, 16 insertions(+), 4 deletions(-) diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c index 04549aa..7b291ec 100644 --- a/hw/scsi-generic.c +++ b/hw/scsi-generic.c @@ -39,8 +39,13 @@ do { fprintf(stderr, scsi-generic: fmt , ## __VA_ARGS__); } while (0) #define SCSI_SENSE_BUF_SIZE 96 -#define SG_ERR_DRIVER_TIMEOUT 0x06 -#define SG_ERR_DRIVER_SENSE 0x08 +#define SG_ERR_DRIVER_TIMEOUT 0x06 +#define SG_ERR_DRIVER_SENSE0x08 + +#define SG_ERR_DID_OK 0x00 +#define SG_ERR_DID_NO_CONNECT 0x01 +#define SG_ERR_DID_BUS_BUSY0x02 +#define SG_ERR_DID_TIME_OUT0x03 #ifndef MAX_UINT #define MAX_UINT ((unsigned int)-1) @@ -68,8 +73,9 @@ static void scsi_command_complete(void *opaque, int ret) SCSIGenericReq *r = (SCSIGenericReq *)opaque; r-req.aiocb = NULL; -if (r-io_header.driver_status SG_ERR_DRIVER_SENSE) +if (r-io_header.driver_status SG_ERR_DRIVER_SENSE) { r-req.sense_len = r-io_header.sb_len_wr; +} if (ret != 0) { switch (ret) { @@ -86,9 +92,15 @@ static void scsi_command_complete(void *opaque, int ret) break; } } else { -if (r-io_header.driver_status SG_ERR_DRIVER_TIMEOUT) { +if (r-io_header.host_status == SG_ERR_DID_NO_CONNECT || +r-io_header.host_status == SG_ERR_DID_BUS_BUSY || +r-io_header.host_status == SG_ERR_DID_TIME_OUT || +(r-io_header.driver_status SG_ERR_DRIVER_TIMEOUT)) { status = BUSY; BADF(Driver Timeout\n); +} else if (r-io_header.host_status) { +status = CHECK_CONDITION; +scsi_req_build_sense(r-req, SENSE_CODE(I_T_NEXUS_LOSS)); } else if (r-io_header.status) { status = r-io_header.status; } else if (r-io_header.driver_status SG_ERR_DRIVER_SENSE) { -- 1.7.6
[Qemu-devel] [PATCH 02/10] scsi-generic: remove scsi_req_fixup
This is not needed anymore, since asynchronous ioctls were introduced by commit 221f715 (new scsi-generic abstraction, use SG_IO, 2009-03-28). Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-generic.c | 15 --- 1 files changed, 0 insertions(+), 15 deletions(-) diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c index 1e5c41b..5130e9a 100644 --- a/hw/scsi-generic.c +++ b/hw/scsi-generic.c @@ -233,19 +233,6 @@ static uint8_t *scsi_get_buf(SCSIRequest *req) return r-buf; } -static void scsi_req_fixup(SCSIRequest *req) -{ -switch(req-cmd.buf[0]) { -case REWIND: -case START_STOP: -if (req-dev-type == TYPE_TAPE) { -/* force IMMED, otherwise qemu waits end of command */ -req-cmd.buf[1] = 0x01; -} -break; -} -} - /* Execute a scsi command. Returns the length of the data expected by the command. This will be Positive for data transfers from the device (eg. disk reads), negative for transfers to the device (eg. disk writes), @@ -257,8 +244,6 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *cmd) SCSIDevice *s = r-req.dev; int ret; -scsi_req_fixup(r-req); - DPRINTF(Command: lun=%d tag=0x%x len %zd data=0x%02x, lun, tag, r-req.cmd.xfer, cmd[0]); -- 1.7.6
Re: [Qemu-devel] [PATCH] hw/9pfs: Handle Security model parsing
-- Regards, M. Mohan Kumar On Wednesday, October 12, 2011 01:58:00 PM Daniel P. Berrange wrote: On Wed, Oct 12, 2011 at 01:24:16PM +0530, M. Mohan Kumar wrote: Security model is needed only for 'local' fs driver. Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- fsdev/qemu-fsdev.c |6 + fsdev/qemu-fsdev.h |1 + hw/9pfs/virtio-9p-device.c | 47 ++- vl.c | 20 +++-- 4 files changed, 43 insertions(+), 31 deletions(-) --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -40,6 +40,7 @@ typedef struct FsTypeTable { typedef struct FsTypeEntry { char *fsdev_id; char *path; +char *fsdriver; char *security_model; int cache_flags; FileOperations *ops; diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index aac58ad..1846e36 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -83,29 +83,30 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) exit(1); } -if (!strcmp(fse-security_model, passthrough)) { -/* Files on the Fileserver set to client user credentials */ -s-ctx.fs_sm = SM_PASSTHROUGH; -s-ctx.xops = passthrough_xattr_ops; -} else if (!strcmp(fse-security_model, mapped)) { -/* Files on the fileserver are set to QEMU credentials. - * Client user credentials are saved in extended attributes. - */ -s-ctx.fs_sm = SM_MAPPED; -s-ctx.xops = mapped_xattr_ops; -} else if (!strcmp(fse-security_model, none)) { -/* - * Files on the fileserver are set to QEMU credentials. - */ -s-ctx.fs_sm = SM_NONE; -s-ctx.xops = none_xattr_ops; -} else { -fprintf(stderr, Default to security_model=none. You may want - enable advanced security model using -security option:\n\t security_model=passthrough\n\t -security_model=mapped\n); -s-ctx.fs_sm = SM_NONE; -s-ctx.xops = none_xattr_ops; +/* security models is needed only for local fs driver */ +if (!strcmp(fse-fsdriver, local)) { +if (!strcmp(fse-security_model, passthrough)) { +/* Files on the Fileserver set to client user credentials */ +s-ctx.fs_sm = SM_PASSTHROUGH; +s-ctx.xops = passthrough_xattr_ops; +} else if (!strcmp(fse-security_model, mapped)) { +/* Files on the fileserver are set to QEMU credentials. +* Client user credentials are saved in extended attributes. +*/ +s-ctx.fs_sm = SM_MAPPED; +s-ctx.xops = mapped_xattr_ops; +} else if (!strcmp(fse-security_model, none)) { +/* +* Files on the fileserver are set to QEMU credentials. +*/ +s-ctx.fs_sm = SM_NONE; +s-ctx.xops = none_xattr_ops; +} else { +fprintf(stderr, Invalid security_model %s specified.\n +Available security models are:\t +passthrough,mapped or none\n, fse-security_model); +exit(1); +} Are you sure there aren't use cases where people would like to choose between passthrough mapped, even when using the 'proxy' or 'handle' security drivers. Proxy FS driver is added to overcome the limit imposed by local + passthrough security model combination that needs qemu to be started by root user. Mapped and none secuiry model can be used by non root user also. So Proxy FS driver does not need any security model(its pass-through only) Both of the security models seem pretty generally useful to me, regardless of the driver type.
Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
On Wed, Oct 12, 2011 at 09:22:01AM +0200, Michael S. Tsirkin wrote: On Wed, Oct 12, 2011 at 02:07:46PM +1100, David Gibson wrote: Um.. why? PCI is defined by the spec to be LE, so I don't see that we need explicit endianness versions for PCI helpers. LE in the spec only applies to structures defined by the spec, that is pci configuration and msix tables in device memory. Well, true. But when was the last time you saw a PCI device with BE registers? I think it's a rare enough case that the individual device models can reswap themselves. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
On Thu, Oct 13, 2011 at 02:43:06AM +1100, David Gibson wrote: On Wed, Oct 12, 2011 at 09:22:01AM +0200, Michael S. Tsirkin wrote: On Wed, Oct 12, 2011 at 02:07:46PM +1100, David Gibson wrote: Um.. why? PCI is defined by the spec to be LE, so I don't see that we need explicit endianness versions for PCI helpers. LE in the spec only applies to structures defined by the spec, that is pci configuration and msix tables in device memory. Well, true. But when was the last time you saw a PCI device with BE registers? I think it's a rare enough case that the individual device models can reswap themselves. s/BE registers/BE DMA accessed structures/ -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [PATCH v2 16/23] i8259: PREP: Replace pic_intack_read with pic_read_irq
Hello Jan, Since the last round of pulls, PReP magically boots again. :) Am 07.10.2011 09:19, schrieb Jan Kiszka: There is nothing in the i8259 spec that justifies the special pic_intack_read. At least the Linux PREP kernels configure the PICs properly so that pic_read_irq returns identical values, and setting read_reg_select in PIC0 cannot be derived from any special i8259 mode. So switch ppc_prep to pic_read_irq and drop the now unused PIC code. Seems to resolve an existing XXX in the code. CC: Andreas Färber andreas.faer...@web.de Signed-off-by: Jan Kiszka jan.kis...@siemens.com Linux is the only thing we can boot with -M prep that I'm aware of, via -kernel. And the 40p series doesn't use this at all. I see no regression on my Debian Etch, so I'm fine with the simplification. Tested-by: Andreas Färber andreas.faer...@web.de diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c index d26049b..6427baa 100644 --- a/hw/ppc_prep.c +++ b/hw/ppc_prep.c @@ -130,7 +130,7 @@ static inline uint32_t _PPC_intack_read(target_phys_addr_t addr) uint32_t retval = 0; if ((addr 0xf) == 0) -retval = pic_intack_read(isa_pic); +retval = pic_read_irq(isa_pic); Mind to add the braces while touching it? Regards, Andreas
Re: [Qemu-devel] [PATCH] hw/9pfs: Handle Security model parsing
On Wed, Oct 12, 2011 at 09:05:50PM +0530, M. Mohan Kumar wrote: -- Regards, M. Mohan Kumar On Wednesday, October 12, 2011 01:58:00 PM Daniel P. Berrange wrote: On Wed, Oct 12, 2011 at 01:24:16PM +0530, M. Mohan Kumar wrote: Security model is needed only for 'local' fs driver. Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- fsdev/qemu-fsdev.c |6 + fsdev/qemu-fsdev.h |1 + hw/9pfs/virtio-9p-device.c | 47 ++- vl.c | 20 +++-- 4 files changed, 43 insertions(+), 31 deletions(-) --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -40,6 +40,7 @@ typedef struct FsTypeTable { typedef struct FsTypeEntry { char *fsdev_id; char *path; +char *fsdriver; char *security_model; int cache_flags; FileOperations *ops; diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index aac58ad..1846e36 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -83,29 +83,30 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) exit(1); } -if (!strcmp(fse-security_model, passthrough)) { -/* Files on the Fileserver set to client user credentials */ -s-ctx.fs_sm = SM_PASSTHROUGH; -s-ctx.xops = passthrough_xattr_ops; -} else if (!strcmp(fse-security_model, mapped)) { -/* Files on the fileserver are set to QEMU credentials. - * Client user credentials are saved in extended attributes. - */ -s-ctx.fs_sm = SM_MAPPED; -s-ctx.xops = mapped_xattr_ops; -} else if (!strcmp(fse-security_model, none)) { -/* - * Files on the fileserver are set to QEMU credentials. - */ -s-ctx.fs_sm = SM_NONE; -s-ctx.xops = none_xattr_ops; -} else { -fprintf(stderr, Default to security_model=none. You may want - enable advanced security model using -security option:\n\t security_model=passthrough\n\t -security_model=mapped\n); -s-ctx.fs_sm = SM_NONE; -s-ctx.xops = none_xattr_ops; +/* security models is needed only for local fs driver */ +if (!strcmp(fse-fsdriver, local)) { +if (!strcmp(fse-security_model, passthrough)) { +/* Files on the Fileserver set to client user credentials */ +s-ctx.fs_sm = SM_PASSTHROUGH; +s-ctx.xops = passthrough_xattr_ops; +} else if (!strcmp(fse-security_model, mapped)) { +/* Files on the fileserver are set to QEMU credentials. +* Client user credentials are saved in extended attributes. +*/ +s-ctx.fs_sm = SM_MAPPED; +s-ctx.xops = mapped_xattr_ops; +} else if (!strcmp(fse-security_model, none)) { +/* +* Files on the fileserver are set to QEMU credentials. +*/ +s-ctx.fs_sm = SM_NONE; +s-ctx.xops = none_xattr_ops; +} else { +fprintf(stderr, Invalid security_model %s specified.\n +Available security models are:\t +passthrough,mapped or none\n, fse-security_model); +exit(1); +} Are you sure there aren't use cases where people would like to choose between passthrough mapped, even when using the 'proxy' or 'handle' security drivers. Proxy FS driver is added to overcome the limit imposed by local + passthrough security model combination that needs qemu to be started by root user. Mapped and none secuiry model can be used by non root user also. So Proxy FS driver does not need any security model(its pass-through only) The Proxy FS driver does not need the security model, but if so desired it would be possible to choose to implement the security models. It just happens that the driver is hardcoded to only operate in 'passthrough' mode. I think that disabling the parsing of the 'security' parameter for non-local drivers is dangerous, because an application might think that the 'mapped' model was supported, but its parameter would get silently ignored. If the requested value is not supported, then the application should always be told about that. So, IMHO, it would be better to have logic such as: if (strcmp(security_mode, passthrough) == 0) { ... } else if (strcmp(security_model, mapped) == 0) { if (strcmp(fsdriver, local) != 0) { fprintf(stderr, security mode 'passthrough' is not supported by '%s'\n, fsdriver); exit(1); } ... } else if (strcmp(security_model, none) == 0) { if (strcmp(fsdriver, local) != 0) {
Re: [Qemu-devel] [PATCH v2 16/23] i8259: PREP: Replace pic_intack_read with pic_read_irq
On 2011-10-12 18:02, Andreas Färber wrote: Hello Jan, Since the last round of pulls, PReP magically boots again. :) Am 07.10.2011 09:19, schrieb Jan Kiszka: There is nothing in the i8259 spec that justifies the special pic_intack_read. At least the Linux PREP kernels configure the PICs properly so that pic_read_irq returns identical values, and setting read_reg_select in PIC0 cannot be derived from any special i8259 mode. So switch ppc_prep to pic_read_irq and drop the now unused PIC code. Seems to resolve an existing XXX in the code. CC: Andreas Färber andreas.faer...@web.de Signed-off-by: Jan Kiszka jan.kis...@siemens.com Linux is the only thing we can boot with -M prep that I'm aware of, via -kernel. And the 40p series doesn't use this at all. I see no regression on my Debian Etch, so I'm fine with the simplification. Tested-by: Andreas Färber andreas.faer...@web.de Thanks! diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c index d26049b..6427baa 100644 --- a/hw/ppc_prep.c +++ b/hw/ppc_prep.c @@ -130,7 +130,7 @@ static inline uint32_t _PPC_intack_read(target_phys_addr_t addr) uint32_t retval = 0; if ((addr 0xf) == 0) -retval = pic_intack_read(isa_pic); +retval = pic_read_irq(isa_pic); Mind to add the braces while touching it? Will do if I have to repost this series (I hope not...). Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 1/2] Add opt_set_bool function
Am 12.10.2011 09:53, schrieb M. Mohan Kumar: In addition to qemu_opt_set function, we need a function to set bool value also. Signed-off-by: M. Mohan Kumarmo...@in.ibm.com --- qemu-option.c | 35 +++ qemu-option.h |1 + 2 files changed, 36 insertions(+), 0 deletions(-) diff --git a/qemu-option.c b/qemu-option.c index 105d760..d6bc908 100644 --- a/qemu-option.c +++ b/qemu-option.c @@ -636,6 +636,41 @@ int qemu_opt_set(QemuOpts *opts, const char *name, const char *value) return 0; } +int qemu_opt_set_bool(QemuOpts *opts, const char *name, int val) Might it make sense to let qemu_opt_{get,set}_bool() use bool type? Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746, AG Nürnberg
[Qemu-devel] [PATCH] configure: Detect when glibc implements makecontext() to always fail
Improve the configure test for presence of ucontext functions by making linker warnings fatal; this allows us to detect when we are linked with a glibc which implements makecontext() to always return ENOSYS. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- Compiling on an Ubuntu Natty ARM host will hit this. (Anybody think we should clean up our configure tests so we can enable -Werror and -Wl,--fatal-warnings on all of them?) configure |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/configure b/configure index 9b4fe34..4d9d9e0 100755 --- a/configure +++ b/configure @@ -2549,9 +2549,12 @@ ucontext_coroutine=no if test $darwin != yes; then cat $TMPC EOF #include ucontext.h -int main(void) { makecontext(0, 0, 0); } +int main(void) { makecontext(0, 0, 0); return 0; } EOF - if compile_prog ; then + # Note that we enable fatal linker warnings to catch the + # glibc makecontext is not implemented and will always fail + # linker warning. + if compile_prog -Wl,--fatal-warnings ; then ucontext_coroutine=yes fi fi -- 1.7.4.1
[Qemu-devel] [PATCH v3] add add-cow file format
Add add-cow file format Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com --- Makefile.objs |1 + block.c|2 +- block.h|1 + block/add-cow.c| 412 block_int.h|1 + docs/specs/add-cow.txt | 45 ++ 6 files changed, 461 insertions(+), 1 deletions(-) create mode 100644 block/add-cow.c create mode 100644 docs/specs/add-cow.txt diff --git a/Makefile.objs b/Makefile.objs index c849e51..624c04c 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -31,6 +31,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o +block-nested-y += add-cow.o block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o diff --git a/block.c b/block.c index e865fab..c25241d 100644 --- a/block.c +++ b/block.c @@ -106,7 +106,7 @@ int is_windows_drive(const char *filename) #endif /* check if the path starts with protocol: */ -static int path_has_protocol(const char *path) +int path_has_protocol(const char *path) { #ifdef _WIN32 if (is_windows_drive(path) || diff --git a/block.h b/block.h index 16bfa0a..8b09f12 100644 --- a/block.h +++ b/block.h @@ -256,6 +256,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn); char *get_human_readable_size(char *buf, int buf_size, int64_t size); int path_is_absolute(const char *path); +int path_has_protocol(const char *path); void path_combine(char *dest, int dest_size, const char *base_path, const char *filename); diff --git a/block/add-cow.c b/block/add-cow.c new file mode 100644 index 000..d2538a2 --- /dev/null +++ b/block/add-cow.c @@ -0,0 +1,412 @@ +#include qemu-common.h +#include block_int.h +#include module.h + +#define ADD_COW_MAGIC (((uint64_t)'A' 56) | ((uint64_t)'D' 48) | \ +((uint64_t)'D' 40) | ((uint64_t)'_' 32) | \ +((uint64_t)'C' 24) | ((uint64_t)'O' 16) | \ +((uint64_t)'W' 8) | 0xFF) +#define ADD_COW_VERSION 1 + +typedef struct AddCowHeader { +uint64_t magic; +uint32_t version; +char backing_file[1024]; +char image_file[1024]; +uint64_t size; +} QEMU_PACKED AddCowHeader; + +typedef struct BDRVAddCowState { +char image_file[1024]; +BlockDriverState *image_hd; +uint8_t *bitmap; +uint64_t bitmap_size; +} BDRVAddCowState; + +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename) +{ +const AddCowHeader *header = (const void *)buf; + +if (be64_to_cpu(header-magic) == ADD_COW_MAGIC +be32_to_cpu(header-version) == ADD_COW_VERSION) { +return 100; +} else { +return 0; +} +} + +static int add_cow_open(BlockDriverState *bs, int flags) +{ +AddCowHeader header; +int64_t size; +char image_filename[1024]; +int image_flags; +BlockDriver *image_drv = NULL; +int ret; +BDRVAddCowState *state = (BDRVAddCowState *)(bs-opaque); + +ret = bdrv_pread(bs-file, 0, header, sizeof(header)); +if (ret != sizeof(header)) { +goto fail; +} + +if (be64_to_cpu(header.magic) != ADD_COW_MAGIC || +be32_to_cpu(header.version) != ADD_COW_VERSION) { +ret = -1; +goto fail; +} + +size = be64_to_cpu(header.size); +bs-total_sectors = size / BDRV_SECTOR_SIZE; + +QEMU_BUILD_BUG_ON(sizeof(state-image_file) != sizeof(header.image_file)); +pstrcpy(bs-backing_file, sizeof(bs-backing_file), +header.backing_file); +pstrcpy(state-image_file, sizeof(state-image_file), +header.image_file); + +state-bitmap_size = ((bs-total_sectors + 7) 3); +state-bitmap = g_malloc0(state-bitmap_size); + +ret = bdrv_pread(bs-file, sizeof(header), state-bitmap, +state-bitmap_size); +if (ret != state-bitmap_size) { +goto fail; +} + /* If there is a image_file, must be together with backing_file */ +if (state-image_file[0] != '\0') { +state-image_hd = bdrv_new(); +/* Relative to image or working dir, need discussion */ +if (path_has_protocol(state-image_file)) { +pstrcpy(image_filename, sizeof(image_filename), +state-image_file); +} else { +path_combine(image_filename, sizeof(image_filename), + bs-filename, state-image_file); +} + +image_drv = bdrv_find_format(raw); +image_flags = + (flags (~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING))) | BDRV_O_RDWR; +state-image_hd-keep_read_only = 0; + +ret = bdrv_open(state-image_hd, image_filename, image_flags, +
Re: [Qemu-devel] qemu-0.15.1 stable call for patches
On Wed, 2011-10-12 at 05:41 -0400, Brad wrote: On 26/09/11 9:16 AM, Justin M. Forbes wrote: With the current patch queue I would like to start getting qemu-0.15.1 stable into shape. With this in mind, the plan is to tag the release on Monday Oct 3rd. If you have patches pending for stable, now would be the time to send them. Please CC jmfor...@linuxtx.org if you can to ensure that I see them. Thanks, Justin Is there anywhere where we can see what's been pulled into the pending 0.15.1 code base so far since you don't seem to really post to the list about the stable branches? I typically reply when things get applied, though you can always check the git tree at: git://git.qemu.org/qemu-stable-0.15.git Justin
[Qemu-devel] [PATCH] ps2: migrate ledstate
Make the ps2 device track its ledstate so that we can migrate it. Otherwise it gets lost across migration, and spice-server gets confused about the actual keyboard state and sends bogus caps/scroll/num key events. Signed-off-by: Christophe Fergeau cferg...@redhat.com --- hw/ps2.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/hw/ps2.c b/hw/ps2.c index 24228c1..f19ea18 100644 --- a/hw/ps2.c +++ b/hw/ps2.c @@ -92,6 +92,7 @@ typedef struct { not the keyboard controller. */ int translate; int scancode_set; /* 1=XT, 2=AT, 3=PS/2 */ +int ledstate; } PS2KbdState; typedef struct { @@ -195,11 +196,17 @@ uint32_t ps2_read_data(void *opaque) return val; } +static void ps2_set_ledstate(PS2KbdState *s, int ledstate) +{ +s-ledstate = ledstate; +kbd_put_ledstate(ledstate); +} + static void ps2_reset_keyboard(PS2KbdState *s) { s-scan_enabled = 1; s-scancode_set = 2; -kbd_put_ledstate(0); +ps2_set_ledstate(s, 0); } void ps2_write_keyboard(void *opaque, int val) @@ -274,7 +281,7 @@ void ps2_write_keyboard(void *opaque, int val) s-common.write_cmd = -1; break; case KBD_CMD_SET_LEDS: -kbd_put_ledstate(val); +ps2_set_ledstate(s, val); ps2_queue(s-common, KBD_REPLY_ACK); s-common.write_cmd = -1; break; @@ -563,6 +570,7 @@ static int ps2_kbd_post_load(void* opaque, int version_id) if (version_id == 2) s-scancode_set=2; +kbd_put_ledstate(s-ledstate); return 0; } @@ -577,6 +585,7 @@ static const VMStateDescription vmstate_ps2_keyboard = { VMSTATE_INT32(scan_enabled, PS2KbdState), VMSTATE_INT32(translate, PS2KbdState), VMSTATE_INT32_V(scancode_set, PS2KbdState,3), +VMSTATE_INT32(ledstate, PS2KbdState), VMSTATE_END_OF_LIST() } }; -- 1.7.6.4
Re: [Qemu-devel] [PATCH] ps2: migrate ledstate
On 10/12/2011 11:35 AM, Christophe Fergeau wrote: Make the ps2 device track its ledstate so that we can migrate it. Otherwise it gets lost across migration, and spice-server gets confused about the actual keyboard state and sends bogus caps/scroll/num key events. Signed-off-by: Christophe Fergeaucferg...@redhat.com --- hw/ps2.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/hw/ps2.c b/hw/ps2.c index 24228c1..f19ea18 100644 --- a/hw/ps2.c +++ b/hw/ps2.c @@ -92,6 +92,7 @@ typedef struct { not the keyboard controller. */ int translate; int scancode_set; /* 1=XT, 2=AT, 3=PS/2 */ +int ledstate; } PS2KbdState; typedef struct { @@ -195,11 +196,17 @@ uint32_t ps2_read_data(void *opaque) return val; } +static void ps2_set_ledstate(PS2KbdState *s, int ledstate) +{ +s-ledstate = ledstate; +kbd_put_ledstate(ledstate); +} + static void ps2_reset_keyboard(PS2KbdState *s) { s-scan_enabled = 1; s-scancode_set = 2; -kbd_put_ledstate(0); +ps2_set_ledstate(s, 0); } void ps2_write_keyboard(void *opaque, int val) @@ -274,7 +281,7 @@ void ps2_write_keyboard(void *opaque, int val) s-common.write_cmd = -1; break; case KBD_CMD_SET_LEDS: -kbd_put_ledstate(val); +ps2_set_ledstate(s, val); ps2_queue(s-common, KBD_REPLY_ACK); s-common.write_cmd = -1; break; @@ -563,6 +570,7 @@ static int ps2_kbd_post_load(void* opaque, int version_id) if (version_id == 2) s-scancode_set=2; +kbd_put_ledstate(s-ledstate); return 0; } @@ -577,6 +585,7 @@ static const VMStateDescription vmstate_ps2_keyboard = { VMSTATE_INT32(scan_enabled, PS2KbdState), VMSTATE_INT32(translate, PS2KbdState), VMSTATE_INT32_V(scancode_set, PS2KbdState,3), +VMSTATE_INT32(ledstate, PS2KbdState), If you're adding a field, you need to bump the version. Regards, Anthony Liguori VMSTATE_END_OF_LIST() } };
Re: [Qemu-devel] [PATCH] ps2: migrate ledstate
Hey, On Wed, Oct 12, 2011 at 06:35:28PM +0200, Christophe Fergeau wrote: Make the ps2 device track its ledstate so that we can migrate it. Otherwise it gets lost across migration, and spice-server gets confused about the actual keyboard state and sends bogus caps/scroll/num key events. This patch adds a ledstate to the PS2 keyboard driver as an alternative to the original patch. I forgot to mention in the changelog entry that this fixes https://bugzilla.redhat.com/show_bug.cgi?id=729294 One thing that could be improved after this patch is that the ledstate is now present both in spice-input and hw/ps2.c, but I couldn't find an easy way to get the ps2 ledstate from spice-input so I left things this way for now. Christophe pgpCVtzFk8rCm.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH] hw/9pfs: Handle Security model parsing
On Wed, 12 Oct 2011 09:28:00 +0100, Daniel P. Berrange berra...@redhat.com wrote: On Wed, Oct 12, 2011 at 01:24:16PM +0530, M. Mohan Kumar wrote: Security model is needed only for 'local' fs driver. Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- fsdev/qemu-fsdev.c |6 + fsdev/qemu-fsdev.h |1 + hw/9pfs/virtio-9p-device.c | 47 ++- vl.c | 20 +++-- 4 files changed, 43 insertions(+), 31 deletions(-) * Files on the fileserver are set to QEMU credentials. +*/ +s-ctx.fs_sm = SM_NONE; +s-ctx.xops = none_xattr_ops; +} else { +fprintf(stderr, Invalid security_model %s specified.\n +Available security models are:\t +passthrough,mapped or none\n, fse-security_model); +exit(1); +} Are you sure there aren't use cases where people would like to choose between passthrough mapped, even when using the 'proxy' or 'handle' security drivers. Currently handle fs driver requires CAP_DAC_READ_SEARCH and if qemu is not going to run with specific capabilities this implies root privileges. So handle fs driver doesn't do the mapping required by different security model. Proxy fs driver is enabling us to run file system operations as root. So even for that we don't need mapped security model. Even if we want to store file attributes in xattr with proxy fs driver, that will go as a proxy's argument not as -fsdev argument. Proxy also don't require export path name. But that is another patch. Both of the security models seem pretty generally useful to me, regardless of the driver type. -aneesh
[Qemu-devel] [ANNOUNCE] QEMU 0.15.1.tar.gz is available
The QEMU team is pleased to announce the availability of the 0.15.1 stable release. Download instructions are available at http://wiki.qemu.org/Download A detailed change log is available at http://wiki.qemu.org/Changelog/0.15 On behalf of the QEMU team, I'd like to thank everyone who contributed to make this release happen! Regards, Justin M. Forbes
Re: [Qemu-devel] [PATCH RFC V1 01/11] Introduce HostPCIDevice to access a pci device on the host.
On Tue, Oct 4, 2011 at 19:21, Jan Kiszka jan.kis...@web.de wrote: This wasn't run through checkpatch.pl, I bet. On 2011-10-04 16:51, Anthony PERARD wrote: Signed-off-by: Anthony PERARD anthony.per...@citrix.com --- hw/host-pci-device.c | 192 ++ hw/host-pci-device.h | 36 + 2 files changed, 228 insertions(+), 0 deletions(-) create mode 100644 hw/host-pci-device.c create mode 100644 hw/host-pci-device.h diff --git a/hw/host-pci-device.c b/hw/host-pci-device.c new file mode 100644 index 000..b3f2899 --- /dev/null +++ b/hw/host-pci-device.c @@ -0,0 +1,192 @@ +#include qemu-common.h +#include host-pci-device.h + +static int path_to(const HostPCIDevice *d, + const char *name, char *buf, ssize_t size) +{ + return snprintf(buf, size, /sys/bus/pci/devices/%04x:%02x:%02x.%x/%s, + d-domain, d-bus, d-dev, d-func, name); +} + +static int get_resource(HostPCIDevice *d) +{ + int i; + FILE *f; + char path[PATH_MAX]; + unsigned long long start, end, flags, size; + + path_to(d, resource, path, sizeof (path)); + f = fopen(path, r); + if (!f) { + fprintf(stderr, Error: Can't open %s: %s\n, path, strerror(errno)); + return -1; + } + + for (i = 0; i PCI_NUM_REGIONS; i++) { + if (fscanf(f, %llx %llx %llx, start, end, flags) != 3) { + fprintf(stderr, Error: Syntax error in %s\n, path); + break; + } + if (start) { + size = end - start + 1; + } else { + size = 0; + } + + flags = 0xf; No magic numbers please. It also looks a bit strange to me: It's the resource type encoded in the second byte? Aren't you interested in it? Actually, we are interessted to have the BAR with the different flags (IO/MEM, prefetch, size) like the value in the config space. Because the base_address value will be given to the VM (by the function pt_bar_reg_read). But I can later merge the values (start | (flags ~pci_base_address_io/mem_mask)), and have the right value to give back. So here, I will keep seperate the base address, and the flags. + + if (i PCI_ROM_SLOT) { + d-base_addr[i] = start | flags; + d-size[i] = size; + } else { + d-rom_base_addr = start | flags; + d-rom_size = size; + } + } + + fclose(f); + return 0; +} + +static unsigned long get_value(HostPCIDevice *d, const char *name) +{ + char path[PATH_MAX]; + FILE *f; + unsigned long value; + + path_to(d, name, path, sizeof (path)); + f = fopen(path, r); + if (!f) { + fprintf(stderr, Error: Can't open %s: %s\n, path, strerror(errno)); + return -1; + } + if (fscanf(f, %lx\n, value) != 1) { + fprintf(stderr, Error: Syntax error in %s\n, path); + value = -1; + } + fclose(f); + return value; +} + +static int pci_dev_is_virtfn(HostPCIDevice *d) +{ + int rc; + char path[PATH_MAX]; + struct stat buf; + + path_to(d, physfn, path, sizeof (path)); + rc = !stat(path, buf); + + return rc; +} + +static int host_pci_config_fd(HostPCIDevice *d) [ We will also need the reverse: pass in open file descriptors that HostPCIDevice should use. Can be added later. ] +{ + char path[PATH_MAX]; + + if (d-config_fd 0) { + path_to(d, config, path, sizeof (path)); + d-config_fd = open(path, O_RDWR); + if (d-config_fd 0) { + fprintf(stderr, HostPCIDevice: Can not open '%s': %s\n, + path, strerror(errno)); + } + } + return d-config_fd; +} +static int host_pci_config_read(HostPCIDevice *d, int pos, void *buf, int len) +{ + int fd = host_pci_config_fd(d); + int res = 0; + + res = pread(fd, buf, len, pos); + if (res 0) { + fprintf(stderr, host_pci_config: read failed: %s (fd: %i)\n, + strerror(errno), fd); + return -1; + } + return res; +} +static int host_pci_config_write(HostPCIDevice *d, + int pos, const void *buf, int len) +{ + int fd = host_pci_config_fd(d); + int res = 0; + + res = pwrite(fd, buf, len, pos); + if (res 0) { + fprintf(stderr, host_pci_config: write failed: %s\n, + strerror(errno)); + return -1; + } + return res; +} + +uint8_t host_pci_read_byte(HostPCIDevice *d, int pos) +{ + uint8_t buf; + host_pci_config_read(d, pos, buf, 1); + return buf; +} +uint16_t host_pci_read_word(HostPCIDevice *d, int pos) +{ + uint16_t buf; + host_pci_config_read(d, pos, buf, 2); + return le16_to_cpu(buf); +} +uint32_t host_pci_read_long(HostPCIDevice *d, int pos) +{ + uint32_t buf; + host_pci_config_read(d, pos, buf, 4); + return le32_to_cpu(buf); +} +int
[Qemu-devel] [PATCH 0/6] trace: Add support for trace events grouping
This series add support for trace events grouping. The state of a given group of trace events can be queried or changed in bulk by the following monitor commands: * info trace-groups View available trace event groups and their state. State 1 means enabled, state 0 means disabled. * trace-group NAME on|off Enable/disable a given trace event group. A group of trace events can also be enabled in early running stage through adding its group name prefixed with group: to trace events list file which is passed to -trace events. Mark Wu (6): trace: Make tracetool generate a group list trace: Add HMP monitor commands for trace events group trace: Add trace events group implementation in the backend simple trace: Add trace events group implementation in the backend stderr trace: Enable -trace events argument to control initial state of groups trace: Update doc for trace events group docs/tracing.txt | 29 ++-- hmp-commands.hx | 14 monitor.c | 22 scripts/tracetool | 94 +++- trace-events | 88 + trace/control.c | 17 + trace/control.h |9 + trace/default.c | 15 trace/simple.c| 30 + trace/simple.h|7 trace/stderr.c| 32 ++ trace/stderr.h|7 12 files changed, 359 insertions(+), 5 deletions(-)
[Qemu-devel] [PATCH 2/6] trace: Add HMP monitor commands for trace events group
Add monitor commands 'trace-group NAME on|off' and 'info trace-groups' to set and query the state of a given group of trace events. Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com --- hmp-commands.hx | 14 ++ monitor.c | 22 ++ trace/control.h |9 + trace/default.c | 15 +++ 4 files changed, 60 insertions(+), 0 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 9e1cca8..b415616 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -194,6 +194,20 @@ STEXI changes status of a trace event ETEXI +{ +.name = trace-group, +.args_type = name:s,option:b, +.params = name on|off, +.help = changes status of a specific trace event, +.mhandler.cmd = do_trace_event_group_set_state, +}, + +STEXI +@item trace-group +@findex trace-group +changes status of a group of trace events +ETEXI + #if defined(CONFIG_SIMPLE_TRACE) { .name = trace-file, diff --git a/monitor.c b/monitor.c index 88d8228..0b8ca09 100644 --- a/monitor.c +++ b/monitor.c @@ -605,6 +605,17 @@ static void do_trace_event_set_state(Monitor *mon, const QDict *qdict) } } +static void do_trace_event_group_set_state(Monitor *mon, const QDict *qdict) +{ +const char *gp_name = qdict_get_str(qdict, name); +bool new_state = qdict_get_bool(qdict, option); +int ret = trace_event_group_set_state(gp_name, new_state); + +if (!ret) { +monitor_printf(mon, unknown group name \%s\\n, gp_name); +} +} + #ifdef CONFIG_SIMPLE_TRACE static void do_trace_file(Monitor *mon, const QDict *qdict) { @@ -1010,6 +1021,10 @@ static void do_trace_print_events(Monitor *mon) trace_print_events((FILE *)mon, monitor_fprintf); } +static void do_trace_print_groups(Monitor *mon) +{ +trace_print_groups((FILE *)mon, monitor_fprintf); +} /** * do_quit(): Quit QEMU execution */ @@ -3170,6 +3185,13 @@ static const mon_cmd_t info_cmds[] = { .mhandler.info = do_trace_print_events, }, { +.name = trace-groups, +.args_type = , +.params = , +.help = show available trace-groups their state, +.mhandler.info = do_trace_print_groups, +}, +{ .name = NULL, }, }; diff --git a/trace/control.h b/trace/control.h index 2acaa42..97ecce7 100644 --- a/trace/control.h +++ b/trace/control.h @@ -15,12 +15,21 @@ /** Print the state of all events. */ void trace_print_events(FILE *stream, fprintf_function stream_printf); + +/** Print the state of all groups. */ +void trace_print_groups(FILE *stream, fprintf_function stream_printf); + /** Set the state of an event. * * @return Whether the state changed. */ bool trace_event_set_state(const char *name, bool state); +/** Set the state of a group of events. + * + * @return Whether the state changed. + */ +bool trace_event_group_set_state(const char *name, bool state); /** Initialize the tracing backend. * diff --git a/trace/default.c b/trace/default.c index c9b27a2..c7e70c7 100644 --- a/trace/default.c +++ b/trace/default.c @@ -18,6 +18,14 @@ void trace_print_events(FILE *stream, fprintf_function stream_printf) operation not supported with the current backend\n); } +void trace_print_groups(FILE *stream, fprintf_function stream_printf) +{ +fprintf(stderr, warning: +cannot print the trace groups with the current backend\n); +stream_printf(stream, error: + operation not supported with the current backend\n); +} + bool trace_event_set_state(const char *name, bool state) { fprintf(stderr, warning: @@ -25,6 +33,13 @@ bool trace_event_set_state(const char *name, bool state) return false; } +bool trace_event_group_set_state(const char *gp_name, bool state) +{ +fprintf(stderr, warning: +cannot set the state of a trace group with the current backend\n); +return false; +} + bool trace_backend_init(const char *events, const char *file) { if (events) { -- 1.7.1
[Qemu-devel] [PATCH 1/6] trace: Make tracetool generate a group list
Each trace events group starts with a line containing group_start: GroupName and end with a line containing group_end. The range of a trace events group is determined by the tracetool script when it processes the trace-events file. Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com --- scripts/tracetool | 94 +++- trace-events | 88 + 2 files changed, 180 insertions(+), 2 deletions(-) diff --git a/scripts/tracetool b/scripts/tracetool index 4c9951d..3b4ca41 100755 --- a/scripts/tracetool +++ b/scripts/tracetool @@ -166,6 +166,82 @@ linetoc_end_nop() return } +linetoh_begin_group() +{ +group_num=0 +} + +linetoh_end_group() +{ +cat EOF +#define NR_TRACE_EVENT_GROUPS $group_num +extern TraceEventGroup trace_group_list[NR_TRACE_EVENT_GROUPS]; +EOF +} + +linetoc_begin_group() +{ +cat EOF trace-groups + +TraceEventGroup trace_group_list[] = { + +EOF +group_num=0 +} + +linetoc_end_group() +{ +cat EOF trace-groups +}; +EOF +cat trace-groups +rm -f trace-groups +} + +linetoc_group() +{ +if echo $str|grep -q group_start; then +gp_name=${1##*group_start:} + if ! test -z $gp_name; then +start=$((${backend}_event_num)) + fi +elif echo $str|grep -q group_end; then + stop=$((${backend}_event_num - 1)) + cat EOF trace-groups +{.gp_name = $gp_name, .state = 0, .start = $start, .end = $stop}, + +EOF + group_num=$((group_num + 1)) +fi +} + +linetoc_group_simple() +{ +linetoc_group $1 +} + +linetoc_group_stderr() +{ +linetoc_group $1 +} + +linetoh_group() +{ +if echo $str|grep -q group_end; then +group_num=$((group_num + 1)) +fi +} + +linetoh_group_simple() +{ +linetoh_group +} + +linetoh_group_stderr() +{ +linetoh_group +} + linetoh_begin_simple() { cat EOF @@ -173,8 +249,10 @@ linetoh_begin_simple() EOF simple_event_num=0 +linetoh_begin_group } + cast_args_to_uint64_t() { local arg @@ -206,12 +284,14 @@ EOF simple_event_num=$((simple_event_num + 1)) } + linetoh_end_simple() { cat EOF #define NR_TRACE_EVENTS $simple_event_num extern TraceEvent trace_list[NR_TRACE_EVENTS]; EOF +linetoh_end_group } linetoc_begin_simple() @@ -222,7 +302,7 @@ linetoc_begin_simple() TraceEvent trace_list[] = { EOF simple_event_num=0 - +linetoc_begin_group } linetoc_simple() @@ -240,6 +320,7 @@ linetoc_end_simple() cat EOF }; EOF +linetoc_end_group } #STDERR @@ -285,6 +366,7 @@ linetoh_end_stderr() cat EOF #define NR_TRACE_EVENTS $stderr_event_num EOF +linetoh_end_group } linetoc_begin_stderr() @@ -295,6 +377,7 @@ linetoc_begin_stderr() TraceEvent trace_list[] = { EOF stderr_event_num=0 +linetoc_begin_group } linetoc_stderr() @@ -312,6 +395,7 @@ linetoc_end_stderr() cat EOF }; EOF +linetoc_end_group } #END OF STDERR @@ -523,12 +607,18 @@ convert() begin=lineto$1_begin_$backend process_line=lineto$1_$backend end=lineto$1_end_$backend +group=lineto$1_group_$backend $begin while read -r str; do # Skip comments and empty lines -test -z ${str%%#*} continue +if test -z ${str%%#*}; then + if [ $backend == simple -o $backend == stderr ] [ ! `echo $str|grep -q ^# group_` ]; then + $group $str + fi + continue + fi echo # Process the line. The nop backend handles disabled lines. diff --git a/trace-events b/trace-events index a31d9aa..cbd4b58 100644 --- a/trace-events +++ b/trace-events @@ -26,6 +26,7 @@ # The format-string should be a sprintf()-compatible format string. # qemu-malloc.c +# group_start:qemu-memory g_malloc(size_t size, void *ptr) size %zu ptr %p g_realloc(void *ptr, size_t size, void *newptr) ptr %p size %zu newptr %p g_free(void *ptr) ptr %p @@ -34,8 +35,10 @@ g_free(void *ptr) ptr %p qemu_memalign(size_t alignment, size_t size, void *ptr) alignment %zu size %zu ptr %p qemu_vmalloc(size_t size, void *ptr) size %zu ptr %p qemu_vfree(void *ptr) ptr %p +# group_end # hw/virtio.c +# group_start:virtio virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) vq %p elem %p len %u idx %u virtqueue_flush(void *vq, unsigned int count) vq %p count %u virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int out_num) vq %p elem %p in_num %u out_num %u @@ -43,6 +46,7 @@ virtio_queue_notify(void *vdev, int n, void *vq) vdev %p n %d vq %p virtio_irq(void *vq) vq %p virtio_notify(void *vdev, void *vq) vdev %p vq %p virtio_set_status(void *vdev, uint8_t val) vdev %p val %u +# group_end # hw/virtio-serial-bus.c virtio_serial_send_control_event(unsigned int port, uint16_t event, uint16_t value) port %u, event %u, value %u @@ -51,11 +55,14 @@ virtio_serial_handle_control_message(uint16_t event, uint16_t value)
[Qemu-devel] [PATCH 3/6] trace: Add trace events group implementation in the backend simple
Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com --- trace/simple.c | 30 ++ trace/simple.h |7 +++ 2 files changed, 37 insertions(+), 0 deletions(-) diff --git a/trace/simple.c b/trace/simple.c index b639dda..7aa4c0b 100644 --- a/trace/simple.c +++ b/trace/simple.c @@ -321,6 +321,16 @@ void trace_print_events(FILE *stream, fprintf_function stream_printf) } } +void trace_print_groups(FILE *stream, fprintf_function stream_printf) +{ +unsigned int i; + +for (i = 0; i NR_TRACE_EVENT_GROUPS; i++) { +stream_printf(stream, %s [GROUP ID %u] : state %u\n, + trace_group_list[i].gp_name, i, + trace_group_list[i].state); +} +} bool trace_event_set_state(const char *name, bool state) { unsigned int i; @@ -334,6 +344,26 @@ bool trace_event_set_state(const char *name, bool state) return false; } +bool trace_event_group_set_state(const char *gp_name, bool state) +{ +unsigned int i; +unsigned int j; +TraceEventGroup *group; + +for (i = 0; i NR_TRACE_EVENT_GROUPS; i++) { + group = trace_group_list[i]; +if (!strcmp(group-gp_name, gp_name)) { +group-state = state; + + for (j = group-start; j = group-end; j++) { +trace_list[j].state = state; +} +return true; +} +} +return false; +} + /* Helper function to create a thread with signals blocked. Use glib's * portable threads since QEMU abstractions cannot be used due to reentrancy in * the tracer. Also note the signal masking on POSIX hosts so that the thread diff --git a/trace/simple.h b/trace/simple.h index 466e75b..cf119f3 100644 --- a/trace/simple.h +++ b/trace/simple.h @@ -22,6 +22,13 @@ typedef struct { bool state; } TraceEvent; +typedef struct { +const char *gp_name; +bool state; +int start; +int end; +} TraceEventGroup; + void trace0(TraceEventID event); void trace1(TraceEventID event, uint64_t x1); void trace2(TraceEventID event, uint64_t x1, uint64_t x2); -- 1.7.1
[Qemu-devel] [PATCH 5/6] trace: Enable -trace events argument to control initial state of groups
A group of trace events can be enabled in early running stage through adding its group name prefixed with group: to trace events list file which is passed to -trace events. Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com --- trace/control.c | 17 + 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/trace/control.c b/trace/control.c index 4c5527d..5043c83 100644 --- a/trace/control.c +++ b/trace/control.c @@ -23,10 +23,27 @@ void trace_backend_init_events(const char *fname) exit(1); } char line_buf[1024]; +char *group; + while (fgets(line_buf, sizeof(line_buf), fp)) { size_t len = strlen(line_buf); if (len 1) { /* skip empty lines */ line_buf[len - 1] = '\0'; + group = strstr(line_buf, group:); + if (group != NULL) { + group += strlen(group:); +if (group == NULL) { + fprintf(stderr, error: empty group name\n); + exit(1); +} +if (!trace_event_group_set_state(group, true)) { +fprintf(stderr, error: trace event group '%s' +does not exist\n, group); +exit(1); +} +continue; +} + if (!trace_event_set_state(line_buf, true)) { fprintf(stderr, error: trace event '%s' does not exist\n, line_buf); -- 1.7.1
[Qemu-devel] [PATCH 6/6] trace: Update doc for trace events group
Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com --- docs/tracing.txt | 29 ++--- 1 files changed, 26 insertions(+), 3 deletions(-) diff --git a/docs/tracing.txt b/docs/tracing.txt index 95ca16c..2bd4824 100644 --- a/docs/tracing.txt +++ b/docs/tracing.txt @@ -16,6 +16,7 @@ for debugging, profiling, and observing execution. echo bdrv_aio_readv/tmp/events echo bdrv_aio_writev /tmp/events + echo group:virtio /tmp/events 3. Run the virtual machine to produce a trace file: @@ -53,6 +54,16 @@ source code like this: return ptr; } +== Trace events group == + +Trace events group is used to represent a set of trace events added for the same +component or feature. Each trace events group starts with a line containing +group_start:GroupName and end with a line containing group_end. The range of +a trace events group is determined by the tracetool script when it processes +the trace-events file. You can change the state of a group of trace events at +one time through trace events group. + + === Declaring trace events === The tracetool script produces the trace.h header file which is included by @@ -106,8 +117,8 @@ respectively. This ensures portability between 32- and 64-bit platforms. == Generic interface and monitor commands == -You can programmatically query and control the dynamic state of trace events -through a backend-agnostic interface: +You can programmatically query and control the dynamic state of trace events or +groups through a backend-agnostic interface: * trace_print_events @@ -122,6 +133,11 @@ through a backend-agnostic interface: [...] trace_event_set_state(virtio_irq, false); /* disable */ +* trace_print_groups + +* trace_event_group_set_state + Enables or disables a given group of trace events at runtime inside QEMU. + Note that some of the backends do not provide an implementation for this interface, in which case QEMU will just print a warning. @@ -131,12 +147,19 @@ This functionality is also provided through monitor commands: View available trace events and their state. State 1 means enabled, state 0 means disabled. +* info trace-groups + View available trace event groups and their state. State 1 means enabled, state 0 + means disabled. + * trace-event NAME on|off Enable/disable a given trace event. +* trace-group NAME on|off + Enable/disable a given trace event group. + The -trace events=file command line argument can be used to enable the events listed in file from the very beginning of the program. This file must -contain one event name per line. +contain one event name or one group name, which is prefixed by group:. == Trace backends == -- 1.7.1
[Qemu-devel] [PATCHv2] ps2: migrate ledstate
Make the ps2 device track its ledstate so that we can migrate it. Otherwise it gets lost across migration, and spice-server gets confused about the actual keyboard state and sends bogus caps/scroll/num key events. This fixes RH bug #729294 Signed-off-by: Christophe Fergeau cferg...@redhat.com --- hw/ps2.c | 15 --- 1 files changed, 12 insertions(+), 3 deletions(-) diff --git a/hw/ps2.c b/hw/ps2.c index 24228c1..3a88681 100644 --- a/hw/ps2.c +++ b/hw/ps2.c @@ -92,6 +92,7 @@ typedef struct { not the keyboard controller. */ int translate; int scancode_set; /* 1=XT, 2=AT, 3=PS/2 */ +int ledstate; } PS2KbdState; typedef struct { @@ -195,11 +196,17 @@ uint32_t ps2_read_data(void *opaque) return val; } +static void ps2_set_ledstate(PS2KbdState *s, int ledstate) +{ +s-ledstate = ledstate; +kbd_put_ledstate(ledstate); +} + static void ps2_reset_keyboard(PS2KbdState *s) { s-scan_enabled = 1; s-scancode_set = 2; -kbd_put_ledstate(0); +ps2_set_ledstate(s, 0); } void ps2_write_keyboard(void *opaque, int val) @@ -274,7 +281,7 @@ void ps2_write_keyboard(void *opaque, int val) s-common.write_cmd = -1; break; case KBD_CMD_SET_LEDS: -kbd_put_ledstate(val); +ps2_set_ledstate(s, val); ps2_queue(s-common, KBD_REPLY_ACK); s-common.write_cmd = -1; break; @@ -544,7 +551,7 @@ static void ps2_mouse_reset(void *opaque) static const VMStateDescription vmstate_ps2_common = { .name = PS2 Common State, -.version_id = 3, +.version_id = 4, .minimum_version_id = 2, .minimum_version_id_old = 2, .fields = (VMStateField []) { @@ -563,6 +570,7 @@ static int ps2_kbd_post_load(void* opaque, int version_id) if (version_id == 2) s-scancode_set=2; +kbd_put_ledstate(s-ledstate); return 0; } @@ -577,6 +585,7 @@ static const VMStateDescription vmstate_ps2_keyboard = { VMSTATE_INT32(scan_enabled, PS2KbdState), VMSTATE_INT32(translate, PS2KbdState), VMSTATE_INT32_V(scancode_set, PS2KbdState,3), +VMSTATE_INT32(ledstate, PS2KbdState), VMSTATE_END_OF_LIST() } }; -- 1.7.6.4
[Qemu-devel] [PATCH 4/6] trace: Add trace events group implementation in the backend stderr
Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com --- trace/stderr.c | 32 trace/stderr.h |7 +++ 2 files changed, 39 insertions(+), 0 deletions(-) diff --git a/trace/stderr.c b/trace/stderr.c index 7107c4a..c409840 100644 --- a/trace/stderr.c +++ b/trace/stderr.c @@ -12,6 +12,17 @@ void trace_print_events(FILE *stream, fprintf_function stream_printf) } } +void trace_print_groups(FILE *stream, fprintf_function stream_printf) +{ +unsigned int i; + +for (i = 0; i NR_TRACE_EVENT_GROUPS; i++) { +stream_printf(stream, %s [GROUP ID %u] : state %u\n, + trace_group_list[i].gp_name, i, + trace_group_list[i].state); +} +} + bool trace_event_set_state(const char *name, bool state) { unsigned int i; @@ -25,6 +36,27 @@ bool trace_event_set_state(const char *name, bool state) return false; } +bool trace_event_group_set_state(const char *gp_name, bool state) +{ +unsigned int i; +unsigned int j; +TraceEventGroup *group; + +for (i = 0; i NR_TRACE_EVENT_GROUPS; i++) { + + group = trace_group_list[i]; +if (!strcmp(group-gp_name, gp_name)) { +group-state = state; + +for (j = group-start; j = group-end; j++) { +trace_list[j].state = state; +} +return true; +} +} +return false; +} + bool trace_backend_init(const char *events, const char *file) { if (file) { diff --git a/trace/stderr.h b/trace/stderr.h index d575b61..45499f6 100644 --- a/trace/stderr.h +++ b/trace/stderr.h @@ -8,4 +8,11 @@ typedef struct { bool state; } TraceEvent; +typedef struct { +const char *gp_name; +bool state; +int start; +int end; +} TraceEventGroup; + #endif /* ! TRACE_STDERR_H */ -- 1.7.1
[Qemu-devel] 2 MiB alignment in qemu_vmalloc()
Hello Avi, commit 36b58628 increased the alignment for some large memory blocks (typically the system RAM) to 2 MiB (QEMU_VMALLOC_ALIGN) on x86_64 Linux hosts. As far as I know, this was only required for KVM. There is a bad side effect of this increase: the Valgrind tool only supports an alignment of up to 1 MiB. It aborts execution with current QEMU for any target (even non-KVM targets). It might be possible to modify Valgrind (as far as I know this is already discussed), and of course I can also patch my local QEMU. Nevertheless, I think the alignment should be reduced again when there is no KVM support or KVM is disabled. Maybe the large alignment has other unwanted side effects. The code is in oslib-posix.c (target independent) and needs something like kvm_enabled() (currently a macro). What would you suggest? Maybe you can provide a patch. Kind regards, Stefan
Re: [Qemu-devel] Is realview-pb-a8 fully supported ?
On Wed, Oct 12, 2011 at 11:23 AM, Peter Maydell peter.mayd...@linaro.org wrote: On 10 October 2011 14:48, Francis Moreau francis.m...@gmail.com wrote: On Mon, Oct 10, 2011 at 10:42 AM, Peter Maydell peter.mayd...@linaro.org wrote: On 10 October 2011 08:35, Francis Moreau francis.m...@gmail.com wrote: I noticed another point for the realview platofrm: if I boot with -M 512, it works however if I set -M 256 then it doesn't. Perhaps your kernel is configured to load in the higher 256MB address range hmm which options do you have in mind ? Hmm, I thought there was an option for this but I can't find it in the config, so I must have been misremembering somehow. When I say it doesn't work, it means that nothing happen when starting qemu: no trace, it looks like it's running an infinite loop. Not even Uncompressing the kernel ? yes uncompressing the kernel works. Below is what I'm seeing exactly: Uncompressing Linux... done, booting the kernel. If you want to track down what's going on then you'll need to connect an ARM gdb up to qemu and single step through the boot process, I'm afraid. Ok but that means that I understand the code I'm going to step through :-/ BTW I'm wondering which kernel source I should use to build kernels for such plateforms (realview, vexpress, versatile) ? I'm currently using the source from kernel.org (well similar since this server seems really dead). but I'm not sure if it's a good idea... I think the mainline kernel sources should in theory work (in particular if they work with 512MB then that's a good sign...) but I'm not a kernel expert; mostly I use other peoples' prebuilt ones. ok, but wouldn't you recommend to use the kernel from Linaro ? ;) Thanks -- Francis
Re: [Qemu-devel] 2 MiB alignment in qemu_vmalloc()
On 12.10.2011, at 20:05, Stefan Weil wrote: Hello Avi, commit 36b58628 increased the alignment for some large memory blocks (typically the system RAM) to 2 MiB (QEMU_VMALLOC_ALIGN) on x86_64 Linux hosts. As far as I know, this was only required for KVM. There is a bad side effect of this increase: the Valgrind tool only supports an alignment of up to 1 MiB. It aborts execution with current QEMU for any target (even non-KVM targets). It might be possible to modify Valgrind (as far as I know this is already discussed), and of course I can also patch my local QEMU. Nevertheless, I think the alignment should be reduced again when there is no KVM support or KVM is disabled. Maybe the large alignment has other unwanted side effects. Actually, I'd much rather prefer to keep the differences between KVM and non-KVM low here. THP can potentially also work on TCG, so the alignment isn't completely moot here. Though it's certainly a lot less useful, as code isn't directly executed from there and we the rest of the overhead is a lot higher either way (especially the softmmu one). Either way, why does valgrind break when we enforce big alignment? That really sounds more like a valgrind bug than anything else. Memalign is there for exactly that reason, no? Alex
Re: [Qemu-devel] 2 MiB alignment in qemu_vmalloc()
Am 12.10.2011 22:02, schrieb Alexander Graf: Actually, I'd much rather prefer to keep the differences between KVM and non-KVM low here. THP can potentially also work on TCG, so the alignment isn't completely moot here. Though it's certainly a lot less useful, as code isn't directly executed from there and we the rest of the overhead is a lot higher either way (especially the softmmu one). Either way, why does valgrind break when we enforce big alignment? That really sounds more like a valgrind bug than anything else. Memalign is there for exactly that reason, no? Alex Actually, there is even a difference between KVM (x86_64) and KVM (non x86_64) in the current code: only x86_64 hosts use the 2 MiB alignment. Valgrind breaks because it has an assertion which limits the alignment. This limitation was already discussed in 2008 and still exists in latest Ubuntu and other distributions (and also in latest Valgrind SVN trunk). Therefore I don't expect that it will be fixed soon. See these bug reports, for example: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=489297 http://bugs.kde.org/show_bug.cgi?id=203877 Cheers, Stefan
Re: [Qemu-devel] Is realview-pb-a8 fully supported ?
On 12 October 2011 20:39, Francis Moreau francis.m...@gmail.com wrote: On Wed, Oct 12, 2011 at 11:23 AM, Peter Maydell peter.mayd...@linaro.org wrote: I think the mainline kernel sources should in theory work (in particular if they work with 512MB then that's a good sign...) but I'm not a kernel expert; mostly I use other peoples' prebuilt ones. ok, but wouldn't you recommend to use the kernel from Linaro ? ;) Obviously those work too (and in practice they're the ones I test most often with). Linaro kernels will certainly have been tested with Versatile Express, but probably not with the other older ARM devboards. But if you don't need to be on the bleeding edge for anything then I don't think it makes much difference whether you use the Linaro kernel or mainline for an established platform like the ARM devboards. (more info on the aims of the linaro kernel tree here: https://wiki.linaro.org/WorkingGroups/Kernel/KernelTree ) -- PMM
Re: [Qemu-devel] 2 MiB alignment in qemu_vmalloc()
On 12.10.2011, at 22:41, Stefan Weil wrote: Am 12.10.2011 22:02, schrieb Alexander Graf: Actually, I'd much rather prefer to keep the differences between KVM and non-KVM low here. THP can potentially also work on TCG, so the alignment isn't completely moot here. Though it's certainly a lot less useful, as code isn't directly executed from there and we the rest of the overhead is a lot higher either way (especially the softmmu one). Either way, why does valgrind break when we enforce big alignment? That really sounds more like a valgrind bug than anything else. Memalign is there for exactly that reason, no? Alex Actually, there is even a difference between KVM (x86_64) and KVM (non x86_64) in the current code: only x86_64 hosts use the 2 MiB alignment. Right. It might make sense to find a reasonable alignment for all archs and just set it to that. I vote for 16MB :). Valgrind breaks because it has an assertion which limits the alignment. This limitation was already discussed in 2008 and still exists in latest Ubuntu and other distributions (and also in latest Valgrind SVN trunk). Therefore I don't expect that it will be fixed soon. See these bug reports, for example: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=489297 http://bugs.kde.org/show_bug.cgi?id=203877 Well, yes, my point is that it's a bug in valgrind that should be fixed. I don't think we should special-case QEMU because of bugs in debugging software :) Alex
Re: [Qemu-devel] [PATCH 3/6] trace: Add trace events group implementation in the backend simple
* Mark Wu wu...@linux.vnet.ibm.com [2011-10-12 12:26]: Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com --- trace/simple.c | 30 ++ trace/simple.h |7 +++ 2 files changed, 37 insertions(+), 0 deletions(-) diff --git a/trace/simple.c b/trace/simple.c index b639dda..7aa4c0b 100644 --- a/trace/simple.c +++ b/trace/simple.c @@ -321,6 +321,16 @@ void trace_print_events(FILE *stream, fprintf_function stream_printf) } } +void trace_print_groups(FILE *stream, fprintf_function stream_printf) +{ +unsigned int i; + +for (i = 0; i NR_TRACE_EVENT_GROUPS; i++) { +stream_printf(stream, %s [GROUP ID %u] : state %u\n, + trace_group_list[i].gp_name, i, + trace_group_list[i].state); You've got a mix of space and tabs in this file. Remove the tabs and adjust the spacing to CODING_STYLE rules. +} +} bool trace_event_set_state(const char *name, bool state) { unsigned int i; @@ -334,6 +344,26 @@ bool trace_event_set_state(const char *name, bool state) return false; } +bool trace_event_group_set_state(const char *gp_name, bool state) +{ +unsigned int i; +unsigned int j; +TraceEventGroup *group; + +for (i = 0; i NR_TRACE_EVENT_GROUPS; i++) { + group = trace_group_list[i]; here +if (!strcmp(group-gp_name, gp_name)) { +group-state = state; + + for (j = group-start; j = group-end; j++) { here +trace_list[j].state = state; +} +return true; +} +} +return false; +} + /* Helper function to create a thread with signals blocked. Use glib's * portable threads since QEMU abstractions cannot be used due to reentrancy in * the tracer. Also note the signal masking on POSIX hosts so that the thread diff --git a/trace/simple.h b/trace/simple.h index 466e75b..cf119f3 100644 --- a/trace/simple.h +++ b/trace/simple.h @@ -22,6 +22,13 @@ typedef struct { bool state; } TraceEvent; +typedef struct { +const char *gp_name; +bool state; +int start; +int end; +} TraceEventGroup; + void trace0(TraceEventID event); void trace1(TraceEventID event, uint64_t x1); void trace2(TraceEventID event, uint64_t x1, uint64_t x2); -- 1.7.1 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
[Qemu-devel] Buggy SDL Zoom
Hi, the SDL zoom feature which is implemented in sdl_zoom_template.h (and the SDL_rotozoom version which it is based on) accesses memory beyond the allocated limits. This can be easily reproduced using Valgrind and some Linux desktop which resizes QEMU's window to fill the whole screen (I did run the tests on an Ubuntu netbook). Another effect can be observed by repeatedly increasing the zoom factor with the Alt-Ctrl-+: the image grows up to a certain value and then collapses again. It looks like other programs using SDL_rotozoom also discovered out-of-bound problems, and in newer versions, the SDL_rotozoom code was totally rewritten. For security reasons, I suggest disabling the zoom feature until either the current code is replaced by a (tested) newer version of SDL_rotozoom or fixed. Cheers, Stefan Weil
Re: [Qemu-devel] 2 MiB alignment in qemu_vmalloc()
Am 12.10.2011 22:47, schrieb Alexander Graf: Well, yes, my point is that it's a bug in valgrind that should be fixed. I don't think we should special-case QEMU because of bugs in debugging software :) Alex Yes, the valgrind bug should be fixed. I don't know why it isn't, but that's not the point: Valgrind is very valuable for finding certain kinds of bugs in QEMU which we want to find and fix (I hope we agree on this). Therefore developers should be able to use it. Today, they cannot use Valgrind with QEMU out-of-the box, because they get an assertion. Some developers will stop here. Others will ask Google, look in Valgrind's code and spend some time to find and fix the problem before they start using Valgrind to find QEMU bugs. They could have spent their time better. I can try to make QEMU more useable with Valgrind by changing the QEMU code (which was Valgrind compatible up to Avi's change). I cannot change the Valgrind code, and even if I could, it would take a lot of time until all Linux distributions would include a fixed Valgrind :-( If all existing gdb versions did not work with QEMU, but there were a simple QEMU change which made them work, what would you do? Cheers, Stefan
Re: [Qemu-devel] 2 MiB alignment in qemu_vmalloc()
On 12.10.2011, at 23:19, Stefan Weil wrote: Am 12.10.2011 22:47, schrieb Alexander Graf: Well, yes, my point is that it's a bug in valgrind that should be fixed. I don't think we should special-case QEMU because of bugs in debugging software :) Alex Yes, the valgrind bug should be fixed. I don't know why it isn't, but that's not the point: Valgrind is very valuable for finding certain kinds of bugs in QEMU which we want to find and fix (I hope we agree on this). Therefore developers should be able to use it. Today, they cannot use Valgrind with QEMU out-of-the box, because they get an assertion. Some developers will stop here. Others will ask Google, look in Valgrind's code and spend some time to find and fix the problem before they start using Valgrind to find QEMU bugs. They could have spent their time better. I can try to make QEMU more useable with Valgrind by changing the QEMU code (which was Valgrind compatible up to Avi's change). I cannot change the Valgrind code, and even if I could, it would take a lot of time until all Linux distributions would include a fixed Valgrind :-( If all existing gdb versions did not work with QEMU, but there were a simple QEMU change which made them work, what would you do? I would add a command line option to modify the alignment in runtime, defaulting it to something sane (16MB), so you can work around bugs in vagrind with a simple parameter :) Alex
[Qemu-devel] [PATCH V4] Add AACI audio playback support to the ARM Versatile/PB platform
This driver emulates the ARM AACI interface (PL041) connected to a LM4549 codec. It enables audio playback for the Versatile/PB platform. Limitations: - Supports only a playback on one channel (Versatile/Vexpress) - Supports only one TX FIFO in compact-mode or non-compact mode. - Supports playback of 12, 16, 18 and 20 bits samples. - Record is not supported. - The PL041 is hardwired to a LM4549 codec. Versatile/PB test build: linux-2.6.38.5 buildroot-2010.11 alsa-lib-1.0.22 alsa-utils-1.0.22 mpg123-0.66 Qemu host: Ubuntu 10.04 in Vmware/OS X Playback tested successfully with speaker-test/aplay/mpg123. Signed-off-by: Mathieu Sonet cont...@elasticsheep.com --- v3-v4 * Fix debug message compile error * Remove Versatile/PB reference in pl041.c * Use 20-bit samples in the LM4549 API * Add support for non-compact mode * Now support 12, 16, 18 and 20 bit sample size * A FIFO reset is triggered when the AACIFE bit is reset Makefile.target |1 + hw/lm4549.c | 332 hw/lm4549.h | 44 hw/pl041.c | 644 ++ hw/pl041.h | 135 hw/pl041.hx | 81 +++ hw/versatilepb.c |8 + 7 files changed, 1245 insertions(+), 0 deletions(-) create mode 100644 hw/lm4549.c create mode 100644 hw/lm4549.h create mode 100644 hw/pl041.c create mode 100644 hw/pl041.h create mode 100644 hw/pl041.hx diff --git a/Makefile.target b/Makefile.target index 42adfec..00173cd 100644 --- a/Makefile.target +++ b/Makefile.target @@ -355,6 +355,7 @@ obj-arm-y += syborg_virtio.o obj-arm-y += vexpress.o obj-arm-y += strongarm.o obj-arm-y += collie.o +obj-arm-y += pl041.o lm4549.o obj-sh4-y = shix.o r2d.o sh7750.o sh7750_regnames.o tc58128.o obj-sh4-y += sh_timer.o sh_serial.o sh_intc.o sh_pci.o sm501.o diff --git a/hw/lm4549.c b/hw/lm4549.c new file mode 100644 index 000..883ef60 --- /dev/null +++ b/hw/lm4549.c @@ -0,0 +1,332 @@ +/* + * LM4549 Audio Codec Interface + * + * Copyright (c) 2011 + * Written by Mathieu Sonet - www.elasticsheep.com + * + * This code is licenced under the GPL. + * + * * + * + * This driver emulates the LM4549 codec. + * + * It supports only one playback voice and no record voice. + */ + +#include hw.h +#include audio/audio.h +#include lm4549.h + +#if 0 +#define LM4549_DEBUG 1 +#endif + +#if 0 +#define LM4549_DUMP_DAC_INPUT 1 +#endif + +#ifdef LM4549_DEBUG +#define DPRINTF(fmt, ...) \ +do { printf(lm4549: fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) do {} while (0) +#endif + +#if defined(LM4549_DUMP_DAC_INPUT) +#include stdio.h +static FILE *fp_dac_input; +#endif + +/* LM4549 register list */ +enum { +LM4549_Reset= 0x00, +LM4549_Master_Volume= 0x02, +LM4549_Line_Out_Volume = 0x04, +LM4549_Master_Volume_Mono = 0x06, +LM4549_PC_Beep_Volume = 0x0A, +LM4549_Phone_Volume = 0x0C, +LM4549_Mic_Volume = 0x0E, +LM4549_Line_In_Volume = 0x10, +LM4549_CD_Volume= 0x12, +LM4549_Video_Volume = 0x14, +LM4549_Aux_Volume = 0x16, +LM4549_PCM_Out_Volume = 0x18, +LM4549_Record_Select= 0x1A, +LM4549_Record_Gain = 0x1C, +LM4549_General_Purpose = 0x20, +LM4549_3D_Control = 0x22, +LM4549_Powerdown_Ctrl_Stat = 0x26, +LM4549_Ext_Audio_ID = 0x28, +LM4549_Ext_Audio_Stat_Ctrl = 0x2A, +LM4549_PCM_Front_DAC_Rate = 0x2C, +LM4549_PCM_ADC_Rate = 0x32, +LM4549_Vendor_ID1 = 0x7C, +LM4549_Vendor_ID2 = 0x7E +}; + +static void lm4549_reset(lm4549_state *s) +{ +uint16_t *regfile = s-regfile; + +regfile[LM4549_Reset] = 0x0d50; +regfile[LM4549_Master_Volume] = 0x8008; +regfile[LM4549_Line_Out_Volume] = 0x8000; +regfile[LM4549_Master_Volume_Mono] = 0x8000; +regfile[LM4549_PC_Beep_Volume] = 0x; +regfile[LM4549_Phone_Volume]= 0x8008; +regfile[LM4549_Mic_Volume] = 0x8008; +regfile[LM4549_Line_In_Volume] = 0x8808; +regfile[LM4549_CD_Volume] = 0x8808; +regfile[LM4549_Video_Volume]= 0x8808; +regfile[LM4549_Aux_Volume] = 0x8808; +regfile[LM4549_PCM_Out_Volume] = 0x8808; +regfile[LM4549_Record_Select] = 0x; +regfile[LM4549_Record_Gain] = 0x8000; +regfile[LM4549_General_Purpose] = 0x; +regfile[LM4549_3D_Control] = 0x0101; +regfile[LM4549_Powerdown_Ctrl_Stat] = 0x000f; +regfile[LM4549_Ext_Audio_ID]= 0x0001; +regfile[LM4549_Ext_Audio_Stat_Ctrl] = 0x; +regfile[LM4549_PCM_Front_DAC_Rate] = 0xbb80; +regfile[LM4549_PCM_ADC_Rate]= 0xbb80; +
Re: [Qemu-devel] [PATCH] usb-hid: activate usb tablet / mouse after migration.
On Wed, Oct 12, 2011 at 19:30, Gerd Hoffmann kra...@redhat.com wrote: qemu uses the ps/2 mouse by default. The usb tablet (or mouse) is activated as soon as qemu sees some guest activity on the device, i.e. polling for HID events. That used to work fine for both fresh boot and migration. It does not fix usb tablet/mouse when starting vm directly from snapshot. Remote wakeup support changed the picture though: There will be no polling after migration in case the guest suspended the usb bus, waiting for wakeup events. Result is that the ps/2 mouse stays active. Fix this by activating the usb tablet / mouse in post_load() in case the guest enabled remote wakeup. Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/usb-hid.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/hw/usb-hid.c b/hw/usb-hid.c index 7c564b6..997f828 100644 --- a/hw/usb-hid.c +++ b/hw/usb-hid.c @@ -520,10 +520,21 @@ static int usb_keyboard_initfn(USBDevice *dev) return usb_hid_initfn(dev, HID_KEYBOARD); } +static int usb_ptr_post_load(void *opaque, int version_id) +{ + USBHIDState *s = opaque; + + if (s-dev.remote_wakeup) { + hid_pointer_activate(s-hid); + } + return 0; +} + static const VMStateDescription vmstate_usb_ptr = { .name = usb-ptr, .version_id = 1, .minimum_version_id = 1, + .post_load = usb_ptr_post_load, .fields = (VMStateField []) { VMSTATE_USB_DEVICE(dev, USBHIDState), VMSTATE_HID_POINTER_DEVICE(hid, USBHIDState), -- 1.7.1
Re: [Qemu-devel] [PATCH] qemu-char: Fix use of free() instead of g_free()
于 10/07/2011 01:38 PM, Stefan Weil 写道: cppcheck reported these errors: qemu-char.c:1667: error: Mismatching allocation and deallocation: s qemu-char.c:1668: error: Mismatching allocation and deallocation: chr qemu-char.c:1769: error: Mismatching allocation and deallocation: s qemu-char.c:1770: error: Mismatching allocation and deallocation: chr Signed-off-by: Stefan Weils...@weilnetz.de --- qemu-char.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/qemu-char.c b/qemu-char.c index 09d2309..e1b2b87 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -1664,8 +1664,8 @@ static int qemu_chr_open_win(QemuOpts *opts, CharDriverState **_chr) chr-chr_close = win_chr_close; if (win_chr_init(chr, filename) 0) { -free(s); -free(chr); +g_free(s); +g_free(chr); return -EIO; } qemu_chr_generic_open(chr); @@ -1766,8 +1766,8 @@ static int qemu_chr_open_win_pipe(QemuOpts *opts, CharDriverState **_chr) chr-chr_close = win_chr_close; if (win_chr_pipe_init(chr, filename) 0) { -free(s); -free(chr); +g_free(s); +g_free(chr); return -EIO; } qemu_chr_generic_open(chr); Tested-by: Dongxu Wang wdon...@linux.vnet.ibm.com
Re: [Qemu-devel] balloon driver on winxp guest start failed
On 10/12/2011 07:09 PM, hkran wrote: Hi, I used balloon driver for windows virtio-win-0.1-15.iso (from http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/) following the install guard , I installed the balloon driver like this: devcon.exe install d:\wxp\x86\balloon.inf PCI\VEN_1AF4DEV_1002SUBSYS_00051AF4REV_00 then reboot guest Os, but the status of driver installed is always incorrect, that show me the driver start failed (code 10) in the device manager. I typed the following cmds in the monitor command line: (qemu) device_add virtio-balloon (qemu) info balloon balloon: actual=2048 (qemu) balloon 1024 (qemu) info balloon balloon: actual=2048 (qemu) info balloon balloon: actual=2048 And I also tried it by using qemu -balloon virtio param when getting qemu up, the status is worse, the winxp guest froze at boot screen. Am I using balloon driver in a correct way? For the boot failure case, I take more looks into it. I open the trace output and see the following when boot failed Balloon driver, built on Oct 13 2011 10:46:59 ^M-- DriverEntry ^Mfile z:\source\kvm-guest-drivers-windows\balloon\sys\driver.c line 151 ^M-- BalloonDeviceAdd ^M-- BalloonDeviceAdd ^M-- BalloonEvtDevicePrepareHardware ^M- Port Resource [C0A0-C0C0] ^M-- BalloonEvtDevicePrepareHardware ^M-- BalloonEvtDeviceD0Entry ^M-- BalloonInit ^M-- VIRTIO_BALLOON_F_STATS_VQ ^M-- BalloonInit ^M-- BalloonInterruptEnable ^M-- BalloonInterruptEnable here, the system is blocked. I compare it with the logfile in the normal case that I hot-plugin the balloon device, and then find the system blocked before calling at BalloonInterruptDpc. Is it meaning that we open the interrupt of balloon device too soon when booting the system?