[PATCH] V2 Handle guest access to BBL_CR_CTL3 MSR
[Resubmit of prior version which contained a wayward patch hunk. Thanks Marcelo] A correction to Intel cpu model CPUID data (patch queued) caused winxp to BSOD when booted with a Penryn model. This was traced to the CPUID model field correction from 6 - 23 (as is proper for a Penryn class of cpu). Only in this case does the problem surface. The cause for this failure is winxp accessing the BBL_CR_CTL3 MSR which is unsupported by current kvm, appears to be a legacy MSR not fully characterized yet existing in current silicon, and is apparently carried forward in MSR space to accommodate vintage code as here. It is not yet conclusive whether this MSR implements any of its legacy functionality or is just an ornamental dud for compatibility. While I found no silicon version specific documentation link to this MSR, a general description exists in Intel's developer's reference which agrees with the functional behavior of other bootloader/kernel code I've examined accessing BBL_CR_CTL3. Regrettably winxp appears to be setting bit #19 called out as reserved in the above document. So to minimally accommodate this MSR, kvm msr get will provide the equivalent mock data and kvm msr write will simply toss the guest passed data without interpretation. While this treatment of BBL_CR_CTL3 addresses the immediate problem, the approach may be modified pending clarification from Intel. Signed-off-by: john cooper john.coo...@redhat.com --- diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 4d0dfa0..5bfafb6 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -38,6 +38,7 @@ #define MSR_MTRRcap0x00fe #define MSR_IA32_BBL_CR_CTL0x0119 +#define MSR_IA32_BBL_CR_CTL3 0x011e #define MSR_IA32_SYSENTER_CS 0x0174 #define MSR_IA32_SYSENTER_ESP 0x0175 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bcc0efc..04d6c55 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1592,6 +1592,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) } else return set_msr_hyperv(vcpu, msr, data); break; + case MSR_IA32_BBL_CR_CTL3: + /* Drop writes to this legacy MSR -- see rdmsr +* counterpart for further detail. +*/ + pr_unimpl(vcpu, ignored wrmsr: 0x%x data %llx\n, msr, data); + break; default: if (msr (msr == vcpu-kvm-arch.xen_hvm_config.msr)) return xen_hvm_config(vcpu, data); @@ -1846,6 +1852,19 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) } else return get_msr_hyperv(vcpu, msr, pdata); break; + case MSR_IA32_BBL_CR_CTL3: + /* This legacy MSR exists but isn't fully documented in current +* silicon. It is however accessed by winxp in very narrow +* scenarios where it sets bit #19, itself documented as +* a reserved bit. Best effort attempt to source coherent +* read data here should the balance of the register be +* interpreted by the guest: +* +* L2 cache control register 3: 64GB range, 256KB size, +* enabled, latency 0x1, configured +*/ + data = 0xbe702111; + break; default: if (!ignore_msrs) { pr_unimpl(vcpu, unhandled rdmsr: 0x%x\n, msr); -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Handle guest access to BBL_CR_CTL3 MSR
Marcelo Tosatti wrote: On Sat, Jan 08, 2011 at 12:05:14AM -0500, john cooper wrote: A correction to Intel cpu model CPUID data (patch queued) caused winxp-64 to BSOD when booted with a Penryn model. This was traced to the CPUID model field correction from 6 - 23 (as is proper for a Penryn class of cpu). Only in this case does the problem surface. The cause for this failure is winxp accessing the BBL_CR_CTL3 MSR which is unsupported by current kvm, appears to be a legacy MSR not fully characterized yet existing in current silicon, and is apparently carried forward in MSR space to accommodate vintage code as here. It is not yet conclusive whether this MSR implements any of its legacy functionality or is just an ornamental dud for compatibility. While I found no silicon version specific documentation link to this MSR, a general description exists in Intel's developer's reference which agrees with the functional behavior of other bootloader/kernel code I've examined accessing BBL_CR_CTL3. Regrettably winxp-64 appears to be setting bit #19 called out as reserved in the above document. So to minimally accommodate this MSR, kvm msr get will provide the equivalent mock data and kvm msr write will simply toss the guest passed data without interpretation. While this treatment of BBL_CR_CTL3 addresses the immediate problem, the approach may be modified pending clarification from Intel. Signed-off-by: john cooper john.coo...@redhat.com --- diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 6b89f5e..145cd60 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -38,6 +38,7 @@ #define MSR_MTRRcap 0x00fe #define MSR_IA32_BBL_CR_CTL 0x0119 +#define MSR_IA32_BBL_CR_CTL30x011e #define MSR_IA32_SYSENTER_CS0x0174 #define MSR_IA32_SYSENTER_ESP 0x0175 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fa708c9..9a8331c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1283,6 +1283,12 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 data) return -1; vcpu-arch.mcg_ctl = data; break; +case MSR_IA32_BBL_CR_CTL3: +/* Drop writes to this legacy MSR -- see rdmsr + * counterpart for further detail. + */ +pr_unimpl(vcpu, ignored wrmsr: 0x%x data %llx\n, msr, data); +break; default: if (msr = MSR_IA32_MC0_CTL msr MSR_IA32_MC0_CTL + 4 * bank_num) { @@ -1592,6 +1598,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) } else return set_msr_hyperv(vcpu, msr, data); break; +case MSR_IA32_BBL_CR_CTL3: +/* This legacy MSR exists but isn't fully documented in current + * silicon. It is however accessed by winxp in very narrow + * scenarios where it sets bit #19, itself documented as + * a reserved bit. Best effort attempt to source coherent + * read data here should the balance of the register be + * interpreted by the guest: + * + * L2 cache control register 3: 64GB range, 256KB size, + * enabled, latency 0x1, configured + */ +data = 0xbe702111; +break; default: if (msr (msr == vcpu-kvm-arch.xen_hvm_config.msr)) return xen_hvm_config(vcpu, data); This is the MSR write path ? The write path ignores the data. The MSR read returns a mock-up of BBL_CR_CTL3 data. The above addresses the narrow usage leading to the immediate bug. From the feedback thus far from Intel I believe the above is sufficient and minimal. -john -- john.coo...@third-harmonic.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Handle guest access to BBL_CR_CTL3 MSR
A correction to Intel cpu model CPUID data (patch queued) caused winxp-64 to BSOD when booted with a Penryn model. This was traced to the CPUID model field correction from 6 - 23 (as is proper for a Penryn class of cpu). Only in this case does the problem surface. The cause for this failure is winxp accessing the BBL_CR_CTL3 MSR which is unsupported by current kvm, appears to be a legacy MSR not fully characterized yet existing in current silicon, and is apparently carried forward in MSR space to accommodate vintage code as here. It is not yet conclusive whether this MSR implements any of its legacy functionality or is just an ornamental dud for compatibility. While I found no silicon version specific documentation link to this MSR, a general description exists in Intel's developer's reference which agrees with the functional behavior of other bootloader/kernel code I've examined accessing BBL_CR_CTL3. Regrettably winxp-64 appears to be setting bit #19 called out as reserved in the above document. So to minimally accommodate this MSR, kvm msr get will provide the equivalent mock data and kvm msr write will simply toss the guest passed data without interpretation. While this treatment of BBL_CR_CTL3 addresses the immediate problem, the approach may be modified pending clarification from Intel. Signed-off-by: john cooper john.coo...@redhat.com --- diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 6b89f5e..145cd60 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -38,6 +38,7 @@ #define MSR_MTRRcap0x00fe #define MSR_IA32_BBL_CR_CTL0x0119 +#define MSR_IA32_BBL_CR_CTL3 0x011e #define MSR_IA32_SYSENTER_CS 0x0174 #define MSR_IA32_SYSENTER_ESP 0x0175 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fa708c9..9a8331c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1283,6 +1283,12 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 data) return -1; vcpu-arch.mcg_ctl = data; break; + case MSR_IA32_BBL_CR_CTL3: + /* Drop writes to this legacy MSR -- see rdmsr +* counterpart for further detail. +*/ + pr_unimpl(vcpu, ignored wrmsr: 0x%x data %llx\n, msr, data); + break; default: if (msr = MSR_IA32_MC0_CTL msr MSR_IA32_MC0_CTL + 4 * bank_num) { @@ -1592,6 +1598,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) } else return set_msr_hyperv(vcpu, msr, data); break; + case MSR_IA32_BBL_CR_CTL3: + /* This legacy MSR exists but isn't fully documented in current +* silicon. It is however accessed by winxp in very narrow +* scenarios where it sets bit #19, itself documented as +* a reserved bit. Best effort attempt to source coherent +* read data here should the balance of the register be +* interpreted by the guest: +* +* L2 cache control register 3: 64GB range, 256KB size, +* enabled, latency 0x1, configured +*/ + data = 0xbe702111; + break; default: if (msr (msr == vcpu-kvm-arch.xen_hvm_config.msr)) return xen_hvm_config(vcpu, data); -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Looking at using KVM for embedded product
Tom Shoes wrote: Hi there, I am looking at using KVM for an embedded product. I am also new to Virtualization so pardon if I ask dumb questions. This is my first time posting to this forum. The embedded product that need to run KVM has: a. Intel processor with VT b. BIOS supports enabling VT c. Linux kernel 2.6.26 (from kernel.org) d. No VGA adapter e. Serial console f. BusyBox Busybox is an interesting requirement in that context. If you are constrained with userland size and linking against other than glibc, use of qemu could be interesting. Can't say I've built it other than linked against glibc and an extensive list of runtime libraries. Although I've never tried to configure-down that dependency. -john -- john.coo...@third-harmonic.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Add 'serial' attribute to virtio-blk devices
Rusty Russell wrote: On Sat, 19 Jun 2010 04:08:02 am Ryan Harper wrote: Create a new attribute for virtio-blk devices that will fetch the serial number of the block device. This attribute can be used by udev to create disk/by-id symlinks for devices that don't have a UUID (filesystem) associated with them. ATA_IDENTIFY strings are special in that they can be up to 20 chars long and aren't required to be NULL-terminated. The buffer is also zero-padded meaning that if the serial is 19 chars or less that we get a NULL terminated string. When copying this value into a string buffer, we must be careful to copy up to the NULL (if it present) and only 20 if it is longer and not to attempt to NULL terminate; this isn't needed. Signed-off-by: Ryan Harper ry...@us.ibm.com Signed-off-by: john cooper john.coo...@redhat.com --- drivers/block/virtio_blk.c | 32 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 258bc2a..f1ef26f 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -281,6 +281,31 @@ static int index_to_minor(int index) return index PART_BITS; } +/* Copy serial number from *s to *d. Copy operation terminates on either + * encountering a nul in *s or after n bytes have been copied, whichever + * occurs first. *d is not forcibly nul terminated. Return # of bytes copied. + */ +static inline int serial_sysfs(char *d, char *s, int n) +{ +char *di = d; + +while (*s n--) +*d++ = *s++; +return d - di; +} + +static ssize_t virtblk_serial_show(struct device *dev, +struct device_attribute *attr, char *buf) +{ +struct gendisk *disk = dev_to_disk(dev); +char id_str[VIRTIO_BLK_ID_BYTES]; + +if (IS_ERR(virtblk_get_id(disk, id_str))) +return 0; 0? Really? That doesn't seem very informative. Propagating a prospective error from virtblk_get_id() should be possible. Unsure if doing so is more useful from the user's perspective compared to just a nul id string. +return serial_sysfs(buf, id_str, min(VIRTIO_BLK_ID_BYTES, PAGE_SIZE)); How about something like this: BUILD_BUG_ON(PAGE_SIZE VIRTIO_BLK_ID_BYTES + 1); Agreed, that's a better wrench in the gearworks. Note padding buf[] by 1 isn't necessary as indicated below. /* id_str is not necessarily nul-terminated! */ buf[VIRTIO_BLK_ID_BYTES] = '\0'; return virtblk_get_id(disk, buf); The /sys file is rendered according to the length returned from this function and the trailing nul is not interpreted in this context. In fact if a nul is added and included in the byte count of the string it will appear in the /sys file. Thanks, -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Add 'serial' attribute to virtio-blk devices
Ryan Harper wrote: * john cooper john.coo...@redhat.com [2010-06-21 01:11]: Rusty Russell wrote: On Sat, 19 Jun 2010 04:08:02 am Ryan Harper wrote: Create a new attribute for virtio-blk devices that will fetch the serial number of the block device. This attribute can be used by udev to create disk/by-id symlinks for devices that don't have a UUID (filesystem) associated with them. ATA_IDENTIFY strings are special in that they can be up to 20 chars long and aren't required to be NULL-terminated. The buffer is also zero-padded meaning that if the serial is 19 chars or less that we get a NULL terminated string. When copying this value into a string buffer, we must be careful to copy up to the NULL (if it present) and only 20 if it is longer and not to attempt to NULL terminate; this isn't needed. Signed-off-by: Ryan Harper ry...@us.ibm.com Signed-off-by: john cooper john.coo...@redhat.com --- drivers/block/virtio_blk.c | 32 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 258bc2a..f1ef26f 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -281,6 +281,31 @@ static int index_to_minor(int index) return index PART_BITS; } +/* Copy serial number from *s to *d. Copy operation terminates on either + * encountering a nul in *s or after n bytes have been copied, whichever + * occurs first. *d is not forcibly nul terminated. Return # of bytes copied. + */ +static inline int serial_sysfs(char *d, char *s, int n) +{ + char *di = d; + + while (*s n--) + *d++ = *s++; + return d - di; +} + +static ssize_t virtblk_serial_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct gendisk *disk = dev_to_disk(dev); + char id_str[VIRTIO_BLK_ID_BYTES]; + + if (IS_ERR(virtblk_get_id(disk, id_str))) + return 0; 0? Really? That doesn't seem very informative. Propagating a prospective error from virtblk_get_id() should be possible. Unsure if doing so is more useful from the user's perspective compared to just a nul id string. I'm not sure we can do any thing else here; maybe printk a warning? Documentation/filesystems/sysfs.txt says that showing attributes should always return the number of chars put into the buffer; so when there is an error; zero is the right value to return since we're not filling the buffer. So we return a nul string in the case the qemu user didn't specify an id string and also in the case a legacy qemu doesn't support retrieval of an id string. Not too much difference and if needed going forward the error return can be elaborated. /* id_str is not necessarily nul-terminated! */ buf[VIRTIO_BLK_ID_BYTES] = '\0'; return virtblk_get_id(disk, buf); The /sys file is rendered according to the length returned from this function and the trailing nul is not interpreted in this context. In fact if a nul is added and included in the byte count of the string it will appear in the /sys file. Yeah; I like the simplicity; but we do need to know how long the string is so we can return that value. Which we're getting from serial_sysfs() without having to accommodate an unused nul. I'd hazard the primary reason the sysfs calling code keys off a return of byte count vs. traversing the string itself is due to the called function almost always having the byte count available. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Add 'serial' attribute to virtio-blk devices
Rusty Russell wrote: On Tue, 22 Jun 2010 02:13:21 am Ryan Harper wrote: * john cooper john.coo...@redhat.com [2010-06-21 01:11]: Rusty Russell wrote: /* id_str is not necessarily nul-terminated! */ buf[VIRTIO_BLK_ID_BYTES] = '\0'; return virtblk_get_id(disk, buf); The /sys file is rendered according to the length returned from this function and the trailing nul is not interpreted in this context. In fact if a nul is added and included in the byte count of the string it will appear in the /sys file. Yeah; I like the simplicity; but we do need to know how long the string is so we can return that value. So we're looking at something like: /* id_str is not necessarily nul-terminated! */ buf[VIRTIO_BLK_ID_BYTES] = '\0'; err = virtblk_get_id(disk, buf); if (!err) return strlen(buf); if (err == -EIO) /* Unsupported? Make it empty. */ return 0; return err; In my haste reading your prior mail, I'd glossed over the fact you were copying direct to the sysfs buf. So in retrospect that (and the above) do make sense. Thanks, -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Remove virtio_blk VBID ioctl
Rusty Russell wrote: On Sat, 19 Jun 2010 04:08:03 am Ryan Harper wrote: With the availablility of a sysfs device attribute for examining disk serial numbers the ioctl is no longer needed. The user-space changes for this aren't upstream yet so we don't have any users to worry about. If John Cooper acks this, I'll push it to Linus immediately. Actually I'm the one who suggested removing it. The code in question was only intended as example usage of accessing the s/n data in the driver, for the /sys interface under discussion back then. That effort subsequently stalled and Ryan had recently picked it up. As such I believe this overshadows the general need for an ioctl. Even if for some reason an ioctl would be justified going forward, a more usage friendly form would be better. So let's just drop it for now as the corresponding qemu-side code hasn't been merged. Unfortunately we offered this interface in 2.6.34, and we're now removing it. That's unpleasant. Indeed. This entire effort, aside from being an exercise in protracted agony, probably violates a Rube Goldberg patent. Thanks, -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] use unfair spinlock when running on hypervisor.
Avi Kivity wrote: On 06/01/2010 07:38 PM, Andi Kleen wrote: Your new code would starve again, right? Yes, of course it may starve with unfair spinlock. Since vcpus are not always running there is much smaller chance then vcpu on remote memory node will starve forever. Old kernels with unfair spinlocks are running fine in VMs on NUMA machines with various loads. Try it on a NUMA system with unfair memory. We are running everything on NUMA (since all modern machines are now NUMA). At what scale do the issues become observable? I understand that reason and do not propose to get back to old spinlock on physical HW! But with virtualization performance hit is unbearable. Extreme unfairness can be unbearable too. Well, the question is what happens first. In our experience, vcpu overcommit is a lot more painful. People will never see the NUMA unfairness issue if they can't use kvm due to the vcpu overcommit problem. Gleb's observed performance hit seems to be a rather mild throughput depression compared with creating a worst case by enforcing vcpu overcommit. Running a single guest with 2:1 overcommit on a 4 core machine I saw over an order of magnitude slowdown vs. 1:1 commit with the same kernel build test. Others have reported similar results. How close you'll get to that scenario depends on host scheduling dynamics, and statistically the number of opened and stalled lock held paths waiting to be contended. So I'd expect to see quite variable numbers for guest-guest aggravation of this problem. What I'd like to see eventually is a short-term-unfair, long-term-fair spinlock. Might make sense for bare metal as well. But it won't be easy to write. Collecting the contention/usage statistics on a per spinlock basis seems complex. I believe a practical approximation to this are adaptive mutexes where upon hitting a spin time threshold, punt and let the scheduler reconcile fairness. -john -- john.coo...@third-harmonic.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add cpu model configuration support.. (resend)
Anthony Liguori wrote: On 02/01/2010 01:02 PM, john cooper wrote: [target-x86_64.conf was unintentionally omitted from the earlier patch] This is a reimplementation of prior versions which adds the ability to define cpu models for contemporary processors. The added models are likewise selected via -cpuname, and are intended to displace the existing convention of -cpu qemu64 augmented with a series of feature flags This breaks the arm-softmmu build. Ugh, does indeed as well as a few other builds. Updated patch follows. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add cpu model configuration support..
[Revision to fix build breakage for a few targets. This does not yet reflect Andre's suggestion to coalesce all config file flags into one space, the implementation of which depends somewhat upon acceptance of the proposed config file syntax modification and is left as a TBD for now.] This is a reimplementation of prior versions which adds the ability to define cpu models for contemporary processors. The added models are likewise selected via -cpu name, and are intended to displace the existing convention of -cpu qemu64 augmented with a series of feature flags. A primary motivation was determination of a least common denominator within a given processor class to simplify guest migration. It is still possible to modify an arbitrary model via additional feature flags however the goal here was to make doing so unnecessary in typical usage. The other consideration was providing models names reflective of current processors. Both AMD and Intel have reviewed the models in terms of balancing generality of migration vs. excessive feature downgrade relative to released silicon. This version of the patch replaces the prior hard wired definitions with a configuration file approach for new models. Existing models are thus far left as-is but may easily be transitioned to (or may be overridden by) the configuration file representation. Proposed new model definitions are provided here for current AMD and Intel processors. Each model consists of a name used to select it on the command line (-cpu name), and a model_id which corresponds to a least common denominator commercial instance of the processor class. A table of names/model_ids may be queried via -cpu ?model: : x86 Opteron_G3 AMD Opteron 23xx (Gen 3 Class Opteron) x86 Opteron_G2 AMD Opteron 22xx (Gen 2 Class Opteron) x86 Opteron_G1 AMD Opteron 240 (Gen 1 Class Opteron) x86 Nehalem Intel Core i7 9xx (Nehalem Class Core i7) x86 Penryn Intel Core 2 Duo P9xxx (Penryn Class Core 2) x86 Conroe Intel Celeron_4x0 (Conroe/Merom Class Core 2) : Also added is -cpu ?dump which exhaustively outputs all config data for all defined models, and -cpu ?cpuid which enumerates all qemu recognized CPUID feature flags. The pseudo cpuid flag 'check' when added to the feature flag list will warn when feature flags (either implicit in a cpu model or explicit on the command line) would have otherwise been quietly unavailable to a guest: # qemu-system-x86_64 ... -cpu Nehalem,check warning: host cpuid _0001 lacks requested flag 'sse4.2|sse4_2' [0x0010] warning: host cpuid _0001 lacks requested flag 'popcnt' [0x0080] A similar 'enforce' pseudo flag exists which in addition to the above causes qemu to error exit if requested flags are unavailable. Configuration data for a cpu model resides in the target config file which by default will be installed as: /usr/local/etc/qemu/target-arch.conf The format of this file should be self explanatory given the definitions for the above six models and essentially mimics the structure of the static x86_def_t x86_defs. Encoding of cpuid flags names now allows aliases for both the configuration file and the command line which reconciles some Intel/AMD/Linux/Qemu naming differences. This patch was tested relative to qemu.git. Signed-off-by: john cooper john.coo...@redhat.com --- diff --git a/Makefile b/Makefile index c72a059..c6629d6 100644 --- a/Makefile +++ b/Makefile @@ -191,7 +191,11 @@ ifdef CONFIG_POSIX $(INSTALL_DATA) qemu-nbd.8 $(DESTDIR)$(mandir)/man8 endif -install: all $(if $(BUILD_DOCS),install-doc) +install-sysconfig: + $(INSTALL_DIR) $(sysconfdir)/qemu + $(INSTALL_DATA) sysconfigs/target/target-x86_64.conf $(sysconfdir)/qemu + +install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig $(INSTALL_DIR) $(DESTDIR)$(bindir) ifneq ($(TOOLS),) $(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) $(DESTDIR)$(bindir) diff --git a/linux-user/main.c b/linux-user/main.c index a0d8ce7..1189dda 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -2553,6 +2553,10 @@ int main(int argc, char **argv, char **envp) } cpu_model = NULL; +#if defined(cpudef_setup) +cpudef_setup(); /* parse cpu definitions in target config file (TBD) */ +#endif + optind = 1; for(;;) { if (optind = argc) @@ -2624,8 +2628,8 @@ int main(int argc, char **argv, char **envp) cpu_model = argv[optind++]; if (cpu_model == NULL || strcmp(cpu_model, ?) == 0) { /* XXX: implement xxx_cpu_list for targets that still miss it */ -#if defined(cpu_list) -cpu_list(stdout, fprintf); +#if defined(cpu_list_id) +cpu_list_id(stdout, fprintf, ); #endif exit(1); } diff --git a/qemu-config.c b/qemu-config.c index c3203c8..246fae6 100644 --- a/qemu
Re: [Qemu-devel] [PATCH] Add assignment operation to config file parser..
Gerd Hoffmann wrote: On 02/07/10 17:24, Anthony Liguori wrote: On 02/06/2010 12:59 PM, john cooper wrote: This patch reworks support for both assignment and append in the config file parser. It was motivated by comments received on the cpu model config file format. Commit dc9ca4ba27be4fe6a0284061b8f056c4364fb0d9 changed the behavior of = from assign to append. This patch preserves the ability to append to an option (however now via +=), reverts = to its previous behavior, and allows both to interoperate. Signed-off-by: john cooperjohn.coo...@redhat.com This deviates from standard ini syntax which makes me a big uncomfortable with it. Gerd, do you have an opinion? Also it the syntax change will break existing users of the append feature (host/guestfwd for slirp networking). Any reason why you can't use the current append to avoid the overlong feature flag lines? Impacting existing usage was my primary concern as well. I'd run this by Mark who created the patch changing the disposition of = to append and I didn't find any alarms there. The proposed scheme supports both semantics and arguably seems more intuitive. If that is the general consensus and it won't unduly perturb existing usage the above would be opportune before the current behavior becomes more entrenched (e.g. by further dependencies such as the proposed cpu model configuration). But to answer the question, my prior patch can work with either. Thanks, -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add assignment operation to config file parser..
This patch reworks support for both assignment and append in the config file parser. It was motivated by comments received on the cpu model config file format. Commit dc9ca4ba27be4fe6a0284061b8f056c4364fb0d9 changed the behavior of = from assign to append. This patch preserves the ability to append to an option (however now via +=), reverts = to its previous behavior, and allows both to interoperate. Signed-off-by: john cooper john.coo...@redhat.com --- diff --git a/qemu-config.c b/qemu-config.c index 246fae6..4e53250 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -429,6 +429,7 @@ int qemu_config_parse(FILE *fp) char line[1024], group[64], id[64], arg[64], value[1024]; QemuOptsList *list = NULL; QemuOpts *opts = NULL; +char append; while (fgets(line, sizeof(line), fp) != NULL) { if (line[0] == '\n') { @@ -455,13 +456,16 @@ int qemu_config_parse(FILE *fp) opts = qemu_opts_create(list, NULL, 0); continue; } -if (sscanf(line, %63s = \%1023[^\]\, arg, value) == 2) { -/* arg = value */ +append = 0; +if (sscanf(line, %63s = \%1023[^\]\, arg, value) == 2 || +(sscanf(line, %63s += \%1023[^\]\, arg, value) == 2 ? +append = 1 : 0)) { +/* arg = value, arg += value */ if (opts == NULL) { fprintf(stderr, no group defined\n); return -1; } -if (qemu_opt_set(opts, arg, value) != 0) { +if (_qemu_opt_set(opts, arg, value, append) != 0) { fprintf(stderr, failed to set \%s\ for %s\n, arg, group); return -1; diff --git a/qemu-option.c b/qemu-option.c index a52a4c4..7c0faed 100644 --- a/qemu-option.c +++ b/qemu-option.c @@ -562,7 +562,11 @@ static void qemu_opt_del(QemuOpt *opt) qemu_free(opt); } -int qemu_opt_set(QemuOpts *opts, const char *name, const char *value) +/* add option *name,*value to group *opts. if append add to tail of option + * list, else set as sole element (overwrite any existing entries of *name). + */ +int _qemu_opt_set(QemuOpts *opts, const char *name, const char *value, + char append) { QemuOpt *opt; QemuOptDesc *desc = opts-list-desc; @@ -582,13 +586,27 @@ int qemu_opt_set(QemuOpts *opts, const char *name, const char *value) return -1; } } - -opt = qemu_mallocz(sizeof(*opt)); -opt-name = qemu_strdup(name); -opt-opts = opts; -QTAILQ_INSERT_TAIL(opts-head, opt, next); -if (desc[i].name != NULL) { -opt-desc = desc+i; +if (append || !(opt = qemu_opt_find(opts, name))) { +opt = qemu_mallocz(sizeof(*opt)); +opt-name = qemu_strdup(name); +opt-opts = opts; +QTAILQ_INSERT_TAIL(opts-head, opt, next); +if (desc[i].name != NULL) { +opt-desc = desc+i; +} +} else if (!append) { +QemuOpt *p, *next; + +/* assign to reclaimed *opt, remove all other *name defs */ +QTAILQ_FOREACH_SAFE(p, opts-head, next, next) { +if (p != opt !strcmp(name, p-name)) { +qemu_free((char *)p-str); +QTAILQ_REMOVE(opts-head, p, next); +qemu_free((char *)p); +} +} +qemu_free((char *)opt-str); +opt-str = NULL; } if (value) { opt-str = qemu_strdup(value); diff --git a/qemu-option.h b/qemu-option.h index 666b666..2385b1a 100644 --- a/qemu-option.h +++ b/qemu-option.h @@ -104,7 +104,14 @@ const char *qemu_opt_get(QemuOpts *opts, const char *name); int qemu_opt_get_bool(QemuOpts *opts, const char *name, int defval); uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval); uint64_t qemu_opt_get_size(QemuOpts *opts, const char *name, uint64_t defval); -int qemu_opt_set(QemuOpts *opts, const char *name, const char *value); +int _qemu_opt_set(QemuOpts *opts, const char *name, const char *value, + char append); +static inline int qemu_opt_set(QemuOpts *opts, const char *name, + const char *value) +{ +return (_qemu_opt_set(opts, name, value, 0)); +} + typedef int (*qemu_opt_loopfunc)(const char *name, const char *value, void *opaque); int qemu_opt_foreach(QemuOpts *opts, qemu_opt_loopfunc func, void *opaque, int abort_on_failure); -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add cpu model configuration support.. (resend)
Andre Przywara wrote: +[cpudef] + name = Conroe + level = 2 + vendor = GenuineIntel + family = 6 + model = 2 + stepping = 3 + feature_edx = sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpumtrr clflush mca pse36 + feature_ecx = sse3 ssse3 + extfeature_edx = fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpulm syscall nx + extfeature_ecx = lahf_lm Wouldn't it be much more user friendly to merge them all into one string? Just from the feature names it is quite obscure to guess which flag belongs into which string (especially since we lack the EXTn_ prefix we had in helper.c). I haven't tried it, but the parsing code looks like this shouldn't be too hard. To avoid overlong lines one could think about a += operator. That's true. Although I expect setup of a cpu model to be a rather infrequent occurrence by the expert (+/-) user so the above didn't strike me as a significant issue. Also -cpu ?cpuid dumps out the entire motley crew of flags relative to each grouping for reference. That said the current config file syntax seems rather rigid and I think your suggestion makes sense. I avoided modifying the parser at this point just in the interest of minimizing the sprawl of this patch. I would just drop all definitions here except qemu{32,64} and kvm{32,64}. The other models should be described in the config file. That's the goal but I wanted to leave an interim firewall of sorts. If the target-x86_64.conf isn't installed for whatever reason, qemu still can fall back to the internal definitions. Even here it isn't strictly necessary to remove an internal def as it can be redefined in the config file which will override the internal version. In general -cpu ?model will indicate internal vs. externally defined models by enclosing internal model names in brackets: : x86 Opteron_G3 AMD Opteron 23xx (Gen 3 Class Opteron) : x86 [athlon] QEMU Virtual CPU version 0.12.50 : It also seems worth dropping a hint to the user in the case qemu fails to find a target config file rather than leaving them to puzzle out why an external model has gone missing. Thanks for the feedback. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add cpu model configuration support.. (resend)
[target-x86_64.conf was unintentionally omitted from the earlier patch] This is a reimplementation of prior versions which adds the ability to define cpu models for contemporary processors. The added models are likewise selected via -cpu name, and are intended to displace the existing convention of -cpu qemu64 augmented with a series of feature flags. A primary motivation was determination of a least common denominator within a given processor class to simplify guest migration. It is still possible to modify an arbitrary model via additional feature flags however the goal here was to make doing so unnecessary in typical usage. The other consideration was providing models names reflective of current processors. Both AMD and Intel have reviewed the models in terms of balancing generality of migration vs. excessive feature downgrade relative to released silicon. This version of the patch replaces the prior hard wired definitions with a configuration file approach for new models. Existing models are thus far left as-is but may easily be transitioned to (or may be overridden by) the configuration file representation. Proposed new model definitions are provided here for current AMD and Intel processors. Each model consists of a name used to select it on the command line (-cpu name), and a model_id which corresponds to a least common denominator commercial instance of the processor class. A table of names/model_ids may be queried via -cpu ?model: : x86 Opteron_G3 AMD Opteron 23xx (Gen 3 Class Opteron) x86 Opteron_G2 AMD Opteron 22xx (Gen 2 Class Opteron) x86 Opteron_G1 AMD Opteron 240 (Gen 1 Class Opteron) x86 Nehalem Intel Core i7 9xx (Nehalem Class Core i7) x86 Penryn Intel Core 2 Duo P9xxx (Penryn Class Core 2) x86 Conroe Intel Celeron_4x0 (Conroe/Merom Class Core 2) : Also added is -cpu ?dump which exhaustively outputs all config data for all defined models, and -cpu ?cpuid which enumerates all qemu recognized CPUID feature flags. The pseudo cpuid flag 'check' when added to the feature flag list will warn when feature flags (either implicit in a cpu model or explicit on the command line) would have otherwise been quietly unavailable to a guest: # qemu-system-x86_64 ... -cpu Nehalem,check warning: host cpuid _0001 lacks requested flag 'sse4.2|sse4_2' [0x0010] warning: host cpuid _0001 lacks requested flag 'popcnt' [0x0080] A similar 'enforce' pseudo flag exists which in addition to the above causes qemu to error exit if requested flags are unavailable. Configuration data for a cpu model resides in the target config file which by default will be installed as: /usr/local/etc/qemu/target-arch.conf The format of this file should be self explanatory given the definitions for the above six models and essentially mimics the structure of the static x86_def_t x86_defs. Encoding of cpuid flags names now allows aliases for both the configuration file and the command line which reconciles some Intel/AMD/Linux/Qemu naming differences. This patch was tested relative to qemu.git. Signed-off-by: john cooper john.coo...@redhat.com --- diff --git a/Makefile b/Makefile index 3848627..b7fa6ef 100644 --- a/Makefile +++ b/Makefile @@ -191,7 +191,11 @@ ifdef CONFIG_POSIX $(INSTALL_DATA) qemu-nbd.8 $(DESTDIR)$(mandir)/man8 endif -install: all $(if $(BUILD_DOCS),install-doc) +install-sysconfig: + $(INSTALL_DIR) $(sysconfdir)/qemu + $(INSTALL_DATA) sysconfigs/target/target-x86_64.conf $(sysconfdir)/qemu + +install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig $(INSTALL_DIR) $(DESTDIR)$(bindir) ifneq ($(TOOLS),) $(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) $(DESTDIR)$(bindir) diff --git a/qemu-config.c b/qemu-config.c index c3203c8..246fae6 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -242,6 +242,54 @@ QemuOptsList qemu_mon_opts = { }, }; +QemuOptsList qemu_cpudef_opts = { +.name = cpudef, +.head = QTAILQ_HEAD_INITIALIZER(qemu_cpudef_opts.head), +.desc = { +{ +.name = name, +.type = QEMU_OPT_STRING, +},{ +.name = level, +.type = QEMU_OPT_NUMBER, +},{ +.name = vendor, +.type = QEMU_OPT_STRING, +},{ +.name = family, +.type = QEMU_OPT_NUMBER, +},{ +.name = model, +.type = QEMU_OPT_NUMBER, +},{ +.name = stepping, +.type = QEMU_OPT_NUMBER, +},{ +.name = feature_edx, /* cpuid _0001.edx */ +.type = QEMU_OPT_STRING, +},{ +.name = feature_ecx, /* cpuid _0001.ecx */ +.type = QEMU_OPT_STRING, +},{ +.name = extfeature_edx, /* cpuid 8000_0001.edx */ +.type = QEMU_OPT_STRING
[PATCH] Add cpu model configuration support..
This is a reimplementation of prior versions which adds the ability to define cpu models for contemporary processors. The added models are likewise selected via -cpu name, and are intended to displace the existing convention of -cpu qemu64 augmented with a series of feature flags. A primary motivation was determination of a least common denominator within a given processor class to simplify guest migration. It is still possible to modify an arbitrary model via additional feature flags however the goal here was to make doing so unnecessary in typical usage. The other consideration was providing models names reflective of current processors. Both AMD and Intel have reviewed the models in terms of balancing generality of migration vs. excessive feature downgrade relative to released silicon. This version of the patch replaces the prior hard wired definitions with a configuration file approach for new models. Existing models are thus far left as-is but may easily be transitioned to (or may be overridden by) the configuration file representation. Proposed new model definitions are provided here for current AMD and Intel processors. Each model consists of a name used to select it on the command line (-cpu name), and a model_id which corresponds to a least common denominator commercial instance of the processor class. A table of names/model_ids may be queried via -cpu ?model: : x86 Opteron_G3 AMD Opteron 23xx (Gen 3 Class Opteron) x86 Opteron_G2 AMD Opteron 22xx (Gen 2 Class Opteron) x86 Opteron_G1 AMD Opteron 240 (Gen 1 Class Opteron) x86 Nehalem Intel Core i7 9xx (Nehalem Class Core i7) x86 Penryn Intel Core 2 Duo P9xxx (Penryn Class Core 2) x86 Conroe Intel Celeron_4x0 (Conroe/Merom Class Core 2) : Also added is -cpu ?dump which exhaustively outputs all config data for all defined models, and -cpu ?cpuid which enumerates all qemu recognized CPUID feature flags. The pseudo cpuid flag 'check' when added to the feature flag list will warn when feature flags (either implicit in a cpu model or explicit on the command line) would have otherwise been quietly unavailable to a guest: # qemu-system-x86_64 ... -cpu Nehalem,check warning: host cpuid _0001 lacks requested flag 'sse4.2|sse4_2' [0x0010] warning: host cpuid _0001 lacks requested flag 'popcnt' [0x0080] A similar 'enforce' pseudo flag exists which in addition to the above causes qemu to error exit if requested flags are unavailable. Configuration data for a cpu model resides in the target config file which by default will be installed as: /usr/local/etc/qemu/target-arch.conf The format of this file should be self explanatory given the definitions for the above six models and essentially mimics the structure of the static x86_def_t x86_defs. Encoding of cpuid flags names now allows aliases for both the configuration file and the command line which reconciles some Intel/AMD/Linux/Qemu naming differences. This patch was tested relative to qemu.git. Signed-off-by: john cooper john.coo...@redhat.com --- diff --git a/Makefile b/Makefile index 3848627..b7fa6ef 100644 --- a/Makefile +++ b/Makefile @@ -191,7 +191,11 @@ ifdef CONFIG_POSIX $(INSTALL_DATA) qemu-nbd.8 $(DESTDIR)$(mandir)/man8 endif -install: all $(if $(BUILD_DOCS),install-doc) +install-sysconfig: + $(INSTALL_DIR) $(sysconfdir)/qemu + $(INSTALL_DATA) sysconfigs/target/target-x86_64.conf $(sysconfdir)/qemu + +install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig $(INSTALL_DIR) $(DESTDIR)$(bindir) ifneq ($(TOOLS),) $(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) $(DESTDIR)$(bindir) diff --git a/qemu-config.c b/qemu-config.c index c3203c8..246fae6 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -242,6 +242,54 @@ QemuOptsList qemu_mon_opts = { }, }; +QemuOptsList qemu_cpudef_opts = { +.name = cpudef, +.head = QTAILQ_HEAD_INITIALIZER(qemu_cpudef_opts.head), +.desc = { +{ +.name = name, +.type = QEMU_OPT_STRING, +},{ +.name = level, +.type = QEMU_OPT_NUMBER, +},{ +.name = vendor, +.type = QEMU_OPT_STRING, +},{ +.name = family, +.type = QEMU_OPT_NUMBER, +},{ +.name = model, +.type = QEMU_OPT_NUMBER, +},{ +.name = stepping, +.type = QEMU_OPT_NUMBER, +},{ +.name = feature_edx, /* cpuid _0001.edx */ +.type = QEMU_OPT_STRING, +},{ +.name = feature_ecx, /* cpuid _0001.ecx */ +.type = QEMU_OPT_STRING, +},{ +.name = extfeature_edx, /* cpuid 8000_0001.edx */ +.type = QEMU_OPT_STRING, +},{ +.name = extfeature_ecx, /* cpuid 8000_0001.ecx
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
Anthony Liguori wrote: On 01/20/2010 07:18 PM, john cooper wrote: I can appreciate the concern of wanting to get this as correct as possible. This is the root of the trouble. At the qemu layer, we try to focus on being correct. Management tools are typically the layer that deals with being correct. A good compromise is making things user tunable which means that a downstream can make correctness decisions without forcing those decisions on upstream. Conceptually I agree with such a malleable approach -- actually I prefer it. I thought however it was too much infrastructure to foist on the problem just to add a few more models into the mix. The only reservation which comes to mind is that of logistics. This may ruffle the code some and impact others such as Andre who seem to have existing patches relative to the current structure. Anyone have strong objections to this approach before I have a look at an implementation? Thanks, -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
Jamie Lokier wrote: Do you mean that more powerful management tools to support safe migration will maintain _their own_ processor model tables, and perform their calculations using their own tables instead of querying qemu, and therefore not have any need of qemu's built in table? I would expect so. IIRC that is what the libvirt folks have in mind for example. But we're also trying to simplify the use case of the lonesome user at one with the qemu CLI. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
Jamie Lokier wrote: I think we can all agree that there is no point looking for a familiar -cpu naming scheme because there aren't any familiar and meaningful names these days. Even if we dismiss the Intel coined names as internal code names, there is still VMW's use of them in this space which we can either align with or attempt to displace. All considered I don't see any motivation nor gain in doing the latter. Anyway it doesn't appear likely we're going to resolve this to our collective satisfaction with a hard-wired naming scheme. It would be nice if qemu could tell the user which of the built-in -cpu choices is the most featureful subset of their own host. With -cpu host implemented, finding that is probably quite easy. This should be doable although it may not be as simple as traversing a hierarchy of features and picking one with the most host flags present. In any case this should be fairly detachable from settling the immediate issue. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
Jamie Lokier wrote: Anthony Liguori wrote: On 01/18/2010 10:45 AM, john cooper wrote: x86 Conroe Intel Celeron_4x0 (Conroe/Merom Class Core 2) x86 Penryn Intel Core 2 Duo P9xxx (Penryn Class Core 2) x86 Nehalem Intel Core i7 9xx (Nehalem Class Core i7) x86 Opteron_G1 AMD Opteron 240 (Gen 1 Class Opteron) x86 Opteron_G2 AMD Opteron 22xx (Gen 2 Class Opteron) x86 Opteron_G3 AMD Opteron 23xx (Gen 3 Class Opteron) I'm very much against having -cpu Nehalem. The whole point of this is to make things easier for a user and for most of the users I've encountered, -cpu Nehalem is just as obscure as -cpu qemu64,-sse3,+vmx,... When I saw that table just now, I had no idea whether Nehalem is newer and more advanced than Penryn, or the other way around. I also have no idea if Core i7 is newer than Core 2 Duo or not. I can appreciate the argument above, however the goal was choosing names with some basis in reality. These were recommended by our contacts within Intel, are used by VmWare to describe their similar cpu models, and arguably have fallen to defacto usage as evidenced by such sources as: http://en.wikipedia.org/wiki/Conroe_(microprocessor) http://en.wikipedia.org/wiki/Penryn_(microprocessor) http://en.wikipedia.org/wiki/Nehalem_(microarchitecture) I suspect whatever we choose of reasonable length as a model tag for -cpu some further detail is going to be required. That was the motivation to augment the table as above with an instance of a LCD for that associated class. I'm not a typical user: I know quite a lot about x86 architecture; I just haven't kept up to date enough to know the code/model names. Typical users will know less about them. Understood. One thought I had to further clarify what is going on under the hood was to dump the cpuid flags for each model as part of (or in addition to) the above table. But this seems a bit extreme and kvm itself can modify flags exported from qemu to a guest. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
Anthony Liguori wrote: On 01/19/2010 02:03 PM, Chris Wright wrote: * Anthony Liguori (anth...@codemonkey.ws) wrote: I'm very much against having -cpu Nehalem. The whole point of this is to make things easier for a user and for most of the users I've encountered, -cpu Nehalem is just as obscure as -cpu qemu64,-sse3,+vmx,... What name will these users know? FWIW, it makes sense to me as it is. Whatever is in /proc/cpuinfo. $ grep name /proc/cpuinfo model name : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz Which is detailing that exact cpu vs. the class of which it is a member. So are you suggesting to map all instances of processors called out in /proc/cpuinfo into one of the three defined models? We can certainly do that however I was looking for a more terse and simplified solution at this level while deferring more ornate mapping schemes to management tools. Still at the user facing CLI this doesn't strike me as the most friendly encoding of a -cpu name. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
Jamie Lokier wrote: john cooper wrote: As before a cpu feature 'check' option is added which warns when feature flags (either implicit in a cpu model or explicit on the command line) would have otherwise been quietly unavailable to a guest: # qemu-system-x86_64 ... -cpu Nehalem,check warning: host cpuid _0001 lacks requested flag 'sse4.2' [0x0010] warning: host cpuid _0001 lacks requested flag 'popcnt' [0x0080] That's a nice feature. Can we have a 'checkfail' option which refuses to run if a requested capability isn't available? Thanks. Certainly, others have requested the same. Let's resolve the issue at hand first. I foresee wanting to iterate over the models and pick the latest one which a host supports - on the grounds that you have done the hard work of ensuring it is a reasonably good performer, while probably working on another host of similar capability when a new host is made available. That's a fairly close use case to that of safe migration which was one of the primary motivations to identify the models being discussed. Although presentation and administration of such was considered the domain of management tools. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
Chris Wright wrote: * Daniel P. Berrange (berra...@redhat.com) wrote: To be honest all possible naming schemes for '-cpu name' are just as unfriendly as each other. The only user friendly option is '-cpu host'. IMHO, we should just pick a concise naming scheme document it. Given they are all equally unfriendly, the one that has consistency with vmware naming seems like a mild winner. Heh, I completely agree, and was just saying the same thing to John earlier today. May as well be -cpu {foo,bar,baz} since the meaning for those command line options must be well-documented in the man page. I can appreciate the concern of wanting to get this as correct as possible. But ultimately we just need three unique tags which ideally have some relation to their associated architectures. The diatribes available from /proc/cpuinfo while generally accurate don't really offer any more of a clue to the model group, and in their unmodified form are rather unwieldy as command line flags. This is from an EVC kb article[1]: Here is a pointer to a more detailed version: http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1003212 We probably should also add an option to dump out the full set of qemu-side cpuid flags for the benefit of users and upper level tools. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add definitions for current cpu models..
This is a rework of the prior version which adds definitions for contemporary processors selected via -cpu model, as an alternative to the existing use of -cpu qemu64 augmented with a series of feature flags. The primary motivation was determination of a least common denominator within a given processor class to simplify guest migration. It is still possible to modify an arbitrary model via additional feature flags however the goal here was to make doing so unnecessary in typical usage. The other consideration was providing models names reflective of current processors. Both AMD and Intel have reviewed the models in terms of balancing generality of migration vs. excessive feature downgrade relative to released silicon. Concerning the prior version of the patch, the proposed name used for a given model drew a fair amount of debate, the main concern being use of names as mnemonic as possible to the wisest group of users. Another suggestion was to use the vendor name of released silicon corresponding to a least common denominator CPU within the class, rational being doing so is more definitive of the intended functionality. However something like: -cpu Intel Core 2 Duo P9xxx probably isn't all that easy to remember nor type when selecting a Penryn class cpu. So I struck what I believe to be a reasonable compromise where the original x86_def_t.name was for the most part retained with the x86_def_t.model_id capturing the marketing name of the cpu being used as the least common denominator for the class. To make it easier for a user to associate a *.name with *.model_id, -cpu ? invoked rather as -cpu ?? will append *.model_id to the generated table: : x86 Conroe Intel Celeron_4x0 (Conroe/Merom Class Core 2) x86 Penryn Intel Core 2 Duo P9xxx (Penryn Class Core 2) x86 Nehalem Intel Core i7 9xx (Nehalem Class Core i7) x86 Opteron_G1 AMD Opteron 240 (Gen 1 Class Opteron) x86 Opteron_G2 AMD Opteron 22xx (Gen 2 Class Opteron) x86 Opteron_G3 AMD Opteron 23xx (Gen 3 Class Opteron) : As before a cpu feature 'check' option is added which warns when feature flags (either implicit in a cpu model or explicit on the command line) would have otherwise been quietly unavailable to a guest: # qemu-system-x86_64 ... -cpu Nehalem,check warning: host cpuid _0001 lacks requested flag 'sse4.2' [0x0010] warning: host cpuid _0001 lacks requested flag 'popcnt' [0x0080] This patch was tested relative to qemu.git. Signed-off-by: john cooper john.coo...@redhat.com --- diff --git a/target-i386/cpu.h b/target-i386/cpu.h index f3834b3..58400ab 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -722,8 +722,8 @@ typedef struct CPUX86State { CPUX86State *cpu_x86_init(const char *cpu_model); int cpu_x86_exec(CPUX86State *s); void cpu_x86_close(CPUX86State *s); -void x86_cpu_list (FILE *f, int (*cpu_fprintf)(FILE *f, const char *fmt, - ...)); +void x86_cpu_list (FILE *f, int (*cpu_fprintf)(FILE *f, const char *fmt, ...), +const char *optarg); int cpu_get_pic_interrupt(CPUX86State *s); /* MSDOS compatibility mode FPU exception support */ void cpu_set_ferr(CPUX86State *s); @@ -875,7 +875,7 @@ uint64_t cpu_get_tsc(CPUX86State *env); #define cpu_exec cpu_x86_exec #define cpu_gen_code cpu_x86_gen_code #define cpu_signal_handler cpu_x86_signal_handler -#define cpu_list x86_cpu_list +#define cpu_list_id x86_cpu_list #define CPU_SAVE_VERSION 11 diff --git a/target-i386/helper.c b/target-i386/helper.c index 730e396..34f4936 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -42,7 +42,7 @@ static const char *feature_name[] = { static const char *ext_feature_name[] = { pni /* Intel,AMD sse3 */, NULL, NULL, monitor, ds_cpl, vmx, NULL /* Linux smx */, est, tm2, ssse3, cid, NULL, NULL, cx16, xtpr, NULL, -NULL, NULL, dca, NULL, NULL, NULL, NULL, popcnt, +NULL, NULL, dca, sse4.1, sse4.2, x2apic, NULL, popcnt, NULL, NULL, NULL, NULL, NULL, NULL, NULL, hypervisor, }; static const char *ext2_feature_name[] = { @@ -58,6 +58,18 @@ static const char *ext3_feature_name[] = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; +/* collects per-function cpuid data + */ +typedef struct model_features_t { +uint32_t *guest_feat; +uint32_t *host_feat; +uint32_t check_feat; +const char **flag_names; +uint32_t cpuid; +} model_features_t; + +int check_cpuid = 0; + static void add_flagname_to_bitmaps(const char *flagname, uint32_t *features, uint32_t *ext_features, uint32_t *ext2_features, @@ -111,10 +123,25 @@ typedef struct x86_def_t { CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \ CPUID_PSE36 | CPUID_FXSR
Re: [PATCH] Add definitions for current cpu models..
Marcelo Tosatti wrote: On Mon, Dec 21, 2009 at 01:46:36AM -0500, john cooper wrote: +{ +.name = Opteron_G2, +.level = 5, +.vendor1 = CPUID_VENDOR_INTEL_1, +.vendor2 = CPUID_VENDOR_INTEL_2, +.vendor3 = CPUID_VENDOR_INTEL_3, Silly question: why a CPU named Opteron_G2 uses intel vendor id's? The feedback I had from AMD indicated using the Intel strings for a family 15 cpu resulted in the least amount of guest confusion. The upstream kvm64 model does similarly. Sorry for my late response here which was preempted by the intervening holiday. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add definitions for current cpu models..
This adds definitions for contemporary processors which may be selected via -cpu model, as an alternative to the existing use of -cpu qemu64 augmented with a series of feature flags. The primary motivation was determination of a least common denominator within a given processor class for simplification of guest migration. It is still possible to modify an arbitrary model via additional feature flags however the goal here was to make doing so unnecessary in typical usage. The other consideration was providing models names reflective of current processors. Both AMD and Intel have reviewed the models in terms of balancing generality of migration vs. excessive feature downgrade relative to released silicon. A cpu feature 'check' option is also added which warns when feature flags (either implicit in a cpu model or explicit on the command line) would have otherwise been quietly disabled for a guest. This patch was tested relative to qemu-kvm.git. Signed-off-by: john cooper john.coo...@redhat.com --- diff --git a/target-i386/helper.c b/target-i386/helper.c index 9a50da6..a706cae 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -44,7 +44,7 @@ static const char *feature_name[] = { static const char *ext_feature_name[] = { pni /* Intel,AMD sse3 */, NULL, NULL, monitor, ds_cpl, vmx, NULL /* Linux smx */, est, tm2, ssse3, cid, NULL, NULL, cx16, xtpr, NULL, -NULL, NULL, dca, NULL, NULL, x2apic, NULL, popcnt, +NULL, NULL, dca, sse4.1, sse4.2, x2apic, NULL, popcnt, NULL, NULL, NULL, NULL, NULL, NULL, NULL, hypervisor, }; static const char *ext2_feature_name[] = { @@ -60,6 +60,18 @@ static const char *ext3_feature_name[] = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; +/* collects per-function cpuid data + */ +typedef struct model_features_t { +uint32_t *guest_feat; +uint32_t *host_feat; +uint32_t check_feat; +const char **flag_names; +uint32_t cpuid; +} model_features_t; + +int check_cpuid = 0; + static void add_flagname_to_bitmaps(const char *flagname, uint32_t *features, uint32_t *ext_features, uint32_t *ext2_features, @@ -171,6 +183,139 @@ static x86_def_t x86_defs[] = { .model_id = AMD Phenom(tm) 9550 Quad-Core Processor }, { +.name = Merom, +.level = 2, +.vendor1 = CPUID_VENDOR_INTEL_1, +.vendor2 = CPUID_VENDOR_INTEL_2, +.vendor3 = CPUID_VENDOR_INTEL_3, +.family = 6, /* P6 */ +.model = 2, +.stepping = 3, +.features = PPRO_FEATURES | +/* these features are needed for Win64 and aren't fully implemented */ +CPUID_MTRR | CPUID_CLFLUSH | CPUID_MCA | +/* this feature is needed for Solaris and isn't fully implemented */ +CPUID_PSE36, +.ext_features = CPUID_EXT_SSE3,/* from qemu64 */ +.ext2_features = (PPRO_FEATURES 0x0183F3FF) | +CPUID_EXT2_LM | CPUID_EXT2_SYSCALL | CPUID_EXT2_NX, +.ext3_features = CPUID_EXT3_SVM, /* from qemu64 */ +.xlevel = 0x800A, +.model_id = Intel Merom Core 2, +}, +{ +.name = Penryn, +.level = 2, +.vendor1 = CPUID_VENDOR_INTEL_1, +.vendor2 = CPUID_VENDOR_INTEL_2, +.vendor3 = CPUID_VENDOR_INTEL_3, +.family = 6, /* P6 */ +.model = 2, +.stepping = 3, +.features = PPRO_FEATURES | +/* these features are needed for Win64 and aren't fully implemented */ +CPUID_MTRR | CPUID_CLFLUSH | CPUID_MCA | +/* this feature is needed for Solaris and isn't fully implemented */ +CPUID_PSE36, +.ext_features = CPUID_EXT_SSE3 | +CPUID_EXT_CX16 | CPUID_EXT_SSSE3 | CPUID_EXT_SSE41, +.ext2_features = (PPRO_FEATURES 0x0183F3FF) | +CPUID_EXT2_LM | CPUID_EXT2_SYSCALL | CPUID_EXT2_NX, +.ext3_features = CPUID_EXT3_SVM, +.xlevel = 0x800A, +.model_id = Intel Penryn Core 2, +}, +{ +.name = Nehalem, +.level = 2, +.vendor1 = CPUID_VENDOR_INTEL_1, +.vendor2 = CPUID_VENDOR_INTEL_2, +.vendor3 = CPUID_VENDOR_INTEL_3, +.family = 6, /* P6 */ +.model = 2, +.stepping = 3, +.features = PPRO_FEATURES | +/* these features are needed for Win64 and aren't fully implemented */ +CPUID_MTRR | CPUID_CLFLUSH | CPUID_MCA | +/* this feature is needed for Solaris and isn't fully implemented */ +CPUID_PSE36, +.ext_features = CPUID_EXT_SSE3 | +CPUID_EXT_CX16 | CPUID_EXT_SSSE3 | CPUID_EXT_SSE41 | +CPUID_EXT_SSE42 | CPUID_EXT_POPCNT, +.ext2_features = (PPRO_FEATURES 0x0183F3FF) | +CPUID_EXT2_LM | CPUID_EXT2_SYSCALL | CPUID_EXT2_NX, +.ext3_features = CPUID_EXT3_SVM, +.xlevel = 0x800A, +.model_id
Re: virtio disk slower than IDE?
[prior attempts from elsewhere kept bouncing, apologies for any replication] Gordan Bobic wrote: The test is building the Linux kernel (only taking the second run to give the test the benefit of local cache): make clean; make -j8 all; make clean; sync; time make -j8 all This takes about 10 minutes with IDE disk emulation and about 13 minutes with virtio. I ran the tests multiple time with most non-essential services on the host switched off (including cron/atd), and the guest in single-user mode to reduce the noise in the test to the minimum, and the results are pretty consistent, with virtio being about 30% behind. I'd expect for an observed 30% wall clock time difference of an operation as complex as a kernel build the base i/o throughput disparity is substantially greater. Did you try a more simple/regular load, eg: a streaming dd read of various block sizes from guest raw disk devices? This is also considerably easier to debug vs. the complex i/o load generated by a build. One way to chop up the problem space is using blktrace on the host to observe both the i/o patterns coming out of qemu and the host's response to them in terms of turn around time. I expect you'll see somewhat different nature requests generated by qemu w/r/t blocking and number of threads serving virtio_blk requests relative to ide but the host response should be essentially the same in terms of data returned per unit time. If the host looks to be turning around i/o request with similar latency in both cases, the problem would be lower frequency of requests generated by qemu in the case of virtio_blk. Here it would be useful to know the host load generated by the guest for both cases. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio disk slower than IDE?
Gordan Bobic wrote: The test is building the Linux kernel (only taking the second run to give the test the benefit of local cache): make clean; make -j8 all; make clean; sync; time make -j8 all This takes about 10 minutes with IDE disk emulation and about 13 minutes with virtio. I ran the tests multiple time with most non-essential services on the host switched off (including cron/atd), and the guest in single-user mode to reduce the noise in the test to the minimum, and the results are pretty consistent, with virtio being about 30% behind. I'd expect for an observed 30% wall clock time difference of an operation as complex as a kernel build the base i/o throughput disparity is substantially greater. Did you try a more simple/regular load, eg: a streaming dd read of various block sizes from guest raw disk devices? This is also considerably easier to debug vs. the complex i/o load generated by a build. One way to chop up the problem space is using blktrace on the host to observe both the i/o patterns coming out of qemu and the host's response to them in terms of turn around time. I expect you'll see somewhat different nature requests generated by qemu w/r/t blocking and number of threads serving virtio_blk requests relative to ide but the host response should be essentially the same in terms of data returned per unit time. If the host looks to be turning around i/o request with similar latency in both cases, the problem would be lower frequency of requests generated by qemu in the case of virtio_blk. Here it would be useful to know the host load generated by the guest for both cases. -john -- john.coo...@third-harmonic.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM-88 broke VirtIO Hard Disks
Anthony Liguori wrote: Marcelo Tosatti wrote: bf011293f is an easy one to blame, can you revert it and check, please? Has anyone tracked down a proper fix? Apologies, I'd been distracted elsewhere. I suspect transferring the identify page in its entirety via the config space is somehow confusing the windows virtio driver although it isn't immediately clear why. Investigating it now. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Add serial number support for virtio_blk, V4
Jens Axboe wrote: On Mon, Jun 01 2009, Rusty Russell wrote: On Fri, 29 May 2009 01:45:27 pm john cooper wrote: virtio_blk-serial-4.patch Hate to ask dumb questions, but is there a scsi equivalent of this? It'd be nice if we could avoid being ATA-specific in the long run... SCSI has mode pages, where ATA pretty much stuffs everything into the identify data. Also, why u16? The identify page is word based, so u16 makes sense. It is actually an artifact left over from the previous version. In that case the driver was only pulling the serial number over, itself constructing a mocked-up identify page, and needed to be aware of the internal structure. Currently qemu fabricates the identify page which the driver treats as opaque data passed onto the user. So the u16 * structure has no meaning. Sorry to be late in responding here. A patch to address this wrinkle and other feedback to follow shortly. -john +/* return ATA identify data + */ +static int virtblk_identify(struct gendisk *disk, void *argp) +{ + struct virtio_blk *vblk = disk-private_data; + u16 *id; + int err = -ENOMEM; + + id = kmalloc(VIRTIO_BLK_ID_BYTES, GFP_KERNEL); + if (!id) + goto out; + + err = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_IDENTIFY, + offsetof(struct virtio_blk_config, identify), id, + VIRTIO_BLK_ID_BYTES); -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Add serial number support for virtio_blk, V4a
This patch extracts the opaque data from pci i/o region 0 via the added VIRTIO_BLK_F_IDENTIFY field. By convention this data takes the form of that returned by an ATA IDENTIFY DEVICE command, however the driver (except for structure size) makes no interpretation of the data. The structure data is copied wholesale to userspace via a HDIO_GET_IDENTITY ioctl command (eg: hdparm -i dev). Signed-off-by: john cooper john.coo...@redhat.com --- drivers/block/virtio_blk.c | 41 ++--- include/linux/virtio_blk.h |6 ++ 2 files changed, 44 insertions(+), 3 deletions(-) = --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -146,12 +146,46 @@ static void do_virtblk_request(struct re vblk-vq-vq_ops-kick(vblk-vq); } +/* return ATA identify data + */ +static int virtblk_identify(struct gendisk *disk, void *argp) +{ + struct virtio_blk *vblk = disk-private_data; + void *opaque; + int err = -ENOMEM; + + opaque = kmalloc(VIRTIO_BLK_ID_BYTES, GFP_KERNEL); + if (!opaque) + goto out; + + err = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_IDENTIFY, + offsetof(struct virtio_blk_config, identify), opaque, + VIRTIO_BLK_ID_BYTES); + + if (err) + goto out_kfree; + + if (copy_to_user(argp, opaque, VIRTIO_BLK_ID_BYTES)) + err = -EFAULT; + +out_kfree: + kfree(opaque); +out: + return err; +} + static int virtblk_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long data) { - return scsi_cmd_ioctl(bdev-bd_disk-queue, - bdev-bd_disk, mode, cmd, - (void __user *)data); + struct gendisk *disk = bdev-bd_disk; + void __user *argp = (void __user *)data; + + switch (cmd) { + case HDIO_GET_IDENTITY: + return virtblk_identify(disk, argp); + default: + return scsi_cmd_ioctl(disk-queue, disk, mode, cmd, argp); + } } /* We provide getgeo only to please some old bootloader/partitioning tools */ @@ -356,6 +390,7 @@ static struct virtio_device_id id_table[ static unsigned int features[] = { VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, + VIRTIO_BLK_F_IDENTIFY }; static struct virtio_driver virtio_blk = { = --- a/include/linux/virtio_blk.h +++ b/include/linux/virtio_blk.h @@ -15,7 +15,12 @@ #define VIRTIO_BLK_F_GEOMETRY 4 /* Legacy geometry available */ #define VIRTIO_BLK_F_RO5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ +#define VIRTIO_BLK_F_IDENTIFY 8 /* ATA IDENTIFY supported */ +#define VIRTIO_BLK_ID_BYTES(sizeof (__u16[256])) /* IDENTIFY DATA */ + +/* mapped into pci i/o region 0 + */ struct virtio_blk_config { /* The capacity (in 512-byte sectors). */ @@ -32,6 +37,7 @@ struct virtio_blk_config } geometry; /* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */ __u32 blk_size; + __u8 identify[VIRTIO_BLK_ID_BYTES]; } __attribute__((packed)); /* These two define direction. */ -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: configure script bug..
Avi Kivity wrote: john cooper wrote: Hit this yesterday when configure hung attempting to pull the version from a kernel's .config. Looks right, but missing a signoff. Signed-off-by: john cooper john.coo...@redhat.com diff --git a/configure b/configure index 493c178..1fd133c 100755 --- a/configure +++ b/configure @@ -126,7 +126,7 @@ if [ -n $no_uname ]; then elif [ -e $kerneldir/include/config/kernel.release ]; then depmod_version=`cat $kerneldir/include/config/kernel.release` elif [ -e $kerneldir/.config ]; then - depmod_version=$(awk '/Linux kernel version:/ { print $NF }' + depmod_version=$(awk '/Linux kernel version:/ { print $NF }' \ $kerneldir/.config) else echo -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
configure script bug..
Hit this yesterday when configure hung attempting to pull the version from a kernel's .config. diff --git a/configure b/configure index 493c178..1fd133c 100755 --- a/configure +++ b/configure @@ -126,7 +126,7 @@ if [ -n $no_uname ]; then elif [ -e $kerneldir/include/config/kernel.release ]; then depmod_version=`cat $kerneldir/include/config/kernel.release` elif [ -e $kerneldir/.config ]; then - depmod_version=$(awk '/Linux kernel version:/ { print $NF }' + depmod_version=$(awk '/Linux kernel version:/ { print $NF }' \ $kerneldir/.config) else echo -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] Add serial number support for virtio_blk, V3
+if (!(id = kzalloc(ATA_ID_WORDS, GFP_KERNEL))) +rv = -ENOMEM; Doesn't ATA_ID_WORDS seem like a strange name for a number of bytes? Yes I caught that bug in the rework as well. What's this *for* BTW? Sorry -- I assumed you were on either list. Please see patch to follow. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] Add serial number support for virtio_blk, V3
Christoph Hellwig wrote: On Wed, May 27, 2009 at 09:49:19AM +0200, Christoph Hellwig wrote: /* * IDE-compatible identify ioctl. * * Currenlyt only returns the serial number and leaves all other fields * zero. */ Btw, thinking about it the rest of the information in the ioctl should probably be filled up with faked data, similar to how we do it for the ide emulation inside qemu. Agreed. I've done so to the extent it makes sense in the case of the more generic fields. A fair amount of the identify information being generated by hw/ide.c appears obsoleted. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Add serial number support for virtio_blk, V4
[Rework of earlier patch to provide additional information in the response to an ATA identify request -- virtio_blk treats the data as opaque, content created by qemu's virtio-blk. Comments from Christoph also incorporated.] This patch allows passing of a virtio_blk drive serial number from qemu into a guest's virtio_blk driver, and provides a means to access the serial number from a guest's userspace. Equivalent functionality currently exists for IDE and SCSI, however it is not yet implemented for virtio. Scenarios exist where guest code relies on a unique drive serial number to correctly identify the machine environment in which it exists. The following two patches implement the above: qemu-vblk-serial-4.patch which provides the qemu missing bits to interpret a '-drive .. serial=XYZ ..' flag, and: virtio_blk-serial-4.patch which extracts this information and makes it available to guest userspace via an HDIO_GET_IDENTITY ioctl, eg: 'hdparm -i /dev/vda'. The above patches are relative to qemu-kvm.git and 2.6.29.3 respectively. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Add serial number support for virtio_blk, V4
qemu-vblk-serial-4.patch diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index 8dd3c7a..0b7ebe9 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -25,6 +25,7 @@ typedef struct VirtIOBlock BlockDriverState *bs; VirtQueue *vq; void *rq; +char serial_str[BLOCK_SERIAL_STRLEN + 1]; } VirtIOBlock; static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) @@ -285,6 +286,50 @@ static void virtio_blk_reset(VirtIODevice *vdev) qemu_aio_flush(); } +/* store identify data in little endian format + */ +static inline void put_le16(uint16_t *p, unsigned int v) +{ +*p = cpu_to_le16(v); +} + +/* copy to *dst from *src, nul pad dst tail as needed to len bytes + */ +static inline void padstr(char *dst, const char *src, int len) +{ +while (len--) +*dst++ = *src ? *src++ : '\0'; +} + +/* setup simulated identify data as appropriate for virtio block device + * + * ref: AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS) + */ +static inline void virtio_identify_template(struct virtio_blk_config *bc) +{ +uint16_t *p = bc-identify[0]; +uint64_t lba_sectors = bc-capacity; + +memset(p, 0, sizeof(bc-identify)); +put_le16(p + 0, 0x0);/* ATA device */ +padstr((char *)(p + 23), QEMU_VERSION, 8); /* firmware revision */ +padstr((char *)(p + 27), QEMU VIRT_BLK, 40); /* model# */ +put_le16(p + 47, 0x80ff);/* max xfer 255 sectors */ +put_le16(p + 49, 0x0b00);/* support IORDY/LBA/DMA */ +put_le16(p + 59, 0x1ff); /* cur xfer 255 sectors */ +put_le16(p + 80, 0x1f0); /* support ATA8/7/6/5/4 */ +put_le16(p + 81, 0x16); +put_le16(p + 82, 0x400); +put_le16(p + 83, 0x400); +put_le16(p + 100, lba_sectors); +put_le16(p + 101, lba_sectors 16); +put_le16(p + 102, lba_sectors 32); +put_le16(p + 103, lba_sectors 48); +} + +/* coalesce internal state, copy to pci i/o region 0 + */ + static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) { VirtIOBlock *s = to_virtio_blk(vdev); @@ -299,11 +344,15 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) stw_raw(blkcfg.cylinders, cylinders); blkcfg.heads = heads; blkcfg.sectors = secs; +virtio_identify_template(blkcfg); +memcpy(blkcfg.identify[VIRTIO_BLK_ID_SN], s-serial_str, +VIRTIO_BLK_ID_SN_BYTES); memcpy(config, blkcfg, sizeof(blkcfg)); } static uint32_t virtio_blk_get_features(VirtIODevice *vdev) { +VirtIOBlock *s = to_virtio_blk(vdev); uint32_t features = 0; features |= (1 VIRTIO_BLK_F_SEG_MAX); @@ -311,6 +360,8 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev) #ifdef __linux__ features |= (1 VIRTIO_BLK_F_SCSI); #endif +if (strcmp(s-serial_str, 0)) +features |= 1 VIRTIO_BLK_F_IDENTIFY; return features; } @@ -354,6 +405,7 @@ VirtIODevice *virtio_blk_init(DeviceState *dev) int cylinders, heads, secs; static int virtio_blk_id; BlockDriverState *bs; +char *ps; s = (VirtIOBlock *)virtio_common_init(virtio-blk, VIRTIO_ID_BLOCK, sizeof(struct virtio_blk_config), @@ -365,6 +417,10 @@ VirtIODevice *virtio_blk_init(DeviceState *dev) s-vdev.reset = virtio_blk_reset; s-bs = bs; s-rq = NULL; +if (strlen(ps = (char *)drive_get_serial(bs))) +strncpy(s-serial_str, ps, sizeof(s-serial_str)); +else +snprintf(s-serial_str, sizeof(s-serial_str), 0); bs-private = dev; bdrv_guess_geometry(s-bs, cylinders, heads, secs); bdrv_set_geometry_hint(s-bs, cylinders, heads, secs); diff --git a/hw/virtio-blk.h b/hw/virtio-blk.h index dff3e0c..1be4342 100644 --- a/hw/virtio-blk.h +++ b/hw/virtio-blk.h @@ -30,6 +30,11 @@ #define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ #define VIRTIO_BLK_F_SCSI 7 /* Supports scsi command passthru */ +#define VIRTIO_BLK_F_IDENTIFY 8 /* ATA IDENTIFY supported */ + +#define VIRTIO_BLK_ID_LEN 256 /* length of identify u16 array */ +#define VIRTIO_BLK_ID_SN10 /* start of char * serial# */ +#define VIRTIO_BLK_ID_SN_BYTES 20 /* length in bytes of serial# */ struct virtio_blk_config { @@ -39,6 +44,8 @@ struct virtio_blk_config uint16_t cylinders; uint8_t heads; uint8_t sectors; +uint32_t _blk_size;/* structure pad, currently unused */ +uint16_t identify[VIRTIO_BLK_ID_LEN]; } __attribute__((packed)); /* These two define direction. */ diff --git a/sysemu.h b/sysemu.h index 47d001e..d3df19f 100644 --- a/sysemu.h +++ b/sysemu.h @@ -152,6 +152,8 @@ typedef enum { BLOCK_ERR_STOP_ANY } BlockInterfaceErrorAction; +#define BLOCK_SERIAL_STRLEN 20 + typedef struct DriveInfo { BlockDriverState *bdrv;
[PATCH 2/2] Add serial number support for virtio_blk, V4
virtio_blk-serial-4.patch drivers/block/virtio_blk.c | 41 ++--- include/linux/virtio_blk.h |7 +++ 2 files changed, 45 insertions(+), 3 deletions(-) = --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -146,12 +146,46 @@ static void do_virtblk_request(struct re vblk-vq-vq_ops-kick(vblk-vq); } +/* return ATA identify data + */ +static int virtblk_identify(struct gendisk *disk, void *argp) +{ + struct virtio_blk *vblk = disk-private_data; + u16 *id; + int err = -ENOMEM; + + id = kmalloc(VIRTIO_BLK_ID_BYTES, GFP_KERNEL); + if (!id) + goto out; + + err = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_IDENTIFY, + offsetof(struct virtio_blk_config, identify), id, + VIRTIO_BLK_ID_BYTES); + + if (err) + goto out_kfree; + + if (copy_to_user(argp, id, VIRTIO_BLK_ID_BYTES)) + err = -EFAULT; + +out_kfree: + kfree(id); +out: + return err; +} + static int virtblk_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long data) { - return scsi_cmd_ioctl(bdev-bd_disk-queue, - bdev-bd_disk, mode, cmd, - (void __user *)data); + struct gendisk *disk = bdev-bd_disk; + void __user *argp = (void __user *)data; + + switch (cmd) { + case HDIO_GET_IDENTITY: + return virtblk_identify(disk, argp); + default: + return scsi_cmd_ioctl(disk-queue, disk, mode, cmd, argp); + } } /* We provide getgeo only to please some old bootloader/partitioning tools */ @@ -356,6 +390,7 @@ static struct virtio_device_id id_table[ static unsigned int features[] = { VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, + VIRTIO_BLK_F_IDENTIFY }; static struct virtio_driver virtio_blk = { = --- a/include/linux/virtio_blk.h +++ b/include/linux/virtio_blk.h @@ -15,7 +15,13 @@ #define VIRTIO_BLK_F_GEOMETRY 4 /* Legacy geometry available */ #define VIRTIO_BLK_F_RO5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ +#define VIRTIO_BLK_F_IDENTIFY 8 /* ATA IDENTIFY supported */ +#define VIRTIO_BLK_ID_LEN 256 +#define VIRTIO_BLK_ID_BYTES (VIRTIO_BLK_ID_LEN * sizeof (u16)) + +/* mapped into pci i/o region 0 + */ struct virtio_blk_config { /* The capacity (in 512-byte sectors). */ @@ -32,6 +38,7 @@ struct virtio_blk_config } geometry; /* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */ __u32 blk_size; + __u16 identify[VIRTIO_BLK_ID_LEN]; } __attribute__((packed)); /* These two define direction. */ -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Add serial number support for virtio_blk, V2
Christoph Hellwig wrote: On Mon, May 18, 2009 at 11:00:41AM -0400, john cooper wrote: Christoph Hellwig wrote: So why can't we re-use the existing interfaces instead of inventing a new one? I'm unclear to what specifically you're referring -- the ioctl() used to retrieve the serial number in the guest? Well, there's not specific ioctl to get a serial number for scsi, but given that we now have SG_IO passthrough in virtio-blk it should be easy enough to provide inquiry data and the device identification VPD page by that way. Not sure how it's handled for ide, maybe that way is even easier. Yea, I'm hardly in enamored with the IDE/ATA diatribe. But in this case displacing the new ioctl with HDIO_GET_IDENTITY seemed the most straightforward means to provide access within an existing interface. Updated patch follows. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Add serial number support for virtio_blk, V3
-- john.coo...@redhat.com diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index dad4ef0..90825a8 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -25,6 +25,7 @@ typedef struct VirtIOBlock BlockDriverState *bs; VirtQueue *vq; void *rq; +char serial_str[BLOCK_SERIAL_STRLEN + 1]; } VirtIOBlock; static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) @@ -285,6 +286,8 @@ static void virtio_blk_reset(VirtIODevice *vdev) qemu_aio_flush(); } +/* coalesce internal state, copy to pci i/o region 0 + */ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) { VirtIOBlock *s = to_virtio_blk(vdev); @@ -299,11 +302,13 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) stw_raw(blkcfg.cylinders, cylinders); blkcfg.heads = heads; blkcfg.sectors = secs; +memcpy(blkcfg.serial, s-serial_str, sizeof (blkcfg.serial)); memcpy(config, blkcfg, sizeof(blkcfg)); } static uint32_t virtio_blk_get_features(VirtIODevice *vdev) { +VirtIOBlock *s = to_virtio_blk(vdev); uint32_t features = 0; features |= (1 VIRTIO_BLK_F_SEG_MAX); @@ -311,6 +316,8 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev) #ifdef __linux__ features |= (1 VIRTIO_BLK_F_SCSI); #endif +if (strcmp(s-serial_str, 0)) +features |= 1 VIRTIO_BLK_F_SN; return features; } @@ -353,6 +360,7 @@ void *virtio_blk_init(PCIBus *bus, BlockDriverState *bs) VirtIOBlock *s; int cylinders, heads, secs; static int virtio_blk_id; +char *ps = drive_get_serial(bs); s = (VirtIOBlock *)virtio_init_pci(bus, virtio-blk, PCI_VENDOR_ID_REDHAT_QUMRANET, @@ -369,6 +377,10 @@ void *virtio_blk_init(PCIBus *bus, BlockDriverState *bs) s-vdev.reset = virtio_blk_reset; s-bs = bs; s-rq = NULL; +if (strlen(ps)) +strncpy(s-serial_str, ps, sizeof (s-serial_str)); +else +snprintf(s-serial_str, sizeof (s-serial_str), 0); bs-private = s-vdev.pci_dev; bdrv_guess_geometry(s-bs, cylinders, heads, secs); bdrv_set_geometry_hint(s-bs, cylinders, heads, secs); diff --git a/hw/virtio-blk.h b/hw/virtio-blk.h index 5ef6c36..3229394 100644 --- a/hw/virtio-blk.h +++ b/hw/virtio-blk.h @@ -31,6 +31,7 @@ #define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ #define VIRTIO_BLK_F_SCSI 7 /* Supports scsi command passthru */ +#define VIRTIO_BLK_F_SN 8 /* serial number supported */ struct virtio_blk_config { @@ -40,6 +41,8 @@ struct virtio_blk_config uint16_t cylinders; uint8_t heads; uint8_t sectors; +uint32_t _blk_size;/* structure pad, currently unused */ +uint8_t serial[BLOCK_SERIAL_STRLEN]; } __attribute__((packed)); /* These two define direction. */ diff --git a/sysemu.h b/sysemu.h index 1f45fd6..185b4e3 100644 --- a/sysemu.h +++ b/sysemu.h @@ -141,6 +141,8 @@ typedef enum { BLOCK_ERR_STOP_ANY } BlockInterfaceErrorAction; +#define BLOCK_SERIAL_STRLEN 20 + typedef struct DriveInfo { BlockDriverState *bdrv; BlockInterfaceType type; @@ -149,7 +151,7 @@ typedef struct DriveInfo { int used; int drive_opt_idx; BlockInterfaceErrorAction onerror; -char serial[21]; +char serial[BLOCK_SERIAL_STRLEN + 1]; } DriveInfo; #define MAX_IDE_DEVS 2
[PATCH 2/2] Add serial number support for virtio_blk, V3
-- john.coo...@redhat.com drivers/block/virtio_blk.c | 32 +--- include/linux/virtio_blk.h |6 ++ 2 files changed, 35 insertions(+), 3 deletions(-) = --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -146,12 +146,37 @@ static void do_virtblk_request(struct re vblk-vq-vq_ops-kick(vblk-vq); } +/* return serial# in IDE identify data (all other fields are currently zero) + */ +static int virtblk_identity(struct block_device *bdev, void *buf) +{ + struct virtio_blk *vblk = bdev-bd_disk-private_data; + u16 *id; + int rv; + + if (!(id = kzalloc(ATA_ID_WORDS, GFP_KERNEL))) + rv = -ENOMEM; + else if ((rv = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_SN, + offsetof(struct virtio_blk_config, serial), id[ATA_ID_SERNO], + ATA_ID_SERNO_LEN))) + ; + else if (copy_to_user(buf, id, ATA_ID_WORDS)) + rv = -EFAULT; + else + rv = 0; + if (id) + kfree(id); +return (rv); +} + static int virtblk_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long data) { - return scsi_cmd_ioctl(bdev-bd_disk-queue, - bdev-bd_disk, mode, cmd, - (void __user *)data); + if (cmd == HDIO_GET_IDENTITY) + return (virtblk_identity(bdev, (void __user *)data)); + else + return scsi_cmd_ioctl(bdev-bd_disk-queue, bdev-bd_disk, + mode, cmd, (void __user *)data); } /* We provide getgeo only to please some old bootloader/partitioning tools */ @@ -356,6 +381,7 @@ static struct virtio_device_id id_table[ static unsigned int features[] = { VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, + VIRTIO_BLK_F_SN }; static struct virtio_driver virtio_blk = { = --- a/include/linux/virtio_blk.h +++ b/include/linux/virtio_blk.h @@ -15,7 +15,12 @@ #define VIRTIO_BLK_F_GEOMETRY 4 /* Legacy geometry available */ #define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ +#define VIRTIO_BLK_F_SN 8 /* serial number supported */ +#define BLOCK_SERIAL_STRLEN 20 + +/* mapped into pci i/o region 0 + */ struct virtio_blk_config { /* The capacity (in 512-byte sectors). */ @@ -32,6 +37,7 @@ struct virtio_blk_config } geometry; /* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */ __u32 blk_size; + __u8 serial[BLOCK_SERIAL_STRLEN]; } __attribute__((packed)); /* These two define direction. */
Re: [PATCH 0/2] Add serial number support for virtio_blk, V2
Christoph Hellwig wrote: On Wed, May 13, 2009 at 01:06:57PM -0400, john cooper wrote: [Resend of earlier patch: 1/2 rebased to qemu-kvm, 2/2 minor tweak] patch 1/2 seems to be missing. It is in the kvm and qemu-devel list archives: http://www.spinics.net/lists/kvm/maillist.html http://lists.gnu.org/archive/html/qemu-devel/2009-05/msg00661.html Equivalent functionality currently exists for IDE and SCSI, however it is not yet implemented for virtio. So why can't we re-use the existing interfaces instead of inventing a new one? I'm unclear to what specifically you're referring -- the ioctl() used to retrieve the serial number in the guest? -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Add serial number support for virtio_blk, V2
[Resend of earlier patch: 1/2 rebased to qemu-kvm, 2/2 minor tweak] This patch allows passing of a virtio_blk drive serial number from qemu into a guest's virtio_blk driver, and provides a means to access the serial number from a guest's userspace. Equivalent functionality currently exists for IDE and SCSI, however it is not yet implemented for virtio. Scenarios exist where guest code relies on a unique drive serial number to correctly identify the machine environment in which it exists. The following two patches implement the above: qemu-vblk-serial-2.patch which provides the qemu missing bits to interpret a '-drive .. serial=XYZ ..' flag, and: virtio_blk-serial-2.patch which extracts this information and makes it available to guest userspace via ioctl. Attached to this patch header is a trivial example program which retrieves the serial number from guest userspace. The above patches are relative to qemu-kvm.git and 2.6.29.3 respectively. -john -- john.coo...@redhat.com /* example: retrieve serial number from virtio block device */ #include stdio.h #include fcntl.h #include stdlib.h #include linux/virtio_blk.h #define iswhite(c) (!('!' = (c) (c) = '~')) #ifndef VBLK_GET_SN #define VBLK_GET_SN ((unsigned int)('V' 24 | 'B' 16 | 'L' 8 | 'K')) #endif /* get virtblk drive serial# */ int main(int ac, char ***av) { int fd, nb, i; unsigned char sn[30]; unsigned char *p; sn[0] = sizeof (sn); if ((fd = open(/dev/vda, O_RDONLY)) 0) perror(can't open device), exit(1); else if ((nb = ioctl(fd, VBLK_GET_SN, sn)) 0) perror(can't ioctl device), exit(1); printf(returned %d bytes:\n, nb); for (p = sn, i = nb; 0 = --i; ++p) printf(%02x%c, *p, i ? ' ' : '\t'); for (p = sn, i = nb; 0 = --i; ++p) printf(%c%s, iswhite(*p) ? '.' : *p, i ? : \n); return (0); }
[PATCH 1/2] Add serial number support for virtio_blk, V2
-- john.coo...@redhat.com diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index dad4ef0..90825a8 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -25,6 +25,7 @@ typedef struct VirtIOBlock BlockDriverState *bs; VirtQueue *vq; void *rq; +char serial_str[BLOCK_SERIAL_STRLEN + 1]; } VirtIOBlock; static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) @@ -285,6 +286,8 @@ static void virtio_blk_reset(VirtIODevice *vdev) qemu_aio_flush(); } +/* coalesce internal state, copy to pci i/o region 0 + */ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) { VirtIOBlock *s = to_virtio_blk(vdev); @@ -299,11 +302,13 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) stw_raw(blkcfg.cylinders, cylinders); blkcfg.heads = heads; blkcfg.sectors = secs; +memcpy(blkcfg.serial, s-serial_str, sizeof (blkcfg.serial)); memcpy(config, blkcfg, sizeof(blkcfg)); } static uint32_t virtio_blk_get_features(VirtIODevice *vdev) { +VirtIOBlock *s = to_virtio_blk(vdev); uint32_t features = 0; features |= (1 VIRTIO_BLK_F_SEG_MAX); @@ -311,6 +316,8 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev) #ifdef __linux__ features |= (1 VIRTIO_BLK_F_SCSI); #endif +if (strcmp(s-serial_str, 0)) +features |= 1 VIRTIO_BLK_F_SN; return features; } @@ -353,6 +360,7 @@ void *virtio_blk_init(PCIBus *bus, BlockDriverState *bs) VirtIOBlock *s; int cylinders, heads, secs; static int virtio_blk_id; +char *ps = drive_get_serial(bs); s = (VirtIOBlock *)virtio_init_pci(bus, virtio-blk, PCI_VENDOR_ID_REDHAT_QUMRANET, @@ -369,6 +377,10 @@ void *virtio_blk_init(PCIBus *bus, BlockDriverState *bs) s-vdev.reset = virtio_blk_reset; s-bs = bs; s-rq = NULL; +if (strlen(ps)) +strncpy(s-serial_str, ps, sizeof (s-serial_str)); +else +snprintf(s-serial_str, sizeof (s-serial_str), 0); bs-private = s-vdev.pci_dev; bdrv_guess_geometry(s-bs, cylinders, heads, secs); bdrv_set_geometry_hint(s-bs, cylinders, heads, secs); diff --git a/hw/virtio-blk.h b/hw/virtio-blk.h index 5ef6c36..3229394 100644 --- a/hw/virtio-blk.h +++ b/hw/virtio-blk.h @@ -31,6 +31,7 @@ #define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ #define VIRTIO_BLK_F_SCSI 7 /* Supports scsi command passthru */ +#define VIRTIO_BLK_F_SN 8 /* serial number supported */ struct virtio_blk_config { @@ -40,6 +41,8 @@ struct virtio_blk_config uint16_t cylinders; uint8_t heads; uint8_t sectors; +uint32_t _blk_size;/* structure pad, currently unused */ +uint8_t serial[BLOCK_SERIAL_STRLEN]; } __attribute__((packed)); /* These two define direction. */ diff --git a/sysemu.h b/sysemu.h index 1f45fd6..185b4e3 100644 --- a/sysemu.h +++ b/sysemu.h @@ -141,6 +141,8 @@ typedef enum { BLOCK_ERR_STOP_ANY } BlockInterfaceErrorAction; +#define BLOCK_SERIAL_STRLEN 20 + typedef struct DriveInfo { BlockDriverState *bdrv; BlockInterfaceType type; @@ -149,7 +151,7 @@ typedef struct DriveInfo { int used; int drive_opt_idx; BlockInterfaceErrorAction onerror; -char serial[21]; +char serial[BLOCK_SERIAL_STRLEN + 1]; } DriveInfo; #define MAX_IDE_DEVS 2
[PATCH 2/2] Add serial number support for virtio_blk, V2
-- john.coo...@redhat.com drivers/block/virtio_blk.c | 35 --- include/linux/virtio_blk.h | 10 ++ 2 files changed, 42 insertions(+), 3 deletions(-) = --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -146,12 +146,40 @@ static void do_virtblk_request(struct re vblk-vq-vq_ops-kick(vblk-vq); } +/* user passes the address of a char[] for serial# return, and has set char[0] + * to the array size. copy serial# to this char[] and return number of + * characters copied excluding any trailing '\0' pad chars in buffer. + */ +static int get_virtblk_sn(struct block_device *bdev, void *buf) +{ + struct virtio_blk *vblk = bdev-bd_disk-private_data; + unsigned char serial[BLOCK_SERIAL_STRLEN]; + unsigned char snlen; + int rv; + + if (copy_from_user(snlen, buf, sizeof (snlen))) + rv = -EFAULT; + else if ((rv = virtio_config_val(vblk-vdev, VIRTIO_BLK_F_SN, + offsetof(struct virtio_blk_config, serial), serial))) + ; + else if (copy_to_user(buf, serial, + snlen = min(snlen, (unsigned char)sizeof (serial + rv = -EFAULT; + else + for (rv = 0; rv snlen; ++rv) + if (!serial[rv]) +break; + return (rv); +} + static int virtblk_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long data) { - return scsi_cmd_ioctl(bdev-bd_disk-queue, - bdev-bd_disk, mode, cmd, - (void __user *)data); + if (cmd == VBLK_GET_SN) + return (get_virtblk_sn(bdev, (void __user *)data)); + else + return scsi_cmd_ioctl(bdev-bd_disk-queue, bdev-bd_disk, + mode, cmd, (void __user *)data); } /* We provide getgeo only to please some old bootloader/partitioning tools */ @@ -356,6 +384,7 @@ static struct virtio_device_id id_table[ static unsigned int features[] = { VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, + VIRTIO_BLK_F_SN }; static struct virtio_driver virtio_blk = { = --- a/include/linux/virtio_blk.h +++ b/include/linux/virtio_blk.h @@ -15,7 +15,16 @@ #define VIRTIO_BLK_F_GEOMETRY 4 /* Legacy geometry available */ #define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ +#define VIRTIO_BLK_F_SN 8 /* serial number supported */ +/* ioctl cmd to retrieve serial# +*/ +#define VBLK_GET_SN ((unsigned int)('V' 24 | 'B' 16 | 'L' 8 | 'K')) + +#define BLOCK_SERIAL_STRLEN 20 + +/* mapped into pci i/o region 0 + */ struct virtio_blk_config { /* The capacity (in 512-byte sectors). */ @@ -32,6 +41,7 @@ struct virtio_blk_config } geometry; /* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */ __u32 blk_size; + __u8 serial[BLOCK_SERIAL_STRLEN]; } __attribute__((packed)); /* These two define direction. */
Re: changing guest CD
Stuart Jansen wrote: Does KVM support changing the CD in a running guest's disc drive? I've tried to do it using the qemu monitor, but so far haven't been able to. I've seen rumor and innuendo that KVM can't change the disc in a running system, but no official confirmation yet. If KVM doesn't support changing the disc in a running system, what would be required to support it? The following worked for me when doing a guest install from multiple CD iso images: to enter monitor from guest screen: cntlalt2 to exit back to guest screen: cntlalt1 changing a cd image: (qemu) info block list of block devices (qemu) eject ide1-cd0 (qemu) change ide1-cd0 target_file cntlalt1 -- john.coo...@third-harmonic.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Add serial number support for virtio_blk
-- john.coo...@third-harmonic.com hw/virtio-blk.c | 15 ++- hw/virtio-blk.h |3 +++ sysemu.h|4 +++- 3 files changed, 20 insertions(+), 2 deletions(-) = --- a/qemu/hw/virtio-blk.h +++ b/qemu/hw/virtio-blk.h @@ -28,6 +28,7 @@ #define VIRTIO_BLK_F_SIZE_MAX 1 /* Indicates maximum segment size */ #define VIRTIO_BLK_F_SEG_MAX2 /* Indicates maximum # of segments */ #define VIRTIO_BLK_F_GEOMETRY 4 /* Indicates support of legacy geometry */ +#define VIRTIO_BLK_F_SN 7 /* serial number supported */ struct virtio_blk_config { @@ -37,6 +38,8 @@ struct virtio_blk_config uint16_t cylinders; uint8_t heads; uint8_t sectors; +uint32_t _blk_size;/* structure pad, currently unused */ +uint8_t serial[BLOCK_SERIAL_STRLEN]; } __attribute__((packed)); /* These two define direction. */ = --- a/qemu/hw/virtio-blk.c +++ b/qemu/hw/virtio-blk.c @@ -22,6 +22,7 @@ typedef struct VirtIOBlock BlockDriverState *bs; VirtQueue *vq; void *rq; +char serial_str[BLOCK_SERIAL_STRLEN + 1]; } VirtIOBlock; static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) @@ -238,6 +239,8 @@ static void virtio_blk_reset(VirtIODevic qemu_aio_flush(); } +/* coalesce internal state, copy to pci i/o region 0 + */ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) { VirtIOBlock *s = to_virtio_blk(vdev); @@ -252,12 +255,17 @@ static void virtio_blk_update_config(Vir stw_raw(blkcfg.cylinders, cylinders); blkcfg.heads = heads; blkcfg.sectors = secs; +memcpy(blkcfg.serial, s-serial_str, sizeof (blkcfg.serial)); memcpy(config, blkcfg, sizeof(blkcfg)); } static uint32_t virtio_blk_get_features(VirtIODevice *vdev) { -return (1 VIRTIO_BLK_F_SEG_MAX | 1 VIRTIO_BLK_F_GEOMETRY); +VirtIOBlock *s = to_virtio_blk(vdev); +char ser_set = strcmp(s-serial_str, 0); + +return (1 VIRTIO_BLK_F_SEG_MAX | 1 VIRTIO_BLK_F_GEOMETRY | +(ser_set ? 1 VIRTIO_BLK_F_SN : 0)); } static void virtio_blk_save(QEMUFile *f, void *opaque) @@ -298,6 +306,7 @@ void *virtio_blk_init(PCIBus *bus, Block VirtIOBlock *s; int cylinders, heads, secs; static int virtio_blk_id; +char *ps = drive_get_serial(bs); s = (VirtIOBlock *)virtio_init_pci(bus, virtio-blk, PCI_VENDOR_ID_REDHAT_QUMRANET, @@ -314,6 +323,10 @@ void *virtio_blk_init(PCIBus *bus, Block s-vdev.reset = virtio_blk_reset; s-bs = bs; s-rq = NULL; +if (strlen(ps)) +strncpy(s-serial_str, ps, sizeof (s-serial_str)); +else +snprintf(s-serial_str, sizeof (s-serial_str), 0); bs-devfn = s-vdev.pci_dev.devfn; bdrv_guess_geometry(s-bs, cylinders, heads, secs); bdrv_set_geometry_hint(s-bs, cylinders, heads, secs); = --- a/qemu/sysemu.h +++ b/qemu/sysemu.h @@ -133,13 +133,15 @@ typedef enum { BLOCK_ERR_STOP_ANY } BlockInterfaceErrorAction; +#define BLOCK_SERIAL_STRLEN 20 + typedef struct DriveInfo { BlockDriverState *bdrv; BlockInterfaceType type; int bus; int unit; BlockInterfaceErrorAction onerror; -char serial[21]; +char serial[BLOCK_SERIAL_STRLEN + 1]; int used; int drive_opt_idx; } DriveInfo;
[PATCH 2/2] Add serial number support for virtio_blk
-- john.coo...@third-harmonic.com drivers/block/virtio_blk.c | 36 +--- include/linux/virtio_blk.h | 10 ++ 2 files changed, 43 insertions(+), 3 deletions(-) = --- a/include/linux/virtio_blk.h +++ b/include/linux/virtio_blk.h @@ -14,7 +14,16 @@ #define VIRTIO_BLK_F_GEOMETRY 4 /* Legacy geometry available */ #define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ +#define VIRTIO_BLK_F_SN 7 /* serial number supported */ +/* ioctl cmd to retrieve serial# +*/ +#define VBLK_GET_SN ((unsigned int)('V' 24 | 'B' 16 | 'L' 8 | 'K')) + +#define BLOCK_SERIAL_STRLEN 20 + +/* mapped into pci i/o region 0 + */ struct virtio_blk_config { /* The capacity (in 512-byte sectors). */ @@ -31,6 +40,7 @@ struct virtio_blk_config } geometry; /* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */ __u32 blk_size; + __u8 serial[BLOCK_SERIAL_STRLEN]; } __attribute__((packed)); /* These two define direction. */ = --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -146,12 +146,41 @@ static void do_virtblk_request(struct re vblk-vq-vq_ops-kick(vblk-vq); } +/* user passes the address of a char[] for serial# return, and has set char[0] + * to the array size. copy serial# to this char[] and return number of + * characters copied excluding any trailing '\0' pad chars in buffer. + */ +#define _min(a,b) ((a) (b) ? (a) : (b)) +static int get_virtblk_sn(struct block_device *bdev, void *buf) +{ + struct virtio_blk *vblk = bdev-bd_disk-private_data; + unsigned char serial[BLOCK_SERIAL_STRLEN]; + unsigned char snlen; + int rv; + + if (copy_from_user(snlen, buf, sizeof (snlen))) + rv = -EFAULT; + else if ((rv = virtio_config_val(vblk-vdev, VIRTIO_BLK_F_SN, + offsetof(struct virtio_blk_config, serial), serial))) + ; + else if (copy_to_user(buf, serial, + snlen = min(snlen, (unsigned char)sizeof (serial + rv = -EFAULT; + else + for (rv = 0; rv snlen; ++rv) + if (!serial[rv]) +break; + return (rv); +} + static int virtblk_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long data) { - return scsi_cmd_ioctl(bdev-bd_disk-queue, - bdev-bd_disk, mode, cmd, - (void __user *)data); + if (cmd == VBLK_GET_SN) + return (get_virtblk_sn(bdev, (void __user *)data)); + else + return scsi_cmd_ioctl(bdev-bd_disk-queue, bdev-bd_disk, + mode, cmd, (void __user *)data); } /* We provide getgeo only to please some old bootloader/partitioning tools */ @@ -339,6 +368,7 @@ static struct virtio_device_id id_table[ static unsigned int features[] = { VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, + VIRTIO_BLK_F_SN }; static struct virtio_driver virtio_blk = {
[PATCH 0/2] Add serial number support for virtio_blk
This patch allows passing of a virtio_blk drive serial number from qemu into a guest's virtio_blk driver, and provides a means to access the serial number from a guest's userspace. Equivalent functionality currently exists for IDE and SCSI, however it is not yet implemented for virtio. Scenarios exist where guest code relies on a unique drive serial number to correctly identify the machine environment in which it exists. The following two patches implement the above qemu-vblk-serial.patch which provides the qemu missing bits to interpret a '-drive .. serial=XYZ ..' flag, and virtio_blk-serial.patch which extracts this information and make it available to guest userspace via ioctl. Attached to this patch header is a trivial example program which retrieves the serial number from guest userspace. The above patches are relative to kvm-84 and 2.6.28 respectively. -john -- john.coo...@third-harmonic.com /* example: retrieve serial number from virtio block device */ #include stdio.h #include fcntl.h #include stdlib.h #include linux/virtio_blk.h #define iswhite(c) (!('!' = (c) (c) = '~')) #ifndef VBLK_GET_SN #define VBLK_GET_SN ((unsigned int)('V' 24 | 'B' 16 | 'L' 8 | 'K')) #endif /* get virtblk drive serial# */ int main(int ac, char ***av) { int fd, nb, i; unsigned char sn[30]; unsigned char *p; sn[0] = sizeof (sn); if ((fd = open(/dev/vda, O_RDONLY)) 0) perror(can't open device), exit(1); else if ((nb = ioctl(fd, VBLK_GET_SN, sn)) 0) perror(can't ioctl device), exit(1); printf(returned %d bytes:\n, nb); for (p = sn, i = nb; 0 = --i; ++p) printf(%02x%c, *p, i ? ' ' : '\t'); for (p = sn, i = nb; 0 = --i; ++p) printf(%c%s, iswhite(*p) ? '.' : *p, i ? : \n); return (0); }
[PATCH 1/2] Add serial number support for virtio_blk
-- john.coo...@third-harmonic.com hw/virtio-blk.c | 15 ++- hw/virtio-blk.h |3 +++ sysemu.h|4 +++- 3 files changed, 20 insertions(+), 2 deletions(-) = --- a/qemu/hw/virtio-blk.h +++ b/qemu/hw/virtio-blk.h @@ -28,6 +28,7 @@ #define VIRTIO_BLK_F_SIZE_MAX 1 /* Indicates maximum segment size */ #define VIRTIO_BLK_F_SEG_MAX2 /* Indicates maximum # of segments */ #define VIRTIO_BLK_F_GEOMETRY 4 /* Indicates support of legacy geometry */ +#define VIRTIO_BLK_F_SN 7 /* serial number supported */ struct virtio_blk_config { @@ -37,6 +38,8 @@ struct virtio_blk_config uint16_t cylinders; uint8_t heads; uint8_t sectors; +uint32_t _blk_size;/* structure pad, currently unused */ +uint8_t serial[BLOCK_SERIAL_STRLEN]; } __attribute__((packed)); /* These two define direction. */ = --- a/qemu/hw/virtio-blk.c +++ b/qemu/hw/virtio-blk.c @@ -22,6 +22,7 @@ typedef struct VirtIOBlock BlockDriverState *bs; VirtQueue *vq; void *rq; +char serial_str[BLOCK_SERIAL_STRLEN + 1]; } VirtIOBlock; static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) @@ -238,6 +239,8 @@ static void virtio_blk_reset(VirtIODevic qemu_aio_flush(); } +/* coalesce internal state, copy to pci i/o region 0 + */ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) { VirtIOBlock *s = to_virtio_blk(vdev); @@ -252,12 +255,17 @@ static void virtio_blk_update_config(Vir stw_raw(blkcfg.cylinders, cylinders); blkcfg.heads = heads; blkcfg.sectors = secs; +memcpy(blkcfg.serial, s-serial_str, sizeof (blkcfg.serial)); memcpy(config, blkcfg, sizeof(blkcfg)); } static uint32_t virtio_blk_get_features(VirtIODevice *vdev) { -return (1 VIRTIO_BLK_F_SEG_MAX | 1 VIRTIO_BLK_F_GEOMETRY); +VirtIOBlock *s = to_virtio_blk(vdev); +char ser_set = strcmp(s-serial_str, 0); + +return (1 VIRTIO_BLK_F_SEG_MAX | 1 VIRTIO_BLK_F_GEOMETRY | +(ser_set ? 1 VIRTIO_BLK_F_SN : 0)); } static void virtio_blk_save(QEMUFile *f, void *opaque) @@ -298,6 +306,7 @@ void *virtio_blk_init(PCIBus *bus, Block VirtIOBlock *s; int cylinders, heads, secs; static int virtio_blk_id; +char *ps = drive_get_serial(bs); s = (VirtIOBlock *)virtio_init_pci(bus, virtio-blk, PCI_VENDOR_ID_REDHAT_QUMRANET, @@ -314,6 +323,10 @@ void *virtio_blk_init(PCIBus *bus, Block s-vdev.reset = virtio_blk_reset; s-bs = bs; s-rq = NULL; +if (strlen(ps)) +strncpy(s-serial_str, ps, sizeof (s-serial_str)); +else +snprintf(s-serial_str, sizeof (s-serial_str), 0); bs-devfn = s-vdev.pci_dev.devfn; bdrv_guess_geometry(s-bs, cylinders, heads, secs); bdrv_set_geometry_hint(s-bs, cylinders, heads, secs); = --- a/qemu/sysemu.h +++ b/qemu/sysemu.h @@ -133,13 +133,15 @@ typedef enum { BLOCK_ERR_STOP_ANY } BlockInterfaceErrorAction; +#define BLOCK_SERIAL_STRLEN 20 + typedef struct DriveInfo { BlockDriverState *bdrv; BlockInterfaceType type; int bus; int unit; BlockInterfaceErrorAction onerror; -char serial[21]; +char serial[BLOCK_SERIAL_STRLEN + 1]; int used; int drive_opt_idx; } DriveInfo;
[PATCH 2/2] Add serial number support for virtio_blk
-- john.coo...@third-harmonic.com drivers/block/virtio_blk.c | 36 +--- include/linux/virtio_blk.h | 10 ++ 2 files changed, 43 insertions(+), 3 deletions(-) = --- a/include/linux/virtio_blk.h +++ b/include/linux/virtio_blk.h @@ -14,7 +14,16 @@ #define VIRTIO_BLK_F_GEOMETRY 4 /* Legacy geometry available */ #define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ +#define VIRTIO_BLK_F_SN 7 /* serial number supported */ +/* ioctl cmd to retrieve serial# +*/ +#define VBLK_GET_SN ((unsigned int)('V' 24 | 'B' 16 | 'L' 8 | 'K')) + +#define BLOCK_SERIAL_STRLEN 20 + +/* mapped into pci i/o region 0 + */ struct virtio_blk_config { /* The capacity (in 512-byte sectors). */ @@ -31,6 +40,7 @@ struct virtio_blk_config } geometry; /* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */ __u32 blk_size; + __u8 serial[BLOCK_SERIAL_STRLEN]; } __attribute__((packed)); /* These two define direction. */ = --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -146,12 +146,41 @@ static void do_virtblk_request(struct re vblk-vq-vq_ops-kick(vblk-vq); } +/* user passes the address of a char[] for serial# return, and has set char[0] + * to the array size. copy serial# to this char[] and return number of + * characters copied excluding any trailing '\0' pad chars in buffer. + */ +#define _min(a,b) ((a) (b) ? (a) : (b)) +static int get_virtblk_sn(struct block_device *bdev, void *buf) +{ + struct virtio_blk *vblk = bdev-bd_disk-private_data; + unsigned char serial[BLOCK_SERIAL_STRLEN]; + unsigned char snlen; + int rv; + + if (copy_from_user(snlen, buf, sizeof (snlen))) + rv = -EFAULT; + else if ((rv = virtio_config_val(vblk-vdev, VIRTIO_BLK_F_SN, + offsetof(struct virtio_blk_config, serial), serial))) + ; + else if (copy_to_user(buf, serial, + snlen = min(snlen, (unsigned char)sizeof (serial + rv = -EFAULT; + else + for (rv = 0; rv snlen; ++rv) + if (!serial[rv]) +break; + return (rv); +} + static int virtblk_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long data) { - return scsi_cmd_ioctl(bdev-bd_disk-queue, - bdev-bd_disk, mode, cmd, - (void __user *)data); + if (cmd == VBLK_GET_SN) + return (get_virtblk_sn(bdev, (void __user *)data)); + else + return scsi_cmd_ioctl(bdev-bd_disk-queue, bdev-bd_disk, + mode, cmd, (void __user *)data); } /* We provide getgeo only to please some old bootloader/partitioning tools */ @@ -339,6 +368,7 @@ static struct virtio_device_id id_table[ static unsigned int features[] = { VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, + VIRTIO_BLK_F_SN }; static struct virtio_driver virtio_blk = {
Re: bad virtio disk performance
Lucas Nussbaum wrote: Hi, I'm experiencing bad disk I/O performance using virtio disks. I'm using Linux 2.6.29 (host guest), kvm 84 userspace. On the host, and in a non-virtio guest, I get ~120 MB/s when writing with dd (the disks are fast RAID0 SAS disks). Could you provide detail of the exact type and size of i/o load you were creating with dd? Also the full qemu cmd line invocation in both cases would be useful. In a guest with a virtio disk, I get at most ~32 MB/s. Which non-virtio interface was used for the comparison? The rest of the setup is the same. For reference, I'm running kvm -drive file=/tmp/debian-amd64.img,if=virtio. Is such performance expected? What should I check? Not expected, something is awry. blktrace(8) run on the host will shed some light on the type of i/o requests issued by qemu in both cases. -john -- john.coo...@third-harmonic.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad virtio disk performance
Lucas Nussbaum wrote: On 27/04/09 at 13:36 -0400, john cooper wrote: Lucas Nussbaum wrote: non-virtio: kvm -drive file=/tmp/debian-amd64.img,if=scsi,cache=writethrough -net nic,macaddr=00:16:3e:5a:28:1,model=e1000 -net tap -nographic -kernel /boot/vmlinuz-2.6.29 -initrd /boot/initrd.img-2.6.29 -append root=/dev/sda1 ro console=tty0 console=ttyS0,9600,8n1 virtio: kvm -drive file=/tmp/debian-amd64.img,if=virtio,cache=writethrough -net nic,macaddr=00:16:3e:5a:28:1,model=e1000 -net tap -nographic -kernel /boot/vmlinuz-2.6.29 -initrd /boot/initrd.img-2.6.29 -append root=/dev/vda1 ro console=tty0 console=ttyS0,9600,8n1 One suggestion would be to use a separate drive for the virtio vs. non-virtio comparison to avoid a Heisenberg effect. So, apparently, with virtio, there's a lot more data being written to disk. The underlying filesystem is ext3, and is monted as /tmp. It only contains the VM image file. Another difference is that, with virtio, the data was shared equally over all 4 CPUs, with without virt-io, CPU0 and CPU1 did all the work. In the virtio log, I also see a (null) process doing a lot of writes. Can't say what is causing that -- only took a look at the short logs. However the isolation suggested above may help factor that out if you need to pursue this path. I uploaded the logs to http://blop.info/bazaar/virtio/, if you want to take a look. In the virtio case i/o is being issued from multiple threads. You could be hitting the cfq close-cooperator bug we saw as late as 2.6.28. A quick test to rule this in/out would be to change the block scheduler to other than cfq for the host device where the backing image resides -- in your case the host device containing /tmp/debian-amd64.img. Eg for /dev/sda1: # cat /sys/block/sda/queue/scheduler noop anticipatory deadline [cfq] # echo deadline /sys/block/sda/queue/scheduler # cat /sys/block/sda/queue/scheduler noop anticipatory [deadline] cfq And re-run your test to see if this brings virtio vs. non-virtio closer to the expected performance. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mm/memory.c:unmap_vmas(): fix NULL * deref
This cropped up in stress testing of a backport of the mmu notifier mechanism, however it still exists in 2.6.28.8 as well. Patch attached. Signed-off-by: john.coo...@redhat.com -- john.coo...@third-harmonic.com mm/memory.c |8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) = --- a/mm/memory.c +++ b/mm/memory.c @@ -899,9 +899,10 @@ unsigned long unmap_vmas(struct mmu_gath unsigned long start = start_addr; spinlock_t *i_mmap_lock = details? details-i_mmap_lock: NULL; int fullmm = (*tlbp)-fullmm; - struct mm_struct *mm = vma-vm_mm; + struct mm_struct *mm = vma ? vma-vm_mm : NULL; - mmu_notifier_invalidate_range_start(mm, start_addr, end_addr); + if (mm) + mmu_notifier_invalidate_range_start(mm, start_addr, end_addr); for ( ; vma vma-vm_start end_addr; vma = vma-vm_next) { unsigned long end; @@ -966,7 +967,8 @@ unsigned long unmap_vmas(struct mmu_gath } } out: - mmu_notifier_invalidate_range_end(mm, start_addr, end_addr); + if (mm) + mmu_notifier_invalidate_range_end(mm, start_addr, end_addr); return start; /* which is now the end (or restart) address */ }
Re: Resend: patch: qemu + hugetlbfs..
Avi Kivity wrote: john cooper wrote: This trivial patch never did manage to find its way in. Marcelo called it to my attention earlier in the week. I've tweaked it to apply to kvm-83 and the resulting patch is attached. I've left the prior e-mail discussion below for reference. Sorry for missing this (copying me helps). Please resubmit with a signoff.Also, please protect with #ifdef MAP_POPULATE to ease merging into upstream eventually. Updated patch attached. -john Signed-off-by: john cooper john.coo...@redhat.com -- john.coo...@redhat.com kernel/x86/Kbuild |4 ++-- qemu/vl.c | 38 ++ 2 files changed, 36 insertions(+), 6 deletions(-) = --- a/qemu/vl.c +++ b/qemu/vl.c @@ -237,6 +237,9 @@ int semihosting_enabled = 0; int time_drift_fix = 0; unsigned int kvm_shadow_memory = 0; const char *mem_path = NULL; +#ifdef MAP_POPULATE +int mem_prealloc = 1; /* force preallocation of physical target memory */ +#endif int hpagesize = 0; const char *cpu_vendor_string; #ifdef TARGET_ARM @@ -4116,7 +4119,12 @@ static void help(int exitcode) #endif -tdfinject timer interrupts that got lost\n -kvm-shadow-memory megs set the amount of shadow pages to be allocated\n - -mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n + -mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n + enables allocation of guest memory with huge pages\n +#ifdef MAP_POPULATE + -mem-prealloc toggles preallocation of -mem-path backed physical memory\n + at startup. Default is enabled.\n +#endif -option-rom rom load a file, rom, into the option ROM space\n #ifdef TARGET_SPARC -prom-env variable=value set OpenBIOS nvram variables\n @@ -4246,6 +4254,9 @@ enum { QEMU_OPTION_tdf, QEMU_OPTION_kvm_shadow_memory, QEMU_OPTION_mempath, +#ifdef MAP_POPULATE +QEMU_OPTION_mem_prealloc, +#endif }; typedef struct QEMUOption { @@ -4381,6 +4392,9 @@ static const QEMUOption qemu_options[] = { icount, HAS_ARG, QEMU_OPTION_icount }, { incoming, HAS_ARG, QEMU_OPTION_incoming }, { mem-path, HAS_ARG, QEMU_OPTION_mempath }, +#ifdef MAP_POPULATE +{ mem-prealloc, 0, QEMU_OPTION_mem_prealloc }, +#endif { NULL }, }; @@ -4663,6 +4677,9 @@ void *alloc_mem_area(size_t memory, unsi char *filename; void *area; int fd; +#ifdef MAP_POPULATE +int flags; +#endif if (asprintf(filename, %s/kvm.XX, path) == -1) return NULL; @@ -4690,13 +4707,21 @@ void *alloc_mem_area(size_t memory, unsi */ ftruncate(fd, memory); +#ifdef MAP_POPULATE +/* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case + * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED + * to sidestep this quirk. + */ +flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE; +area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0); +#else area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); +#endif if (area == MAP_FAILED) { - perror(mmap); + perror(alloc_mem_area: can't mmap hugetlbfs pages); close(fd); - return NULL; + return (NULL); } - *len = memory; return area; } @@ -5377,6 +5402,11 @@ int main(int argc, char **argv, char **e case QEMU_OPTION_mempath: mem_path = optarg; break; +#ifdef MAP_POPULATE +case QEMU_OPTION_mem_prealloc: + mem_prealloc = !mem_prealloc; + break; +#endif case QEMU_OPTION_name: qemu_name = optarg; break; = --- a/kernel/x86/Kbuild +++ b/kernel/x86/Kbuild @@ -9,8 +9,8 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_e ifeq ($(EXT_CONFIG_KVM_TRACE),y) kvm-objs += kvm_trace.o endif -ifeq ($(CONFIG_DMAR),y) -kvm-objs += vtd.o +ifeq ($(CONFIG_IOMMU_API),y) +kvm-objs += iommu.o endif kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o kvm-amd-objs := svm.o ../external-module-compat.o
Re: Resend: patch: qemu + hugetlbfs..
Avi Kivity wrote: john cooper wrote: This patch from over a month ago doesn't seem to have made it into kvm-73 and may have been lost in the shuffle. Attached is essentially the same patch but as applied to kvm-73, and validated relative to that version. What is the motivation for providing an option to disable this? If we can detect mem-path is backed by huge pages somehow, I think we can prefault the memory unconditionally. Pre-allocation of the entire huge page backed guest memory avoids the nondeterministic termination but admittedly is overly pessimistic. As this patch does so by default when -mem-path is specified, allowing for disable of pre-allocation simply reverts this change to prior behavior for use cases more tolerant to it as well as for debug purposes. The real fix arguably hinges on huge pages having more general virtual memory behavior. But that appears to be a much longer term prospect. -john -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Resend: patch: qemu + hugetlbfs..
This patch from over a month ago doesn't seem to have made it into kvm-73 and may have been lost in the shuffle. Attached is essentially the same patch but as applied to kvm-73, and validated relative to that version. In a nutshell the intention here is to allow preallocation of guest huge page backed memory at qemu initialization time to avoid a quirk in the kernel's huge page accounting allowing overcommit of huge pages. Failure of the kernel to resolve a guest fault to overcommitted huge page memory during runtime results in sigkill termination of the guest. This patch provides the option of avoiding such behavior at the cost of up-front preallocation of physical huge pages backing the guest. -john Anthony Liguori wrote: john cooper wrote: Anthony Liguori wrote: john cooper wrote: As it currently exists alloc_hpage_mem() is tied to the notion of huge page allocation as it will reference gethugepagesize() irrespective of *mem_path. So even in the case of tmpfs backed files, if the host kernel has been configured with CONFIG_HUGETLBFS we will wind up doing allocations of /dev/shm mapped files at /proc/meminfo:Hugepagesize granularity. Which is fine. It just means we round -m values up to even numbers. Well, yes it will round the allocation. But from a minimally sufficient 4KB boundary to that of 4MB/2MB relative to a 32/64 bit x86 host which is excessive. Probably not what was intended but probably not too much of a concern as -mem-path /dev/shm is likely only used in debug of this flag and associated logic. I don't see it currently being worth the trouble to correct from a squeaky clean POV, and doing so may drag in far more than the header file we've just booted above to deal with this architecture/config dependency. Renaming a function to a name that's less accurate seems bad to me. I don't mean to be pedantic, but it seems like a strange thing to do. I prefer it the way it was before. I don't see any harm reverting the name. But I do believe it is largely cosmetic as given the above, the current code does require some work to make it independent of huge page assumptions. Update attached. -john Looks good to me. Acked-by: Anthony Liguori [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- [EMAIL PROTECTED] vl.c | 48 ++-- 1 file changed, 30 insertions(+), 18 deletions(-) = --- ./qemu/vl.c.PAORG +++ ./qemu/vl.c @@ -239,6 +239,7 @@ int autostart = 1; int time_drift_fix = 0; unsigned int kvm_shadow_memory = 0; const char *mem_path = NULL; +int mem_prealloc = 1; /* force preallocation of physical target memory */ int hpagesize = 0; const char *cpu_vendor_string; #ifdef TARGET_ARM @@ -8423,7 +8424,10 @@ static void help(int exitcode) #endif -tdfinject timer interrupts that got lost\n -kvm-shadow-memory megs set the amount of shadow pages to be allocated\n - -mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n + -mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n + enables allocation of guest memory with huge pages\n + -mem-prealloc toggles preallocation of -mem-path backed physical memory\n + at startup. Default is enabled.\n -option-rom rom load a file, rom, into the option ROM space\n #ifdef TARGET_SPARC -prom-env variable=value set OpenBIOS nvram variables\n @@ -8546,6 +8550,7 @@ enum { QEMU_OPTION_tdf, QEMU_OPTION_kvm_shadow_memory, QEMU_OPTION_mempath, +QEMU_OPTION_mem_prealloc }; typedef struct QEMUOption { @@ -8671,6 +8676,7 @@ const QEMUOption qemu_options[] = { { tb-size, HAS_ARG, QEMU_OPTION_tb_size }, { icount, HAS_ARG, QEMU_OPTION_icount }, { mem-path, HAS_ARG, QEMU_OPTION_mempath }, +{ mem-prealloc, 0, QEMU_OPTION_mem_prealloc }, { NULL }, }; @@ -8890,11 +8896,13 @@ static int gethugepagesize(void) return hugepagesize; } +/* attempt to allocate memory mmap'ed to mem_path + */ void *alloc_mem_area(unsigned long memory, const char *path) { char *filename; void *area; -int fd; +int fd, flags; if (asprintf(filename, %s/kvm.XX, path) == -1) return NULL; @@ -8922,26 +8930,27 @@ void *alloc_mem_area(unsigned long memor */ ftruncate(fd, memory); -area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); -if (area == MAP_FAILED) { - perror(mmap); - close(fd); - return NULL; -} - -return area; +/* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case + * MAP_PRIVATE is requested
Re: Un-googlable
David Abrahams wrote: on Sat Aug 16 2008, Glauber Costa glommer-AT-gmail.com wrote: Furthermore, we're not the first word with (at least) double meaning in the world. No, but as I said, the target audience (and application area) overlaps so heavily that even adding other relevant search terms usually doesn't eliminate the chaff. I seem to recall from back when some half-baked software known as windows which was introduced by that name despite existing software at the time using the same moniker. Doesn't appear to have impacted its viral propagation in the world too much. Give it some time. I think it is a safe bet for the hypervisor by that name to be more ubiquitous than the like named console multiplexer box. -john -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: patch: qemu + hugetlbfs..
Anthony Liguori wrote: +#include asm/param.h I don't think this is necessary anymore. Depending on a Linux headers breaks the QEMU build on other unices so it's a bad thing. It is no longer required, but see below. hpage is a misnomer too as we aren't actually dependent on huge pages (this code should work equally well for tmpfs). As it currently exists alloc_hpage_mem() is tied to the notion of huge page allocation as it will reference gethugepagesize() irrespective of *mem_path. So even in the case of tmpfs backed files, if the host kernel has been configured with CONFIG_HUGETLBFS we will wind up doing allocations of /dev/shm mapped files at /proc/meminfo:Hugepagesize granularity. Otherwise if HUGETLBFS is not configured gethugepagesize() returns zero and alloc_hpage_mem() itself will not perform the allocation. Probably not what was intended but probably not too much of a concern as -mem-path /dev/shm is likely only used in debug of this flag and associated logic. I don't see it currently being worth the trouble to correct from a squeaky clean POV, and doing so may drag in far more than the header file we've just booted above to deal with this architecture/config dependency. An updated patch is attached. -john -- [EMAIL PROTECTED] --- a/qemu/vl.c +++ b/qemu/vl.c @@ -234,6 +234,7 @@ int autostart = 1; int time_drift_fix = 0; unsigned int kvm_shadow_memory = 0; const char *mem_path = NULL; +int mem_prealloc = 1; /* force preallocation of physical target memory */ int hpagesize = 0; const char *cpu_vendor_string; #ifdef TARGET_ARM @@ -7809,7 +7810,10 @@ static void help(int exitcode) #endif -tdfinject timer interrupts that got lost\n -kvm-shadow-memory megs set the amount of shadow pages to be allocated\n - -mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n + -mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n + enables allocation of guest memory with huge pages\n + -mem-prealloc toggles preallocation of -mem-path backed physical memory\n + at startup. Default is enabled.\n -option-rom rom load a file, rom, into the option ROM space\n #ifdef TARGET_SPARC -prom-env variable=value set OpenBIOS nvram variables\n @@ -7932,6 +7936,7 @@ enum { QEMU_OPTION_tdf, QEMU_OPTION_kvm_shadow_memory, QEMU_OPTION_mempath, +QEMU_OPTION_mem_prealloc }; typedef struct QEMUOption { @@ -8059,6 +8064,7 @@ const QEMUOption qemu_options[] = { { startdate, HAS_ARG, QEMU_OPTION_startdate }, { tb-size, HAS_ARG, QEMU_OPTION_tb_size }, { mem-path, HAS_ARG, QEMU_OPTION_mempath }, +{ mem-prealloc, 0, QEMU_OPTION_mem_prealloc }, { NULL }, }; @@ -8276,11 +8282,13 @@ static int gethugepagesize(void) return hugepagesize; } -void *alloc_mem_area(unsigned long memory, const char *path) +/* attempt to allocate memory mmap'ed to mem_path + */ +void *alloc_hpage_mem(unsigned long memory, const char *path) { char *filename; void *area; -int fd; +int fd, flags; if (asprintf(filename, %s/kvm.XX, path) == -1) return NULL; @@ -8308,26 +8316,27 @@ void *alloc_mem_area(unsigned long memor */ ftruncate(fd, memory); -area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); -if (area == MAP_FAILED) { - perror(mmap); - close(fd); - return NULL; -} - -return area; +/* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case + * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED + * to sidestep this quirk. + */ +flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE; +area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0); +if (area != MAP_FAILED) + return (area); +perror(alloc_hpage_mem: can't mmap hugetlbfs pages); +close(fd); +return (NULL); } -void *qemu_alloc_physram(unsigned long memory) +/* allocate guest memory as requested + */ +void *qemu_alloc_physram(unsigned long size) { -void *area = NULL; - if (mem_path) - area = alloc_mem_area(memory, mem_path); -if (!area) - area = qemu_vmalloc(memory); - -return area; + return (alloc_hpage_mem(size, mem_path)); +else + return (qemu_vmalloc(size)); } int main(int argc, char **argv) @@ -8962,6 +8971,9 @@ int main(int argc, char **argv) case QEMU_OPTION_mempath: mem_path = optarg; break; +case QEMU_OPTION_mem_prealloc: + mem_prealloc = !mem_prealloc; + break; case QEMU_OPTION_name: qemu_name = optarg; break;
Re: patch: qemu + hugetlbfs..
Anthony Liguori wrote: +/* assertion checking + */ +#define ASSERT(c) if (!(c)) _assert(#c, __FILE__, __LINE__) + +static inline void _assert(char *text, char *file, int line) { +fprintf(stderr, ASSERTION FAILED: [%s] %s:%d\n, text, file, line); +kill(getpid(), 12); /* dump core */ +} + Is it really necessary to add this as part of this patch? No, certainly not strictly necessary. +int mem_fallback = {0};/* allow alloc from alternate page size */ This is a bizarre way to initialize an integer. I had no idea you could even do this for a scalar. Just initialize to 1 and 0. That's historically been valid syntax since pre-KR. But certainly can be adapted to the context style if it raises concern. -void *alloc_mem_area(unsigned long memory, const char *path) +/* fault in associated page, fail syscall when free page is unavailable + */ +static inline int fault_in(void *a) +{ +return (gettimeofday(a, NULL) 0 ? errno != EFAULT : 1); +} I don't like this very much. Why not just MAP_POPULATE? I like it even less. MAP_POPULATE does not fault in physical hpages to the map. Again this was a qemu-side interim bandaid. +/* we failed to fault in hpage *a, fall back to conventional page mapping + */ +int remap_hpage(void *a, int sz) +{ +ASSERT(!(sz (EXEC_PAGESIZE - 1))); +if (munmap(a, sz) 0) +perror(remap_hpage: munmap); +else if (mmap(a, sz, PROT_READ|PROT_WRITE, +MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0) == MAP_FAILED) +perror(remap_hpage: mmap); +else +return (1); +return (0); +} I think this would be simplified with MAP_POPULATE since you can fail in large chunks of memory instead of potentially having a highly fragmented set of VMAs. Here for 4K pages we only need to setup the map. If we later fault on a physically absent 4K page we'll wait if a page isn't immediately available. Rather in the case of a hpage being unavailable, we'll terminate. Note at this point we've effectively locked onto whatever hpages we've been able to map as they can't be reclaimed from us until we exit. Also it appears tab formatting in this patch may need to be scrubbed some as the mailers seem to be jostling the whitespace. +int bool_arg(const char *optarg, const char *flag_name) +{ +if (!optarg) +; +else if (altmatch(y|yes|1, optarg)) +return (1); +else if (altmatch(n|no|0, optarg)) +return (0); +fprintf(stderr, usage: %s [0|1]\n, flag_name); +exit(1); } + This is introducing too much infrastructure. Just follow the conventions in the rest of the code. No need to make the options take a boolean argument. The options themselves are boolean arguments. That's how it originally existed. However with the defaults different for the two flags it seemed a bit clunky to have one recall what the initial disposition of the option was and to toggle its behavior if that wasn't the intention. But admittedly I don't have a strong affinity either way. I was only concerned with usability and clarity of the flags. -john -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html