Re: [PATCHv3 4/4] acpi: automatically generated ssdt proc
On Tue, Oct 04, 2011 at 10:52:33PM -0400, Kevin O'Connor wrote: On Tue, Oct 04, 2011 at 03:26:19PM +0200, Michael S. Tsirkin wrote: Get rid of manually cut and pasted ssdt_proc, use ssdt compiled by iasl and offsets extracted by acpi_extract instead. Thanks - I like the idea of auto-generating the offsets. [...] +#define AmlCode static ssdp_proc_aml +#include ssdt-proc.hex +#undef AmlCode Side note - since you're post-processing the acpi data, it would be nice to update the name in the hex file too. That would mean we will output hex as well, ignore the hex produced by iasl. Sure, I can do that. Another benefit would be that we can skip the ssdt preamble generated by iasl in the hex we output. +/* 0x5B 0x83 ProcessorOp PkgLength NameString ProcID */ +#define SD_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2) +#define SD_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4) +#define SD_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start) +#define SD_SIZEOF (*ssdt_proc_end - *ssdt_proc_start) +#define SD_PROC (ssdp_proc_aml + *ssdt_proc_start) [...] DefinitionBlock (ssdt-proc.aml, SSDT, 0x01, BXPC, BXSSDT, 0x1) -/* v-- DO NOT EDIT --v */ { +ACPI_EXTRACT_PROCESSOR_START ssdt_proc_start +ACPI_EXTRACT_PROCESSOR_END ssdt_proc_end +ACPI_EXTRACT_PROCESSOR_STRING ssdt_proc_name Processor (CPAA, 0xAA, 0xb010, 0x06) { Since the acpi.c code needs to know the processor object format anyway, what about making a generic ACPI_EXTRACT indicator that exports the location, size, and parameter location in one go. Something like: ACPI_EXTRACT ssdt_proc_obj Processor (CPAA, 0xAA, 0xb010, 0x06) { I considered this, sure. We could parse AML to figure out what is the object type we are trying to look up. However I decided sanity-checking that we get the right type of object in AML is better. This way if iasl output format breaks we will have a better chance to detect that. Makes sense? Also this can not be as generic as it seems: each type of object in AML bytecode is encoded slightly differently. So we would still have types of objects we support and types of object we don't. which would produce something like: static struct aml_object ssdt_proc_obj = {.addr=0x24, .size=0x40, .param=0x28}; What is the param offset here? I really think multiple arrays are better so we don't waste memory on fields that we don't need and alignment. I also dislike hardcoding field names. With a struct, if you want to know where did 'param' come from you must read python. My way, all names come from DSL source. As for the other parts of this patch series - I'm still leary of changing the DSDT dynamically. Hmm, not sure I understand why. Could you explain pls? I'd be curious to see if we can add the following to ssdt-proc.dsl: ACPI_EXTRACT hotplug_obj Device (SL00) { ACPI_EXTRACT_NAME_DWORD_CONST hotplog_id Name (ID, 0xAABBCCDD) Name (_ADR, ID) Method (_EJ0, 1) { Return(PCEJ(ID)) } Name (_SUN, ID) } and then just memcpy the hotplug_obj N number of times into the ssdt for each available slot. (This would be on top of the DSDT simplification patch series that I posted previously.) This assumes all devices are the same. But unfortunately this will not work for other devices such as VGA. -Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm: Remove extboot support
From: Jan Kiszka jan.kis...@siemens.com Except for booting from SCSI via the LSI controller, SeaBIOS has native support for mass storage interfaces now. And SCSI can be worked around via [1] - or someone finally adds the necessary bits to SeaBIOS if there is a real need. [1] http://thread.gmane.org/gmane.comp.emulators.qemu/78467 Signed-off-by: Jan Kiszka jan.kis...@siemens.com Acked-by: Anthony Liguori aligu...@us.ibm.com --- .gitignore |1 - Makefile|1 - Makefile.target |1 - blockdev.c | 13 - blockdev.h |2 - hw/extboot.c| 123 hw/pc.c | 22 -- hw/pc.h |4 - pc-bios/optionrom/Makefile |2 +- pc-bios/optionrom/extboot.S | 691 --- qemu-config.c |4 - qemu-options.hx |4 +- 12 files changed, 2 insertions(+), 866 deletions(-) delete mode 100644 hw/extboot.c delete mode 100644 pc-bios/optionrom/extboot.S diff --git a/.gitignore b/.gitignore index 625f28e..ea2bd8a 100644 --- a/.gitignore +++ b/.gitignore @@ -66,7 +66,6 @@ pc-bios/vgabios-pq/status pc-bios/optionrom/linuxboot.bin pc-bios/optionrom/multiboot.bin pc-bios/optionrom/multiboot.raw -pc-bios/optionrom/extboot.bin pc-bios/optionrom/vapic.bin .stgit-* cscope.* diff --git a/Makefile b/Makefile index 936f130..f7f33a6 100644 --- a/Makefile +++ b/Makefile @@ -252,7 +252,6 @@ mpc8544ds.dtb \ multiboot.bin linuxboot.bin \ s390-zipl.rom \ spapr-rtas.bin slof.bin -BLOBS += extboot.bin BLOBS += vapic.bin else BLOBS= diff --git a/Makefile.target b/Makefile.target index 324b4f1..f84d8cb 100644 --- a/Makefile.target +++ b/Makefile.target @@ -229,7 +229,6 @@ obj-i386-y += mc146818rtc.o i8259.o pc.o obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o obj-i386-y += vmport.o obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o -obj-i386-y += extboot.o obj-i386-y += debugcon.o multiboot.o obj-i386-y += pc_piix.o obj-i386-$(CONFIG_KVM) += kvmclock.o diff --git a/blockdev.c b/blockdev.c index da74171..0827bf7 100644 --- a/blockdev.c +++ b/blockdev.c @@ -16,8 +16,6 @@ #include sysemu.h #include block_int.h -DriveInfo *extboot_drive = NULL; - static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives); static const char *const if_name[IF_COUNT] = { @@ -237,7 +235,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) int on_read_error, on_write_error; const char *devaddr; DriveInfo *dinfo; -int is_extboot = 0; int snapshot = 0; int ret; @@ -356,12 +353,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) } } -is_extboot = qemu_opt_get_bool(opts, boot, 0); -if (is_extboot extboot_drive) { -fprintf(stderr, qemu: two bootable drives specified\n); -return NULL; -} - on_write_error = BLOCK_ERR_STOP_ENOSPC; if ((buf = qemu_opt_get(opts, werror)) != NULL) { if (type != IF_IDE type != IF_SCSI type != IF_VIRTIO type != IF_NONE) { @@ -467,10 +458,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) strncpy(dinfo-serial, serial, sizeof(dinfo-serial) - 1); QTAILQ_INSERT_TAIL(drives, dinfo, next); -if (is_extboot) { -extboot_drive = dinfo; -} - bdrv_set_on_error(dinfo-bdrv, on_read_error, on_write_error); switch(type) { diff --git a/blockdev.h b/blockdev.h index 0a5144c..3587786 100644 --- a/blockdev.h +++ b/blockdev.h @@ -66,6 +66,4 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data); -extern DriveInfo *extboot_drive; - #endif diff --git a/hw/extboot.c b/hw/extboot.c deleted file mode 100644 index d517834..000 --- a/hw/extboot.c +++ /dev/null @@ -1,123 +0,0 @@ -/* - * Extended boot option ROM support. - * - * Copyright IBM, Corp. 2007 - * - * Authors: - * Anthony Liguori aligu...@us.ibm.com - * - * This work is licensed under the terms of the GNU GPL, version 2. See - * the COPYING file in the top-level directory. - * - */ - -#include hw.h -#include pc.h -#include isa.h -#include block.h - -/* Extended Boot ROM suport */ - -union extboot_cmd -{ -uint16_t type; -struct { - uint16_t type; - uint16_t cylinders; - uint16_t heads; - uint16_t sectors; - uint64_t nb_sectors; -} query_geometry; -struct { - uint16_t type; - uint16_t nb_sectors; - uint16_t segment; - uint16_t offset; - uint64_t sector; -} xfer; -}; - -static void get_translated_chs(BlockDriverState *bs, int *c, int *h, int *s) -{ -bdrv_get_geometry_hint(bs, c, h, s); - -if (*c = 1024) { - *c = 0; - *h = 0; -} else if (*c = 2048) { - *c
Re: [Qemu-devel] [PATCH 2/2] LAPIC: make lapic support cpu hotplug
On 2011-10-05 12:26, liu ping fan wrote: And make the creation of apic as part of cpu initialization, so apic's state has been ready, before setting kvm_apic. There is no kvm-apic upstream yet, so it's hard to judge why we need this here. If we do, this has to be a separate patch. But I seriously doubt we need it (my hack worked without it, and that was not because of its hack nature). Sorry, I did not explain it clearly. What I mean is that “env-apic_state” must be prepared before qemu_kvm_cpu_thread_fn() - ... - kvm_put_sregs(), where we get apic_base by “ sregs.apic_base = cpu_get_apic_base(env-apic_state);” and then call “kvm_vcpu_ioctl(env, KVM_SET_SREGS, sregs);” which will finally affect the kvm_apic structure in kernel. But as current code, in pc_new_cpu(), we call apic_init() to initialize apic_state, after cpu_init(), so we can not guarantee the order of apic_state initializaion and the setting to kernel. Because LAPIC is part of x86 chip, I want to move it into cpu_x86_init(), and ensure apic_init() called before thread “qemu_kvm_cpu_thread_fn()” creation. The LAPIC is part of the CPU, the classic APIC was a dedicated chip. For various reasons, a safer approach for creating a new CPU is to stop the machine, add the new device models, run cpu_synchronize_post_init on that new cpu (looks like you missed that) and then resume everything. See http://git.kiszka.org/?p=qemu-kvm.git;a=commitdiff;h=be8f21c6b54eac82f7add7ee9d4ecf9cb8ebb320 ... diff --git a/hw/icc_bus.c b/hw/icc_bus.c new file mode 100644 index 000..360ca2a --- /dev/null +++ b/hw/icc_bus.c @@ -0,0 +1,62 @@ +/* +*/ +#define ICC_BUS_PLUG +#ifdef ICC_BUS_PLUG +#include icc_bus.h + + + +struct icc_bus_info icc_info = { +.qinfo.name = icc, +.qinfo.size = sizeof(struct icc_bus), +.qinfo.props = (Property[]) { +DEFINE_PROP_END_OF_LIST(), +} + +}; + + +static const VMStateDescription vmstate_icc_bus = { +.name = icc_bus, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.pre_save = NULL, +.post_load = NULL, +}; + +struct icc_bus *g_iccbus; + +struct icc_bus *icc_init_bus(DeviceState *parent, const char *name) +{ +struct icc_bus *bus; + +bus = FROM_QBUS(icc_bus, qbus_create(icc_info.qinfo, parent, name)); +bus-qbus.allow_hotplug = 1; /* Yes, we can */ +bus-qbus.name = icc; +vmstate_register(NULL, -1, vmstate_icc_bus, bus); The chipset is the owner of this bus and instantiates it. So it also provides a vmstate. You can drop this unneeded one here (it's created via an obsolete API anyway). No familiar with Qemu bus emulation, keep on learning :) . But what I thought is, the x86-ICC bus is not the same as bus like PCI. For a PCI bus, it lies behind a host bridge, but ICC is shared by all x86 processors in SMP system, so there is not a outstanding owner. And I right? ICC is also attached to the chipset (due to the IOAPIC). So it looks reasonable to me to let the chipset do the lifecycle management as well. It is the fixed point, CPUs may come and go. Jan signature.asc Description: OpenPGP digital signature
[PATCH v2 5/9] perf, tools: Do guest-only counting in perf-kvm by default
From: Joerg Roedel joerg.roe...@amd.com Make use of exclude_guest and exlude_host in perf-kvm to do only guest-only counting by default. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Gleb Natapov g...@redhat.com --- tools/perf/builtin-kvm.c |3 ++- tools/perf/util/event.c|8 tools/perf/util/event.h|2 ++ tools/perf/util/evlist.c |5 - tools/perf/util/parse-events.c |1 + 5 files changed, 17 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 032324a..9b05afa 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -107,7 +107,8 @@ static int __cmd_buildid_list(int argc, const char **argv) int cmd_kvm(int argc, const char **argv, const char *prefix __used) { - perf_host = perf_guest = 0; + perf_host = 0; + perf_guest = 1; argc = parse_options(argc, argv, kvm_options, kvm_usage, PARSE_OPT_STOP_AT_NON_OPTION); diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index 437f8ca..31a6d7f 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -44,6 +44,14 @@ static struct perf_sample synth_sample = { .period= 1, }; +void event_attr_init(struct perf_event_attr *attr) +{ + if (!perf_host) + attr-exclude_host = 1; + if (!perf_guest) + attr-exclude_guest = 1; +} + static pid_t perf_event__synthesize_comm(union perf_event *event, pid_t pid, int full, perf_event__handler_t process, struct perf_session *session) diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h index 357a85b..e5f101e 100644 --- a/tools/perf/util/event.h +++ b/tools/perf/util/event.h @@ -150,6 +150,8 @@ typedef int (*perf_event__handler_t)(union perf_event *event, struct perf_sample *sample, struct perf_session *session); +void event_attr_init(struct perf_event_attr *attr); + int perf_event__synthesize_thread_map(struct thread_map *threads, perf_event__handler_t process, struct perf_session *session); diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 72e9f48..4773fbe 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -82,8 +82,11 @@ int perf_evlist__add_default(struct perf_evlist *evlist) .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES, }; - struct perf_evsel *evsel = perf_evsel__new(attr, 0); + struct perf_evsel *evsel; + + event_attr_init(attr); + evsel = perf_evsel__new(attr, 0); if (evsel == NULL) goto error; diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 3b00775..620ba98 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -848,6 +848,7 @@ int parse_events(struct perf_evlist *evlist , const char *str, int unset __used) for (;;) { ostr = str; memset(attr, 0, sizeof(attr)); + event_attr_init(attr); ret = parse_event_symbols(evlist, str, attr); if (ret == EVT_FAILED) return -1; -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/9] perf, core: Introduce attrs to count in either host or guest mode
From: Joerg Roedel joerg.roe...@amd.com The two new attributes exclude_guest and exclude_host can bes used by user-space to tell the kernel to setup performance counter to either only count while the CPU is in guest or in host mode. An additional check is also introduced to make sure user-space does not try to exclude guest and host mode from counting. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Gleb Natapov g...@redhat.com --- include/linux/perf_event.h |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index c816075..1e9ebe5 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -220,7 +220,10 @@ struct perf_event_attr { mmap_data : 1, /* non-exec mmap data*/ sample_id_all : 1, /* sample_type all events */ - __reserved_1 : 45; + exclude_host : 1, /* don't count in host */ + exclude_guest : 1, /* don't count in guest */ + + __reserved_1 : 43; union { __u32 wakeup_events;/* wakeup every n events */ -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/9] perf, tools: Add support for guest/host-only profiling
From: Joerg Roedel joerg.roe...@amd.com To restrict a counter to either host or guest mode this patch introduces two new event modifiers: G and H. With G the counter is configured in guest-only mode and with H in host-only mode. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Gleb Natapov g...@redhat.com --- tools/perf/util/parse-events.c | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 928918b..3b00775 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -735,8 +735,8 @@ static int parse_event_modifier(const char **strp, struct perf_event_attr *attr) { const char *str = *strp; - int exclude = 0; - int eu = 0, ek = 0, eh = 0, precise = 0; + int exclude = 0, exclude_GH = 0; + int eu = 0, ek = 0, eh = 0, eH = 0, eG = 0, precise = 0; if (!*str) return 0; @@ -760,6 +760,14 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr) if (!exclude) exclude = eu = ek = eh = 1; eh = 0; + } else if (*str == 'G') { + if (!exclude_GH) + exclude_GH = eG = eH = 1; + eG = 0; + } else if (*str == 'H') { + if (!exclude_GH) + exclude_GH = eG = eH = 1; + eH = 0; } else if (*str == 'p') { precise++; } else @@ -776,6 +784,8 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr) attr-exclude_kernel = ek; attr-exclude_hv = eh; attr-precise_ip = precise; + attr-exclude_host = eH; + attr-exclude_guest = eG; return 0; } -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/9] perf, intel: Use GO/HO bits in perf-ctr
Intel does not have guest/host-only bit in perf counters like AMD does. To support GO/HO bits KVM needs to switch EVENTSELn values (or PERF_GLOBAL_CTRL if available) at a guest entry. If a counter is configured to count only in a guest mode it stays disabled in a host, but VMX is configured to switch it to enabled value during guest entry. This patch adds GO/HO tracking to Intel perf code and provides interface for KVM to get a list of MSRs that need to be switched on a guest entry. Only cpus with architectural PMU (v1 or later) are supported with this patch. To my knowledge there is not p6 models with VMX but without architectural PMU and p4 with VMX are rare and the interface is general enough to support them if need arise. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/perf_event.h | 12 arch/x86/kernel/cpu/perf_event.c | 14 + arch/x86/kernel/cpu/perf_event_intel.c | 90 ++- 3 files changed, 113 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index ce2bfb3..e47cb61 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -162,7 +162,19 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs); ); \ } +struct perf_guest_switch_msr { + unsigned msr; + u64 host, guest; +}; + +extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr); #else +static inline perf_guest_switch_msr *perf_guest_get_msrs(int *nr) +{ + *nr = 0; + return NULL; +} + static inline void perf_events_lapic_init(void){ } #endif diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index cfa62ec..531f5c6 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -119,6 +119,10 @@ struct cpu_hw_events { struct perf_branch_stacklbr_stack; struct perf_branch_entrylbr_entries[MAX_LBR_ENTRIES]; + /* Intel host/guest exclude bits */ + u64 intel_ctrl_guest_mask; + u64 intel_ctrl_host_mask; + /* * manage shared (per-core, per-cpu) registers * used on Intel NHM/WSM/SNB @@ -129,6 +133,11 @@ struct cpu_hw_events { * AMD specific bits */ struct amd_nb *amd_nb; + + /* +* Intel guest/host-only support +*/ + struct perf_guest_switch_msr guest_switch_msrs[X86_PMC_IDX_MAX]; }; #define __EVENT_CONSTRAINT(c, n, m, w) {\ @@ -292,6 +301,11 @@ struct x86_pmu { */ struct extra_reg *extra_regs; unsigned int er_flags; + + /* +* Guest event support +*/ + struct perf_guest_switch_msr* (*guest_get_msrs)(int *nr); }; #define ERF_NO_HT_SHARING 1 diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index f88af2c..e35b916 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -746,7 +746,8 @@ static void intel_pmu_enable_all(int added) intel_pmu_pebs_enable_all(); intel_pmu_lbr_enable_all(); - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl); + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, + x86_pmu.intel_ctrl ~cpuc-intel_ctrl_guest_mask); if (test_bit(X86_PMC_IDX_FIXED_BTS, cpuc-active_mask)) { struct perf_event *event = @@ -869,6 +870,7 @@ static void intel_pmu_disable_fixed(struct hw_perf_event *hwc) static void intel_pmu_disable_event(struct perf_event *event) { struct hw_perf_event *hwc = event-hw; + struct cpu_hw_events *cpuc = __get_cpu_var(cpu_hw_events); if (unlikely(hwc-idx == X86_PMC_IDX_FIXED_BTS)) { intel_pmu_disable_bts(); @@ -876,6 +878,9 @@ static void intel_pmu_disable_event(struct perf_event *event) return; } + cpuc-intel_ctrl_guest_mask = ~(1ull hwc-idx); + cpuc-intel_ctrl_host_mask = ~(1ull hwc-idx); + if (unlikely(hwc-config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) { intel_pmu_disable_fixed(hwc); return; @@ -921,6 +926,7 @@ static void intel_pmu_enable_fixed(struct hw_perf_event *hwc) static void intel_pmu_enable_event(struct perf_event *event) { struct hw_perf_event *hwc = event-hw; + struct cpu_hw_events *cpuc = __get_cpu_var(cpu_hw_events); if (unlikely(hwc-idx == X86_PMC_IDX_FIXED_BTS)) { if (!__this_cpu_read(cpu_hw_events.enabled)) @@ -930,6 +936,11 @@ static void intel_pmu_enable_event(struct perf_event *event) return; } + if (event-attr.exclude_host) + cpuc-intel_ctrl_guest_mask |= (1ull hwc-idx); + if (event-attr.exclude_guest) + cpuc-intel_ctrl_host_mask |= (1ull
[PATCH v2 7/9] KVM, VMX: add support for switching of PERF_GLOBAL_CTRL
Some cpus have special support for switching PERF_GLOBAL_CTRL msr. Add logic to detect if such support exists and works properly and extend msr switching code to use it if available. Also extend number of generic msr switching entries to 8. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c | 104 ++- 1 files changed, 93 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f3ec38f..a938ddf 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -118,7 +118,7 @@ module_param(ple_gap, int, S_IRUGO); static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW; module_param(ple_window, int, S_IRUGO); -#define NR_AUTOLOAD_MSRS 1 +#define NR_AUTOLOAD_MSRS 8 #define VMCS02_POOL_SIZE 1 struct vmcs { @@ -622,6 +622,7 @@ static unsigned long *vmx_msr_bitmap_legacy; static unsigned long *vmx_msr_bitmap_longmode; static bool cpu_has_load_ia32_efer; +static bool cpu_has_load_perf_global_ctrl; static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS); static DEFINE_SPINLOCK(vmx_vpid_lock); @@ -1191,15 +1192,34 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu) vmcs_write32(EXCEPTION_BITMAP, eb); } +static void clear_atomic_switch_msr_special(unsigned long entry, + unsigned long exit) +{ + vmcs_clear_bits(VM_ENTRY_CONTROLS, entry); + vmcs_clear_bits(VM_EXIT_CONTROLS, exit); +} + static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr) { unsigned i; struct msr_autoload *m = vmx-msr_autoload; - if (msr == MSR_EFER cpu_has_load_ia32_efer) { - vmcs_clear_bits(VM_ENTRY_CONTROLS, VM_ENTRY_LOAD_IA32_EFER); - vmcs_clear_bits(VM_EXIT_CONTROLS, VM_EXIT_LOAD_IA32_EFER); - return; + switch (msr) { + case MSR_EFER: + if (cpu_has_load_ia32_efer) { + clear_atomic_switch_msr_special(VM_ENTRY_LOAD_IA32_EFER, + VM_EXIT_LOAD_IA32_EFER); + return; + } + break; + case MSR_CORE_PERF_GLOBAL_CTRL: + if (cpu_has_load_perf_global_ctrl) { + clear_atomic_switch_msr_special( + VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, + VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL); + return; + } + break; } for (i = 0; i m-nr; ++i) @@ -1215,18 +1235,44 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr) vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m-nr); } +static void add_atomic_switch_msr_special(unsigned long entry, + unsigned long exit, unsigned long guest_val_vmcs, + unsigned long host_val_vmcs, u64 guest_val, u64 host_val) +{ + vmcs_write64(guest_val_vmcs, guest_val); + vmcs_write64(host_val_vmcs, host_val); + vmcs_set_bits(VM_ENTRY_CONTROLS, entry); + vmcs_set_bits(VM_EXIT_CONTROLS, exit); +} + static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, u64 guest_val, u64 host_val) { unsigned i; struct msr_autoload *m = vmx-msr_autoload; - if (msr == MSR_EFER cpu_has_load_ia32_efer) { - vmcs_write64(GUEST_IA32_EFER, guest_val); - vmcs_write64(HOST_IA32_EFER, host_val); - vmcs_set_bits(VM_ENTRY_CONTROLS, VM_ENTRY_LOAD_IA32_EFER); - vmcs_set_bits(VM_EXIT_CONTROLS, VM_EXIT_LOAD_IA32_EFER); - return; + switch (msr) { + case MSR_EFER: + if (cpu_has_load_ia32_efer) { + add_atomic_switch_msr_special(VM_ENTRY_LOAD_IA32_EFER, + VM_EXIT_LOAD_IA32_EFER, + GUEST_IA32_EFER, + HOST_IA32_EFER, + guest_val, host_val); + return; + } + break; + case MSR_CORE_PERF_GLOBAL_CTRL: + if (cpu_has_load_perf_global_ctrl) { + add_atomic_switch_msr_special( + VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, + VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL, + GUEST_IA32_PERF_GLOBAL_CTRL, + HOST_IA32_PERF_GLOBAL_CTRL, + guest_val, host_val); + return; + } + break; } for (i = 0; i m-nr; ++i) @@ -2455,6 +2501,42 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) allow_1_setting(MSR_IA32_VMX_EXIT_CTLS, VM_EXIT_LOAD_IA32_EFER); +
[PATCH v2 4/9] perf, tools: Fix copypaste error in perf-kvm option description
From: Joerg Roedel joerg.roe...@amd.com The --host option certainly enables host-data collection. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Gleb Natapov g...@redhat.com --- tools/perf/builtin-kvm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 34d1e85..032324a 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -38,7 +38,7 @@ static const struct option kvm_options[] = { OPT_BOOLEAN(0, guest, perf_guest, Collect guest os data), OPT_BOOLEAN(0, host, perf_host, - Collect guest os data), + Collect host os data), OPT_STRING(0, guestmount, symbol_conf.guestmount, directory, guest mount directory under which every guest os instance has a subdir), -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/9] perf support for x86 guest/host-only bits
This patch series consists of Joerg series named perf support for amd guest/host-only bits v2 [1] rebased to 3.1.0-rc7 and in addition, support for intel cpus for the same functionality. [1] https://lkml.org/lkml/2011/6/17/171 Changelog: v1-v2 - move perf_guest_switch_msr array to perf code. - small cosmetic changes. Gleb Natapov (4): perf, intel: Use GO/HO bits in perf-ctr KVM, VMX: add support for switching of PERF_GLOBAL_CTRL KVM, VMX: Add support for guest/host-only profiling KVM, VMX: Check for automatic switch msr table overflow. Joerg Roedel (5): perf, core: Introduce attrs to count in either host or guest mode perf, amd: Use GO/HO bits in perf-ctr perf, tools: Add support for guest/host-only profiling perf, tools: Fix copypaste error in perf-kvm option description perf, tools: Do guest-only counting in perf-kvm by default arch/x86/include/asm/perf_event.h | 15 arch/x86/kernel/cpu/perf_event.c | 14 arch/x86/kernel/cpu/perf_event_amd.c | 13 +++ arch/x86/kernel/cpu/perf_event_intel.c | 90 +- arch/x86/kvm/vmx.c | 131 +--- include/linux/perf_event.h |5 +- tools/perf/builtin-kvm.c |5 +- tools/perf/util/event.c|8 ++ tools/perf/util/event.h|2 + tools/perf/util/evlist.c |5 +- tools/perf/util/parse-events.c | 15 +++- 11 files changed, 282 insertions(+), 21 deletions(-) -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 9/9] KVM, VMX: Check for automatic switch msr table overflow.
Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d0d4afa..6e28d58 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1280,7 +1280,11 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, if (m-guest[i].index == msr) break; - if (i == m-nr) { + if (i == NR_AUTOLOAD_MSRS) { + printk_once(KERN_WARNINGNot enough mst switch entries. + Can't add msr %x\n, msr); + return; + } else if (i == m-nr) { ++m-nr; vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m-nr); vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m-nr); -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/9] perf, amd: Use GO/HO bits in perf-ctr
From: Joerg Roedel joerg.roe...@amd.com The AMD perf-counters support counting in guest or host-mode only. Make use of that feature when user-space specified guest/host-mode only counting. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/perf_event.h|3 +++ arch/x86/kernel/cpu/perf_event_amd.c | 13 + 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index 094fb30..ce2bfb3 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -29,6 +29,9 @@ #define ARCH_PERFMON_EVENTSEL_INV (1ULL 23) #define ARCH_PERFMON_EVENTSEL_CMASK0xFF00ULL +#define AMD_PERFMON_EVENTSEL_GUESTONLY (1ULL 40) +#define AMD_PERFMON_EVENTSEL_HOSTONLY (1ULL 41) + #define AMD64_EVENTSEL_EVENT \ (ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL 32)) #define INTEL_ARCH_EVENT_MASK \ diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c index 941caa2..5c01a73 100644 --- a/arch/x86/kernel/cpu/perf_event_amd.c +++ b/arch/x86/kernel/cpu/perf_event_amd.c @@ -132,6 +132,19 @@ static int amd_pmu_hw_config(struct perf_event *event) if (ret) return ret; + if (event-attr.exclude_host event-attr.exclude_guest) + /* +* When HO == GO == 1 the hardware treats that as GO == HO == 0 +* and will count in both modes. We don't want to count in that +* case so we emulate no-counting by setting US = OS = 0. +*/ + event-hw.config = ~(ARCH_PERFMON_EVENTSEL_USR | + ARCH_PERFMON_EVENTSEL_OS); + else if (event-attr.exclude_host) + event-hw.config |= AMD_PERFMON_EVENTSEL_GUESTONLY; + else if (event-attr.exclude_guest) + event-hw.config |= AMD_PERFMON_EVENTSEL_HOSTONLY; + if (event-attr.type != PERF_TYPE_RAW) return 0; -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 8/9] KVM, VMX: Add support for guest/host-only profiling
Support guest/host-only profiling by switch perf msrs on a guest entry if needed. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c | 21 + 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a938ddf..d0d4afa 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -39,6 +39,7 @@ #include asm/mce.h #include asm/i387.h #include asm/xcr.h +#include asm/perf_event.h #include trace.h @@ -6054,6 +6055,24 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu) vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); } +static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) +{ + int i, nr_msrs; + struct perf_guest_switch_msr *msrs; + + msrs = perf_guest_get_msrs(nr_msrs); + + if (!msrs) + return; + + for (i = 0; i nr_msrs; i++) + if (msrs[i].host == msrs[i].guest) + clear_atomic_switch_msr(vmx, msrs[i].msr); + else + add_atomic_switch_msr(vmx, msrs[i].msr, msrs[i].guest, + msrs[i].host); +} + #ifdef CONFIG_X86_64 #define R r #define Q q @@ -6103,6 +6122,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) vmx_set_interrupt_shadow(vcpu, 0); + atomic_switch_perf_msrs(vmx); + vmx-__launched = vmx-loaded_vmcs-launched; asm( /* Store host registers */ -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] virtio-net: Prevent NULL dereference
On Mon, 2011-10-03 at 20:40 +0200, Michael S. Tsirkin wrote: On Wed, Sep 28, 2011 at 05:40:55PM +0300, Sasha Levin wrote: This patch prevents a NULL dereference when the user has passed a length longer than an actual buffer to virtio-net. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualizat...@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com --- drivers/net/virtio_net.c | 12 +++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index bde0dec..4a53d2a 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -208,12 +208,22 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, return NULL; } - while (len) { + while (len page) { set_skb_frag(skb, page, offset, len); page = (struct page *)page-private; offset = 0; } + /* +* This is the case where we ran out of pages in our linked list, but +* supposedly have more data to read. Again, let's clarify that this only happens with broken devices. I think that the code within the if() makes it clear that it isn't the regular path. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] virtio-net: Verify page list size before fitting into skb
On Mon, 2011-10-03 at 21:04 +0200, Michael S. Tsirkin wrote: On Wed, Sep 28, 2011 at 05:40:54PM +0300, Sasha Levin wrote: This patch verifies that the length of a buffer stored in a linked list of pages is small enough to fit into a skb. If the size is larger than a max size of a skb, it means that we shouldn't go ahead building skbs anyway since we won't be able to send the buffer as the user requested. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualizat...@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com --- drivers/net/virtio_net.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0c7321c..bde0dec 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -195,6 +195,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, len -= copy; offset += copy; + /* +* Verify that we can indeed put this data into a skb. +* This is here to handle cases when the device erroneously +* tries to receive more than is possible. This is usually +* the case of a broken device. +*/ + if (unlikely(len MAX_SKB_FRAGS * PAGE_SIZE)) { + if (net_ratelimit()) + pr_debug(%s: too much data\n, skb-dev-name); + dev_kfree_skb(skb); + return NULL; + } + BTW, receive_mergeable does pr_debug(%s: packet too long\n, skb-dev-name); skb-dev-stats.rx_length_errors++; which makes sense. Do you think we should increase rx_length_errors here as well? -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 00/11] KVM: x86: optimize for writing guest page
On 09/23/2011 02:51 PM, Marcelo Tosatti wrote: On Thu, Sep 22, 2011 at 04:52:40PM +0800, Xiao Guangrong wrote: This patchset is against https://github.com/avikivity/kvm.git next branch. In this version, some changes come from Avi's comments: - fix instruction retried for nested guest - skip write-flooding for the sp whose level is 1 - rename some functions Looks good to me. To me as well. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/9] perf support for x86 guest/host-only bits
On 10/05/2011 02:01 PM, Gleb Natapov wrote: This patch series consists of Joerg series named perf support for amd guest/host-only bits v2 [1] rebased to 3.1.0-rc7 and in addition, support for intel cpus for the same functionality. Looks good to me. Peter, Ingo, if it works for you as well, please merge 1-6 into a branch on tip that I can pull and apply the rest on top of. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 6/9] perf, intel: Use GO/HO bits in perf-ctr
Had some serious conflicts with tip/master, queued the below merge. --- Subject: perf, intel: Use GO/HO bits in perf-ctr From: Gleb Natapov g...@redhat.com Date: Wed, 5 Oct 2011 14:01:21 +0200 Intel does not have guest/host-only bit in perf counters like AMD does. To support GO/HO bits KVM needs to switch EVENTSELn values (or PERF_GLOBAL_CTRL if available) at a guest entry. If a counter is configured to count only in a guest mode it stays disabled in a host, but VMX is configured to switch it to enabled value during guest entry. This patch adds GO/HO tracking to Intel perf code and provides interface for KVM to get a list of MSRs that need to be switched on a guest entry. Only cpus with architectural PMU (v1 or later) are supported with this patch. To my knowledge there is not p6 models with VMX but without architectural PMU and p4 with VMX are rare and the interface is general enough to support them if need arise. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl Link: http://lkml.kernel.org/r/1317816084-18026-7-git-send-email-g...@redhat.com --- arch/x86/include/asm/perf_event.h | 12 arch/x86/kernel/cpu/perf_event.h | 12 arch/x86/kernel/cpu/perf_event_intel.c | 91 +++-- 3 files changed, 112 insertions(+), 3 deletions(-) Index: linux-2.6/arch/x86/include/asm/perf_event.h === --- linux-2.6.orig/arch/x86/include/asm/perf_event.h +++ linux-2.6/arch/x86/include/asm/perf_event.h @@ -162,7 +162,19 @@ extern unsigned long perf_misc_flags(str ); \ } +struct perf_guest_switch_msr { + unsigned msr; + u64 host, guest; +}; + +extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr); #else +static inline perf_guest_switch_msr *perf_guest_get_msrs(int *nr) +{ + *nr = 0; + return NULL; +} + static inline void perf_events_lapic_init(void){ } #endif Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c === --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c +++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c @@ -749,7 +749,8 @@ static void intel_pmu_enable_all(int add intel_pmu_pebs_enable_all(); intel_pmu_lbr_enable_all(); - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl); + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, + x86_pmu.intel_ctrl ~cpuc-intel_ctrl_guest_mask); if (test_bit(X86_PMC_IDX_FIXED_BTS, cpuc-active_mask)) { struct perf_event *event = @@ -872,6 +873,7 @@ static void intel_pmu_disable_fixed(stru static void intel_pmu_disable_event(struct perf_event *event) { struct hw_perf_event *hwc = event-hw; + struct cpu_hw_events *cpuc = __get_cpu_var(cpu_hw_events); if (unlikely(hwc-idx == X86_PMC_IDX_FIXED_BTS)) { intel_pmu_disable_bts(); @@ -879,6 +881,9 @@ static void intel_pmu_disable_event(stru return; } + cpuc-intel_ctrl_guest_mask = ~(1ull hwc-idx); + cpuc-intel_ctrl_host_mask = ~(1ull hwc-idx); + if (unlikely(hwc-config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) { intel_pmu_disable_fixed(hwc); return; @@ -924,6 +929,7 @@ static void intel_pmu_enable_fixed(struc static void intel_pmu_enable_event(struct perf_event *event) { struct hw_perf_event *hwc = event-hw; + struct cpu_hw_events *cpuc = __get_cpu_var(cpu_hw_events); if (unlikely(hwc-idx == X86_PMC_IDX_FIXED_BTS)) { if (!__this_cpu_read(cpu_hw_events.enabled)) @@ -933,6 +939,11 @@ static void intel_pmu_enable_event(struc return; } + if (event-attr.exclude_host) + cpuc-intel_ctrl_guest_mask |= (1ull hwc-idx); + if (event-attr.exclude_guest) + cpuc-intel_ctrl_host_mask |= (1ull hwc-idx); + if (unlikely(hwc-config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) { intel_pmu_enable_fixed(hwc); return; @@ -1302,12 +1313,84 @@ static int intel_pmu_hw_config(struct pe return 0; } +struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr) +{ + if (x86_pmu.guest_get_msrs) + return x86_pmu.guest_get_msrs(nr); + *nr = 0; + return NULL; +} +EXPORT_SYMBOL_GPL(perf_guest_get_msrs); + +static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr) +{ + struct cpu_hw_events *cpuc = __get_cpu_var(cpu_hw_events); + struct perf_guest_switch_msr *arr = cpuc-guest_switch_msrs; + + arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; + arr[0].host = x86_pmu.intel_ctrl ~cpuc-intel_ctrl_guest_mask; + arr[0].guest = x86_pmu.intel_ctrl ~cpuc-intel_ctrl_host_mask; + + *nr = 1; + return arr; +} + +static struct
Re: [PATCH v2 8/9] KVM, VMX: Add support for guest/host-only profiling
On Wed, 2011-10-05 at 14:01 +0200, Gleb Natapov wrote: +static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) +{ + int i, nr_msrs; + struct perf_guest_switch_msr *msrs; + + msrs = perf_guest_get_msrs(nr_msrs); + + if (!msrs) + return; + + for (i = 0; i nr_msrs; i++) + if (msrs[i].host == msrs[i].guest) + clear_atomic_switch_msr(vmx, msrs[i].msr); + else + add_atomic_switch_msr(vmx, msrs[i].msr, msrs[i].guest, + msrs[i].host); +} I don't think this will actually compile with PERF_EVENTS=n due to struct perf_guest_switch_msr not being defined. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/9] perf support for x86 guest/host-only bits
On Wed, 2011-10-05 at 15:48 +0200, Avi Kivity wrote: On 10/05/2011 02:01 PM, Gleb Natapov wrote: This patch series consists of Joerg series named perf support for amd guest/host-only bits v2 [1] rebased to 3.1.0-rc7 and in addition, support for intel cpus for the same functionality. Looks good to me. Peter, Ingo, if it works for you as well, please merge 1-6 into a branch on tip that I can pull and apply the rest on top of. I queued bits, and they should hopefully find their way to tip:perf/core soonish. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: E500: Support hugetlbfs
With hugetlbfs support emerging on e500, we should also support KVM backing its guest memory by it. This patch adds support for hugetlbfs into the e500 shadow mmu code. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - address scott's comments --- arch/powerpc/kvm/e500_tlb.c | 24 1 files changed, 24 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index ec17148..1dd96a9 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c @@ -24,6 +24,7 @@ #include linux/sched.h #include linux/rwsem.h #include linux/vmalloc.h +#include linux/hugetlb.h #include asm/kvm_ppc.h #include asm/kvm_e500.h @@ -673,12 +674,31 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, pfn = ~(tsize_pages - 1); break; } + } else if (vma hva = vma-vm_start + (vma-vm_flags VM_HUGETLB)) { + unsigned long psize = vma_kernel_pagesize(vma); + + tsize = (gtlbe-mas1 MAS1_TSIZE_MASK) + MAS1_TSIZE_SHIFT; + + /* +* Take the largest page size that satisfies both host +* and guest mapping +*/ + tsize = min(__ilog2(psize) - 10, tsize); + + /* +* e500 doesn't implement the lowest tsize bit, +* or 1K pages. +*/ + tsize = max(BOOK3E_PAGESZ_4K, tsize ~1); } up_read(current-mm-mmap_sem); } if (likely(!pfnmap)) { + unsigned long tsize_pages = 1 (tsize + 10 - PAGE_SHIFT); pfn = gfn_to_pfn_memslot(vcpu_e500-vcpu.kvm, slot, gfn); if (is_error_pfn(pfn)) { printk(KERN_ERR Couldn't get real page for gfn %lx!\n, @@ -686,6 +706,10 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, kvm_release_pfn_clean(pfn); return; } + + /* Align guest and physical address to page map boundaries */ + pfn = ~(tsize_pages - 1); + gvaddr = ~((tsize_pages PAGE_SHIFT) - 1); } /* Drop old ref and setup new one. */ -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Remove extboot support
On Wed, Oct 05, 2011 at 09:52:12AM +0200, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Except for booting from SCSI via the LSI controller, SeaBIOS has native support for mass storage interfaces now. And SCSI can be worked around via [1] - or someone finally adds the necessary bits to SeaBIOS if there is a real need. [1] http://thread.gmane.org/gmane.comp.emulators.qemu/78467 Signed-off-by: Jan Kiszka jan.kis...@siemens.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] virtio-net: Verify page list size before fitting into skb
On Wed, Oct 05, 2011 at 03:50:54PM +0200, Sasha Levin wrote: On Mon, 2011-10-03 at 21:04 +0200, Michael S. Tsirkin wrote: On Wed, Sep 28, 2011 at 05:40:54PM +0300, Sasha Levin wrote: This patch verifies that the length of a buffer stored in a linked list of pages is small enough to fit into a skb. If the size is larger than a max size of a skb, it means that we shouldn't go ahead building skbs anyway since we won't be able to send the buffer as the user requested. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualizat...@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com --- drivers/net/virtio_net.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0c7321c..bde0dec 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -195,6 +195,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, len -= copy; offset += copy; + /* + * Verify that we can indeed put this data into a skb. + * This is here to handle cases when the device erroneously + * tries to receive more than is possible. This is usually + * the case of a broken device. + */ + if (unlikely(len MAX_SKB_FRAGS * PAGE_SIZE)) { + if (net_ratelimit()) + pr_debug(%s: too much data\n, skb-dev-name); + dev_kfree_skb(skb); + return NULL; + } + BTW, receive_mergeable does pr_debug(%s: packet too long\n, skb-dev-name); skb-dev-stats.rx_length_errors++; which makes sense. Do you think we should increase rx_length_errors here as well? this is all debugging tool for devices/drivers, right? so maybe not worth the noise. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] virtio-net: Prevent NULL dereference
On Wed, Oct 05, 2011 at 03:50:14PM +0200, Sasha Levin wrote: On Mon, 2011-10-03 at 20:40 +0200, Michael S. Tsirkin wrote: On Wed, Sep 28, 2011 at 05:40:55PM +0300, Sasha Levin wrote: This patch prevents a NULL dereference when the user has passed a length longer than an actual buffer to virtio-net. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualizat...@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com --- drivers/net/virtio_net.c | 12 +++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index bde0dec..4a53d2a 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -208,12 +208,22 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, return NULL; } - while (len) { + while (len page) { set_skb_frag(skb, page, offset, len); page = (struct page *)page-private; offset = 0; } + /* + * This is the case where we ran out of pages in our linked list, but + * supposedly have more data to read. Again, let's clarify that this only happens with broken devices. I think that the code within the if() makes it clear that it isn't the regular path. It doesn't make it clear that this never happens in absence of bugs. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 8/9] KVM, VMX: Add support for guest/host-only profiling
On Wed, Oct 05, 2011 at 04:19:39PM +0200, Peter Zijlstra wrote: On Wed, 2011-10-05 at 14:01 +0200, Gleb Natapov wrote: +static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) +{ + int i, nr_msrs; + struct perf_guest_switch_msr *msrs; + + msrs = perf_guest_get_msrs(nr_msrs); + + if (!msrs) + return; + + for (i = 0; i nr_msrs; i++) + if (msrs[i].host == msrs[i].guest) + clear_atomic_switch_msr(vmx, msrs[i].msr); + else + add_atomic_switch_msr(vmx, msrs[i].msr, msrs[i].guest, + msrs[i].host); +} I don't think this will actually compile with PERF_EVENTS=n due to struct perf_guest_switch_msr not being defined. Oops you are right. Turns out it is not enough to remove PERF_EVENTS from .config to disable it. It re-appears after make oldconfig. Should I send incremental patch to fix that? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 8/9] KVM, VMX: Add support for guest/host-only profiling
On Wed, 2011-10-05 at 17:29 +0200, Gleb Natapov wrote: On Wed, Oct 05, 2011 at 04:19:39PM +0200, Peter Zijlstra wrote: On Wed, 2011-10-05 at 14:01 +0200, Gleb Natapov wrote: +static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) +{ + int i, nr_msrs; + struct perf_guest_switch_msr *msrs; + + msrs = perf_guest_get_msrs(nr_msrs); + + if (!msrs) + return; + + for (i = 0; i nr_msrs; i++) + if (msrs[i].host == msrs[i].guest) + clear_atomic_switch_msr(vmx, msrs[i].msr); + else + add_atomic_switch_msr(vmx, msrs[i].msr, msrs[i].guest, + msrs[i].host); +} I don't think this will actually compile with PERF_EVENTS=n due to struct perf_guest_switch_msr not being defined. Oops you are right. Turns out it is not enough to remove PERF_EVENTS from .config to disable it. It re-appears after make oldconfig. Should I send incremental patch to fix that? Frederic, what's the status of being able to disable PERF on x86 again? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kernel bug - 3.1.0 - rc9
I'm not sure exactly where to report this (and the kernel.org reporting bugs page is still offline) RC9 of the 3.1 kernel is causing VirtualBox to fail (certainly here on amd64 - Phenom II 965 quad). syslog output is: Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:Oops: [#1] PREEMPT SMP Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:Stack: Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:Call Trace: Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:Code: ac e0 66 8b b3 b8 02 00 00 8b bb b0 02 00 00 48 c7 83 68 02 00 00 00 00 00 00 e8 68 ff ff ff 48 c7 c7 00 aa 62 a0 e8 8e d4 cd e0 Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:CR2: 0088 vboxsrv drivers install correctly and there are no obvious errors from vbox itself. The bug manifests itself by simply hanging the VM at startup. Versions pre rc9 work fine in this regard. I hope that this is of some help. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: E500: Support hugetlbfs
On 10/05/2011 09:37 AM, Alexander Graf wrote: diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index ec17148..1dd96a9 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c @@ -24,6 +24,7 @@ #include linux/sched.h #include linux/rwsem.h #include linux/vmalloc.h +#include linux/hugetlb.h #include asm/kvm_ppc.h #include asm/kvm_e500.h @@ -673,12 +674,31 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, pfn = ~(tsize_pages - 1); break; } + } else if (vma hva = vma-vm_start + (vma-vm_flags VM_HUGETLB)) { + unsigned long psize = vma_kernel_pagesize(vma); + + tsize = (gtlbe-mas1 MAS1_TSIZE_MASK) + MAS1_TSIZE_SHIFT; + + /* + * Take the largest page size that satisfies both host + * and guest mapping + */ + tsize = min(__ilog2(psize) - 10, tsize); Any reason for __ilog2() rather than ilog2()? Shouldn't make a difference, just curious about avoiding the public interface. Either way, Acked-by: Scott Wood scottw...@freescale.com -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: KVM: emulate lapic tsc deadline timer for guest
Yes, Avi has noticed this issue and fix the bug as attached. Thanks, Jinsong Dan Carpenter wrote: This patch causes a NULL dereference for me when I start qemu. [ 136.130978] BUG: unable to handle kernel NULL pointer dereference at 0078 [ 136.131032] IP: [a015a3d3] update_cpuid+0x63/0x90 [kvm] [ 136.131076] PGD 3fcac4067 PUD 3fc91a067 PMD 0 [ 136.131108] Oops: 0002 [#1] SMP [ 136.131132] CPU 0 [ 136.131145] Modules linked in: e1000e fuse kvm_intel kvm radeon ttm [last unloaded: e1000e] [ 136.131208] [ 136.131219] Pid: 2678, comm: qemu-system-x86 Not tainted 3.1.0-rc8-next-20110930+ #92 System manufacturer System Product Name/P8Z68-V PRO [ 136.131289] RIP: 0010:[a015a3d3] [a015a3d3] update_cpuid+0x63/0x90 [kvm] [ 136.131341] RSP: 0018:880404761d20 EFLAGS: 00010282 [ 136.131370] RAX: RBX: 8803fc408000 RCX: 8803fc408b18 [ 136.131408] RDX: 80802001 RSI: 0001 RDI: 8803fc408000 [ 136.131445] RBP: 880404761d28 R08: 0015 R09: 0003 [ 136.131483] R10: 0003 R11: R12: 8803fc408000 [ 136.131520] R13: 7fffb5e8d870 R14: 0015 R15: [ 136.131559] FS: 7f96c6324760() GS:88042f40() knlGS: [ 136.131602] CS: 0010 DS: ES: CR0: 80050033 [ 136.131632] CR2: 0078 CR3: 0003fcac5000 CR4: 000426f0 [ 136.131670] DR0: DR1: DR2: [ 136.131707] DR3: DR6: 0ff0 DR7: 0400 [ 136.131745] Process qemu-system-x86 (pid: 2678, threadinfo 88040476, task 880404618000) [ 136.131792] Stack: [ 136.131804] 880404761df8 a016635f [ 136.131851] 0015 0200 [ 136.131897] 88042f7ee000 880404761da8 a01beba9 c90006722000 [ 136.131944] Call Trace: [ 136.131966] [a016635f] kvm_arch_vcpu_ioctl+0xd5f/0x15d0 [kvm] [ 136.132005] [a01beba9] ? vmx_vcpu_load+0x39/0x1b0 [kvm_intel] [ 136.132046] [a01654fb] ? kvm_arch_vcpu_load+0x6b/0x170 [kvm] Here is the code listing from gdb: (gdb) list *(update_cpuid+0x63) 0xc3d3 is in update_cpuid (arch/x86/kvm/x86.c:618). 613 } 614 615 if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL 616 best-function == 0x1) { 617 best-ecx |= bit(X86_FEATURE_TSC_DEADLINE_TIMER); 618 vcpu-arch.apic-lapic_timer.timer_mode_mask = (3 17); 619 } else 620 vcpu-arch.apic-lapic_timer.timer_mode_mask = (1 17); 621 } 622 (gdb) Reverting the patch fixes things for me. I'm using linux-next from Friday. regards, dan carpenter ---BeginMessage--- vcpu-arch.apic may be NULL. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/x86.c |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 83b839f..aa11707 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -600,6 +600,8 @@ static bool guest_cpuid_has_fsgsbase(struct kvm_vcpu *vcpu) static void update_cpuid(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; + struct kvm_lapic *apic = vcpu-arch.apic; + u32 timer_mode_mask; best = kvm_find_cpuid_entry(vcpu, 1, 0); if (!best) @@ -615,9 +617,12 @@ static void update_cpuid(struct kvm_vcpu *vcpu) if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL best-function == 0x1) { best-ecx |= bit(X86_FEATURE_TSC_DEADLINE_TIMER); - vcpu-arch.apic-lapic_timer.timer_mode_mask = (3 17); + timer_mode_mask = 3 17; } else - vcpu-arch.apic-lapic_timer.timer_mode_mask = (1 17); + timer_mode_mask = 1 17; + + if (apic) + apic-lapic_timer.timer_mode_mask = timer_mode_mask; } int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) -- 1.7.6.3 ---End Message---
RE: [PATCH] KVM: emulate lapic tsc deadline timer for guest
Marcelo Tosatti wrote: On Sun, Sep 25, 2011 at 10:47:46PM +0800, Liu, Jinsong wrote: Marcelo Tosatti wrote: On Fri, Sep 23, 2011 at 04:25:51PM +0800, Liu, Jinsong wrote: Marcelo Tosatti wrote: On Thu, Sep 22, 2011 at 04:55:52PM +0800, Liu, Jinsong wrote: From 4d5b83aba40ce0d421add9a41a6c591a8590a32e Mon Sep 17 00:00:00 2001 From: Liu, Jinsong jinsong@intel.com Date: Thu, 22 Sep 2011 14:00:08 +0800 Subject: [PATCH 2/2] KVM: emulate lapic tsc deadline timer for guest This patch emulate lapic tsc deadline timer for guest: Enumerate tsc deadline timer capability by CPUID; Enable tsc deadline timer mode by lapic MMIO; Start tsc deadline timer by WRMSR; Signed-off-by: Liu, Jinsong jinsong@intel.com --- arch/x86/include/asm/kvm_host.h |2 + arch/x86/kvm/kvm_timer.h|2 + arch/x86/kvm/lapic.c| 123 --- arch/x86/kvm/lapic.h |3 + arch/x86/kvm/x86.c | 16 +- 5 files changed, 123 insertions(+), 23 deletions(-) Looks good, please rebase against branch master of git://github.com/avikivity/kvm.git Rebased as attached. Thanks, Jinsong Please write a simple test case to arm a lapic timer via wrmsr (see https://github.com/avikivity/kvm-unit-tests). Kernel patches have been applied, thanks. Marcelo, I'm not quite clear the purpose and usage of test case of the kvm-unit-tests. Can you give me some hint? The purpose is to add unit tests for new features (such as lapic deadline timer). There are examples that make it relatively easy to construct new test case (or modify existing ones to accomodate new tests). Please add a new test case for lapic deadline timer, thanks. Thanks Marcelo. I will add the test case. Sorry for slow email reply because of holiday. Regards, Jinsong -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcpdump locks up kvm host for a while.
On 10/04/2011 09:40 PM, Robin Lee Powell wrote: When I run tcpdump on a *guest*, the entire guest completely freezes up; no response even to hitting enter on the console. virsh list also locks up whenever it tries to print state about that VM (but the others work fine), as does any other operation that touches the state of that VM. The VM takes up 100% of CPU on one core while this is happening. Eventually it gets better. You can use 'perf kvm' to figure out where the guest is spinning. OK, gathered with: sudo perf kvm --guest --host record -o /tmp/kvm_perf -a I don't know how to read it at all, so it's at http://users.digitalkingdom.org/~rlpowell/media/public/kvm_perf Not accessible. Please post the output of 'perf kvm report log'. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
accessing a 'native' disk from KVM ?
HI all Using KVM (qemu-kvm-0.12.1) on CentOS-6. how can I supply an entire disk for KVM to use? Is this correct snippet in kvm.xml ? disk type='block' device='disk' driver name='qemu' type='raw'/ source dev='/dev/sdi1'/ target dev='vdb' bus='virtio'/ /disk thanks very much Sujee Maniyam http://sujee.net -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm] accessing a 'native' disk from KVM ?
On Wed, Oct 05, 2011 at 11:41:13AM -0700, Sujee Maniyam wrote: HI all Using KVM (qemu-kvm-0.12.1) on CentOS-6. how can I supply an entire disk for KVM to use? Is this correct snippet in kvm.xml ? disk type='block' device='disk' driver name='qemu' type='raw'/ source dev='/dev/sdi1'/ target dev='vdb' bus='virtio'/ /disk This is what I have: disk type='block' device='disk' driver name='qemu' type='raw' cache='none'/ source dev='/dev/mapper/local-vrici_root'/ target dev='vda' bus='virtio'/ address type='pci' domain='0x' bus='0x00' slot='0x04' function='0x0'/ /disk So, yeah, pretty much, although note tat you're making /dev/sdi*1* available, which is a partition, not a disk. Regardless, http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=%2Fliaat%2Fliaattuncache.htm (and other things I've seen) suggest cache=none for these cases. -Robin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: E500: Support hugetlbfs
On 05.10.2011, at 18:06, Scott Wood wrote: On 10/05/2011 09:37 AM, Alexander Graf wrote: diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index ec17148..1dd96a9 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c @@ -24,6 +24,7 @@ #include linux/sched.h #include linux/rwsem.h #include linux/vmalloc.h +#include linux/hugetlb.h #include asm/kvm_ppc.h #include asm/kvm_e500.h @@ -673,12 +674,31 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, pfn = ~(tsize_pages - 1); break; } +} else if (vma hva = vma-vm_start + (vma-vm_flags VM_HUGETLB)) { +unsigned long psize = vma_kernel_pagesize(vma); + +tsize = (gtlbe-mas1 MAS1_TSIZE_MASK) +MAS1_TSIZE_SHIFT; + +/* + * Take the largest page size that satisfies both host + * and guest mapping + */ +tsize = min(__ilog2(psize) - 10, tsize); Any reason for __ilog2() rather than ilog2()? Shouldn't make a difference, just curious about avoiding the public interface. I grep'ed through the kernel tree and only found __ilog2 defined as well as mostly users for __ilog2, so I figured there's got to be a reason ;) Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[QEMU PATCH] kvm: support TSC deadline MSR with subsection
Jinsong, please test this qemu-kvm patch by migrating a guest which is currently using TSC deadline timer. Using subsections avoids breaking migration to older qemu versions when the guest does not make use of TSC deadline feature. - From: Liu, Jinsong jinsong@intel.com KVM add emulation of lapic tsc deadline timer for guest. This patch is co-operation work at qemu side. Use subsections to save/restore the field (mtosatti). Signed-off-by: Liu, Jinsong jinsong@intel.com diff --git a/target-i386/cpu.h b/target-i386/cpu.h index ae36489..29412dc 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -283,6 +283,7 @@ #define MSR_IA32_APICBASE_BSP (18) #define MSR_IA32_APICBASE_ENABLE(111) #define MSR_IA32_APICBASE_BASE (0xf12) +#define MSR_IA32_TSCDEADLINE0x6e0 #define MSR_MTRRcap0xfe #define MSR_MTRRcap_VCNT 8 @@ -687,6 +688,7 @@ typedef struct CPUX86State { uint64_t async_pf_en_msr; uint64_t tsc; +uint64_t tsc_deadline; uint64_t mcg_status; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index b6eef04..90a6ffb 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -59,6 +59,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = { static bool has_msr_star; static bool has_msr_hsave_pa; +static bool has_msr_tsc_deadline; static bool has_msr_async_pf_en; static int lm_capable_kernel; @@ -568,6 +569,10 @@ static int kvm_get_supported_msrs(KVMState *s) has_msr_hsave_pa = true; continue; } +if (kvm_msr_list-indices[i] == MSR_IA32_TSCDEADLINE) { +has_msr_tsc_deadline = true; +continue; +} } } @@ -881,6 +886,9 @@ static int kvm_put_msrs(CPUState *env, int level) if (has_msr_hsave_pa) { kvm_msr_entry_set(msrs[n++], MSR_VM_HSAVE_PA, env-vm_hsave); } +if (has_msr_tsc_deadline) { +kvm_msr_entry_set(msrs[n++], MSR_IA32_TSCDEADLINE, env-tsc_deadline); +} #ifdef TARGET_X86_64 if (lm_capable_kernel) { kvm_msr_entry_set(msrs[n++], MSR_CSTAR, env-cstar); @@ -1127,6 +1135,9 @@ static int kvm_get_msrs(CPUState *env) if (has_msr_hsave_pa) { msrs[n++].index = MSR_VM_HSAVE_PA; } +if (has_msr_tsc_deadline) { +msrs[n++].index = MSR_IA32_TSCDEADLINE; +} if (!env-tsc_valid) { msrs[n++].index = MSR_IA32_TSC; @@ -1195,6 +1206,9 @@ static int kvm_get_msrs(CPUState *env) case MSR_IA32_TSC: env-tsc = msrs[i].data; break; +case MSR_IA32_TSCDEADLINE: +env-tsc_deadline = msrs[i].data; +break; case MSR_VM_HSAVE_PA: env-vm_hsave = msrs[i].data; break; diff --git a/target-i386/machine.c b/target-i386/machine.c index 9aca8e0..176d372 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -310,6 +310,24 @@ static const VMStateDescription vmstate_fpop_ip_dp = { } }; +static bool tscdeadline_needed(void *opaque) +{ +CPUState *env = opaque; + +return env-tsc_deadline != 0; +} + +static const VMStateDescription vmstate_msr_tscdeadline = { +.name = cpu/msr_tscdeadline, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT64(tsc_deadline, CPUState), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_cpu = { .name = cpu, .version_id = CPU_SAVE_VERSION, @@ -420,6 +438,9 @@ static const VMStateDescription vmstate_cpu = { } , { .vmsd = vmstate_fpop_ip_dp, .needed = fpop_ip_dp_needed, +}, { +.vmsd = vmstate_msr_tscdeadline, +.needed = tscdeadline_needed, } , { /* empty */ } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: E500: Support hugetlbfs
On 10/05/2011 01:55 PM, Alexander Graf wrote: On 05.10.2011, at 18:06, Scott Wood wrote: Any reason for __ilog2() rather than ilog2()? Shouldn't make a difference, just curious about avoiding the public interface. I grep'ed through the kernel tree and only found __ilog2 defined as well as mostly users for __ilog2, so I figured there's got to be a reason ;) ilog2() is defined in include/linux/ilog2.h. It produces constant output if the input is constant, and appears to be the front door to __ilog2_u32/__ilog2_u64. Plain __ilog2 is older and powerpc-specific, which is probably why there are more users of that in arch/powerpc. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Perf for only one VM?
Is it possible to get perf kvm record to record only what's going on on a single VM? The options seem to be --host (all host data) and --guest (all VM data). I'm getting 74.86% qemu-kvm [unknown][u] 0x7f3795bc408d, and I think that's because I've got kallsyms loaded from a particular guest, but not the others. -Robin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] virtio: support unlocked queue kick
Split virtqueue_kick to be able to do the actual notification outside the lock protecting the virtqueue. This patch was originally done by Stefan Hajnoczi, but I can't find the original one anymore and had to recreated it from memory. Pointers to the original or corrections for the commit message are welcome. Index: linux-2.6/drivers/virtio/virtio_ring.c === --- linux-2.6.orig/drivers/virtio/virtio_ring.c 2011-09-15 15:28:55.891347016 +0200 +++ linux-2.6/drivers/virtio/virtio_ring.c 2011-10-03 18:45:32.492738431 +0200 @@ -237,9 +237,11 @@ add_head: } EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp); -void virtqueue_kick(struct virtqueue *_vq) +bool virtqueue_kick_prepare(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); + bool need_kick = false; + u16 new, old; START_USE(vq); /* Descriptors and available array need to be set before we expose the @@ -253,15 +255,32 @@ void virtqueue_kick(struct virtqueue *_v /* Need to update avail index before checking if we should notify */ virtio_mb(); - if (vq-event ? - vring_need_event(vring_avail_event(vq-vring), new, old) : - !(vq-vring.used-flags VRING_USED_F_NO_NOTIFY)) - /* Prod other side to tell it about changes. */ - vq-notify(vq-vq); - + if (vq-event) { + if (vring_need_event(vring_avail_event(vq-vring), new, old)) + need_kick = true; + } else { + if (!(vq-vring.used-flags VRING_USED_F_NO_NOTIFY)) + need_kick = true; + } END_USE(vq); + return need_kick; +} +EXPORT_SYMBOL_GPL(virtqueue_kick_prepare); + +void virtqueue_notify(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + /* Prod other side to tell it about changes. */ + vq-notify(_vq); +} +EXPORT_SYMBOL_GPL(virtqueue_notify); + +void virtqueue_kick(struct virtqueue *vq) +{ + if (virtqueue_kick_prepare(vq)) + virtqueue_notify(vq); } -EXPORT_SYMBOL_GPL(virtqueue_kick); static void detach_buf(struct vring_virtqueue *vq, unsigned int head) { Index: linux-2.6/include/linux/virtio.h === --- linux-2.6.orig/include/linux/virtio.h 2011-09-15 15:28:57.078857804 +0200 +++ linux-2.6/include/linux/virtio.h2011-10-03 18:41:07.309766531 +0200 @@ -86,6 +86,8 @@ static inline int virtqueue_add_buf(stru } void virtqueue_kick(struct virtqueue *vq); +bool virtqueue_kick_prepare(struct virtqueue *vq); +void virtqueue_notify(struct virtqueue *vq); void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] virtio-blk: implement -make_request
Add an alternate I/O path that implements -make_request for virtio-blk. This is required for high IOPs devices which get slowed down to 1/5th of the native speed by all the locking, memory allocation and other overhead in the request based I/O path. This patch is not quite merge ready due to two issues: - it doesn't implement FUA and FLUSH requests yet - it hardcodes which I/O path to chose Signed-off-by: Christoph Hellwig h...@lst.de Index: linux-2.6/drivers/block/virtio_blk.c === --- linux-2.6.orig/drivers/block/virtio_blk.c 2011-10-05 10:36:42.883913334 -0400 +++ linux-2.6/drivers/block/virtio_blk.c2011-10-05 15:29:35.591405323 -0400 @@ -11,6 +11,8 @@ #define PART_BITS 4 +static int use_make_request = 1; + static int major, index; struct workqueue_struct *virtblk_wq; @@ -20,6 +22,7 @@ struct virtio_blk struct virtio_device *vdev; struct virtqueue *vq; + wait_queue_head_t queue_wait; /* The disk structure for the kernel. */ struct gendisk *disk; @@ -39,11 +42,13 @@ struct virtio_blk struct virtblk_req { void *private; + struct virtblk_req *next; struct virtio_blk_outhdr out_hdr; struct virtio_scsi_inhdr in_hdr; u8 kind; #define VIRTIO_BLK_REQUEST 0x00 -#define VIRTIO_BLK_INTERNAL0x01 +#define VIRTIO_BLK_BIO 0x01 +#define VIRTIO_BLK_INTERNAL0x02 u8 status; }; @@ -74,10 +79,17 @@ static void virtblk_request_done(struct mempool_free(vbr, vblk-pool); } +static void virtblk_bio_done(struct virtio_blk *vblk, + struct virtblk_req *vbr) +{ + bio_endio(vbr-private, virtblk_result(vbr)); + mempool_free(vbr, vblk-pool); +} + static void blk_done(struct virtqueue *vq) { struct virtio_blk *vblk = vq-vdev-priv; - struct virtblk_req *vbr; + struct virtblk_req *vbr, *head = NULL, *tail = NULL; unsigned int len; unsigned long flags; @@ -88,15 +100,47 @@ static void blk_done(struct virtqueue *v virtblk_request_done(vblk, vbr); break; case VIRTIO_BLK_INTERNAL: - complete(vbr-private); + case VIRTIO_BLK_BIO: + if (head) { + tail-next = vbr; + tail = vbr; + } else { + tail = head = vbr; + } break; default: BUG(); } } - /* In case queue is stopped waiting for more buffers. */ - blk_start_queue(vblk-disk-queue); + + if (!use_make_request) { + /* In case queue is stopped waiting for more buffers. */ + blk_start_queue(vblk-disk-queue); + } spin_unlock_irqrestore(vblk-lock, flags); + + wake_up(vblk-queue_wait); + + /* +* Process completions after freeing up space in the virtqueue and +* dropping the lock. +*/ + while (head) { + vbr = head; + head = head-next; + + switch (vbr-kind) { + case VIRTIO_BLK_BIO: + virtblk_bio_done(vblk, vbr); + break; + case VIRTIO_BLK_INTERNAL: + complete(vbr-private); + break; + default: + BUG(); + } + + } } static bool do_req(struct request_queue *q, struct virtio_blk *vblk, @@ -111,6 +155,7 @@ static bool do_req(struct request_queue return false; vbr-private = req; + vbr-next = NULL; vbr-kind = VIRTIO_BLK_REQUEST; if (req-cmd_flags REQ_FLUSH) { @@ -199,6 +244,128 @@ static void do_virtblk_request(struct re virtqueue_kick(vblk-vq); } +struct virtblk_plug_cb { + struct blk_plug_cb cb; + struct virtio_blk *vblk; +}; + +static void virtblk_unplug(struct blk_plug_cb *bcb) +{ + struct virtblk_plug_cb *cb = + container_of(bcb, struct virtblk_plug_cb, cb); + + virtqueue_notify(cb-vblk-vq); + kfree(cb); +} + +static bool virtblk_plugged(struct virtio_blk *vblk) +{ + struct blk_plug *plug = current-plug; + struct virtblk_plug_cb *cb; + + if (!plug) + return false; + + list_for_each_entry(cb, plug-cb_list, cb.list) { + if (cb-cb.callback == virtblk_unplug cb-vblk == vblk) + return true; + } + + /* Not currently on the callback list */ + cb = kmalloc(sizeof(*cb), GFP_ATOMIC); + if (!cb) + return false; + + cb-vblk = vblk; + cb-cb.callback = virtblk_unplug; + list_add(cb-cb.list, plug-cb_list); + return true; +} + +static void
[PATCH 3/5] virtio-blk: remove the unused list of pending requests
Signed-off-by: Christoph Hellwig h...@lst.de Index: linux-2.6/drivers/block/virtio_blk.c === --- linux-2.6.orig/drivers/block/virtio_blk.c 2011-10-03 19:55:29.061215040 +0200 +++ linux-2.6/drivers/block/virtio_blk.c2011-10-03 19:55:54.196412176 +0200 @@ -24,9 +24,6 @@ struct virtio_blk /* The disk structure for the kernel. */ struct gendisk *disk; - /* Request tracking. */ - struct list_head reqs; - mempool_t *pool; /* Process context for config space updates */ @@ -41,7 +38,6 @@ struct virtio_blk struct virtblk_req { - struct list_head list; struct request *req; struct virtio_blk_outhdr out_hdr; struct virtio_scsi_inhdr in_hdr; @@ -85,7 +81,6 @@ static void blk_done(struct virtqueue *v } __blk_end_request_all(vbr-req, error); - list_del(vbr-list); mempool_free(vbr, vblk-pool); } /* In case queue is stopped waiting for more buffers. */ @@ -170,7 +165,6 @@ static bool do_req(struct request_queue return false; } - list_add_tail(vbr-list, vblk-reqs); return true; } @@ -368,7 +362,6 @@ static int __devinit virtblk_probe(struc goto out; } - INIT_LIST_HEAD(vblk-reqs); spin_lock_init(vblk-lock); vblk-vdev = vdev; vblk-sg_elems = sg_elems; @@ -526,9 +519,6 @@ static void __devexit virtblk_remove(str flush_work(vblk-config_work); - /* Nothing should be pending. */ - BUG_ON(!list_empty(vblk-reqs)); - /* Stop all the virtqueues. */ vdev-config-reset(vdev); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] RFC: -make_request support for virtio-blk
This patchset allows the virtio-blk driver to support much higher IOP rates which can be driven out of modern PCI-e flash devices. At this point it really is just a RFC due to various issues. The first four patches are infrastructure that could go in fairly soon as far as I'm concerned. Patch 5 implements the actual -make_request support and still has a few issues, see there for more details. With it I can driver my PCI-e test devices to 85-90% of the native IOPS and bandwith, but be warned that this is still a fairly low end setup as far as expensive flash storage is concerned. One big downside that is has is that it current exposes a nasty race in the qemu virtqueue code - just running xfstests inside a guest using the new virtio-blk driver (even on a slow device) will trigger it and lead to a filesystem shutdown. I've tracked it down to getting data I/O segments overwritten with status s/g list entries, but got lost at that point. I can start a separate thread on it. Besides that it is missing a few features, and we have to decided how to select which mode to use in virtio-blk - either a module option, sysfs attribute or something that the host communicates. Or maybe decide that just going with -make_request alone is fine, even on my cheap laptop SSD it actually is just as fast if not slightly faster than the request based variant on my laptop. There are a few other bottlenecks in virtio that this exposes. The first one is the low queue length of just 128 entries in the virtio-blk queue - to drive higher IOPs with a deep queue we absolutely need to increment that. Comments welcome! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] virtio-blk: reimplement the serial attribute without using requests
If we want to do bio-based I/O in virtio-blk we have to implement reading the serial attribute ourselves. Do that and also prepare struct virtblk_req for dealing with different types of requests. Signed-off-by: Christoph Hellwig h...@lst.de Index: linux-2.6/drivers/block/virtio_blk.c === --- linux-2.6.orig/drivers/block/virtio_blk.c 2011-10-03 20:32:12.997713070 +0200 +++ linux-2.6/drivers/block/virtio_blk.c2011-10-03 20:37:28.836714193 +0200 @@ -38,12 +38,42 @@ struct virtio_blk struct virtblk_req { - struct request *req; + void *private; struct virtio_blk_outhdr out_hdr; struct virtio_scsi_inhdr in_hdr; + u8 kind; +#define VIRTIO_BLK_REQUEST 0x00 +#define VIRTIO_BLK_INTERNAL0x01 u8 status; }; +static inline int virtblk_result(struct virtblk_req *vbr) +{ + switch (vbr-status) { + case VIRTIO_BLK_S_OK: + return 0; + case VIRTIO_BLK_S_UNSUPP: + return -ENOTTY; + default: + return -EIO; + } +} + +static void virtblk_request_done(struct virtio_blk *vblk, + struct virtblk_req *vbr) +{ + struct request *req = vbr-private; + + if (req-cmd_type == REQ_TYPE_BLOCK_PC) { + req-resid_len = vbr-in_hdr.residual; + req-sense_len = vbr-in_hdr.sense_len; + req-errors = vbr-in_hdr.errors; + } + + __blk_end_request_all(req, virtblk_result(vbr)); + mempool_free(vbr, vblk-pool); +} + static void blk_done(struct virtqueue *vq) { struct virtio_blk *vblk = vq-vdev-priv; @@ -53,35 +83,16 @@ static void blk_done(struct virtqueue *v spin_lock_irqsave(vblk-lock, flags); while ((vbr = virtqueue_get_buf(vblk-vq, len)) != NULL) { - int error; - - switch (vbr-status) { - case VIRTIO_BLK_S_OK: - error = 0; - break; - case VIRTIO_BLK_S_UNSUPP: - error = -ENOTTY; - break; - default: - error = -EIO; - break; - } - - switch (vbr-req-cmd_type) { - case REQ_TYPE_BLOCK_PC: - vbr-req-resid_len = vbr-in_hdr.residual; - vbr-req-sense_len = vbr-in_hdr.sense_len; - vbr-req-errors = vbr-in_hdr.errors; + switch (vbr-kind) { + case VIRTIO_BLK_REQUEST: + virtblk_request_done(vblk, vbr); break; - case REQ_TYPE_SPECIAL: - vbr-req-errors = (error != 0); + case VIRTIO_BLK_INTERNAL: + complete(vbr-private); break; default: - break; + BUG(); } - - __blk_end_request_all(vbr-req, error); - mempool_free(vbr, vblk-pool); } /* In case queue is stopped waiting for more buffers. */ blk_start_queue(vblk-disk-queue); @@ -99,28 +110,24 @@ static bool do_req(struct request_queue /* When another request finishes we'll try again. */ return false; - vbr-req = req; + vbr-private = req; + vbr-kind = VIRTIO_BLK_REQUEST; if (req-cmd_flags REQ_FLUSH) { vbr-out_hdr.type = VIRTIO_BLK_T_FLUSH; vbr-out_hdr.sector = 0; - vbr-out_hdr.ioprio = req_get_ioprio(vbr-req); + vbr-out_hdr.ioprio = req_get_ioprio(req); } else { switch (req-cmd_type) { case REQ_TYPE_FS: vbr-out_hdr.type = 0; - vbr-out_hdr.sector = blk_rq_pos(vbr-req); - vbr-out_hdr.ioprio = req_get_ioprio(vbr-req); + vbr-out_hdr.sector = blk_rq_pos(req); + vbr-out_hdr.ioprio = req_get_ioprio(req); break; case REQ_TYPE_BLOCK_PC: vbr-out_hdr.type = VIRTIO_BLK_T_SCSI_CMD; vbr-out_hdr.sector = 0; - vbr-out_hdr.ioprio = req_get_ioprio(vbr-req); - break; - case REQ_TYPE_SPECIAL: - vbr-out_hdr.type = VIRTIO_BLK_T_GET_ID; - vbr-out_hdr.sector = 0; - vbr-out_hdr.ioprio = req_get_ioprio(vbr-req); + vbr-out_hdr.ioprio = req_get_ioprio(req); break; default: /* We don't put anything else in the queue. */ @@ -136,13 +143,14 @@ static bool do_req(struct request_queue * block, and before the normal inhdr we put the sense data and the * inhdr with
[PATCH 1/5] block: add bio_map_sg
Add a helper to map a bio to a scatterlist, modelled after blk_rq_map_sg. This helper is useful for any driver that wants to create a scatterlist from its -make_request method. Signed-off-by: Christoph Hellwig h...@lst.de Index: linux-2.6/block/blk-merge.c === --- linux-2.6.orig/block/blk-merge.c2011-10-04 11:49:32.857014742 -0400 +++ linux-2.6/block/blk-merge.c 2011-10-04 13:37:51.305630951 -0400 @@ -199,6 +199,69 @@ new_segment: } EXPORT_SYMBOL(blk_rq_map_sg); +/* + * map a bio to a scatterlist, return number of sg entries setup. Caller + * must make sure sg can hold bio-bi_phys_segments entries + */ +int bio_map_sg(struct request_queue *q, struct bio *bio, + struct scatterlist *sglist) +{ + struct bio_vec *bvec, *bvprv; + struct scatterlist *sg; + int nsegs, cluster; + unsigned long i; + + nsegs = 0; + cluster = blk_queue_cluster(q); + + bvprv = NULL; + sg = NULL; + bio_for_each_segment(bvec, bio, i) { + int nbytes = bvec-bv_len; + + if (bvprv cluster) { + if (sg-length + nbytes queue_max_segment_size(q)) + goto new_segment; + + if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec)) + goto new_segment; + if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec)) + goto new_segment; + + sg-length += nbytes; + } else { +new_segment: + if (!sg) + sg = sglist; + else { + /* +* If the driver previously mapped a shorter +* list, we could see a termination bit +* prematurely unless it fully inits the sg +* table on each mapping. We KNOW that there +* must be more entries here or the driver +* would be buggy, so force clear the +* termination bit to avoid doing a full +* sg_init_table() in drivers for each command. +*/ + sg-page_link = ~0x02; + sg = sg_next(sg); + } + + sg_set_page(sg, bvec-bv_page, nbytes, bvec-bv_offset); + nsegs++; + } + bvprv = bvec; + } /* segments in bio */ + + if (sg) + sg_mark_end(sg); + + BUG_ON(bio-bi_phys_segments nsegs bio-bi_phys_segments); + return nsegs; +} +EXPORT_SYMBOL(bio_map_sg); + static inline int ll_new_hw_segment(struct request_queue *q, struct request *req, struct bio *bio) Index: linux-2.6/include/linux/blkdev.h === --- linux-2.6.orig/include/linux/blkdev.h 2011-10-04 13:37:13.216148915 -0400 +++ linux-2.6/include/linux/blkdev.h2011-10-04 13:37:51.317613617 -0400 @@ -854,6 +854,8 @@ extern void blk_queue_flush_queueable(st extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev); extern int blk_rq_map_sg(struct request_queue *, struct request *, struct scatterlist *); +extern int bio_map_sg(struct request_queue *q, struct bio *bio, + struct scatterlist *sglist); extern void blk_dump_rq_flags(struct request *, char *); extern long nr_blockdev_pages(void); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci_cap_init: add 82599 VF quirk
On Tue, Oct 04, 2011 at 03:02:18PM -0400, Donald Dutile wrote: v2: Updated to reflect changes requested by reviewers. Add check when PCIe capability structure is version 0 and VID and DID is 82599 VF. In this case, the size of the PCIe cap structure should be the same as a version 2 cap structure. Documented in 82599 Errata 35, and is still marked No Fix. According to Intel, it's in silicon not fw, and needs a sw workaround. Signed-off-by: Donald Dutile ddut...@redhat.com --- hw/device-assignment.c | 17 +++-- 1 files changed, 15 insertions(+), 2 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm] Re: tcpdump locks up kvm host for a while.
On Wed, Oct 05, 2011 at 11:36:53AM -0700, Robin Lee Powell wrote: On Wed, Oct 05, 2011 at 08:21:53PM +0200, Avi Kivity wrote: On 10/04/2011 09:40 PM, Robin Lee Powell wrote: When I run tcpdump on a *guest*, the entire guest completely freezes up; no response even to hitting enter on the console. virsh list also locks up whenever it tries to print state about that VM (but the others work fine), as does any other operation that touches the state of that VM. The VM takes up 100% of CPU on one core while this is happening. Eventually it gets better. You can use 'perf kvm' to figure out where the guest is spinning. OK, gathered with: sudo perf kvm --guest --host record -o /tmp/kvm_perf -a I don't know how to read it at all, so it's at http://users.digitalkingdom.org/~rlpowell/media/public/kvm_perf Not accessible. -_- Fixed. Please post the output of 'perf kvm report log'. # Events: 42K # # Overhead Command Shared Object Symbol # .. # 100.00% qemu-kvm [unknown] [g] 0x8111c7a5 # # (For a higher level overview, try: perf report --sort comm,dso) # How helpful is that? -_- I'm guessing I need --guestkallsyms= ; since they're all the same kernel I thought it'd figure it out. I'll redo. OK, here's a better version. # Events: 46K cycles # # Overhead CommandShared Object Symbol # ... ... # 74.81% qemu-kvm [unknown][u] 0x7fbdffd4c18a 25.14% qemu-kvm [guest.kernel.kallsyms] [g] 0x82f0 0.03% qemu-kvm [virtio_net] [g] 0x83e8 0.01% qemu-kvm [virtio_balloon] [g] 0x103b 0.00% qemu-kvm [ip6_tables] [g] compat_standard_to_user 0.00% qemu-kvm [ipv6] [g] icmpv6_send 0.00% qemu-kvm [virtio_blk] [g] 0x7783 0.00% qemu-kvm [ipv6] [g] raw6_seq_show 0.00% qemu-kvm [ipv6] [g] icmpv6_rcv 0.00% qemu-kvm [virtio_net] [g] fini 0.00% qemu-kvm [ip6table_filter][g] 0x9b5 # # (For a higher level overview, try: perf report --sort comm,dso) # The file is also updated. -Robin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] RFC: -make_request support for virtio-blk
On Wed, Oct 05, 2011 at 03:54:03PM -0400, Christoph Hellwig wrote: This patchset allows the virtio-blk driver to support much higher IOP rates which can be driven out of modern PCI-e flash devices. At this point it really is just a RFC due to various issues. So you no longer believe that request queue overhead can be brought down to mangeable levels for these fast devices. And instead go for bio based drivers and give up on merging and implement own FLUSH/FUA machinery. Not that I am advocating for continuing with request based driver. Just curious.. Thanks Vivek -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] RFC: -make_request support for virtio-blk
On Wed, Oct 05, 2011 at 04:31:16PM -0400, Vivek Goyal wrote: So you no longer believe that request queue overhead can be brought down to mangeable levels for these fast devices. And instead go for bio based drivers and give up on merging and implement own FLUSH/FUA machinery. Not in a reasonable time frame. Having a common interface is still the grand plan, but the virtio performance issues are hurting people so badly that we need a fix soon. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Perf for only one VM?
On 10/05/2011 01:54 PM, Robin Lee Powell wrote: Is it possible to get perf kvm record to record only what's going on on a single VM? The options seem to be --host (all host data) and --guest (all VM data). -p pid argument after the record. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] block: add bio_map_sg
On 10/05/2011 09:54 PM, Christoph Hellwig wrote: Add a helper to map a bio to a scatterlist, modelled after blk_rq_map_sg. This helper is useful for any driver that wants to create a scatterlist from its -make_request method. Signed-off-by: Christoph Hellwig h...@lst.de I have some questions. - Could we later use this bio_map_sg() to implement blk_rq_map_sg() and remove some duplicated code? - Don't you need to support a chained bio (bio-next != NULL)? Because I did not see any looping in the last patch [PATCH 5/5] virtio-blk: implement -make_request Or is it that -make_request() is a single bio at a time? If so could we benefit from both bio-chaining and sg-chaning to make bigger IOs? Thanks Boaz Index: linux-2.6/block/blk-merge.c === --- linux-2.6.orig/block/blk-merge.c 2011-10-04 11:49:32.857014742 -0400 +++ linux-2.6/block/blk-merge.c 2011-10-04 13:37:51.305630951 -0400 @@ -199,6 +199,69 @@ new_segment: } EXPORT_SYMBOL(blk_rq_map_sg); +/* + * map a bio to a scatterlist, return number of sg entries setup. Caller + * must make sure sg can hold bio-bi_phys_segments entries + */ +int bio_map_sg(struct request_queue *q, struct bio *bio, + struct scatterlist *sglist) +{ + struct bio_vec *bvec, *bvprv; + struct scatterlist *sg; + int nsegs, cluster; + unsigned long i; + + nsegs = 0; + cluster = blk_queue_cluster(q); + + bvprv = NULL; + sg = NULL; + bio_for_each_segment(bvec, bio, i) { + int nbytes = bvec-bv_len; + + if (bvprv cluster) { + if (sg-length + nbytes queue_max_segment_size(q)) + goto new_segment; + + if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec)) + goto new_segment; + if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec)) + goto new_segment; + + sg-length += nbytes; + } else { +new_segment: + if (!sg) + sg = sglist; + else { + /* + * If the driver previously mapped a shorter + * list, we could see a termination bit + * prematurely unless it fully inits the sg + * table on each mapping. We KNOW that there + * must be more entries here or the driver + * would be buggy, so force clear the + * termination bit to avoid doing a full + * sg_init_table() in drivers for each command. + */ + sg-page_link = ~0x02; + sg = sg_next(sg); + } + + sg_set_page(sg, bvec-bv_page, nbytes, bvec-bv_offset); + nsegs++; + } + bvprv = bvec; + } /* segments in bio */ + + if (sg) + sg_mark_end(sg); + + BUG_ON(bio-bi_phys_segments nsegs bio-bi_phys_segments); + return nsegs; +} +EXPORT_SYMBOL(bio_map_sg); + static inline int ll_new_hw_segment(struct request_queue *q, struct request *req, struct bio *bio) Index: linux-2.6/include/linux/blkdev.h === --- linux-2.6.orig/include/linux/blkdev.h 2011-10-04 13:37:13.216148915 -0400 +++ linux-2.6/include/linux/blkdev.h 2011-10-04 13:37:51.317613617 -0400 @@ -854,6 +854,8 @@ extern void blk_queue_flush_queueable(st extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev); extern int blk_rq_map_sg(struct request_queue *, struct request *, struct scatterlist *); +extern int bio_map_sg(struct request_queue *q, struct bio *bio, + struct scatterlist *sglist); extern void blk_dump_rq_flags(struct request *, char *); extern long nr_blockdev_pages(void); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm] Re: Perf for only one VM?
On Wed, Oct 05, 2011 at 04:03:13PM -0600, David Ahern wrote: On 10/05/2011 01:54 PM, Robin Lee Powell wrote: Is it possible to get perf kvm record to record only what's going on on a single VM? The options seem to be --host (all host data) and --guest (all VM data). -p pid argument after the record. Thanks! -Robin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] LAPIC: make lapic support cpu hotplug
On Wed, Oct 5, 2011 at 7:01 PM, Jan Kiszka jan.kis...@web.de wrote: On 2011-10-05 12:26, liu ping fan wrote: And make the creation of apic as part of cpu initialization, so apic's state has been ready, before setting kvm_apic. There is no kvm-apic upstream yet, so it's hard to judge why we need this here. If we do, this has to be a separate patch. But I seriously doubt we need it (my hack worked without it, and that was not because of its hack nature). Sorry, I did not explain it clearly. What I mean is that “env-apic_state” must be prepared before qemu_kvm_cpu_thread_fn() - ... - kvm_put_sregs(), where we get apic_base by “ sregs.apic_base = cpu_get_apic_base(env-apic_state);” and then call “kvm_vcpu_ioctl(env, KVM_SET_SREGS, sregs);” which will finally affect the kvm_apic structure in kernel. But as current code, in pc_new_cpu(), we call apic_init() to initialize apic_state, after cpu_init(), so we can not guarantee the order of apic_state initializaion and the setting to kernel. Because LAPIC is part of x86 chip, I want to move it into cpu_x86_init(), and ensure apic_init() called before thread “qemu_kvm_cpu_thread_fn()” creation. The LAPIC is part of the CPU, the classic APIC was a dedicated chip. Sorry, a little puzzle. I think x86 interrupt system consists of two parts: IOAPIC/LAPIC. For we have hw/ioapic.c to simulate IOAPIC, I think hw/apic.c takes the role as LAPIC, especially that we create an APICState instance for each CPUX86State, just like each LAPIC for x86 CPU in real machine. So we can consider apic_init() to create a LAPIC instance, rather than create a classic APIC? I guess If there is lack of something in IOAPIC/LAPIC bus topology, that will be the arbitrator of ICC bus, right? So the classic APIC was a dedicated chip what you said, play this role, right? Could you tell me a sample chipset of APIC, and I can increase my knowledge about it, thanks. For various reasons, a safer approach for creating a new CPU is to stop the machine, add the new device models, run cpu_synchronize_post_init on that new cpu (looks like you missed that) and then resume everything. See http://git.kiszka.org/?p=qemu-kvm.git;a=commitdiff;h=be8f21c6b54eac82f7add7ee9d4ecf9cb8ebb320 Great job. And I am interesting about it. Could you give some sample reason about why we need to stop the machine, or list all of the reasons, so we can resolve it one by one. I can not figure out such scenes by myself. From my view, especially for KVM, the creation of vcpu are protected well by lock mechanism from other vcpu threads in kernel, so we need not to stop all of the threads. Hope for suggestion and direct from you, and make a further step for CPU hot-plug feature, Thanks and regards, ping fan ... diff --git a/hw/icc_bus.c b/hw/icc_bus.c new file mode 100644 index 000..360ca2a --- /dev/null +++ b/hw/icc_bus.c @@ -0,0 +1,62 @@ +/* +*/ +#define ICC_BUS_PLUG +#ifdef ICC_BUS_PLUG +#include icc_bus.h + + + +struct icc_bus_info icc_info = { + .qinfo.name = icc, + .qinfo.size = sizeof(struct icc_bus), + .qinfo.props = (Property[]) { + DEFINE_PROP_END_OF_LIST(), + } + +}; + + +static const VMStateDescription vmstate_icc_bus = { + .name = icc_bus, + .version_id = 1, + .minimum_version_id = 1, + .minimum_version_id_old = 1, + .pre_save = NULL, + .post_load = NULL, +}; + +struct icc_bus *g_iccbus; + +struct icc_bus *icc_init_bus(DeviceState *parent, const char *name) +{ + struct icc_bus *bus; + + bus = FROM_QBUS(icc_bus, qbus_create(icc_info.qinfo, parent, name)); + bus-qbus.allow_hotplug = 1; /* Yes, we can */ + bus-qbus.name = icc; + vmstate_register(NULL, -1, vmstate_icc_bus, bus); The chipset is the owner of this bus and instantiates it. So it also provides a vmstate. You can drop this unneeded one here (it's created via an obsolete API anyway). No familiar with Qemu bus emulation, keep on learning :) . But what I thought is, the x86-ICC bus is not the same as bus like PCI. For a PCI bus, it lies behind a host bridge, but ICC is shared by all x86 processors in SMP system, so there is not a outstanding owner. And I right? ICC is also attached to the chipset (due to the IOAPIC). So it looks reasonable to me to let the chipset do the lifecycle management as well. It is the fixed point, CPUs may come and go. Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] LAPIC: make lapic support cpu hotplug
On 10/05/2011 08:13 PM, liu ping fan wrote: On Wed, Oct 5, 2011 at 7:01 PM, Jan Kiszkajan.kis...@web.de wrote: On 2011-10-05 12:26, liu ping fan wrote: And make the creation of apic as part of cpu initialization, so apic's state has been ready, before setting kvm_apic. There is no kvm-apic upstream yet, so it's hard to judge why we need this here. If we do, this has to be a separate patch. But I seriously doubt we need it (my hack worked without it, and that was not because of its hack nature). Sorry, I did not explain it clearly. What I mean is that “env-apic_state” must be prepared before qemu_kvm_cpu_thread_fn() - ... - kvm_put_sregs(), where we get apic_base by “ sregs.apic_base = cpu_get_apic_base(env-apic_state);” and then call “kvm_vcpu_ioctl(env, KVM_SET_SREGS,sregs);” which will finally affect the kvm_apic structure in kernel. But as current code, in pc_new_cpu(), we call apic_init() to initialize apic_state, after cpu_init(), so we can not guarantee the order of apic_state initializaion and the setting to kernel. Because LAPIC is part of x86 chip, I want to move it into cpu_x86_init(), and ensure apic_init() called before thread “qemu_kvm_cpu_thread_fn()” creation. The LAPIC is part of the CPU, the classic APIC was a dedicated chip. Sorry, a little puzzle. I think x86 interrupt system consists of two parts: IOAPIC/LAPIC. For we have hw/ioapic.c to simulate IOAPIC, I think hw/apic.c takes the role as LAPIC, especially that we create an APICState instance for each CPUX86State, just like each LAPIC for x86 CPU in real machine. So we can consider apic_init() to create a LAPIC instance, rather than create a classic APIC? I guess If there is lack of something in IOAPIC/LAPIC bus topology, that will be the arbitrator of ICC bus, right? So the classic APIC was a dedicated chip what you said, play this role, right? Could you tell me a sample chipset of APIC, and I can increase my knowledge about it, thanks. I think Jan meant the PIC was a dedicated chip. hw/apic.c implements an LAPIC, hw/i8259.c implements an I8259A otherwise known as the PIC. hw/ioapic.c implements an I/O APIC. Together, the I/O APIC and the LAPIC form an 'APIC Architecture'. Usually, the legacy PIC can hang off of the BSP LAPIC. Regards, Anthony Liguori For various reasons, a safer approach for creating a new CPU is to stop the machine, add the new device models, run cpu_synchronize_post_init on that new cpu (looks like you missed that) and then resume everything. See http://git.kiszka.org/?p=qemu-kvm.git;a=commitdiff;h=be8f21c6b54eac82f7add7ee9d4ecf9cb8ebb320 Great job. And I am interesting about it. Could you give some sample reason about why we need to stop the machine, or list all of the reasons, so we can resolve it one by one. I can not figure out such scenes by myself. From my view, especially for KVM, the creation of vcpu are protected well by lock mechanism from other vcpu threads in kernel, so we need not to stop all of the threads. Hope for suggestion and direct from you, and make a further step for CPU hot-plug feature, Thanks and regards, ping fan ... diff --git a/hw/icc_bus.c b/hw/icc_bus.c new file mode 100644 index 000..360ca2a --- /dev/null +++ b/hw/icc_bus.c @@ -0,0 +1,62 @@ +/* +*/ +#define ICC_BUS_PLUG +#ifdef ICC_BUS_PLUG +#include icc_bus.h + + + +struct icc_bus_info icc_info = { +.qinfo.name = icc, +.qinfo.size = sizeof(struct icc_bus), +.qinfo.props = (Property[]) { +DEFINE_PROP_END_OF_LIST(), +} + +}; + + +static const VMStateDescription vmstate_icc_bus = { +.name = icc_bus, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.pre_save = NULL, +.post_load = NULL, +}; + +struct icc_bus *g_iccbus; + +struct icc_bus *icc_init_bus(DeviceState *parent, const char *name) +{ +struct icc_bus *bus; + +bus = FROM_QBUS(icc_bus, qbus_create(icc_info.qinfo, parent, name)); +bus-qbus.allow_hotplug = 1; /* Yes, we can */ +bus-qbus.name = icc; +vmstate_register(NULL, -1,vmstate_icc_bus, bus); The chipset is the owner of this bus and instantiates it. So it also provides a vmstate. You can drop this unneeded one here (it's created via an obsolete API anyway). No familiar with Qemu bus emulation, keep on learning :) . But what I thought is, the x86-ICC bus is not the same as bus like PCI. For a PCI bus, it lies behind a host bridge, but ICC is shared by all x86 processors in SMP system, so there is not a outstanding owner. And I right? ICC is also attached to the chipset (due to the IOAPIC). So it looks reasonable to me to let the chipset do the lifecycle management as well. It is the fixed point, CPUs may come and go. Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3 4/4] acpi: automatically generated ssdt proc
On Wed, Oct 05, 2011 at 08:35:26AM -0200, Michael S. Tsirkin wrote: On Tue, Oct 04, 2011 at 10:52:33PM -0400, Kevin O'Connor wrote: Something like: ACPI_EXTRACT ssdt_proc_obj Processor (CPAA, 0xAA, 0xb010, 0x06) { I considered this, sure. We could parse AML to figure out what is the object type we are trying to look up. However I decided sanity-checking that we get the right type of object in AML is better. This way if iasl output format breaks we will have a better chance to detect that. Makes sense? Yes. I guess one could do ACPI_EXTRACT_PROCESSOR for the sanity check. Also this can not be as generic as it seems: each type of object in AML bytecode is encoded slightly differently. So we would still have types of objects we support and types of object we don't. Yes. which would produce something like: static struct aml_object ssdt_proc_obj = {.addr=0x24, .size=0x40, .param=0x28}; What is the param offset here? The location of the first byte of the parameters (the same as you had for ssdt_proc_name). Normally, AML objects take the form: id length fixed params variable param list. The length is itself of variable length, so passing in the start of the fixed parameters would make manipulating the results easier. As for the other parts of this patch series - I'm still leary of changing the DSDT dynamically. Hmm, not sure I understand why. Could you explain pls? Sure: - The DSDT is big and has several cross-functional users. Patching up the DSDT for hotplug when the DSDT also has unrelated stuff (eg, mouse) seems ugly. - The PCI hotplug stuff is generating a whole bunch of devices and the dynamic code is effectively disabling the unwanted ones. It seems nicer to dynamically generate the desired entries instead of bulk generating and dynamically blanking. - The CPU hotplug has similar requirements, but is implemented differently - it generates the CPU objects dynamically. It's not desirable to bulk generate the CPU objects and blank them dynamically, because 255 CPU objects would noticeably increase SeaBIOS' static size. - Some time back there were patches floating around to pass the DSDT into SeaBIOS via fw_cfg interface. Those patches never made it in (I forget why), but the basic functionality seemed sound. Patching the DSDT in SeaBIOS would seem to eliminate that possibility. None of these would be road-blocks. However, they make me want to consider other approaches. and then just memcpy the hotplug_obj N number of times into the ssdt for each available slot. (This would be on top of the DSDT simplification patch series that I posted previously.) This assumes all devices are the same. But unfortunately this will not work for other devices such as VGA. The VGA device can't be hotplugged, so I don't see why that would be an issue. -Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-blk: implement -make_request
On Wed, 05 Oct 2011 15:54:08 -0400, Christoph Hellwig h...@infradead.org wrote: Add an alternate I/O path that implements -make_request for virtio-blk. This is required for high IOPs devices which get slowed down to 1/5th of the native speed by all the locking, memory allocation and other overhead in the request based I/O path. Ouch. I'd be tempted to just switch across to this, though I'd be interested to see if the simple add_buf change I referred to before has some effect by itself (I doubt it). Also, though it's overkill I'd use standard list primitives rather than open-coding a single linked list. Thanks! Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: E500: Support hugetlbfs
On 05.10.2011, at 18:06, Scott Wood wrote: On 10/05/2011 09:37 AM, Alexander Graf wrote: diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index ec17148..1dd96a9 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c @@ -24,6 +24,7 @@ #include linux/sched.h #include linux/rwsem.h #include linux/vmalloc.h +#include linux/hugetlb.h #include asm/kvm_ppc.h #include asm/kvm_e500.h @@ -673,12 +674,31 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, pfn = ~(tsize_pages - 1); break; } +} else if (vma hva = vma-vm_start + (vma-vm_flags VM_HUGETLB)) { +unsigned long psize = vma_kernel_pagesize(vma); + +tsize = (gtlbe-mas1 MAS1_TSIZE_MASK) +MAS1_TSIZE_SHIFT; + +/* + * Take the largest page size that satisfies both host + * and guest mapping + */ +tsize = min(__ilog2(psize) - 10, tsize); Any reason for __ilog2() rather than ilog2()? Shouldn't make a difference, just curious about avoiding the public interface. I grep'ed through the kernel tree and only found __ilog2 defined as well as mostly users for __ilog2, so I figured there's got to be a reason ;) Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: E500: Support hugetlbfs
On 10/05/2011 01:55 PM, Alexander Graf wrote: On 05.10.2011, at 18:06, Scott Wood wrote: Any reason for __ilog2() rather than ilog2()? Shouldn't make a difference, just curious about avoiding the public interface. I grep'ed through the kernel tree and only found __ilog2 defined as well as mostly users for __ilog2, so I figured there's got to be a reason ;) ilog2() is defined in include/linux/ilog2.h. It produces constant output if the input is constant, and appears to be the front door to __ilog2_u32/__ilog2_u64. Plain __ilog2 is older and powerpc-specific, which is probably why there are more users of that in arch/powerpc. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html