Re: [kvm-devel] The SMP RHEL 5.1 PAE guest can't boot up issue
I believe the patch is still necessary, since we still need to guarantee that a vcpu's tsc is monotonous. I think there are three issues to be addressed: 1. The majority of intel machines don't need the offset adjustment since they already have a constant rate tsc that is synchronized on all cpus. I think this is indicated by X86_FEATURE_CONSTANT_TSC (though I'm not 100% certain if it means that the rate is the same for all cpus, Thomas can you clarify?) This will improve tsc quality for those machines, but we can't depend on it, since some machines don't have constant tsc. Further, I don't think really large machines can have constant tsc since clock distribution becomes difficult or impossible. I have another newbie question: can the current Linux kernel handle the unsynced TSC? If kernel can't handle this case, it still has problem to run Linux on hardware with unsynced TSC. Thanks, Forrest - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] Can Linux kernel handle unsynced TSC?
For example, 1 rdtsc() is invoked on CPU0 2 process is migrated to CPU1, and rdtsc() is invoked on CPU1 3 if TSC on CPU1 is slower than TSC on CPU0, can kernel guarantee that the second rdtsc() doesn't return a value smaller than the one returned by the first rdtsc()? Thanks, Forrest - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] Can Linux kernel handle unsynced TSC?
Sorry for reposting it. For example, 1 rdtsc() is invoked on CPU0 2 process is migrated to CPU1, and rdtsc() is invoked on CPU1 3 if TSC on CPU1 is slower than TSC on CPU0, can kernel guarantee that the second rdtsc() doesn't return a value smaller than the one returned by the first rdtsc()? Thanks, Forrest - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Can Linux kernel handle unsynced TSC?
On Fri, 2008-02-29 at 16:55 +0800, Zhao Forrest wrote: Sorry for reposting it. For example, 1 rdtsc() is invoked on CPU0 2 process is migrated to CPU1, and rdtsc() is invoked on CPU1 3 if TSC on CPU1 is slower than TSC on CPU0, can kernel guarantee that the second rdtsc() doesn't return a value smaller than the one returned by the first rdtsc()? No, rdtsc() goes directly to the hardware. You need a (preferably cheap) clock abstraction layer on top if you need this. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] I/O bandwidth control on KVM
Hello all, I've implemented a block device which throttles block I/O bandwidth, which I called dm-ioband, and been trying to throttle I/O bandwidth on KVM environment. But unfortunately it doesn't work well, the number of issued I/Os is not according to the bandwidth setting. On the other hand, I got the good result when accessing directly to the local disk on the local machine. I'm not so familiar with KVM. Could anyone give me any advice? For dm-ioband details, please see the website at http://people.valinux.co.jp/~ryov/dm-ioband/ The number of issued I/Os -- | device | sda11 | sda12 | |weight setting|80%|20%| |--+---+---| | KVM | I/Os | 4397| 2902| | | ratio to total | 60.2% | 39.8% | |-++---+---| |local| I/Os | 5447| 1314| | | ratio to total | 80.6% | 19.4% | -- The test environment and the procedure are as follow: o Prepare two partitions sda11 and sda12. o Create two bandwidth control devices, each device is mapped to the sda11 and sda12 respectively. o Give weights of 80 and 20 to each bandwidth control device respectively. o Run two virtual machines, the virtual machine's disk is mapped to the each bandwidth control device. o Run 128 processes issuing random read/write direct I/O with 4KB data on each virtual machine at the same time respectively. o Count up the number of I/Os which have done in 60 seconds. Access through KVM +---+ +--+ | Virtual Machine 1 (VM1) | | Virtual Machine 2 (VM2) | |in cgroup ioband1| |in cgroup ioband2 | | | | | | Read/Write with O_DIRECT | | Read/Write with O_DIRECT | | process x 128 | | process x 128 | | | | | || | V | | V| | /dev/vda1 | | /dev/vda1| +-|-+ +-|+ +-V-V+ | /dev/mapper/ioband1 | /dev/mapper/ioband2 | | 80% for cgroup ioband1 | 20% for cgroup ioband2| | | | |Control I/O bandwidth according to the cgroup tasks | +-|-|+ +-V-+ +-|+ |/dev/sda11 | | /dev/sda12 | +---+ +--+ Direct access +---+ +--+ | cgroup ioband1 | | cgroup ioband2 | | | | | | Read/Write with O_DIRECT | | Read/Write with O_DIRECT | | process x 128 | | process x 128 | | | | | || +-|-+ +-|+ +-V-V+ | /dev/mapper/ioband1 |/dev/mapper/ioband2 | | 80% for cgroup ioband1 | 20% for cgroup ioband2| | | | | Control I/O bandwidth according to the cgroup tasks| +-|-|+ +-V-+ +-|+ |/dev/sda11 | | /dev/sda12 | +---+ +--+ Thanks, Ryo Tsuruta - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] FW: KVM Test result, kernel 4a7f582.., userspace bc6db37..
Zhao, Yunfeng wrote: Hi, all, This is today's KVM test result against kvm.git 4a7f582a07e14763ee4714b681e98b3b134d1d46 and kvm-userspace.git bc6db37817ce749dcc88fbc761a36bb8df5cf60a. LTP and kernel build test on pae linux guest are failed, because these case boot guests with smp 2.6.9 kernel, it's related with today's new issue. With manual test save restore have no problem for the first time. Save/restore test cases passed in manually testing. Because the command has been changed, it failed in auto testing. We will change the test cases. One new issue: 1. Can not boot guests with 2.6.9 smp pae kernel https://sourceforge.net/tracker/index.php?func=detailaid=19037 32group_id=180599atid=893831 We doubt this issue is caused by this commit: kvm: bios: mark extra cpus as present kvm-userspace: 538c90271b9431f8c7f2ebfdffdab07749b97d86 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges
On Thu, Feb 28, 2008 at 04:59:59PM -0800, Christoph Lameter wrote: And thus the device driver may stop receiving data on a UP system? It will never get the ack. Not sure to follow, sorry. My idea was: post the invalidate in the mmio region of the device smp_call_function() while (mmio device wait-bitflag is on); Instead of the current: smp_call_function() post the invalidate in the mmio region of the device while (mmio device wait-bitflag is on); To decrease the wait loop time. invalidate_page_before/end could be realized as an invalidate_range_begin/end on a page sized range? If we go this route, once you add support to xpmem, you'll have to make the anon_vma lock a mutex too, that would be fine with me though. The main reason invalidate_page exists, is to allow you to leave it as non-sleep-capable even after you make invalidate_range sleep capable, and to implement the mmu_rmap_notifiers sleep capable in all the paths that invalidate_page would be called. That was the strategy you had in your patch. I'll try to drop invalidate_page. I wonder if then you won't need the mmu_rmap_notifiers anymore. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v7
On Thu, Feb 28, 2008 at 05:03:01PM -0800, Christoph Lameter wrote: I thought you wanted to get rid of the sync via pte lock? Sure. _notify is happening inside the pt lock by coincidence, to reduce the changes to mm/* as long as the mmu notifiers aren't sleep capable. What changes to do_wp_page do you envision? Converting it to invalidate_range_begin/end. What is the trouble with the current do_wp_page modifications? There is no need for invalidate_page() there so far. invalidate_range() does the trick there. No trouble, it's just that I didn't want to mangle over the logic of do_wp_page unless it was strictly required, the patch has to be obviously safe. You need to keep that bit of your patch to make the mmu notifiers sleepable. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Can Linux kernel handle unsynced TSC?
On 2/29/08, Peter Zijlstra [EMAIL PROTECTED] wrote: On Fri, 2008-02-29 at 16:55 +0800, Zhao Forrest wrote: Sorry for reposting it. For example, 1 rdtsc() is invoked on CPU0 2 process is migrated to CPU1, and rdtsc() is invoked on CPU1 3 if TSC on CPU1 is slower than TSC on CPU0, can kernel guarantee that the second rdtsc() doesn't return a value smaller than the one returned by the first rdtsc()? No, rdtsc() goes directly to the hardware. You need a (preferably cheap) clock abstraction layer on top if you need this. Thank you for the clarification. I think gettimeofday() is such kind of clock abstraction layer, am I right? - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] catch vmentry failure (was enable gfxboot on VMX)
On Mon, 18 Feb 2008 10:39:31 +0100 Alexander Graf [EMAIL PROTECTED] wrote: So if you want to see a VMentry failure, just remove the SS patching and you'll see one. My guess would be that you see a lot of problems with otherwise working code too then, though, as SS can be anything in that state. So I made some tests and you were right, removing the SS patching showed VM entry failure but it also generated lots of problems. Thus I tried to modify a little bit the code and with the following patch (see the end of the email) I can detect VM Entry failures without generating other problems. It works when you use a distribution that is big-real-mode free. I pasted the patch just to show the idea. It's interesting because we can continue to use the virtual mode for the majority of distribution and we can detect when a VM entry failure is detected it means that we need to switch from virtual mode to full real mode emulation. Such failure is caught in handle_vmentry_failure() when patch applied. If it's doable, the next step is the modification of the SS segment selector to succeed the vm-entry and the switch from virtual mode to a real mode emulation that could be done in handle_vmentry_failure(). Does it make sense? Regards, Guillaume --- diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 46e0e58..c2c3897 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1166,15 +1166,13 @@ static void enter_pmode(struct kvm_vcpu *vcpu) (vmcs_readl(CR4_READ_SHADOW) X86_CR4_VME)); update_exception_bitmap(vcpu); - + + fix_pmode_dataseg(VCPU_SREG_SS, vcpu-arch.rmode.ss); fix_pmode_dataseg(VCPU_SREG_ES, vcpu-arch.rmode.es); fix_pmode_dataseg(VCPU_SREG_DS, vcpu-arch.rmode.ds); fix_pmode_dataseg(VCPU_SREG_GS, vcpu-arch.rmode.gs); fix_pmode_dataseg(VCPU_SREG_FS, vcpu-arch.rmode.fs); - vmcs_write16(GUEST_SS_SELECTOR, 0); - vmcs_write32(GUEST_SS_AR_BYTES, 0x93); - vmcs_write16(GUEST_CS_SELECTOR, vmcs_read16(GUEST_CS_SELECTOR) ~SELECTOR_RPL_MASK); vmcs_write32(GUEST_CS_AR_BYTES, 0x9b); @@ -1228,20 +1226,12 @@ static void enter_rmode(struct kvm_vcpu *vcpu) vmcs_writel(GUEST_CR4, vmcs_readl(GUEST_CR4) | X86_CR4_VME); update_exception_bitmap(vcpu); - vmcs_write16(GUEST_SS_SELECTOR, vmcs_readl(GUEST_SS_BASE) 4); - vmcs_write32(GUEST_SS_LIMIT, 0x); - vmcs_write32(GUEST_SS_AR_BYTES, 0xf3); - - vmcs_write32(GUEST_CS_AR_BYTES, 0xf3); - vmcs_write32(GUEST_CS_LIMIT, 0x); - if (vmcs_readl(GUEST_CS_BASE) == 0x) - vmcs_writel(GUEST_CS_BASE, 0xf); - vmcs_write16(GUEST_CS_SELECTOR, vmcs_readl(GUEST_CS_BASE) 4); - + fix_rmode_seg(VCPU_SREG_CS, vcpu-arch.rmode.cs); fix_rmode_seg(VCPU_SREG_ES, vcpu-arch.rmode.es); fix_rmode_seg(VCPU_SREG_DS, vcpu-arch.rmode.ds); fix_rmode_seg(VCPU_SREG_GS, vcpu-arch.rmode.gs); fix_rmode_seg(VCPU_SREG_FS, vcpu-arch.rmode.fs); + fix_rmode_seg(VCPU_SREG_SS, vcpu-arch.rmode.ss); kvm_mmu_reset_context(vcpu); init_rmode_tss(vcpu-kvm); @@ -2257,6 +2247,39 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu, static const int kvm_vmx_max_exit_handlers = ARRAY_SIZE(kvm_vmx_exit_handlers); +static int handle_vmentry_failure(u32 exit_reason, struct kvm_vcpu *vcpu) +{ + unsigned long exit_qualification = vmcs_read64(EXIT_QUALIFICATION); + u32 info_field = vmcs_read32(VMX_INSTRUCTION_INFO); + unsigned int basic_exit_reason = (uint16_t) exit_reason; + + printk(%s: exit reason 0x%x \n, __FUNCTION__, exit_reason); + printk(%s: vmentry failure reason %u \n, __FUNCTION__, basic_exit_reason); + printk(%s: VMX-instruction Information field 0x%x \n, __FUNCTION__, info_field); + + switch (basic_exit_reason) { + case EXIT_REASON_INVALID_GUEST_STATE: + printk(caused by invalid guest state (%ld).\n, exit_qualification); + /* At this point we need to modify SS selector to pass vmentry test. +* This modification prevent the usage of virtual mode to emulate real +* mode so we need to pass in big real mode emulation +* with somehting like: +* vcpu-arch.rmode.emulate = 1 +*/ + break; + case EXIT_REASON_MSR_LOADING: + printk(caused by MSR entry %ld loading.\n, exit_qualification); + break; + case EXIT_REASON_MACHINE_CHECK: + printk(caused by machine check.\n); + break; + default: + printk(reason not known yet!\n); + break; + } + return 0; +} + /* * The guest has exited. See if we can
Re: [kvm-devel] Can Linux kernel handle unsynced TSC?
On Fri, 2008-02-29 at 22:20 +0800, Zhao Forrest wrote: On 2/29/08, Peter Zijlstra [EMAIL PROTECTED] wrote: On Fri, 2008-02-29 at 16:55 +0800, Zhao Forrest wrote: Sorry for reposting it. For example, 1 rdtsc() is invoked on CPU0 2 process is migrated to CPU1, and rdtsc() is invoked on CPU1 3 if TSC on CPU1 is slower than TSC on CPU0, can kernel guarantee that the second rdtsc() doesn't return a value smaller than the one returned by the first rdtsc()? No, rdtsc() goes directly to the hardware. You need a (preferably cheap) clock abstraction layer on top if you need this. Thank you for the clarification. I think gettimeofday() is such kind of clock abstraction layer, am I right? Yes, gtod is one such a layer, however it fails the 'cheap' test for many definitions of cheap. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Qemu-devel] [PATCH] USB 2.0 EHCI emulation
On Fri, Feb 29, 2008 at 2:33 AM, Arnon Gilboa [EMAIL PROTECTED] wrote: In hw/pc.c, replace usb_uhci_piix3_init(pci_bus, piix3_devfn + 2); With usb_ehci_init(pci_bus, piix3_devfn + 2); With these changes.. I can't add the usb devices anymore to a Windows XP (32 bit). This is the command i use to start kvm: /usr/local/bin/kvm/qemu-system-x86_64 -localtime -m 512 -usb -hda win32xp.img To add usb device i normally go to the qemu console and type: info usbhost find the number for my device i want to connect to usb_add host:03f0:01cda But with your patch, when i try to add a usb device i get: Could not add 'USB device host:03f0:01cda' Since i'm using EHCI emulation, do i need to add usb devices in a different way? Or should it work exactly the same way? Thanks, Jerry Note my comments on the original post: -tested on XP guest -does not support ISO transfers -timing issues -Original Message- From: Gerb Stralko [mailto:[EMAIL PROTECTED] Sent: Thursday, February 28, 2008 9:46 PM To: Arnon Gilboa Cc: [EMAIL PROTECTED]; kvm-devel@lists.sourceforge.net Subject: Re: [kvm-devel] [Qemu-devel] [PATCH] USB 2.0 EHCI emulation Attached is a repost of the preliminary patch implementing USB 2.0 EHCI emulation. I want to start testing your patches for the EHCI stuff. Do i need to enable anything inorder to get EHCI emulation working after applying your patch? Unfortunately, with this patch it doesn't work for me. My guest host (windows vista) still became really slow when I add the a usb device. Waiting for your comments, Arnon Thanks, Jerry - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] Belebt Geist und Korper
Online Apotheke - original Qualitaet - 100% wirksam Spezialangebot: Vi. 10 Tab. 100 mg + Ci. 10 Tab. x 20 mg 53,82 Euro Vi. 10 Tab. 26,20 Euro Vi. 30 Tab. 51,97 Euro - Sie sparen: 27,00 Euro Vi. 60 Tab. 95,69 Euro - Sie sparen: 62,00 Euro Vi. 90 Tab. 136,91 Euro - Sie sparen: 100,00 Euro Ci. 10 - 30,00 Euro Ci. 20 - 59,35 Euro - Sie sparen: 2,00 Euro Ci. 30 - 80,30 Euro - Sie sparen: 12,00 Euro - Bequem und diskret online bestellen. - Visa verifizierter Onlineshop - keine versteckte Kosten - Diskrete Verpackung und Zahlung - Kein langes Warten - Auslieferung innerhalb von 2-3 Tagen - Kein peinlicher Arztbesuch erforderlich - Kostenlose, arztliche Telefon-Beratung Bestellen Sie jetzt und vergessen Sie Ihre Enttauschungen, anhaltende Versagensaengste und wiederholte peinliche Situationen Vier Dosen gibt's bei jeder Bestellung umsonst http://believetalk.com- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] KVM-61/62 build fails on SLES 10
Whereas KVM-60 builds out of the box on SLES 10 SP1 (assuming gcc 3.4 is installed), KVM-61 and KVM-62 don't. They fail with: make[1]: Entering directory `/scratch/KVM/kvm-61/kernel' # include header priority 1) INUX 2) ERNELDIR 3) include-compat make -C /lib/modules/2.6.16.54-0.2.5-smp/build M=`pwd` \ LINUXINCLUDE=-I`pwd`/include -Iinclude -I`pwd`/include-compat \ -include include/linux/autoconf.h \ $@ make[2]: Entering directory `/usr/src/linux-2.6.16.54-0.2.5-obj/x86_64/smp' make -C ../../../linux-2.6.16.54-0.2.5 O=../linux-2.6.16.54-0.2.5-obj/x86_64/smp LD /scratch/KVM/kvm-61/kernel/built-in.o CC [M] /scratch/KVM/kvm-61/kernel/svm.o In file included from command line:1: /scratch/KVM/kvm-61/kernel/external-module-compat.h:10:28: error: linux/compiler.h: No such file or directory /scratch/KVM/kvm-61/kernel/external-module-compat.h:12:26: error: linux/string.h: No such file or directory Trying to fiddle the include path to ensure that it finds /usr/src/linux/include then produces an error for linux/clocksource.h. SLES 10 SP1 uses a kernel whose version is 2.6.16.54-0.2.5-smp, i.e. 2.6.16 plus various back-ported bits. However, SLES 10 SP1 is the current version of SuSE Linux Enterprise Server, so in some sense this is current. KVM was configured with ./configure --prefix=/usr/local/kvm/kvm-61 \ --qemu-cc=/scratch/gcc-3.4/bin/gcc-3.4 Michael - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] KVM-61/62 build fails on SLES 10
On Fri, 29 Feb 2008, M.J. Rutter wrote: Whereas KVM-60 builds out of the box on SLES 10 SP1 (assuming gcc 3.4 is installed), KVM-61 and KVM-62 don't. Bother. Ignore that. As far as I can see, no KVM since about KVM-37 has actually run on a kernel that old, due to the lack of hrtimer_init and friends. KVM-60 may build, but it certainly doesn't run, so KVM-61/62's inability to build is of no consequence. Michael - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] 64bit host performance
Hello All, Is there a significant performance advantage with using a 64bit host os? I am specifically wondering about the advantages where KVM and QEMU are concerned. Thanks in advance, -G - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v7
On Fri, 29 Feb 2008, Andrea Arcangeli wrote: On Thu, Feb 28, 2008 at 05:03:01PM -0800, Christoph Lameter wrote: I thought you wanted to get rid of the sync via pte lock? Sure. _notify is happening inside the pt lock by coincidence, to reduce the changes to mm/* as long as the mmu notifiers aren't sleep capable. Ok if this is a coincidence then it would be better to separate the notifier callouts from the pte macro calls. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges
On Fri, 29 Feb 2008, Andrea Arcangeli wrote: On Thu, Feb 28, 2008 at 04:59:59PM -0800, Christoph Lameter wrote: And thus the device driver may stop receiving data on a UP system? It will never get the ack. Not sure to follow, sorry. My idea was: post the invalidate in the mmio region of the device smp_call_function() while (mmio device wait-bitflag is on); So the device driver on UP can only operate through interrupts? If you are hogging the only cpu then driver operations may not be possible. invalidate_page_before/end could be realized as an invalidate_range_begin/end on a page sized range? If we go this route, once you add support to xpmem, you'll have to make the anon_vma lock a mutex too, that would be fine with me though. The main reason invalidate_page exists, is to allow you to leave it as non-sleep-capable even after you make invalidate_range sleep capable, and to implement the mmu_rmap_notifiers sleep capable in all the paths that invalidate_page would be called. That was the strategy you had in your patch. I'll try to drop invalidate_page. I wonder if then you won't need the mmu_rmap_notifiers anymore. I am mainly concerned with making the mmu notifier a generally useful feature for multiple users. Xpmem is one example of a different user. It should be considered as one example of a different type of callback user. It is not the gold standard that you make it to be. RDMA is another and there are likely scores of others (DMA engines etc) once it becomes clear that such a feature is available. In general the mmu notifier will allows us to fix the problems caused by memory pinning and mlock by various devices and other mechanisms that need to directly access memory. And yes I would like to get rid of the mmu_rmap_notifiers altogether. It would be much cleaner with just one mmu_notifier that can sleep in all functions. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges
On Fri, Feb 29, 2008 at 11:55:17AM -0800, Christoph Lameter wrote: post the invalidate in the mmio region of the device smp_call_function() while (mmio device wait-bitflag is on); So the device driver on UP can only operate through interrupts? If you are hogging the only cpu then driver operations may not be possible. There was no irq involved in the above pseudocode, the irq if something would run in the remote system. Still irqs can run fine during the while loop like they run fine on top of smp_call_function. The send-irq and the following spin-on-a-bitflag works exactly as smp_call_function except this isn't a numa-CPU to invalidate. And yes I would like to get rid of the mmu_rmap_notifiers altogether. It would be much cleaner with just one mmu_notifier that can sleep in all functions. Agreed. I just thought xpmem needed an invalidate-by-page, but I'm glad if xpmem can go in sync with the KVM/GRU/DRI model in this regard. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] 64bit host performance
ציטוט [EMAIL PROTECTED]: Hello All, Is there a significant performance advantage with using a 64bit host os? I am specifically wondering about the advantages where KVM and QEMU are concerned. the mmu code (the page table entries pointers are 64bits) would run faster on 64bits host i think this should be the main diffrence. Thanks in advance, -G - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges
On Fri, 29 Feb 2008, Andrea Arcangeli wrote: Agreed. I just thought xpmem needed an invalidate-by-page, but I'm glad if xpmem can go in sync with the KVM/GRU/DRI model in this regard. That means we need both the anon_vma locks and the i_mmap_lock to become semaphores. I think semaphores are better than mutexes. Rik and Lee saw some performance improvements because list can be traversed in parallel when the anon_vma lock is switched to be a rw lock. Sounds like we get to a conceptually clean version here? - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges
On Fri, 29 Feb 2008, Andrea Arcangeli wrote: On Fri, Feb 29, 2008 at 01:03:16PM -0800, Christoph Lameter wrote: That means we need both the anon_vma locks and the i_mmap_lock to become semaphores. I think semaphores are better than mutexes. Rik and Lee saw some performance improvements because list can be traversed in parallel when the anon_vma lock is switched to be a rw lock. The improvement was with a rw spinlock IIRC, so I don't see how it's related to this. AFAICT The rw semaphore fastpath is similar in performance to a rw spinlock. Perhaps the rwlock spinlock can be changed to a rw semaphore without measurable overscheduling in the fast path. However theoretically Overscheduling? You mean overhead? speaking the rw_lock spinlock is more efficient than a rw semaphore in case of a little contention during the page fault fast path because the critical section is just a list_add so it'd be overkill to schedule while waiting. That's why currently it's a spinlock (or rw spinlock). On the other hand a semaphore puts the process to sleep and may actually improve performance because there is less time spend in a busy loop. Other processes may do something useful and we stay off the contended cacheline reducing traffic on the interconnect. preempt-rt runs quite a bit slower, or we could rip spinlocks out of the kernel in the first place ;) The question is why that is the case and it seesm that there are issues with interrupt on/off that are important here and particularly significant with the SLAB allocator (significant hacks there to deal with that issue). The fastpath that we have in the works for SLUB may address a large part of that issue because it no longer relies on disabling interrupts. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges
On Fri, Feb 29, 2008 at 01:34:34PM -0800, Christoph Lameter wrote: On Fri, 29 Feb 2008, Andrea Arcangeli wrote: On Fri, Feb 29, 2008 at 01:03:16PM -0800, Christoph Lameter wrote: That means we need both the anon_vma locks and the i_mmap_lock to become semaphores. I think semaphores are better than mutexes. Rik and Lee saw some performance improvements because list can be traversed in parallel when the anon_vma lock is switched to be a rw lock. The improvement was with a rw spinlock IIRC, so I don't see how it's related to this. AFAICT The rw semaphore fastpath is similar in performance to a rw spinlock. read side is taken in the slow path. write side is taken in the fast path. pagefault is fast path, VM during swapping is slow path. Perhaps the rwlock spinlock can be changed to a rw semaphore without measurable overscheduling in the fast path. However theoretically Overscheduling? You mean overhead? The only possible overhead that a rw semaphore could ever generate vs a rw lock is overscheduling. speaking the rw_lock spinlock is more efficient than a rw semaphore in case of a little contention during the page fault fast path because the critical section is just a list_add so it'd be overkill to schedule while waiting. That's why currently it's a spinlock (or rw spinlock). On the other hand a semaphore puts the process to sleep and may actually improve performance because there is less time spend in a busy loop. Other processes may do something useful and we stay off the contended cacheline reducing traffic on the interconnect. Yes, that's the positive side, the negative side is that you'll put the task in uninterruptible sleep and call schedule() and require a wakeup, because a list_add taking 1usec is running in the other cpu. No other downside. But that's the only reason it's a spinlock right now, infact there can't be any other reason. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges
On Fri, Feb 29, 2008 at 01:03:16PM -0800, Christoph Lameter wrote: That means we need both the anon_vma locks and the i_mmap_lock to become semaphores. I think semaphores are better than mutexes. Rik and Lee saw some performance improvements because list can be traversed in parallel when the anon_vma lock is switched to be a rw lock. The improvement was with a rw spinlock IIRC, so I don't see how it's related to this. Perhaps the rwlock spinlock can be changed to a rw semaphore without measurable overscheduling in the fast path. However theoretically speaking the rw_lock spinlock is more efficient than a rw semaphore in case of a little contention during the page fault fast path because the critical section is just a list_add so it'd be overkill to schedule while waiting. That's why currently it's a spinlock (or rw spinlock). Sounds like we get to a conceptually clean version here? I don't have a strong opinion if it should become a semaphore unconditionally or only with a CONFIG_XPMEM=y. But keep in mind preempt-rt runs quite a bit slower, or we could rip spinlocks out of the kernel in the first place ;) - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges
On Fri, Feb 29, 2008 at 02:12:57PM -0800, Christoph Lameter wrote: On Fri, 29 Feb 2008, Andrea Arcangeli wrote: AFAICT The rw semaphore fastpath is similar in performance to a rw spinlock. read side is taken in the slow path. Slowpath meaning VM slowpath or lock slow path? Its seems that the rwsem With slow path I meant the VM. Sorry if that was confusing given locks also have fast paths (no contention) and slow paths (contention). read side path is pretty efficient: Yes. The assembly doesn't worry me at all. pagefault is fast path, VM during swapping is slow path. Not sure what you are saying here. A pagefault should be considered as a fast path and swapping is not performance critical? Yes, swapping is I/O bound and it rarely becomes CPU hog in the common case. There are corner case workloads (including OOM) where swapping can become cpu bound (that's also where rwlock helps). But certainly the speed of fork() and a page fault, is critical for _everyone_, not just a few workloads and setups. Ok too many calls to schedule() because the slow path (of the semaphore) is taken? Yes, that's the only possible worry when converting a spinlock to mutex. But that is only happening for the contended case. Certainly a spinlock is better for 2p system but the more processors content for the lock (and the longer the hold off is, typical for the processors with 4p or 8p or more) the better a semaphore will work. Sure. That's also why the PT lock switches for 4way compiles. Config option helps to keep the VM optimal for everyone. Here it is possible it won't be necessary but I can't be sure given both i_mmap_lock and anon-vma lock are used in some many places. Some TPC comparison would be nice before making a default switch IMHO. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel