Re: [KVM paravirt issue?] Re: vsyscall=emulate regression
On 02/24/2012 08:58 PM, Andy Lutomirski wrote: On Thu, Feb 23, 2012 at 8:34 PM, H. Peter Anvin h...@zytor.com wrote: On 02/16/2012 09:39 AM, Avi Kivity wrote: Yes, this is on purpose Why? I think the this refers to the PF_INSTR fault when executing at 0xff600xxx. That's definitely intentional -- it's how vsyscall emulation works. I think it's unintentional that some kvm versions apparently forget to set the PF_INSTR bit. Correct. Can you provide the version that failed, so we can fix it? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM paravirt issue?] Re: vsyscall=emulate regression
On (Tue) 28 Feb 2012 [12:00:34], Avi Kivity wrote: On 02/24/2012 08:58 PM, Andy Lutomirski wrote: On Thu, Feb 23, 2012 at 8:34 PM, H. Peter Anvin h...@zytor.com wrote: On 02/16/2012 09:39 AM, Avi Kivity wrote: Yes, this is on purpose Why? I think the this refers to the PF_INSTR fault when executing at 0xff600xxx. That's definitely intentional -- it's how vsyscall emulation works. I think it's unintentional that some kvm versions apparently forget to set the PF_INSTR bit. Correct. Can you provide the version that failed, so we can fix it? I'm running this on a RHEL host, the version that fails is 2.6.32-220.4.1.el6.x86_64, but can't say when it got introduced, haven't gone back and checked that. Amit -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM paravirt issue?] Re: vsyscall=emulate regression
On Thu, Feb 23, 2012 at 8:34 PM, H. Peter Anvin h...@zytor.com wrote: On 02/16/2012 09:39 AM, Avi Kivity wrote: Yes, this is on purpose Why? I think the this refers to the PF_INSTR fault when executing at 0xff600xxx. That's definitely intentional -- it's how vsyscall emulation works. I think it's unintentional that some kvm versions apparently forget to set the PF_INSTR bit. --Andy -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- Andy Lutomirski AMA Capital Management, LLC Office: (310) 553-5322 Mobile: (650) 906-0647 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM paravirt issue?] Re: vsyscall=emulate regression
On 02/16/2012 09:39 AM, Avi Kivity wrote: Yes, this is on purpose Why? -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM paravirt issue?] Re: vsyscall=emulate regression
On 02/15/2012 09:36 PM, Andy Lutomirski wrote: Hi, kvm people- Here's a strange failure. It could be a bug in something RHEL6-specific, but it could be a generic issue that only triggers with a paravirt guest with old userspace on a non-ept host. There was a bug like this on Xen, and I'm wondering something's wrong on kvm as well. For background, a change in 3.1 (IIRC) means that, when vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is NX. It seems like Amit's machine is marking the physical PTE present but unreadable. No such thing as present and unreadable, without EPT. So I could have messed up, or there could be a subtle bug somewhere. Any ideas? What's the code trying to do? Execute an instruction from an non-executable page, trap the #PF, and emulate? And what are the symptoms? wrong error code for the #PF? That could easily be a kvm bug. I'll try to reproduce on a non-ept host later on, but that will involve finding one. rmmod kvm-intel moprobe kvm-intel ept=0 Hmm. You don't have ept. If your guest kernel supports paravirt, then you might use the hypercall interface instead of programming the fixmap directly. There is no hypercall interface for writing page tables in kvm. This is what I get with vsyscall=none, where emulate and native work fine on the 3.2 kernel on different host hardware, the guest stays the same: [2.874661] debug: unmapping init memory 8167f000..818dc000 [2.876778] Write protecting the kernel read-only data: 6144k [2.879111] debug: unmapping init memory 880001318000..88000140 [2.881242] debug: unmapping init memory 8800015a..88000160 [2.884637] init[1] vsyscall attempted with vsyscall=none ip:ff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0 This like (vsyscall attempted) means that the emulation worked correctly. Your other traces didn't have it or anything like it, which mostly rules out do_emulate_vsyscall issues. Can you point me at the code in question? Amit, a trace would be nice. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM paravirt issue?] Re: vsyscall=emulate regression
On Thu, Feb 16, 2012 at 8:17 AM, Avi Kivity a...@redhat.com wrote: On 02/15/2012 09:36 PM, Andy Lutomirski wrote: Hi, kvm people- Here's a strange failure. It could be a bug in something RHEL6-specific, but it could be a generic issue that only triggers with a paravirt guest with old userspace on a non-ept host. There was a bug like this on Xen, and I'm wondering something's wrong on kvm as well. For background, a change in 3.1 (IIRC) means that, when vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is NX. It seems like Amit's machine is marking the physical PTE present but unreadable. No such thing as present and unreadable, without EPT. So I could have messed up, or there could be a subtle bug somewhere. Any ideas? What's the code trying to do? Execute an instruction from an non-executable page, trap the #PF, and emulate? And what are the symptoms? wrong error code for the #PF? That could easily be a kvm bug. The symptom is that some kind of access to a page that's supposed to be readable, NX is reporting error 5. I'm not quite sure what kind of access is causing that. I'll try to reproduce on a non-ept host later on, but that will involve finding one. rmmod kvm-intel moprobe kvm-intel ept=0 I just tried that and still can't reproduce the problem. FWIW, I also failed to reproduce it on the one RHEL6 machine I have access to. Hmm. You don't have ept. If your guest kernel supports paravirt, then you might use the hypercall interface instead of programming the fixmap directly. There is no hypercall interface for writing page tables in kvm. Evidently I was looking at the removed kvm_set_pte stuff :) This is what I get with vsyscall=none, where emulate and native work fine on the 3.2 kernel on different host hardware, the guest stays the same: [ 2.874661] debug: unmapping init memory 8167f000..818dc000 [ 2.876778] Write protecting the kernel read-only data: 6144k [ 2.879111] debug: unmapping init memory 880001318000..88000140 [ 2.881242] debug: unmapping init memory 8800015a..88000160 [ 2.884637] init[1] vsyscall attempted with vsyscall=none ip:ff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0 This like (vsyscall attempted) means that the emulation worked correctly. Your other traces didn't have it or anything like it, which mostly rules out do_emulate_vsyscall issues. Can you point me at the code in question? The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall. The bad access is to the vsyscall page. Amit, a trace would be nice. The full output from a test boot of my (updated this morning) initramfs here: http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img may give a better hint. The updated code is here: #include unistd.h #include stdio.h #include string.h #include time.h typedef time_t (*vsys_time_t)(time_t *); int main() { vsys_time_t vsys_time = (vsys_time_t)(0xff600400); unsigned char *p = (char*)0xff600400; int i; printf(Will try reading...\n); printf(The first few bytes are:\n); for (i = 0; i 16; i++) { unsigned char c = p[i]; printf(%02x , (int)c); } printf(\n); printf(Will try executing...\n); printf(The time is %ld\n, (long)( vsys_time(0) )); printf(All done\n); while(1) pause(); } --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM paravirt issue?] Re: vsyscall=emulate regression
On 02/16/2012 06:45 PM, Andy Lutomirski wrote: So I could have messed up, or there could be a subtle bug somewhere. Any ideas? What's the code trying to do? Execute an instruction from an non-executable page, trap the #PF, and emulate? And what are the symptoms? wrong error code for the #PF? That could easily be a kvm bug. The symptom is that some kind of access to a page that's supposed to be readable, NX is reporting error 5. I'm not quite sure what kind of access is causing that. Might it be a fetch access, with kvm forgetting to set bit 4 correctly? Can you point me at the code in question? The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall. The bad access is to the vsyscall page. The bad access is on purpose, yes? From fault.c: #ifdef CONFIG_X86_64 /* * Instruction fetch faults in the vsyscall page might need * emulation. */ if (unlikely((error_code PF_INSTR) ((address ~0xfff) == VSYSCALL_START))) { if (emulate_vsyscall(regs, address)) return; } #endif so it seems like kvm doesn't set PF_INSTR? I thought we unit tested that, but maybe not this exact scenario. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM paravirt issue?] Re: vsyscall=emulate regression
On Thu, Feb 16, 2012 at 9:14 AM, Avi Kivity a...@redhat.com wrote: On 02/16/2012 06:45 PM, Andy Lutomirski wrote: So I could have messed up, or there could be a subtle bug somewhere. Any ideas? What's the code trying to do? Execute an instruction from an non-executable page, trap the #PF, and emulate? And what are the symptoms? wrong error code for the #PF? That could easily be a kvm bug. The symptom is that some kind of access to a page that's supposed to be readable, NX is reporting error 5. I'm not quite sure what kind of access is causing that. Might it be a fetch access, with kvm forgetting to set bit 4 correctly? Can you point me at the code in question? The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall. The bad access is to the vsyscall page. The bad access is on purpose, yes? From fault.c: #ifdef CONFIG_X86_64 /* * Instruction fetch faults in the vsyscall page might need * emulation. */ if (unlikely((error_code PF_INSTR) ((address ~0xfff) == VSYSCALL_START))) { if (emulate_vsyscall(regs, address)) return; } #endif so it seems like kvm doesn't set PF_INSTR? Yes, this is on purpose, and you're almost certainly right (and I feel dumb for not figuring this out immediately). The error message is: segfault at ff600400 ip ff600400 sp 7fff103d72f8 error 5 which is garbage. The instruction at 0xff600400 can't fetch itself as data and fault on the data access (at least not in 64-bit mode, as far as I can think of, without evil messing with the TLBs). So... what do we do about this? This (whitespace-damaged, untested) patch will probably work around it well enough to boot the system: diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 9d74824..52b9522 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -741,8 +741,11 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long * Instruction fetch faults in the vsyscall page might need * emulation. */ - if (unlikely((error_code PF_INSTR) + if (unlikely(address == regs-ip !(error_code PF_WRITE) ((address ~0xfff) == VSYSCALL_START))) { + WARN_ONCE(!(error_code PF_INSTR), + Fixing up bogus vsyscall read fault -- + your hypervisor is buggy.); if (emulate_vsyscall(regs, address)) return; } Before we patch the guest like this, though, it would be nice to know what hosts are affected. If it's just one version of RHEL6, maybe it makes sense to fix the hypervisor and either leave the guest alone or just add a warning saying to fix your hypervisor, like: WARN_ONCE(address == regs-ip !(error_code (PF_INSTR | PF_WRITE)) user_64bit_mode(regs), Fishy page fault -- you might need to fix your hypervisor); near some exit path in the page fault handler. The 64-bit check is because (I think) 32-bit code can mess with regs-ip using a cs offset in the LDT and trigger the warning at will. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM paravirt issue?] Re: vsyscall=emulate regression
On 02/16/2012 07:35 PM, Andy Lutomirski wrote: so it seems like kvm doesn't set PF_INSTR? Yes, this is on purpose, and you're almost certainly right (and I feel dumb for not figuring this out immediately). The error message is: segfault at ff600400 ip ff600400 sp 7fff103d72f8 error 5 which is garbage. The instruction at 0xff600400 can't fetch itself as data and fault on the data access (at least not in 64-bit mode, as far as I can think of, without evil messing with the TLBs). So... what do we do about this? This (whitespace-damaged, untested) patch will probably work around it well enough to boot the system: diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 9d74824..52b9522 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -741,8 +741,11 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long * Instruction fetch faults in the vsyscall page might need * emulation. */ - if (unlikely((error_code PF_INSTR) + if (unlikely(address == regs-ip !(error_code PF_WRITE) ((address ~0xfff) == VSYSCALL_START))) { + WARN_ONCE(!(error_code PF_INSTR), + Fixing up bogus vsyscall read fault -- + your hypervisor is buggy.); if (emulate_vsyscall(regs, address)) return; } Before we patch the guest like this, though, it would be nice to know what hosts are affected. If it's just one version of RHEL6, maybe it makes sense to fix the hypervisor and either leave the guest alone or just add a warning saying to fix your hypervisor, like: WARN_ONCE(address == regs-ip !(error_code (PF_INSTR | PF_WRITE)) user_64bit_mode(regs), Fishy page fault -- you might need to fix your hypervisor); near some exit path in the page fault handler. The 64-bit check is because (I think) 32-bit code can mess with regs-ip using a cs offset in the LDT and trigger the warning at will. We'll just fix all affected hypervisor versions. No need to uglify the guest for a clear kvm bug. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vsyscall=emulate regression
On (Tue) 14 Feb 2012 [08:26:22], Andy Lutomirski wrote: On Tue, Feb 14, 2012 at 4:22 AM, Amit Shah amit.s...@redhat.com wrote: On (Fri) 03 Feb 2012 [13:57:48], Amit Shah wrote: Hello, I'm booting some latest kernels on a Fedora 11 (released June 2009) guest. After the recent change of default to vsyscall=emulate, the guest fails to boot (init segfaults). I also tried vsyscall=none, as suggested by hpa, and that fails as well. Only vsyscall=native works fine. The commit that introduced the kernel parameter, 3ae36655b97a03fa1decf72f04078ef945647c1a is bad too. I suggest we revert 2e57ae0515124af45dd889bfbd4840fd40fcc07d till we track down and fix the vsyscal=emulate case. Hi- Sorry, I lost track of this one. I can't reproduce it, although I doubt I've set up the right test environment. But this is fishy: init[1]: segfault at ff600400 ip ff600400 sp 7fff9c8ba098 error 5 Error 5, if I'm decoding it correctly, is a userspace read (i.e. not execute) fault. The vsyscall emulation changes shouldn't have had any effect on reads there. Can you try booting the initramfs here: http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img with your kernel image (i.e. qemu-kvm -kernel whatever -initrd vsyscall_initramfs.img -whatever_else) and seeing what happens? It works for me. This too results in a similar error. I'm also curious what happens if you run without kvm (i.e. straight qemu) Interesting; without kvm, this does work fine. and what your .config on the guest kernel is. It sounds like something's wrong with your fixmap, which makes me wonder if your qemu/kernel combo is capable of booting even a modern distro (up-to-date F16, say) -- the vvar page uses identical fixmap flags as the vsyscall page in vsyscall=emulate and vsyscall=none mode. I didn't try a modern distro, but looks like this is enough evidence for now to check the kvm emulator code. I tried the same guests on a newer kernel (Fedora 16's 3.2), and things worked fine except for vsyscall=none, panic message below. What host cpu are you on and what qemu flags do you use? $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz stepping: 11 cpu MHz : 2000.000 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow vnmi flexpriority bogomips: 4654.73 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Maybe something is wrong with your emulator. Yes, looks like it. Thanks! This is what I get with vsyscall=none, where emulate and native work fine on the 3.2 kernel on different host hardware, the guest stays the same: [2.874661] debug: unmapping init memory 8167f000..818dc000 [2.876778] Write protecting the kernel read-only data: 6144k [2.879111] debug: unmapping init memory 880001318000..88000140 [2.881242] debug: unmapping init memory 8800015a..88000160 [2.884637] init[1] vsyscall attempted with vsyscall=none ip:ff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0 [2.888078] init[1]: segfault at ff600400 ip ff600400 sp 7fff2f48fe18 error 15 [2.888193] Refined TSC clocksource calibration: 2691.293 MHz. [2.892748] [2.895219] Kernel panic - not syncing: Attempted to kill init! Amit -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM paravirt issue?] Re: vsyscall=emulate regression
Hi, kvm people- Here's a strange failure. It could be a bug in something RHEL6-specific, but it could be a generic issue that only triggers with a paravirt guest with old userspace on a non-ept host. There was a bug like this on Xen, and I'm wondering something's wrong on kvm as well. For background, a change in 3.1 (IIRC) means that, when vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is NX. It seems like Amit's machine is marking the physical PTE present but unreadable. So I could have messed up, or there could be a subtle bug somewhere. Any ideas? I'll try to reproduce on a non-ept host later on, but that will involve finding one. On Wed, Feb 15, 2012 at 3:01 AM, Amit Shah amit.s...@redhat.com wrote: On (Tue) 14 Feb 2012 [08:26:22], Andy Lutomirski wrote: On Tue, Feb 14, 2012 at 4:22 AM, Amit Shah amit.s...@redhat.com wrote: Can you try booting the initramfs here: http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img with your kernel image (i.e. qemu-kvm -kernel whatever -initrd vsyscall_initramfs.img -whatever_else) and seeing what happens? It works for me. This too results in a similar error. Can you post the exact error? I'm interested in how far it gets before it fails. I didn't try a modern distro, but looks like this is enough evidence for now to check the kvm emulator code. I tried the same guests on a newer kernel (Fedora 16's 3.2), and things worked fine except for vsyscall=none, panic message below. vsyscall=none isn't supposed to work unless you're running a very modern distro *and* you have no legacy static binaries *and* you aren't using anything written in Go (sigh). It will probably either never become the default or will take 5-10 years. model name : Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow vnmi flexpriority Hmm. You don't have ept. If your guest kernel supports paravirt, then you might use the hypercall interface instead of programming the fixmap directly. This is what I get with vsyscall=none, where emulate and native work fine on the 3.2 kernel on different host hardware, the guest stays the same: [ 2.874661] debug: unmapping init memory 8167f000..818dc000 [ 2.876778] Write protecting the kernel read-only data: 6144k [ 2.879111] debug: unmapping init memory 880001318000..88000140 [ 2.881242] debug: unmapping init memory 8800015a..88000160 [ 2.884637] init[1] vsyscall attempted with vsyscall=none ip:ff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0 This like (vsyscall attempted) means that the emulation worked correctly. Your other traces didn't have it or anything like it, which mostly rules out do_emulate_vsyscall issues. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html