Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eric W. Biederman wrote: svm can writeback into memory at odd times if we don't do this, and the cost is small - clear a bit in EFER. There's no reason to be lazy. Especially if we can clear that bit unconditionally (when EFER is present) I'm all for it. That is the case. Cool. I forget if we have to test for EFER on 32bit, or if we can just wrmsr and be done with it. Regardless that sounds easy to do on the kexec path. I'm pretty sure you have to test for it first; pre-64 bit x86 hardware doesn't have the EFER register, so you'll fault on access. On the other hand pre-64 bit x86 hardware doesn't usually (ever?) have VT extensions either. -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Chris Lalancette wrote: I'm pretty sure you have to test for it first; pre-64 bit x86 hardware doesn't have the EFER register, so you'll fault on access. On the other hand pre-64 bit x86 hardware doesn't usually (ever?) have VT extensions either. All amd hardware that supports svm has an efer. Some Intel hardware doesn't, but we don't care in that case. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eric W. Biederman wrote: Avi Kivity [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Most of the reason I was wondering is that the cpu hardware probing largely seems to be a duplicate of what we have in the core for probing cpu capabilities already, and could likely be made smaller by building upon the existing codebase. We use the core cpuid functions, or are you referring to something else? I was referring to the arch/x86/kernel/cpu/* and arch/x86/include/asm/cpufeature.h The core functions that are responsible for detecting all of the cpu features, and disabling them if there are cpu errata. The usual pattern is that code does all of the magic detection logic and then the code that would use it would just need to test: cpu_has_vmx or cpu_has_svm. vmx is much more complicated than that, with some features define in read-only msrs. At least in part that allows us to show the working cpu features in /proc/cpuinfo. Yes that's a problem now; you can only tell if you have vmx or not, without any information as to the various vmx subfeatures. Cool. I forget if we have to test for EFER on 32bit, or if we can just wrmsr and be done with it. Regardless that sounds easy to do on the kexec path. if (cpu_has_svm()) disable_svme(); -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eric W. Biederman wrote: I think I will get flamed if I try to pull to the core a bunch of code that always lived in the KVM module. 8) Why is KVM modular anyway? That seems like some pretty core cpu functionality... Many reasons. Developers like the ability to rmmod and modprobe during development. Distros like to keep their non-modular core small. There is an external module distribution that allows users to graft a new kvm on an old kernel, which our testers and bleeding edge users like. Because it's there. There's always CONFIG_KVM=y if you don't want it. Depending. It doesn't sound like svm has the problem where init doesn't work so svm really doesn't need to do this. svm can writeback into memory at odd times if we don't do this, and the cost is small - clear a bit in EFER. There's no reason to be lazy. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eduardo Habkost wrote: I think I will get flamed if I try to pull to the core a bunch of code that always lived in the KVM module. 8) Having a simple api to enter vmx or svm mode in core code should not be too bad. The kvm hardware_enable() stuff could simply call that. The only downsides I see are a very minor bloat to the core, and spreading kvm code around. And even if we pull those functions to the core, we will still have a function pointer on the new code anyway, because we would need to support vmx and svm. Or have an if () statement there. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Avi Kivity [EMAIL PROTECTED] writes: Eric W. Biederman wrote: I think I will get flamed if I try to pull to the core a bunch of code that always lived in the KVM module. 8) Why is KVM modular anyway? That seems like some pretty core cpu functionality... Many reasons. Developers like the ability to rmmod and modprobe during development. Distros like to keep their non-modular core small. There is an external module distribution that allows users to graft a new kvm on an old kernel, which our testers and bleeding edge users like. Because it's there. Most of the reason I was wondering is that the cpu hardware probing largely seems to be a duplicate of what we have in the core for probing cpu capabilities already, and could likely be made smaller by building upon the existing codebase. svm can writeback into memory at odd times if we don't do this, and the cost is small - clear a bit in EFER. There's no reason to be lazy. Especially if we can clear that bit unconditionally (when EFER is present) I'm all for it. Eric -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eric W. Biederman wrote: Most of the reason I was wondering is that the cpu hardware probing largely seems to be a duplicate of what we have in the core for probing cpu capabilities already, and could likely be made smaller by building upon the existing codebase. We use the core cpuid functions, or are you referring to something else? svm can writeback into memory at odd times if we don't do this, and the cost is small - clear a bit in EFER. There's no reason to be lazy. Especially if we can clear that bit unconditionally (when EFER is present) I'm all for it. That is the case. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
On Mon, Oct 27, 2008 at 09:09:37AM -0400, Vivek Goyal wrote: On Mon, Oct 27, 2008 at 10:54:01AM +0200, Avi Kivity wrote: snip The hang was likely caused by vmx blocking INIT. Sigh. Avi, We boot kdump kernel with maxcpus=1. IIUC, in that code path we will not be using INIT. So did you try booting kdump kernel with maxcpus=1 and did it work for you? If not than problem could be something else. Just checked it here: I am passing maxcpus=1 to the kdump kernel. My full kexec commandline is this: /sbin/kexec --console-serial -p '--command-line=ro root=/dev/vg0/rawhide rhgb console=ttyS0,115200 ignore_loglevel irqpoll maxcpus=1 reset_devices earlyprintk=ttyS0,115200' --initrd=/boot/initrd-2.6.28-rc2kdump.img /boot/vmlinuz-2.6.28-rc2 Additional info: I don't get the I'm in purgatory message on serial console, when the system hangs. I do get the message if I trigger the crash dump after unloading the kvm-intel module, so the serial console is working. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eduardo Habkost wrote: We boot kdump kernel with maxcpus=1. IIUC, in that code path we will not be using INIT. So did you try booting kdump kernel with maxcpus=1 and did it work for you? If not than problem could be something else. vmx also disallows changing some bits in cr0 and cr4; for example you can't switch off paging. This may interfere with the boot sequence. The full set of pleasant surprises is detailed in section 19.8 of the SDM volume 3, RESTRICTIONS ON VMX OPERATION. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
On Wed, Oct 29, 2008 at 06:10:06PM -0200, Eduardo Habkost wrote: On Mon, Oct 27, 2008 at 09:09:37AM -0400, Vivek Goyal wrote: On Mon, Oct 27, 2008 at 10:54:01AM +0200, Avi Kivity wrote: snip The hang was likely caused by vmx blocking INIT. Sigh. Avi, We boot kdump kernel with maxcpus=1. IIUC, in that code path we will not be using INIT. So did you try booting kdump kernel with maxcpus=1 and did it work for you? If not than problem could be something else. Just checked it here: I am passing maxcpus=1 to the kdump kernel. My full kexec commandline is this: /sbin/kexec --console-serial -p '--command-line=ro root=/dev/vg0/rawhide rhgb console=ttyS0,115200 ignore_loglevel irqpoll maxcpus=1 reset_devices earlyprintk=ttyS0,115200' --initrd=/boot/initrd-2.6.28-rc2kdump.img /boot/vmlinuz-2.6.28-rc2 Additional info: I don't get the I'm in purgatory message on serial console, when the system hangs. I do get the message if I trigger the crash dump after unloading the kvm-intel module, so the serial console is working. That means we are not even beginning to boot second kernel and are getting lost somewhere in the first kernel itself (most likely relocate_kernel_32/64.S). And it could be because of any of the interesting restrictions put when vmx is enabled (As Avi mentioned in another mail.). To debug that code, I used to put some outb messages to output a character to serial console and find out which instructions is creating trouble. You can debug to find out exactly what operation is not allowed when vmx is enabled or simply write the patches to disable vmx and it should automatically lead to problem resolution. Thanks Vivek -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Vivek Goyal [EMAIL PROTECTED] writes: On Wed, Oct 29, 2008 at 06:10:06PM -0200, Eduardo Habkost wrote: On Mon, Oct 27, 2008 at 09:09:37AM -0400, Vivek Goyal wrote: On Mon, Oct 27, 2008 at 10:54:01AM +0200, Avi Kivity wrote: snip The hang was likely caused by vmx blocking INIT. Sigh. Avi, We boot kdump kernel with maxcpus=1. IIUC, in that code path we will not be using INIT. So did you try booting kdump kernel with maxcpus=1 and did it work for you? If not than problem could be something else. Just checked it here: I am passing maxcpus=1 to the kdump kernel. My full kexec commandline is this: /sbin/kexec --console-serial -p '--command-line=ro root=/dev/vg0/rawhide rhgb console=ttyS0,115200 ignore_loglevel irqpoll maxcpus=1 reset_devices earlyprintk=ttyS0,115200' --initrd=/boot/initrd-2.6.28-rc2kdump.img /boot/vmlinuz-2.6.28-rc2 Additional info: I don't get the I'm in purgatory message on serial console, when the system hangs. I do get the message if I trigger the crash dump after unloading the kvm-intel module, so the serial console is working. That means we are not even beginning to boot second kernel and are getting lost somewhere in the first kernel itself (most likely relocate_kernel_32/64.S). And it could be because of any of the interesting restrictions put when vmx is enabled (As Avi mentioned in another mail.). To debug that code, I used to put some outb messages to output a character to serial console and find out which instructions is creating trouble. You can debug to find out exactly what operation is not allowed when vmx is enabled or simply write the patches to disable vmx and it should automatically lead to problem resolution. The transfer between kernels happens with virtual addresses identity mapped to physical addresses. For 32bit mode we implement this by disabling paging. Which apparently is one of the interesting restrictions placed of vmx. Eric -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Avi Kivity [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Most of the reason I was wondering is that the cpu hardware probing largely seems to be a duplicate of what we have in the core for probing cpu capabilities already, and could likely be made smaller by building upon the existing codebase. We use the core cpuid functions, or are you referring to something else? I was referring to the arch/x86/kernel/cpu/* and arch/x86/include/asm/cpufeature.h The core functions that are responsible for detecting all of the cpu features, and disabling them if there are cpu errata. The usual pattern is that code does all of the magic detection logic and then the code that would use it would just need to test: cpu_has_vmx or cpu_has_svm. At least in part that allows us to show the working cpu features in /proc/cpuinfo. svm can writeback into memory at odd times if we don't do this, and the cost is small - clear a bit in EFER. There's no reason to be lazy. Especially if we can clear that bit unconditionally (when EFER is present) I'm all for it. That is the case. Cool. I forget if we have to test for EFER on 32bit, or if we can just wrmsr and be done with it. Regardless that sounds easy to do on the kexec path. Eric -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
On Mon, Oct 27, 2008 at 10:32:43AM -0700, Eric W. Biederman wrote: Avi Kivity [EMAIL PROTECTED] writes: snip I wouldn't mind notifiers (with a nice comment explaining that you must know what you're doing, though that's the case with most kernel APIs). I'm fine with either approach. This is the 3rd request I have seen for a notifier. This is the first request I have seen for code that must be executed in the kexec on panic path. So history suggest to me that notifiers make it unreasonably easy to get code onto the kexec on panic code path. Occasionally the kexec on panic code path is tested to see how well it works in strange situations like being called from a stack overflow etc. The rest of the history is that previous attempts like lkcd had very programmer friendly interfaces, that worked fine in test environments giving beautiful core dumps, but when things broke in the field they were essentially useless. The kdump approach is still not completely reliable but it does work well enough that people get useful crash dumps sometimes. I feel anything that makes the kexec on panic code path harder to verify it will work when things are crazy broken, like a notifier is something we should avoid. I am still wondering if a simple function pointer (instead of a full notifier interface) would be good enough. It looks like a reasonable tradeoff. I think I will get flamed if I try to pull to the core a bunch of code that always lived in the KVM module. 8) And even if we pull those functions to the core, we will still have a function pointer on the new code anyway, because we would need to support vmx and svm. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eduardo Habkost wrote: (we can use NMI IPIs, but that will likely be messy) NMI IPIs are already used on x86 native_machine_crash_shutdown(), so it wouldn't get more messy that it is currently. We just need to add another bit of code to the code that already runs on an NMI handler. That looks like the easiest (and best) way out. My question is: is a notifier chain too much complexity for a sensible piece of code like that? If so, a compile-time hook on that point would be safer, I think an unconditional vmx disable is wanted here, so kexec can work with other hypervisors. but then it wouldn't work when KVM is compiled as a out-of-tree module. The external module can do without. It's possible to hijack the nmi vector, but I don't think that's a good idea. If someone wants kexec+vmx on an older kernel, they can patch that kernel. But what happens when the kdump kernel reboots? If it is uniprocessor, it will never have a chance to disable vmx on other cpus. Using acpi reset (now default) works around this on some machines, but not all. Good point. My problem was a hang when booting the kdump kernel, but it may also cause problems later, when the kdump kernel reboots. The hang was likely caused by vmx blocking INIT. Sigh. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eric W. Biederman wrote: But what happens when the kdump kernel reboots? If it is uniprocessor, it will never have a chance to disable vmx on other cpus. Using acpi reset (now default) works around this on some machines, but not all. Mostly likely any reboot path will trigger software to toggle the reset line on the board. At least that has been my experience, and the reason we don't see many problems when we fail to properly shutdown devices before reboot. If vmx persists across that it would seem to be very broken by design. We've observed on some machines that keyboard controller reset didn't work, and that the fallback to triple-fault asserted INIT and not RESET (and that this INIT was blocked, hanging the machine). Switching to acpi reset worked on some machines, but not all. Either acpi reset was not implemented on the failing machines, or it was wired to INIT and not RESET. My objections are: on_each_cpu called from after a panic looks like a good way to loose control of the box and never return. Adding a notifier looks like a good way too add functionality onto the kdump path that never gets reviewed for robustness after a kernel crash and thus a good way to remove the usefulness of the kexec on panic kernel path. Since we already have NMI IPIs, we could disable vmx there. If it is done unconditionally and without notifiers, there is no chance of having unreviewed surprises sneaking in. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eric W. Biederman wrote: NMI IPIs are already used on x86 native_machine_crash_shutdown(), so it wouldn't get more messy that it is currently. We just need to add another bit of code to the code that already runs on an NMI handler. Yes. And handling of those NMIs is best effort. Nothing fails if they don't actually run. Unless someone can come up with another way to disable vmx remotely, that's going to change if you have vmx enabled. Well we could fairly easily have a non-modular function that does. if (vmx_present vmx_enabled) { turn_off_vmx(); } Which at first skim looks like it is all of about 10-20 machine instructions. There's no way to query whether vmx is enabled or disabled, AFAICT. So we have to execute vmxoff and ignore possible #UDs. If we trust the exception handlers, there's no problem. Otherwise we need to replace the current #UD handler with an iret (perhaps switching temporarily to another IDT). There are a few real places where we need code on the kdump path because there it is not possible to do the work any other way. However we need to think long and hard about that because placing the code anywhere besides in a broken and failing kernel is going to be easier to maintain and more reliable. vmx blocking INITs makes it impossible to leave this to the new kernel. I oppose an atomic notifier because it makes the review essentially impossible. If any module can come in and register a notifier we can't know what code is running on that code path and we can't be certain the code is safe in an abnormal case to run on that code path. What if it's a specialized notifier for kexec? Or even kexec_crash? That said, I have no issue with static code at the call site. Right now we only need to support vmx on the kdump path because of what appears to be a hardware design bug. Enabling vmx apparently disables standard functions like an INIT IPI. Things like this do happen but they should be rare. The general kexec path also wants this fixed. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
On Mon, Oct 27, 2008 at 11:13:41AM +0200, Avi Kivity wrote: Eric W. Biederman wrote: NMI IPIs are already used on x86 native_machine_crash_shutdown(), so it wouldn't get more messy that it is currently. We just need to add another bit of code to the code that already runs on an NMI handler. Yes. And handling of those NMIs is best effort. Nothing fails if they don't actually run. Unless someone can come up with another way to disable vmx remotely, that's going to change if you have vmx enabled. Well we could fairly easily have a non-modular function that does. if (vmx_present vmx_enabled) { turn_off_vmx(); } Which at first skim looks like it is all of about 10-20 machine instructions. There's no way to query whether vmx is enabled or disabled, AFAICT. So we have to execute vmxoff and ignore possible #UDs. Oops. This means the notifier my patches add would break, if vmx is disabled on any CPU. Can't we just set a flag when we are about to enable vmx, so we run vmxoff only when know it's enabled? There will be a tiny window between setting this flag and and actually running vmxon where things could go wrong, but this doesn't look that bad. Having to handle #UD would make things more messy, in my opinion. BTW, is this problem vmx-specific? Do we need to do something similar for svm? If we trust the exception handlers, there's no problem. Otherwise we need to replace the current #UD handler with an iret (perhaps switching temporarily to another IDT). I think we can't fully trust anything if we are on the crash dump path, so the less code we depend on, the better. There are a few real places where we need code on the kdump path because there it is not possible to do the work any other way. However we need to think long and hard about that because placing the code anywhere besides in a broken and failing kernel is going to be easier to maintain and more reliable. vmx blocking INITs makes it impossible to leave this to the new kernel. I oppose an atomic notifier because it makes the review essentially impossible. If any module can come in and register a notifier we can't know what code is running on that code path and we can't be certain the code is safe in an abnormal case to run on that code path. What if it's a specialized notifier for kexec? Or even kexec_crash? The patches I've sent to the kvm mailing list added a notifier interface specific for kexec_crash, using raw_notifier_*(). IMO, if a notifier registration interface was acceptable, the raw notifiers would be good enough for that. But Eric seems to think that adding a notifier registration interface for the crash handler path wouldn't be a good idea, and I am starting to agree with him. That said, I have no issue with static code at the call site. Right now we only need to support vmx on the kdump path because of what appears to be a hardware design bug. Enabling vmx apparently disables standard functions like an INIT IPI. Things like this do happen but they should be rare. The general kexec path also wants this fixed. When I've tested it, kexec called the kvm reboot notifier, so everything worked fine. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
On Mon, Oct 27, 2008 at 10:54:01AM +0200, Avi Kivity wrote: Eduardo Habkost wrote: (we can use NMI IPIs, but that will likely be messy) NMI IPIs are already used on x86 native_machine_crash_shutdown(), so it wouldn't get more messy that it is currently. We just need to add another bit of code to the code that already runs on an NMI handler. That looks like the easiest (and best) way out. My question is: is a notifier chain too much complexity for a sensible piece of code like that? If so, a compile-time hook on that point would be safer, I think an unconditional vmx disable is wanted here, so kexec can work with other hypervisors. but then it wouldn't work when KVM is compiled as a out-of-tree module. The external module can do without. It's possible to hijack the nmi vector, but I don't think that's a good idea. If someone wants kexec+vmx on an older kernel, they can patch that kernel. But what happens when the kdump kernel reboots? If it is uniprocessor, it will never have a chance to disable vmx on other cpus. Using acpi reset (now default) works around this on some machines, but not all. Good point. My problem was a hang when booting the kdump kernel, but it may also cause problems later, when the kdump kernel reboots. The hang was likely caused by vmx blocking INIT. Sigh. Avi, We boot kdump kernel with maxcpus=1. IIUC, in that code path we will not be using INIT. So did you try booting kdump kernel with maxcpus=1 and did it work for you? If not than problem could be something else. Thanks Vivek -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eduardo Habkost wrote: Can't we just set a flag when we are about to enable vmx, so we run vmxoff only when know it's enabled? There will be a tiny window between setting this flag and and actually running vmxon where things could go wrong, but this doesn't look that bad. It makes more sense to have a vmxon api in the core; you call it, the kernel enables it and sets a flag; then either you or the core can disable it. Having to handle #UD would make things more messy, in my opinion. It's not too bad, either relying on exception handlers or hacking our own. BTW, is this problem vmx-specific? Do we need to do something similar for svm? svm needs it as well, since it shares some memory with the cpu. It's less critical though, will likely work even without it. If we trust the exception handlers, there's no problem. Otherwise we need to replace the current #UD handler with an iret (perhaps switching temporarily to another IDT). I think we can't fully trust anything if we are on the crash dump path, so the less code we depend on, the better. So we can point #UD temporarily at an 'addq $3, (%rsp); iret' for the vmxoff instruction. Or implement the 'enable virt extensions' API. The patches I've sent to the kvm mailing list added a notifier interface specific for kexec_crash, using raw_notifier_*(). IMO, if a notifier registration interface was acceptable, the raw notifiers would be good enough for that. But Eric seems to think that adding a notifier registration interface for the crash handler path wouldn't be a good idea, and I am starting to agree with him. I wouldn't mind notifiers (with a nice comment explaining that you must know what you're doing, though that's the case with most kernel APIs). I'm fine with either approach. The general kexec path also wants this fixed. When I've tested it, kexec called the kvm reboot notifier, so everything worked fine. Oh, okay. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Vivek Goyal wrote: The hang was likely caused by vmx blocking INIT. Sigh. Avi, We boot kdump kernel with maxcpus=1. IIUC, in that code path we will not be using INIT. So did you try booting kdump kernel with maxcpus=1 and did it work for you? If not than problem could be something else. It wasn't me who tried it, so I can't tell. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Avi Kivity [EMAIL PROTECTED] writes: There's no way to query whether vmx is enabled or disabled, AFAICT. So we have to execute vmxoff and ignore possible #UDs. If we trust the exception handlers, there's no problem. Otherwise we need to replace the current #UD handler with an iret (perhaps switching temporarily to another IDT). Ugh. We already change the IDT on that code path so that may be a way to go. The general kexec path also wants this fixed. It looks like someone hooked the reboot notifier which should be called on the normal kexec path. Eric -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
On Mon, Oct 27, 2008 at 08:02:31AM -0700, Eric W. Biederman wrote: Avi Kivity [EMAIL PROTECTED] writes: Eric W. Biederman wrote: We've observed on some machines that keyboard controller reset didn't work, and that the fallback to triple-fault asserted INIT and not RESET (and that this INIT was blocked, hanging the machine). Switching to acpi reset worked on some machines, but not all. Either acpi reset was not implemented on the failing machines, or it was wired to INIT and not RESET. Right. INIT does not reset things like the MTRRs, I had forgotten that distinction. Frequently the firmware when it regains control at 0x0 asserts reset, but if we can't get there. Ouch! Since we already have NMI IPIs, we could disable vmx there. If it is done unconditionally and without notifiers, there is no chance of having unreviewed surprises sneaking in. Sounds good to me. The problem here is that we can't disable it unconditionally, so we need to either hack into #UD, or track on which CPUs it is enabled. But tracking on which CPUs it is enabled is exactly what KVM hardware_enable()/hardware_disable() do. To avoid hacking into #UD only for that, I was being inclined to simply add core code to track on which CPUs vmx is enabled. But just calling back into KVM doesn't look too bad, as long as the callback does only what is strictly necessary. It looks a bit better than moving kvm code to the core, and looks simple enough to be better than hacking #UD. Maybe we could have a simple virt-specific callback pointer instead of a whole notifier chain? -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Avi Kivity [EMAIL PROTECTED] writes: Eduardo Habkost wrote: Can't we just set a flag when we are about to enable vmx, so we run vmxoff only when know it's enabled? There will be a tiny window between setting this flag and and actually running vmxon where things could go wrong, but this doesn't look that bad. It makes more sense to have a vmxon api in the core; you call it, the kernel enables it and sets a flag; then either you or the core can disable it. Yes. That sounds like the most maintainable approach. The patches I've sent to the kvm mailing list added a notifier interface specific for kexec_crash, using raw_notifier_*(). IMO, if a notifier registration interface was acceptable, the raw notifiers would be good enough for that. But Eric seems to think that adding a notifier registration interface for the crash handler path wouldn't be a good idea, and I am starting to agree with him. I wouldn't mind notifiers (with a nice comment explaining that you must know what you're doing, though that's the case with most kernel APIs). I'm fine with either approach. This is the 3rd request I have seen for a notifier. This is the first request I have seen for code that must be executed in the kexec on panic path. So history suggest to me that notifiers make it unreasonably easy to get code onto the kexec on panic code path. Occasionally the kexec on panic code path is tested to see how well it works in strange situations like being called from a stack overflow etc. The rest of the history is that previous attempts like lkcd had very programmer friendly interfaces, that worked fine in test environments giving beautiful core dumps, but when things broke in the field they were essentially useless. The kdump approach is still not completely reliable but it does work well enough that people get useful crash dumps sometimes. I feel anything that makes the kexec on panic code path harder to verify it will work when things are crazy broken, like a notifier is something we should avoid. Eric -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Avi Kivity [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Why do we need to disable vmx mode before booting a normal linux kernel? vmx mode blocks INIT (even on the host; not just on the guests) *blink* broken hardware design there. so reboots don't work. It also assigns some memory to the cpu; if the new kernel isn't aware of it, Not a problem for a kdump kernel, as it lives out of a reserved region of memory. the cpu and the kernel would both think it belongs to them. Finally, if vmx mode is enabled, you can't start kvm on the new kernel. This isn't especially interesting in the crash dump scenario. Is it possible to disable vmx mode before we enable interrrupts in the kdump kernel? You need IPIs to disable vmx on smp. Thank you. Reading your description and taking a quick look at the code in hardware disable it does not appear that there is anything needed (other than restricting ourselves it running uniprocessor in the kdump kernel) that needs to happen. Certainly it would be nice to have kvm disabled in hardware, but if you are proposing using the existing hardware disable I must say that the cure looks much worse than the disease. It looks like the disable function is all of about 20 assembly instructions so I would not have a problem if he had a little inline function we could call that test to see if vmx is enabled and disable it in the case of kexec on panic. The normal polite shutdown. That just looks like asking for trouble. Eric -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eric W. Biederman wrote: so reboots don't work. It also assigns some memory to the cpu; if the new kernel isn't aware of it, Not a problem for a kdump kernel, as it lives out of a reserved region of memory. But it is a problem for general kexec. Is it possible to disable vmx mode before we enable interrrupts in the kdump kernel? You need IPIs to disable vmx on smp. Thank you. Reading your description and taking a quick look at the code in hardware disable it does not appear that there is anything needed (other than restricting ourselves it running uniprocessor in the kdump kernel) that needs to happen. Certainly it would be nice to have kvm disabled in hardware, but if you are proposing using the existing hardware disable I must say that the cure looks much worse than the disease. Certainly you don't want to issue IPIs when kdump()ing. It's not unlikely that the other cpus have interrupts permanently disabled. (we can use NMI IPIs, but that will likely be messy) It looks like the disable function is all of about 20 assembly instructions so I would not have a problem if he had a little inline function we could call that test to see if vmx is enabled and disable it in the case of kexec on panic. The normal polite shutdown. That just looks like asking for trouble. But what happens when the kdump kernel reboots? If it is uniprocessor, it will never have a chance to disable vmx on other cpus. Using acpi reset (now default) works around this on some machines, but not all. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
On Sun, Oct 26, 2008 at 05:07:45PM +0200, Avi Kivity wrote: Eric W. Biederman wrote: snip Is it possible to disable vmx mode before we enable interrrupts in the kdump kernel? You need IPIs to disable vmx on smp. Thank you. Reading your description and taking a quick look at the code in hardware disable it does not appear that there is anything needed (other than restricting ourselves it running uniprocessor in the kdump kernel) that needs to happen. Certainly it would be nice to have kvm disabled in hardware, but if you are proposing using the existing hardware disable I must say that the cure looks much worse than the disease. Certainly you don't want to issue IPIs when kdump()ing. It's not unlikely that the other cpus have interrupts permanently disabled. (we can use NMI IPIs, but that will likely be messy) NMI IPIs are already used on x86 native_machine_crash_shutdown(), so it wouldn't get more messy that it is currently. We just need to add another bit of code to the code that already runs on an NMI handler. My question is: is a notifier chain too much complexity for a sensible piece of code like that? If so, a compile-time hook on that point would be safer, but then it wouldn't work when KVM is compiled as a out-of-tree module. It looks like the disable function is all of about 20 assembly instructions so I would not have a problem if he had a little inline function we could call that test to see if vmx is enabled and disable it in the case of kexec on panic. The normal polite shutdown. That just looks like asking for trouble. But what happens when the kdump kernel reboots? If it is uniprocessor, it will never have a chance to disable vmx on other cpus. Using acpi reset (now default) works around this on some machines, but not all. Good point. My problem was a hang when booting the kdump kernel, but it may also cause problems later, when the kdump kernel reboots. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Avi Kivity [EMAIL PROTECTED] writes: Eric W. Biederman wrote: so reboots don't work. It also assigns some memory to the cpu; if the new kernel isn't aware of it, Not a problem for a kdump kernel, as it lives out of a reserved region of memory. But it is a problem for general kexec. Agreed. We certainly want to cleanly disable vmx on a normal kexec reboot. Is it possible to disable vmx mode before we enable interrrupts in the kdump kernel? You need IPIs to disable vmx on smp. Thank you. Reading your description and taking a quick look at the code in hardware disable it does not appear that there is anything needed (other than restricting ourselves it running uniprocessor in the kdump kernel) that needs to happen. Certainly it would be nice to have kvm disabled in hardware, but if you are proposing using the existing hardware disable I must say that the cure looks much worse than the disease. Certainly you don't want to issue IPIs when kdump()ing. It's not unlikely that the other cpus have interrupts permanently disabled. (we can use NMI IPIs, but that will likely be messy) It looks like the disable function is all of about 20 assembly instructions so I would not have a problem if he had a little inline function we could call that test to see if vmx is enabled and disable it in the case of kexec on panic. The normal polite shutdown. That just looks like asking for trouble. But what happens when the kdump kernel reboots? If it is uniprocessor, it will never have a chance to disable vmx on other cpus. Using acpi reset (now default) works around this on some machines, but not all. Mostly likely any reboot path will trigger software to toggle the reset line on the board. At least that has been my experience, and the reason we don't see many problems when we fail to properly shutdown devices before reboot. If vmx persists across that it would seem to be very broken by design. In any case if reboot fails after a kdump that is a minor thing. Someone can always push the power button or the equivalent. My objections are: on_each_cpu called from after a panic looks like a good way to loose control of the box and never return. Adding a notifier looks like a good way too add functionality onto the kdump path that never gets reviewed for robustness after a kernel crash and thus a good way to remove the usefulness of the kexec on panic kernel path. Eric -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eduardo Habkost [EMAIL PROTECTED] writes: On Sun, Oct 26, 2008 at 05:07:45PM +0200, Avi Kivity wrote: Eric W. Biederman wrote: snip Is it possible to disable vmx mode before we enable interrrupts in the kdump kernel? You need IPIs to disable vmx on smp. Thank you. Reading your description and taking a quick look at the code in hardware disable it does not appear that there is anything needed (other than restricting ourselves it running uniprocessor in the kdump kernel) that needs to happen. Certainly it would be nice to have kvm disabled in hardware, but if you are proposing using the existing hardware disable I must say that the cure looks much worse than the disease. Certainly you don't want to issue IPIs when kdump()ing. It's not unlikely that the other cpus have interrupts permanently disabled. (we can use NMI IPIs, but that will likely be messy) NMI IPIs are already used on x86 native_machine_crash_shutdown(), so it wouldn't get more messy that it is currently. We just need to add another bit of code to the code that already runs on an NMI handler. Yes. And handling of those NMIs is best effort. Nothing fails if they don't actually run. My question is: is a notifier chain too much complexity for a sensible piece of code like that? If so, a compile-time hook on that point would be safer, but then it wouldn't work when KVM is compiled as a out-of-tree module. Well we could fairly easily have a non-modular function that does. if (vmx_present vmx_enabled) { turn_off_vmx(); } Which at first skim looks like it is all of about 10-20 machine instructions. There are a few real places where we need code on the kdump path because there it is not possible to do the work any other way. However we need to think long and hard about that because placing the code anywhere besides in a broken and failing kernel is going to be easier to maintain and more reliable. I oppose an atomic notifier because it makes the review essentially impossible. If any module can come in and register a notifier we can't know what code is running on that code path and we can't be certain the code is safe in an abnormal case to run on that code path. Right now we only need to support vmx on the kdump path because of what appears to be a hardware design bug. Enabling vmx apparently disables standard functions like an INIT IPI. Things like this do happen but they should be rare. Good point. My problem was a hang when booting the kdump kernel, but it may also cause problems later, when the kdump kernel reboots. What was the cause of the hang when booting the kdump kernel? Eric -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
On Thu, Oct 23, 2008 at 10:28:24AM +1100, Simon Horman wrote: On Mon, Oct 20, 2008 at 01:01:32PM -0200, Eduardo Habkost wrote: The following two patches should make kdump work when the kvm-intel module is loaded. We need to disable vmx mode before booting the kdump kernel, so I've introduced a notifier interface where KVM can hook and disable virtualization on all CPUs just before they are halted. It has the same purpose of the KVM reboot notifier that gets executed at kexec-time. But on the kdump case, things are not as simple because the kernel has just crashed. The notifier interface being introduced is x86-specific. I don't know if an arch-independent interface would be more appropriate for this case. It was tested only using kvm-intel. Testing on different machines is welcome. These changes look fine to me from a kexec/kdump point of view. Reviewed-by: Simon Horman [EMAIL PROTECTED] Thanks. Considering they touch both KVM and kexec, which tree would be best way to get them in? (Avi: the patches were sent only to kexec and kvm mailing lists, initially. If it's better to submit them to your address also so it gets on your queue, please let me know) -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
[ Added Andrew Morton, Eric Biederman, Vivek Goyal and Haren Myneni to CC ] On Thu, Oct 23, 2008 at 05:41:29PM -0200, Eduardo Habkost wrote: On Thu, Oct 23, 2008 at 10:28:24AM +1100, Simon Horman wrote: On Mon, Oct 20, 2008 at 01:01:32PM -0200, Eduardo Habkost wrote: The following two patches should make kdump work when the kvm-intel module is loaded. We need to disable vmx mode before booting the kdump kernel, so I've introduced a notifier interface where KVM can hook and disable virtualization on all CPUs just before they are halted. It has the same purpose of the KVM reboot notifier that gets executed at kexec-time. But on the kdump case, things are not as simple because the kernel has just crashed. The notifier interface being introduced is x86-specific. I don't know if an arch-independent interface would be more appropriate for this case. It was tested only using kvm-intel. Testing on different machines is welcome. These changes look fine to me from a kexec/kdump point of view. Reviewed-by: Simon Horman [EMAIL PROTECTED] Thanks. Considering they touch both KVM and kexec, which tree would be best way to get them in? As I understand it, there is no kexec tree as such, rather patches either get picked up by an arch tree or Andrew Morton. I am happy to create and maintain a kexec tree if there is a need. But in this case it seems that using the KVM tree would be best. (Avi: the patches were sent only to kexec and kvm mailing lists, initially. If it's better to submit them to your address also so it gets on your queue, please let me know) I won't speak for Avi, but usually its good to CC the maintainer. -- Simon Horman VA Linux Systems Japan K.K., Sydney, Australia Satellite Office H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Simon Horman [EMAIL PROTECTED] writes: [ Added Andrew Morton, Eric Biederman, Vivek Goyal and Haren Myneni to CC ] On Thu, Oct 23, 2008 at 05:41:29PM -0200, Eduardo Habkost wrote: On Thu, Oct 23, 2008 at 10:28:24AM +1100, Simon Horman wrote: On Mon, Oct 20, 2008 at 01:01:32PM -0200, Eduardo Habkost wrote: The following two patches should make kdump work when the kvm-intel module is loaded. We need to disable vmx mode before booting the kdump kernel, so I've introduced a notifier interface where KVM can hook and disable virtualization on all CPUs just before they are halted. It has the same purpose of the KVM reboot notifier that gets executed at kexec-time. But on the kdump case, things are not as simple because the kernel has just crashed. The notifier interface being introduced is x86-specific. I don't know if an arch-independent interface would be more appropriate for this case. My preference would be to have a magic function call that compiles out when kvm isn't present. This is a code path that is hard to audit and test, and get right. A notifier chain seems to make a proper audit all but impossible. Why do we need to disable vmx mode before booting a normal linux kernel? Is it possible to disable vmx mode before we enable interrrupts in the kdump kernel? Eric -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
On Mon, Oct 20, 2008 at 01:01:32PM -0200, Eduardo Habkost wrote: The following two patches should make kdump work when the kvm-intel module is loaded. We need to disable vmx mode before booting the kdump kernel, so I've introduced a notifier interface where KVM can hook and disable virtualization on all CPUs just before they are halted. It has the same purpose of the KVM reboot notifier that gets executed at kexec-time. But on the kdump case, things are not as simple because the kernel has just crashed. The notifier interface being introduced is x86-specific. I don't know if an arch-independent interface would be more appropriate for this case. It was tested only using kvm-intel. Testing on different machines is welcome. These changes look fine to me from a kexec/kdump point of view. Reviewed-by: Simon Horman [EMAIL PROTECTED] -- Simon Horman VA Linux Systems Japan K.K., Sydney, Australia Satellite Office H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html