On Tue, 18 Sep 2012 14:41:53 +0200 Jan Kiszka <jan.kis...@siemens.com> wrote:
> On 2012-09-18 14:23, Markus Armbruster wrote: > > Jan Kiszka <jan.kis...@siemens.com> writes: > > > >> On 2012-09-18 03:52, Wen Congyang wrote: > >>> At 09/18/2012 01:56 AM, Luiz Capitulino Wrote: > >>>> Hi Wen, > >>>> > >>>> We've re-reviewed the dump-guest-memory command and found some > >>>> possible issues with the -p option. > >>>> > >>>> The main issue is that it seems possible for a malicious guest to set > >>>> page tables in a way that we allocate a MemoryMapping structure for > >>>> each possible PTE. If IA-32e paging is used, this could lead to the > >>>> allocation of dozens of gigabytes by qemu. > >>>> > >>>> Of course that this is not expected for the regular case, where a > >>>> MemoryMapping allocation can be skipped for several reasons (I/O memory, > >>>> page not present, contiguous/in same range addresses etc), but the > >>>> point is what a malicious guest can do. > >>>> > >>>> Another problem is that the -p option seems to be broken for SMP guests. > >>>> The problem is in qemu_get_guest_memory_mapping(): > >>>> > >>>> first_paging_enabled_cpu = find_paging_enabled_cpu(first_cpu); > >>>> if (first_paging_enabled_cpu) { > >>>> for (env = first_paging_enabled_cpu; env != NULL; env = > >>>> env->next_cpu) { > >>>> ret = cpu_get_memory_mapping(list, env); > >>>> if (ret < 0) { > >>>> return -1; > >>>> } > >>>> } > >>>> return 0; > >>>> } > >>>> > >>>> This looks for the first vCPU with paging enabled, and then assumes > >>>> that all the following vCPUs also have paging enabled. How does this > >>>> hold? > >> > >> cpu_get_memory_mapping re-validates that paging is one. In fact, > >> cpu_get_memory_mapping should handle both cases so that the generic code > >> need not worry about paging on/off. > > > > The loop Luiz quoted is confusing. > > > > Actually, the whole function is confusing. Here's how I understand it: > > > > if there is a CPU that has paging enabled > > there is a proper prefix of env whose members don't have paging > > enabled; ignore them all [WTF#1] > > for all members of env not in that prefix ("the suffix"): > > get memory mapping for a CPU with paging enabled [WTF#2], > > and merge it into list > > else > > get memory mapping for ram_list, > > and merge it into list > > > > WTF#1: Why is it okay to ignore leading CPUs with paging disabled, but > > only if there's at least one CPU with paging enabled? > > > > WTF#2: What if a CPU in the suffix doesn't have paging enabled? Oh, the > > arch-specific function to get its memory map is expected to do nothing > > then. > > > > Bonus WTF#3: What if a guest enables/disables paging between > > find_paging_enabled_cpu() and the loop? What if it changes page tables > > while we walk them? > > In fact, the dump should be taken in a consistent state, means it should > run synchronously /wrt at least the CPU we refer to. So we need to run > the dump over the VCPU thread or with that VCPU stopped. This command stops all vCPUs, so yes, you're right here. Any idea about WTF#1? > > WTF is this function supposed to do? > > Associate virtual and physical addresses for the whole machine at a > given time. The picture is not fully consistent as we cannot express yet > that different CPUs have different views on memory. IIRC, the first view > is taken, the rests are dropped - correct me if I'm wrong, Wen. > > > > >>>> Assuming that this last issue is fixable (ie. we can make the -p > >>>> option work well with SMP guests), we should at least document that > >>>> -p can make QEMU allocates lots of memory and end up being killed > >>>> by the OS. > >>>> > >>>> However, I also think that we should consider if having the -p > >>>> feature is really worth it. It's a complex feature and has a number > >>>> of limitations*. If libvirt doesn't use this, dropping it shouldn't > >>>> be a big deal (we can return an error when -p is used). > >> > >> libvirt should surely not be the only reference for debugging features. > > > > No, it's just a user, albeit an important one. We don't break known > > users cavalierly. > > That is not what is being discussed here. It's about dropping a feature > because that one user doesn't expose it. No, let me clarify. First, what's being discussed is whether or not to drop an unsafe feature. Having an important user relying on the feature would be a strong indication that feature should not be dropped. As it turns out, libvirt is the only known open-source project that is making use of this feature (although they don't use the -p option). We'd give the same importance to any other project that makes themselves heard. Of course, the rule is never to drop anything. There are exceptions though, and this is one of them as it puts the host in danger. > >>>> > >>>> * The issues discussed in this email plus the fact that the guest > >>>> memory may be corrupted, and the guest may be in real-mode even > >>>> when paging is enabled > >>>> > >>> > >>> Yes, there are some limitations with this option. Jan said that he > >>> always use gdb to deal with vmcore, so he needs such information. > >> > >> The point is to overcome the focus on Linux-only dump processing tools. > > > > While I don't care for supporting alternate dump processing tools > > myself, I certainly don't mind supporting them, as long as the code > > satisfies basic safety and reliability requirements. > > > > This code doesn't, as far as I can tell. > > It works, thought not under all circumstances. That would be fine if the worst case would a mangled dump file. > > If that's correct, we should either rip it out until a satisfactory > > replacemnt is available, or at least document -p as unsafe and > > unreliable debugging feature (master & stable). > > > >> I'm sure the memory allocation can be avoided by writing out any found > >> virt->phys mapping directly to the vmcore file. We know where physical > >> RAM will be, we only need the corresponding virtual addresses - IIUC. So > >> first prepare the section according to the guest's RAM size and then, > >> once we identified a page while walking the tables carefully, seek to > >> that file position and write to it. > > > > Sounds like a non-trivial change from the current code. Makes me lean > > towards ripping it out. > > We are in early 1.3 cycle. There is surely no need to rip something > useful out based on these arguments. I'd agree if we fix it and the fix is backportable to -stable. Although I have mentioned documenting it as a possible solution, I'm now unsure if this is feasible, as we just can't allow qemu to screw up the host. Btw, I've thought about another solution, although it can turn out to be as bad as dropping the feature. Jan, do you intend to use this feature through the mngt stack or do you use it for debugging purposes only? If it's the latter case, then maybe we could rip -p from dump-guest-memory and: 1. Add a new qmp command namespace for unstable/debugging-only commands. This should be done through a vendor extension, so maybe something like qemu.org- 2. Add qemu.org-dump-guest-memory-gdb, which does what -p does today