On 11/09/16 15:47, Daniel P. Berrange wrote: > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote: >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote: >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: >>>>>> On 11/09/16 11:40, Andrew Jones wrote: >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash >>>>>>>> utility can still works in case crashed kernel has kaslr enabled. >>>>>>>> >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted >>>>>>>> messages >>>>>>>> from Dave below: >>>>>>>> >>>>>>>> """ >>>>>>>> with virsh dump, there's no way of even knowing that KASLR >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there is >>>>>>>> no >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value. >>>>>>>> Unless virsh dump can export some basic virtual memory data, which >>>>>>>> they say it can't, I don't see how KASLR can ever be supported. >>>>>>>> """ >>>>>>>> >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it >>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel >>>>>>>> list. If this is not correct please let me know. >>>>>>>> >>>>>>>> Could you qemu dump people make it work? Or we can not support virt >>>>>>>> dump >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in >>>>>>>> x86_64. >>>>>>>> >>>>>>> >>>>>>> When the -kernel command line option is used, then it may be possible >>>>>>> to extract some information that could be used to supplement the memory >>>>>>> dump that dump-guest-memory provides. However, that would be a specific >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't >>>>>>> know where it is in the disk image, and it doesn't even know if it's >>>>>>> Linux. >>>>>>> >>>>>>> Is there anything a guest userspace application could probe from e.g. >>>>>>> /proc that would work? If so, then the guest agent could gain a new >>>>>>> feature providing that. >>>>>> >>>>>> I fully agree. This is exactly what I suggested too, independently, in >>>>>> the downstream thread, before arriving at this upstream thread. Let me >>>>>> quote that email: >>>>>> >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote: >>>>>>> [...] the dump-guest-memory QEMU command supports an option called >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source >>>>>>> file: >>>>>>> >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This >>>>>>>> allows >>>>>>>> # using gdb to process the core file. >>>>>>>> # >>>>>>>> # IMPORTANT: this option can make QEMU allocate several >>>>>>>> gigabytes >>>>>>>> # of RAM. This can happen for a large guest, or a >>>>>>>> # malicious guest pretending to be large. >>>>>>>> # >>>>>>>> # Also, paging=true has the following limitations: >>>>>>>> # >>>>>>>> # 1. The guest may be in a catastrophic state or can have >>>>>>>> corrupted >>>>>>>> # memory, which cannot be trusted >>>>>>>> # 2. The guest can be in real-mode even if paging is >>>>>>>> enabled. For >>>>>>>> # example, the guest uses ACPI to sleep, and ACPI sleep >>>>>>>> state >>>>>>>> # goes in real-mode >>>>>>>> # 3. Currently only supported on i386 and x86_64. >>>>>>>> # >>>>>>> >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons. >>>>>>> >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is >>>>>>> not enlightened about the guest. >>>>>>> >>>>>>> If the additional information you are looking for can be retrieved >>>>>>> within the running Linux guest, using an appropriately privieleged >>>>>>> userspace process, then I would recommend considering an extension to >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could >>>>>>> first invoke the guest agent (a process with root privileges running >>>>>>> in the guest) from the host side, through virtio-serial. The new guest >>>>>>> agent command would return the information necessary to deal with >>>>>>> KASLR. Then the management layer would initiate the dump like always. >>>>>>> Finally, the extra information would be combined with (or placed >>>>>>> beside) the dump file in some way. >>>>>>> >>>>>>> So, this proposal would affect the guest agent and the management >>>>>>> layer (= libvirt). >>>>>> >>>>>> Given that we already dislike "paging=true", enlightening >>>>>> dump-guest-memory with even more guest-specific insight is the wrong >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent. >>>>> >>>>> If you're trying to debug a hung/panicked guest, then using a guest >>>>> agent to fetch info is a complete non-starter as it'll be dead.
Yes, I realized this a while after posting... >>>> So don't wait. Management software can make this query immediately >>>> after the guest agent goes live. The information needed won't change. ... and then figured this would solve the problem. >>> That doesn't help with trying to diagnose a crash during boot up, since >>> the guest agent isn't running till fairly late. I'm also concerned that >>> the QEMU guest agent is likely to be far from widely deployed in guests, I have no hard data, but from the recent Fedora and RHEL-7 guest installations I've done, it seems like qga is installed automatically. (Not sure if that's because Anaconda realizes it's installing the OS in a VM.) Once I made sure there was an appropriate virtio-serial config in the domain XMLs, I could talk to the agents (mainly for fstrim's sake) immediately. >>> so reliance on the guest agent will mean the dump facility is no longer >>> reliably available. >>> >> >> It'd still be reliably available and useable during early boot, just like >> it is now, for kernels that don't use KASLR. This proposal is only >> attempting to *also* address KASLR kernels, for which there is currently >> no support whatsoever. Call it a best-effort. >> >> Of course we can get support for [probably] early boot and >> guest-agent-less guests using KASLR too if we introduce a paravirt >> solution, requiring guest kernel and KVM changes. Is it worth it? > > There's a standard for persistent storage that is intended to allow > the kernel to dump out data at time of crash: > > https://lwn.net/Articles/434821/ > > and there's some recent patches to provide a QEMU backend. Could we > leverage that facility to get the data we need from the guest kernel ? > > Instead of only using pstore at time of crash, the kernel could see > that its running on KVM, and write out the paging data to pstore. So > when QEMU later generates a core dump, it can grab the corresponding > data from pstore backend ? > > Still requires an extra device, to be configured, but at lesat we > would not have to invent yet another paravirt device ourselves, just > use the existing framework. Not disagreeing, I'd just like to point out that the kernel can also crash before the extra device (the pstore driver) is configured (especially if the driver is built as a module). Laszlo