On Mon, Nov 14, 2016 at 01:32:56PM +0800, Dave Young wrote: > On 11/09/16 at 04:38pm, Laszlo Ersek wrote: > > On 11/09/16 15:47, Daniel P. Berrange wrote: > > > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote: > > >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote: > > >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: > > >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: > > >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > > >>>>>> On 11/09/16 11:40, Andrew Jones wrote: > > >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory > > >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash > > >>>>>>>> utility can still works in case crashed kernel has kaslr enabled. > > >>>>>>>> > > >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted > > >>>>>>>> messages > > >>>>>>>> from Dave below: > > >>>>>>>> > > >>>>>>>> """ > > >>>>>>>> with virsh dump, there's no way of even knowing that KASLR > > >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there > > >>>>>>>> is no > > >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the > > >>>>>>>> kdump > > >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value. > > >>>>>>>> Unless virsh dump can export some basic virtual memory data, which > > >>>>>>>> they say it can't, I don't see how KASLR can ever be supported. > > >>>>>>>> """ > > >>>>>>>> > > >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it > > >>>>>>>> should be first addressed in qemu. Thus post this query to qemu > > >>>>>>>> devel > > >>>>>>>> list. If this is not correct please let me know. > > >>>>>>>> > > >>>>>>>> Could you qemu dump people make it work? Or we can not support > > >>>>>>>> virt dump > > >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled > > >>>>>>>> it in x86_64. > > >>>>>>>> > > >>>>>>> > > >>>>>>> When the -kernel command line option is used, then it may be > > >>>>>>> possible > > >>>>>>> to extract some information that could be used to supplement the > > >>>>>>> memory > > >>>>>>> dump that dump-guest-memory provides. However, that would be a > > >>>>>>> specific > > >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It > > >>>>>>> doesn't > > >>>>>>> know where it is in the disk image, and it doesn't even know if it's > > >>>>>>> Linux. > > >>>>>>> > > >>>>>>> Is there anything a guest userspace application could probe from > > >>>>>>> e.g. > > >>>>>>> /proc that would work? If so, then the guest agent could gain a new > > >>>>>>> feature providing that. > > >>>>>> > > >>>>>> I fully agree. This is exactly what I suggested too, independently, > > >>>>>> in > > >>>>>> the downstream thread, before arriving at this upstream thread. Let > > >>>>>> me > > >>>>>> quote that email: > > >>>>>> > > >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote: > > >>>>>>> [...] the dump-guest-memory QEMU command supports an option called > > >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" > > >>>>>>> source > > >>>>>>> file: > > >>>>>>> > > >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This > > >>>>>>>> allows > > >>>>>>>> # using gdb to process the core file. > > >>>>>>>> # > > >>>>>>>> # IMPORTANT: this option can make QEMU allocate several > > >>>>>>>> gigabytes > > >>>>>>>> # of RAM. This can happen for a large guest, > > >>>>>>>> or a > > >>>>>>>> # malicious guest pretending to be large. > > >>>>>>>> # > > >>>>>>>> # Also, paging=true has the following limitations: > > >>>>>>>> # > > >>>>>>>> # 1. The guest may be in a catastrophic state or can > > >>>>>>>> have corrupted > > >>>>>>>> # memory, which cannot be trusted > > >>>>>>>> # 2. The guest can be in real-mode even if paging is > > >>>>>>>> enabled. For > > >>>>>>>> # example, the guest uses ACPI to sleep, and ACPI > > >>>>>>>> sleep state > > >>>>>>>> # goes in real-mode > > >>>>>>>> # 3. Currently only supported on i386 and x86_64. > > >>>>>>>> # > > >>>>>>> > > >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons. > > >>>>>>> > > >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the > > >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is > > >>>>>>> not enlightened about the guest. > > >>>>>>> > > >>>>>>> If the additional information you are looking for can be retrieved > > >>>>>>> within the running Linux guest, using an appropriately privieleged > > >>>>>>> userspace process, then I would recommend considering an extension > > >>>>>>> to > > >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could > > >>>>>>> first invoke the guest agent (a process with root privileges running > > >>>>>>> in the guest) from the host side, through virtio-serial. The new > > >>>>>>> guest > > >>>>>>> agent command would return the information necessary to deal with > > >>>>>>> KASLR. Then the management layer would initiate the dump like > > >>>>>>> always. > > >>>>>>> Finally, the extra information would be combined with (or placed > > >>>>>>> beside) the dump file in some way. > > >>>>>>> > > >>>>>>> So, this proposal would affect the guest agent and the management > > >>>>>>> layer (= libvirt). > > >>>>>> > > >>>>>> Given that we already dislike "paging=true", enlightening > > >>>>>> dump-guest-memory with even more guest-specific insight is the wrong > > >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent. > > >>>>> > > >>>>> If you're trying to debug a hung/panicked guest, then using a guest > > >>>>> agent to fetch info is a complete non-starter as it'll be dead. > > > > Yes, I realized this a while after posting... > > > > >>>> So don't wait. Management software can make this query immediately > > >>>> after the guest agent goes live. The information needed won't change. > > > > ... and then figured this would solve the problem. > > > > >>> That doesn't help with trying to diagnose a crash during boot up, since > > >>> the guest agent isn't running till fairly late. I'm also concerned that > > >>> the QEMU guest agent is likely to be far from widely deployed in guests, > > > > I have no hard data, but from the recent Fedora and RHEL-7 guest > > installations I've done, it seems like qga is installed automatically. > > (Not sure if that's because Anaconda realizes it's installing the OS in > > a VM.) Once I made sure there was an appropriate virtio-serial config in > > the domain XMLs, I could talk to the agents (mainly for fstrim's sake) > > immediately. > > > > >>> so reliance on the guest agent will mean the dump facility is no longer > > >>> reliably available. > > >>> > > >> > > >> It'd still be reliably available and useable during early boot, just like > > >> it is now, for kernels that don't use KASLR. This proposal is only > > >> attempting to *also* address KASLR kernels, for which there is currently > > >> no support whatsoever. Call it a best-effort. > > >> > > >> Of course we can get support for [probably] early boot and > > >> guest-agent-less guests using KASLR too if we introduce a paravirt > > >> solution, requiring guest kernel and KVM changes. Is it worth it? > > > > > > There's a standard for persistent storage that is intended to allow > > > the kernel to dump out data at time of crash: > > > > > > https://lwn.net/Articles/434821/ > > > > > > and there's some recent patches to provide a QEMU backend. Could we > > > leverage that facility to get the data we need from the guest kernel ? > > > > > > Instead of only using pstore at time of crash, the kernel could see > > > that its running on KVM, and write out the paging data to pstore. So > > > when QEMU later generates a core dump, it can grab the corresponding > > > data from pstore backend ? > > > > > > Still requires an extra device, to be configured, but at lesat we > > > would not have to invent yet another paravirt device ourselves, just > > > use the existing framework. > > > > Not disagreeing, I'd just like to point out that the kernel can also > > crash before the extra device (the pstore driver) is configured > > (especially if the driver is built as a module). > > Boot phase crash is also a problem for kdump, but hopefully the boot > phase crash will be found early and get fixed early. The run time > problems are harder, it will still be helpful. > > I'm not a virt expert, but from my feeling comparint guest agent and > pstore I would vote for guest agent, it is ready to work on now, no? > For pstore I'm not sure how to make a pstore device for all guests. I > know uefi guest can use its nvram, but introducing some general pstore > sounds hard..
There's already patches posted to create a virtio-pstore device for QEMU, which is what led me to suggest this as an option: https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00381.html Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|