On Mon, Nov 14, 2016 at 01:32:56PM +0800, Dave Young wrote:
> On 11/09/16 at 04:38pm, Laszlo Ersek wrote:
> > On 11/09/16 15:47, Daniel P. Berrange wrote:
> > > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote:
> > >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote:
> > >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote:
> > >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote:
> > >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote:
> > >>>>>> On 11/09/16 11:40, Andrew Jones wrote:
> > >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> > >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash
> > >>>>>>>> utility can still works in case crashed kernel has kaslr enabled.
> > >>>>>>>>
> > >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted 
> > >>>>>>>> messages
> > >>>>>>>> from Dave below:
> > >>>>>>>>
> > >>>>>>>> """
> > >>>>>>>> with virsh dump, there's no way of even knowing that KASLR
> > >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there 
> > >>>>>>>> is no
> > >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the 
> > >>>>>>>> kdump
> > >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value.
> > >>>>>>>> Unless virsh dump can export some basic virtual memory data, which
> > >>>>>>>> they say it can't, I don't see how KASLR can ever be supported.
> > >>>>>>>> """
> > >>>>>>>>
> > >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it
> > >>>>>>>> should be first addressed in qemu. Thus post this query to qemu 
> > >>>>>>>> devel
> > >>>>>>>> list. If this is not correct please let me know.
> > >>>>>>>>
> > >>>>>>>> Could you qemu dump people make it work? Or we can not support 
> > >>>>>>>> virt dump
> > >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled 
> > >>>>>>>> it in x86_64.
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> When the -kernel command line option is used, then it may be 
> > >>>>>>> possible
> > >>>>>>> to extract some information that could be used to supplement the 
> > >>>>>>> memory
> > >>>>>>> dump that dump-guest-memory provides. However, that would be a 
> > >>>>>>> specific
> > >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It 
> > >>>>>>> doesn't
> > >>>>>>> know where it is in the disk image, and it doesn't even know if it's
> > >>>>>>> Linux.
> > >>>>>>>
> > >>>>>>> Is there anything a guest userspace application could probe from 
> > >>>>>>> e.g.
> > >>>>>>> /proc that would work? If so, then the guest agent could gain a new
> > >>>>>>> feature providing that.
> > >>>>>>
> > >>>>>> I fully agree. This is exactly what I suggested too, independently, 
> > >>>>>> in
> > >>>>>> the downstream thread, before arriving at this upstream thread. Let 
> > >>>>>> me
> > >>>>>> quote that email:
> > >>>>>>
> > >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote:
> > >>>>>>> [...] the dump-guest-memory QEMU command supports an option called
> > >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" 
> > >>>>>>> source
> > >>>>>>> file:
> > >>>>>>>
> > >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This 
> > >>>>>>>> allows
> > >>>>>>>> #          using gdb to process the core file.
> > >>>>>>>> #
> > >>>>>>>> #          IMPORTANT: this option can make QEMU allocate several 
> > >>>>>>>> gigabytes
> > >>>>>>>> #                     of RAM. This can happen for a large guest, 
> > >>>>>>>> or a
> > >>>>>>>> #                     malicious guest pretending to be large.
> > >>>>>>>> #
> > >>>>>>>> #          Also, paging=true has the following limitations:
> > >>>>>>>> #
> > >>>>>>>> #             1. The guest may be in a catastrophic state or can 
> > >>>>>>>> have corrupted
> > >>>>>>>> #                memory, which cannot be trusted
> > >>>>>>>> #             2. The guest can be in real-mode even if paging is 
> > >>>>>>>> enabled. For
> > >>>>>>>> #                example, the guest uses ACPI to sleep, and ACPI 
> > >>>>>>>> sleep state
> > >>>>>>>> #                goes in real-mode
> > >>>>>>>> #             3. Currently only supported on i386 and x86_64.
> > >>>>>>>> #
> > >>>>>>>
> > >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons.
> > >>>>>>>
> > >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the
> > >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is
> > >>>>>>> not enlightened about the guest.
> > >>>>>>>
> > >>>>>>> If the additional information you are looking for can be retrieved
> > >>>>>>> within the running Linux guest, using an appropriately privieleged
> > >>>>>>> userspace process, then I would recommend considering an extension 
> > >>>>>>> to
> > >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could
> > >>>>>>> first invoke the guest agent (a process with root privileges running
> > >>>>>>> in the guest) from the host side, through virtio-serial. The new 
> > >>>>>>> guest
> > >>>>>>> agent command would return the information necessary to deal with
> > >>>>>>> KASLR. Then the management layer would initiate the dump like 
> > >>>>>>> always.
> > >>>>>>> Finally, the extra information would be combined with (or placed
> > >>>>>>> beside) the dump file in some way.
> > >>>>>>>
> > >>>>>>> So, this proposal would affect the guest agent and the management
> > >>>>>>> layer (= libvirt).
> > >>>>>>
> > >>>>>> Given that we already dislike "paging=true", enlightening
> > >>>>>> dump-guest-memory with even more guest-specific insight is the wrong
> > >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent.
> > >>>>>
> > >>>>> If you're trying to debug a hung/panicked guest, then using a guest
> > >>>>> agent to fetch info is a complete non-starter as it'll be dead.
> > 
> > Yes, I realized this a while after posting...
> > 
> > >>>> So don't wait. Management software can make this query immediately
> > >>>> after the guest agent goes live. The information needed won't change.
> > 
> > ... and then figured this would solve the problem.
> > 
> > >>> That doesn't help with trying to diagnose a crash during boot up, since
> > >>> the guest agent isn't running till fairly late. I'm also concerned that
> > >>> the QEMU guest agent is likely to be far from widely deployed in guests,
> > 
> > I have no hard data, but from the recent Fedora and RHEL-7 guest
> > installations I've done, it seems like qga is installed automatically.
> > (Not sure if that's because Anaconda realizes it's installing the OS in
> > a VM.) Once I made sure there was an appropriate virtio-serial config in
> > the domain XMLs, I could talk to the agents (mainly for fstrim's sake)
> > immediately.
> > 
> > >>> so reliance on the guest agent will mean the dump facility is no longer
> > >>> reliably available.
> > >>>
> > >>
> > >> It'd still be reliably available and useable during early boot, just like
> > >> it is now, for kernels that don't use KASLR. This proposal is only
> > >> attempting to *also* address KASLR kernels, for which there is currently
> > >> no support whatsoever. Call it a best-effort.
> > >>
> > >> Of course we can get support for [probably] early boot and
> > >> guest-agent-less guests using KASLR too if we introduce a paravirt
> > >> solution, requiring guest kernel and KVM changes. Is it worth it?
> > > 
> > > There's a standard for persistent storage that is intended to allow
> > > the kernel to dump out data at time of crash:
> > > 
> > >    https://lwn.net/Articles/434821/
> > > 
> > > and there's some recent patches to provide a QEMU backend. Could we
> > > leverage that facility to get the data we need from the guest kernel ?
> > > 
> > > Instead of only using pstore at time of crash, the kernel could see
> > > that its running on KVM, and write out the paging data to pstore. So
> > > when QEMU later generates a core dump, it can grab the corresponding
> > > data from pstore backend ?
> > > 
> > > Still requires an extra device, to be configured, but at lesat we
> > > would not have to invent yet another paravirt device ourselves, just
> > > use the existing framework.
> > 
> > Not disagreeing, I'd just like to point out that the kernel can also
> > crash before the extra device (the pstore driver) is configured
> > (especially if the driver is built as a module).
> 
> Boot phase crash is also a problem for kdump, but hopefully the boot
> phase crash will be found early and get fixed early. The run time
> problems are harder, it will still be helpful.
> 
> I'm not a virt expert, but from my feeling comparint guest agent and
> pstore I would vote for guest agent, it is ready to work on now, no?
> For pstore I'm not sure how to make a pstore device for all guests. I
> know uefi guest can use its nvram, but introducing some general pstore
> sounds hard..

There's already patches posted to create a virtio-pstore device for
QEMU, which is what led me to suggest this as an option:

  https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00381.html

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

Reply via email to