Re: [Qemu-devel] Kernel memory allocation debugging with Qemu

2008-02-08 Thread Paul Brook
On Friday 08 February 2008, Blue Swirl wrote:
> On 2/8/08, Paul Brook <[EMAIL PROTECTED]> wrote:
> > > The patch takes a half of the memory and slows down the system. I
> > > think Qemu could be used instead. A channel (IO/MMIO) is created
> > > between the memory allocator in target kernel and Qemu running in the
> > > host. Memory allocator tells the allocated area to Qemu using the
> > > channel. Qemu changes the physical memory mapping for the area to
> > > special memory that will report any reads before writes back to
> > > allocator. Writes change the memory back to standard RAM. The
> > > performance would be comparable to Qemu in general and host kernel +
> > > Qemu only take a few MB of the memory. The system would be directly
> > > usable for other OSes as well.
> >
> > The qemu implementation isn't actually any more space efficient than the
> > in-kernel implementation. You still need the same amount of bookkeeping
> > ram. In both cases it should be possible to reduce the overhead from 1/2
> > to 1/9 by using a bitmask rather than whole bytes.
>
> Qemu would not track all memory, only the regions that kmalloc() have
> given to other kernel that have not yet been written to.

Memory still has the be tracked after it has been written to.  You can only 
stop tracking after the whole page has been written to, and there's no easy 
way to determine when that is.  The kernel actually has better information 
about this because it can replace the clear/copy_page routines.

If you're only trying to track things with page granularity then that's a much 
easier problem.

> > Performance is a less clear. A qemu implementation probably causes less
> > relative slowdown than an in-kernel implementation. However it's still
> > going to be significantly slower than normal qemu.  Remember that any
> > checked access is going to have to go through the slow case in the TLB
> > lookup. Any optimizations that are applicable to one implementation can
> > probably also be applied to the other.
>
> Again, we are not trapping all accesses. The fast case should be used
> for most kernel accesses and all of userland.

Ok. So all the accesses that the in-kernel implementation intercepts.  That's 
obviously a significant number. If it wasn't then performance wouldn't 
matter.

The number of accesses intercepted and amount of bookkeeping required should 
be the same in both cases. The only difference is the runtime overhead when 
an access in intercepted.  qemu goes through the slow-path softmmu routines, 
the in kernel implementation takes a pagefault+singlestep.

Paul




Re: [Qemu-devel] Kernel memory allocation debugging with Qemu

2008-02-08 Thread Blue Swirl
On 2/8/08, Paul Brook <[EMAIL PROTECTED]> wrote:
> > The patch takes a half of the memory and slows down the system. I
> > think Qemu could be used instead. A channel (IO/MMIO) is created
> > between the memory allocator in target kernel and Qemu running in the
> > host. Memory allocator tells the allocated area to Qemu using the
> > channel. Qemu changes the physical memory mapping for the area to
> > special memory that will report any reads before writes back to
> > allocator. Writes change the memory back to standard RAM. The
> > performance would be comparable to Qemu in general and host kernel +
> > Qemu only take a few MB of the memory. The system would be directly
> > usable for other OSes as well.
>
> The qemu implementation isn't actually any more space efficient than the
> in-kernel implementation. You still need the same amount of bookkeeping ram.
> In both cases it should be possible to reduce the overhead from 1/2 to 1/9 by
> using a bitmask rather than whole bytes.

Qemu would not track all memory, only the regions that kmalloc() have
given to other kernel that have not yet been written to.

> Performance is a less clear. A qemu implementation probably causes less
> relative slowdown than an in-kernel implementation. However it's still going
> to be significantly slower than normal qemu.  Remember that any checked
> access is going to have to go through the slow case in the TLB lookup. Any
> optimizations that are applicable to one implementation can probably also be
> applied to the other.

Again, we are not trapping all accesses. The fast case should be used
for most kernel accesses and all of userland.

> Given qemu is significantly slower to start with, and depending on the
> overhead of taking the page fault, it might not end up much better overall. A
> KVM implementation would most likely be slower than the in-kernel.
>
> That said it may be an interesting thing to play with. In practice it's
> probably most useful to generate an interrupt and report back to the guest
> OS, rather than having qemu reports faults directly.

The access could happen when the interrupts are disabled, so a buffer
should be needed. The accesses could also be written to a block device
seen by both Qemu and the kernel, or appear to arrive from a fake
network device.




Re: [Qemu-devel] Kernel memory allocation debugging with Qemu

2008-02-08 Thread Paul Brook
> The patch takes a half of the memory and slows down the system. I
> think Qemu could be used instead. A channel (IO/MMIO) is created
> between the memory allocator in target kernel and Qemu running in the
> host. Memory allocator tells the allocated area to Qemu using the
> channel. Qemu changes the physical memory mapping for the area to
> special memory that will report any reads before writes back to
> allocator. Writes change the memory back to standard RAM. The
> performance would be comparable to Qemu in general and host kernel +
> Qemu only take a few MB of the memory. The system would be directly
> usable for other OSes as well.

The qemu implementation isn't actually any more space efficient than the 
in-kernel implementation. You still need the same amount of bookkeeping ram. 
In both cases it should be possible to reduce the overhead from 1/2 to 1/9 by 
using a bitmask rather than whole bytes.

Performance is a less clear. A qemu implementation probably causes less 
relative slowdown than an in-kernel implementation. However it's still going 
to be significantly slower than normal qemu.  Remember that any checked 
access is going to have to go through the slow case in the TLB lookup. Any 
optimizations that are applicable to one implementation can probably also be 
applied to the other.

Given qemu is significantly slower to start with, and depending on the 
overhead of taking the page fault, it might not end up much better overall. A 
KVM implementation would most likely be slower than the in-kernel.

That said it may be an interesting thing to play with. In practice it's 
probably most useful to generate an interrupt and report back to the guest 
OS, rather than having qemu reports faults directly.

Paul




[Qemu-devel] Kernel memory allocation debugging with Qemu

2008-02-08 Thread Blue Swirl
On KernelTrap there is a story about Linux kernel memory allocation
debugging patch that allows detection of reads from uninitialized
memory (http://kerneltrap.org/Linux/Debugging_With_kmemcheck).

The patch takes a half of the memory and slows down the system. I
think Qemu could be used instead. A channel (IO/MMIO) is created
between the memory allocator in target kernel and Qemu running in the
host. Memory allocator tells the allocated area to Qemu using the
channel. Qemu changes the physical memory mapping for the area to
special memory that will report any reads before writes back to
allocator. Writes change the memory back to standard RAM. The
performance would be comparable to Qemu in general and host kernel +
Qemu only take a few MB of the memory. The system would be directly
usable for other OSes as well.

Similar debugging tool could be used in user space too (instrumenting
libc malloc/free), but that's probably reinventing Valgrind or other
malloc checkers.

The special memory could also report unaligned accesses even on target
where this is normally not detected but not so efficient.