On 04.08.2010, at 17:48, Gleb Natapov wrote:

> On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote:
>> 
>> On 04.08.2010, at 17:25, Gleb Natapov wrote:
>> 
>>> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
>>>> On 08/04/2010 09:51 AM, David S. Ahern wrote:
>>>>> 
>>>>> On 08/03/10 12:43, Avi Kivity wrote:
>>>>>> libguestfs does not depend on an x86 architectural feature.
>>>>>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
>>>>>> discourage people from depending on this interface for production use.
>>>>> That is a feature of qemu - and an important one to me as well. Why
>>>>> should it be discouraged? You end up at the same place -- a running
>>>>> kernel and in-ram filesystem; why require going through a bootloader
>>>>> just because the hardware case needs it?
>>>> 
>>>> It's smoke and mirrors.  We're still providing a boot loader it's
>>>> just a little tiny one that we've written soley for this purpose.
>>>> 
>>>> And it works fine for production use.  The question is whether we
>>>> ought to be aggressively optimizing it for large initrd sizes.  To
>>>> be honest, after a lot of discussion of possibilities, I've come to
>>>> the conclusion that it's just not worth it.
>>>> 
>>>> There are better ways like using string I/O and optimizing the PIO
>>>> path in the kernel.  That should cut down the 1s slow down with a
>>>> 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
>>>> further off the initrd load is just not worth it using the current
>>>> model.
>>>> 
>>> The slow down is not 1s any more. String PIO emulation had many bugs
>>> that were fixed in 2.6.35. I verified how much time it took to load 100M
>>> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
>>> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
>>> that was already committed make it 20s. I have some code prototype that
>>> makes it 11s. I don't see how we can get below that, surely not back to
>>> ~2-3sec.
>> 
>> What exactly is the reason for the slowdown? It can't be only boundary and 
>> permission checks, right?
>> 
>> 
> The big part of slowdown right now is that write into memory is done
> for each byte. It means for each byte we call kvm_write_guest() and
> kvm_mmu_pte_write(). The second call is needed in case memory, instruction
> is trying to write to, is shadowed. Previously we didn't checked for
> that at all. This can be mitigated by introducing write cache and do
> combined writes into the memory and unshadow the page if there is more
> then one write into it. This optimization saves ~10secs. Currently string

Ok, so you tackled that bit already.

> emulation enter guest from time to time to check if event injection is
> needed and read from userspace is done in 1K chunks, not 4K like it was,
> but when I made reads to be 4K and disabled guest reentry I haven't seen
> any speed improvements worth talking about.

So what are we wasting those 10 seconds on then? Does perf tell you anything 
useful?


Alex


Reply via email to