On 10/16/2012 11:52 PM, Paul Mackerras wrote:
> On Tue, Oct 16, 2012 at 03:06:33PM +0200, Avi Kivity wrote:
>> On 10/16/2012 01:58 PM, Paul Mackerras wrote:
>> > On Tue, Oct 16, 2012 at 12:06:58PM +0200, Avi Kivity wrote:
>> >> Does/should the fd support O_NONBLOCK and poll? (=waiting for an entry
>> >> to change).
>> > 
>> > No.
>> 
>> This forces userspace to dedicate a thread for the HPT.
> 
> Why? Reads never block in any case.

Ok.  This parallels KVM_GET_DIRTY_LOG.

>> 
>> I meant the internal data structure that holds HPT entries.
> 
> Oh, that's just an array, and userspace already knows how big it is.
> 
>> I guess I don't understand the index.  Do we expect changes to be in
>> contiguous ranges?  And invalid entries to be contiguous as well?  That
>> doesn't fit with how hash tables work.  Does the index represent the
>> position of the entry within the table, or something else?
> 
> The index is just the position in the array.  Typically, in each group
> of 8 it will tend to be the low-numbered ones that are valid, since
> creating an entry usually uses the first empty slot.  So I expect that
> on the first pass, most of the records will represent 8 HPTEs.  On
> subsequent passes, probably most records will represent a single HPTE.

So it's a form of RLE compression.  Ok.

>> 
>> 16MiB is transferred in ~0.15 sec on GbE, much faster with 10GbE.  Does
>> it warrant a live migration protocol?
> 
> The qemu people I talked to seemed to think so.
> 
>> > Because it is a hash table, updates tend to be scattered throughout
>> > the whole table, which is another reason why per-page dirty tracking
>> > and updates would be pretty inefficient.
>> 
>> This suggests a stream format that includes the index in every entry.
> 
> That would amount to dropping the n_valid and n_invalid fields from
> the current header format.  That would be less efficient for the
> initial pass (assuming we achieve an average n_valid of at least 2 on
> the initial pass), and probably less efficient for the incremental
> updates, since a newly-invalidated entry would have to be represented
> as 16 zero bytes rather than just an 8-byte header with n_valid=0 and
> n_invalid=1.  I'm assuming here that the initial pass would omit
> invalid entries.

I agree.  But let's have some measurements to make sure.

> 
>> > 
>> > As for the change rate, it depends on the application of course, but
>> > basically every time the guest changes a PTE in its Linux page tables
>> > we do the corresponding change to the corresponding HPT entry, so the
>> > rate can be quite high.  Workloads that do a lot of fork, exit, mmap,
>> > exec, etc. have a high rate of HPT updates.
>> 
>> If the rate is high enough, then there's no point in a live update.
> 
> True, but doesn't that argument apply to memory pages as well?

In some cases it does.  The question is what happens in practice.  If
you migrate a kernel build, how many entries are sent in the guest
stopped phase?


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to