Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Avi Kivity
On 02/02/2012 04:44 PM, Takuya Yoshikawa wrote:
> Avi Kivity  wrote:
>
> > > I have one concern about correctness issue though:
> > >
> > > concurrent rmap write protection may not be safe due to
> > > delayed tlb flush ... cannot happen?
> > 
> > What do you mean by concurrent rmap write protection?
> > 
>
> Not sure, but other codes like:
>
> - mmu_sync_children()
>   for_each_sp(pages, sp, parents, i)
>   protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
>
>   if (protected)
>   kvm_flush_remote_tlbs(vcpu->kvm);
>
> - kvm_mmu_get_page()
>   if (rmap_write_protect(vcpu->kvm, gfn))
>   kvm_flush_remote_tlbs(vcpu->kvm);
>
> I just wondered what can happen if GET_DIRTY_LOG is being processed
> behind these processing?

It's a bug.  If the flush happens outside the spinlock, then one of the
callers can return before it is assured the tlb is flushed.

  A B

  spin_lock
  clear pte.w
  spin_unlock
spin_lock
pte.w already clear
spin_unlock
skip flush
return
  flush


>
>
> They may find nothing to write protect and won't do kvm_flush_remote_tlbs()
> if the gfn has been already protected by GET_DIRTY_LOG.
>
> But GET_DIRTY_LOG may still be busy write protecting other pages and
> others can return before.  (My code releases mmu_lock to not include
> __put_user() in the critical section.)
>
> I am not still enough familier with these code yet.

It's actually an advantage, since you don't have any assumptions on how
the code works.

> (maybe empty concern)

Nope, good catch of this subtle bug.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Takuya Yoshikawa
Avi Kivity  wrote:

> > I have one concern about correctness issue though:
> >
> > concurrent rmap write protection may not be safe due to
> > delayed tlb flush ... cannot happen?
> 
> What do you mean by concurrent rmap write protection?
> 

Not sure, but other codes like:

- mmu_sync_children()
for_each_sp(pages, sp, parents, i)
protected |= rmap_write_protect(vcpu->kvm, sp->gfn);

if (protected)
kvm_flush_remote_tlbs(vcpu->kvm);

- kvm_mmu_get_page()
if (rmap_write_protect(vcpu->kvm, gfn))
kvm_flush_remote_tlbs(vcpu->kvm);

I just wondered what can happen if GET_DIRTY_LOG is being processed
behind these processing?

They may find nothing to write protect and won't do kvm_flush_remote_tlbs()
if the gfn has been already protected by GET_DIRTY_LOG.

But GET_DIRTY_LOG may still be busy write protecting other pages and
others can return before.  (My code releases mmu_lock to not include
__put_user() in the critical section.)

I am not still enough familier with these code yet.
(maybe empty concern)

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Avi Kivity
On 02/02/2012 12:40 PM, Takuya Yoshikawa wrote:
>
> I have one concern about correctness issue though:
>
> concurrent rmap write protection may not be safe due to
> delayed tlb flush ... cannot happen?

What do you mean by concurrent rmap write protection?


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Takuya Yoshikawa

(2012/02/02 19:21), Avi Kivity wrote:



I used "unsigned int" just because I wanted to use the current
atomic_clear_mask() as is.

We need to implement atomic_clear_mask_long() or use ...


If we use cmpxchg8b/cmpxchg16b then this won't fit with the
atomic_*_long family.



OK, I will try.


I have one concern about correctness issue though:

concurrent rmap write protection may not be safe due to
delayed tlb flush ... cannot happen?


Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Avi Kivity
On 02/02/2012 12:21 PM, Takuya Yoshikawa wrote:
> (2012/02/02 19:10), Avi Kivity wrote:
>
>>>
>>> =
>>> # of dirty pages:  kvm.git (ns),  with this patch (ns)
>>> 1: 102,077 ns  10,105 ns
>>> 2:  47,197 ns   9,395 ns
>>> 4:  43,563 ns   9,938 ns
>>> 8:  41,239 ns  10,618 ns
>>> 16: 42,988 ns  12,299 ns
>>> 32: 45,503 ns  14,298 ns
>>> 64: 50,915 ns  19,895 ns
>>> 128:61,087 ns  29,260 ns
>>> 256:81,007 ns  49,023 ns
>>> 512:   132,776 ns  86,670 ns
>>> 1024:  939,299 ns 131,496 ns
>>> 2048:  992,209 ns 250,429 ns
>>> 4096:  891,809 ns 479,280 ns
>>> 8192:1,027,280 ns 906,971 ns
>>> (until now pretty good)
>>>
>>> (ah, for every 32-bit atomic clear mask ...)
>>> 16384:   1,270,972 ns   6,661,741 ns//  1  1  1 ...  1
>>> 32768:   1,581,335 ns   9,673,985 ns//  ...
>>> 65536:   2,161,604 ns  11,466,134 ns//  ...
>>> 131072:  3,253,027 ns  13,412,954 ns//  ...
>>> 262144:  5,663,002 ns  16,309,924 ns// 31 31 31 ... 31
>>> =
>>
>> On a 64-bit host, this will be twice as fast.  Or if we use cmpxchg16b,
>> and there are no surprises, four times as fast.  It will still be slower
>> than the original, but by a smaller margin.
>
> Yes.
>
> I used "unsigned int" just because I wanted to use the current
> atomic_clear_mask() as is.
>
> We need to implement atomic_clear_mask_long() or use ...

If we use cmpxchg8b/cmpxchg16b then this won't fit with the
atomic_*_long family.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Takuya Yoshikawa

(2012/02/02 19:10), Avi Kivity wrote:



=
# of dirty pages:  kvm.git (ns),  with this patch (ns)
1: 102,077 ns  10,105 ns
2:  47,197 ns   9,395 ns
4:  43,563 ns   9,938 ns
8:  41,239 ns  10,618 ns
16: 42,988 ns  12,299 ns
32: 45,503 ns  14,298 ns
64: 50,915 ns  19,895 ns
128:61,087 ns  29,260 ns
256:81,007 ns  49,023 ns
512:   132,776 ns  86,670 ns
1024:  939,299 ns 131,496 ns
2048:  992,209 ns 250,429 ns
4096:  891,809 ns 479,280 ns
8192:1,027,280 ns 906,971 ns
(until now pretty good)

(ah, for every 32-bit atomic clear mask ...)
16384:   1,270,972 ns   6,661,741 ns//  1  1  1 ...  1
32768:   1,581,335 ns   9,673,985 ns//  ...
65536:   2,161,604 ns  11,466,134 ns//  ...
131072:  3,253,027 ns  13,412,954 ns//  ...
262144:  5,663,002 ns  16,309,924 ns// 31 31 31 ... 31
=


On a 64-bit host, this will be twice as fast.  Or if we use cmpxchg16b,
and there are no surprises, four times as fast.  It will still be slower
than the original, but by a smaller margin.


Yes.

I used "unsigned int" just because I wanted to use the current
atomic_clear_mask() as is.

We need to implement atomic_clear_mask_long() or use ...





Yeah.  But I think we should switch to srcu-less dirty logs regardless.
Here are you numbers, but normalized by the number of dirty pages.


Thanks,

I can prepare the official patch series then, of course with more test.


Takuya



dirty pagesold (ns/page)new (ns/page)
110207710105
2235994698
4108912485
851551327
162687769
321422447
64796311
128477229
256316191
512259169
1024917128
2048484122
4096218117
8192125111
1638478407
3276848295
6553633175
13107225102
2621442262


Your worst case, when considering a reasonable number of dirty pages, is
407ns/page, which is still lower than what userspace will actually do to
process the page, so it's reasonable.  The old method is often a lot
worse than your worst case, by this metric.





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Avi Kivity
On 02/02/2012 07:46 AM, Takuya Yoshikawa wrote:
> Avi Kivity  wrote:
>
> > >> That'll be great, numbers are better than speculation.
> > >>
> > >
> > >
> > > Yes, I already have some good numbers to show (and some patches).
> > 
> > Looking forward.
>
> I made a patch to see if Avi's suggestion of getting rid of srcu
> update for dirty logging is practical;  tested with my unit-test.
>
> (I used a function to write protect a range of pages using rmap,
>  which is itself useful for optimizing the current code.)
>
> 1. test result
>
> on 32bit host (core i3 box)   // just for the unit-test ...
> slot size: 256K pages (1GB memory)
>
>
> Measured by dirty-log-perf (executed only once for each case)
>
> Note: dirty pages are completely distributed
>   (no locality: worst case for my patch?)
>
> =
> # of dirty pages:  kvm.git (ns),  with this patch (ns)
> 1: 102,077 ns  10,105 ns
> 2:  47,197 ns   9,395 ns
> 4:  43,563 ns   9,938 ns
> 8:  41,239 ns  10,618 ns
> 16: 42,988 ns  12,299 ns
> 32: 45,503 ns  14,298 ns
> 64: 50,915 ns  19,895 ns
> 128:61,087 ns  29,260 ns
> 256:81,007 ns  49,023 ns
> 512:   132,776 ns  86,670 ns
> 1024:  939,299 ns 131,496 ns
> 2048:  992,209 ns 250,429 ns
> 4096:  891,809 ns 479,280 ns
> 8192:1,027,280 ns 906,971 ns
> (until now pretty good)
>
> (ah, for every 32-bit atomic clear mask ...)
> 16384:   1,270,972 ns   6,661,741 ns//  1  1  1 ...  1
> 32768:   1,581,335 ns   9,673,985 ns//  ...
> 65536:   2,161,604 ns  11,466,134 ns//  ...
> 131072:  3,253,027 ns  13,412,954 ns//  ...
> 262144:  5,663,002 ns  16,309,924 ns// 31 31 31 ... 31
> =

On a 64-bit host, this will be twice as fast.  Or if we use cmpxchg16b,
and there are no surprises, four times as fast.  It will still be slower
than the original, but by a smaller margin.

> According to a 2005 usenix paper, WWS with a 8sec window was
> about 50,000 pages for a high dirtying rate program.
>
> Taking into acount of another possible gains from the WWS locality
> of real workloads, these numbers are not so bad IMO.

I agree.

>
> Furthermore the code has been made for initial test only and I did
> not do any optimization:  I know what I should try.
>
> So this seems worth more testing.
>
>
> The new code also makes it possible to do find-grained get dirty log.
> Live migration can be done like this ??? (not sure yet):
>
>   until the dirty rate becomes enough low
>   get dirty log for the first 32K pages (partial return is OK)
>   while sending
>   get dirty log for the next 32K pages  (partial return is OK)
>   while sending
>   ...
>   get dirty log for the last 32K pages  (partial return is OK)
>
>   stop the guest and get dirty log (but no need to write protect now)
>   send the remaining pages
>
> New API is needed for this as discussed before!

Yeah.  But I think we should switch to srcu-less dirty logs regardless. 
Here are you numbers, but normalized by the number of dirty pages.

dirty pagesold (ns/page)new (ns/page)
110207710105
2235994698
4108912485
851551327
162687769
321422447
64796311
128477229
256316191
512259169
1024917128
2048484122
4096218117
8192125111
1638478407
3276848295
6553633175
13107225102
2621442262


Your worst case, when considering a reasonable number of dirty pages, is
407ns/page, which is still lower than what userspace will actually do to
process the page, so it's reasonable.  The old method is often a lot
worse than your worst case, by this metric.



-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html