Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()
On 02/02/2012 04:44 PM, Takuya Yoshikawa wrote: > Avi Kivity wrote: > > > > I have one concern about correctness issue though: > > > > > > concurrent rmap write protection may not be safe due to > > > delayed tlb flush ... cannot happen? > > > > What do you mean by concurrent rmap write protection? > > > > Not sure, but other codes like: > > - mmu_sync_children() > for_each_sp(pages, sp, parents, i) > protected |= rmap_write_protect(vcpu->kvm, sp->gfn); > > if (protected) > kvm_flush_remote_tlbs(vcpu->kvm); > > - kvm_mmu_get_page() > if (rmap_write_protect(vcpu->kvm, gfn)) > kvm_flush_remote_tlbs(vcpu->kvm); > > I just wondered what can happen if GET_DIRTY_LOG is being processed > behind these processing? It's a bug. If the flush happens outside the spinlock, then one of the callers can return before it is assured the tlb is flushed. A B spin_lock clear pte.w spin_unlock spin_lock pte.w already clear spin_unlock skip flush return flush > > > They may find nothing to write protect and won't do kvm_flush_remote_tlbs() > if the gfn has been already protected by GET_DIRTY_LOG. > > But GET_DIRTY_LOG may still be busy write protecting other pages and > others can return before. (My code releases mmu_lock to not include > __put_user() in the critical section.) > > I am not still enough familier with these code yet. It's actually an advantage, since you don't have any assumptions on how the code works. > (maybe empty concern) Nope, good catch of this subtle bug. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()
Avi Kivity wrote: > > I have one concern about correctness issue though: > > > > concurrent rmap write protection may not be safe due to > > delayed tlb flush ... cannot happen? > > What do you mean by concurrent rmap write protection? > Not sure, but other codes like: - mmu_sync_children() for_each_sp(pages, sp, parents, i) protected |= rmap_write_protect(vcpu->kvm, sp->gfn); if (protected) kvm_flush_remote_tlbs(vcpu->kvm); - kvm_mmu_get_page() if (rmap_write_protect(vcpu->kvm, gfn)) kvm_flush_remote_tlbs(vcpu->kvm); I just wondered what can happen if GET_DIRTY_LOG is being processed behind these processing? They may find nothing to write protect and won't do kvm_flush_remote_tlbs() if the gfn has been already protected by GET_DIRTY_LOG. But GET_DIRTY_LOG may still be busy write protecting other pages and others can return before. (My code releases mmu_lock to not include __put_user() in the critical section.) I am not still enough familier with these code yet. (maybe empty concern) Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()
On 02/02/2012 12:40 PM, Takuya Yoshikawa wrote: > > I have one concern about correctness issue though: > > concurrent rmap write protection may not be safe due to > delayed tlb flush ... cannot happen? What do you mean by concurrent rmap write protection? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()
(2012/02/02 19:21), Avi Kivity wrote: I used "unsigned int" just because I wanted to use the current atomic_clear_mask() as is. We need to implement atomic_clear_mask_long() or use ... If we use cmpxchg8b/cmpxchg16b then this won't fit with the atomic_*_long family. OK, I will try. I have one concern about correctness issue though: concurrent rmap write protection may not be safe due to delayed tlb flush ... cannot happen? Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()
On 02/02/2012 12:21 PM, Takuya Yoshikawa wrote: > (2012/02/02 19:10), Avi Kivity wrote: > >>> >>> = >>> # of dirty pages: kvm.git (ns), with this patch (ns) >>> 1: 102,077 ns 10,105 ns >>> 2: 47,197 ns 9,395 ns >>> 4: 43,563 ns 9,938 ns >>> 8: 41,239 ns 10,618 ns >>> 16: 42,988 ns 12,299 ns >>> 32: 45,503 ns 14,298 ns >>> 64: 50,915 ns 19,895 ns >>> 128:61,087 ns 29,260 ns >>> 256:81,007 ns 49,023 ns >>> 512: 132,776 ns 86,670 ns >>> 1024: 939,299 ns 131,496 ns >>> 2048: 992,209 ns 250,429 ns >>> 4096: 891,809 ns 479,280 ns >>> 8192:1,027,280 ns 906,971 ns >>> (until now pretty good) >>> >>> (ah, for every 32-bit atomic clear mask ...) >>> 16384: 1,270,972 ns 6,661,741 ns// 1 1 1 ... 1 >>> 32768: 1,581,335 ns 9,673,985 ns// ... >>> 65536: 2,161,604 ns 11,466,134 ns// ... >>> 131072: 3,253,027 ns 13,412,954 ns// ... >>> 262144: 5,663,002 ns 16,309,924 ns// 31 31 31 ... 31 >>> = >> >> On a 64-bit host, this will be twice as fast. Or if we use cmpxchg16b, >> and there are no surprises, four times as fast. It will still be slower >> than the original, but by a smaller margin. > > Yes. > > I used "unsigned int" just because I wanted to use the current > atomic_clear_mask() as is. > > We need to implement atomic_clear_mask_long() or use ... If we use cmpxchg8b/cmpxchg16b then this won't fit with the atomic_*_long family. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()
(2012/02/02 19:10), Avi Kivity wrote: = # of dirty pages: kvm.git (ns), with this patch (ns) 1: 102,077 ns 10,105 ns 2: 47,197 ns 9,395 ns 4: 43,563 ns 9,938 ns 8: 41,239 ns 10,618 ns 16: 42,988 ns 12,299 ns 32: 45,503 ns 14,298 ns 64: 50,915 ns 19,895 ns 128:61,087 ns 29,260 ns 256:81,007 ns 49,023 ns 512: 132,776 ns 86,670 ns 1024: 939,299 ns 131,496 ns 2048: 992,209 ns 250,429 ns 4096: 891,809 ns 479,280 ns 8192:1,027,280 ns 906,971 ns (until now pretty good) (ah, for every 32-bit atomic clear mask ...) 16384: 1,270,972 ns 6,661,741 ns// 1 1 1 ... 1 32768: 1,581,335 ns 9,673,985 ns// ... 65536: 2,161,604 ns 11,466,134 ns// ... 131072: 3,253,027 ns 13,412,954 ns// ... 262144: 5,663,002 ns 16,309,924 ns// 31 31 31 ... 31 = On a 64-bit host, this will be twice as fast. Or if we use cmpxchg16b, and there are no surprises, four times as fast. It will still be slower than the original, but by a smaller margin. Yes. I used "unsigned int" just because I wanted to use the current atomic_clear_mask() as is. We need to implement atomic_clear_mask_long() or use ... Yeah. But I think we should switch to srcu-less dirty logs regardless. Here are you numbers, but normalized by the number of dirty pages. Thanks, I can prepare the official patch series then, of course with more test. Takuya dirty pagesold (ns/page)new (ns/page) 110207710105 2235994698 4108912485 851551327 162687769 321422447 64796311 128477229 256316191 512259169 1024917128 2048484122 4096218117 8192125111 1638478407 3276848295 6553633175 13107225102 2621442262 Your worst case, when considering a reasonable number of dirty pages, is 407ns/page, which is still lower than what userspace will actually do to process the page, so it's reasonable. The old method is often a lot worse than your worst case, by this metric. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()
On 02/02/2012 07:46 AM, Takuya Yoshikawa wrote: > Avi Kivity wrote: > > > >> That'll be great, numbers are better than speculation. > > >> > > > > > > > > > Yes, I already have some good numbers to show (and some patches). > > > > Looking forward. > > I made a patch to see if Avi's suggestion of getting rid of srcu > update for dirty logging is practical; tested with my unit-test. > > (I used a function to write protect a range of pages using rmap, > which is itself useful for optimizing the current code.) > > 1. test result > > on 32bit host (core i3 box) // just for the unit-test ... > slot size: 256K pages (1GB memory) > > > Measured by dirty-log-perf (executed only once for each case) > > Note: dirty pages are completely distributed > (no locality: worst case for my patch?) > > = > # of dirty pages: kvm.git (ns), with this patch (ns) > 1: 102,077 ns 10,105 ns > 2: 47,197 ns 9,395 ns > 4: 43,563 ns 9,938 ns > 8: 41,239 ns 10,618 ns > 16: 42,988 ns 12,299 ns > 32: 45,503 ns 14,298 ns > 64: 50,915 ns 19,895 ns > 128:61,087 ns 29,260 ns > 256:81,007 ns 49,023 ns > 512: 132,776 ns 86,670 ns > 1024: 939,299 ns 131,496 ns > 2048: 992,209 ns 250,429 ns > 4096: 891,809 ns 479,280 ns > 8192:1,027,280 ns 906,971 ns > (until now pretty good) > > (ah, for every 32-bit atomic clear mask ...) > 16384: 1,270,972 ns 6,661,741 ns// 1 1 1 ... 1 > 32768: 1,581,335 ns 9,673,985 ns// ... > 65536: 2,161,604 ns 11,466,134 ns// ... > 131072: 3,253,027 ns 13,412,954 ns// ... > 262144: 5,663,002 ns 16,309,924 ns// 31 31 31 ... 31 > = On a 64-bit host, this will be twice as fast. Or if we use cmpxchg16b, and there are no surprises, four times as fast. It will still be slower than the original, but by a smaller margin. > According to a 2005 usenix paper, WWS with a 8sec window was > about 50,000 pages for a high dirtying rate program. > > Taking into acount of another possible gains from the WWS locality > of real workloads, these numbers are not so bad IMO. I agree. > > Furthermore the code has been made for initial test only and I did > not do any optimization: I know what I should try. > > So this seems worth more testing. > > > The new code also makes it possible to do find-grained get dirty log. > Live migration can be done like this ??? (not sure yet): > > until the dirty rate becomes enough low > get dirty log for the first 32K pages (partial return is OK) > while sending > get dirty log for the next 32K pages (partial return is OK) > while sending > ... > get dirty log for the last 32K pages (partial return is OK) > > stop the guest and get dirty log (but no need to write protect now) > send the remaining pages > > New API is needed for this as discussed before! Yeah. But I think we should switch to srcu-less dirty logs regardless. Here are you numbers, but normalized by the number of dirty pages. dirty pagesold (ns/page)new (ns/page) 110207710105 2235994698 4108912485 851551327 162687769 321422447 64796311 128477229 256316191 512259169 1024917128 2048484122 4096218117 8192125111 1638478407 3276848295 6553633175 13107225102 2621442262 Your worst case, when considering a reasonable number of dirty pages, is 407ns/page, which is still lower than what userspace will actually do to process the page, so it's reasonable. The old method is often a lot worse than your worst case, by this metric. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html