Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 12/03/2011 06:37 AM, Takuya Yoshikawa wrote: Avi Kivity a...@redhat.com wrote: That's true. But some applications do require low latency, and the current code can impose a lot of time with the mmu spinlock held. The total amount of work actually increases slightly, from O(N) to O(N log N), but since the tree is so wide, the overhead is small. Controlling the latency can be achieved by making the user space limit the number of dirty pages to scan without hacking the core mmu code. The fact that we cannot transfer so many pages on the network at once suggests this is reasonable. That is true. Write protecting everything at once means that there is a large window between the sampling the dirty log, and transferring the page. Any writes within that window cause a re-transfer, even when they should not. With the rmap write protection method in KVM, the only thing we need is a new GET_DIRTY_LOG api which takes the [gfn_start, gfn_end] to scan, or max_write_protections optionally. Right. I remember that someone suggested splitting the slot at KVM forum. Same effect with less effort. QEMU can also avoid unwanted page faults by using this api wisely. E.g. you can use this for Interactivity improvements TODO on KVM wiki, I think. Furthermore, QEMU may be able to use multiple threads for the memory copy task. Each thread has its own range of memory to copy, and does GET_DIRTY_LOG independently. This will make things easy to add further optimizations in QEMU. In summary, my impression is that the main cause of the current latency problem is not the write protection of KVM but the strategy which tries to cook the large slot in one hand. What do you think? I agree. Maybe O(1) write protection has a place, but it is secondary to fine-grained dirty logging, and if we implement it, it should be after your idea, and further measurements. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
Avi Kivity a...@redhat.com wrote: That's true. But some applications do require low latency, and the current code can impose a lot of time with the mmu spinlock held. The total amount of work actually increases slightly, from O(N) to O(N log N), but since the tree is so wide, the overhead is small. Controlling the latency can be achieved by making the user space limit the number of dirty pages to scan without hacking the core mmu code. The fact that we cannot transfer so many pages on the network at once suggests this is reasonable. With the rmap write protection method in KVM, the only thing we need is a new GET_DIRTY_LOG api which takes the [gfn_start, gfn_end] to scan, or max_write_protections optionally. I remember that someone suggested splitting the slot at KVM forum. Same effect with less effort. QEMU can also avoid unwanted page faults by using this api wisely. E.g. you can use this for Interactivity improvements TODO on KVM wiki, I think. Furthermore, QEMU may be able to use multiple threads for the memory copy task. Each thread has its own range of memory to copy, and does GET_DIRTY_LOG independently. This will make things easy to add further optimizations in QEMU. In summary, my impression is that the main cause of the current latency problem is not the write protection of KVM but the strategy which tries to cook the large slot in one hand. What do you think? Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 11/30/2011 09:03 AM, Xiao Guangrong wrote: On 11/29/2011 08:01 PM, Avi Kivity wrote: On 11/29/2011 01:56 PM, Xiao Guangrong wrote: On 11/29/2011 07:20 PM, Avi Kivity wrote: We used to have a bitmap in a shadow page with a bit set for every slot pointed to by the page. If we extend this to non-leaf pages (so, when we set a bit, we propagate it through its parent_ptes list), then we do the following on write fault: Thanks for the detail. Um, propagating slot bit to parent ptes is little slow, especially, it is the overload for no Xwindow guests which is dirty logged only in the migration(i guess most linux guests are running on this mode and migration is not frequent). No? You need to propagate very infrequently. The first pte added to a page will need to propagate, but the second (if from the same slot, which is likely) will already have the bit set in the page, so we're assured it's set in all its parents. What will happen if a guest page is unmapped or a shadow page is zapped? It should immediately clear the slot bit of the shadow page and its parent, it means it should propagate this clear slot bit event to all parents, in the case of softmmu. zapping shadow page is frequently, maybe it is unacceptable? You can keep the bit set. A clear bit means there are exactly zero pages from the slot in this mmu page or its descendents. A set bit means there are zero or more pages. If we have a bit accidentally set, nothing bad happens. It does not like unsync bit which can be lazily cleared, because all bits of hierarchy can be cleared when cr3 reload. With tdp (and without nested virt) the mappings never change anyway. With shadow, they do change. Not sure how to handle that at the higher levels. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 11/30/2011 07:15 AM, Takuya Yoshikawa wrote: (2011/11/30 14:02), Takuya Yoshikawa wrote: IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) write protections with respect to the total number of dirty pages: distributed, but actually each page fault, which should be logged, does some write protection? Sorry, was not precise. It depends on the level, and not completely distributed. But I think it is O(N), and the total number of costs will not change so much, I guess. That's true. But some applications do require low latency, and the current code can impose a lot of time with the mmu spinlock held. The total amount of work actually increases slightly, from O(N) to O(N log N), but since the tree is so wide, the overhead is small. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
Avi Kivity avi at redhat.com writes: On 11/16/2011 06:28 AM, Takuya Yoshikawa wrote: (2011/11/14 21:39), Avi Kivity wrote: There was a patchset from Peter Zijlstra that converted mmu notifiers to be preemptible, with that, we can convert the mmu spinlock to a mutex, I'll see what happened to it. Interesting! There is a third method of doing write protection, and that is by write-protecting at the higher levels of the paging hierarchy. The advantage there is that write protection is O(1) no matter how large the guest is, or the number of dirty pages. To write protect all guest memory, we just write protect the 512 PTEs at the very top, and leave the rest alone. When the guest writes to a page, we allow writes for the top-level PTE that faulted, and write-protect all the PTEs that it points to. One important point is that the guest, not GET DIRTY LOG caller, will pay for the write protection at the timing of faults. I don't think there is a significant difference. The number of write faults does not change. The amount of work done per fault does, but not by much, thanks to the writeable bitmap. Avi, I think it needs more thinking if only less page need be write protected. For example, framebuffer-based device used by Xwindow, only ~64M pages needs to be write protected, but in your way, guest will get write page fault on all memory? Hmm? It has some tricks but i missed? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
Sorry, CC list is lost. :( On 11/29/2011 06:01 PM, Xiao Guangrong wrote: Avi Kivity avi at redhat.com writes: On 11/16/2011 06:28 AM, Takuya Yoshikawa wrote: (2011/11/14 21:39), Avi Kivity wrote: There was a patchset from Peter Zijlstra that converted mmu notifiers to be preemptible, with that, we can convert the mmu spinlock to a mutex, I'll see what happened to it. Interesting! There is a third method of doing write protection, and that is by write-protecting at the higher levels of the paging hierarchy. The advantage there is that write protection is O(1) no matter how large the guest is, or the number of dirty pages. To write protect all guest memory, we just write protect the 512 PTEs at the very top, and leave the rest alone. When the guest writes to a page, we allow writes for the top-level PTE that faulted, and write-protect all the PTEs that it points to. One important point is that the guest, not GET DIRTY LOG caller, will pay for the write protection at the timing of faults. I don't think there is a significant difference. The number of write faults does not change. The amount of work done per fault does, but not by much, thanks to the writeable bitmap. Avi, I think it needs more thinking if only less page need be write protected. For example, framebuffer-based device used by Xwindow, only ~64M pages needs to be write protected, but in your way, guest will get write page fault on all memory? Hmm? It has some tricks but i missed? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
(2011/11/29 19:09), Xiao Guangrong wrote: Sorry, CC list is lost. :( On 11/29/2011 06:01 PM, Xiao Guangrong wrote: Avi Kivityaviat redhat.com writes: On 11/16/2011 06:28 AM, Takuya Yoshikawa wrote: (2011/11/14 21:39), Avi Kivity wrote: There was a patchset from Peter Zijlstra that converted mmu notifiers to be preemptible, with that, we can convert the mmu spinlock to a mutex, I'll see what happened to it. Interesting! There is a third method of doing write protection, and that is by write-protecting at the higher levels of the paging hierarchy. The advantage there is that write protection is O(1) no matter how large the guest is, or the number of dirty pages. To write protect all guest memory, we just write protect the 512 PTEs at the very top, and leave the rest alone. When the guest writes to a page, we allow writes for the top-level PTE that faulted, and write-protect all the PTEs that it points to. One important point is that the guest, not GET DIRTY LOG caller, will pay for the write protection at the timing of faults. I don't think there is a significant difference. The number of write faults does not change. The amount of work done per fault does, but not by much, thanks to the writeable bitmap. Avi, I think it needs more thinking if only less page need be write protected. For example, framebuffer-based device used by Xwindow, only ~64M pages needs to be write protected, but in your way, guest will get write page fault on all memory? Hmm? It has some tricks but i missed? Do you mean write protecting slot by slot is difficult in the case of O(1)? Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 11/29/2011 07:20 PM, Avi Kivity wrote: We used to have a bitmap in a shadow page with a bit set for every slot pointed to by the page. If we extend this to non-leaf pages (so, when we set a bit, we propagate it through its parent_ptes list), then we do the following on write fault: Thanks for the detail. Um, propagating slot bit to parent ptes is little slow, especially, it is the overload for no Xwindow guests which is dirty logged only in the migration(i guess most linux guests are running on this mode and migration is not frequent). No? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 11/29/2011 01:56 PM, Xiao Guangrong wrote: On 11/29/2011 07:20 PM, Avi Kivity wrote: We used to have a bitmap in a shadow page with a bit set for every slot pointed to by the page. If we extend this to non-leaf pages (so, when we set a bit, we propagate it through its parent_ptes list), then we do the following on write fault: Thanks for the detail. Um, propagating slot bit to parent ptes is little slow, especially, it is the overload for no Xwindow guests which is dirty logged only in the migration(i guess most linux guests are running on this mode and migration is not frequent). No? You need to propagate very infrequently. The first pte added to a page will need to propagate, but the second (if from the same slot, which is likely) will already have the bit set in the page, so we're assured it's set in all its parents. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 11/29/2011 02:01 PM, Avi Kivity wrote: On 11/29/2011 01:56 PM, Xiao Guangrong wrote: On 11/29/2011 07:20 PM, Avi Kivity wrote: We used to have a bitmap in a shadow page with a bit set for every slot pointed to by the page. If we extend this to non-leaf pages (so, when we set a bit, we propagate it through its parent_ptes list), then we do the following on write fault: Thanks for the detail. Um, propagating slot bit to parent ptes is little slow, especially, it is the overload for no Xwindow guests which is dirty logged only in the migration(i guess most linux guests are running on this mode and migration is not frequent). No? You need to propagate very infrequently. The first pte added to a page will need to propagate, but the second (if from the same slot, which is likely) will already have the bit set in the page, so we're assured it's set in all its parents. btw, if you plan to work on this, let's agree on pseudocode/data structures first to minimize churn. I'll also want this documented in mmu.txt. Of course we can still end up with something different than planned, but let's at least try to think of the issues in advance. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
CCing qemu devel, Juan, (2011/11/29 23:03), Avi Kivity wrote: On 11/29/2011 02:01 PM, Avi Kivity wrote: On 11/29/2011 01:56 PM, Xiao Guangrong wrote: On 11/29/2011 07:20 PM, Avi Kivity wrote: We used to have a bitmap in a shadow page with a bit set for every slot pointed to by the page. If we extend this to non-leaf pages (so, when we set a bit, we propagate it through its parent_ptes list), then we do the following on write fault: Thanks for the detail. Um, propagating slot bit to parent ptes is little slow, especially, it is the overload for no Xwindow guests which is dirty logged only in the migration(i guess most linux guests are running on this mode and migration is not frequent). No? You need to propagate very infrequently. The first pte added to a page will need to propagate, but the second (if from the same slot, which is likely) will already have the bit set in the page, so we're assured it's set in all its parents. btw, if you plan to work on this, let's agree on pseudocode/data structures first to minimize churn. I'll also want this documented in mmu.txt. Of course we can still end up with something different than planned, but let's at least try to think of the issues in advance. I want to hear the overall view as well. Now we are trying to improve cases when there are too many dirty pages during live migration. I did some measurements of live migration some months ago on 10Gbps dedicated line, two servers were directly connected, and checked that transferring only a few MBs of memory took ms order of latency, even if I excluded other QEMU side overheads: it matches simple math calculation. In another test, I found that even in a relatively normal workload, it needed a few seconds of pause at the last timing. Juan has more data? So, the current scheme is not scalable with respect to the number of dirty pages, and administrators should control not to migrate during such workload if possible. Server consolidation in the night will be OK, but dynamic load balancing may not work well in such restrictions: I am now more interested in the former. Then, taking that in mind, I put the goal on 1K dirty pages, 4MB memory, when I did the rmap optimization. Now it takes a few ms or so for write protecting such number of pages, IIRC: that is not so bad compared to the overall latency? So, though I like O(1) method, I want to hear the expected improvements in a bit more detail, if possible. IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) write protections with respect to the total number of dirty pages: distributed, but actually each page fault, which should be logged, does some write protection? In general, what kind of improvements actually needed for live migration? Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
(2011/11/30 14:02), Takuya Yoshikawa wrote: IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) write protections with respect to the total number of dirty pages: distributed, but actually each page fault, which should be logged, does some write protection? Sorry, was not precise. It depends on the level, and not completely distributed. But I think it is O(N), and the total number of costs will not change so much, I guess. Takuya In general, what kind of improvements actually needed for live migration? Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 11/29/2011 08:01 PM, Avi Kivity wrote: On 11/29/2011 01:56 PM, Xiao Guangrong wrote: On 11/29/2011 07:20 PM, Avi Kivity wrote: We used to have a bitmap in a shadow page with a bit set for every slot pointed to by the page. If we extend this to non-leaf pages (so, when we set a bit, we propagate it through its parent_ptes list), then we do the following on write fault: Thanks for the detail. Um, propagating slot bit to parent ptes is little slow, especially, it is the overload for no Xwindow guests which is dirty logged only in the migration(i guess most linux guests are running on this mode and migration is not frequent). No? You need to propagate very infrequently. The first pte added to a page will need to propagate, but the second (if from the same slot, which is likely) will already have the bit set in the page, so we're assured it's set in all its parents. What will happen if a guest page is unmapped or a shadow page is zapped? It should immediately clear the slot bit of the shadow page and its parent, it means it should propagate this clear slot bit event to all parents, in the case of softmmu. zapping shadow page is frequently, maybe it is unacceptable? It does not like unsync bit which can be lazily cleared, because all bits of hierarchy can be cleared when cr3 reload. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 11/29/2011 10:03 PM, Avi Kivity wrote: On 11/29/2011 02:01 PM, Avi Kivity wrote: On 11/29/2011 01:56 PM, Xiao Guangrong wrote: On 11/29/2011 07:20 PM, Avi Kivity wrote: We used to have a bitmap in a shadow page with a bit set for every slot pointed to by the page. If we extend this to non-leaf pages (so, when we set a bit, we propagate it through its parent_ptes list), then we do the following on write fault: Thanks for the detail. Um, propagating slot bit to parent ptes is little slow, especially, it is the overload for no Xwindow guests which is dirty logged only in the migration(i guess most linux guests are running on this mode and migration is not frequent). No? You need to propagate very infrequently. The first pte added to a page will need to propagate, but the second (if from the same slot, which is likely) will already have the bit set in the page, so we're assured it's set in all its parents. btw, if you plan to work on this, let's agree on pseudocode/data structures first to minimize churn. I'll also want this documented in mmu.txt. Of course we can still end up with something different than planned, but let's at least try to think of the issues in advance. Yeap, this work is interesting, i will keep researching it. ;) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote: This is a revised version of my previous work. I hope that the patches are more self explanatory than before. Thanks, applied. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
Adding qemu-devel to Cc. (2011/11/14 21:39), Avi Kivity wrote: On 11/14/2011 12:56 PM, Takuya Yoshikawa wrote: (2011/11/14 19:25), Avi Kivity wrote: On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote: This is a revised version of my previous work. I hope that the patches are more self explanatory than before. It looks good. I'll let Marcelo (or anyone else?) review it as well before applying. Do you have performance measurements? For VGA, 30-40us became 3-5us when the display was quiet, with a enough warmed up guest. That's a nice improvement. Near the criterion, the number was not different much from the original version. For live migration, I forgot the number but the result was good. But my test case was not enough to cover every pattern, so I changed the criterion to be a bit conservative. More tests may be able to find a better criterion. I am not in a hurry about this, so it is OK to add some tests before merging this. I think we can merge is as is, it's clear we get an improvement. I did a simple test to show numbers! Here, a 4GB guest was being migrated locally during copying a file in it. Case 1. corresponds to the original method and case 2 does to the optimized one. Small numbers are, probably, from VGA: Case 1. about 30us Case 2. about 3us Other numbers are from the system RAM (triggered by live migration): Case 1. about 500us, 2000us Case 2. about 80us, 2000us (not exactly averaged, see below for details) * 2000us was when rmap was not used, so equal to that of case 1. So I can say that my patch worked well for both VGA and live migration. Takuya === measurement snippet === Case 1. kvm_mmu_slot_remove_write_access() only (same as the original method): qemu-system-x86-25413 [000] 6546.215009: funcgraph_entry: | write_protect_slot() { qemu-system-x86-25413 [000] 6546.215010: funcgraph_entry: ! 2039.512 us |kvm_mmu_slot_remove_write_access(); qemu-system-x86-25413 [000] 6546.217051: funcgraph_exit: ! 2040.487 us | } qemu-system-x86-25413 [002] 6546.217347: funcgraph_entry: | write_protect_slot() { qemu-system-x86-25413 [002] 6546.217349: funcgraph_entry: ! 571.121 us | kvm_mmu_slot_remove_write_access(); qemu-system-x86-25413 [002] 6546.217921: funcgraph_exit: ! 572.525 us | } qemu-system-x86-25413 [000] 6546.314583: funcgraph_entry: | write_protect_slot() { qemu-system-x86-25413 [000] 6546.314585: funcgraph_entry: + 29.598 us | kvm_mmu_slot_remove_write_access(); qemu-system-x86-25413 [000] 6546.314616: funcgraph_exit: + 31.053 us | } qemu-system-x86-25413 [000] 6546.314784: funcgraph_entry: | write_protect_slot() { qemu-system-x86-25413 [000] 6546.314785: funcgraph_entry: ! 2002.591 us |kvm_mmu_slot_remove_write_access(); qemu-system-x86-25413 [000] 6546.316788: funcgraph_exit: ! 2003.537 us | } qemu-system-x86-25413 [000] 6546.317082: funcgraph_entry: | write_protect_slot() { qemu-system-x86-25413 [000] 6546.317083: funcgraph_entry: ! 624.445 us | kvm_mmu_slot_remove_write_access(); qemu-system-x86-25413 [000] 6546.317709: funcgraph_exit: ! 625.861 us | } qemu-system-x86-25413 [000] 6546.414261: funcgraph_entry: | write_protect_slot() { qemu-system-x86-25413 [000] 6546.414263: funcgraph_entry: + 29.593 us | kvm_mmu_slot_remove_write_access(); qemu-system-x86-25413 [000] 6546.414293: funcgraph_exit: + 30.944 us | } qemu-system-x86-25413 [000] 6546.414528: funcgraph_entry: | write_protect_slot() { qemu-system-x86-25413 [000] 6546.414529: funcgraph_entry: ! 1990.363 us |kvm_mmu_slot_remove_write_access(); qemu-system-x86-25413 [000] 6546.416520: funcgraph_exit: ! 1991.370 us | } qemu-system-x86-25413 [000] 6546.416775: funcgraph_entry: | write_protect_slot() { qemu-system-x86-25413 [000] 6546.416776: funcgraph_entry: ! 594.333 us | kvm_mmu_slot_remove_write_access(); qemu-system-x86-25413 [000] 6546.417371: funcgraph_exit: ! 595.415 us | } qemu-system-x86-25413 [000] 6546.514133: funcgraph_entry: | write_protect_slot() { qemu-system-x86-25413 [000] 6546.514135: funcgraph_entry: + 24.032 us | kvm_mmu_slot_remove_write_access(); qemu-system-x86-25413 [000] 6546.514160: funcgraph_exit: + 25.074 us | } qemu-system-x86-25413 [000] 6546.514312: funcgraph_entry: | write_protect_slot() { qemu-system-x86-25413 [000] 6546.514313: funcgraph_entry: ! 2035.365 us |kvm_mmu_slot_remove_write_access(); qemu-system-x86-25413 [000] 6546.516349: funcgraph_exit: ! 2036.298 us | } qemu-system-x86-25413 [000] 6546.516642: funcgraph_entry: | write_protect_slot() {
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 11/16/2011 06:28 AM, Takuya Yoshikawa wrote: (2011/11/14 21:39), Avi Kivity wrote: There was a patchset from Peter Zijlstra that converted mmu notifiers to be preemptible, with that, we can convert the mmu spinlock to a mutex, I'll see what happened to it. Interesting! There is a third method of doing write protection, and that is by write-protecting at the higher levels of the paging hierarchy. The advantage there is that write protection is O(1) no matter how large the guest is, or the number of dirty pages. To write protect all guest memory, we just write protect the 512 PTEs at the very top, and leave the rest alone. When the guest writes to a page, we allow writes for the top-level PTE that faulted, and write-protect all the PTEs that it points to. One important point is that the guest, not GET DIRTY LOG caller, will pay for the write protection at the timing of faults. I don't think there is a significant difference. The number of write faults does not change. The amount of work done per fault does, but not by much, thanks to the writeable bitmap. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
(2011/11/14 21:39), Avi Kivity wrote: There was a patchset from Peter Zijlstra that converted mmu notifiers to be preemptible, with that, we can convert the mmu spinlock to a mutex, I'll see what happened to it. Interesting! There is a third method of doing write protection, and that is by write-protecting at the higher levels of the paging hierarchy. The advantage there is that write protection is O(1) no matter how large the guest is, or the number of dirty pages. To write protect all guest memory, we just write protect the 512 PTEs at the very top, and leave the rest alone. When the guest writes to a page, we allow writes for the top-level PTE that faulted, and write-protect all the PTEs that it points to. One important point is that the guest, not GET DIRTY LOG caller, will pay for the write protection at the timing of faults. For live migration, it may be good because we have to make the guest memory converge anyway. We can combine it with your method by having a small bitmap (say, just 64 bits) per shadow page. Each bit represents 8 PTEs (total 512 PTEs) and is set if any of those PTEs are writeable. Yes, there seem to be some good ways to make every case work well. Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] KVM: Dirty logging optimization using rmap
This is a revised version of my previous work. I hope that the patches are more self explanatory than before. Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote: This is a revised version of my previous work. I hope that the patches are more self explanatory than before. It looks good. I'll let Marcelo (or anyone else?) review it as well before applying. Do you have performance measurements? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
(2011/11/14 19:25), Avi Kivity wrote: On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote: This is a revised version of my previous work. I hope that the patches are more self explanatory than before. It looks good. I'll let Marcelo (or anyone else?) review it as well before applying. Do you have performance measurements? For VGA, 30-40us became 3-5us when the display was quiet, with a enough warmed up guest. Near the criterion, the number was not different much from the original version. For live migration, I forgot the number but the result was good. But my test case was not enough to cover every pattern, so I changed the criterion to be a bit conservative. More tests may be able to find a better criterion. I am not in a hurry about this, so it is OK to add some tests before merging this. But what I did not like was holding spin lock more than 100us or so with the original version. With this version, at least, the problem should be cured some. Takuya One note: kvm-unit-tests' dirty logging test was broken for 32-bit box: compile error. I changed an idt to boot_idt and used it. I do not know kvm-unit-tests well, so I want somebody to fix that officially. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 11/14/2011 12:56 PM, Takuya Yoshikawa wrote: (2011/11/14 19:25), Avi Kivity wrote: On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote: This is a revised version of my previous work. I hope that the patches are more self explanatory than before. It looks good. I'll let Marcelo (or anyone else?) review it as well before applying. Do you have performance measurements? For VGA, 30-40us became 3-5us when the display was quiet, with a enough warmed up guest. That's a nice improvement. Near the criterion, the number was not different much from the original version. For live migration, I forgot the number but the result was good. But my test case was not enough to cover every pattern, so I changed the criterion to be a bit conservative. More tests may be able to find a better criterion. I am not in a hurry about this, so it is OK to add some tests before merging this. I think we can merge is as is, it's clear we get an improvement. But what I did not like was holding spin lock more than 100us or so with the original version. With this version, at least, the problem should be cured some. There was a patchset from Peter Zijlstra that converted mmu notifiers to be preemptible, with that, we can convert the mmu spinlock to a mutex, I'll see what happened to it. There is a third method of doing write protection, and that is by write-protecting at the higher levels of the paging hierarchy. The advantage there is that write protection is O(1) no matter how large the guest is, or the number of dirty pages. To write protect all guest memory, we just write protect the 512 PTEs at the very top, and leave the rest alone. When the guest writes to a page, we allow writes for the top-level PTE that faulted, and write-protect all the PTEs that it points to. We can combine it with your method by having a small bitmap (say, just 64 bits) per shadow page. Each bit represents 8 PTEs (total 512 PTEs) and is set if any of those PTEs are writeable. Takuya One note: kvm-unit-tests' dirty logging test was broken for 32-bit box: compile error. I changed an idt to boot_idt and used it. I do not know kvm-unit-tests well, so I want somebody to fix that officially. I'll look into it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html