CCing qemu devel, Juan,
(2011/11/29 23:03), Avi Kivity wrote:
On 11/29/2011 02:01 PM, Avi Kivity wrote:
On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
On 11/29/2011 07:20 PM, Avi Kivity wrote:
We used to have a bitmap in a shadow page with a bit set for every slot
pointed to by the page. If we extend this to non-leaf pages (so, when
we set a bit, we propagate it through its parent_ptes list), then we do
the following on write fault:
Thanks for the detail.
Um, propagating slot bit to parent ptes is little slow, especially, it
is the overload for no Xwindow guests which is dirty logged only in the
migration(i guess most linux guests are running on this mode and migration
is not frequent). No?
You need to propagate very infrequently. The first pte added to a page
will need to propagate, but the second (if from the same slot, which is
likely) will already have the bit set in the page, so we're assured it's
set in all its parents.
btw, if you plan to work on this, let's agree on pseudocode/data
structures first to minimize churn. I'll also want this documented in
mmu.txt. Of course we can still end up with something different than
planned, but let's at least try to think of the issues in advance.
I want to hear the overall view as well.
Now we are trying to improve cases when there are too many dirty pages during
live migration.
I did some measurements of live migration some months ago on 10Gbps dedicated
line,
two servers were directly connected, and checked that transferring only a few
MBs of
memory took ms order of latency, even if I excluded other QEMU side overheads:
it
matches simple math calculation.
In another test, I found that even in a relatively normal workload, it needed a
few
seconds of pause at the last timing.
Juan has more data?
So, the current scheme is not scalable with respect to the number of dirty
pages,
and administrators should control not to migrate during such workload if
possible.
Server consolidation in the night will be OK, but dynamic load balancing
may not work well in such restrictions: I am now more interested in the
former.
Then, taking that in mind, I put the goal on 1K dirty pages, 4MB memory, when
I did the rmap optimization. Now it takes a few ms or so for write protecting
such number of pages, IIRC: that is not so bad compared to the overall latency?
So, though I like O(1) method, I want to hear the expected improvements in a bit
more detail, if possible.
IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N)
write
protections with respect to the total number of dirty pages: distributed, but
actually each page fault, which should be logged, does some write protection?
In general, what kind of improvements actually needed for live migration?
Thanks,
Takuya