On 11/06/2012 07:22 AM, Alexey Kardashevskiy wrote:
> On 02/11/12 23:12, Orit Wasserman wrote:
>> On 11/02/2012 05:10 AM, David Gibson wrote:
>>> Asking for some advice on the list.
>>>
>>> I have prorotype savevm and migration support ready for the pseries
>>> machine. They seem to work under simple circumstances (idle guest).
>>> To test them more extensively I've been attempting to perform live
>>> migrations (just over tcp->localhost) which the guest is active with
>>> something. In particular I've tried while using octave to do matrix
>>> multiply (so exercising the FP unit) and my colleague Alexey has tried
>>> during some video encoding.
>>>
>> As you are doing local migration one option is to setting the speed higher
>> than line speed , as we don't actually send the data, another is to set high
>> downtime.
>>
>>> However, in each of these cases, we've found that the migration only
>>> completes and the source instance only stops after the intensive
>>> workload has (just) completed. What I surmise is happening is that
>>> the workload is touching memory pages fast enough that the ram
>>> migration code is never getting below the threshold to complete the
>>> migration until the guest is idle again.
>>>
>> The workload you chose is really bad for live migration, as all the guest
>> does is
>> dirtying his memory. I recommend looking for workload that does some
>> networking or disk IO.
>> Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I
>> don't
>> know if they run on pseries, but similar workload should be ok(small
>> database/warehouse).
>> We found out that SpecJbb on the other hand is hard to converge.
>> Web workload or video streaming also do the trick.
>
>
> My ffmpeg workload is simple encoding h263+ac3 to h263+ac3, 64*36 pixels. So
> it should not be dirtying memory too much. Or is it?
>
> (qemu) info migrate
> capabilities: xbzrle: off
> Migration status: completed
> total time: 14538 milliseconds
> downtime: 1273 milliseconds
> transferred ram: 389961 kbytes
> remaining ram: 0 kbytes
> total ram: 1065024 kbytes
> duplicate: 181949 pages
> normal: 97446 pages
> normal bytes: 389784 kbytes
>
> How many bytes were actually transferred? "duplicate" * 4K = 745MB?
For duplicate we send one byte and those are usually zero pages + the page
header.
transferred is the actual amount of bytes sent so here is around 389M was sent.
>
> Is there any tool in QEMU to see how many pages are used/dirty/etc?
sadly no.
> "info" does not seem to have any kind of such statistic.
>
> btw the new guest did not resume (qemu still responds on commands) but this
> is probably our problem within "pseries" platform. What is strange is that
> "info migrate" on the new guest shows nothing:
>
> (qemu) info migrate
> (qemu)
>
the "info migrate" command displays outgoing migration information not incoming
..
>
>
>
>> Cheers,
>> Orit
>>
>>> Does anyone have some ideas for testing this better: workloads that
>>> are less likely to trigger this behaviour, or settings to tweak in the
>>> migration itself to make it more likely to complete migration while
>>> the workload is still active.
>>>
>>
>
>