Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread KAMEZAWA Hiroyuki
On Tue, 18 Oct 2011 10:31:10 +0200
Jan Kiszka  wrote:

> On 2011-10-18 10:31, Wen Congyang wrote:
> > At 10/18/2011 04:26 PM, Jan Kiszka Write:
> >> On 2011-10-18 10:25, Wen Congyang wrote:
> >>> At 10/18/2011 04:19 PM, Jan Kiszka Write:
>  On 2011-10-18 09:58, Wen Congyang wrote:
> > At 10/18/2011 03:52 PM, Jan Kiszka Write:
> >> On 2011-10-18 09:15, Wen Congyang wrote:
> >>> Hi, Jan Kiszka
> >>>
> >>> At 10/10/2011 05:34 PM, Jan Kiszka Write:
>  On 2011-10-10 11:02, Daniel P. Berrange wrote:
> > On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:
> >>>
> 
>  Run gdb with "set debug remote 1" and watch the communication, it is 
>  not
>  that complex. But a dump command is probably simpler for those
>  scenarios, I agree.
> >>>
> >>> I have implemented the command dump and reuse migration's code. But I 
> >>> meet a problem
> >>> when I test it.
> >>
> >> Using migration code for dump is most probably the wrong approach as 
> >> you
> >> saw through that conflict. All you need are the register states and the
> >> RAM. Reuse gdbstub services for this.
> >
> > Hmm, if the migration code can not be reused, I think we should define 
> > a new
> > qemu's vmcore format, and add some codes into crash to support such 
> > format.
> 
>  Please try to avoid defining something new. Unless there is a striking
>  reason, standard gdb core files should be generated so that you can load
>  the dump directly into gdb for analysis.
> >>>
> >>> I am not sure whehter the standard gdb core files can not be analyzed by 
> >>> crash.
> >>> If not, I think we should define something new because it's easier to use
> >>> crash than gdb to analyze the core files.
> >>
> >> gdb allows you to walk up the frame and print variables (globals &
> >> local) etc.
> > 
> > Crash uses gdb to provide common function, and you can also use all the gdb 
> > commands
> > in crash.
> 
> That what's the added value here when I can use gdb directly?
> 

I didn't read full story but 'crash' is used for investigating kernel core 
generated
by kdump for several years. Considering support service guys, virsh dump should 
support
a format for crash because they can't work well at investigating vmcore by gdb.

crash has several functionality useful for them as 'show kerne log', 'focus on 
a cpu'
'for-each-task', 'for-each-vma', 'extract ftrace log' etc.

Anyway, if a man, who is not developper of qemu/kvm, should learn 2 tools for
investigating kernel dump, it sounds harmful.

Thanks,
-Kame
 




Re: [Qemu-devel] [PATCH] stop the iteration when too many pages is transferred

2010-11-21 Thread KAMEZAWA Hiroyuki
On Fri, 19 Nov 2010 20:23:55 -0600
Anthony Liguori  wrote:

> On 11/17/2010 08:32 PM, Wen Congyang wrote:
> > When the total sent page size is larger than max_factor
> > times of the size of guest OS's memory, stop the
> > iteration.
> > The default value of max_factor is 3.
> >
> > This is similar to XEN.
> >
> >
> > Signed-off-by: Wen Congyang
> >   
> 
> I'm strongly opposed to doing this. I think Xen gets this totally wrong.
> 
> Migration is a contract. When you set the stop time, you're saying that
> you want only want the guest to experience a fixed amount of downtime.
> Stopping the guest after some arbitrary number of iterations makes the
> downtime non-deterministic. With a very large guest, this could wreak
> havoc causing dropped networking connections, etc.
> 
> It's totally unsafe.
> 
> If a management tool wants this behavior, they can set a timeout and
> explicitly stop the guest during the live migration. IMHO, such a
> management tool is not doing it's job properly but it still can be
> implemented.
> 

Hmm, is there any information available for management-tools about
"the reason migration failed was because migration never ends because
 of new dirty pages" or some ?

I'm grad if I know cold-migraton will success at high rate before stop machine
even when live migration failed by timeout. If the "network" or "target node is
too busy" is the reason of failure, cold migration will also be in trouble and
we'll see longer down time than expected.
I think it's helpful to show how the transfer was done, as "sent 3x pages of 
guest
pages but failed."

any idea ?

Thanks,
-Kame