At 12/15/2011 09:30 AM, HATAYAMA Daisuke Write: > From: Wen Congyang <we...@cn.fujitsu.com> > Subject: Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci > device is used by guest > Date: Tue, 13 Dec 2011 17:20:24 +0800 > >> At 12/13/2011 02:01 PM, HATAYAMA Daisuke Write: >>> From: Wen Congyang <we...@cn.fujitsu.com> >>> Subject: Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci >>> device is used by guest >>> Date: Tue, 13 Dec 2011 11:35:53 +0800 >>> >>>> Hi, hatayama-san >>>> >>>> At 12/13/2011 11:12 AM, HATAYAMA Daisuke Write: >>>>> Hello Wen, >>>>> >>>>> From: Wen Congyang <we...@cn.fujitsu.com> >>>>> Subject: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci >>>>> device is used by guest >>>>> Date: Fri, 09 Dec 2011 15:57:26 +0800 >>>>> >>>>>> Hi, all >>>>>> >>>>>> 'virsh dump' can not work when host pci device is used by guest. We have >>>>>> discussed this issue here: >>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html >>>>>> >>>>>> We have determined to introduce a new command dump to dump memory. The >>>>>> core >>>>>> file's format can be elf. >>>>>> >>>>>> Note: >>>>>> 1. The guest should be x86 or x86_64. The other arch is not supported. >>>>>> 2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not >>>>>> crash. >>>>>> 3. If the OS is in the second kernel, gdb may not work well, and crash >>>>>> can >>>>>> work by specifying '--machdep phys_addr=xxx' in the command line. The >>>>>> reason is that the second kernel will update the page table, and we >>>>>> can >>>>>> not get the page table for the first kernel. >>>>> >>>>> I guess still the current implementation breaks vmalloc'ed area that >>>>> needs page tables originally located in the first 640kB, right? If you >>>>> want to do so in a correct way, you need to identify a position of >>>>> backup region and get data of 1st kernel's page tables. >>>> >>>> I do not know anything about vmalloc'ed area. Can you explain it more >>>> detailed? >>>> >>> >>> It's memory area not straight-mapped. To read the area, it's necessary >>> to look up guest machine's page tables. If I understand correctly, >>> your current implementation translates the vmalloc'ed area so that the >>> generated vmcore is linearly mapped w.r.t. virtual-address for gdb to >>> work. >> >> Do you mean the page table for vmalloc'ed area is stored in first 640KB, >> and it may be overwriten by the second kernel(this region has been backed >> up)? >> > > This might be wrong.. I've locally tried to ensure this but I have not > done yet. > > I make sure at least pgtlist_data could be within the first 640kB: > > crash> log > <cut> > No NUMA configuration found > Faking a node at 0000000000000000-000000007f800000 > Bootmem setup node 0 0000000000000000-000000007f800000 > NODE_DATA [0000000000011000 - 0000000000044fff] <-- this
Only kernel built with CONFIG_NUMA has this. This config is only enabled on RHEL x86_64. I do not have such env on hand now. > bootmap [0000000000045000 - 0000000000054eff] pages 10 > (7 early reservations) ==> bootmem [0000000000 - 007f800000] > #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] > #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] > > And I had ever had the vmcore created after entering 2nd kernel where > I cannot see module data using mod sub-command, which was resolved by > re-reading the address to the corresponding backup region. > > I guess becuase crash uses page table on memory, this affects paging > badly. > > I want to look into this more but I don't have such vmcore now because > I lost them accidentally... I tried to reproduce this some times > yesterday but didn't succeed. The vmcore above is one of them. > >>> >>> kdump saves the first 640kB physical memory into the backup region. I >>> guess, for some vmcores created by the current implementation, gdb and >>> crash cannot see the vmalloc'ed memory area that needs page tables >> >> Hmm, IIRC, crash do not use CPU's page table. gdb use the information in >> PT_LOAD to read memory area. >> > > I was confused this. Your dump command uses CPU's page table. > > So on the qemu side you can get page table over a whole physical > address, right? If so, contents themselves are not broken, I think. > >>> placed at the 640kB region, correctly. For example, try to use mod >>> sub-command. Kernel modules are allocated on vmalloc'ed area. >>> >>> I have developped a very similar logic for sadump. Look at sadump.c in >>> crash. Logic itself is very simple, but debugging information is >>> necessary. Documentation/kdump/kdump.txt and the following paper >>> explains backup region mechanism very well, and the implementaion >>> around there remains same now. >> >> Hmm, we can not use debugging information on qemu sied. >> > > How about re-reading them later in crash? Users want to see the 1st > kernel rather than 2nd kernel. A easy way to see the 1st kernel is: specify --machdep phys_base=xxx in the command line. Thanks Wen Congyang > > To do it, the dump format must be able to be distingished from crash. > Which does function in crash read vmcores created by this command? > kcore, or netdump? > > Thanks. > HATAYAMA, Daisuke > >