Re: [GIT PULL] x86/mm changes for v3.9-rc1
,Jarkko Sakkinen ,Jeremy Fitzhardinge ,Joe Millenbach ,Joerg Roedel ,Johannes Weiner ,Josh Triplett ,Kyungmin Park ,Lee Schermerhorn ,Len Brown ,Linux Kernel Mailing List ,Marcelo Tosatti ,Marek Szyprowski ,Matt Fleming ,Mel Gorman ,Paul Turner ,Pavel Machek ,Pekka Enberg ,Peter Zijlstra ,Ralf Baechle ,Rik van Riel ,Rob Landley ,Russell King ,Rusty Russell ,Shuah Khan ,Shuah Khan ,Steven Rostedt ,Thomas Gleixn! er ,=?ISO-8859-1?Q?Ville_Syrj=E4l=E4?= ,Yasuaki Ishimatsu ,Zachary Amsden ,"a...@redhat.com" ,"linux-m...@linux-mips.org" ,"linux...@vger.kernel.org" ,"m...@redhat.com" ,"sparcli...@vger.kernel.org" ,"virtualization@lists.linux-foundation.org" ,"xen-de...@lists.xensource.com" Message-ID: <1b89b5cf-4ad4-4c25-ab76-a8ac6910c...@email.android.com> Again... you probably want to check into Dave's debug changes first. Makes more sense. Yinghai Lu wrote: >On Fri, Feb 22, 2013 at 10:06 AM, Stefano Stabellini > wrote: >> On Fri, 22 Feb 2013, Konrad Rzeszutek Wilk wrote: >>> On Fri, Feb 22, 2013 at 09:12:57AM -0800, H. Peter Anvin wrote: >>> > On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote: >>> > > >>> > >What is bizzare is that I do recall testing this (and Stefano >also did it). >>> > >So I am not sure what has altered. >>> > > >>> > >>> > Yes, there was a very specific reason why I wanted you guys to >test it... >>> >>> Exactly. And I re-ran the same test, but with a new kernel. This is >what >>> git reflog tells me: >>> >>> 473cd24 HEAD@{75}: checkout: moving from >08f321ed97353cf3b3fafa6b1c1971d6a8970830 to linux-next >>> 08f321e HEAD@{76}: checkout: moving from linux-next to >yinghai/for-x86-mm >>> eb827a7 HEAD@{77}: checkout: moving from >1b66ccf15ff4bd0200567e8d70446a8763f96ee7 to linux-next >>> [konrad@build linux]$ git show 08f321e >>> commit 08f321ed97353cf3b3fafa6b1c1971d6a8970830 >>> Author: Yinghai Lu >>> Date: Thu Nov 8 00:00:19 2012 -0800 >>> >>> mm: Kill NO_BOOTMEM version free_all_bootmem_node() >>> >>> And I recall Stefano later on testing (I was in a conference and did >not have >>> the opportunity to test it). Not sure what he ran with. >>> >> >> FYI the last patch series I tested was Yinghai's "x86, boot, 64bit: >Add >> support for loading ramdisk and bzImage above 4G" v7u1. > > >the one in tip and linus's tree is >--- >-v7u2: update changelog and comments, and clear more fields for >sentinel. > Update swiotlb autoswitch off patch. > Fix crash with xen PV guest with 2G. >--- > >and it fixes xen crash that you reported with v7u1, and you tested >that add-on patch >fix_xen_2g.patch with v7u1. >and I fold the addon patch into offending patch in v7u2. > > >Thanks > >Yinghai -- Sent from my mobile phone. Please excuse brevity and lack of formatting. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] x86/mm changes for v3.9-rc1
> > [0.00] DMI: MSI MS-7680/H61M-P23 (MS-7680), BIOS V17.0 03/14/2011 > > [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved > > [0.00] e820: remove [mem 0x000a-0x000f] usable > > [0.00] No AGP bridge found > > [0.00] e820: last_pfn = 0x23fe00 max_arch_pfn = 0x4 > > [0.00] e820: lacanning 1 areas for low memory corruption > > [0.00] Base memory trampoline at [88098000] 98000 size 24576 > > [0.00] reserving inaccessible SNB gfx pages > > [0.00] init_memory_mapping: [mem 0x-0x000f] > > [0.00] [mem 0x-0x000f] page 4k > > [0.00] init_memory_mapping: [mem 0x1f200-0x1f20c3fff] > > [0.00] [mem 0x1f200-0x1f20c3fff] page 4k > > [0.00] BRK [0x01cd2000, 0x01cd2fff] PGTABLE > > [0.00] BRK [0x01cd3000, 0x01cd3fff] PGTABLE > > [0.00] init_memory_mapping: [mem 0x1f000-0x1f1ff] > > [0.00] [mem 0x1f000-0x1f1ff] page 4k > > [0.00] BRK [0x01cd4000, 0x01cd4fff] PGTABLE > > [0.00] BRK [0x01cd5000, 0x01cd5fff] PGTABLE > > [0.00] BRK [0x01cd6000, 0x01cd6fff] PGTABLE > > [0.00] init_memory_mapping: [mem 0x18000-0x1efff] > > [0.00] [mem 0x18000-0x1efff] page 4k > > [0.00] init_memory_mapping: [mem 0x0010-0x1fff] > > [0.00] [mem 0x0010-0x1fff] page 4k > > [0.00] init_memory_mapping: [mem 0x2020-0x3fff] > > [0.00] [mem 0x2020-0x3fff] page 4k > > [0.00] init_memory_mapping: [mem 0x4020-0xbad7] > > [0.00] [mem 0x4020-0xbad7] page 4k > > [0.00] init_memory_mapping: [mem 0xbadf4000-0xbadf5fff] > > [0.00] [mem 0xbadf4000-0xbadf5fff] page 4k > > [0.00] init_memory_mapping: [mem 0xbae7f000-0xbaff] > > [0.00] [mem 0xbae7f000-0xbaff] page 4k > > [0.00] init_memory_mapping: [mem 0x1-0x17fff] > > [0.00] [mem 0x1-0x17fff] page 4k > > [0.00] init_memory_mapping: [mem 0x1f20c4000-0x23fdf] > > [0.00] [mem 0x1f20c4000-0x23fdf] page 4k > > so init_memory_mapping are all done. Not so. > > > (XEN) d0:v0: unhandled page fault (ec=) > > (XEN) Pagetable walk from ea05b2d0: > > (XEN) L4[0x1d4] = > > (XEN) domain_crash_sync called from entry.S > > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > > (XEN) [ Xen-4.1.5-pre x86_64 debug=y Tainted:C ] > > (XEN) CPU:0 > > (XEN) RIP:e033:[] > > (XEN) RFLAGS: 0206 EM: 1 CONTEXT: pv guest > > (XEN) rax: ea00 rbx: 01a0c000 rcx: 8000 > > (XEN) rdx: 0005b2a0 rsi: 01a0c000 rdi: > > (XEN) rbp: 81a01dd8 rsp: 81a01d90 r8: > > (XEN) r9: 1001 r10: 0005 r11: 0010 > > (XEN) r12: r13: 0200 r14: > > (XEN) r15: 0010 cr0: 8005003b cr4: 26f0 > > (XEN) cr3: 000221a0c000 cr2: ea05b2d0 > > (XEN) ds: es: fs: gs: ss: e02b cs: e033 > > (XEN) Guest stack trace from rsp=81a01d90: > > (XEN)8000 0010 8103feba > > (XEN)0001e030 00010006 81a01dd8 e02b > > (XEN) 81a01e08 81042d27 00023fe0 > > (XEN)0001f20c4000 0200 0001acac7000 81a01e48 > > (XEN)81ad2d21 0028 40004000 > > (XEN) 81a01ed8 > > (XEN)81ac293f 81b46900 > > (XEN) 81a01f00 8165fbd1 0010 > > (XEN)81a01ee8 81a01ea8 81a01ec8 > > (XEN) 81b46900 > > (XEN) 81a01f28 81abcd62 96062000 > > (XEN)81cc6000 81ccd000 81b4f2e0 > > (XEN) 81a01f38 > > (XEN)81abc5f7 81a01ff8 81abf0c7 03010032 > > (XEN)0005 > > (XEN) > > (XEN) > > (XEN) > > (XEN) 819822831fc9cbf5 000206a700100800 0001 > > (XEN) 0f0060c0c748 c305 > > (XEN) Domain 0 crashed: re
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On Fri, Feb 22, 2013 at 10:06 AM, Stefano Stabellini wrote: > On Fri, 22 Feb 2013, Konrad Rzeszutek Wilk wrote: >> On Fri, Feb 22, 2013 at 09:12:57AM -0800, H. Peter Anvin wrote: >> > On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote: >> > > >> > >What is bizzare is that I do recall testing this (and Stefano also did >> > >it). >> > >So I am not sure what has altered. >> > > >> > >> > Yes, there was a very specific reason why I wanted you guys to test it... >> >> Exactly. And I re-ran the same test, but with a new kernel. This is what >> git reflog tells me: >> >> 473cd24 HEAD@{75}: checkout: moving from >> 08f321ed97353cf3b3fafa6b1c1971d6a8970830 to linux-next >> 08f321e HEAD@{76}: checkout: moving from linux-next to yinghai/for-x86-mm >> eb827a7 HEAD@{77}: checkout: moving from >> 1b66ccf15ff4bd0200567e8d70446a8763f96ee7 to linux-next >> [konrad@build linux]$ git show 08f321e >> commit 08f321ed97353cf3b3fafa6b1c1971d6a8970830 >> Author: Yinghai Lu >> Date: Thu Nov 8 00:00:19 2012 -0800 >> >> mm: Kill NO_BOOTMEM version free_all_bootmem_node() >> >> And I recall Stefano later on testing (I was in a conference and did not have >> the opportunity to test it). Not sure what he ran with. >> > > FYI the last patch series I tested was Yinghai's "x86, boot, 64bit: Add > support for loading ramdisk and bzImage above 4G" v7u1. the one in tip and linus's tree is --- -v7u2: update changelog and comments, and clear more fields for sentinel. Update swiotlb autoswitch off patch. Fix crash with xen PV guest with 2G. --- and it fixes xen crash that you reported with v7u1, and you tested that add-on patch fix_xen_2g.patch with v7u1. and I fold the addon patch into offending patch in v7u2. Thanks Yinghai ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On Fri, 22 Feb 2013, Konrad Rzeszutek Wilk wrote: > On Fri, Feb 22, 2013 at 09:12:57AM -0800, H. Peter Anvin wrote: > > On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote: > > > > > >What is bizzare is that I do recall testing this (and Stefano also did it). > > >So I am not sure what has altered. > > > > > > > Yes, there was a very specific reason why I wanted you guys to test it... > > Exactly. And I re-ran the same test, but with a new kernel. This is what > git reflog tells me: > > 473cd24 HEAD@{75}: checkout: moving from > 08f321ed97353cf3b3fafa6b1c1971d6a8970830 to linux-next > 08f321e HEAD@{76}: checkout: moving from linux-next to yinghai/for-x86-mm > eb827a7 HEAD@{77}: checkout: moving from > 1b66ccf15ff4bd0200567e8d70446a8763f96ee7 to linux-next > [konrad@build linux]$ git show 08f321e > commit 08f321ed97353cf3b3fafa6b1c1971d6a8970830 > Author: Yinghai Lu > Date: Thu Nov 8 00:00:19 2012 -0800 > > mm: Kill NO_BOOTMEM version free_all_bootmem_node() > > And I recall Stefano later on testing (I was in a conference and did not have > the opportunity to test it). Not sure what he ran with. > FYI the last patch series I tested was Yinghai's "x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G" v7u1. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On Fri, Feb 22, 2013 at 9:38 AM, Konrad Rzeszutek Wilk wrote: > On Fri, Feb 22, 2013 at 09:12:57AM -0800, H. Peter Anvin wrote: >> On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote: >> > >> >What is bizzare is that I do recall testing this (and Stefano also did it). >> >So I am not sure what has altered. >> > >> >> Yes, there was a very specific reason why I wanted you guys to test it... > > Exactly. And I re-ran the same test, but with a new kernel. This is what > git reflog tells me: > > 473cd24 HEAD@{75}: checkout: moving from > 08f321ed97353cf3b3fafa6b1c1971d6a8970830 to linux-next > 08f321e HEAD@{76}: checkout: moving from linux-next to yinghai/for-x86-mm > eb827a7 HEAD@{77}: checkout: moving from > 1b66ccf15ff4bd0200567e8d70446a8763f96ee7 to linux-next > [konrad@build linux]$ git show 08f321e > commit 08f321ed97353cf3b3fafa6b1c1971d6a8970830 > Author: Yinghai Lu > Date: Thu Nov 8 00:00:19 2012 -0800 > > mm: Kill NO_BOOTMEM version free_all_bootmem_node() > > And I recall Stefano later on testing (I was in a conference and did not have > the opportunity to test it). Not sure what he ran with. the commit in tip and linus tree have different hash... commit 600cc5b7f6371706679490d7ee108015ae57ac2f Author: Yinghai Lu Date: Fri Nov 16 19:39:22 2012 -0800 mm: Kill NO_BOOTMEM version free_all_bootmem_node() Now NO_BOOTMEM version free_all_bootmem_node() does not really do free_bootmem at all, and it only call register_page_bootmem_info_node for online nodes instead. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On Fri, Feb 22, 2013 at 9:24 AM, Konrad Rzeszutek Wilk wrote: > On Fri, Feb 22, 2013 at 11:55:31AM -0500, Konrad Rzeszutek Wilk wrote: >> On Thu, Feb 21, 2013 at 04:34:06PM -0800, H. Peter Anvin wrote: >> > Hi Linus, >> > >> > This is a huge set of several partly interrelated (and concurrently >> > developed) changes, which is why the branch history is messier than >> > one would like. >> > >> > The *really* big items are two humonguous patchsets mostly developed >> > by Yinghai Lu at my request, which completely revamps the way we >> > create initial page tables. In particular, rather than estimating how >> > much memory we will need for page tables and then build them into that >> > memory -- a calculation that has shown to be incredibly fragile -- we >> > now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- >> > a #PF handler which creates temporary page tables on demand. >> > >> > This has several advantages: >> > >> > 1. It makes it much easier to support things that need access to >> >data very early (a followon patchset uses this to load microcode >> >way early in the kernel startup). >> > >> > 2. It allows the kernel and all the kernel data objects to be invoked >> >from above the 4 GB limit. This allows kdump to work on very large >> >systems. >> > >> > 3. It greatly reduces the difference between Xen and native (Xen's >> >equivalent of the #PF handler are the temporary page tables created >> >by the domain builder), eliminating a bunch of fragile hooks. >> > >> > The patch series also gets us a bit closer to W^X. >> > >> > Additional work in this pull is the 64-bit get_user() work which you >> > were also involved with, and a bunch of cleanups/speedups to >> > __phys_addr()/__pa(). >> >> Looking at figuring out which of the patches in the branch did this, but >> with this merge I am getting a crash with a very simple PV guest (booted with >> one 1G): >> >> Call Trace: >> [] xen_get_user_pgd+0x5a <-- >> [] xen_get_user_pgd+0x5a >> [] xen_write_cr3+0x77 >> [] init_mem_mapping+0x1f9 >> [] setup_arch+0x742 >> [] printk+0x48 >> [] start_kernel+0x90 >> [] __add_preferred_console.clone.1+0x9b >> [] x86_64_start_reservations+0x2a >> [] xen_start_kernel+0x564 >> >> And the hypervisor says: >> (XEN) d7:v0: unhandled page fault (ec=) >> (XEN) Pagetable walk from ea05b2d0: >> (XEN) L4[0x1d4] = >> (XEN) domain_crash_sync called from entry.S >> (XEN) Domain 7 (vcpu#0) crashed on cpu#3: >> (XEN) [ Xen-4.2.0 x86_64 debug=n Not tainted ] >> (XEN) CPU:3 >> (XEN) RIP:e033:[] >> (XEN) RFLAGS: 0206 EM: 1 CONTEXT: pv guest >> (XEN) rax: ea00 rbx: 01a0c000 rcx: 8000 >> (XEN) rdx: 0005b2a0 rsi: 01a0c000 rdi: >> (XEN) rbp: 81a01dd8 rsp: 81a01d90 r8: >> (XEN) r9: 1001 r10: r11: >> (XEN) r12: r13: 0010 r14: >> (XEN) r15: 0010 cr0: 8005003b cr4: 000406f0 >> (XEN) cr3: 000411165000 cr2: ea05b2d0 >> (XEN) ds: es: fs: gs: ss: e02b cs: e033 >> (XEN) Guest stack trace from rsp=81a01d90: >> (XEN)8000 8103feba >> (XEN)0001e030 00010006 81a01dd8 e02b > > Here is a better serial log of the crash (just booting a normal Xen 4.1 + > initial > kernel with 8GB): > > PXELINUX 3.82 2009-06-09 Copyright (C) 1994-2009 H. Peter Anvin et al > boot: > Loading xen.gz... ok > Loading vmlinuz... ok > Loading initramfs.cpio.gz... ok > __ ___ __ > \ \/ /___ _ __ | || | / | | ___|_ __ _ __ ___ > \ // _ \ '_(_)_(_)/ | .__/|_| \___| >|_| > (XEN) Xen version 4.1.5-pre (kon...@dumpdata.com) (gcc version 4.4.4 20100503 > (Red Hat 4.4.4-2) (GCC) ) Fri Feb 22 11:37:00 EST 2013 > (XEN) Latest ChangeSet: Fri Feb 15 15:31:55 2013 +0100 23459:9f12bdd6b7f0 > (XEN) Console output is synchronous. > (XEN) Bootloader: unknown > (XEN) Command line: cpuinfo conring_size=1048576 sync_console cpufreq=verbose > com1=115200,8n1 console=com1,vga loglvl=all guest_loglvl=all > (XEN) Video information: > (XEN) VGA is text mode 80x25, font 8x16 > (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds > (XEN) EDID info not retrieved because no DDC retrieval method detected > (XEN) Disc information: > (XEN) Found 1 MBR signatures > (XEN) Found 1 EDD information structures > (XEN) Xen-e820 RAM map: > (XEN) - 0009ec00 (usable) > (XEN) 0009ec00 - 000a (reserved) > (XEN) 000e - 0010 (reserved) > (XEN) 0010 - 2000 (usable) > (XEN) 2000 - 202
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote: > On Thu, Feb 21, 2013 at 04:34:06PM -0800, H. Peter Anvin wrote: >> Hi Linus, >> >> This is a huge set of several partly interrelated (and concurrently >> developed) changes, which is why the branch history is messier than >> one would like. >> >> The *really* big items are two humonguous patchsets mostly developed >> by Yinghai Lu at my request, which completely revamps the way we >> create initial page tables. In particular, rather than estimating how >> much memory we will need for page tables and then build them into that >> memory -- a calculation that has shown to be incredibly fragile -- we >> now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- >> a #PF handler which creates temporary page tables on demand. >> >> This has several advantages: >> >> 1. It makes it much easier to support things that need access to >>data very early (a followon patchset uses this to load microcode >>way early in the kernel startup). >> >> 2. It allows the kernel and all the kernel data objects to be invoked >>from above the 4 GB limit. This allows kdump to work on very large >>systems. >> >> 3. It greatly reduces the difference between Xen and native (Xen's >>equivalent of the #PF handler are the temporary page tables created >>by the domain builder), eliminating a bunch of fragile hooks. >> >> The patch series also gets us a bit closer to W^X. >> >> Additional work in this pull is the 64-bit get_user() work which you >> were also involved with, and a bunch of cleanups/speedups to >> __phys_addr()/__pa(). > > Looking at figuring out which of the patches in the branch did this, but > with this merge I am getting a crash with a very simple PV guest (booted with > one 1G): > > Call Trace: > [] xen_get_user_pgd+0x5a <-- > [] xen_get_user_pgd+0x5a > [] xen_write_cr3+0x77 > [] init_mem_mapping+0x1f9 > [] setup_arch+0x742 > [] printk+0x48 > [] start_kernel+0x90 > [] __add_preferred_console.clone.1+0x9b > [] x86_64_start_reservations+0x2a > [] xen_start_kernel+0x564 Do you have CONFIG_DEBUG_VIRTUAL on? You're probably hitting the new BUG_ON() in __phys_addr(). It's intended to detect places where someone is doing a __pa()/__phys_addr() on an address that's outside the kernel's identity mapping. There are a lot of __pa() calls around there, but from the looks of it, it's this code: static pgd_t *xen_get_user_pgd(pgd_t *pgd) { ... if (offset < pgd_index(USER_LIMIT)) { struct page *page = virt_to_page(pgd_page); I'm a bit fuzzy on exactly what the code is trying to do here. It could mean either that the identity mapping isn't set up enough yet, or that __pa() is getting called on a bogus address. I'm especially fuzzy on why we'd be calling anything that's looking at userspace pagetables (xen_get_user_pgd() ??) this early in boot. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On Fri, Feb 22, 2013 at 09:12:57AM -0800, H. Peter Anvin wrote: > On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote: > > > >What is bizzare is that I do recall testing this (and Stefano also did it). > >So I am not sure what has altered. > > > > Yes, there was a very specific reason why I wanted you guys to test it... Exactly. And I re-ran the same test, but with a new kernel. This is what git reflog tells me: 473cd24 HEAD@{75}: checkout: moving from 08f321ed97353cf3b3fafa6b1c1971d6a8970830 to linux-next 08f321e HEAD@{76}: checkout: moving from linux-next to yinghai/for-x86-mm eb827a7 HEAD@{77}: checkout: moving from 1b66ccf15ff4bd0200567e8d70446a8763f96ee7 to linux-next [konrad@build linux]$ git show 08f321e commit 08f321ed97353cf3b3fafa6b1c1971d6a8970830 Author: Yinghai Lu Date: Thu Nov 8 00:00:19 2012 -0800 mm: Kill NO_BOOTMEM version free_all_bootmem_node() And I recall Stefano later on testing (I was in a conference and did not have the opportunity to test it). Not sure what he ran with. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On 02/22/2013 09:30 AM, Dave Hansen wrote: Do you have CONFIG_DEBUG_VIRTUAL on? You're probably hitting the new BUG_ON() in __phys_addr(). It's intended to detect places where someone is doing a __pa()/__phys_addr() on an address that's outside the kernel's identity mapping. There are a lot of __pa() calls around there, but from the looks of it, it's this code: static pgd_t *xen_get_user_pgd(pgd_t *pgd) { ... if (offset < pgd_index(USER_LIMIT)) { struct page *page = virt_to_page(pgd_page); I'm a bit fuzzy on exactly what the code is trying to do here. It could mean either that the identity mapping isn't set up enough yet, or that __pa() is getting called on a bogus address. I'm especially fuzzy on why we'd be calling anything that's looking at userspace pagetables (xen_get_user_pgd() ??) this early in boot. Ah yes, of course. This is unrelated to the early page table setups, which is why it didn't trip in Konrad's earlier testing. This debugging bits has already found real bugs in the kernel, and this might be another. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On 02/22/2013 08:22 AM, Linus Torvalds wrote: Ugh. So I've tried to walk through this, and it's painful. If this results in problems, we're going to be *so* screwed. Is it bisectable? I can't tell you for sure that it is bisectable at every point. There are definite bisection points in there, though, as this is several pieces of work from two kernel cycles that were independently tested. I also don't understand how "early_idt_handler" could *possibly* work. In particular, it seems to rely on the trap number being set up in the stack frame: cmpl $14,72(%rsp) # Page fault? but that's not even *true*. Why? Because we export both the early_idt_handlers[] array (that sets up the trap number and makes the stack frame be reliable) and the single early_idt_handler function (that relies on the trap number and the reliable stack frame), AND AFAIK WE USE THE LATTER! See x86_64_start_kernel(): for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) { #ifdef CONFIG_EARLY_PRINTK set_intr_gate(i, &early_idt_handlers[i]); #else set_intr_gate(i, early_idt_handler); #endif } so unless you have CONFIG_EARLY_PRINTK, the interrupt gate will point to that raw early_idt_handler function that doesn't *work* on its own, afaik. This is a (pre-existing!) bug that absolutely needs to be fixed, which ought to break other things too (early use of *msr_safe for example, or anything else that relies on an early exception entry, which there aren't a lot of so far). The fix is simple and obvious. But you're right... what the heck is going on here? My own testing would probably not have caught this, as I consider EARLY_PRINTK a must have, but Ingo's test machines definitely would have. Btw, it's not just the page fault index testing that is wrong. The whole cmpl $__KERNEL_CS,96(%rsp) jne 11f also relies on the stack frame being set up the same way for all exceptions - which again is only true if we ran through the early_idt_handlers[] prologue that added the extra stack entry. How does this even work for me? I don't have EARLY_PRINTK enabled. What am I missing? I just ran a simulation without EARLY_PRINTK, presumably based on the memory layout, we can apparently go through the entire bootup sequence without actually ever taking an early trap. It is a bug, though, and it is a bug even without this patchset. I will submit a fix. However, the Xen "we tested this, this worked, now it doesn't" worries me a lot. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On 02/22/2013 09:24 AM, Konrad Rzeszutek Wilk wrote: Here is a better serial log of the crash (just booting a normal Xen 4.1 + initial kernel with 8GB): Configuration, please, especially: is early_printk compiled in? Also, since this is Xen-related we really need your help on this. A lot of this is not going to be meaningful to non-Xen people. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On Fri, Feb 22, 2013 at 11:55:31AM -0500, Konrad Rzeszutek Wilk wrote: > On Thu, Feb 21, 2013 at 04:34:06PM -0800, H. Peter Anvin wrote: > > Hi Linus, > > > > This is a huge set of several partly interrelated (and concurrently > > developed) changes, which is why the branch history is messier than > > one would like. > > > > The *really* big items are two humonguous patchsets mostly developed > > by Yinghai Lu at my request, which completely revamps the way we > > create initial page tables. In particular, rather than estimating how > > much memory we will need for page tables and then build them into that > > memory -- a calculation that has shown to be incredibly fragile -- we > > now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- > > a #PF handler which creates temporary page tables on demand. > > > > This has several advantages: > > > > 1. It makes it much easier to support things that need access to > >data very early (a followon patchset uses this to load microcode > >way early in the kernel startup). > > > > 2. It allows the kernel and all the kernel data objects to be invoked > >from above the 4 GB limit. This allows kdump to work on very large > >systems. > > > > 3. It greatly reduces the difference between Xen and native (Xen's > >equivalent of the #PF handler are the temporary page tables created > >by the domain builder), eliminating a bunch of fragile hooks. > > > > The patch series also gets us a bit closer to W^X. > > > > Additional work in this pull is the 64-bit get_user() work which you > > were also involved with, and a bunch of cleanups/speedups to > > __phys_addr()/__pa(). > > Looking at figuring out which of the patches in the branch did this, but > with this merge I am getting a crash with a very simple PV guest (booted with > one 1G): > > Call Trace: > [] xen_get_user_pgd+0x5a <-- > [] xen_get_user_pgd+0x5a > [] xen_write_cr3+0x77 > [] init_mem_mapping+0x1f9 > [] setup_arch+0x742 > [] printk+0x48 > [] start_kernel+0x90 > [] __add_preferred_console.clone.1+0x9b > [] x86_64_start_reservations+0x2a > [] xen_start_kernel+0x564 > > And the hypervisor says: > (XEN) d7:v0: unhandled page fault (ec=) > (XEN) Pagetable walk from ea05b2d0: > (XEN) L4[0x1d4] = > (XEN) domain_crash_sync called from entry.S > (XEN) Domain 7 (vcpu#0) crashed on cpu#3: > (XEN) [ Xen-4.2.0 x86_64 debug=n Not tainted ] > (XEN) CPU:3 > (XEN) RIP:e033:[] > (XEN) RFLAGS: 0206 EM: 1 CONTEXT: pv guest > (XEN) rax: ea00 rbx: 01a0c000 rcx: 8000 > (XEN) rdx: 0005b2a0 rsi: 01a0c000 rdi: > (XEN) rbp: 81a01dd8 rsp: 81a01d90 r8: > (XEN) r9: 1001 r10: r11: > (XEN) r12: r13: 0010 r14: > (XEN) r15: 0010 cr0: 8005003b cr4: 000406f0 > (XEN) cr3: 000411165000 cr2: ea05b2d0 > (XEN) ds: es: fs: gs: ss: e02b cs: e033 > (XEN) Guest stack trace from rsp=81a01d90: > (XEN)8000 8103feba > (XEN)0001e030 00010006 81a01dd8 e02b Here is a better serial log of the crash (just booting a normal Xen 4.1 + initial kernel with 8GB): PXELINUX 3.82 2009-06-09 Copyright (C) 1994-2009 H. Peter Anvin et al boot: Loading xen.gz... ok Loading vmlinuz... ok Loading initramfs.cpio.gz... ok __ ___ __ \ \/ /___ _ __ | || | / | | ___|_ __ _ __ ___ \ // _ \ '_(_)_(_)/ | .__/|_| \___| |_| (XEN) Xen version 4.1.5-pre (kon...@dumpdata.com) (gcc version 4.4.4 20100503 (Red Hat 4.4.4-2) (GCC) ) Fri Feb 22 11:37:00 EST 2013 (XEN) Latest ChangeSet: Fri Feb 15 15:31:55 2013 +0100 23459:9f12bdd6b7f0 (XEN) Console output is synchronous. (XEN) Bootloader: unknown (XEN) Command line: cpuinfo conring_size=1048576 sync_console cpufreq=verbose com1=115200,8n1 console=com1,vga loglvl=all guest_loglvl=all (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds (XEN) EDID info not retrieved because no DDC retrieval method detected (XEN) Disc information: (XEN) Found 1 MBR signatures (XEN) Found 1 EDD information structures (XEN) Xen-e820 RAM map: (XEN) - 0009ec00 (usable) (XEN) 0009ec00 - 000a (reserved) (XEN) 000e - 0010 (reserved) (XEN) 0010 - 2000 (usable) (XEN) 2000 - 2020 (reserved) (XEN) 2020 - 4000 (usable) (XEN) 4000 - 4020 (reserved) (XEN) 4020 -
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote: What is bizzare is that I do recall testing this (and Stefano also did it). So I am not sure what has altered. Yes, there was a very specific reason why I wanted you guys to test it... -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On Thu, Feb 21, 2013 at 04:34:06PM -0800, H. Peter Anvin wrote: > Hi Linus, > > This is a huge set of several partly interrelated (and concurrently > developed) changes, which is why the branch history is messier than > one would like. > > The *really* big items are two humonguous patchsets mostly developed > by Yinghai Lu at my request, which completely revamps the way we > create initial page tables. In particular, rather than estimating how > much memory we will need for page tables and then build them into that > memory -- a calculation that has shown to be incredibly fragile -- we > now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- > a #PF handler which creates temporary page tables on demand. > > This has several advantages: > > 1. It makes it much easier to support things that need access to >data very early (a followon patchset uses this to load microcode >way early in the kernel startup). > > 2. It allows the kernel and all the kernel data objects to be invoked >from above the 4 GB limit. This allows kdump to work on very large >systems. > > 3. It greatly reduces the difference between Xen and native (Xen's >equivalent of the #PF handler are the temporary page tables created >by the domain builder), eliminating a bunch of fragile hooks. > > The patch series also gets us a bit closer to W^X. > > Additional work in this pull is the 64-bit get_user() work which you > were also involved with, and a bunch of cleanups/speedups to > __phys_addr()/__pa(). Looking at figuring out which of the patches in the branch did this, but with this merge I am getting a crash with a very simple PV guest (booted with one 1G): Call Trace: [] xen_get_user_pgd+0x5a <-- [] xen_get_user_pgd+0x5a [] xen_write_cr3+0x77 [] init_mem_mapping+0x1f9 [] setup_arch+0x742 [] printk+0x48 [] start_kernel+0x90 [] __add_preferred_console.clone.1+0x9b [] x86_64_start_reservations+0x2a [] xen_start_kernel+0x564 And the hypervisor says: (XEN) d7:v0: unhandled page fault (ec=) (XEN) Pagetable walk from ea05b2d0: (XEN) L4[0x1d4] = (XEN) domain_crash_sync called from entry.S (XEN) Domain 7 (vcpu#0) crashed on cpu#3: (XEN) [ Xen-4.2.0 x86_64 debug=n Not tainted ] (XEN) CPU:3 (XEN) RIP:e033:[] (XEN) RFLAGS: 0206 EM: 1 CONTEXT: pv guest (XEN) rax: ea00 rbx: 01a0c000 rcx: 8000 (XEN) rdx: 0005b2a0 rsi: 01a0c000 rdi: (XEN) rbp: 81a01dd8 rsp: 81a01d90 r8: (XEN) r9: 1001 r10: r11: (XEN) r12: r13: 0010 r14: (XEN) r15: 0010 cr0: 8005003b cr4: 000406f0 (XEN) cr3: 000411165000 cr2: ea05b2d0 (XEN) ds: es: fs: gs: ss: e02b cs: e033 (XEN) Guest stack trace from rsp=81a01d90: (XEN)8000 8103feba (XEN)0001e030 00010006 81a01dd8 e02b What is bizzare is that I do recall testing this (and Stefano also did it). So I am not sure what has altered. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] x86/mm changes for v3.9-rc1
On Thu, Feb 21, 2013 at 4:34 PM, H. Peter Anvin wrote: > > This is a huge set of several partly interrelated (and concurrently > developed) changes, which is why the branch history is messier than > one would like. > > The *really* big items are two humonguous patchsets mostly developed > by Yinghai Lu at my request, which completely revamps the way we > create initial page tables. Ugh. So I've tried to walk through this, and it's painful. If this results in problems, we're going to be *so* screwed. Is it bisectable? I also don't understand how "early_idt_handler" could *possibly* work. In particular, it seems to rely on the trap number being set up in the stack frame: cmpl $14,72(%rsp) # Page fault? but that's not even *true*. Why? Because we export both the early_idt_handlers[] array (that sets up the trap number and makes the stack frame be reliable) and the single early_idt_handler function (that relies on the trap number and the reliable stack frame), AND AFAIK WE USE THE LATTER! See x86_64_start_kernel(): for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) { #ifdef CONFIG_EARLY_PRINTK set_intr_gate(i, &early_idt_handlers[i]); #else set_intr_gate(i, early_idt_handler); #endif } so unless you have CONFIG_EARLY_PRINTK, the interrupt gate will point to that raw early_idt_handler function that doesn't *work* on its own, afaik. Btw, it's not just the page fault index testing that is wrong. The whole cmpl $__KERNEL_CS,96(%rsp) jne 11f also relies on the stack frame being set up the same way for all exceptions - which again is only true if we ran through the early_idt_handlers[] prologue that added the extra stack entry. How does this even work for me? I don't have EARLY_PRINTK enabled. What am I missing? Linus ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization