Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-23 Thread H. Peter Anvin
,Jarkko Sakkinen ,Jeremy 
Fitzhardinge ,Joe Millenbach ,Joerg 
Roedel ,Johannes Weiner ,Josh Triplett 
,Kyungmin Park ,Lee 
Schermerhorn ,Len Brown ,Linux 
Kernel Mailing List ,Marcelo Tosatti 
,Marek Szyprowski ,Matt Fleming 
,Mel Gorman ,Paul Turner 
,Pavel Machek ,Pekka Enberg 
,Peter Zijlstra ,Ralf Baechle 
,Rik van Riel ,Rob Landley 
,Russell King ,Rusty Russell 
,Shuah Khan ,Shuah Khan 
,Steven Rostedt ,Thomas Gleixn!
 er
,=?ISO-8859-1?Q?Ville_Syrj=E4l=E4?= 
,Yasuaki Ishimatsu 
,Zachary Amsden 
,"a...@redhat.com" 
,"linux-m...@linux-mips.org" 
,"linux...@vger.kernel.org" 
,"m...@redhat.com" 
,"sparcli...@vger.kernel.org" 
,"virtualization@lists.linux-foundation.org" 
,"xen-de...@lists.xensource.com" 

Message-ID: <1b89b5cf-4ad4-4c25-ab76-a8ac6910c...@email.android.com>

Again... you probably want to check into Dave's debug changes first. Makes more 
sense.

Yinghai Lu  wrote:

>On Fri, Feb 22, 2013 at 10:06 AM, Stefano Stabellini
> wrote:
>> On Fri, 22 Feb 2013, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Feb 22, 2013 at 09:12:57AM -0800, H. Peter Anvin wrote:
>>> > On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote:
>>> > >
>>> > >What is bizzare is that I do recall testing this (and Stefano
>also did it).
>>> > >So I am not sure what has altered.
>>> > >
>>> >
>>> > Yes, there was a very specific reason why I wanted you guys to
>test it...
>>>
>>> Exactly. And I re-ran the same test, but with a new kernel. This is
>what
>>> git reflog tells me:
>>>
>>> 473cd24 HEAD@{75}: checkout: moving from
>08f321ed97353cf3b3fafa6b1c1971d6a8970830 to linux-next
>>> 08f321e HEAD@{76}: checkout: moving from linux-next to
>yinghai/for-x86-mm
>>> eb827a7 HEAD@{77}: checkout: moving from
>1b66ccf15ff4bd0200567e8d70446a8763f96ee7 to linux-next
>>> [konrad@build linux]$ git show 08f321e
>>> commit 08f321ed97353cf3b3fafa6b1c1971d6a8970830
>>> Author: Yinghai Lu 
>>> Date:   Thu Nov 8 00:00:19 2012 -0800
>>>
>>> mm: Kill NO_BOOTMEM version free_all_bootmem_node()
>>>
>>> And I recall Stefano later on testing (I was in a conference and did
>not have
>>> the opportunity to test it). Not sure what he ran with.
>>>
>>
>> FYI the last patch series I tested was Yinghai's "x86, boot, 64bit:
>Add
>> support for loading ramdisk and bzImage above 4G" v7u1.
>
>
>the one in tip and linus's tree is
>---
>-v7u2: update changelog and comments, and clear more fields for
>sentinel.
> Update swiotlb autoswitch off patch.
> Fix crash with xen PV guest with 2G.
>---
>
>and it fixes xen crash that you reported with v7u1, and you tested
>that add-on patch
>fix_xen_2g.patch with v7u1.
>and I fold the addon patch into offending patch in v7u2.
>
>
>Thanks
>
>Yinghai

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread Konrad Rzeszutek Wilk
> > [0.00] DMI: MSI MS-7680/H61M-P23 (MS-7680), BIOS V17.0 03/14/2011
> > [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
> > [0.00] e820: remove [mem 0x000a-0x000f] usable
> > [0.00] No AGP bridge found
> > [0.00] e820: last_pfn = 0x23fe00 max_arch_pfn = 0x4
> > [0.00] e820: lacanning 1 areas for low memory corruption
> > [0.00] Base memory trampoline at [88098000] 98000 size 24576
> > [0.00] reserving inaccessible SNB gfx pages
> > [0.00] init_memory_mapping: [mem 0x-0x000f]
> > [0.00]  [mem 0x-0x000f] page 4k
> > [0.00] init_memory_mapping: [mem 0x1f200-0x1f20c3fff]
> > [0.00]  [mem 0x1f200-0x1f20c3fff] page 4k
> > [0.00] BRK [0x01cd2000, 0x01cd2fff] PGTABLE
> > [0.00] BRK [0x01cd3000, 0x01cd3fff] PGTABLE
> > [0.00] init_memory_mapping: [mem 0x1f000-0x1f1ff]
> > [0.00]  [mem 0x1f000-0x1f1ff] page 4k
> > [0.00] BRK [0x01cd4000, 0x01cd4fff] PGTABLE
> > [0.00] BRK [0x01cd5000, 0x01cd5fff] PGTABLE
> > [0.00] BRK [0x01cd6000, 0x01cd6fff] PGTABLE
> > [0.00] init_memory_mapping: [mem 0x18000-0x1efff]
> > [0.00]  [mem 0x18000-0x1efff] page 4k
> > [0.00] init_memory_mapping: [mem 0x0010-0x1fff]
> > [0.00]  [mem 0x0010-0x1fff] page 4k
> > [0.00] init_memory_mapping: [mem 0x2020-0x3fff]
> > [0.00]  [mem 0x2020-0x3fff] page 4k
> > [0.00] init_memory_mapping: [mem 0x4020-0xbad7]
> > [0.00]  [mem 0x4020-0xbad7] page 4k
> > [0.00] init_memory_mapping: [mem 0xbadf4000-0xbadf5fff]
> > [0.00]  [mem 0xbadf4000-0xbadf5fff] page 4k
> > [0.00] init_memory_mapping: [mem 0xbae7f000-0xbaff]
> > [0.00]  [mem 0xbae7f000-0xbaff] page 4k
> > [0.00] init_memory_mapping: [mem 0x1-0x17fff]
> > [0.00]  [mem 0x1-0x17fff] page 4k
> > [0.00] init_memory_mapping: [mem 0x1f20c4000-0x23fdf]
> > [0.00]  [mem 0x1f20c4000-0x23fdf] page 4k
> 
> so init_memory_mapping are all done.

Not so.
> 
> > (XEN) d0:v0: unhandled page fault (ec=)
> > (XEN) Pagetable walk from ea05b2d0:
> > (XEN)  L4[0x1d4] =  
> > (XEN) domain_crash_sync called from entry.S
> > (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> > (XEN) [ Xen-4.1.5-pre  x86_64  debug=y  Tainted:C ]
> > (XEN) CPU:0
> > (XEN) RIP:e033:[]
> > (XEN) RFLAGS: 0206   EM: 1   CONTEXT: pv guest
> > (XEN) rax: ea00   rbx: 01a0c000   rcx: 8000
> > (XEN) rdx: 0005b2a0   rsi: 01a0c000   rdi: 
> > (XEN) rbp: 81a01dd8   rsp: 81a01d90   r8:  
> > (XEN) r9:  1001   r10: 0005   r11: 0010
> > (XEN) r12:    r13: 0200   r14: 
> > (XEN) r15: 0010   cr0: 8005003b   cr4: 26f0
> > (XEN) cr3: 000221a0c000   cr2: ea05b2d0
> > (XEN) ds:    es:    fs:    gs:    ss: e02b   cs: e033
> > (XEN) Guest stack trace from rsp=81a01d90:
> > (XEN)8000 0010  8103feba
> > (XEN)0001e030 00010006 81a01dd8 e02b
> > (XEN) 81a01e08 81042d27 00023fe0
> > (XEN)0001f20c4000 0200 0001acac7000 81a01e48
> > (XEN)81ad2d21  0028 40004000
> > (XEN)   81a01ed8
> > (XEN)81ac293f 81b46900  
> > (XEN) 81a01f00 8165fbd1 0010
> > (XEN)81a01ee8 81a01ea8  81a01ec8
> > (XEN) 81b46900  
> > (XEN) 81a01f28 81abcd62 96062000
> > (XEN)81cc6000 81ccd000 81b4f2e0 
> > (XEN)   81a01f38
> > (XEN)81abc5f7 81a01ff8 81abf0c7 03010032
> > (XEN)0005   
> > (XEN)   
> > (XEN)   
> > (XEN)   
> > (XEN) 819822831fc9cbf5 000206a700100800 0001
> > (XEN)  0f0060c0c748 c305
> > (XEN) Domain 0 crashed: re

Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread Yinghai Lu
On Fri, Feb 22, 2013 at 10:06 AM, Stefano Stabellini
 wrote:
> On Fri, 22 Feb 2013, Konrad Rzeszutek Wilk wrote:
>> On Fri, Feb 22, 2013 at 09:12:57AM -0800, H. Peter Anvin wrote:
>> > On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote:
>> > >
>> > >What is bizzare is that I do recall testing this (and Stefano also did 
>> > >it).
>> > >So I am not sure what has altered.
>> > >
>> >
>> > Yes, there was a very specific reason why I wanted you guys to test it...
>>
>> Exactly. And I re-ran the same test, but with a new kernel. This is what
>> git reflog tells me:
>>
>> 473cd24 HEAD@{75}: checkout: moving from 
>> 08f321ed97353cf3b3fafa6b1c1971d6a8970830 to linux-next
>> 08f321e HEAD@{76}: checkout: moving from linux-next to yinghai/for-x86-mm
>> eb827a7 HEAD@{77}: checkout: moving from 
>> 1b66ccf15ff4bd0200567e8d70446a8763f96ee7 to linux-next
>> [konrad@build linux]$ git show 08f321e
>> commit 08f321ed97353cf3b3fafa6b1c1971d6a8970830
>> Author: Yinghai Lu 
>> Date:   Thu Nov 8 00:00:19 2012 -0800
>>
>> mm: Kill NO_BOOTMEM version free_all_bootmem_node()
>>
>> And I recall Stefano later on testing (I was in a conference and did not have
>> the opportunity to test it). Not sure what he ran with.
>>
>
> FYI the last patch series I tested was Yinghai's "x86, boot, 64bit: Add
> support for loading ramdisk and bzImage above 4G" v7u1.


the one in tip and linus's tree is
---
-v7u2: update changelog and comments, and clear more fields for sentinel.
 Update swiotlb autoswitch off patch.
 Fix crash with xen PV guest with 2G.
---

and it fixes xen crash that you reported with v7u1, and you tested
that add-on patch
fix_xen_2g.patch with v7u1.
and I fold the addon patch into offending patch in v7u2.


Thanks

Yinghai
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread Stefano Stabellini
On Fri, 22 Feb 2013, Konrad Rzeszutek Wilk wrote:
> On Fri, Feb 22, 2013 at 09:12:57AM -0800, H. Peter Anvin wrote:
> > On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote:
> > >
> > >What is bizzare is that I do recall testing this (and Stefano also did it).
> > >So I am not sure what has altered.
> > >
> > 
> > Yes, there was a very specific reason why I wanted you guys to test it...
> 
> Exactly. And I re-ran the same test, but with a new kernel. This is what
> git reflog tells me:
> 
> 473cd24 HEAD@{75}: checkout: moving from 
> 08f321ed97353cf3b3fafa6b1c1971d6a8970830 to linux-next
> 08f321e HEAD@{76}: checkout: moving from linux-next to yinghai/for-x86-mm
> eb827a7 HEAD@{77}: checkout: moving from 
> 1b66ccf15ff4bd0200567e8d70446a8763f96ee7 to linux-next
> [konrad@build linux]$ git show 08f321e
> commit 08f321ed97353cf3b3fafa6b1c1971d6a8970830
> Author: Yinghai Lu 
> Date:   Thu Nov 8 00:00:19 2012 -0800
> 
> mm: Kill NO_BOOTMEM version free_all_bootmem_node()
> 
> And I recall Stefano later on testing (I was in a conference and did not have
> the opportunity to test it). Not sure what he ran with.
> 

FYI the last patch series I tested was Yinghai's "x86, boot, 64bit: Add
support for loading ramdisk and bzImage above 4G" v7u1.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread Yinghai Lu
On Fri, Feb 22, 2013 at 9:38 AM, Konrad Rzeszutek Wilk
 wrote:
> On Fri, Feb 22, 2013 at 09:12:57AM -0800, H. Peter Anvin wrote:
>> On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote:
>> >
>> >What is bizzare is that I do recall testing this (and Stefano also did it).
>> >So I am not sure what has altered.
>> >
>>
>> Yes, there was a very specific reason why I wanted you guys to test it...
>
> Exactly. And I re-ran the same test, but with a new kernel. This is what
> git reflog tells me:
>
> 473cd24 HEAD@{75}: checkout: moving from 
> 08f321ed97353cf3b3fafa6b1c1971d6a8970830 to linux-next
> 08f321e HEAD@{76}: checkout: moving from linux-next to yinghai/for-x86-mm
> eb827a7 HEAD@{77}: checkout: moving from 
> 1b66ccf15ff4bd0200567e8d70446a8763f96ee7 to linux-next
> [konrad@build linux]$ git show 08f321e
> commit 08f321ed97353cf3b3fafa6b1c1971d6a8970830
> Author: Yinghai Lu 
> Date:   Thu Nov 8 00:00:19 2012 -0800
>
> mm: Kill NO_BOOTMEM version free_all_bootmem_node()
>
> And I recall Stefano later on testing (I was in a conference and did not have
> the opportunity to test it). Not sure what he ran with.

the commit in tip and linus tree have different hash...

commit 600cc5b7f6371706679490d7ee108015ae57ac2f
Author: Yinghai Lu 
Date:   Fri Nov 16 19:39:22 2012 -0800

mm: Kill NO_BOOTMEM version free_all_bootmem_node()

Now NO_BOOTMEM version free_all_bootmem_node() does not really
do free_bootmem at all, and it only call register_page_bootmem_info_node
for online nodes instead.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread Yinghai Lu
On Fri, Feb 22, 2013 at 9:24 AM, Konrad Rzeszutek Wilk
 wrote:
> On Fri, Feb 22, 2013 at 11:55:31AM -0500, Konrad Rzeszutek Wilk wrote:
>> On Thu, Feb 21, 2013 at 04:34:06PM -0800, H. Peter Anvin wrote:
>> > Hi Linus,
>> >
>> > This is a huge set of several partly interrelated (and concurrently
>> > developed) changes, which is why the branch history is messier than
>> > one would like.
>> >
>> > The *really* big items are two humonguous patchsets mostly developed
>> > by Yinghai Lu at my request, which completely revamps the way we
>> > create initial page tables.  In particular, rather than estimating how
>> > much memory we will need for page tables and then build them into that
>> > memory -- a calculation that has shown to be incredibly fragile -- we
>> > now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
>> > a #PF handler which creates temporary page tables on demand.
>> >
>> > This has several advantages:
>> >
>> > 1. It makes it much easier to support things that need access to
>> >data very early (a followon patchset uses this to load microcode
>> >way early in the kernel startup).
>> >
>> > 2. It allows the kernel and all the kernel data objects to be invoked
>> >from above the 4 GB limit.  This allows kdump to work on very large
>> >systems.
>> >
>> > 3. It greatly reduces the difference between Xen and native (Xen's
>> >equivalent of the #PF handler are the temporary page tables created
>> >by the domain builder), eliminating a bunch of fragile hooks.
>> >
>> > The patch series also gets us a bit closer to W^X.
>> >
>> > Additional work in this pull is the 64-bit get_user() work which you
>> > were also involved with, and a bunch of cleanups/speedups to
>> > __phys_addr()/__pa().
>>
>> Looking at figuring out which of the patches in the branch did this, but
>> with this merge I am getting a crash with a very simple PV guest (booted with
>> one 1G):
>>
>> Call Trace:
>>   [] xen_get_user_pgd+0x5a  <--
>>   [] xen_get_user_pgd+0x5a
>>   [] xen_write_cr3+0x77
>>   [] init_mem_mapping+0x1f9
>>   [] setup_arch+0x742
>>   [] printk+0x48
>>   [] start_kernel+0x90
>>   [] __add_preferred_console.clone.1+0x9b
>>   [] x86_64_start_reservations+0x2a
>>   [] xen_start_kernel+0x564
>>
>> And the hypervisor says:
>> (XEN) d7:v0: unhandled page fault (ec=)
>> (XEN) Pagetable walk from ea05b2d0:
>> (XEN)  L4[0x1d4] =  
>> (XEN) domain_crash_sync called from entry.S
>> (XEN) Domain 7 (vcpu#0) crashed on cpu#3:
>> (XEN) [ Xen-4.2.0  x86_64  debug=n  Not tainted ]
>> (XEN) CPU:3
>> (XEN) RIP:e033:[]
>> (XEN) RFLAGS: 0206   EM: 1   CONTEXT: pv guest
>> (XEN) rax: ea00   rbx: 01a0c000   rcx: 8000
>> (XEN) rdx: 0005b2a0   rsi: 01a0c000   rdi: 
>> (XEN) rbp: 81a01dd8   rsp: 81a01d90   r8:  
>> (XEN) r9:  1001   r10:    r11: 
>> (XEN) r12:    r13: 0010   r14: 
>> (XEN) r15: 0010   cr0: 8005003b   cr4: 000406f0
>> (XEN) cr3: 000411165000   cr2: ea05b2d0
>> (XEN) ds:    es:    fs:    gs:    ss: e02b   cs: e033
>> (XEN) Guest stack trace from rsp=81a01d90:
>> (XEN)8000   8103feba
>> (XEN)0001e030 00010006 81a01dd8 e02b
>
> Here is a better serial log of the crash (just booting a normal Xen 4.1 + 
> initial
> kernel with 8GB):
>
> PXELINUX 3.82 2009-06-09  Copyright (C) 1994-2009 H. Peter Anvin et al
> boot:
> Loading xen.gz... ok
> Loading vmlinuz... ok
> Loading initramfs.cpio.gz... ok
>  __  ___  __   
>  \ \/ /___ _ __   | || |  / | | ___|_ __  _ __ ___
>   \  // _ \ '_(_)_(_)/   | .__/|_|  \___|
>|_|
> (XEN) Xen version 4.1.5-pre (kon...@dumpdata.com) (gcc version 4.4.4 20100503 
> (Red Hat 4.4.4-2) (GCC) ) Fri Feb 22 11:37:00 EST 2013
> (XEN) Latest ChangeSet: Fri Feb 15 15:31:55 2013 +0100 23459:9f12bdd6b7f0
> (XEN) Console output is synchronous.
> (XEN) Bootloader: unknown
> (XEN) Command line: cpuinfo conring_size=1048576 sync_console cpufreq=verbose 
> com1=115200,8n1 console=com1,vga loglvl=all guest_loglvl=all
> (XEN) Video information:
> (XEN)  VGA is text mode 80x25, font 8x16
> (XEN)  VBE/DDC methods: none; EDID transfer time: 0 seconds
> (XEN)  EDID info not retrieved because no DDC retrieval method detected
> (XEN) Disc information:
> (XEN)  Found 1 MBR signatures
> (XEN)  Found 1 EDD information structures
> (XEN) Xen-e820 RAM map:
> (XEN)   - 0009ec00 (usable)
> (XEN)  0009ec00 - 000a (reserved)
> (XEN)  000e - 0010 (reserved)
> (XEN)  0010 - 2000 (usable)
> (XEN)  2000 - 202

Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread Dave Hansen
On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote:
> On Thu, Feb 21, 2013 at 04:34:06PM -0800, H. Peter Anvin wrote:
>> Hi Linus,
>>
>> This is a huge set of several partly interrelated (and concurrently
>> developed) changes, which is why the branch history is messier than
>> one would like.
>>
>> The *really* big items are two humonguous patchsets mostly developed
>> by Yinghai Lu at my request, which completely revamps the way we
>> create initial page tables.  In particular, rather than estimating how
>> much memory we will need for page tables and then build them into that
>> memory -- a calculation that has shown to be incredibly fragile -- we
>> now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
>> a #PF handler which creates temporary page tables on demand.
>>
>> This has several advantages:
>>
>> 1. It makes it much easier to support things that need access to
>>data very early (a followon patchset uses this to load microcode
>>way early in the kernel startup).
>>
>> 2. It allows the kernel and all the kernel data objects to be invoked
>>from above the 4 GB limit.  This allows kdump to work on very large
>>systems.
>>
>> 3. It greatly reduces the difference between Xen and native (Xen's
>>equivalent of the #PF handler are the temporary page tables created
>>by the domain builder), eliminating a bunch of fragile hooks.
>>
>> The patch series also gets us a bit closer to W^X.
>>
>> Additional work in this pull is the 64-bit get_user() work which you
>> were also involved with, and a bunch of cleanups/speedups to
>> __phys_addr()/__pa().
> 
> Looking at figuring out which of the patches in the branch did this, but
> with this merge I am getting a crash with a very simple PV guest (booted with
> one 1G):
> 
> Call Trace:
>   [] xen_get_user_pgd+0x5a  <--
>   [] xen_get_user_pgd+0x5a 
>   [] xen_write_cr3+0x77 
>   [] init_mem_mapping+0x1f9 
>   [] setup_arch+0x742 
>   [] printk+0x48 
>   [] start_kernel+0x90 
>   [] __add_preferred_console.clone.1+0x9b 
>   [] x86_64_start_reservations+0x2a 
>   [] xen_start_kernel+0x564 

Do you have CONFIG_DEBUG_VIRTUAL on?

You're probably hitting the new BUG_ON() in __phys_addr().  It's
intended to detect places where someone is doing a __pa()/__phys_addr()
on an address that's outside the kernel's identity mapping.

There are a lot of __pa() calls around there, but from the looks of it,
it's this code:

static pgd_t *xen_get_user_pgd(pgd_t *pgd)
{
...
if (offset < pgd_index(USER_LIMIT)) {
struct page *page = virt_to_page(pgd_page);

I'm a bit fuzzy on exactly what the code is trying to do here.  It could
mean either that the identity mapping isn't set up enough yet, or that
__pa() is getting called on a bogus address.

I'm especially fuzzy on why we'd be calling anything that's looking at
userspace pagetables (xen_get_user_pgd() ??) this early in boot.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread Konrad Rzeszutek Wilk
On Fri, Feb 22, 2013 at 09:12:57AM -0800, H. Peter Anvin wrote:
> On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote:
> >
> >What is bizzare is that I do recall testing this (and Stefano also did it).
> >So I am not sure what has altered.
> >
> 
> Yes, there was a very specific reason why I wanted you guys to test it...

Exactly. And I re-ran the same test, but with a new kernel. This is what
git reflog tells me:

473cd24 HEAD@{75}: checkout: moving from 
08f321ed97353cf3b3fafa6b1c1971d6a8970830 to linux-next
08f321e HEAD@{76}: checkout: moving from linux-next to yinghai/for-x86-mm
eb827a7 HEAD@{77}: checkout: moving from 
1b66ccf15ff4bd0200567e8d70446a8763f96ee7 to linux-next
[konrad@build linux]$ git show 08f321e
commit 08f321ed97353cf3b3fafa6b1c1971d6a8970830
Author: Yinghai Lu 
Date:   Thu Nov 8 00:00:19 2012 -0800

mm: Kill NO_BOOTMEM version free_all_bootmem_node()

And I recall Stefano later on testing (I was in a conference and did not have
the opportunity to test it). Not sure what he ran with.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread H. Peter Anvin

On 02/22/2013 09:30 AM, Dave Hansen wrote:


Do you have CONFIG_DEBUG_VIRTUAL on?

You're probably hitting the new BUG_ON() in __phys_addr().  It's
intended to detect places where someone is doing a __pa()/__phys_addr()
on an address that's outside the kernel's identity mapping.

There are a lot of __pa() calls around there, but from the looks of it,
it's this code:

static pgd_t *xen_get_user_pgd(pgd_t *pgd)
{
...
 if (offset < pgd_index(USER_LIMIT)) {
 struct page *page = virt_to_page(pgd_page);

I'm a bit fuzzy on exactly what the code is trying to do here.  It could
mean either that the identity mapping isn't set up enough yet, or that
__pa() is getting called on a bogus address.

I'm especially fuzzy on why we'd be calling anything that's looking at
userspace pagetables (xen_get_user_pgd() ??) this early in boot.



Ah yes, of course.

This is unrelated to the early page table setups, which is why it didn't 
trip in Konrad's earlier testing.


This debugging bits has already found real bugs in the kernel, and this 
might be another.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread H. Peter Anvin

On 02/22/2013 08:22 AM, Linus Torvalds wrote:


Ugh. So I've tried to walk through this, and it's painful. If this
results in problems, we're going to be *so* screwed. Is it bisectable?



I can't tell you for sure that it is bisectable at every point.  There 
are definite bisection points in there, though, as this is several 
pieces of work from two kernel cycles that were independently tested.



I also don't understand how "early_idt_handler" could *possibly* work.
In particular, it seems to rely on the trap number being set up in the
stack frame:

 cmpl $14,72(%rsp)   # Page fault?

but that's not even *true*. Why? Because we export both the
early_idt_handlers[] array (that sets up the trap number and makes the
stack frame be reliable) and the single early_idt_handler function
(that relies on the trap number and the reliable stack frame), AND
AFAIK WE USE THE LATTER!

See x86_64_start_kernel():

 for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
#ifdef CONFIG_EARLY_PRINTK
 set_intr_gate(i, &early_idt_handlers[i]);
#else
 set_intr_gate(i, early_idt_handler);
#endif
 }

so unless you have CONFIG_EARLY_PRINTK, the interrupt gate will point
to that raw early_idt_handler function that doesn't *work* on its own,
afaik.



This is a (pre-existing!) bug that absolutely needs to be fixed, which 
ought to break other things too (early use of *msr_safe for example, or 
anything else that relies on an early exception entry, which there 
aren't a lot of so far).  The fix is simple and obvious.

But you're right... what the heck is going on here?

My own testing would probably not have caught this, as I consider 
EARLY_PRINTK a must have, but Ingo's test machines definitely would have.



Btw, it's not just the page fault index testing that is wrong. The whole

 cmpl $__KERNEL_CS,96(%rsp)
 jne 11f

also relies on the stack frame being set up the same way for all
exceptions - which again is only true if we ran through the
early_idt_handlers[] prologue that added the extra stack entry.

How does this even work for me? I don't have EARLY_PRINTK enabled.

What am I missing?


I just ran a simulation without EARLY_PRINTK, presumably based on the 
memory layout, we can apparently go through the entire bootup sequence 
without actually ever taking an early trap.  It is a bug, though, and it 
is a bug even without this patchset.  I will submit a fix.  However, the 
Xen "we tested this, this worked, now it doesn't" worries me a lot.


-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread H. Peter Anvin

On 02/22/2013 09:24 AM, Konrad Rzeszutek Wilk wrote:


Here is a better serial log of the crash (just booting a normal Xen 4.1 + 
initial
kernel with 8GB):



Configuration, please, especially: is early_printk compiled in?  Also, 
since this is Xen-related we really need your help on this.  A lot of 
this is not going to be meaningful to non-Xen people.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread Konrad Rzeszutek Wilk
On Fri, Feb 22, 2013 at 11:55:31AM -0500, Konrad Rzeszutek Wilk wrote:
> On Thu, Feb 21, 2013 at 04:34:06PM -0800, H. Peter Anvin wrote:
> > Hi Linus,
> > 
> > This is a huge set of several partly interrelated (and concurrently
> > developed) changes, which is why the branch history is messier than
> > one would like.
> > 
> > The *really* big items are two humonguous patchsets mostly developed
> > by Yinghai Lu at my request, which completely revamps the way we
> > create initial page tables.  In particular, rather than estimating how
> > much memory we will need for page tables and then build them into that
> > memory -- a calculation that has shown to be incredibly fragile -- we
> > now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
> > a #PF handler which creates temporary page tables on demand.
> > 
> > This has several advantages:
> > 
> > 1. It makes it much easier to support things that need access to
> >data very early (a followon patchset uses this to load microcode
> >way early in the kernel startup).
> > 
> > 2. It allows the kernel and all the kernel data objects to be invoked
> >from above the 4 GB limit.  This allows kdump to work on very large
> >systems.
> > 
> > 3. It greatly reduces the difference between Xen and native (Xen's
> >equivalent of the #PF handler are the temporary page tables created
> >by the domain builder), eliminating a bunch of fragile hooks.
> > 
> > The patch series also gets us a bit closer to W^X.
> > 
> > Additional work in this pull is the 64-bit get_user() work which you
> > were also involved with, and a bunch of cleanups/speedups to
> > __phys_addr()/__pa().
> 
> Looking at figuring out which of the patches in the branch did this, but
> with this merge I am getting a crash with a very simple PV guest (booted with
> one 1G):
> 
> Call Trace:
>   [] xen_get_user_pgd+0x5a  <--
>   [] xen_get_user_pgd+0x5a 
>   [] xen_write_cr3+0x77 
>   [] init_mem_mapping+0x1f9 
>   [] setup_arch+0x742 
>   [] printk+0x48 
>   [] start_kernel+0x90 
>   [] __add_preferred_console.clone.1+0x9b 
>   [] x86_64_start_reservations+0x2a 
>   [] xen_start_kernel+0x564 
> 
> And the hypervisor says:
> (XEN) d7:v0: unhandled page fault (ec=)
> (XEN) Pagetable walk from ea05b2d0:
> (XEN)  L4[0x1d4] =  
> (XEN) domain_crash_sync called from entry.S
> (XEN) Domain 7 (vcpu#0) crashed on cpu#3:
> (XEN) [ Xen-4.2.0  x86_64  debug=n  Not tainted ]
> (XEN) CPU:3
> (XEN) RIP:e033:[]
> (XEN) RFLAGS: 0206   EM: 1   CONTEXT: pv guest
> (XEN) rax: ea00   rbx: 01a0c000   rcx: 8000
> (XEN) rdx: 0005b2a0   rsi: 01a0c000   rdi: 
> (XEN) rbp: 81a01dd8   rsp: 81a01d90   r8:  
> (XEN) r9:  1001   r10:    r11: 
> (XEN) r12:    r13: 0010   r14: 
> (XEN) r15: 0010   cr0: 8005003b   cr4: 000406f0
> (XEN) cr3: 000411165000   cr2: ea05b2d0
> (XEN) ds:    es:    fs:    gs:    ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=81a01d90:
> (XEN)8000   8103feba
> (XEN)0001e030 00010006 81a01dd8 e02b

Here is a better serial log of the crash (just booting a normal Xen 4.1 + 
initial
kernel with 8GB):

PXELINUX 3.82 2009-06-09  Copyright (C) 1994-2009 H. Peter Anvin et al
boot: 
Loading xen.gz... ok
Loading vmlinuz... ok
Loading initramfs.cpio.gz... ok
 __  ___  __   
 \ \/ /___ _ __   | || |  / | | ___|_ __  _ __ ___ 
  \  // _ \ '_(_)_(_)/   | .__/|_|  \___|
   |_| 
(XEN) Xen version 4.1.5-pre (kon...@dumpdata.com) (gcc version 4.4.4 20100503 
(Red Hat 4.4.4-2) (GCC) ) Fri Feb 22 11:37:00 EST 2013
(XEN) Latest ChangeSet: Fri Feb 15 15:31:55 2013 +0100 23459:9f12bdd6b7f0
(XEN) Console output is synchronous.
(XEN) Bootloader: unknown
(XEN) Command line: cpuinfo conring_size=1048576 sync_console cpufreq=verbose 
com1=115200,8n1 console=com1,vga loglvl=all guest_loglvl=all
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: none; EDID transfer time: 0 seconds
(XEN)  EDID info not retrieved because no DDC retrieval method detected
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)   - 0009ec00 (usable)
(XEN)  0009ec00 - 000a (reserved)
(XEN)  000e - 0010 (reserved)
(XEN)  0010 - 2000 (usable)
(XEN)  2000 - 2020 (reserved)
(XEN)  2020 - 4000 (usable)
(XEN)  4000 - 4020 (reserved)
(XEN)  4020 -

Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread H. Peter Anvin

On 02/22/2013 08:55 AM, Konrad Rzeszutek Wilk wrote:


What is bizzare is that I do recall testing this (and Stefano also did it).
So I am not sure what has altered.



Yes, there was a very specific reason why I wanted you guys to test it...

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread Konrad Rzeszutek Wilk
On Thu, Feb 21, 2013 at 04:34:06PM -0800, H. Peter Anvin wrote:
> Hi Linus,
> 
> This is a huge set of several partly interrelated (and concurrently
> developed) changes, which is why the branch history is messier than
> one would like.
> 
> The *really* big items are two humonguous patchsets mostly developed
> by Yinghai Lu at my request, which completely revamps the way we
> create initial page tables.  In particular, rather than estimating how
> much memory we will need for page tables and then build them into that
> memory -- a calculation that has shown to be incredibly fragile -- we
> now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
> a #PF handler which creates temporary page tables on demand.
> 
> This has several advantages:
> 
> 1. It makes it much easier to support things that need access to
>data very early (a followon patchset uses this to load microcode
>way early in the kernel startup).
> 
> 2. It allows the kernel and all the kernel data objects to be invoked
>from above the 4 GB limit.  This allows kdump to work on very large
>systems.
> 
> 3. It greatly reduces the difference between Xen and native (Xen's
>equivalent of the #PF handler are the temporary page tables created
>by the domain builder), eliminating a bunch of fragile hooks.
> 
> The patch series also gets us a bit closer to W^X.
> 
> Additional work in this pull is the 64-bit get_user() work which you
> were also involved with, and a bunch of cleanups/speedups to
> __phys_addr()/__pa().

Looking at figuring out which of the patches in the branch did this, but
with this merge I am getting a crash with a very simple PV guest (booted with
one 1G):

Call Trace:
  [] xen_get_user_pgd+0x5a  <--
  [] xen_get_user_pgd+0x5a 
  [] xen_write_cr3+0x77 
  [] init_mem_mapping+0x1f9 
  [] setup_arch+0x742 
  [] printk+0x48 
  [] start_kernel+0x90 
  [] __add_preferred_console.clone.1+0x9b 
  [] x86_64_start_reservations+0x2a 
  [] xen_start_kernel+0x564 

And the hypervisor says:
(XEN) d7:v0: unhandled page fault (ec=)
(XEN) Pagetable walk from ea05b2d0:
(XEN)  L4[0x1d4] =  
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 7 (vcpu#0) crashed on cpu#3:
(XEN) [ Xen-4.2.0  x86_64  debug=n  Not tainted ]
(XEN) CPU:3
(XEN) RIP:e033:[]
(XEN) RFLAGS: 0206   EM: 1   CONTEXT: pv guest
(XEN) rax: ea00   rbx: 01a0c000   rcx: 8000
(XEN) rdx: 0005b2a0   rsi: 01a0c000   rdi: 
(XEN) rbp: 81a01dd8   rsp: 81a01d90   r8:  
(XEN) r9:  1001   r10:    r11: 
(XEN) r12:    r13: 0010   r14: 
(XEN) r15: 0010   cr0: 8005003b   cr4: 000406f0
(XEN) cr3: 000411165000   cr2: ea05b2d0
(XEN) ds:    es:    fs:    gs:    ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=81a01d90:
(XEN)8000   8103feba
(XEN)0001e030 00010006 81a01dd8 e02b


What is bizzare is that I do recall testing this (and Stefano also did it).
So I am not sure what has altered.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] x86/mm changes for v3.9-rc1

2013-02-22 Thread Linus Torvalds
On Thu, Feb 21, 2013 at 4:34 PM, H. Peter Anvin  wrote:
>
> This is a huge set of several partly interrelated (and concurrently
> developed) changes, which is why the branch history is messier than
> one would like.
>
> The *really* big items are two humonguous patchsets mostly developed
> by Yinghai Lu at my request, which completely revamps the way we
> create initial page tables.

Ugh. So I've tried to walk through this, and it's painful. If this
results in problems, we're going to be *so* screwed. Is it bisectable?

I also don't understand how "early_idt_handler" could *possibly* work.
In particular, it seems to rely on the trap number being set up in the
stack frame:

cmpl $14,72(%rsp)   # Page fault?

but that's not even *true*. Why? Because we export both the
early_idt_handlers[] array (that sets up the trap number and makes the
stack frame be reliable) and the single early_idt_handler function
(that relies on the trap number and the reliable stack frame), AND
AFAIK WE USE THE LATTER!

See x86_64_start_kernel():

for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
#ifdef CONFIG_EARLY_PRINTK
set_intr_gate(i, &early_idt_handlers[i]);
#else
set_intr_gate(i, early_idt_handler);
#endif
}

so unless you have CONFIG_EARLY_PRINTK, the interrupt gate will point
to that raw early_idt_handler function that doesn't *work* on its own,
afaik.

Btw, it's not just the page fault index testing that is wrong. The whole

cmpl $__KERNEL_CS,96(%rsp)
jne 11f

also relies on the stack frame being set up the same way for all
exceptions - which again is only true if we ran through the
early_idt_handlers[] prologue that added the extra stack entry.

How does this even work for me? I don't have EARLY_PRINTK enabled.

What am I missing?

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization