Re: Linux 3.0 boot failure on the Powerbook G4

2011-07-24 Thread Andreas Schwab
Michael Büsch m...@bues.ch writes:

 Linux 3.0 fails to boot _very_ early on my Powerbook G4. See the
 yaboot/OF screenshot:
 http://bues.ch/misc/linux-3.0-pbook.jpg

 Linux 2.6.39.2 boots fine.
 Does somebody have an idea?

Perhaps your image is getting too large?

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Linux 3.0 boot failure on the Powerbook G4

2011-07-24 Thread Michael Büsch
On Sun, 24 Jul 2011 10:06:03 +0200
Andreas Schwab sch...@linux-m68k.org wrote:

 Michael Büsch m...@bues.ch writes:
 
  Linux 3.0 fails to boot _very_ early on my Powerbook G4. See the
  yaboot/OF screenshot:
  http://bues.ch/misc/linux-3.0-pbook.jpg
 
  Linux 2.6.39.2 boots fine.
  Does somebody have an idea?
 
 Perhaps your image is getting too large?

I reduced the image size, so that it's way less than the
2.6.39 kernel size (old is 2.6.39. a is 3.0)
-rwxr-xr-x 1 root root 5.6M Jul 24 12:16 /boot/linux.a
-rwxr-xr-x 1 root root 6.3M Jul 23 20:50 /boot/linux.old

But that didn't help. :(

I'm currently trying to bisect it, but that turns out to be hard
due to various compile issues and stuff like that...

-- 
Greetings, Michael.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Linux 3.0 boot failure on the Powerbook G4

2011-07-24 Thread Benjamin Herrenschmidt
On Sat, 2011-07-23 at 22:20 +0200, Michael Büsch wrote:
 Linux 3.0 fails to boot _very_ early on my Powerbook G4. See the
 yaboot/OF screenshot:
 http://bues.ch/misc/linux-3.0-pbook.jpg
 
 Linux 2.6.39.2 boots fine.
 Does somebody have an idea?

Interesting, that's before it even kills OF. Are you booting a zImage or
a vmlinux ?

It might be also useful to compile yaboot with debug output enabled to
figure out where the kernel is loaded so we can try calculating where
exactly it dies if it's a vmlinux...

Cheers,
Ben.

 The config can be found here:
 http://bues.ch/misc/linux-3.0-ppc-config
 It's mostly equivalent to the working 2.6.39 config. I enabled
 early OF console, but that didn't help to show better error messages.
 
 The machine is a:
 
 cpu : 7447A, altivec supported
 clock   : 1499.999000MHz
 revision: 1.2 (pvr 8003 0102)
 platform: PowerMac
 motherboard : PowerBook5,6 MacRISC3 Power Macintosh 
 detected as : 287 (PowerBook G4 15)
 pmac-generation : NewWorld
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Linux 3.0 boot failure on the Powerbook G4

2011-07-24 Thread Michael Büsch
On Sun, 24 Jul 2011 22:07:30 +1000
Benjamin Herrenschmidt b...@kernel.crashing.org wrote:

 On Sat, 2011-07-23 at 22:20 +0200, Michael Büsch wrote:
  Linux 3.0 fails to boot _very_ early on my Powerbook G4. See the
  yaboot/OF screenshot:
  http://bues.ch/misc/linux-3.0-pbook.jpg
  
  Linux 2.6.39.2 boots fine.
  Does somebody have an idea?
 
 Interesting, that's before it even kills OF. Are you booting a zImage or
 a vmlinux ?

I'm booting zImage.pmac.

 It might be also useful to compile yaboot with debug output enabled to
 figure out where the kernel is loaded so we can try calculating where
 exactly it dies if it's a vmlinux...

Hm, I guess I could probably try that.

-- 
Greetings, Michael.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Linux 3.0 boot failure on the Powerbook G4

2011-07-24 Thread Benjamin Herrenschmidt
On Sun, 2011-07-24 at 14:10 +0200, Michael Büsch wrote:
 On Sun, 24 Jul 2011 22:07:30 +1000
 Benjamin Herrenschmidt b...@kernel.crashing.org wrote:
 
  On Sat, 2011-07-23 at 22:20 +0200, Michael Büsch wrote:
   Linux 3.0 fails to boot _very_ early on my Powerbook G4. See the
   yaboot/OF screenshot:
   http://bues.ch/misc/linux-3.0-pbook.jpg
   
   Linux 2.6.39.2 boots fine.
   Does somebody have an idea?
  
  Interesting, that's before it even kills OF. Are you booting a zImage or
  a vmlinux ?
 
 I'm booting zImage.pmac.

Ah that might make it easier... I don't remember where it links, can you
show me the program headers out of readelf -a of the zImage ?

  It might be also useful to compile yaboot with debug output enabled to
  figure out where the kernel is loaded so we can try calculating where
  exactly it dies if it's a vmlinux...
 
 Hm, I guess I could probably try that.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Linux 3.0 boot failure on the Powerbook G4

2011-07-24 Thread Michael Büsch
On Sun, 24 Jul 2011 22:13:34 +1000
Benjamin Herrenschmidt b...@kernel.crashing.org wrote:
  I'm booting zImage.pmac.
 
 Ah that might make it easier... I don't remember where it links, can you
 show me the program headers out of readelf -a of the zImage ?

As I recompiled stuff, here's the current failure log:
http://bues.ch/misc/linux-3.0-pbook-2.jpg

And this is the corresponding readelf output:

mb@maggie:~$ readelf -a /boot/linux.a
ELF Header:
  Magic:   7f 45 4c 46 01 02 01 00 00 00 00 00 00 00 00 00 
  Class: ELF32
  Data:  2's complement, big endian
  Version:   1 (current)
  OS/ABI:UNIX - System V
  ABI Version:   0
  Type:  EXEC (Executable file)
  Machine:   PowerPC
  Version:   0x1
  Entry point address:   0x400230
  Start of program headers:  52 (bytes into file)
  Start of section headers:  5769716 (bytes into file)
  Flags: 0x8000, relocatable-lib
  Size of this header:   52 (bytes)
  Size of program headers:   32 (bytes)
  Number of program headers: 2
  Size of section headers:   40 (bytes)
  Number of section headers: 12
  Section header string table index: 9

Section Headers:
  [Nr] Name  TypeAddr OffSize   ES Flg Lk Inf Al
  [ 0]   NULL 00 00 00  0   0  0
  [ 1] .text PROGBITS0040 01 0048b0 00  AX  0   0  4
  [ 2] .data PROGBITS00405000 015000 0012f8 00  WA  0   0  4
  [ 3] .got  PROGBITS004062f8 0162f8 0c 04  WA  0   0  4
  [ 4] __builtin_cmdline PROGBITS00406304 016304 000200 00  WA  0   0  4
  [ 5] .kernel:vmlinux.s PROGBITS00407000 017000 569952 00   A  0   0  1
  [ 6] .bss  NOBITS  00971000 580952 00bc70 00  WA  0   0  4
  [ 7] .comment  PROGBITS 580952 1c 01  MS  0   0  1
  [ 8] .gnu.attributes   LOOS+ff5 58096e 14 00  0   0  1
  [ 9] .shstrtab STRTAB   580982 72 00  0   0  1
  [10] .symtab   SYMTAB   580bd4 000780 10 11  55  4
  [11] .strtab   STRTAB   581354 0004f3 00  0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type   Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD   0x01 0x0040 0x0040 0x570952 0x57cc70 RWE 0x1
  GNU_STACK  0x00 0x 0x 0x0 0x0 RWE 0x4

 Section to Segment mapping:
  Segment Sections...
   00 .text .data .got __builtin_cmdline .kernel:vmlinux.strip .bss 
   01 

There is no dynamic section in this file.

There are no relocations in this file.

There are no unwind sections in this file.

Symbol table '.symtab' contains 120 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
 0:  0 NOTYPE  LOCAL  DEFAULT  UND 
 1: 0040 0 SECTION LOCAL  DEFAULT1 
 2: 00405000 0 SECTION LOCAL  DEFAULT2 
 3: 004062f8 0 SECTION LOCAL  DEFAULT3 
 4: 00406304 0 SECTION LOCAL  DEFAULT4 
 5: 00407000 0 SECTION LOCAL  DEFAULT5 
 6: 00971000 0 SECTION LOCAL  DEFAULT6 
 7:  0 SECTION LOCAL  DEFAULT7 
 8:  0 SECTION LOCAL  DEFAULT8 
 9:  0 FILELOCAL  DEFAULT  ABS of.c
10: 004096 FUNCLOCAL  DEFAULT1 of_image_hdr
11: 00400130   220 FUNCLOCAL  DEFAULT1 of_try_claim
12: 00971000 4 OBJECT  LOCAL  DEFAULT6 claim_base
13:  0 FILELOCAL  DEFAULT  ABS empty.c
14: 0040021c 0 NOTYPE  LOCAL  DEFAULT1 p_start
15: 00400220 0 NOTYPE  LOCAL  DEFAULT1 p_etext
16: 00400224 0 NOTYPE  LOCAL  DEFAULT1 p_bss_start
17: 00400228 0 NOTYPE  LOCAL  DEFAULT1 p_end
18: 0040022c 0 NOTYPE  LOCAL  DEFAULT1 p_pstack
19: 00400234 0 NOTYPE  LOCAL  DEFAULT1 p_base
20: 0007 0 NOTYPE  LOCAL  DEFAULT  ABS RELA
21: 6ff9 0 NOTYPE  LOCAL  DEFAULT  ABS RELACOUNT
22:  0 FILELOCAL  DEFAULT  ABS main.c
23: 0040032c   536 FUNCLOCAL  DEFAULT1 prep_kernel
24: 00971004 46960 OBJECT  LOCAL  DEFAULT6 gzstate
25: 00406304   512 OBJECT  LOCAL  DEFAULT4 cmdline
26:  0 FILELOCAL  DEFAULT  ABS gunzip_util.c
27: 0097c774   128 OBJECT  LOCAL  DEFAULT6 discard_buf.1439
28:  0 FILELOCAL  DEFAULT  ABS elf_util.c
  

Re: [PATCH 2/5] hugetlb: add phys addr to struct huge_bootmem_page

2011-07-24 Thread Tabi Timur-B04825
On Thu, Jun 30, 2011 at 1:50 PM, Becky Bruce bec...@kernel.crashing.org wrote:

 Because there was no bootmem allocation in the normal case - the non-highmem
 version stores data structure in the huge page itself.  This is perfectly 
 fine as long
 as you have a mapping.  Since this isn't true for HIGHMEM pages, I allocate
 bootmem to store the early data structure that stores information about the
 hugepage (this happens in arch-specific code in alloc_bootmem_huge_page).

I would put this text in a comment in the code.

-- 
Timur Tabi
Linux kernel developer at Freescale
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: perf PPC: kernel panic with callchains and context switch events

2011-07-24 Thread David Ahern
On 07/20/2011 03:57 PM, David Ahern wrote:
 I am hoping someone familiar with PPC can help understand a panic that
 is generated when capturing callchains with context switch events.
 
 Call trace is below. The short of it is that walking the callchain
 generates a page fault. To handle the page fault the mmap_sem is needed,
 but it is currently held by setup_arg_pages. setup_arg_pages calls
 shift_arg_pages with the mmap_sem held. shift_arg_pages then calls
 move_page_tables which has a cond_resched at the top of its for loop. If
 the cond_resched() is removed from move_page_tables everything works
 beautifully - no panics.
 
 So, the question: is it normal for walking the stack to trigger a page
 fault on PPC? The panic is not seen on x86 based systems.

Can anyone confirm whether page faults while walking the stack are
normal for PPC? We really want to use the context switch event with
callchains and need to understand whether this behavior is normal. Of
course if it is normal, a way to address the problem without a panic
will be needed.

Thanks,
David

 
  [b0180e00]rb_erase+0x1b4/0x3e8
  [b00430f4]__dequeue_entity+0x50/0xe8
  [b0043304]set_next_entity+0x178/0x1bc
  [b0043440]pick_next_task_fair+0xb0/0x118
  [b02ada80]schedule+0x500/0x614
  [b02afaa8]rwsem_down_failed_common+0xf0/0x264
  [b02afca0]rwsem_down_read_failed+0x34/0x54
  [b02aed4c]down_read+0x3c/0x54
  [b0023b58]do_page_fault+0x114/0x5e8
  [b001e350]handle_page_fault+0xc/0x80
  [b0022dec]perf_callchain+0x224/0x31c
  [b009ba70]perf_prepare_sample+0x240/0x2fc
  [b009d760]__perf_event_overflow+0x280/0x398
  [b009d914]perf_swevent_overflow+0x9c/0x10c
  [b009db54]perf_swevent_ctx_event+0x1d0/0x230
  [b009dc38]do_perf_sw_event+0x84/0xe4
  [b009dde8]perf_sw_event_context_switch+0x150/0x1b4
  [b009de90]perf_event_task_sched_out+0x44/0x2d4
  [b02ad840]schedule+0x2c0/0x614
  [b0047dc0]__cond_resched+0x34/0x90
  [b02adcc8]_cond_resched+0x4c/0x68
  [b00bccf8]move_page_tables+0xb0/0x418
  [b00d7ee0]setup_arg_pages+0x184/0x2a0
  [b0110914]load_elf_binary+0x394/0x1208
  [b00d6e28]search_binary_handler+0xe0/0x2c4
  [b00d834c]do_execve+0x1bc/0x268
  [b0015394]sys_execve+0x84/0xc8
  [b001df10]ret_from_syscall+0x0/0x3c
 
 Thanks,
 David
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Linux 3.0 boot failure on the Powerbook G4

2011-07-24 Thread Benjamin Herrenschmidt
On Sun, 2011-07-24 at 14:37 +0200, Michael Büsch wrote:
 On Sun, 24 Jul 2011 22:13:34 +1000
 Benjamin Herrenschmidt b...@kernel.crashing.org wrote:
   I'm booting zImage.pmac.
  
  Ah that might make it easier... I don't remember where it links, can you
  show me the program headers out of readelf -a of the zImage ?
 
 As I recompiled stuff, here's the current failure log:
 http://bues.ch/misc/linux-3.0-pbook-2.jpg
 
 And this is the corresponding readelf output:

Hrm.. the faulting address is outside of the zImage. Odd.

Can you try loading a plain vmlinux instead ? (feel free to strip it).

yaboot 1.3.13 might not be the best one to load a real ELF ...

On my side I'll dig one of my old powerbooks and see if I can reproduce
(I generally tend to netboot the zImage directly, but it needs to be 
4M for that to work due to Apple OF limitations, or use yaboot with plan
vmlinux which exercises a different code path within yaboot).

Cheers,
Ben.

 mb@maggie:~$ readelf -a /boot/linux.a
 ELF Header:
   Magic:   7f 45 4c 46 01 02 01 00 00 00 00 00 00 00 00 00 
   Class: ELF32
   Data:  2's complement, big endian
   Version:   1 (current)
   OS/ABI:UNIX - System V
   ABI Version:   0
   Type:  EXEC (Executable file)
   Machine:   PowerPC
   Version:   0x1
   Entry point address:   0x400230
   Start of program headers:  52 (bytes into file)
   Start of section headers:  5769716 (bytes into file)
   Flags: 0x8000, relocatable-lib
   Size of this header:   52 (bytes)
   Size of program headers:   32 (bytes)
   Number of program headers: 2
   Size of section headers:   40 (bytes)
   Number of section headers: 12
   Section header string table index: 9
 
 Section Headers:
   [Nr] Name  TypeAddr OffSize   ES Flg Lk Inf 
 Al
   [ 0]   NULL 00 00 00  0   0 
  0
   [ 1] .text PROGBITS0040 01 0048b0 00  AX  0   0 
  4
   [ 2] .data PROGBITS00405000 015000 0012f8 00  WA  0   0 
  4
   [ 3] .got  PROGBITS004062f8 0162f8 0c 04  WA  0   0 
  4
   [ 4] __builtin_cmdline PROGBITS00406304 016304 000200 00  WA  0   0 
  4
   [ 5] .kernel:vmlinux.s PROGBITS00407000 017000 569952 00   A  0   0 
  1
   [ 6] .bss  NOBITS  00971000 580952 00bc70 00  WA  0   0 
  4
   [ 7] .comment  PROGBITS 580952 1c 01  MS  0   0 
  1
   [ 8] .gnu.attributes   LOOS+ff5 58096e 14 00  0   0 
  1
   [ 9] .shstrtab STRTAB   580982 72 00  0   0 
  1
   [10] .symtab   SYMTAB   580bd4 000780 10 11  55 
  4
   [11] .strtab   STRTAB   581354 0004f3 00  0   0 
  1
 Key to Flags:
   W (write), A (alloc), X (execute), M (merge), S (strings)
   I (info), L (link order), G (group), x (unknown)
   O (extra OS processing required) o (OS specific), p (processor specific)
 
 There are no section groups in this file.
 
 Program Headers:
   Type   Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
   LOAD   0x01 0x0040 0x0040 0x570952 0x57cc70 RWE 0x1
   GNU_STACK  0x00 0x 0x 0x0 0x0 RWE 0x4
 
  Section to Segment mapping:
   Segment Sections...
00 .text .data .got __builtin_cmdline .kernel:vmlinux.strip .bss 
01 
 
 There is no dynamic section in this file.
 
 There are no relocations in this file.
 
 There are no unwind sections in this file.
 
 Symbol table '.symtab' contains 120 entries:
Num:Value  Size TypeBind   Vis  Ndx Name
  0:  0 NOTYPE  LOCAL  DEFAULT  UND 
  1: 0040 0 SECTION LOCAL  DEFAULT1 
  2: 00405000 0 SECTION LOCAL  DEFAULT2 
  3: 004062f8 0 SECTION LOCAL  DEFAULT3 
  4: 00406304 0 SECTION LOCAL  DEFAULT4 
  5: 00407000 0 SECTION LOCAL  DEFAULT5 
  6: 00971000 0 SECTION LOCAL  DEFAULT6 
  7:  0 SECTION LOCAL  DEFAULT7 
  8:  0 SECTION LOCAL  DEFAULT8 
  9:  0 FILELOCAL  DEFAULT  ABS of.c
 10: 004096 FUNCLOCAL  DEFAULT1 of_image_hdr
 11: 00400130   220 FUNCLOCAL  DEFAULT1 of_try_claim
 12: 00971000 4 OBJECT  LOCAL  DEFAULT6 claim_base
 13:  0 FILELOCAL  DEFAULT  ABS empty.c
 14: 0040021c 0 NOTYPE  LOCAL  DEFAULT1 p_start
 15: 00400220 0 NOTYPE  LOCAL  DEFAULT1 p_etext
 16: 00400224 0 NOTYPE  LOCAL  DEFAULT1 p_bss_start
 17: 00400228 0 NOTYPE  LOCAL  DEFAULT1 p_end
 18: 

[PATCH] perf: powerpc: Disable pagefaults during callchain stack read

2011-07-24 Thread Anton Blanchard
Hi David,

  I am hoping someone familiar with PPC can help understand a panic
  that is generated when capturing callchains with context switch
  events.
  
  Call trace is below. The short of it is that walking the callchain
  generates a page fault. To handle the page fault the mmap_sem is
  needed, but it is currently held by setup_arg_pages.
  setup_arg_pages calls shift_arg_pages with the mmap_sem held.
  shift_arg_pages then calls move_page_tables which has a
  cond_resched at the top of its for loop. If the cond_resched() is
  removed from move_page_tables everything works beautifully - no
  panics.
  
  So, the question: is it normal for walking the stack to trigger a
  page fault on PPC? The panic is not seen on x86 based systems.
 
 Can anyone confirm whether page faults while walking the stack are
 normal for PPC? We really want to use the context switch event with
 callchains and need to understand whether this behavior is normal. Of
 course if it is normal, a way to address the problem without a panic
 will be needed.

I talked to Ben about this last week and he pointed me at
pagefault_disable/enable. Untested patch below.

Anton

--

We need to disable pagefaults when reading the stack otherwise
we can lock up trying to take the mmap_sem when the code we are
profiling already has a write lock taken.

This will not happen for hardware events, but could for software
events.

Reported-by: David Ahern dsah...@gmail.com
Signed-off-by: Anton Blanchard an...@samba.org
Cc: sta...@kernel.org
---

Index: linux-powerpc/arch/powerpc/kernel/perf_callchain.c
===
--- linux-powerpc.orig/arch/powerpc/kernel/perf_callchain.c 2011-07-25 
09:54:27.296757427 +1000
+++ linux-powerpc/arch/powerpc/kernel/perf_callchain.c  2011-07-25 
09:56:08.828367882 +1000
@@ -154,8 +154,12 @@ static int read_user_stack_64(unsigned l
((unsigned long)ptr  7))
return -EFAULT;
 
-   if (!__get_user_inatomic(*ret, ptr))
+   pagefault_disable();
+   if (!__get_user_inatomic(*ret, ptr)) {
+   pagefault_enable();
return 0;
+   }
+   pagefault_enable();
 
return read_user_stack_slow(ptr, ret, 8);
 }
@@ -166,8 +170,12 @@ static int read_user_stack_32(unsigned i
((unsigned long)ptr  3))
return -EFAULT;
 
-   if (!__get_user_inatomic(*ret, ptr))
+   pagefault_disable();
+   if (!__get_user_inatomic(*ret, ptr)) {
+   pagefault_enable();
return 0;
+   }
+   pagefault_enable();
 
return read_user_stack_slow(ptr, ret, 4);
 }
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: perf PPC: kernel panic with callchains and context switch events

2011-07-24 Thread Benjamin Herrenschmidt
On Sun, 2011-07-24 at 11:18 -0600, David Ahern wrote:
 On 07/20/2011 03:57 PM, David Ahern wrote:
  I am hoping someone familiar with PPC can help understand a panic that
  is generated when capturing callchains with context switch events.
  
  Call trace is below. The short of it is that walking the callchain
  generates a page fault. To handle the page fault the mmap_sem is needed,
  but it is currently held by setup_arg_pages. setup_arg_pages calls
  shift_arg_pages with the mmap_sem held. shift_arg_pages then calls
  move_page_tables which has a cond_resched at the top of its for loop. If
  the cond_resched() is removed from move_page_tables everything works
  beautifully - no panics.
  
  So, the question: is it normal for walking the stack to trigger a page
  fault on PPC? The panic is not seen on x86 based systems.
 
 Can anyone confirm whether page faults while walking the stack are
 normal for PPC? We really want to use the context switch event with
 callchains and need to understand whether this behavior is normal. Of
 course if it is normal, a way to address the problem without a panic
 will be needed.

Now that leads to interesting discoveries :-) Becky, can you read all
the way and let me know what you think ?

So, trying to walk the user stack directly will potentially cause page
faults if it's done by direct access. So if you're going to do it in a
spot where you can't afford it, you need to pagefault_disable() I
suppose. I think the problem with our existing code is that it's missing
those around __get_user_inatomic().

In fact, arguably, we don't want the hash code from modifying the hash
either (or even hashing things in). Our 64-bit code handles it today in
perf_callchain.c in a way that involves pretty much duplicating the
functionality of __get_user_pages_fast() as used by x86 (see below), but
as a fallback from a direct access which misses the pagefault_disable()
as well.

I think it comes from an old assumption that this would always be called
from an nmi, and the explicit tracepoints broke that assumption.

In fact we probably want to bump the NMI count, not just the IRQ count
as pagefault_disable() does, to make sure we prevent hashing. 

x86 does things differently, using __get_user_pages_fast() (a variant of
get_user_page_fast() that doesn't fallback to normal get_user_pages()).

Now, we could do the same (use __gup_fast too), but I can see a
potential issue with ppc 32-bit platforms that have 64-bit PTEs, since
we could end up GUP'ing in the middle of the two accesses.

Becky: I think gup_fast is generally broken on 32-bit with 64-bit PTE
because of that, the problem isn't specific to perf backtraces, I'll
propose a solution further down.

Now, on x86, there is a similar problem with PAE, which is handled by

 - having gup disable IRQs
 - rely on the fact that to change from a valid value to another valid
   value, the PTE will first get invalidated, which requires an IPI
   and thus will be blocked by our interrupts being off

We do the first part, but the second part will break if we use HW TLB
invalidation broadcast (yet another reason why those are bad, I think I
will write a blog entry about it one of these days).

I think we can work around this while keeping our broadcast TLB
invalidations by having the invalidation code also increment a global
generation count (using the existing lock used by the invalidation code,
all 32-bit platforms have such a lock).

From there, gup_fast can be changed to, with proper ordering, check the
generation count around the loading of the PTE and loop if it has
changed, kind-of a seqlock.

We also need the NMI count bump if we are going to try to keep the
attempt at doing a direct access first for perfs.

Becky, do you feel like giving that a shot or should I find another
victim ? (Or even do it myself ... ) :-)

Cheers,
Ben.

 Thanks,
 David
 
  
   [b0180e00]rb_erase+0x1b4/0x3e8
   [b00430f4]__dequeue_entity+0x50/0xe8
   [b0043304]set_next_entity+0x178/0x1bc
   [b0043440]pick_next_task_fair+0xb0/0x118
   [b02ada80]schedule+0x500/0x614
   [b02afaa8]rwsem_down_failed_common+0xf0/0x264
   [b02afca0]rwsem_down_read_failed+0x34/0x54
   [b02aed4c]down_read+0x3c/0x54
   [b0023b58]do_page_fault+0x114/0x5e8
   [b001e350]handle_page_fault+0xc/0x80
   [b0022dec]perf_callchain+0x224/0x31c
   [b009ba70]perf_prepare_sample+0x240/0x2fc
   [b009d760]__perf_event_overflow+0x280/0x398
   [b009d914]perf_swevent_overflow+0x9c/0x10c
   [b009db54]perf_swevent_ctx_event+0x1d0/0x230
   [b009dc38]do_perf_sw_event+0x84/0xe4
   [b009dde8]perf_sw_event_context_switch+0x150/0x1b4
   [b009de90]perf_event_task_sched_out+0x44/0x2d4
   [b02ad840]schedule+0x2c0/0x614
   [b0047dc0]__cond_resched+0x34/0x90
   [b02adcc8]_cond_resched+0x4c/0x68
   [b00bccf8]move_page_tables+0xb0/0x418
   [b00d7ee0]setup_arg_pages+0x184/0x2a0
   [b0110914]load_elf_binary+0x394/0x1208
   [b00d6e28]search_binary_handler+0xe0/0x2c4
   [b00d834c]do_execve+0x1bc/0x268