Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-08-05 Thread Sachin Sant

Benjamin Herrenschmidt wrote:

Thanks. I'll have a look next week. I think when I changed the indices
I may have forgotten to update something.
  

Ben,

I can recreate this issue with today's next.
Let me know if i can help in any way to fix this issue.

Thanks
-Sachin


: [ cut here ]
cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
pc: c00486d4: .free_hugepte_range+0x68/0xa0
lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
sp: c000389237e0
   msr: 80029032
  current = 0xc0003b1d7780
  paca= 0xc1002400
pid   = 2839, comm = readback
kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
enter ? for help
[c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c
[c00038923970] c0165a48 .free_pgtables+0xa0/0x154
[c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc
[c00038923ae0] c00997ec .mmput+0x68/0x14c
[c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8
[c00038923c20] c00a16e8 .do_exit+0x214/0x784
[c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8
[c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48
[c00038923e30] c00085b4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 0fe15038
SP (ffb8e030) is in userspace
0:mon e
cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
pc: c00486d4: .free_hugepte_range+0x68/0xa0
lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
sp: c000389237e0
   msr: 80029032
  current = 0xc0003b1d7780
  paca= 0xc1002400
pid   = 2839, comm = readback
kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
0:mon r
R00 = 0001   R16 = 
R01 = c000389237e0   R17 = 0001
R02 = c0f165a8   R18 = 3fff
R03 = c14504d0   R19 = 
R04 = c00039390001   R20 = 
R05 = 0007   R21 = 0100
R06 =    R22 = 4000
R07 = 4000   R23 = c14504d0
R08 = c0003d708188   R24 = 3fff
R09 = c0003eb4   R25 = 0007
R10 = c0003d708188   R26 = c0003ebd41b8
R11 = 0018   R27 = c14504d0
R12 = 4448   R28 = c0003eb40018
R13 = c1002400   R29 = 0008
R14 =    R30 = 4000
R15 =    R31 = c000389237e0
pc  = c00486d4 .free_hugepte_range+0x68/0xa0
lr  = c0048954 .hugetlb_free_pgd_range+0x248/0x38c
msr = 80029032   cr  = 20042444
ctr = 8000b6f4   xer = 0001   trap =  700
0:mon 


Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to

BUG_ON(cachenum  PGF_CACHENUM_MASK);

May be something to do with number of elements in huge_pgtable_cache_name ??

Thanks
-Sachin

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-08-05 Thread Kumar Gala


On Jul 29, 2009, at 10:04 AM, Sachin Sant wrote:


While executing hugetlb tests against today's Next tree on
a Power 6 box came across following OOPS.


out of interest what tests are you running for hugetlb?

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-08-05 Thread Sachin Sant

Kumar Gala wrote:


On Jul 29, 2009, at 10:04 AM, Sachin Sant wrote:


While executing hugetlb tests against today's Next tree on
a Power 6 box came across following OOPS.


out of interest what tests are you running for hugetlb?

The one maintained at : http://libhugetlbfs.ozlabs.org/ which points
to the sourceforge libhugetlbfs project.

Latest release can be downloaded from sourceforge using
http://sourceforge.net/projects/libhugetlbfs/files/

I am using version 2.5

Thanks
-Sachin



--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-08-05 Thread Benjamin Herrenschmidt
On Wed, 2009-08-05 at 16:13 +0530, Sachin Sant wrote:
 Benjamin Herrenschmidt wrote:
  Thanks. I'll have a look next week. I think when I changed the indices
  I may have forgotten to update something.

 Ben,
 
 I can recreate this issue with today's next.
 Let me know if i can help in any way to fix this issue.

Does this patch fixes it ?

[PATCH] powerpc/mm: Fix encoding of page table cache numbers

The mask used to encode the page table cache number in the
batch when freeing page tables was too small for the new
possible values of MMU page sizes. This increases it along
with a comment explaining the constraints.

Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org
---
 arch/powerpc/include/asm/pgalloc.h |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/pgalloc.h 
b/arch/powerpc/include/asm/pgalloc.h
index 34b0806..f2e812d 100644
--- a/arch/powerpc/include/asm/pgalloc.h
+++ b/arch/powerpc/include/asm/pgalloc.h
@@ -28,7 +28,12 @@ typedef struct pgtable_free {
unsigned long val;
 } pgtable_free_t;
 
-#define PGF_CACHENUM_MASK  0x7
+/* This needs to be big enough to allow for MMU_PAGE_COUNT + 2 to be stored
+ * and small enough to fit in the low bits of any naturally aligned page
+ * table cache entry. Arbitrarily set to 0x1f, that should give us some
+ * room to grow
+ */
+#define PGF_CACHENUM_MASK  0x1f
 
 static inline pgtable_free_t pgtable_free_cache(void *p, int cachenum,
unsigned long mask)
-- 
1.6.0.4


 Thanks
 -Sachin
 
  : [ cut here ]
  cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
  pc: c00486d4: .free_hugepte_range+0x68/0xa0
  lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
  sp: c000389237e0
 msr: 80029032
current = 0xc0003b1d7780
paca= 0xc1002400
  pid   = 2839, comm = readback
  kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
  enter ? for help
  [c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c
  [c00038923970] c0165a48 .free_pgtables+0xa0/0x154
  [c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc
  [c00038923ae0] c00997ec .mmput+0x68/0x14c
  [c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8
  [c00038923c20] c00a16e8 .do_exit+0x214/0x784
  [c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8
  [c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48
  [c00038923e30] c00085b4 syscall_exit+0x0/0x40
  --- Exception: c01 (System Call) at 0fe15038
  SP (ffb8e030) is in userspace
  0:mon e
  cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
  pc: c00486d4: .free_hugepte_range+0x68/0xa0
  lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
  sp: c000389237e0
 msr: 80029032
current = 0xc0003b1d7780
paca= 0xc1002400
  pid   = 2839, comm = readback
  kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
  0:mon r
  R00 = 0001   R16 = 
  R01 = c000389237e0   R17 = 0001
  R02 = c0f165a8   R18 = 3fff
  R03 = c14504d0   R19 = 
  R04 = c00039390001   R20 = 
  R05 = 0007   R21 = 0100
  R06 =    R22 = 4000
  R07 = 4000   R23 = c14504d0
  R08 = c0003d708188   R24 = 3fff
  R09 = c0003eb4   R25 = 0007
  R10 = c0003d708188   R26 = c0003ebd41b8
  R11 = 0018   R27 = c14504d0
  R12 = 4448   R28 = c0003eb40018
  R13 = c1002400   R29 = 0008
  R14 =    R30 = 4000
  R15 =    R31 = c000389237e0
  pc  = c00486d4 .free_hugepte_range+0x68/0xa0
  lr  = c0048954 .hugetlb_free_pgd_range+0x248/0x38c
  msr = 80029032   cr  = 20042444
  ctr = 8000b6f4   xer = 0001   trap =  700
  0:mon 
 
  Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to
 
  BUG_ON(cachenum  PGF_CACHENUM_MASK);
 
  May be something to do with number of elements in huge_pgtable_cache_name 
  ??
 
  Thanks
  -Sachin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-08-05 Thread Sachin Sant

Benjamin Herrenschmidt wrote:

Does this patch fixes it ?

[PATCH] powerpc/mm: Fix encoding of page table cache numbers

The mask used to encode the page table cache number in the
batch when freeing page tables was too small for the new
possible values of MMU page sizes. This increases it along
with a comment explaining the constraints.

Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org
---
  

Yes this patch fixed the issue for me. Thanks Ben.

Tested-by: Sachin Sant sach...@in.ibm.com

Regards
-Sachin

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-07-30 Thread Sachin Sant

Sachin Sant wrote:

next-20090728 worked fine. Last commit that changed
arch/powerpc/mm/hugetlbpage.c was 
cb7f3f2d92d1b26c13e30e639b6ee4a78e9a3afa


powerpc: Add memory management headers for new 64-bit BookE

I will try reverting that commit and check if that helps.

Hi Ben,

Reverting the above patch helped. The tests ran fine against the
patched kernel. But ofcourse that's not the solution :-)

Here is some data from xmon that might help find the reason for
the failure. This is with today's next.

: [ cut here ]
cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
   pc: c00486d4: .free_hugepte_range+0x68/0xa0
   lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
   sp: c000389237e0
  msr: 80029032
 current = 0xc0003b1d7780
 paca= 0xc1002400
   pid   = 2839, comm = readback
kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
enter ? for help
[c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c
[c00038923970] c0165a48 .free_pgtables+0xa0/0x154
[c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc
[c00038923ae0] c00997ec .mmput+0x68/0x14c
[c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8
[c00038923c20] c00a16e8 .do_exit+0x214/0x784
[c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8
[c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48
[c00038923e30] c00085b4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 0fe15038
SP (ffb8e030) is in userspace
0:mon e
cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
   pc: c00486d4: .free_hugepte_range+0x68/0xa0
   lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
   sp: c000389237e0
  msr: 80029032
 current = 0xc0003b1d7780
 paca= 0xc1002400
   pid   = 2839, comm = readback
kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
0:mon r
R00 = 0001   R16 = 
R01 = c000389237e0   R17 = 0001
R02 = c0f165a8   R18 = 3fff
R03 = c14504d0   R19 = 
R04 = c00039390001   R20 = 
R05 = 0007   R21 = 0100
R06 =    R22 = 4000
R07 = 4000   R23 = c14504d0
R08 = c0003d708188   R24 = 3fff
R09 = c0003eb4   R25 = 0007
R10 = c0003d708188   R26 = c0003ebd41b8
R11 = 0018   R27 = c14504d0
R12 = 4448   R28 = c0003eb40018
R13 = c1002400   R29 = 0008
R14 =    R30 = 4000
R15 =    R31 = c000389237e0
pc  = c00486d4 .free_hugepte_range+0x68/0xa0
lr  = c0048954 .hugetlb_free_pgd_range+0x248/0x38c
msr = 80029032   cr  = 20042444
ctr = 8000b6f4   xer = 0001   trap =  700
0:mon 


Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to

BUG_ON(cachenum  PGF_CACHENUM_MASK);

May be something to do with number of elements in huge_pgtable_cache_name ??

Thanks
-Sachin

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-07-30 Thread Benjamin Herrenschmidt
On Thu, 2009-07-30 at 17:55 +0530, Sachin Sant wrote:
 Sachin Sant wrote:
  next-20090728 worked fine. Last commit that changed
  arch/powerpc/mm/hugetlbpage.c was 
  cb7f3f2d92d1b26c13e30e639b6ee4a78e9a3afa
 
  powerpc: Add memory management headers for new 64-bit BookE
 
  I will try reverting that commit and check if that helps.
 Hi Ben,
 
 Reverting the above patch helped. The tests ran fine against the
 patched kernel. But ofcourse that's not the solution :-)
 
 Here is some data from xmon that might help find the reason for
 the failure. This is with today's next.

Thanks. I'll have a look next week. I think when I changed the indices
I may have forgotten to update something.

Cheers,
Ben.

 : [ cut here ]
 cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
 pc: c00486d4: .free_hugepte_range+0x68/0xa0
 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
 sp: c000389237e0
msr: 80029032
   current = 0xc0003b1d7780
   paca= 0xc1002400
 pid   = 2839, comm = readback
 kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
 enter ? for help
 [c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c
 [c00038923970] c0165a48 .free_pgtables+0xa0/0x154
 [c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc
 [c00038923ae0] c00997ec .mmput+0x68/0x14c
 [c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8
 [c00038923c20] c00a16e8 .do_exit+0x214/0x784
 [c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8
 [c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48
 [c00038923e30] c00085b4 syscall_exit+0x0/0x40
 --- Exception: c01 (System Call) at 0fe15038
 SP (ffb8e030) is in userspace
 0:mon e
 cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
 pc: c00486d4: .free_hugepte_range+0x68/0xa0
 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
 sp: c000389237e0
msr: 80029032
   current = 0xc0003b1d7780
   paca= 0xc1002400
 pid   = 2839, comm = readback
 kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
 0:mon r
 R00 = 0001   R16 = 
 R01 = c000389237e0   R17 = 0001
 R02 = c0f165a8   R18 = 3fff
 R03 = c14504d0   R19 = 
 R04 = c00039390001   R20 = 
 R05 = 0007   R21 = 0100
 R06 =    R22 = 4000
 R07 = 4000   R23 = c14504d0
 R08 = c0003d708188   R24 = 3fff
 R09 = c0003eb4   R25 = 0007
 R10 = c0003d708188   R26 = c0003ebd41b8
 R11 = 0018   R27 = c14504d0
 R12 = 4448   R28 = c0003eb40018
 R13 = c1002400   R29 = 0008
 R14 =    R30 = 4000
 R15 =    R31 = c000389237e0
 pc  = c00486d4 .free_hugepte_range+0x68/0xa0
 lr  = c0048954 .hugetlb_free_pgd_range+0x248/0x38c
 msr = 80029032   cr  = 20042444
 ctr = 8000b6f4   xer = 0001   trap =  700
 0:mon 
 
 Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to
 
 BUG_ON(cachenum  PGF_CACHENUM_MASK);
 
 May be something to do with number of elements in huge_pgtable_cache_name ??
 
 Thanks
 -Sachin
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-07-29 Thread Sachin Sant

While executing hugetlb tests against today's Next tree on
a Power 6 box came across following OOPS.

[ cut here ]
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: ipv6 fuse loop dm_mod ehea sg sd_mod crc_t10dif ibmvscsic 
scsi_transport_srp scsi_tgt scsi_mod
NIP: c003e794 LR: c003e9ec CTR: bba4
REGS: c0006a72b5d0 TRAP: 0700   Not tainted  
(2.6.31-rc4-autotest-next-20090729-5-ppc64)
MSR: 80029032 EE,ME,CE,IR,DR  CR: 2204  XER: 0001
TASK = c00069c00180[1115] 'readback' THREAD: c0006a728000 CPU: 2
GPR00: 0001 c0006a72b850 c0a93190 c0006f6f04d0
GPR04: c0006b810001 0008  0400
GPR08: c0006ece0ca8 c0006a137ff8 c0006ece0ca8 0018
GPR12: 42000448 c0b72800  
GPR16: 477555d0  0001 03ff
GPR20: 0400  0100 0400
GPR24: c0006f6f04d0 03ff 0007 c0006cdc28d0
GPR28: 0400 03fff000 c0006a137ff8 0400
NIP [c003e794] .free_hugepte_range+0x44/0x68
LR [c003e9ec] .hugetlb_free_pgd_range+0x234/0x374
Call Trace:
[c0006a72b850] [175c08000393] 0x175c08000393 (unreliable)
[c0006a72b8c0] [c003e9ec] .hugetlb_free_pgd_range+0x234/0x374
[c0006a72b9b0] [c013742c] .free_pgtables+0x90/0x140
[c0006a72ba60] [c01393c4] .exit_mmap+0x12c/0x1b8
[c0006a72bb10] [c008d460] .mmput+0x54/0x14c
[c0006a72bba0] [c0092428] .exit_mm+0x17c/0x1a0
[c0006a72bc50] [c009481c] .do_exit+0x204/0x774
[c0006a72bd30] [c0094e40] .do_group_exit+0xb4/0xe8
[c0006a72bdc0] [c0094e88] .SyS_exit_group+0x14/0x28
[c0006a72be30] [c00085b4] syscall_exit+0x0/0x40
Instruction dump:
6881 780007e0 0b00 38a50001 3800 7ca507b4 f809 3801
2f850007 9003000c 7c101026 5400f7fe 0b00 78840724 7ca42378 4bff8ed5

next-20090728 worked fine. Last commit that changed
arch/powerpc/mm/hugetlbpage.c was cb7f3f2d92d1b26c13e30e639b6ee4a78e9a3afa

powerpc: Add memory management headers for new 64-bit BookE

I will try reverting that commit and check if that helps.

Thanks
-Sachin


--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev