Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-08-05 Thread Sachin Sant

Benjamin Herrenschmidt wrote:

Does this patch fixes it ?

[PATCH] powerpc/mm: Fix encoding of page table cache numbers

The mask used to encode the page table cache number in the
batch when freeing page tables was too small for the new
possible values of MMU page sizes. This increases it along
with a comment explaining the constraints.

Signed-off-by: Benjamin Herrenschmidt 
---
  

Yes this patch fixed the issue for me. Thanks Ben.

Tested-by: Sachin Sant 

Regards
-Sachin

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-08-05 Thread Benjamin Herrenschmidt
On Wed, 2009-08-05 at 16:13 +0530, Sachin Sant wrote:
> Benjamin Herrenschmidt wrote:
> > Thanks. I'll have a look next week. I think when I changed the indices
> > I may have forgotten to update something.
> >   
> Ben,
> 
> I can recreate this issue with today's next.
> Let me know if i can help in any way to fix this issue.

Does this patch fixes it ?

[PATCH] powerpc/mm: Fix encoding of page table cache numbers

The mask used to encode the page table cache number in the
batch when freeing page tables was too small for the new
possible values of MMU page sizes. This increases it along
with a comment explaining the constraints.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/pgalloc.h |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/pgalloc.h 
b/arch/powerpc/include/asm/pgalloc.h
index 34b0806..f2e812d 100644
--- a/arch/powerpc/include/asm/pgalloc.h
+++ b/arch/powerpc/include/asm/pgalloc.h
@@ -28,7 +28,12 @@ typedef struct pgtable_free {
unsigned long val;
 } pgtable_free_t;
 
-#define PGF_CACHENUM_MASK  0x7
+/* This needs to be big enough to allow for MMU_PAGE_COUNT + 2 to be stored
+ * and small enough to fit in the low bits of any naturally aligned page
+ * table cache entry. Arbitrarily set to 0x1f, that should give us some
+ * room to grow
+ */
+#define PGF_CACHENUM_MASK  0x1f
 
 static inline pgtable_free_t pgtable_free_cache(void *p, int cachenum,
unsigned long mask)
-- 
1.6.0.4


> Thanks
> -Sachin
> 
> >> : [ cut here ]
> >> cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
> >> pc: c00486d4: .free_hugepte_range+0x68/0xa0
> >> lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
> >> sp: c000389237e0
> >>msr: 80029032
> >>   current = 0xc0003b1d7780
> >>   paca= 0xc1002400
> >> pid   = 2839, comm = readback
> >> kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
> >> enter ? for help
> >> [c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c
> >> [c00038923970] c0165a48 .free_pgtables+0xa0/0x154
> >> [c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc
> >> [c00038923ae0] c00997ec .mmput+0x68/0x14c
> >> [c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8
> >> [c00038923c20] c00a16e8 .do_exit+0x214/0x784
> >> [c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8
> >> [c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48
> >> [c00038923e30] c00085b4 syscall_exit+0x0/0x40
> >> --- Exception: c01 (System Call) at 0fe15038
> >> SP (ffb8e030) is in userspace
> >> 0:mon> e
> >> cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
> >> pc: c00486d4: .free_hugepte_range+0x68/0xa0
> >> lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
> >> sp: c000389237e0
> >>msr: 80029032
> >>   current = 0xc0003b1d7780
> >>   paca= 0xc1002400
> >> pid   = 2839, comm = readback
> >> kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
> >> 0:mon> r
> >> R00 = 0001   R16 = 
> >> R01 = c000389237e0   R17 = 0001
> >> R02 = c0f165a8   R18 = 3fff
> >> R03 = c14504d0   R19 = 
> >> R04 = c00039390001   R20 = 
> >> R05 = 0007   R21 = 0100
> >> R06 =    R22 = 4000
> >> R07 = 4000   R23 = c14504d0
> >> R08 = c0003d708188   R24 = 3fff
> >> R09 = c0003eb4   R25 = 0007
> >> R10 = c0003d708188   R26 = c0003ebd41b8
> >> R11 = 0018   R27 = c14504d0
> >> R12 = 4448   R28 = c0003eb40018
> >> R13 = c1002400   R29 = 0008
> >> R14 =    R30 = 4000
> >> R15 =    R31 = c000389237e0
> >> pc  = c00486d4 .free_hugepte_range+0x68/0xa0
> >> lr  = c0048954 .hugetlb_free_pgd_range+0x248/0x38c
> >> msr = 80029032   cr  = 20042444
> >> ctr = 8000b6f4   xer = 0001   trap =  700
> >> 0:mon> 
> >>
> >> Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to
> >>
> >> BUG_ON(cachenum > PGF_CACHENUM_MASK);
> >>
> >> May be something to do with number of elements in huge_pgtable_cache_name 
> >> ??
> >>
> >> Thanks
> >> -Sachin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-08-05 Thread Sachin Sant

Kumar Gala wrote:


On Jul 29, 2009, at 10:04 AM, Sachin Sant wrote:


While executing hugetlb tests against today's Next tree on
a Power 6 box came across following OOPS.


out of interest what tests are you running for hugetlb?

The one maintained at : http://libhugetlbfs.ozlabs.org/ which points
to the sourceforge libhugetlbfs project.

Latest release can be downloaded from sourceforge using
http://sourceforge.net/projects/libhugetlbfs/files/

I am using version 2.5

Thanks
-Sachin



--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-08-05 Thread Kumar Gala


On Jul 29, 2009, at 10:04 AM, Sachin Sant wrote:


While executing hugetlb tests against today's Next tree on
a Power 6 box came across following OOPS.


out of interest what tests are you running for hugetlb?

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-08-05 Thread Sachin Sant

Benjamin Herrenschmidt wrote:

Thanks. I'll have a look next week. I think when I changed the indices
I may have forgotten to update something.
  

Ben,

I can recreate this issue with today's next.
Let me know if i can help in any way to fix this issue.

Thanks
-Sachin


: [ cut here ]
cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
pc: c00486d4: .free_hugepte_range+0x68/0xa0
lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
sp: c000389237e0
   msr: 80029032
  current = 0xc0003b1d7780
  paca= 0xc1002400
pid   = 2839, comm = readback
kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
enter ? for help
[c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c
[c00038923970] c0165a48 .free_pgtables+0xa0/0x154
[c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc
[c00038923ae0] c00997ec .mmput+0x68/0x14c
[c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8
[c00038923c20] c00a16e8 .do_exit+0x214/0x784
[c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8
[c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48
[c00038923e30] c00085b4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 0fe15038
SP (ffb8e030) is in userspace
0:mon> e
cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
pc: c00486d4: .free_hugepte_range+0x68/0xa0
lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
sp: c000389237e0
   msr: 80029032
  current = 0xc0003b1d7780
  paca= 0xc1002400
pid   = 2839, comm = readback
kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
0:mon> r
R00 = 0001   R16 = 
R01 = c000389237e0   R17 = 0001
R02 = c0f165a8   R18 = 3fff
R03 = c14504d0   R19 = 
R04 = c00039390001   R20 = 
R05 = 0007   R21 = 0100
R06 =    R22 = 4000
R07 = 4000   R23 = c14504d0
R08 = c0003d708188   R24 = 3fff
R09 = c0003eb4   R25 = 0007
R10 = c0003d708188   R26 = c0003ebd41b8
R11 = 0018   R27 = c14504d0
R12 = 4448   R28 = c0003eb40018
R13 = c1002400   R29 = 0008
R14 =    R30 = 4000
R15 =    R31 = c000389237e0
pc  = c00486d4 .free_hugepte_range+0x68/0xa0
lr  = c0048954 .hugetlb_free_pgd_range+0x248/0x38c
msr = 80029032   cr  = 20042444
ctr = 8000b6f4   xer = 0001   trap =  700
0:mon> 


Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to

BUG_ON(cachenum > PGF_CACHENUM_MASK);

May be something to do with number of elements in huge_pgtable_cache_name ??

Thanks
-Sachin

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-07-30 Thread Benjamin Herrenschmidt
On Thu, 2009-07-30 at 17:55 +0530, Sachin Sant wrote:
> Sachin Sant wrote:
> > next-20090728 worked fine. Last commit that changed
> > arch/powerpc/mm/hugetlbpage.c was 
> > cb7f3f2d92d1b26c13e30e639b6ee4a78e9a3afa
> >
> > powerpc: Add memory management headers for new 64-bit BookE
> >
> > I will try reverting that commit and check if that helps.
> Hi Ben,
> 
> Reverting the above patch helped. The tests ran fine against the
> patched kernel. But ofcourse that's not the solution :-)
> 
> Here is some data from xmon that might help find the reason for
> the failure. This is with today's next.

Thanks. I'll have a look next week. I think when I changed the indices
I may have forgotten to update something.

Cheers,
Ben.

> : [ cut here ]
> cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
> pc: c00486d4: .free_hugepte_range+0x68/0xa0
> lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
> sp: c000389237e0
>msr: 80029032
>   current = 0xc0003b1d7780
>   paca= 0xc1002400
> pid   = 2839, comm = readback
> kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
> enter ? for help
> [c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c
> [c00038923970] c0165a48 .free_pgtables+0xa0/0x154
> [c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc
> [c00038923ae0] c00997ec .mmput+0x68/0x14c
> [c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8
> [c00038923c20] c00a16e8 .do_exit+0x214/0x784
> [c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8
> [c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48
> [c00038923e30] c00085b4 syscall_exit+0x0/0x40
> --- Exception: c01 (System Call) at 0fe15038
> SP (ffb8e030) is in userspace
> 0:mon> e
> cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
> pc: c00486d4: .free_hugepte_range+0x68/0xa0
> lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
> sp: c000389237e0
>msr: 80029032
>   current = 0xc0003b1d7780
>   paca= 0xc1002400
> pid   = 2839, comm = readback
> kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
> 0:mon> r
> R00 = 0001   R16 = 
> R01 = c000389237e0   R17 = 0001
> R02 = c0f165a8   R18 = 3fff
> R03 = c14504d0   R19 = 
> R04 = c00039390001   R20 = 
> R05 = 0007   R21 = 0100
> R06 =    R22 = 4000
> R07 = 4000   R23 = c14504d0
> R08 = c0003d708188   R24 = 3fff
> R09 = c0003eb4   R25 = 0007
> R10 = c0003d708188   R26 = c0003ebd41b8
> R11 = 0018   R27 = c14504d0
> R12 = 4448   R28 = c0003eb40018
> R13 = c1002400   R29 = 0008
> R14 =    R30 = 4000
> R15 =    R31 = c000389237e0
> pc  = c00486d4 .free_hugepte_range+0x68/0xa0
> lr  = c0048954 .hugetlb_free_pgd_range+0x248/0x38c
> msr = 80029032   cr  = 20042444
> ctr = 8000b6f4   xer = 0001   trap =  700
> 0:mon> 
> 
> Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to
> 
> BUG_ON(cachenum > PGF_CACHENUM_MASK);
> 
> May be something to do with number of elements in huge_pgtable_cache_name ??
> 
> Thanks
> -Sachin
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)

2009-07-30 Thread Sachin Sant

Sachin Sant wrote:

next-20090728 worked fine. Last commit that changed
arch/powerpc/mm/hugetlbpage.c was 
cb7f3f2d92d1b26c13e30e639b6ee4a78e9a3afa


powerpc: Add memory management headers for new 64-bit BookE

I will try reverting that commit and check if that helps.

Hi Ben,

Reverting the above patch helped. The tests ran fine against the
patched kernel. But ofcourse that's not the solution :-)

Here is some data from xmon that might help find the reason for
the failure. This is with today's next.

: [ cut here ]
cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
   pc: c00486d4: .free_hugepte_range+0x68/0xa0
   lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
   sp: c000389237e0
  msr: 80029032
 current = 0xc0003b1d7780
 paca= 0xc1002400
   pid   = 2839, comm = readback
kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
enter ? for help
[c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c
[c00038923970] c0165a48 .free_pgtables+0xa0/0x154
[c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc
[c00038923ae0] c00997ec .mmput+0x68/0x14c
[c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8
[c00038923c20] c00a16e8 .do_exit+0x214/0x784
[c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8
[c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48
[c00038923e30] c00085b4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 0fe15038
SP (ffb8e030) is in userspace
0:mon> e
cpu 0x0: Vector: 700 (Program Check) at [c00038923560]
   pc: c00486d4: .free_hugepte_range+0x68/0xa0
   lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c
   sp: c000389237e0
  msr: 80029032
 current = 0xc0003b1d7780
 paca= 0xc1002400
   pid   = 2839, comm = readback
kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
0:mon> r
R00 = 0001   R16 = 
R01 = c000389237e0   R17 = 0001
R02 = c0f165a8   R18 = 3fff
R03 = c14504d0   R19 = 
R04 = c00039390001   R20 = 
R05 = 0007   R21 = 0100
R06 =    R22 = 4000
R07 = 4000   R23 = c14504d0
R08 = c0003d708188   R24 = 3fff
R09 = c0003eb4   R25 = 0007
R10 = c0003d708188   R26 = c0003ebd41b8
R11 = 0018   R27 = c14504d0
R12 = 4448   R28 = c0003eb40018
R13 = c1002400   R29 = 0008
R14 =    R30 = 4000
R15 =    R31 = c000389237e0
pc  = c00486d4 .free_hugepte_range+0x68/0xa0
lr  = c0048954 .hugetlb_free_pgd_range+0x248/0x38c
msr = 80029032   cr  = 20042444
ctr = 8000b6f4   xer = 0001   trap =  700
0:mon> 


Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to

BUG_ON(cachenum > PGF_CACHENUM_MASK);

May be something to do with number of elements in huge_pgtable_cache_name ??

Thanks
-Sachin

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev