Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
Benjamin Herrenschmidt wrote: Does this patch fixes it ? [PATCH] powerpc/mm: Fix encoding of page table cache numbers The mask used to encode the page table cache number in the batch when freeing page tables was too small for the new possible values of MMU page sizes. This increases it along with a comment explaining the constraints. Signed-off-by: Benjamin Herrenschmidt --- Yes this patch fixed the issue for me. Thanks Ben. Tested-by: Sachin Sant Regards -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
On Wed, 2009-08-05 at 16:13 +0530, Sachin Sant wrote: > Benjamin Herrenschmidt wrote: > > Thanks. I'll have a look next week. I think when I changed the indices > > I may have forgotten to update something. > > > Ben, > > I can recreate this issue with today's next. > Let me know if i can help in any way to fix this issue. Does this patch fixes it ? [PATCH] powerpc/mm: Fix encoding of page table cache numbers The mask used to encode the page table cache number in the batch when freeing page tables was too small for the new possible values of MMU page sizes. This increases it along with a comment explaining the constraints. Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/include/asm/pgalloc.h |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/pgalloc.h b/arch/powerpc/include/asm/pgalloc.h index 34b0806..f2e812d 100644 --- a/arch/powerpc/include/asm/pgalloc.h +++ b/arch/powerpc/include/asm/pgalloc.h @@ -28,7 +28,12 @@ typedef struct pgtable_free { unsigned long val; } pgtable_free_t; -#define PGF_CACHENUM_MASK 0x7 +/* This needs to be big enough to allow for MMU_PAGE_COUNT + 2 to be stored + * and small enough to fit in the low bits of any naturally aligned page + * table cache entry. Arbitrarily set to 0x1f, that should give us some + * room to grow + */ +#define PGF_CACHENUM_MASK 0x1f static inline pgtable_free_t pgtable_free_cache(void *p, int cachenum, unsigned long mask) -- 1.6.0.4 > Thanks > -Sachin > > >> : [ cut here ] > >> cpu 0x0: Vector: 700 (Program Check) at [c00038923560] > >> pc: c00486d4: .free_hugepte_range+0x68/0xa0 > >> lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c > >> sp: c000389237e0 > >>msr: 80029032 > >> current = 0xc0003b1d7780 > >> paca= 0xc1002400 > >> pid = 2839, comm = readback > >> kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! > >> enter ? for help > >> [c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c > >> [c00038923970] c0165a48 .free_pgtables+0xa0/0x154 > >> [c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc > >> [c00038923ae0] c00997ec .mmput+0x68/0x14c > >> [c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8 > >> [c00038923c20] c00a16e8 .do_exit+0x214/0x784 > >> [c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8 > >> [c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48 > >> [c00038923e30] c00085b4 syscall_exit+0x0/0x40 > >> --- Exception: c01 (System Call) at 0fe15038 > >> SP (ffb8e030) is in userspace > >> 0:mon> e > >> cpu 0x0: Vector: 700 (Program Check) at [c00038923560] > >> pc: c00486d4: .free_hugepte_range+0x68/0xa0 > >> lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c > >> sp: c000389237e0 > >>msr: 80029032 > >> current = 0xc0003b1d7780 > >> paca= 0xc1002400 > >> pid = 2839, comm = readback > >> kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! > >> 0:mon> r > >> R00 = 0001 R16 = > >> R01 = c000389237e0 R17 = 0001 > >> R02 = c0f165a8 R18 = 3fff > >> R03 = c14504d0 R19 = > >> R04 = c00039390001 R20 = > >> R05 = 0007 R21 = 0100 > >> R06 = R22 = 4000 > >> R07 = 4000 R23 = c14504d0 > >> R08 = c0003d708188 R24 = 3fff > >> R09 = c0003eb4 R25 = 0007 > >> R10 = c0003d708188 R26 = c0003ebd41b8 > >> R11 = 0018 R27 = c14504d0 > >> R12 = 4448 R28 = c0003eb40018 > >> R13 = c1002400 R29 = 0008 > >> R14 = R30 = 4000 > >> R15 = R31 = c000389237e0 > >> pc = c00486d4 .free_hugepte_range+0x68/0xa0 > >> lr = c0048954 .hugetlb_free_pgd_range+0x248/0x38c > >> msr = 80029032 cr = 20042444 > >> ctr = 8000b6f4 xer = 0001 trap = 700 > >> 0:mon> > >> > >> Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to > >> > >> BUG_ON(cachenum > PGF_CACHENUM_MASK); > >> > >> May be something to do with number of elements in huge_pgtable_cache_name > >> ?? > >> > >> Thanks > >> -Sachin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
Kumar Gala wrote: On Jul 29, 2009, at 10:04 AM, Sachin Sant wrote: While executing hugetlb tests against today's Next tree on a Power 6 box came across following OOPS. out of interest what tests are you running for hugetlb? The one maintained at : http://libhugetlbfs.ozlabs.org/ which points to the sourceforge libhugetlbfs project. Latest release can be downloaded from sourceforge using http://sourceforge.net/projects/libhugetlbfs/files/ I am using version 2.5 Thanks -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
On Jul 29, 2009, at 10:04 AM, Sachin Sant wrote: While executing hugetlb tests against today's Next tree on a Power 6 box came across following OOPS. out of interest what tests are you running for hugetlb? - k ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
Benjamin Herrenschmidt wrote: Thanks. I'll have a look next week. I think when I changed the indices I may have forgotten to update something. Ben, I can recreate this issue with today's next. Let me know if i can help in any way to fix this issue. Thanks -Sachin : [ cut here ] cpu 0x0: Vector: 700 (Program Check) at [c00038923560] pc: c00486d4: .free_hugepte_range+0x68/0xa0 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c sp: c000389237e0 msr: 80029032 current = 0xc0003b1d7780 paca= 0xc1002400 pid = 2839, comm = readback kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! enter ? for help [c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c [c00038923970] c0165a48 .free_pgtables+0xa0/0x154 [c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc [c00038923ae0] c00997ec .mmput+0x68/0x14c [c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8 [c00038923c20] c00a16e8 .do_exit+0x214/0x784 [c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8 [c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48 [c00038923e30] c00085b4 syscall_exit+0x0/0x40 --- Exception: c01 (System Call) at 0fe15038 SP (ffb8e030) is in userspace 0:mon> e cpu 0x0: Vector: 700 (Program Check) at [c00038923560] pc: c00486d4: .free_hugepte_range+0x68/0xa0 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c sp: c000389237e0 msr: 80029032 current = 0xc0003b1d7780 paca= 0xc1002400 pid = 2839, comm = readback kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! 0:mon> r R00 = 0001 R16 = R01 = c000389237e0 R17 = 0001 R02 = c0f165a8 R18 = 3fff R03 = c14504d0 R19 = R04 = c00039390001 R20 = R05 = 0007 R21 = 0100 R06 = R22 = 4000 R07 = 4000 R23 = c14504d0 R08 = c0003d708188 R24 = 3fff R09 = c0003eb4 R25 = 0007 R10 = c0003d708188 R26 = c0003ebd41b8 R11 = 0018 R27 = c14504d0 R12 = 4448 R28 = c0003eb40018 R13 = c1002400 R29 = 0008 R14 = R30 = 4000 R15 = R31 = c000389237e0 pc = c00486d4 .free_hugepte_range+0x68/0xa0 lr = c0048954 .hugetlb_free_pgd_range+0x248/0x38c msr = 80029032 cr = 20042444 ctr = 8000b6f4 xer = 0001 trap = 700 0:mon> Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to BUG_ON(cachenum > PGF_CACHENUM_MASK); May be something to do with number of elements in huge_pgtable_cache_name ?? Thanks -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
On Thu, 2009-07-30 at 17:55 +0530, Sachin Sant wrote: > Sachin Sant wrote: > > next-20090728 worked fine. Last commit that changed > > arch/powerpc/mm/hugetlbpage.c was > > cb7f3f2d92d1b26c13e30e639b6ee4a78e9a3afa > > > > powerpc: Add memory management headers for new 64-bit BookE > > > > I will try reverting that commit and check if that helps. > Hi Ben, > > Reverting the above patch helped. The tests ran fine against the > patched kernel. But ofcourse that's not the solution :-) > > Here is some data from xmon that might help find the reason for > the failure. This is with today's next. Thanks. I'll have a look next week. I think when I changed the indices I may have forgotten to update something. Cheers, Ben. > : [ cut here ] > cpu 0x0: Vector: 700 (Program Check) at [c00038923560] > pc: c00486d4: .free_hugepte_range+0x68/0xa0 > lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c > sp: c000389237e0 >msr: 80029032 > current = 0xc0003b1d7780 > paca= 0xc1002400 > pid = 2839, comm = readback > kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! > enter ? for help > [c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c > [c00038923970] c0165a48 .free_pgtables+0xa0/0x154 > [c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc > [c00038923ae0] c00997ec .mmput+0x68/0x14c > [c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8 > [c00038923c20] c00a16e8 .do_exit+0x214/0x784 > [c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8 > [c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48 > [c00038923e30] c00085b4 syscall_exit+0x0/0x40 > --- Exception: c01 (System Call) at 0fe15038 > SP (ffb8e030) is in userspace > 0:mon> e > cpu 0x0: Vector: 700 (Program Check) at [c00038923560] > pc: c00486d4: .free_hugepte_range+0x68/0xa0 > lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c > sp: c000389237e0 >msr: 80029032 > current = 0xc0003b1d7780 > paca= 0xc1002400 > pid = 2839, comm = readback > kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! > 0:mon> r > R00 = 0001 R16 = > R01 = c000389237e0 R17 = 0001 > R02 = c0f165a8 R18 = 3fff > R03 = c14504d0 R19 = > R04 = c00039390001 R20 = > R05 = 0007 R21 = 0100 > R06 = R22 = 4000 > R07 = 4000 R23 = c14504d0 > R08 = c0003d708188 R24 = 3fff > R09 = c0003eb4 R25 = 0007 > R10 = c0003d708188 R26 = c0003ebd41b8 > R11 = 0018 R27 = c14504d0 > R12 = 4448 R28 = c0003eb40018 > R13 = c1002400 R29 = 0008 > R14 = R30 = 4000 > R15 = R31 = c000389237e0 > pc = c00486d4 .free_hugepte_range+0x68/0xa0 > lr = c0048954 .hugetlb_free_pgd_range+0x248/0x38c > msr = 80029032 cr = 20042444 > ctr = 8000b6f4 xer = 0001 trap = 700 > 0:mon> > > Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to > > BUG_ON(cachenum > PGF_CACHENUM_MASK); > > May be something to do with number of elements in huge_pgtable_cache_name ?? > > Thanks > -Sachin > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
Sachin Sant wrote: next-20090728 worked fine. Last commit that changed arch/powerpc/mm/hugetlbpage.c was cb7f3f2d92d1b26c13e30e639b6ee4a78e9a3afa powerpc: Add memory management headers for new 64-bit BookE I will try reverting that commit and check if that helps. Hi Ben, Reverting the above patch helped. The tests ran fine against the patched kernel. But ofcourse that's not the solution :-) Here is some data from xmon that might help find the reason for the failure. This is with today's next. : [ cut here ] cpu 0x0: Vector: 700 (Program Check) at [c00038923560] pc: c00486d4: .free_hugepte_range+0x68/0xa0 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c sp: c000389237e0 msr: 80029032 current = 0xc0003b1d7780 paca= 0xc1002400 pid = 2839, comm = readback kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! enter ? for help [c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c [c00038923970] c0165a48 .free_pgtables+0xa0/0x154 [c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc [c00038923ae0] c00997ec .mmput+0x68/0x14c [c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8 [c00038923c20] c00a16e8 .do_exit+0x214/0x784 [c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8 [c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48 [c00038923e30] c00085b4 syscall_exit+0x0/0x40 --- Exception: c01 (System Call) at 0fe15038 SP (ffb8e030) is in userspace 0:mon> e cpu 0x0: Vector: 700 (Program Check) at [c00038923560] pc: c00486d4: .free_hugepte_range+0x68/0xa0 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c sp: c000389237e0 msr: 80029032 current = 0xc0003b1d7780 paca= 0xc1002400 pid = 2839, comm = readback kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! 0:mon> r R00 = 0001 R16 = R01 = c000389237e0 R17 = 0001 R02 = c0f165a8 R18 = 3fff R03 = c14504d0 R19 = R04 = c00039390001 R20 = R05 = 0007 R21 = 0100 R06 = R22 = 4000 R07 = 4000 R23 = c14504d0 R08 = c0003d708188 R24 = 3fff R09 = c0003eb4 R25 = 0007 R10 = c0003d708188 R26 = c0003ebd41b8 R11 = 0018 R27 = c14504d0 R12 = 4448 R28 = c0003eb40018 R13 = c1002400 R29 = 0008 R14 = R30 = 4000 R15 = R31 = c000389237e0 pc = c00486d4 .free_hugepte_range+0x68/0xa0 lr = c0048954 .hugetlb_free_pgd_range+0x248/0x38c msr = 80029032 cr = 20042444 ctr = 8000b6f4 xer = 0001 trap = 700 0:mon> Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to BUG_ON(cachenum > PGF_CACHENUM_MASK); May be something to do with number of elements in huge_pgtable_cache_name ?? Thanks -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev