Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
Benjamin Herrenschmidt wrote: Thanks. I'll have a look next week. I think when I changed the indices I may have forgotten to update something. Ben, I can recreate this issue with today's next. Let me know if i can help in any way to fix this issue. Thanks -Sachin : [ cut here ] cpu 0x0: Vector: 700 (Program Check) at [c00038923560] pc: c00486d4: .free_hugepte_range+0x68/0xa0 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c sp: c000389237e0 msr: 80029032 current = 0xc0003b1d7780 paca= 0xc1002400 pid = 2839, comm = readback kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! enter ? for help [c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c [c00038923970] c0165a48 .free_pgtables+0xa0/0x154 [c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc [c00038923ae0] c00997ec .mmput+0x68/0x14c [c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8 [c00038923c20] c00a16e8 .do_exit+0x214/0x784 [c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8 [c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48 [c00038923e30] c00085b4 syscall_exit+0x0/0x40 --- Exception: c01 (System Call) at 0fe15038 SP (ffb8e030) is in userspace 0:mon e cpu 0x0: Vector: 700 (Program Check) at [c00038923560] pc: c00486d4: .free_hugepte_range+0x68/0xa0 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c sp: c000389237e0 msr: 80029032 current = 0xc0003b1d7780 paca= 0xc1002400 pid = 2839, comm = readback kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! 0:mon r R00 = 0001 R16 = R01 = c000389237e0 R17 = 0001 R02 = c0f165a8 R18 = 3fff R03 = c14504d0 R19 = R04 = c00039390001 R20 = R05 = 0007 R21 = 0100 R06 = R22 = 4000 R07 = 4000 R23 = c14504d0 R08 = c0003d708188 R24 = 3fff R09 = c0003eb4 R25 = 0007 R10 = c0003d708188 R26 = c0003ebd41b8 R11 = 0018 R27 = c14504d0 R12 = 4448 R28 = c0003eb40018 R13 = c1002400 R29 = 0008 R14 = R30 = 4000 R15 = R31 = c000389237e0 pc = c00486d4 .free_hugepte_range+0x68/0xa0 lr = c0048954 .hugetlb_free_pgd_range+0x248/0x38c msr = 80029032 cr = 20042444 ctr = 8000b6f4 xer = 0001 trap = 700 0:mon Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to BUG_ON(cachenum PGF_CACHENUM_MASK); May be something to do with number of elements in huge_pgtable_cache_name ?? Thanks -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
On Jul 29, 2009, at 10:04 AM, Sachin Sant wrote: While executing hugetlb tests against today's Next tree on a Power 6 box came across following OOPS. out of interest what tests are you running for hugetlb? - k ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
Kumar Gala wrote: On Jul 29, 2009, at 10:04 AM, Sachin Sant wrote: While executing hugetlb tests against today's Next tree on a Power 6 box came across following OOPS. out of interest what tests are you running for hugetlb? The one maintained at : http://libhugetlbfs.ozlabs.org/ which points to the sourceforge libhugetlbfs project. Latest release can be downloaded from sourceforge using http://sourceforge.net/projects/libhugetlbfs/files/ I am using version 2.5 Thanks -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
On Wed, 2009-08-05 at 16:13 +0530, Sachin Sant wrote: Benjamin Herrenschmidt wrote: Thanks. I'll have a look next week. I think when I changed the indices I may have forgotten to update something. Ben, I can recreate this issue with today's next. Let me know if i can help in any way to fix this issue. Does this patch fixes it ? [PATCH] powerpc/mm: Fix encoding of page table cache numbers The mask used to encode the page table cache number in the batch when freeing page tables was too small for the new possible values of MMU page sizes. This increases it along with a comment explaining the constraints. Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/include/asm/pgalloc.h |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/pgalloc.h b/arch/powerpc/include/asm/pgalloc.h index 34b0806..f2e812d 100644 --- a/arch/powerpc/include/asm/pgalloc.h +++ b/arch/powerpc/include/asm/pgalloc.h @@ -28,7 +28,12 @@ typedef struct pgtable_free { unsigned long val; } pgtable_free_t; -#define PGF_CACHENUM_MASK 0x7 +/* This needs to be big enough to allow for MMU_PAGE_COUNT + 2 to be stored + * and small enough to fit in the low bits of any naturally aligned page + * table cache entry. Arbitrarily set to 0x1f, that should give us some + * room to grow + */ +#define PGF_CACHENUM_MASK 0x1f static inline pgtable_free_t pgtable_free_cache(void *p, int cachenum, unsigned long mask) -- 1.6.0.4 Thanks -Sachin : [ cut here ] cpu 0x0: Vector: 700 (Program Check) at [c00038923560] pc: c00486d4: .free_hugepte_range+0x68/0xa0 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c sp: c000389237e0 msr: 80029032 current = 0xc0003b1d7780 paca= 0xc1002400 pid = 2839, comm = readback kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! enter ? for help [c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c [c00038923970] c0165a48 .free_pgtables+0xa0/0x154 [c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc [c00038923ae0] c00997ec .mmput+0x68/0x14c [c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8 [c00038923c20] c00a16e8 .do_exit+0x214/0x784 [c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8 [c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48 [c00038923e30] c00085b4 syscall_exit+0x0/0x40 --- Exception: c01 (System Call) at 0fe15038 SP (ffb8e030) is in userspace 0:mon e cpu 0x0: Vector: 700 (Program Check) at [c00038923560] pc: c00486d4: .free_hugepte_range+0x68/0xa0 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c sp: c000389237e0 msr: 80029032 current = 0xc0003b1d7780 paca= 0xc1002400 pid = 2839, comm = readback kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! 0:mon r R00 = 0001 R16 = R01 = c000389237e0 R17 = 0001 R02 = c0f165a8 R18 = 3fff R03 = c14504d0 R19 = R04 = c00039390001 R20 = R05 = 0007 R21 = 0100 R06 = R22 = 4000 R07 = 4000 R23 = c14504d0 R08 = c0003d708188 R24 = 3fff R09 = c0003eb4 R25 = 0007 R10 = c0003d708188 R26 = c0003ebd41b8 R11 = 0018 R27 = c14504d0 R12 = 4448 R28 = c0003eb40018 R13 = c1002400 R29 = 0008 R14 = R30 = 4000 R15 = R31 = c000389237e0 pc = c00486d4 .free_hugepte_range+0x68/0xa0 lr = c0048954 .hugetlb_free_pgd_range+0x248/0x38c msr = 80029032 cr = 20042444 ctr = 8000b6f4 xer = 0001 trap = 700 0:mon Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to BUG_ON(cachenum PGF_CACHENUM_MASK); May be something to do with number of elements in huge_pgtable_cache_name ?? Thanks -Sachin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
Benjamin Herrenschmidt wrote: Does this patch fixes it ? [PATCH] powerpc/mm: Fix encoding of page table cache numbers The mask used to encode the page table cache number in the batch when freeing page tables was too small for the new possible values of MMU page sizes. This increases it along with a comment explaining the constraints. Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- Yes this patch fixed the issue for me. Thanks Ben. Tested-by: Sachin Sant sach...@in.ibm.com Regards -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
Sachin Sant wrote: next-20090728 worked fine. Last commit that changed arch/powerpc/mm/hugetlbpage.c was cb7f3f2d92d1b26c13e30e639b6ee4a78e9a3afa powerpc: Add memory management headers for new 64-bit BookE I will try reverting that commit and check if that helps. Hi Ben, Reverting the above patch helped. The tests ran fine against the patched kernel. But ofcourse that's not the solution :-) Here is some data from xmon that might help find the reason for the failure. This is with today's next. : [ cut here ] cpu 0x0: Vector: 700 (Program Check) at [c00038923560] pc: c00486d4: .free_hugepte_range+0x68/0xa0 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c sp: c000389237e0 msr: 80029032 current = 0xc0003b1d7780 paca= 0xc1002400 pid = 2839, comm = readback kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! enter ? for help [c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c [c00038923970] c0165a48 .free_pgtables+0xa0/0x154 [c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc [c00038923ae0] c00997ec .mmput+0x68/0x14c [c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8 [c00038923c20] c00a16e8 .do_exit+0x214/0x784 [c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8 [c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48 [c00038923e30] c00085b4 syscall_exit+0x0/0x40 --- Exception: c01 (System Call) at 0fe15038 SP (ffb8e030) is in userspace 0:mon e cpu 0x0: Vector: 700 (Program Check) at [c00038923560] pc: c00486d4: .free_hugepte_range+0x68/0xa0 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c sp: c000389237e0 msr: 80029032 current = 0xc0003b1d7780 paca= 0xc1002400 pid = 2839, comm = readback kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! 0:mon r R00 = 0001 R16 = R01 = c000389237e0 R17 = 0001 R02 = c0f165a8 R18 = 3fff R03 = c14504d0 R19 = R04 = c00039390001 R20 = R05 = 0007 R21 = 0100 R06 = R22 = 4000 R07 = 4000 R23 = c14504d0 R08 = c0003d708188 R24 = 3fff R09 = c0003eb4 R25 = 0007 R10 = c0003d708188 R26 = c0003ebd41b8 R11 = 0018 R27 = c14504d0 R12 = 4448 R28 = c0003eb40018 R13 = c1002400 R29 = 0008 R14 = R30 = 4000 R15 = R31 = c000389237e0 pc = c00486d4 .free_hugepte_range+0x68/0xa0 lr = c0048954 .hugetlb_free_pgd_range+0x248/0x38c msr = 80029032 cr = 20042444 ctr = 8000b6f4 xer = 0001 trap = 700 0:mon Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to BUG_ON(cachenum PGF_CACHENUM_MASK); May be something to do with number of elements in huge_pgtable_cache_name ?? Thanks -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
On Thu, 2009-07-30 at 17:55 +0530, Sachin Sant wrote: Sachin Sant wrote: next-20090728 worked fine. Last commit that changed arch/powerpc/mm/hugetlbpage.c was cb7f3f2d92d1b26c13e30e639b6ee4a78e9a3afa powerpc: Add memory management headers for new 64-bit BookE I will try reverting that commit and check if that helps. Hi Ben, Reverting the above patch helped. The tests ran fine against the patched kernel. But ofcourse that's not the solution :-) Here is some data from xmon that might help find the reason for the failure. This is with today's next. Thanks. I'll have a look next week. I think when I changed the indices I may have forgotten to update something. Cheers, Ben. : [ cut here ] cpu 0x0: Vector: 700 (Program Check) at [c00038923560] pc: c00486d4: .free_hugepte_range+0x68/0xa0 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c sp: c000389237e0 msr: 80029032 current = 0xc0003b1d7780 paca= 0xc1002400 pid = 2839, comm = readback kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! enter ? for help [c00038923880] c0048954 .hugetlb_free_pgd_range+0x248/0x38c [c00038923970] c0165a48 .free_pgtables+0xa0/0x154 [c00038923a30] c0167f78 .exit_mmap+0x13c/0x1cc [c00038923ae0] c00997ec .mmput+0x68/0x14c [c00038923b70] c009f1d4 .exit_mm+0x190/0x1b8 [c00038923c20] c00a16e8 .do_exit+0x214/0x784 [c00038923d00] c00a1d1c .do_group_exit+0xc4/0xf8 [c00038923da0] c00a1d7c .SyS_exit_group+0x2c/0x48 [c00038923e30] c00085b4 syscall_exit+0x0/0x40 --- Exception: c01 (System Call) at 0fe15038 SP (ffb8e030) is in userspace 0:mon e cpu 0x0: Vector: 700 (Program Check) at [c00038923560] pc: c00486d4: .free_hugepte_range+0x68/0xa0 lr: c0048954: .hugetlb_free_pgd_range+0x248/0x38c sp: c000389237e0 msr: 80029032 current = 0xc0003b1d7780 paca= 0xc1002400 pid = 2839, comm = readback kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36! 0:mon r R00 = 0001 R16 = R01 = c000389237e0 R17 = 0001 R02 = c0f165a8 R18 = 3fff R03 = c14504d0 R19 = R04 = c00039390001 R20 = R05 = 0007 R21 = 0100 R06 = R22 = 4000 R07 = 4000 R23 = c14504d0 R08 = c0003d708188 R24 = 3fff R09 = c0003eb4 R25 = 0007 R10 = c0003d708188 R26 = c0003ebd41b8 R11 = 0018 R27 = c14504d0 R12 = 4448 R28 = c0003eb40018 R13 = c1002400 R29 = 0008 R14 = R30 = 4000 R15 = R31 = c000389237e0 pc = c00486d4 .free_hugepte_range+0x68/0xa0 lr = c0048954 .hugetlb_free_pgd_range+0x248/0x38c msr = 80029032 cr = 20042444 ctr = 8000b6f4 xer = 0001 trap = 700 0:mon Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to BUG_ON(cachenum PGF_CACHENUM_MASK); May be something to do with number of elements in huge_pgtable_cache_name ?? Thanks -Sachin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
While executing hugetlb tests against today's Next tree on a Power 6 box came across following OOPS. [ cut here ] Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: ipv6 fuse loop dm_mod ehea sg sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod NIP: c003e794 LR: c003e9ec CTR: bba4 REGS: c0006a72b5d0 TRAP: 0700 Not tainted (2.6.31-rc4-autotest-next-20090729-5-ppc64) MSR: 80029032 EE,ME,CE,IR,DR CR: 2204 XER: 0001 TASK = c00069c00180[1115] 'readback' THREAD: c0006a728000 CPU: 2 GPR00: 0001 c0006a72b850 c0a93190 c0006f6f04d0 GPR04: c0006b810001 0008 0400 GPR08: c0006ece0ca8 c0006a137ff8 c0006ece0ca8 0018 GPR12: 42000448 c0b72800 GPR16: 477555d0 0001 03ff GPR20: 0400 0100 0400 GPR24: c0006f6f04d0 03ff 0007 c0006cdc28d0 GPR28: 0400 03fff000 c0006a137ff8 0400 NIP [c003e794] .free_hugepte_range+0x44/0x68 LR [c003e9ec] .hugetlb_free_pgd_range+0x234/0x374 Call Trace: [c0006a72b850] [175c08000393] 0x175c08000393 (unreliable) [c0006a72b8c0] [c003e9ec] .hugetlb_free_pgd_range+0x234/0x374 [c0006a72b9b0] [c013742c] .free_pgtables+0x90/0x140 [c0006a72ba60] [c01393c4] .exit_mmap+0x12c/0x1b8 [c0006a72bb10] [c008d460] .mmput+0x54/0x14c [c0006a72bba0] [c0092428] .exit_mm+0x17c/0x1a0 [c0006a72bc50] [c009481c] .do_exit+0x204/0x774 [c0006a72bd30] [c0094e40] .do_group_exit+0xb4/0xe8 [c0006a72bdc0] [c0094e88] .SyS_exit_group+0x14/0x28 [c0006a72be30] [c00085b4] syscall_exit+0x0/0x40 Instruction dump: 6881 780007e0 0b00 38a50001 3800 7ca507b4 f809 3801 2f850007 9003000c 7c101026 5400f7fe 0b00 78840724 7ca42378 4bff8ed5 next-20090728 worked fine. Last commit that changed arch/powerpc/mm/hugetlbpage.c was cb7f3f2d92d1b26c13e30e639b6ee4a78e9a3afa powerpc: Add memory management headers for new 64-bit BookE I will try reverting that commit and check if that helps. Thanks -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev