[PATCH] powerpc/pci: cleanup on duplicate assignment
While creating the PCI root bus through function pci_create_root_bus() of PCI core, it should have assigned the secondary bus number for the newly created PCI root bus. Thus we needn't do the explicit assignment for the secondary bus number again in pcibios_scan_phb(). Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com --- arch/powerpc/kernel/pci-common.c |1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 8e78e93..0f75bd5 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -1646,7 +1646,6 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose) pci_free_resource_list(resources); return; } - bus-secondary = hose-first_busno; hose-bus = bus; /* Get probe mode and perform scan */ -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Use enhanced touch instructions in POWER7 copy_to_user/copy_from_user
Version 2.06 of the POWER ISA introduced enhanced touch instructions, allowing us to specify a number of attributes including the length of a stream. This patch adds a software stream for both loads and stores in the POWER7 copy_tofrom_user loop. Since the setup is quite complicated and we have to use an eieio to ensure correct ordering of the GO command we only do this for copies above 4kB. To quantify any performance improvements we need a working set bigger than the caches so we operate on a 1GB file: # dd if=/dev/zero of=/tmp/foo bs=1M count=1024 And we compare how fast we can read the file: # dd if=/tmp/foo of=/dev/null bs=1M before: 7.7 GB/s after: 9.6 GB/s A 25% improvement. The worst case for this patch will be a completely L1 cache contained copy of just over 4kB. We can test this with the copy_to_user testcase we used to tune copy_tofrom_user originally: http://ozlabs.org/~anton/junkcode/copy_to_user.c # time ./copy_to_user2 -l 4224 -i 1000 before: 6.807 s after: 6.946 s A 2% slowdown, which seems reasonable considering our data is unlikely to be completely L1 contained. Signed-off-by: Anton Blanchard an...@samba.org --- v2: Use cr1 in the comparison so we don't corrupt the compare/branch to select vmx vs non vmx loops. Index: linux-build/arch/powerpc/lib/copyuser_power7.S === --- linux-build.orig/arch/powerpc/lib/copyuser_power7.S 2012-05-29 21:22:40.445551834 +1000 +++ linux-build/arch/powerpc/lib/copyuser_power7.S 2012-05-31 15:28:35.336354208 +1000 @@ -298,6 +298,37 @@ err1; stb r0,0(r3) ld r5,STACKFRAMESIZE+64(r1) mtlrr0 + /* +* We prefetch both the source and destination using enhanced touch +* instructions. We use a stream ID of 0 for the load side and +* 1 for the store side. +*/ + clrrdi r6,r4,7 + clrrdi r9,r3,7 + ori r9,r9,1 /* stream=1 */ + + srdir7,r5,7 /* length in cachelines, capped at 0x3FF */ + cmpldi cr1,r7,0x3FF + ble cr1,1f + li r7,0x3FF +1: lis r0,0x0E00 /* depth=7 */ + sldir7,r7,7 + or r7,r7,r0 + ori r10,r7,1/* stream=1 */ + + lis r8,0x8000 /* GO=1 */ + clrldi r8,r8,32 + +.machine push +.machine power4 + dcbtr0,r6,0b01000 + dcbtr0,r7,0b01010 + dcbtst r0,r9,0b01000 + dcbtst r0,r10,0b01010 + eieio + dcbtr0,r8,0b01010 /* GO */ +.machine pop + beq .Lunwind_stack_nonvmx_copy /* ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: POWER7 optimised memcpy using VMX and enhanced prefetch
Implement a POWER7 optimised memcpy using VMX and enhanced prefetch instructions. This is a copy of the POWER7 optimised copy_to_user/copy_from_user loop. Detailed implementation and performance details can be found in commit a66086b8197d (powerpc: POWER7 optimised copy_to_user/copy_from_user using VMX). I noticed memcpy issues when profiling a RAID6 workload: .memcpy .async_memcpy .async_copy_data .__raid_run_ops .handle_stripe .raid5d .md_thread I created a simplified testcase by building a RAID6 array with 4 1GB ramdisks (booting with brd.rd_size=1048576): # mdadm -CR -e 1.2 /dev/md0 --level=6 -n4 /dev/ram[0-3] I then timed how long it took to write to the entire array: # dd if=/dev/zero of=/dev/md0 bs=1M Before: 892 MB/s After: 999 MB/s A 12% improvement. Signed-off-by: Anton Blanchard an...@samba.org --- Index: linux-build/arch/powerpc/lib/Makefile === --- linux-build.orig/arch/powerpc/lib/Makefile 2012-05-30 15:27:30.0 +1000 +++ linux-build/arch/powerpc/lib/Makefile 2012-05-31 09:12:27.574372864 +1000 @@ -17,7 +17,8 @@ obj-$(CONFIG_HAS_IOMEM) += devres.o obj-$(CONFIG_PPC64)+= copypage_64.o copyuser_64.o \ memcpy_64.o usercopy_64.o mem_64.o string.o \ checksum_wrappers_64.o hweight_64.o \ - copyuser_power7.o string_64.o copypage_power7.o + copyuser_power7.o string_64.o copypage_power7.o \ + memcpy_power7.o obj-$(CONFIG_XMON) += sstep.o ldstfp.o obj-$(CONFIG_KPROBES) += sstep.o ldstfp.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += sstep.o ldstfp.o Index: linux-build/arch/powerpc/lib/memcpy_64.S === --- linux-build.orig/arch/powerpc/lib/memcpy_64.S 2012-05-30 09:39:59.0 +1000 +++ linux-build/arch/powerpc/lib/memcpy_64.S2012-05-31 09:12:00.093876936 +1000 @@ -11,7 +11,11 @@ .align 7 _GLOBAL(memcpy) +BEGIN_FTR_SECTION std r3,48(r1) /* save destination pointer for return value */ +FTR_SECTION_ELSE + b memcpy_power7 +ALT_FTR_SECTION_END_IFCLR(CPU_FTR_VMX_COPY) PPC_MTOCRF(0x01,r5) cmpldi cr1,r5,16 neg r6,r3 # LS 3 bits = # bytes to 8-byte dest bdry Index: linux-build/arch/powerpc/lib/memcpy_power7.S === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-build/arch/powerpc/lib/memcpy_power7.S2012-05-31 15:28:03.495781127 +1000 @@ -0,0 +1,650 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2012 + * + * Author: Anton Blanchard an...@au.ibm.com + */ +#include asm/ppc_asm.h + +#define STACKFRAMESIZE 256 +#define STK_REG(i) (112 + ((i)-14)*8) + +_GLOBAL(memcpy_power7) +#ifdef CONFIG_ALTIVEC + cmpldi r5,16 + cmpldi cr1,r5,4096 + + std r3,48(r1) + + blt .Lshort_copy + bgt cr1,.Lvmx_copy +#else + cmpldi r5,16 + + std r3,48(r1) + + blt .Lshort_copy +#endif + +.Lnonvmx_copy: + /* Get the source 8B aligned */ + neg r6,r4 + mtocrf 0x01,r6 + clrldi r6,r6,(64-3) + + bf cr7*4+3,1f + lbz r0,0(r4) + addir4,r4,1 + stb r0,0(r3) + addir3,r3,1 + +1: bf cr7*4+2,2f + lhz r0,0(r4) + addir4,r4,2 + sth r0,0(r3) + addir3,r3,2 + +2: bf cr7*4+1,3f + lwz r0,0(r4) + addir4,r4,4 + stw r0,0(r3) + addir3,r3,4 + +3: sub r5,r5,r6 + cmpldi r5,128 + blt 5f + + mflrr0 + stdur1,-STACKFRAMESIZE(r1) + std r14,STK_REG(r14)(r1) + std r15,STK_REG(r15)(r1) + std r16,STK_REG(r16)(r1) + std r17,STK_REG(r17)(r1) + std r18,STK_REG(r18)(r1) + std r19,STK_REG(r19)(r1) + std r20,STK_REG(r20)(r1) + std r21,STK_REG(r21)(r1) + std r22,STK_REG(r22)(r1) + std r0,STACKFRAMESIZE+16(r1) + + srdir6,r5,7 + mtctr r6 +
RE: kernel panic during kernel module load (powerpc specific part)
Michael, On Wed, 2012-05-30 at 16:33 +0200, Steffen Rumler wrote: I've found the following root cause: (6) Unfortunately, the trampoline code (do_plt_call()) is using register r11 to setup the jump. It looks like the prologue and epilogue are using also the register r11, in order to point to the previous stack frame. This is a conflict !!! The trampoline code is damaging the content of r11. Hi Steffen, Great bug report! I can't quite work out what the standards say, the versions I'm looking at are probably old anyway. The ABI supplement from https://www.power.org/resources/downloads/ is explicit about r11 being a requirement for the statically lined save/restore functions in section 3.3.4 on page 59. This means that any trampoline code must not ever use r11 or it can't be used to get to such save/restore functions safely from far away. Unfortunately the same doc and predecessors show r11 in all basic examples for PLT/trampoline code AFAICS, which is likely why all trampoline code uses r11 in any known case. I would guess that it was never envisioned that compiler generated code would be in a different section than save/restore functions, i.e., the Linux module __init assumptions for Power break the ABI. Or does the ABI break the __init concept?! Using r12 in the trampoline seems to be the obvious solution for module loading. But what about other code loading done? If, e.g., a user runs any app from bash it gets loaded and relocated and trampolines might get set up somehow. Wouldn't we have to find fix ANY trampoline code generator remotely related to a basic Power Architecture Linux? Or is it a basic assumption for anything but modules that compiler generated code may never ever be outside the .text section? I am not sure that would be a safe assumption. Isn't this problem going beyond just module loading for Power Architecture Linux? Regards, Heinz ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] edac: Use ccsr_pci structure instead of hardcoded define
There are some differences of register offset and definition between pci and pcie error management registers. While, some other pci/pcie error management registers are nearly the same. To merge pci and pcie edac code into one, it is easier to use ccsr_pci structure than the hardcoded define. So remove the hardcoded define and add pci/pcie error management register in ccsr_pci structure. Signed-off-by: Chunhe Lan chunhe@freescale.com Signed-off-by: Kumar Gala ga...@kernel.crashing.org Cc: Grant Likely grant.lik...@secretlab.ca Cc: Doug Thompson dougthomp...@xmission.com --- arch/powerpc/sysdev/fsl_pci.h | 46 +--- drivers/edac/mpc85xx_edac.h | 13 +- 2 files changed, 40 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/sysdev/fsl_pci.h b/arch/powerpc/sysdev/fsl_pci.h index a39ed5c..5378a47 100644 --- a/arch/powerpc/sysdev/fsl_pci.h +++ b/arch/powerpc/sysdev/fsl_pci.h @@ -1,7 +1,7 @@ /* * MPC85xx/86xx PCI Express structure define * - * Copyright 2007,2011 Freescale Semiconductor, Inc + * Copyright 2007,2011,2012 Freescale Semiconductor, Inc * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the @@ -14,6 +14,8 @@ #ifndef __POWERPC_FSL_PCI_H #define __POWERPC_FSL_PCI_H +#include asm/pci-bridge.h + #define PCIE_LTSSM 0x0404 /* PCIE Link Training and Status */ #define PCIE_LTSSM_L0 0x16/* L0 state */ #define PIWAR_EN 0x8000 /* Enable */ @@ -74,13 +76,41 @@ struct ccsr_pci { */ struct pci_inbound_window_regs piw[4]; - __be32 pex_err_dr; /* 0x.e00 - PCI/PCIE error detect register */ - u8 res21[4]; - __be32 pex_err_en; /* 0x.e08 - PCI/PCIE error interrupt enable register */ - u8 res22[4]; - __be32 pex_err_disr; /* 0x.e10 - PCI/PCIE error disable register */ - u8 res23[12]; - __be32 pex_err_cap_stat; /* 0x.e20 - PCI/PCIE error capture status register */ +/* Merge PCI Express/PCI error management registers */ + __be32 pex_err_dr; /* 0x.e00 + * - PCI/PCIE error detect register + */ + __be32 pex_err_cap_dr; /* 0x.e04 + * - PCI error capture disabled register + * - PCIE has no this register + */ + __be32 pex_err_en; /* 0x.e08 + * - PCI/PCIE error interrupt enable register + */ + __be32 pex_err_attrib; /* 0x.e0c + * - PCI error attributes capture register + * - PCIE has no this register + */ + __be32 pex_err_disr; /* 0x.e10 + * - PCI error address capture register + * - PCIE error disable register + */ + __be32 pex_err_ext_addr; /* 0x.e14 + * - PCI error extended addr capture register + * - PCIE has no this register + */ + __be32 pex_err_dl; /* 0x.e18 + * - PCI error data low capture register + * - PCIE has no this register + */ + __be32 pex_err_dh; /* 0x.e1c + * - PCI error data high capture register + * - PCIE has no this register + */ + __be32 pex_err_cap_stat; /* 0x.e20 + * - PCI gasket timer register + * - PCIE error capture status register + */ u8 res24[4]; __be32 pex_err_cap_r0; /* 0x.e28 - PCIE error capture register 0 */ __be32 pex_err_cap_r1; /* 0x.e2c - PCIE error capture register 0 */ diff --git a/drivers/edac/mpc85xx_edac.h b/drivers/edac/mpc85xx_edac.h index 932016f..8ba4152 100644 --- a/drivers/edac/mpc85xx_edac.h +++ b/drivers/edac/mpc85xx_edac.h @@ -1,5 +1,7 @@ /* * Freescale MPC85xx Memory Controller kenel module + * Copyright (c) 2012 Freescale Semiconductor, Inc. + * * Author: Dave Jiang dji...@mvista.com * * 2006-2007 (c) MontaVista Software, Inc. This file is licensed under @@ -131,17 +133,6 @@ #define PCI_EDE_PERR_MASK (PCI_EDE_TGT_PERR | PCI_EDE_MST_PERR | \ PCI_EDE_ADDR_PERR) -#define MPC85XX_PCI_ERR_DR 0x -#define MPC85XX_PCI_ERR_CAP_DR 0x0004 -#define MPC85XX_PCI_ERR_EN 0x0008 -#define MPC85XX_PCI_ERR_ATTRIB
[PATCH 2/2] edac/85xx: PCI/PCIe error interrupt edac support
Adding pcie error interrupt edac support for mpc85xx and p4080. mpc85xx uses the legacy interrupt report mechanism - the error interrupts are reported directly to mpic. While, p4080 attaches the most of error interrupts to interrupt 0. And report error interrupts to mpic via interrupt 0. This patch can handle both of them. Signed-off-by: Chunhe Lan chunhe@freescale.com Signed-off-by: Kumar Gala ga...@kernel.crashing.org Cc: Grant Likely grant.lik...@secretlab.ca Cc: Doug Thompson dougthomp...@xmission.com --- drivers/edac/mpc85xx_edac.c | 236 +-- drivers/edac/mpc85xx_edac.h |9 ++- 2 files changed, 191 insertions(+), 54 deletions(-) diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c index 73464a6..35eef79 100644 --- a/drivers/edac/mpc85xx_edac.c +++ b/drivers/edac/mpc85xx_edac.c @@ -1,5 +1,6 @@ /* * Freescale MPC85xx Memory Controller kenel module + * Copyright (c) 2012 Freescale Semiconductor, Inc. * * Author: Dave Jiang dji...@mvista.com * @@ -21,6 +22,7 @@ #include linux/of_platform.h #include linux/of_device.h +#include sysdev/fsl_pci.h #include edac_module.h #include edac_core.h #include mpc85xx_edac.h @@ -37,11 +39,6 @@ static u32 orig_ddr_err_sbe; /* * PCI Err defines */ -#ifdef CONFIG_PCI -static u32 orig_pci_err_cap_dr; -static u32 orig_pci_err_en; -#endif - static u32 orig_l2_err_disable; #ifdef CONFIG_FSL_SOC_BOOKE static u32 orig_hid1[2]; @@ -151,37 +148,52 @@ static void mpc85xx_pci_check(struct edac_pci_ctl_info *pci) { struct mpc85xx_pci_pdata *pdata = pci-pvt_info; u32 err_detect; + struct ccsr_pci *reg = pdata-pci_reg; + + err_detect = in_be32(pdata-pci_reg-pex_err_dr); + + if (pdata-pcie_flag) { + printk(KERN_ERR PCIE error(s) detected\n); + printk(KERN_ERR PCIE ERR_DR register: 0x%08x\n, err_detect); + printk(KERN_ERR PCIE ERR_CAP_STAT register: 0x%08x\n, + in_be32(reg-pex_err_cap_stat)); + printk(KERN_ERR PCIE ERR_CAP_R0 register: 0x%08x\n, + in_be32(reg-pex_err_cap_r0)); + printk(KERN_ERR PCIE ERR_CAP_R1 register: 0x%08x\n, + in_be32(reg-pex_err_cap_r1)); + printk(KERN_ERR PCIE ERR_CAP_R2 register: 0x%08x\n, + in_be32(reg-pex_err_cap_r2)); + printk(KERN_ERR PCIE ERR_CAP_R3 register: 0x%08x\n, + in_be32(reg-pex_err_cap_r3)); + } else { + /* master aborts can happen during PCI config cycles */ + if (!(err_detect ~(PCI_EDE_MULTI_ERR | PCI_EDE_MST_ABRT))) { + out_be32(reg-pex_err_dr, err_detect); + return; + } - err_detect = in_be32(pdata-pci_vbase + MPC85XX_PCI_ERR_DR); - - /* master aborts can happen during PCI config cycles */ - if (!(err_detect ~(PCI_EDE_MULTI_ERR | PCI_EDE_MST_ABRT))) { - out_be32(pdata-pci_vbase + MPC85XX_PCI_ERR_DR, err_detect); - return; + printk(KERN_ERR PCI error(s) detected\n); + printk(KERN_ERR PCI/X ERR_DR register: 0x%08x\n, err_detect); + printk(KERN_ERR PCI/X ERR_ATTRIB register: 0x%08x\n, + in_be32(reg-pex_err_attrib)); + printk(KERN_ERR PCI/X ERR_ADDR register: 0x%08x\n, + in_be32(reg-pex_err_disr)); + printk(KERN_ERR PCI/X ERR_EXT_ADDR register: 0x%08x\n, + in_be32(reg-pex_err_ext_addr)); + printk(KERN_ERR PCI/X ERR_DL register: 0x%08x\n, + in_be32(reg-pex_err_dl)); + printk(KERN_ERR PCI/X ERR_DH register: 0x%08x\n, + in_be32(reg-pex_err_dh)); + + if (err_detect PCI_EDE_PERR_MASK) + edac_pci_handle_pe(pci, pci-ctl_name); + + if (err_detect ~(PCI_EDE_MULTI_ERR | PCI_EDE_PERR_MASK)) + edac_pci_handle_npe(pci, pci-ctl_name); } - printk(KERN_ERR PCI error(s) detected\n); - printk(KERN_ERR PCI/X ERR_DR register: %#08x\n, err_detect); - - printk(KERN_ERR PCI/X ERR_ATTRIB register: %#08x\n, - in_be32(pdata-pci_vbase + MPC85XX_PCI_ERR_ATTRIB)); - printk(KERN_ERR PCI/X ERR_ADDR register: %#08x\n, - in_be32(pdata-pci_vbase + MPC85XX_PCI_ERR_ADDR)); - printk(KERN_ERR PCI/X ERR_EXT_ADDR register: %#08x\n, - in_be32(pdata-pci_vbase + MPC85XX_PCI_ERR_EXT_ADDR)); - printk(KERN_ERR PCI/X ERR_DL register: %#08x\n, - in_be32(pdata-pci_vbase + MPC85XX_PCI_ERR_DL)); - printk(KERN_ERR PCI/X ERR_DH register: %#08x\n, - in_be32(pdata-pci_vbase + MPC85XX_PCI_ERR_DH)); - /* clear error bits */ - out_be32(pdata-pci_vbase
Re: Re[2]: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
Abatron Support supp...@abatron.ch wrote on 2012/05/30 14:08:26: I have tested this briefly with BDI2000 on P2010(e500) and it works for me. I don't know if there are any bad side effects, therfore this RFC. We used to have MSR_DE surrounded by CONFIG_something to ensure it wasn't set under normal operation. IIRC, if MSR_DE is set, you will have problems with software debuggers that utilize the the debugging registers in the chip itself. You only want to force this to be set when using the BDI, not at other times. This MSR_DE is also of interest and used for software debuggers that make use of the debug registers. Only if MSR_DE is set then debug interrupts are generated. If a debug event leads to a debug interrupt handled by a software debugger or if it leads to a debug halt handled by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM. The e500 Core Family Reference Manual chapter Chapter 8 Debug Support explains in detail the effect of MSR_DE. So what is the verdict on this? I don't buy into Dan argument without some hard data. Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Re[4]: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
Abatron Support supp...@abatron.ch wrote on 2012/05/31 11:30:57: Abatron Support supp...@abatron.ch wrote on 2012/05/30 14:08:26: I have tested this briefly with BDI2000 on P2010(e500) and it works for me. I don't know if there are any bad side effects, therfore this RFC. We used to have MSR_DE surrounded by CONFIG_something to ensure it wasn't set under normal operation. IIRC, if MSR_DE is set, you will have problems with software debuggers that utilize the the debugging registers in the chip itself. You only want to force this to be set when using the BDI, not at other times. This MSR_DE is also of interest and used for software debuggers that make use of the debug registers. Only if MSR_DE is set then debug interrupts are generated. If a debug event leads to a debug interrupt handled by a software debugger or if it leads to a debug halt handled by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM. The e500 Core Family Reference Manual chapter Chapter 8 Debug Support explains in detail the effect of MSR_DE. So what is the verdict on this? I don't buy into Dan argument without some hard data. What I tried to mention is that handling the MSR_DE correct is not only an emulator (JTAG debugger) requirement. Also a software debugger may depend on a correct handled MSR_DE bit. Yes, that made sense to me too. How would SW debuggers work if the kernel keeps turning off MSR_DE first chance it gets? Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re[4]: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
Abatron Support supp...@abatron.ch wrote on 2012/05/30 14:08:26: I have tested this briefly with BDI2000 on P2010(e500) and it works for me. I don't know if there are any bad side effects, therfore this RFC. We used to have MSR_DE surrounded by CONFIG_something to ensure it wasn't set under normal operation. IIRC, if MSR_DE is set, you will have problems with software debuggers that utilize the the debugging registers in the chip itself. You only want to force this to be set when using the BDI, not at other times. This MSR_DE is also of interest and used for software debuggers that make use of the debug registers. Only if MSR_DE is set then debug interrupts are generated. If a debug event leads to a debug interrupt handled by a software debugger or if it leads to a debug halt handled by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM. The e500 Core Family Reference Manual chapter Chapter 8 Debug Support explains in detail the effect of MSR_DE. So what is the verdict on this? I don't buy into Dan argument without some hard data. What I tried to mention is that handling the MSR_DE correct is not only an emulator (JTAG debugger) requirement. Also a software debugger may depend on a correct handled MSR_DE bit. Ruedi ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: kernel panic during kernel module load (powerpc specific part)
On Thu, May 31, 2012 at 07:04:42AM +, Wrobel Heinz-R39252 wrote: Michael, On Wed, 2012-05-30 at 16:33 +0200, Steffen Rumler wrote: I've found the following root cause: (6) Unfortunately, the trampoline code (do_plt_call()) is using register r11 to setup the jump. It looks like the prologue and epilogue are using also the register r11, in order to point to the previous stack frame. This is a conflict !!! The trampoline code is damaging the content of r11. Hi Steffen, Great bug report! I can't quite work out what the standards say, the versions I'm looking at are probably old anyway. The ABI supplement from https://www.power.org/resources/downloads/ is explicit about r11 being a requirement for the statically lined save/restore functions in section 3.3.4 on page 59. This means that any trampoline code must not ever use r11 or it can't be used to get to such save/restore functions safely from far away. I believe that the basic premise is that you should provide a directly reachable copy of the save/rstore functions, even if this means that you need several copies of the functions. Unfortunately the same doc and predecessors show r11 in all basic examples for PLT/trampoline code AFAICS, which is likely why all trampoline code uses r11 in any known case. I would guess that it was never envisioned that compiler generated code would be in a different section than save/restore functions, i.e., the Linux module __init assumptions for Power break the ABI. Or does the ABI break the __init concept?! Using r12 in the trampoline seems to be the obvious solution for module loading. But what about other code loading done? If, e.g., a user runs any app from bash it gets loaded and relocated and trampolines might get set up somehow. I don't think so. The linker/whatever should generate a copy of the save/restore functions for every executable code area (shared library), and probably more than one copy if the text becomes too large. For 64 bit code, these functions are actually inserted by the linker. [Ok, I just recompiled my 64 bit kernel with -Os and I see that vmlinux gets one copy of the save/restore functions and every module also gets its copy.] This makes sense, really these functions are there for a compromise between locality and performance, there should be one per code section, otherwise the cache line used by the trampoline negates a large part of their advantage. Wouldn't we have to find fix ANY trampoline code generator remotely related to a basic Power Architecture Linux? Or is it a basic assumption for anything but modules that compiler generated code may never ever be outside the .text section? I am not sure that would be a safe assumption. Isn't this problem going beyond just module loading for Power Architecture Linux? I don't think so. It really seems to be a 32 bit kernel problem. Regards, Gabriel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
On 05/31/2012 04:56 AM, Joakim Tjernlund wrote: Abatron Support supp...@abatron.ch wrote on 2012/05/31 11:30:57: Abatron Support supp...@abatron.ch wrote on 2012/05/30 14:08:26: I have tested this briefly with BDI2000 on P2010(e500) and it works for me. I don't know if there are any bad side effects, therfore this RFC. We used to have MSR_DE surrounded by CONFIG_something to ensure it wasn't set under normal operation. IIRC, if MSR_DE is set, you will have problems with software debuggers that utilize the the debugging registers in the chip itself. You only want to force this to be set when using the BDI, not at other times. This MSR_DE is also of interest and used for software debuggers that make use of the debug registers. Only if MSR_DE is set then debug interrupts are generated. If a debug event leads to a debug interrupt handled by a software debugger or if it leads to a debug halt handled by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM. The e500 Core Family Reference Manual chapter Chapter 8 Debug Support explains in detail the effect of MSR_DE. So what is the verdict on this? I don't buy into Dan argument without some hard data. What I tried to mention is that handling the MSR_DE correct is not only an emulator (JTAG debugger) requirement. Also a software debugger may depend on a correct handled MSR_DE bit. Yes, that made sense to me too. How would SW debuggers work if the kernel keeps turning off MSR_DE first chance it gets? The kernel selectively enables MSR_DE when it wants to debug. I'm not sure if anything will be bothered by leaving it on all the time. This is something we need for virtualization as well, so a hypervisor can debug the guest. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[Early RFC 0/6] arch/powerpc: Add 64TB support to ppc64
Hi, This patchset include preparatory patches for supporting 64TB with ppc64. I haven't completed the actual patch that bump the USER_ESID bits. I wanted the share the changes early so that I can get feedback on the approach. The changes itself contains few FIXME!! which I will be addressing in the later updates. Thanks, -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[Early RFC 1/6] arch/powerpc: Use hpt_va to compute virtual address
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Don't open code the same Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/platforms/cell/beat_htab.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/cell/beat_htab.c b/arch/powerpc/platforms/cell/beat_htab.c index 943c9d3..b83077e 100644 --- a/arch/powerpc/platforms/cell/beat_htab.c +++ b/arch/powerpc/platforms/cell/beat_htab.c @@ -259,7 +259,7 @@ static void beat_lpar_hpte_updateboltedpp(unsigned long newpp, u64 dummy0, dummy1; vsid = get_kernel_vsid(ea, MMU_SEGSIZE_256M); - va = (vsid 28) | (ea 0x0fff); + va = hpt_va(ea, vsid, MMU_SEGSIZE_256M); raw_spin_lock(beat_htab_lock); slot = beat_lpar_hpte_find(va, psize); -- 1.7.10 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[Early RFC 2/6] arch/powerpc: Convert virtual address to a struct
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com This is in preparation to the conversion of 64 bit powerpc virtual address to the max 78 bits. Later patch will switch struct virt_addr to a struct of virtual segment id and segment offset. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s.h |2 +- arch/powerpc/include/asm/machdep.h |6 +-- arch/powerpc/include/asm/mmu-hash64.h | 24 ++ arch/powerpc/include/asm/tlbflush.h |4 +- arch/powerpc/kvm/book3s_64_mmu_host.c |3 +- arch/powerpc/mm/hash_native_64.c| 76 +-- arch/powerpc/mm/hash_utils_64.c | 12 ++--- arch/powerpc/mm/hugetlbpage-hash64.c|3 +- arch/powerpc/mm/tlb_hash64.c|3 +- arch/powerpc/platforms/cell/beat_htab.c | 17 +++ arch/powerpc/platforms/ps3/htab.c |6 +-- arch/powerpc/platforms/pseries/lpar.c | 30 ++-- 12 files changed, 103 insertions(+), 83 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index fd07f43..374b75d 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -59,7 +59,7 @@ struct hpte_cache { struct hlist_node list_vpte; struct hlist_node list_vpte_long; struct rcu_head rcu_head; - u64 host_va; + struct virt_addr host_va; u64 pfn; ulong slot; struct kvmppc_pte pte; diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 42ce570..b34d0a9 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -34,19 +34,19 @@ struct machdep_calls { char*name; #ifdef CONFIG_PPC64 void(*hpte_invalidate)(unsigned long slot, - unsigned long va, + struct virt_addr va, int psize, int ssize, int local); long(*hpte_updatepp)(unsigned long slot, unsigned long newpp, -unsigned long va, +struct virt_addr va, int psize, int ssize, int local); void(*hpte_updateboltedpp)(unsigned long newpp, unsigned long ea, int psize, int ssize); long(*hpte_insert)(unsigned long hpte_group, - unsigned long va, + struct virt_addr va, unsigned long prpn, unsigned long rflags, unsigned long vflags, diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h index 1c65a59..5ff936b 100644 --- a/arch/powerpc/include/asm/mmu-hash64.h +++ b/arch/powerpc/include/asm/mmu-hash64.h @@ -143,6 +143,10 @@ struct mmu_psize_def unsigned long sllp; /* SLB L||LP (exact mask to use in slbmte) */ }; +struct virt_addr { + unsigned long addr; +}; + #endif /* __ASSEMBLY__ */ /* @@ -183,11 +187,11 @@ extern int mmu_ci_restrictions; * This function sets the AVPN and L fields of the HPTE appropriately * for the page size */ -static inline unsigned long hpte_encode_v(unsigned long va, int psize, +static inline unsigned long hpte_encode_v(struct virt_addr va, int psize, int ssize) { unsigned long v; - v = (va 23) ~(mmu_psize_defs[psize].avpnm); + v = (va.addr 23) ~(mmu_psize_defs[psize].avpnm); v = HPTE_V_AVPN_SHIFT; if (psize != MMU_PAGE_4K) v |= HPTE_V_LARGE; @@ -218,28 +222,30 @@ static inline unsigned long hpte_encode_r(unsigned long pa, int psize) /* * Build a VA given VSID, EA and segment size */ -static inline unsigned long hpt_va(unsigned long ea, unsigned long vsid, +static inline struct virt_addr hpt_va(unsigned long ea, unsigned long vsid, int ssize) { + struct virt_addr va; if (ssize == MMU_SEGSIZE_256M) - return (vsid 28) | (ea 0xfffUL); - return (vsid 40) | (ea 0xffUL); + va.addr = (vsid 28) | (ea 0xfffUL); + va.addr = (vsid 40) | (ea 0xffUL); + return va; } /* * This hashes a virtual address */ -static inline unsigned long hpt_hash(unsigned long va, unsigned int shift, +static inline unsigned long hpt_hash(struct virt_addr va, unsigned int shift, int ssize) { unsigned
[Early RFC 3/6] arch/powerpc: Simplify hpte_decode
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com This patch simplify hpte_decode for easy switching of virtual address to vsid and segment offset combination in the later patch Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/mm/hash_native_64.c | 51 ++ 1 file changed, 30 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c index cab3892..76c2574 100644 --- a/arch/powerpc/mm/hash_native_64.c +++ b/arch/powerpc/mm/hash_native_64.c @@ -357,9 +357,10 @@ static void native_hpte_invalidate(unsigned long slot, struct virt_addr va, static void hpte_decode(struct hash_pte *hpte, unsigned long slot, int *psize, int *ssize, struct virt_addr *va) { + unsigned long avpn, pteg, vpi; unsigned long hpte_r = hpte-r; unsigned long hpte_v = hpte-v; - unsigned long avpn; + unsigned long vsid, seg_off; int i, size, shift, penc; if (!(hpte_v HPTE_V_LARGE)) @@ -386,32 +387,40 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot, } /* This works for all page sizes, and for 256M and 1T segments */ + *ssize = hpte_v HPTE_V_SSIZE_SHIFT; shift = mmu_psize_defs[size].shift; - avpn = (HPTE_V_AVPN_VAL(hpte_v) ~mmu_psize_defs[size].avpnm) 23; - - if (shift 23) { - unsigned long vpi, vsid, pteg; - pteg = slot / HPTES_PER_GROUP; - if (hpte_v HPTE_V_SECONDARY) - pteg = ~pteg; - switch (hpte_v HPTE_V_SSIZE_SHIFT) { - case MMU_SEGSIZE_256M: - vpi = ((avpn 28) ^ pteg) htab_hash_mask; - break; - case MMU_SEGSIZE_1T: - vsid = avpn 40; + avpn = (HPTE_V_AVPN_VAL(hpte_v) ~mmu_psize_defs[size].avpnm); + pteg = slot / HPTES_PER_GROUP; + if (hpte_v HPTE_V_SECONDARY) + pteg = ~pteg; + + switch (*ssize) { + case MMU_SEGSIZE_256M: + /* We only have 28 - 23 bits of seg_off in avpn */ + seg_off = (avpn 0x1f) 23; + vsid= avpn 5; + /* We can find more bits from the pteg value */ + if (shift 23) { + vpi = (vsid ^ pteg) htab_hash_mask; + seg_off |= vpi shift; + } + va-addr = vsid 28 | seg_off; + case MMU_SEGSIZE_1T: + /* We only have 40 - 23 bits of seg_off in avpn */ + seg_off = (avpn 0x1) 23; + vsid= avpn 17; + if (shift 23) { vpi = (vsid ^ (vsid 25) ^ pteg) htab_hash_mask; - break; - default: - avpn = vpi = size = 0; + seg_off |= vpi shift; } - avpn |= (vpi mmu_psize_defs[size].shift); + va-addr = vsid 40 | seg_off; + default: + seg_off = 0; + vsid= 0; + va-addr = 0; } - - va-addr = avpn; *psize = size; - *ssize = hpte_v HPTE_V_SSIZE_SHIFT; } /* -- 1.7.10 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[Early RFC 4/6] arch/powerpc: Use vsid and segment offset to represent virtual address
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com This patch enables us to have 78 bit virtual address. With 1TB segments we use 40 bits of virtual adress as segment offset and the remaining 24 bits (of the current 64 bit virtual address) are used to index the virtual segment. Out of the 24 bits we currently use 19 bits for user context and that leave us with only 4 bits for effective segment ID. In-order to support more than 16TB of memory we would require more than 4 ESID bits. This patch splits the virtual address to two unsigned long components, vsid and segment offset thereby allowing us to support 78 bit virtual address. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/mmu-hash64.h | 62 --- arch/powerpc/mm/hash_low_64.S | 191 ++--- arch/powerpc/mm/hash_native_64.c | 34 +++--- arch/powerpc/mm/hash_utils_64.c |6 +- arch/powerpc/platforms/ps3/htab.c | 13 +-- arch/powerpc/platforms/pseries/lpar.c | 29 ++--- 6 files changed, 191 insertions(+), 144 deletions(-) diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h index 5ff936b..e563bd2 100644 --- a/arch/powerpc/include/asm/mmu-hash64.h +++ b/arch/powerpc/include/asm/mmu-hash64.h @@ -143,8 +143,10 @@ struct mmu_psize_def unsigned long sllp; /* SLB L||LP (exact mask to use in slbmte) */ }; +/* 78 bit power virtual address */ struct virt_addr { - unsigned long addr; + unsigned long vsid; + unsigned long seg_off; }; #endif /* __ASSEMBLY__ */ @@ -161,6 +163,13 @@ struct virt_addr { #ifndef __ASSEMBLY__ +static inline int segment_shift(int ssize) +{ + if (ssize == MMU_SEGSIZE_256M) + return SID_SHIFT; + return SID_SHIFT_1T; +} + /* * The current system page and segment sizes */ @@ -184,6 +193,32 @@ extern unsigned long tce_alloc_start, tce_alloc_end; extern int mmu_ci_restrictions; /* + * This computes the AVPN and B fields of the first dword of a HPTE, + * for use when we want to match an existing PTE. The bottom 7 bits + * of the returned value are zero. + */ +static inline unsigned long hpte_encode_avpn(struct virt_addr va, int psize, +int ssize) +{ + unsigned long v; + + /* +* The AVA field omits the low-order 23 bits of the 78 bits VA. +* These bits are not needed in the PTE, because the +* low-order b of these bits are part of the byte offset +* into the virtual page and, if b 23, the high-order +* 23-b of these bits are always used in selecting the +* PTEGs to be searched +*/ + v = va.seg_off 23; + v |= va.vsid (segment_shift(ssize) - 23); + v = ~(mmu_psize_defs[psize].avpnm); + v = HPTE_V_AVPN_SHIFT; + v |= ((unsigned long) ssize) HPTE_V_SSIZE_SHIFT; + return v; +} + +/* * This function sets the AVPN and L fields of the HPTE appropriately * for the page size */ @@ -191,11 +226,9 @@ static inline unsigned long hpte_encode_v(struct virt_addr va, int psize, int ssize) { unsigned long v; - v = (va.addr 23) ~(mmu_psize_defs[psize].avpnm); - v = HPTE_V_AVPN_SHIFT; + v = hpte_encode_avpn(va, psize, ssize); if (psize != MMU_PAGE_4K) v |= HPTE_V_LARGE; - v |= ((unsigned long) ssize) HPTE_V_SSIZE_SHIFT; return v; } @@ -222,30 +255,31 @@ static inline unsigned long hpte_encode_r(unsigned long pa, int psize) /* * Build a VA given VSID, EA and segment size */ -static inline struct virt_addr hpt_va(unsigned long ea, unsigned long vsid, - int ssize) +static inline struct virt_addr hpt_va(unsigned long ea, unsigned long vsid, int ssize) { struct virt_addr va; + + va.vsid= vsid; if (ssize == MMU_SEGSIZE_256M) - va.addr = (vsid 28) | (ea 0xfffUL); - va.addr = (vsid 40) | (ea 0xffUL); + va.seg_off = ea 0xfffUL; + else + va.seg_off = ea 0xffUL; return va; } /* * This hashes a virtual address */ - -static inline unsigned long hpt_hash(struct virt_addr va, unsigned int shift, -int ssize) +/* Verify */ +static inline unsigned long hpt_hash(struct virt_addr va, unsigned int shift, int ssize) { unsigned long hash, vsid; if (ssize == MMU_SEGSIZE_256M) { - hash = (va.addr 28) ^ ((va.addr 0x0fffUL) shift); + hash = va.vsid ^ (va.seg_off shift); } else { - vsid = va.addr 40; - hash = vsid ^ (vsid 25) ^ ((va.addr 0xffUL) shift); + vsid = va.vsid; + hash = vsid ^ (vsid 25) ^ (va.seg_off shift); } return hash
[Early RFC 5/6] arch/powerpc: Make KERN_VIRT_SIZE not dependend on PGTABLE_RANGE
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com As we keep increasing PGTABLE_RANGE we need not increase the virual map area for kernel. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pgtable-ppc64.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index c420561..8af1cf2 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -41,7 +41,7 @@ #else #define KERN_VIRT_START ASM_CONST(0xD000) #endif -#define KERN_VIRT_SIZE PGTABLE_RANGE +#define KERN_VIRT_SIZE ASM_CONST(0x1000) /* * The vmalloc space starts at the beginning of that region, and -- 1.7.10 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[Early RFC 6/6] arch/powerpc: Increase the slice range to 64TB
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com This patch makes the high psizes mask as an unsigned char array so that we can have more than 16TB. Currently we support upto 64TB Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/mmu-hash64.h |7 ++- arch/powerpc/include/asm/page_64.h|7 ++- arch/powerpc/mm/hash_utils_64.c | 15 +++--- arch/powerpc/mm/slb_low.S | 35 arch/powerpc/mm/slice.c | 95 + 5 files changed, 109 insertions(+), 50 deletions(-) diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h index e563bd2..0f8b10d 100644 --- a/arch/powerpc/include/asm/mmu-hash64.h +++ b/arch/powerpc/include/asm/mmu-hash64.h @@ -456,7 +456,12 @@ typedef struct { #ifdef CONFIG_PPC_MM_SLICES u64 low_slices_psize; /* SLB page size encodings */ - u64 high_slices_psize; /* 4 bits per slice for now */ + /* +* FIXME!! it should be derived from PGTABLE_RANGE +* Right now we support 64TB and 4 bits for each +* 1TB slice we need 32 bytes for 64TB. +*/ + unsigned char high_slices_psize[32]; /* 4 bits per slice for now */ #else u16 sllp; /* SLB page size encoding */ #endif diff --git a/arch/powerpc/include/asm/page_64.h b/arch/powerpc/include/asm/page_64.h index fed85e6..8806e87 100644 --- a/arch/powerpc/include/asm/page_64.h +++ b/arch/powerpc/include/asm/page_64.h @@ -82,7 +82,12 @@ extern u64 ppc64_pft_size; struct slice_mask { u16 low_slices; - u16 high_slices; + /* +* FIXME!! +* This should be derived out of PGTABLE_RANGE. For the current +* max 64TB, u64 should be ok. +*/ + u64 high_slices; }; struct mm_struct; diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index 8b5d3c2..beace0b 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -804,16 +804,19 @@ unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap) #ifdef CONFIG_PPC_MM_SLICES unsigned int get_paca_psize(unsigned long addr) { - unsigned long index, slices; + u64 lpsizes; + unsigned char *hpsizes; + unsigned long index, mask_index; if (addr SLICE_LOW_TOP) { - slices = get_paca()-context.low_slices_psize; + lpsizes = get_paca()-context.low_slices_psize; index = GET_LOW_SLICE_INDEX(addr); - } else { - slices = get_paca()-context.high_slices_psize; - index = GET_HIGH_SLICE_INDEX(addr); + return (lpsizes (index * 4)) 0xF; } - return (slices (index * 4)) 0xF; + hpsizes = get_paca()-context.high_slices_psize; + index = GET_HIGH_SLICE_INDEX(addr) 1; + mask_index = GET_HIGH_SLICE_INDEX(addr) - (index 1); + return (hpsizes[index] (mask_index * 4)) 0xF; } #else diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S index b9ee79ce..c355af6 100644 --- a/arch/powerpc/mm/slb_low.S +++ b/arch/powerpc/mm/slb_low.S @@ -108,17 +108,34 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_1T_SEGMENT) * between 4k and 64k standard page size */ #ifdef CONFIG_PPC_MM_SLICES + /* r10 have esid */ cmpldi r10,16 - - /* Get the slice index * 4 in r11 and matching slice size mask in r9 */ - ld r9,PACALOWSLICESPSIZE(r13) - sldir11,r10,2 + /* below SLICE_LOW_TOP */ blt 5f - ld r9,PACAHIGHSLICEPSIZE(r13) - srdir11,r10,(SLICE_HIGH_SHIFT - SLICE_LOW_SHIFT - 2) - andi. r11,r11,0x3c - -5: /* Extract the psize and multiply to get an array offset */ + /* +* Handle hpsizes, +* r9 is get_paca()-context.high_slices_psize[index], r11 is mask_index +* We use r10 here, later we restore it to esid. +* Can we use other register instead of r10 ? +*/ + srdir10,r10,(SLICE_HIGH_SHIFT - SLICE_LOW_SHIFT) /* index */ + srdir11,r10,1 /* r11 is array index */ + addir9,r11,PACAHIGHSLICEPSIZE + lbzxr9,r9,r13 /* r9 is hpsizes[r11] */ + sldir11,r11,1 + subfr11,r11,r10 /* mask_index = index - (array_index 1) */ + srdir10,r3,28 /* restore r10 with esid */ + b 6f +5: + /* +* Handle lpsizes +* r9 is get_paca()-context.low_slices_psize, r11 is index +*/ + ld r9,PACALOWSLICESPSIZE(r13) + mr r11,r10 +6: + sldir11,r11,2 /* index * 4 */ + /* Extract the psize and multiply to get an array offset */ srd r9,r9,r11 andi. r9,r9,0xf mulli r9,r9,MMUPSIZEDEFSIZE diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c index 73709f7..302a481 100644 ---
Re: [Early RFC 0/6] arch/powerpc: Add 64TB support to ppc64
Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes: Hi, This patchset include preparatory patches for supporting 64TB with ppc64. I haven't completed the actual patch that bump the USER_ESID bits. I wanted the share the changes early so that I can get feedback on the approach. The changes itself contains few FIXME!! which I will be addressing in the later updates. Here is the patch that update USER_ESID_BITS. I get a machine check exception with this changes. That is why I didn't include this in the patch series. commit 5fff8ff606bc136c510350a40528431196f60001 Author: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Date: Wed May 23 16:59:36 2012 +0530 arch/powerpc: Add 64TB support Increase max addressable range to 64TB. This is not tested on real hardware yet. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h index 0f8b10d..9f9a2a1 100644 --- a/arch/powerpc/include/asm/mmu-hash64.h +++ b/arch/powerpc/include/asm/mmu-hash64.h @@ -375,8 +375,8 @@ extern void slb_set_size(u16 size); #define VSID_MODULUS_1T((1ULVSID_BITS_1T)-1) #define CONTEXT_BITS 19 -#define USER_ESID_BITS 16 -#define USER_ESID_BITS_1T 4 +#define USER_ESID_BITS 18 +#define USER_ESID_BITS_1T 6 #define USER_VSID_RANGE(1UL (USER_ESID_BITS + SID_SHIFT)) diff --git a/arch/powerpc/include/asm/pgtable-ppc64-4k.h b/arch/powerpc/include/asm/pgtable-ppc64-4k.h index 6eefdcf..b3eccf2 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64-4k.h +++ b/arch/powerpc/include/asm/pgtable-ppc64-4k.h @@ -7,7 +7,7 @@ */ #define PTE_INDEX_SIZE 9 #define PMD_INDEX_SIZE 7 -#define PUD_INDEX_SIZE 7 +#define PUD_INDEX_SIZE 9 #define PGD_INDEX_SIZE 9 #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/include/asm/pgtable-ppc64-64k.h b/arch/powerpc/include/asm/pgtable-ppc64-64k.h index 90533dd..be4e287 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64-64k.h +++ b/arch/powerpc/include/asm/pgtable-ppc64-64k.h @@ -7,7 +7,7 @@ #define PTE_INDEX_SIZE 12 #define PMD_INDEX_SIZE 12 #define PUD_INDEX_SIZE 0 -#define PGD_INDEX_SIZE 4 +#define PGD_INDEX_SIZE 6 #ifndef __ASSEMBLY__ #define PTE_TABLE_SIZE (sizeof(real_pte_t) PTE_INDEX_SIZE) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 8e2d037..426ed13 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -100,8 +100,8 @@ extern struct task_struct *last_task_used_spe; #endif #ifdef CONFIG_PPC64 -/* 64-bit user address space is 44-bits (16TB user VM) */ -#define TASK_SIZE_USER64 (0x1000UL) +/* 64-bit user address space is 46-bits (64TB user VM) */ +#define TASK_SIZE_USER64 (0x4000UL) /* * 32-bit user address space is 4GB - 1 page diff --git a/arch/powerpc/include/asm/sparsemem.h b/arch/powerpc/include/asm/sparsemem.h index 0c5fa31..f6fc0ee 100644 --- a/arch/powerpc/include/asm/sparsemem.h +++ b/arch/powerpc/include/asm/sparsemem.h @@ -10,8 +10,8 @@ */ #define SECTION_SIZE_BITS 24 -#define MAX_PHYSADDR_BITS 44 -#define MAX_PHYSMEM_BITS44 +#define MAX_PHYSADDR_BITS 46 +#define MAX_PHYSMEM_BITS46 #endif /* CONFIG_SPARSEMEM */ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
Scott Wood scottw...@freescale.com wrote on 2012/05/31 19:47:53: On 05/31/2012 04:56 AM, Joakim Tjernlund wrote: Abatron Support supp...@abatron.ch wrote on 2012/05/31 11:30:57: Abatron Support supp...@abatron.ch wrote on 2012/05/30 14:08:26: I have tested this briefly with BDI2000 on P2010(e500) and it works for me. I don't know if there are any bad side effects, therfore this RFC. We used to have MSR_DE surrounded by CONFIG_something to ensure it wasn't set under normal operation. IIRC, if MSR_DE is set, you will have problems with software debuggers that utilize the the debugging registers in the chip itself. You only want to force this to be set when using the BDI, not at other times. This MSR_DE is also of interest and used for software debuggers that make use of the debug registers. Only if MSR_DE is set then debug interrupts are generated. If a debug event leads to a debug interrupt handled by a software debugger or if it leads to a debug halt handled by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM. The e500 Core Family Reference Manual chapter Chapter 8 Debug Support explains in detail the effect of MSR_DE. So what is the verdict on this? I don't buy into Dan argument without some hard data. What I tried to mention is that handling the MSR_DE correct is not only an emulator (JTAG debugger) requirement. Also a software debugger may depend on a correct handled MSR_DE bit. Yes, that made sense to me too. How would SW debuggers work if the kernel keeps turning off MSR_DE first chance it gets? The kernel selectively enables MSR_DE when it wants to debug. I'm not sure if anything will be bothered by leaving it on all the time. This is something we need for virtualization as well, so a hypervisor can debug the guest. hmm, I read that as you as in favour of the patch? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
On 05/31/2012 04:38 PM, Joakim Tjernlund wrote: Scott Wood scottw...@freescale.com wrote on 2012/05/31 19:47:53: On 05/31/2012 04:56 AM, Joakim Tjernlund wrote: Abatron Support supp...@abatron.ch wrote on 2012/05/31 11:30:57: Abatron Support supp...@abatron.ch wrote on 2012/05/30 14:08:26: I have tested this briefly with BDI2000 on P2010(e500) and it works for me. I don't know if there are any bad side effects, therfore this RFC. We used to have MSR_DE surrounded by CONFIG_something to ensure it wasn't set under normal operation. IIRC, if MSR_DE is set, you will have problems with software debuggers that utilize the the debugging registers in the chip itself. You only want to force this to be set when using the BDI, not at other times. This MSR_DE is also of interest and used for software debuggers that make use of the debug registers. Only if MSR_DE is set then debug interrupts are generated. If a debug event leads to a debug interrupt handled by a software debugger or if it leads to a debug halt handled by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM. The e500 Core Family Reference Manual chapter Chapter 8 Debug Support explains in detail the effect of MSR_DE. So what is the verdict on this? I don't buy into Dan argument without some hard data. What I tried to mention is that handling the MSR_DE correct is not only an emulator (JTAG debugger) requirement. Also a software debugger may depend on a correct handled MSR_DE bit. Yes, that made sense to me too. How would SW debuggers work if the kernel keeps turning off MSR_DE first chance it gets? The kernel selectively enables MSR_DE when it wants to debug. I'm not sure if anything will be bothered by leaving it on all the time. This is something we need for virtualization as well, so a hypervisor can debug the guest. hmm, I read that as you as in favour of the patch? I'd want some confirmation that it doesn't break anything, and that there aren't any other places that need MSR_DE that this doesn't cover, but in general yes. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
Scott Wood scottw...@freescale.com wrote on 2012/05/31 23:43:34: On 05/31/2012 04:38 PM, Joakim Tjernlund wrote: Scott Wood scottw...@freescale.com wrote on 2012/05/31 19:47:53: On 05/31/2012 04:56 AM, Joakim Tjernlund wrote: Abatron Support supp...@abatron.ch wrote on 2012/05/31 11:30:57: Abatron Support supp...@abatron.ch wrote on 2012/05/30 14:08:26: I have tested this briefly with BDI2000 on P2010(e500) and it works for me. I don't know if there are any bad side effects, therfore this RFC. We used to have MSR_DE surrounded by CONFIG_something to ensure it wasn't set under normal operation. IIRC, if MSR_DE is set, you will have problems with software debuggers that utilize the the debugging registers in the chip itself. You only want to force this to be set when using the BDI, not at other times. This MSR_DE is also of interest and used for software debuggers that make use of the debug registers. Only if MSR_DE is set then debug interrupts are generated. If a debug event leads to a debug interrupt handled by a software debugger or if it leads to a debug halt handled by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM. The e500 Core Family Reference Manual chapter Chapter 8 Debug Support explains in detail the effect of MSR_DE. So what is the verdict on this? I don't buy into Dan argument without some hard data. What I tried to mention is that handling the MSR_DE correct is not only an emulator (JTAG debugger) requirement. Also a software debugger may depend on a correct handled MSR_DE bit. Yes, that made sense to me too. How would SW debuggers work if the kernel keeps turning off MSR_DE first chance it gets? The kernel selectively enables MSR_DE when it wants to debug. I'm not sure if anything will be bothered by leaving it on all the time. This is something we need for virtualization as well, so a hypervisor can debug the guest. hmm, I read that as you as in favour of the patch? I'd want some confirmation that it doesn't break anything, and that there aren't any other places that need MSR_DE that this doesn't cover, but in general yes. Then you need to test drive the patch :) Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
On 05/31/2012 05:14 PM, Joakim Tjernlund wrote: Scott Wood scottw...@freescale.com wrote on 2012/05/31 23:43:34: On 05/31/2012 04:38 PM, Joakim Tjernlund wrote: Scott Wood scottw...@freescale.com wrote on 2012/05/31 19:47:53: On 05/31/2012 04:56 AM, Joakim Tjernlund wrote: Abatron Support supp...@abatron.ch wrote on 2012/05/31 11:30:57: Abatron Support supp...@abatron.ch wrote on 2012/05/30 14:08:26: I have tested this briefly with BDI2000 on P2010(e500) and it works for me. I don't know if there are any bad side effects, therfore this RFC. We used to have MSR_DE surrounded by CONFIG_something to ensure it wasn't set under normal operation. IIRC, if MSR_DE is set, you will have problems with software debuggers that utilize the the debugging registers in the chip itself. You only want to force this to be set when using the BDI, not at other times. This MSR_DE is also of interest and used for software debuggers that make use of the debug registers. Only if MSR_DE is set then debug interrupts are generated. If a debug event leads to a debug interrupt handled by a software debugger or if it leads to a debug halt handled by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM. The e500 Core Family Reference Manual chapter Chapter 8 Debug Support explains in detail the effect of MSR_DE. So what is the verdict on this? I don't buy into Dan argument without some hard data. What I tried to mention is that handling the MSR_DE correct is not only an emulator (JTAG debugger) requirement. Also a software debugger may depend on a correct handled MSR_DE bit. Yes, that made sense to me too. How would SW debuggers work if the kernel keeps turning off MSR_DE first chance it gets? The kernel selectively enables MSR_DE when it wants to debug. I'm not sure if anything will be bothered by leaving it on all the time. This is something we need for virtualization as well, so a hypervisor can debug the guest. hmm, I read that as you as in favour of the patch? I'd want some confirmation that it doesn't break anything, and that there aren't any other places that need MSR_DE that this doesn't cover, but in general yes. Then you need to test drive the patch :) I was thinking more along the lines of someone who's more familiar with the relevant parts of the code confirming that it's really OK, not just testing that it doesn't blow up in my face. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
Scott Wood scottw...@freescale.com wrote on 2012/06/01 00:16:53: On 05/31/2012 05:14 PM, Joakim Tjernlund wrote: Scott Wood scottw...@freescale.com wrote on 2012/05/31 23:43:34: On 05/31/2012 04:38 PM, Joakim Tjernlund wrote: Scott Wood scottw...@freescale.com wrote on 2012/05/31 19:47:53: On 05/31/2012 04:56 AM, Joakim Tjernlund wrote: Abatron Support supp...@abatron.ch wrote on 2012/05/31 11:30:57: Abatron Support supp...@abatron.ch wrote on 2012/05/30 14:08:26: I have tested this briefly with BDI2000 on P2010(e500) and it works for me. I don't know if there are any bad side effects, therfore this RFC. We used to have MSR_DE surrounded by CONFIG_something to ensure it wasn't set under normal operation. IIRC, if MSR_DE is set, you will have problems with software debuggers that utilize the the debugging registers in the chip itself. You only want to force this to be set when using the BDI, not at other times. This MSR_DE is also of interest and used for software debuggers that make use of the debug registers. Only if MSR_DE is set then debug interrupts are generated. If a debug event leads to a debug interrupt handled by a software debugger or if it leads to a debug halt handled by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM. The e500 Core Family Reference Manual chapter Chapter 8 Debug Support explains in detail the effect of MSR_DE. So what is the verdict on this? I don't buy into Dan argument without some hard data. What I tried to mention is that handling the MSR_DE correct is not only an emulator (JTAG debugger) requirement. Also a software debugger may depend on a correct handled MSR_DE bit. Yes, that made sense to me too. How would SW debuggers work if the kernel keeps turning off MSR_DE first chance it gets? The kernel selectively enables MSR_DE when it wants to debug. I'm not sure if anything will be bothered by leaving it on all the time. This is something we need for virtualization as well, so a hypervisor can debug the guest. hmm, I read that as you as in favour of the patch? I'd want some confirmation that it doesn't break anything, and that there aren't any other places that need MSR_DE that this doesn't cover, but in general yes. Then you need to test drive the patch :) I was thinking more along the lines of someone who's more familiar with the relevant parts of the code confirming that it's really OK, not just testing that it doesn't blow up in my face. Still needs a test run, just throw it in :) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
Scott Wood scottw...@freescale.com wrote on 2012/06/01 00:16:53: On 05/31/2012 05:14 PM, Joakim Tjernlund wrote: Scott Wood scottw...@freescale.com wrote on 2012/05/31 23:43:34: On 05/31/2012 04:38 PM, Joakim Tjernlund wrote: Scott Wood scottw...@freescale.com wrote on 2012/05/31 19:47:53: On 05/31/2012 04:56 AM, Joakim Tjernlund wrote: Abatron Support supp...@abatron.ch wrote on 2012/05/31 11:30:57: Abatron Support supp...@abatron.ch wrote on 2012/05/30 14:08:26: I have tested this briefly with BDI2000 on P2010(e500) and it works for me. I don't know if there are any bad side effects, therfore this RFC. We used to have MSR_DE surrounded by CONFIG_something to ensure it wasn't set under normal operation. IIRC, if MSR_DE is set, you will have problems with software debuggers that utilize the the debugging registers in the chip itself. You only want to force this to be set when using the BDI, not at other times. This MSR_DE is also of interest and used for software debuggers that make use of the debug registers. Only if MSR_DE is set then debug interrupts are generated. If a debug event leads to a debug interrupt handled by a software debugger or if it leads to a debug halt handled by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM. The e500 Core Family Reference Manual chapter Chapter 8 Debug Support explains in detail the effect of MSR_DE. So what is the verdict on this? I don't buy into Dan argument without some hard data. What I tried to mention is that handling the MSR_DE correct is not only an emulator (JTAG debugger) requirement. Also a software debugger may depend on a correct handled MSR_DE bit. Yes, that made sense to me too. How would SW debuggers work if the kernel keeps turning off MSR_DE first chance it gets? The kernel selectively enables MSR_DE when it wants to debug. I'm not sure if anything will be bothered by leaving it on all the time. This is something we need for virtualization as well, so a hypervisor can debug the guest. hmm, I read that as you as in favour of the patch? I'd want some confirmation that it doesn't break anything, and that there aren't any other places that need MSR_DE that this doesn't cover, but in general yes. Then you need to test drive the patch :) I was thinking more along the lines of someone who's more familiar with the relevant parts of the code confirming that it's really OK, not just testing that it doesn't blow up in my face. It just occurred to me that you guys have this already in your Linux SDK so it can't be that bad. Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/pci: cleanup on duplicate assignment
Agree, this is a duplication. On Thu, May 31, 2012 at 02:17:29PM +0800, Gavin Shan wrote: While creating the PCI root bus through function pci_create_root_bus() of PCI core, it should have assigned the secondary bus number for the newly created PCI root bus. Thus we needn't do the explicit assignment for the secondary bus number again in pcibios_scan_phb(). Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com --- arch/powerpc/kernel/pci-common.c |1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 8e78e93..0f75bd5 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -1646,7 +1646,6 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose) pci_free_resource_list(resources); return; } - bus-secondary = hose-first_busno; hose-bus = bus; /* Get probe mode and perform scan */ -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 5/9] blackfin: A couple of task-mm handling fixes
On Monday 23 April 2012 03:09:01 Anton Vorontsov wrote: 1. Working with task-mm w/o getting mm or grabing the task lock is dangerous as -mm might disappear (exit_mm() assigns NULL under task_lock(), so tasklist lock is not enough). that isn't a problem for this code as it specifically checks if it's in an atomic section. if it is, then task-mm can't go away on us. We can't use get_task_mm()/mmput() pair as mmput() might sleep, so we have to take the task lock while handle its mm. if we're not in an atomic section, then sleeping is fine. 2. Checking for process-mm is not enough because process' main thread may exit or detach its mm via use_mm(), but other threads may still have a valid mm. i don't think it matters for this code (per the reasons above). To catch this we use find_lock_task_mm(), which walks up all threads and returns an appropriate task (with task lock held). certainly fine for the non-atomic code path. i guess we'll notice in crashes if it causes a problem in atomic code paths as well. -mike signature.asc Description: This is a digitally signed message part. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev