Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
On Wed, 2014-06-25 at 09:20 +, David Laight wrote: > What are the sizes of the iommu table and the PE structure? > If the table is a round number of pages then you probably don't want > to embed it inside the PE structure. The problem isn't the table itself but the struct iommu_table which contains the pointer to the actual table and various other bits of controlling state. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] PCI/MSI: Remove arch_msi_check_device()
> /** > - * pci_msi_check_device - check whether MSI may be enabled on a device > + * msi_check_device - check whether MSI may be enabled on a device > * @dev: pointer to the pci_dev data structure of MSI device function > * @nvec: how many MSIs have been requested ? > - * @type: are we checking for MSI or MSI-X ? > * > * Look at global flags, the device itself, and its parent buses > * to determine if MSI/-X are supported for the device. If MSI/-X is > * supported return 0, else return an error code. > **/ > -static int pci_msi_check_device(struct pci_dev *dev, int nvec, int type) > +static int msi_check_device(struct pci_dev *dev, int nvec) > { > struct pci_bus *bus; > - int ret; > > /* MSI must be globally enabled and supported by the device */ > - if (!pci_msi_enable || !dev || dev->no_msi) > + if (!pci_msi_enable) > + return -EINVAL; > + > + if (!dev || dev->no_msi || dev->current_state != PCI_D0) > return -EINVAL; > > /* > @@ -846,10 +837,6 @@ static int pci_msi_check_device(struct pci_dev *dev, int > nvec, int type) > if (bus->bus_flags & PCI_BUS_FLAGS_NO_MSI) > return -EINVAL; > > - ret = arch_msi_check_device(dev, nvec, type); > - if (ret) > - return ret; > - Move the arch_msi_check_device() into arch_msi_setup_irq(), make we can not detect whether the device in this platform supports MSI or MSI-X aeap. If we delay this, maybe we will do a lot unnecessary working for MSI/MSI-X setup. Thanks! Yijing. > return 0; > } > > @@ -954,13 +941,13 @@ int pci_enable_msix(struct pci_dev *dev, struct > msix_entry *entries, int nvec) > int status, nr_entries; > int i, j; > > - if (!entries || !dev->msix_cap || dev->current_state != PCI_D0) > - return -EINVAL; > - > - status = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSIX); > + status = msi_check_device(dev, nvec); > if (status) > return status; > > + if (!entries) > + return -EINVAL; > + > nr_entries = pci_msix_vec_count(dev); > if (nr_entries < 0) > return nr_entries; > @@ -1085,8 +1072,9 @@ int pci_enable_msi_range(struct pci_dev *dev, int > minvec, int maxvec) > int nvec; > int rc; > > - if (dev->current_state != PCI_D0) > - return -EINVAL; > + rc = msi_check_device(dev, minvec); > + if (rc) > + return rc; > > WARN_ON(!!dev->msi_enabled); > > @@ -1109,17 +1097,6 @@ int pci_enable_msi_range(struct pci_dev *dev, int > minvec, int maxvec) > nvec = maxvec; > > do { > - rc = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSI); > - if (rc < 0) { > - return rc; > - } else if (rc > 0) { > - if (rc < minvec) > - return -ENOSPC; > - nvec = rc; > - } > - } while (rc); > - > - do { > rc = msi_capability_init(dev, nvec); > if (rc < 0) { > return rc; > diff --git a/include/linux/msi.h b/include/linux/msi.h > index 92a2f99..3b873bc 100644 > --- a/include/linux/msi.h > +++ b/include/linux/msi.h > @@ -59,7 +59,6 @@ int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc > *desc); > void arch_teardown_msi_irq(unsigned int irq); > int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type); > void arch_teardown_msi_irqs(struct pci_dev *dev); > -int arch_msi_check_device(struct pci_dev* dev, int nvec, int type); > void arch_restore_msi_irqs(struct pci_dev *dev); > > void default_teardown_msi_irqs(struct pci_dev *dev); > @@ -76,8 +75,6 @@ struct msi_chip { > int (*setup_irq)(struct msi_chip *chip, struct pci_dev *dev, >struct msi_desc *desc); > void (*teardown_irq)(struct msi_chip *chip, unsigned int irq); > - int (*check_device)(struct msi_chip *chip, struct pci_dev *dev, > - int nvec, int type); > }; > > #endif /* LINUX_MSI_H */ > -- Thanks! Yijing ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: bit fields && data tearing
On Sun, 2014-07-13 at 09:15 -0400, Peter Hurley wrote: > > I'm not sure I understand your point here, Ben. > > Suppose that two different spinlocks are used independently to > protect r-m-w access to adjacent data. In Oleg's example, > suppose spinlock 1 is used for access to the bitfield and > spinlock 2 is used for access to freeze_stop. > > What would prevent an accidental write to freeze_stop from the > kt_1 thread? My point was to be weary of bitfields in general because access to them is always R-M-W, never atomic and that seem to escape people regularily :-) (Among other problems such as endian etc...) As for Oleg's example, it *should* have worked because the bitfield and the adjacent freeze_stop should have been accessed using load/stores that don't actually overlap, but the compiler bug causes the bitfield access to not properly use the basic type of the bitfield, but escalate to a full 64-bit R-M-W instead, thus incorrectly R-M-W'ing the field next door. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3] arm64, ia64, ppc, s390, sh, tile, um, x86, mm: Remove default gate area
The core mm code will provide a default gate area based on FIXADDR_USER_START and FIXADDR_USER_END if !defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR). This default is only useful for ia64. arm64, ppc, s390, sh, tile, 64-bit UML, and x86_32 have their own code just to disable it. arm, 32-bit UML, and x86_64 have gate areas, but they have their own implementations. This gets rid of the default and moves the code into ia64. This should save some code on architectures without a gate area: it's now possible to inline the gate_area functions in the default case. Acked-by: Will Deacon Cc: Catalin Marinas Cc: Will Deacon Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: linux...@de.ibm.com Cc: Chris Metcalf Cc: Jeff Dike Cc: Richard Weinberger Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Nathan Lynch Cc: x...@kernel.org Cc: linux-a...@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Cc: linux-i...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: user-mode-linux-de...@lists.sourceforge.net Cc: linux...@kvack.org Signed-off-by: Andy Lutomirski --- It would be nice to get this into some tree that's in -next and that people can base on. Changes from v1 and v2: Nothing except Will Deacon's ack and splitting this out from the larger series. arch/arm64/include/asm/page.h | 3 --- arch/arm64/kernel/vdso.c | 19 --- arch/ia64/include/asm/page.h | 2 ++ arch/ia64/mm/init.c| 26 ++ arch/powerpc/include/asm/page.h| 3 --- arch/powerpc/kernel/vdso.c | 16 arch/s390/include/asm/page.h | 2 -- arch/s390/kernel/vdso.c| 15 --- arch/sh/include/asm/page.h | 5 - arch/sh/kernel/vsyscall/vsyscall.c | 15 --- arch/tile/include/asm/page.h | 6 -- arch/tile/kernel/vdso.c| 15 --- arch/um/include/asm/page.h | 5 + arch/x86/include/asm/page.h| 1 - arch/x86/include/asm/page_64.h | 2 ++ arch/x86/um/asm/elf.h | 1 - arch/x86/um/mem_64.c | 15 --- arch/x86/vdso/vdso32-setup.c | 19 +-- include/linux/mm.h | 17 - mm/memory.c| 38 -- mm/nommu.c | 5 - 21 files changed, 48 insertions(+), 182 deletions(-) diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 46bf666..992710f 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -28,9 +28,6 @@ #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) -/* We do define AT_SYSINFO_EHDR but don't use the gate mechanism */ -#define __HAVE_ARCH_GATE_AREA 1 - #ifndef __ASSEMBLY__ #ifdef CONFIG_ARM64_64K_PAGES diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index 50384fe..f630626 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -187,25 +187,6 @@ const char *arch_vma_name(struct vm_area_struct *vma) } /* - * We define AT_SYSINFO_EHDR, so we need these function stubs to keep - * Linux happy. - */ -int in_gate_area_no_mm(unsigned long addr) -{ - return 0; -} - -int in_gate_area(struct mm_struct *mm, unsigned long addr) -{ - return 0; -} - -struct vm_area_struct *get_gate_vma(struct mm_struct *mm) -{ - return NULL; -} - -/* * Update the vDSO data page to keep in sync with kernel timekeeping. */ void update_vsyscall(struct timekeeper *tk) diff --git a/arch/ia64/include/asm/page.h b/arch/ia64/include/asm/page.h index f1e1b2e..1f1bf14 100644 --- a/arch/ia64/include/asm/page.h +++ b/arch/ia64/include/asm/page.h @@ -231,4 +231,6 @@ get_order (unsigned long size) #define PERCPU_ADDR(-PERCPU_PAGE_SIZE) #define LOAD_OFFSET(KERNEL_START - KERNEL_TR_PAGE_SIZE) +#define __HAVE_ARCH_GATE_AREA 1 + #endif /* _ASM_IA64_PAGE_H */ diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 25c3502..35efaa3 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -278,6 +278,32 @@ setup_gate (void) ia64_patch_gate(); } +static struct vm_area_struct gate_vma; + +static int __init gate_vma_init(void) +{ + gate_vma.vm_mm = NULL; + gate_vma.vm_start = FIXADDR_USER_START; + gate_vma.vm_end = FIXADDR_USER_END; + gate_vma.vm_flags = VM_READ | VM_MAYREAD | VM_EXEC | VM_MAYEXEC; + gate_vma.vm_page_prot = __P101; + + return 0; +} +__initcall(gate_vma_init); + +struct vm_area_struct *get_gate_vma(struct mm_struct *mm) +{ + return &gate_vma; +} + +int in_gate_area_no_mm(unsigned long addr) +{ + if ((addr >= FIXADDR_USER_START) &&
Re: bit fields && data tearing
On 07/12/2014 07:34 PM, Benjamin Herrenschmidt wrote: On Sat, 2014-07-12 at 22:51 +0200, Oleg Nesterov wrote: OK, looks like this is compiler bug, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 Thanks to Dan who informed me privately. So yes, there's is this compiler bug which means a bitfield access can cause a r-m-w access to a neighbouring field but in general, I would be weary of bitfields anyway since accessing them isn't going to be atomic anyway... it's too easy to get things wrong and in most cases the benefit is yet to be demonstrated. I'm not sure I understand your point here, Ben. Suppose that two different spinlocks are used independently to protect r-m-w access to adjacent data. In Oleg's example, suppose spinlock 1 is used for access to the bitfield and spinlock 2 is used for access to freeze_stop. What would prevent an accidental write to freeze_stop from the kt_1 thread? Regards, Peter Hurley In your example, I don't see the point of the bitfield. Cheers, Ben. On 07/12, Oleg Nesterov wrote: Hello, I am not sure I should ask here, but since Documentation/memory-barriers.txt mentions load/store tearing perhaps my question is not completely off-topic... I am fighting with mysterious RHEL bug, it can be reproduced on ppc and s390 but not on x86. Finally I seem to understand the problem, and I even wrote the stupid kernel module to ensure, see it below at the end. It triggers the problem immediately, kt_2() sees the wrong value in freeze_stop. (If I turn ->freeze_stop int "long", the problem goes away). So the question is: is this gcc bug or the code below is buggy? If it is buggy, then probably memory-barriers.txt could mention that you should be carefull with bit fields, even ACCESS_ONCE() obviously can't help. Or this just discloses my ignorance and you need at least aligned(long) after a bit field to be thread-safe ? I thought that compiler should take care and add the necessary alignment if (say) CPU can't update a single byte/uint. gcc version 4.4.7 20120313 (Red Hat 4.4.7-9) (GCC). Asm: <.kt_2>: 0: 7c 08 02 a6 mflrr0 4: fb 81 ff e0 std r28,-32(r1) 8: fb a1 ff e8 std r29,-24(r1) c: fb c1 ff f0 std r30,-16(r1) 10: fb e1 ff f8 std r31,-8(r1) 14: eb c2 00 00 ld r30,0(r2) 18: f8 01 00 10 std r0,16(r1) 1c: f8 21 ff 71 stdur1,-144(r1) 20: 7c 7d 1b 78 mr r29,r3 24: 3b e0 00 00 li r31,0 28: 78 3c 04 64 rldicr r28,r1,0,49 2c: 3b 9c 00 80 addir28,r28,128 30: 48 00 00 2c b 5c <.kt_2+0x5c> 34: 60 00 00 00 nop 38: 60 00 00 00 nop 3c: 60 00 00 00 nop 40: 93 fd 00 04 stw r31,4(r29) 44: e8 9d 00 06 lwa r4,4(r29) 48: 7f 84 f8 00 cmpwcr7,r4,r31 4c: 40 de 00 4c bne-cr7,98 <.kt_2+0x98> 50: e8 1c 00 00 ld r0,0(r28) 54: 78 09 f7 e3 rldicl. r9,r0,62,63 58: 40 c2 00 54 bne-ac <.kt_2+0xac> 5c: 48 00 00 01 bl 5c <.kt_2+0x5c> 60: 60 00 00 00 nop 64: 3b ff 00 01 addir31,r31,1 68: 2f a3 00 00 cmpdi cr7,r3,0 6c: 7f ff 07 b4 extsw r31,r31 70: 41 9e ff d0 beq+cr7,40 <.kt_2+0x40> 74: 38 21 00 90 addir1,r1,144 78: 38 60 00 00 li r3,0 7c: e8 01 00 10 ld r0,16(r1) 80: eb 81 ff e0 ld r28,-32(r1) 84: eb a1 ff e8 ld r29,-24(r1) 88: eb c1 ff f0 ld r30,-16(r1) 8c: eb e1 ff f8 ld r31,-8(r1) 90: 7c 08 03 a6 mtlrr0 94: 4e 80 00 20 blr 98: e8 7e 80 28 ld r3,-32728(r30) 9c: 7f e5 fb 78 mr r5,r31 a0: 48 00 00 01 bl a0 <.kt_2+0xa0> a4: 60 00 00 00 nop a8: 4b ff ff a8 b 50 <.kt_2+0x50> ac: 48 00 00 01 bl ac <.kt_2+0xac> b0: 60 00 00 00 nop b4: 4b ff ff a8 b 5c <.kt_2+0x5c> b8: 60 00 00 00 nop bc: 60 00 00 00 nop 00c0 <.kt_1>: c0: 7c 08 02 a6 mflrr0 c4: fb 81 ff e0 std r28,-32(r1) c8: fb a1 ff e8 std r29,-24(r1) cc: fb c1 ff f0 std r30,-16(r1) d0: fb e1 ff f8 std r31,-8(r1) d4: eb c2 00 00 ld r30,0(r2) d8: f8 01 00 10 std r0,16(r1) dc: f8 21 ff 71 stdur1,-144(r1) e0: 7c 7d 1b 78 mr r29,r3 e4: 3b e0 00 00 li r31,0 e8: 78 3c 04 64
Re: bit fields && data tearing
On 07/13, Benjamin Herrenschmidt wrote: > > On Sat, 2014-07-12 at 22:51 +0200, Oleg Nesterov wrote: > > OK, looks like this is compiler bug, > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 > > > > Thanks to Dan who informed me privately. > > So yes, there's is this compiler bug which means a bitfield > access can cause a r-m-w access to a neighbouring field Thanks. So I can forward this all back to bugzilla. > but > in general, I would be weary of bitfields anyway since accessing > them isn't going to be atomic anyway... it's too easy to get things > wrong and in most cases the benefit is yet to be demonstrated. Sure, bit fields should be used with care. But the same arguments apply to bitmasks, they are often used without "atomic" set/clear_bit. > In your example, I don't see the point of the bitfield. This is just test-case. The real code has more adjacent bit fields, only the tracee can modify them, and only debugger can change ->freeze_stop. Thanks, Oleg. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev