Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically

2014-07-13 Thread Benjamin Herrenschmidt
On Wed, 2014-06-25 at 09:20 +, David Laight wrote:
> What are the sizes of the iommu table and the PE structure? 
> If the table is a round number of pages then you probably don't want
> to embed it inside the PE structure. 

The problem isn't the table itself but the struct iommu_table which
contains the pointer to the actual table and various other bits of
controlling state.

Cheers,
Ben.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] PCI/MSI: Remove arch_msi_check_device()

2014-07-13 Thread Yijing Wang
>  /**
> - * pci_msi_check_device - check whether MSI may be enabled on a device
> + * msi_check_device - check whether MSI may be enabled on a device
>   * @dev: pointer to the pci_dev data structure of MSI device function
>   * @nvec: how many MSIs have been requested ?
> - * @type: are we checking for MSI or MSI-X ?
>   *
>   * Look at global flags, the device itself, and its parent buses
>   * to determine if MSI/-X are supported for the device. If MSI/-X is
>   * supported return 0, else return an error code.
>   **/
> -static int pci_msi_check_device(struct pci_dev *dev, int nvec, int type)
> +static int msi_check_device(struct pci_dev *dev, int nvec)
>  {
>   struct pci_bus *bus;
> - int ret;
>  
>   /* MSI must be globally enabled and supported by the device */
> - if (!pci_msi_enable || !dev || dev->no_msi)
> + if (!pci_msi_enable)
> + return -EINVAL;
> +
> + if (!dev || dev->no_msi || dev->current_state != PCI_D0)
>   return -EINVAL;
>  
>   /*
> @@ -846,10 +837,6 @@ static int pci_msi_check_device(struct pci_dev *dev, int 
> nvec, int type)
>   if (bus->bus_flags & PCI_BUS_FLAGS_NO_MSI)
>   return -EINVAL;
>  
> - ret = arch_msi_check_device(dev, nvec, type);
> - if (ret)
> - return ret;
> -

Move the arch_msi_check_device() into arch_msi_setup_irq(), make we can not 
detect whether the device in this platform
supports MSI or MSI-X aeap. If we delay this, maybe we will do a lot 
unnecessary working for MSI/MSI-X setup.

Thanks!
Yijing.

>   return 0;
>  }
>  
> @@ -954,13 +941,13 @@ int pci_enable_msix(struct pci_dev *dev, struct 
> msix_entry *entries, int nvec)
>   int status, nr_entries;
>   int i, j;
>  
> - if (!entries || !dev->msix_cap || dev->current_state != PCI_D0)
> - return -EINVAL;
> -
> - status = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSIX);
> + status = msi_check_device(dev, nvec);
>   if (status)
>   return status;
>  
> + if (!entries)
> + return -EINVAL;
> +
>   nr_entries = pci_msix_vec_count(dev);
>   if (nr_entries < 0)
>   return nr_entries;
> @@ -1085,8 +1072,9 @@ int pci_enable_msi_range(struct pci_dev *dev, int 
> minvec, int maxvec)
>   int nvec;
>   int rc;
>  
> - if (dev->current_state != PCI_D0)
> - return -EINVAL;
> + rc = msi_check_device(dev, minvec);
> + if (rc)
> + return rc;
>  
>   WARN_ON(!!dev->msi_enabled);
>  
> @@ -1109,17 +1097,6 @@ int pci_enable_msi_range(struct pci_dev *dev, int 
> minvec, int maxvec)
>   nvec = maxvec;
>  
>   do {
> - rc = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSI);
> - if (rc < 0) {
> - return rc;
> - } else if (rc > 0) {
> - if (rc < minvec)
> - return -ENOSPC;
> - nvec = rc;
> - }
> - } while (rc);
> -
> - do {
>   rc = msi_capability_init(dev, nvec);
>   if (rc < 0) {
>   return rc;
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 92a2f99..3b873bc 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -59,7 +59,6 @@ int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc 
> *desc);
>  void arch_teardown_msi_irq(unsigned int irq);
>  int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
>  void arch_teardown_msi_irqs(struct pci_dev *dev);
> -int arch_msi_check_device(struct pci_dev* dev, int nvec, int type);
>  void arch_restore_msi_irqs(struct pci_dev *dev);
>  
>  void default_teardown_msi_irqs(struct pci_dev *dev);
> @@ -76,8 +75,6 @@ struct msi_chip {
>   int (*setup_irq)(struct msi_chip *chip, struct pci_dev *dev,
>struct msi_desc *desc);
>   void (*teardown_irq)(struct msi_chip *chip, unsigned int irq);
> - int (*check_device)(struct msi_chip *chip, struct pci_dev *dev,
> - int nvec, int type);
>  };
>  
>  #endif /* LINUX_MSI_H */
> 


-- 
Thanks!
Yijing

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields && data tearing

2014-07-13 Thread Benjamin Herrenschmidt
On Sun, 2014-07-13 at 09:15 -0400, Peter Hurley wrote:
> 
> I'm not sure I understand your point here, Ben.
> 
> Suppose that two different spinlocks are used independently to
> protect r-m-w access to adjacent data. In Oleg's example,
> suppose spinlock 1 is used for access to the bitfield and
> spinlock 2 is used for access to freeze_stop.
> 
> What would prevent an accidental write to freeze_stop from the
> kt_1 thread?

My point was to be weary of bitfields in general because access
to them is always R-M-W, never atomic and that seem to escape
people regularily :-) (Among other problems such as endian etc...)

As for Oleg's example, it *should* have worked because the bitfield and
the adjacent freeze_stop should have been accessed using load/stores
that don't actually overlap, but the compiler bug causes the bitfield
access to not properly use the basic type of the bitfield, but escalate
to a full 64-bit R-M-W instead, thus incorrectly R-M-W'ing the field
next door.

Cheers,
Ben.




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3] arm64, ia64, ppc, s390, sh, tile, um, x86, mm: Remove default gate area

2014-07-13 Thread Andy Lutomirski
The core mm code will provide a default gate area based on
FIXADDR_USER_START and FIXADDR_USER_END if
!defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR).

This default is only useful for ia64.  arm64, ppc, s390, sh, tile,
64-bit UML, and x86_32 have their own code just to disable it.  arm,
32-bit UML, and x86_64 have gate areas, but they have their own
implementations.

This gets rid of the default and moves the code into ia64.

This should save some code on architectures without a gate area: it's
now possible to inline the gate_area functions in the default case.

Acked-by: Will Deacon 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: linux...@de.ibm.com
Cc: Chris Metcalf 
Cc: Jeff Dike 
Cc: Richard Weinberger 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Nathan Lynch 
Cc: x...@kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-i...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: user-mode-linux-de...@lists.sourceforge.net
Cc: linux...@kvack.org
Signed-off-by: Andy Lutomirski 
---

It would be nice to get this into some tree that's in -next and that
people can base on.

Changes from v1 and v2: Nothing except Will Deacon's ack and splitting
this out from the larger series.

 arch/arm64/include/asm/page.h  |  3 ---
 arch/arm64/kernel/vdso.c   | 19 ---
 arch/ia64/include/asm/page.h   |  2 ++
 arch/ia64/mm/init.c| 26 ++
 arch/powerpc/include/asm/page.h|  3 ---
 arch/powerpc/kernel/vdso.c | 16 
 arch/s390/include/asm/page.h   |  2 --
 arch/s390/kernel/vdso.c| 15 ---
 arch/sh/include/asm/page.h |  5 -
 arch/sh/kernel/vsyscall/vsyscall.c | 15 ---
 arch/tile/include/asm/page.h   |  6 --
 arch/tile/kernel/vdso.c| 15 ---
 arch/um/include/asm/page.h |  5 +
 arch/x86/include/asm/page.h|  1 -
 arch/x86/include/asm/page_64.h |  2 ++
 arch/x86/um/asm/elf.h  |  1 -
 arch/x86/um/mem_64.c   | 15 ---
 arch/x86/vdso/vdso32-setup.c   | 19 +--
 include/linux/mm.h | 17 -
 mm/memory.c| 38 --
 mm/nommu.c |  5 -
 21 files changed, 48 insertions(+), 182 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 46bf666..992710f 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -28,9 +28,6 @@
 #define PAGE_SIZE  (_AC(1,UL) << PAGE_SHIFT)
 #define PAGE_MASK  (~(PAGE_SIZE-1))
 
-/* We do define AT_SYSINFO_EHDR but don't use the gate mechanism */
-#define __HAVE_ARCH_GATE_AREA  1
-
 #ifndef __ASSEMBLY__
 
 #ifdef CONFIG_ARM64_64K_PAGES
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 50384fe..f630626 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -187,25 +187,6 @@ const char *arch_vma_name(struct vm_area_struct *vma)
 }
 
 /*
- * We define AT_SYSINFO_EHDR, so we need these function stubs to keep
- * Linux happy.
- */
-int in_gate_area_no_mm(unsigned long addr)
-{
-   return 0;
-}
-
-int in_gate_area(struct mm_struct *mm, unsigned long addr)
-{
-   return 0;
-}
-
-struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
-{
-   return NULL;
-}
-
-/*
  * Update the vDSO data page to keep in sync with kernel timekeeping.
  */
 void update_vsyscall(struct timekeeper *tk)
diff --git a/arch/ia64/include/asm/page.h b/arch/ia64/include/asm/page.h
index f1e1b2e..1f1bf14 100644
--- a/arch/ia64/include/asm/page.h
+++ b/arch/ia64/include/asm/page.h
@@ -231,4 +231,6 @@ get_order (unsigned long size)
 #define PERCPU_ADDR(-PERCPU_PAGE_SIZE)
 #define LOAD_OFFSET(KERNEL_START - KERNEL_TR_PAGE_SIZE)
 
+#define __HAVE_ARCH_GATE_AREA  1
+
 #endif /* _ASM_IA64_PAGE_H */
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 25c3502..35efaa3 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -278,6 +278,32 @@ setup_gate (void)
ia64_patch_gate();
 }
 
+static struct vm_area_struct gate_vma;
+
+static int __init gate_vma_init(void)
+{
+   gate_vma.vm_mm = NULL;
+   gate_vma.vm_start = FIXADDR_USER_START;
+   gate_vma.vm_end = FIXADDR_USER_END;
+   gate_vma.vm_flags = VM_READ | VM_MAYREAD | VM_EXEC | VM_MAYEXEC;
+   gate_vma.vm_page_prot = __P101;
+
+   return 0;
+}
+__initcall(gate_vma_init);
+
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
+{
+   return &gate_vma;
+}
+
+int in_gate_area_no_mm(unsigned long addr)
+{
+   if ((addr >= FIXADDR_USER_START) && 

Re: bit fields && data tearing

2014-07-13 Thread Peter Hurley

On 07/12/2014 07:34 PM, Benjamin Herrenschmidt wrote:

On Sat, 2014-07-12 at 22:51 +0200, Oleg Nesterov wrote:

OK, looks like this is compiler bug,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080

Thanks to Dan who informed me privately.


So yes, there's is this compiler bug which means a bitfield
access can cause a r-m-w access to a neighbouring field but
in general, I would be weary of bitfields anyway since accessing
them isn't going to be atomic anyway... it's too easy to get things
wrong and in most cases the benefit is yet to be demonstrated.


I'm not sure I understand your point here, Ben.

Suppose that two different spinlocks are used independently to
protect r-m-w access to adjacent data. In Oleg's example,
suppose spinlock 1 is used for access to the bitfield and
spinlock 2 is used for access to freeze_stop.

What would prevent an accidental write to freeze_stop from the
kt_1 thread?

Regards,
Peter Hurley


In your example, I don't see the point of the bitfield.

Cheers,
Ben.


On 07/12, Oleg Nesterov wrote:


Hello,

I am not sure I should ask here, but since Documentation/memory-barriers.txt
mentions load/store tearing perhaps my question is not completely off-topic...

I am fighting with mysterious RHEL bug, it can be reproduced on ppc and s390
but not on x86. Finally I seem to understand the problem, and I even wrote the
stupid kernel module to ensure, see it below at the end.

It triggers the problem immediately, kt_2() sees the wrong value in freeze_stop.
(If I turn ->freeze_stop int "long", the problem goes away).

So the question is: is this gcc bug or the code below is buggy?

If it is buggy, then probably memory-barriers.txt could mention that you should
be carefull with bit fields, even ACCESS_ONCE() obviously can't help.

Or this just discloses my ignorance and you need at least aligned(long) after a
bit field to be thread-safe ? I thought that compiler should take care and add
the necessary alignment if (say) CPU can't update a single byte/uint.

gcc version 4.4.7 20120313 (Red Hat 4.4.7-9) (GCC). Asm:

 <.kt_2>:
   0:   7c 08 02 a6 mflrr0
   4:   fb 81 ff e0 std r28,-32(r1)
   8:   fb a1 ff e8 std r29,-24(r1)
   c:   fb c1 ff f0 std r30,-16(r1)
  10:   fb e1 ff f8 std r31,-8(r1)
  14:   eb c2 00 00 ld  r30,0(r2)
  18:   f8 01 00 10 std r0,16(r1)
  1c:   f8 21 ff 71 stdur1,-144(r1)
  20:   7c 7d 1b 78 mr  r29,r3
  24:   3b e0 00 00 li  r31,0
  28:   78 3c 04 64 rldicr  r28,r1,0,49
  2c:   3b 9c 00 80 addir28,r28,128
  30:   48 00 00 2c b   5c <.kt_2+0x5c>
  34:   60 00 00 00 nop
  38:   60 00 00 00 nop
  3c:   60 00 00 00 nop
  40:   93 fd 00 04 stw r31,4(r29)
  44:   e8 9d 00 06 lwa r4,4(r29)
  48:   7f 84 f8 00 cmpwcr7,r4,r31
  4c:   40 de 00 4c bne-cr7,98 <.kt_2+0x98>
  50:   e8 1c 00 00 ld  r0,0(r28)
  54:   78 09 f7 e3 rldicl. r9,r0,62,63
  58:   40 c2 00 54 bne-ac <.kt_2+0xac>
  5c:   48 00 00 01 bl  5c <.kt_2+0x5c>
  60:   60 00 00 00 nop
  64:   3b ff 00 01 addir31,r31,1
  68:   2f a3 00 00 cmpdi   cr7,r3,0
  6c:   7f ff 07 b4 extsw   r31,r31
  70:   41 9e ff d0 beq+cr7,40 <.kt_2+0x40>
  74:   38 21 00 90 addir1,r1,144
  78:   38 60 00 00 li  r3,0
  7c:   e8 01 00 10 ld  r0,16(r1)
  80:   eb 81 ff e0 ld  r28,-32(r1)
  84:   eb a1 ff e8 ld  r29,-24(r1)
  88:   eb c1 ff f0 ld  r30,-16(r1)
  8c:   eb e1 ff f8 ld  r31,-8(r1)
  90:   7c 08 03 a6 mtlrr0
  94:   4e 80 00 20 blr
  98:   e8 7e 80 28 ld  r3,-32728(r30)
  9c:   7f e5 fb 78 mr  r5,r31
  a0:   48 00 00 01 bl  a0 <.kt_2+0xa0>
  a4:   60 00 00 00 nop
  a8:   4b ff ff a8 b   50 <.kt_2+0x50>
  ac:   48 00 00 01 bl  ac <.kt_2+0xac>
  b0:   60 00 00 00 nop
  b4:   4b ff ff a8 b   5c <.kt_2+0x5c>
  b8:   60 00 00 00 nop
  bc:   60 00 00 00 nop

00c0 <.kt_1>:
  c0:   7c 08 02 a6 mflrr0
  c4:   fb 81 ff e0 std r28,-32(r1)
  c8:   fb a1 ff e8 std r29,-24(r1)
  cc:   fb c1 ff f0 std r30,-16(r1)
  d0:   fb e1 ff f8 std r31,-8(r1)
  d4:   eb c2 00 00 ld  r30,0(r2)
  d8:   f8 01 00 10 std r0,16(r1)
  dc:   f8 21 ff 71 stdur1,-144(r1)
  e0:   7c 7d 1b 78 mr  r29,r3
  e4:   3b e0 00 00 li  r31,0
  e8:   78 3c 04 64

Re: bit fields && data tearing

2014-07-13 Thread Oleg Nesterov
On 07/13, Benjamin Herrenschmidt wrote:
>
> On Sat, 2014-07-12 at 22:51 +0200, Oleg Nesterov wrote:
> > OK, looks like this is compiler bug,
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
> >
> > Thanks to Dan who informed me privately.
>
> So yes, there's is this compiler bug which means a bitfield
> access can cause a r-m-w access to a neighbouring field

Thanks. So I can forward this all back to bugzilla.

> but
> in general, I would be weary of bitfields anyway since accessing
> them isn't going to be atomic anyway... it's too easy to get things
> wrong and in most cases the benefit is yet to be demonstrated.

Sure, bit fields should be used with care. But the same arguments apply
to bitmasks, they are often used without "atomic" set/clear_bit.

> In your example, I don't see the point of the bitfield.

This is just test-case. The real code has more adjacent bit fields, only
the tracee can modify them, and only debugger can change ->freeze_stop.

Thanks,

Oleg.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev