Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors

2014-05-27 Thread Will Deacon
On Tue, May 27, 2014 at 09:23:30PM +0100, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-27 at 20:34 +0100, Will Deacon wrote:
> 
> > Do you mean the io{read,write} functions? Funnily enough, they're already
> > relaxed on ARM if you go by the semantics I've proposed. That implies we at
> > least need some Documentation to that effect...
> > 
> > What do you do on ppc?
> 
> They are not supposed to be relaxed. If they are, you probably have a
> whole lot of busted drivers :-)

Lucky me!

> They have the same semantics as readl/writel for memory and as inb/outb
> for IO space, they just allow to hide the "type" (memory vs. IO) from
> most of the driver code.
> 
> We probably need to create a set of _relaxed variants.

Ok. I'll try putting together a v3 including this and the mmiowb work.

Thanks for the feedback,

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors

2014-05-27 Thread Will Deacon
On Tue, May 27, 2014 at 09:21:38PM +0100, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-27 at 20:32 +0100, Will Deacon wrote:
> 
> > Why would you need two barriers? I would have though an mmiowb() inlined
> > into writel after the store operation would be sufficient. Or is this to
> > ensure a non-relaxed write is ordered with respect to a relaxed write?
> 
> Well, so the non-relaxed writel would have to do:
> 
>   sync
>   store
>   sync
> 
> The first sync is to synchronize with DMAs, so that a sequence of
> 
>   store to mem
>   writel
> 
> Remains ordered vs. the device (ie, when the writel causes the device
> to do a DMA, it will see the previous store to mem).
> 
> The second sync is needed as mmiowb, to order with unlocks.

Ah yeah, thanks. I was so hung up on the ordering against locks that I
completely forgot about DMA!

> At this point, I'm keen on keeping my per-cpu trick to avoid that
> second one in most cases.

Makes sense. The alternative is dropping that requirement and instead
relying on drivers to use mmiowb() even with the non-relaxed accessors,
but I think that's going to be fairly painful (and hence why you have the
trick to start with).

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors

2014-05-27 Thread Benjamin Herrenschmidt
On Tue, 2014-05-27 at 20:34 +0100, Will Deacon wrote:

> Do you mean the io{read,write} functions? Funnily enough, they're already
> relaxed on ARM if you go by the semantics I've proposed. That implies we at
> least need some Documentation to that effect...
> 
> What do you do on ppc?

They are not supposed to be relaxed. If they are, you probably have a
whole lot of busted drivers :-)

They have the same semantics as readl/writel for memory and as inb/outb
for IO space, they just allow to hide the "type" (memory vs. IO) from
most of the driver code.

We probably need to create a set of _relaxed variants.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors

2014-05-27 Thread Benjamin Herrenschmidt
On Tue, 2014-05-27 at 20:32 +0100, Will Deacon wrote:

> Why would you need two barriers? I would have though an mmiowb() inlined
> into writel after the store operation would be sufficient. Or is this to
> ensure a non-relaxed write is ordered with respect to a relaxed write?

Well, so the non-relaxed writel would have to do:

sync
store
sync

The first sync is to synchronize with DMAs, so that a sequence of

store to mem
writel

Remains ordered vs. the device (ie, when the writel causes the device
to do a DMA, it will see the previous store to mem).

The second sync is needed as mmiowb, to order with unlocks.
 
At this point, I'm keen on keeping my per-cpu trick to avoid that
second one in most cases.

> Anyway, we may need something similar for other architectures with mmiowb
> implementations:
> 
>   blackfin
>   frv
>   ia64
>   mips
>   sh
> 
> so I'm anticipating some more discussion when I try to push that patch :)
> 
> Cheers,
> 
> Will


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors

2014-05-27 Thread Will Deacon
On Sun, May 25, 2014 at 10:47:50PM +0100, Benjamin Herrenschmidt wrote:
> On Thu, 2014-05-22 at 17:47 +0100, Will Deacon wrote:
> > Hi all,
> > 
> > This is version 2 of the series I originally posted here:
> > 
> >   https://lkml.org/lkml/2014/4/17/269
> > 
> > Changes since v1 include:
> > 
> >  - Added relevant acks from arch maintainers
> >  - Fixed potential compiler re-ordering issue for x86 definitions
> > 
> > I'd *really* appreciate some feedback on the proposed semantics here, but
> > acks are still good :)
> > 
> > The original cover letter is duplicated below.
> 
> Question (sorry if I missed an existing explanation...), do we have an
> equivalent bunch for iomap ?

Do you mean the io{read,write} functions? Funnily enough, they're already
relaxed on ARM if you go by the semantics I've proposed. That implies we at
least need some Documentation to that effect...

What do you do on ppc?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors

2014-05-27 Thread Will Deacon
Hi Ben,

On Sun, May 25, 2014 at 10:46:03PM +0100, Benjamin Herrenschmidt wrote:
> On Thu, 2014-05-22 at 17:47 +0100, Will Deacon wrote:
> > A corollary to this is that mmiowb() probably needs rethinking. As it 
> > currently
> > stands, an mmiowb() is required to order MMIO writes to a device from 
> > multiple
> > CPUs, even if that device is protected by a lock. However, this isn't often 
> > used
> > in practice, leading to PowerPC implementing both mmiowb() *and* 
> > synchronising
> > I/O in spin_unlock.
> > 
> > I would propose making the non-relaxed I/O accessors ordered with respect to
> > LOCK/UNLOCK, leaving mmiowb() to be used with the relaxed accessors, if
> > required, but would welcome thoughts/suggestions on this topic.
> 
> I agree on the proposed semantics, though for us that does mean we still need
> that per-cpu flag tracking non-relaxed MMIO stores and corresponding added 
> barrier
> in unlock. Eventually, if the use of the relaxed accessors becomes pervasive
> enough I suppose I can just make the ordered ones unconditionally do 2 
> barriers.

Why would you need two barriers? I would have though an mmiowb() inlined
into writel after the store operation would be sufficient. Or is this to
ensure a non-relaxed write is ordered with respect to a relaxed write?

Anyway, we may need something similar for other architectures with mmiowb
implementations:

  blackfin
  frv
  ia64
  mips
  sh

so I'm anticipating some more discussion when I try to push that patch :)

Cheers,

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors

2014-05-25 Thread Benjamin Herrenschmidt
On Thu, 2014-05-22 at 17:47 +0100, Will Deacon wrote:
> Hi all,
> 
> This is version 2 of the series I originally posted here:
> 
>   https://lkml.org/lkml/2014/4/17/269
> 
> Changes since v1 include:
> 
>  - Added relevant acks from arch maintainers
>  - Fixed potential compiler re-ordering issue for x86 definitions
> 
> I'd *really* appreciate some feedback on the proposed semantics here, but
> acks are still good :)
> 
> The original cover letter is duplicated below.

Question (sorry if I missed an existing explanation...), do we have an
equivalent bunch for iomap ?

Cheers,
Ben.

> Cheers,
> 
> Will
> 
> --->8
> 
> This RFC series attempts to define a portable (i.e. cross-architecture)
> definition of the {readX,writeX}_relaxed MMIO accessor functions. These
> functions are already in widespread use amongst drivers (mainly those 
> supporting
> devices embedded in ARM SoCs), but lack any well-defined semantics and,
> subsequently, any portable definitions to allow these drivers to be compiled 
> for
> other architectures.
> 
> The two main motivations for this series are:
> 
>  (1) To promote use of the _relaxed MMIO accessors on weakly-ordered
>  architectures, where they can bring significant performance improvements
>  over their non-relaxed counterparts.
> 
>  (2) To allow COMPILE_TEST to build drivers using the relaxed accessors across
>  all architectures.
> 
> The proposed semantics largely match exactly those provided by the ARM
> implementation (i.e. no weaker), with one exception (see below).
> 
> Informally:
> 
>   - Relaxed accesses to the same device are ordered with respect to each 
> other.
> 
>   - Relaxed accesses are *not* guaranteed to be ordered with respect to normal
> memory accesses (e.g. DMA buffers -- this is what gives us the performance
> boost over the non-relaxed versions).
> 
>   - Relaxed accesses are not guaranteed to be ordered with respect to
> LOCK/UNLOCK operations.
> 
> In actual fact, the relaxed accessors *are* ordered with respect to 
> LOCK/UNLOCK
> operations on ARM[64], but I have added this constraint for the benefit of
> PowerPC, which has expensive I/O barriers in the spin_unlock path for the
> non-relaxed accessors.
> 
> A corollary to this is that mmiowb() probably needs rethinking. As it 
> currently
> stands, an mmiowb() is required to order MMIO writes to a device from multiple
> CPUs, even if that device is protected by a lock. However, this isn't often 
> used
> in practice, leading to PowerPC implementing both mmiowb() *and* synchronising
> I/O in spin_unlock.
> 
> I would propose making the non-relaxed I/O accessors ordered with respect to
> LOCK/UNLOCK, leaving mmiowb() to be used with the relaxed accessors, if
> required, but would welcome thoughts/suggestions on this topic.
> 
> 
> Will Deacon (18):
>   asm-generic: io: implement relaxed accessor macros as conditional
> wrappers
>   microblaze: io: remove dummy relaxed accessor macros
>   s390: io: remove dummy relaxed accessor macros for reads
>   xtensa: io: remove dummy relaxed accessor macros for reads
>   alpha: io: implement relaxed accessor macros for writes
>   frv: io: implement dummy relaxed accessor macros for writes
>   cris: io: implement dummy relaxed accessor macros for writes
>   ia64: io: implement dummy relaxed accessor macros for writes
>   m32r: io: implement dummy relaxed accessor macros for writes
>   m68k: io: implement dummy relaxed accessor macros for writes
>   mn10300: io: implement dummy relaxed accessor macros for writes
>   parisc: io: implement dummy relaxed accessor macros for writes
>   powerpc: io: implement dummy relaxed accessor macros for writes
>   sparc: io: implement dummy relaxed accessor macros for writes
>   tile: io: implement dummy relaxed accessor macros for writes
>   x86: io: implement dummy relaxed accessor macros for writes
>   documentation: memory-barriers: clarify relaxed io accessor semantics
>   asm-generic: io: define relaxed accessor macros unconditionally
> 
>  Documentation/memory-barriers.txt | 13 +
>  arch/alpha/include/asm/io.h   | 12 
>  arch/cris/include/asm/io.h|  3 +++
>  arch/frv/include/asm/io.h |  3 +++
>  arch/ia64/include/asm/io.h|  4 
>  arch/m32r/include/asm/io.h|  3 +++
>  arch/m68k/include/asm/io.h|  8 
>  arch/m68k/include/asm/io_no.h |  4 
>  arch/microblaze/include/asm/io.h  |  8 
>  arch/mn10300/include/asm/io.h |  4 
>  arch/parisc/include/asm/io.h  | 12 
>  arch/powerpc/include/asm/io.h | 12 
>  arch/s390/include/asm/io.h|  5 -
>  arch/sparc/include/asm/io.h   |  9 +
>  arch/sparc/include/asm/io_32.h|  3 ---
>  arch/sparc/include/asm/io_64.h| 22 ++
>  arch/tile/include/asm/io.h|  4 
>  arch/x86/include/asm/io.h | 10 +++---
>  arch/xtensa/include/asm/io.h  |  7 ---
>  inclu

Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors

2014-05-25 Thread Benjamin Herrenschmidt
On Thu, 2014-05-22 at 17:47 +0100, Will Deacon wrote:
> A corollary to this is that mmiowb() probably needs rethinking. As it 
> currently
> stands, an mmiowb() is required to order MMIO writes to a device from multiple
> CPUs, even if that device is protected by a lock. However, this isn't often 
> used
> in practice, leading to PowerPC implementing both mmiowb() *and* synchronising
> I/O in spin_unlock.
> 
> I would propose making the non-relaxed I/O accessors ordered with respect to
> LOCK/UNLOCK, leaving mmiowb() to be used with the relaxed accessors, if
> required, but would welcome thoughts/suggestions on this topic.

I agree on the proposed semantics, though for us that does mean we still need
that per-cpu flag tracking non-relaxed MMIO stores and corresponding added 
barrier
in unlock. Eventually, if the use of the relaxed accessors becomes pervasive
enough I suppose I can just make the ordered ones unconditionally do 2 barriers.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/