Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors
On Tue, May 27, 2014 at 09:23:30PM +0100, Benjamin Herrenschmidt wrote: > On Tue, 2014-05-27 at 20:34 +0100, Will Deacon wrote: > > > Do you mean the io{read,write} functions? Funnily enough, they're already > > relaxed on ARM if you go by the semantics I've proposed. That implies we at > > least need some Documentation to that effect... > > > > What do you do on ppc? > > They are not supposed to be relaxed. If they are, you probably have a > whole lot of busted drivers :-) Lucky me! > They have the same semantics as readl/writel for memory and as inb/outb > for IO space, they just allow to hide the "type" (memory vs. IO) from > most of the driver code. > > We probably need to create a set of _relaxed variants. Ok. I'll try putting together a v3 including this and the mmiowb work. Thanks for the feedback, Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors
On Tue, May 27, 2014 at 09:21:38PM +0100, Benjamin Herrenschmidt wrote: > On Tue, 2014-05-27 at 20:32 +0100, Will Deacon wrote: > > > Why would you need two barriers? I would have though an mmiowb() inlined > > into writel after the store operation would be sufficient. Or is this to > > ensure a non-relaxed write is ordered with respect to a relaxed write? > > Well, so the non-relaxed writel would have to do: > > sync > store > sync > > The first sync is to synchronize with DMAs, so that a sequence of > > store to mem > writel > > Remains ordered vs. the device (ie, when the writel causes the device > to do a DMA, it will see the previous store to mem). > > The second sync is needed as mmiowb, to order with unlocks. Ah yeah, thanks. I was so hung up on the ordering against locks that I completely forgot about DMA! > At this point, I'm keen on keeping my per-cpu trick to avoid that > second one in most cases. Makes sense. The alternative is dropping that requirement and instead relying on drivers to use mmiowb() even with the non-relaxed accessors, but I think that's going to be fairly painful (and hence why you have the trick to start with). Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors
On Tue, 2014-05-27 at 20:34 +0100, Will Deacon wrote: > Do you mean the io{read,write} functions? Funnily enough, they're already > relaxed on ARM if you go by the semantics I've proposed. That implies we at > least need some Documentation to that effect... > > What do you do on ppc? They are not supposed to be relaxed. If they are, you probably have a whole lot of busted drivers :-) They have the same semantics as readl/writel for memory and as inb/outb for IO space, they just allow to hide the "type" (memory vs. IO) from most of the driver code. We probably need to create a set of _relaxed variants. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors
On Tue, 2014-05-27 at 20:32 +0100, Will Deacon wrote: > Why would you need two barriers? I would have though an mmiowb() inlined > into writel after the store operation would be sufficient. Or is this to > ensure a non-relaxed write is ordered with respect to a relaxed write? Well, so the non-relaxed writel would have to do: sync store sync The first sync is to synchronize with DMAs, so that a sequence of store to mem writel Remains ordered vs. the device (ie, when the writel causes the device to do a DMA, it will see the previous store to mem). The second sync is needed as mmiowb, to order with unlocks. At this point, I'm keen on keeping my per-cpu trick to avoid that second one in most cases. > Anyway, we may need something similar for other architectures with mmiowb > implementations: > > blackfin > frv > ia64 > mips > sh > > so I'm anticipating some more discussion when I try to push that patch :) > > Cheers, > > Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors
On Sun, May 25, 2014 at 10:47:50PM +0100, Benjamin Herrenschmidt wrote: > On Thu, 2014-05-22 at 17:47 +0100, Will Deacon wrote: > > Hi all, > > > > This is version 2 of the series I originally posted here: > > > > https://lkml.org/lkml/2014/4/17/269 > > > > Changes since v1 include: > > > > - Added relevant acks from arch maintainers > > - Fixed potential compiler re-ordering issue for x86 definitions > > > > I'd *really* appreciate some feedback on the proposed semantics here, but > > acks are still good :) > > > > The original cover letter is duplicated below. > > Question (sorry if I missed an existing explanation...), do we have an > equivalent bunch for iomap ? Do you mean the io{read,write} functions? Funnily enough, they're already relaxed on ARM if you go by the semantics I've proposed. That implies we at least need some Documentation to that effect... What do you do on ppc? Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors
Hi Ben, On Sun, May 25, 2014 at 10:46:03PM +0100, Benjamin Herrenschmidt wrote: > On Thu, 2014-05-22 at 17:47 +0100, Will Deacon wrote: > > A corollary to this is that mmiowb() probably needs rethinking. As it > > currently > > stands, an mmiowb() is required to order MMIO writes to a device from > > multiple > > CPUs, even if that device is protected by a lock. However, this isn't often > > used > > in practice, leading to PowerPC implementing both mmiowb() *and* > > synchronising > > I/O in spin_unlock. > > > > I would propose making the non-relaxed I/O accessors ordered with respect to > > LOCK/UNLOCK, leaving mmiowb() to be used with the relaxed accessors, if > > required, but would welcome thoughts/suggestions on this topic. > > I agree on the proposed semantics, though for us that does mean we still need > that per-cpu flag tracking non-relaxed MMIO stores and corresponding added > barrier > in unlock. Eventually, if the use of the relaxed accessors becomes pervasive > enough I suppose I can just make the ordered ones unconditionally do 2 > barriers. Why would you need two barriers? I would have though an mmiowb() inlined into writel after the store operation would be sufficient. Or is this to ensure a non-relaxed write is ordered with respect to a relaxed write? Anyway, we may need something similar for other architectures with mmiowb implementations: blackfin frv ia64 mips sh so I'm anticipating some more discussion when I try to push that patch :) Cheers, Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors
On Thu, 2014-05-22 at 17:47 +0100, Will Deacon wrote: > Hi all, > > This is version 2 of the series I originally posted here: > > https://lkml.org/lkml/2014/4/17/269 > > Changes since v1 include: > > - Added relevant acks from arch maintainers > - Fixed potential compiler re-ordering issue for x86 definitions > > I'd *really* appreciate some feedback on the proposed semantics here, but > acks are still good :) > > The original cover letter is duplicated below. Question (sorry if I missed an existing explanation...), do we have an equivalent bunch for iomap ? Cheers, Ben. > Cheers, > > Will > > --->8 > > This RFC series attempts to define a portable (i.e. cross-architecture) > definition of the {readX,writeX}_relaxed MMIO accessor functions. These > functions are already in widespread use amongst drivers (mainly those > supporting > devices embedded in ARM SoCs), but lack any well-defined semantics and, > subsequently, any portable definitions to allow these drivers to be compiled > for > other architectures. > > The two main motivations for this series are: > > (1) To promote use of the _relaxed MMIO accessors on weakly-ordered > architectures, where they can bring significant performance improvements > over their non-relaxed counterparts. > > (2) To allow COMPILE_TEST to build drivers using the relaxed accessors across > all architectures. > > The proposed semantics largely match exactly those provided by the ARM > implementation (i.e. no weaker), with one exception (see below). > > Informally: > > - Relaxed accesses to the same device are ordered with respect to each > other. > > - Relaxed accesses are *not* guaranteed to be ordered with respect to normal > memory accesses (e.g. DMA buffers -- this is what gives us the performance > boost over the non-relaxed versions). > > - Relaxed accesses are not guaranteed to be ordered with respect to > LOCK/UNLOCK operations. > > In actual fact, the relaxed accessors *are* ordered with respect to > LOCK/UNLOCK > operations on ARM[64], but I have added this constraint for the benefit of > PowerPC, which has expensive I/O barriers in the spin_unlock path for the > non-relaxed accessors. > > A corollary to this is that mmiowb() probably needs rethinking. As it > currently > stands, an mmiowb() is required to order MMIO writes to a device from multiple > CPUs, even if that device is protected by a lock. However, this isn't often > used > in practice, leading to PowerPC implementing both mmiowb() *and* synchronising > I/O in spin_unlock. > > I would propose making the non-relaxed I/O accessors ordered with respect to > LOCK/UNLOCK, leaving mmiowb() to be used with the relaxed accessors, if > required, but would welcome thoughts/suggestions on this topic. > > > Will Deacon (18): > asm-generic: io: implement relaxed accessor macros as conditional > wrappers > microblaze: io: remove dummy relaxed accessor macros > s390: io: remove dummy relaxed accessor macros for reads > xtensa: io: remove dummy relaxed accessor macros for reads > alpha: io: implement relaxed accessor macros for writes > frv: io: implement dummy relaxed accessor macros for writes > cris: io: implement dummy relaxed accessor macros for writes > ia64: io: implement dummy relaxed accessor macros for writes > m32r: io: implement dummy relaxed accessor macros for writes > m68k: io: implement dummy relaxed accessor macros for writes > mn10300: io: implement dummy relaxed accessor macros for writes > parisc: io: implement dummy relaxed accessor macros for writes > powerpc: io: implement dummy relaxed accessor macros for writes > sparc: io: implement dummy relaxed accessor macros for writes > tile: io: implement dummy relaxed accessor macros for writes > x86: io: implement dummy relaxed accessor macros for writes > documentation: memory-barriers: clarify relaxed io accessor semantics > asm-generic: io: define relaxed accessor macros unconditionally > > Documentation/memory-barriers.txt | 13 + > arch/alpha/include/asm/io.h | 12 > arch/cris/include/asm/io.h| 3 +++ > arch/frv/include/asm/io.h | 3 +++ > arch/ia64/include/asm/io.h| 4 > arch/m32r/include/asm/io.h| 3 +++ > arch/m68k/include/asm/io.h| 8 > arch/m68k/include/asm/io_no.h | 4 > arch/microblaze/include/asm/io.h | 8 > arch/mn10300/include/asm/io.h | 4 > arch/parisc/include/asm/io.h | 12 > arch/powerpc/include/asm/io.h | 12 > arch/s390/include/asm/io.h| 5 - > arch/sparc/include/asm/io.h | 9 + > arch/sparc/include/asm/io_32.h| 3 --- > arch/sparc/include/asm/io_64.h| 22 ++ > arch/tile/include/asm/io.h| 4 > arch/x86/include/asm/io.h | 10 +++--- > arch/xtensa/include/asm/io.h | 7 --- > inclu
Re: [PATCH v2 00/18] Cross-architecture definitions of relaxed MMIO accessors
On Thu, 2014-05-22 at 17:47 +0100, Will Deacon wrote: > A corollary to this is that mmiowb() probably needs rethinking. As it > currently > stands, an mmiowb() is required to order MMIO writes to a device from multiple > CPUs, even if that device is protected by a lock. However, this isn't often > used > in practice, leading to PowerPC implementing both mmiowb() *and* synchronising > I/O in spin_unlock. > > I would propose making the non-relaxed I/O accessors ordered with respect to > LOCK/UNLOCK, leaving mmiowb() to be used with the relaxed accessors, if > required, but would welcome thoughts/suggestions on this topic. I agree on the proposed semantics, though for us that does mean we still need that per-cpu flag tracking non-relaxed MMIO stores and corresponding added barrier in unlock. Eventually, if the use of the relaxed accessors becomes pervasive enough I suppose I can just make the ordered ones unconditionally do 2 barriers. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/