Re: [RFT PATCH v3 08/27] asm-generic/io.h: Add a non-posted variant of ioremap()

2021-03-25 Thread Will Deacon
On Thu, Mar 25, 2021 at 11:07:40PM +0900, Hector Martin wrote:
> On 25/03/2021 04.09, Arnd Bergmann wrote:
> > On Wed, Mar 24, 2021 at 7:12 PM Will Deacon  wrote:
> > > 
> > > > +/*
> > > > + * ioremap_np needs an explicit architecture implementation, as it
> > > > + * requests stronger semantics than regular ioremap(). Portable drivers
> > > > + * should instead use one of the higher-level abstractions, like
> > > > + * devm_ioremap_resource(), to choose the correct variant for any given
> > > > + * device and bus. Portable drivers with a good reason to want 
> > > > non-posted
> > > > + * write semantics should always provide an ioremap() fallback in case
> > > > + * ioremap_np() is not available.
> > > > + */
> > > > +#ifndef ioremap_np
> > > > +#define ioremap_np ioremap_np
> > > > +static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size)
> > > > +{
> > > > + return NULL;
> > > > +}
> > > > +#endif
> > > 
> > > Can we implement the generic pci_remap_cfgspace() in terms of ioremap_np()
> > > if it is supported by the architecture? That way, we could avoid defining
> > > both on arm64.
> > 
> > Good idea. It needs a fallback in case the ioremap_np() fails on most
> > architectures, but that sounds easy enough.
> > 
> > Since pci_remap_cfgspace() only has custom implementations, it sounds like
> > we can actually make the generic implementation unconditional in the end,
> > but that requires adding ioremap_np() on 32-bit as well, and I would keep
> > that separate from this series.
> 
> Sounds good; I'm adding a patch to adjust the generic implementation and
> remove the arm64 one in v4, and we can then complete the cleanup for other
> arches later.

Cheers. Don't forget to update the comment in the generic code which
complains about the lack of an ioremap() API for non-posted writes ;)

Will


Re: [RFT PATCH v3 08/27] asm-generic/io.h: Add a non-posted variant of ioremap()

2021-03-25 Thread Hector Martin

On 25/03/2021 04.09, Arnd Bergmann wrote:

On Wed, Mar 24, 2021 at 7:12 PM Will Deacon  wrote:



+/*
+ * ioremap_np needs an explicit architecture implementation, as it
+ * requests stronger semantics than regular ioremap(). Portable drivers
+ * should instead use one of the higher-level abstractions, like
+ * devm_ioremap_resource(), to choose the correct variant for any given
+ * device and bus. Portable drivers with a good reason to want non-posted
+ * write semantics should always provide an ioremap() fallback in case
+ * ioremap_np() is not available.
+ */
+#ifndef ioremap_np
+#define ioremap_np ioremap_np
+static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size)
+{
+ return NULL;
+}
+#endif


Can we implement the generic pci_remap_cfgspace() in terms of ioremap_np()
if it is supported by the architecture? That way, we could avoid defining
both on arm64.


Good idea. It needs a fallback in case the ioremap_np() fails on most
architectures, but that sounds easy enough.

Since pci_remap_cfgspace() only has custom implementations, it sounds like
we can actually make the generic implementation unconditional in the end,
but that requires adding ioremap_np() on 32-bit as well, and I would keep
that separate from this series.


Sounds good; I'm adding a patch to adjust the generic implementation and 
remove the arm64 one in v4, and we can then complete the cleanup for 
other arches later.


--
Hector Martin (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [RFT PATCH v3 08/27] asm-generic/io.h: Add a non-posted variant of ioremap()

2021-03-24 Thread Arnd Bergmann
On Wed, Mar 24, 2021 at 7:12 PM Will Deacon  wrote:
>
> > +/*
> > + * ioremap_np needs an explicit architecture implementation, as it
> > + * requests stronger semantics than regular ioremap(). Portable drivers
> > + * should instead use one of the higher-level abstractions, like
> > + * devm_ioremap_resource(), to choose the correct variant for any given
> > + * device and bus. Portable drivers with a good reason to want non-posted
> > + * write semantics should always provide an ioremap() fallback in case
> > + * ioremap_np() is not available.
> > + */
> > +#ifndef ioremap_np
> > +#define ioremap_np ioremap_np
> > +static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size)
> > +{
> > + return NULL;
> > +}
> > +#endif
>
> Can we implement the generic pci_remap_cfgspace() in terms of ioremap_np()
> if it is supported by the architecture? That way, we could avoid defining
> both on arm64.

Good idea. It needs a fallback in case the ioremap_np() fails on most
architectures, but that sounds easy enough.

Since pci_remap_cfgspace() only has custom implementations, it sounds like
we can actually make the generic implementation unconditional in the end,
but that requires adding ioremap_np() on 32-bit as well, and I would keep
that separate from this series.

Arnd


Re: [RFT PATCH v3 08/27] asm-generic/io.h: Add a non-posted variant of ioremap()

2021-03-24 Thread Will Deacon
On Fri, Mar 05, 2021 at 06:38:43AM +0900, Hector Martin wrote:
> ARM64 currently defaults to posted MMIO (nGnRnE), but some devices
> require the use of non-posted MMIO (nGnRE). Introduce a new ioremap()
> variant to handle this case. ioremap_np() is aliased to ioremap() by
> default on arches that do not implement this variant.
> 
> sparc64 is the only architecture that needs to be touched directly,
> because it includes neither of the generic io.h or iomap.h headers.
> 
> This adds the IORESOURCE_MEM_NONPOSTED flag, which maps to this
> variant and marks a given resource as requiring non-posted mappings.
> This is implemented in the resource system because it is a SoC-level
> requirement, so existing drivers do not need special-case code to pick
> this ioremap variant.
> 
> Then this is implemented in devres by introducing devm_ioremap_np(),
> and making devm_ioremap_resource() automatically select this variant
> when the resource has the IORESOURCE_MEM_NONPOSTED flag set.
> 
> Signed-off-by: Hector Martin 
> ---
>  .../driver-api/driver-model/devres.rst|  1 +
>  arch/sparc/include/asm/io_64.h|  4 
>  include/asm-generic/io.h  | 22 ++-
>  include/asm-generic/iomap.h   |  9 
>  include/linux/io.h|  2 ++
>  include/linux/ioport.h|  1 +
>  lib/devres.c  | 22 +++
>  7 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/driver-api/driver-model/devres.rst 
> b/Documentation/driver-api/driver-model/devres.rst
> index cd8b6e657b94..2f45877a539d 100644
> --- a/Documentation/driver-api/driver-model/devres.rst
> +++ b/Documentation/driver-api/driver-model/devres.rst
> @@ -309,6 +309,7 @@ IOMAP
>devm_ioremap()
>devm_ioremap_uc()
>devm_ioremap_wc()
> +  devm_ioremap_np()
>devm_ioremap_resource() : checks resource, requests memory region, ioremaps
>devm_ioremap_resource_wc()
>devm_platform_ioremap_resource() : calls devm_ioremap_resource() for 
> platform device
> diff --git a/arch/sparc/include/asm/io_64.h b/arch/sparc/include/asm/io_64.h
> index 9bb27e5c22f1..9fbfc9574432 100644
> --- a/arch/sparc/include/asm/io_64.h
> +++ b/arch/sparc/include/asm/io_64.h
> @@ -409,6 +409,10 @@ static inline void __iomem *ioremap(unsigned long 
> offset, unsigned long size)
>  #define ioremap_uc(X,Y)  ioremap((X),(Y))
>  #define ioremap_wc(X,Y)  ioremap((X),(Y))
>  #define ioremap_wt(X,Y)  ioremap((X),(Y))
> +static inline void __iomem *ioremap_np(unsigned long offset, unsigned long 
> size)
> +{
> + return NULL;
> +}
>  
>  static inline void iounmap(volatile void __iomem *addr)
>  {
> diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
> index c6af40ce03be..082e0c96db6e 100644
> --- a/include/asm-generic/io.h
> +++ b/include/asm-generic/io.h
> @@ -942,7 +942,9 @@ static inline void *phys_to_virt(unsigned long address)
>   *
>   * ioremap_wc() and ioremap_wt() can provide more relaxed caching attributes
>   * for specific drivers if the architecture choses to implement them.  If 
> they
> - * are not implemented we fall back to plain ioremap.
> + * are not implemented we fall back to plain ioremap. Conversely, 
> ioremap_np()
> + * can provide stricter non-posted write semantics if the architecture
> + * implements them.
>   */
>  #ifndef CONFIG_MMU
>  #ifndef ioremap
> @@ -993,6 +995,24 @@ static inline void __iomem *ioremap_uc(phys_addr_t 
> offset, size_t size)
>  {
>   return NULL;
>  }
> +
> +/*
> + * ioremap_np needs an explicit architecture implementation, as it
> + * requests stronger semantics than regular ioremap(). Portable drivers
> + * should instead use one of the higher-level abstractions, like
> + * devm_ioremap_resource(), to choose the correct variant for any given
> + * device and bus. Portable drivers with a good reason to want non-posted
> + * write semantics should always provide an ioremap() fallback in case
> + * ioremap_np() is not available.
> + */
> +#ifndef ioremap_np
> +#define ioremap_np ioremap_np
> +static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size)
> +{
> + return NULL;
> +}
> +#endif

Can we implement the generic pci_remap_cfgspace() in terms of ioremap_np()
if it is supported by the architecture? That way, we could avoid defining
both on arm64.

Will


Re: [RFT PATCH v3 08/27] asm-generic/io.h: Add a non-posted variant of ioremap()

2021-03-08 Thread Marc Zyngier
On Thu, 04 Mar 2021 21:38:43 +,
Hector Martin  wrote:
> 
> ARM64 currently defaults to posted MMIO (nGnRnE), but some devices
> require the use of non-posted MMIO (nGnRE). Introduce a new ioremap()
> variant to handle this case. ioremap_np() is aliased to ioremap() by
> default on arches that do not implement this variant.
> 
> sparc64 is the only architecture that needs to be touched directly,
> because it includes neither of the generic io.h or iomap.h headers.
> 
> This adds the IORESOURCE_MEM_NONPOSTED flag, which maps to this
> variant and marks a given resource as requiring non-posted mappings.
> This is implemented in the resource system because it is a SoC-level
> requirement, so existing drivers do not need special-case code to pick
> this ioremap variant.
> 
> Then this is implemented in devres by introducing devm_ioremap_np(),
> and making devm_ioremap_resource() automatically select this variant
> when the resource has the IORESOURCE_MEM_NONPOSTED flag set.
> 
> Signed-off-by: Hector Martin 

Acked-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [RFT PATCH v3 08/27] asm-generic/io.h: Add a non-posted variant of ioremap()

2021-03-05 Thread Hector Martin

On 05/03/2021 23.45, Andy Shevchenko wrote:

On Thu, Mar 4, 2021 at 11:40 PM Hector Martin  wrote:


ARM64 currently defaults to posted MMIO (nGnRnE), but some devices
require the use of non-posted MMIO (nGnRE). Introduce a new ioremap()
variant to handle this case. ioremap_np() is aliased to ioremap() by
default on arches that do not implement this variant.


Hmm... But isn't it basically a requirement to those device drivers to
use readX()/writeX() instead of readX_relaxed() / writeX_relaxed()?


No, the write ops used do not matter. It's just that on these Apple SoCs 
the fabric requires the mappings to be nGnRnE, else it just throws 
SErrors on all writes and ignores them.


The difference between _relaxed and not is barrier behavior with regards 
to DMA/memory accesses; this applies regardless of whether the writes 
are E or nE. You can have relaxed accesses with nGnRnE and then you 
would still have race conditions if you do not have a barrier between 
the MMIO and accessing DMA memory. What nGnRnE buys you (on 
platforms/buses where it works properly) is that you do need a dummy 
read after a write to ensure completion.


All of this is to some extent moot on these SoCs; it's not that we need 
the drivers to use nGnRnE for some correctness reason, it's that the 
SoCs force us to use it or else everything breaks, which was the 
motivation for this change. But since on most other SoCs both are valid 
options, this does allow some other drivers/platforms to opt into nGnRnE 
if they have a good reason to do so.


Though you just made me notice two mistakes in the commit description: 
first, it describes the old v2 version, for v3 I made ioremap_np() just 
return NULL on arches that don't implement it. Second, nGnRnE and nGnRE 
are backwards. Oops. I'll fix it for the next version.



  #define IORESOURCE_MEM_32BIT   (3<<3)
  #define IORESOURCE_MEM_SHADOWABLE  (1<<5)  /* dup: IORESOURCE_SHADOWABLE 
*/
  #define IORESOURCE_MEM_EXPANSIONROM(1<<6)
+#define IORESOURCE_MEM_NONPOSTED   (1<<7)


Not sure it's the right location (in a bit field) for this flag.


Do you have a better suggestion? It seemed logical to put it here, as a 
flag on memory-type I/O resources.


--
Hector Martin (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [RFT PATCH v3 08/27] asm-generic/io.h: Add a non-posted variant of ioremap()

2021-03-05 Thread Andy Shevchenko
On Thu, Mar 4, 2021 at 11:40 PM Hector Martin  wrote:
>
> ARM64 currently defaults to posted MMIO (nGnRnE), but some devices
> require the use of non-posted MMIO (nGnRE). Introduce a new ioremap()
> variant to handle this case. ioremap_np() is aliased to ioremap() by
> default on arches that do not implement this variant.

Hmm... But isn't it basically a requirement to those device drivers to
use readX()/writeX() instead of readX_relaxed() / writeX_relaxed()?

...

>  #define IORESOURCE_MEM_32BIT   (3<<3)
>  #define IORESOURCE_MEM_SHADOWABLE  (1<<5)  /* dup: IORESOURCE_SHADOWABLE 
> */
>  #define IORESOURCE_MEM_EXPANSIONROM(1<<6)
> +#define IORESOURCE_MEM_NONPOSTED   (1<<7)

Not sure it's the right location (in a bit field) for this flag.

-- 
With Best Regards,
Andy Shevchenko


[RFT PATCH v3 08/27] asm-generic/io.h: Add a non-posted variant of ioremap()

2021-03-04 Thread Hector Martin
ARM64 currently defaults to posted MMIO (nGnRnE), but some devices
require the use of non-posted MMIO (nGnRE). Introduce a new ioremap()
variant to handle this case. ioremap_np() is aliased to ioremap() by
default on arches that do not implement this variant.

sparc64 is the only architecture that needs to be touched directly,
because it includes neither of the generic io.h or iomap.h headers.

This adds the IORESOURCE_MEM_NONPOSTED flag, which maps to this
variant and marks a given resource as requiring non-posted mappings.
This is implemented in the resource system because it is a SoC-level
requirement, so existing drivers do not need special-case code to pick
this ioremap variant.

Then this is implemented in devres by introducing devm_ioremap_np(),
and making devm_ioremap_resource() automatically select this variant
when the resource has the IORESOURCE_MEM_NONPOSTED flag set.

Signed-off-by: Hector Martin 
---
 .../driver-api/driver-model/devres.rst|  1 +
 arch/sparc/include/asm/io_64.h|  4 
 include/asm-generic/io.h  | 22 ++-
 include/asm-generic/iomap.h   |  9 
 include/linux/io.h|  2 ++
 include/linux/ioport.h|  1 +
 lib/devres.c  | 22 +++
 7 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/Documentation/driver-api/driver-model/devres.rst 
b/Documentation/driver-api/driver-model/devres.rst
index cd8b6e657b94..2f45877a539d 100644
--- a/Documentation/driver-api/driver-model/devres.rst
+++ b/Documentation/driver-api/driver-model/devres.rst
@@ -309,6 +309,7 @@ IOMAP
   devm_ioremap()
   devm_ioremap_uc()
   devm_ioremap_wc()
+  devm_ioremap_np()
   devm_ioremap_resource() : checks resource, requests memory region, ioremaps
   devm_ioremap_resource_wc()
   devm_platform_ioremap_resource() : calls devm_ioremap_resource() for 
platform device
diff --git a/arch/sparc/include/asm/io_64.h b/arch/sparc/include/asm/io_64.h
index 9bb27e5c22f1..9fbfc9574432 100644
--- a/arch/sparc/include/asm/io_64.h
+++ b/arch/sparc/include/asm/io_64.h
@@ -409,6 +409,10 @@ static inline void __iomem *ioremap(unsigned long offset, 
unsigned long size)
 #define ioremap_uc(X,Y)ioremap((X),(Y))
 #define ioremap_wc(X,Y)ioremap((X),(Y))
 #define ioremap_wt(X,Y)ioremap((X),(Y))
+static inline void __iomem *ioremap_np(unsigned long offset, unsigned long 
size)
+{
+   return NULL;
+}
 
 static inline void iounmap(volatile void __iomem *addr)
 {
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index c6af40ce03be..082e0c96db6e 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -942,7 +942,9 @@ static inline void *phys_to_virt(unsigned long address)
  *
  * ioremap_wc() and ioremap_wt() can provide more relaxed caching attributes
  * for specific drivers if the architecture choses to implement them.  If they
- * are not implemented we fall back to plain ioremap.
+ * are not implemented we fall back to plain ioremap. Conversely, ioremap_np()
+ * can provide stricter non-posted write semantics if the architecture
+ * implements them.
  */
 #ifndef CONFIG_MMU
 #ifndef ioremap
@@ -993,6 +995,24 @@ static inline void __iomem *ioremap_uc(phys_addr_t offset, 
size_t size)
 {
return NULL;
 }
+
+/*
+ * ioremap_np needs an explicit architecture implementation, as it
+ * requests stronger semantics than regular ioremap(). Portable drivers
+ * should instead use one of the higher-level abstractions, like
+ * devm_ioremap_resource(), to choose the correct variant for any given
+ * device and bus. Portable drivers with a good reason to want non-posted
+ * write semantics should always provide an ioremap() fallback in case
+ * ioremap_np() is not available.
+ */
+#ifndef ioremap_np
+#define ioremap_np ioremap_np
+static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size)
+{
+   return NULL;
+}
+#endif
+
 #endif
 
 #ifdef CONFIG_HAS_IOPORT_MAP
diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h
index 649224664969..9b3eb6d86200 100644
--- a/include/asm-generic/iomap.h
+++ b/include/asm-generic/iomap.h
@@ -101,6 +101,15 @@ extern void ioport_unmap(void __iomem *);
 #define ioremap_wt ioremap
 #endif
 
+#ifndef ARCH_HAS_IOREMAP_NP
+/* See the comment in asm-generic/io.h about ioremap_np(). */
+#define ioremap_np ioremap_np
+static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size)
+{
+   return NULL;
+}
+#endif
+
 #ifdef CONFIG_PCI
 /* Destroy a virtual mapping cookie for a PCI BAR (memory or IO) */
 struct pci_dev;
diff --git a/include/linux/io.h b/include/linux/io.h
index 8394c56babc2..d718354ed3e1 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -68,6 +68,8 @@ void __iomem *devm_ioremap_uc(struct device *dev, 
resource_size_t offset,