Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA

2023-05-05 Thread Arnd Bergmann
On Fri, May 5, 2023, at 07:47, Guo Ren wrote:
> On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann  wrote:

>>
>> riscv also invalidates the caches before the transfer, which does
>> not appear to serve any purpose.
> Yes, we can't guarantee the CPU pre-load cache lines randomly during
> dma working.
>
> But I've two purposes to keep invalidates before dma transfer:
>  - We clearly tell the CPU these cache lines are invalid. The caching
> algorithm would use these invalid slots first instead of replacing
> valid ones.
>  - Invalidating is very cheap. Actually, flush and clean have the same
> performance in our machine.

The main purpose of the series was to get consistent behavior on
all machines, so I really don't want a custom optimization on
one architecture. You make a good point about cacheline reuse
after invalidation, but if we do that, I'd suggest doing this
across all architectures.

> So, how about:
>
> diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> index d919efab6eba..2c52fbc15064 100644
> --- a/arch/riscv/mm/dma-noncoherent.c
> +++ b/arch/riscv/mm/dma-noncoherent.c
> @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t 
> size,
> ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
> break;
> case DMA_FROM_DEVICE:
> -   ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
> -   break;
> case DMA_BIDIRECTIONAL:
> ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
> break;

This is something we can consider. Unfortunately, this is something
that no architecture (except pa-risc, which has other problems)
does at the moment, so we'd probably need to have a proper debate
about this.

We already have two conflicting ways to handle DMA_FROM_DEVICE,
either invalidate/invalidate, or clean/invalidate. I can see
that flush/invalidate may be a sensible option as well, but I'd
want to have that discussion after the series is complete, so
we can come to a generic solution that has the same documented
behavior across all architectures.

In particular, if we end up moving arm64 and riscv back to the
traditional invalidate/invalidate for DMA_FROM_DEVICE and
document that driver must not rely on buffers getting cleaned
before a partial DMA_FROM_DEVICE, the question between clean
or flush becomes moot as well.

> @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
> break;
> case DMA_FROM_DEVICE:
> case DMA_BIDIRECTIONAL:
> /* I'm not sure all drivers have guaranteed cacheline
> alignment. If not, this inval would cause problems */
> -   ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
> +   ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size);
> break;

This is my original patch, and I would not mix it with the other
change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in
is that both flush and inval would be wrong if you get simultaneous
writes from device and cpu to the same cache line, so there is
no way to win this. Using inval instead of flush would at least
work if the CPU data in the cacheline is read-only from the CPU,
so that seems better than something that is always wrong.

The documented API is that sharing the cache line is not allowed
at all, so anything that would observe a difference between the
two is also a bug. One idea that we have considered already is
that we could overwrite the unused bits of the cacheline with
poison values and/or mark them as invalid using KASAN for debugging
purposes, to find drivers that already violate this.

  Arnd

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v4 4/6] fbdev: Include instead of

2023-05-05 Thread Thomas Zimmermann

Hi

Am 04.05.23 um 17:37 schrieb Sam Ravnborg:

Hi Thomas,

On Thu, May 04, 2023 at 09:45:37AM +0200, Thomas Zimmermann wrote:

Replace include statements for  with . Fixes
the coding style: if a header is available in asm/ and linux/, it
is preferable to include the header from linux/. This only affects
a few source files, most of which already include .

Suggested-by: Sam Ravnborg 
Signed-off-by: Thomas Zimmermann 


Thanks,
Reviewed-by: Sam Ravnborg 


Thanks for reviewing. I intent to merge this early next week after the 
upcoming -rc1 has landed in the DRM misc trees.


Best regards
Thomas

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


OpenPGP_signature
Description: OpenPGP digital signature
___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v4 0/6] fbdev: Move framebuffer I/O helpers to

2023-05-05 Thread Thomas Zimmermann

Hi

Am 04.05.23 um 10:08 schrieb Arnd Bergmann:

On Thu, May 4, 2023, at 09:45, Thomas Zimmermann wrote:

Fbdev provides helpers for framebuffer I/O, such as fb_readl(),
fb_writel() or fb_memcpy_to_fb(). The implementation of each helper
depends on the architecture, but they are all equivalent to regular
I/O functions of similar names. So use regular functions instead and
move all helpers into 

The first patch a simple whitespace cleanup.

Until now,  contained an include of . As this
will go away, patches 2 to 4 prepare include statements in the various
drivers. Source files that use regular I/O helpers, such as readl(),
now include . Source files that use framebuffer I/O
helpers, such as fb_readl(), now include .

Patch 5 replaces the architecture-based if-else branching in
 by helpers in . All helpers use Linux'
existing I/O functions.

Patch 6 harmonizes naming among fbdev and existing I/O functions.

The patchset has been built for a variety of platforms, such as x86-64,
arm, aarch64, ppc64, parisc, m64k, mips and sparc.


The whole series looks good to me now,


This was a bit more effort to to untangle than I expected. Thanks for 
your help with cleaning this up.


Best regards
Thomas



Reviewed-by: Arnd Bergmann 


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


OpenPGP_signature
Description: OpenPGP digital signature
___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [v4,5/6] fbdev: Move framebuffer I/O helpers into

2023-05-05 Thread Thomas Zimmermann

Hi

Am 04.05.23 um 13:59 schrieb Sui Jingfeng:

Hi,

I tested the whole patch set on a LS3A5000(LoongArch)machine with efifb 
driver,


with both fbtest and fbdev of IGT,  The test result say passed and I can 
not see anything wired happen.



Tested-by: Sui Jingfeng 


Thanks for testing.




On 2023/5/4 15:45, Thomas Zimmermann wrote:

Implement framebuffer I/O helpers, such as fb_read*() and fb_write*(),
in the architecture's  header file or the generic one.

The common case has been the use of regular I/O functions, such as
__raw_readb() or memset_io(). A few architectures used plain system-
memory reads and writes. Sparc used helpers for its SBus.

The architectures that used special cases provide the same code in
their __raw_*() I/O helpers. So the patch replaces this code with the
__raw_*() functions and moves it to  for all
architectures.

v4:
* ia64, loongarch, sparc64: add fb_mem*() to arch headers
  to keep current semantics (Arnd)
v3:
* implement all architectures with generic helpers
* support reordering and native byte order (Geert, Arnd)

Signed-off-by: Thomas Zimmermann 
---
  arch/ia64/include/asm/fb.h  |  20 +++
  arch/loongarch/include/asm/fb.h |  21 +++
  arch/sparc/include/asm/fb.h |  20 +++
  include/asm-generic/fb.h    | 101 
  include/linux/fb.h  |  53 -
  5 files changed, 162 insertions(+), 53 deletions(-)

diff --git a/arch/ia64/include/asm/fb.h b/arch/ia64/include/asm/fb.h
index 0208f64a0da0..bcf982043a5c 100644
--- a/arch/ia64/include/asm/fb.h
+++ b/arch/ia64/include/asm/fb.h
@@ -2,7 +2,9 @@
  #ifndef _ASM_FB_H_
  #define _ASM_FB_H_
+#include 
  #include 
+#include 
  #include 
@@ -18,6 +20,24 @@ static inline void fb_pgprotect(struct file *file, 
struct vm_area_struct *vma,

  }
  #define fb_pgprotect fb_pgprotect
+static inline void fb_memcpy_fromfb(void *to, const volatile void 
__iomem *from, size_t n)

+{
+    memcpy(to, (void __force *)from, n);
+}
+#define fb_memcpy_fromfb fb_memcpy_fromfb
+
+static inline void fb_memcpy_tofb(volatile void __iomem *to, const 
void *from, size_t n)

+{
+    memcpy((void __force *)to, from, n);
+}
+#define fb_memcpy_tofb fb_memcpy_tofb
+
+static inline void fb_memset(volatile void __iomem *addr, int c, 
size_t n)

+{
+    memset((void __force *)addr, c, n);
+}
+#define fb_memset fb_memset
+
  #include 
  #endif /* _ASM_FB_H_ */
diff --git a/arch/loongarch/include/asm/fb.h 
b/arch/loongarch/include/asm/fb.h

index ff82f20685c8..c6fc7ef374a4 100644
--- a/arch/loongarch/include/asm/fb.h
+++ b/arch/loongarch/include/asm/fb.h
@@ -5,6 +5,27 @@
  #ifndef _ASM_FB_H_
  #define _ASM_FB_H_
+#include 
+#include 
+
+static inline void fb_memcpy_fromfb(void *to, const volatile void 
__iomem *from, size_t n)

+{
+    memcpy(to, (void __force *)from, n);
+}
+#define fb_memcpy_fromfb fb_memcpy_fromfb
+
+static inline void fb_memcpy_tofb(volatile void __iomem *to, const 
void *from, size_t n)

+{
+    memcpy((void __force *)to, from, n);
+}
+#define fb_memcpy_tofb fb_memcpy_tofb
+
+static inline void fb_memset(volatile void __iomem *addr, int c, 
size_t n)

+{
+    memset((void __force *)addr, c, n);
+}
+#define fb_memset fb_memset
+
  #include 


Here works as the past, but  why bother cast it to (void __force *) ?

why not use __memcpy_fromio, __memcpy_toio and __memset_io directly?

I modify it this patch as following, it still works.



  static inline void fb_memcpy_fromio(void *to, const volatile void 
__iomem *from, size_t n)

  {
-   memcpy(to, (void __force *)from, n);
+   __memcpy_fromio(to, from, n);
  }
  #define fb_memcpy_fromio fb_memcpy_fromio

  static inline void fb_memcpy_toio(volatile void __iomem *to, const 
void *from, size_t n)

  {
-   memcpy((void __force *)to, from, n);
+   __memcpy_toio(to, from, n);
  }
  #define fb_memcpy_toio fb_memcpy_toio

  static inline void fb_memset_io(volatile void __iomem *addr, int c, 
size_t n)

  {
-   memset((void __force *)addr, c, n);
+   __memset_io(addr, c, n);
  }
  #define fb_memset fb_memset_io


  #endif /* _ASM_FB_H_ */
diff --git a/arch/sparc/include/asm/fb.h b/arch/sparc/include/asm/fb.h
index 689ee5c60054..077da91aeba1 100644
--- a/arch/sparc/include/asm/fb.h
+++ b/arch/sparc/include/asm/fb.h
@@ -2,6 +2,8 @@
  #ifndef _SPARC_FB_H_
  #define _SPARC_FB_H_
+#include 
+
  struct fb_info;
  struct file;
  struct vm_area_struct;
@@ -16,6 +18,24 @@ static inline void fb_pgprotect(struct file *file, 
struct vm_area_struct *vma,

  int fb_is_primary_device(struct fb_info *info);
  #define fb_is_primary_device fb_is_primary_device
+static inline void fb_memcpy_fromfb(void *to, const volatile void 
__iomem *from, size_t n)

+{
+    sbus_memcpy_fromio(to, from, n);
+}
+#define fb_memcpy_fromfb fb_memcpy_fromfb
+
+static inline void fb_memcpy_tofb(volatile void __iomem *to, const 
void *from, size_t n)

+{
+    sbus_memcpy_toio(to, from, n);
+}
+#define fb_memcpy_tofb