Instead of always falling back to memcpy_fromio() for any size, prefer
using read{b,w,l}(). When reading struct members it's common to read
individual integer variables individually. Going through memcpy_fromio()
for each of them poses a high penalty.

Employ a similar trick as __seqprop() by using _Generic() to generate
only the specific call based on a type-compatible variable.

For a pariticular i915 workload producing GPU context switches,
__get_engine_usage_record() is particularly hot since the engine usage
is read from device local memory with dgfx, possibly multiple times
since it's racy. Test execution time for this test shows a ~12.5%
improvement with DG2:

Before:
        nrepeats = 1000; min = 7.63243e+06; max = 1.01817e+07;
        median = 9.52548e+06; var = 526149;
After:
        nrepeats = 1000; min = 7.03402e+06; max = 8.8832e+06;
        median = 8.33955e+06; var = 333113;

Other things attempted that didn't prove very useful:
1) Change the _Generic() on x86 to just dereference the memory address
2) Change __get_engine_usage_record() to do just 1 read per loop,
   comparing with the previous value read
3) Change __get_engine_usage_record() to access the fields directly as it
   was before the conversion to iosys-map

(3) did gave a small improvement (~3%), but doesn't seem to scale well
to other similar cases in the driver.

Additional test by Chris Wilson using gem_create from igt with some
changes to track object creation time. This happens to accidentally
stress this code path:

        Pre iosys_map conversion of engine busyness:
        lmem0: Creating    262144 4KiB objects took 59274.2ms

        Unpatched:
        lmem0: Creating    262144 4KiB objects took 108830.2ms

        With readl (this patch):
        lmem0: Creating    262144 4KiB objects took 61348.6ms

        s/readl/READ_ONCE/
        lmem0: Creating    262144 4KiB objects took 61333.2ms

So we do take a little bit more time than before the conversion, but
that is due to other factors: bringing the READ_ONCE back would be as
good as just doing this conversion.

Signed-off-by: Lucas De Marchi <lucas.demar...@intel.com>
---
 include/linux/iosys-map.h | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h
index e69a002d5aa4..cd28c7a1b79c 100644
--- a/include/linux/iosys-map.h
+++ b/include/linux/iosys-map.h
@@ -333,6 +333,20 @@ static inline void iosys_map_memset(struct iosys_map *dst, 
size_t offset,
                memset(dst->vaddr + offset, value, len);
 }
 
+#ifdef CONFIG_64BIT
+#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_)                 \
+       u64: val_ = readq(vaddr_iomem_),
+#else
+#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_)
+#endif
+
+#define __iosys_map_rd_io(val__, vaddr_iomem__, type__) _Generic(val__,        
\
+       u8: val__ = readb(vaddr_iomem__),                               \
+       u16: val__ = readw(vaddr_iomem__),                              \
+       u32: val__ = readl(vaddr_iomem__),                              \
+       __iosys_map_rd_io_u64_case(val__, vaddr_iomem__)                \
+       default: memcpy_fromio(&(val__), vaddr_iomem__, sizeof(val__)))
+
 /**
  * iosys_map_rd - Read a C-type value from the iosys_map
  *
@@ -346,10 +360,14 @@ static inline void iosys_map_memset(struct iosys_map 
*dst, size_t offset,
  * Returns:
  * The value read from the mapping.
  */
-#define iosys_map_rd(map__, offset__, type__) ({                       \
-       type__ val;                                                     \
-       iosys_map_memcpy_from(&val, map__, offset__, sizeof(val));      \
-       val;                                                            \
+#define iosys_map_rd(map__, offset__, type__) ({                               
\
+       type__ val;                                                             
\
+       if ((map__)->is_iomem) {                                                
\
+               __iosys_map_rd_io(val, (map__)->vaddr_iomem + offset__, 
type__);\
+       } else {                                                                
\
+               memcpy(&val, (map__)->vaddr + offset__, sizeof(val));           
\
+       }                                                                       
\
+       val;                                                                    
\
 })
 
 /**
-- 
2.36.1

Reply via email to