On 29/12/15 19:18, Chris Wilson wrote:
On Tue, Dec 29, 2015 at 06:59:57PM +0000, Dave Gordon wrote:
In the discussions around commit 373701b1 (Jani Nikula)
     drm: fix potential dangling else problems in for_each_ macros
Daniel Vetter mooted the idea of a for_each_engine(ring, dev_priv)
macro that didn't use or rely on having an integer index variable.

So here it is; and, for backwards compatibility, an updated version
of for_each_ring() implemented using the same internals. As an example
of the use of this new macro, the functions in intel_uncore.c have
been converted to use it in preference to for_each_ring() wherever
possible. A subsequent patch will convert many of the uses in other
source files.

Cc: Daniel Vetter <dan...@ffwll.ch>
Cc: Jani Nikula <jani.nik...@intel.com>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>
Signed-off-by: Dave Gordon <david.s.gor...@intel.com>
---
  drivers/gpu/drm/i915/i915_drv.h     | 32 ++++++++++++++++++++++++++++----
  drivers/gpu/drm/i915/intel_uncore.c |  5 ++---
  2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cf7e0fc..a6457ee 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1968,10 +1968,34 @@ static inline struct drm_i915_private 
*guc_to_i915(struct intel_guc *guc)
        return container_of(guc, struct drm_i915_private, guc);
  }

-/* Iterate over initialised rings */
-#define for_each_ring(ring__, dev_priv__, i__) \
-       for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
-               for_each_if ((((ring__) = &(dev_priv__)->ring[(i__)]), 
intel_ring_initialized((ring__))))
+/* Helpers for the for_each_* macros below */
+#define        __INIT_ENGINE_PTR(engine, dev_priv)                             
\
+       ({                                                              \
+               struct intel_engine_cs *ep = (dev_priv)->ring;               \
+               (engine) = --ep;                                        \
+       })
+#define        __NEXT_ACTIVE_ENGINE(engine, dev_priv, nr)                      
\
+       ({                                                              \
+               struct intel_engine_cs *end = &(dev_priv)->ring[nr];     \
+               struct intel_engine_cs *ep = (engine);                  \
+               bool in_range;                                          \
+               do {                                                    \
+                       in_range = ++(ep) < end;                     \
+               } while (in_range && !intel_ring_initialized(ep));      \
+               (engine) = ep;                                          \
+               in_range;                                               \
+       })
+
+/* Iterate over initialised engines */
+#define for_each_engine(engine, dev_priv)                              \
+       for (__INIT_ENGINE_PTR(engine, dev_priv);                       \
+            __NEXT_ACTIVE_ENGINE(engine, dev_priv, I915_NUM_RINGS); )
+
+/* Backwards compatibility: iterate over initialised "rings" */
+#define for_each_ring(engine, dev_priv, ring_id)                       \
+       for (__INIT_ENGINE_PTR(engine, dev_priv);                       \
+            __NEXT_ACTIVE_ENGINE(engine, dev_priv, I915_NUM_RINGS) &&  \
+            ((ring_id) = (engine)->id, true); )

No, that is hard to read and bloats the kernel for no benefit. Even more
so compared against just

Well I agree that __NEXT_ACTIVE_ENGINE() takes a bit of thinking about, but at least it uses clearly named variables that tell you what they stand for, and can be formatted without any line breaks within statements, which IMHO makes it much EASIER to read than then mess of (parentheses), __underscores__, & random->punctuation that we had. And breaking the details out into two helpers makes the actual end-user macro so much clearer; after all, the complicated innards only have to be checked and verified once, here on the mailing list, and then everyone can see what the for_each_engine() macro is doing without having to worry about the GCC magic inside.

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 578f954dba1d..739569458bb7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1967,8 +1967,8 @@ static inline struct drm_i915_private *guc_to_i915(struct 
intel_guc *guc)

  /* Iterate over initialised rings */
  #define for_each_ring(ring__, dev_priv__, i__) \
-       for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
-               for_each_if ((((ring__) = &(dev_priv__)->ring[(i__)]), 
intel_engine_initialized((ring__))))
+       for ((ring__) = &(dev_priv__)->ring[0]; (ring__) < 
&(dev_priv__)->ring[I915_NUM_RINGS]; (ring__)++) \
+               for_each_if (intel_engine_initialized((ring__)))

  enum hdmi_force_audio {
         HDMI_AUDIO_OFF_DVI = -2,        /* no aux data for HDMI-DVI converter 
*/

(which is a net win over the current code, presumably solely due to
eliminating dead locals)
-Chris

Well actually it isn't.

First off, that doesn't even compile, it leaves lots of "variable used uninitialised" type errors in users of the macro, because those users actually require the index, which is why my patches converted only those that didn't.

With the above (with the addition of "i__ = 0" ... "(i__)++") used as the definition of for_each_ring(), and also (but without the parameter "i__") as the definition of for_each_engine(), plus my second patch that converts as many as possible, we find the following:

-rw-r--r-- 30866475 Jan 4 14:55 i915-drm.ko     # baseline
-rw-r--r-- 30871447 Jan 4 14:56 i915-cw.ko      # this version
-rw-r--r-- 30877524 Jan 4 14:56 i915-dsg.ko     # my version

so it looks like this increases the size, and mine increases it even more. But actually, when we look at CODE size rather than file size:

   text    data     bss     dec     hex filename
1092416   24233     512 1117161  110be9 i915-drm.ko     # baseline
1092820   24233     512 1117565  110d7d i915-cw.ko      # this version
1092244   24233     512 1116989  110b3d i915-dsg.ko     # my version

we find just the opposite. This makes the code ~400b larger, whereas my patch made it ~200b smaller. I think it's because GCC knows how to alias variables with different scopes to the same registers, so caching intermediate results in very-local-variables is generally a win. It's definitely not "kernel bloat".

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to