Re: [Intel-gfx] [RFC] drm/i915: store all mmio bases in intel_engines

Tvrtko Ursulin Mon, 12 Mar 2018 04:05:34 -0700


On 09/03/2018 19:44, Tvrtko Ursulin wrote:

On 09/03/2018 18:47, Daniele Ceraolo Spurio wrote:
On 09/03/18 01:53, Tvrtko Ursulin wrote:
On 08/03/2018 18:46, Daniele Ceraolo Spurio wrote:
On 08/03/18 01:31, Tvrtko Ursulin wrote:
On 07/03/2018 19:45, Daniele Ceraolo Spurio wrote:
The mmio bases we're currently storing in the intel_engines array are
only valid for a subset of gens, so we need to ignore them and use
different values in some cases. Instead of doing that, we can have a
table of [starting gen, mmio base] pairs for each engine in
intel_engines and select the correct one based on the gen we'rerunning
on in a consistent way.

Cc: Mika Kuoppala <mika.kuopp...@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
Signed-off-by: Daniele Ceraolo Spurio<daniele.ceraolospu...@intel.com>
---
drivers/gpu/drm/i915/intel_engine_cs.c | 77+++++++++++++++++++++------------
  drivers/gpu/drm/i915/intel_ringbuffer.c |  1 -
  2 files changed, 49 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.cb/drivers/gpu/drm/i915/intel_engine_cs.c
index 4ba139c27fba..1dd92cac8d67 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -81,12 +81,16 @@ static const struct engine_class_infointel_engine_classes[] = {
      },
  };
+#define MAX_MMIO_BASES 3
  struct engine_info {
      unsigned int hw_id;
      unsigned int uabi_id;
      u8 class;
      u8 instance;
-    u32 mmio_base;
+    struct engine_mmio_base {
+        u32 gen : 8;
+        u32 base : 24;
+    } mmio_bases[MAX_MMIO_BASES];
      unsigned irq_shift;
  };
It's not bad, but I am just wondering if it is too complicatedversus simply duplicating the tables.
Duplicated tables would certainly be much less code and complexity,but what about the size of pure data?
In this patch sizeof(struct engine_info) grows by MAX_MMIO_BASES *sizeof(u32), so 12, to the total of 30 if I am not mistaken. TimesNUM_ENGINES equals 240 bytes for intel_engines[] array with thisscheme.
we remove a u32 (the old mmio base) so we only grow 8 per engine,but the compiler rounds up to a multiple of u32 so 28 per engine,for a total of 224.
Separate tables per gens would be:

sizeof(struct engine_info) is 18 bytes.

For < gen6 we'd need 3 engines * 18 = 54
< gen11 = 5 engines * 18 = 90
 >= gen11 = 8 engines * 18 = 144

54 + 90 + 144 = 288 bytes
So a little bit bigger. Hm. Plus we would need to refactor sointel_engines[] is not indexed by engine->id but is contiguousarray with engine->id in the elements. Code to lookup the compactarray should be simpler than combined new code from this patch,especially if you add the test as Chris suggested. So separateengine info arrays would win I think overall - when consideringsize of text + size of data + size of source code.
Not sure. I would exclude the selftest, which is usually notcompiled for released kernels, which leads to:
Yes, but we cannot exclude its source since selftests for separatetables wouldn't be needed.
add/remove: 0/0 grow/shrink: 3/1 up/down: 100/-7 (93)
Function                                     old     new   delta
intel_engines                                160     224     +64
__func__                                   13891   13910     +19
intel_engines_init_mmio                     1247    1264     +17
intel_init_bsd_ring_buffer                   142     135      -7
Total: Before=1479626, After=1479719, chg +0.01%
Total growth is 93, which is less then your estimation for thegrowth introduced by replicating the table.
But on the other hand your solution might be more future proof. SoI don't know. Use the crystal ball to decide? :)
I think we should expect that the total engine count could grow withfuture gens. In this case to me adding a new entry to a unifiedtable seems much cleaner (and uses less data) than adding acompletely new table each time.
Okay I was off in my estimates. But in general I was coming from theangle of thinking that every new mmio base difference in this schemegrows the size for all engines. So if just VCS grows one new base,impact is multiplied by NUM_ENGINES. But maybe separate tables arealso not a solution. Perhaps pulling out mmio_base arrays to outsidestruct engine_info in another set of static tables, so they could benull terminated would be best of both worlds?
struct engine_mmio_base {
     u32 gen : 8;
     u32 base : 24;
};

static const struct engine_mmio_base vcs0_mmio_bases[] = {
     { .gen = 11, .base = GEN11_BSD_RING_BASE },
     { .gen = 6, .base = GEN6_BSD_RING_BASE },
     { .gen = 4, .base = BSD_RING_BASE },
     { },
};

And then in intel_engines array, for BSD:

    {
     ...
     .mmio_bases = &vcs0_mmio_bases;
     ...
    },
This way we limit data growth only to engines which change their mmiobases frequently.
Just an idea.
I gave this a try and the code actually grows:

add/remove: 8/0 grow/shrink: 2/0 up/down: 120/0 (120)
Function                                     old     new   delta
intel_engines                                224     256     +32
vcs0_mmio_bases                                -      16     +16
vecs1_mmio_bases                               -      12     +12
vecs0_mmio_bases                               -      12     +12
vcs1_mmio_bases                                -      12     +12
vcs3_mmio_bases                                -       8      +8
vcs2_mmio_bases                                -       8      +8
rcs_mmio_bases                                 -       8      +8
intel_engines_init_mmio                     1264    1272      +8
bcs_mmio_bases                                 -       4      +4
Total: Before=1479719, After=1479839, chg +0.01%
I have no idea what the compiler is doing to grow intel_engines, sinceby replacing and array of 3 u32s with a pointer I would expect ashrink of 4 bytes per-engine. Anyway, even without the compilerweirdness with
It probably has to align the pointer to 8 bytes so that creates a hole.Moving mmio_base pointer higher up, either to the top or after twounsigned ints should work. Or packing the struct.
this method we would grow the code size of at least 4 bytes per enginebecause we replace an array of 3 u32 (12B) with a pointer (8B) and anarray of at 2 or more u32 (8B+). I guess we can reconsider if/when oneengine reaches more than 3 mmio bases.
Yeah it's fine. I was thinking that since you are almost there it makessense to future proof it more, since no one will probably remember itlater. But OK.

One idea to compact more, in addition to avoiding alignment holes, couldbe to store the engine_mmio_base directly in the engine_mmio_basepointer when it is a single entry, othewrise use a pointer to nullterminated array. Actually we could store two without table indirectionto be most optimal on 64-bit. Something like:


struct engine_mmio_base {
    u32 base : 24; /* Must be LSB, */
    u32 gen : 8;
};

union engine_mmio {
        struct engine_mmio_base mmio_base[2];
        struct engine_mmio_base *mmio_base_list;
};

#define MMIO_PTR_LIST BIT(0)

   {
    ... render engine ...
    .mmio = { (struct engine_mmio_base){.base = ..., ..gen = ...} },
    ...
   },
   {
    ...
    .mmio = { .mmio_base_list = &vcs0_mmio_bases | MMIO_PTR_LIST }
    ...
   },

This could be the best of both worlds, with up to two bases built-in,and engines with more point to an array.



Regards,

Tvrtko




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC] drm/i915: store all mmio bases in intel_engines

Reply via email to