Re: [Mesa-dev] [PATCH 31/33] intel: decoder: decouple decoding from memory pointers

2017-11-01 Thread Lionel Landwerlin

On 01/11/17 15:09, Scott D Phillips wrote:

Lionel Landwerlin  writes:


On 31/10/17 23:04, Scott D Phillips wrote:

Lionel Landwerlin  writes:


On 31/10/17 20:54, Scott D Phillips wrote:

Lionel Landwerlin  writes:


We want to introduce a reader interface for accessing memory, so that
later on we can use different ways of storing the content of the GTT
address space that don't involve a pointer to a linear buffer.

I'm kinda sceptical that this is the best way to achieve what you want
here. It strikes me as code that we'll look at in a year and wonder
what's going on.

If I'm understanding, it seems like the essence of what you're going for
here is in the one place where you're using the sub_struct_reader. Maybe
instead of plumbing the reader object through everywhere, you can add a
callback just in gen_print_group for fixing up offsets to pointers, and
then leave everywhere else assuming contiguous memory blocks as today.

First, thanks for you time reviewing this!

I should have stated that in patch 33 I introduce a sparse memory object
that isn't contiguous.
It's based on the data structure described here :
https://en.wikipedia.org/wiki/Hash_array_mapped_trie

The idea is to split the memory into chunks of 4Kb but still make it
look like it's a 64bit address space.
The trie structure allows for reuse of pages at different point in time
without having an actual copy of the whole address space.

What I meant was that most dword reads will really be adjacent in a
piece of memory and leaving the simple pointer math there is
clearer. You will only need to callback for indirection when you're
chasing an offset or an address.


Like a couple of pages might have been written by relocations associated
to the first batch buffer, then 10 batches later you override them.
The amount of memory we need to allocate for storing 2 snapshots is just
the modified pages (+ ~12 nodes in the trie but those are less than
300bytes).
That allows the UI to decode 2 batches at the same time as well as all
the associated memory with a small cost.

Really there's no need to manage any memory for the buffers themselves,
they're immutably stored in the aub file. If you mmap the entire file
then you would just need to have a map of gfx addrs to file addrs that
would help direct your decoding.


Thanks, I'll try that.

Thinking more about it, I remember that intel_aubdump will break up
buffers into 32KiB chunks. So that would cause problems for this idea for
buffers bigger than 32KiB. We could try just not doing that splitting in
aubdump and see if it has any other adverse effects.

I gave a try to your approach and it seems to work but I'm still dealing 
with bugs everywhere :(

Still, I like the idea of the trie :)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 31/33] intel: decoder: decouple decoding from memory pointers

2017-11-01 Thread Scott D Phillips
Lionel Landwerlin  writes:

> On 31/10/17 23:04, Scott D Phillips wrote:
>> Lionel Landwerlin  writes:
>>
>>> On 31/10/17 20:54, Scott D Phillips wrote:
 Lionel Landwerlin  writes:

> We want to introduce a reader interface for accessing memory, so that
> later on we can use different ways of storing the content of the GTT
> address space that don't involve a pointer to a linear buffer.
 I'm kinda sceptical that this is the best way to achieve what you want
 here. It strikes me as code that we'll look at in a year and wonder
 what's going on.

 If I'm understanding, it seems like the essence of what you're going for
 here is in the one place where you're using the sub_struct_reader. Maybe
 instead of plumbing the reader object through everywhere, you can add a
 callback just in gen_print_group for fixing up offsets to pointers, and
 then leave everywhere else assuming contiguous memory blocks as today.
>>> First, thanks for you time reviewing this!
>>>
>>> I should have stated that in patch 33 I introduce a sparse memory object
>>> that isn't contiguous.
>>> It's based on the data structure described here :
>>> https://en.wikipedia.org/wiki/Hash_array_mapped_trie
>>>
>>> The idea is to split the memory into chunks of 4Kb but still make it
>>> look like it's a 64bit address space.
>>> The trie structure allows for reuse of pages at different point in time
>>> without having an actual copy of the whole address space.
>> What I meant was that most dword reads will really be adjacent in a
>> piece of memory and leaving the simple pointer math there is
>> clearer. You will only need to callback for indirection when you're
>> chasing an offset or an address.
>>
>>> Like a couple of pages might have been written by relocations associated
>>> to the first batch buffer, then 10 batches later you override them.
>>> The amount of memory we need to allocate for storing 2 snapshots is just
>>> the modified pages (+ ~12 nodes in the trie but those are less than
>>> 300bytes).
>>> That allows the UI to decode 2 batches at the same time as well as all
>>> the associated memory with a small cost.
>> Really there's no need to manage any memory for the buffers themselves,
>> they're immutably stored in the aub file. If you mmap the entire file
>> then you would just need to have a map of gfx addrs to file addrs that
>> would help direct your decoding.
>>
> Thanks, I'll try that.

Thinking more about it, I remember that intel_aubdump will break up
buffers into 32KiB chunks. So that would cause problems for this idea for
buffers bigger than 32KiB. We could try just not doing that splitting in
aubdump and see if it has any other adverse effects.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 31/33] intel: decoder: decouple decoding from memory pointers

2017-10-31 Thread Lionel Landwerlin

On 31/10/17 23:04, Scott D Phillips wrote:

Lionel Landwerlin  writes:


On 31/10/17 20:54, Scott D Phillips wrote:

Lionel Landwerlin  writes:


We want to introduce a reader interface for accessing memory, so that
later on we can use different ways of storing the content of the GTT
address space that don't involve a pointer to a linear buffer.

I'm kinda sceptical that this is the best way to achieve what you want
here. It strikes me as code that we'll look at in a year and wonder
what's going on.

If I'm understanding, it seems like the essence of what you're going for
here is in the one place where you're using the sub_struct_reader. Maybe
instead of plumbing the reader object through everywhere, you can add a
callback just in gen_print_group for fixing up offsets to pointers, and
then leave everywhere else assuming contiguous memory blocks as today.

First, thanks for you time reviewing this!

I should have stated that in patch 33 I introduce a sparse memory object
that isn't contiguous.
It's based on the data structure described here :
https://en.wikipedia.org/wiki/Hash_array_mapped_trie

The idea is to split the memory into chunks of 4Kb but still make it
look like it's a 64bit address space.
The trie structure allows for reuse of pages at different point in time
without having an actual copy of the whole address space.

What I meant was that most dword reads will really be adjacent in a
piece of memory and leaving the simple pointer math there is
clearer. You will only need to callback for indirection when you're
chasing an offset or an address.


Like a couple of pages might have been written by relocations associated
to the first batch buffer, then 10 batches later you override them.
The amount of memory we need to allocate for storing 2 snapshots is just
the modified pages (+ ~12 nodes in the trie but those are less than
300bytes).
That allows the UI to decode 2 batches at the same time as well as all
the associated memory with a small cost.

Really there's no need to manage any memory for the buffers themselves,
they're immutably stored in the aub file. If you mmap the entire file
then you would just need to have a map of gfx addrs to file addrs that
would help direct your decoding.


Thanks, I'll try that.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 31/33] intel: decoder: decouple decoding from memory pointers

2017-10-31 Thread Scott D Phillips
Lionel Landwerlin  writes:

> On 31/10/17 20:54, Scott D Phillips wrote:
>> Lionel Landwerlin  writes:
>>
>>> We want to introduce a reader interface for accessing memory, so that
>>> later on we can use different ways of storing the content of the GTT
>>> address space that don't involve a pointer to a linear buffer.
>> I'm kinda sceptical that this is the best way to achieve what you want
>> here. It strikes me as code that we'll look at in a year and wonder
>> what's going on.
>>
>> If I'm understanding, it seems like the essence of what you're going for
>> here is in the one place where you're using the sub_struct_reader. Maybe
>> instead of plumbing the reader object through everywhere, you can add a
>> callback just in gen_print_group for fixing up offsets to pointers, and
>> then leave everywhere else assuming contiguous memory blocks as today.
>
> First, thanks for you time reviewing this!
>
> I should have stated that in patch 33 I introduce a sparse memory object 
> that isn't contiguous.
> It's based on the data structure described here : 
> https://en.wikipedia.org/wiki/Hash_array_mapped_trie
>
> The idea is to split the memory into chunks of 4Kb but still make it 
> look like it's a 64bit address space.
> The trie structure allows for reuse of pages at different point in time 
> without having an actual copy of the whole address space.

What I meant was that most dword reads will really be adjacent in a
piece of memory and leaving the simple pointer math there is
clearer. You will only need to callback for indirection when you're
chasing an offset or an address.

> Like a couple of pages might have been written by relocations associated 
> to the first batch buffer, then 10 batches later you override them.
> The amount of memory we need to allocate for storing 2 snapshots is just 
> the modified pages (+ ~12 nodes in the trie but those are less than 
> 300bytes).
> That allows the UI to decode 2 batches at the same time as well as all 
> the associated memory with a small cost.

Really there's no need to manage any memory for the buffers themselves,
they're immutably stored in the aub file. If you mmap the entire file
then you would just need to have a map of gfx addrs to file addrs that
would help direct your decoding.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 31/33] intel: decoder: decouple decoding from memory pointers

2017-10-31 Thread Lionel Landwerlin

On 31/10/17 20:54, Scott D Phillips wrote:

Lionel Landwerlin  writes:


We want to introduce a reader interface for accessing memory, so that
later on we can use different ways of storing the content of the GTT
address space that don't involve a pointer to a linear buffer.

I'm kinda sceptical that this is the best way to achieve what you want
here. It strikes me as code that we'll look at in a year and wonder
what's going on.

If I'm understanding, it seems like the essence of what you're going for
here is in the one place where you're using the sub_struct_reader. Maybe
instead of plumbing the reader object through everywhere, you can add a
callback just in gen_print_group for fixing up offsets to pointers, and
then leave everywhere else assuming contiguous memory blocks as today.


First, thanks for you time reviewing this!

I should have stated that in patch 33 I introduce a sparse memory object 
that isn't contiguous.
It's based on the data structure described here : 
https://en.wikipedia.org/wiki/Hash_array_mapped_trie


The idea is to split the memory into chunks of 4Kb but still make it 
look like it's a 64bit address space.
The trie structure allows for reuse of pages at different point in time 
without having an actual copy of the whole address space.


Like a couple of pages might have been written by relocations associated 
to the first batch buffer, then 10 batches later you override them.
The amount of memory we need to allocate for storing 2 snapshots is just 
the modified pages (+ ~12 nodes in the trie but those are less than 
300bytes).
That allows the UI to decode 2 batches at the same time as well as all 
the associated memory with a small cost.





Signed-off-by: Lionel Landwerlin 
---
  src/intel/common/gen_decoder.c| 75 ---
  src/intel/common/gen_decoder.h| 24 +++--
  src/intel/tools/aubinator.c   |  7 ++-
  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 26 +++---
  4 files changed, 101 insertions(+), 31 deletions(-)

diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c
index 098ff472b37..c3fa150a6ea 100644
--- a/src/intel/common/gen_decoder.c
+++ b/src/intel/common/gen_decoder.c
@@ -807,12 +807,18 @@ iter_group_offset_bits(const struct gen_field_iterator 
*iter,
 return iter->group->group_offset + (group_iter * iter->group->group_size);
  }
  
+uint32_t gen_read_dword_from_pointer(void *user_data, uint32_t dword_offset)

+{
+   return ((uint32_t *) user_data)[dword_offset];
+}
+
  static bool
  iter_more_groups(const struct gen_field_iterator *iter)
  {
 if (iter->group->variable) {
return iter_group_offset_bits(iter, iter->group_iter + 1) <
-  (gen_group_get_length(iter->group, iter->p) * 32);
+ (gen_group_get_length(iter->group,
+   gen_read_dword(iter->reader, 0)) * 32);
 } else {
return (iter->group_iter + 1) < iter->group->group_count ||
   iter->group->next != NULL;
@@ -856,17 +862,20 @@ iter_advance_field(struct gen_field_iterator *iter)
  
  static uint64_t

  iter_decode_field_raw(struct gen_field *field,
-  const uint32_t *p,
-  const uint32_t *end)
+  uint32_t dword_offset,
+  uint32_t dword_end,
+  const struct gen_dword_reader *reader)
  {
 uint64_t qw = 0;
  
 if ((field->end - field->start) > 32) {

-  if ((p + 1) < end)
- qw = ((uint64_t) p[1]) << 32;
-  qw |= p[0];
+  if ((dword_offset + 1) < dword_end) {
+ qw = gen_read_dword(reader, dword_offset + 1);
+ qw <<= 32;
+  }
+  qw |= gen_read_dword(reader, dword_offset);
 } else
-  qw = p[0];
+  qw = gen_read_dword(reader, dword_offset);
  
 qw = field_value(qw, field->start, field->end);
  
@@ -895,8 +904,8 @@ iter_decode_field(struct gen_field_iterator *iter)
  
 memset(&v, 0, sizeof(v));
  
-   v.qw = iter_decode_field_raw(iter->field,

-&iter->p[iter->dword], iter->end);
+   v.qw = iter_decode_field_raw(iter->field, iter->dword,
+iter->dword_end, iter->reader);
  
 const char *enum_name = NULL;
  
@@ -966,7 +975,7 @@ iter_decode_field(struct gen_field_iterator *iter)

  void
  gen_field_iterator_init(struct gen_field_iterator *iter,
  struct gen_group *group,
-const uint32_t *p,
+const struct gen_dword_reader *reader,
  bool print_colors)
  {
 memset(iter, 0, sizeof(*iter));
@@ -976,8 +985,9 @@ gen_field_iterator_init(struct gen_field_iterator *iter,
iter->field = group->fields;
 else
iter->field = group->next->fields;
-   iter->p = p;
-   iter->end = &p[gen_group_get_length(iter->group, p[0])];
+   iter->reader = reader;
+   iter->dword_end = gen_group_get_length(iter

Re: [Mesa-dev] [PATCH 31/33] intel: decoder: decouple decoding from memory pointers

2017-10-31 Thread Scott D Phillips
Lionel Landwerlin  writes:

> We want to introduce a reader interface for accessing memory, so that
> later on we can use different ways of storing the content of the GTT
> address space that don't involve a pointer to a linear buffer.

I'm kinda sceptical that this is the best way to achieve what you want
here. It strikes me as code that we'll look at in a year and wonder
what's going on.

If I'm understanding, it seems like the essence of what you're going for
here is in the one place where you're using the sub_struct_reader. Maybe
instead of plumbing the reader object through everywhere, you can add a
callback just in gen_print_group for fixing up offsets to pointers, and
then leave everywhere else assuming contiguous memory blocks as today.

> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/common/gen_decoder.c| 75 
> ---
>  src/intel/common/gen_decoder.h| 24 +++--
>  src/intel/tools/aubinator.c   |  7 ++-
>  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 26 +++---
>  4 files changed, 101 insertions(+), 31 deletions(-)
>
> diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c
> index 098ff472b37..c3fa150a6ea 100644
> --- a/src/intel/common/gen_decoder.c
> +++ b/src/intel/common/gen_decoder.c
> @@ -807,12 +807,18 @@ iter_group_offset_bits(const struct gen_field_iterator 
> *iter,
> return iter->group->group_offset + (group_iter * iter->group->group_size);
>  }
>  
> +uint32_t gen_read_dword_from_pointer(void *user_data, uint32_t dword_offset)
> +{
> +   return ((uint32_t *) user_data)[dword_offset];
> +}
> +
>  static bool
>  iter_more_groups(const struct gen_field_iterator *iter)
>  {
> if (iter->group->variable) {
>return iter_group_offset_bits(iter, iter->group_iter + 1) <
> -  (gen_group_get_length(iter->group, iter->p) * 32);
> + (gen_group_get_length(iter->group,
> +   gen_read_dword(iter->reader, 0)) * 32);
> } else {
>return (iter->group_iter + 1) < iter->group->group_count ||
>   iter->group->next != NULL;
> @@ -856,17 +862,20 @@ iter_advance_field(struct gen_field_iterator *iter)
>  
>  static uint64_t
>  iter_decode_field_raw(struct gen_field *field,
> -  const uint32_t *p,
> -  const uint32_t *end)
> +  uint32_t dword_offset,
> +  uint32_t dword_end,
> +  const struct gen_dword_reader *reader)
>  {
> uint64_t qw = 0;
>  
> if ((field->end - field->start) > 32) {
> -  if ((p + 1) < end)
> - qw = ((uint64_t) p[1]) << 32;
> -  qw |= p[0];
> +  if ((dword_offset + 1) < dword_end) {
> + qw = gen_read_dword(reader, dword_offset + 1);
> + qw <<= 32;
> +  }
> +  qw |= gen_read_dword(reader, dword_offset);
> } else
> -  qw = p[0];
> +  qw = gen_read_dword(reader, dword_offset);
>  
> qw = field_value(qw, field->start, field->end);
>  
> @@ -895,8 +904,8 @@ iter_decode_field(struct gen_field_iterator *iter)
>  
> memset(&v, 0, sizeof(v));
>  
> -   v.qw = iter_decode_field_raw(iter->field,
> -&iter->p[iter->dword], iter->end);
> +   v.qw = iter_decode_field_raw(iter->field, iter->dword,
> +iter->dword_end, iter->reader);
>  
> const char *enum_name = NULL;
>  
> @@ -966,7 +975,7 @@ iter_decode_field(struct gen_field_iterator *iter)
>  void
>  gen_field_iterator_init(struct gen_field_iterator *iter,
>  struct gen_group *group,
> -const uint32_t *p,
> +const struct gen_dword_reader *reader,
>  bool print_colors)
>  {
> memset(iter, 0, sizeof(*iter));
> @@ -976,8 +985,9 @@ gen_field_iterator_init(struct gen_field_iterator *iter,
>iter->field = group->fields;
> else
>iter->field = group->next->fields;
> -   iter->p = p;
> -   iter->end = &p[gen_group_get_length(iter->group, p[0])];
> +   iter->reader = reader;
> +   iter->dword_end = gen_group_get_length(iter->group,
> +  gen_read_dword(reader, 0));
> iter->print_colors = print_colors;
>  
> iter_decode_field(iter);
> @@ -997,10 +1007,12 @@ gen_field_iterator_next(struct gen_field_iterator 
> *iter)
>  static void
>  print_dword_header(FILE *outfile,
> struct gen_field_iterator *iter,
> -   uint64_t offset, uint32_t dword)
> +   uint64_t offset,
> +   uint32_t dword)
>  {
> fprintf(outfile, "0x%08"PRIx64":  0x%08x : Dword %d\n",
> -   offset + 4 * dword, iter->p[dword], dword);
> +   offset + 4 * dword,
> +   gen_read_dword(iter->reader, dword), dword);
>  }
>  
>  bool
> @@ -1018,21 +1030,38 @@ gen_field_is_header(struct gen_field *field)
>  }
>  
>  void gen_fie

[Mesa-dev] [PATCH 31/33] intel: decoder: decouple decoding from memory pointers

2017-10-30 Thread Lionel Landwerlin
We want to introduce a reader interface for accessing memory, so that
later on we can use different ways of storing the content of the GTT
address space that don't involve a pointer to a linear buffer.

Signed-off-by: Lionel Landwerlin 
---
 src/intel/common/gen_decoder.c| 75 ---
 src/intel/common/gen_decoder.h| 24 +++--
 src/intel/tools/aubinator.c   |  7 ++-
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 26 +++---
 4 files changed, 101 insertions(+), 31 deletions(-)

diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c
index 098ff472b37..c3fa150a6ea 100644
--- a/src/intel/common/gen_decoder.c
+++ b/src/intel/common/gen_decoder.c
@@ -807,12 +807,18 @@ iter_group_offset_bits(const struct gen_field_iterator 
*iter,
return iter->group->group_offset + (group_iter * iter->group->group_size);
 }
 
+uint32_t gen_read_dword_from_pointer(void *user_data, uint32_t dword_offset)
+{
+   return ((uint32_t *) user_data)[dword_offset];
+}
+
 static bool
 iter_more_groups(const struct gen_field_iterator *iter)
 {
if (iter->group->variable) {
   return iter_group_offset_bits(iter, iter->group_iter + 1) <
-  (gen_group_get_length(iter->group, iter->p) * 32);
+ (gen_group_get_length(iter->group,
+   gen_read_dword(iter->reader, 0)) * 32);
} else {
   return (iter->group_iter + 1) < iter->group->group_count ||
  iter->group->next != NULL;
@@ -856,17 +862,20 @@ iter_advance_field(struct gen_field_iterator *iter)
 
 static uint64_t
 iter_decode_field_raw(struct gen_field *field,
-  const uint32_t *p,
-  const uint32_t *end)
+  uint32_t dword_offset,
+  uint32_t dword_end,
+  const struct gen_dword_reader *reader)
 {
uint64_t qw = 0;
 
if ((field->end - field->start) > 32) {
-  if ((p + 1) < end)
- qw = ((uint64_t) p[1]) << 32;
-  qw |= p[0];
+  if ((dword_offset + 1) < dword_end) {
+ qw = gen_read_dword(reader, dword_offset + 1);
+ qw <<= 32;
+  }
+  qw |= gen_read_dword(reader, dword_offset);
} else
-  qw = p[0];
+  qw = gen_read_dword(reader, dword_offset);
 
qw = field_value(qw, field->start, field->end);
 
@@ -895,8 +904,8 @@ iter_decode_field(struct gen_field_iterator *iter)
 
memset(&v, 0, sizeof(v));
 
-   v.qw = iter_decode_field_raw(iter->field,
-&iter->p[iter->dword], iter->end);
+   v.qw = iter_decode_field_raw(iter->field, iter->dword,
+iter->dword_end, iter->reader);
 
const char *enum_name = NULL;
 
@@ -966,7 +975,7 @@ iter_decode_field(struct gen_field_iterator *iter)
 void
 gen_field_iterator_init(struct gen_field_iterator *iter,
 struct gen_group *group,
-const uint32_t *p,
+const struct gen_dword_reader *reader,
 bool print_colors)
 {
memset(iter, 0, sizeof(*iter));
@@ -976,8 +985,9 @@ gen_field_iterator_init(struct gen_field_iterator *iter,
   iter->field = group->fields;
else
   iter->field = group->next->fields;
-   iter->p = p;
-   iter->end = &p[gen_group_get_length(iter->group, p[0])];
+   iter->reader = reader;
+   iter->dword_end = gen_group_get_length(iter->group,
+  gen_read_dword(reader, 0));
iter->print_colors = print_colors;
 
iter_decode_field(iter);
@@ -997,10 +1007,12 @@ gen_field_iterator_next(struct gen_field_iterator *iter)
 static void
 print_dword_header(FILE *outfile,
struct gen_field_iterator *iter,
-   uint64_t offset, uint32_t dword)
+   uint64_t offset,
+   uint32_t dword)
 {
fprintf(outfile, "0x%08"PRIx64":  0x%08x : Dword %d\n",
-   offset + 4 * dword, iter->p[dword], dword);
+   offset + 4 * dword,
+   gen_read_dword(iter->reader, dword), dword);
 }
 
 bool
@@ -1018,21 +1030,38 @@ gen_field_is_header(struct gen_field *field)
 }
 
 void gen_field_decode(struct gen_field *field,
-  const uint32_t *p, const uint32_t *end,
+  const struct gen_dword_reader *reader,
   union gen_field_value *value)
 {
+   uint32_t length = gen_group_get_length(field->parent,
+  gen_read_dword(reader, 0));
uint32_t dword = field->start / 32;
-   value->u64 = iter_decode_field_raw(field, &p[dword], end);
+   value->u64 = iter_decode_field_raw(field, dword, length, reader);
+}
+
+struct sub_struct_reader {
+   struct gen_dword_reader base;
+   const struct gen_dword_reader *reader;
+   uint32_t struct_offset;
+};
+
+static uint32_t
+read_struct_dword(void *user_data, uint32_t dword_offset)
+{
+   struct sub_struct_reader *reader =