On 18.11.22 18:49, Andrew Stubbs wrote:
On 18/11/2022 17:20, Tobias Burnus wrote:

This looks wrong:

+    /* stackbase = (stack_segment_decr & 0x0000ffffffffffff)
+            + stack_wave_offset);
+       seg_size = dispatch_ptr->private_segment_size;
+       stacklimit = stackbase + seg_size*64;
(this should be '*seg_size' not 'seg_size' and the name should be
s/seg_size/seg_size_ptr/.)
+       with segsize = dispatch_ptr + 6*sizeof(int16_t) +
3*sizeof(int32_t);
+       cf. struct hsa_kernel_dispatch_packet_s in the HSA doc. */
+    rtx ptr;
+    if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0
+        && cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
+      {
+        rtx size_rtx = gen_rtx_REG (DImode,
+ cfun->machine->args.reg[DISPATCH_PTR_ARG]);
+        size_rtx = gen_rtx_MEM (DImode,
+                    gen_rtx_PLUS (DImode, size_rtx,
+                          GEN_INT (6*16 + 3*32)));
+        size_rtx = gen_rtx_MULT (DImode, size_rtx, GEN_INT (64));
+
(Reading it, I think it should be '..._MEM(SImode,' and
'..._MULT(SImode' instead of DImode.)
seg_size is calculated from the private_segment_size loaded from the
dispatch_ptr, not calculated from the dispatch_ptr itself.

Isn't this what thee code tries to do? Namely:


My understanding is that

dispatch_ptr->private_segment_size == *((char*)dispatch_ptr + 192)

And the latter is what I attempt to do. I have a very limited knowledge
of insn/rtx/RTL and of GCN assemply; thus, I likely have done something
stupid. Having said this, Here is what I get:

(Where asm("s4") == dispatch_ptr)

        s_add_u32       s2, s4, 192
        s_addc_u32      s3, s5, 0
        v_writelane_b32 v4, s2, 0
        v_writelane_b32 v5, s3, 0
        s_mov_b64       exec, 1
        flat_load_dwordx2       v[4:5], v[4:5]
        s_waitcnt       0
        v_lshlrev_b64   v[4:5], 6, v[4:5]
        v_readlane_b32  s2, v4, 0
        v_readlane_b32  s3, v5, 0

Not that I really understand every line, but at a glance it
looks okay.

The 192 is because of (quoting newlib/libc/machine/amdgcn/getreent.c):

typedef struct hsa_kernel_dispatch_packet_s {
  uint16_t header ;
  uint16_t setup;
  uint16_t workgroup_size_x ;
  uint16_t workgroup_size_y ;
  uint16_t workgroup_size_z;
  uint16_t reserved0;
  uint32_t grid_size_x ;
  uint32_t grid_size_y ;
  uint32_t grid_size_z;
  uint32_t private_segment_size;

i.e. 6*16 + 3*32 = 192 – and we want to read a 32bit unsigned int.

 * * *

Admittedly, there is probably something not quite right as I see with gfx908

  # of expected passes            27476
  # of unexpected failures        317

where 317 FAIL comes from 88 testcase files.

That's not a a very high number but more than the usual fails, which shows that
something is not quite right.

 * * *

I am pretty sure that I missed something - but the question is what.
I hope you can help me pinpoint the place where it goes wrong.

Thanks,

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Reply via email to