On 18.11.22 18:49, Andrew Stubbs wrote:
On 18/11/2022 17:20, Tobias Burnus wrote:
This looks wrong:
+ /* stackbase = (stack_segment_decr & 0x0000ffffffffffff)
+ + stack_wave_offset);
+ seg_size = dispatch_ptr->private_segment_size;
+ stacklimit = stackbase + seg_size*64;
(this should be '*seg_size' not 'seg_size' and the name should be
s/seg_size/seg_size_ptr/.)
+ with segsize = dispatch_ptr + 6*sizeof(int16_t) +
3*sizeof(int32_t);
+ cf. struct hsa_kernel_dispatch_packet_s in the HSA doc. */
+ rtx ptr;
+ if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0
+ && cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
+ {
+ rtx size_rtx = gen_rtx_REG (DImode,
+ cfun->machine->args.reg[DISPATCH_PTR_ARG]);
+ size_rtx = gen_rtx_MEM (DImode,
+ gen_rtx_PLUS (DImode, size_rtx,
+ GEN_INT (6*16 + 3*32)));
+ size_rtx = gen_rtx_MULT (DImode, size_rtx, GEN_INT (64));
+
(Reading it, I think it should be '..._MEM(SImode,' and
'..._MULT(SImode' instead of DImode.)
seg_size is calculated from the private_segment_size loaded from the
dispatch_ptr, not calculated from the dispatch_ptr itself.
Isn't this what thee code tries to do? Namely:
My understanding is that
dispatch_ptr->private_segment_size == *((char*)dispatch_ptr + 192)
And the latter is what I attempt to do. I have a very limited knowledge
of insn/rtx/RTL and of GCN assemply; thus, I likely have done something
stupid. Having said this, Here is what I get:
(Where asm("s4") == dispatch_ptr)
s_add_u32 s2, s4, 192
s_addc_u32 s3, s5, 0
v_writelane_b32 v4, s2, 0
v_writelane_b32 v5, s3, 0
s_mov_b64 exec, 1
flat_load_dwordx2 v[4:5], v[4:5]
s_waitcnt 0
v_lshlrev_b64 v[4:5], 6, v[4:5]
v_readlane_b32 s2, v4, 0
v_readlane_b32 s3, v5, 0
Not that I really understand every line, but at a glance it
looks okay.
The 192 is because of (quoting newlib/libc/machine/amdgcn/getreent.c):
typedef struct hsa_kernel_dispatch_packet_s {
uint16_t header ;
uint16_t setup;
uint16_t workgroup_size_x ;
uint16_t workgroup_size_y ;
uint16_t workgroup_size_z;
uint16_t reserved0;
uint32_t grid_size_x ;
uint32_t grid_size_y ;
uint32_t grid_size_z;
uint32_t private_segment_size;
i.e. 6*16 + 3*32 = 192 – and we want to read a 32bit unsigned int.
* * *
Admittedly, there is probably something not quite right as I see with gfx908
# of expected passes 27476
# of unexpected failures 317
where 317 FAIL comes from 88 testcase files.
That's not a a very high number but more than the usual fails, which shows that
something is not quite right.
* * *
I am pretty sure that I missed something - but the question is what.
I hope you can help me pinpoint the place where it goes wrong.
Thanks,
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht
München, HRB 106955