On 12.10.22 19:09, Andrew Stubbs wrote:

On 12/10/2022 15:29, Tobias Burnus wrote:

Right, sorry, the buffer is circular, but the counter is linear. It
simplified reservation that way, but it does mean that there's a limit
to the number of times the buffer can cycle before the counter
saturates. (You'd need to stream out gigabytes of data to hit the
limit though.)
Or in other words, you can have 2^32 = 4,294,967,296 (write chunks +
reverse offloads) per kernel launch.
...
PS: Currently, device stack variables are private and cannot be
accessed from the host; this will change in a separate patch. [...]
So, the patch, as is, is known to be non-functional? How can you have
tested it? For the addrs_sizes_kind data to be accessible the
asm("s8") has to be wrong.

I have tested the non-addrs_sizes_kind part only, which permits to run
reverse-offload functions just fine, but only if they do not use
firstprivate or map. — And I actually also tested with the
addrs_sizes_kind part but that unsurprisingly fails hard when trying to
copy the stack data.

I think the patch looks good, in principle. The use of the existing
ring-buffer is the right way to do it, IMO. Can we get the manually
allocated stacks patch in first and then follow up with these patches
when they actually work?

I stash this patch as: "OK – but ams still want to have a glance once
__builtin_gcn_kernarg_ptr is in".

I terms of having fewer *.diff files around, I of course would prefer to
just change one line in a follow-up commit instead of keeping a full
patch around, but holding off until __builtin_gcn_kernarg_ptr is ready +
the default has changed to non-private stack variables is also fine.

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Reply via email to