On 2/12/26 10:13, Arsen Arsenović wrote:
Hi!
Matthew Malcomson <[email protected]> writes:
From: Matthew Malcomson <[email protected]>
I imagine there is a good chunk of performance to be gained in adding
more logic here:
1) Something recording whether the affinity setup has changed since the
last non-nested team and if not removing the calculation of places
and assignment to `nthr->ts.place_partition_*` variables.
2) I could move more members that are assigned multiple times to `team`
for each secondary thread to take.
- `data` pointer.
- `num_teams`.
- `team_num`.
- Honestly should have looked into this in the patch -- noticed it
while writing this cover letter and will look into whether this is
feasible relatively soon.
3) If we can identify that we're re-using a team from the last parallel
region (and affinity ICV has not changed) it seems that we could
avoid re-initialising some of its fields:
- ordered_release[i] should already point to `nthr`?
- ts.team_id should already be set.
Overall if we can identify that we're using the same team as was
cached and we don't need to change anything we should be able to get
away with drastically less work in the serial part of the call to
GOMP_parallel.
I've actually made pretty much this same change for the GCN
configuration of libgomp, resulting in a semantic conflict with this
patch (yet to upstream it, though).
It may make sense to extract the common thread initialization (including
from this cache) into a function so that it can be used in other
libgomp configurations.
Hi Arsen,
Nice to know others see the benefit!
Having a common thread initialisation function sounds like a sensible idea.
One complication would be that (once I've investigated the three
improvements I mentioned above) I expect some of the thread
initialization done in the generic code would not be things that the
targets use (e.g. the `num_teams` member).
I guess if that does turn out to be the case we could take a little
sub-optimality (in the parallel region so much less critical) rather
than making a generic function that is strongly tied to the current
implementation details of backends.
MM