Am 16.08.2018 um 20:43 schrieb Felix Kuehling:
On 2018-08-16 02:26 PM, Christian König wrote:
Am 16.08.2018 um 20:23 schrieb Felix Kuehling:
On 2018-08-16 02:18 PM, Christian König wrote:
Am 16.08.2018 um 18:50 schrieb Felix Kuehling:
On 2018-08-16 02:43 AM, Christian König wrote:
[SNIP]
I mean it could be that in the worst case we race and stop a KFD
process for no good reason.
Right. For a more practical example, a KFD BO can get evicted just
before the application decides to unmap it. The preemption happens
asynchronously, handled by an SDMA job in the GPU scheduler. That job
will have an amdgpu_sync object with the eviction fence in it.
While that SDMA job is pending or in progress, the application decides
to unmap the BO. That removes the eviction fence from that BO's
reservation. But it can't remove the fence from all the sync objects
that were previously created and are still in flight. So the
preemption
will be triggered, and the fence will eventually signal when the KFD
preemption is complete.
I don't think that's something we can prevent. The worst case is
that a
preemption happens unnecessarily if an eviction gets triggered just
before removing the fence. But removing the fence will prevent future
evictions of the BO from triggering a KFD process preemption.
That's the
best we can do.
It's true that you can't drop the SDMA job which wants to evict the
BO, but at this time the fence signaling is already underway and not
stoppable anymore.
Replacing the fence with a new one would just be much more cleaner and
fix quite a bunch of corner cases where the KFD process would be
preempted without good reason.
Replacing the fence cleanly probably also involves a preemption, so you
don't gain anything.
Mhm, why that?
My idea would be to create a new fence, replace the old one with the
new one and then manually signal the old one.
So why should there be a preemption triggered here?
Right maybe you can do this without triggering a preemption, but it
would require a bunch of new code. Currently the only thing that
replaces an old eviction fence with a new one is a preemption+restore.
You'll need to replace the fence in all BOs belonging to the process.
Since removing a fence is something you don't want to do, that really
means adding a new fence, and leaving the old fence in place until it
signals. In the mean time you have two eviction fences active for this
process, and if either one gets triggered, you'll get a preemption. Only
after all BOs have the new eviction fence, you can disarm the old
eviction fence and signal it (without triggering a preemption).
The way I see it, this adds a bunch of CPU overhead (depending on the
number of BOs in the process), and increases the likelihood of
unnecessary preemptions, because it takes that much longer for the
operation to complete.
Yeah, the CPU overhead is indeed a really good point.
Talking about overhead, imagine a process cleaning up at process
termination, which unmaps and frees all BOs. Each BO that gets freed
replaces the eviction fence on all remaining BOs. If you have N BOs,
you'll end up creating N new eviction fences in the process. The
overhead scales with O(N²) of the number of BOs in the process.
Well as I said that is the only case where I think replacing fences is
unproblematic.
E.g. when a BO is freed we already have a mechanism to copy all fences
to a local reservation object (to prevent that any new fences can be
added to the BO if it shares it's reservation object).
At this time it should be really easy to filter out BOs you don't want
to wait for when the BO is destructed.
But anyway, when you can live with the possibility that a stale fence
triggers a process preemption than there isn't much point in cleaning
this up.
Regards,
Christian.
Regards,
Felix
Christian.
Regards,
Felix
It's probably quite a bit of more CPU overhead of doing so, but I
think that this would still be the more fail prove option.
Regards,
Christian.
Regards,
Felix
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx