On 6/10/20 5:16 PM, Daniel Vetter wrote:
On Wed, Jun 10, 2020 at 10:30 PM Thomas Hellström (Intel)
<thomas...@shipmail.org> wrote:
On 6/10/20 5:30 PM, Daniel Vetter wrote:
On Wed, Jun 10, 2020 at 04:05:04PM +0200, Christian König wrote:
Am 10.06.20 um 15:54 schrieb Andrey Grodzovsky:
On 6/10/20 6:15 AM, Thomas Hellström (Intel) wrote:
On 6/9/20 7:21 PM, Koenig, Christian wrote:
Am 09.06.2020 18:37 schrieb "Grodzovsky, Andrey"
<andrey.grodzov...@amd.com>:
On 6/5/20 2:40 PM, Christian König wrote:
> Am 05.06.20 um 16:29 schrieb Andrey Grodzovsky:
>>
>> On 5/11/20 2:45 AM, Christian König wrote:
>>> Am 09.05.20 um 20:51 schrieb Andrey Grodzovsky:
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
>>>> ---
>>>> drivers/gpu/drm/ttm/ttm_bo.c | 22 +++++++++++++++++++++-
>>>> include/drm/ttm/ttm_bo_driver.h | 2 ++
>>>> 2 files changed, 23 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> index c5b516f..eae61cc 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> @@ -1750,9 +1750,29 @@ void ttm_bo_unmap_virtual(struct
>>>> ttm_buffer_object *bo)
>>>> ttm_bo_unmap_virtual_locked(bo);
>>>> ttm_mem_io_unlock(man);
>>>> }
>>>> +EXPORT_SYMBOL(ttm_bo_unmap_virtual);
>>>> +void ttm_bo_unmap_virtual_address_space(struct
ttm_bo_device *bdev)
>>>> +{
>>>> + struct ttm_mem_type_manager *man;
>>>> + int i;
>>>> -EXPORT_SYMBOL(ttm_bo_unmap_virtual);
>>>
>>>> + for (i = 0; i < TTM_NUM_MEM_TYPES; i++) {
>>>> + man = &bdev->man[i];
>>>> + if (man->has_type && man->use_type)
>>>> + ttm_mem_io_lock(man, false);
>>>> + }
>>>
>>> You should drop that it will just result in a deadlock
warning for
>>> Nouveau and has no effect at all.
>>>
>>> Apart from that looks good to me,
>>> Christian.
>>
>>
>> As I am considering to re-include this in V2 of the
patchsets, can
>> you clarify please why this will have no effect at all ?
>
> The locks are exclusive for Nouveau to allocate/free the io
address
> space.
>
> Since we don't do this here we don't need the locks.
>
> Christian.
So basically calling unmap_mapping_range doesn't require any extra
locking around it and whatever locks are taken within the function
should be enough ?
I think so, yes.
Christian.
Yes, that's true. However, without the bo reservation, nothing stops
a PTE from being immediately re-faulted back again. Even while
unmap_mapping_range() is running.
Can you explain more on this - specifically, which function to reserve
the BO, why BO reservation would prevent re-fault of the PTE ?
Thomas is talking about ttm_bo_reserver()/ttm_bo_unreserve(), but we don't
need this because we unmap everything because the whole device is gone and
not just manipulate a single BO.
So the device removed flag needs to be advertized before this
function is run,
I indeed intend to call this right after calling drm_dev_unplug from
amdgpu_pci_remove while adding drm_dev_enter/exit in ttm_bo_vm_fault (or
in amdgpu specific wrapper since I don't see how can I access struct
drm_device from ttm_bo_vm_fault) and this in my understanding should
stop a PTE from being re-faulted back as you pointed out - so again I
don't see how bo reservation would prevent it so it looks like I am
missing something...
(perhaps with a memory barrier pair).
drm_dev_unplug and drm_dev_enter/exit are RCU synchronized and so I
don't think require any extra memory barriers for visibility of the
removed flag being set
As far as I can see that should be perfectly sufficient.
Only if you have a drm_dev_enter/exit pair in your fault handler.
Otherwise you're still open to the races Thomas described. But aside from
that the drm_dev_unplug stuff has all the barriers and stuff to make sure
nothing escapes.
Failure to drm_dev_enter could then also trigger the special case where we
put a dummy page in place.
-Daniel
Hmm, Yes, indeed advertizing the flag before the call to
unmap_mapping_range isn't enough, since there might be fault handlers
running that haven't picked up the flag when unmap_mapping_range is
launched.
Hm ... Now I'm not sure drm_dev_enter/exit is actually good enough. I
guess if you use vmf_insert_pfn within the drm_dev_enter/exit critical
section, it should be fine. But I think you can also do fault handlers
that just return the struct page and then let core handle the pte
wrangling, those would indeed race and we can't have that I think.
I think we should try and make sure (as much as possible) that this is
done all done in helpers and not some open coded stuff in drivers, or
we'll just get it all wrong in the details.
Can you please clarify this last paragraph ? Where in your opinion
should I place the drm_dev_enter/exit and the zero page setting to
faulting VA's PTEs ? I was planning to do it in amdgpu specific .fault
handler which in turn calls to ttm_bo_vm_fault.
Andrey
For the special case of syncing a full address-space
unmap_mapping_range() with fault handlers regardless of the reason for
the full address-space unmap_mapping_range() one could either traverse
the address space (drm_vma_manager) and grab *all* bo reservations
around the unmap_mapping_range(), or grab the i_mmap_lock in read mode
in the fault handler. (It's taken in write mode in unmap_mapping_range).
While the latter may seem like a simple solution, one should probably
consider the overhead both in run-time and scaling ability.
drm_dev_enter/exit uses srcu internally, so afaik should scale
ridiculously well and be dirt cheap on the read side. It's horrible on
the flush side in drm_dev_unplug, but hey no one cares about that :-)
-Daniel
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx