Hi Andrey,

I don't have any XGMI machines here, maybe you can reach out shaoyun for help.

On 2022/1/29 上午12:57, Grodzovsky, Andrey wrote:
> Just a gentle ping.
>
> Andrey
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *From:* Grodzovsky, Andrey
> *Sent:* 26 January 2022 10:52
> *To:* Christian König <ckoenig.leichtzumer...@gmail.com>; Koenig, Christian 
> <christian.koe...@amd.com>; Lazar, Lijo <lijo.la...@amd.com>; 
> dri-de...@lists.freedesktop.org <dri-de...@lists.freedesktop.org>; 
> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Chen, JingWen 
> <jingwen.ch...@amd.com>
> *Cc:* Chen, Horace <horace.c...@amd.com>; Liu, Monk <monk....@amd.com>
> *Subject:* Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with 
> TDRs
>  
>
> JingWen - could you maybe give those patches a try on SRIOV XGMI system ? If 
> you see issues maybe you could let me connect and debug. My SRIOV XGMI system 
> which Shayun kindly arranged for me is not loading the driver with my 
> drm-misc-next branch even without my patches.
>
> Andrey
>
> On 2022-01-17 14:21, Andrey Grodzovsky wrote:
>>
>>
>> On 2022-01-17 2:17 p.m., Christian König wrote:
>>> Am 17.01.22 um 20:14 schrieb Andrey Grodzovsky:
>>>>
>>>> Ping on the question
>>>>
>>>
>>> Oh, my! That was already more than a week ago and is completely swapped out 
>>> of my head again.
>>>
>>>> Andrey
>>>>
>>>> On 2022-01-05 1:11 p.m., Andrey Grodzovsky wrote:
>>>>>>> Also, what about having the reset_active or in_reset flag in the 
>>>>>>> reset_domain itself?
>>>>>>
>>>>>> Of hand that sounds like a good idea.
>>>>>
>>>>>
>>>>> What then about the adev->reset_sem semaphore ? Should we also move this 
>>>>> to reset_domain ?  Both of the moves have functional
>>>>> implications only for XGMI case because there will be contention over 
>>>>> accessing those single instance variables from multiple devices
>>>>> while now each device has it's own copy.
>>>
>>> Since this is a rw semaphore that should be unproblematic I think. It could 
>>> just be that the cache line of the lock then plays ping/pong between the 
>>> CPU cores.
>>>
>>>>>
>>>>> What benefit the centralization into reset_domain gives - is it for 
>>>>> example to prevent one device in a hive trying to access through MMIO 
>>>>> another one's
>>>>> VRAM (shared FB memory) while the other one goes through reset ?
>>>
>>> I think that this is the killer argument for a centralized lock, yes.
>>
>>
>> np, i will add a patch with centralizing both flag into reset domain and 
>> resend.
>>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>>>
>>>>> Andrey 
>>>

Reply via email to