RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-13 Thread Russell, Kent
Fantastic, I'm integrating it now. Fingers crossed!

 Kent

> -Original Message-
> From: Kuehling, Felix
> Sent: Tuesday, March 12, 2019 7:31 PM
> To: Yang, Philip ; Russell, Kent
> ; Koenig, Christian ;
> amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> 
> Never mind. I must have messed up my build. I can't reproduce the problem
> any more. The patch I sent out is still needed and valid. AFAICT it should be 
> all
> that's needed to fix GPUVM for KFD.
> 
> I have not seen any faults with KFDCWSRTest.BasicTest on my system with
> Fiji or Vega10 with that patch applied.
> 
> Regards,
>    Felix
> 
> On 2019-03-12 5:19 p.m., Felix Kuehling wrote:
> > I'm also still seeing VM faults in the eviction test even with my fix,
> > and even with SDMA page table updates. There is still something else
> > going wrong. :/
> >
> > Thanks,
> >   Felix
> >
> > On 2019-03-12 5:13 p.m., Yang, Philip wrote:
> >> vm fault happens about 1/10 for KFDCWSRTest.BasicTest for me. I am
> >> using SDMA for page table update. I don't try CPU page table update.
> >>
> >> Philip
> >>
> >> On 2019-03-12 11:12 a.m., Russell, Kent wrote:
> >>> Peculiar, I hit it immediately when I ran it . Can you try use
> >>> --gtest_filter=KFDCWSRTest.BasicTest . That one hung every time for
> me.
> >>>
> >>>    Kent
> >>>
> >>>> -----Original Message-
> >>>> From: Christian König 
> >>>> Sent: Tuesday, March 12, 2019 11:09 AM
> >>>> To: Russell, Kent ; Koenig, Christian
> >>>> ; Kuehling, Felix
> >>>> ; amd-gfx@lists.freedesktop.org
> >>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on
> demand
> >>>>
> >>>> Yeah, same problem here.
> >>>>
> >>>> I removed libhsakmt package and installed it manually and now it
> >>>> seems to work.
> >>>>
> >>>> Doing some testing now, but at least of hand I can't seem to
> >>>> reproduce the VM fault on a Vega10.
> >>>>
> >>>> Christian.
> >>>>
> >>>> Am 12.03.19 um 16:01 schrieb Russell, Kent:
> >>>>> Oh right, I remember that issue. I had that happen to me once,
> >>>>> where my
> >>>> installed libhsakmt didn't match up with the latest source code, so
> >>>> I ended up having to remove the libhsakmt package and pointing it
> >>>> to the folders instead.
> >>>>>     Kent
> >>>>>
> >>>>>> -Original Message-
> >>>>>> From: Koenig, Christian
> >>>>>> Sent: Tuesday, March 12, 2019 10:49 AM
> >>>>>> To: Russell, Kent ; Kuehling, Felix
> >>>>>> ; amd-gfx@lists.freedesktop.org
> >>>>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on
> >>>>>> demand
> >>>>>>
> >>>>>> Yeah, the problem is I do have the libhsakmt installed.
> >>>>>>
> >>>>>> Going to give it a try to specify the directory directly.
> >>>>>>
> >>>>>> Christian.
> >>>>>>
> >>>>>> Am 12.03.19 um 15:47 schrieb Russell, Kent:
> >>>>>>> The README.txt file inside the tests/kfdtest folder has
> >>>>>>> instructions on how
> >>>>>> to do it if you don't have the libhsakmt package installed on
> >>>>>> your system:
> >>>>>>> export LIBHSAKMT_PATH=/*your local libhsakmt folder*/ With
> that,
> >>>>>>> the headers and libraries are searched under
> >>>>>>> LIBHSAKMT_PATH/include and LIBHSAKMT_PATH/lib respectively.
> >>>>>>>
> >>>>>>> So if you try export LIBHSAKMT_PATH as the root ROCT folder (the
> >>>>>>> one
> >>>>>> containing include, src, tests, etc), then that should cover it.
> >>>>>>>  Kent
> >>>>>>>
> >>>>>>>
> >>>>>>>> -Original Message-
> >>>>>>>> From: Christian König 
> >>>>>>>> Sent: Tuesday, March 12, 2019 9:13 AM
> >>>>>>>> To: Russell, Kent ;

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Kuehling, Felix
Never mind. I must have messed up my build. I can't reproduce the 
problem any more. The patch I sent out is still needed and valid. AFAICT 
it should be all that's needed to fix GPUVM for KFD.

I have not seen any faults with KFDCWSRTest.BasicTest on my system with 
Fiji or Vega10 with that patch applied.

Regards,
   Felix

On 2019-03-12 5:19 p.m., Felix Kuehling wrote:
> I'm also still seeing VM faults in the eviction test even with my fix, 
> and even with SDMA page table updates. There is still something else 
> going wrong. :/
>
> Thanks,
>   Felix
>
> On 2019-03-12 5:13 p.m., Yang, Philip wrote:
>> vm fault happens about 1/10 for KFDCWSRTest.BasicTest for me. I am using
>> SDMA for page table update. I don't try CPU page table update.
>>
>> Philip
>>
>> On 2019-03-12 11:12 a.m., Russell, Kent wrote:
>>> Peculiar, I hit it immediately when I ran it . Can you try use 
>>> --gtest_filter=KFDCWSRTest.BasicTest . That one hung every time for me.
>>>
>>>    Kent
>>>
>>>> -Original Message-
>>>> From: Christian König 
>>>> Sent: Tuesday, March 12, 2019 11:09 AM
>>>> To: Russell, Kent ; Koenig, Christian
>>>> ; Kuehling, Felix ;
>>>> amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>>>
>>>> Yeah, same problem here.
>>>>
>>>> I removed libhsakmt package and installed it manually and now it 
>>>> seems to
>>>> work.
>>>>
>>>> Doing some testing now, but at least of hand I can't seem to 
>>>> reproduce the
>>>> VM fault on a Vega10.
>>>>
>>>> Christian.
>>>>
>>>> Am 12.03.19 um 16:01 schrieb Russell, Kent:
>>>>> Oh right, I remember that issue. I had that happen to me once, 
>>>>> where my
>>>> installed libhsakmt didn't match up with the latest source code, so 
>>>> I ended up
>>>> having to remove the libhsakmt package and pointing it to the folders
>>>> instead.
>>>>>     Kent
>>>>>
>>>>>> -Original Message-
>>>>>> From: Koenig, Christian
>>>>>> Sent: Tuesday, March 12, 2019 10:49 AM
>>>>>> To: Russell, Kent ; Kuehling, Felix
>>>>>> ; amd-gfx@lists.freedesktop.org
>>>>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>>>>>
>>>>>> Yeah, the problem is I do have the libhsakmt installed.
>>>>>>
>>>>>> Going to give it a try to specify the directory directly.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 12.03.19 um 15:47 schrieb Russell, Kent:
>>>>>>> The README.txt file inside the tests/kfdtest folder has 
>>>>>>> instructions
>>>>>>> on how
>>>>>> to do it if you don't have the libhsakmt package installed on 
>>>>>> your system:
>>>>>>> export LIBHSAKMT_PATH=/*your local libhsakmt folder*/ With that, 
>>>>>>> the
>>>>>>> headers and libraries are searched under LIBHSAKMT_PATH/include and
>>>>>>> LIBHSAKMT_PATH/lib respectively.
>>>>>>>
>>>>>>> So if you try export LIBHSAKMT_PATH as the root ROCT folder (the 
>>>>>>> one
>>>>>> containing include, src, tests, etc), then that should cover it.
>>>>>>>  Kent
>>>>>>>
>>>>>>>
>>>>>>>> -Original Message-
>>>>>>>> From: Christian König 
>>>>>>>> Sent: Tuesday, March 12, 2019 9:13 AM
>>>>>>>> To: Russell, Kent ; Kuehling, Felix
>>>>>>>> ; Koenig, Christian
>>>>>>>> ; amd-gfx@lists.freedesktop.org
>>>>>>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on
>>>> demand
>>>>>>>> Hi guys,
>>>>>>>>
>>>>>>>> so found a few minutes today to compile kfdtest.
>>>>>>>>
>>>>>>>> Problem is that during the compile I get a lots of this:
>>>>>>>>> CMakeFiles/kfdtest.dir/src/BaseQueue.cpp.o: In Funktion
>>>>>>>>> »BaseQueue::Create(unsigned int, unsigned int, unsigned l

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Kuehling, Felix
I'm also still seeing VM faults in the eviction test even with my fix, 
and even with SDMA page table updates. There is still something else 
going wrong. :/

Thanks,
   Felix

On 2019-03-12 5:13 p.m., Yang, Philip wrote:
> vm fault happens about 1/10 for KFDCWSRTest.BasicTest for me. I am using
> SDMA for page table update. I don't try CPU page table update.
>
> Philip
>
> On 2019-03-12 11:12 a.m., Russell, Kent wrote:
>> Peculiar, I hit it immediately when I ran it . Can you try use 
>> --gtest_filter=KFDCWSRTest.BasicTest  . That one hung every time for me.
>>
>>Kent
>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Tuesday, March 12, 2019 11:09 AM
>>> To: Russell, Kent ; Koenig, Christian
>>> ; Kuehling, Felix ;
>>> amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>>
>>> Yeah, same problem here.
>>>
>>> I removed libhsakmt package and installed it manually and now it seems to
>>> work.
>>>
>>> Doing some testing now, but at least of hand I can't seem to reproduce the
>>> VM fault on a Vega10.
>>>
>>> Christian.
>>>
>>> Am 12.03.19 um 16:01 schrieb Russell, Kent:
>>>> Oh right, I remember that issue. I had that happen to me once, where my
>>> installed libhsakmt didn't match up with the latest source code, so I ended 
>>> up
>>> having to remove the libhsakmt package and pointing it to the folders
>>> instead.
>>>> Kent
>>>>
>>>>> -Original Message-
>>>>> From: Koenig, Christian
>>>>> Sent: Tuesday, March 12, 2019 10:49 AM
>>>>> To: Russell, Kent ; Kuehling, Felix
>>>>> ; amd-gfx@lists.freedesktop.org
>>>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>>>>
>>>>> Yeah, the problem is I do have the libhsakmt installed.
>>>>>
>>>>> Going to give it a try to specify the directory directly.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 12.03.19 um 15:47 schrieb Russell, Kent:
>>>>>> The README.txt file inside the tests/kfdtest folder has instructions
>>>>>> on how
>>>>> to do it if you don't have the libhsakmt package installed on your system:
>>>>>> export LIBHSAKMT_PATH=/*your local libhsakmt folder*/ With that, the
>>>>>> headers and libraries are searched under LIBHSAKMT_PATH/include and
>>>>>> LIBHSAKMT_PATH/lib respectively.
>>>>>>
>>>>>> So if you try export LIBHSAKMT_PATH as the root ROCT folder (the one
>>>>> containing include, src, tests, etc), then that should cover it.
>>>>>>  Kent
>>>>>>
>>>>>>
>>>>>>> -Original Message-
>>>>>>> From: Christian König 
>>>>>>> Sent: Tuesday, March 12, 2019 9:13 AM
>>>>>>> To: Russell, Kent ; Kuehling, Felix
>>>>>>> ; Koenig, Christian
>>>>>>> ; amd-gfx@lists.freedesktop.org
>>>>>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on
>>> demand
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> so found a few minutes today to compile kfdtest.
>>>>>>>
>>>>>>> Problem is that during the compile I get a lots of this:
>>>>>>>> CMakeFiles/kfdtest.dir/src/BaseQueue.cpp.o: In Funktion
>>>>>>>> »BaseQueue::Create(unsigned int, unsigned int, unsigned long*)«:
>>>>>>>> /usr/src/ROCT-Thunk-Interface/tests/kfdtest/src/BaseQueue.cpp:57:
>>>>>>>> Warnung: undefinierter Verweis auf »hsaKmtCreateQueue«
>>>>>>> Any idea?
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 11.03.19 um 17:55 schrieb Christian König:
>>>>>>>> Hi guys,
>>>>>>>>
>>>>>>>> well it's most likely some missing handling in the KFD, so I'm
>>>>>>>> rather reluctant to revert the change immediately.
>>>>>>>>
>>>>>>>> Problem is that I don't have time right now to look into it
>>>>>>>> immediately. So Kent can you continue to take a look?
>>&g

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Yang, Philip
vm fault happens about 1/10 for KFDCWSRTest.BasicTest for me. I am using 
SDMA for page table update. I don't try CPU page table update.

Philip

On 2019-03-12 11:12 a.m., Russell, Kent wrote:
> Peculiar, I hit it immediately when I ran it . Can you try use 
> --gtest_filter=KFDCWSRTest.BasicTest  . That one hung every time for me.
> 
>   Kent
> 
>> -Original Message-
>> From: Christian König 
>> Sent: Tuesday, March 12, 2019 11:09 AM
>> To: Russell, Kent ; Koenig, Christian
>> ; Kuehling, Felix ;
>> amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>
>> Yeah, same problem here.
>>
>> I removed libhsakmt package and installed it manually and now it seems to
>> work.
>>
>> Doing some testing now, but at least of hand I can't seem to reproduce the
>> VM fault on a Vega10.
>>
>> Christian.
>>
>> Am 12.03.19 um 16:01 schrieb Russell, Kent:
>>> Oh right, I remember that issue. I had that happen to me once, where my
>> installed libhsakmt didn't match up with the latest source code, so I ended 
>> up
>> having to remove the libhsakmt package and pointing it to the folders
>> instead.
>>>
>>>Kent
>>>
>>>> -----Original Message-----
>>>> From: Koenig, Christian
>>>> Sent: Tuesday, March 12, 2019 10:49 AM
>>>> To: Russell, Kent ; Kuehling, Felix
>>>> ; amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>>>
>>>> Yeah, the problem is I do have the libhsakmt installed.
>>>>
>>>> Going to give it a try to specify the directory directly.
>>>>
>>>> Christian.
>>>>
>>>> Am 12.03.19 um 15:47 schrieb Russell, Kent:
>>>>> The README.txt file inside the tests/kfdtest folder has instructions
>>>>> on how
>>>> to do it if you don't have the libhsakmt package installed on your system:
>>>>> export LIBHSAKMT_PATH=/*your local libhsakmt folder*/ With that, the
>>>>> headers and libraries are searched under LIBHSAKMT_PATH/include and
>>>>> LIBHSAKMT_PATH/lib respectively.
>>>>>
>>>>> So if you try export LIBHSAKMT_PATH as the root ROCT folder (the one
>>>> containing include, src, tests, etc), then that should cover it.
>>>>> Kent
>>>>>
>>>>>
>>>>>> -Original Message-
>>>>>> From: Christian König 
>>>>>> Sent: Tuesday, March 12, 2019 9:13 AM
>>>>>> To: Russell, Kent ; Kuehling, Felix
>>>>>> ; Koenig, Christian
>>>>>> ; amd-gfx@lists.freedesktop.org
>>>>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on
>> demand
>>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> so found a few minutes today to compile kfdtest.
>>>>>>
>>>>>> Problem is that during the compile I get a lots of this:
>>>>>>> CMakeFiles/kfdtest.dir/src/BaseQueue.cpp.o: In Funktion
>>>>>>> »BaseQueue::Create(unsigned int, unsigned int, unsigned long*)«:
>>>>>>> /usr/src/ROCT-Thunk-Interface/tests/kfdtest/src/BaseQueue.cpp:57:
>>>>>>> Warnung: undefinierter Verweis auf »hsaKmtCreateQueue«
>>>>>> Any idea?
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 11.03.19 um 17:55 schrieb Christian König:
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> well it's most likely some missing handling in the KFD, so I'm
>>>>>>> rather reluctant to revert the change immediately.
>>>>>>>
>>>>>>> Problem is that I don't have time right now to look into it
>>>>>>> immediately. So Kent can you continue to take a look?
>>>>>>>
>>>>>>> Sounds like its crashing immediately, so it should be something
>> obvious.
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 11.03.19 um 10:49 schrieb Russell, Kent:
>>>>>>>> From what I've been able to dig through, the VM Fault seems to
>>>>>>>> occur right after a doorbell mmap, but that's as far as I got. I
>>>>>>>> can try to revert it in 

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Kuehling, Felix
The root cause is that we don't wait after calling amdgpu_vm_clear_bo in 
amdgpu_vm_alloc_pts.

Waiting for the page table BOs to be idle for CPU page table updates is 
done in amdgpu_vm_bo_update_mapping. That is now *before* the page 
tables are actually allocated and cleared in amdgpu_vm_update_ptes.

We'll need to move the waiting for page tables to be idle into 
amdgpu_vm_alloc_pts or amdgpu_vm_update_ptes.

Regards,
   Felix

On 2019-03-12 3:02 p.m., Felix Kuehling wrote:
> I find that it's related to CPU page table updates. If I force page 
> table updates with SDMA, I don't get the VM fault.
>
> Regards,
>   Felix
>
> On 2019-03-11 12:55 p.m., Christian König wrote:
>> Hi guys,
>>
>> well it's most likely some missing handling in the KFD, so I'm rather 
>> reluctant to revert the change immediately.
>>
>> Problem is that I don't have time right now to look into it 
>> immediately. So Kent can you continue to take a look?
>>
>> Sounds like its crashing immediately, so it should be something obvious.
>>
>> Christian.
>>
>> Am 11.03.19 um 10:49 schrieb Russell, Kent:
>>>  From what I've been able to dig through, the VM Fault seems to 
>>> occur right after a doorbell mmap, but that's as far as I got. I can 
>>> try to revert it in today's merge and see how things go.
>>>
>>>   Kent
>>>
>>>> -----Original Message-----
>>>> From: Kuehling, Felix
>>>> Sent: Friday, March 08, 2019 11:16 PM
>>>> To: Koenig, Christian ; Russell, Kent
>>>> ; amd-gfx@lists.freedesktop.org
>>>> Subject: RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>>>
>>>> My concerns were related to eviction fence handing. It would 
>>>> manifest by
>>>> unnecessary eviction callbacks into KFD that aren't cause by real 
>>>> evictions. I
>>>> addressed that with a previous patch series that removed the need to
>>>> remove eviction fences and add them back around page table updates in
>>>> amdgpu_amdkfd_gpuvm.c.
>>>>
>>>> I don't know what's going on here. I can probably take a look on 
>>>> Monday. I
>>>> haven't considered what changed with respect to PD updates.
>>>>
>>>> Kent, can we temporarily revert the offending change in 
>>>> amd-kfd-staging
>>>> just to unblock the merge?
>>>>
>>>> Christian, I think KFD is currently broken on amd-staging-drm-next. 
>>>> If we're
>>>> serious about supporting KFD upstream, you may also want to consider
>>>> reverting your change there for now. Also consider building the 
>>>> Thunk and
>>>> kfdtest so you can do quick smoke tests locally whenever you make
>>>> amdgpu_vm changes that can affect KFD.
>>>> https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
>>>>
>>>> Regards,
>>>>    Felix
>>>>
>>>> -Original Message-
>>>> From: amd-gfx  On Behalf Of
>>>> Christian König
>>>> Sent: Friday, March 08, 2019 9:14 AM
>>>> To: Russell, Kent ; 
>>>> amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>>>
>>>> My best guess is that we forget somewhere to update the PDs. What
>>>> hardware is that on?
>>>>
>>>> Felix already mentioned that this could be problematic for the KFD.
>>>>
>>>> Maybe he has an idea,
>>>> Christian.
>>>>
>>>> Am 08.03.19 um 15:04 schrieb Russell, Kent:
>>>>> Hi Christian,
>>>>>
>>>>> This patch ended up causing a VM Fault in KFDTest. Reverting just 
>>>>> this
>>>> patch addressed the issue:
>>>>> [   82.703503] amdgpu :0c:00.0: GPU fault detected: 146 
>>>>> 0x480c for
>>>> process  pid 0 thread  pid 0
>>>>> [   82.703512] amdgpu :0c:00.0:
>>>> VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x1000
>>>>> [   82.703516] amdgpu :0c:00.0:
>>>> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x1004800C
>>>>> [   82.703522] amdgpu :0c:00.0: VM fault (0x0c, vmid 8, pasid 
>>>>> 32769) at
>>>> page 4096, read from 'TC0' (0x54433000) (72)
>>>>> [   82.703585] Evicting PASID 32769 queues
>>>>>
>>>>> I am look

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Kuehling, Felix
I find that it's related to CPU page table updates. If I force page 
table updates with SDMA, I don't get the VM fault.

Regards,
   Felix

On 2019-03-11 12:55 p.m., Christian König wrote:
> Hi guys,
>
> well it's most likely some missing handling in the KFD, so I'm rather 
> reluctant to revert the change immediately.
>
> Problem is that I don't have time right now to look into it 
> immediately. So Kent can you continue to take a look?
>
> Sounds like its crashing immediately, so it should be something obvious.
>
> Christian.
>
> Am 11.03.19 um 10:49 schrieb Russell, Kent:
>>  From what I've been able to dig through, the VM Fault seems to occur 
>> right after a doorbell mmap, but that's as far as I got. I can try to 
>> revert it in today's merge and see how things go.
>>
>>   Kent
>>
>>> -Original Message-
>>> From: Kuehling, Felix
>>> Sent: Friday, March 08, 2019 11:16 PM
>>> To: Koenig, Christian ; Russell, Kent
>>> ; amd-gfx@lists.freedesktop.org
>>> Subject: RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>>
>>> My concerns were related to eviction fence handing. It would 
>>> manifest by
>>> unnecessary eviction callbacks into KFD that aren't cause by real 
>>> evictions. I
>>> addressed that with a previous patch series that removed the need to
>>> remove eviction fences and add them back around page table updates in
>>> amdgpu_amdkfd_gpuvm.c.
>>>
>>> I don't know what's going on here. I can probably take a look on 
>>> Monday. I
>>> haven't considered what changed with respect to PD updates.
>>>
>>> Kent, can we temporarily revert the offending change in amd-kfd-staging
>>> just to unblock the merge?
>>>
>>> Christian, I think KFD is currently broken on amd-staging-drm-next. 
>>> If we're
>>> serious about supporting KFD upstream, you may also want to consider
>>> reverting your change there for now. Also consider building the 
>>> Thunk and
>>> kfdtest so you can do quick smoke tests locally whenever you make
>>> amdgpu_vm changes that can affect KFD.
>>> https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
>>>
>>> Regards,
>>>    Felix
>>>
>>> -Original Message-
>>> From: amd-gfx  On Behalf Of
>>> Christian König
>>> Sent: Friday, March 08, 2019 9:14 AM
>>> To: Russell, Kent ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>>
>>> My best guess is that we forget somewhere to update the PDs. What
>>> hardware is that on?
>>>
>>> Felix already mentioned that this could be problematic for the KFD.
>>>
>>> Maybe he has an idea,
>>> Christian.
>>>
>>> Am 08.03.19 um 15:04 schrieb Russell, Kent:
>>>> Hi Christian,
>>>>
>>>> This patch ended up causing a VM Fault in KFDTest. Reverting just this
>>> patch addressed the issue:
>>>> [   82.703503] amdgpu :0c:00.0: GPU fault detected: 146 
>>>> 0x480c for
>>> process  pid 0 thread  pid 0
>>>> [   82.703512] amdgpu :0c:00.0:
>>> VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x1000
>>>> [   82.703516] amdgpu :0c:00.0:
>>> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x1004800C
>>>> [   82.703522] amdgpu :0c:00.0: VM fault (0x0c, vmid 8, pasid 
>>>> 32769) at
>>> page 4096, read from 'TC0' (0x54433000) (72)
>>>> [   82.703585] Evicting PASID 32769 queues
>>>>
>>>> I am looking into it, but if you have any insight that would be 
>>>> great in
>>> helping to resolve it quickly.
>>>>    Kent
>>>>> -Original Message-
>>>>> From: amd-gfx  On Behalf Of
>>>>> Christian König
>>>>> Sent: Tuesday, February 26, 2019 7:47 AM
>>>>> To: amd-gfx@lists.freedesktop.org
>>>>> Subject: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>>>>
>>>>> Let's start to allocate VM PDs/PTs on demand instead of
>>>>> pre-allocating them during mapping.
>>>>>
>>>>> Signed-off-by: Christian König 
>>>>> Reviewed-by: Felix Kuehling 
>>>>> ---
>>>>>    .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  10 +-
>>>>>    drivers/gpu/drm/amd/am

RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Russell, Kent
Peculiar, I hit it immediately when I ran it . Can you try use 
--gtest_filter=KFDCWSRTest.BasicTest  . That one hung every time for me.

 Kent

> -Original Message-
> From: Christian König 
> Sent: Tuesday, March 12, 2019 11:09 AM
> To: Russell, Kent ; Koenig, Christian
> ; Kuehling, Felix ;
> amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> 
> Yeah, same problem here.
> 
> I removed libhsakmt package and installed it manually and now it seems to
> work.
> 
> Doing some testing now, but at least of hand I can't seem to reproduce the
> VM fault on a Vega10.
> 
> Christian.
> 
> Am 12.03.19 um 16:01 schrieb Russell, Kent:
> > Oh right, I remember that issue. I had that happen to me once, where my
> installed libhsakmt didn't match up with the latest source code, so I ended up
> having to remove the libhsakmt package and pointing it to the folders
> instead.
> >
> >   Kent
> >
> >> -Original Message-
> >> From: Koenig, Christian
> >> Sent: Tuesday, March 12, 2019 10:49 AM
> >> To: Russell, Kent ; Kuehling, Felix
> >> ; amd-gfx@lists.freedesktop.org
> >> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> >>
> >> Yeah, the problem is I do have the libhsakmt installed.
> >>
> >> Going to give it a try to specify the directory directly.
> >>
> >> Christian.
> >>
> >> Am 12.03.19 um 15:47 schrieb Russell, Kent:
> >>> The README.txt file inside the tests/kfdtest folder has instructions
> >>> on how
> >> to do it if you don't have the libhsakmt package installed on your system:
> >>> export LIBHSAKMT_PATH=/*your local libhsakmt folder*/ With that, the
> >>> headers and libraries are searched under LIBHSAKMT_PATH/include and
> >>> LIBHSAKMT_PATH/lib respectively.
> >>>
> >>> So if you try export LIBHSAKMT_PATH as the root ROCT folder (the one
> >> containing include, src, tests, etc), then that should cover it.
> >>>Kent
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Christian König 
> >>>> Sent: Tuesday, March 12, 2019 9:13 AM
> >>>> To: Russell, Kent ; Kuehling, Felix
> >>>> ; Koenig, Christian
> >>>> ; amd-gfx@lists.freedesktop.org
> >>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on
> demand
> >>>>
> >>>> Hi guys,
> >>>>
> >>>> so found a few minutes today to compile kfdtest.
> >>>>
> >>>> Problem is that during the compile I get a lots of this:
> >>>>> CMakeFiles/kfdtest.dir/src/BaseQueue.cpp.o: In Funktion
> >>>>> »BaseQueue::Create(unsigned int, unsigned int, unsigned long*)«:
> >>>>> /usr/src/ROCT-Thunk-Interface/tests/kfdtest/src/BaseQueue.cpp:57:
> >>>>> Warnung: undefinierter Verweis auf »hsaKmtCreateQueue«
> >>>> Any idea?
> >>>>
> >>>> Christian.
> >>>>
> >>>> Am 11.03.19 um 17:55 schrieb Christian König:
> >>>>> Hi guys,
> >>>>>
> >>>>> well it's most likely some missing handling in the KFD, so I'm
> >>>>> rather reluctant to revert the change immediately.
> >>>>>
> >>>>> Problem is that I don't have time right now to look into it
> >>>>> immediately. So Kent can you continue to take a look?
> >>>>>
> >>>>> Sounds like its crashing immediately, so it should be something
> obvious.
> >>>>>
> >>>>> Christian.
> >>>>>
> >>>>> Am 11.03.19 um 10:49 schrieb Russell, Kent:
> >>>>>>    From what I've been able to dig through, the VM Fault seems to
> >>>>>> occur right after a doorbell mmap, but that's as far as I got. I
> >>>>>> can try to revert it in today's merge and see how things go.
> >>>>>>
> >>>>>>     Kent
> >>>>>>
> >>>>>>> -Original Message-
> >>>>>>> From: Kuehling, Felix
> >>>>>>> Sent: Friday, March 08, 2019 11:16 PM
> >>>>>>> To: Koenig, Christian ; Russell, Kent
> >>>>>>> ; amd-gfx@lists.freedesktop.org
> >>>>>>> Subje

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Christian König

Yeah, same problem here.

I removed libhsakmt package and installed it manually and now it seems 
to work.


Doing some testing now, but at least of hand I can't seem to reproduce 
the VM fault on a Vega10.


Christian.

Am 12.03.19 um 16:01 schrieb Russell, Kent:

Oh right, I remember that issue. I had that happen to me once, where my 
installed libhsakmt didn't match up with the latest source code, so I ended up 
having to remove the libhsakmt package and pointing it to the folders instead.

  Kent


-Original Message-
From: Koenig, Christian
Sent: Tuesday, March 12, 2019 10:49 AM
To: Russell, Kent ; Kuehling, Felix
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

Yeah, the problem is I do have the libhsakmt installed.

Going to give it a try to specify the directory directly.

Christian.

Am 12.03.19 um 15:47 schrieb Russell, Kent:

The README.txt file inside the tests/kfdtest folder has instructions on how

to do it if you don't have the libhsakmt package installed on your system:

export LIBHSAKMT_PATH=/*your local libhsakmt folder*/ With that, the
headers and libraries are searched under LIBHSAKMT_PATH/include and
LIBHSAKMT_PATH/lib respectively.

So if you try export LIBHSAKMT_PATH as the root ROCT folder (the one

containing include, src, tests, etc), then that should cover it.

   Kent



-Original Message-
From: Christian König 
Sent: Tuesday, March 12, 2019 9:13 AM
To: Russell, Kent ; Kuehling, Felix
; Koenig, Christian
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

Hi guys,

so found a few minutes today to compile kfdtest.

Problem is that during the compile I get a lots of this:

CMakeFiles/kfdtest.dir/src/BaseQueue.cpp.o: In Funktion
»BaseQueue::Create(unsigned int, unsigned int, unsigned long*)«:
/usr/src/ROCT-Thunk-Interface/tests/kfdtest/src/BaseQueue.cpp:57:
Warnung: undefinierter Verweis auf »hsaKmtCreateQueue«

Any idea?

Christian.

Am 11.03.19 um 17:55 schrieb Christian König:

Hi guys,

well it's most likely some missing handling in the KFD, so I'm
rather reluctant to revert the change immediately.

Problem is that I don't have time right now to look into it
immediately. So Kent can you continue to take a look?

Sounds like its crashing immediately, so it should be something obvious.

Christian.

Am 11.03.19 um 10:49 schrieb Russell, Kent:

   From what I've been able to dig through, the VM Fault seems to
occur right after a doorbell mmap, but that's as far as I got. I
can try to revert it in today's merge and see how things go.

    Kent


-Original Message-
From: Kuehling, Felix
Sent: Friday, March 08, 2019 11:16 PM
To: Koenig, Christian ; Russell, Kent
; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on

demand

My concerns were related to eviction fence handing. It would
manifest by unnecessary eviction callbacks into KFD that aren't
cause by real evictions. I addressed that with a previous patch
series that removed the need to remove eviction fences and add
them back around page table updates in amdgpu_amdkfd_gpuvm.c.

I don't know what's going on here. I can probably take a look on
Monday. I haven't considered what changed with respect to PD
updates.

Kent, can we temporarily revert the offending change in
amd-kfd-staging just to unblock the merge?

Christian, I think KFD is currently broken on amd-staging-drm-next.
If we're
serious about supporting KFD upstream, you may also want to
consider reverting your change there for now. Also consider
building the Thunk and kfdtest so you can do quick smoke tests
locally whenever you make amdgpu_vm changes that can affect KFD.
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface

Regards,
     Felix

-Original Message-
From: amd-gfx  On Behalf

Of

Christian König
Sent: Friday, March 08, 2019 9:14 AM
To: Russell, Kent ;
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on

demand

My best guess is that we forget somewhere to update the PDs. What
hardware is that on?

Felix already mentioned that this could be problematic for the KFD.

Maybe he has an idea,
Christian.

Am 08.03.19 um 15:04 schrieb Russell, Kent:

Hi Christian,

This patch ended up causing a VM Fault in KFDTest. Reverting just
this

patch addressed the issue:

[   82.703503] amdgpu :0c:00.0: GPU fault detected: 146
0x480c for

process  pid 0 thread  pid 0

[   82.703512] amdgpu :0c:00.0:

VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x1000

[   82.703516] amdgpu :0c:00.0:

VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x1004800C

[   82.703522] amdgpu :0c:00.0: VM fault (0x0c, vmid 8, pasid
32769) at

page 4096, read from 'TC0' (0x54433000) (72)

[   82.703585] Evicting PASID 32769 queues

I am looking into it, but if you have any insight that would be
great in

helping

RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Russell, Kent
Oh right, I remember that issue. I had that happen to me once, where my 
installed libhsakmt didn't match up with the latest source code, so I ended up 
having to remove the libhsakmt package and pointing it to the folders instead. 

 Kent

> -Original Message-
> From: Koenig, Christian
> Sent: Tuesday, March 12, 2019 10:49 AM
> To: Russell, Kent ; Kuehling, Felix
> ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> 
> Yeah, the problem is I do have the libhsakmt installed.
> 
> Going to give it a try to specify the directory directly.
> 
> Christian.
> 
> Am 12.03.19 um 15:47 schrieb Russell, Kent:
> > The README.txt file inside the tests/kfdtest folder has instructions on how
> to do it if you don't have the libhsakmt package installed on your system:
> >
> > export LIBHSAKMT_PATH=/*your local libhsakmt folder*/ With that, the
> > headers and libraries are searched under LIBHSAKMT_PATH/include and
> > LIBHSAKMT_PATH/lib respectively.
> >
> > So if you try export LIBHSAKMT_PATH as the root ROCT folder (the one
> containing include, src, tests, etc), then that should cover it.
> >
> >   Kent
> >
> >
> >> -Original Message-
> >> From: Christian König 
> >> Sent: Tuesday, March 12, 2019 9:13 AM
> >> To: Russell, Kent ; Kuehling, Felix
> >> ; Koenig, Christian
> >> ; amd-gfx@lists.freedesktop.org
> >> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> >>
> >> Hi guys,
> >>
> >> so found a few minutes today to compile kfdtest.
> >>
> >> Problem is that during the compile I get a lots of this:
> >>> CMakeFiles/kfdtest.dir/src/BaseQueue.cpp.o: In Funktion
> >>> »BaseQueue::Create(unsigned int, unsigned int, unsigned long*)«:
> >>> /usr/src/ROCT-Thunk-Interface/tests/kfdtest/src/BaseQueue.cpp:57:
> >>> Warnung: undefinierter Verweis auf »hsaKmtCreateQueue«
> >> Any idea?
> >>
> >> Christian.
> >>
> >> Am 11.03.19 um 17:55 schrieb Christian König:
> >>> Hi guys,
> >>>
> >>> well it's most likely some missing handling in the KFD, so I'm
> >>> rather reluctant to revert the change immediately.
> >>>
> >>> Problem is that I don't have time right now to look into it
> >>> immediately. So Kent can you continue to take a look?
> >>>
> >>> Sounds like its crashing immediately, so it should be something obvious.
> >>>
> >>> Christian.
> >>>
> >>> Am 11.03.19 um 10:49 schrieb Russell, Kent:
> >>>>   From what I've been able to dig through, the VM Fault seems to
> >>>> occur right after a doorbell mmap, but that's as far as I got. I
> >>>> can try to revert it in today's merge and see how things go.
> >>>>
> >>>>    Kent
> >>>>
> >>>>> -Original Message-
> >>>>> From: Kuehling, Felix
> >>>>> Sent: Friday, March 08, 2019 11:16 PM
> >>>>> To: Koenig, Christian ; Russell, Kent
> >>>>> ; amd-gfx@lists.freedesktop.org
> >>>>> Subject: RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on
> demand
> >>>>>
> >>>>> My concerns were related to eviction fence handing. It would
> >>>>> manifest by unnecessary eviction callbacks into KFD that aren't
> >>>>> cause by real evictions. I addressed that with a previous patch
> >>>>> series that removed the need to remove eviction fences and add
> >>>>> them back around page table updates in amdgpu_amdkfd_gpuvm.c.
> >>>>>
> >>>>> I don't know what's going on here. I can probably take a look on
> >>>>> Monday. I haven't considered what changed with respect to PD
> >>>>> updates.
> >>>>>
> >>>>> Kent, can we temporarily revert the offending change in
> >>>>> amd-kfd-staging just to unblock the merge?
> >>>>>
> >>>>> Christian, I think KFD is currently broken on amd-staging-drm-next.
> >>>>> If we're
> >>>>> serious about supporting KFD upstream, you may also want to
> >>>>> consider reverting your change there for now. Also consider
> >>>>> building the Thunk and kfdtest so you can do quick smoke tests
> >>>>> lo

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Koenig, Christian
Yeah, the problem is I do have the libhsakmt installed.

Going to give it a try to specify the directory directly.

Christian.

Am 12.03.19 um 15:47 schrieb Russell, Kent:
> The README.txt file inside the tests/kfdtest folder has instructions on how 
> to do it if you don't have the libhsakmt package installed on your system:
>
> export LIBHSAKMT_PATH=/*your local libhsakmt folder*/
> With that, the headers and libraries are searched under
> LIBHSAKMT_PATH/include and LIBHSAKMT_PATH/lib respectively.
>
> So if you try export LIBHSAKMT_PATH as the root ROCT folder (the one 
> containing include, src, tests, etc), then that should cover it.
>
>   Kent
>
>
>> -Original Message-
>> From: Christian König 
>> Sent: Tuesday, March 12, 2019 9:13 AM
>> To: Russell, Kent ; Kuehling, Felix
>> ; Koenig, Christian
>> ; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>
>> Hi guys,
>>
>> so found a few minutes today to compile kfdtest.
>>
>> Problem is that during the compile I get a lots of this:
>>> CMakeFiles/kfdtest.dir/src/BaseQueue.cpp.o: In Funktion
>>> »BaseQueue::Create(unsigned int, unsigned int, unsigned long*)«:
>>> /usr/src/ROCT-Thunk-Interface/tests/kfdtest/src/BaseQueue.cpp:57:
>>> Warnung: undefinierter Verweis auf »hsaKmtCreateQueue«
>> Any idea?
>>
>> Christian.
>>
>> Am 11.03.19 um 17:55 schrieb Christian König:
>>> Hi guys,
>>>
>>> well it's most likely some missing handling in the KFD, so I'm rather
>>> reluctant to revert the change immediately.
>>>
>>> Problem is that I don't have time right now to look into it
>>> immediately. So Kent can you continue to take a look?
>>>
>>> Sounds like its crashing immediately, so it should be something obvious.
>>>
>>> Christian.
>>>
>>> Am 11.03.19 um 10:49 schrieb Russell, Kent:
>>>>   From what I've been able to dig through, the VM Fault seems to occur
>>>> right after a doorbell mmap, but that's as far as I got. I can try to
>>>> revert it in today's merge and see how things go.
>>>>
>>>>    Kent
>>>>
>>>>> -Original Message-
>>>>> From: Kuehling, Felix
>>>>> Sent: Friday, March 08, 2019 11:16 PM
>>>>> To: Koenig, Christian ; Russell, Kent
>>>>> ; amd-gfx@lists.freedesktop.org
>>>>> Subject: RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>>>>
>>>>> My concerns were related to eviction fence handing. It would
>>>>> manifest by unnecessary eviction callbacks into KFD that aren't
>>>>> cause by real evictions. I addressed that with a previous patch
>>>>> series that removed the need to remove eviction fences and add them
>>>>> back around page table updates in amdgpu_amdkfd_gpuvm.c.
>>>>>
>>>>> I don't know what's going on here. I can probably take a look on
>>>>> Monday. I haven't considered what changed with respect to PD
>>>>> updates.
>>>>>
>>>>> Kent, can we temporarily revert the offending change in
>>>>> amd-kfd-staging just to unblock the merge?
>>>>>
>>>>> Christian, I think KFD is currently broken on amd-staging-drm-next.
>>>>> If we're
>>>>> serious about supporting KFD upstream, you may also want to consider
>>>>> reverting your change there for now. Also consider building the
>>>>> Thunk and kfdtest so you can do quick smoke tests locally whenever
>>>>> you make amdgpu_vm changes that can affect KFD.
>>>>> https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
>>>>>
>>>>> Regards,
>>>>>     Felix
>>>>>
>>>>> -Original Message-
>>>>> From: amd-gfx  On Behalf Of
>>>>> Christian König
>>>>> Sent: Friday, March 08, 2019 9:14 AM
>>>>> To: Russell, Kent ;
>>>>> amd-gfx@lists.freedesktop.org
>>>>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>>>>
>>>>> My best guess is that we forget somewhere to update the PDs. What
>>>>> hardware is that on?
>>>>>
>>>>> Felix already mentioned that this could be problematic for the KFD.
>>>>>
>>>>> Maybe he has an 

RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Russell, Kent
The README.txt file inside the tests/kfdtest folder has instructions on how to 
do it if you don't have the libhsakmt package installed on your system:

export LIBHSAKMT_PATH=/*your local libhsakmt folder*/
With that, the headers and libraries are searched under
LIBHSAKMT_PATH/include and LIBHSAKMT_PATH/lib respectively.

So if you try export LIBHSAKMT_PATH as the root ROCT folder (the one containing 
include, src, tests, etc), then that should cover it.

 Kent


> -Original Message-
> From: Christian König 
> Sent: Tuesday, March 12, 2019 9:13 AM
> To: Russell, Kent ; Kuehling, Felix
> ; Koenig, Christian
> ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> 
> Hi guys,
> 
> so found a few minutes today to compile kfdtest.
> 
> Problem is that during the compile I get a lots of this:
> > CMakeFiles/kfdtest.dir/src/BaseQueue.cpp.o: In Funktion
> > »BaseQueue::Create(unsigned int, unsigned int, unsigned long*)«:
> > /usr/src/ROCT-Thunk-Interface/tests/kfdtest/src/BaseQueue.cpp:57:
> > Warnung: undefinierter Verweis auf »hsaKmtCreateQueue«
> 
> Any idea?
> 
> Christian.
> 
> Am 11.03.19 um 17:55 schrieb Christian König:
> > Hi guys,
> >
> > well it's most likely some missing handling in the KFD, so I'm rather
> > reluctant to revert the change immediately.
> >
> > Problem is that I don't have time right now to look into it
> > immediately. So Kent can you continue to take a look?
> >
> > Sounds like its crashing immediately, so it should be something obvious.
> >
> > Christian.
> >
> > Am 11.03.19 um 10:49 schrieb Russell, Kent:
> >>  From what I've been able to dig through, the VM Fault seems to occur
> >> right after a doorbell mmap, but that's as far as I got. I can try to
> >> revert it in today's merge and see how things go.
> >>
> >>   Kent
> >>
> >>> -Original Message-
> >>> From: Kuehling, Felix
> >>> Sent: Friday, March 08, 2019 11:16 PM
> >>> To: Koenig, Christian ; Russell, Kent
> >>> ; amd-gfx@lists.freedesktop.org
> >>> Subject: RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> >>>
> >>> My concerns were related to eviction fence handing. It would
> >>> manifest by unnecessary eviction callbacks into KFD that aren't
> >>> cause by real evictions. I addressed that with a previous patch
> >>> series that removed the need to remove eviction fences and add them
> >>> back around page table updates in amdgpu_amdkfd_gpuvm.c.
> >>>
> >>> I don't know what's going on here. I can probably take a look on
> >>> Monday. I haven't considered what changed with respect to PD
> >>> updates.
> >>>
> >>> Kent, can we temporarily revert the offending change in
> >>> amd-kfd-staging just to unblock the merge?
> >>>
> >>> Christian, I think KFD is currently broken on amd-staging-drm-next.
> >>> If we're
> >>> serious about supporting KFD upstream, you may also want to consider
> >>> reverting your change there for now. Also consider building the
> >>> Thunk and kfdtest so you can do quick smoke tests locally whenever
> >>> you make amdgpu_vm changes that can affect KFD.
> >>> https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
> >>>
> >>> Regards,
> >>>    Felix
> >>>
> >>> -Original Message-
> >>> From: amd-gfx  On Behalf Of
> >>> Christian König
> >>> Sent: Friday, March 08, 2019 9:14 AM
> >>> To: Russell, Kent ;
> >>> amd-gfx@lists.freedesktop.org
> >>> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> >>>
> >>> My best guess is that we forget somewhere to update the PDs. What
> >>> hardware is that on?
> >>>
> >>> Felix already mentioned that this could be problematic for the KFD.
> >>>
> >>> Maybe he has an idea,
> >>> Christian.
> >>>
> >>> Am 08.03.19 um 15:04 schrieb Russell, Kent:
> >>>> Hi Christian,
> >>>>
> >>>> This patch ended up causing a VM Fault in KFDTest. Reverting just
> >>>> this
> >>> patch addressed the issue:
> >>>> [   82.703503] amdgpu :0c:00.0: GPU fault detected: 146
> >>>> 0x480c for
> >>> process  pid 0 thread  pid 0
> >

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Christian König

Hi guys,

so found a few minutes today to compile kfdtest.

Problem is that during the compile I get a lots of this:
CMakeFiles/kfdtest.dir/src/BaseQueue.cpp.o: In Funktion 
»BaseQueue::Create(unsigned int, unsigned int, unsigned long*)«:
/usr/src/ROCT-Thunk-Interface/tests/kfdtest/src/BaseQueue.cpp:57: 
Warnung: undefinierter Verweis auf »hsaKmtCreateQueue«


Any idea?

Christian.

Am 11.03.19 um 17:55 schrieb Christian König:

Hi guys,

well it's most likely some missing handling in the KFD, so I'm rather 
reluctant to revert the change immediately.


Problem is that I don't have time right now to look into it 
immediately. So Kent can you continue to take a look?


Sounds like its crashing immediately, so it should be something obvious.

Christian.

Am 11.03.19 um 10:49 schrieb Russell, Kent:
 From what I've been able to dig through, the VM Fault seems to occur 
right after a doorbell mmap, but that's as far as I got. I can try to 
revert it in today's merge and see how things go.


  Kent


-Original Message-
From: Kuehling, Felix
Sent: Friday, March 08, 2019 11:16 PM
To: Koenig, Christian ; Russell, Kent
; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

My concerns were related to eviction fence handing. It would 
manifest by
unnecessary eviction callbacks into KFD that aren't cause by real 
evictions. I

addressed that with a previous patch series that removed the need to
remove eviction fences and add them back around page table updates in
amdgpu_amdkfd_gpuvm.c.

I don't know what's going on here. I can probably take a look on 
Monday. I

haven't considered what changed with respect to PD updates.

Kent, can we temporarily revert the offending change in amd-kfd-staging
just to unblock the merge?

Christian, I think KFD is currently broken on amd-staging-drm-next. 
If we're

serious about supporting KFD upstream, you may also want to consider
reverting your change there for now. Also consider building the 
Thunk and

kfdtest so you can do quick smoke tests locally whenever you make
amdgpu_vm changes that can affect KFD.
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface

Regards,
   Felix

-Original Message-
From: amd-gfx  On Behalf Of
Christian König
Sent: Friday, March 08, 2019 9:14 AM
To: Russell, Kent ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

My best guess is that we forget somewhere to update the PDs. What
hardware is that on?

Felix already mentioned that this could be problematic for the KFD.

Maybe he has an idea,
Christian.

Am 08.03.19 um 15:04 schrieb Russell, Kent:

Hi Christian,

This patch ended up causing a VM Fault in KFDTest. Reverting just this

patch addressed the issue:
[   82.703503] amdgpu :0c:00.0: GPU fault detected: 146 
0x480c for

process  pid 0 thread  pid 0

[   82.703512] amdgpu :0c:00.0:

VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x1000

[   82.703516] amdgpu :0c:00.0:

VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x1004800C
[   82.703522] amdgpu :0c:00.0: VM fault (0x0c, vmid 8, pasid 
32769) at

page 4096, read from 'TC0' (0x54433000) (72)

[   82.703585] Evicting PASID 32769 queues

I am looking into it, but if you have any insight that would be 
great in

helping to resolve it quickly.

   Kent

-Original Message-
From: amd-gfx  On Behalf Of
Christian König
Sent: Tuesday, February 26, 2019 7:47 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

Let's start to allocate VM PDs/PTs on demand instead of
pre-allocating them during mapping.

Signed-off-by: Christian König 
Reviewed-by: Felix Kuehling 
---
   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  10 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c   |   9 --
   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  10 --
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    | 136 
+

-

drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |   3 -
   5 files changed, 39 insertions(+), 129 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 31e3953dcb6e..088e9b6b765b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -410,15 +410,7 @@ static int add_bo_to_vm(struct amdgpu_device
*adev, struct kgd_mem *mem,
   if (p_bo_va_entry)
   *p_bo_va_entry = bo_va_entry;

-    /* Allocate new page tables if needed and validate
- * them.
- */
-    ret = amdgpu_vm_alloc_pts(adev, vm, va, amdgpu_bo_size(bo));
-    if (ret) {
-    pr_err("Failed to allocate pts, err=%d\n", ret);
-    goto err_alloc_pts;
-    }
-
+    /* Allocate validate page tables if needed */
   ret = vm_validate_pt_pd_bos(vm);
   if (ret) {
   pr_err("validate_pt_pd_bos() failed\n

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-11 Thread Christian König

Hi guys,

well it's most likely some missing handling in the KFD, so I'm rather 
reluctant to revert the change immediately.


Problem is that I don't have time right now to look into it immediately. 
So Kent can you continue to take a look?


Sounds like its crashing immediately, so it should be something obvious.

Christian.

Am 11.03.19 um 10:49 schrieb Russell, Kent:

 From what I've been able to dig through, the VM Fault seems to occur right 
after a doorbell mmap, but that's as far as I got. I can try to revert it in 
today's merge and see how things go.

  Kent


-Original Message-
From: Kuehling, Felix
Sent: Friday, March 08, 2019 11:16 PM
To: Koenig, Christian ; Russell, Kent
; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

My concerns were related to eviction fence handing. It would manifest by
unnecessary eviction callbacks into KFD that aren't cause by real evictions. I
addressed that with a previous patch series that removed the need to
remove eviction fences and add them back around page table updates in
amdgpu_amdkfd_gpuvm.c.

I don't know what's going on here. I can probably take a look on Monday. I
haven't considered what changed with respect to PD updates.

Kent, can we temporarily revert the offending change in amd-kfd-staging
just to unblock the merge?

Christian, I think KFD is currently broken on amd-staging-drm-next. If we're
serious about supporting KFD upstream, you may also want to consider
reverting your change there for now. Also consider building the Thunk and
kfdtest so you can do quick smoke tests locally whenever you make
amdgpu_vm changes that can affect KFD.
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface

Regards,
   Felix

-Original Message-
From: amd-gfx  On Behalf Of
Christian König
Sent: Friday, March 08, 2019 9:14 AM
To: Russell, Kent ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

My best guess is that we forget somewhere to update the PDs. What
hardware is that on?

Felix already mentioned that this could be problematic for the KFD.

Maybe he has an idea,
Christian.

Am 08.03.19 um 15:04 schrieb Russell, Kent:

Hi Christian,

This patch ended up causing a VM Fault in KFDTest. Reverting just this

patch addressed the issue:

[   82.703503] amdgpu :0c:00.0: GPU fault detected: 146 0x480c for

process  pid 0 thread  pid 0

[   82.703512] amdgpu :0c:00.0:

VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x1000

[   82.703516] amdgpu :0c:00.0:

VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x1004800C

[   82.703522] amdgpu :0c:00.0: VM fault (0x0c, vmid 8, pasid 32769) at

page 4096, read from 'TC0' (0x54433000) (72)

[   82.703585] Evicting PASID 32769 queues

I am looking into it, but if you have any insight that would be great in

helping to resolve it quickly.

   Kent

-Original Message-
From: amd-gfx  On Behalf Of
Christian König
Sent: Tuesday, February 26, 2019 7:47 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

Let's start to allocate VM PDs/PTs on demand instead of
pre-allocating them during mapping.

Signed-off-by: Christian König 
Reviewed-by: Felix Kuehling 
---
   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  10 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c   |   9 --
   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  10 --
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 136 +

-

   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   3 -
   5 files changed, 39 insertions(+), 129 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 31e3953dcb6e..088e9b6b765b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -410,15 +410,7 @@ static int add_bo_to_vm(struct amdgpu_device
*adev, struct kgd_mem *mem,
if (p_bo_va_entry)
*p_bo_va_entry = bo_va_entry;

-   /* Allocate new page tables if needed and validate
-* them.
-*/
-   ret = amdgpu_vm_alloc_pts(adev, vm, va, amdgpu_bo_size(bo));
-   if (ret) {
-   pr_err("Failed to allocate pts, err=%d\n", ret);
-   goto err_alloc_pts;
-   }
-
+   /* Allocate validate page tables if needed */
ret = vm_validate_pt_pd_bos(vm);
if (ret) {
pr_err("validate_pt_pd_bos() failed\n"); diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 7e22be7ca68a..54dd02a898b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -92,15 +92,6 @@ int amdgpu_map_static_csa(struct amdgpu_device
*adev, struct amdgpu_vm *vm,
return -ENOMEM;
}

-   r = amdgpu_vm_allo

RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-11 Thread Russell, Kent
From what I've been able to dig through, the VM Fault seems to occur right 
after a doorbell mmap, but that's as far as I got. I can try to revert it in 
today's merge and see how things go.

 Kent

> -Original Message-
> From: Kuehling, Felix
> Sent: Friday, March 08, 2019 11:16 PM
> To: Koenig, Christian ; Russell, Kent
> ; amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> 
> My concerns were related to eviction fence handing. It would manifest by
> unnecessary eviction callbacks into KFD that aren't cause by real evictions. I
> addressed that with a previous patch series that removed the need to
> remove eviction fences and add them back around page table updates in
> amdgpu_amdkfd_gpuvm.c.
> 
> I don't know what's going on here. I can probably take a look on Monday. I
> haven't considered what changed with respect to PD updates.
> 
> Kent, can we temporarily revert the offending change in amd-kfd-staging
> just to unblock the merge?
> 
> Christian, I think KFD is currently broken on amd-staging-drm-next. If we're
> serious about supporting KFD upstream, you may also want to consider
> reverting your change there for now. Also consider building the Thunk and
> kfdtest so you can do quick smoke tests locally whenever you make
> amdgpu_vm changes that can affect KFD.
> https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
> 
> Regards,
>   Felix
> 
> -Original Message-
> From: amd-gfx  On Behalf Of
> Christian König
> Sent: Friday, March 08, 2019 9:14 AM
> To: Russell, Kent ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> 
> My best guess is that we forget somewhere to update the PDs. What
> hardware is that on?
> 
> Felix already mentioned that this could be problematic for the KFD.
> 
> Maybe he has an idea,
> Christian.
> 
> Am 08.03.19 um 15:04 schrieb Russell, Kent:
> > Hi Christian,
> >
> > This patch ended up causing a VM Fault in KFDTest. Reverting just this
> patch addressed the issue:
> > [   82.703503] amdgpu :0c:00.0: GPU fault detected: 146 0x480c for
> process  pid 0 thread  pid 0
> > [   82.703512] amdgpu :0c:00.0:
> VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x1000
> > [   82.703516] amdgpu :0c:00.0:
> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x1004800C
> > [   82.703522] amdgpu :0c:00.0: VM fault (0x0c, vmid 8, pasid 32769) at
> page 4096, read from 'TC0' (0x54433000) (72)
> > [   82.703585] Evicting PASID 32769 queues
> >
> > I am looking into it, but if you have any insight that would be great in
> helping to resolve it quickly.
> >
> >   Kent
> >> -Original Message-
> >> From: amd-gfx  On Behalf Of
> >> Christian König
> >> Sent: Tuesday, February 26, 2019 7:47 AM
> >> To: amd-gfx@lists.freedesktop.org
> >> Subject: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> >>
> >> Let's start to allocate VM PDs/PTs on demand instead of
> >> pre-allocating them during mapping.
> >>
> >> Signed-off-by: Christian König 
> >> Reviewed-by: Felix Kuehling 
> >> ---
> >>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  10 +-
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c   |   9 --
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  10 --
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 136 +
> -
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   3 -
> >>   5 files changed, 39 insertions(+), 129 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >> index 31e3953dcb6e..088e9b6b765b 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >> @@ -410,15 +410,7 @@ static int add_bo_to_vm(struct amdgpu_device
> >> *adev, struct kgd_mem *mem,
> >>if (p_bo_va_entry)
> >>*p_bo_va_entry = bo_va_entry;
> >>
> >> -  /* Allocate new page tables if needed and validate
> >> -   * them.
> >> -   */
> >> -  ret = amdgpu_vm_alloc_pts(adev, vm, va, amdgpu_bo_size(bo));
> >> -  if (ret) {
> >> -  pr_err("Failed to allocate pts, err=%d\n", ret);
> >> -  goto err_alloc_pts;
> >> -  }
> >> -
> >> +  /* Allocate validate page tables if needed */
> >>ret = vm_validate_pt_pd_bos(vm);
> 

RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-08 Thread Kuehling, Felix
My concerns were related to eviction fence handing. It would manifest by 
unnecessary eviction callbacks into KFD that aren't cause by real evictions. I 
addressed that with a previous patch series that removed the need to remove 
eviction fences and add them back around page table updates in 
amdgpu_amdkfd_gpuvm.c.

I don't know what's going on here. I can probably take a look on Monday. I 
haven't considered what changed with respect to PD updates.

Kent, can we temporarily revert the offending change in amd-kfd-staging just to 
unblock the merge?

Christian, I think KFD is currently broken on amd-staging-drm-next. If we're 
serious about supporting KFD upstream, you may also want to consider reverting 
your change there for now. Also consider building the Thunk and kfdtest so you 
can do quick smoke tests locally whenever you make amdgpu_vm changes that can 
affect KFD. https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface

Regards,
  Felix

-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Friday, March 08, 2019 9:14 AM
To: Russell, Kent ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

My best guess is that we forget somewhere to update the PDs. What hardware is 
that on?

Felix already mentioned that this could be problematic for the KFD.

Maybe he has an idea,
Christian.

Am 08.03.19 um 15:04 schrieb Russell, Kent:
> Hi Christian,
>
> This patch ended up causing a VM Fault in KFDTest. Reverting just this patch 
> addressed the issue:
> [   82.703503] amdgpu :0c:00.0: GPU fault detected: 146 0x480c for 
> process  pid 0 thread  pid 0
> [   82.703512] amdgpu :0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   
> 0x1000
> [   82.703516] amdgpu :0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 
> 0x1004800C
> [   82.703522] amdgpu :0c:00.0: VM fault (0x0c, vmid 8, pasid 32769) at 
> page 4096, read from 'TC0' (0x54433000) (72)
> [   82.703585] Evicting PASID 32769 queues
>
> I am looking into it, but if you have any insight that would be great in 
> helping to resolve it quickly.
>
>   Kent
>> -Original Message-
>> From: amd-gfx  On Behalf Of 
>> Christian König
>> Sent: Tuesday, February 26, 2019 7:47 AM
>> To: amd-gfx@lists.freedesktop.org
>> Subject: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
>>
>> Let's start to allocate VM PDs/PTs on demand instead of 
>> pre-allocating them during mapping.
>>
>> Signed-off-by: Christian König 
>> Reviewed-by: Felix Kuehling 
>> ---
>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  10 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c   |   9 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  10 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 136 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   3 -
>>   5 files changed, 39 insertions(+), 129 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 31e3953dcb6e..088e9b6b765b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -410,15 +410,7 @@ static int add_bo_to_vm(struct amdgpu_device 
>> *adev, struct kgd_mem *mem,
>>  if (p_bo_va_entry)
>>  *p_bo_va_entry = bo_va_entry;
>>
>> -/* Allocate new page tables if needed and validate
>> - * them.
>> - */
>> -ret = amdgpu_vm_alloc_pts(adev, vm, va, amdgpu_bo_size(bo));
>> -if (ret) {
>> -pr_err("Failed to allocate pts, err=%d\n", ret);
>> -goto err_alloc_pts;
>> -}
>> -
>> +/* Allocate validate page tables if needed */
>>  ret = vm_validate_pt_pd_bos(vm);
>>  if (ret) {
>>  pr_err("validate_pt_pd_bos() failed\n"); diff --git 
>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> index 7e22be7ca68a..54dd02a898b9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> @@ -92,15 +92,6 @@ int amdgpu_map_static_csa(struct amdgpu_device 
>> *adev, struct amdgpu_vm *vm,
>>  return -ENOMEM;
>>  }
>>
>> -r = amdgpu_vm_alloc_pts(adev, (*bo_va)->base.vm, csa_addr,
>> -size);
>> -if (r) {
>> -DRM_ERROR("failed to allocate pts for static CSA, err=%d\n",
>> r);
>> -amdgpu_vm_bo_rmv(adev, *bo_va);
>> -ttm_eu_backoff_res

RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-08 Thread Russell, Kent
That dmesg was from Fiji, but it occurred on Vega10 as well. 

 Kent

> -Original Message-
> From: Christian König 
> Sent: Friday, March 08, 2019 9:14 AM
> To: Russell, Kent ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> 
> My best guess is that we forget somewhere to update the PDs. What
> hardware is that on?
> 
> Felix already mentioned that this could be problematic for the KFD.
> 
> Maybe he has an idea,
> Christian.
> 
> Am 08.03.19 um 15:04 schrieb Russell, Kent:
> > Hi Christian,
> >
> > This patch ended up causing a VM Fault in KFDTest. Reverting just this
> patch addressed the issue:
> > [   82.703503] amdgpu :0c:00.0: GPU fault detected: 146 0x480c for
> process  pid 0 thread  pid 0
> > [   82.703512] amdgpu :0c:00.0:
> VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x1000
> > [   82.703516] amdgpu :0c:00.0:
> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x1004800C
> > [   82.703522] amdgpu :0c:00.0: VM fault (0x0c, vmid 8, pasid 32769) at
> page 4096, read from 'TC0' (0x54433000) (72)
> > [   82.703585] Evicting PASID 32769 queues
> >
> > I am looking into it, but if you have any insight that would be great in
> helping to resolve it quickly.
> >
> >   Kent
> >> -Original Message-
> >> From: amd-gfx  On Behalf Of
> >> Christian König
> >> Sent: Tuesday, February 26, 2019 7:47 AM
> >> To: amd-gfx@lists.freedesktop.org
> >> Subject: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> >>
> >> Let's start to allocate VM PDs/PTs on demand instead of
> >> pre-allocating them during mapping.
> >>
> >> Signed-off-by: Christian König 
> >> Reviewed-by: Felix Kuehling 
> >> ---
> >>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  10 +-
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c   |   9 --
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  10 --
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 136 +
> -
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   3 -
> >>   5 files changed, 39 insertions(+), 129 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >> index 31e3953dcb6e..088e9b6b765b 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >> @@ -410,15 +410,7 @@ static int add_bo_to_vm(struct amdgpu_device
> >> *adev, struct kgd_mem *mem,
> >>if (p_bo_va_entry)
> >>*p_bo_va_entry = bo_va_entry;
> >>
> >> -  /* Allocate new page tables if needed and validate
> >> -   * them.
> >> -   */
> >> -  ret = amdgpu_vm_alloc_pts(adev, vm, va, amdgpu_bo_size(bo));
> >> -  if (ret) {
> >> -  pr_err("Failed to allocate pts, err=%d\n", ret);
> >> -  goto err_alloc_pts;
> >> -  }
> >> -
> >> +  /* Allocate validate page tables if needed */
> >>ret = vm_validate_pt_pd_bos(vm);
> >>if (ret) {
> >>pr_err("validate_pt_pd_bos() failed\n"); diff --git
> >> a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> >> index 7e22be7ca68a..54dd02a898b9 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> >> @@ -92,15 +92,6 @@ int amdgpu_map_static_csa(struct amdgpu_device
> >> *adev, struct amdgpu_vm *vm,
> >>return -ENOMEM;
> >>}
> >>
> >> -  r = amdgpu_vm_alloc_pts(adev, (*bo_va)->base.vm, csa_addr,
> >> -  size);
> >> -  if (r) {
> >> -  DRM_ERROR("failed to allocate pts for static CSA, err=%d\n",
> >> r);
> >> -  amdgpu_vm_bo_rmv(adev, *bo_va);
> >> -  ttm_eu_backoff_reservation(&ticket, &list);
> >> -  return r;
> >> -  }
> >> -
> >>r = amdgpu_vm_bo_map(adev, *bo_va, csa_addr, 0, size,
> >> AMDGPU_PTE_READABLE |
> >> AMDGPU_PTE_WRITEABLE |
> >> AMDGPU_PTE_EXECUTABLE);
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> index 555285e329ed..fcaaac30e84b 100644
> >> --- a/drivers/

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-08 Thread Christian König
My best guess is that we forget somewhere to update the PDs. What 
hardware is that on?


Felix already mentioned that this could be problematic for the KFD.

Maybe he has an idea,
Christian.

Am 08.03.19 um 15:04 schrieb Russell, Kent:

Hi Christian,

This patch ended up causing a VM Fault in KFDTest. Reverting just this patch 
addressed the issue:
[   82.703503] amdgpu :0c:00.0: GPU fault detected: 146 0x480c for 
process  pid 0 thread  pid 0
[   82.703512] amdgpu :0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   
0x1000
[   82.703516] amdgpu :0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 
0x1004800C
[   82.703522] amdgpu :0c:00.0: VM fault (0x0c, vmid 8, pasid 32769) at 
page 4096, read from 'TC0' (0x54433000) (72)
[   82.703585] Evicting PASID 32769 queues

I am looking into it, but if you have any insight that would be great in 
helping to resolve it quickly.

  Kent

-Original Message-
From: amd-gfx  On Behalf Of
Christian König
Sent: Tuesday, February 26, 2019 7:47 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

Let's start to allocate VM PDs/PTs on demand instead of pre-allocating them
during mapping.

Signed-off-by: Christian König 
Reviewed-by: Felix Kuehling 
---
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  10 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c   |   9 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  10 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 136 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   3 -
  5 files changed, 39 insertions(+), 129 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 31e3953dcb6e..088e9b6b765b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -410,15 +410,7 @@ static int add_bo_to_vm(struct amdgpu_device
*adev, struct kgd_mem *mem,
if (p_bo_va_entry)
*p_bo_va_entry = bo_va_entry;

-   /* Allocate new page tables if needed and validate
-* them.
-*/
-   ret = amdgpu_vm_alloc_pts(adev, vm, va, amdgpu_bo_size(bo));
-   if (ret) {
-   pr_err("Failed to allocate pts, err=%d\n", ret);
-   goto err_alloc_pts;
-   }
-
+   /* Allocate validate page tables if needed */
ret = vm_validate_pt_pd_bos(vm);
if (ret) {
pr_err("validate_pt_pd_bos() failed\n"); diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 7e22be7ca68a..54dd02a898b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -92,15 +92,6 @@ int amdgpu_map_static_csa(struct amdgpu_device
*adev, struct amdgpu_vm *vm,
return -ENOMEM;
}

-   r = amdgpu_vm_alloc_pts(adev, (*bo_va)->base.vm, csa_addr,
-   size);
-   if (r) {
-   DRM_ERROR("failed to allocate pts for static CSA, err=%d\n",
r);
-   amdgpu_vm_bo_rmv(adev, *bo_va);
-   ttm_eu_backoff_reservation(&ticket, &list);
-   return r;
-   }
-
r = amdgpu_vm_bo_map(adev, *bo_va, csa_addr, 0, size,
 AMDGPU_PTE_READABLE |
AMDGPU_PTE_WRITEABLE |
 AMDGPU_PTE_EXECUTABLE);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 555285e329ed..fcaaac30e84b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -625,11 +625,6 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev,
void *data,

switch (args->operation) {
case AMDGPU_VA_OP_MAP:
-   r = amdgpu_vm_alloc_pts(adev, bo_va->base.vm, args-

va_address,

-   args->map_size);
-   if (r)
-   goto error_backoff;
-
va_flags = amdgpu_gmc_get_pte_flags(adev, args->flags);
r = amdgpu_vm_bo_map(adev, bo_va, args->va_address,
 args->offset_in_bo, args->map_size, @@ -
645,11 +640,6 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void
*data,
args->map_size);
break;
case AMDGPU_VA_OP_REPLACE:
-   r = amdgpu_vm_alloc_pts(adev, bo_va->base.vm, args-

va_address,

-   args->map_size);
-   if (r)
-   goto error_backoff;
-
va_flags = amdgpu_gmc_get_pte_flags(adev, args->flags);
r = amdgpu_vm_bo_replace_map(adev, bo_va, args-

va_address,

 args->offset_in_bo, args-

map_size, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 362436f4e856..dfad543fc000 1006

RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-08 Thread Russell, Kent
Hi Christian,

This patch ended up causing a VM Fault in KFDTest. Reverting just this patch 
addressed the issue:
[   82.703503] amdgpu :0c:00.0: GPU fault detected: 146 0x480c for 
process  pid 0 thread  pid 0
[   82.703512] amdgpu :0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   
0x1000
[   82.703516] amdgpu :0c:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 
0x1004800C
[   82.703522] amdgpu :0c:00.0: VM fault (0x0c, vmid 8, pasid 32769) at 
page 4096, read from 'TC0' (0x54433000) (72)
[   82.703585] Evicting PASID 32769 queues

I am looking into it, but if you have any insight that would be great in 
helping to resolve it quickly.

 Kent
> -Original Message-
> From: amd-gfx  On Behalf Of
> Christian König
> Sent: Tuesday, February 26, 2019 7:47 AM
> To: amd-gfx@lists.freedesktop.org
> Subject: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand
> 
> Let's start to allocate VM PDs/PTs on demand instead of pre-allocating them
> during mapping.
> 
> Signed-off-by: Christian König 
> Reviewed-by: Felix Kuehling 
> ---
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  10 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c   |   9 --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  10 --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 136 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   3 -
>  5 files changed, 39 insertions(+), 129 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 31e3953dcb6e..088e9b6b765b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -410,15 +410,7 @@ static int add_bo_to_vm(struct amdgpu_device
> *adev, struct kgd_mem *mem,
>   if (p_bo_va_entry)
>   *p_bo_va_entry = bo_va_entry;
> 
> - /* Allocate new page tables if needed and validate
> -  * them.
> -  */
> - ret = amdgpu_vm_alloc_pts(adev, vm, va, amdgpu_bo_size(bo));
> - if (ret) {
> - pr_err("Failed to allocate pts, err=%d\n", ret);
> - goto err_alloc_pts;
> - }
> -
> + /* Allocate validate page tables if needed */
>   ret = vm_validate_pt_pd_bos(vm);
>   if (ret) {
>   pr_err("validate_pt_pd_bos() failed\n"); diff --git
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> index 7e22be7ca68a..54dd02a898b9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> @@ -92,15 +92,6 @@ int amdgpu_map_static_csa(struct amdgpu_device
> *adev, struct amdgpu_vm *vm,
>   return -ENOMEM;
>   }
> 
> - r = amdgpu_vm_alloc_pts(adev, (*bo_va)->base.vm, csa_addr,
> - size);
> - if (r) {
> - DRM_ERROR("failed to allocate pts for static CSA, err=%d\n",
> r);
> - amdgpu_vm_bo_rmv(adev, *bo_va);
> - ttm_eu_backoff_reservation(&ticket, &list);
> - return r;
> - }
> -
>   r = amdgpu_vm_bo_map(adev, *bo_va, csa_addr, 0, size,
>AMDGPU_PTE_READABLE |
> AMDGPU_PTE_WRITEABLE |
>AMDGPU_PTE_EXECUTABLE);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 555285e329ed..fcaaac30e84b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -625,11 +625,6 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev,
> void *data,
> 
>   switch (args->operation) {
>   case AMDGPU_VA_OP_MAP:
> - r = amdgpu_vm_alloc_pts(adev, bo_va->base.vm, args-
> >va_address,
> - args->map_size);
> - if (r)
> - goto error_backoff;
> -
>   va_flags = amdgpu_gmc_get_pte_flags(adev, args->flags);
>   r = amdgpu_vm_bo_map(adev, bo_va, args->va_address,
>args->offset_in_bo, args->map_size, @@ -
> 645,11 +640,6 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void
> *data,
>   args->map_size);
>   break;
>   case AMDGPU_VA_OP_REPLACE:
> - r = amdgpu_vm_alloc_pts(adev, bo_va->base.vm, args-
> >va_address,
> - args->map_size);
> - if (r)
> - goto error_backoff;
> -
>   va_flags = amdgpu_gmc_get_pte_flags(adev, args->flags);
>   r = amdgpu_vm_bo_replace_map(adev, bo_va, args-
> >va_address,
>args->offset_in_bo, args-
> >map_size, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 362436f4e856..dfad543fc000 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -504,47 +504,6 @@ static void am