date:20210106

RE: [PATCH 3/3] drm/amdgpu:Limit the resolution for virtual_display

2021-01-06 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Alex Deucher 
>Sent: Wednesday, January 6, 2021 1:23 AM
>To: Deng, Emily 
>Cc: amd-gfx list 
>Subject: Re: [PATCH 3/3] drm/amdgpu:Limit the resolution for virtual_display
>
>On Tue, Jan 5, 2021 at 3:37 AM Emily.Deng  wrote:
>>
>> Limit the resolution not bigger than 16384, which means
>> dev->mode_info.num_crtc * common_modes[i].w not bigger than 16384.
>>
>> Signed-off-by: Emily.Deng 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> index 2b16c8faca34..c23d37b02fd7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> @@ -319,6 +319,7 @@ dce_virtual_encoder(struct drm_connector
>> *connector)  static int dce_virtual_get_modes(struct drm_connector
>> *connector)  {
>> struct drm_device *dev = connector->dev;
>> +   struct amdgpu_device *adev = dev->dev_private;
>> struct drm_display_mode *mode = NULL;
>> unsigned i;
>> static const struct mode_size { @@ -350,8 +351,10 @@ static
>> int dce_virtual_get_modes(struct drm_connector *connector)
>> };
>>
>> for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
>> -   mode = drm_cvt_mode(dev, common_modes[i].w,
>common_modes[i].h, 60, false, false, false);
>> -   drm_mode_probed_add(connector, mode);
>> +   if (adev->mode_info.num_crtc <= 4 ||
>> + common_modes[i].w <= 2560) {
>
>You are also limiting the number of crtcs here.  Intended?  Won't this break 5
>or 6 crtc configs?
>
>Alex
Yes, it is intended,  for num_crtc bigger then 4, don't support resolution 
bigger then 2560, because of the max supported width is 16384 for xcb protocol.
>
>> +   mode = drm_cvt_mode(dev, common_modes[i].w,
>common_modes[i].h, 60, false, false, false);
>> +   drm_mode_probed_add(connector, mode);
>> +   }
>> }
>>
>> return 0;
>> --
>> 2.25.1
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfx&data=04%7C01%7CEm
>>
>ily.Deng%40amd.com%7Ce17ab0515ecf483eff6a08d8b19ea565%7C3dd8961f
>e4884e
>>
>608e11a82d994e183d%7C0%7C0%7C637454642229402978%7CUnknown%7
>CTWFpbGZsb3
>>
>d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
>3D%7
>>
>C1000&sdata=YEVtCVJZ8JSe3kjyAGmjltHN1O4i4yvjvXjDZhWhZSY%3D&a
>mp;res
>> erved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: For sriov multiple VF, set compute timeout to 10s

2021-01-06 Thread Emily . Deng

For multiple VF, after engine hang,as host driver will first
encounter FLR, so has no meanning to set compute to 60s.

Signed-off-by: Emily.Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b69c34074d8d..ed36bf97df29 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3117,8 +3117,10 @@ static int amdgpu_device_get_job_timeout_settings(struct 
amdgpu_device *adev)
 */
adev->gfx_timeout = msecs_to_jiffies(1);
adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout;
-   if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev))
+   if ((amdgpu_sriov_vf(adev) && amdgpu_sriov_is_pp_one_vf(adev)) || 
amdgpu_passthrough(adev))
adev->compute_timeout =  msecs_to_jiffies(6);
+   else if (amdgpu_sriov_vf(adev))
+   adev->compute_timeout =  msecs_to_jiffies(1);
else
adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;
 
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: don't limit gtt size on apus

2021-01-06 Thread Joshua Ashton




On 1/6/21 7:52 AM, Christian König wrote:

Am 05.01.21 um 23:31 schrieb Joshua Ashton:

On 1/5/21 10:10 PM, Alex Deucher wrote:

On Tue, Jan 5, 2021 at 5:05 PM Joshua Ashton  wrote:


Since commit 24562523688b ("Revert "drm/amd/amdgpu: set gtt size
according to system memory size only""), the GTT size was limited by
3GiB or VRAM size.


The commit in question was to fix a hang with certain tests on APUs.
That should be tested again before we re-enable this.  If it is fixed,
we should just revert the revert rather than special case dGPUs.

Alex



I think the commit before the revert (ba851eed895c) has some 
fundamental problems:


It was always specifying max(3GiB, 3/4ths RAM) of GTT, even if that 
wouldn't fit into say, 1GiB or 2GiB of available RAM.


Limiting GTT to min(max(3GiB, VRAM), 3/4ths RAM) size on dGPUs makes 
sense also and is a sensible limit to avoid silly situations with 
overallocation and potential OOM.


This patch solves both of those issues.


No, Alex is right this approach was already tried and it causes problems.

Additional to that why should this be an issue? Even when VRAM is very 
small on APUs we still use 3GiB of GTT.


Regards,
Christian.


The problem is that 3GiB of GTT isn't enough for most modern games. My 
laptop has a 128MiB carveout which is not possible to be configured in 
the BIOS so I am stuck with that size without extra kernel parameters 
which shouldn't be necessary.


If you dislike the approach of keeping the extra check for dGPUs and 
limiting GTT there, then I would say that we should use

gtt_size = 3/4ths system memory
for all devices instead of
gtt_size = max(3/4ths system memory, 3GiB)
as it was before the revert, as it is problematic on systems with < 3GiB 
of system memory.


- Joshie 🐸✨





- Joshie 🐸✨





This is problematic on APUs, especially with a small carveout
which can be as low as a fixed 128MiB, as there would be very a limited
3GiB available for video memory.
This obviously does not meet the demands of modern applications.

This patch makes it so the GTT size heuristic always uses 3/4ths of
the system memory size on APUs (limiting the size by 3GiB/VRAM size
only on devices with dedicated video memory).

Fixes: 24562523688b ("Revert drm/amd/amdgpu: set gtt size according to
system memory size only")

Signed-off-by: Joshua Ashton 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  5 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +---
  2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

index 72efd579ec5e..a5a41e9272d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -192,8 +192,9 @@ module_param_named(gartsize, amdgpu_gart_size, 
uint, 0600);


  /**
   * DOC: gttsize (int)
- * Restrict the size of GTT domain in MiB for testing. The default 
is -1 (It's VRAM size if 3GB < VRAM < 3/4 RAM,

- * otherwise 3/4 RAM size).
+ * Restrict the size of GTT domain in MiB for testing. The default 
is -1 (On APUs this is 3/4th
+ * of the system memory; on dGPUs this is 3GiB or VRAM sized, 
whichever is bigger,

+ * with an upper bound of 3/4th of system memory.
   */
  MODULE_PARM_DESC(gttsize, "Size of the GTT domain in megabytes (-1 
= auto)");

  module_param_named(gttsize, amdgpu_gtt_size, int, 0600);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index 4d8f19ab1014..294f26f4f310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1865,9 +1865,15 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 struct sysinfo si;

 si_meminfo(&si);
-   gtt_size = min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
-  adev->gmc.mc_vram_size),
-  ((uint64_t)si.totalram * si.mem_unit 
* 3/4));

+   gtt_size = (uint64_t)si.totalram * si.mem_unit * 3/4;
+   /* If we have dedicated memory, limit our GTT size to
+    * 3GiB or VRAM size, whichever is bigger
+    */
+   if (!(adev->flags & AMD_IS_APU)) {
+   gtt_size = 
min(max(AMDGPU_DEFAULT_GTT_SIZE_MB << 20,

+   adev->gmc.mc_vram_size),
+   gtt_size);
+   }
 }
 else
 gtt_size = (uint64_t)amdgpu_gtt_size << 20;
--
2.30.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cchristian.koenig%40amd.com%7C0dfe0d4b6f694ef4bd3c08d8b1c9ab0b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637454827005214704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata

Re: [PATCH] drm/amdgpu: For sriov multiple VF, set compute timeout to 10s

2021-01-06 Thread Paul Menzel


Dear Emily,


Am 06.01.21 um 12:41 schrieb Emily.Deng:

Could you please remove the dot your name in your git configuration?

git config --global user.name "Emily Deng"

For the summary, maybe amend it to:

Decrease compute timeout to 10 s for sriov multiple VF


For multiple VF, after engine hang,as host driver will first


Nit: Please add a space after the comma.


encounter FLR, so has no meanning to set compute to 60s.


meaning

How can this be tested?


Signed-off-by: Emily.Deng 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b69c34074d8d..ed36bf97df29 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3117,8 +3117,10 @@ static int amdgpu_device_get_job_timeout_settings(struct 
amdgpu_device *adev)
 */
adev->gfx_timeout = msecs_to_jiffies(1);
adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout;
-   if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev))
+   if ((amdgpu_sriov_vf(adev) && amdgpu_sriov_is_pp_one_vf(adev)) || 
amdgpu_passthrough(adev))
adev->compute_timeout =  msecs_to_jiffies(6);
+   else if (amdgpu_sriov_vf(adev))
+   adev->compute_timeout =  msecs_to_jiffies(1);


Maybe split up the first if condition to group the condition and not he 
timeout values. At least for me that would be less confusing:


if (amdgpu_sriov_vf(adev))
	adev->compute_timeout = amdgpu_sriov_is_pp_one_vf(adev) ? 
msecs_to_jiffies(6) : msecs_to_jiffies(1)

else if (amdgpu_passthrough(adev))
adev->compute_timeout =  msecs_to_jiffies(6);


else
adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;



Kind regards,

Paul
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: don't limit gtt size on apus

2021-01-06 Thread Christian König


Am 06.01.21 um 13:47 schrieb Joshua Ashton:



On 1/6/21 7:52 AM, Christian König wrote:

Am 05.01.21 um 23:31 schrieb Joshua Ashton:

On 1/5/21 10:10 PM, Alex Deucher wrote:

On Tue, Jan 5, 2021 at 5:05 PM Joshua Ashton  wrote:


Since commit 24562523688b ("Revert "drm/amd/amdgpu: set gtt size
according to system memory size only""), the GTT size was limited by
3GiB or VRAM size.


The commit in question was to fix a hang with certain tests on APUs.
That should be tested again before we re-enable this.  If it is fixed,
we should just revert the revert rather than special case dGPUs.

Alex



I think the commit before the revert (ba851eed895c) has some 
fundamental problems:


It was always specifying max(3GiB, 3/4ths RAM) of GTT, even if that 
wouldn't fit into say, 1GiB or 2GiB of available RAM.


Limiting GTT to min(max(3GiB, VRAM), 3/4ths RAM) size on dGPUs makes 
sense also and is a sensible limit to avoid silly situations with 
overallocation and potential OOM.


This patch solves both of those issues.


No, Alex is right this approach was already tried and it causes 
problems.


Additional to that why should this be an issue? Even when VRAM is 
very small on APUs we still use 3GiB of GTT.


Regards,
Christian.


The problem is that 3GiB of GTT isn't enough for most modern games.


You seem to misunderstand what the GTT size means here. This is the 
amount of memory an application can lock down in a single command 
submissions.


It is still possible for the game to use all of system memory for 
textures etc... it can just happen that some buffers are temporary 
marked as inaccessible for the GPU.


My laptop has a 128MiB carveout which is not possible to be configured 
in the BIOS so I am stuck with that size without extra kernel 
parameters which shouldn't be necessary.


Did you ran into problems without the parameter?



If you dislike the approach of keeping the extra check for dGPUs and 
limiting GTT there, then I would say that we should use

gtt_size = 3/4ths system memory
for all devices instead of
gtt_size = max(3/4ths system memory, 3GiB)
as it was before the revert, as it is problematic on systems with < 
3GiB of system memory.


Yeah, that's indeed not a good idea.

Regards,
Christian.



- Joshie 🐸✨





- Joshie 🐸✨





This is problematic on APUs, especially with a small carveout
which can be as low as a fixed 128MiB, as there would be very a 
limited

3GiB available for video memory.
This obviously does not meet the demands of modern applications.

This patch makes it so the GTT size heuristic always uses 3/4ths of
the system memory size on APUs (limiting the size by 3GiB/VRAM size
only on devices with dedicated video memory).

Fixes: 24562523688b ("Revert drm/amd/amdgpu: set gtt size 
according to

system memory size only")

Signed-off-by: Joshua Ashton 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  5 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +---
  2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

index 72efd579ec5e..a5a41e9272d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -192,8 +192,9 @@ module_param_named(gartsize, amdgpu_gart_size, 
uint, 0600);


  /**
   * DOC: gttsize (int)
- * Restrict the size of GTT domain in MiB for testing. The 
default is -1 (It's VRAM size if 3GB < VRAM < 3/4 RAM,

- * otherwise 3/4 RAM size).
+ * Restrict the size of GTT domain in MiB for testing. The 
default is -1 (On APUs this is 3/4th
+ * of the system memory; on dGPUs this is 3GiB or VRAM sized, 
whichever is bigger,

+ * with an upper bound of 3/4th of system memory.
   */
  MODULE_PARM_DESC(gttsize, "Size of the GTT domain in megabytes 
(-1 = auto)");

  module_param_named(gttsize, amdgpu_gtt_size, int, 0600);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index 4d8f19ab1014..294f26f4f310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1865,9 +1865,15 @@ int amdgpu_ttm_init(struct amdgpu_device 
*adev)

 struct sysinfo si;

 si_meminfo(&si);
-   gtt_size = min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 
20),

-  adev->gmc.mc_vram_size),
-  ((uint64_t)si.totalram * 
si.mem_unit * 3/4));

+   gtt_size = (uint64_t)si.totalram * si.mem_unit * 3/4;
+   /* If we have dedicated memory, limit our GTT size to
+    * 3GiB or VRAM size, whichever is bigger
+    */
+   if (!(adev->flags & AMD_IS_APU)) {
+   gtt_size = 
min(max(AMDGPU_DEFAULT_GTT_SIZE_MB << 20,

+ adev->gmc.mc_vram_size),
+   gtt_size);
+   }
 }
 else
 gtt_size = (uint64_t)amdgpu_gtt_size << 20;
--
2.30.0

___

Re: [PATCH] drm/amdgpu: don't limit gtt size on apus

2021-01-06 Thread Bas Nieuwenhuizen

On Wed, Jan 6, 2021 at 1:54 PM Christian König 
wrote:

> Am 06.01.21 um 13:47 schrieb Joshua Ashton:
> >
> >
> > On 1/6/21 7:52 AM, Christian König wrote:
> >> Am 05.01.21 um 23:31 schrieb Joshua Ashton:
> >>> On 1/5/21 10:10 PM, Alex Deucher wrote:
>  On Tue, Jan 5, 2021 at 5:05 PM Joshua Ashton 
> wrote:
> >
> > Since commit 24562523688b ("Revert "drm/amd/amdgpu: set gtt size
> > according to system memory size only""), the GTT size was limited by
> > 3GiB or VRAM size.
> 
>  The commit in question was to fix a hang with certain tests on APUs.
>  That should be tested again before we re-enable this.  If it is fixed,
>  we should just revert the revert rather than special case dGPUs.
> 
>  Alex
> 
> >>>
> >>> I think the commit before the revert (ba851eed895c) has some
> >>> fundamental problems:
> >>>
> >>> It was always specifying max(3GiB, 3/4ths RAM) of GTT, even if that
> >>> wouldn't fit into say, 1GiB or 2GiB of available RAM.
> >>>
> >>> Limiting GTT to min(max(3GiB, VRAM), 3/4ths RAM) size on dGPUs makes
> >>> sense also and is a sensible limit to avoid silly situations with
> >>> overallocation and potential OOM.
> >>>
> >>> This patch solves both of those issues.
> >>
> >> No, Alex is right this approach was already tried and it causes
> >> problems.
> >>
> >> Additional to that why should this be an issue? Even when VRAM is
> >> very small on APUs we still use 3GiB of GTT.
> >>
> >> Regards,
> >> Christian.
> >
> > The problem is that 3GiB of GTT isn't enough for most modern games.
>
> You seem to misunderstand what the GTT size means here. This is the
> amount of memory an application can lock down in a single command
> submissions.
>
> It is still possible for the game to use all of system memory for
> textures etc... it can just happen that some buffers are temporary
> marked as inaccessible for the GPU.
>

For Vulkan we (both RADV and AMDVLK) use GTT as the total size. Usage in
modern games is essentially "bindless" so there is no way to track at a
per-submission level what memory needs to be resident. (and even with
tracking applications are allowed to use all the memory in a single draw
call, which would be unsplittable anyway ...)


> > My laptop has a 128MiB carveout which is not possible to be configured
> > in the BIOS so I am stuck with that size without extra kernel
> > parameters which shouldn't be necessary.
>
> Did you ran into problems without the parameter?
>
> >
> > If you dislike the approach of keeping the extra check for dGPUs and
> > limiting GTT there, then I would say that we should use
> > gtt_size = 3/4ths system memory
> > for all devices instead of
> > gtt_size = max(3/4ths system memory, 3GiB)
> > as it was before the revert, as it is problematic on systems with <
> > 3GiB of system memory.
>
> Yeah, that's indeed not a good idea.
>
> Regards,
> Christian.
>
> >
> > - Joshie 🐸✨
> >
> >>
> >>>
> >>> - Joshie 🐸✨
> >>>
> 
> >
> > This is problematic on APUs, especially with a small carveout
> > which can be as low as a fixed 128MiB, as there would be very a
> > limited
> > 3GiB available for video memory.
> > This obviously does not meet the demands of modern applications.
> >
> > This patch makes it so the GTT size heuristic always uses 3/4ths of
> > the system memory size on APUs (limiting the size by 3GiB/VRAM size
> > only on devices with dedicated video memory).
> >
> > Fixes: 24562523688b ("Revert drm/amd/amdgpu: set gtt size
> > according to
> > system memory size only")
> >
> > Signed-off-by: Joshua Ashton 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  5 +++--
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +---
> >   2 files changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 72efd579ec5e..a5a41e9272d6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -192,8 +192,9 @@ module_param_named(gartsize, amdgpu_gart_size,
> > uint, 0600);
> >
> >   /**
> >* DOC: gttsize (int)
> > - * Restrict the size of GTT domain in MiB for testing. The
> > default is -1 (It's VRAM size if 3GB < VRAM < 3/4 RAM,
> > - * otherwise 3/4 RAM size).
> > + * Restrict the size of GTT domain in MiB for testing. The
> > default is -1 (On APUs this is 3/4th
> > + * of the system memory; on dGPUs this is 3GiB or VRAM sized,
> > whichever is bigger,
> > + * with an upper bound of 3/4th of system memory.
> >*/
> >   MODULE_PARM_DESC(gttsize, "Size of the GTT domain in megabytes
> > (-1 = auto)");
> >   module_param_named(gttsize, amdgpu_gtt_size, int, 0600);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>

Re: [PATCH] drm/amdgpu: don't limit gtt size on apus

2021-01-06 Thread Christian König

Am 06.01.21 um 14:02 schrieb Bas Nieuwenhuizen:

On Wed, Jan 6, 2021 at 1:54 PM Christian König 
mailto:christian.koe...@amd.com>> wrote:

Am 06.01.21 um 13:47 schrieb Joshua Ashton:
>
>
> On 1/6/21 7:52 AM, Christian König wrote:
>> Am 05.01.21 um 23:31 schrieb Joshua Ashton:
>>> On 1/5/21 10:10 PM, Alex Deucher wrote:
 On Tue, Jan 5, 2021 at 5:05 PM Joshua Ashton
mailto:jos...@froggi.es>> wrote:
>
> Since commit 24562523688b ("Revert "drm/amd/amdgpu: set gtt size
> according to system memory size only""), the GTT size was
limited by
> 3GiB or VRAM size.

 The commit in question was to fix a hang with certain tests
on APUs.
 That should be tested again before we re-enable this.  If it
is fixed,
 we should just revert the revert rather than special case dGPUs.

 Alex

>>>
>>> I think the commit before the revert (ba851eed895c) has some
>>> fundamental problems:
>>>
>>> It was always specifying max(3GiB, 3/4ths RAM) of GTT, even if
that
>>> wouldn't fit into say, 1GiB or 2GiB of available RAM.
>>>
>>> Limiting GTT to min(max(3GiB, VRAM), 3/4ths RAM) size on dGPUs
makes
>>> sense also and is a sensible limit to avoid silly situations with
>>> overallocation and potential OOM.
>>>
>>> This patch solves both of those issues.
>>
>> No, Alex is right this approach was already tried and it causes
>> problems.
>>
>> Additional to that why should this be an issue? Even when VRAM is
>> very small on APUs we still use 3GiB of GTT.
>>
>> Regards,
>> Christian.
>
> The problem is that 3GiB of GTT isn't enough for most modern games.

You seem to misunderstand what the GTT size means here. This is the
amount of memory an application can lock down in a single command
submissions.

It is still possible for the game to use all of system memory for
textures etc... it can just happen that some buffers are temporary
marked as inaccessible for the GPU.

For Vulkan we (both RADV and AMDVLK) use GTT as the total size. Usage 
in modern games is essentially "bindless" so there is no way to track 
at a per-submission level what memory needs to be resident. (and even 
with tracking applications are allowed to use all the memory in a 
single draw call, which would be unsplittable anyway ...)

Yeah, that is a really good point.

The issue is that we need some limitation since 3/4 of system memory is 
way to much and the max texture size test in piglit can cause a system 
crash.

The alternative is a better OOM handling, so that an application which 
uses to much system memory through the driver stack has a more likely 
chance to get killed. Cause currently that is either X or Wayland :(

Christian.

> My laptop has a 128MiB carveout which is not possible to be
configured
> in the BIOS so I am stuck with that size without extra kernel
> parameters which shouldn't be necessary.

Did you ran into problems without the parameter?

>
> If you dislike the approach of keeping the extra check for dGPUs
and
> limiting GTT there, then I would say that we should use
> gtt_size = 3/4ths system memory
> for all devices instead of
> gtt_size = max(3/4ths system memory, 3GiB)
> as it was before the revert, as it is problematic on systems with <
> 3GiB of system memory.

Yeah, that's indeed not a good idea.

Regards,
Christian.

>
> - Joshie 🐸✨
>
>>
>>>
>>> - Joshie 🐸✨
>>>

>
> This is problematic on APUs, especially with a small carveout
> which can be as low as a fixed 128MiB, as there would be very a
> limited
> 3GiB available for video memory.
> This obviously does not meet the demands of modern applications.
>
> This patch makes it so the GTT size heuristic always uses
3/4ths of
> the system memory size on APUs (limiting the size by
3GiB/VRAM size
> only on devices with dedicated video memory).
>
> Fixes: 24562523688b ("Revert drm/amd/amdgpu: set gtt size
> according to
> system memory size only")
>
> Signed-off-by: Joshua Ashton mailto:jos...@froggi.es>>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  5 +++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +---
>   2 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 72efd579ec5e..a5a41e9272d6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -192,8 +192,9 @@ module_param_named(gartsize,
amdgpu_gart_size,
> uint, 0600)

Re: [PATCH] drm/amdgpu: don't limit gtt size on apus

2021-01-06 Thread Joshua Ashton




On 1/6/21 12:54 PM, Christian König wrote:

Am 06.01.21 um 13:47 schrieb Joshua Ashton:



On 1/6/21 7:52 AM, Christian König wrote:

Am 05.01.21 um 23:31 schrieb Joshua Ashton:

On 1/5/21 10:10 PM, Alex Deucher wrote:

On Tue, Jan 5, 2021 at 5:05 PM Joshua Ashton  wrote:


Since commit 24562523688b ("Revert "drm/amd/amdgpu: set gtt size
according to system memory size only""), the GTT size was limited by
3GiB or VRAM size.


The commit in question was to fix a hang with certain tests on APUs.
That should be tested again before we re-enable this.  If it is fixed,
we should just revert the revert rather than special case dGPUs.

Alex



I think the commit before the revert (ba851eed895c) has some 
fundamental problems:


It was always specifying max(3GiB, 3/4ths RAM) of GTT, even if that 
wouldn't fit into say, 1GiB or 2GiB of available RAM.


Limiting GTT to min(max(3GiB, VRAM), 3/4ths RAM) size on dGPUs makes 
sense also and is a sensible limit to avoid silly situations with 
overallocation and potential OOM.


This patch solves both of those issues.


No, Alex is right this approach was already tried and it causes 
problems.


Additional to that why should this be an issue? Even when VRAM is 
very small on APUs we still use 3GiB of GTT.


Regards,
Christian.


The problem is that 3GiB of GTT isn't enough for most modern games.


You seem to misunderstand what the GTT size means here. This is the 
amount of memory an application can lock down in a single command 
submissions.


It is still possible for the game to use all of system memory for 
textures etc... it can just happen that some buffers are temporary 
marked as inaccessible for the GPU.


In Vulkan, command buffers are explicit and the amount of memory the app 
uses is not trackable at a command buffer level due to bindless.


This means that we can't magically split command buffers like in GL if 
too much memory is being used by a single submission.


This means that the only two visible heaps available to AMD APUs in RADV 
right now are the carveout and GTT. As I understand it there is no other 
way to use more memory in APIs with explicit cmd buffering & bindless.




My laptop has a 128MiB carveout which is not possible to be configured 
in the BIOS so I am stuck with that size without extra kernel 
parameters which shouldn't be necessary.


Did you ran into problems without the parameter?



If you dislike the approach of keeping the extra check for dGPUs and 
limiting GTT there, then I would say that we should use

gtt_size = 3/4ths system memory
for all devices instead of
gtt_size = max(3/4ths system memory, 3GiB)
as it was before the revert, as it is problematic on systems with < 
3GiB of system memory.


Yeah, that's indeed not a good idea.

Regards,
Christian.



- Joshie 🐸✨





- Joshie 🐸✨





This is problematic on APUs, especially with a small carveout
which can be as low as a fixed 128MiB, as there would be very a 
limited

3GiB available for video memory.
This obviously does not meet the demands of modern applications.

This patch makes it so the GTT size heuristic always uses 3/4ths of
the system memory size on APUs (limiting the size by 3GiB/VRAM size
only on devices with dedicated video memory).

Fixes: 24562523688b ("Revert drm/amd/amdgpu: set gtt size 
according to

system memory size only")

Signed-off-by: Joshua Ashton 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  5 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +---
  2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

index 72efd579ec5e..a5a41e9272d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -192,8 +192,9 @@ module_param_named(gartsize, amdgpu_gart_size, 
uint, 0600);


  /**
   * DOC: gttsize (int)
- * Restrict the size of GTT domain in MiB for testing. The 
default is -1 (It's VRAM size if 3GB < VRAM < 3/4 RAM,

- * otherwise 3/4 RAM size).
+ * Restrict the size of GTT domain in MiB for testing. The 
default is -1 (On APUs this is 3/4th
+ * of the system memory; on dGPUs this is 3GiB or VRAM sized, 
whichever is bigger,

+ * with an upper bound of 3/4th of system memory.
   */
  MODULE_PARM_DESC(gttsize, "Size of the GTT domain in megabytes 
(-1 = auto)");

  module_param_named(gttsize, amdgpu_gtt_size, int, 0600);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index 4d8f19ab1014..294f26f4f310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1865,9 +1865,15 @@ int amdgpu_ttm_init(struct amdgpu_device 
*adev)

 struct sysinfo si;

 si_meminfo(&si);
-   gtt_size = min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 
20),

-  adev->gmc.mc_vram_size),
-  ((uint64_t)si.totalram * 
si.mem_unit * 3/4));

+

Re: [PATCH] drm/amdgpu: don't limit gtt size on apus

2021-01-06 Thread Joshua Ashton

On 1/6/21 1:05 PM, Christian König wrote:

Am 06.01.21 um 14:02 schrieb Bas Nieuwenhuizen:

On Wed, Jan 6, 2021 at 1:54 PM Christian König 
mailto:christian.koe...@amd.com>> wrote:

Am 06.01.21 um 13:47 schrieb Joshua Ashton:
>
>
> On 1/6/21 7:52 AM, Christian König wrote:
>> Am 05.01.21 um 23:31 schrieb Joshua Ashton:
>>> On 1/5/21 10:10 PM, Alex Deucher wrote:
 On Tue, Jan 5, 2021 at 5:05 PM Joshua Ashton
mailto:jos...@froggi.es>> wrote:
>
> Since commit 24562523688b ("Revert "drm/amd/amdgpu: set gtt size
> according to system memory size only""), the GTT size was
limited by
> 3GiB or VRAM size.

 The commit in question was to fix a hang with certain tests
on APUs.
 That should be tested again before we re-enable this.  If it
is fixed,
 we should just revert the revert rather than special case dGPUs.

 Alex

>>>
>>> I think the commit before the revert (ba851eed895c) has some
>>> fundamental problems:
>>>
>>> It was always specifying max(3GiB, 3/4ths RAM) of GTT, even if
that
>>> wouldn't fit into say, 1GiB or 2GiB of available RAM.
>>>
>>> Limiting GTT to min(max(3GiB, VRAM), 3/4ths RAM) size on dGPUs
makes
>>> sense also and is a sensible limit to avoid silly situations with
>>> overallocation and potential OOM.
>>>
>>> This patch solves both of those issues.
>>
>> No, Alex is right this approach was already tried and it causes
>> problems.
>>
>> Additional to that why should this be an issue? Even when VRAM is
>> very small on APUs we still use 3GiB of GTT.
>>
>> Regards,
>> Christian.
>
> The problem is that 3GiB of GTT isn't enough for most modern games.

You seem to misunderstand what the GTT size means here. This is the
amount of memory an application can lock down in a single command
submissions.

It is still possible for the game to use all of system memory for
textures etc... it can just happen that some buffers are temporary
marked as inaccessible for the GPU.

For Vulkan we (both RADV and AMDVLK) use GTT as the total size. Usage 
in modern games is essentially "bindless" so there is no way to track 
at a per-submission level what memory needs to be resident. (and even 
with tracking applications are allowed to use all the memory in a 
single draw call, which would be unsplittable anyway ...)

Yeah, that is a really good point.

The issue is that we need some limitation since 3/4 of system memory is 
way to much and the max texture size test in piglit can cause a system 
crash.

The alternative is a better OOM handling, so that an application which 
uses to much system memory through the driver stack has a more likely 
chance to get killed. Cause currently that is either X or Wayland :(

Christian.

As I understand it, what is being exposed right now is essentially 
max(vram size, 3GiB) limited by 3/4ths of the memory. Previously, before 
the revert what was being taken was just max(3GiB, 3/4ths).

If you had < 3GiB of system memory that seems like a bit of an issue 
that could easily leat to OOM to me?

Are you hitting on something smaller than 3/4ths right now? I remember 
the source commit mentioned they only had 1GiB of system memory 
available, so that could be possible if you had a carveout of < 786MiB...

- Joshie 🐸✨

> My laptop has a 128MiB carveout which is not possible to be
configured
> in the BIOS so I am stuck with that size without extra kernel
> parameters which shouldn't be necessary.

Did you ran into problems without the parameter?

>
> If you dislike the approach of keeping the extra check for dGPUs
and
> limiting GTT there, then I would say that we should use
> gtt_size = 3/4ths system memory
> for all devices instead of
> gtt_size = max(3/4ths system memory, 3GiB)
> as it was before the revert, as it is problematic on systems with <
> 3GiB of system memory.

Yeah, that's indeed not a good idea.

Regards,
Christian.

>
> - Joshie 🐸✨
>
>>
>>>
>>> - Joshie 🐸✨
>>>

>
> This is problematic on APUs, especially with a small carveout
> which can be as low as a fixed 128MiB, as there would be very a
> limited
> 3GiB available for video memory.
> This obviously does not meet the demands of modern applications.
>
> This patch makes it so the GTT size heuristic always uses
3/4ths of
> the system memory size on APUs (limiting the size by
3GiB/VRAM size
> only on devices with dedicated video memory).
>
> Fixes: 24562523688b ("Revert drm/amd/amdgpu: set gtt size
> according to
> system memory size only")
>
> Signed-off-by: Joshua Ashton mailto:jos...@froggi.es>>
>>>

Re: [PATCH] drm/amdgpu: don't limit gtt size on apus

2021-01-06 Thread Christian König

Am 06.01.21 um 14:17 schrieb Joshua Ashton:

On 1/6/21 1:05 PM, Christian König wrote:

Am 06.01.21 um 14:02 schrieb Bas Nieuwenhuizen:

On Wed, Jan 6, 2021 at 1:54 PM Christian König 
mailto:christian.koe...@amd.com>> wrote:

    Am 06.01.21 um 13:47 schrieb Joshua Ashton:
    >
    >
    > On 1/6/21 7:52 AM, Christian König wrote:
    >> Am 05.01.21 um 23:31 schrieb Joshua Ashton:
    >>> On 1/5/21 10:10 PM, Alex Deucher wrote:
     On Tue, Jan 5, 2021 at 5:05 PM Joshua Ashton
    mailto:jos...@froggi.es>> wrote:
    >
    > Since commit 24562523688b ("Revert "drm/amd/amdgpu: set 
gtt size

    > according to system memory size only""), the GTT size was
    limited by
    > 3GiB or VRAM size.

     The commit in question was to fix a hang with certain tests
    on APUs.
     That should be tested again before we re-enable this.  If it
    is fixed,
     we should just revert the revert rather than special case 
dGPUs.

     Alex

    >>>
    >>> I think the commit before the revert (ba851eed895c) has some
    >>> fundamental problems:
    >>>
    >>> It was always specifying max(3GiB, 3/4ths RAM) of GTT, even if
    that
    >>> wouldn't fit into say, 1GiB or 2GiB of available RAM.
    >>>
    >>> Limiting GTT to min(max(3GiB, VRAM), 3/4ths RAM) size on dGPUs
    makes
    >>> sense also and is a sensible limit to avoid silly situations 
with

    >>> overallocation and potential OOM.
    >>>
    >>> This patch solves both of those issues.
    >>
    >> No, Alex is right this approach was already tried and it causes
    >> problems.
    >>
    >> Additional to that why should this be an issue? Even when 
VRAM is

    >> very small on APUs we still use 3GiB of GTT.
    >>
    >> Regards,
    >> Christian.
    >
    > The problem is that 3GiB of GTT isn't enough for most modern 
games.

    You seem to misunderstand what the GTT size means here. This is the
    amount of memory an application can lock down in a single command
    submissions.

    It is still possible for the game to use all of system memory for
    textures etc... it can just happen that some buffers are temporary
    marked as inaccessible for the GPU.

For Vulkan we (both RADV and AMDVLK) use GTT as the total size. 
Usage in modern games is essentially "bindless" so there is no way 
to track at a per-submission level what memory needs to be resident. 
(and even with tracking applications are allowed to use all the 
memory in a single draw call, which would be unsplittable anyway ...)

Yeah, that is a really good point.

The issue is that we need some limitation since 3/4 of system memory 
is way to much and the max texture size test in piglit can cause a 
system crash.

The alternative is a better OOM handling, so that an application 
which uses to much system memory through the driver stack has a more 
likely chance to get killed. Cause currently that is either X or 
Wayland :(

Christian.

As I understand it, what is being exposed right now is essentially 
max(vram size, 3GiB) limited by 3/4ths of the memory. Previously, 
before the revert what was being taken was just max(3GiB, 3/4ths).

If you had < 3GiB of system memory that seems like a bit of an issue 
that could easily leat to OOM to me?

Not really, as I said GTT is only the memory the GPU can lock at the 
same time. It is perfectly possible to have that larger than the 
available system memory.

In other words this is *not* to prevent using to much system memory, for 
this we have an additional limit inside TTM. But instead to have a 
reasonable limit for applications to not use to much memory at the same 
time.

Are you hitting on something smaller than 3/4ths right now? I remember 
the source commit mentioned they only had 1GiB of system memory 
available, so that could be possible if you had a carveout of < 786MiB...

What do you mean with that? I don't have a test system at hand for this 
if that's what you are asking for.

Regards,
Christian.

- Joshie 🐸✨

    > My laptop has a 128MiB carveout which is not possible to be
    configured
    > in the BIOS so I am stuck with that size without extra kernel
    > parameters which shouldn't be necessary.

    Did you ran into problems without the parameter?

    >
    > If you dislike the approach of keeping the extra check for dGPUs
    and
    > limiting GTT there, then I would say that we should use
    > gtt_size = 3/4ths system memory
    > for all devices instead of
    > gtt_size = max(3/4ths system memory, 3GiB)
    > as it was before the revert, as it is problematic on systems 
with <

    > 3GiB of system memory.

    Yeah, that's indeed not a good idea.

    Regards,
    Christian.

    >
    > - Joshie 🐸✨
    >
    >>
    >>>
    >>> - Joshie 🐸✨
    >>>

    >
    > This is problematic on APUs, especially with a small carveout
    > which can be as low as a fixed 128MiB, as there would be 
very a

Re: [PATCH] drm/amdkfd: check more client ids in interrupt handler

2021-01-06 Thread Deucher, Alexander

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Alex Deucher 

From: Zhou1, Tao 
Sent: Wednesday, January 6, 2021 1:13 AM
To: Deucher, Alexander ; Kuehling, Felix 
; amd-gfx@lists.freedesktop.org 

Cc: Zhou1, Tao 
Subject: [PATCH] drm/amdkfd: check more client ids in interrupt handler

Add check for SExSH clients in kfd interrupt handler.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
index 0ca0327a39e5..74a460be077b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
@@ -56,7 +56,11 @@ static bool event_interrupt_isr_v9(struct kfd_dev *dev,
 client_id != SOC15_IH_CLIENTID_SDMA7 &&
 client_id != SOC15_IH_CLIENTID_VMC &&
 client_id != SOC15_IH_CLIENTID_VMC1 &&
-   client_id != SOC15_IH_CLIENTID_UTCL2)
+   client_id != SOC15_IH_CLIENTID_UTCL2 &&
+   client_id != SOC15_IH_CLIENTID_SE0SH &&
+   client_id != SOC15_IH_CLIENTID_SE1SH &&
+   client_id != SOC15_IH_CLIENTID_SE2SH &&
+   client_id != SOC15_IH_CLIENTID_SE3SH)
 return false;

 /* This is a known issue for gfx9. Under non HWS, pasid is not set
@@ -111,7 +115,11 @@ static void event_interrupt_wq_v9(struct kfd_dev *dev,
 vmid = SOC15_VMID_FROM_IH_ENTRY(ih_ring_entry);
 context_id = SOC15_CONTEXT_ID0_FROM_IH_ENTRY(ih_ring_entry);

-   if (client_id == SOC15_IH_CLIENTID_GRBM_CP) {
+   if (client_id == SOC15_IH_CLIENTID_GRBM_CP ||
+   client_id == SOC15_IH_CLIENTID_SE0SH ||
+   client_id == SOC15_IH_CLIENTID_SE1SH ||
+   client_id == SOC15_IH_CLIENTID_SE2SH ||
+   client_id == SOC15_IH_CLIENTID_SE3SH) {
 if (source_id == SOC15_INTSRC_CP_END_OF_PIPE)
 kfd_signal_event_interrupt(pasid, context_id, 32);
 else if (source_id == SOC15_INTSRC_SQ_INTERRUPT_MSG)
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: don't limit gtt size on apus

2021-01-06 Thread Joshua Ashton

On 1/6/21 1:45 PM, Christian König wrote:

Am 06.01.21 um 14:17 schrieb Joshua Ashton:

On 1/6/21 1:05 PM, Christian König wrote:

Am 06.01.21 um 14:02 schrieb Bas Nieuwenhuizen:

On Wed, Jan 6, 2021 at 1:54 PM Christian König 
mailto:christian.koe...@amd.com>> wrote:

    Am 06.01.21 um 13:47 schrieb Joshua Ashton:
    >
    >
    > On 1/6/21 7:52 AM, Christian König wrote:
    >> Am 05.01.21 um 23:31 schrieb Joshua Ashton:
    >>> On 1/5/21 10:10 PM, Alex Deucher wrote:
     On Tue, Jan 5, 2021 at 5:05 PM Joshua Ashton
    mailto:jos...@froggi.es>> wrote:
    >
    > Since commit 24562523688b ("Revert "drm/amd/amdgpu: set 
gtt size

    > according to system memory size only""), the GTT size was
    limited by
    > 3GiB or VRAM size.

     The commit in question was to fix a hang with certain tests
    on APUs.
     That should be tested again before we re-enable this.  If it
    is fixed,
     we should just revert the revert rather than special case 
dGPUs.

     Alex

    >>>
    >>> I think the commit before the revert (ba851eed895c) has some
    >>> fundamental problems:
    >>>
    >>> It was always specifying max(3GiB, 3/4ths RAM) of GTT, even if
    that
    >>> wouldn't fit into say, 1GiB or 2GiB of available RAM.
    >>>
    >>> Limiting GTT to min(max(3GiB, VRAM), 3/4ths RAM) size on dGPUs
    makes
    >>> sense also and is a sensible limit to avoid silly situations 
with

    >>> overallocation and potential OOM.
    >>>
    >>> This patch solves both of those issues.
    >>
    >> No, Alex is right this approach was already tried and it causes
    >> problems.
    >>
    >> Additional to that why should this be an issue? Even when 
VRAM is

    >> very small on APUs we still use 3GiB of GTT.
    >>
    >> Regards,
    >> Christian.
    >
    > The problem is that 3GiB of GTT isn't enough for most modern 
games.

    You seem to misunderstand what the GTT size means here. This is the
    amount of memory an application can lock down in a single command
    submissions.

    It is still possible for the game to use all of system memory for
    textures etc... it can just happen that some buffers are temporary
    marked as inaccessible for the GPU.

For Vulkan we (both RADV and AMDVLK) use GTT as the total size. 
Usage in modern games is essentially "bindless" so there is no way 
to track at a per-submission level what memory needs to be resident. 
(and even with tracking applications are allowed to use all the 
memory in a single draw call, which would be unsplittable anyway ...)

Yeah, that is a really good point.

The issue is that we need some limitation since 3/4 of system memory 
is way to much and the max texture size test in piglit can cause a 
system crash.

The alternative is a better OOM handling, so that an application 
which uses to much system memory through the driver stack has a more 
likely chance to get killed. Cause currently that is either X or 
Wayland :(

Christian.

As I understand it, what is being exposed right now is essentially 
max(vram size, 3GiB) limited by 3/4ths of the memory. Previously, 
before the revert what was being taken was just max(3GiB, 3/4ths).

If you had < 3GiB of system memory that seems like a bit of an issue 
that could easily leat to OOM to me?

Not really, as I said GTT is only the memory the GPU can lock at the 
same time. It is perfectly possible to have that larger than the 
available system memory.

In other words this is *not* to prevent using to much system memory, for 
this we have an additional limit inside TTM. But instead to have a 
reasonable limit for applications to not use to much memory at the same 
time.

Worth noting that this GTT size here also affects the memory reporting 
and budgeting for applications. If the user has 1GiB of total system 
memory and 3GiB set here, then 3GiB will be the budget and size exposed 
to applications too...

(On APUs,) we really don't want to expose more GTT than system memory. 
Apps will eat into it and end up swapping or running into OOM or 
swapping *very* quickly. (I imagine this is likely what was being run 
into before the revert.)

Alternatively, in RADV and other user space drivers like AMDVLK, we 
could limit this to the system memory size or 3/4ths ourselves. Although 
that's kinda gross and I don't think that's the correct path...

Are you hitting on something smaller than 3/4ths right now? I remember 
the source commit mentioned they only had 1GiB of system memory 
available, so that could be possible if you had a carveout of < 786MiB...

What do you mean with that? I don't have a test system at hand for this 
if that's what you are asking for.

This was mainly a question to whoever did the revert. The question to 
find out some extra info about what they are using at the time.

- Joshie 🐸✨

Regards,
Christian.

- Joshie 🐸✨

    > My laptop has a 128MiB carveout which is not possible to

Re: [PATCH] drm/amdgpu: don't limit gtt size on apus

2021-01-06 Thread Christian König


Am 06.01.21 um 15:18 schrieb Joshua Ashton:

[SNIP]
For Vulkan we (both RADV and AMDVLK) use GTT as the total size. 
Usage in modern games is essentially "bindless" so there is no way 
to track at a per-submission level what memory needs to be 
resident. (and even with tracking applications are allowed to use 
all the memory in a single draw call, which would be unsplittable 
anyway ...)


Yeah, that is a really good point.

The issue is that we need some limitation since 3/4 of system 
memory is way to much and the max texture size test in piglit can 
cause a system crash.


The alternative is a better OOM handling, so that an application 
which uses to much system memory through the driver stack has a 
more likely chance to get killed. Cause currently that is either X 
or Wayland :(


Christian.


As I understand it, what is being exposed right now is essentially 
max(vram size, 3GiB) limited by 3/4ths of the memory. Previously, 
before the revert what was being taken was just max(3GiB, 3/4ths).


If you had < 3GiB of system memory that seems like a bit of an issue 
that could easily leat to OOM to me?


Not really, as I said GTT is only the memory the GPU can lock at the 
same time. It is perfectly possible to have that larger than the 
available system memory.


In other words this is *not* to prevent using to much system memory, 
for this we have an additional limit inside TTM. But instead to have 
a reasonable limit for applications to not use to much memory at the 
same time.




Worth noting that this GTT size here also affects the memory reporting 
and budgeting for applications. If the user has 1GiB of total system 
memory and 3GiB set here, then 3GiB will be the budget and size 
exposed to applications too...


Yeah, that's indeed problematic.



(On APUs,) we really don't want to expose more GTT than system memory. 
Apps will eat into it and end up swapping or running into OOM or 
swapping *very* quickly. (I imagine this is likely what was being run 
into before the revert.)


No, the issue is that some applications try to allocate textures way 
above some reasonable limit.


Alternatively, in RADV and other user space drivers like AMDVLK, we 
could limit this to the system memory size or 3/4ths ourselves. 
Although that's kinda gross and I don't think that's the correct path...


Ok, let me explain from the other side: We have this limitation because 
otherwise some tests like the maximum texture size test for OpenGL 
crashes the system. And this is independent of your system configuration.


We could of course add another limit for the texture size in 
OpenGL/RADV/AMDVLK, but I agree that this is rather awkward.




Are you hitting on something smaller than 3/4ths right now? I 
remember the source commit mentioned they only had 1GiB of system 
memory available, so that could be possible if you had a carveout of 
< 786MiB...


What do you mean with that? I don't have a test system at hand for 
this if that's what you are asking for.


This was mainly a question to whoever did the revert. The question to 
find out some extra info about what they are using at the time.


You don't need a specific system configuration for this, just try to run 
the max texture size test in piglit.


Regards,
Christian.



- Joshie 🐸✨


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm: Check actual format for legacy pageflip.

2021-01-06 Thread Liu, Zhan

[AMD Official Use Only - Internal Distribution Only]

> -Original Message-
> From: Liu, Zhan 
> Sent: 2021/January/04, Monday 3:46 PM
> To: Bas Nieuwenhuizen ; Mario Kleiner
> 
> Cc: dri-devel ; amd-gfx list  g...@lists.freedesktop.org>; Deucher, Alexander
> ; Daniel Vetter ;
> Kazlauskas, Nicholas ; Ville Syrjälä
> 
> Subject: Re: [PATCH] drm: Check actual format for legacy pageflip.
>
>
>
> + Ville
>
> On Sat, Jan 2, 2021 at 4:31 PM Mario Kleiner 
> wrote:
> >
> > On Sat, Jan 2, 2021 at 3:02 PM Bas Nieuwenhuizen
> >  wrote:
> > >
> > > With modifiers one can actually have different format_info structs
> > > for the same format, which now matters for AMDGPU since we convert
> > > implicit modifiers to explicit modifiers with multiple planes.
> > >
> > > I checked other drivers and it doesn't look like they end up
> > > triggering this case so I think this is safe to relax.
> > >
> > > Signed-off-by: Bas Nieuwenhuizen 
> > > Fixes: 816853f9dc40 ("drm/amd/display: Set new format info for
> > >converted metadata.")
> > > ---
> > >  drivers/gpu/drm/drm_plane.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_plane.c
> > > b/drivers/gpu/drm/drm_plane.c index e6231947f987..f5085990cfac
> > > 100644
> > > --- a/drivers/gpu/drm/drm_plane.c
> > > +++ b/drivers/gpu/drm/drm_plane.c
> > > @@ -1163,7 +1163,7 @@ int drm_mode_page_flip_ioctl(struct
> drm_device
> > >*dev,
> > > if (ret)
> > > goto out;
> > >
> > > -   if (old_fb->format != fb->format) {
> > > +   if (old_fb->format->format != fb->format->format) {
> >
>
> I agree with this patch, though considering the original way was made by
> Ville, I will wait for Ville's input first. Adding my "Acked-by" here.
>
> This patch is:
> Acked-by: Zhan Liu 

Ping...

>
> > This was btw. the original way before Ville made it more strict about
> > 4 years ago, to catch issues related to tiling, and more complex
> > layouts, like the dcc tiling/retiling introduced by your modifier
> > patches. That's why I hope my alternative patch is a good solution for
> > atomic drivers while keeping the strictness for potential legacy
> > drivers.
> >
> > -mario
> >
> > > DRM_DEBUG_KMS("Page flip is not allowed to change
> > >frame buffer format.\n");
> > > ret = -EINVAL;
> > > goto out;
> > > --
> > > 2.29.2
> > >
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/3] drm/amdgpu:Limit the resolution for virtual_display

2021-01-06 Thread Michel Dänzer


On 2021-01-06 11:40 a.m., Deng, Emily wrote:

From: Alex Deucher 
On Tue, Jan 5, 2021 at 3:37 AM Emily.Deng  wrote:


Limit the resolution not bigger than 16384, which means
dev->mode_info.num_crtc * common_modes[i].w not bigger than 16384.

Signed-off-by: Emily.Deng 
---
  drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
index 2b16c8faca34..c23d37b02fd7 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
@@ -319,6 +319,7 @@ dce_virtual_encoder(struct drm_connector
*connector)  static int dce_virtual_get_modes(struct drm_connector
*connector)  {
 struct drm_device *dev = connector->dev;
+   struct amdgpu_device *adev = dev->dev_private;
 struct drm_display_mode *mode = NULL;
 unsigned i;
 static const struct mode_size { @@ -350,8 +351,10 @@ static
int dce_virtual_get_modes(struct drm_connector *connector)
 };

 for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
-   mode = drm_cvt_mode(dev, common_modes[i].w,

common_modes[i].h, 60, false, false, false);

-   drm_mode_probed_add(connector, mode);
+   if (adev->mode_info.num_crtc <= 4 ||
+ common_modes[i].w <= 2560) {


You are also limiting the number of crtcs here.  Intended?  Won't this break 5
or 6 crtc configs?

Alex

Yes, it is intended,  for num_crtc bigger then 4, don't support resolution 
bigger then 2560, because of the max supported width is 16384 for xcb protocol.


There's no such limitation with Wayland. I'd recommend against 
artificially imposing limits from X11 to the kernel.



(As a side note, the X11 protocol limit should actually be 32768; the 
16384 limit exposed in the RANDR extension comes from the kernel driver, 
specifically drmModeGetResources's max_width/height)



--
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: check more client ids in interrupt handler

2021-01-06 Thread Felix Kuehling

Thanks for catching and fixing this.

Reviewed-by: Felix Kuehling 

Am 2021-01-06 um 1:13 a.m. schrieb Tao Zhou:
> Add check for SExSH clients in kfd interrupt handler.
>
> Signed-off-by: Tao Zhou 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> index 0ca0327a39e5..74a460be077b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> @@ -56,7 +56,11 @@ static bool event_interrupt_isr_v9(struct kfd_dev *dev,
>   client_id != SOC15_IH_CLIENTID_SDMA7 &&
>   client_id != SOC15_IH_CLIENTID_VMC &&
>   client_id != SOC15_IH_CLIENTID_VMC1 &&
> - client_id != SOC15_IH_CLIENTID_UTCL2)
> + client_id != SOC15_IH_CLIENTID_UTCL2 &&
> + client_id != SOC15_IH_CLIENTID_SE0SH &&
> + client_id != SOC15_IH_CLIENTID_SE1SH &&
> + client_id != SOC15_IH_CLIENTID_SE2SH &&
> + client_id != SOC15_IH_CLIENTID_SE3SH)
>   return false;
>  
>   /* This is a known issue for gfx9. Under non HWS, pasid is not set
> @@ -111,7 +115,11 @@ static void event_interrupt_wq_v9(struct kfd_dev *dev,
>   vmid = SOC15_VMID_FROM_IH_ENTRY(ih_ring_entry);
>   context_id = SOC15_CONTEXT_ID0_FROM_IH_ENTRY(ih_ring_entry);
>  
> - if (client_id == SOC15_IH_CLIENTID_GRBM_CP) {
> + if (client_id == SOC15_IH_CLIENTID_GRBM_CP ||
> + client_id == SOC15_IH_CLIENTID_SE0SH ||
> + client_id == SOC15_IH_CLIENTID_SE1SH ||
> + client_id == SOC15_IH_CLIENTID_SE2SH ||
> + client_id == SOC15_IH_CLIENTID_SE3SH) {
>   if (source_id == SOC15_INTSRC_CP_END_OF_PIPE)
>   kfd_signal_event_interrupt(pasid, context_id, 32);
>   else if (source_id == SOC15_INTSRC_SQ_INTERRUPT_MSG)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

radeon kernel driver not suppressing ACPI_VIDEO_NOTIFY_PROBE events when it should

2021-01-06 Thread Hans de Goede

Hi All,

I get Cc-ed on all Fedora kernel bugs and this one stood out to me:

https://bugzilla.redhat.com/show_bug.cgi?id=1911763

Since I've done a lot of work on the acpi-video code I thought I should
take a look. I've managed to help the user with a kernel-commandline
option which stops video.ko (the acpi-video kernel module) from emitting
key-press events for ACPI_VIDEO_NOTIFY_PROBE events.

This is on a Dell Vostro laptop with i915/radeon hybrid gfx.

I was thinking about adding a DMI quirk for this, but from the brief time
that I worked on nouveau (and specifically hybrid gfx setups) I know that
these events get fired on hybrid gfx setups when the discrete GPU is
powered down and something happens which requires the discrete GPUs drivers
attention, like an external monitor being plugged into a connector handled
by the dGPU (note that is not the case here).

So I took a quick look at the radeon code and the radeon_atif_handler()
function from drivers/gpu/drm/radeon/radeon_acpi.c. When successful that
returns NOTIFY_BAD which suppresses the key-press.

But in various cases it returns NOTIFY_DONE instead which does not
suppress the key-press event. So I think that the spurious key-press events
which the user is seeing should be avoided by this function returning
NOTIFY_BAD.

Specifically I'm wondering if we should not return
NOTIFY_BAD when count == 0?   I guess this can cause problems if there
are multiple GPUs, but we could check if the acpi-event is for the
pci-device the radeon driver is bound to. This would require changing the
acpi-notify code to also pass the acpi_device pointer as part of the
acpi_bus_event but that should not be a problem.

Anyways I'm hoping you all have some ideas. If necessary I can build
a Fedora test-kernel with some patches for the reporter to test.

Regards,

Hans

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: don't limit gtt size on apus

2021-01-06 Thread Joshua Ashton




On 1/6/21 2:59 PM, Christian König wrote:

Am 06.01.21 um 15:18 schrieb Joshua Ashton:

[SNIP]
For Vulkan we (both RADV and AMDVLK) use GTT as the total size. 
Usage in modern games is essentially "bindless" so there is no way 
to track at a per-submission level what memory needs to be 
resident. (and even with tracking applications are allowed to use 
all the memory in a single draw call, which would be unsplittable 
anyway ...)


Yeah, that is a really good point.

The issue is that we need some limitation since 3/4 of system 
memory is way to much and the max texture size test in piglit can 
cause a system crash.


The alternative is a better OOM handling, so that an application 
which uses to much system memory through the driver stack has a 
more likely chance to get killed. Cause currently that is either X 
or Wayland :(


Christian.


As I understand it, what is being exposed right now is essentially 
max(vram size, 3GiB) limited by 3/4ths of the memory. Previously, 
before the revert what was being taken was just max(3GiB, 3/4ths).


If you had < 3GiB of system memory that seems like a bit of an issue 
that could easily leat to OOM to me?


Not really, as I said GTT is only the memory the GPU can lock at the 
same time. It is perfectly possible to have that larger than the 
available system memory.


In other words this is *not* to prevent using to much system memory, 
for this we have an additional limit inside TTM. But instead to have 
a reasonable limit for applications to not use to much memory at the 
same time.




Worth noting that this GTT size here also affects the memory reporting 
and budgeting for applications. If the user has 1GiB of total system 
memory and 3GiB set here, then 3GiB will be the budget and size 
exposed to applications too...


Yeah, that's indeed problematic.



(On APUs,) we really don't want to expose more GTT than system memory. 
Apps will eat into it and end up swapping or running into OOM or 
swapping *very* quickly. (I imagine this is likely what was being run 
into before the revert.)


No, the issue is that some applications try to allocate textures way 
above some reasonable limit.


Alternatively, in RADV and other user space drivers like AMDVLK, we 
could limit this to the system memory size or 3/4ths ourselves. 
Although that's kinda gross and I don't think that's the correct path...


Ok, let me explain from the other side: We have this limitation because 
otherwise some tests like the maximum texture size test for OpenGL 
crashes the system. And this is independent of your system configuration.


We could of course add another limit for the texture size in 
OpenGL/RADV/AMDVLK, but I agree that this is rather awkward.




Are you hitting on something smaller than 3/4ths right now? I 
remember the source commit mentioned they only had 1GiB of system 
memory available, so that could be possible if you had a carveout of 
< 786MiB...


What do you mean with that? I don't have a test system at hand for 
this if that's what you are asking for.


This was mainly a question to whoever did the revert. The question to 
find out some extra info about what they are using at the time.


You don't need a specific system configuration for this, just try to run 
the max texture size test in piglit.


Regards,
Christian.


I see... I have not managed to reproduce a hang as described in the 
revert commit, but I have had a soft crash and delay with the OOM killer 
ending X.org after a little bit when GTT > system memory.


I tested with max-texture-size on both Renoir and Picasso the following 
conditions:

16GiB RAM + 12 GiB GTT -> test works fine
16GiB RAM + 64 GiB GTT -> OOM killer kills X.org after a little bit of 
waiting (piglit died with it)

2 GiB RAM + 1.5GiB GTT -> test works fine

I also tested on my Radeon VII and it worked fine regardless of the GTT 
size there, although that card has more than enough video memory any way 
for nothing to be an issue there 🐸.
Limiting my system memory to 2GiB, the card's memory and visible memory 
to 1GiB and the GTT to 1.75GiB, the test works fine.


The only time I ever had problems with a crash or pesudo-hang (waiting 
for OOM killer but the system was locked up) was whenever GTT was > 
system memory (ie. in the reverted commit)


If I edited my commit to universally use 3/4ths of the system memory for 
GTT for all hardware, would that be considered to be merged?


Thanks!
- Joshie 🐸✨





- Joshie 🐸✨



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: radeon kernel driver not suppressing ACPI_VIDEO_NOTIFY_PROBE events when it should

2021-01-06 Thread Alex Deucher

On Wed, Jan 6, 2021 at 11:25 AM Hans de Goede  wrote:
>
> Hi All,
>
> I get Cc-ed on all Fedora kernel bugs and this one stood out to me:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1911763
>
> Since I've done a lot of work on the acpi-video code I thought I should
> take a look. I've managed to help the user with a kernel-commandline
> option which stops video.ko (the acpi-video kernel module) from emitting
> key-press events for ACPI_VIDEO_NOTIFY_PROBE events.
>
> This is on a Dell Vostro laptop with i915/radeon hybrid gfx.
>
> I was thinking about adding a DMI quirk for this, but from the brief time
> that I worked on nouveau (and specifically hybrid gfx setups) I know that
> these events get fired on hybrid gfx setups when the discrete GPU is
> powered down and something happens which requires the discrete GPUs drivers
> attention, like an external monitor being plugged into a connector handled
> by the dGPU (note that is not the case here).
>
> So I took a quick look at the radeon code and the radeon_atif_handler()
> function from drivers/gpu/drm/radeon/radeon_acpi.c. When successful that
> returns NOTIFY_BAD which suppresses the key-press.
>
> But in various cases it returns NOTIFY_DONE instead which does not
> suppress the key-press event. So I think that the spurious key-press events
> which the user is seeing should be avoided by this function returning
> NOTIFY_BAD.
>
> Specifically I'm wondering if we should not return
> NOTIFY_BAD when count == 0?   I guess this can cause problems if there
> are multiple GPUs, but we could check if the acpi-event is for the
> pci-device the radeon driver is bound to. This would require changing the
> acpi-notify code to also pass the acpi_device pointer as part of the
> acpi_bus_event but that should not be a problem.
>

For A+A PX/HG systems, we'd want the notifications for both the dGPU
and the APU since some of the events are relevant to one or the other.
ATIF_DGPU_DISPLAY_EVENT is only relevant to the dGPU, while
ATIF_PANEL_BRIGHTNESS_CHANGE_REQUEST would be possibly relevant to
both (if there was a mux), but mainly the APU.
ATIF_SYSTEM_POWER_SOURCE_CHANGE_REQUEST would be relevant to both.
The other events have extended bits to determine which GPU the event
is targeted at.

Alex


> Anyways I'm hoping you all have some ideas. If necessary I can build
> a Fedora test-kernel with some patches for the reporter to test.
>
> Regards,
>
> Hans
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Patch "Revert "drm/amd/display: Fix memory leaks in S3 resume"" has been added to the 5.10-stable tree

2021-01-06 Thread gregkh



This is a note to let you know that I've just added the patch titled

Revert "drm/amd/display: Fix memory leaks in S3 resume"

to the 5.10-stable tree which can be found at:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
 revert-drm-amd-display-fix-memory-leaks-in-s3-resume.patch
and it can be found in the queue-5.10 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


>From alexdeuc...@gmail.com  Wed Jan  6 17:47:17 2021
From: Alex Deucher 
Date: Tue,  5 Jan 2021 11:45:45 -0500
Subject: Revert "drm/amd/display: Fix memory leaks in S3 resume"
To: amd-gfx@lists.freedesktop.org
Cc: Alex Deucher , Stylon Wang 
, Harry Wentland , Nicholas 
Kazlauskas , Andre Tomt , 
Oleksandr Natalenko , sta...@vger.kernel.org
Message-ID: <20210105164545.963036-1-alexander.deuc...@amd.com>

From: Alex Deucher 

This reverts commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362.

This leads to blank screens on some boards after replugging a
display.  Revert until we understand the root cause and can
fix both the leak and the blank screen after replug.

Cc: Stylon Wang 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
Cc: Andre Tomt 
Cc: Oleksandr Natalenko 
Signed-off-by: Alex Deucher 
Cc: sta...@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2278,8 +2278,7 @@ void amdgpu_dm_update_connector_after_de
 
drm_connector_update_edid_property(connector,
   aconnector->edid);
-   aconnector->num_modes = drm_add_edid_modes(connector, 
aconnector->edid);
-   drm_connector_list_update(connector);
+   drm_add_edid_modes(connector, aconnector->edid);
 
if (aconnector->dc_link->aux_mode)
drm_dp_cec_set_edid(&aconnector->dm_dp_aux.aux,


Patches currently in stable-queue which might be from alexdeuc...@gmail.com are

queue-5.10/revert-drm-amd-display-fix-memory-leaks-in-s3-resume.patch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Patch "Revert "drm/amd/display: Fix memory leaks in S3 resume"" has been added to the 5.4-stable tree

2021-01-06 Thread gregkh



This is a note to let you know that I've just added the patch titled

Revert "drm/amd/display: Fix memory leaks in S3 resume"

to the 5.4-stable tree which can be found at:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
 revert-drm-amd-display-fix-memory-leaks-in-s3-resume.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


>From alexdeuc...@gmail.com  Wed Jan  6 17:47:17 2021
From: Alex Deucher 
Date: Tue,  5 Jan 2021 11:45:45 -0500
Subject: Revert "drm/amd/display: Fix memory leaks in S3 resume"
To: amd-gfx@lists.freedesktop.org
Cc: Alex Deucher , Stylon Wang 
, Harry Wentland , Nicholas 
Kazlauskas , Andre Tomt , 
Oleksandr Natalenko , sta...@vger.kernel.org
Message-ID: <20210105164545.963036-1-alexander.deuc...@amd.com>

From: Alex Deucher 

This reverts commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362.

This leads to blank screens on some boards after replugging a
display.  Revert until we understand the root cause and can
fix both the leak and the blank screen after replug.

Cc: Stylon Wang 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
Cc: Andre Tomt 
Cc: Oleksandr Natalenko 
Signed-off-by: Alex Deucher 
Cc: sta...@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1434,8 +1434,7 @@ amdgpu_dm_update_connector_after_detect(
 
drm_connector_update_edid_property(connector,
   aconnector->edid);
-   aconnector->num_modes = drm_add_edid_modes(connector, 
aconnector->edid);
-   drm_connector_list_update(connector);
+   drm_add_edid_modes(connector, aconnector->edid);
 
if (aconnector->dc_link->aux_mode)
drm_dp_cec_set_edid(&aconnector->dm_dp_aux.aux,


Patches currently in stable-queue which might be from alexdeuc...@gmail.com are

queue-5.4/revert-drm-amd-display-fix-memory-leaks-in-s3-resume.patch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: radeon kernel driver not suppressing ACPI_VIDEO_NOTIFY_PROBE events when it should

2021-01-06 Thread Alex Deucher

On Wed, Jan 6, 2021 at 1:10 PM Hans de Goede  wrote:
>
> Hi,
>
> On 1/6/21 6:07 PM, Alex Deucher wrote:
> > On Wed, Jan 6, 2021 at 11:25 AM Hans de Goede  wrote:
> >>
> >> Hi All,
> >>
> >> I get Cc-ed on all Fedora kernel bugs and this one stood out to me:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1911763
> >>
> >> Since I've done a lot of work on the acpi-video code I thought I should
> >> take a look. I've managed to help the user with a kernel-commandline
> >> option which stops video.ko (the acpi-video kernel module) from emitting
> >> key-press events for ACPI_VIDEO_NOTIFY_PROBE events.
> >>
> >> This is on a Dell Vostro laptop with i915/radeon hybrid gfx.
> >>
> >> I was thinking about adding a DMI quirk for this, but from the brief time
> >> that I worked on nouveau (and specifically hybrid gfx setups) I know that
> >> these events get fired on hybrid gfx setups when the discrete GPU is
> >> powered down and something happens which requires the discrete GPUs drivers
> >> attention, like an external monitor being plugged into a connector handled
> >> by the dGPU (note that is not the case here).
> >>
> >> So I took a quick look at the radeon code and the radeon_atif_handler()
> >> function from drivers/gpu/drm/radeon/radeon_acpi.c. When successful that
> >> returns NOTIFY_BAD which suppresses the key-press.
> >>
> >> But in various cases it returns NOTIFY_DONE instead which does not
> >> suppress the key-press event. So I think that the spurious key-press events
> >> which the user is seeing should be avoided by this function returning
> >> NOTIFY_BAD.
> >>
> >> Specifically I'm wondering if we should not return
> >> NOTIFY_BAD when count == 0?   I guess this can cause problems if there
> >> are multiple GPUs, but we could check if the acpi-event is for the
> >> pci-device the radeon driver is bound to. This would require changing the
> >> acpi-notify code to also pass the acpi_device pointer as part of the
> >> acpi_bus_event but that should not be a problem.
> >>
> >
> > For A+A PX/HG systems, we'd want the notifications for both the dGPU
> > and the APU since some of the events are relevant to one or the other.
> > ATIF_DGPU_DISPLAY_EVENT is only relevant to the dGPU, while
> > ATIF_PANEL_BRIGHTNESS_CHANGE_REQUEST would be possibly relevant to
> > both (if there was a mux), but mainly the APU.
> > ATIF_SYSTEM_POWER_SOURCE_CHANGE_REQUEST would be relevant to both.
> > The other events have extended bits to determine which GPU the event
> > is targeted at.
>
> Right, but AFAIK on hybrid systems there are 2 ACPI video-bus devices,
> one for each of the iGPU and dGPU which is why I suggested passing
> the video-bus acpi_device as extra data in acpi_bus_event and then
> radeon_atif_handler() could check if the acpi_device is the companion
> device of the GPU. This assumes that events for GPU# will also
> originate from (through an ACPI ASL notify call) the ACPI video-bus
> which belongs to that GPU.

That's not the case.  For PX/HG systems, ATIF is in the iGPU's
namespace, on dGPU only systems, ATIF is in the dGPU's namespace.

Alex

>
> This all assumes though that the problem is that radeon_atif_handler()
> does not return NOTIFY_BAD when the event count being 0 (in other words
> a spurious event). It is also possibly that one of the earlier checks in
> radeon_atif_handler() is failing...
>
> I guess that a first step in debugging this would be to ask the reporter
> to run a kernel with some debugging printk-s added to radeon_atif_handler(),
> to see which code-path in radeon_atif_handler we end up in
> (assuming that radeon_atif_handler() gets called at all).
>
> Any suggestions for other debugging printk-s, before I prepare a Fedora
> kernel for the reporter to test?
>
> Regards,
>
> Hans
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/3] drm/amdgpu: Add new mode 2560x1440

2021-01-06 Thread Alex Deucher

On Tue, Jan 5, 2021 at 8:05 PM Emily.Deng  wrote:
>
> Add one more 2k resolution which appears frequently in market.
>
> Signed-off-by: Emily.Deng 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c 
> b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
> index ffcc64ec6473..9810af712cc0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
> @@ -294,7 +294,7 @@ static int dce_virtual_get_modes(struct drm_connector 
> *connector)
> static const struct mode_size {
> int w;
> int h;
> -   } common_modes[21] = {
> +   } common_modes[] = {
> { 640,  480},
> { 720,  480},
> { 800,  600},
> @@ -312,13 +312,14 @@ static int dce_virtual_get_modes(struct drm_connector 
> *connector)
> {1600, 1200},
> {1920, 1080},
> {1920, 1200},
> +   {2560, 1440},
> {4096, 3112},
> {3656, 2664},
> {3840, 2160},
> {4096, 2160},
> };
>
> -   for (i = 0; i < 21; i++) {
> +   for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
> mode = drm_cvt_mode(dev, common_modes[i].w, 
> common_modes[i].h, 60, false, false, false);
> drm_mode_probed_add(connector, mode);
> }
> --
> 2.25.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/3] drm/amdgpu: Correct the read sclk for navi10

2021-01-06 Thread Alex Deucher

On Tue, Jan 5, 2021 at 8:05 PM Emily.Deng  wrote:
>
> According to hw, after navi10,it runs in dfll mode, and should
> read sclk from AverageGfxclkFrequency.
>
> Signed-off-by: Emily.Deng 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> index 51e83123f72a..7ebf9588983f 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> @@ -1673,7 +1673,7 @@ static int navi10_read_sensor(struct smu_context *smu,
> *size = 4;
> break;
> case AMDGPU_PP_SENSOR_GFX_SCLK:
> -   ret = navi10_get_current_clk_freq_by_table(smu, SMU_GFXCLK, 
> (uint32_t *)data);
> +   ret = navi10_get_smu_metrics_data(smu, 
> METRICS_AVERAGE_GFXCLK, (uint32_t *)data);
> *(uint32_t *)data *= 100;
> *size = 4;
> break;
> --
> 2.25.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v3 3/3] drm/amd/display: Skip modeset for front porch change

2021-01-06 Thread Kazlauskas, Nicholas


On 2021-01-04 4:08 p.m., Aurabindo Pillai wrote:

[Why&How]
Inorder to enable freesync video mode, driver adds extra
modes based on preferred modes for common freesync frame rates.
When commiting these mode changes, a full modeset is not needed.
If the change in only in the front porch timing value, skip full
modeset and continue using the same stream.

Signed-off-by: Aurabindo Pillai 
Acked-by: Christian König 
---
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 219 +++---
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   1 +
  2 files changed, 188 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index aaef2fb528fd..315756207f0f 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -213,6 +213,9 @@ static bool amdgpu_dm_psr_disable_all(struct 
amdgpu_display_manager *dm);
  static const struct drm_format_info *
  amd_get_format_info(const struct drm_mode_fb_cmd2 *cmd);
  
+static bool

+is_timing_unchanged_for_freesync(struct drm_crtc_state *old_crtc_state,
+struct drm_crtc_state *new_crtc_state);
  /*
   * dm_vblank_get_counter
   *
@@ -4940,7 +4943,8 @@ static void fill_stream_properties_from_drm_display_mode(
const struct drm_connector *connector,
const struct drm_connector_state *connector_state,
const struct dc_stream_state *old_stream,
-   int requested_bpc)
+   int requested_bpc,
+   bool is_in_modeset)
  {
struct dc_crtc_timing *timing_out = &stream->timing;
const struct drm_display_info *info = &connector->display_info;
@@ -4995,19 +4999,28 @@ static void 
fill_stream_properties_from_drm_display_mode(
timing_out->hdmi_vic = hv_frame.vic;
}
  
-	timing_out->h_addressable = mode_in->crtc_hdisplay;

-   timing_out->h_total = mode_in->crtc_htotal;
-   timing_out->h_sync_width =
-   mode_in->crtc_hsync_end - mode_in->crtc_hsync_start;
-   timing_out->h_front_porch =
-   mode_in->crtc_hsync_start - mode_in->crtc_hdisplay;
-   timing_out->v_total = mode_in->crtc_vtotal;
-   timing_out->v_addressable = mode_in->crtc_vdisplay;
-   timing_out->v_front_porch =
-   mode_in->crtc_vsync_start - mode_in->crtc_vdisplay;
-   timing_out->v_sync_width =
-   mode_in->crtc_vsync_end - mode_in->crtc_vsync_start;
-   timing_out->pix_clk_100hz = mode_in->crtc_clock * 10;
+   if (is_in_modeset) {
+   timing_out->h_addressable = mode_in->hdisplay;
+   timing_out->h_total = mode_in->htotal;
+   timing_out->h_sync_width = mode_in->hsync_end - 
mode_in->hsync_start;
+   timing_out->h_front_porch = mode_in->hsync_start - 
mode_in->hdisplay;
+   timing_out->v_total = mode_in->vtotal;
+   timing_out->v_addressable = mode_in->vdisplay;
+   timing_out->v_front_porch = mode_in->vsync_start - 
mode_in->vdisplay;
+   timing_out->v_sync_width = mode_in->vsync_end - 
mode_in->vsync_start;
+   timing_out->pix_clk_100hz = mode_in->clock * 10;
+   } else {
+   timing_out->h_addressable = mode_in->crtc_hdisplay;
+   timing_out->h_total = mode_in->crtc_htotal;
+   timing_out->h_sync_width = mode_in->crtc_hsync_end - 
mode_in->crtc_hsync_start;
+   timing_out->h_front_porch = mode_in->crtc_hsync_start - 
mode_in->crtc_hdisplay;
+   timing_out->v_total = mode_in->crtc_vtotal;
+   timing_out->v_addressable = mode_in->crtc_vdisplay;
+   timing_out->v_front_porch = mode_in->crtc_vsync_start - 
mode_in->crtc_vdisplay;
+   timing_out->v_sync_width = mode_in->crtc_vsync_end - 
mode_in->crtc_vsync_start;
+   timing_out->pix_clk_100hz = mode_in->crtc_clock * 10;
+   }
+
timing_out->aspect_ratio = get_aspect_ratio(mode_in);
  
  	stream->output_color_space = get_output_color_space(timing_out);

@@ -5227,6 +5240,33 @@ get_highest_refresh_rate_mode(struct amdgpu_dm_connector 
*aconnector,
return m_pref;
  }
  
+static bool is_freesync_video_mode(struct drm_display_mode *mode,

+  struct amdgpu_dm_connector *aconnector)
+{
+   struct drm_display_mode *high_mode;
+   int timing_diff;
+
+   high_mode = get_highest_refresh_rate_mode(aconnector, false);
+   if (!high_mode || !mode)
+   return false;
+
+   timing_diff = high_mode->vtotal - mode->vtotal;
+
+   if (high_mode->clock == 0 || high_mode->clock != mode->clock ||
+   high_mode->hdisplay != mode->hdisplay ||
+   high_mode->vdisplay != mode->vdisplay ||
+   high_mode->hsync_start != mode->hsync_start ||
+   high_mode->hsync_end != mode->hsync_end ||
+   high_mode->htotal != mode->htotal ||
+

Re: Couple of issues with amdgpu on my WX4100

2021-01-06 Thread Maxim Levitsky

On Mon, 2021-01-04 at 09:45 -0700, Alex Williamson wrote:
> On Mon, 4 Jan 2021 12:34:34 +0100
> Christian König  wrote:
> 
> > Hi Maxim,
> > 
> > I can't help with the display related stuff. Probably best approach to 
> > get this fixes would be to open up a bug tracker for this on FDO.
> > 
> > But I'm the one who implemented the resizeable BAR support and your 
> > analysis of the problem sounds about correct to me.
> > 
> > The reason why this works on Linux is most likely because we restore the 
> > BAR size on resume (and maybe during initial boot as well).
> > 
> > See this patch for reference:
> > 
> > commit d3252ace0bc652a1a24446b6a549f969bf99
> > Author: Christian König 
> > Date:   Fri Jun 29 19:54:55 2018 -0500
> > 
> >  PCI: Restore resized BAR state on resume
> > 
> >  Resize BARs after resume to the expected size again.
> > 
> >  BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199959
> >  Fixes: d6895ad39f3b ("drm/amdgpu: resize VRAM BAR for CPU access v6")
> >  Fixes: 276b738deb5b ("PCI: Add resizable BAR infrastructure")
> >  Signed-off-by: Christian König 
> >  Signed-off-by: Bjorn Helgaas 
> >  CC: sta...@vger.kernel.org  # v4.15+
> > 
Hi!
Thanks for the feedback!

So I went over qemu code and indeed the qemu (as opposed to the kernel
where I tried to hide the PCI_EXT_CAP_ID_REBAR) indeed does hide this
pci capability from the guest.

However exactly as Alex mentioned the kernel does indeed restore
the rebar state, and even with that code patched out I found out that
rebar state persists across the reset that the vendor_reset module 
does (BACO I think).

Therefore the Linux guest sees the full 4G bar and happily uses it, 
while the windows guest's driver apparently has a bug when the bar
is that large.

I patched the amdgpu to resize the bar to various other sizes, and
the windows driver apparently works up to a 2GB bar.

So pretty much other than a bug in the windows driver, and fact
that VFIO doesn't support resizable bars there is nothing wrong here.

Since my system does support above 4G decoding and I do have a nice
vfio friendly device that does support a resizable bar, I do volunteer
to add support for this to VFIO as time and resources permit.

Also it would be nice if it was either possible to make amdgpu 
(or the whole system) optionally avoid resizing bars when a 
kernel command line / module param is given,
or even better let the amdgpu resize the bar to its original
size when it is unloaded which IMHO is the best solution 
for this problem.

I think I can prepare a patch to make amdgpu restore 
the bar size on unload if you think that
this is the right solution.

> > 
> > It should be trivial to add this to the reset module as well. Most 
> > likely even completely vendor independent since I'm not sure what a bus 
> > reset will do to this configuration and restoring it all the time should 
> > be the most defensive approach.

> 
> Hmm, this should already be used by the bus/slot reset path:
> 
> pci_bus_restore_locked()/pci_slot_restore_locked()
>  pci_dev_restore()
>   pci_restore_state()
>pci_restore_rebar_state()
> 
> VFIO support for resizeable BARs has been on my todo list, but I don't
> have access to any systems that have both a capable device and >4G
> decoding enabled in the BIOS.  If we have a consistent view of the BAR
> size after the BARs are expanded, I'm not sure why it doesn't just
> work.  FWIW, QEMU currently hides the REBAR capability to the guest
> because the kernel driver doesn't support emulation through config
> space (ie. it's read-only, which the spec doesn't support).
> 
> AIUI, resource allocation can fail when enabling REBAR support, which
> is a problem if the failure occurs on the host but not the guest since
> we have no means via the hardware protocol to expose such a condition.
> Therefore the model I was considering for vfio-pci would be to simply
> pre-enable REBAR at the max size.  It might be sufficiently safe to
> test BAR expansion on initialization and then allow user control, but
> I'm concerned that resource availability could change while already in
> use by the user.  Thanks,

As mentioned in other replies in this thread and what my first
thought about this, this will indeed will break on devices which
don't accurately report the maximum bar size that they actually need.
Even the spec itself says that it is vendor specific to determine the
optimal bar size.

We can also allow guest to resize the bar and if that fails,
expose the error via a virtual AER message on the root port
where the device is attached? 

I personally don't know if this is possible/worth it.

Best regards,
Maxim Levitsky

> 
> Alex

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: radeon kernel driver not suppressing ACPI_VIDEO_NOTIFY_PROBE events when it should

2021-01-06 Thread Hans de Goede

Hi,

On 1/6/21 6:07 PM, Alex Deucher wrote:
> On Wed, Jan 6, 2021 at 11:25 AM Hans de Goede  wrote:
>>
>> Hi All,
>>
>> I get Cc-ed on all Fedora kernel bugs and this one stood out to me:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1911763
>>
>> Since I've done a lot of work on the acpi-video code I thought I should
>> take a look. I've managed to help the user with a kernel-commandline
>> option which stops video.ko (the acpi-video kernel module) from emitting
>> key-press events for ACPI_VIDEO_NOTIFY_PROBE events.
>>
>> This is on a Dell Vostro laptop with i915/radeon hybrid gfx.
>>
>> I was thinking about adding a DMI quirk for this, but from the brief time
>> that I worked on nouveau (and specifically hybrid gfx setups) I know that
>> these events get fired on hybrid gfx setups when the discrete GPU is
>> powered down and something happens which requires the discrete GPUs drivers
>> attention, like an external monitor being plugged into a connector handled
>> by the dGPU (note that is not the case here).
>>
>> So I took a quick look at the radeon code and the radeon_atif_handler()
>> function from drivers/gpu/drm/radeon/radeon_acpi.c. When successful that
>> returns NOTIFY_BAD which suppresses the key-press.
>>
>> But in various cases it returns NOTIFY_DONE instead which does not
>> suppress the key-press event. So I think that the spurious key-press events
>> which the user is seeing should be avoided by this function returning
>> NOTIFY_BAD.
>>
>> Specifically I'm wondering if we should not return
>> NOTIFY_BAD when count == 0?   I guess this can cause problems if there
>> are multiple GPUs, but we could check if the acpi-event is for the
>> pci-device the radeon driver is bound to. This would require changing the
>> acpi-notify code to also pass the acpi_device pointer as part of the
>> acpi_bus_event but that should not be a problem.
>>
> 
> For A+A PX/HG systems, we'd want the notifications for both the dGPU
> and the APU since some of the events are relevant to one or the other.
> ATIF_DGPU_DISPLAY_EVENT is only relevant to the dGPU, while
> ATIF_PANEL_BRIGHTNESS_CHANGE_REQUEST would be possibly relevant to
> both (if there was a mux), but mainly the APU.
> ATIF_SYSTEM_POWER_SOURCE_CHANGE_REQUEST would be relevant to both.
> The other events have extended bits to determine which GPU the event
> is targeted at.

Right, but AFAIK on hybrid systems there are 2 ACPI video-bus devices,
one for each of the iGPU and dGPU which is why I suggested passing 
the video-bus acpi_device as extra data in acpi_bus_event and then
radeon_atif_handler() could check if the acpi_device is the companion
device of the GPU. This assumes that events for GPU# will also
originate from (through an ACPI ASL notify call) the ACPI video-bus
which belongs to that GPU.

This all assumes though that the problem is that radeon_atif_handler() 
does not return NOTIFY_BAD when the event count being 0 (in other words
a spurious event). It is also possibly that one of the earlier checks in
radeon_atif_handler() is failing...

I guess that a first step in debugging this would be to ask the reporter
to run a kernel with some debugging printk-s added to radeon_atif_handler(),
to see which code-path in radeon_atif_handler we end up in
(assuming that radeon_atif_handler() gets called at all).

Any suggestions for other debugging printk-s, before I prepare a Fedora
kernel for the reporter to test?

Regards,

Hans

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: radeon kernel driver not suppressing ACPI_VIDEO_NOTIFY_PROBE events when it should

2021-01-06 Thread Hans de Goede

Hi,

On 1/6/21 8:33 PM, Alex Deucher wrote:
> On Wed, Jan 6, 2021 at 1:10 PM Hans de Goede  wrote:
>>
>> Hi,
>>
>> On 1/6/21 6:07 PM, Alex Deucher wrote:
>>> On Wed, Jan 6, 2021 at 11:25 AM Hans de Goede  wrote:

 Hi All,

 I get Cc-ed on all Fedora kernel bugs and this one stood out to me:

 https://bugzilla.redhat.com/show_bug.cgi?id=1911763

 Since I've done a lot of work on the acpi-video code I thought I should
 take a look. I've managed to help the user with a kernel-commandline
 option which stops video.ko (the acpi-video kernel module) from emitting
 key-press events for ACPI_VIDEO_NOTIFY_PROBE events.

 This is on a Dell Vostro laptop with i915/radeon hybrid gfx.

 I was thinking about adding a DMI quirk for this, but from the brief time
 that I worked on nouveau (and specifically hybrid gfx setups) I know that
 these events get fired on hybrid gfx setups when the discrete GPU is
 powered down and something happens which requires the discrete GPUs drivers
 attention, like an external monitor being plugged into a connector handled
 by the dGPU (note that is not the case here).

 So I took a quick look at the radeon code and the radeon_atif_handler()
 function from drivers/gpu/drm/radeon/radeon_acpi.c. When successful that
 returns NOTIFY_BAD which suppresses the key-press.

 But in various cases it returns NOTIFY_DONE instead which does not
 suppress the key-press event. So I think that the spurious key-press events
 which the user is seeing should be avoided by this function returning
 NOTIFY_BAD.

 Specifically I'm wondering if we should not return
 NOTIFY_BAD when count == 0?   I guess this can cause problems if there
 are multiple GPUs, but we could check if the acpi-event is for the
 pci-device the radeon driver is bound to. This would require changing the
 acpi-notify code to also pass the acpi_device pointer as part of the
 acpi_bus_event but that should not be a problem.

>>>
>>> For A+A PX/HG systems, we'd want the notifications for both the dGPU
>>> and the APU since some of the events are relevant to one or the other.
>>> ATIF_DGPU_DISPLAY_EVENT is only relevant to the dGPU, while
>>> ATIF_PANEL_BRIGHTNESS_CHANGE_REQUEST would be possibly relevant to
>>> both (if there was a mux), but mainly the APU.
>>> ATIF_SYSTEM_POWER_SOURCE_CHANGE_REQUEST would be relevant to both.
>>> The other events have extended bits to determine which GPU the event
>>> is targeted at.
>>
>> Right, but AFAIK on hybrid systems there are 2 ACPI video-bus devices,
>> one for each of the iGPU and dGPU which is why I suggested passing
>> the video-bus acpi_device as extra data in acpi_bus_event and then
>> radeon_atif_handler() could check if the acpi_device is the companion
>> device of the GPU. This assumes that events for GPU# will also
>> originate from (through an ACPI ASL notify call) the ACPI video-bus
>> which belongs to that GPU.
> 
> That's not the case.  For PX/HG systems, ATIF is in the iGPU's
> namespace, on dGPU only systems, ATIF is in the dGPU's namespace.

That assumes and AMD iGPU + AMD dGPU I believe ?  The system on
which the spurious ACPI_VIDEO_NOTIFY_PROBE events lead to spurious
KEY_SWITCHVIDEOMODE key-presses being reported uses an Intel iGPU
with an AMD dGPU. I don't have any hybrid gfx systems available for
testing atm, but I believe that in this case there will be 2 ACPI
video-busses, one for each GPU.

Note I'm not saying that that means that checking the originating
ACPI device is the companion of the GPUs PCI-device is the solution
here. But so far all I've heard from you is that that is not the
solution, without you offering any alternative ideas / possible
solutions to try for filtering out these spurious key-presses.

Regards,

Hans

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: radeon kernel driver not suppressing ACPI_VIDEO_NOTIFY_PROBE events when it should

2021-01-06 Thread Alex Deucher

On Wed, Jan 6, 2021 at 3:04 PM Hans de Goede  wrote:
>
> Hi,
>
> On 1/6/21 8:33 PM, Alex Deucher wrote:
> > On Wed, Jan 6, 2021 at 1:10 PM Hans de Goede  wrote:
> >>
> >> Hi,
> >>
> >> On 1/6/21 6:07 PM, Alex Deucher wrote:
> >>> On Wed, Jan 6, 2021 at 11:25 AM Hans de Goede  wrote:
> 
>  Hi All,
> 
>  I get Cc-ed on all Fedora kernel bugs and this one stood out to me:
> 
>  https://bugzilla.redhat.com/show_bug.cgi?id=1911763
> 
>  Since I've done a lot of work on the acpi-video code I thought I should
>  take a look. I've managed to help the user with a kernel-commandline
>  option which stops video.ko (the acpi-video kernel module) from emitting
>  key-press events for ACPI_VIDEO_NOTIFY_PROBE events.
> 
>  This is on a Dell Vostro laptop with i915/radeon hybrid gfx.
> 
>  I was thinking about adding a DMI quirk for this, but from the brief time
>  that I worked on nouveau (and specifically hybrid gfx setups) I know that
>  these events get fired on hybrid gfx setups when the discrete GPU is
>  powered down and something happens which requires the discrete GPUs 
>  drivers
>  attention, like an external monitor being plugged into a connector 
>  handled
>  by the dGPU (note that is not the case here).
> 
>  So I took a quick look at the radeon code and the radeon_atif_handler()
>  function from drivers/gpu/drm/radeon/radeon_acpi.c. When successful that
>  returns NOTIFY_BAD which suppresses the key-press.
> 
>  But in various cases it returns NOTIFY_DONE instead which does not
>  suppress the key-press event. So I think that the spurious key-press 
>  events
>  which the user is seeing should be avoided by this function returning
>  NOTIFY_BAD.
> 
>  Specifically I'm wondering if we should not return
>  NOTIFY_BAD when count == 0?   I guess this can cause problems if there
>  are multiple GPUs, but we could check if the acpi-event is for the
>  pci-device the radeon driver is bound to. This would require changing the
>  acpi-notify code to also pass the acpi_device pointer as part of the
>  acpi_bus_event but that should not be a problem.
> 
> >>>
> >>> For A+A PX/HG systems, we'd want the notifications for both the dGPU
> >>> and the APU since some of the events are relevant to one or the other.
> >>> ATIF_DGPU_DISPLAY_EVENT is only relevant to the dGPU, while
> >>> ATIF_PANEL_BRIGHTNESS_CHANGE_REQUEST would be possibly relevant to
> >>> both (if there was a mux), but mainly the APU.
> >>> ATIF_SYSTEM_POWER_SOURCE_CHANGE_REQUEST would be relevant to both.
> >>> The other events have extended bits to determine which GPU the event
> >>> is targeted at.
> >>
> >> Right, but AFAIK on hybrid systems there are 2 ACPI video-bus devices,
> >> one for each of the iGPU and dGPU which is why I suggested passing
> >> the video-bus acpi_device as extra data in acpi_bus_event and then
> >> radeon_atif_handler() could check if the acpi_device is the companion
> >> device of the GPU. This assumes that events for GPU# will also
> >> originate from (through an ACPI ASL notify call) the ACPI video-bus
> >> which belongs to that GPU.
> >
> > That's not the case.  For PX/HG systems, ATIF is in the iGPU's
> > namespace, on dGPU only systems, ATIF is in the dGPU's namespace.
>
> That assumes and AMD iGPU + AMD dGPU I believe ?  The system on
> which the spurious ACPI_VIDEO_NOTIFY_PROBE events lead to spurious
> KEY_SWITCHVIDEOMODE key-presses being reported uses an Intel iGPU
> with an AMD dGPU. I don't have any hybrid gfx systems available for
> testing atm, but I believe that in this case there will be 2 ACPI
> video-busses, one for each GPU.

I think the ATIF method will be on the iGPU regardless of whether it's
intel or AMD.

>
> Note I'm not saying that that means that checking the originating
> ACPI device is the companion of the GPUs PCI-device is the solution
> here. But so far all I've heard from you is that that is not the
> solution, without you offering any alternative ideas / possible
> solutions to try for filtering out these spurious key-presses.

Sorry, I'm not really an ACPI expert.  I think returning NOTIFY_BAD is
fine for this specific case, but I don't know if it will break other
platforms.  That said, I don't recall seeing any other similar bugs,
so maybe this is something specific to this particular laptop.

Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Couple of issues with amdgpu on my WX4100

2021-01-06 Thread Maxim Levitsky

On Mon, 2021-01-04 at 12:34 +0100, Christian König wrote:
> Hi Maxim,
> 
> I can't help with the display related stuff. Probably best approach to get 
> this fixes would be to open up a bug tracker for this on FDO.

Done, bugs are opened
https://gitlab.freedesktop.org/drm/amd/-/issues/1429
https://gitlab.freedesktop.org/drm/amd/-/issues/1430

About the EDID issue, there do seem to be few open bugs about it,
but what differs in my case I think is that EDID failure happens
only once in a while, rather that always, and it seems to bring
the whole device down.

Best regards,
Maxim Levitsky

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[pull] amdgpu drm-fixes-5.11

2021-01-06 Thread Alex Deucher

Hi Dave, Daniel,

New URL.  FDO ran out of disk space, so I'm attempting to move to gitlab.
Let me know if you run into any issues.

Thanks

The following changes since commit 5b2fc08c455bbf749489254a81baeffdf4c0a693:

  Merge tag 'amd-drm-fixes-5.11-2020-12-23' of 
git://people.freedesktop.org/~agd5f/linux into drm-next (2020-12-24 10:31:16 
+1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-fixes-5.11-2021-01-06

for you to fetch changes up to 5efc1f4b454c6179d35e7b0c3eda0ad5763a00fc:

  Revert "drm/amd/display: Fix memory leaks in S3 resume" (2021-01-06 16:25:06 
-0500)


amd-drm-fixes-5.11-2021-01-06:

amdgpu:
- Telemetry fix for VGH
- Powerplay fixes for RV
- Powerplay fixes for RN
- RAS fixes for Sienna Cichlid
- Blank screen regression fix
- Drop DCN support for aarch64
- Misc other fixes


Alex Deucher (2):
  drm/amdgpu/display: drop DCN support for aarch64
  Revert "drm/amd/display: Fix memory leaks in S3 resume"

Arnd Bergmann (1):
  drm/amd/display: Fix unused variable warning

Dennis Li (3):
  drm/amdgpu: fix a memory protection fault when remove amdgpu device
  drm/amdgpu: fix a GPU hang issue when remove device
  drm/amdgpu: fix no bad_pages issue after umc ue injection

Hawking Zhang (1):
  drm/amdgpu: switched to cached noretry setting for vangogh

Jiawei Gu (1):
  drm/amdgpu: fix potential memory leak during navi12 deinitialization

John Clements (2):
  drm/amd/pm: updated PM to I2C controller port on sienna cichlid
  drm/amdgpu: enable ras eeprom support for sienna cichlid

Kevin Wang (1):
  drm/amd/display: fix sysfs amdgpu_current_backlight_pwm NULL pointer issue

Xiaojian Du (4):
  drm/amd/pm: correct the sensor value of power for vangogh
  drm/amd/pm: improve the fine grain tuning function for RV/RV2/PCO
  drm/amd/pm: fix the failure when change power profile for renoir
  drm/amd/pm: improve the fine grain tuning function for RV/RV2/PCO

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c|  25 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c |   8 +-
 drivers/gpu/drm/amd/amdgpu/mmhub_v2_3.c|   2 +-
 drivers/gpu/drm/amd/display/Kconfig|   2 +-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  |   7 +-
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.h  |   2 +-
 drivers/gpu/drm/amd/display/dc/calcs/Makefile  |   4 -
 drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile|  21 ---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c  |   7 +-
 drivers/gpu/drm/amd/display/dc/dcn10/Makefile  |   7 -
 .../gpu/drm/amd/display/dc/dcn10/dcn10_resource.c  |   7 -
 drivers/gpu/drm/amd/display/dc/dcn20/Makefile  |   4 -
 drivers/gpu/drm/amd/display/dc/dcn21/Makefile  |   4 -
 drivers/gpu/drm/amd/display/dc/dcn30/Makefile  |   5 -
 drivers/gpu/drm/amd/display/dc/dcn301/Makefile |   4 -
 drivers/gpu/drm/amd/display/dc/dcn302/Makefile |   4 -
 drivers/gpu/drm/amd/display/dc/dml/Makefile|   4 -
 drivers/gpu/drm/amd/display/dc/dsc/Makefile|   4 -
 drivers/gpu/drm/amd/display/dc/os_types.h  |   4 -
 .../gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c   | 166 +++--
 .../gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h   |   3 +
 .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c|   2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c   |   3 +-
 drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c|   1 +
 drivers/gpu/drm/amd/pm/swsmu/smu12/smu_v12_0.c |   1 +
 27 files changed, 200 insertions(+), 113 deletions(-)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/pm: add swsmu init documentation

2021-01-06 Thread Ryan Taylor

Documents functions used in swsmu initialization.

Signed-off-by: Ryan Taylor 
Reviewed-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 94 ++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index d80f7f8efdcd..82099cb3d00a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -376,6 +376,15 @@ static int smu_get_driver_allowed_feature_mask(struct 
smu_context *smu)
return ret;
 }
 
+/**
+ * smu_set_funcs - Set ASIC specific SMU communication tools and data.
+ * @adev: amdgpu_device pointer
+ *
+ * Set hooks (&struct pptable_funcs), maps (&struct cmn2asic_mapping) and
+ * basic ASIC information (is_apu, od_enabled, etc.).
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
 static int smu_set_funcs(struct amdgpu_device *adev)
 {
struct smu_context *smu = &adev->smu;
@@ -417,6 +426,15 @@ static int smu_set_funcs(struct amdgpu_device *adev)
return 0;
 }
 
+/**
+ * smu_early_init - Early init for the SMU IP block.
+ * @handle: amdgpu_device pointer
+ *
+ * Perform basic initialization of &struct smu_context. Set ASIC specific SMU
+ * communication tools and data using smu_set_funcs().
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
 static int smu_early_init(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -424,10 +442,12 @@ static int smu_early_init(void *handle)
 
smu->adev = adev;
smu->pm_enabled = !!amdgpu_dpm;
+   /* Assume ASIC is not an APU until updated in smu_set_funcs(). */
smu->is_apu = false;
mutex_init(&smu->mutex);
mutex_init(&smu->smu_baco.mutex);
smu->smu_baco.state = SMU_BACO_STATE_EXIT;
+   /* Disable baco support until the SMU engine is running. */
smu->smu_baco.platform_support = false;
 
return smu_set_funcs(adev);
@@ -472,6 +492,17 @@ static int smu_set_default_dpm_table(struct smu_context 
*smu)
return ret;
 }
 
+
+/**
+ * smu_late_init - Finish setting up the SMU IP block.
+ * @adev: amdgpu_device pointer
+ *
+ * Setup SMU tables/values used by other driver subsystems and in userspace
+ * (Overdrive, UMD power states, etc.). Perform final SMU configuration (set
+ * performance level, update display config etc.).
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
 static int smu_late_init(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -514,6 +545,8 @@ static int smu_late_init(void *handle)
 
smu_get_fan_parameters(smu);
 
+   /* Sets performance level, power profile mode and display
+* configuration. */
smu_handle_task(&adev->smu,
smu->smu_dpm.dpm_level,
AMD_PP_TASK_COMPLETE_INIT,
@@ -601,7 +634,7 @@ static int smu_fini_fb_allocations(struct smu_context *smu)
 /**
  * smu_alloc_memory_pool - allocate memory pool in the system memory
  *
- * @smu: amdgpu_device pointer
+ * @smu: smu_context pointer
  *
  * This memory pool will be used for SMC use and msg SetSystemVirtualDramAddr
  * and DramLogSetDramAddr can notify it changed.
@@ -701,6 +734,15 @@ static void smu_free_dummy_read_table(struct smu_context 
*smu)
memset(dummy_read_1_table, 0, sizeof(struct smu_table));
 }
 
+/**
+ * smu_smc_table_sw_init -  Initialize shared driver/SMU communication tools.
+ * @smu: smu_context pointer
+ *
+ * Allocate VRAM/DRAM for shared memory objects (SMU tables, memory pool, 
etc.).
+ * Initialize i2c.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
 static int smu_smc_table_sw_init(struct smu_context *smu)
 {
int ret;
@@ -799,6 +841,18 @@ static void smu_interrupt_work_fn(struct work_struct *work)
mutex_unlock(&smu->mutex);
 }
 
+/**
+ * smu_sw_init - Software init for the SMU IP block.
+ * @handle: amdgpu_device pointer
+ *
+ * Configure &struct smu_context with boot default performance profiles (power
+ * profile, workload, etc.) and power savings optimizations (powergate
+ * VCN/JPEG). Request the SMU's firmware from the kernel. Initialize features,
+ * locks, and kernel work queues. Initialize driver/SMU communication tools
+ * using smu_smc_table_sw_init(). Register the interrupt handler.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
 static int smu_sw_init(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -820,6 +874,7 @@ static int smu_sw_init(void *handle)
INIT_WORK(&smu->interrupt_work, smu_interrupt_work_fn);
atomic64_set(&smu->throttle_int_counter, 0);
smu->watermarks_bitmap = 0;
+
smu->power_profile_mode = PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
smu->default_power_profile_mode = PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
 
@@ -914,6 +969,18 @@ static int smu_ge

RE: [PATCH] drm/amd/pm: add swsmu init documentation

2021-01-06 Thread Quan, Evan

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Evan Quan 

-Original Message-
From: amd-gfx  On Behalf Of Ryan Taylor
Sent: Thursday, January 7, 2021 7:45 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Taylor, Ryan 

Subject: [PATCH] drm/amd/pm: add swsmu init documentation

Documents functions used in swsmu initialization.

Signed-off-by: Ryan Taylor 
Reviewed-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 94 ++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index d80f7f8efdcd..82099cb3d00a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -376,6 +376,15 @@ static int smu_get_driver_allowed_feature_mask(struct 
smu_context *smu)
 return ret;
 }

+/**
+ * smu_set_funcs - Set ASIC specific SMU communication tools and data.
+ * @adev: amdgpu_device pointer
+ *
+ * Set hooks (&struct pptable_funcs), maps (&struct cmn2asic_mapping) and
+ * basic ASIC information (is_apu, od_enabled, etc.).
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
 static int smu_set_funcs(struct amdgpu_device *adev)
 {
 struct smu_context *smu = &adev->smu;
@@ -417,6 +426,15 @@ static int smu_set_funcs(struct amdgpu_device *adev)
 return 0;
 }

+/**
+ * smu_early_init - Early init for the SMU IP block.
+ * @handle: amdgpu_device pointer
+ *
+ * Perform basic initialization of &struct smu_context. Set ASIC specific SMU
+ * communication tools and data using smu_set_funcs().
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
 static int smu_early_init(void *handle)
 {
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -424,10 +442,12 @@ static int smu_early_init(void *handle)

 smu->adev = adev;
 smu->pm_enabled = !!amdgpu_dpm;
+/* Assume ASIC is not an APU until updated in smu_set_funcs(). */
 smu->is_apu = false;
 mutex_init(&smu->mutex);
 mutex_init(&smu->smu_baco.mutex);
 smu->smu_baco.state = SMU_BACO_STATE_EXIT;
+/* Disable baco support until the SMU engine is running. */
 smu->smu_baco.platform_support = false;

 return smu_set_funcs(adev);
@@ -472,6 +492,17 @@ static int smu_set_default_dpm_table(struct smu_context 
*smu)
 return ret;
 }

+
+/**
+ * smu_late_init - Finish setting up the SMU IP block.
+ * @adev: amdgpu_device pointer
+ *
+ * Setup SMU tables/values used by other driver subsystems and in userspace
+ * (Overdrive, UMD power states, etc.). Perform final SMU configuration (set
+ * performance level, update display config etc.).
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
 static int smu_late_init(void *handle)
 {
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -514,6 +545,8 @@ static int smu_late_init(void *handle)

 smu_get_fan_parameters(smu);

+/* Sets performance level, power profile mode and display
+ * configuration. */
 smu_handle_task(&adev->smu,
 smu->smu_dpm.dpm_level,
 AMD_PP_TASK_COMPLETE_INIT,
@@ -601,7 +634,7 @@ static int smu_fini_fb_allocations(struct smu_context *smu)
 /**
  * smu_alloc_memory_pool - allocate memory pool in the system memory
  *
- * @smu: amdgpu_device pointer
+ * @smu: smu_context pointer
  *
  * This memory pool will be used for SMC use and msg SetSystemVirtualDramAddr
  * and DramLogSetDramAddr can notify it changed.
@@ -701,6 +734,15 @@ static void smu_free_dummy_read_table(struct smu_context 
*smu)
 memset(dummy_read_1_table, 0, sizeof(struct smu_table));
 }

+/**
+ * smu_smc_table_sw_init -  Initialize shared driver/SMU communication tools.
+ * @smu: smu_context pointer
+ *
+ * Allocate VRAM/DRAM for shared memory objects (SMU tables, memory pool, 
etc.).
+ * Initialize i2c.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
 static int smu_smc_table_sw_init(struct smu_context *smu)
 {
 int ret;
@@ -799,6 +841,18 @@ static void smu_interrupt_work_fn(struct work_struct *work)
 mutex_unlock(&smu->mutex);
 }

+/**
+ * smu_sw_init - Software init for the SMU IP block.
+ * @handle: amdgpu_device pointer
+ *
+ * Configure &struct smu_context with boot default performance profiles (power
+ * profile, workload, etc.) and power savings optimizations (powergate
+ * VCN/JPEG). Request the SMU's firmware from the kernel. Initialize features,
+ * locks, and kernel work queues. Initialize driver/SMU communication tools
+ * using smu_smc_table_sw_init(). Register the interrupt handler.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
 static int smu_sw_init(void *handle)
 {
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -820,6 +874,7 @@ static int smu_sw_init(void *handle)
 INIT_WORK(&smu->interrupt_work, smu_interrupt_work_fn);
 atomic64_set(&smu->throttle_int_counter, 0);
 smu->watermarks_bitmap = 0;
+
 smu->power_profile_mode = PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
 smu->default_power_profile_mode = PP_SMC_POWER_PRO

RE: [PATCH 3/3] drm/amdgpu:Limit the resolution for virtual_display

2021-01-06 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Michel Dänzer 
>Sent: Wednesday, January 6, 2021 11:25 PM
>To: Deng, Emily ; Alex Deucher
>
>Cc: amd-gfx list 
>Subject: Re: [PATCH 3/3] drm/amdgpu:Limit the resolution for virtual_display
>
>On 2021-01-06 11:40 a.m., Deng, Emily wrote:
>>> From: Alex Deucher  On Tue, Jan 5, 2021 at
>>> 3:37 AM Emily.Deng  wrote:

 Limit the resolution not bigger than 16384, which means
 dev->mode_info.num_crtc * common_modes[i].w not bigger than 16384.

 Signed-off-by: Emily.Deng 
 ---
   drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 7 +--
   1 file changed, 5 insertions(+), 2 deletions(-)

 diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
 b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
 index 2b16c8faca34..c23d37b02fd7 100644
 --- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
 +++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
 @@ -319,6 +319,7 @@ dce_virtual_encoder(struct drm_connector
 *connector)  static int dce_virtual_get_modes(struct drm_connector
 *connector)  {
  struct drm_device *dev = connector->dev;
 +   struct amdgpu_device *adev = dev->dev_private;
  struct drm_display_mode *mode = NULL;
  unsigned i;
  static const struct mode_size { @@ -350,8 +351,10 @@ static
 int dce_virtual_get_modes(struct drm_connector *connector)
  };

  for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
 -   mode = drm_cvt_mode(dev, common_modes[i].w,
>>> common_modes[i].h, 60, false, false, false);
 -   drm_mode_probed_add(connector, mode);
 +   if (adev->mode_info.num_crtc <= 4 ||
 + common_modes[i].w <= 2560) {
>>>
>>> You are also limiting the number of crtcs here.  Intended?  Won't
>>> this break 5 or 6 crtc configs?
>>>
>>> Alex
>> Yes, it is intended,  for num_crtc bigger then 4, don't support resolution
>bigger then 2560, because of the max supported width is 16384 for xcb
>protocol.
>
>There's no such limitation with Wayland. I'd recommend against artificially
>imposing limits from X11 to the kernel.
>
>
>(As a side note, the X11 protocol limit should actually be 32768; the
>16384 limit exposed in the RANDR extension comes from the kernel driver,
>specifically drmModeGetResources's max_width/height)
It is our test and debug result, that the follow variable only have 16bit. Will 
limit the resolution to 16384.
glamor_pixmap_from_fd(ScreenPtr screen,
  int fd,
  CARD16 width,
  CARD16 height,
  CARD16 stride, CARD8 depth, CARD8 bpp)
>
>
>--
>Earthling Michel Dänzer   |
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredha
>t.com%2F&data=04%7C01%7CEmily.Deng%40amd.com%7Ca822927192
>e54d50539c08d8b2574439%7C3dd8961fe4884e608e11a82d994e183d%7C0%
>7C0%7C637455435178758996%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4
>wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&a
>mp;sdata=5u7%2Bz2q52PTyPEg9LWcLGVGLERYupc%2B5nKJiIHZTTKw%3D&a
>mp;reserved=0
>Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: For sriov multiple VF, set compute timeout to 10s

2021-01-06 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Paul Menzel 
>Sent: Wednesday, January 6, 2021 8:54 PM
>To: Deng, Emily 
>Cc: amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: For sriov multiple VF, set compute timeout
>to 10s
>
>Dear Emily,
>
>
>Am 06.01.21 um 12:41 schrieb Emily.Deng:
>
>Could you please remove the dot your name in your git configuration?
>
> git config --global user.name "Emily Deng"
Ok, will do this.
>
>For the summary, maybe amend it to:
>
> Decrease compute timeout to 10 s for sriov multiple VF
Ok, thanks, good suggestion.
>
>> For multiple VF, after engine hang,as host driver will first
>
>Nit: Please add a space after the comma.
>
>> encounter FLR, so has no meanning to set compute to 60s.
>
>meaning
>
>How can this be tested?
Setup the environment for sriov.
>
>> Signed-off-by: Emily.Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index b69c34074d8d..ed36bf97df29 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3117,8 +3117,10 @@ static int
>amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>>*/
>>   adev->gfx_timeout = msecs_to_jiffies(1);
>>   adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout;
>> -if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev))
>> +if ((amdgpu_sriov_vf(adev) && amdgpu_sriov_is_pp_one_vf(adev)) ||
>> +amdgpu_passthrough(adev))
>>   adev->compute_timeout =  msecs_to_jiffies(6);
>> +else if (amdgpu_sriov_vf(adev))
>> +adev->compute_timeout =  msecs_to_jiffies(1);
>
>Maybe split up the first if condition to group the condition and not he timeout
>values. At least for me that would be less confusing:
>
> if (amdgpu_sriov_vf(adev))
> adev->compute_timeout = amdgpu_sriov_is_pp_one_vf(adev) ?
>msecs_to_jiffies(6) : msecs_to_jiffies(1)
> else if (amdgpu_passthrough(adev))
> adev->compute_timeout =  msecs_to_jiffies(6);
>
>>   else
>>   adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;
>
Good suggestion, will send out v2 patch
>
>Kind regards,
>
>Paul
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 2/3] drm/amdgpu: Correct the read sclk for navi10

2021-01-06 Thread Quan, Evan

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Evan Quan 

-Original Message-
From: amd-gfx  On Behalf Of Emily.Deng
Sent: Wednesday, January 6, 2021 9:05 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily 
Subject: [PATCH 2/3] drm/amdgpu: Correct the read sclk for navi10

According to hw, after navi10,it runs in dfll mode, and should
read sclk from AverageGfxclkFrequency.

Signed-off-by: Emily.Deng 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index 51e83123f72a..7ebf9588983f 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -1673,7 +1673,7 @@ static int navi10_read_sensor(struct smu_context *smu,
 *size = 4;
 break;
 case AMDGPU_PP_SENSOR_GFX_SCLK:
-ret = navi10_get_current_clk_freq_by_table(smu, SMU_GFXCLK, (uint32_t *)data);
+ret = navi10_get_smu_metrics_data(smu, METRICS_AVERAGE_GFXCLK, (uint32_t 
*)data);
 *(uint32_t *)data *= 100;
 *size = 4;
 break;
--
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cevan.quan%40amd.com%7Cf8dd063d81db4666206308d8b1df2912%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637454919343172670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gA1B8ytfQj0g5TMPHYYC%2FKNUli2qY7iY%2Fc1vn4M7vWA%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: Decrease compute timeout to 10 s for sriov multiple VF

2021-01-06 Thread Emily Deng

From: "Emily.Deng" 

For multiple VF, after engine hang,as host driver will first
encounter FLR, so has no meanning to set compute to 60s.

Signed-off-by: Emily.Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5527c549db82..ce07b9b975ff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3133,7 +3133,10 @@ static int amdgpu_device_get_job_timeout_settings(struct 
amdgpu_device *adev)
 */
adev->gfx_timeout = msecs_to_jiffies(1);
adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout;
-   if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev))
+   if (amdgpu_sriov_vf(adev))
+   adev->compute_timeout = amdgpu_sriov_is_pp_one_vf(adev) ?
+   msecs_to_jiffies(6) : 
msecs_to_jiffies(1)
+   else if (amdgpu_passthrough(adev))
adev->compute_timeout =  msecs_to_jiffies(6);
else
adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2] drm/amdgpu: Decrease compute timeout to 10 s for sriov multiple VF

2021-01-06 Thread Emily Deng

From: "Emily.Deng" 

For multiple VF, after engine hang,as host driver will first
encounter FLR, so has no meanning to set compute to 60s.

v2:
   Refine the patch and comment

Signed-off-by: Emily.Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5527c549db82..35edf58c825d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3133,7 +3133,10 @@ static int amdgpu_device_get_job_timeout_settings(struct 
amdgpu_device *adev)
 */
adev->gfx_timeout = msecs_to_jiffies(1);
adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout;
-   if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev))
+   if (amdgpu_sriov_vf(adev))
+   adev->compute_timeout = amdgpu_sriov_is_pp_one_vf(adev) ?
+   msecs_to_jiffies(6) : 
msecs_to_jiffies(1);
+   else if (amdgpu_passthrough(adev))
adev->compute_timeout =  msecs_to_jiffies(6);
else
adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 2/3] drm/amdgpu: Correct the read sclk for navi10

2021-01-06 Thread Feng, Kenneth

[AMD Official Use Only - Internal Distribution Only]

Hello Emily,
The average clock value is a little different from the 'current clock' value.
May I know what's the purpose of this patch to display the average clock? Any 
issue or any customer requirement?
Thanks.


Best Regards
Kenneth

-Original Message-
From: amd-gfx  On Behalf Of Emily.Deng
Sent: Wednesday, January 6, 2021 9:05 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily 
Subject: [PATCH 2/3] drm/amdgpu: Correct the read sclk for navi10

[CAUTION: External Email]

According to hw, after navi10,it runs in dfll mode, and should read sclk from 
AverageGfxclkFrequency.

Signed-off-by: Emily.Deng 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index 51e83123f72a..7ebf9588983f 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -1673,7 +1673,7 @@ static int navi10_read_sensor(struct smu_context *smu,
*size = 4;
break;
case AMDGPU_PP_SENSOR_GFX_SCLK:
-   ret = navi10_get_current_clk_freq_by_table(smu, SMU_GFXCLK, 
(uint32_t *)data);
+   ret = navi10_get_smu_metrics_data(smu, 
+ METRICS_AVERAGE_GFXCLK, (uint32_t *)data);
*(uint32_t *)data *= 100;
*size = 4;
break;
--
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CKenneth.Feng%40amd.com%7Cf8dd063d81db4666206308d8b1df2912%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637454919344238874%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mXK3%2F5g4lqMFFN1ovavfXdbuypZK2FUGbAEce9VPTWs%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 00/35] Add HMM-based SVM memory manager to KFD

2021-01-06 Thread Felix Kuehling

This is the first version of our HMM based shared virtual memory manager
for KFD. There are still a number of known issues that we're working through
(see below). This will likely lead to some pretty significant changes in
MMU notifier handling and locking on the migration code paths. So don't
get hung up on those details yet.

But I think this is a good time to start getting feedback. We're pretty
confident about the ioctl API, which is both simple and extensible for the
future. (see patches 4,16) The user mode side of the API can be found here:
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/fxkamd/hmm-wip/src/svm.c

I'd also like another pair of eyes on how we're interfacing with the GPU VM
code in amdgpu_vm.c (see patches 12,13), retry page fault handling (24,25),
and some retry IRQ handling changes (32).


Known issues:
* won't work with IOMMU enabled, we need to dma_map all pages properly
* still working on some race conditions and random bugs
* performance is not great yet

Alex Sierra (12):
  drm/amdgpu: replace per_device_list by array
  drm/amdkfd: helper to convert gpu id and idx
  drm/amdkfd: add xnack enabled flag to kfd_process
  drm/amdkfd: add ioctl to configure and query xnack retries
  drm/amdkfd: invalidate tables on page retry fault
  drm/amdkfd: page table restore through svm API
  drm/amdkfd: SVM API call to restore page tables
  drm/amdkfd: add svm_bo reference for eviction fence
  drm/amdgpu: add param bit flag to create SVM BOs
  drm/amdkfd: add svm_bo eviction mechanism support
  drm/amdgpu: svm bo enable_signal call condition
  drm/amdgpu: add svm_bo eviction to enable_signal cb

Philip Yang (23):
  drm/amdkfd: select kernel DEVICE_PRIVATE option
  drm/amdkfd: add svm ioctl API
  drm/amdkfd: Add SVM API support capability bits
  drm/amdkfd: register svm range
  drm/amdkfd: add svm ioctl GET_ATTR op
  drm/amdgpu: add common HMM get pages function
  drm/amdkfd: validate svm range system memory
  drm/amdkfd: register overlap system memory range
  drm/amdkfd: deregister svm range
  drm/amdgpu: export vm update mapping interface
  drm/amdkfd: map svm range to GPUs
  drm/amdkfd: svm range eviction and restore
  drm/amdkfd: register HMM device private zone
  drm/amdkfd: validate vram svm range from TTM
  drm/amdkfd: support xgmi same hive mapping
  drm/amdkfd: copy memory through gart table
  drm/amdkfd: HMM migrate ram to vram
  drm/amdkfd: HMM migrate vram to ram
  drm/amdgpu: reserve fence slot to update page table
  drm/amdgpu: enable retry fault wptr overflow
  drm/amdkfd: refine migration policy with xnack on
  drm/amdkfd: add svm range validate timestamp
  drm/amdkfd: multiple gpu migrate vram to vram

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|4 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c  |   16 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   13 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c|   83 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h|7 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h|5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   90 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|   47 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   10 +
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c|   32 +-
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c|   32 +-
 drivers/gpu/drm/amd/amdkfd/Kconfig|1 +
 drivers/gpu/drm/amd/amdkfd/Makefile   |4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  170 +-
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c|8 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c  |  866 ++
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h  |   59 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   52 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  200 +-
 .../amd/amdkfd/kfd_process_queue_manager.c|6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c  | 2564 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h  |  135 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |1 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   10 +-
 include/uapi/linux/kfd_ioctl.h|  169 +-
 26 files changed, 4296 insertions(+), 291 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_svm.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_svm.h

-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 01/35] drm/amdkfd: select kernel DEVICE_PRIVATE option

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

DEVICE_PRIVATE kernel config option is required for HMM page migration,
to register vram (GPU device memory) as DEVICE_PRIVATE zone memory.
Enabling this option recompiles kernel.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/Kconfig 
b/drivers/gpu/drm/amd/amdkfd/Kconfig
index e8fb10c41f16..33f8efadc6f6 100644
--- a/drivers/gpu/drm/amd/amdkfd/Kconfig
+++ b/drivers/gpu/drm/amd/amdkfd/Kconfig
@@ -7,6 +7,7 @@ config HSA_AMD
bool "HSA kernel driver for AMD GPU devices"
depends on DRM_AMDGPU && (X86_64 || ARM64 || PPC64)
imply AMD_IOMMU_V2 if X86_64
+   select DEVICE_PRIVATE
select MMU_NOTIFIER
help
  Enable this if you want to use HSA features on AMD GPU devices.
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 02/35] drm/amdgpu: replace per_device_list by array

2021-01-06 Thread Felix Kuehling

From: Alex Sierra 

Remove per_device_list from kfd_process and replace it with a
kfd_process_device pointers array of MAX_GPU_INSTANCES size. This helps
to manage the kfd_process_devices binded to a specific kfd_process.
Also, functions used by kfd_chardev to iterate over the list were
removed, since they are not valid anymore. Instead, it was replaced by a
local loop iterating the array.

Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 116 --
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c|   8 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  20 +--
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 108 
 .../amd/amdkfd/kfd_process_queue_manager.c|   6 +-
 5 files changed, 111 insertions(+), 147 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 8cc51cec988a..8c87afce12df 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -874,52 +874,47 @@ static int kfd_ioctl_get_process_apertures(struct file 
*filp,
 {
struct kfd_ioctl_get_process_apertures_args *args = data;
struct kfd_process_device_apertures *pAperture;
-   struct kfd_process_device *pdd;
+   int i;
 
dev_dbg(kfd_device, "get apertures for PASID 0x%x", p->pasid);
 
args->num_of_nodes = 0;
 
mutex_lock(&p->mutex);
+   /* Run over all pdd of the process */
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+
+   pAperture =
+   &args->process_apertures[args->num_of_nodes];
+   pAperture->gpu_id = pdd->dev->id;
+   pAperture->lds_base = pdd->lds_base;
+   pAperture->lds_limit = pdd->lds_limit;
+   pAperture->gpuvm_base = pdd->gpuvm_base;
+   pAperture->gpuvm_limit = pdd->gpuvm_limit;
+   pAperture->scratch_base = pdd->scratch_base;
+   pAperture->scratch_limit = pdd->scratch_limit;
 
-   /*if the process-device list isn't empty*/
-   if (kfd_has_process_device_data(p)) {
-   /* Run over all pdd of the process */
-   pdd = kfd_get_first_process_device_data(p);
-   do {
-   pAperture =
-   &args->process_apertures[args->num_of_nodes];
-   pAperture->gpu_id = pdd->dev->id;
-   pAperture->lds_base = pdd->lds_base;
-   pAperture->lds_limit = pdd->lds_limit;
-   pAperture->gpuvm_base = pdd->gpuvm_base;
-   pAperture->gpuvm_limit = pdd->gpuvm_limit;
-   pAperture->scratch_base = pdd->scratch_base;
-   pAperture->scratch_limit = pdd->scratch_limit;
-
-   dev_dbg(kfd_device,
-   "node id %u\n", args->num_of_nodes);
-   dev_dbg(kfd_device,
-   "gpu id %u\n", pdd->dev->id);
-   dev_dbg(kfd_device,
-   "lds_base %llX\n", pdd->lds_base);
-   dev_dbg(kfd_device,
-   "lds_limit %llX\n", pdd->lds_limit);
-   dev_dbg(kfd_device,
-   "gpuvm_base %llX\n", pdd->gpuvm_base);
-   dev_dbg(kfd_device,
-   "gpuvm_limit %llX\n", pdd->gpuvm_limit);
-   dev_dbg(kfd_device,
-   "scratch_base %llX\n", pdd->scratch_base);
-   dev_dbg(kfd_device,
-   "scratch_limit %llX\n", pdd->scratch_limit);
-
-   args->num_of_nodes++;
-
-   pdd = kfd_get_next_process_device_data(p, pdd);
-   } while (pdd && (args->num_of_nodes < NUM_OF_SUPPORTED_GPUS));
-   }
+   dev_dbg(kfd_device,
+   "node id %u\n", args->num_of_nodes);
+   dev_dbg(kfd_device,
+   "gpu id %u\n", pdd->dev->id);
+   dev_dbg(kfd_device,
+   "lds_base %llX\n", pdd->lds_base);
+   dev_dbg(kfd_device,
+   "lds_limit %llX\n", pdd->lds_limit);
+   dev_dbg(kfd_device,
+   "gpuvm_base %llX\n", pdd->gpuvm_base);
+   dev_dbg(kfd_device,
+   "gpuvm_limit %llX\n", pdd->gpuvm_limit);
+   dev_dbg(kfd_device,
+   "scratch_base %llX\n", pdd->scratch_base);
+   dev_dbg(kfd_device,
+   "scratch_limit %llX\n", pdd->scratch_limit);
 
+   if (++args->num_of_nodes >= NUM_OF_SUPPORTED_GPUS)
+   break;
+   }
mutex_unlock(&p->mutex);
 
return 0;
@@ -930,9 +9

[PATCH 04/35] drm/amdkfd: add svm ioctl API

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

Add svm (shared virtual memory) ioctl data structure and API definition.

The svm ioctl API is designed to be extensible in the future. All
operations are provided by a single IOCTL to preserve ioctl number
space. The arguments structure ends with a variable size array of
attributes that can be used to set or get one or multiple attributes.

Signed-off-by: Philip Yang 
Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |   7 ++
 include/uapi/linux/kfd_ioctl.h   | 128 ++-
 2 files changed, 133 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 8c87afce12df..c5288a6e45b9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1746,6 +1746,11 @@ static int kfd_ioctl_smi_events(struct file *filep,
return kfd_smi_event_open(dev, &args->anon_fd);
 }
 
+static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data)
+{
+   return -EINVAL;
+}
+
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
.cmd_drv = 0, .name = #ioctl}
@@ -1844,6 +1849,8 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_SMI_EVENTS,
kfd_ioctl_smi_events, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_SVM, kfd_ioctl_svm, 0),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 695b606da4b1..5d4a4b3e0b61 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -29,9 +29,10 @@
 /*
  * - 1.1 - initial version
  * - 1.3 - Add SMI events support
+ * - 1.4 - Add SVM API
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 3
+#define KFD_IOCTL_MINOR_VERSION 4
 
 struct kfd_ioctl_get_version_args {
__u32 major_version;/* from KFD */
@@ -471,6 +472,127 @@ enum kfd_mmio_remap {
KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL = 4,
 };
 
+/* Guarantee host access to memory */
+#define KFD_IOCTL_SVM_FLAG_HOST_ACCESS 0x0001
+/* Fine grained coherency between all devices with access */
+#define KFD_IOCTL_SVM_FLAG_COHERENT0x0002
+/* Use any GPU in same hive as preferred device */
+#define KFD_IOCTL_SVM_FLAG_HIVE_LOCAL  0x0004
+/* GPUs only read, allows replication */
+#define KFD_IOCTL_SVM_FLAG_GPU_RO  0x0008
+/* Allow execution on GPU */
+#define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010
+
+/**
+ * kfd_ioctl_svm_op - SVM ioctl operations
+ *
+ * @KFD_IOCTL_SVM_OP_SET_ATTR: Modify one or more attributes
+ * @KFD_IOCTL_SVM_OP_GET_ATTR: Query one or more attributes
+ */
+enum kfd_ioctl_svm_op {
+   KFD_IOCTL_SVM_OP_SET_ATTR,
+   KFD_IOCTL_SVM_OP_GET_ATTR
+};
+
+/** kfd_ioctl_svm_location - Enum for preferred and prefetch locations
+ *
+ * GPU IDs are used to specify GPUs as preferred and prefetch locations.
+ * Below definitions are used for system memory or for leaving the preferred
+ * location unspecified.
+ */
+enum kfd_ioctl_svm_location {
+   KFD_IOCTL_SVM_LOCATION_SYSMEM = 0,
+   KFD_IOCTL_SVM_LOCATION_UNDEFINED = 0x
+};
+
+/**
+ * kfd_ioctl_svm_attr_type - SVM attribute types
+ *
+ * @KFD_IOCTL_SVM_ATTR_PREFERRED_LOC: gpuid of the preferred location, 0 for
+ *system memory
+ * @KFD_IOCTL_SVM_ATTR_PREFETCH_LOC: gpuid of the prefetch location, 0 for
+ *   system memory. Setting this triggers an
+ *   immediate prefetch (migration).
+ * @KFD_IOCTL_SVM_ATTR_ACCESS:
+ * @KFD_IOCTL_SVM_ATTR_ACCESS_IN_PLACE:
+ * @KFD_IOCTL_SVM_ATTR_NO_ACCESS: specify memory access for the gpuid given
+ *by the attribute value
+ * @KFD_IOCTL_SVM_ATTR_SET_FLAGS: bitmask of flags to set (see
+ *KFD_IOCTL_SVM_FLAG_...)
+ * @KFD_IOCTL_SVM_ATTR_CLR_FLAGS: bitmask of flags to clear
+ * @KFD_IOCTL_SVM_ATTR_GRANULARITY: migration granularity
+ *  (log2 num pages)
+ */
+enum kfd_ioctl_svm_attr_type {
+   KFD_IOCTL_SVM_ATTR_PREFERRED_LOC,
+   KFD_IOCTL_SVM_ATTR_PREFETCH_LOC,
+   KFD_IOCTL_SVM_ATTR_ACCESS,
+   KFD_IOCTL_SVM_ATTR_ACCESS_IN_PLACE,
+   KFD_IOCTL_SVM_ATTR_NO_ACCESS,
+   KFD_IOCTL_SVM_ATTR_SET_FLAGS,
+   KFD_IOCTL_SVM_ATTR_CLR_FLAGS,
+   KFD_IOCTL_SVM_ATTR_GRANULARITY
+};
+
+/**
+ * kfd_ioctl_svm_attribute - Attributes as pairs of type and value
+ *
+ * The meaning of the @value depends on the attribute type.
+ *
+ * @type: attribute type (see enum @kfd_ioctl_svm_attr_type)
+ * @value: attribute value
+ */
+struct kfd_ioctl_svm_attribute {
+   __u32 type;
+   __u32 value;
+};
+
+/**
+ * kfd_ioctl_svm_args - Arguments for

[PATCH 05/35] drm/amdkfd: Add SVM API support capability bits

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

SVMAPISupported property added to HSA_CAPABILITY, the value match
HSA_CAPABILITY defined in Thunk spec:

SVMAPISupported: it will not be supported on older kernels that don't
have HMM or on GFXv8 or older GPUs without support for 48-bit virtual
addresses.

CoherentHostAccess property added to HSA_MEMORYPROPERTY, the value match
HSA_MEMORYPROPERTY defined in Thunk spec:

CoherentHostAccess: whether or not device memory can be coherently
accessed by the host CPU.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 10 ++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index a3fc23873819..885b8a071717 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1380,6 +1380,7 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
dev->node_props.capability |= ((HSA_CAP_DOORBELL_TYPE_2_0 <<
HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) &
HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK);
+   dev->node_props.capability |= HSA_CAP_SVMAPI_SUPPORTED;
break;
default:
WARN(1, "Unexpected ASIC family %u",
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 326d9b26b7aa..7c5ea9b4b9d9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -52,8 +52,9 @@
 #define HSA_CAP_RASEVENTNOTIFY 0x0020
 #define HSA_CAP_ASIC_REVISION_MASK 0x03c0
 #define HSA_CAP_ASIC_REVISION_SHIFT22
+#define HSA_CAP_SVMAPI_SUPPORTED   0x0400
 
-#define HSA_CAP_RESERVED   0xfc078000
+#define HSA_CAP_RESERVED   0xf8078000
 
 struct kfd_node_properties {
uint64_t hive_id;
@@ -98,9 +99,10 @@ struct kfd_node_properties {
 #define HSA_MEM_HEAP_TYPE_GPU_LDS  4
 #define HSA_MEM_HEAP_TYPE_GPU_SCRATCH  5
 
-#define HSA_MEM_FLAGS_HOT_PLUGGABLE0x0001
-#define HSA_MEM_FLAGS_NON_VOLATILE 0x0002
-#define HSA_MEM_FLAGS_RESERVED 0xfffc
+#define HSA_MEM_FLAGS_HOT_PLUGGABLE0x0001
+#define HSA_MEM_FLAGS_NON_VOLATILE 0x0002
+#define HSA_MEM_FLAGS_COHERENTHOSTACCESS   0x0004
+#define HSA_MEM_FLAGS_RESERVED 0xfff8
 
 struct kfd_mem_properties {
struct list_headlist;
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 06/35] drm/amdkfd: register svm range

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

svm range structure stores the range start address, size, attributes,
flags, prefetch location and gpu bitmap which indicates which GPU this
range maps to. Same virtual address is shared by CPU and GPUs.

Process has svm range list which uses both interval tree and list to
store all svm ranges registered by the process. Interval tree is used by
GPU vm fault handler and CPU page fault handler to get svm range
structure from the specific address. List is used to scan all ranges in
eviction restore work.

Apply attributes preferred location, prefetch location, mapping flags,
migration granularity to svm range, store mapping gpu index into bitmap.

Signed-off-by: Philip Yang 
Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/Makefile  |   3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  21 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  14 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |   9 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 603 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  93 
 6 files changed, 741 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_svm.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_svm.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index e1e4115dcf78..387ce0217d35 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -54,7 +54,8 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_dbgdev.o \
$(AMDKFD_PATH)/kfd_dbgmgr.o \
$(AMDKFD_PATH)/kfd_smi_events.o \
-   $(AMDKFD_PATH)/kfd_crat.o
+   $(AMDKFD_PATH)/kfd_crat.o \
+   $(AMDKFD_PATH)/kfd_svm.o
 
 ifneq ($(CONFIG_AMD_IOMMU_V2),)
 AMDKFD_FILES += $(AMDKFD_PATH)/kfd_iommu.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index c5288a6e45b9..2d3ba7e806d5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -38,6 +38,7 @@
 #include "kfd_priv.h"
 #include "kfd_device_queue_manager.h"
 #include "kfd_dbgmgr.h"
+#include "kfd_svm.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
 
@@ -1748,7 +1749,25 @@ static int kfd_ioctl_smi_events(struct file *filep,
 
 static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data)
 {
-   return -EINVAL;
+   struct kfd_ioctl_svm_args *args = data;
+   int r = 0;
+
+   pr_debug("start 0x%llx size 0x%llx op 0x%x nattr 0x%x\n",
+args->start_addr, args->size, args->op, args->nattr);
+
+   if ((args->start_addr & ~PAGE_MASK) || (args->size & ~PAGE_MASK))
+   return -EINVAL;
+   if (!args->start_addr || !args->size)
+   return -EINVAL;
+
+   mutex_lock(&p->mutex);
+
+   r = svm_ioctl(p, args->op, args->start_addr, args->size, args->nattr,
+ args->attrs);
+
+   mutex_unlock(&p->mutex);
+
+   return r;
 }
 
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 4ef8804adcf5..cbb2bae1982d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -726,6 +726,17 @@ struct kfd_process_device {
 
 #define qpd_to_pdd(x) container_of(x, struct kfd_process_device, qpd)
 
+struct svm_range_list {
+   struct mutexlock; /* use svms_lock/unlock(svms) */;
+   unsigned intsaved_flags;
+   struct rb_root_cached   objects;
+   struct list_headlist;
+   struct srcu_struct  srcu;
+   struct work_struct  srcu_free_work;
+   struct list_headfree_list;
+   struct mutexfree_list_lock;
+};
+
 /* Process data */
 struct kfd_process {
/*
@@ -804,6 +815,9 @@ struct kfd_process {
struct kobject *kobj;
struct kobject *kobj_queues;
struct attribute attr_pasid;
+
+   /* shared virtual memory registered by this process */
+   struct svm_range_list svms;
 };
 
 #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 7396f3a6d0ee..791f17308b1b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -35,6 +35,7 @@
 #include 
 #include "amdgpu_amdkfd.h"
 #include "amdgpu.h"
+#include "kfd_svm.h"
 
 struct mm_struct;
 
@@ -42,6 +43,7 @@ struct mm_struct;
 #include "kfd_device_queue_manager.h"
 #include "kfd_dbgmgr.h"
 #include "kfd_iommu.h"
+#include "kfd_svm.h"
 
 /*
  * List of struct kfd_process (field kfd_process).
@@ -997,6 +999,7 @@ static void kfd_process_wq_release(struct work_struct *work)
kfd_iommu_unbind_process(p);
 
k

[PATCH 03/35] drm/amdkfd: helper to convert gpu id and idx

2021-01-06 Thread Felix Kuehling

From: Alex Sierra 

svm range uses gpu bitmap to store which GPU svm range maps to.
Application pass driver gpu id to specify GPU, the helper is needed to
convert gpu id to gpu bitmap idx.

Access through kfd_process_device pointers array from kfd_process.

Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  5 
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 30 
 2 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index d9f8d3d48aac..4ef8804adcf5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -837,6 +837,11 @@ struct kfd_process *kfd_create_process(struct file *filep);
 struct kfd_process *kfd_get_process(const struct task_struct *);
 struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid);
 struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm);
+int kfd_process_gpuid_from_gpuidx(struct kfd_process *p,
+   uint32_t gpu_idx, uint32_t *gpuid);
+int kfd_process_gpuidx_from_gpuid(struct kfd_process *p, uint32_t gpu_id);
+int kfd_process_device_from_gpuidx(struct kfd_process *p,
+   uint32_t gpu_idx, struct kfd_dev **gpu);
 void kfd_unref_process(struct kfd_process *p);
 int kfd_process_evict_queues(struct kfd_process *p);
 int kfd_process_restore_queues(struct kfd_process *p);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 031e752e3154..7396f3a6d0ee 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1561,6 +1561,36 @@ int kfd_process_restore_queues(struct kfd_process *p)
return ret;
 }
 
+int kfd_process_gpuid_from_gpuidx(struct kfd_process *p,
+   uint32_t gpu_idx, uint32_t *gpuid)
+{
+   if (gpu_idx < p->n_pdds) {
+   *gpuid = p->pdds[gpu_idx]->dev->id;
+   return 0;
+   }
+   return -EINVAL;
+}
+
+int kfd_process_gpuidx_from_gpuid(struct kfd_process *p, uint32_t gpu_id)
+{
+   int i;
+
+   for (i = 0; i < p->n_pdds; i++)
+   if (p->pdds[i] && gpu_id == p->pdds[i]->dev->id)
+   return i;
+   return -EINVAL;
+}
+
+int kfd_process_device_from_gpuidx(struct kfd_process *p,
+   uint32_t gpu_idx, struct kfd_dev **gpu)
+{
+   if (gpu_idx < p->n_pdds) {
+   *gpu = p->pdds[gpu_idx]->dev;
+   return 0;
+   }
+   return -EINVAL;
+}
+
 static void evict_process_worker(struct work_struct *work)
 {
int ret;
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 08/35] drm/amdgpu: add common HMM get pages function

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

Move the HMM get pages function from amdgpu_ttm and to amdgpu_mn. This
common function will be used by new svm APIs.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 83 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h  |  7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 76 +++---
 3 files changed, 100 insertions(+), 66 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index 828b5167ff12..997da4237a10 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -155,3 +155,86 @@ void amdgpu_mn_unregister(struct amdgpu_bo *bo)
mmu_interval_notifier_remove(&bo->notifier);
bo->notifier.mm = NULL;
 }
+
+int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier,
+  struct mm_struct *mm, struct page **pages,
+  uint64_t start, uint64_t npages,
+  struct hmm_range **phmm_range, bool readonly,
+  bool mmap_locked)
+{
+   struct hmm_range *hmm_range;
+   unsigned long timeout;
+   unsigned long i;
+   unsigned long *pfns;
+   int r = 0;
+
+   hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
+   if (unlikely(!hmm_range))
+   return -ENOMEM;
+
+   pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
+   if (unlikely(!pfns)) {
+   r = -ENOMEM;
+   goto out_free_range;
+   }
+
+   hmm_range->notifier = notifier;
+   hmm_range->default_flags = HMM_PFN_REQ_FAULT;
+   if (!readonly)
+   hmm_range->default_flags |= HMM_PFN_REQ_WRITE;
+   hmm_range->hmm_pfns = pfns;
+   hmm_range->start = start;
+   hmm_range->end = start + npages * PAGE_SIZE;
+   timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
+
+retry:
+   hmm_range->notifier_seq = mmu_interval_read_begin(notifier);
+
+   if (likely(!mmap_locked))
+   mmap_read_lock(mm);
+
+   r = hmm_range_fault(hmm_range);
+
+   if (likely(!mmap_locked))
+   mmap_read_unlock(mm);
+   if (unlikely(r)) {
+   /*
+* FIXME: This timeout should encompass the retry from
+* mmu_interval_read_retry() as well.
+*/
+   if (r == -EBUSY && !time_after(jiffies, timeout))
+   goto retry;
+   goto out_free_pfns;
+   }
+
+   /*
+* Due to default_flags, all pages are HMM_PFN_VALID or
+* hmm_range_fault() fails. FIXME: The pages cannot be touched outside
+* the notifier_lock, and mmu_interval_read_retry() must be done first.
+*/
+   for (i = 0; pages && i < npages; i++)
+   pages[i] = hmm_pfn_to_page(pfns[i]);
+
+   *phmm_range = hmm_range;
+
+   return 0;
+
+out_free_pfns:
+   kvfree(pfns);
+out_free_range:
+   kfree(hmm_range);
+
+   return r;
+}
+
+int amdgpu_hmm_range_get_pages_done(struct hmm_range *hmm_range)
+{
+   int r;
+
+   r = mmu_interval_read_retry(hmm_range->notifier,
+   hmm_range->notifier_seq);
+   kvfree(hmm_range->hmm_pfns);
+   kfree(hmm_range);
+
+   return r;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h
index a292238f75eb..7f7d37a457c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h
@@ -30,6 +30,13 @@
 #include 
 #include 
 
+int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier,
+  struct mm_struct *mm, struct page **pages,
+  uint64_t start, uint64_t npages,
+  struct hmm_range **phmm_range, bool readonly,
+  bool mmap_locked);
+int amdgpu_hmm_range_get_pages_done(struct hmm_range *hmm_range);
+
 #if defined(CONFIG_HMM_MIRROR)
 int amdgpu_mn_register(struct amdgpu_bo *bo, unsigned long addr);
 void amdgpu_mn_unregister(struct amdgpu_bo *bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index aaad9e304ad9..f423f42cb9b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -32,7 +32,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -843,10 +842,8 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
struct page **pages)
struct amdgpu_ttm_tt *gtt = (void *)ttm;
unsigned long start = gtt->userptr;
struct vm_area_struct *vma;
-   struct hmm_range *range;
-   unsigned long timeout;
struct mm_struct *mm;
-   unsigned long i;
+   bool readonly;
int r = 0;
 
mm = bo->notifier.mm;
@@ -862,76 +859,26 @@ int amdgpu_ttm_tt_get_u

[PATCH 09/35] drm/amdkfd: validate svm range system memory

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

Use HMM to get system memory pages address, which will be used to
map to GPUs or migrate to vram.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c  | 88 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h  |  2 +
 3 files changed, 91 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index cbb2bae1982d..97cf267b6f51 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -735,6 +735,7 @@ struct svm_range_list {
struct work_struct  srcu_free_work;
struct list_headfree_list;
struct mutexfree_list_lock;
+   struct mmu_interval_notifiernotifier;
 };
 
 /* Process data */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 017e77e9ae1e..02918faa70d5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -135,6 +135,65 @@ svm_get_supported_dev_by_id(struct kfd_process *p, 
uint32_t gpu_id,
return dev;
 }
 
+/**
+ * svm_range_validate_ram - get system memory pages of svm range
+ *
+ * @mm: the mm_struct of process
+ * @prange: the range struct
+ *
+ * After mapping system memory to GPU, system memory maybe invalidated anytime
+ * during application running, we use HMM callback to sync GPU with CPU page
+ * table update, so we don't need use lock to prevent CPU invalidation and 
check
+ * hmm_range_get_pages_done return value.
+ *
+ * Return:
+ * 0 - OK, otherwise error code
+ */
+static int
+svm_range_validate_ram(struct mm_struct *mm, struct svm_range *prange)
+{
+   uint64_t i;
+   int r;
+
+   if (!prange->pages_addr) {
+   prange->pages_addr = kvmalloc_array(prange->npages,
+   sizeof(*prange->pages_addr),
+   GFP_KERNEL | __GFP_ZERO);
+   if (!prange->pages_addr)
+   return -ENOMEM;
+   }
+
+   r = amdgpu_hmm_range_get_pages(&prange->svms->notifier, mm, NULL,
+  prange->it_node.start << PAGE_SHIFT,
+  prange->npages, &prange->hmm_range,
+  false, true);
+   if (r) {
+   pr_debug("failed %d to get svm range pages\n", r);
+   return r;
+   }
+
+   for (i = 0; i < prange->npages; i++)
+   prange->pages_addr[i] =
+   PFN_PHYS(prange->hmm_range->hmm_pfns[i]);
+
+   amdgpu_hmm_range_get_pages_done(prange->hmm_range);
+   prange->hmm_range = NULL;
+
+   return 0;
+}
+
+static int
+svm_range_validate(struct mm_struct *mm, struct svm_range *prange)
+{
+   int r = 0;
+
+   pr_debug("actual loc 0x%x\n", prange->actual_loc);
+
+   r = svm_range_validate_ram(mm, prange);
+
+   return r;
+}
+
 static int
 svm_range_apply_attrs(struct kfd_process *p, struct svm_range *prange,
  uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs)
@@ -349,10 +408,28 @@ static void svm_range_srcu_free_work(struct work_struct 
*work_struct)
mutex_unlock(&svms->free_list_lock);
 }
 
+/**
+ * svm_range_cpu_invalidate_pagetables - interval notifier callback
+ *
+ */
+static bool
+svm_range_cpu_invalidate_pagetables(struct mmu_interval_notifier *mni,
+   const struct mmu_notifier_range *range,
+   unsigned long cur_seq)
+{
+   return true;
+}
+
+static const struct mmu_interval_notifier_ops svm_range_mn_ops = {
+   .invalidate = svm_range_cpu_invalidate_pagetables,
+};
+
 void svm_range_list_fini(struct kfd_process *p)
 {
pr_debug("pasid 0x%x svms 0x%p\n", p->pasid, &p->svms);
 
+   mmu_interval_notifier_remove(&p->svms.notifier);
+
/* Ensure srcu free work is finished before process is destroyed */
flush_work(&p->svms.srcu_free_work);
cleanup_srcu_struct(&p->svms.srcu);
@@ -375,6 +452,8 @@ int svm_range_list_init(struct kfd_process *p)
INIT_WORK(&svms->srcu_free_work, svm_range_srcu_free_work);
INIT_LIST_HEAD(&svms->free_list);
mutex_init(&svms->free_list_lock);
+   mmu_interval_notifier_insert(&svms->notifier, current->mm, 0, ~1ULL,
+&svm_range_mn_ops);
 
return 0;
 }
@@ -531,6 +610,15 @@ svm_range_set_attr(struct kfd_process *p, uint64_t start, 
uint64_t size,
r = svm_range_apply_attrs(p, prange, nattr, attrs);
if (r) {
pr_debug("failed %d to apply attrs\n", r);
+   goto out_unlock;
+   }
+
+   r = svm_range_validate(mm, prange);
+   if (r)
+   pr_debug("failed %

[PATCH 10/35] drm/amdkfd: register overlap system memory range

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

No overlap range interval [start, last] exist in svms object interval
tree. If process registers new range which has overlap with old range,
the old range split into 2 ranges depending on the overlap happens at
head or tail part of old range.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 297 ++-
 1 file changed, 294 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 02918faa70d5..ad007261f54c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -293,6 +293,278 @@ static void svm_range_debug_dump(struct svm_range_list 
*svms)
}
 }
 
+static bool
+svm_range_is_same_attrs(struct svm_range *old, struct svm_range *new)
+{
+   return (old->prefetch_loc == new->prefetch_loc &&
+   old->flags == new->flags &&
+   old->granularity == new->granularity);
+}
+
+static int
+svm_range_split_pages(struct svm_range *new, struct svm_range *old,
+ uint64_t start, uint64_t last)
+{
+   unsigned long old_start;
+   dma_addr_t *pages_addr;
+   uint64_t d;
+
+   old_start = old->it_node.start;
+   new->pages_addr = kvmalloc_array(new->npages,
+sizeof(*new->pages_addr),
+GFP_KERNEL | __GFP_ZERO);
+   if (!new->pages_addr)
+   return -ENOMEM;
+
+   d = new->it_node.start - old_start;
+   memcpy(new->pages_addr, old->pages_addr + d,
+  new->npages * sizeof(*new->pages_addr));
+
+   old->npages = last - start + 1;
+   old->it_node.start = start;
+   old->it_node.last = last;
+
+   pages_addr = kvmalloc_array(old->npages, sizeof(*pages_addr),
+   GFP_KERNEL);
+   if (!pages_addr) {
+   kvfree(new->pages_addr);
+   return -ENOMEM;
+   }
+
+   d = start - old_start;
+   memcpy(pages_addr, old->pages_addr + d,
+  old->npages * sizeof(*pages_addr));
+
+   kvfree(old->pages_addr);
+   old->pages_addr = pages_addr;
+
+   return 0;
+}
+
+/**
+ * svm_range_split_adjust - split range and adjust
+ *
+ * @new: new range
+ * @old: the old range
+ * @start: the old range adjust to start address in pages
+ * @last: the old range adjust to last address in pages
+ *
+ * Copy system memory pages, pages_addr or vram mm_nodes in old range to new
+ * range from new_start up to size new->npages, the remaining old range is from
+ * start to last
+ *
+ * Return:
+ * 0 - OK, -ENOMEM - out of memory
+ */
+static int
+svm_range_split_adjust(struct svm_range *new, struct svm_range *old,
+ uint64_t start, uint64_t last)
+{
+   int r = -EINVAL;
+
+   pr_debug("svms 0x%p new 0x%lx old [0x%lx 0x%lx] => [0x%llx 0x%llx]\n",
+new->svms, new->it_node.start, old->it_node.start,
+old->it_node.last, start, last);
+
+   if (new->it_node.start < old->it_node.start ||
+   new->it_node.last > old->it_node.last) {
+   WARN_ONCE(1, "invalid new range start or last\n");
+   return -EINVAL;
+   }
+
+   if (old->pages_addr)
+   r = svm_range_split_pages(new, old, start, last);
+   else
+   WARN_ONCE(1, "split adjust invalid pages_addr and nodes\n");
+   if (r)
+   return r;
+
+   new->flags = old->flags;
+   new->preferred_loc = old->preferred_loc;
+   new->prefetch_loc = old->prefetch_loc;
+   new->actual_loc = old->actual_loc;
+   new->granularity = old->granularity;
+   bitmap_copy(new->bitmap_access, old->bitmap_access, MAX_GPU_INSTANCE);
+   bitmap_copy(new->bitmap_aip, old->bitmap_aip, MAX_GPU_INSTANCE);
+
+   return 0;
+}
+
+/**
+ * svm_range_split - split a range in 2 ranges
+ *
+ * @prange: the svm range to split
+ * @start: the remaining range start address in pages
+ * @last: the remaining range last address in pages
+ * @new: the result new range generated
+ *
+ * Two cases only:
+ * case 1: if start == prange->it_node.start
+ * prange ==> prange[start, last]
+ * new range [last + 1, prange->it_node.last]
+ *
+ * case 2: if last == prange->it_node.last
+ * prange ==> prange[start, last]
+ * new range [prange->it_node.start, start - 1]
+ *
+ * Context: Caller hold svms->rw_sem as write mode
+ *
+ * Return:
+ * 0 - OK, -ENOMEM - out of memory, -EINVAL - invalid start, last
+ */
+static int
+svm_range_split(struct svm_range *prange, uint64_t start, uint64_t last,
+   struct svm_range **new)
+{
+   uint64_t old_start = prange->it_node.start;
+   uint64_t old_last = prange->it_node.last;
+   struct svm_range_list *svms;
+   int r = 0;
+
+   pr_debug("svms 0x%p [0x%llx 0x%llx] to [0x%llx 0x%llx]\n", prange->svms,
+

[PATCH 07/35] drm/amdkfd: add svm ioctl GET_ATTR op

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

Get the intersection of attributes over all memory in the given
range

Signed-off-by: Philip Yang 
Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 175 ++-
 1 file changed, 173 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 0b0410837be9..017e77e9ae1e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -75,8 +75,8 @@ static void
 svm_range_set_default_attributes(int32_t *location, int32_t *prefetch_loc,
 uint8_t *granularity, uint32_t *flags)
 {
-   *location = 0;
-   *prefetch_loc = 0;
+   *location = KFD_IOCTL_SVM_LOCATION_UNDEFINED;
+   *prefetch_loc = KFD_IOCTL_SVM_LOCATION_UNDEFINED;
*granularity = 9;
*flags =
KFD_IOCTL_SVM_FLAG_HOST_ACCESS | KFD_IOCTL_SVM_FLAG_COHERENT;
@@ -581,6 +581,174 @@ svm_range_set_attr(struct kfd_process *p, uint64_t start, 
uint64_t size,
return r;
 }
 
+static int
+svm_range_get_attr(struct kfd_process *p, uint64_t start, uint64_t size,
+  uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs)
+{
+   DECLARE_BITMAP(bitmap_access, MAX_GPU_INSTANCE);
+   DECLARE_BITMAP(bitmap_aip, MAX_GPU_INSTANCE);
+   bool get_preferred_loc = false;
+   bool get_prefetch_loc = false;
+   bool get_granularity = false;
+   bool get_accessible = false;
+   bool get_flags = false;
+   uint64_t last = start + size - 1UL;
+   struct mm_struct *mm = current->mm;
+   uint8_t granularity = 0xff;
+   struct interval_tree_node *node;
+   struct svm_range_list *svms;
+   struct svm_range *prange;
+   uint32_t prefetch_loc = KFD_IOCTL_SVM_LOCATION_UNDEFINED;
+   uint32_t location = KFD_IOCTL_SVM_LOCATION_UNDEFINED;
+   uint32_t flags = 0x;
+   int gpuidx;
+   uint32_t i;
+
+   pr_debug("svms 0x%p [0x%llx 0x%llx] nattr 0x%x\n", &p->svms, start,
+start + size - 1, nattr);
+
+   mmap_read_lock(mm);
+   if (!svm_range_is_valid(mm, start, size)) {
+   pr_debug("invalid range\n");
+   mmap_read_unlock(mm);
+   return -EINVAL;
+   }
+   mmap_read_unlock(mm);
+
+   for (i = 0; i < nattr; i++) {
+   switch (attrs[i].type) {
+   case KFD_IOCTL_SVM_ATTR_PREFERRED_LOC:
+   get_preferred_loc = true;
+   break;
+   case KFD_IOCTL_SVM_ATTR_PREFETCH_LOC:
+   get_prefetch_loc = true;
+   break;
+   case KFD_IOCTL_SVM_ATTR_ACCESS:
+   if (!svm_get_supported_dev_by_id(
+   p, attrs[i].value, NULL))
+   return -EINVAL;
+   get_accessible = true;
+   break;
+   case KFD_IOCTL_SVM_ATTR_SET_FLAGS:
+   get_flags = true;
+   break;
+   case KFD_IOCTL_SVM_ATTR_GRANULARITY:
+   get_granularity = true;
+   break;
+   case KFD_IOCTL_SVM_ATTR_CLR_FLAGS:
+   case KFD_IOCTL_SVM_ATTR_ACCESS_IN_PLACE:
+   case KFD_IOCTL_SVM_ATTR_NO_ACCESS:
+   fallthrough;
+   default:
+   pr_debug("get invalid attr type 0x%x\n", attrs[i].type);
+   return -EINVAL;
+   }
+   }
+
+   svms = &p->svms;
+
+   svms_lock(svms);
+
+   node = interval_tree_iter_first(&svms->objects, start, last);
+   if (!node) {
+   pr_debug("range attrs not found return default values\n");
+   svm_range_set_default_attributes(&location, &prefetch_loc,
+&granularity, &flags);
+   /* TODO: Automatically create SVM ranges and map them on
+* GPU page faults
+   if (p->xnack_enabled)
+   bitmap_fill(bitmap_access, MAX_GPU_INSTANCE);
+   FIXME: Only set bits for supported GPUs
+   FIXME: I think this should be done inside
+   svm_range_set_default_attributes, so that it will
+   apply to all newly created ranges
+*/
+
+   goto fill_values;
+   }
+   bitmap_fill(bitmap_access, MAX_GPU_INSTANCE);
+   bitmap_fill(bitmap_aip, MAX_GPU_INSTANCE);
+
+   while (node) {
+   struct interval_tree_node *next;
+
+   prange = container_of(node, struct svm_range, it_node);
+   next = interval_tree_iter_next(node, start, last);
+
+   if (get_preferred_loc) {
+   if (prange->preferred_loc ==
+

[PATCH 12/35] drm/amdgpu: export vm update mapping interface

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

It will be used by kfd to map svm range to GPU, because svm range does
not have amdgpu_bo and bo_va, cannot use amdgpu_bo_update interface, use
amdgpu vm update interface directly.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 10 ++
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index fdbe7d4e8b8b..9c557e8bf0e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1589,15 +1589,14 @@ static int amdgpu_vm_update_ptes(struct 
amdgpu_vm_update_params *params,
  * Returns:
  * 0 for success, -EINVAL for failure.
  */
-static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
-  struct amdgpu_device *bo_adev,
-  struct amdgpu_vm *vm, bool immediate,
-  bool unlocked, struct dma_resv *resv,
-  uint64_t start, uint64_t last,
-  uint64_t flags, uint64_t offset,
-  struct drm_mm_node *nodes,
-  dma_addr_t *pages_addr,
-  struct dma_fence **fence)
+int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
+   struct amdgpu_device *bo_adev,
+   struct amdgpu_vm *vm, bool immediate,
+   bool unlocked, struct dma_resv *resv,
+   uint64_t start, uint64_t last, uint64_t flags,
+   uint64_t offset, struct drm_mm_node *nodes,
+   dma_addr_t *pages_addr,
+   struct dma_fence **fence)
 {
struct amdgpu_vm_update_params params;
enum amdgpu_sync_mode sync_mode;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 2bf4ef5fb3e1..73ca630520fd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -366,6 +366,8 @@ struct amdgpu_vm_manager {
spinlock_t  pasid_lock;
 };
 
+struct amdgpu_bo_va_mapping;
+
 #define amdgpu_vm_copy_pte(adev, ib, pe, src, count) 
((adev)->vm_manager.vm_pte_funcs->copy_pte((ib), (pe), (src), (count)))
 #define amdgpu_vm_write_pte(adev, ib, pe, value, count, incr) 
((adev)->vm_manager.vm_pte_funcs->write_pte((ib), (pe), (value), (count), 
(incr)))
 #define amdgpu_vm_set_pte_pde(adev, ib, pe, addr, count, incr, flags) 
((adev)->vm_manager.vm_pte_funcs->set_pte_pde((ib), (pe), (addr), (count), 
(incr), (flags)))
@@ -397,6 +399,14 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
  struct dma_fence **fence);
 int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
   struct amdgpu_vm *vm);
+int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
+   struct amdgpu_device *bo_adev,
+   struct amdgpu_vm *vm, bool immediate,
+   bool unlocked, struct dma_resv *resv,
+   uint64_t start, uint64_t last, uint64_t flags,
+   uint64_t offset, struct drm_mm_node *nodes,
+   dma_addr_t *pages_addr,
+   struct dma_fence **fence);
 int amdgpu_vm_bo_update(struct amdgpu_device *adev,
struct amdgpu_bo_va *bo_va,
bool clear);
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 13/35] drm/amdkfd: map svm range to GPUs

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

Use amdgpu_vm_bo_update_mapping to update GPU page table to map or unmap
svm range system memory pages address to GPUs.

Signed-off-by: Philip Yang 
Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 232 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |   2 +
 2 files changed, 233 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 55500ec4972f..3c4a036609c4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -534,6 +534,229 @@ svm_range_split_add_front(struct svm_range *prange, 
struct svm_range *new,
return 0;
 }
 
+static uint64_t
+svm_range_get_pte_flags(struct amdgpu_device *adev, struct svm_range *prange)
+{
+   uint32_t flags = prange->flags;
+   uint32_t mapping_flags;
+   uint64_t pte_flags;
+
+   pte_flags = AMDGPU_PTE_VALID;
+   pte_flags |= AMDGPU_PTE_SYSTEM | AMDGPU_PTE_SNOOPED;
+
+   mapping_flags = AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_WRITEABLE;
+
+   if (flags & KFD_IOCTL_SVM_FLAG_GPU_RO)
+   mapping_flags &= ~AMDGPU_VM_PAGE_WRITEABLE;
+   if (flags & KFD_IOCTL_SVM_FLAG_GPU_EXEC)
+   mapping_flags |= AMDGPU_VM_PAGE_EXECUTABLE;
+   if (flags & KFD_IOCTL_SVM_FLAG_COHERENT)
+   mapping_flags |= AMDGPU_VM_MTYPE_UC;
+   else
+   mapping_flags |= AMDGPU_VM_MTYPE_NC;
+
+   /* TODO: add CHIP_ARCTURUS new flags for vram mapping */
+
+   pte_flags |= amdgpu_gem_va_map_flags(adev, mapping_flags);
+
+   /* Apply ASIC specific mapping flags */
+   amdgpu_gmc_get_vm_pte(adev, &prange->mapping, &pte_flags);
+
+   pr_debug("PTE flags 0x%llx\n", pte_flags);
+
+   return pte_flags;
+}
+
+static int
+svm_range_unmap_from_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,
+struct svm_range *prange, struct dma_fence **fence)
+{
+   uint64_t init_pte_value = 0;
+   uint64_t start;
+   uint64_t last;
+
+   start = prange->it_node.start;
+   last = prange->it_node.last;
+
+   pr_debug("svms 0x%p [0x%llx 0x%llx]\n", prange->svms, start, last);
+
+   return amdgpu_vm_bo_update_mapping(adev, adev, vm, false, true, NULL,
+  start, last, init_pte_value, 0,
+  NULL, NULL, fence);
+}
+
+static int
+svm_range_unmap_from_gpus(struct svm_range *prange)
+{
+   DECLARE_BITMAP(bitmap, MAX_GPU_INSTANCE);
+   struct kfd_process_device *pdd;
+   struct dma_fence *fence = NULL;
+   struct amdgpu_device *adev;
+   struct kfd_process *p;
+   struct kfd_dev *dev;
+   uint32_t gpuidx;
+   int r = 0;
+
+   bitmap_or(bitmap, prange->bitmap_access, prange->bitmap_aip,
+ MAX_GPU_INSTANCE);
+   p = container_of(prange->svms, struct kfd_process, svms);
+
+   for_each_set_bit(gpuidx, bitmap, MAX_GPU_INSTANCE) {
+   pr_debug("unmap from gpu idx 0x%x\n", gpuidx);
+   r = kfd_process_device_from_gpuidx(p, gpuidx, &dev);
+   if (r) {
+   pr_debug("failed to find device idx %d\n", gpuidx);
+   return -EINVAL;
+   }
+
+   pdd = kfd_bind_process_to_device(dev, p);
+   if (IS_ERR(pdd))
+   return -EINVAL;
+
+   adev = (struct amdgpu_device *)dev->kgd;
+
+   r = svm_range_unmap_from_gpu(adev, pdd->vm, prange, &fence);
+   if (r)
+   break;
+
+   if (fence) {
+   r = dma_fence_wait(fence, false);
+   dma_fence_put(fence);
+   fence = NULL;
+   if (r)
+   break;
+   }
+
+   amdgpu_amdkfd_flush_gpu_tlb_pasid((struct kgd_dev *)adev,
+ p->pasid);
+   }
+
+   return r;
+}
+
+static int svm_range_bo_validate(void *param, struct amdgpu_bo *bo)
+{
+   struct ttm_operation_ctx ctx = { false, false };
+
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
+
+   return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+}
+
+static int
+svm_range_map_to_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,
+struct svm_range *prange, bool reserve_vm,
+struct dma_fence **fence)
+{
+   struct amdgpu_bo *root;
+   dma_addr_t *pages_addr;
+   uint64_t pte_flags;
+   int r = 0;
+
+   pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms,
+prange->it_node.start, prange->it_node.last);
+
+   if (reserve_vm) {
+   root = amdgpu_bo_ref(vm->root.base.bo);
+   r = amdgpu_bo_reserve(root, true);
+   if (r) {
+   pr_debug("fai

[PATCH 11/35] drm/amdkfd: deregister svm range

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

When application explicitly call unmap or unmap from mmput when
application exit, driver will receive MMU_NOTIFY_UNMAP event to remove
svm range from process svms object tree and list first, unmap from GPUs
(in the following patch).

Split the svm ranges to handle unmap partial svm range.

Signed-off-by: Philip Yang 
Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 86 
 1 file changed, 86 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index ad007261f54c..55500ec4972f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -699,15 +699,101 @@ static void svm_range_srcu_free_work(struct work_struct 
*work_struct)
mutex_unlock(&svms->free_list_lock);
 }
 
+static void
+svm_range_unmap_from_cpu(struct mm_struct *mm, unsigned long start,
+unsigned long last)
+{
+   struct list_head remove_list;
+   struct list_head update_list;
+   struct list_head insert_list;
+   struct svm_range_list *svms;
+   struct svm_range new = {0};
+   struct svm_range *prange;
+   struct svm_range *tmp;
+   struct kfd_process *p;
+   int r;
+
+   p = kfd_lookup_process_by_mm(mm);
+   if (!p)
+   return;
+   svms = &p->svms;
+
+   pr_debug("notifier svms 0x%p [0x%lx 0x%lx]\n", svms, start, last);
+
+   svms_lock(svms);
+
+   r = svm_range_handle_overlap(svms, &new, start, last, &update_list,
+&insert_list, &remove_list, NULL);
+   if (r) {
+   svms_unlock(svms);
+   kfd_unref_process(p);
+   return;
+   }
+
+   mutex_lock(&svms->free_list_lock);
+   list_for_each_entry_safe(prange, tmp, &remove_list, remove_list) {
+   pr_debug("remove svms 0x%p [0x%lx 0x%lx]\n", prange->svms,
+prange->it_node.start, prange->it_node.last);
+   svm_range_unlink(prange);
+
+   pr_debug("schedule to free svms 0x%p [0x%lx 0x%lx]\n",
+prange->svms, prange->it_node.start,
+prange->it_node.last);
+   list_add_tail(&prange->remove_list, &svms->free_list);
+   }
+   if (!list_empty(&svms->free_list))
+   schedule_work(&svms->srcu_free_work);
+   mutex_unlock(&svms->free_list_lock);
+
+   /* prange in update_list is unmapping from cpu, remove it from insert
+* list
+*/
+   list_for_each_entry_safe(prange, tmp, &update_list, update_list) {
+   list_del(&prange->list);
+   mutex_lock(&svms->free_list_lock);
+   list_add_tail(&prange->remove_list, &svms->free_list);
+   mutex_unlock(&svms->free_list_lock);
+   }
+   mutex_lock(&svms->free_list_lock);
+   if (!list_empty(&svms->free_list))
+   schedule_work(&svms->srcu_free_work);
+   mutex_unlock(&svms->free_list_lock);
+
+   list_for_each_entry_safe(prange, tmp, &insert_list, list)
+   svm_range_add_to_svms(prange);
+
+   svms_unlock(svms);
+   kfd_unref_process(p);
+}
+
 /**
  * svm_range_cpu_invalidate_pagetables - interval notifier callback
  *
+ * MMU range unmap notifier to remove svm ranges
  */
 static bool
 svm_range_cpu_invalidate_pagetables(struct mmu_interval_notifier *mni,
const struct mmu_notifier_range *range,
unsigned long cur_seq)
 {
+   unsigned long start = range->start >> PAGE_SHIFT;
+   unsigned long last = (range->end - 1) >> PAGE_SHIFT;
+   struct svm_range_list *svms;
+
+   svms = container_of(mni, struct svm_range_list, notifier);
+
+   if (range->event == MMU_NOTIFY_RELEASE) {
+   pr_debug("cpu release range [0x%lx 0x%lx]\n", range->start,
+range->end - 1);
+   return true;
+   }
+   if (range->event == MMU_NOTIFY_UNMAP) {
+   pr_debug("mm 0x%p unmap range [0x%lx 0x%lx]\n", range->mm,
+start, last);
+   svm_range_unmap_from_cpu(mni->mm, start, last);
+   return true;
+   }
+
return true;
 }
 
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 16/35] drm/amdkfd: add ioctl to configure and query xnack retries

2021-01-06 Thread Felix Kuehling

From: Alex Sierra 

Xnack retries are used for page fault recovery. Some AMD chip
families support continuously retry while page table entries are invalid.
The driver must handle the page fault interrupt and fill in a valid entry
for the GPU to continue.

This ioctl allows to enable/disable XNACK retries per KFD process.

Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 28 +++
 include/uapi/linux/kfd_ioctl.h   | 43 +++-
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 2d3ba7e806d5..a9a6a7c8ff21 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1747,6 +1747,31 @@ static int kfd_ioctl_smi_events(struct file *filep,
return kfd_smi_event_open(dev, &args->anon_fd);
 }
 
+static int kfd_ioctl_set_xnack_mode(struct file *filep,
+   struct kfd_process *p, void *data)
+{
+   struct kfd_ioctl_set_xnack_mode_args *args = data;
+   int r = 0;
+
+   mutex_lock(&p->mutex);
+   if (args->xnack_enabled >= 0) {
+   if (!list_empty(&p->pqm.queues)) {
+   pr_debug("Process has user queues running\n");
+   mutex_unlock(&p->mutex);
+   return -EBUSY;
+   }
+   if (args->xnack_enabled && !kfd_process_xnack_supported(p))
+   r = -EPERM;
+   else
+   p->xnack_enabled = args->xnack_enabled;
+   } else {
+   args->xnack_enabled = p->xnack_enabled;
+   }
+   mutex_unlock(&p->mutex);
+
+   return r;
+}
+
 static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data)
 {
struct kfd_ioctl_svm_args *args = data;
@@ -1870,6 +1895,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
kfd_ioctl_smi_events, 0),
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_SVM, kfd_ioctl_svm, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_SET_XNACK_MODE,
+   kfd_ioctl_set_xnack_mode, 0),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 5d4a4b3e0b61..b1a45cd37ab7 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -593,6 +593,44 @@ struct kfd_ioctl_svm_args {
struct kfd_ioctl_svm_attribute attrs[0];
 };
 
+/**
+ * kfd_ioctl_set_xnack_mode_args - Arguments for set_xnack_mode
+ *
+ * @xnack_enabled:   [in/out] Whether to enable XNACK mode for this process
+ *
+ * @xnack_enabled indicates whether recoverable page faults should be
+ * enabled for the current process. 0 means disabled, positive means
+ * enabled, negative means leave unchanged. If enabled, virtual address
+ * translations on GFXv9 and later AMD GPUs can return XNACK and retry
+ * the access until a valid PTE is available. This is used to implement
+ * device page faults.
+ *
+ * On output, @xnack_enabled returns the (new) current mode (0 or
+ * positive). Therefore, a negative input value can be used to query
+ * the current mode without changing it.
+ *
+ * The XNACK mode fundamentally changes the way SVM managed memory works
+ * in the driver, with subtle effects on application performance and
+ * functionality.
+ *
+ * Enabling XNACK mode requires shader programs to be compiled
+ * differently. Furthermore, not all GPUs support changing the mode
+ * per-process. Therefore changing the mode is only allowed while no
+ * user mode queues exist in the process. This ensure that no shader
+ * code is running that may be compiled for the wrong mode. And GPUs
+ * that cannot change to the requested mode will prevent the XNACK
+ * mode from occurring. All GPUs used by the process must be in the
+ * same XNACK mode.
+ *
+ * GFXv8 or older GPUs do not support 48 bit virtual addresses or SVM.
+ * Therefore those GPUs are not considered for the XNACK mode switch.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+struct kfd_ioctl_set_xnack_mode_args {
+   __s32 xnack_enabled;
+};
+
 #define AMDKFD_IOCTL_BASE 'K'
 #define AMDKFD_IO(nr)  _IO(AMDKFD_IOCTL_BASE, nr)
 #define AMDKFD_IOR(nr, type)   _IOR(AMDKFD_IOCTL_BASE, nr, type)
@@ -695,7 +733,10 @@ struct kfd_ioctl_svm_args {
 
 #define AMDKFD_IOC_SVM AMDKFD_IOWR(0x20, struct kfd_ioctl_svm_args)
 
+#define AMDKFD_IOC_SET_XNACK_MODE  \
+   AMDKFD_IOWR(0x21, struct kfd_ioctl_set_xnack_mode_args)
+
 #define AMDKFD_COMMAND_START   0x01
-#define AMDKFD_COMMAND_END 0x21
+#define AMDKFD_COMMAND_END 0x22
 
 #endif
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 14/35] drm/amdkfd: svm range eviction and restore

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

HMM interval notifier callback notify CPU page table will be updated,
stop process queues if the updated address belongs to svm range
registered in process svms objects tree. Scheduled restore work to
update GPU page table using new pages address in the updated svm range.

svm restore work to use srcu to scan svms list to avoid deadlock between
below two cases:

case1: svm restore work takes svm lock to scan svms list, then call
hmm_page_fault which takes mm->mmap_sem.
case2: unmap event callback and set_attr ioctl takes mm->mmap_sem, than
takes svm lock to add/remove ranges.

Calling synchronize_srcu in unmap event callback will deadlock with
restore work because restore work may wait for unmap event done to
take mm->mmap_sem, so schedule srcu_free_work to wait for srcu read
critical section done in svm restore work then free svm ranges.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 169 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |   2 +
 4 files changed, 169 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 97cf267b6f51..f1e95773e19b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -736,6 +736,8 @@ struct svm_range_list {
struct list_headfree_list;
struct mutexfree_list_lock;
struct mmu_interval_notifiernotifier;
+   atomic_tevicted_ranges;
+   struct delayed_work restore_work;
 };
 
 /* Process data */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 791f17308b1b..0f31538b2a91 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1048,6 +1048,7 @@ static void kfd_process_notifier_release(struct 
mmu_notifier *mn,
 
cancel_delayed_work_sync(&p->eviction_work);
cancel_delayed_work_sync(&p->restore_work);
+   cancel_delayed_work_sync(&p->svms.restore_work);
 
mutex_lock(&p->mutex);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 3c4a036609c4..e3ba6e7262a7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -21,6 +21,7 @@
  */
 
 #include 
+#include 
 #include "amdgpu_sync.h"
 #include "amdgpu_object.h"
 #include "amdgpu_vm.h"
@@ -28,6 +29,8 @@
 #include "kfd_priv.h"
 #include "kfd_svm.h"
 
+#define AMDGPU_SVM_RANGE_RESTORE_DELAY_MS 1
+
 /**
  * svm_range_unlink - unlink svm_range from lists and interval tree
  * @prange: svm range structure to be removed
@@ -99,6 +102,7 @@ svm_range *svm_range_new(struct svm_range_list *svms, 
uint64_t start,
INIT_LIST_HEAD(&prange->list);
INIT_LIST_HEAD(&prange->update_list);
INIT_LIST_HEAD(&prange->remove_list);
+   atomic_set(&prange->invalid, 0);
svm_range_set_default_attributes(&prange->preferred_loc,
 &prange->prefetch_loc,
 &prange->granularity, &prange->flags);
@@ -191,6 +195,10 @@ svm_range_validate(struct mm_struct *mm, struct svm_range 
*prange)
 
r = svm_range_validate_ram(mm, prange);
 
+   pr_debug("svms 0x%p [0x%lx 0x%lx] ret %d invalid %d\n", prange->svms,
+prange->it_node.start, prange->it_node.last,
+r, atomic_read(&prange->invalid));
+
return r;
 }
 
@@ -757,6 +765,151 @@ static int svm_range_map_to_gpus(struct svm_range 
*prange, bool reserve_vm)
return r;
 }
 
+static void svm_range_restore_work(struct work_struct *work)
+{
+   struct delayed_work *dwork = to_delayed_work(work);
+   struct amdkfd_process_info *process_info;
+   struct svm_range_list *svms;
+   struct svm_range *prange;
+   struct kfd_process *p;
+   struct mm_struct *mm;
+   int evicted_ranges;
+   int srcu_idx;
+   int invalid;
+   int r;
+
+   svms = container_of(dwork, struct svm_range_list, restore_work);
+   evicted_ranges = atomic_read(&svms->evicted_ranges);
+   if (!evicted_ranges)
+   return;
+
+   pr_debug("restore svm ranges\n");
+
+   /* kfd_process_notifier_release destroys this worker thread. So during
+* the lifetime of this thread, kfd_process and mm will be valid.
+*/
+   p = container_of(svms, struct kfd_process, svms);
+   process_info = p->kgd_process_info;
+   mm = p->mm;
+   if (!mm)
+   return;
+
+   mutex_lock(&process_info->lock);
+   mmap_read_lock(mm);
+   srcu_idx = srcu_read_lock(&svms->srcu);
+
+   list_for_each_entry_rcu(prange, &svms->list, list) {
+   invalid = atomic_read(&prange->in

[PATCH 20/35] drm/amdkfd: copy memory through gart table

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

Use sdma linear copy to migrate data between ram and vram. The sdma
linear copy command uses kernel buffer function queue to access system
memory through gart table.

Use reserved gart table window 0 to map system page address, and vram
page address is direct mapping. Use the same kernel buffer function to
fill in gart table mapping, so this is serialized with memory copy by
sdma job submit. We only need wait for the last memory copy sdma fence
for larger buffer migration.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 172 +++
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   5 +
 2 files changed, 177 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 1950b86f1562..f2019c8f0b80 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -32,6 +32,178 @@
 #include "kfd_svm.h"
 #include "kfd_migrate.h"
 
+static uint64_t
+svm_migrate_direct_mapping_addr(struct amdgpu_device *adev, uint64_t addr)
+{
+   return addr + amdgpu_ttm_domain_start(adev, TTM_PL_VRAM);
+}
+
+static int
+svm_migrate_gart_map(struct amdgpu_ring *ring, uint64_t npages,
+uint64_t *addr, uint64_t *gart_addr, uint64_t flags)
+{
+   struct amdgpu_device *adev = ring->adev;
+   struct amdgpu_job *job;
+   unsigned int num_dw, num_bytes;
+   struct dma_fence *fence;
+   uint64_t src_addr, dst_addr;
+   uint64_t pte_flags;
+   void *cpu_addr;
+   int r;
+
+   /* use gart window 0 */
+   *gart_addr = adev->gmc.gart_start;
+
+   num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
+   num_bytes = npages * 8;
+
+   r = amdgpu_job_alloc_with_ib(adev, num_dw * 4 + num_bytes,
+AMDGPU_IB_POOL_DELAYED, &job);
+   if (r)
+   return r;
+
+   src_addr = num_dw * 4;
+   src_addr += job->ibs[0].gpu_addr;
+
+   dst_addr = amdgpu_bo_gpu_offset(adev->gart.bo);
+   amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr,
+   dst_addr, num_bytes, false);
+
+   amdgpu_ring_pad_ib(ring, &job->ibs[0]);
+   WARN_ON(job->ibs[0].length_dw > num_dw);
+
+   pte_flags = AMDGPU_PTE_VALID | AMDGPU_PTE_READABLE;
+   pte_flags |= AMDGPU_PTE_SYSTEM | AMDGPU_PTE_SNOOPED;
+   if (!(flags & KFD_IOCTL_SVM_FLAG_GPU_RO))
+   pte_flags |= AMDGPU_PTE_WRITEABLE;
+   pte_flags |= adev->gart.gart_pte_flags;
+
+   cpu_addr = &job->ibs[0].ptr[num_dw];
+
+   r = amdgpu_gart_map(adev, 0, npages, addr, pte_flags, cpu_addr);
+   if (r)
+   goto error_free;
+
+   r = amdgpu_job_submit(job, &adev->mman.entity,
+ AMDGPU_FENCE_OWNER_UNDEFINED, &fence);
+   if (r)
+   goto error_free;
+
+   dma_fence_put(fence);
+
+   return r;
+
+error_free:
+   amdgpu_job_free(job);
+   return r;
+}
+
+/**
+ * svm_migrate_copy_memory_gart - sdma copy data between ram and vram
+ *
+ * @adev: amdgpu device the sdma ring running
+ * @src: source page address array
+ * @dst: destination page address array
+ * @npages: number of pages to copy
+ * @direction: enum MIGRATION_COPY_DIR
+ * @mfence: output, sdma fence to signal after sdma is done
+ *
+ * ram address uses GART table continuous entries mapping to ram pages,
+ * vram address uses direct mapping of vram pages, which must have npages
+ * number of continuous pages.
+ * GART update and sdma uses same buf copy function ring, sdma is splited to
+ * multiple GTT_MAX_PAGES transfer, all sdma operations are serialized, wait 
for
+ * the last sdma finish fence which is returned to check copy memory is done.
+ *
+ * Context: Process context, takes and releases gtt_window_lock
+ *
+ * Return:
+ * 0 - OK, otherwise error code
+ */
+
+static int
+svm_migrate_copy_memory_gart(struct amdgpu_device *adev, uint64_t *src,
+uint64_t *dst, uint64_t npages,
+enum MIGRATION_COPY_DIR direction,
+struct dma_fence **mfence)
+{
+   const uint64_t GTT_MAX_PAGES = AMDGPU_GTT_MAX_TRANSFER_SIZE;
+   struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+   uint64_t gart_s, gart_d;
+   struct dma_fence *next;
+   uint64_t size;
+   int r;
+
+   mutex_lock(&adev->mman.gtt_window_lock);
+
+   while (npages) {
+   size = min(GTT_MAX_PAGES, npages);
+
+   if (direction == FROM_VRAM_TO_RAM) {
+   gart_s = svm_migrate_direct_mapping_addr(adev, *src);
+   r = svm_migrate_gart_map(ring, size, dst, &gart_d, 0);
+
+   } else if (direction == FROM_RAM_TO_VRAM) {
+   r = svm_migrate_gart_map(ring, size, src, &gart_s,
+KFD_IOCTL_SVM_FLAG_GPU_RO);

[PATCH 17/35] drm/amdkfd: register HMM device private zone

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

Register vram memory as MEMORY_DEVICE_PRIVATE type resource, to
allocate vram backing pages for page migration.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |   3 +
 drivers/gpu/drm/amd/amdkfd/Makefile|   3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c   | 101 +
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h   |  48 ++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |   3 +
 5 files changed, 157 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index db96d69eb45e..562bb5b69137 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -30,6 +30,7 @@
 #include 
 #include "amdgpu_xgmi.h"
 #include 
+#include "kfd_migrate.h"
 
 /* Total memory size in system memory and all GPU VRAM. Used to
  * estimate worst case amount of memory to reserve for page tables
@@ -170,12 +171,14 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
}
 
kgd2kfd_device_init(adev->kfd.dev, adev_to_drm(adev), 
&gpu_resources);
+   svm_migrate_init(adev);
}
 }
 
 void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev)
 {
if (adev->kfd.dev) {
+   svm_migrate_fini(adev);
kgd2kfd_device_exit(adev->kfd.dev);
adev->kfd.dev = NULL;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index 387ce0217d35..a93301dbc464 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -55,7 +55,8 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_dbgmgr.o \
$(AMDKFD_PATH)/kfd_smi_events.o \
$(AMDKFD_PATH)/kfd_crat.o \
-   $(AMDKFD_PATH)/kfd_svm.o
+   $(AMDKFD_PATH)/kfd_svm.o \
+   $(AMDKFD_PATH)/kfd_migrate.o
 
 ifneq ($(CONFIG_AMD_IOMMU_V2),)
 AMDKFD_FILES += $(AMDKFD_PATH)/kfd_iommu.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
new file mode 100644
index ..1950b86f1562
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -0,0 +1,101 @@
+/*
+ * Copyright 2020 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "amdgpu_sync.h"
+#include "amdgpu_object.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_mn.h"
+#include "kfd_priv.h"
+#include "kfd_svm.h"
+#include "kfd_migrate.h"
+
+static void svm_migrate_page_free(struct page *page)
+{
+}
+
+/**
+ * svm_migrate_to_ram - CPU page fault handler
+ * @vmf: CPU vm fault vma, address
+ *
+ * Context: vm fault handler, mm->mmap_sem is taken
+ *
+ * Return:
+ * 0 - OK
+ * VM_FAULT_SIGBUS - notice application to have SIGBUS page fault
+ */
+static vm_fault_t svm_migrate_to_ram(struct vm_fault *vmf)
+{
+   return VM_FAULT_SIGBUS;
+}
+
+static const struct dev_pagemap_ops svm_migrate_pgmap_ops = {
+   .page_free  = svm_migrate_page_free,
+   .migrate_to_ram = svm_migrate_to_ram,
+};
+
+int svm_migrate_init(struct amdgpu_device *adev)
+{
+   struct kfd_dev *kfddev = adev->kfd.dev;
+   struct dev_pagemap *pgmap;
+   struct resource *res;
+   unsigned long size;
+   void *r;
+
+   /* Page migration works on Vega10 or newer */
+   if (kfddev->device_info->asic_family < CHIP_VEGA10)
+   return -EINVAL;
+
+   pgmap = &kfddev->pgmap;
+   memset(pgmap, 0, sizeof(*pgmap));
+
+   /* TODO: register all vram to HMM for now.
+* should remove reserved size
+*/
+   size = ALIGN(adev->gmc.real_vr

[PATCH 15/35] drm/amdkfd: add xnack enabled flag to kfd_process

2021-01-06 Thread Felix Kuehling

From: Alex Sierra 

This flag is useful at cpu invalidation page table
decision. Between select queue eviction or page fault.

Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  4 +++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 36 
 2 files changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index f1e95773e19b..7a4b4b6dcf32 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -821,6 +821,8 @@ struct kfd_process {
 
/* shared virtual memory registered by this process */
struct svm_range_list svms;
+
+   bool xnack_enabled;
 };
 
 #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
@@ -874,6 +876,8 @@ struct kfd_process_device 
*kfd_get_process_device_data(struct kfd_dev *dev,
 struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
struct kfd_process *p);
 
+bool kfd_process_xnack_supported(struct kfd_process *p);
+
 int kfd_reserved_mem_mmap(struct kfd_dev *dev, struct kfd_process *process,
  struct vm_area_struct *vma);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 0f31538b2a91..f7a50a364d78 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1157,6 +1157,39 @@ static int kfd_process_device_init_cwsr_dgpu(struct 
kfd_process_device *pdd)
return 0;
 }
 
+bool kfd_process_xnack_supported(struct kfd_process *p)
+{
+   int i;
+
+   /* On most GFXv9 GPUs, the retry mode in the SQ must match the
+* boot time retry setting. Mixing processes with different
+* XNACK/retry settings can hang the GPU.
+*
+* Different GPUs can have different noretry settings depending
+* on HW bugs or limitations. We need to find at least one
+* XNACK mode for this process that's compatible with all GPUs.
+* Fortunately GPUs with retry enabled (noretry=0) can run code
+* built for XNACK-off. On GFXv9 it may perform slower.
+*
+* Therefore applications built for XNACK-off can always be
+* supported and will be our fallback if any GPU does not
+* support retry.
+*/
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_dev *dev = p->pdds[i]->dev;
+
+   /* Only consider GFXv9 and higher GPUs. Older GPUs don't
+* support the SVM APIs and don't need to be considered
+* for the XNACK mode selection.
+*/
+   if (dev->device_info->asic_family >= CHIP_VEGA10 &&
+   dev->noretry)
+   return false;
+   }
+
+   return true;
+}
+
 /*
  * On return the kfd_process is fully operational and will be freed when the
  * mm is released
@@ -1194,6 +1227,9 @@ static struct kfd_process *create_process(const struct 
task_struct *thread)
if (err != 0)
goto err_init_apertures;
 
+   /* Check XNACK support after PDDs are created in kfd_init_apertures */
+   process->xnack_enabled = kfd_process_xnack_supported(process);
+
err = svm_range_list_init(process);
if (err)
goto err_init_svm_range_list;
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 18/35] drm/amdkfd: validate vram svm range from TTM

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

If svm range perfetch location is not zero, use TTM to alloc
amdgpu_bo vram nodes to validate svm range, then map vram nodes to GPUs.

Use offset to sub allocate from the same amdgpu_bo to handle overlap
vram range while adding new range or unmapping range.

svm_bo has ref count to trace the shared ranges. If all ranges of shared
amdgpu_bo are migrated to ram, ref count becomes 0, then amdgpu_bo is
released, all ranges svm_bo is set to NULL.

To migrate range from ram back to vram, allocate the same amdgpu_bo
with previous offset if the range has svm_bo.

Signed-off-by: Philip Yang 
Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 342 ---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  20 ++
 2 files changed, 335 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index e3ba6e7262a7..7d91dc49a5a9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -35,7 +35,9 @@
  * svm_range_unlink - unlink svm_range from lists and interval tree
  * @prange: svm range structure to be removed
  *
- * Remove the svm range from svms interval tree and link list
+ * Remove the svm_range from the svms and svm_bo SRCU lists and the svms
+ * interval tree. After this call, synchronize_srcu is needed before the
+ * range can be freed safely.
  *
  * Context: The caller must hold svms_lock
  */
@@ -44,6 +46,12 @@ static void svm_range_unlink(struct svm_range *prange)
pr_debug("prange 0x%p [0x%lx 0x%lx]\n", prange, prange->it_node.start,
 prange->it_node.last);
 
+   if (prange->svm_bo) {
+   spin_lock(&prange->svm_bo->list_lock);
+   list_del(&prange->svm_bo_list);
+   spin_unlock(&prange->svm_bo->list_lock);
+   }
+
list_del_rcu(&prange->list);
interval_tree_remove(&prange->it_node, &prange->svms->objects);
 }
@@ -70,6 +78,12 @@ static void svm_range_remove(struct svm_range *prange)
pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms,
 prange->it_node.start, prange->it_node.last);
 
+   if (prange->mm_nodes) {
+   pr_debug("vram prange svms 0x%p [0x%lx 0x%lx]\n", prange->svms,
+prange->it_node.start, prange->it_node.last);
+   svm_range_vram_node_free(prange);
+   }
+
kvfree(prange->pages_addr);
kfree(prange);
 }
@@ -102,7 +116,9 @@ svm_range *svm_range_new(struct svm_range_list *svms, 
uint64_t start,
INIT_LIST_HEAD(&prange->list);
INIT_LIST_HEAD(&prange->update_list);
INIT_LIST_HEAD(&prange->remove_list);
+   INIT_LIST_HEAD(&prange->svm_bo_list);
atomic_set(&prange->invalid, 0);
+   spin_lock_init(&prange->svm_bo_lock);
svm_range_set_default_attributes(&prange->preferred_loc,
 &prange->prefetch_loc,
 &prange->granularity, &prange->flags);
@@ -139,6 +155,16 @@ svm_get_supported_dev_by_id(struct kfd_process *p, 
uint32_t gpu_id,
return dev;
 }
 
+struct amdgpu_device *
+svm_range_get_adev_by_id(struct svm_range *prange, uint32_t gpu_id)
+{
+   struct kfd_process *p =
+   container_of(prange->svms, struct kfd_process, svms);
+   struct kfd_dev *dev = svm_get_supported_dev_by_id(p, gpu_id, NULL);
+
+   return dev ? (struct amdgpu_device *)dev->kgd : NULL;
+}
+
 /**
  * svm_range_validate_ram - get system memory pages of svm range
  *
@@ -186,14 +212,226 @@ svm_range_validate_ram(struct mm_struct *mm, struct 
svm_range *prange)
return 0;
 }
 
+static bool svm_bo_ref_unless_zero(struct svm_range_bo *svm_bo)
+{
+   if (!svm_bo || !kref_get_unless_zero(&svm_bo->kref))
+   return false;
+
+   return true;
+}
+
+static struct svm_range_bo *svm_range_bo_ref(struct svm_range_bo *svm_bo)
+{
+   if (svm_bo)
+   kref_get(&svm_bo->kref);
+
+   return svm_bo;
+}
+
+static void svm_range_bo_release(struct kref *kref)
+{
+   struct svm_range_bo *svm_bo;
+
+   svm_bo = container_of(kref, struct svm_range_bo, kref);
+   /* This cleanup loop does not need to be SRCU safe because there
+* should be no SRCU readers while the ref count is 0. Any SRCU
+* reader that has a chance of reducing the ref count must take
+* an extra reference before srcu_read_lock and release it after
+* srcu_read_unlock.
+*/
+   spin_lock(&svm_bo->list_lock);
+   while (!list_empty(&svm_bo->range_list)) {
+   struct svm_range *prange =
+   list_first_entry(&svm_bo->range_list,
+   struct svm_range, svm_bo_list);
+   pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms,
+prange->it_node.start, prange->it_node.last)

[PATCH 22/35] drm/amdkfd: HMM migrate vram to ram

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

If CPU page fault happens, HMM pgmap_ops callback migrate_to_ram start
migrate memory from vram to ram in steps:

1. migrate_vma_pages get vram pages, and notify HMM to invalidate the
pages, HMM interval notifier callback evict process queues
2. Allocate system memory pages
3. Use svm copy memory to migrate data from vram to ram
4. migrate_vma_pages copy pages structure from vram pages to ram pages
5. Return VM_FAULT_SIGBUS if migration failed, to notify application
6. migrate_vma_finalize put vram pages, page_free callback free vram
pages and vram nodes
7. Restore work wait for migration is finished, then update GPU page
table mapping to system memory, and resume process queues

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 274 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   3 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 116 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |   4 +
 4 files changed, 392 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index af23f0be7eaf..d33a4cc63495 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -259,6 +259,35 @@ svm_migrate_put_vram_page(struct amdgpu_device *adev, 
unsigned long addr)
put_page(page);
 }
 
+static unsigned long
+svm_migrate_addr(struct amdgpu_device *adev, struct page *page)
+{
+   unsigned long addr;
+
+   addr = page_to_pfn(page) << PAGE_SHIFT;
+   return (addr - adev->kfd.dev->pgmap.res.start);
+}
+
+static struct page *
+svm_migrate_get_sys_page(struct vm_area_struct *vma, unsigned long addr)
+{
+   struct page *page;
+
+   page = alloc_page_vma(GFP_HIGHUSER, vma, addr);
+   if (page)
+   lock_page(page);
+
+   return page;
+}
+
+void svm_migrate_put_sys_page(unsigned long addr)
+{
+   struct page *page;
+
+   page = pfn_to_page(addr >> PAGE_SHIFT);
+   unlock_page(page);
+   put_page(page);
+}
 
 static int
 svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
@@ -471,13 +500,208 @@ int svm_migrate_ram_to_vram(struct svm_range *prange, 
uint32_t best_loc)
 
 static void svm_migrate_page_free(struct page *page)
 {
+   /* Keep this function to avoid warning */
+}
+
+static int
+svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
+   struct migrate_vma *migrate,
+   struct dma_fence **mfence)
+{
+   uint64_t npages = migrate->cpages;
+   uint64_t *src, *dst;
+   struct page *dpage;
+   uint64_t i = 0, j;
+   uint64_t addr;
+   int r = 0;
+
+   pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms,
+prange->it_node.start, prange->it_node.last);
+
+   addr = prange->it_node.start << PAGE_SHIFT;
+
+   src = kvmalloc_array(npages << 1, sizeof(*src), GFP_KERNEL);
+   if (!src)
+   return -ENOMEM;
+
+   dst = src + npages;
+
+   prange->pages_addr = kvmalloc_array(npages, sizeof(*prange->pages_addr),
+   GFP_KERNEL | __GFP_ZERO);
+   if (!prange->pages_addr) {
+   r = -ENOMEM;
+   goto out_oom;
+   }
+
+   for (i = 0, j = 0; i < npages; i++, j++, addr += PAGE_SIZE) {
+   struct page *spage;
+
+   spage = migrate_pfn_to_page(migrate->src[i]);
+   if (!spage) {
+   pr_debug("failed get spage svms 0x%p [0x%lx 0x%lx]\n",
+prange->svms, prange->it_node.start,
+prange->it_node.last);
+   r = -ENOMEM;
+   goto out_oom;
+   }
+   src[i] = svm_migrate_addr(adev, spage);
+   if (i > 0 && src[i] != src[i - 1] + PAGE_SIZE) {
+   r = svm_migrate_copy_memory_gart(adev, src + i - j,
+dst + i - j, j,
+FROM_VRAM_TO_RAM,
+mfence);
+   if (r)
+   goto out_oom;
+   j = 0;
+   }
+
+   dpage = svm_migrate_get_sys_page(migrate->vma, addr);
+   if (!dpage) {
+   pr_debug("failed get page svms 0x%p [0x%lx 0x%lx]\n",
+prange->svms, prange->it_node.start,
+prange->it_node.last);
+   r = -ENOMEM;
+   goto out_oom;
+   }
+
+   dst[i] = page_to_pfn(dpage) << PAGE_SHIFT;
+   *(prange->pages_addr + i) = dst[i];
+
+   migrate->dst[i] = migrate_pfn(page_to_pfn(dpage));
+   migrate->dst[i]

[PATCH 19/35] drm/amdkfd: support xgmi same hive mapping

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

amdgpu_gmc_get_vm_pte use bo_va->is_xgmi same hive information to set
pte flags to update GPU mapping. Add local structure variable bo_va, and
update bo_va.is_xgmi, pass it to mapping->bo_va while mapping to GPU.

Assuming xgmi pstate is hi after boot.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7d91dc49a5a9..8a4d0a3935b6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -26,6 +26,8 @@
 #include "amdgpu_object.h"
 #include "amdgpu_vm.h"
 #include "amdgpu_mn.h"
+#include "amdgpu.h"
+#include "amdgpu_xgmi.h"
 #include "kfd_priv.h"
 #include "kfd_svm.h"
 
@@ -923,10 +925,11 @@ static int svm_range_bo_validate(void *param, struct 
amdgpu_bo *bo)
 static int
 svm_range_map_to_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 struct svm_range *prange, bool reserve_vm,
-struct dma_fence **fence)
+struct amdgpu_device *bo_adev, struct dma_fence **fence)
 {
struct ttm_validate_buffer tv[2];
struct ww_acquire_ctx ticket;
+   struct amdgpu_bo_va bo_va;
struct list_head list;
dma_addr_t *pages_addr;
uint64_t pte_flags;
@@ -963,6 +966,11 @@ svm_range_map_to_gpu(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
}
}
 
+   if (prange->svm_bo && prange->mm_nodes) {
+   bo_va.is_xgmi = amdgpu_xgmi_same_hive(adev, bo_adev);
+   prange->mapping.bo_va = &bo_va;
+   }
+
prange->mapping.start = prange->it_node.start;
prange->mapping.last = prange->it_node.last;
prange->mapping.offset = prange->offset;
@@ -970,7 +978,7 @@ svm_range_map_to_gpu(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
prange->mapping.flags = pte_flags;
pages_addr = prange->pages_addr;
 
-   r = amdgpu_vm_bo_update_mapping(adev, adev, vm, false, false, NULL,
+   r = amdgpu_vm_bo_update_mapping(adev, bo_adev, vm, false, false, NULL,
prange->mapping.start,
prange->mapping.last, pte_flags,
prange->mapping.offset,
@@ -994,6 +1002,7 @@ svm_range_map_to_gpu(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
*fence = dma_fence_get(vm->last_update);
 
 unreserve_out:
+   prange->mapping.bo_va = NULL;
if (reserve_vm)
ttm_eu_backoff_reservation(&ticket, &list);
 out:
@@ -1004,6 +1013,7 @@ static int svm_range_map_to_gpus(struct svm_range 
*prange, bool reserve_vm)
 {
DECLARE_BITMAP(bitmap, MAX_GPU_INSTANCE);
struct kfd_process_device *pdd;
+   struct amdgpu_device *bo_adev;
struct amdgpu_device *adev;
struct kfd_process *p;
struct kfd_dev *dev;
@@ -1011,6 +1021,11 @@ static int svm_range_map_to_gpus(struct svm_range 
*prange, bool reserve_vm)
uint32_t gpuidx;
int r = 0;
 
+   if (prange->svm_bo && prange->mm_nodes)
+   bo_adev = amdgpu_ttm_adev(prange->svm_bo->bo->tbo.bdev);
+   else
+   bo_adev = NULL;
+
bitmap_or(bitmap, prange->bitmap_access, prange->bitmap_aip,
  MAX_GPU_INSTANCE);
p = container_of(prange->svms, struct kfd_process, svms);
@@ -1027,8 +1042,14 @@ static int svm_range_map_to_gpus(struct svm_range 
*prange, bool reserve_vm)
return -EINVAL;
adev = (struct amdgpu_device *)dev->kgd;
 
+   if (bo_adev && adev != bo_adev &&
+   !amdgpu_xgmi_same_hive(adev, bo_adev)) {
+   pr_debug("cannot map to device idx %d\n", gpuidx);
+   continue;
+   }
+
r = svm_range_map_to_gpu(adev, pdd->vm, prange, reserve_vm,
-&fence);
+bo_adev, &fence);
if (r)
break;
 
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 26/35] drm/amdkfd: add svm_bo reference for eviction fence

2021-01-06 Thread Felix Kuehling

From: Alex Sierra 

[why]
As part of the SVM functionality, the eviction mechanism used for
SVM_BOs is different. This mechanism uses one eviction fence per prange,
instead of one fence per kfd_process.

[how]
A svm_bo reference to amdgpu_amdkfd_fence to allow differentiate between
SVM_BO or regular BO evictions. This also include modifications to set the
reference at the fence creation call.

Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h   | 4 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 5 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 --
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index bc9f0e42e0a2..fb8be788ac1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -75,6 +75,7 @@ struct amdgpu_amdkfd_fence {
struct mm_struct *mm;
spinlock_t lock;
char timeline_name[TASK_COMM_LEN];
+   struct svm_range_bo *svm_bo;
 };
 
 struct amdgpu_kfd_dev {
@@ -95,7 +96,8 @@ enum kgd_engine_type {
 };
 
 struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
-  struct mm_struct *mm);
+   struct mm_struct *mm,
+   struct svm_range_bo *svm_bo);
 bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
 struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
 int amdgpu_amdkfd_remove_fence_on_pt_pd_bos(struct amdgpu_bo *bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
index 3107b9575929..9cc85efa4ed5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -60,7 +60,8 @@ static atomic_t fence_seq = ATOMIC_INIT(0);
  */
 
 struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
-  struct mm_struct *mm)
+   struct mm_struct *mm,
+   struct svm_range_bo *svm_bo)
 {
struct amdgpu_amdkfd_fence *fence;
 
@@ -73,7 +74,7 @@ struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 
context,
fence->mm = mm;
get_task_comm(fence->timeline_name, current);
spin_lock_init(&fence->lock);
-
+   fence->svm_bo = svm_bo;
dma_fence_init(&fence->base, &amdkfd_fence_ops, &fence->lock,
   context, atomic_inc_return(&fence_seq));
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 99ad4e1d0896..8a43f3880022 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -928,7 +928,8 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void 
**process_info,
 
info->eviction_fence =
amdgpu_amdkfd_fence_create(dma_fence_context_alloc(1),
-  current->mm);
+  current->mm,
+  NULL);
if (!info->eviction_fence) {
pr_err("Failed to create eviction fence\n");
ret = -ENOMEM;
@@ -2150,7 +2151,8 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, 
struct dma_fence **ef)
 */
new_fence = amdgpu_amdkfd_fence_create(
process_info->eviction_fence->base.context,
-   process_info->eviction_fence->mm);
+   process_info->eviction_fence->mm,
+   NULL);
if (!new_fence) {
pr_err("Failed to create eviction fence\n");
ret = -ENOMEM;
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 25/35] drm/amdkfd: SVM API call to restore page tables

2021-01-06 Thread Felix Kuehling

From: Alex Sierra 

Use SVM API to restore page tables when retry fault and
compute context are enabled.

Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 9c557e8bf0e5..abdd4e7b4c3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -37,6 +37,7 @@
 #include "amdgpu_gmc.h"
 #include "amdgpu_xgmi.h"
 #include "amdgpu_dma_buf.h"
+#include "kfd_svm.h"
 
 /**
  * DOC: GPUVM
@@ -3301,18 +3302,29 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, 
unsigned int pasid,
uint64_t value, flags;
struct amdgpu_vm *vm;
long r;
+   bool is_compute_context = false;
 
spin_lock(&adev->vm_manager.pasid_lock);
vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
-   if (vm)
+   if (vm) {
root = amdgpu_bo_ref(vm->root.base.bo);
-   else
+   is_compute_context = vm->is_compute_context;
+   } else {
root = NULL;
+   }
spin_unlock(&adev->vm_manager.pasid_lock);
 
if (!root)
return false;
 
+   addr /= AMDGPU_GPU_PAGE_SIZE;
+
+   if (!amdgpu_noretry && is_compute_context &&
+   !svm_range_restore_pages(adev, pasid, addr)) {
+   amdgpu_bo_unref(&root);
+   return true;
+   }
+
r = amdgpu_bo_reserve(root, true);
if (r)
goto error_unref;
@@ -3326,18 +3338,16 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, 
unsigned int pasid,
if (!vm)
goto error_unlock;
 
-   addr /= AMDGPU_GPU_PAGE_SIZE;
flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
AMDGPU_PTE_SYSTEM;
 
-   if (vm->is_compute_context) {
+   if (is_compute_context) {
/* Intentionally setting invalid PTE flag
 * combination to force a no-retry-fault
 */
flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
AMDGPU_PTE_TF;
value = 0;
-
} else if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
/* Redirect the access to the dummy page */
value = adev->dummy_page_addr;
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 28/35] drm/amdkfd: add svm_bo eviction mechanism support

2021-01-06 Thread Felix Kuehling

From: Alex Sierra 

svm_bo eviction mechanism is different from regular BOs.
Every SVM_BO created contains one eviction fence and one
worker item for eviction process.
SVM_BOs can be attached to one or more pranges.
For SVM_BO eviction mechanism, TTM will start to call
enable_signal callback for every SVM_BO until VRAM space
is available.
Here, all the ttm_evict calls are synchronous, this guarantees
that each eviction has completed and the fence has signaled before
it returns.

Signed-off-by: Alex Sierra 
Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 197 ---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  13 +-
 2 files changed, 160 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7346255f7c27..63b745a06740 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -34,6 +34,7 @@
 
 #define AMDGPU_SVM_RANGE_RESTORE_DELAY_MS 1
 
+static void svm_range_evict_svm_bo_worker(struct work_struct *work);
 /**
  * svm_range_unlink - unlink svm_range from lists and interval tree
  * @prange: svm range structure to be removed
@@ -260,7 +261,15 @@ static void svm_range_bo_release(struct kref *kref)
list_del_init(&prange->svm_bo_list);
}
spin_unlock(&svm_bo->list_lock);
-
+   if (!dma_fence_is_signaled(&svm_bo->eviction_fence->base)) {
+   /* We're not in the eviction worker.
+* Signal the fence and synchronize with any
+* pending eviction work.
+*/
+   dma_fence_signal(&svm_bo->eviction_fence->base);
+   cancel_work_sync(&svm_bo->eviction_work);
+   }
+   dma_fence_put(&svm_bo->eviction_fence->base);
amdgpu_bo_unref(&svm_bo->bo);
kfree(svm_bo);
 }
@@ -273,6 +282,62 @@ static void svm_range_bo_unref(struct svm_range_bo *svm_bo)
kref_put(&svm_bo->kref, svm_range_bo_release);
 }
 
+static bool svm_range_validate_svm_bo(struct svm_range *prange)
+{
+   spin_lock(&prange->svm_bo_lock);
+   if (!prange->svm_bo) {
+   spin_unlock(&prange->svm_bo_lock);
+   return false;
+   }
+   if (prange->mm_nodes) {
+   /* We still have a reference, all is well */
+   spin_unlock(&prange->svm_bo_lock);
+   return true;
+   }
+   if (svm_bo_ref_unless_zero(prange->svm_bo)) {
+   if (READ_ONCE(prange->svm_bo->evicting)) {
+   struct dma_fence *f;
+   struct svm_range_bo *svm_bo;
+   /* The BO is getting evicted,
+* we need to get a new one
+*/
+   spin_unlock(&prange->svm_bo_lock);
+   svm_bo = prange->svm_bo;
+   f = dma_fence_get(&svm_bo->eviction_fence->base);
+   svm_range_bo_unref(prange->svm_bo);
+   /* wait for the fence to avoid long spin-loop
+* at list_empty_careful
+*/
+   dma_fence_wait(f, false);
+   dma_fence_put(f);
+   } else {
+   /* The BO was still around and we got
+* a new reference to it
+*/
+   spin_unlock(&prange->svm_bo_lock);
+   pr_debug("reuse old bo svms 0x%p [0x%lx 0x%lx]\n",
+prange->svms, prange->it_node.start,
+prange->it_node.last);
+
+   prange->mm_nodes = prange->svm_bo->bo->tbo.mem.mm_node;
+   return true;
+   }
+
+   } else {
+   spin_unlock(&prange->svm_bo_lock);
+   }
+
+   /* We need a new svm_bo. Spin-loop to wait for concurrent
+* svm_range_bo_release to finish removing this range from
+* its range list. After this, it is safe to reuse the
+* svm_bo pointer and svm_bo_list head.
+*/
+   while (!list_empty_careful(&prange->svm_bo_list))
+   ;
+
+   return false;
+}
+
 static struct svm_range_bo *svm_range_bo_new(void)
 {
struct svm_range_bo *svm_bo;
@@ -292,71 +357,54 @@ int
 svm_range_vram_node_new(struct amdgpu_device *adev, struct svm_range *prange,
bool clear)
 {
-   struct amdkfd_process_info *process_info;
struct amdgpu_bo_param bp;
struct svm_range_bo *svm_bo;
struct amdgpu_bo *bo;
struct kfd_process *p;
+   struct mm_struct *mm;
int r;
 
-   pr_debug("[0x%lx 0x%lx]\n", prange->it_node.start,
-prange->it_node.last);
-   spin_lock(&prange->svm_bo_lock);
-   if (prange->svm_bo) {
-   if (prange->mm_nodes) {
-   /* We still have

[PATCH 29/35] drm/amdgpu: svm bo enable_signal call condition

2021-01-06 Thread Felix Kuehling

From: Alex Sierra 

[why]
To support svm bo eviction mechanism.

[how]
If the BO crated has AMDGPU_AMDKFD_CREATE_SVM_BO flag set,
enable_signal callback will be called inside amdgpu_evict_flags.
This also causes gutting of the BO by removing all placements,
so that TTM won't actually do an eviction. Instead it will discard
the memory held by the BO. This is needed for HMM migration to user
mode system memory pages.

Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index f423f42cb9b5..62d4da95d22d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -107,6 +107,20 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
}
 
abo = ttm_to_amdgpu_bo(bo);
+   if (abo->flags & AMDGPU_AMDKFD_CREATE_SVM_BO) {
+   struct dma_fence *fence;
+   struct dma_resv *resv = &bo->base._resv;
+
+   rcu_read_lock();
+   fence = rcu_dereference(resv->fence_excl);
+   if (fence && !fence->ops->signaled)
+   dma_fence_enable_sw_signaling(fence);
+
+   placement->num_placement = 0;
+   placement->num_busy_placement = 0;
+   rcu_read_unlock();
+   return;
+   }
switch (bo->mem.mem_type) {
case AMDGPU_PL_GDS:
case AMDGPU_PL_GWS:
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 27/35] drm/amdgpu: add param bit flag to create SVM BOs

2021-01-06 Thread Felix Kuehling

From: Alex Sierra 

Add CREATE_SVM_BO define bit for SVM BOs.
Another define flag was moved to concentrate these
KFD type flags in one include file.

Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 7 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h   | 5 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 8a43f3880022..5982d09b6c3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -31,9 +31,6 @@
 #include "amdgpu_dma_buf.h"
 #include 
 
-/* BO flag to indicate a KFD userptr BO */
-#define AMDGPU_AMDKFD_USERPTR_BO (1ULL << 63)
-
 /* Userptr restore delay, just long enough to allow consecutive VM
  * changes to accumulate
  */
@@ -207,7 +204,7 @@ void amdgpu_amdkfd_unreserve_memory_limit(struct amdgpu_bo 
*bo)
u32 domain = bo->preferred_domains;
bool sg = (bo->preferred_domains == AMDGPU_GEM_DOMAIN_CPU);
 
-   if (bo->flags & AMDGPU_AMDKFD_USERPTR_BO) {
+   if (bo->flags & AMDGPU_AMDKFD_CREATE_USERPTR_BO) {
domain = AMDGPU_GEM_DOMAIN_CPU;
sg = false;
}
@@ -1241,7 +1238,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
bo->kfd_bo = *mem;
(*mem)->bo = bo;
if (user_addr)
-   bo->flags |= AMDGPU_AMDKFD_USERPTR_BO;
+   bo->flags |= AMDGPU_AMDKFD_CREATE_USERPTR_BO;
 
(*mem)->va = va;
(*mem)->domain = domain;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index adbefd6a655d..b72772ab93fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -37,6 +37,11 @@
 #define AMDGPU_BO_INVALID_OFFSET   LONG_MAX
 #define AMDGPU_BO_MAX_PLACEMENTS   3
 
+/* BO flag to indicate a KFD userptr BO */
+#define AMDGPU_AMDKFD_CREATE_USERPTR_BO(1ULL << 63)
+#define AMDGPU_AMDKFD_CREATE_SVM_BO(1ULL << 62)
+
+
 struct amdgpu_bo_param {
unsigned long   size;
int byte_align;
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 23/35] drm/amdkfd: invalidate tables on page retry fault

2021-01-06 Thread Felix Kuehling

From: Alex Sierra 

GPU page tables are invalidated by unmapping prange directly at
the mmu notifier, when page fault retry is enabled through
amdgpu_noretry global parameter. The restore page table is
performed at the page fault handler.

If xnack is on, we need update GPU mapping after prefetch migration
to avoid GPU vm fault, because range migration unmap the range from
GPUs, there is no restore work scheduled to update GPU mapping.

Signed-off-by: Alex Sierra 
Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 37f35f986930..ea27c5ed4ef3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1279,7 +1279,9 @@ svm_range_evict(struct svm_range_list *svms, struct 
mm_struct *mm,
int r = 0;
struct interval_tree_node *node;
struct svm_range *prange;
+   struct kfd_process *p;
 
+   p = container_of(svms, struct kfd_process, svms);
svms_lock(svms);
 
pr_debug("invalidate svms 0x%p [0x%lx 0x%lx]\n", svms, start, last);
@@ -1292,8 +1294,13 @@ svm_range_evict(struct svm_range_list *svms, struct 
mm_struct *mm,
next = interval_tree_iter_next(node, start, last);
 
invalid = atomic_inc_return(&prange->invalid);
-   evicted_ranges = atomic_inc_return(&svms->evicted_ranges);
-   if (evicted_ranges == 1) {
+
+   if (!p->xnack_enabled) {
+   evicted_ranges =
+   atomic_inc_return(&svms->evicted_ranges);
+   if (evicted_ranges != 1)
+   goto next_node;
+
pr_debug("evicting svms 0x%p range [0x%lx 0x%lx]\n",
 prange->svms, prange->it_node.start,
 prange->it_node.last);
@@ -1306,7 +1313,14 @@ svm_range_evict(struct svm_range_list *svms, struct 
mm_struct *mm,
pr_debug("schedule to restore svm %p ranges\n", svms);
schedule_delayed_work(&svms->restore_work,
   msecs_to_jiffies(AMDGPU_SVM_RANGE_RESTORE_DELAY_MS));
+   } else {
+   pr_debug("invalidate svms 0x%p [0x%lx 0x%lx] %d\n",
+prange->svms, prange->it_node.start,
+prange->it_node.last, invalid);
+   if (invalid == 1)
+   svm_range_unmap_from_gpus(prange);
}
+next_node:
node = next;
}
 
@@ -1944,7 +1958,7 @@ svm_range_set_attr(struct kfd_process *p, uint64_t start, 
uint64_t size,
if (r)
goto out_unlock;
 
-   if (migrated) {
+   if (migrated && !p->xnack_enabled) {
pr_debug("restore_work will update mappings of GPUs\n");
mutex_unlock(&prange->mutex);
continue;
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 24/35] drm/amdkfd: page table restore through svm API

2021-01-06 Thread Felix Kuehling

From: Alex Sierra 

Page table restore implementation in SVM API. This is called from
the fault handler at amdgpu_vm. To update page tables through
the page fault retry IH.

Signed-off-by: Alex Sierra 
Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 78 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  2 +
 2 files changed, 80 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index ea27c5ed4ef3..7346255f7c27 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1629,6 +1629,84 @@ svm_range_from_addr(struct svm_range_list *svms, 
unsigned long addr)
return container_of(node, struct svm_range, it_node);
 }
 
+int
+svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
+   uint64_t addr)
+{
+   int r = 0;
+   int srcu_idx;
+   struct mm_struct *mm = NULL;
+   struct svm_range *prange;
+   struct svm_range_list *svms;
+   struct kfd_process *p;
+
+   p = kfd_lookup_process_by_pasid(pasid);
+   if (!p) {
+   pr_debug("kfd process not founded pasid 0x%x\n", pasid);
+   return -ESRCH;
+   }
+   svms = &p->svms;
+   srcu_idx = srcu_read_lock(&svms->srcu);
+
+   pr_debug("restoring svms 0x%p fault address 0x%llx\n", svms, addr);
+
+   svms_lock(svms);
+   prange = svm_range_from_addr(svms, addr);
+   svms_unlock(svms);
+   if (!prange) {
+   pr_debug("failed to find prange svms 0x%p address [0x%llx]\n",
+svms, addr);
+   r = -EFAULT;
+   goto unlock_out;
+   }
+
+   if (!atomic_read(&prange->invalid)) {
+   pr_debug("svms 0x%p [0x%lx %lx] already restored\n",
+svms, prange->it_node.start, prange->it_node.last);
+   goto unlock_out;
+   }
+
+   mm = get_task_mm(p->lead_thread);
+   if (!mm) {
+   pr_debug("svms 0x%p failed to get mm\n", svms);
+   r = -ESRCH;
+   goto unlock_out;
+   }
+
+   mmap_read_lock(mm);
+
+   /*
+* If range is migrating, wait for migration is done.
+*/
+   mutex_lock(&prange->mutex);
+
+   r = svm_range_validate(mm, prange);
+   if (r) {
+   pr_debug("failed %d to validate svms 0x%p [0x%lx 0x%lx]\n", r,
+svms, prange->it_node.start, prange->it_node.last);
+
+   goto mmput_out;
+   }
+
+   pr_debug("restoring svms 0x%p [0x%lx %lx] mapping\n",
+svms, prange->it_node.start, prange->it_node.last);
+
+   r = svm_range_map_to_gpus(prange, true);
+   if (r)
+   pr_debug("failed %d to map svms 0x%p [0x%lx 0x%lx] to gpu\n", r,
+svms, prange->it_node.start, prange->it_node.last);
+
+mmput_out:
+   mutex_unlock(&prange->mutex);
+   mmap_read_unlock(mm);
+   mmput(mm);
+unlock_out:
+   srcu_read_unlock(&svms->srcu, srcu_idx);
+   kfd_unref_process(p);
+
+   return r;
+}
+
 void svm_range_list_fini(struct kfd_process *p)
 {
pr_debug("pasid 0x%x svms 0x%p\n", p->pasid, &p->svms);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index c67e96f764fe..e546f36ef709 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -121,5 +121,7 @@ int svm_range_vram_node_new(struct amdgpu_device *adev,
 void svm_range_vram_node_free(struct svm_range *prange);
 int svm_range_split_by_granularity(struct kfd_process *p, unsigned long addr,
   struct list_head *list);
+int svm_range_restore_pages(struct amdgpu_device *adev,
+   unsigned int pasid, uint64_t addr);
 
 #endif /* KFD_SVM_H_ */
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 21/35] drm/amdkfd: HMM migrate ram to vram

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

Register svm range with same address and size but perferred_location
is changed from CPU to GPU or from GPU to CPU, trigger migration the svm
range from ram to vram or from vram to ram.

If svm range prefetch location is GPU with flags
KFD_IOCTL_SVM_FLAG_HOST_ACCESS, validate the svm range on ram first,
then migrate it from ram to vram.

After migrating to vram is done, CPU access will have cpu page fault,
page fault handler migrate it back to ram and resume cpu access.

Migration steps:

1. migrate_vma_pages get svm range ram pages, notify the
interval is invalidated and unmap from CPU page table, HMM interval
notifier callback evict process queues
2. Allocate new pages in vram using TTM
3. Use svm copy memory to sdma copy data from ram to vram
4. migrate_vma_pages copy ram pages structure to vram pages structure
5. migrate_vma_finalize put ram pages to free ram pages and memory
6. Restore work wait for migration is finished, then update GPUs page
table mapping to new vram pages, resume process queues

If migrate_vma_setup failed to collect all ram pages of range, retry 3
times until success to start migration.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 265 +++
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 175 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |   2 +
 4 files changed, 436 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index f2019c8f0b80..af23f0be7eaf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -204,6 +204,271 @@ svm_migrate_copy_done(struct amdgpu_device *adev, struct 
dma_fence *mfence)
return r;
 }
 
+static uint64_t
+svm_migrate_node_physical_addr(struct amdgpu_device *adev,
+  struct drm_mm_node **mm_node, uint64_t *offset)
+{
+   struct drm_mm_node *node = *mm_node;
+   uint64_t pos = *offset;
+
+   if (node->start == AMDGPU_BO_INVALID_OFFSET) {
+   pr_debug("drm node is not validated\n");
+   return 0;
+   }
+
+   pr_debug("vram node start 0x%llx npages 0x%llx\n", node->start,
+node->size);
+
+   if (pos >= node->size) {
+   do  {
+   pos -= node->size;
+   node++;
+   } while (pos >= node->size);
+
+   *mm_node = node;
+   *offset = pos;
+   }
+
+   return (node->start + pos) << PAGE_SHIFT;
+}
+
+unsigned long
+svm_migrate_addr_to_pfn(struct amdgpu_device *adev, unsigned long addr)
+{
+   return (addr + adev->kfd.dev->pgmap.res.start) >> PAGE_SHIFT;
+}
+
+static void
+svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn)
+{
+   struct page *page;
+
+   page = pfn_to_page(pfn);
+   page->zone_device_data = prange;
+   get_page(page);
+   lock_page(page);
+}
+
+static void
+svm_migrate_put_vram_page(struct amdgpu_device *adev, unsigned long addr)
+{
+   struct page *page;
+
+   page = pfn_to_page(svm_migrate_addr_to_pfn(adev, addr));
+   unlock_page(page);
+   put_page(page);
+}
+
+
+static int
+svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
+struct migrate_vma *migrate,
+struct dma_fence **mfence)
+{
+   uint64_t npages = migrate->cpages;
+   struct drm_mm_node *node;
+   uint64_t *src, *dst;
+   uint64_t vram_addr;
+   uint64_t offset;
+   uint64_t i, j;
+   int r = -ENOMEM;
+
+   pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms,
+prange->it_node.start, prange->it_node.last);
+
+   src = kvmalloc_array(npages << 1, sizeof(*src), GFP_KERNEL);
+   if (!src)
+   goto out;
+   dst = src + npages;
+
+   r = svm_range_vram_node_new(adev, prange, false);
+   if (r) {
+   pr_debug("failed %d get 0x%llx pages from vram\n", r, npages);
+   goto out_free;
+   }
+
+   node = prange->mm_nodes;
+   offset = prange->offset;
+   vram_addr = svm_migrate_node_physical_addr(adev, &node, &offset);
+   if (!vram_addr) {
+   WARN_ONCE(1, "vram node address is 0\n");
+   r = -ENOMEM;
+   goto out_free;
+   }
+
+   for (i = j = 0; i < npages; i++) {
+   struct page *spage;
+
+   spage = migrate_pfn_to_page(migrate->src[i]);
+   src[i] = page_to_pfn(spage) << PAGE_SHIFT;
+
+   dst[i] = vram_addr + (j << PAGE_SHIFT);
+   migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]);
+   svm_migrate_get_vram_page(prange, migrate->dst[i]);
+
+   migrate->dst[i] = migrate_pfn(migrate->dst[i]);
+   migrate->dst[i] |= MIGRATE_PFN_LOC

[PATCH 30/35] drm/amdgpu: add svm_bo eviction to enable_signal cb

2021-01-06 Thread Felix Kuehling

From: Alex Sierra 

Add to amdgpu_amdkfd_fence.enable_signal callback, support
for svm_bo fence eviction.

Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
index 9cc85efa4ed5..98d6e08f22d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include "amdgpu_amdkfd.h"
+#include "kfd_svm.h"
 
 static const struct dma_fence_ops amdkfd_fence_ops;
 static atomic_t fence_seq = ATOMIC_INIT(0);
@@ -123,9 +124,13 @@ static bool amdkfd_fence_enable_signaling(struct dma_fence 
*f)
if (dma_fence_is_signaled(f))
return true;
 
-   if (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
-   return true;
-
+   if (!fence->svm_bo) {
+   if (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
+   return true;
+   } else {
+   if (!svm_range_schedule_evict_svm_bo(fence))
+   return true;
+   }
return false;
 }
 
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 31/35] drm/amdgpu: reserve fence slot to update page table

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

Forgot to reserve a fence slot to use sdma to update page table, cause
below kernel BUG backtrace to handle vm retry fault while application is
exiting.

[  133.048143] kernel BUG at 
/home/yangp/git/compute_staging/kernel/drivers/dma-buf/dma-resv.c:281!
[  133.048487] Workqueue: events amdgpu_irq_handle_ih1 [amdgpu]
[  133.048506] RIP: 0010:dma_resv_add_shared_fence+0x204/0x280
[  133.048672]  amdgpu_vm_sdma_commit+0x134/0x220 [amdgpu]
[  133.048788]  amdgpu_vm_bo_update_range+0x220/0x250 [amdgpu]
[  133.048905]  amdgpu_vm_handle_fault+0x202/0x370 [amdgpu]
[  133.049031]  gmc_v9_0_process_interrupt+0x1ab/0x310 [amdgpu]
[  133.049165]  ? kgd2kfd_interrupt+0x9a/0x180 [amdgpu]
[  133.049289]  ? amdgpu_irq_dispatch+0xb6/0x240 [amdgpu]
[  133.049408]  amdgpu_irq_dispatch+0xb6/0x240 [amdgpu]
[  133.049534]  amdgpu_ih_process+0x9b/0x1c0 [amdgpu]
[  133.049657]  amdgpu_irq_handle_ih1+0x21/0x60 [amdgpu]
[  133.049669]  process_one_work+0x29f/0x640
[  133.049678]  worker_thread+0x39/0x3f0
[  133.049685]  ? process_one_work+0x640/0x640

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index abdd4e7b4c3b..bd9de870f8f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3301,7 +3301,7 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, 
unsigned int pasid,
struct amdgpu_bo *root;
uint64_t value, flags;
struct amdgpu_vm *vm;
-   long r;
+   int r;
bool is_compute_context = false;
 
spin_lock(&adev->vm_manager.pasid_lock);
@@ -3359,6 +3359,12 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, 
unsigned int pasid,
value = 0;
}
 
+   r = dma_resv_reserve_shared(root->tbo.base.resv, 1);
+   if (r) {
+   pr_debug("failed %d to reserve fence slot\n", r);
+   goto error_unlock;
+   }
+
r = amdgpu_vm_bo_update_mapping(adev, adev, vm, true, false, NULL, addr,
addr, flags, value, NULL, NULL,
NULL);
@@ -3370,7 +3376,7 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, 
unsigned int pasid,
 error_unlock:
amdgpu_bo_unreserve(root);
if (r < 0)
-   DRM_ERROR("Can't handle page fault (%ld)\n", r);
+   DRM_ERROR("Can't handle page fault (%d)\n", r);
 
 error_unref:
amdgpu_bo_unref(&root);
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 32/35] drm/amdgpu: enable retry fault wptr overflow

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

If xnack is on, VM retry fault interrupt send to IH ring1, and ring1
will be full quickly. IH cannot receive other interrupts, this causes
deadlock if migrating buffer using sdma and waiting for sdma done while
handling retry fault.

Remove VMC from IH storm client, enable ring1 write pointer overflow,
then IH will drop retry fault interrupts and be able to receive other
interrupts while driver is handling retry fault.

IH ring1 write pointer doesn't writeback to memory by IH, and ring1
write pointer recorded by self-irq is not updated, so always read
the latest ring1 write pointer from register.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 32 +-
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 32 +-
 2 files changed, 22 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 88626d83e07b..ca8efa5c6978 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -220,10 +220,8 @@ static int vega10_ih_enable_ring(struct amdgpu_device 
*adev,
tmp = vega10_ih_rb_cntl(ih, tmp);
if (ih == &adev->irq.ih)
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RPTR_REARM, 
!!adev->irq.msi_enabled);
-   if (ih == &adev->irq.ih1) {
-   tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_ENABLE, 0);
+   if (ih == &adev->irq.ih1)
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RB_FULL_DRAIN_ENABLE, 1);
-   }
if (amdgpu_sriov_vf(adev)) {
if (psp_reg_program(&adev->psp, ih_regs->psp_reg_id, tmp)) {
dev_err(adev->dev, "PSP program IH_RB_CNTL failed!\n");
@@ -265,7 +263,6 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev)
u32 ih_chicken;
int ret;
int i;
-   u32 tmp;
 
/* disable irqs */
ret = vega10_ih_toggle_interrupts(adev, false);
@@ -291,15 +288,6 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev)
}
}
 
-   tmp = RREG32_SOC15(OSSSYS, 0, mmIH_STORM_CLIENT_LIST_CNTL);
-   tmp = REG_SET_FIELD(tmp, IH_STORM_CLIENT_LIST_CNTL,
-   CLIENT18_IS_STORM_CLIENT, 1);
-   WREG32_SOC15(OSSSYS, 0, mmIH_STORM_CLIENT_LIST_CNTL, tmp);
-
-   tmp = RREG32_SOC15(OSSSYS, 0, mmIH_INT_FLOOD_CNTL);
-   tmp = REG_SET_FIELD(tmp, IH_INT_FLOOD_CNTL, FLOOD_CNTL_ENABLE, 1);
-   WREG32_SOC15(OSSSYS, 0, mmIH_INT_FLOOD_CNTL, tmp);
-
pci_set_master(adev->pdev);
 
/* enable interrupts */
@@ -345,11 +333,17 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev,
u32 wptr, tmp;
struct amdgpu_ih_regs *ih_regs;
 
-   wptr = le32_to_cpu(*ih->wptr_cpu);
-   ih_regs = &ih->ih_regs;
+   if (ih == &adev->irq.ih) {
+   /* Only ring0 supports writeback. On other rings fall back
+* to register-based code with overflow checking below.
+*/
+   wptr = le32_to_cpu(*ih->wptr_cpu);
 
-   if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
-   goto out;
+   if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
+   goto out;
+   }
+
+   ih_regs = &ih->ih_regs;
 
/* Double check that the overflow wasn't already cleared. */
wptr = RREG32_NO_KIQ(ih_regs->ih_rb_wptr);
@@ -440,15 +434,11 @@ static int vega10_ih_self_irq(struct amdgpu_device *adev,
  struct amdgpu_irq_src *source,
  struct amdgpu_iv_entry *entry)
 {
-   uint32_t wptr = cpu_to_le32(entry->src_data[0]);
-
switch (entry->ring_id) {
case 1:
-   *adev->irq.ih1.wptr_cpu = wptr;
schedule_work(&adev->irq.ih1_work);
break;
case 2:
-   *adev->irq.ih2.wptr_cpu = wptr;
schedule_work(&adev->irq.ih2_work);
break;
default: break;
diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
index 42032ca380cc..60d1bd51781e 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
@@ -220,10 +220,8 @@ static int vega20_ih_enable_ring(struct amdgpu_device 
*adev,
tmp = vega20_ih_rb_cntl(ih, tmp);
if (ih == &adev->irq.ih)
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RPTR_REARM, 
!!adev->irq.msi_enabled);
-   if (ih == &adev->irq.ih1) {
-   tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_ENABLE, 0);
+   if (ih == &adev->irq.ih1)
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RB_FULL_DRAIN_ENABLE, 1);
-   }
if (amdgpu_sriov_vf(adev)) {
if (psp_reg_program(&adev->psp, ih_regs->psp_reg_id, tmp)) {
dev_err(adev->dev, "PSP program IH_RB_CNTL failed!\n");
@@ -297,7

[PATCH 33/35] drm/amdkfd: refine migration policy with xnack on

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

With xnack on, GPU vm fault handler decide the best restore location,
then migrate range to the best restore location and update GPU mapping
to recover the GPU vm fault.

Signed-off-by: Philip Yang 
Signed-off-by: Alex Sierra 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  25 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|   3 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  16 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 162 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |   3 +-
 7 files changed, 180 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index bd9de870f8f1..50a8f4db22f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3320,7 +3320,7 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, 
unsigned int pasid,
addr /= AMDGPU_GPU_PAGE_SIZE;
 
if (!amdgpu_noretry && is_compute_context &&
-   !svm_range_restore_pages(adev, pasid, addr)) {
+   !svm_range_restore_pages(adev, vm, pasid, addr)) {
amdgpu_bo_unref(&root);
return true;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index d33a4cc63495..2095417c7846 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -441,6 +441,7 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
  * svm_migrate_ram_to_vram - migrate svm range from system to device
  * @prange: range structure
  * @best_loc: the device to migrate to
+ * @mm: the process mm structure
  *
  * Context: Process context, caller hold mm->mmap_sem and prange->lock and take
  *  svms srcu read lock.
@@ -448,12 +449,12 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, 
struct svm_range *prange,
  * Return:
  * 0 - OK, otherwise error code
  */
-int svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc)
+int svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc,
+   struct mm_struct *mm)
 {
unsigned long addr, start, end;
struct vm_area_struct *vma;
struct amdgpu_device *adev;
-   struct mm_struct *mm;
int r = 0;
 
if (prange->actual_loc == best_loc) {
@@ -475,8 +476,6 @@ int svm_migrate_ram_to_vram(struct svm_range *prange, 
uint32_t best_loc)
start = prange->it_node.start << PAGE_SHIFT;
end = (prange->it_node.last + 1) << PAGE_SHIFT;
 
-   mm = current->mm;
-
for (addr = start; addr < end;) {
unsigned long next;
 
@@ -740,12 +739,26 @@ static vm_fault_t svm_migrate_to_ram(struct vm_fault *vmf)
list_for_each_entry(prange, &list, update_list) {
mutex_lock(&prange->mutex);
r = svm_migrate_vram_to_ram(prange, vma->vm_mm);
-   mutex_unlock(&prange->mutex);
if (r) {
pr_debug("failed %d migrate [0x%lx 0x%lx] to ram\n", r,
 prange->it_node.start, prange->it_node.last);
-   goto out_srcu;
+   goto next;
}
+
+   /* xnack off, svm_range_restore_work will update GPU mapping */
+   if (!p->xnack_enabled)
+   goto next;
+
+   /* xnack on, update mapping on GPUs with ACCESS_IN_PLACE */
+   r = svm_range_map_to_gpus(prange, true);
+   if (r)
+   pr_debug("failed %d to map svms 0x%p [0x%lx 0x%lx]\n",
+r, prange->svms, prange->it_node.start,
+prange->it_node.last);
+next:
+   mutex_unlock(&prange->mutex);
+   if (r)
+   break;
}
 
 out_srcu:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
index 95fd7b21791f..9949b55d3b6a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
@@ -37,7 +37,8 @@ enum MIGRATION_COPY_DIR {
FROM_VRAM_TO_RAM
 };
 
-int svm_migrate_ram_to_vram(struct svm_range *prange,  uint32_t best_loc);
+int svm_migrate_ram_to_vram(struct svm_range *prange,  uint32_t best_loc,
+   struct mm_struct *mm);
 int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm);
 unsigned long
 svm_migrate_addr_to_pfn(struct amdgpu_device *adev, unsigned long addr);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index d5367e770b39..db94f963eb7e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -864,6 +864,9 @@ int kfd_process_gpuid_f

[PATCH 35/35] drm/amdkfd: multiple gpu migrate vram to vram

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

If prefetch range to gpu with acutal location is another gpu, or GPU
retry fault restore pages to migrate the range with acutal location is
gpu, then migrate from one gpu to another gpu.

Use system memory as bridge because sdma engine may not able to access
another gpu vram, use sdma of source gpu to migrate to system memory,
then use sdma of destination gpu to migrate from system memory to gpu.

Print out gpuid or gpuidx in debug messages.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 57 +--
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 70 +---
 3 files changed, 103 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 2095417c7846..6c644472cead 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -449,8 +449,9 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
  * Return:
  * 0 - OK, otherwise error code
  */
-int svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc,
-   struct mm_struct *mm)
+static int
+svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc,
+   struct mm_struct *mm)
 {
unsigned long addr, start, end;
struct vm_area_struct *vma;
@@ -470,8 +471,8 @@ int svm_migrate_ram_to_vram(struct svm_range *prange, 
uint32_t best_loc,
return -ENODEV;
}
 
-   pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms,
-prange->it_node.start, prange->it_node.last);
+   pr_debug("svms 0x%p [0x%lx 0x%lx] to gpu 0x%x\n", prange->svms,
+prange->it_node.start, prange->it_node.last, best_loc);
 
start = prange->it_node.start << PAGE_SHIFT;
end = (prange->it_node.last + 1) << PAGE_SHIFT;
@@ -668,8 +669,9 @@ int svm_migrate_vram_to_ram(struct svm_range *prange, 
struct mm_struct *mm)
return -ENODEV;
}
 
-   pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms,
-prange->it_node.start, prange->it_node.last);
+   pr_debug("svms 0x%p [0x%lx 0x%lx] from gpu 0x%x to ram\n", prange->svms,
+prange->it_node.start, prange->it_node.last,
+prange->actual_loc);
 
start = prange->it_node.start << PAGE_SHIFT;
end = (prange->it_node.last + 1) << PAGE_SHIFT;
@@ -696,6 +698,49 @@ int svm_migrate_vram_to_ram(struct svm_range *prange, 
struct mm_struct *mm)
return r;
 }
 
+/**
+ * svm_migrate_vram_to_vram - migrate svm range from device to device
+ * @prange: range structure
+ * @best_loc: the device to migrate to
+ * @mm: process mm, use current->mm if NULL
+ *
+ * Context: Process context, caller hold mm->mmap_sem and prange->lock and take
+ *  svms srcu read lock
+ *
+ * Return:
+ * 0 - OK, otherwise error code
+ */
+static int
+svm_migrate_vram_to_vram(struct svm_range *prange, uint32_t best_loc,
+struct mm_struct *mm)
+{
+   int r;
+
+   /*
+* TODO: for both devices with PCIe large bar or on same xgmi hive, skip
+* system memory as migration bridge
+*/
+
+   pr_debug("from gpu 0x%x to gpu 0x%x\n", prange->actual_loc, best_loc);
+
+   r = svm_migrate_vram_to_ram(prange, mm);
+   if (r)
+   return r;
+
+   return svm_migrate_ram_to_vram(prange, best_loc, mm);
+}
+
+int
+svm_migrate_to_vram(struct svm_range *prange, uint32_t best_loc,
+   struct mm_struct *mm)
+{
+   if  (!prange->actual_loc)
+   return svm_migrate_ram_to_vram(prange, best_loc, mm);
+   else
+   return svm_migrate_vram_to_vram(prange, best_loc, mm);
+
+}
+
 /**
  * svm_migrate_to_ram - CPU page fault handler
  * @vmf: CPU vm fault vma, address
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
index 9949b55d3b6a..bc680619d135 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
@@ -37,8 +37,8 @@ enum MIGRATION_COPY_DIR {
FROM_VRAM_TO_RAM
 };
 
-int svm_migrate_ram_to_vram(struct svm_range *prange,  uint32_t best_loc,
-   struct mm_struct *mm);
+int svm_migrate_to_vram(struct svm_range *prange,  uint32_t best_loc,
+   struct mm_struct *mm);
 int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm);
 unsigned long
 svm_migrate_addr_to_pfn(struct amdgpu_device *adev, unsigned long addr);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 65f20a72ddcb..d029fce94db0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -288,8 +288,11 @@ static void svm_range_bo_unref(struct svm_range_bo *svm_bo)

[PATCH 34/35] drm/amdkfd: add svm range validate timestamp

2021-01-06 Thread Felix Kuehling

From: Philip Yang 

With xnack on, add validate timestamp in order to handle GPU vm fault
from multiple GPUs.

If GPU retry fault need migrate the range to the best restore location,
use range validate timestamp to record system timestamp after range is
restored to update GPU page table.

Because multiple pages of same range have multiple retry fault, define
AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING to the long time period that
pending retry fault may still comes after page table update, to skip
duplicate retry fault of same range.

If difference between system timestamp and range last validate timestamp
is bigger than AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING, that means the
retry fault is from another GPU, then continue to handle retry fault
recover.

Signed-off-by: Philip Yang 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 27 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  2 ++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 8b57f5a471bd..65f20a72ddcb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -34,6 +34,11 @@
 
 #define AMDGPU_SVM_RANGE_RESTORE_DELAY_MS 1
 
+/* Long enough to ensure no retry fault comes after svm range is restored and
+ * page table is updated.
+ */
+#define AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING   2000
+
 static void svm_range_evict_svm_bo_worker(struct work_struct *work);
 /**
  * svm_range_unlink - unlink svm_range from lists and interval tree
@@ -122,6 +127,7 @@ svm_range *svm_range_new(struct svm_range_list *svms, 
uint64_t start,
INIT_LIST_HEAD(&prange->remove_list);
INIT_LIST_HEAD(&prange->svm_bo_list);
atomic_set(&prange->invalid, 0);
+   prange->validate_timestamp = ktime_to_us(ktime_get());
mutex_init(&prange->mutex);
spin_lock_init(&prange->svm_bo_lock);
svm_range_set_default_attributes(&prange->preferred_loc,
@@ -482,20 +488,28 @@ static int svm_range_validate_vram(struct svm_range 
*prange)
 static int
 svm_range_validate(struct mm_struct *mm, struct svm_range *prange)
 {
+   struct kfd_process *p;
int r;
 
pr_debug("svms 0x%p [0x%lx 0x%lx] actual loc 0x%x\n", prange->svms,
 prange->it_node.start, prange->it_node.last,
 prange->actual_loc);
 
+   p = container_of(prange->svms, struct kfd_process, svms);
+
if (!prange->actual_loc)
r = svm_range_validate_ram(mm, prange);
else
r = svm_range_validate_vram(prange);
 
-   pr_debug("svms 0x%p [0x%lx 0x%lx] ret %d invalid %d\n", prange->svms,
-prange->it_node.start, prange->it_node.last,
-r, atomic_read(&prange->invalid));
+   if (!r) {
+   if (p->xnack_enabled)
+   atomic_set(&prange->invalid, 0);
+   prange->validate_timestamp = ktime_to_us(ktime_get());
+   }
+
+   pr_debug("svms 0x%p [0x%lx 0x%lx] ret %d\n", prange->svms,
+prange->it_node.start, prange->it_node.last, r);
 
return r;
 }
@@ -1766,6 +1780,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
struct svm_range_list *svms;
struct svm_range *prange;
struct kfd_process *p;
+   uint64_t timestamp;
int32_t best_loc;
int srcu_idx;
int r = 0;
@@ -1790,7 +1805,11 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
goto out_srcu_unlock;
}
 
-   if (!atomic_read(&prange->invalid)) {
+   mutex_lock(&prange->mutex);
+   timestamp = ktime_to_us(ktime_get()) - prange->validate_timestamp;
+   mutex_unlock(&prange->mutex);
+   /* skip duplicate vm fault on different pages of same range */
+   if (timestamp < AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING) {
pr_debug("svms 0x%p [0x%lx %lx] already restored\n",
 svms, prange->it_node.start, prange->it_node.last);
goto out_srcu_unlock;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index 0685eb04b87c..466ec5537bbb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -66,6 +66,7 @@ struct svm_range_bo {
  * @actual_loc: the actual location, 0 for CPU, or GPU id
  * @granularity:migration granularity, log2 num pages
  * @invalid:not 0 means cpu page table is invalidated
+ * @validate_timestamp: system timestamp when range is validated
  * @bitmap_access: index bitmap of GPUs which can access the range
  * @bitmap_aip: index bitmap of GPUs which can access the range in place
  *
@@ -95,6 +96,7 @@ struct svm_range {
uint32_tactual_loc;
uint8_t granularity;
atomic_tinvalid;
+   uint64_t

Re: radeon kernel driver not suppressing ACPI_VIDEO_NOTIFY_PROBE events when it should

2021-01-06 Thread Hans de Goede

Hi,

On 1/6/21 9:38 PM, Alex Deucher wrote:
> On Wed, Jan 6, 2021 at 3:04 PM Hans de Goede  wrote:
>>
>> Hi,
>>
>> On 1/6/21 8:33 PM, Alex Deucher wrote:
>>> On Wed, Jan 6, 2021 at 1:10 PM Hans de Goede  wrote:

 Hi,

 On 1/6/21 6:07 PM, Alex Deucher wrote:
> On Wed, Jan 6, 2021 at 11:25 AM Hans de Goede  wrote:
>>
>> Hi All,
>>
>> I get Cc-ed on all Fedora kernel bugs and this one stood out to me:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1911763
>>
>> Since I've done a lot of work on the acpi-video code I thought I should
>> take a look. I've managed to help the user with a kernel-commandline
>> option which stops video.ko (the acpi-video kernel module) from emitting
>> key-press events for ACPI_VIDEO_NOTIFY_PROBE events.
>>
>> This is on a Dell Vostro laptop with i915/radeon hybrid gfx.
>>
>> I was thinking about adding a DMI quirk for this, but from the brief time
>> that I worked on nouveau (and specifically hybrid gfx setups) I know that
>> these events get fired on hybrid gfx setups when the discrete GPU is
>> powered down and something happens which requires the discrete GPUs 
>> drivers
>> attention, like an external monitor being plugged into a connector 
>> handled
>> by the dGPU (note that is not the case here).
>>
>> So I took a quick look at the radeon code and the radeon_atif_handler()
>> function from drivers/gpu/drm/radeon/radeon_acpi.c. When successful that
>> returns NOTIFY_BAD which suppresses the key-press.
>>
>> But in various cases it returns NOTIFY_DONE instead which does not
>> suppress the key-press event. So I think that the spurious key-press 
>> events
>> which the user is seeing should be avoided by this function returning
>> NOTIFY_BAD.
>>
>> Specifically I'm wondering if we should not return
>> NOTIFY_BAD when count == 0?   I guess this can cause problems if there
>> are multiple GPUs, but we could check if the acpi-event is for the
>> pci-device the radeon driver is bound to. This would require changing the
>> acpi-notify code to also pass the acpi_device pointer as part of the
>> acpi_bus_event but that should not be a problem.
>>
>
> For A+A PX/HG systems, we'd want the notifications for both the dGPU
> and the APU since some of the events are relevant to one or the other.
> ATIF_DGPU_DISPLAY_EVENT is only relevant to the dGPU, while
> ATIF_PANEL_BRIGHTNESS_CHANGE_REQUEST would be possibly relevant to
> both (if there was a mux), but mainly the APU.
> ATIF_SYSTEM_POWER_SOURCE_CHANGE_REQUEST would be relevant to both.
> The other events have extended bits to determine which GPU the event
> is targeted at.

 Right, but AFAIK on hybrid systems there are 2 ACPI video-bus devices,
 one for each of the iGPU and dGPU which is why I suggested passing
 the video-bus acpi_device as extra data in acpi_bus_event and then
 radeon_atif_handler() could check if the acpi_device is the companion
 device of the GPU. This assumes that events for GPU# will also
 originate from (through an ACPI ASL notify call) the ACPI video-bus
 which belongs to that GPU.
>>>
>>> That's not the case.  For PX/HG systems, ATIF is in the iGPU's
>>> namespace, on dGPU only systems, ATIF is in the dGPU's namespace.
>>
>> That assumes and AMD iGPU + AMD dGPU I believe ?  The system on
>> which the spurious ACPI_VIDEO_NOTIFY_PROBE events lead to spurious
>> KEY_SWITCHVIDEOMODE key-presses being reported uses an Intel iGPU
>> with an AMD dGPU. I don't have any hybrid gfx systems available for
>> testing atm, but I believe that in this case there will be 2 ACPI
>> video-busses, one for each GPU.
> 
> I think the ATIF method will be on the iGPU regardless of whether it's
> intel or AMD.

Ok.

>> Note I'm not saying that that means that checking the originating
>> ACPI device is the companion of the GPUs PCI-device is the solution
>> here. But so far all I've heard from you is that that is not the
>> solution, without you offering any alternative ideas / possible
>> solutions to try for filtering out these spurious key-presses.
> 
> Sorry, I'm not really an ACPI expert.  I think returning NOTIFY_BAD is
> fine for this specific case, but I don't know if it will break other
> platforms.

Yes, I'm worried too that it might break other platforms, so that
option is of the table then.

> That said, I don't recall seeing any other similar bugs,
> so maybe this is something specific to this particular laptop.

Ok, the acpi_video.c code already has the option to suppress
key-press reporting based on either a cmdline option, or
a DMI quirk and the reporter of the issue has already confirmed that
the kernel cmdline option works around this. So I will submit a patch
for acpi_video.c to add a DMI quirk for this then. This seems more
of a workaround then a real solution, but it lo

[PATCH v2] drm/amdgpu:Limit the resolution for virtual_display

2021-01-06 Thread Emily Deng

From: "Emily.Deng" 

Limit the resolution not bigger than 16384, which means
dev->mode_info.num_crtc * common_modes[i].w not bigger than 16384.

v2:
  Refine the code

Signed-off-by: Emily.Deng 
---
 drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c 
b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
index 2b16c8faca34..fd2b3a6dfd60 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
@@ -319,6 +319,7 @@ dce_virtual_encoder(struct drm_connector *connector)
 static int dce_virtual_get_modes(struct drm_connector *connector)
 {
struct drm_device *dev = connector->dev;
+   struct amdgpu_device *adev = dev->dev_private;
struct drm_display_mode *mode = NULL;
unsigned i;
static const struct mode_size {
@@ -350,8 +351,10 @@ static int dce_virtual_get_modes(struct drm_connector 
*connector)
};
 
for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
-   mode = drm_cvt_mode(dev, common_modes[i].w, common_modes[i].h, 
60, false, false, false);
-   drm_mode_probed_add(connector, mode);
+   if (adev->mode_info.num_crtc * common_modes[i].w <= 16384) {
+   mode = drm_cvt_mode(dev, common_modes[i].w, 
common_modes[i].h, 60, false, false, false);
+   drm_mode_probed_add(connector, mode);
+   }
}
 
return 0;
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH v2] drm/amdgpu/psp: fix psp gfx ctrl cmds

2021-01-06 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Evan Quan 

>-Original Message-
>From: amd-gfx  On Behalf Of Victor
>Zhao
>Sent: Tuesday, January 5, 2021 3:51 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Zhao, Victor 
>Subject: [PATCH v2] drm/amdgpu/psp: fix psp gfx ctrl cmds
>
>psp GFX_CTRL_CMD_ID_CONSUME_CMD different for windows and linux,
>according to psp, linux cmds are not correct.
>
>v2: only correct GFX_CTRL_CMD_ID_CONSUME_CMD.
>
>Signed-off-by: Victor Zhao 
>---
> drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>index d65a5339d354..3ba7bdfde65d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>+++ b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>@@ -47,7 +47,7 @@ enum psp_gfx_crtl_cmd_id
> GFX_CTRL_CMD_ID_DISABLE_INT = 0x0006,   /* disable PSP-to-Gfx
>interrupt */
> GFX_CTRL_CMD_ID_MODE1_RST   = 0x0007,   /* trigger the Mode 1
>reset */
> GFX_CTRL_CMD_ID_GBR_IH_SET  = 0x0008,   /* set Gbr
>IH_RB_CNTL registers */
>-GFX_CTRL_CMD_ID_CONSUME_CMD = 0x000A,   /* send interrupt
>to psp for updating write pointer of vf */
>+GFX_CTRL_CMD_ID_CONSUME_CMD = 0x0009,   /* send interrupt
>to psp for updating write pointer of vf */
> GFX_CTRL_CMD_ID_DESTROY_GPCOM_RING = 0x000C, /* destroy
>GPCOM ring */
>
> GFX_CTRL_CMD_ID_MAX = 0x000F,   /* max command ID */
>--
>2.25.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfx&data=04%7C01%7CEmily.Deng%40amd.com%7Cb32008e7e8e1447
>797ab08d8b14eb089%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%
>7C637454298972763193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
>wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&s
>data=2HlggVF4%2B20Ceom5OBfyr0MXYdiLMblwgUbl%2FVEeqII%3D&re
>served=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH v2] drm/amdgpu/psp: fix psp gfx ctrl cmds

2021-01-06 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Sorry, reply the wrong message.
Reviewed-by: Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Deng,
>Emily
>Sent: Thursday, January 7, 2021 2:26 PM
>To: Zhao, Victor ; amd-gfx@lists.freedesktop.org
>Cc: Zhao, Victor 
>Subject: RE: [PATCH v2] drm/amdgpu/psp: fix psp gfx ctrl cmds
>
>[AMD Official Use Only - Internal Distribution Only]
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Reviewed-by: Evan Quan 
>
>>-Original Message-
>>From: amd-gfx  On Behalf Of
>>Victor Zhao
>>Sent: Tuesday, January 5, 2021 3:51 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Zhao, Victor 
>>Subject: [PATCH v2] drm/amdgpu/psp: fix psp gfx ctrl cmds
>>
>>psp GFX_CTRL_CMD_ID_CONSUME_CMD different for windows and linux,
>>according to psp, linux cmds are not correct.
>>
>>v2: only correct GFX_CTRL_CMD_ID_CONSUME_CMD.
>>
>>Signed-off-by: Victor Zhao 
>>---
>> drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>>b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>>index d65a5339d354..3ba7bdfde65d 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>>+++ b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>>@@ -47,7 +47,7 @@ enum psp_gfx_crtl_cmd_id
>> GFX_CTRL_CMD_ID_DISABLE_INT = 0x0006,   /* disable PSP-to-Gfx
>>interrupt */
>> GFX_CTRL_CMD_ID_MODE1_RST   = 0x0007,   /* trigger the Mode
>1
>>reset */
>> GFX_CTRL_CMD_ID_GBR_IH_SET  = 0x0008,   /* set Gbr
>>IH_RB_CNTL registers */
>>-GFX_CTRL_CMD_ID_CONSUME_CMD = 0x000A,   /* send interrupt
>>to psp for updating write pointer of vf */
>>+GFX_CTRL_CMD_ID_CONSUME_CMD = 0x0009,   /* send interrupt
>>to psp for updating write pointer of vf */
>> GFX_CTRL_CMD_ID_DESTROY_GPCOM_RING = 0x000C, /* destroy
>GPCOM
>>ring */
>>
>> GFX_CTRL_CMD_ID_MAX = 0x000F,   /* max command ID */
>>--
>>2.25.1
>>
>>___
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists
>>.f
>>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>>gfx&data=04%7C01%7CEmily.Deng%40amd.com%7Cb32008e7e8e144
>7
>>797ab08d8b14eb089%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%
>>7C637454298972763193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
>>wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&
>s
>>data=2HlggVF4%2B20Ceom5OBfyr0MXYdiLMblwgUbl%2FVEeqII%3D&r
>e
>>served=0
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfx&data=04%7C01%7CEmily.Deng%40amd.com%7Cdbe3378304364ec2
>161208d8b2d51127%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637455975520611398%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sda
>ta=lmtZ%2BZjKZEq73CsfCX%2FxEcM7rzWB7%2FuKDzuJ7j3Nx8k%3D&res
>erved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

79 matches

Mail list logo