[PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"

2024-05-24 Thread Armin Wolf
This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.

A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and verified that by reverting it the notebook works again.
He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.

An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.

Reported-by: Barry Kauler 
Signed-off-by: Armin Wolf 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 222a1d9ecf16..5f6c32ec674d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
if (r)
goto init_failed;

+   r = amdgpu_amdkfd_resume_iommu(adev);
+   if (r)
+   goto init_failed;
+
r = amdgpu_device_ip_hw_init_phase1(adev);
if (r)
goto init_failed;
@@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);

-   r = amdgpu_amdkfd_resume_iommu(adev);
-   if (r)
-   goto init_failed;
-
amdgpu_fru_get_product_info(adev);

 init_failed:
--
2.39.2



Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"

2024-06-04 Thread Armin Wolf

Am 23.05.24 um 19:30 schrieb Armin Wolf:


This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.

A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and verified that by reverting it the notebook works again.
He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.

An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.


Hi,

what is the status of this?

Armin Wolf



Reported-by: Barry Kauler 
Signed-off-by: Armin Wolf 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 222a1d9ecf16..5f6c32ec674d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
if (r)
goto init_failed;

+   r = amdgpu_amdkfd_resume_iommu(adev);
+   if (r)
+   goto init_failed;
+
r = amdgpu_device_ip_hw_init_phase1(adev);
if (r)
goto init_failed;
@@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);

-   r = amdgpu_amdkfd_resume_iommu(adev);
-   if (r)
-   goto init_failed;
-
amdgpu_fru_get_product_info(adev);

  init_failed:
--
2.39.2




Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"

2024-06-04 Thread Felix Kuehling



On 2024-06-03 18:19, Armin Wolf wrote:

Am 23.05.24 um 19:30 schrieb Armin Wolf:


This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.

A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and verified that by reverting it the notebook works again.
He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.

An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.


Hi,

what is the status of this?


Which branch is this for? This patch won't apply to anything after Linux 
6.5. Support for IOMMUv2 was removed from amdgpu in Linux 6.6 by:


commit c99a2e7ae291e5b19b60443eb6397320ef9e8571
Author: Alex Deucher 
Date:   Fri Jul 28 12:20:12 2023 -0400

    drm/amdkfd: drop IOMMUv2 support

    Now that we use the dGPU path for all APUs, drop the
    IOMMUv2 support.

    v2: drop the now unused queue manager functions for gfx7/8 APUs

    Reviewed-by: Felix Kuehling 
    Acked-by: Christian König 
    Tested-by: Mike Lothian 
    Signed-off-by: Alex Deucher 

Regards,
  Felix




Armin Wolf



Reported-by: Barry Kauler 
Signed-off-by: Armin Wolf 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 222a1d9ecf16..5f6c32ec674d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct 
amdgpu_device *adev)

  if (r)
  goto init_failed;

+    r = amdgpu_amdkfd_resume_iommu(adev);
+    if (r)
+    goto init_failed;
+
  r = amdgpu_device_ip_hw_init_phase1(adev);
  if (r)
  goto init_failed;
@@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct 
amdgpu_device *adev)

  if (!adev->gmc.xgmi.pending_reset)
  amdgpu_amdkfd_device_init(adev);

-    r = amdgpu_amdkfd_resume_iommu(adev);
-    if (r)
-    goto init_failed;
-
  amdgpu_fru_get_product_info(adev);

  init_failed:
--
2.39.2




RE: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"

2024-06-04 Thread Deucher, Alexander
[AMD Official Use Only - AMD Internal Distribution Only]

> -Original Message-
> From: Kuehling, Felix 
> Sent: Tuesday, June 4, 2024 2:25 PM
> To: Armin Wolf ; Deucher, Alexander
> ; Koenig, Christian
> ; Pan, Xinhui ;
> gre...@linuxfoundation.org; sas...@kernel.org
> Cc: sta...@vger.kernel.org; bkau...@gmail.com; Zhang, Yifan
> ; Liang, Prike ; dri-
> de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device
> init"
>
>
> On 2024-06-03 18:19, Armin Wolf wrote:
> > Am 23.05.24 um 19:30 schrieb Armin Wolf:
> >
> >> This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.
> >>
> >> A user reported that this commit breaks the integrated gpu of his
> >> notebook, causing a black screen. He was able to bisect the
> >> problematic commit and verified that by reverting it the notebook works
> again.
> >> He also confirmed that kernel 6.8.1 also works on his device, so the
> >> upstream commit itself seems to be ok.
> >>
> >> An amdgpu developer (Alex Deucher) confirmed that this patch should
> >> have never been ported to 5.15 in the first place, so revert this
> >> commit from the 5.15 stable series.
> >
> > Hi,
> >
> > what is the status of this?
>
> Which branch is this for? This patch won't apply to anything after Linux 6.5.

It's applicable to 5.15 stable only.  The original patch caused a regression on 
5.15 so probably should not have been applied there.

Alex


> Support for IOMMUv2 was removed from amdgpu in Linux 6.6 by:
>
> commit c99a2e7ae291e5b19b60443eb6397320ef9e8571
> Author: Alex Deucher 
> Date:   Fri Jul 28 12:20:12 2023 -0400
>
>  drm/amdkfd: drop IOMMUv2 support
>
>  Now that we use the dGPU path for all APUs, drop the
>  IOMMUv2 support.
>
>  v2: drop the now unused queue manager functions for gfx7/8 APUs
>
>  Reviewed-by: Felix Kuehling 
>  Acked-by: Christian König 
>  Tested-by: Mike Lothian 
>  Signed-off-by: Alex Deucher 
>
> Regards,
>Felix
>
>
> >
> > Armin Wolf
> >
> >>
> >> Reported-by: Barry Kauler 
> >> Signed-off-by: Armin Wolf 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
> >>   1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index 222a1d9ecf16..5f6c32ec674d 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct
> >> amdgpu_device *adev)
> >>   if (r)
> >>   goto init_failed;
> >>
> >> +r = amdgpu_amdkfd_resume_iommu(adev);
> >> +if (r)
> >> +goto init_failed;
> >> +
> >>   r = amdgpu_device_ip_hw_init_phase1(adev);
> >>   if (r)
> >>   goto init_failed;
> >> @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct
> >> amdgpu_device *adev)
> >>   if (!adev->gmc.xgmi.pending_reset)
> >>   amdgpu_amdkfd_device_init(adev);
> >>
> >> -r = amdgpu_amdkfd_resume_iommu(adev);
> >> -if (r)
> >> -goto init_failed;
> >> -
> >>   amdgpu_fru_get_product_info(adev);
> >>
> >>   init_failed:
> >> --
> >> 2.39.2
> >>
> >>


Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"

2024-06-11 Thread Armin Wolf

Am 04.06.24 um 20:28 schrieb Deucher, Alexander:


[AMD Official Use Only - AMD Internal Distribution Only]


-Original Message-
From: Kuehling, Felix 
Sent: Tuesday, June 4, 2024 2:25 PM
To: Armin Wolf ; Deucher, Alexander
; Koenig, Christian
; Pan, Xinhui ;
gre...@linuxfoundation.org; sas...@kernel.org
Cc: sta...@vger.kernel.org; bkau...@gmail.com; Zhang, Yifan
; Liang, Prike ; dri-
de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device
init"


On 2024-06-03 18:19, Armin Wolf wrote:

Am 23.05.24 um 19:30 schrieb Armin Wolf:


This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.

A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the
problematic commit and verified that by reverting it the notebook works

again.

He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.

An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.

Hi,

what is the status of this?

Which branch is this for? This patch won't apply to anything after Linux 6.5.

It's applicable to 5.15 stable only.  The original patch caused a regression on 
5.15 so probably should not have been applied there.

Alex


Correct, and i would be very grateful if this regression could be resolved in 
the near future.
The user already wrote a blog post about the whole issue, see here:

https://bkhome.org/news/202405/kernel-amd-gpu-disaster-fixed.html

Thanks,
Armin Wolf


Support for IOMMUv2 was removed from amdgpu in Linux 6.6 by:

commit c99a2e7ae291e5b19b60443eb6397320ef9e8571
Author: Alex Deucher 
Date:   Fri Jul 28 12:20:12 2023 -0400

  drm/amdkfd: drop IOMMUv2 support

  Now that we use the dGPU path for all APUs, drop the
  IOMMUv2 support.

  v2: drop the now unused queue manager functions for gfx7/8 APUs

  Reviewed-by: Felix Kuehling 
  Acked-by: Christian König 
  Tested-by: Mike Lothian 
  Signed-off-by: Alex Deucher 

Regards,
Felix



Armin Wolf


Reported-by: Barry Kauler 
Signed-off-by: Armin Wolf 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
   1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 222a1d9ecf16..5f6c32ec674d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct
amdgpu_device *adev)
   if (r)
   goto init_failed;

+r = amdgpu_amdkfd_resume_iommu(adev);
+if (r)
+goto init_failed;
+
   r = amdgpu_device_ip_hw_init_phase1(adev);
   if (r)
   goto init_failed;
@@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct
amdgpu_device *adev)
   if (!adev->gmc.xgmi.pending_reset)
   amdgpu_amdkfd_device_init(adev);

-r = amdgpu_amdkfd_resume_iommu(adev);
-if (r)
-goto init_failed;
-
   amdgpu_fru_get_product_info(adev);

   init_failed:
--
2.39.2




Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"

2024-06-12 Thread Matthew Ruffell
Hi Greg KH, Sasha,

Please pick up this patch for 5.15 stable tree. I have built a test kernel and
can confirm that it fixes affected users.

Downstream bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738

Thanks,
Matthew


Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"

2024-06-13 Thread Greg KH
On Wed, Jun 12, 2024 at 12:10:37PM +1200, Matthew Ruffell wrote:
> Hi Greg KH, Sasha,
> 
> Please pick up this patch for 5.15 stable tree. I have built a test kernel and
> can confirm that it fixes affected users.
> 
> Downstream bug:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738

Sorry for the delay, now picked up.

greg k-h


Patch "Revert "drm/amdgpu: init iommu after amdkfd device init"" has been added to the 5.15-stable tree

2024-06-13 Thread gregkh


This is a note to let you know that I've just added the patch titled

Revert "drm/amdgpu: init iommu after amdkfd device init"

to the 5.15-stable tree which can be found at:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
 revert-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


>From w_ar...@gmx.de  Wed Jun 12 14:43:21 2024
From: Armin Wolf 
Date: Thu, 23 May 2024 19:30:31 +0200
Subject: Revert "drm/amdgpu: init iommu after amdkfd device init"
To: alexander.deuc...@amd.com, christian.koe...@amd.com, xinhui@amd.com, 
gre...@linuxfoundation.org, sas...@kernel.org
Cc: sta...@vger.kernel.org, bkau...@gmail.com, yifan1.zh...@amd.com, 
prike.li...@amd.com, dri-de...@lists.freedesktop.org, 
amd-gfx@lists.freedesktop.org
Message-ID: <20240523173031.4212-1-w_ar...@gmx.de>

From: Armin Wolf 

This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.

A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and verified that by reverting it the notebook works again.
He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.

An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.

Reported-by: Barry Kauler 
Signed-off-by: Armin Wolf 
Link: https://lore.kernel.org/r/20240523173031.4212-1-w_ar...@gmx.de
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct
if (r)
goto init_failed;
 
+   r = amdgpu_amdkfd_resume_iommu(adev);
+   if (r)
+   goto init_failed;
+
r = amdgpu_device_ip_hw_init_phase1(adev);
if (r)
goto init_failed;
@@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);
 
-   r = amdgpu_amdkfd_resume_iommu(adev);
-   if (r)
-   goto init_failed;
-
amdgpu_fru_get_product_info(adev);
 
 init_failed:


Patches currently in stable-queue which might be from w_ar...@gmx.de are

queue-5.15/revert-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch


Re: Patch "Revert "drm/amdgpu: init iommu after amdkfd device init"" has been added to the 5.15-stable tree

2024-06-14 Thread Armin Wolf

Am 12.06.24 um 14:45 schrieb gre...@linuxfoundation.org:


This is a note to let you know that I've just added the patch titled

 Revert "drm/amdgpu: init iommu after amdkfd device init"

to the 5.15-stable tree which can be found at:
 
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
  revert-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch
and it can be found in the queue-5.15 subdirectory.


Thank you :)



If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


 From w_ar...@gmx.de  Wed Jun 12 14:43:21 2024
From: Armin Wolf 
Date: Thu, 23 May 2024 19:30:31 +0200
Subject: Revert "drm/amdgpu: init iommu after amdkfd device init"
To: alexander.deuc...@amd.com, christian.koe...@amd.com, xinhui@amd.com, 
gre...@linuxfoundation.org, sas...@kernel.org
Cc: sta...@vger.kernel.org, bkau...@gmail.com, yifan1.zh...@amd.com, 
prike.li...@amd.com, dri-de...@lists.freedesktop.org, 
amd-gfx@lists.freedesktop.org
Message-ID: <20240523173031.4212-1-w_ar...@gmx.de>

From: Armin Wolf 

This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.

A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and verified that by reverting it the notebook works again.
He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.

An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.

Reported-by: Barry Kauler 
Signed-off-by: Armin Wolf 
Link: https://lore.kernel.org/r/20240523173031.4212-1-w_ar...@gmx.de
Signed-off-by: Greg Kroah-Hartman 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |8 
  1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct
if (r)
goto init_failed;

+   r = amdgpu_amdkfd_resume_iommu(adev);
+   if (r)
+   goto init_failed;
+
r = amdgpu_device_ip_hw_init_phase1(adev);
if (r)
goto init_failed;
@@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);

-   r = amdgpu_amdkfd_resume_iommu(adev);
-   if (r)
-   goto init_failed;
-
amdgpu_fru_get_product_info(adev);

  init_failed:


Patches currently in stable-queue which might be from w_ar...@gmx.de are

queue-5.15/revert-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch