Re: [PATCH] Revert "drm/radeon: handle PCIe root ports with addressing limitations"

2020-09-16 Thread Alex Deucher
On Wed, Sep 16, 2020 at 3:04 AM Christoph Hellwig  wrote:
>
> On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
> > This change breaks tons of systems.
>
> Did you do at least some basic root causing on why?  Do GPUs get
> fed address they can't deal with?  Any examples?
>
> Bug 1 doesn't seem to contain any analysis and was reported against
> a very old kernel that had all kind of fixes since.
>
> Bug 2 seems to imply a drm kthread is accessing some structure it
> shouldn't, which would imply a mismatch between pools used by radeon
> now and those actually provided by the core.  Something that should
> be pretty to trivial to fix for someone understanding the whole ttm
> pool maze.
>
> Bug 3: same as 1, but an even older kernel.
>
> Bug 4: looks like 1 and 3, and actually verified to work properly
> in 5.9-rc.  Did you try to get the other reporters test this as well?

It would appear that the change in 5.9 to disable AGP on radeon fixed
the issue.  I'm following up on the other tickets to see if I can get
confirmation.  On another thread[1], the user was able to avoid the
issue by disabling HIMEM.  Looks like some issue with HIMEM and/or
AGP.

Alex

[1] https://lkml.org/lkml/2019/12/14/263
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] Revert "drm/radeon: handle PCIe root ports with addressing limitations"

2020-09-16 Thread Alex Deucher
On Wed, Sep 16, 2020 at 2:32 AM Greg KH  wrote:
>
> On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
> > This change breaks tons of systems.
>
> Very vague :(

Screen corruption making the system unusable.

>
> This commit has also been merged for over a year, why the sudden
> problem now?
>

It was noticed by several people closer to when the change went in as
well.  If you notice, most of the bugs date back quite a while.  We
looked into it a bit at the time but couldn't determine the problem.
It only seems to affect really old chips (like 15-20 years old) which
makes it hard to reproduce if you don't have an old system.  There
were a couple of threads at the time, but nothing was resolved.  I was
able to find one of them:
https://lkml.org/lkml/2019/12/14/263

There were several new bugs filed which brought the issue back to my
attention recently.

> > This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.
>
> You mean "33b3ad3788ab ("drm/radeon: handle PCIe root ports with
> addressing limitations")"?
>
> That's the proper way to reference commits in changelogs please.  It's
> even documented that way...

When you revert a patch with git, that is what it does.  Maybe we
should fix git to change the formatting.

>
> >
> > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973
> > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697
> > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763
> > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140
> > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287
> > Signed-off-by: Alex Deucher 
> > Cc: sta...@vger.kernel.org
> > Cc: Christoph Hellwig 
> > Cc: christian.koe...@amd.com
>
> Fixes: 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing 
> limitations")
>

Sure, I can add that, but it doesn't really fix it, it reverts it.
But point taken, it does fix the commit by removing it.

Alex

> as well?
>
> thanks,
>
> greg k-h
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] Revert "drm/radeon: handle PCIe root ports with addressing limitations"

2020-09-16 Thread Daniel Vetter
On Wed, Sep 16, 2020 at 08:33:00AM +0200, Greg KH wrote:
> On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
> > This change breaks tons of systems.
> 
> Very vague :(
> 
> This commit has also been merged for over a year, why the sudden
> problem now?

Unrelated rant, but one year is generally what it takes for most users to
upgrade to new kernels, through their distro updates. Especially for older
hw like the radeon drivers (since 5 years or so amd gpus switched over to
amdgpu.ko).

So surprise that bugs only show up after 1+ year shouldn't be a surprise
:-) My personal rule is that I put a 1 year spacer between a risky change
and any cleanup that enables. Too many regrets in the past.

Cheers, Daniel

> 
> > This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.
> 
> You mean "33b3ad3788ab ("drm/radeon: handle PCIe root ports with
> addressing limitations")"?
> 
> That's the proper way to reference commits in changelogs please.  It's
> even documented that way...
> 
> > 
> > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973
> > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697
> > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763
> > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140
> > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287
> > Signed-off-by: Alex Deucher 
> > Cc: sta...@vger.kernel.org
> > Cc: Christoph Hellwig 
> > Cc: christian.koe...@amd.com
> 
> Fixes: 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing 
> limitations")
> 
> as well?
> 
> thanks,
> 
> greg k-h
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] Revert "drm/radeon: handle PCIe root ports with addressing limitations"

2020-09-15 Thread Greg KH
On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
> This change breaks tons of systems.

Very vague :(

This commit has also been merged for over a year, why the sudden
problem now?

> This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.

You mean "33b3ad3788ab ("drm/radeon: handle PCIe root ports with
addressing limitations")"?

That's the proper way to reference commits in changelogs please.  It's
even documented that way...

> 
> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973
> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697
> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287
> Signed-off-by: Alex Deucher 
> Cc: sta...@vger.kernel.org
> Cc: Christoph Hellwig 
> Cc: christian.koe...@amd.com

Fixes: 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing 
limitations")

as well?

thanks,

greg k-h
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] Revert "drm/radeon: handle PCIe root ports with addressing limitations"

2020-09-15 Thread Alex Deucher
This change breaks tons of systems.

This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287
Signed-off-by: Alex Deucher 
Cc: sta...@vger.kernel.org
Cc: Christoph Hellwig 
Cc: christian.koe...@amd.com
---
 drivers/gpu/drm/radeon/radeon.h|  1 +
 drivers/gpu/drm/radeon/radeon_device.c | 13 -
 drivers/gpu/drm/radeon/radeon_ttm.c|  2 +-
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index b7c3fb2bfb54..eed23dffccf4 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2391,6 +2391,7 @@ struct radeon_device {
struct radeon_wbwb;
struct radeon_dummy_pagedummy_page;
boolshutdown;
+   boolneed_dma32;
boolneed_swiotlb;
boolaccel_working;
boolfastfb_working; /* IGP feature*/
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 266e3cbbd09b..f74c74ad8b5d 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1363,25 +1363,28 @@ int radeon_device_init(struct radeon_device *rdev,
else
rdev->mc.mc_mask = 0xULL; /* 32 bit MC */
 
-   /* set DMA mask.
+   /* set DMA mask + need_dma32 flags.
 * PCIE - can handle 40-bits.
 * IGP - can handle 40-bits
 * AGP - generally dma32 is safest
 * PCI - dma32 for legacy pci gart, 40 bits on newer asics
 */
-   dma_bits = 40;
+   rdev->need_dma32 = false;
if (rdev->flags & RADEON_IS_AGP)
-   dma_bits = 32;
+   rdev->need_dma32 = true;
if ((rdev->flags & RADEON_IS_PCI) &&
(rdev->family <= CHIP_RS740))
-   dma_bits = 32;
+   rdev->need_dma32 = true;
 #ifdef CONFIG_PPC64
if (rdev->family == CHIP_CEDAR)
-   dma_bits = 32;
+   rdev->need_dma32 = true;
 #endif
 
+   dma_bits = rdev->need_dma32 ? 32 : 40;
r = dma_set_mask_and_coherent(&rdev->pdev->dev, DMA_BIT_MASK(dma_bits));
if (r) {
+   rdev->need_dma32 = true;
+   dma_bits = 32;
pr_warn("radeon: No suitable DMA available\n");
return r;
}
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 357e8e98cca9..d2550862313e 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -787,7 +787,7 @@ int radeon_ttm_init(struct radeon_device *rdev)
   &radeon_bo_driver,
   rdev->ddev->anon_inode->i_mapping,
   rdev->ddev->vma_offset_manager,
-  dma_addressing_limited(&rdev->pdev->dev));
+  rdev->need_dma32);
if (r) {
DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
return r;
-- 
2.25.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel