[amdgpu] ASSERT()'s in write_i2c*retimer_setting() functions

2019-10-11 Thread Gabriel C
Hello,

I've built recently a new box with a Ryzen3 2200G APU.

Each time I plug in an HDMI cable ( to a TV or Monitor ),
or boot with HDMI connected a lot ASSERT()'s from
write_i2c*retimer_setting() functions are triggered.

I see the same on a Laptop with a Ryzen7 3750H with
hybrid GPU configuration.

Besides the noise in dmesg and the delay on boot,
everything is working fine. I cannot find anything wrong
or broken.

Since the write errors seem to not be fatal, I think a friendly message
would help more instead of flooding the dmesg with dumps while
everything is working properly.

Why is ASSERT() used there?

I have a dmesg from the Ryzen3 box with drm.debug and a snipped
from the Laptop ( not near me right now ) uploaded there:

https://crazy.dev.frugalware.org/amdgpu/

Please let me know if you need more information,
If needed I can upload a dmesg from the Laptop with drm.debug too.


Best Regards,

Gabriel C
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Regression] 5.3-rc8 suspending from X broken with amdgpu

2019-09-12 Thread Gabriel C
Hello,

I am testing latest rc8/Linus git tree on my new
Acer Nitro 5 (AN515-43-R8BF) Laptop.

The box has an Ryzen7 3750H APU+RX 560x hybrid GPU(s).

Suspending ( closing the Lid ) from tty without X up
is working fine, however with X running doing the same
does not work. The display remains black.

It seems to be triggered from

 .. dcn10_hw_sequencer.c:932
dcn10_verify_allow_pstate_change_high.cold+0xc/0x229 [amdgpu]

The dmesg is way to big to post there so I uploaded it:

dmesg:
 http://crazy.dev.frugalware.org/Nitro5/dmesg.one.txt
lspci:
 http://crazy.dev.frugalware.org/Nitro5/lspci.nnvv.txt
 http://crazy.dev.frugalware.org/Nitro5/lspci.txt
config:
 http://crazy.dev.frugalware.org/Nitro5/config.nitro5-5.3-r8git

I didn't tested any other rcX kernels so I cannot tell if all are affected,
but 5.2.x kernels are working fine on this box.


The dirty state of the build is because this patch, which fixes the
NVME device on that box:
 https://lkml.org/lkml/2019/9/11/569

If you need more infos please let me know.
Also I can test any kind patches.

Best Regards,

Gabriel C
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-11 Thread Gabriel C
2018-06-08 8:52 GMT+02:00 Christian König :
> Am 08.06.2018 um 08:02 schrieb Christoph Hellwig:
>>
>> On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
>>>
>>> Ok done.. bisect points to:
>>
>> What is the failure mode you are seeing?  Can't find anything in the
>> mail unfortunately.
>
>
> As far as I analyzed it we now get an -ENOMEM from dma_alloc_attrs() in
> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c when IOMMU is enabled.
>
> Still need to figure out which parameters we want to use for the allocation,
> but I think it is only 4k or 8k.

When you guys need me to test something , or run debug patches
or patches of any sort just let me know..

>
> Regards,
> Christian.

BR
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-08 Thread Gabriel C
2018-06-07 9:07 GMT+02:00 Christian König :
> Am 06.06.2018 um 17:44 schrieb Gabriel C:
>>
>> 2018-06-06 17:03 GMT+02:00 Michel Dänzer :
>>>
>>> On 2018-06-06 04:44 PM, Christian König wrote:
>>>>
>>>> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>>> [SNIP]
>>>> At least in theory it should work when we use the coherent DMA
>>>> allocator.
>>>>
>>>> When that really worked before, so the most likely commit which broke
>>>> this is:
>>>>
>>>> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
>>>> Author: Chunming Zhou 
>>>> Date:   Fri Feb 9 10:44:09 2018 +0800
>>>>
>>>>  drm/amdgpu: only enable swiotlb alloc when need v2
>>>>
>>>>  get the max io mapping address of system memory to see if it is
>>>> over
>>>>  our card accessing range.
>>>>  v2: move checking later
>>>>
>>>>  Signed-off-by: Chunming Zhou 
>>>>  Reviewed-by: Monk Liu 
>>>>  Reviewed-by: Christian König 
>>>>  Signed-off-by: Alex Deucher 
>>>>
>>>> Currently looking into how we could somehow improve this detection.
>>>
>>> I guess this could fit for Gabriel, but e.g.
>>> https://bugs.freedesktop.org/104437 says amdgpu was already broken with
>>> SME in 4.15, if not 4.14 (I suspect there was simply no SME support
>>> earlier).
>
>
> And what I totally missed is that Gabriel is using radeon and not amdgpu.
>
> So Gabriel you need to revert this one for testing:
> commit 1bc3d3cce8c3b44c2b5ac6cee98c830bb40e6b0f
> Author: Chunming Zhou 
> Date:   Fri Feb 9 10:44:10 2018 +0800
>
> drm/radeon: only enable swiotlb path when need v2
>
> swiotlb expands our card accessing range, but its path always is slower
> than ttm pool allocation.
> So add condition to use it.
> v2: move a bit later
>
> Signed-off-by: Chunming Zhou 
> Reviewed-by: Monk Liu 
> Reviewed-by: Christian König 
> Signed-off-by: Alex Deucher 
> Link:
> https://patchwork.freedesktop.org/patch/msgid/20180209024410.1469-3-david1.z...@amd.com
>
>> I got strange performance issue with 4.15 and 4.16 .. but SME was ON
>> on that setup ( even before it hit mainline ) and never broke the GPU like
>> this.
>
>
> Well that is very interesting, you are the first one who reports that SME +
> GFX works in some way. So far we only got negative reports for that.
>
>> There is a 4.16.13 boot dmesg which has no such issue:
>>
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt
>>
>> With the setup as is booting 4.16.x works , while 4.17 trows the errors.
>
>
> Please do the bisect if the patch I've mentioned above doesn't help.

Ok done.. bisect points to:

b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit
commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
Author: Christoph Hellwig 
Date:   Mon Mar 19 11:38:19 2018 +0100

   iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()

   This cleans up the code a lot by removing duplicate logic.

   Tested-by: Tom Lendacky 
   Tested-by: Joerg Roedel 
   Signed-off-by: Christoph Hellwig 
   Reviewed-by: Thomas Gleixner 
   Acked-by: Joerg Roedel 
   Cc: David Woodhouse 
   Cc: Joerg Roedel 
   Cc: Jon Mason 
   Cc: Konrad Rzeszutek Wilk 
   Cc: Linus Torvalds 
   Cc: Muli Ben-Yehuda 
   Cc: Peter Zijlstra 
   Cc: io...@lists.linux-foundation.org
   Link: http://lkml.kernel.org/r/20180319103826.12853-8-...@lst.de
   Signed-off-by: Ingo Molnar 


I'll try to revert this once I'm home.

BR
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-08 Thread Gabriel C
>> Well that is very interesting, you are the first one who reports that SME +
>> GFX works in some way. So far we only got negative reports for that.
>>
>>> There is a 4.16.13 boot dmesg which has no such issue:
>>>
>>>
>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt
>>>
>>> With the setup as is booting 4.16.x works , while 4.17 trows the errors.
>>
>>
>> Please do the bisect if the patch I've mentioned above doesn't help.
>
> Ok done.. bisect points to:
>
> b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit
> commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
> Author: Christoph Hellwig 
> Date:   Mon Mar 19 11:38:19 2018 +0100
>
>iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()
>
>This cleans up the code a lot by removing duplicate logic.
>
>Tested-by: Tom Lendacky 
>Tested-by: Joerg Roedel 
>Signed-off-by: Christoph Hellwig 
>Reviewed-by: Thomas Gleixner 
>Acked-by: Joerg Roedel 
>Cc: David Woodhouse 
>Cc: Joerg Roedel 
>Cc: Jon Mason 
>Cc: Konrad Rzeszutek Wilk 
>Cc: Linus Torvalds 
>Cc: Muli Ben-Yehuda 
>Cc: Peter Zijlstra 
>Cc: io...@lists.linux-foundation.org
>Link: http://lkml.kernel.org/r/20180319103826.12853-8-...@lst.de
>Signed-off-by: Ingo Molnar 
>
>
> I'll try to revert this once I'm home.

I can confirm reverting b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
fixes that issue for me.

The GPU is working fine with SME enabled.

Now with working GPU :) I can also confirm performance is back to normal
without doing any other workarounds.

The only app still acting up a bit is Firefox , just minor frame drops,
but nothing to bad.  ( probably an Firefox bug too )

crhomium/chrome is fine .. even with 10 tabs open , each one playing
an video on youtube no glitches at all.

Desktop is also fine now,  could not find anything wrong.


BR
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: AMD graphics performance regression in 4.15 and later

2018-06-07 Thread Gabriel C
2018-04-11 7:02 GMT+02:00 Gabriel C :
>>2018-04-11 6:00 GMT+02:00 Gabriel C :
>> 2018-04-09 11:42 GMT+02:00 Christian König 
>> :
>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
> ...
>> I can help testing code for 4.17/++ if you wish but that is *different* 
>> storry.
>>
>
> Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver
> are broken now in this one.
>
> radeon tells:
>
> ...
>
> [6.337838] [drm] PCIE GART of 2048M enabled (table at 0x001D6000).
> [6.338210] radeon :21:00.0: (-12) create WB bo failed
> [6.338214] radeon :21:00.0: disabling GPU acceleration
>
> ...
>

I have the same Issue now on final 4.17.

Also I played with BIOS options also which does not fix anything but
changes the error message.

IOMMU && SR-IOV disabled the error changes to this :

[7.092044] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0
test failed (scratch(0x850C)=0xCAFEDEAD)
[7.092059] radeon :21:00.0: disabling GPU acceleration


While I could workaround SWIOTLB bugs in 4.15 and 4.16 , 4.17 seems to
kill the GPU with no way
for me to make it work ( at least I could not find any workaround by now )

BR
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-07 Thread Gabriel C
2018-06-06 17:03 GMT+02:00 Michel Dänzer :
> On 2018-06-06 04:44 PM, Christian König wrote:
>> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>>> 2018-06-06 14:19 GMT+02:00 Christian König :
>>>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>>>> 2018-06-06 13:33 GMT+02:00 Christian König :
>>>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>
>>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>>>>
>>>>>>
>>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>>>>
>>>>>>
>>>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>>>
>>>>>
>>>>> That has nothing TODO with the driver nor the original bug you
>>>>> reported. The
>>>>> problem is that SME is active and that is currently not supported at
>>>>> all
>>>>> with a that hardware.
>>>>
>>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>>> release ?
>>>>
>>>> SME was like this in kernel 4.16.x here and all worked.
>>>
>>> If that is true, again please bisect which commit broke it.
>>>
>>> All the reports I've seen before this indicated that at least amdgpu
>>> has never worked with SME (which BTW doesn't mean it's never going to
>>> work or that we don't want to support it, just that as far as we know
>>> it's currently not working).
>>
>> At least in theory it should work when we use the coherent DMA allocator.
>>
>> When that really worked before, so the most likely commit which broke
>> this is:
>>
>> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
>> Author: Chunming Zhou 
>> Date:   Fri Feb 9 10:44:09 2018 +0800
>>
>> drm/amdgpu: only enable swiotlb alloc when need v2
>>
>> get the max io mapping address of system memory to see if it is over
>> our card accessing range.
>> v2: move checking later
>>
>> Signed-off-by: Chunming Zhou 
>> Reviewed-by: Monk Liu 
>> Reviewed-by: Christian König 
>> Signed-off-by: Alex Deucher 
>>
>> Currently looking into how we could somehow improve this detection.
>
> I guess this could fit for Gabriel, but e.g.
> https://bugs.freedesktop.org/104437 says amdgpu was already broken with
> SME in 4.15, if not 4.14 (I suspect there was simply no SME support
> earlier).

I got strange performance issue with 4.15 and 4.16 .. but SME was ON
on that setup ( even before it hit mainline ) and never broke the GPU like this.

There is a 4.16.13 boot dmesg which has no such issue:

http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt

With the setup as is booting 4.16.x works , while 4.17 trows the errors.

>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-07 Thread Gabriel C
2018-06-06 14:19 GMT+02:00 Christian König :
> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>
>> 2018-06-06 13:33 GMT+02:00 Christian König :
>>>
>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>>
>>>> 2018-04-11 7:02 GMT+02:00 Gabriel C :
>>>>>>
>>>>>> 2018-04-11 6:00 GMT+02:00 Gabriel C :
>>>>>> 2018-04-09 11:42 GMT+02:00 Christian König
>>>>>> :
>>>>>>>
>>>>>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>>>>>
>>>>> ...
>>>>>>
>>>>>> I can help testing code for 4.17/++ if you wish but that is
>>>>>> *different*
>>>>>> storry.
>>>>>>
>>>>> Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver
>>>>> are broken now in this one.
>>>>>
>>>>> radeon tells:
>>>>>
>>>>> ...
>>>>>
>>>>> [6.337838] [drm] PCIE GART of 2048M enabled (table at
>>>>> 0x001D6000).
>>>>> [6.338210] radeon :21:00.0: (-12) create WB bo failed
>>>>> [6.338214] radeon :21:00.0: disabling GPU acceleration
>>>>>
>>>>> ...
>>>>>
>>>> I have the same Issue now on final 4.17.
>>>
>>>
>>> Actually Michel came up with a fix for the performance regression which
>>> is
>>> now backported to older kernels as well.
>>>
>>> So the original issue of this mail thread should be fixed by now.
>>
>> Ok , will test as soon I get the GPU to work :))
>>
>>>> Also I played with BIOS options also which does not fix anything but
>>>> changes the error message.
>>>>
>>>> IOMMU && SR-IOV disabled the error changes to this :
>>>>
>>>> [7.092044] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0
>>>> test failed (scratch(0x850C)=0xCAFEDEAD)
>>>> [7.092059] radeon :21:00.0: disabling GPU acceleration
>>>>
>>>>
>>>> While I could workaround SWIOTLB bugs in 4.15 and 4.16 , 4.17 seems to
>>>> kill the GPU with no way
>>>> for me to make it work ( at least I could not find any workaround by now
>>>> )
>>>
>>>
>>> That actually sounds like something completely different. Can you provide
>>> a
>>> full dmesg of radeon and/or amdgpu?
>>
>> Sure here from boot with IOMMU/SR-IOV ON/OFF in BIOS :
>>
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>
>> Also nothing else changed in that setup just testing kernel 4.17.
>
>
> That has nothing TODO with the driver nor the original bug you reported. The
> problem is that SME is active and that is currently not supported at all
> with a that hardware.

Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?

SME was like this in kernel 4.16.x here and all worked.

Also if you don't support SME at all now on that Hardware while worked before
please add proper error handling and proper dmesg messages
letting the user know.

radeon:  : SME not supported on that Hardware anymore , please
disable SME...
radeon: : Update your GPU < or whatever >

How hard would be that ?

No one but developers , can guess from these error messges why his
hardware  suddenly  isn't working anymore by just updating the kernel.


>
> Try to disable SME either in the BIOS or on the kernel command line.

Yes that works but is not the point.

Really you just can't break users setups like this.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-07 Thread Gabriel C
2018-06-06 16:44 GMT+02:00 Christian König :
> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>
>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>>
>>> 2018-06-06 14:19 GMT+02:00 Christian König :
>>>>
>>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>>>
>>>>> 2018-06-06 13:33 GMT+02:00 Christian König :
>>>>>>
>>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>>>>>
>>>>>>> 2018-04-11 7:02 GMT+02:00 Gabriel C :
>>>>>>>>
>>>>>>>>
>>>>>>>> [6.337838] [drm] PCIE GART of 2048M enabled (table at
>>>>>>>> 0x001D6000).
>>>>>>>> [6.338210] radeon :21:00.0: (-12) create WB bo failed
>>>>>>>> [6.338214] radeon :21:00.0: disabling GPU acceleration
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>> I have the same Issue now on final 4.17.
>>
>>
>> Please file a bug report, and ideally bisect which commit(s) introduced
>> the issue(s).
>>
>>
>>>>>
>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>>>
>>>>>
>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>>>
>>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>>
>>>>
>>>>
>>>> That has nothing TODO with the driver nor the original bug you reported.
>>>> The
>>>> problem is that SME is active and that is currently not supported at all
>>>> with a that hardware.
>>>
>>>
>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>> release ?
>>>
>>> SME was like this in kernel 4.16.x here and all worked.
>>
>>
>> If that is true, again please bisect which commit broke it.
>>
>> All the reports I've seen before this indicated that at least amdgpu has
>> never worked with SME (which BTW doesn't mean it's never going to work or
>> that we don't want to support it, just that as far as we know it's currently
>> not working).
>
>
> At least in theory it should work when we use the coherent DMA allocator.
>
> When that really worked before, so the most likely commit which broke this
> is:
>
> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
> Author: Chunming Zhou 
> Date:   Fri Feb 9 10:44:09 2018 +0800
>
> drm/amdgpu: only enable swiotlb alloc when need v2
>
> get the max io mapping address of system memory to see if it is over
> our card accessing range.
> v2: move checking later
>
> Signed-off-by: Chunming Zhou 
> Reviewed-by: Monk Liu 
> Reviewed-by: Christian König 
> Signed-off-by: Alex Deucher 
>
> Currently looking into how we could somehow improve this detection.

Is not this one , I've build an kernel with this reverted.

I'll do an bisect tonight or tomorrow.

>
> Regards,
> Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: AMD graphics performance regression in 4.15 and later

2018-06-07 Thread Gabriel C
2018-06-06 13:33 GMT+02:00 Christian König :
> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>
>> 2018-04-11 7:02 GMT+02:00 Gabriel C :
>>>>
>>>> 2018-04-11 6:00 GMT+02:00 Gabriel C :
>>>> 2018-04-09 11:42 GMT+02:00 Christian König
>>>> :
>>>>>
>>>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>>>
>>> ...
>>>>
>>>> I can help testing code for 4.17/++ if you wish but that is *different*
>>>> storry.
>>>>
>>> Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver
>>> are broken now in this one.
>>>
>>> radeon tells:
>>>
>>> ...
>>>
>>> [6.337838] [drm] PCIE GART of 2048M enabled (table at
>>> 0x001D6000).
>>> [6.338210] radeon :21:00.0: (-12) create WB bo failed
>>> [6.338214] radeon :21:00.0: disabling GPU acceleration
>>>
>>> ...
>>>
>> I have the same Issue now on final 4.17.
>
>
> Actually Michel came up with a fix for the performance regression which is
> now backported to older kernels as well.
>
> So the original issue of this mail thread should be fixed by now.

Ok , will test as soon I get the GPU to work :))

>
>> Also I played with BIOS options also which does not fix anything but
>> changes the error message.
>>
>> IOMMU && SR-IOV disabled the error changes to this :
>>
>> [7.092044] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0
>> test failed (scratch(0x850C)=0xCAFEDEAD)
>> [7.092059] radeon :21:00.0: disabling GPU acceleration
>>
>>
>> While I could workaround SWIOTLB bugs in 4.15 and 4.16 , 4.17 seems to
>> kill the GPU with no way
>> for me to make it work ( at least I could not find any workaround by now )
>
>
> That actually sounds like something completely different. Can you provide a
> full dmesg of radeon and/or amdgpu?

Sure here from boot with IOMMU/SR-IOV ON/OFF in BIOS :

http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt

Also nothing else changed in that setup just testing kernel 4.17.

I can force the GPU to use amdgpu if you wish and post dmesg's too.
Just let me know
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: AMD graphics performance regression in 4.15 and later

2018-04-12 Thread Gabriel C
2018-04-12 0:20 GMT+02:00 Gabriel C <nix.or@gmail.com>:
> 2018-04-11 20:35 GMT+02:00 Jean-Marc Valin <jmva...@mozilla.com>:
>> On 04/11/2018 05:37 AM, Christian König wrote:
>>>> With your patches my EPYC box is unusable with  4.15++ kernels.
>>>> The whole Desktop is acting weird.  This one is using
>>>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>>>
>>>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>>>
>>>> Also a 14C Xeon box with a HD7700 is broken same way.
>>>
>>> The hardware is irrelevant for this. We need to know what software stack
>>> you use on top of it.
>>
>> Well, the hardware appears to be part of the issue too. I don't think
>> it's a coincidence that Gabriel has the problem on 2xEPYC, I have it on
>> 2xXeon and the previous reported had it on a Core 2 Quad that internally
>> has two dies.
>>
>> I've not yet tested your disable CONFIG_SWIOTLB fix yet -- might try it
>> over the weekend and report what happens.
>>
>
> To get that right .. is only a matter of disabling SWIOTLB *code*
> while CONFIG_SWIOTLB is still set ?

Ok I tested that on 4.16.1 and yes it does work. However I didn't like the
#if 0 method and so compile an kernel twice just to compare an test.

I created an small patch and added swiotlb option for amdgpu and radeon
so I can boot and compare / test with and without SWIOTLB code.

( not meant for upstream )

http://ftp.frugalware.org/pub/other/people/crazy/0001-Make-it-possible-to-disable-SWIOTLB-code-on-admgpu-a.patch

With SWIOTLB code off all works fine , while hell breaks when turning on.

Maybe similar options should be added upstream until code is more
stable in 4.17/4.18

Regards
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: AMD graphics performance regression in 4.15 and later

2018-04-12 Thread Gabriel C
2018-04-11 11:37 GMT+02:00 Christian König <christian.koe...@amd.com>:
> Am 11.04.2018 um 06:00 schrieb Gabriel C:
>>
>> 2018-04-09 11:42 GMT+02:00 Christian König
>> <ckoenig.leichtzumer...@gmail.com>:
>>>
>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>>>>
>>>> Hi Christian,
>>>>
>>>> Thanks for the info. FYI, I've also opened a Firefox bug for that at:
>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
>>>> Feel free to comment since you have a better understanding of what's
>>>> going on.
>>>>
>>>> One last question: right now I'm running 4.15.0 with the "offending"
>>>> patch reverted. Is that safe to run or are there possible bad
>>>> interactions with other changes.
>>>
>>>
>>> That should work without problems.
>>>
>>> But I just had another idea as well, if you want you could still test the
>>> new code path which will be using in 4.17.
>>>
>> While Firefox may do some strange things is not about only Firefox.
>>
>> With your patches my EPYC box is unusable with  4.15++ kernels.
>> The whole Desktop is acting weird.  This one is using
>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>
>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>
>> Also a 14C Xeon box with a HD7700 is broken same way.
>
>
> The hardware is irrelevant for this. We need to know what software stack you
> use on top of it.
>
> E.g. desktop environment/Mesa and DDX version etc...

Plasma 5.12.4 compiled wth frameworks 5.44.0 , Qt5 5.10.1
mesa 18.0.0 and mesa 17.3.7 on the other box
Xorg is 1.19.6
xf86-video-amdgpu and xf86-video-ati both 18.0.1

>
>>
>> Everything breaks in X .. scrolling , moving windows , flickering etc.
>>
>>
>> reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
>> 648bc3574716400acc06f99915815f80d9563783
>> from an 4.15 kernel makes things work again.
>>
>>
>>> Backporting all the detection logic is to invasive, but you could just go
>>> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
>>> code path.
>>>
>>> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>>>
>> Well you really can't be serious about these suggestions ? Are you ?
>>
>> Telling peoples to #if 0 random code is not a solution.
>
>
> That is for testing and not a permanent solution.
>
>> You broke existsing working userland with your patches and at least
>> please fix that for 4.16.
>>
>> I can help testing code for 4.17/++ if you wish but that is *different*
>> storry.
>
>
> Please test Alex's amd-staging-drm-next branch from
> git://people.freedesktop.org/~agd5f/linux.

I'm on it just the connection to freedesktop.org is slow as hell.
Will take a while to get that branch with 62KiB/s :)

Regards
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: AMD graphics performance regression in 4.15 and later

2018-04-12 Thread Gabriel C
2018-04-11 20:35 GMT+02:00 Jean-Marc Valin :
> On 04/11/2018 05:37 AM, Christian König wrote:
>>> With your patches my EPYC box is unusable with  4.15++ kernels.
>>> The whole Desktop is acting weird.  This one is using
>>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>>
>>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>>
>>> Also a 14C Xeon box with a HD7700 is broken same way.
>>
>> The hardware is irrelevant for this. We need to know what software stack
>> you use on top of it.
>
> Well, the hardware appears to be part of the issue too. I don't think
> it's a coincidence that Gabriel has the problem on 2xEPYC, I have it on
> 2xXeon and the previous reported had it on a Core 2 Quad that internally
> has two dies.
>
> I've not yet tested your disable CONFIG_SWIOTLB fix yet -- might try it
> over the weekend and report what happens.
>

To get that right .. is only a matter of disabling SWIOTLB *code*
while CONFIG_SWIOTLB is still set ?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: AMD graphics performance regression in 4.15 and later

2018-04-12 Thread Gabriel C
2018-04-11 16:26 GMT+02:00 Gabriel C <nix.or@gmail.com>:
> 2018-04-11 11:37 GMT+02:00 Christian König <christian.koe...@amd.com>:
>> Am 11.04.2018 um 06:00 schrieb Gabriel C:

...
>>
>> Please test Alex's amd-staging-drm-next branch from
>> git://people.freedesktop.org/~agd5f/linux.
>
> I'm on it just the connection to freedesktop.org is slow as hell.
> Will take a while to get that branch with 62KiB/s :)
>

Testing done on that branch on commit 24110c70630998dc83da23cae910a9538f54ef64.

On default Plasma OpenGL 2.0 profiles things are still laggy but a lot better.
On OpenGL 3.1 things are working much better just minor gliches on
maximzing/minimizing windows.

Firefox is still broken , frames drops , video stops etc
Cromium-browser works fine
Otter-browser does not work at all
Qupzilla/Falkon has Firefox like issues too

Things I noticed while testing Firefox or Qupzilla..
Once these start acting up it does affect the whole Desktop,
for some secons scrolling lags , mouse is slow , etc.
Once these are closed the Desktop start working again after few seconds.


Do you want me to test any mesa/xorg-server/drivers git/branches too ?

If so just let me know.

Regards
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Gabriel C
2018-04-09 11:42 GMT+02:00 Christian König <ckoenig.leichtzumer...@gmail.com>:
> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>>
>> Hi Christian,
>>
>> Thanks for the info. FYI, I've also opened a Firefox bug for that at:
>> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
>> Feel free to comment since you have a better understanding of what's
>> going on.
>>
>> One last question: right now I'm running 4.15.0 with the "offending"
>> patch reverted. Is that safe to run or are there possible bad
>> interactions with other changes.
>
>
> That should work without problems.
>
> But I just had another idea as well, if you want you could still test the
> new code path which will be using in 4.17.
>

While Firefox may do some strange things is not about only Firefox.

With your patches my EPYC box is unusable with  4.15++ kernels.
The whole Desktop is acting weird.  This one is using
an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.

Box is  2 * EPYC 7281 with 128 GB ECC RAM

Also a 14C Xeon box with a HD7700 is broken same way.

Everything breaks in X .. scrolling , moving windows , flickering etc.


reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
648bc3574716400acc06f99915815f80d9563783
from an 4.15 kernel makes things work again.


> Backporting all the detection logic is to invasive, but you could just go
> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
> code path.
>
> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>

Well you really can't be serious about these suggestions ? Are you ?

Telling peoples to #if 0 random code is not a solution.

You broke existsing working userland with your patches and at least
please fix that for 4.16.

I can help testing code for 4.17/++ if you wish but that is *different* storry.

Regards,

Gabriel C
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Gabriel C
>2018-04-11 6:00 GMT+02:00 Gabriel C <nix.or@gmail.com>:
> 2018-04-09 11:42 GMT+02:00 Christian König <ckoenig.leichtzumer...@gmail.com>:
>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
...
> I can help testing code for 4.17/++ if you wish but that is *different* 
> storry.
>

Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver
are broken now in this one.

radeon tells:

...

[6.337838] [drm] PCIE GART of 2048M enabled (table at 0x001D6000).
[6.338210] radeon :21:00.0: (-12) create WB bo failed
[6.338214] radeon :21:00.0: disabling GPU acceleration

...

And no way to start X .. flickering and hangs.

amdgpu hits an bug:

http://ftp.frugalware.org/pub/other/people/crazy/trace.txt


Do you have some git tree I can test from ?

Also if you need full , logs or any other infos just  let me know.

Regards
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel