Re: Re: Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8

2021-12-21 Thread Alex Deucher
Yes, you can either do that, or if amdgpu is loaded, just read the data
from /sys/kernel/debug/dri/0/amdgpu_vbios

Alex


On Mon, Dec 20, 2021 at 3:06 AM 周宗敏  wrote:

>
>
> Dear Alex:
>
>
> I've never tried to get a VBIOS before, so can you tell me how to  get a
> vbios image copy for you?
>
> I  try to google, just get the message that maybe can get from the
> following way:
>
> echo 1 > /sys/devices/pci:00/:00:02.0/rom
>
> cat /sys/devices/pci:00/:00:02.0/rom > vbios.dump
>
> echo 0 > /sys/devices/pci:00/:00:02.0/rom
>
>
> Is that right?
>
>
> Thanks very much.
>
>
> ----
>
>
>
>
>
>
> *主 题:*Re: Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc
> v8
> *日 期:*2021-12-18 05:19
> *发件人:*Alex Deucher
> *收件人:*周宗敏
>
>
> If you could get me a copy of the vbios image from a problematic board,
> that would be helpful.  In the meantime, I've applied the patch.
>
> Alex
>
>
> On Thu, Dec 16, 2021 at 9:38 PM 周宗敏  wrote:
>
>> Dear Alex:
>>
>>
>> >Is the issue reproducible with the same board in bare metal on x86?Or
>> does it only happen with passthrough on ARM?
>>
>>
>> Unfortunately, my current environment is not convenient to test this GPU
>> board on x86 platform.
>>
>> but I can tell you the problem still occurs on ARM without passthrough to
>> virtual machine.
>>
>>
>> In addition,at end of 2020,my colleagues also found similar problems on
>> MIPS platforms with Graphics chips of Radeon R7 340.
>>
>> So,I may think it can happen to no matter based on x86 ,ARM or mips.
>>
>>
>> I hope the above information is helpful to you,and I also think it will
>> be better for user if can root cause this issue.
>>
>>
>> Best regards.
>>
>>
>>
>>
>> 
>>
>>
>>
>>
>>
>>
>> *主 题:*Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>>
>> *日 期:*2021-12-16 23:28
>> *发件人:*Alex Deucher
>> *收件人:*周宗敏
>>
>>
>> Is the issue reproducible with the same board in bare metal on x86?  Or
>> does it only happen with passthrough on ARM?  Looking through the archives,
>> the SI patch I made was for an x86 laptop.  It would be nice to root cause
>> this, but there weren't any gfx8 boards with more than 64G of vram, so I
>> think it's safe.  That said, if you see similar issues with newer gfx IPs
>> then we have an issue since the upper bit will be meaningful, so it would
>> be nice to root cause this.
>>
>> Alex
>>
>>
>> On Thu, Dec 16, 2021 at 4:36 AM 周宗敏  wrote:
>>
>>> Hi  Christian,
>>>
>>>
>>> I'm  testing for GPU passthrough feature, so I pass through this GPU to
>>> virtual machine to use. It  based on arm64 system.
>>>
>>> As far as i know, Alex had dealt with a similar problems on
>>> dri/radeon/si.c .  Maybe they have a same reason to cause it?
>>>
>>> the history commit message is below:
>>>
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ca223b029a261e82fb2f50c52eb85d510f4260e
>>>
>>> [image: image.png]
>>>
>>>
>>> Thanks very much.
>>>
>>>
>>>
>>> 
>>>
>>>
>>>
>>> *主 题:*Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>>>
>>> *日 期:*2021-12-16 16:15
>>> *发件人:*Christian König
>>> *收件人:*周宗敏Alex Deucher
>>>
>>>
>>>
>>>
>>> Hi Zongmin,
>>>
>>>that strongly sounds like the ASIC is not correctly initialized when
>>>trying to read the register.
>>>
>>>What board and environment are you using this GPU with? Is that a
>>>  normal x86 system?
>>>
>>>Regards,
>>>Christian.
>>>
>>>
>>>
>>> Am 16.12.21 um 04:11 schrieb 周宗敏:
>>>
>>>
>>>
>>>1.
>>>
>>>the problematic boards that I have tested is [AMD/ATI] Lexa
>>>   PRO [Radeon RX 550/550X] ;  and the vbios version :
>>> 113-RXF9310-C09-BT
>>>2.
>>>
>>>When an exception occurs I can see the following changes in
>>>   the values of vram size get from RREG32(mmCONFIG_MEMSIZE) ,
>>>
>>>it seems to have garbage in the upper 16 bits
>>>
>>>[image: ima

回复: Re: Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8

2021-12-20 Thread 周宗敏
    Dear Alex:I've never tried to get a VBIOS before, so can you tell me how to  get a vbios image copy for you?I  try to google, just get the message that maybe can get from the following way:echo 1 > /sys/devices/pci:00/:00:02.0/romcat /sys/devices/pci:00/:00:02.0/rom > vbios.dumpecho 0 > /sys/devices/pci:00/:00:02.0/romIs that right?Thanks very much.----
        主 题:Re: Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
            日 期:2021-12-18 05:19
            发件人:Alex Deucher
            收件人:周宗敏
            
        
        If you could get me a copy of the vbios image from a problematic board, that would be helpful.  In the meantime, I've applied the patch.AlexOn Thu, Dec 16, 2021 at 9:38 PM 周宗敏 <zhouzong...@kylinos.cn> wrote:Dear Alex:>Is the issue reproducible with the same board in bare metal on x86?Or does it only happen with passthrough on ARM?Unfortunately, my current environment is not convenient to test this GPU board on x86 platform.but I can tell you the problem still occurs on ARM without passthrough to virtual machine.In addition,at end of 2020,my colleagues also found similar problems on MIPS platforms with Graphics chips of Radeon R7 340.So,I may think it can happen to no matter based on x86 ,ARM or mips.I hope the above information is helpful to you,and I also think it will be better for user if can root cause this issue.Best regards.

    
        主 题:Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
            日 期:2021-12-16 23:28
            发件人:Alex Deucher
            收件人:周宗敏
            
        
        Is the issue reproducible with the same board in bare metal on x86?  Or does it only happen with passthrough on ARM?  Looking through the archives, the SI patch I made was for an x86 laptop.  It would be nice to root cause this, but there weren't any gfx8 boards with more than 64G of vram, so I think it's safe.  That said, if you see similar issues with newer gfx IPs then we have an issue since the upper bit will be meaningful, so it would be nice to root cause this.AlexOn Thu, Dec 16, 2021 at 4:36 AM 周宗敏 <zhouzong...@kylinos.cn> wrote:Hi  Christian,I'm  testing for GPU passthrough feature, so I pass through this GPU to  virtual machine to use. It  based on arm64 system.As far as i know, Alex had dealt with a similar problems on dri/radeon/si.c .  Maybe they have a same reason to cause it?the history commit message is below:https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ca223b029a261e82fb2f50c52eb85d510f4260eThanks very much.       主 题:Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
            日 期:2021-12-16 16:15
            发件人:Christian König
            收件人:周宗敏Alex Deucher
            
        
        Hi Zongmin,
    
    that strongly sounds like the ASIC is not correctly initialized when
    trying to read the register.
    
    What board and environment are you using this GPU with? Is that a
    normal x86 system?
    
    Regards,
    Christian.
    
    Am 16.12.21 um 04:11 schrieb 周宗敏:
    the problematic boards that I have tested is [AMD/ATI] Lexa
            PRO [Radeon RX 550/550X] ;  and the vbios version :
            113-RXF9310-C09-BTWhen an exception occurs I can see the following changes in
            the values of vram size get from RREG32(mmCONFIG_MEMSIZE) ,it seems to have garbage in the upper 16 bits 
          
          and then I can also see some dmesg like below:when vram size register have garbage,we may see error
            message like below:amdgpu :09:00.0: VRAM: 4286582784M 0x00F4 -
            0x000FF8F4 (4286582784M used)the correct message should like below:amdgpu :09:00.0: VRAM: 4096M 0x00F4 -
            0x00F4 (4096M used)
          if you have any problems,please send me mail.thanks very much.
                主 题:Re: [PATCH] drm/amdgpu:
          fixup bad vram size on gmc v8            
        日 期:2021-12-16 04:23    
               
        发件人:Alex Deucher        
           
        收件人:Zongmin Zhou          
                 
               On Wed, Dec 15, 2021 at 10:31 AM Zongmin Zhouwrote:
          >
          > Some boards(like RX550) seem to have garbage in the upper
          > 16 bits of the vram size register.  Check for
          > this and clamp the size properly.  Fixes
          > boards reporting bogus amounts of vram.
          >
          > after add this patch,the maximum GPU VRAM size is 64GB,
          > otherwise only 64GB vram size will be used.
          
          Can you provide some examples of problematic boards and
          possibly a
          vbios image from the problematic board?  What values are you
          seeing?
          It would be nice to see what the boards are reporting and
          whether the
          lower 16 bits are actually correct or i

Re: Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8

2021-12-17 Thread Alex Deucher
If you could get me a copy of the vbios image from a problematic board,
that would be helpful.  In the meantime, I've applied the patch.

Alex


On Thu, Dec 16, 2021 at 9:38 PM 周宗敏  wrote:

> Dear Alex:
>
>
> >Is the issue reproducible with the same board in bare metal on x86?Or
> does it only happen with passthrough on ARM?
>
>
> Unfortunately, my current environment is not convenient to test this GPU
> board on x86 platform.
>
> but I can tell you the problem still occurs on ARM without passthrough to
> virtual machine.
>
>
> In addition,at end of 2020,my colleagues also found similar problems on
> MIPS platforms with Graphics chips of Radeon R7 340.
>
> So,I may think it can happen to no matter based on x86 ,ARM or mips.
>
>
> I hope the above information is helpful to you,and I also think it will be
> better for user if can root cause this issue.
>
>
> Best regards.
>
>
>
>
> 
>
>
>
>
>
>
> *主 题:*Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>
> *日 期:*2021-12-16 23:28
> *发件人:*Alex Deucher
> *收件人:*周宗敏
>
>
> Is the issue reproducible with the same board in bare metal on x86?  Or
> does it only happen with passthrough on ARM?  Looking through the archives,
> the SI patch I made was for an x86 laptop.  It would be nice to root
> cause this, but there weren't any gfx8 boards with more than 64G of vram,
> so I think it's safe.  That said, if you see similar issues with newer gfx
> IPs then we have an issue since the upper bit will be meaningful, so it
> would be nice to root cause this.
>
> Alex
>
>
> On Thu, Dec 16, 2021 at 4:36 AM 周宗敏  wrote:
>
>> Hi  Christian,
>>
>>
>> I'm  testing for GPU passthrough feature, so I pass through this GPU to
>> virtual machine to use. It  based on arm64 system.
>>
>> As far as i know, Alex had dealt with a similar problems on
>> dri/radeon/si.c .  Maybe they have a same reason to cause it?
>>
>> the history commit message is below:
>>
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ca223b029a261e82fb2f50c52eb85d510f4260e
>>
>> [image: image.png]
>>
>>
>> Thanks very much.
>>
>>
>>
>> 
>>
>>
>>
>> *主 题:*Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>>
>> *日 期:*2021-12-16 16:15
>> *发件人:*Christian König
>> *收件人:*周宗敏Alex Deucher
>>
>>
>>
>>
>> Hi Zongmin,
>>
>>that strongly sounds like the ASIC is not correctly initialized when
>>  trying to read the register.
>>
>>What board and environment are you using this GPU with? Is that a
>>  normal x86 system?
>>
>>Regards,
>>Christian.
>>
>>
>>
>> Am 16.12.21 um 04:11 schrieb 周宗敏:
>>
>>
>>
>>1.
>>
>>the problematic boards that I have tested is [AMD/ATI] Lexa
>> PRO [Radeon RX 550/550X] ;  and the vbios version :
>> 113-RXF9310-C09-BT
>>2.
>>
>>When an exception occurs I can see the following changes in
>> the values of vram size get from RREG32(mmCONFIG_MEMSIZE) ,
>>
>>it seems to have garbage in the upper 16 bits
>>
>>[image: image.png]
>>
>>
>>
>>
>>3.
>>
>>and then I can also see some dmesg like below:
>>
>>when vram size register have garbage,we may see error
>> message like below:
>>
>>amdgpu :09:00.0: VRAM: 4286582784M 0x00F4 -
>> 0x000FF8F4 (4286582784M used)
>>
>>the correct message should like below:
>>
>>amdgpu :09:00.0: VRAM: 4096M 0x00F4 -
>> 0x00F4 (4096M used)
>>
>>
>>
>>
>>if you have any problems,please send me mail.
>>
>>thanks very much.
>>
>>
>>
>>
>> 
>>
>> *主 题:*Re: [PATCH] drm/amdgpu:  fixup bad vram size on gmc v8
>>
>>*日 期:*2021-12-16 04:23
>>*发件人:*Alex Deucher
>>*收件人:*Zongmin Zhou
>>
>>
>>
>>
>> On Wed, Dec 15, 2021 at 10:31 AM Zongmin Zhouwrote:
>>  >
>>  > Some boards(like RX550) seem to have garbage in the upper
>>  > 16 bits of the vram size register.  Check for
>>  > this and clamp the size properly.  Fixes
>>  > boards reporting bogus amounts of vram.
>>  >
>>  > after add this patch,the maximum GPU VRAM size is 64GB,
>>  > otherwise only 64GB vram size will be used.
>>
>>  Can you provide some examples of problematic boards and
>>  possibly a
>>  vbios image from the problematic board?  What values are you
>>  seeing?
>>  It would be nice to see what the boards are reporting and
>>whether the
>>  lower 16 bits are actually correct or if it is some other
>>issue.  This
>>  register is undefined until the asic has been initialized.
>> The vbios
>>  programs it as part of it's asic init sequence (either via
>>vesa/gop or
>>  the OS driver).
>>
>>  Alex
>>
>>
>>  >
>>  > Signed-off-by: Zongmin Zhou
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 13
>>  ++---
>>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>