RE: deprecated register issues

Liu, Monk Thu, 08 Mar 2018 02:04:15 -0800

Yeah, I agree with you we’d better find all those registers

Stitch together the REQ and ACK part only avoid this issue for vm flush only, 
and we may still hit the issue in other place, I knew it.


The frustrating job is how can we find all those registers ?
And more is since this CC_RB_BACKEND_DISABLE register is not deprecated 
(confirmed with David M), and driver indeed nee to read it
How could we avoid this reading cause vm hub broken ?

I believe like you said there are a bunch of registers (not really deprecated ) 
reading would cause vm hub broken, how could we still read
Them when we want but also not to trigger the world switch issue ?

looks to me there is no way to do that, even if you already find out all of 
those registers, we still need to access them, so the world switch (or other 
issues)
are still going to fail, and that’s why I think the plan B at least have its 
reason to stand there.

any thought ?
/Monk

From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
Sent: 2018年3月8日 17:41
To: Liu, Monk <monk....@amd.com>; Deucher, Alexander 
<alexander.deuc...@amd.com>; Koenig, Christian <christian.koe...@amd.com>; Mao, 
David <david....@amd.com>
Cc: Jin, Jian-Rong <jian-rong....@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: deprecated register issues

Hi Monk,


While we can avoid such vm flush failure by stitch together of the sending REQ 
and reading ACK part, at least for compute ring this is confirmed.
Well there are two misunderstanding here.

First of all this solution doesn't really work, it just hides the problem 
because we don't do a world switch in between those two packets any more. And 
while we could change the SDMA, UVD and VCE firmware do to something similar 
you can't apply this solution to CPU based flushes.

The second issue is that this isn't related to VMHUB flushing at all, it's just 
that VMHUB flushing is the first thing where you notice that something is wrong.

The real problem is that when you access CC_RB_BACKEND_DISABLE and a bunch of 
other registers the bus on Vega10 sometimes gets a hickup and drops other reads 
and writes.

So we need to identify those registers and removes all accesses to them, 
otherwise working with the hardware will just be horrible unreliable in general.

Regards,
Christian.

Am 08.03.2018 um 04:05 schrieb Liu, Monk:
Hi Alex

While we can avoid such vm flush failure by stitch together of the sending REQ 
and reading ACK part, at least for compute ring this is confirmed.
And I believe for SDMA ring (even UVD/VCE ring) it could also be achieved.

But @Koenig, Christian<mailto:christian.koe...@amd.com> insist stitching 
together the REQ AND ACK part is not a formal way to fix the issue, instead 
just a walkaround and I cannot debate that

What make me worry more is what if there are more registers like Alex said that 
behaves like this CC_RB_BACKEND_DISABLE,
since we don’t know their names(too hard to filter them out!) so we couldn’t 
remove them all from SR list,
So I still think we need plan B to handle above case,  A.K.A use one package 
for the REQ and ACK job

/Monk

From: Deucher, Alexander
Sent: 2018年3月8日 10:53
To: Liu, Monk <monk....@amd.com><mailto:monk....@amd.com>; Koenig, Christian 
<christian.koe...@amd.com><mailto:christian.koe...@amd.com>; Mao, David 
<david....@amd.com><mailto:david....@amd.com>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Jin, 
Jian-Rong <jian-rong....@amd.com><mailto:jian-rong....@amd.com>
Subject: Re: deprecated register issues


I think there are more than just CC_RB_BACKEND_DISABLE that could cause this 
problem.  IIRC, some entire class of gfx registers could cause it, it just 
happened that this was one of the only ones we readback via mmio.  Also for the 
save and restore list, I think the RLC uses a different interface to read back 
the registers so it may not be affected the same way.



Alex

________________________________
From: Liu, Monk
Sent: Wednesday, March 7, 2018 9:42:41 PM
To: Deucher, Alexander; Koenig, Christian; Mao, David
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Jin, 
Jian-Rong
Subject: RE: deprecated register issues


Hi guys



According to Christian’s found, reading this register would make vm hub failed 
to finish the vm flush request , e.g.: sdma is doing vm flush which first write 
data to vm_invalidat_req and read result from vm_invalidate_ack, but found 
driver will forever failed to get the correct value from vm_invalidate_ack if 
the meantime BIF is reading this CC_RB_BACKEND_DISABLE register.



Now SR-IOV world switch also may get such similar trouble, see below 
save_restore_list ( during world_switch, RLCV will save current VF’s register 
according to this list and restore all those registers when loading back this 
VF)



uint32 register_restore[] = {

       (uint32)((0x3000 << 18) | mmPA_SC_FIFO_SIZE),   /* SC   */

       0x00000001,

       (uint32)((0x3000 << 18) | mmCC_RB_BACKEND_DISABLE),   /* SC SC PER_SE  */

       0x00000000,

       (uint32)((0x3400 << 18) | mmCC_RB_BACKEND_DISABLE),   /* SC SC PER_SE  */

       0x00000000,

       (uint32)((0x3800 << 18) | mmCC_RB_BACKEND_DISABLE),   /* SC SC PER_SE  */

       0x00000000,

       (uint32)((0x3c00 << 18) | mmCC_RB_BACKEND_DISABLE),   /* SC SC PER_SE  */

       0x00000000,

       (uint32)((0x3000 << 18) | mmVGT_VTX_VECT_EJECT_REG),

       0x00000001,

       (uint32)((0x3000 << 18) | mmVGT_DMA_DATA_FIFO_DEPTH),   /* IA WD  */

       0x00000001,

       (uint32)((0x3000 << 18) | mmVGT_DMA_REQ_FIFO_DEPTH),   /* WD   */

       0x00000001,

       (uint32)((0x3000 << 18) | mmVGT_DRAW_INIT_FIFO_DEPTH),   /* WD   */

       0x00000001,

       (uint32)((0x3000 << 18) | mmVGT_CACHE_INVALIDATION),   /*  IA  */

       0x00000001,

       (uint32)((0x3000 << 18) | mmVGT_RESET_DEBUG),   /*  WD  */

       0x00000001,

       (uint32)((0x3000 << 18) | mmVGT_FIFO_DEPTHS),



I will do some test against this CC_RB_BACKEND_DISABLE register, see if vm 
flush failure issue could be avoided by removing those four register from SR 
list



Thanks



/Monk



From: Deucher, Alexander
Sent: 2018年3月7日 23:13
To: Koenig, Christian 
<christian.koe...@amd.com<mailto:christian.koe...@amd.com>>; Mao, David 
<david....@amd.com<mailto:david....@amd.com>>; Liu, Monk 
<monk....@amd.com<mailto:monk....@amd.com>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Jin, 
Jian-Rong <jian-rong....@amd.com<mailto:jian-rong....@amd.com>>
Subject: Re: deprecated register issues



Right.  We ran into issues with reading back that register at runtime when UMDs 
queried it when other stuff was in flight, so we just read it once at startup 
and cache the results. Now when UMDs request it, we return the cached value.



Alex

________________________________

From: Koenig, Christian
Sent: Wednesday, March 7, 2018 9:31:13 AM
To: Mao, David; Liu, Monk
Cc: Deucher, Alexander; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Jin, 
Jian-Rong
Subject: Re: deprecated register issues



Hi David,

well I just figured that this is a misunderstanding.

Accessing this register and some other deprecated registers can cause problem 
when invalidating VMHUBs.

This register itself isn't deprecated, the wording in a patch fixing things is 
just a bit unclear.

Question is is that register still accessed regularly or is it value cached 
after startup?

Regards,
Christian.

Am 07.03.2018 um 15:25 schrieb Mao, David:

We requires base driver to provide the mask of disabled RB.

This is why kernel read the CC_RB_BACKEND_DISABLE to collect the harvest 
configuration.

Where did you get to know that the register is deprecated?

I think it should still be there.



Best Regards,

David



On Mar 7, 2018, at 9:49 PM, Liu, Monk 
<monk....@amd.com<mailto:monk....@amd.com>> wrote:



+ UMD guys



Hi David



Do you know if GC_USER_RB_BACKEND_DISABLE is still exist for gfx9/vega10 ?



We found CC_RB_BACKEND_DISABLE was deprecated but looks it is still in use in 
kmd, so

I want to check with you both of above registers



Thanks

/Monk



From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
Christian K?nig
Sent: 2018年3月7日 20:26
To: Liu, Monk <monk....@amd.com<mailto:monk....@amd.com>>; Deucher, Alexander 
<alexander.deuc...@amd.com<mailto:alexander.deuc...@amd.com>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: deprecated register issues



Hi Monk,

I honestly don't have the slightest idea why we are still accessing 
CC_RB_BACKEND_DISABLE. Maybe it still contains some useful values?

Key point was that we needed to stop accessing it all the time to avoid 
triggering problems.

Regards,
Christian.

Am 07.03.2018 um 13:11 schrieb Liu, Monk:

Hi Christian



I remember you and AlexD mentioned that a handful registers are deprecated for 
greenland (gfx9)

e.g. CC_RB_BACKEND_DISABLE



do you know why we still have this routine ?



static u32

gfx_v9_0_get_rb_active_bitmap(struct amdgpu_device *adev)



{



    u32 data, mask;







    data = RREG32_SOC15(GC,

0, mmCC_RB_BACKEND_DISABLE);



    data |= RREG32_SOC15(GC,

0, mmGC_USER_RB_BACKEND_DISABLE);







    data &= CC_RB_BACKEND_DISABLE__BACKEND_DISABLE_MASK;



    data >>= GC_USER_RB_BACKEND_DISABLE__BACKEND_DISABLE__SHIFT;







    mask = amdgpu_gfx_create_bitmask(adev->gfx.config.max_backends_per_se /



                     adev->gfx.config.max_sh_per_se);







    return (~data) & mask;



}





see that it still read CC_RB_BACKEND_DISABLE



thanks



/Monk












_______________________________________________

amd-gfx mailing list

amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>

https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: deprecated register issues

Reply via email to