[RFC] Using DC in amdgpu for upcoming GPU

Cheng, Tony Fri, 9 Dec 2016 17:26:43 +0000

> Merging this code as well as maintaining a trust relationship with 
> Linus, also maintains a trust relationship with the Linux graphics 
> community and other drm contributors. There have been countless 
> requests from various companies and contributors to merge unsavoury 
> things over the years and we've denied them. They've all had the same 
> reasons behind why they couldn't do what we want and why we were 
> wrong, but lots of people have shown up who do get what we are at and 
> have joined the community and contributed drivers that conform to the 
> standards.
> Turning around now and saying well AMD ignored our directions, so 
> we'll give them a free pass even though we've denied you all the same 
> thing over time.


I'd like to say that I acknowledge the good and hard work maintainers are 
doing.  You nor the community is wrong to say no. I understand where the no 
comes from.  If somebody wants to throw 100k lines into DAL I would say no as 
well.

> If I'd given in and merged every vendor coded driver as-is we'd never 
> have progressed to having atomic modesetting, there would have been  
> too many vendor HALs and abstractions that would have blocked forward 
> progression. Merging one HAL or abstraction is going to cause  pain, 
> but setting a precedent to merge more would be just downright stupid 
> maintainership.

> Here's the thing, we want AMD to join the graphics community not hang 
> out inside the company in silos. We need to enable FreeSync on Linux, 
> go ask the community how would be best to do it, don't shove it inside 
> the driver hidden in a special ioctl. Got some new HDMI features that 
> are secret, talk to other ppl in the same position and work out a plan 
> for moving forward. At the moment there is no engaging with the Linux 
> stack because you aren't really using it, as long as you hide behind 
> the abstraction there won't be much engagement, and neither side 
> benefits, so why should we merge the code if nobody benefits?


> The platform problem/Windows mindset is scary and makes a lot of 
> decisions for you, open source doesn't have those restrictions, and I 
> don't accept drivers that try and push those development model 
> problems into our codebase.

I would like to share how platform problem/Windows mindset look from our side.  
We are dealing with ever more complex hardware with the push to reduce power 
while driving more pixels through.  It is the power reduction that is causing 
us driver developers most of the pain.  Display is a high bandwidth real time 
memory fetch sub system which is always on, even when the system is idle.  When 
the system is idle, pretty much all of power consumption comes from display.  
Can we use existing DRM infrastructure?  Definitely yes, if we talk about modes 
up to 300Mpix/s and leaving a lot of voltage and clock margin on the table.  
How hard is it to set up a timing while bypass most of the pixel processing 
pipeline to light up a display?  How about adding all the power optimization 
such as burst read to fill display cache and keep DRAM in self-refresh as much 
as possible?  How about powering off some of the cache or pixel processing 
pipeline if we are not using them?  We need to manage and maximize valuable 
resources like cache (cache == silicon area == $$) and clock (== power) and 
optimize memory request patterns at different memory clock speeds, while DPM is 
going, in real time on the system.  This is why there is so much code to 
program registers, track our states, and manages resources, and it's getting 
more complex as HW would prefer SW program the same value into 5 different 
registers in different sub blocks to save a few cross tile wires on silicon and 
do complex calculations to find the magical optimal settings (the hated 
bandwidth_cals.c).  There are a lot of registers need to be programmed to 
correct values in the right situation if we enable all these power/performance 
optimizations.

It's really not a problem of windows mindset, rather is what is the bring up 
platform when silicon is in the lab with HW designer support.  Today no 
surprise we do that almost exclusively on windows.  Display team is working 
hard to change that to have linux in the mix while we have the attention from 
HW designers.  We have a recent effort to try to enable all power features on 
Stoney (current gen low power APU) to match idle power on windows after Stoney 
shipped.  Linux driver guys working hard on it for 4+ month and still having 
hard time getting over the hurdle without support from HW designers because 
designers are tied up with the next generation silicon currently in the lab and 
the rest of them already moved onto next next generation.  To me I would rather 
have everything built on top of DC, including HW diagnostic test suites.  Even 
if I have to build DC on top of DRM mode setting I would prefer that over 
trying to do another bring up without HW support.  After all as driver 
developer refactoring and changing code is more fun than digging through 
documents/email and experimenting with different combination of settings in 
register and countless of reboots to try get pass some random hang.  

FYI, just dce_mem_input.c programs over 50 distinct register fields, and DC for 
current generation ASIC doesn't yet support all features and power 
optimizations.  This doesn't even include more complex programming model in 
future generation with HW IP getting more modular.  We are already making 
progress with bring up with shared DC code for next gen ASIC in the lab. DC HW 
programming / resource management / power optimization will be fully validated 
on all platforms including Linux and that will benefit the Linux driver running 
on AMD HW, especially in battery life.

Just in case you are wondering Polaris windows driver isn't using DC and was on 
a "windows architecture" code base.  We understand that from community point of 
view you are not getting much feature / power benefit yet because 
CI/VI/CZ/Polaris Linux driver with DC is only used in Linux and we donât have 
the man power to make it fully optimized yet.  Next gen will be performance and 
power optimized at launch.  I acknowledge that we don't have full feature on 
Linux yet and we still need to work with community to amend DRM to enable 
FreeSync, HDR, next gen resolution and other display feature just made 
available in Crimson ReLive.  However it's not realistic to engage with 
community early on in these efforts, as up to 1 month prior to release we were 
still experimenting with different solutions to make the feature better and we 
wouldn't have known what we end up building half year ago.  And of course 
marketing wouldn't let us leak these features before Crimson launch.

I would like to work with the community and I think we have shown that we 
welcome, appreciate and take feedback seriously.  There is plenty of work done 
in DC addressing some of the easier to fix problems while we have next gen ASIC 
in the lab as top priority.  We are already down to 66k lines of code from 93k 
through refactoring and remove numerous abstractions.  We can't just tear apart 
the "mid layer" or "HAL" over night.  Plenty of work need to be done to 
understand if/how we can fit resource optimization complexity into existing DRM 
framework.  If you look at DC structure closely, we created them to plug into 
DRM structures (ie.  dc_surface == FB/plane, dc_stream ~= CRTC, dc_link+dc_sink 
= encoder + connector), but we need a resource layer to decide how to realize 
the given "state" with our HW.  The problem is not getting simpler as on top of 
multi-plane combine, shared encoders and clock resources,  compression is 
starting to get into display domain.  By the way, existing DRM structure do fit 
nicely for HW of 4 generations ago, and with current windows driver we do have 
concept of crtc, encoders, connector. However over the years complexity has 
grown and resource management is becoming a problem, which led us to design of 
putting in a resource management layer.  We might not be supporting full range 
of what atomic can do and our semantics may be different at this stage of 
development, but saying dc_validate breaks atomic only tells me you haven't 
take a close look at our DC code.  For us all validation runs same 
topology/resource algorithm in check and commit.  It's not optimal yet as we 
will end up doing this algorithm twice today on a commit but we do intend to 
fix it over time.  I welcome any concrete suggestions on using existing 
framework to solve the resource/topology management issue.  It's not too late 
to change DC now but after couple year after more OS and ASICs are built on top 
of DC it will be very difficult to change. 

> Now the reason I bring this up (and we've discussed it at length in
> private) is that DC still suffers from a massive abstraction midlayer. 
> A lot of the back-end stuff (dp aux, i2c, abstractions for allocation, 
> timers, irq, ...) have been cleaned up, but the midlayer is still there.
> And I understand why you have it, and why it's there - without some OS 
> abstraction your grand plan of a unified driver across everything 
> doesn't work out so well.
>
> But in a way the backend stuff isn't such a big deal. It's annoying 
> since lots of code, and bugfixes have to be duplicated and all that, 
> but it's fairly easy to fix case-by-case, and as long as AMD folks 
> stick around (which I fully expect) not a maintainance issue. It makes 
> it harder for others to contribute, but then since it's mostly the 
> leaf it's generally easy to just improve the part you want to change 
> (as an outsider). And if you want to improve shared code the only 
> downside is that you can't also improve amd, but that's not so much a 
> problem for non-amd folks ;-)

Unfortunately duplicating bug fixes is not trivial and if code base diverge 
some of the fixes will be different.  Surprisingly if you track where we spend 
our time, < 20% is writing code.  Probably 50% is trying to figure out which 
register need a different value programmed in those situations. The other 30% 
is trying to make sure the change doesnât break other stuff in different 
scenarios.  If power and performance optimizations remains off in Linux then I 
would agree with your assessment.  

> I've only got one true power as a maintainer, and that is to say No.

We AMD driver developer only got 2 true power over community, and that is 
having access to internal documentation and HW designers.  Not pulling Linux 
into the mix while silicon is still in the lab means we lose half of our power 
(HW designer support).   

> I've also wondered if the DC code is ready for being part of the kernel 
> anyways, what happens if I merge this, and some external 
> contributor rewrites 50% of it and removes a bunch of stuff that the 
> kernel doesn't need. By any kernel standards I'll merge that sort of 
> change over your heads if Alex doesn't, it might mean you have to 
> rewrite a chunk of your internal validation code, or some other 
> interactions, but those won't be reasons to block the changes from 
> my POV. I'd like some serious introspection on your team's part on 
> how you got into this situation and how even if I was feeling like 
> merging this (which I'm not) how you'd actually deal with being part 
> of the Linux kernel and not hiding in nicely framed orgchart silo 
> behind a HAL. 

We have come a long way compare to how we used to be windows centric, and I am 
sure there is plenty of work remaining for us to be ready to be part of the 
kernel.  If community has clever and clean solution that doesnât break our 
ASICs weâll take it internally with open arms.  We merged Dave and Jeromeâs 
clean up on removing abstractions and we had lots of patches following Dave and 
Jeromeâs lead in different area.  

Again this is not about orgchart.  Itâs about whatâs validated when samples 
are in the lab.

God I miss the day when everything is plugged into the wall and dual link DVI 
was cutting edge.  At least most of our problem can be solved by diffing 
register dump between good and bad case.  

Tony

[RFC] Using DC in amdgpu for upcoming GPU

Reply via email to