Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support

2023-05-09 Thread Konrad Dybcio



On 8.05.2023 23:15, Akhil P Oommen wrote:
> On Mon, May 08, 2023 at 10:59:24AM +0200, Konrad Dybcio wrote:
>>
>>
>> On 6.05.2023 16:46, Akhil P Oommen wrote:
>>> On Fri, May 05, 2023 at 12:35:18PM +0200, Konrad Dybcio wrote:


 On 5.05.2023 10:46, Akhil P Oommen wrote:
> On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
>>
>>
>> On 3.05.2023 22:32, Akhil P Oommen wrote:
>>> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:


 On 2.05.2023 09:49, Akhil P Oommen wrote:
> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>> but don't implement the associated GMUs. This is due to the fact that
>> the GMU directly pokes at RPMh. Sadly, this means we have to take 
>> care
>> of enabling & scaling power rails, clocks and bandwidth ourselves.
>>
>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
>> A6XX code to facilitate these GPUs. This involves if-ing out lots
>> of GMU callbacks and introducing a new type of GMU - GMU wrapper 
>> (it's
>> the actual name that Qualcomm uses in their downstream kernels).
>>
>> This is essentially a register region which is convenient to model
>> as a device. We'll use it for managing the GDSCs. The register
>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> << I sent a reply to this patch earlier, but not sure where it went.
> Still figuring out Mutt... >>
 Answered it here:

 https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8...@linaro.org/
>>>
>>> Thanks. Will check and respond there if needed.
>>>

 I don't think I see any new comments in this "reply revision" (heh), 
 so please
 check that one out.

>
> Only convenience I found is that we can reuse gmu register ops in a 
> few
> places (< 10 I think). If we just model this as another gpu memory
> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> architecture code with clean separation. Also, it looks like we need 
> to
> keep a dummy gmu platform device in the devicetree with the current
> approach. That doesn't sound right.
 That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
 need additional, gmuwrapper-configuration specific code anyway, as
 OPP & genpd will no longer make use of the default behavior which
 only gets triggered if there's a single power-domains=<> entry, afaicu.
>>> Can you please tell me which specific *default behviour* do you mean 
>>> here?
>>> I am curious to know what I am overlooking here. We can always get a 
>>> cxpd/gxpd device
>>> and vote for the gdscs directly from the driver. Anything related to
>>> OPP?
>> I *believe* this is true:
>>
>> if (ARRAY_SIZE(power-domains) == 1) {
>>  of generic code will enable the power domain at .probe time
> we need to handle the voting directly. I recently shared a patch to
> vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
> only cx rail due to this logic you quoted here.
>
> I see that you have handled it mostly correctly from the gpu driver in 
> the updated
> a6xx_pm_suspend() callback. Just the power domain device ptrs should be 
> moved to
> gpu from gmu.
>
>>
>>  opp APIs will default to scaling that domain with required-opps
>
>> }
>>
>> and we do need to put GX/CX (with an MX parent to match) there, as the
>> AP is responsible for voting in this configuration
>
> We should vote to turn ON gx/cx headswitches through genpd from gpu 
> driver. When you vote for
> core clk frequency, *clock driver is supposed to scale* all the necessary
> regulators. At least that is how downstream works. You can refer the 
> downstream
> gpucc clk driver of these SoCs. I am not sure how much of that can be 
> easily converted to
> upstream.
>
> Also, how does having a gmu dt node help in this regard? Feel free to
> elaborate, I am not very familiar with clk/regulator implementations.
 Okay so I think we have a bit of a confusion here.

 Currently, with this patchset we manage things like this:

 1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, 
 which
is then used with OPP APIs to ensure it's being scaled on freq change 
 [2].
The VDD_lines coming from RPM(h) are described as power domains upstream
*unlike downstream*, which represents them as regulators with preset 
 

Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support

2023-05-08 Thread Akhil P Oommen
On Mon, May 08, 2023 at 10:59:24AM +0200, Konrad Dybcio wrote:
> 
> 
> On 6.05.2023 16:46, Akhil P Oommen wrote:
> > On Fri, May 05, 2023 at 12:35:18PM +0200, Konrad Dybcio wrote:
> >>
> >>
> >> On 5.05.2023 10:46, Akhil P Oommen wrote:
> >>> On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
> 
> 
>  On 3.05.2023 22:32, Akhil P Oommen wrote:
> > On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
> >>
> >>
> >> On 2.05.2023 09:49, Akhil P Oommen wrote:
> >>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
>  Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>  but don't implement the associated GMUs. This is due to the fact that
>  the GMU directly pokes at RPMh. Sadly, this means we have to take 
>  care
>  of enabling & scaling power rails, clocks and bandwidth ourselves.
> 
>  Reuse existing Adreno-common code and modify the deeply-GMU-infused
>  A6XX code to facilitate these GPUs. This involves if-ing out lots
>  of GMU callbacks and introducing a new type of GMU - GMU wrapper 
>  (it's
>  the actual name that Qualcomm uses in their downstream kernels).
> 
>  This is essentially a register region which is convenient to model
>  as a device. We'll use it for managing the GDSCs. The register
>  layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>  and lets us reuse quite a bit of gmu_read/write/rmw calls.
> >>> << I sent a reply to this patch earlier, but not sure where it went.
> >>> Still figuring out Mutt... >>
> >> Answered it here:
> >>
> >> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8...@linaro.org/
> >
> > Thanks. Will check and respond there if needed.
> >
> >>
> >> I don't think I see any new comments in this "reply revision" (heh), 
> >> so please
> >> check that one out.
> >>
> >>>
> >>> Only convenience I found is that we can reuse gmu register ops in a 
> >>> few
> >>> places (< 10 I think). If we just model this as another gpu memory
> >>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> >>> architecture code with clean separation. Also, it looks like we need 
> >>> to
> >>> keep a dummy gmu platform device in the devicetree with the current
> >>> approach. That doesn't sound right.
> >> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
> >> need additional, gmuwrapper-configuration specific code anyway, as
> >> OPP & genpd will no longer make use of the default behavior which
> >> only gets triggered if there's a single power-domains=<> entry, afaicu.
> > Can you please tell me which specific *default behviour* do you mean 
> > here?
> > I am curious to know what I am overlooking here. We can always get a 
> > cxpd/gxpd device
> > and vote for the gdscs directly from the driver. Anything related to
> > OPP?
>  I *believe* this is true:
> 
>  if (ARRAY_SIZE(power-domains) == 1) {
>   of generic code will enable the power domain at .probe time
> >>> we need to handle the voting directly. I recently shared a patch to
> >>> vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
> >>> only cx rail due to this logic you quoted here.
> >>>
> >>> I see that you have handled it mostly correctly from the gpu driver in 
> >>> the updated
> >>> a6xx_pm_suspend() callback. Just the power domain device ptrs should be 
> >>> moved to
> >>> gpu from gmu.
> >>>
> 
>   opp APIs will default to scaling that domain with required-opps
> >>>
>  }
> 
>  and we do need to put GX/CX (with an MX parent to match) there, as the
>  AP is responsible for voting in this configuration
> >>>
> >>> We should vote to turn ON gx/cx headswitches through genpd from gpu 
> >>> driver. When you vote for
> >>> core clk frequency, *clock driver is supposed to scale* all the necessary
> >>> regulators. At least that is how downstream works. You can refer the 
> >>> downstream
> >>> gpucc clk driver of these SoCs. I am not sure how much of that can be 
> >>> easily converted to
> >>> upstream.
> >>>
> >>> Also, how does having a gmu dt node help in this regard? Feel free to
> >>> elaborate, I am not very familiar with clk/regulator implementations.
> >> Okay so I think we have a bit of a confusion here.
> >>
> >> Currently, with this patchset we manage things like this:
> >>
> >> 1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, 
> >> which
> >>is then used with OPP APIs to ensure it's being scaled on freq change 
> >> [2].
> >>The VDD_lines coming from RPM(h) are described as power domains upstream
> >>*unlike downstream*, which represents them as regulators with preset 
> >> voltage
> >>steps (and perhaps that's what had 

Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support

2023-05-08 Thread Konrad Dybcio



On 6.05.2023 16:46, Akhil P Oommen wrote:
> On Fri, May 05, 2023 at 12:35:18PM +0200, Konrad Dybcio wrote:
>>
>>
>> On 5.05.2023 10:46, Akhil P Oommen wrote:
>>> On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:


 On 3.05.2023 22:32, Akhil P Oommen wrote:
> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
>>
>>
>> On 2.05.2023 09:49, Akhil P Oommen wrote:
>>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
 Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
 but don't implement the associated GMUs. This is due to the fact that
 the GMU directly pokes at RPMh. Sadly, this means we have to take care
 of enabling & scaling power rails, clocks and bandwidth ourselves.

 Reuse existing Adreno-common code and modify the deeply-GMU-infused
 A6XX code to facilitate these GPUs. This involves if-ing out lots
 of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
 the actual name that Qualcomm uses in their downstream kernels).

 This is essentially a register region which is convenient to model
 as a device. We'll use it for managing the GDSCs. The register
 layout matches the actual GMU_CX/GX regions on the "real GMU" devices
 and lets us reuse quite a bit of gmu_read/write/rmw calls.
>>> << I sent a reply to this patch earlier, but not sure where it went.
>>> Still figuring out Mutt... >>
>> Answered it here:
>>
>> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8...@linaro.org/
>
> Thanks. Will check and respond there if needed.
>
>>
>> I don't think I see any new comments in this "reply revision" (heh), so 
>> please
>> check that one out.
>>
>>>
>>> Only convenience I found is that we can reuse gmu register ops in a few
>>> places (< 10 I think). If we just model this as another gpu memory
>>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
>>> architecture code with clean separation. Also, it looks like we need to
>>> keep a dummy gmu platform device in the devicetree with the current
>>> approach. That doesn't sound right.
>> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
>> need additional, gmuwrapper-configuration specific code anyway, as
>> OPP & genpd will no longer make use of the default behavior which
>> only gets triggered if there's a single power-domains=<> entry, afaicu.
> Can you please tell me which specific *default behviour* do you mean here?
> I am curious to know what I am overlooking here. We can always get a 
> cxpd/gxpd device
> and vote for the gdscs directly from the driver. Anything related to
> OPP?
 I *believe* this is true:

 if (ARRAY_SIZE(power-domains) == 1) {
of generic code will enable the power domain at .probe time
>>> we need to handle the voting directly. I recently shared a patch to
>>> vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
>>> only cx rail due to this logic you quoted here.
>>>
>>> I see that you have handled it mostly correctly from the gpu driver in the 
>>> updated
>>> a6xx_pm_suspend() callback. Just the power domain device ptrs should be 
>>> moved to
>>> gpu from gmu.
>>>

opp APIs will default to scaling that domain with required-opps
>>>
 }

 and we do need to put GX/CX (with an MX parent to match) there, as the
 AP is responsible for voting in this configuration
>>>
>>> We should vote to turn ON gx/cx headswitches through genpd from gpu driver. 
>>> When you vote for
>>> core clk frequency, *clock driver is supposed to scale* all the necessary
>>> regulators. At least that is how downstream works. You can refer the 
>>> downstream
>>> gpucc clk driver of these SoCs. I am not sure how much of that can be 
>>> easily converted to
>>> upstream.
>>>
>>> Also, how does having a gmu dt node help in this regard? Feel free to
>>> elaborate, I am not very familiar with clk/regulator implementations.
>> Okay so I think we have a bit of a confusion here.
>>
>> Currently, with this patchset we manage things like this:
>>
>> 1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, which
>>is then used with OPP APIs to ensure it's being scaled on freq change [2].
>>The VDD_lines coming from RPM(h) are described as power domains upstream
>>*unlike downstream*, which represents them as regulators with preset 
>> voltage
>>steps (and perhaps that's what had you confused). What's more is that 
>> GDSCs
>>are also modeled as genpds instead of regulators, hence they sort of 
>> "fight"
>>for the spot in power-domains=<> of a given node.
> 
> Thanks for clarifying. I didn't get this part "hence they sort of "fight" for 
> the spot in power-domains".
> What spot exactly did 

Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support

2023-05-06 Thread Akhil P Oommen
On Sun, May 07, 2023 at 02:16:36AM +0530, Akhil P Oommen wrote:
> On Sat, May 06, 2023 at 08:16:21PM +0530, Akhil P Oommen wrote:
> > On Fri, May 05, 2023 at 12:35:18PM +0200, Konrad Dybcio wrote:
> > > 
> > > 
> > > On 5.05.2023 10:46, Akhil P Oommen wrote:
> > > > On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
> > > >>
> > > >>
> > > >> On 3.05.2023 22:32, Akhil P Oommen wrote:
> > > >>> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
> > > 
> > > 
> > >  On 2.05.2023 09:49, Akhil P Oommen wrote:
> > > > On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
> > > >> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX 
> > > >> GPUs
> > > >> but don't implement the associated GMUs. This is due to the fact 
> > > >> that
> > > >> the GMU directly pokes at RPMh. Sadly, this means we have to take 
> > > >> care
> > > >> of enabling & scaling power rails, clocks and bandwidth ourselves.
> > > >>
> > > >> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> > > >> A6XX code to facilitate these GPUs. This involves if-ing out lots
> > > >> of GMU callbacks and introducing a new type of GMU - GMU wrapper 
> > > >> (it's
> > > >> the actual name that Qualcomm uses in their downstream kernels).
> > > >>
> > > >> This is essentially a register region which is convenient to model
> > > >> as a device. We'll use it for managing the GDSCs. The register
> > > >> layout matches the actual GMU_CX/GX regions on the "real GMU" 
> > > >> devices
> > > >> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> > > > << I sent a reply to this patch earlier, but not sure where it went.
> > > > Still figuring out Mutt... >>
> > >  Answered it here:
> > > 
> > >  https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8...@linaro.org/
> > > >>>
> > > >>> Thanks. Will check and respond there if needed.
> > > >>>
> > > 
> > >  I don't think I see any new comments in this "reply revision" (heh), 
> > >  so please
> > >  check that one out.
> > > 
> > > >
> > > > Only convenience I found is that we can reuse gmu register ops in a 
> > > > few
> > > > places (< 10 I think). If we just model this as another gpu memory
> > > > region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> > > > architecture code with clean separation. Also, it looks like we 
> > > > need to
> > > > keep a dummy gmu platform device in the devicetree with the current
> > > > approach. That doesn't sound right.
> > >  That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
> > >  need additional, gmuwrapper-configuration specific code anyway, as
> > >  OPP & genpd will no longer make use of the default behavior which
> > >  only gets triggered if there's a single power-domains=<> entry, 
> > >  afaicu.
> > > >>> Can you please tell me which specific *default behviour* do you mean 
> > > >>> here?
> > > >>> I am curious to know what I am overlooking here. We can always get a 
> > > >>> cxpd/gxpd device
> > > >>> and vote for the gdscs directly from the driver. Anything related to
> > > >>> OPP?
> > > >> I *believe* this is true:
> > > >>
> > > >> if (ARRAY_SIZE(power-domains) == 1) {
> > > >>of generic code will enable the power domain at .probe time
> > > > we need to handle the voting directly. I recently shared a patch to
> > > > vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
> > > > only cx rail due to this logic you quoted here.
> > > > 
> > > > I see that you have handled it mostly correctly from the gpu driver in 
> > > > the updated
> > > > a6xx_pm_suspend() callback. Just the power domain device ptrs should be 
> > > > moved to
> > > > gpu from gmu.
> > > > 
> > > >>
> > > >>opp APIs will default to scaling that domain with required-opps
> > > > 
> > > >> }
> > > >>
> > > >> and we do need to put GX/CX (with an MX parent to match) there, as the
> > > >> AP is responsible for voting in this configuration
> > > > 
> > > > We should vote to turn ON gx/cx headswitches through genpd from gpu 
> > > > driver. When you vote for
> > > > core clk frequency, *clock driver is supposed to scale* all the 
> > > > necessary
> > > > regulators. At least that is how downstream works. You can refer the 
> > > > downstream
> > > > gpucc clk driver of these SoCs. I am not sure how much of that can be 
> > > > easily converted to
> > > > upstream.
> > > > 
> > > > Also, how does having a gmu dt node help in this regard? Feel free to
> > > > elaborate, I am not very familiar with clk/regulator implementations.
> > > Okay so I think we have a bit of a confusion here.
> > > 
> > > Currently, with this patchset we manage things like this:
> > > 
> > > 1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, 
> > > which
> > >is 

Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support

2023-05-06 Thread Akhil P Oommen
On Sat, May 06, 2023 at 08:16:21PM +0530, Akhil P Oommen wrote:
> On Fri, May 05, 2023 at 12:35:18PM +0200, Konrad Dybcio wrote:
> > 
> > 
> > On 5.05.2023 10:46, Akhil P Oommen wrote:
> > > On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
> > >>
> > >>
> > >> On 3.05.2023 22:32, Akhil P Oommen wrote:
> > >>> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
> > 
> > 
> >  On 2.05.2023 09:49, Akhil P Oommen wrote:
> > > On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
> > >> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> > >> but don't implement the associated GMUs. This is due to the fact that
> > >> the GMU directly pokes at RPMh. Sadly, this means we have to take 
> > >> care
> > >> of enabling & scaling power rails, clocks and bandwidth ourselves.
> > >>
> > >> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> > >> A6XX code to facilitate these GPUs. This involves if-ing out lots
> > >> of GMU callbacks and introducing a new type of GMU - GMU wrapper 
> > >> (it's
> > >> the actual name that Qualcomm uses in their downstream kernels).
> > >>
> > >> This is essentially a register region which is convenient to model
> > >> as a device. We'll use it for managing the GDSCs. The register
> > >> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> > >> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> > > << I sent a reply to this patch earlier, but not sure where it went.
> > > Still figuring out Mutt... >>
> >  Answered it here:
> > 
> >  https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8...@linaro.org/
> > >>>
> > >>> Thanks. Will check and respond there if needed.
> > >>>
> > 
> >  I don't think I see any new comments in this "reply revision" (heh), 
> >  so please
> >  check that one out.
> > 
> > >
> > > Only convenience I found is that we can reuse gmu register ops in a 
> > > few
> > > places (< 10 I think). If we just model this as another gpu memory
> > > region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> > > architecture code with clean separation. Also, it looks like we need 
> > > to
> > > keep a dummy gmu platform device in the devicetree with the current
> > > approach. That doesn't sound right.
> >  That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
> >  need additional, gmuwrapper-configuration specific code anyway, as
> >  OPP & genpd will no longer make use of the default behavior which
> >  only gets triggered if there's a single power-domains=<> entry, afaicu.
> > >>> Can you please tell me which specific *default behviour* do you mean 
> > >>> here?
> > >>> I am curious to know what I am overlooking here. We can always get a 
> > >>> cxpd/gxpd device
> > >>> and vote for the gdscs directly from the driver. Anything related to
> > >>> OPP?
> > >> I *believe* this is true:
> > >>
> > >> if (ARRAY_SIZE(power-domains) == 1) {
> > >>  of generic code will enable the power domain at .probe time
> > > we need to handle the voting directly. I recently shared a patch to
> > > vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
> > > only cx rail due to this logic you quoted here.
> > > 
> > > I see that you have handled it mostly correctly from the gpu driver in 
> > > the updated
> > > a6xx_pm_suspend() callback. Just the power domain device ptrs should be 
> > > moved to
> > > gpu from gmu.
> > > 
> > >>
> > >>  opp APIs will default to scaling that domain with required-opps
> > > 
> > >> }
> > >>
> > >> and we do need to put GX/CX (with an MX parent to match) there, as the
> > >> AP is responsible for voting in this configuration
> > > 
> > > We should vote to turn ON gx/cx headswitches through genpd from gpu 
> > > driver. When you vote for
> > > core clk frequency, *clock driver is supposed to scale* all the necessary
> > > regulators. At least that is how downstream works. You can refer the 
> > > downstream
> > > gpucc clk driver of these SoCs. I am not sure how much of that can be 
> > > easily converted to
> > > upstream.
> > > 
> > > Also, how does having a gmu dt node help in this regard? Feel free to
> > > elaborate, I am not very familiar with clk/regulator implementations.
> > Okay so I think we have a bit of a confusion here.
> > 
> > Currently, with this patchset we manage things like this:
> > 
> > 1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, 
> > which
> >is then used with OPP APIs to ensure it's being scaled on freq change 
> > [2].
> >The VDD_lines coming from RPM(h) are described as power domains upstream
> >*unlike downstream*, which represents them as regulators with preset 
> > voltage
> >steps (and perhaps that's what had you confused). What's more is that 
> > GDSCs
> >are 

Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support

2023-05-06 Thread Akhil P Oommen
On Fri, May 05, 2023 at 12:35:18PM +0200, Konrad Dybcio wrote:
> 
> 
> On 5.05.2023 10:46, Akhil P Oommen wrote:
> > On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
> >>
> >>
> >> On 3.05.2023 22:32, Akhil P Oommen wrote:
> >>> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
> 
> 
>  On 2.05.2023 09:49, Akhil P Oommen wrote:
> > On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
> >> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> >> but don't implement the associated GMUs. This is due to the fact that
> >> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> >> of enabling & scaling power rails, clocks and bandwidth ourselves.
> >>
> >> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> >> A6XX code to facilitate these GPUs. This involves if-ing out lots
> >> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> >> the actual name that Qualcomm uses in their downstream kernels).
> >>
> >> This is essentially a register region which is convenient to model
> >> as a device. We'll use it for managing the GDSCs. The register
> >> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> >> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> > << I sent a reply to this patch earlier, but not sure where it went.
> > Still figuring out Mutt... >>
>  Answered it here:
> 
>  https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8...@linaro.org/
> >>>
> >>> Thanks. Will check and respond there if needed.
> >>>
> 
>  I don't think I see any new comments in this "reply revision" (heh), so 
>  please
>  check that one out.
> 
> >
> > Only convenience I found is that we can reuse gmu register ops in a few
> > places (< 10 I think). If we just model this as another gpu memory
> > region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> > architecture code with clean separation. Also, it looks like we need to
> > keep a dummy gmu platform device in the devicetree with the current
> > approach. That doesn't sound right.
>  That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
>  need additional, gmuwrapper-configuration specific code anyway, as
>  OPP & genpd will no longer make use of the default behavior which
>  only gets triggered if there's a single power-domains=<> entry, afaicu.
> >>> Can you please tell me which specific *default behviour* do you mean here?
> >>> I am curious to know what I am overlooking here. We can always get a 
> >>> cxpd/gxpd device
> >>> and vote for the gdscs directly from the driver. Anything related to
> >>> OPP?
> >> I *believe* this is true:
> >>
> >> if (ARRAY_SIZE(power-domains) == 1) {
> >>of generic code will enable the power domain at .probe time
> > we need to handle the voting directly. I recently shared a patch to
> > vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
> > only cx rail due to this logic you quoted here.
> > 
> > I see that you have handled it mostly correctly from the gpu driver in the 
> > updated
> > a6xx_pm_suspend() callback. Just the power domain device ptrs should be 
> > moved to
> > gpu from gmu.
> > 
> >>
> >>opp APIs will default to scaling that domain with required-opps
> > 
> >> }
> >>
> >> and we do need to put GX/CX (with an MX parent to match) there, as the
> >> AP is responsible for voting in this configuration
> > 
> > We should vote to turn ON gx/cx headswitches through genpd from gpu driver. 
> > When you vote for
> > core clk frequency, *clock driver is supposed to scale* all the necessary
> > regulators. At least that is how downstream works. You can refer the 
> > downstream
> > gpucc clk driver of these SoCs. I am not sure how much of that can be 
> > easily converted to
> > upstream.
> > 
> > Also, how does having a gmu dt node help in this regard? Feel free to
> > elaborate, I am not very familiar with clk/regulator implementations.
> Okay so I think we have a bit of a confusion here.
> 
> Currently, with this patchset we manage things like this:
> 
> 1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, which
>is then used with OPP APIs to ensure it's being scaled on freq change [2].
>The VDD_lines coming from RPM(h) are described as power domains upstream
>*unlike downstream*, which represents them as regulators with preset 
> voltage
>steps (and perhaps that's what had you confused). What's more is that GDSCs
>are also modeled as genpds instead of regulators, hence they sort of 
> "fight"
>for the spot in power-domains=<> of a given node.

Thanks for clarifying. I didn't get this part "hence they sort of "fight" for 
the spot in power-domains".
What spot exactly did you mean here? The spot for PD to be used during scaling?

It seems 

Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support

2023-05-05 Thread Konrad Dybcio



On 5.05.2023 10:46, Akhil P Oommen wrote:
> On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
>>
>>
>> On 3.05.2023 22:32, Akhil P Oommen wrote:
>>> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:


 On 2.05.2023 09:49, Akhil P Oommen wrote:
> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>> but don't implement the associated GMUs. This is due to the fact that
>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
>> of enabling & scaling power rails, clocks and bandwidth ourselves.
>>
>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
>> A6XX code to facilitate these GPUs. This involves if-ing out lots
>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
>> the actual name that Qualcomm uses in their downstream kernels).
>>
>> This is essentially a register region which is convenient to model
>> as a device. We'll use it for managing the GDSCs. The register
>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> << I sent a reply to this patch earlier, but not sure where it went.
> Still figuring out Mutt... >>
 Answered it here:

 https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8...@linaro.org/
>>>
>>> Thanks. Will check and respond there if needed.
>>>

 I don't think I see any new comments in this "reply revision" (heh), so 
 please
 check that one out.

>
> Only convenience I found is that we can reuse gmu register ops in a few
> places (< 10 I think). If we just model this as another gpu memory
> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> architecture code with clean separation. Also, it looks like we need to
> keep a dummy gmu platform device in the devicetree with the current
> approach. That doesn't sound right.
 That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
 need additional, gmuwrapper-configuration specific code anyway, as
 OPP & genpd will no longer make use of the default behavior which
 only gets triggered if there's a single power-domains=<> entry, afaicu.
>>> Can you please tell me which specific *default behviour* do you mean here?
>>> I am curious to know what I am overlooking here. We can always get a 
>>> cxpd/gxpd device
>>> and vote for the gdscs directly from the driver. Anything related to
>>> OPP?
>> I *believe* this is true:
>>
>> if (ARRAY_SIZE(power-domains) == 1) {
>>  of generic code will enable the power domain at .probe time
> we need to handle the voting directly. I recently shared a patch to
> vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
> only cx rail due to this logic you quoted here.
> 
> I see that you have handled it mostly correctly from the gpu driver in the 
> updated
> a6xx_pm_suspend() callback. Just the power domain device ptrs should be moved 
> to
> gpu from gmu.
> 
>>
>>  opp APIs will default to scaling that domain with required-opps
> 
>> }
>>
>> and we do need to put GX/CX (with an MX parent to match) there, as the
>> AP is responsible for voting in this configuration
> 
> We should vote to turn ON gx/cx headswitches through genpd from gpu driver. 
> When you vote for
> core clk frequency, *clock driver is supposed to scale* all the necessary
> regulators. At least that is how downstream works. You can refer the 
> downstream
> gpucc clk driver of these SoCs. I am not sure how much of that can be easily 
> converted to
> upstream.
> 
> Also, how does having a gmu dt node help in this regard? Feel free to
> elaborate, I am not very familiar with clk/regulator implementations.
Okay so I think we have a bit of a confusion here.

Currently, with this patchset we manage things like this:

1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, which
   is then used with OPP APIs to ensure it's being scaled on freq change [2].
   The VDD_lines coming from RPM(h) are described as power domains upstream
   *unlike downstream*, which represents them as regulators with preset voltage
   steps (and perhaps that's what had you confused). What's more is that GDSCs
   are also modeled as genpds instead of regulators, hence they sort of "fight"
   for the spot in power-domains=<> of a given node.

2. GMU wrapper gets CX_GDSC & GX_GDSC handles in power-domains=<> (just like
   the real GMU in the current state of upstream [3]), which are then governed
   through explicit genpd calls to turn them on/off when the GPU resume/suspend/
   crash recovery functions are called.

3. GPUs with GMU, like A630, don't get any power-domains=<> entries in DT,
   instead relying on the GMU firmware to communicate necessary requests
   to the VDD_xyz resources 

Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support

2023-05-05 Thread Akhil P Oommen
On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
> 
> 
> On 3.05.2023 22:32, Akhil P Oommen wrote:
> > On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
> >>
> >>
> >> On 2.05.2023 09:49, Akhil P Oommen wrote:
> >>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
>  Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>  but don't implement the associated GMUs. This is due to the fact that
>  the GMU directly pokes at RPMh. Sadly, this means we have to take care
>  of enabling & scaling power rails, clocks and bandwidth ourselves.
> 
>  Reuse existing Adreno-common code and modify the deeply-GMU-infused
>  A6XX code to facilitate these GPUs. This involves if-ing out lots
>  of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
>  the actual name that Qualcomm uses in their downstream kernels).
> 
>  This is essentially a register region which is convenient to model
>  as a device. We'll use it for managing the GDSCs. The register
>  layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>  and lets us reuse quite a bit of gmu_read/write/rmw calls.
> >>> << I sent a reply to this patch earlier, but not sure where it went.
> >>> Still figuring out Mutt... >>
> >> Answered it here:
> >>
> >> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8...@linaro.org/
> > 
> > Thanks. Will check and respond there if needed.
> > 
> >>
> >> I don't think I see any new comments in this "reply revision" (heh), so 
> >> please
> >> check that one out.
> >>
> >>>
> >>> Only convenience I found is that we can reuse gmu register ops in a few
> >>> places (< 10 I think). If we just model this as another gpu memory
> >>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> >>> architecture code with clean separation. Also, it looks like we need to
> >>> keep a dummy gmu platform device in the devicetree with the current
> >>> approach. That doesn't sound right.
> >> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
> >> need additional, gmuwrapper-configuration specific code anyway, as
> >> OPP & genpd will no longer make use of the default behavior which
> >> only gets triggered if there's a single power-domains=<> entry, afaicu.
> > Can you please tell me which specific *default behviour* do you mean here?
> > I am curious to know what I am overlooking here. We can always get a 
> > cxpd/gxpd device
> > and vote for the gdscs directly from the driver. Anything related to
> > OPP?
> I *believe* this is true:
> 
> if (ARRAY_SIZE(power-domains) == 1) {
>   of generic code will enable the power domain at .probe time
we need to handle the voting directly. I recently shared a patch to
vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
only cx rail due to this logic you quoted here.

I see that you have handled it mostly correctly from the gpu driver in the 
updated
a6xx_pm_suspend() callback. Just the power domain device ptrs should be moved to
gpu from gmu.

> 
>   opp APIs will default to scaling that domain with required-opps

> }
> 
> and we do need to put GX/CX (with an MX parent to match) there, as the
> AP is responsible for voting in this configuration

We should vote to turn ON gx/cx headswitches through genpd from gpu driver. 
When you vote for
core clk frequency, *clock driver is supposed to scale* all the necessary
regulators. At least that is how downstream works. You can refer the downstream
gpucc clk driver of these SoCs. I am not sure how much of that can be easily 
converted to
upstream.

Also, how does having a gmu dt node help in this regard? Feel free to
elaborate, I am not very familiar with clk/regulator implementations.

-Akhil.
> 
> Konrad
> > 
> > -Akhil
> >>
> >> If nothing else, this is a very convenient way to model a part of the
> >> GPU (as that's essentially what GMU_CX is, to my understanding) and
> >> the bindings people didn't shoot me in the head for proposing this, so
> >> I assume it'd be cool to pursue this..
> >>
> >> Konrad
> 
>  Signed-off-by: Konrad Dybcio 
>  ---
>   drivers/gpu/drm/msm/adreno/a6xx_gmu.c   |  72 +++-
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 255 
>  +---
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |   1 +
>   drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
>   drivers/gpu/drm/msm/adreno/adreno_gpu.c |   8 +-
>   drivers/gpu/drm/msm/adreno/adreno_gpu.h |   6 +
>   6 files changed, 318 insertions(+), 38 deletions(-)
> 
>  diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
>  b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>  index 87babbb2a19f..b1acdb027205 100644
>  --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>  +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>  @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, 
>  

Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support

2023-05-04 Thread Konrad Dybcio



On 3.05.2023 22:32, Akhil P Oommen wrote:
> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
>>
>>
>> On 2.05.2023 09:49, Akhil P Oommen wrote:
>>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
 Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
 but don't implement the associated GMUs. This is due to the fact that
 the GMU directly pokes at RPMh. Sadly, this means we have to take care
 of enabling & scaling power rails, clocks and bandwidth ourselves.

 Reuse existing Adreno-common code and modify the deeply-GMU-infused
 A6XX code to facilitate these GPUs. This involves if-ing out lots
 of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
 the actual name that Qualcomm uses in their downstream kernels).

 This is essentially a register region which is convenient to model
 as a device. We'll use it for managing the GDSCs. The register
 layout matches the actual GMU_CX/GX regions on the "real GMU" devices
 and lets us reuse quite a bit of gmu_read/write/rmw calls.
>>> << I sent a reply to this patch earlier, but not sure where it went.
>>> Still figuring out Mutt... >>
>> Answered it here:
>>
>> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8...@linaro.org/
> 
> Thanks. Will check and respond there if needed.
> 
>>
>> I don't think I see any new comments in this "reply revision" (heh), so 
>> please
>> check that one out.
>>
>>>
>>> Only convenience I found is that we can reuse gmu register ops in a few
>>> places (< 10 I think). If we just model this as another gpu memory
>>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
>>> architecture code with clean separation. Also, it looks like we need to
>>> keep a dummy gmu platform device in the devicetree with the current
>>> approach. That doesn't sound right.
>> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
>> need additional, gmuwrapper-configuration specific code anyway, as
>> OPP & genpd will no longer make use of the default behavior which
>> only gets triggered if there's a single power-domains=<> entry, afaicu.
> Can you please tell me which specific *default behviour* do you mean here?
> I am curious to know what I am overlooking here. We can always get a 
> cxpd/gxpd device
> and vote for the gdscs directly from the driver. Anything related to
> OPP?
I *believe* this is true:

if (ARRAY_SIZE(power-domains) == 1) {
of generic code will enable the power domain at .probe time

opp APIs will default to scaling that domain with required-opps
}

and we do need to put GX/CX (with an MX parent to match) there, as the
AP is responsible for voting in this configuration

Konrad
> 
> -Akhil
>>
>> If nothing else, this is a very convenient way to model a part of the
>> GPU (as that's essentially what GMU_CX is, to my understanding) and
>> the bindings people didn't shoot me in the head for proposing this, so
>> I assume it'd be cool to pursue this..
>>
>> Konrad

 Signed-off-by: Konrad Dybcio 
 ---
  drivers/gpu/drm/msm/adreno/a6xx_gmu.c   |  72 +++-
  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 255 
 +---
  drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |   1 +
  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
  drivers/gpu/drm/msm/adreno/adreno_gpu.c |   8 +-
  drivers/gpu/drm/msm/adreno/adreno_gpu.h |   6 +
  6 files changed, 318 insertions(+), 38 deletions(-)

 diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
 b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
 index 87babbb2a19f..b1acdb027205 100644
 --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
 +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
 @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, 
 struct platform_device *pdev,
  
  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
  {
 +  struct adreno_gpu *adreno_gpu = _gpu->base;
struct a6xx_gmu *gmu = _gpu->gmu;
struct platform_device *pdev = to_platform_device(gmu->dev);
  
 @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
gmu->mmio = NULL;
gmu->rscc = NULL;
  
 -  a6xx_gmu_memory_free(gmu);
 +  if (!adreno_has_gmu_wrapper(adreno_gpu)) {
 +  a6xx_gmu_memory_free(gmu);
  
 -  free_irq(gmu->gmu_irq, gmu);
 -  free_irq(gmu->hfi_irq, gmu);
 +  free_irq(gmu->gmu_irq, gmu);
 +  free_irq(gmu->hfi_irq, gmu);
 +  }
  
/* Drop reference taken in of_find_device_by_node */
put_device(gmu->dev);
 @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block 
 *nb,
return 0;
  }
  
 +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node 
 *node)
 +{
 +  struct platform_device *pdev = of_find_device_by_node(node);

Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support

2023-05-03 Thread Akhil P Oommen
On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
> 
> 
> On 2.05.2023 09:49, Akhil P Oommen wrote:
> > On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
> >> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> >> but don't implement the associated GMUs. This is due to the fact that
> >> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> >> of enabling & scaling power rails, clocks and bandwidth ourselves.
> >>
> >> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> >> A6XX code to facilitate these GPUs. This involves if-ing out lots
> >> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> >> the actual name that Qualcomm uses in their downstream kernels).
> >>
> >> This is essentially a register region which is convenient to model
> >> as a device. We'll use it for managing the GDSCs. The register
> >> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> >> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> > << I sent a reply to this patch earlier, but not sure where it went.
> > Still figuring out Mutt... >>
> Answered it here:
> 
> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8...@linaro.org/

Thanks. Will check and respond there if needed.

> 
> I don't think I see any new comments in this "reply revision" (heh), so please
> check that one out.
> 
> > 
> > Only convenience I found is that we can reuse gmu register ops in a few
> > places (< 10 I think). If we just model this as another gpu memory
> > region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> > architecture code with clean separation. Also, it looks like we need to
> > keep a dummy gmu platform device in the devicetree with the current
> > approach. That doesn't sound right.
> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
> need additional, gmuwrapper-configuration specific code anyway, as
> OPP & genpd will no longer make use of the default behavior which
> only gets triggered if there's a single power-domains=<> entry, afaicu.
Can you please tell me which specific *default behviour* do you mean here?
I am curious to know what I am overlooking here. We can always get a cxpd/gxpd 
device
and vote for the gdscs directly from the driver. Anything related to
OPP?

-Akhil
> 
> If nothing else, this is a very convenient way to model a part of the
> GPU (as that's essentially what GMU_CX is, to my understanding) and
> the bindings people didn't shoot me in the head for proposing this, so
> I assume it'd be cool to pursue this..
> 
> Konrad
> >>
> >> Signed-off-by: Konrad Dybcio 
> >> ---
> >>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c   |  72 +++-
> >>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 255 
> >> +---
> >>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |   1 +
> >>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
> >>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |   8 +-
> >>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |   6 +
> >>  6 files changed, 318 insertions(+), 38 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> >> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >> index 87babbb2a19f..b1acdb027205 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, 
> >> struct platform_device *pdev,
> >>  
> >>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >>  {
> >> +  struct adreno_gpu *adreno_gpu = _gpu->base;
> >>struct a6xx_gmu *gmu = _gpu->gmu;
> >>struct platform_device *pdev = to_platform_device(gmu->dev);
> >>  
> >> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >>gmu->mmio = NULL;
> >>gmu->rscc = NULL;
> >>  
> >> -  a6xx_gmu_memory_free(gmu);
> >> +  if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >> +  a6xx_gmu_memory_free(gmu);
> >>  
> >> -  free_irq(gmu->gmu_irq, gmu);
> >> -  free_irq(gmu->hfi_irq, gmu);
> >> +  free_irq(gmu->gmu_irq, gmu);
> >> +  free_irq(gmu->hfi_irq, gmu);
> >> +  }
> >>  
> >>/* Drop reference taken in of_find_device_by_node */
> >>put_device(gmu->dev);
> >> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block 
> >> *nb,
> >>return 0;
> >>  }
> >>  
> >> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node 
> >> *node)
> >> +{
> >> +  struct platform_device *pdev = of_find_device_by_node(node);
> >> +  struct a6xx_gmu *gmu = _gpu->gmu;
> >> +  int ret;
> >> +
> >> +  if (!pdev)
> >> +  return -ENODEV;
> >> +
> >> +  gmu->dev = >dev;
> >> +
> >> +  of_dma_configure(gmu->dev, node, true);
> > why setup dma for a device that is not actually present?
> >> +
> >> +  pm_runtime_enable(gmu->dev);
> >> +
> >> +  /* Mark legacy for manual SPTPRAC control */
> >> +  gmu->legacy = true;
> >> +
> >> +  /* Map the GMU 

Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support

2023-05-02 Thread Konrad Dybcio



On 2.05.2023 09:49, Akhil P Oommen wrote:
> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>> but don't implement the associated GMUs. This is due to the fact that
>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
>> of enabling & scaling power rails, clocks and bandwidth ourselves.
>>
>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
>> A6XX code to facilitate these GPUs. This involves if-ing out lots
>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
>> the actual name that Qualcomm uses in their downstream kernels).
>>
>> This is essentially a register region which is convenient to model
>> as a device. We'll use it for managing the GDSCs. The register
>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> << I sent a reply to this patch earlier, but not sure where it went.
> Still figuring out Mutt... >>
Answered it here:

https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8...@linaro.org/

I don't think I see any new comments in this "reply revision" (heh), so please
check that one out.

> 
> Only convenience I found is that we can reuse gmu register ops in a few
> places (< 10 I think). If we just model this as another gpu memory
> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> architecture code with clean separation. Also, it looks like we need to
> keep a dummy gmu platform device in the devicetree with the current
> approach. That doesn't sound right.
That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
need additional, gmuwrapper-configuration specific code anyway, as
OPP & genpd will no longer make use of the default behavior which
only gets triggered if there's a single power-domains=<> entry, afaicu.

If nothing else, this is a very convenient way to model a part of the
GPU (as that's essentially what GMU_CX is, to my understanding) and
the bindings people didn't shoot me in the head for proposing this, so
I assume it'd be cool to pursue this..

Konrad
>>
>> Signed-off-by: Konrad Dybcio 
>> ---
>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c   |  72 +++-
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 255 
>> +---
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |   1 +
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |   8 +-
>>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |   6 +
>>  6 files changed, 318 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
>> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> index 87babbb2a19f..b1acdb027205 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, 
>> struct platform_device *pdev,
>>  
>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>>  {
>> +struct adreno_gpu *adreno_gpu = _gpu->base;
>>  struct a6xx_gmu *gmu = _gpu->gmu;
>>  struct platform_device *pdev = to_platform_device(gmu->dev);
>>  
>> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>>  gmu->mmio = NULL;
>>  gmu->rscc = NULL;
>>  
>> -a6xx_gmu_memory_free(gmu);
>> +if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>> +a6xx_gmu_memory_free(gmu);
>>  
>> -free_irq(gmu->gmu_irq, gmu);
>> -free_irq(gmu->hfi_irq, gmu);
>> +free_irq(gmu->gmu_irq, gmu);
>> +free_irq(gmu->hfi_irq, gmu);
>> +}
>>  
>>  /* Drop reference taken in of_find_device_by_node */
>>  put_device(gmu->dev);
>> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
>>  return 0;
>>  }
>>  
>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node 
>> *node)
>> +{
>> +struct platform_device *pdev = of_find_device_by_node(node);
>> +struct a6xx_gmu *gmu = _gpu->gmu;
>> +int ret;
>> +
>> +if (!pdev)
>> +return -ENODEV;
>> +
>> +gmu->dev = >dev;
>> +
>> +of_dma_configure(gmu->dev, node, true);
> why setup dma for a device that is not actually present?
>> +
>> +pm_runtime_enable(gmu->dev);
>> +
>> +/* Mark legacy for manual SPTPRAC control */
>> +gmu->legacy = true;
>> +
>> +/* Map the GMU registers */
>> +gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
>> +if (IS_ERR(gmu->mmio)) {
>> +ret = PTR_ERR(gmu->mmio);
>> +goto err_mmio;
>> +}
>> +
>> +gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
>> +if (IS_ERR(gmu->cxpd)) {
>> +ret = PTR_ERR(gmu->cxpd);
>> +goto err_mmio;
>> +}
>> +
>> +if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
>> +ret = -ENODEV;
>> +goto detach_cxpd;
>> +}
>> +
>> +  

Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support

2023-05-02 Thread Akhil P Oommen
On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> but don't implement the associated GMUs. This is due to the fact that
> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> of enabling & scaling power rails, clocks and bandwidth ourselves.
> 
> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> A6XX code to facilitate these GPUs. This involves if-ing out lots
> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> the actual name that Qualcomm uses in their downstream kernels).
> 
> This is essentially a register region which is convenient to model
> as a device. We'll use it for managing the GDSCs. The register
> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> and lets us reuse quite a bit of gmu_read/write/rmw calls.
<< I sent a reply to this patch earlier, but not sure where it went.
Still figuring out Mutt... >>

Only convenience I found is that we can reuse gmu register ops in a few
places (< 10 I think). If we just model this as another gpu memory
region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
architecture code with clean separation. Also, it looks like we need to
keep a dummy gmu platform device in the devicetree with the current
approach. That doesn't sound right.
> 
> Signed-off-by: Konrad Dybcio 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c   |  72 +++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 255 
> +---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |   8 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |   6 +
>  6 files changed, 318 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 87babbb2a19f..b1acdb027205 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, 
> struct platform_device *pdev,
>  
>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>  {
> + struct adreno_gpu *adreno_gpu = _gpu->base;
>   struct a6xx_gmu *gmu = _gpu->gmu;
>   struct platform_device *pdev = to_platform_device(gmu->dev);
>  
> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>   gmu->mmio = NULL;
>   gmu->rscc = NULL;
>  
> - a6xx_gmu_memory_free(gmu);
> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> + a6xx_gmu_memory_free(gmu);
>  
> - free_irq(gmu->gmu_irq, gmu);
> - free_irq(gmu->hfi_irq, gmu);
> + free_irq(gmu->gmu_irq, gmu);
> + free_irq(gmu->hfi_irq, gmu);
> + }
>  
>   /* Drop reference taken in of_find_device_by_node */
>   put_device(gmu->dev);
> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
>   return 0;
>  }
>  
> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node 
> *node)
> +{
> + struct platform_device *pdev = of_find_device_by_node(node);
> + struct a6xx_gmu *gmu = _gpu->gmu;
> + int ret;
> +
> + if (!pdev)
> + return -ENODEV;
> +
> + gmu->dev = >dev;
> +
> + of_dma_configure(gmu->dev, node, true);
why setup dma for a device that is not actually present?
> +
> + pm_runtime_enable(gmu->dev);
> +
> + /* Mark legacy for manual SPTPRAC control */
> + gmu->legacy = true;
> +
> + /* Map the GMU registers */
> + gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
> + if (IS_ERR(gmu->mmio)) {
> + ret = PTR_ERR(gmu->mmio);
> + goto err_mmio;
> + }
> +
> + gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
> + if (IS_ERR(gmu->cxpd)) {
> + ret = PTR_ERR(gmu->cxpd);
> + goto err_mmio;
> + }
> +
> + if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
> + ret = -ENODEV;
> + goto detach_cxpd;
> + }
> +
> + init_completion(>pd_gate);
> + complete_all(>pd_gate);
> + gmu->pd_nb.notifier_call = cxpd_notifier_cb;
> +
> + /* Get a link to the GX power domain to reset the GPU */
> + gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> + if (IS_ERR(gmu->gxpd)) {
> + ret = PTR_ERR(gmu->gxpd);
> + goto err_mmio;
> + }
> +
> + gmu->initialized = true;
> +
> + return 0;
> +
> +detach_cxpd:
> + dev_pm_domain_detach(gmu->cxpd, false);
> +
> +err_mmio:
> + iounmap(gmu->mmio);
> +
> + /* Drop reference taken in of_find_device_by_node */
> + put_device(gmu->dev);
> +
> + return ret;
> +}
> +
>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>  {
>   struct adreno_gpu *adreno_gpu = _gpu->base;
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
>