date:20230222

Re: [PATCH 1/2] Mips: ls2k1000: dts: add the display controller device node

2023-02-22 Thread Krzysztof Kozlowski

On 23/02/2023 04:19, Sui jingfeng wrote:
> Hi,
> 
> On 2023/2/23 02:32, Krzysztof Kozlowski wrote:
>> On 22/02/2023 17:55, suijingfeng wrote:
>>> The display controller is a pci device, it's pci vendor id is
>>> 0x0014, it's pci device id is 0x7a06.
>>>
>>> Signed-off-by: suijingfeng 
>>> ---
>>>   .../boot/dts/loongson/loongson64-2k1000.dtsi  | 21 +++
>>>   1 file changed, 21 insertions(+)
>>>
>>> diff --git a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi 
>>> b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
>>> index 8143a6e3..a528af3977d9 100644
>>> --- a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
>>> +++ b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
>>> @@ -31,6 +31,18 @@ memory@20 {
>>> <0x0001 0x1000 0x0001 0xb000>; /* 6912 
>>> MB at 4352MB */
>>> };
>>>   
>>> +   reserved-memory {
>>> +   #address-cells = <2>;
>>> +   #size-cells = <2>;
>>> +   ranges;
>>> +
>>> +   display_reserved: framebuffer@3000 {
>>> +   compatible = "shared-dma-pool";
>>> +   reg = <0x0 0x3000 0x0 0x0400>; /* 64M */
>>> +   linux,cma-default;
>>> +   };
>>> +   };
>>> +
>>> cpu_clk: cpu_clk {
>>> #clock-cells = <0>;
>>> compatible = "fixed-clock";
>>> @@ -198,6 +210,15 @@ sata@8,0 {
>>> interrupt-parent = <&liointc0>;
>>> };
>>>   
>>> +   display-controller@6,0 {
>>> +   compatible = "loongson,ls2k1000-dc";
>>> +
>>> +   reg = <0x3000 0x0 0x0 0x0 0x0>;
>>> +   interrupts = <28 IRQ_TYPE_LEVEL_LOW>;
>>> +   interrupt-parent = <&liointc0>;
>>> +   memory-region = <&display_reserved>;
>> NAK.
> Err :(,  please give me a chance to explain
>> Test your code against the bindings you send.
> 
> I can guarantee to you that I test may code more than twice. The code 
> used to testing is listed at link [1].

I wrote - test against the bindings. I don't believe that it was tested.
Please paste the output of the testing (dtbs_check).

> 
> This patchset  mainly used to illustrate how  we made the driver in [1] 
> usable on our SoC platform.
> 
>> It's the same
>> patchset. You basically send something which the same moment is incorrect.
> 
> Loongson display controller IP has been integrated in both Loongson
> North Bridge chipset(ls7a1000 and ls7a2000) and Loongson SoCs(ls2k1000
> and ls2k2000 etc), it even has been included in Loongson BMC(ls2k0500 bmc)
> products.

I don't understand how your reply here is relevant to incorrect bindings
or incorrect DTS according to bindings.


Best regards,
Krzysztof

Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings

2023-02-22 Thread Christian König


Am 22.02.23 um 17:40 schrieb Danilo Krummrich:

On 2/22/23 16:14, Christian König wrote:

Am 22.02.23 um 16:07 schrieb Danilo Krummrich:

On 2/22/23 11:25, Christian König wrote:

Am 17.02.23 um 14:44 schrieb Danilo Krummrich:





+/**
+ * DOC: Overview
+ *
+ * The DRM GPU VA Manager, represented by struct 
drm_gpuva_manager keeps track
+ * of a GPU's virtual address (VA) space and manages the 
corresponding virtual
+ * mappings represented by &drm_gpuva objects. It also keeps 
track of the

+ * mapping's backing &drm_gem_object buffers.
+ *
+ * &drm_gem_object buffers maintain a list (and a corresponding 
list lock) of
+ * &drm_gpuva objects representing all existent GPU VA mappings 
using this

+ * &drm_gem_object as backing buffer.
+ *
+ * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA 
mapping can
+ * only be created within a previously allocated 
&drm_gpuva_region, which
+ * represents a reserved portion of the GPU VA space. GPU VA 
mappings are not

+ * allowed to span over a &drm_gpuva_region's boundary.
+ *
+ * GPU VA regions can also be flagged as sparse, which allows 
drivers to create
+ * sparse mappings for a whole GPU VA region in order to support 
Vulkan

+ * 'Sparse Resources'.


Well since we have now found that there is absolutely no technical 
reason for having those regions could we please drop them?


I disagree this was the outcome of our previous discussion.

In nouveau I still need them to track the separate sparse page 
tables and, as you confirmed previously, Nvidia cards are not the 
only cards supporting this feature.


The second reason is that with regions we can avoid merging between 
buffers, which saves some effort. However, I agree that this 
argument by itself probably doesn't hold too much, since you've 
pointed out in a previous mail that:



1) If we merge and decide to only do that inside certain boundaries 
then those boundaries needs to be provided and checked against. This 
burns quite some CPU cycles


2) If we just merge what we can we might have extra page table 
updates which cost time and could result in undesired side effects.


3) If we don't merge at all we have additional housekeeping for the 
mappings and maybe hw restrictions.



However, if a driver uses regions to track its separate sparse page 
tables anyway it gets 1) for free, which is a nice synergy.


I totally agree that regions aren't for everyone though. Hence, I 
made them an optional feature and by default regions are disabled. 
In order to use them drm_gpuva_manager_init() must be called with 
the DRM_GPUVA_MANAGER_REGIONS feature flag.


I really would not want to open code regions or have two GPUVA 
manager instances in nouveau to track sparse page tables. That would 
be really messy, hence I hope we can agree on this to be an optional 
feature.


I absolutely don't think that this is a good idea then. This separate 
handling of sparse page tables is completely Nouveau specific.


Actually, I rely on what you said in a previous mail when I say it's, 
potentially, not specific to nouveau.



This sounds similar to what AMD hw used to have up until gfx8 (I 
think), basically sparse resources where defined through a separate 
mechanism to the address resolution of the page tables. I won't rule 
out that other hardware has similar approaches.




Ok, sounds like I didn't made my point here clear: AMD does have that 
same mechanism for older hw you try to implement here for Nouveau, but 
we have *abandoned* it because it is to much trouble and especially 
overhead to support! In other words we have said "Ok we would need two 
separate components to cleanly handle that, one for newer hw and one for 
older hw.".


What you now try to do is to write one component which works for both. 
We have already exercised this idea and came to the conclusion that it's 
not a good path to go down. So you're basically just repeating our mistake.


I mean if it's just for Nouveau then I would say feel free to do 
whatever you want, but since this component is supposed to be used by 
more drivers then I strongly think we need to tackle this from a 
different side.


Even when it's optional feature mixing this into the common handling 
is exactly what I pointed out as not properly separating between 
hardware specific and hardware agnostic functionality.


Optionally having regions is *not* a hardware specific concept, 
drivers might use it for a hardware specific purpose though. Which 
potentially is is the case for almost every DRM helper.


Drivers can use regions only for the sake of not merging between 
buffer boundaries as well. Some drivers might prefer this over "never 
merge" or "always merge", depending on the cost of re-organizing page 
tables for unnecessary splits/merges, without having the need of 
tracking separate sparse page tables.


Its just that I think *if* a driver needs to track separate sparse 
page tables anyways its a nice synergy since then there is no extra 
cost for g

Re: [Intel-gfx] [PATCH 2/2] Apply quirk to disable PSR 2 on Tongfang PHxTxX1 and PHxTQx1

2023-02-22 Thread Hogander, Jouni

On Wed, 2023-02-22 at 15:13 -0500, Rodrigo Vivi wrote:
> On Wed, Feb 22, 2023 at 03:17:55PM +0100, Werner Sembach wrote:
> > On these Barebones PSR 2 is recognized as supported but is very
> > buggy:
> > - Upper third of screen does sometimes not updated, resulting in
> > disappearing cursors or ghosts of already closed Windows saying
> > behind.
> > - Approximately 40 px from the bottom edge a 3 pixel wide strip of
> > randomly
> > colored pixels is flickering.
> > 
> > PSR 1 is working fine however.
> 
> I wonder if this is really about the panel's PSR2 or about the
> userspace
> there not marking the dirtyfb? I know I know... it is not userspace
> fault...
> 
> But I wonder if the case you got here highlights the fact that we
> have
> a substantial bug in the i915 itself in regards to PSR2 API.
> 
> Jose, Jouni, ideas on how to check what could be happening here?

There is already fix for this (Thanks to Werner Sembach for testing the
patch):

https://patchwork.freedesktop.org/series/114217/

> 
> oh, btw, Werner, do we have an  open gilab issue for this?

https://gitlab.freedesktop.org/drm/intel/-/issues/7347

> 
> Thanks,
> Rodrigo.
> 
> > 
> > Signed-off-by: Werner Sembach 
> > Cc: 
> > ---
> >  drivers/gpu/drm/i915/display/intel_quirks.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_quirks.c
> > b/drivers/gpu/drm/i915/display/intel_quirks.c
> > index ce6d0fe6448f5..eeb32d3189f5c 100644
> > --- a/drivers/gpu/drm/i915/display/intel_quirks.c
> > +++ b/drivers/gpu/drm/i915/display/intel_quirks.c
> > @@ -65,6 +65,10 @@ static void
> > quirk_no_pps_backlight_power_hook(struct drm_i915_private *i915)
> > drm_info(&i915->drm, "Applying no pps backlight power
> > quirk\n");
> >  }
> >  
> > +/*
> > + * Tongfang PHxTxX1 and PHxTQx1 devices have support for PSR 2 but
> > it is broken
> > + * on Linux. PSR 1 however works just fine.
> > + */
> >  static void quirk_no_psr2(struct drm_i915_private *i915)
> >  {
> > intel_set_quirk(i915, QUIRK_NO_PSR2);
> > @@ -205,6 +209,10 @@ static struct intel_quirk intel_quirks[] = {
> > /* ECS Liva Q2 */
> > { 0x3185, 0x1019, 0xa94d, quirk_increase_ddi_disabled_time
> > },
> > { 0x3184, 0x1019, 0xa94d, quirk_increase_ddi_disabled_time
> > },
> > +
> > +   /* Tongfang PHxTxX1 and PHxTQx1/TUXEDO InfinityBook 14 Gen6
> > */
> > +   { 0x9a49, 0x1d05, 0x1105, quirk_no_psr2 },
> > +   { 0x9a49, 0x1d05, 0x114c, quirk_no_psr2 },
> >  };
> >  
> >  void intel_init_quirks(struct drm_i915_private *i915)
> > -- 
> > 2.34.1
> >

Re: [PATCH][next] i915/gvt: Fix spelling mistake "vender" -> "vendor"

2023-02-22 Thread Zhenyu Wang

On 2023.02.02 12:50:18 +, Colin Ian King wrote:
> There is a spelling mistake in a literal string. Fix it.
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/i915/gvt/firmware.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/firmware.c 
> b/drivers/gpu/drm/i915/gvt/firmware.c
> index dce93738e98a..4dd52ac2043e 100644
> --- a/drivers/gpu/drm/i915/gvt/firmware.c
> +++ b/drivers/gpu/drm/i915/gvt/firmware.c
> @@ -171,7 +171,7 @@ static int verify_firmware(struct intel_gvt *gvt,
>   mem = (fw->data + h->cfg_space_offset);
>  
>   id = *(u16 *)(mem + PCI_VENDOR_ID);
> - VERIFY("vender id", id, pdev->vendor);
> + VERIFY("vendor id", id, pdev->vendor);
>  
>   id = *(u16 *)(mem + PCI_DEVICE_ID);
>   VERIFY("device id", id, pdev->device);
> -- 

Thanks, Colin.

Acked-by: Zhenyu Wang 
> 


signature.asc
Description: PGP signature

Re: [Intel-gfx] [PATCH] i915: fix memory leak with using debugfs_lookup()

2023-02-22 Thread Zhenyu Wang

On 2023.02.06 15:23:55 -0500, Rodrigo Vivi wrote:
> On Thu, Feb 02, 2023 at 03:13:09PM +0100, Greg Kroah-Hartman wrote:
> > When calling debugfs_lookup() the result must have dput() called on it,
> > otherwise the memory will leak over time.  To make things simpler, just
> > call debugfs_lookup_and_remove() instead which handles all of the logic
> > at once.
> > 
> > Cc: Zhenyu Wang 
> > Cc: Zhi Wang 
> > Cc: Jani Nikula 
> > Cc: Joonas Lahtinen 
> > Cc: Rodrigo Vivi 
> > Cc: Tvrtko Ursulin 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: intel-gvt-...@lists.freedesktop.org
> > Cc: intel-...@lists.freedesktop.org
> > Cc: dri-devel@lists.freedesktop.org
> > Signed-off-by: Greg Kroah-Hartman 
> 
> 
> Reviewed-by: Rodrigo Vivi 
> 
> Zhenyu or Zhi, could you please propagate this through your gvt branch?
> 

Sorry I missed this one after I migrated my mail stuff onto new machine..
Looks good to me.

Reviewed-by: Zhenyu Wang 

> >  drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> > b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > index 8ae7039b3683..de675d799c7d 100644
> > --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> > +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > @@ -699,7 +699,7 @@ static void intel_vgpu_close_device(struct vfio_device 
> > *vfio_dev)
> >  
> > clear_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status);
> >  
> > -   debugfs_remove(debugfs_lookup(KVMGT_DEBUGFS_FILENAME, vgpu->debugfs));
> > +   debugfs_lookup_and_remove(KVMGT_DEBUGFS_FILENAME, vgpu->debugfs);
> >  
> > kvm_page_track_unregister_notifier(vgpu->vfio_device.kvm,
> >&vgpu->track_node);
> > -- 
> > 2.39.1
> > 


signature.asc
Description: PGP signature

Re: [PATCH] Documentation: gpu: add acceleration node section

2023-02-22 Thread Bagas Sanjaya

On Thu, Feb 23, 2023 at 02:52:52AM +, Dylan Le wrote:
> 
> This patch was initially written for the Linux Kernel Bug Fixing Mentorship
> program. The patch adds a temporarily stubbed section on Acceleration Nodes
> to resolve a documentation warning.
> 
> This resolves the warning:
> ./Documentation/gpu/drm-internals:179: ./include/drm/drm_file.h:411: WARNING: 
> undefined label: drm_accel_node

Please write the patch description in imperative mood ("Do foo" instead
of "This patch does foo").

> 
> I would appreciate any feedback on what should be documented here.

I think above is better placed between the three dashes and diffstat ...


> ---

like here.

>  Documentation/gpu/drm-uapi.rst | 9 +
>  1 file changed, 9 insertions(+)
> 

> +.. _drm_accel_node:
> +
> +Acceleration nodes
> +==
> +
> +.. note::
> +   There is not any documentation yet need to figure out what this is.

I'd like to write this stub as generic .. admonition:: block [1] instead,
with the content which is "This section is empty, add appropriate
documentation here." or similar.

[1]: 
https://docutils.sourceforge.io/docs/ref/rst/directives.html#generic-admonition

Thanks. 

-- 
An old man doll... just what I always wanted! - Clara


signature.asc
Description: PGP signature

Re: [PATCH 1/2] Mips: ls2k1000: dts: add the display controller device node

2023-02-22 Thread Sui jingfeng


Hi,

On 2023/2/23 02:32, Krzysztof Kozlowski wrote:

On 22/02/2023 17:55, suijingfeng wrote:

The display controller is a pci device, it's pci vendor id is
0x0014, it's pci device id is 0x7a06.

Signed-off-by: suijingfeng 
---
  .../boot/dts/loongson/loongson64-2k1000.dtsi  | 21 +++
  1 file changed, 21 insertions(+)

diff --git a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi 
b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
index 8143a6e3..a528af3977d9 100644
--- a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
+++ b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
@@ -31,6 +31,18 @@ memory@20 {
<0x0001 0x1000 0x0001 0xb000>; /* 6912 
MB at 4352MB */
};
  
+	reserved-memory {

+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   display_reserved: framebuffer@3000 {
+   compatible = "shared-dma-pool";
+   reg = <0x0 0x3000 0x0 0x0400>; /* 64M */
+   linux,cma-default;
+   };
+   };
+
cpu_clk: cpu_clk {
#clock-cells = <0>;
compatible = "fixed-clock";
@@ -198,6 +210,15 @@ sata@8,0 {
interrupt-parent = <&liointc0>;
};
  
+			display-controller@6,0 {

+   compatible = "loongson,ls2k1000-dc";
+
+   reg = <0x3000 0x0 0x0 0x0 0x0>;
+   interrupts = <28 IRQ_TYPE_LEVEL_LOW>;
+   interrupt-parent = <&liointc0>;
+   memory-region = <&display_reserved>;

NAK.

Err :(,  please give me a chance to explain

Test your code against the bindings you send.


I can guarantee to you that I test may code more than twice. The code 
used to testing is listed at link [1].


This patchset  mainly used to illustrate how  we made the driver in [1] 
usable on our SoC platform.



It's the same
patchset. You basically send something which the same moment is incorrect.


Loongson display controller IP has been integrated in both Loongson
North Bridge chipset(ls7a1000 and ls7a2000) and Loongson SoCs(ls2k1000
and ls2k2000 etc), it even has been included in Loongson BMC(ls2k0500 bmc)
products.

When use this driver on Loongson embedded platform(say ls2k2000, 
ls2k1000 and ls2k0500)  ,


the PMON/Uboot firmware(my company using pmon most of time) will pass a 
DT to the kernel.


Different boards will pass different DTs. But when using this driver on 
Loongson server and


PC platform( ls3c5000/ls3a5000+ls7a1000/ls7a2000), there will no DT 
supplied. The firmware


and kernel side developer of Loongson choose ACPI+UEFI for such 
platform, more discussion


can be found at [2]. Therefore, on such a situation we decide to send 
the patch at separate patchset.


It is not like the arm  and risc-v, as the binding would not be always 
exits. If we put those patches


into a same patchset, some reviewers would suggest us to revise our code.

To a form that the code *ALWAYS*  probed from the DT, this is not desired.

Besides, the driver code + dt support is petty large, separate it is 
more easy to review and manage.



Finally,  Thanks your kindly guiding and valuable reviews.


[1] https://patchwork.freedesktop.org/patch/523409/?series=113566&rev=4

[2] https://lkml.org/lkml/2022/7/15/135


Best regards,
Krzysztof

Re: [PATCH v2 06/10] dt-bindings: gpu: mali-bifrost: Add a compatible for MediaTek MT8186

2023-02-22 Thread Chen-Yu Tsai

On Wed, Feb 22, 2023 at 5:13 PM AngeloGioacchino Del Regno
 wrote:
>
> Il 22/02/23 09:37, Chen-Yu Tsai ha scritto:
> > On Tue, Feb 21, 2023 at 11:37 PM AngeloGioacchino Del Regno
> >  wrote:
> >>
> >> Get GPU support on MT8186 by adding its compatible.
> >>
> >> Signed-off-by: AngeloGioacchino Del Regno 
> >> 
> >> ---
> >>   Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml | 5 +
> >>   1 file changed, 5 insertions(+)
> >>
> >> diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml 
> >> b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
> >> index be18b161959b..43a841d4e94d 100644
> >> --- a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
> >> +++ b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
> >> @@ -15,6 +15,11 @@ properties:
> >>
> >> compatible:
> >>   oneOf:
> >> +  - items:
> >> +  - enum:
> >> +  - mediatek,mt8186-mali
> >> +  - const: mediatek,mt8183b-mali
> >> +  - const: arm,mali-bifrost # Mali Bifrost GPU model/revision is 
> >> fully discoverable
> >
> > The MT8186 has Mali-G52 MC2 2EE, while the MT8183 has Mali-G72 MP3.
>
> Keeping in mind the obvious - which is that G52 and G72 are both Bifrost
>
> > So we actually need a new entry with two power domains.
> >
>
> ...This is my node for MT8186:
>
> gpu: gpu@1304 {
> compatible = "mediatek,mt8186-mali",
>  "mediatek,mt8183b-mali",
>  "arm,mali-bifrost";
> reg = <0 0x1304 0 0x4000>;
>
> clocks = <&mfgsys CLK_MFG_BG3D>;
> interrupts = ,
>  ,
>  ;
> interrupt-names = "job", "mmu", "gpu";
> power-domains = <&spm MT8186_POWER_DOMAIN_MFG1>,
> <&spm MT8186_POWER_DOMAIN_MFG2>,
> <&spm MT8186_POWER_DOMAIN_MFG3>;
> power-domain-names = "core0", "core1", "core2";
>
> /* Please ignore speedbin, that's for another time 
> :-) */
> nvmem-cells = <&gpu_volt_bin>;
> nvmem-cell-names = "speed-bin";
> #cooling-cells = <2>;
> };
>
> There are three MFG power domains... MFG2 and MFG3 are parents of MFG1, on 
> that
> I agree, but we can avoid adding a new entry just for MT8186 and use the 
> MT8183-b
> one while still being technically correct.
>
> Besides, Mali G52 and Mali G72 are both Bifrost... so I don't think that this
> commit is incorrect. For the sake of simplicity, I would push on getting this
> one picked.

I'm aware. In case it wasn't obvious, Mali-G52 MC2 2EE has 2 cores, while
Mali-G72 MP3 has 3 cores. I think that is reason enough to do a new entry.
Otherwise you are describing power domains for 3 cores for a GPU that only
has two.

> Unless there are any real-strong opinions against...

Yes.

ChenYu

Re: [PATCH v2 6/6] drm/msm/adreno: Enable optional icc voting from OPP tables

2023-02-22 Thread Dmitry Baryshkov

On Thu, 23 Feb 2023 at 03:47, Konrad Dybcio  wrote:
>
> Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework
> handle bus voting as part of power level setting.
>
> Signed-off-by: Konrad Dybcio 

Reviewed-by: Dmitry Baryshkov 


-- 
With best wishes
Dmitry

Re: [PATCH v2 3/6] drm/msm/a2xx: Implement .gpu_busy

2023-02-22 Thread Dmitry Baryshkov

On Thu, 23 Feb 2023 at 03:47, Konrad Dybcio  wrote:
>
> Implement gpu_busy based on the downstream msm-3.4 code [1]. This
> allows us to use devfreq on this old old old hardware!
>
> [1] 
> https://github.com/LineageOS/android_kernel_sony_apq8064/blob/lineage-16.0/drivers/gpu/msm/adreno_a2xx.c#L1975
>
> Signed-off-by: Konrad Dybcio 

Reviewed-by: Dmitry Baryshkov 

Small nit below

> ---
>  drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 24 
>  1 file changed, 24 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> index c67089a7ebc1..6f9876b37db5 100644
> --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> @@ -481,6 +481,29 @@ a2xx_create_address_space(struct msm_gpu *gpu, struct 
> platform_device *pdev)
> return aspace;
>  }
>
> +/* While the precise size of this field is unknown, it holds at least these 
> three values.. */
> +static u64 a2xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
> +{
> +   u64 busy_cycles;
> +
> +   /* Freeze the counter */
> +   gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_FREEZE);
> +
> +   busy_cycles = gpu_read64(gpu, REG_A2XX_RBBM_PERFCOUNTER1_LO);
> +
> +   /* Reset the counter */
> +   gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_RESET);
> +
> +   /* Re-enable the performance monitors */
> +   gpu_rmw(gpu, REG_A2XX_RBBM_PM_OVERRIDE2, BIT(6), BIT(6));

It's DEBUG_PERF_SCLK_PM_OVERRIDE

See 
https://github.com/genesi/linux-legacy/blob/master/drivers/mxc/amd-gpu/include/reg/yamato/10/yamato_mask.h#L4428

> +   gpu_write(gpu, REG_A2XX_RBBM_PERFCOUNTER1_SELECT, 1);
> +   gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_ENABLE);
> +
> +   *out_sample_rate = clk_get_rate(gpu->core_clk);
> +
> +   return busy_cycles;
> +}
> +
>  static u32 a2xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
>  {
> ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR);
> @@ -502,6 +525,7 @@ static const struct adreno_gpu_funcs funcs = {
>  #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
> .show = adreno_show,
>  #endif
> +   .gpu_busy = a2xx_gpu_busy,
> .gpu_state_get = a2xx_gpu_state_get,
> .gpu_state_put = adreno_gpu_state_put,
> .create_address_space = a2xx_create_address_space,
>
> --
> 2.39.2
>


-- 
With best wishes
Dmitry

Re: [PATCH v2 2/6] drm/msm/adreno: Use OPP for every GPU generation

2023-02-22 Thread Dmitry Baryshkov

On Thu, 23 Feb 2023 at 03:47, Konrad Dybcio  wrote:
>
> Some older GPUs (namely a2xx with no opp tables at all and a320 with
> downstream-remnants gpu pwrlevels) used not to have OPP tables. They
> both however had just one frequency defined, making it extremely easy
> to construct such an OPP table from within the driver if need be.
>
> Do so and switch all clk_set_rate calls on core_clk to their OPP
> counterparts.
>
> Signed-off-by: Konrad Dybcio 

Reviewed-by: Dmitry Baryshkov 

Minor nit below.

> ---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 99 
> +++--
>  drivers/gpu/drm/msm/msm_gpu.c   |  4 +-
>  drivers/gpu/drm/msm/msm_gpu_devfreq.c   |  2 +-
>  3 files changed, 48 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index ce6b76c45b6f..8721e3d6231a 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -922,73 +922,48 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, 
> uint32_t ndwords)
> ring->id);
>  }
>
> -/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp table 
> */
> -static int adreno_get_legacy_pwrlevels(struct device *dev)
> -{
> -   struct device_node *child, *node;
> -   int ret;
> -
> -   node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels");
> -   if (!node) {
> -   DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n");
> -   return -ENXIO;
> -   }
> -
> -   for_each_child_of_node(node, child) {
> -   unsigned int val;
> -
> -   ret = of_property_read_u32(child, "qcom,gpu-freq", &val);
> -   if (ret)
> -   continue;
> -
> -   /*
> -* Skip the intentionally bogus clock value found at the 
> bottom
> -* of most legacy frequency tables
> -*/
> -   if (val != 2700)
> -   dev_pm_opp_add(dev, val, 0);
> -   }
> -
> -   of_node_put(node);
> -
> -   return 0;
> -}
> -
> -static void adreno_get_pwrlevels(struct device *dev,
> +static int adreno_get_pwrlevels(struct device *dev,
> struct msm_gpu *gpu)
>  {
> +   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> unsigned long freq = ULONG_MAX;
> struct dev_pm_opp *opp;
> int ret;
>
> gpu->fast_rate = 0;
>
> -   /* You down with OPP? */
> -   if (!of_find_property(dev->of_node, "operating-points-v2", NULL))
> -   ret = adreno_get_legacy_pwrlevels(dev);
> -   else {
> -   ret = devm_pm_opp_of_add_table(dev);
> -   if (ret)
> -   DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
> -   }
> -
> -   if (!ret) {
> -   /* Find the fastest defined rate */
> -   opp = dev_pm_opp_find_freq_floor(dev, &freq);
> -   if (!IS_ERR(opp)) {
> -   gpu->fast_rate = freq;
> -   dev_pm_opp_put(opp);
> +   /* devm_pm_opp_of_add_table may error out but will still create an 
> OPP table */
> +   ret = devm_pm_opp_of_add_table(dev);
> +   if (ret == -ENODEV) {
> +   /* Special cases for ancient hw with ancient DT bindings */
> +   if (adreno_is_a2xx(adreno_gpu)) {
> +   dev_warn(dev, "Unable to find the OPP table. Falling 
> back to 200 MHz.\n");
> +   dev_pm_opp_add(dev, 2, 0);
> +   } else if (adreno_is_a320(adreno_gpu)) {
> +   dev_warn(dev, "Unable to find the OPP table. Falling 
> back to 450 MHz.\n");
> +   dev_pm_opp_add(dev, 45000, 0);
> +   } else {
> +   DRM_DEV_ERROR(dev, "Unable to find the OPP table\n");
> +   return -ENODEV;
> }
> +   } else if (ret) {
> +   DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
> +   return ret;
> }
>
> -   if (!gpu->fast_rate) {
> -   dev_warn(dev,
> -   "Could not find a clock rate. Using a reasonable 
> default\n");
> -   /* Pick a suitably safe clock speed for any target */
> -   gpu->fast_rate = 2;
> +   /* Find the fastest defined rate */
> +   opp = dev_pm_opp_find_freq_floor(dev, &freq);
> +
> +   if (IS_ERR(opp))
> +   return PTR_ERR(opp);
> +   else {
> +   gpu->fast_rate = freq;
> +   dev_pm_opp_put(opp);
> }

Nit: you can drop else {} here.

>
> DBG("fast_rate=%u, slow_rate=2700", gpu->fast_rate);
> +
> +   return 0;
>  }
>
>  int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu *adreno_gpu,
> @@ -1046,6 +1021,20 @@ int adreno_gpu_init(struct drm_device *drm, struct 
>

Re: [Intel-gfx] How is the progress for removing flush_scheduled_work() callers?

2023-02-22 Thread Tetsuo Handa

On 2022/11/16 22:06, Ville Syrjälä wrote:
> On Wed, Nov 16, 2022 at 12:08:27PM +0200, Jani Nikula wrote:
>> On Sun, 06 Nov 2022, Tetsuo Handa  wrote:
>>> Like commit c4f135d643823a86 ("workqueue: Wrap flush_workqueue() using a
>>> macro") says, flush_scheduled_work() is dangerous and will be forbidden.
>>> We are on the way for removing all flush_scheduled_work() callers from
>>> the kernel, and there are only 4 callers remaining as of linux-20221104.
>>>
>>>   drivers/gpu/drm/i915/display/intel_display.c:8997:  
>>> flush_scheduled_work();
>>
>> Thanks for the reminder, I've pinged folks to get someone working on
>> this. We do schedule quite a bunch of work, so it's not immediately
>> obvious (at least to me) what exactly needs flushing.
> 
> Here's my earlier cursory analysis of the subject:
> https://lore.kernel.org/intel-gfx/yy3byxfrfafql...@intel.com/

Now that a patch for mptscsih.c was proposed as
https://lkml.kernel.org/r/0b9ebcfb-b647-1381-0653-b54528a64...@i-love.sakura.ne.jp
 ,
intel_display.c is going to become the last flush_scheduled_work() user.

If fixing the hpd disable path takes more time, should we start with moving
related works from system_wq to a local workqueue dedicated for intel_display.c 
?

> 
>>
>> https://gitlab.freedesktop.org/drm/intel/-/issues/7546
>>
>>>   drivers/gpu/drm/i915/gt/selftest_execlists.c:88:
>>> flush_scheduled_work();
>>
>> Removed by commit 7d33fd02dd94 ("drm/i915/selftests: Remove
>> flush_scheduled_work() from live_execlists") in drm-next.
>>
>> BR,
>> Jani.
>>
>> -- 
>> Jani Nikula, Intel Open Source Graphics Center
>

[PATCH v2 6/6] drm/msm/adreno: Enable optional icc voting from OPP tables

2023-02-22 Thread Konrad Dybcio

Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework
handle bus voting as part of power level setting.

Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/adreno_device.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 36f062c7582f..5142a4c72cfc 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -548,6 +548,10 @@ static int adreno_bind(struct device *dev, struct device 
*master, void *data)
return PTR_ERR(gpu);
}
 
+   ret = dev_pm_opp_of_find_icc_paths(dev, NULL);
+   if (ret)
+   return ret;
+
return 0;
 }
 

-- 
2.39.2

[PATCH v2 3/6] drm/msm/a2xx: Implement .gpu_busy

2023-02-22 Thread Konrad Dybcio

Implement gpu_busy based on the downstream msm-3.4 code [1]. This
allows us to use devfreq on this old old old hardware!

[1] 
https://github.com/LineageOS/android_kernel_sony_apq8064/blob/lineage-16.0/drivers/gpu/msm/adreno_a2xx.c#L1975

Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index c67089a7ebc1..6f9876b37db5 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -481,6 +481,29 @@ a2xx_create_address_space(struct msm_gpu *gpu, struct 
platform_device *pdev)
return aspace;
 }
 
+/* While the precise size of this field is unknown, it holds at least these 
three values.. */
+static u64 a2xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
+{
+   u64 busy_cycles;
+
+   /* Freeze the counter */
+   gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_FREEZE);
+
+   busy_cycles = gpu_read64(gpu, REG_A2XX_RBBM_PERFCOUNTER1_LO);
+
+   /* Reset the counter */
+   gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_RESET);
+
+   /* Re-enable the performance monitors */
+   gpu_rmw(gpu, REG_A2XX_RBBM_PM_OVERRIDE2, BIT(6), BIT(6));
+   gpu_write(gpu, REG_A2XX_RBBM_PERFCOUNTER1_SELECT, 1);
+   gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_ENABLE);
+
+   *out_sample_rate = clk_get_rate(gpu->core_clk);
+
+   return busy_cycles;
+}
+
 static u32 a2xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR);
@@ -502,6 +525,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
.show = adreno_show,
 #endif
+   .gpu_busy = a2xx_gpu_busy,
.gpu_state_get = a2xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
.create_address_space = a2xx_create_address_space,

-- 
2.39.2

[PATCH v2 5/6] drm/msm/a4xx: Implement .gpu_busy

2023-02-22 Thread Konrad Dybcio

Add support for gpu_busy on a4xx, which is required for devfreq
support.

Reviewed-by: Dmitry Baryshkov 
Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 3e09d3a7a0ac..715436cb3996 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -611,6 +611,16 @@ static int a4xx_get_timestamp(struct msm_gpu *gpu, 
uint64_t *value)
return 0;
 }
 
+static u64 a4xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
+{
+   u64 busy_cycles;
+
+   busy_cycles = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_RBBM_1_LO);
+   *out_sample_rate = clk_get_rate(gpu->core_clk);
+
+   return busy_cycles;
+}
+
 static u32 a4xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
ring->memptrs->rptr = gpu_read(gpu, REG_A4XX_CP_RB_RPTR);
@@ -632,6 +642,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
.show = adreno_show,
 #endif
+   .gpu_busy = a4xx_gpu_busy,
.gpu_state_get = a4xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
.create_address_space = adreno_create_address_space,

-- 
2.39.2

[PATCH v2 2/6] drm/msm/adreno: Use OPP for every GPU generation

2023-02-22 Thread Konrad Dybcio

Some older GPUs (namely a2xx with no opp tables at all and a320 with
downstream-remnants gpu pwrlevels) used not to have OPP tables. They
both however had just one frequency defined, making it extremely easy
to construct such an OPP table from within the driver if need be.

Do so and switch all clk_set_rate calls on core_clk to their OPP
counterparts.

Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 99 +++--
 drivers/gpu/drm/msm/msm_gpu.c   |  4 +-
 drivers/gpu/drm/msm/msm_gpu_devfreq.c   |  2 +-
 3 files changed, 48 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index ce6b76c45b6f..8721e3d6231a 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -922,73 +922,48 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, 
uint32_t ndwords)
ring->id);
 }
 
-/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp table */
-static int adreno_get_legacy_pwrlevels(struct device *dev)
-{
-   struct device_node *child, *node;
-   int ret;
-
-   node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels");
-   if (!node) {
-   DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n");
-   return -ENXIO;
-   }
-
-   for_each_child_of_node(node, child) {
-   unsigned int val;
-
-   ret = of_property_read_u32(child, "qcom,gpu-freq", &val);
-   if (ret)
-   continue;
-
-   /*
-* Skip the intentionally bogus clock value found at the bottom
-* of most legacy frequency tables
-*/
-   if (val != 2700)
-   dev_pm_opp_add(dev, val, 0);
-   }
-
-   of_node_put(node);
-
-   return 0;
-}
-
-static void adreno_get_pwrlevels(struct device *dev,
+static int adreno_get_pwrlevels(struct device *dev,
struct msm_gpu *gpu)
 {
+   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
unsigned long freq = ULONG_MAX;
struct dev_pm_opp *opp;
int ret;
 
gpu->fast_rate = 0;
 
-   /* You down with OPP? */
-   if (!of_find_property(dev->of_node, "operating-points-v2", NULL))
-   ret = adreno_get_legacy_pwrlevels(dev);
-   else {
-   ret = devm_pm_opp_of_add_table(dev);
-   if (ret)
-   DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
-   }
-
-   if (!ret) {
-   /* Find the fastest defined rate */
-   opp = dev_pm_opp_find_freq_floor(dev, &freq);
-   if (!IS_ERR(opp)) {
-   gpu->fast_rate = freq;
-   dev_pm_opp_put(opp);
+   /* devm_pm_opp_of_add_table may error out but will still create an OPP 
table */
+   ret = devm_pm_opp_of_add_table(dev);
+   if (ret == -ENODEV) {
+   /* Special cases for ancient hw with ancient DT bindings */
+   if (adreno_is_a2xx(adreno_gpu)) {
+   dev_warn(dev, "Unable to find the OPP table. Falling 
back to 200 MHz.\n");
+   dev_pm_opp_add(dev, 2, 0);
+   } else if (adreno_is_a320(adreno_gpu)) {
+   dev_warn(dev, "Unable to find the OPP table. Falling 
back to 450 MHz.\n");
+   dev_pm_opp_add(dev, 45000, 0);
+   } else {
+   DRM_DEV_ERROR(dev, "Unable to find the OPP table\n");
+   return -ENODEV;
}
+   } else if (ret) {
+   DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
+   return ret;
}
 
-   if (!gpu->fast_rate) {
-   dev_warn(dev,
-   "Could not find a clock rate. Using a reasonable 
default\n");
-   /* Pick a suitably safe clock speed for any target */
-   gpu->fast_rate = 2;
+   /* Find the fastest defined rate */
+   opp = dev_pm_opp_find_freq_floor(dev, &freq);
+
+   if (IS_ERR(opp))
+   return PTR_ERR(opp);
+   else {
+   gpu->fast_rate = freq;
+   dev_pm_opp_put(opp);
}
 
DBG("fast_rate=%u, slow_rate=2700", gpu->fast_rate);
+
+   return 0;
 }
 
 int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu *adreno_gpu,
@@ -1046,6 +1021,20 @@ int adreno_gpu_init(struct drm_device *drm, struct 
platform_device *pdev,
struct adreno_rev *rev = &config->rev;
const char *gpu_name;
u32 speedbin;
+   int ret;
+
+   /*
+* This can only be done before devm_pm_opp_of_add_table(), or
+* dev_pm_opp_set_config() will WARN_ON()
+*/
+   if (IS_ERR(devm_clk_get(dev, "core"))) {
+   /*
+* If "core" is absent

[PATCH v2 4/6] drm/msm/a3xx: Implement .gpu_busy

2023-02-22 Thread Konrad Dybcio

Add support for gpu_busy on a3xx, which is required for devfreq
support.

Tested-by: Dmitry Baryshkov  #ifc6410
Reviewed-by: Dmitry Baryshkov 
Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 948785ed07bb..c86b377f6f0d 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -477,6 +477,16 @@ static struct msm_gpu_state *a3xx_gpu_state_get(struct 
msm_gpu *gpu)
return state;
 }
 
+static u64 a3xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
+{
+   u64 busy_cycles;
+
+   busy_cycles = gpu_read64(gpu, REG_A3XX_RBBM_PERFCTR_RBBM_1_LO);
+   *out_sample_rate = clk_get_rate(gpu->core_clk);
+
+   return busy_cycles;
+}
+
 static u32 a3xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR);
@@ -498,6 +508,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
.show = adreno_show,
 #endif
+   .gpu_busy = a3xx_gpu_busy,
.gpu_state_get = a3xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
.create_address_space = adreno_create_address_space,

-- 
2.39.2

[PATCH v2 0/6] OPP and devfreq for all Adrenos

2023-02-22 Thread Konrad Dybcio

v1 -> v2:

- Move a2xx #defines to XML
- Use dev_pm_opp_find_freq_floor in the common path in [2/6]
- Clarify a comment in [2/6]
- Move voting from a5xx to Adreno-wide [6/6]
- Pick up tags

v1: 
https://lore.kernel.org/linux-arm-msm/20230222-konrad-longbois-next-v1-0-010214257...@linaro.org

This series is a combination of [1] and a subset of [2] and some new
stuff.

With it, devfreq is used on all a2xx-a6xx (including gmu and
gmu-wrapper) and all clk_set_rate(core clock) calls are dropped in
favour of dev_pm_opp_set_rate, which - drumroll - lets us scale
the voltage domain. DT patches making use of that will be sent
separately.

On top of that, a5xx gets a call to enable icc scaling from the OPP
tables. No SoCs implementing a2xx have icc support yet and a3/4xx
SoCs have separate logic for that, which will be updated at a later
time.

Getting this in for 6.4 early would be appreciated, as that would
allow for getting GMU wrapper GPUs up (without VDD&icc scaling they
can only run at lower freqs, which is.. ehhh..)

Changes:
- a3xx busy: use the _1 counter as per msm-3.x instead of _0
- a6xx-series-opp: basically rewrite, ensure compat with all gens
- a2/4xx busy: new patch
- a5xx icc: new patch

[1] 
https://lore.kernel.org/linux-arm-msm/20230130093809.2079314-1-konrad.dyb...@linaro.org/
[2] 
https://lore.kernel.org/linux-arm-msm/20230214173145.2482651-1-konrad.dyb...@linaro.org/

Signed-off-by: Konrad Dybcio 
---
Konrad Dybcio (6):
  drm/msm/a2xx: Include perf counter reg values in XML
  drm/msm/adreno: Use OPP for every GPU generation
  drm/msm/a2xx: Implement .gpu_busy
  drm/msm/a3xx: Implement .gpu_busy
  drm/msm/a4xx: Implement .gpu_busy
  drm/msm/adreno: Enable optional icc voting from OPP tables

 drivers/gpu/drm/msm/adreno/a2xx.xml.h  |  6 ++
 drivers/gpu/drm/msm/adreno/a2xx_gpu.c  | 24 
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c  | 11 
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c  | 11 
 drivers/gpu/drm/msm/adreno/adreno_device.c |  4 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c| 99 ++
 drivers/gpu/drm/msm/msm_gpu.c  |  4 +-
 drivers/gpu/drm/msm/msm_gpu_devfreq.c  |  2 +-
 8 files changed, 104 insertions(+), 57 deletions(-)
---
base-commit: aaf70d5ad5e2b06a8050c51e278b0c3a14fabef5
change-id: 20230223-topic-opp-01e7112b867d

Best regards,
-- 
Konrad Dybcio

[PATCH v2 1/6] drm/msm/a2xx: Include perf counter reg values in XML

2023-02-22 Thread Konrad Dybcio

This is a partial merge of [1], subject to be dropped if a header
update is executed.

[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21480/

Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a2xx.xml.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx.xml.h 
b/drivers/gpu/drm/msm/adreno/a2xx.xml.h
index afa6023346c4..b85fdc082bc1 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx.xml.h
+++ b/drivers/gpu/drm/msm/adreno/a2xx.xml.h
@@ -1060,6 +1060,12 @@ enum a2xx_mh_perfcnt_select {
AXI_TOTAL_READ_REQUEST_DATA_BEATS = 181,
 };
 
+enum perf_mode_cnt {
+   PERF_STATE_RESET = 0,
+   PERF_STATE_ENABLE = 1,
+   PERF_STATE_FREEZE = 2,
+};
+
 enum adreno_mmu_clnt_beh {
BEH_NEVR = 0,
BEH_TRAN_RNG = 1,

-- 
2.39.2

Re: [Intel-gfx] [PATCH] drm/i915/gsc: Fix the Driver-FLR completion

2023-02-22 Thread Belgaumkar, Vinay




On 2/22/2023 1:01 PM, Alan Previn wrote:

The Driver-FLR flow may inadvertently exit early before the full
completion of the re-init of the internal HW state if we only poll
GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead
we need a two-step completion wait-for-completion flow that also
involves GU_CNTL. See the patch and new code comments for detail.
This is new direction from HW architecture folks.

v2: - Add error message for the teardown timeout (Anshuman)
- Don't duplicate code in comments (Jani)


LGTM,

Tested-by: Vinay Belgaumkar 



Signed-off-by: Alan Previn 
Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was 
loaded")
---
  drivers/gpu/drm/i915/intel_uncore.c | 13 -
  1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
b/drivers/gpu/drm/i915/intel_uncore.c
index f018da7ebaac..f3c46352db89 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -2749,14 +2749,25 @@ static void driver_initiated_flr(struct intel_uncore 
*uncore)
/* Trigger the actual Driver-FLR */
intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR);
  
+	/* Wait for hardware teardown to complete */

+   ret = intel_wait_for_register_fw(uncore, GU_CNTL,
+DRIVERFLR_STATUS, 0,
+flr_timeout_ms);
+   if (ret) {
+   drm_err(&i915->drm, "Driver-FLR-teardown wait completion failed! 
%d\n", ret);
+   return;
+   }
+
+   /* Wait for hardware/firmware re-init to complete */
ret = intel_wait_for_register_fw(uncore, GU_DEBUG,
 DRIVERFLR_STATUS, DRIVERFLR_STATUS,
 flr_timeout_ms);
if (ret) {
-   drm_err(&i915->drm, "wait for Driver-FLR completion failed! 
%d\n", ret);
+   drm_err(&i915->drm, "Driver-FLR-reinit wait completion failed! 
%d\n", ret);
return;
}
  
+	/* Clear sticky completion status */

intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS);
  }

[PATCH] drm/rockchip: use struct_size() in vop2_bind

2023-02-22 Thread Jacob Keller

Use the overflow-protected struct_size() helper macro to compute the
allocation size of the vop2 data structure.

Signed-off-by: Jacob Keller 
Cc: Sandy Huang 
Cc: Heiko Stübner 
Cc: David Airlie 
---
I found this while developing a coccinelle script to detect potential places
where struct_size() would be appropriate.

 drivers/gpu/drm/rockchip/rockchip_drm_vop2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
index 0e0012368976..3e5861653b69 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
@@ -2655,7 +2655,7 @@ static int vop2_bind(struct device *dev, struct device 
*master, void *data)
return -ENODEV;
 
/* Allocate vop2 struct and its vop2_win array */
-   alloc_size = sizeof(*vop2) + sizeof(*vop2->win) * vop2_data->win_size;
+   alloc_size = struct_size(vop2, win, vop2_data->win_size);
vop2 = devm_kzalloc(dev, alloc_size, GFP_KERNEL);
if (!vop2)
return -ENOMEM;

base-commit: 4d5a2cce47a8e1e7119a881a4714f0eee557c1cd
-- 
2.39.1.405.gd4c25cc71f83

Re: [PATCH 1/5] drm/msm/adreno: Use OPP for every GPU generation

2023-02-22 Thread Konrad Dybcio




On 23.02.2023 00:16, Dmitry Baryshkov wrote:
> On 23/02/2023 00:40, Konrad Dybcio wrote:
>>
>>
>> On 22.02.2023 23:38, Dmitry Baryshkov wrote:
>>> On 22/02/2023 23:47, Konrad Dybcio wrote:
 Some older GPUs (namely a2xx with no opp tables at all and a320 with
 downstream-remnants gpu pwrlevels) used not to have OPP tables. They
 both however had just one frequency defined, making it extremely easy
 to construct such an OPP table from within the driver if need be.

 Do so and switch all clk_set_rate calls on core_clk to their OPP
 counterparts.

 Signed-off-by: Konrad Dybcio 
 ---
    drivers/gpu/drm/msm/adreno/adreno_gpu.c | 94 
 +++--
    drivers/gpu/drm/msm/msm_gpu.c   |  4 +-
    drivers/gpu/drm/msm/msm_gpu_devfreq.c   |  2 +-
    3 files changed, 45 insertions(+), 55 deletions(-)

 diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
 b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
 index ce6b76c45b6f..9b940c0f063f 100644
 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
 +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
 @@ -922,73 +922,50 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, 
 uint32_t ndwords)
    ring->id);
    }
    -/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp 
 table */
 -static int adreno_get_legacy_pwrlevels(struct device *dev)
 -{
 -    struct device_node *child, *node;
 -    int ret;
 -
 -    node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels");
 -    if (!node) {
 -    DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n");
 -    return -ENXIO;
 -    }
 -
 -    for_each_child_of_node(node, child) {
 -    unsigned int val;
 -
 -    ret = of_property_read_u32(child, "qcom,gpu-freq", &val);
 -    if (ret)
 -    continue;
 -
 -    /*
 - * Skip the intentionally bogus clock value found at the bottom
 - * of most legacy frequency tables
 - */
 -    if (val != 2700)
 -    dev_pm_opp_add(dev, val, 0);
 -    }
 -
 -    of_node_put(node);
 -
 -    return 0;
 -}
 -
 -static void adreno_get_pwrlevels(struct device *dev,
 +static int adreno_get_pwrlevels(struct device *dev,
    struct msm_gpu *gpu)
    {
 +    struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
    unsigned long freq = ULONG_MAX;
    struct dev_pm_opp *opp;
    int ret;
      gpu->fast_rate = 0;
    -    /* You down with OPP? */
 -    if (!of_find_property(dev->of_node, "operating-points-v2", NULL))
 -    ret = adreno_get_legacy_pwrlevels(dev);
 -    else {
 -    ret = devm_pm_opp_of_add_table(dev);
 -    if (ret)
 -    DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
 -    }
 -
 -    if (!ret) {
 +    /* devm_pm_opp_of_add_table may error out but will still create an 
 OPP table */
 +    ret = devm_pm_opp_of_add_table(dev);
 +    if (ret == -ENODEV) {
 +    /* Special cases for ancient hw with ancient DT bindings */
 +    if (adreno_is_a2xx(adreno_gpu)) {
 +    dev_warn(dev, "Unable to find the OPP table. Falling back to 
 200 MHz.\n");
 +    dev_pm_opp_add(dev, 2, 0);
 +    gpu->fast_rate = 2;
>>>
>>> We can skip setting the fast_rate, dev_pm_opp_find_freq_floor below will 
>>> get it from our freshly generated opp table.
>> It's not reached in this code path.
> 
> I see. I got lost in all the ifs. What do you think about turning it into the 
> main code path, since after this code block we always have a valid OPP table?
Sounds good!

Konrad
> 
>>
>>>
 +    } else if (adreno_is_a320(adreno_gpu)) {
 +    dev_warn(dev, "Unable to find the OPP table. Falling back to 
 450 MHz.\n");
 +    dev_pm_opp_add(dev, 45000, 0);
 +    gpu->fast_rate = 45000;
 +    } else {
 +    DRM_DEV_ERROR(dev, "Unable to find the OPP table\n");
 +    return -ENODEV;
 +    }
 +    } else if (ret) {
 +    DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
 +    return ret;
 +    } else {
    /* Find the fastest defined rate */
    opp = dev_pm_opp_find_freq_floor(dev, &freq);
 -    if (!IS_ERR(opp)) {
 +
 +    if (IS_ERR(opp))
 +    return PTR_ERR(opp);
 +    else {
    gpu->fast_rate = freq;
    dev_pm_opp_put(opp);
    }
    }
    -    if (!gpu->fast_rate) {
 -    dev_warn(dev,
 -    "Could not find a clock rate. Using a reasonable default\n");
 -    /* Pick a suitably safe clock speed for any

[PATCH v2 2/2] drm: rcar-du: Write correct values in DORCR reserved fields

2023-02-22 Thread Laurent Pinchart

The DORCR register controls the routing of clocks and data between DU
channels within a group. For groups that contain a single channel,
there's no routing option to control, and some fields of the register
are then reserved. On Gen2 those reserved fields are documented as
required to be set to 0, while on Gen3 and newer the PG1T, DK1S and PG1D
reserved fields must be set to 1.

The DU driver initializes the DORCR register in rcar_du_group_setup(),
where it ignores the PG1T, DK1S and PG1D, and then configures those
fields to the correct value in rcar_du_group_set_routing(). This hasn't
been shown to cause any issue, but prevents certifying that the driver
complies with the documentation in safety-critical use cases.

As there is no reasonable change that the documentation will be updated
to clarify that those reserved fields can be written to 0 temporarily
before starting the hardware, make sure that the registers are always
set to valid values.

Signed-off-by: Laurent Pinchart 
---
 drivers/gpu/drm/rcar-du/rcar_du_group.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/rcar-du/rcar_du_group.c 
b/drivers/gpu/drm/rcar-du/rcar_du_group.c
index b5950749d68a..2ccd2581f544 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_group.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_group.c
@@ -138,6 +138,7 @@ static void rcar_du_group_setup(struct rcar_du_group *rgrp)
 {
struct rcar_du_device *rcdu = rgrp->dev;
u32 defr7 = DEFR7_CODE;
+   u32 dorcr;
 
/* Enable extended features */
rcar_du_group_write(rgrp, DEFR, DEFR_CODE | DEFR_DEFE);
@@ -174,8 +175,15 @@ static void rcar_du_group_setup(struct rcar_du_group *rgrp)
/*
 * Use DS1PR and DS2PR to configure planes priorities and connects the
 * superposition 0 to DU0 pins. DU1 pins will be configured dynamically.
+*
+* Groups that have a single channel have a hardcoded configuration. On
+* Gen3 and newer, the documentation requires PG1T, DK1S and PG1D_DS1 to
+* always be set in this case.
 */
-   rcar_du_group_write(rgrp, DORCR, DORCR_PG0D_DS0 | DORCR_DPRS);
+   dorcr = DORCR_PG0D_DS0 | DORCR_DPRS;
+   if (rcdu->info->gen >= 3 && rgrp->num_crtcs == 1)
+   dorcr |= DORCR_PG1T | DORCR_DK1S | DORCR_PG1D_DS1;
+   rcar_du_group_write(rgrp, DORCR, dorcr);
 
/* Apply planes to CRTCs association. */
mutex_lock(&rgrp->lock);
-- 
Regards,

Laurent Pinchart

[PATCH v2 1/2] drm: rcar-du: Rename DORCR fields to make them 0-based

2023-02-22 Thread Laurent Pinchart

The DORCR fields were documented in the R-Car H1 datasheet with 1-based
named, and then got renamed to 0-based in Gen2. The 0-based names are
used for Gen3 and Gen4, making H1 an outlier. Rename the field macros to
make them 0-based, in order to increase readability of the code when
comparing it with the documentation.

Signed-off-by: Laurent Pinchart 
---
 drivers/gpu/drm/rcar-du/rcar_du_group.c |  8 
 drivers/gpu/drm/rcar-du/rcar_du_regs.h  | 26 -
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/rcar-du/rcar_du_group.c 
b/drivers/gpu/drm/rcar-du/rcar_du_group.c
index 152602236377..b5950749d68a 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_group.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_group.c
@@ -175,7 +175,7 @@ static void rcar_du_group_setup(struct rcar_du_group *rgrp)
 * Use DS1PR and DS2PR to configure planes priorities and connects the
 * superposition 0 to DU0 pins. DU1 pins will be configured dynamically.
 */
-   rcar_du_group_write(rgrp, DORCR, DORCR_PG1D_DS1 | DORCR_DPRS);
+   rcar_du_group_write(rgrp, DORCR, DORCR_PG0D_DS0 | DORCR_DPRS);
 
/* Apply planes to CRTCs association. */
mutex_lock(&rgrp->lock);
@@ -349,7 +349,7 @@ int rcar_du_group_set_routing(struct rcar_du_group *rgrp)
struct rcar_du_device *rcdu = rgrp->dev;
u32 dorcr = rcar_du_group_read(rgrp, DORCR);
 
-   dorcr &= ~(DORCR_PG2T | DORCR_DK2S | DORCR_PG2D_MASK);
+   dorcr &= ~(DORCR_PG1T | DORCR_DK1S | DORCR_PG1D_MASK);
 
/*
 * Set the DPAD1 pins sources. Select CRTC 0 if explicitly requested and
@@ -357,9 +357,9 @@ int rcar_du_group_set_routing(struct rcar_du_group *rgrp)
 * by default.
 */
if (rcdu->dpad1_source == rgrp->index * 2)
-   dorcr |= DORCR_PG2D_DS1;
+   dorcr |= DORCR_PG1D_DS0;
else
-   dorcr |= DORCR_PG2T | DORCR_DK2S | DORCR_PG2D_DS2;
+   dorcr |= DORCR_PG1T | DORCR_DK1S | DORCR_PG1D_DS1;
 
rcar_du_group_write(rgrp, DORCR, dorcr);
 
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_regs.h 
b/drivers/gpu/drm/rcar-du/rcar_du_regs.h
index 789ae9285108..6c750fab6ebb 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_regs.h
+++ b/drivers/gpu/drm/rcar-du/rcar_du_regs.h
@@ -511,19 +511,19 @@
  */
 
 #define DORCR  0x11000
-#define DORCR_PG2T (1 << 30)
-#define DORCR_DK2S (1 << 28)
-#define DORCR_PG2D_DS1 (0 << 24)
-#define DORCR_PG2D_DS2 (1 << 24)
-#define DORCR_PG2D_FIX0(2 << 24)
-#define DORCR_PG2D_DOOR(3 << 24)
-#define DORCR_PG2D_MASK(3 << 24)
-#define DORCR_DR1D (1 << 21)
-#define DORCR_PG1D_DS1 (0 << 16)
-#define DORCR_PG1D_DS2 (1 << 16)
-#define DORCR_PG1D_FIX0(2 << 16)
-#define DORCR_PG1D_DOOR(3 << 16)
-#define DORCR_PG1D_MASK(3 << 16)
+#define DORCR_PG1T (1 << 30)
+#define DORCR_DK1S (1 << 28)
+#define DORCR_PG1D_DS0 (0 << 24)
+#define DORCR_PG1D_DS1 (1 << 24)
+#define DORCR_PG1D_FIX0(2 << 24)
+#define DORCR_PG1D_DOOR(3 << 24)
+#define DORCR_PG1D_MASK(3 << 24)
+#define DORCR_DR0D (1 << 21)
+#define DORCR_PG0D_DS0 (0 << 16)
+#define DORCR_PG0D_DS1 (1 << 16)
+#define DORCR_PG0D_FIX0(2 << 16)
+#define DORCR_PG0D_DOOR(3 << 16)
+#define DORCR_PG0D_MASK(3 << 16)
 #define DORCR_RGPV (1 << 4)
 #define DORCR_DPRS (1 << 0)
 
-- 
Regards,

Laurent Pinchart

[PATCH v2 0/2] drm: rcar-du: Fix more invalid register writes

2023-02-22 Thread Laurent Pinchart

Hello,

Following the "[PATCH 0/2] drm: rcar-du: Avoid writing reserved register
fields" series ([1]), this series addresses more invalid register
writes in the R-Car DU driver. Patch 1/2 first renames some register
field macros to increase readability, and patch 2/2 fixes the invalid
writes.

The rationale is the same as for the previous series: the current
implementation is likely fine, but doesn't pass the functional safety
requirements as it doesn't match the documentation.

The series supersedes the patch "[PATCH] drm: rcar-du: Write correct
values in DORCR reserved fields" ([2]) that I have just sent, which was
messing the 1/2 dependency. Patch 2/2 is otherwise identical to [2].

[1] 
https://lore.kernel.org/dri-devel/20230222050623.29080-1-laurent.pinchart+rene...@ideasonboard.com/T/#t
[2] 
https://lore.kernel.org/dri-devel/2023033113.4737-1-laurent.pinchart+rene...@ideasonboard.com/T/#u

Laurent Pinchart (2):
  drm: rcar-du: Rename DORCR fields to make them 0-based
  drm: rcar-du: Write correct values in DORCR reserved fields

 drivers/gpu/drm/rcar-du/rcar_du_group.c | 16 +++
 drivers/gpu/drm/rcar-du/rcar_du_regs.h  | 26 -
 2 files changed, 25 insertions(+), 17 deletions(-)

-- 
Regards,

Laurent Pinchart

Re: [PATCH v5 3/8] drm/i915/pxp: Add MTL helpers to submit Heci-Cmd-Packet to GSC

2023-02-22 Thread Teres Alexis, Alan Previn

On Tue, 2023-02-14 at 13:38 -0800, Teres Alexis, Alan Previn wrote:
> Add helper functions into a new file for heci-packet-submission.
> The helpers will handle generating the MTL GSC-CS Memory-Header
> and submission of the Heci-Cmd-Packet instructions to the engine.
> 
> NOTE1: These common functions for heci-packet-submission will be used
> by different i915 callers:
>  1- GSC-SW-Proxy: This is pending upstream publication awaiting
> a few remaining opens
>  2- MTL-HDCP: An equivalent patch has also been published at:
> https://patchwork.freedesktop.org/series/111876/. (Patch 1)
>  3- PXP: This series.
> 
> NOTE2: A difference in this patch vs what is appearing is in bullet 2
> above is that HDCP (and SW-Proxy) will be using priveleged submission
> (GGTT and common gsc-uc-context) while PXP will be using non-priveleged
> PPGTT, context and batch buffer. Therefore this patch will only slightly
> overlap with the MTL-HDCP patches despite have very similar function
> names (emit_foo vs emit_nonpriv_foo). This is because HECI_CMD_PKT
> instructions require different flows and hw-specific code when done
> via PPGTT based submission (not different from other engines). MTL-HDCP
> contains the same intel_gsc_mtl_header_t structures as this but the
> helpers there are different. Both add the same new file names.
> 

alan: snip

> +int
> +intel_gsc_uc_heci_cmd_submit_nonpriv(struct intel_gsc_uc *gsc,
> +  struct intel_context *ce,
> +  struct intel_gsc_heci_non_priv_pkt *pkt,
> +  u32 *cmd, int timeout_ms)
> +{
> + struct intel_engine_cs *eng;
> + struct i915_request *rq;
> + int err;
> +
> + rq = intel_context_create_request(ce);

alan: i need to  this to below the vma-lock-unlock pairs below to avoid any 
kind of lockdep warning because of expected primed ordering of calls across 
driver

> + if (IS_ERR(rq))
> + return PTR_ERR(rq);
> +
> + emit_gsc_heci_pkt_nonpriv(cmd, pkt);
> +
> + i915_vma_lock(pkt->bb_vma);
> + err = i915_vma_move_to_active(pkt->bb_vma, rq, EXEC_OBJECT_WRITE);
> + i915_vma_unlock(pkt->bb_vma);
> + if (err)
> + return err;
> +
> + i915_vma_lock(pkt->heci_pkt_vma);
> + err = i915_vma_move_to_active(pkt->heci_pkt_vma, rq, EXEC_OBJECT_WRITE);
> + i915_vma_unlock(pkt->heci_pkt_vma);
> + if (err)
> + return err;
> +
> + eng = rq->context->engine;
> + if (eng->emit_init_breadcrumb) {
> + err = eng->emit_init_breadcrumb(rq);
> + if (err)
> + goto out_rq;
> + }

alan: snip

[PATCH] drm: rcar-du: Write correct values in DORCR reserved fields

2023-02-22 Thread Laurent Pinchart

The DORCR register controls the routing of clocks and data between DU
channels within a group. For groups that contain a single channel,
there's no routing option to control, and some fields of the register
are then reserved. On Gen2 those reserved fields are documented as
required to be set to 0, while on Gen3 and newer the PG1T, DK1S and PG1D
reserved fields must be set to 1.

The DU driver initializes the DORCR register in rcar_du_group_setup(),
where it ignores the PG1T, DK1S and PG1D, and then configures those
fields to the correct value in rcar_du_group_set_routing(). This hasn't
been shown to cause any issue, but prevents certifying that the driver
complies with the documentation in safety-critical use cases.

As there is no reasonable change that the documentation will be updated
to clarify that those reserved fields can be written to 0 temporarily
before starting the hardware, make sure that the registers are always
set to valid values.

Signed-off-by: Laurent Pinchart 
---
 drivers/gpu/drm/rcar-du/rcar_du_group.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/rcar-du/rcar_du_group.c 
b/drivers/gpu/drm/rcar-du/rcar_du_group.c
index b5950749d68a..2ccd2581f544 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_group.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_group.c
@@ -138,6 +138,7 @@ static void rcar_du_group_setup(struct rcar_du_group *rgrp)
 {
struct rcar_du_device *rcdu = rgrp->dev;
u32 defr7 = DEFR7_CODE;
+   u32 dorcr;
 
/* Enable extended features */
rcar_du_group_write(rgrp, DEFR, DEFR_CODE | DEFR_DEFE);
@@ -174,8 +175,15 @@ static void rcar_du_group_setup(struct rcar_du_group *rgrp)
/*
 * Use DS1PR and DS2PR to configure planes priorities and connects the
 * superposition 0 to DU0 pins. DU1 pins will be configured dynamically.
+*
+* Groups that have a single channel have a hardcoded configuration. On
+* Gen3 and newer, the documentation requires PG1T, DK1S and PG1D_DS1 to
+* always be set in this case.
 */
-   rcar_du_group_write(rgrp, DORCR, DORCR_PG0D_DS0 | DORCR_DPRS);
+   dorcr = DORCR_PG0D_DS0 | DORCR_DPRS;
+   if (rcdu->info->gen >= 3 && rgrp->num_crtcs == 1)
+   dorcr |= DORCR_PG1T | DORCR_DK1S | DORCR_PG1D_DS1;
+   rcar_du_group_write(rgrp, DORCR, dorcr);
 
/* Apply planes to CRTCs association. */
mutex_lock(&rgrp->lock);
-- 
Regards,

Laurent Pinchart

Re: [PATCH] drm/atomic-helpers: remove legacy_cursor_update hacks

2023-02-22 Thread Rob Clark

On Wed, Feb 22, 2023 at 3:14 PM Rob Clark  wrote:
>
> On Thu, Feb 16, 2023 at 3:12 AM Daniel Vetter  wrote:
> >
> > The stuff never really worked, and leads to lots of fun because it
> > out-of-order frees atomic states. Which upsets KASAN, among other
> > things.
> >
> > For async updates we now have a more solid solution with the
> > ->atomic_async_check and ->atomic_async_commit hooks. Support for that
> > for msm and vc4 landed. nouveau and i915 have their own commit
> > routines, doing something similar.
> >
> > For everyone else it's probably better to remove the use-after-free
> > bug, and encourage folks to use the async support instead. The
> > affected drivers which register a legacy cursor plane and don't either
> > use the new async stuff or their own commit routine are: amdgpu,
> > atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
> >
> > Inspired by an amdgpu bug report.
> >
> > v2: Drop RFC, I think with amdgpu converted over to use
> > atomic_async_check/commit done in
> >
> > commit 674e78acae0dfb4beb56132e41cbae5b60f7d662
> > Author: Nicholas Kazlauskas 
> > Date:   Wed Dec 5 14:59:07 2018 -0500
> >
> > drm/amd/display: Add fast path for cursor plane updates
> >
> > we don't have any driver anymore where we have userspace expecting
> > solid legacy cursor support _and_ they are using the atomic helpers in
> > their fully glory. So we can retire this.
> >
> > v3: Paper over msm and i915 regression. The complete_all is the only
> > thing missing afaict.
> >
> > v4: Fixup i915 fixup ...
> >
> > v5: Unallocate the crtc->event in msm to avoid hitting a WARN_ON in
> > dpu_crtc_atomic_flush(). This is a bit a hack, but simplest way to
> > untangle this all. Thanks to Abhinav Kumar for the debug help.
>
> Hmm, are you sure about that double-put?
>
> [  +0.501263] [ cut here ]
> [  +0.32] refcount_t: underflow; use-after-free.
> [  +0.33] WARNING: CPU: 6 PID: 1854 at lib/refcount.c:28
> refcount_warn_saturate+0xf8/0x134
> [  +0.43] Modules linked in: uinput rfcomm algif_hash
> algif_skcipher af_alg veth venus_dec venus_enc xt_cgroup xt_MASQUERADE
> qcom_spmi_temp_alarm qcom_spmi_adc_tm5 qcom_spmi_adc5 qcom_vadc_common
> cros_ec_typec typec 8021q hci_uart btqca qcom_stats venus_core
> coresight_etm4x coresight_tmc snd_soc_lpass_sc7180
> coresight_replicator coresight_funnel coresight snd_soc_sc7180
> ip6table_nat fuse ath10k_snoc ath10k_core ath mac80211 iio_trig_sysfs
> bluetooth cros_ec_sensors cfg80211 cros_ec_sensors_core
> industrialio_triggered_buffer kfifo_buf ecdh_generic ecc
> cros_ec_sensorhub lzo_rle lzo_compress r8153_ecm cdc_ether usbnet
> r8152 mii zram hid_vivaldi hid_google_hammer hid_vivaldi_common joydev
> [  +0.000189] CPU: 6 PID: 1854 Comm: DrmThread Not tainted
> 5.15.93-16271-g5ecce40dbcd4 #46
> cf9752a1c9e5b13fd13216094f52d77fa5a5f8f3
> [  +0.16] Hardware name: Google Wormdingler rev1+ INX panel board (DT)
> [  +0.08] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  +0.13] pc : refcount_warn_saturate+0xf8/0x134
> [  +0.11] lr : refcount_warn_saturate+0xf8/0x134
> [  +0.11] sp : ffc012e43930
> [  +0.08] x29: ffc012e43930 x28: ff80d31aa300 x27: 
> 024e
> [  +0.17] x26: 03bd x25: 0040 x24: 
> 0040
> [  +0.14] x23: ff8083eb1000 x22: 0002 x21: 
> ff80845bc800
> [  +0.13] x20: 0040 x19: ff80d0cecb00 x18: 
> 60014024
> [  +0.12] x17:  x16: 003c x15: 
> ffd97e21a1c0
> [  +0.12] x14: 0003 x13: 0004 x12: 
> 0001
> [  +0.14] x11: c000dfff x10: ffd97f560f50 x9 : 
> 5749cdb403550d00
> [  +0.14] x8 : 5749cdb403550d00 x7 :  x6 : 
> 372e31332020205b
> [  +0.12] x5 : ffd97f7b8b24 x4 :  x3 : 
> ffc012e43588
> [  +0.13] x2 : ffc012e43590 x1 : dfff x0 : 
> 0026
> [  +0.14] Call trace:
> [  +0.08]  refcount_warn_saturate+0xf8/0x134
> [  +0.13]  drm_crtc_commit_put+0x54/0x74
> [  +0.13]  __drm_atomic_helper_plane_destroy_state+0x64/0x68
> [  +0.13]  dpu_plane_destroy_state+0x24/0x3c
> [  +0.17]  drm_atomic_state_default_clear+0x13c/0x2d8
> [  +0.15]  __drm_atomic_state_free+0x88/0xa0
> [  +0.15]  drm_atomic_helper_update_plane+0x158/0x188
> [  +0.14]  __setplane_atomic+0xf4/0x138
> [  +0.12]  drm_mode_cursor_common+0x2e8/0x40c
> [  +0.09]  drm_mode_cursor_ioctl+0x48/0x70
> [  +0.08]  drm_ioctl_kernel+0xe0/0x158
> [  +0.14]  drm_ioctl+0x214/0x480
> [  +0.12]  __arm64_sys_ioctl+0x94/0xd4
> [  +0.10]  invoke_syscall+0x4c/0x100
> [  +0.13]  do_el0_svc+0xa4/0x168
> [  +0.12]  el0_svc+0x20/0x50
> [  +0.09]  el0t_64_sync_handler+0x20/0x110
> [  +0.08]  el0t_64_sync+0x1a4/0x1a8
> [  +0.10] ---[ end trace 35bb2d245a684c9a ]---
>

without the

Re: [PATCH 1/5] drm/msm/adreno: Use OPP for every GPU generation

2023-02-22 Thread Dmitry Baryshkov


On 23/02/2023 00:40, Konrad Dybcio wrote:



On 22.02.2023 23:38, Dmitry Baryshkov wrote:

On 22/02/2023 23:47, Konrad Dybcio wrote:

Some older GPUs (namely a2xx with no opp tables at all and a320 with
downstream-remnants gpu pwrlevels) used not to have OPP tables. They
both however had just one frequency defined, making it extremely easy
to construct such an OPP table from within the driver if need be.

Do so and switch all clk_set_rate calls on core_clk to their OPP
counterparts.

Signed-off-by: Konrad Dybcio 
---
   drivers/gpu/drm/msm/adreno/adreno_gpu.c | 94 
+++--
   drivers/gpu/drm/msm/msm_gpu.c   |  4 +-
   drivers/gpu/drm/msm/msm_gpu_devfreq.c   |  2 +-
   3 files changed, 45 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index ce6b76c45b6f..9b940c0f063f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -922,73 +922,50 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, 
uint32_t ndwords)
   ring->id);
   }
   -/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp 
table */
-static int adreno_get_legacy_pwrlevels(struct device *dev)
-{
-    struct device_node *child, *node;
-    int ret;
-
-    node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels");
-    if (!node) {
-    DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n");
-    return -ENXIO;
-    }
-
-    for_each_child_of_node(node, child) {
-    unsigned int val;
-
-    ret = of_property_read_u32(child, "qcom,gpu-freq", &val);
-    if (ret)
-    continue;
-
-    /*
- * Skip the intentionally bogus clock value found at the bottom
- * of most legacy frequency tables
- */
-    if (val != 2700)
-    dev_pm_opp_add(dev, val, 0);
-    }
-
-    of_node_put(node);
-
-    return 0;
-}
-
-static void adreno_get_pwrlevels(struct device *dev,
+static int adreno_get_pwrlevels(struct device *dev,
   struct msm_gpu *gpu)
   {
+    struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
   unsigned long freq = ULONG_MAX;
   struct dev_pm_opp *opp;
   int ret;
     gpu->fast_rate = 0;
   -    /* You down with OPP? */
-    if (!of_find_property(dev->of_node, "operating-points-v2", NULL))
-    ret = adreno_get_legacy_pwrlevels(dev);
-    else {
-    ret = devm_pm_opp_of_add_table(dev);
-    if (ret)
-    DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
-    }
-
-    if (!ret) {
+    /* devm_pm_opp_of_add_table may error out but will still create an OPP 
table */
+    ret = devm_pm_opp_of_add_table(dev);
+    if (ret == -ENODEV) {
+    /* Special cases for ancient hw with ancient DT bindings */
+    if (adreno_is_a2xx(adreno_gpu)) {
+    dev_warn(dev, "Unable to find the OPP table. Falling back to 200 
MHz.\n");
+    dev_pm_opp_add(dev, 2, 0);
+    gpu->fast_rate = 2;


We can skip setting the fast_rate, dev_pm_opp_find_freq_floor below will get it 
from our freshly generated opp table.

It's not reached in this code path.


I see. I got lost in all the ifs. What do you think about turning it 
into the main code path, since after this code block we always have a 
valid OPP table?







+    } else if (adreno_is_a320(adreno_gpu)) {
+    dev_warn(dev, "Unable to find the OPP table. Falling back to 450 
MHz.\n");
+    dev_pm_opp_add(dev, 45000, 0);
+    gpu->fast_rate = 45000;
+    } else {
+    DRM_DEV_ERROR(dev, "Unable to find the OPP table\n");
+    return -ENODEV;
+    }
+    } else if (ret) {
+    DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
+    return ret;
+    } else {
   /* Find the fastest defined rate */
   opp = dev_pm_opp_find_freq_floor(dev, &freq);
-    if (!IS_ERR(opp)) {
+
+    if (IS_ERR(opp))
+    return PTR_ERR(opp);
+    else {
   gpu->fast_rate = freq;
   dev_pm_opp_put(opp);
   }
   }
   -    if (!gpu->fast_rate) {
-    dev_warn(dev,
-    "Could not find a clock rate. Using a reasonable default\n");
-    /* Pick a suitably safe clock speed for any target */
-    gpu->fast_rate = 2;
-    }
-
   DBG("fast_rate=%u, slow_rate=2700", gpu->fast_rate);
+
+    return 0;
   }
     int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu 
*adreno_gpu,



--
With best wishes
Dmitry

Re: [PATCH] drm/atomic-helpers: remove legacy_cursor_update hacks

2023-02-22 Thread Rob Clark

On Thu, Feb 16, 2023 at 3:12 AM Daniel Vetter  wrote:
>
> The stuff never really worked, and leads to lots of fun because it
> out-of-order frees atomic states. Which upsets KASAN, among other
> things.
>
> For async updates we now have a more solid solution with the
> ->atomic_async_check and ->atomic_async_commit hooks. Support for that
> for msm and vc4 landed. nouveau and i915 have their own commit
> routines, doing something similar.
>
> For everyone else it's probably better to remove the use-after-free
> bug, and encourage folks to use the async support instead. The
> affected drivers which register a legacy cursor plane and don't either
> use the new async stuff or their own commit routine are: amdgpu,
> atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
>
> Inspired by an amdgpu bug report.
>
> v2: Drop RFC, I think with amdgpu converted over to use
> atomic_async_check/commit done in
>
> commit 674e78acae0dfb4beb56132e41cbae5b60f7d662
> Author: Nicholas Kazlauskas 
> Date:   Wed Dec 5 14:59:07 2018 -0500
>
> drm/amd/display: Add fast path for cursor plane updates
>
> we don't have any driver anymore where we have userspace expecting
> solid legacy cursor support _and_ they are using the atomic helpers in
> their fully glory. So we can retire this.
>
> v3: Paper over msm and i915 regression. The complete_all is the only
> thing missing afaict.
>
> v4: Fixup i915 fixup ...
>
> v5: Unallocate the crtc->event in msm to avoid hitting a WARN_ON in
> dpu_crtc_atomic_flush(). This is a bit a hack, but simplest way to
> untangle this all. Thanks to Abhinav Kumar for the debug help.

Hmm, are you sure about that double-put?

[  +0.501263] [ cut here ]
[  +0.32] refcount_t: underflow; use-after-free.
[  +0.33] WARNING: CPU: 6 PID: 1854 at lib/refcount.c:28
refcount_warn_saturate+0xf8/0x134
[  +0.43] Modules linked in: uinput rfcomm algif_hash
algif_skcipher af_alg veth venus_dec venus_enc xt_cgroup xt_MASQUERADE
qcom_spmi_temp_alarm qcom_spmi_adc_tm5 qcom_spmi_adc5 qcom_vadc_common
cros_ec_typec typec 8021q hci_uart btqca qcom_stats venus_core
coresight_etm4x coresight_tmc snd_soc_lpass_sc7180
coresight_replicator coresight_funnel coresight snd_soc_sc7180
ip6table_nat fuse ath10k_snoc ath10k_core ath mac80211 iio_trig_sysfs
bluetooth cros_ec_sensors cfg80211 cros_ec_sensors_core
industrialio_triggered_buffer kfifo_buf ecdh_generic ecc
cros_ec_sensorhub lzo_rle lzo_compress r8153_ecm cdc_ether usbnet
r8152 mii zram hid_vivaldi hid_google_hammer hid_vivaldi_common joydev
[  +0.000189] CPU: 6 PID: 1854 Comm: DrmThread Not tainted
5.15.93-16271-g5ecce40dbcd4 #46
cf9752a1c9e5b13fd13216094f52d77fa5a5f8f3
[  +0.16] Hardware name: Google Wormdingler rev1+ INX panel board (DT)
[  +0.08] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  +0.13] pc : refcount_warn_saturate+0xf8/0x134
[  +0.11] lr : refcount_warn_saturate+0xf8/0x134
[  +0.11] sp : ffc012e43930
[  +0.08] x29: ffc012e43930 x28: ff80d31aa300 x27: 024e
[  +0.17] x26: 03bd x25: 0040 x24: 0040
[  +0.14] x23: ff8083eb1000 x22: 0002 x21: ff80845bc800
[  +0.13] x20: 0040 x19: ff80d0cecb00 x18: 60014024
[  +0.12] x17:  x16: 003c x15: ffd97e21a1c0
[  +0.12] x14: 0003 x13: 0004 x12: 0001
[  +0.14] x11: c000dfff x10: ffd97f560f50 x9 : 5749cdb403550d00
[  +0.14] x8 : 5749cdb403550d00 x7 :  x6 : 372e31332020205b
[  +0.12] x5 : ffd97f7b8b24 x4 :  x3 : ffc012e43588
[  +0.13] x2 : ffc012e43590 x1 : dfff x0 : 0026
[  +0.14] Call trace:
[  +0.08]  refcount_warn_saturate+0xf8/0x134
[  +0.13]  drm_crtc_commit_put+0x54/0x74
[  +0.13]  __drm_atomic_helper_plane_destroy_state+0x64/0x68
[  +0.13]  dpu_plane_destroy_state+0x24/0x3c
[  +0.17]  drm_atomic_state_default_clear+0x13c/0x2d8
[  +0.15]  __drm_atomic_state_free+0x88/0xa0
[  +0.15]  drm_atomic_helper_update_plane+0x158/0x188
[  +0.14]  __setplane_atomic+0xf4/0x138
[  +0.12]  drm_mode_cursor_common+0x2e8/0x40c
[  +0.09]  drm_mode_cursor_ioctl+0x48/0x70
[  +0.08]  drm_ioctl_kernel+0xe0/0x158
[  +0.14]  drm_ioctl+0x214/0x480
[  +0.12]  __arm64_sys_ioctl+0x94/0xd4
[  +0.10]  invoke_syscall+0x4c/0x100
[  +0.13]  do_el0_svc+0xa4/0x168
[  +0.12]  el0_svc+0x20/0x50
[  +0.09]  el0t_64_sync_handler+0x20/0x110
[  +0.08]  el0t_64_sync+0x1a4/0x1a8
[  +0.10] ---[ end trace 35bb2d245a684c9a ]---


BR,
-R



> Cc: Abhinav Kumar 
> Cc: Thomas Zimmermann 
> Cc: Maxime Ripard 
> References: https://bugzilla.kernel.org/show_bug.cgi?id=199425
> References: 
> https://lore.kernel.org/all/20220221134155.125447-9-max...@cerno.tech/
> References: https://bugzilla.kernel.org/show_bug.cgi?

Re: [PATCH 1/5] drm/msm/adreno: Use OPP for every GPU generation

2023-02-22 Thread Konrad Dybcio




On 22.02.2023 23:38, Dmitry Baryshkov wrote:
> On 22/02/2023 23:47, Konrad Dybcio wrote:
>> Some older GPUs (namely a2xx with no opp tables at all and a320 with
>> downstream-remnants gpu pwrlevels) used not to have OPP tables. They
>> both however had just one frequency defined, making it extremely easy
>> to construct such an OPP table from within the driver if need be.
>>
>> Do so and switch all clk_set_rate calls on core_clk to their OPP
>> counterparts.
>>
>> Signed-off-by: Konrad Dybcio 
>> ---
>>   drivers/gpu/drm/msm/adreno/adreno_gpu.c | 94 
>> +++--
>>   drivers/gpu/drm/msm/msm_gpu.c   |  4 +-
>>   drivers/gpu/drm/msm/msm_gpu_devfreq.c   |  2 +-
>>   3 files changed, 45 insertions(+), 55 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
>> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>> index ce6b76c45b6f..9b940c0f063f 100644
>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>> @@ -922,73 +922,50 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, 
>> uint32_t ndwords)
>>   ring->id);
>>   }
>>   -/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp 
>> table */
>> -static int adreno_get_legacy_pwrlevels(struct device *dev)
>> -{
>> -    struct device_node *child, *node;
>> -    int ret;
>> -
>> -    node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels");
>> -    if (!node) {
>> -    DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n");
>> -    return -ENXIO;
>> -    }
>> -
>> -    for_each_child_of_node(node, child) {
>> -    unsigned int val;
>> -
>> -    ret = of_property_read_u32(child, "qcom,gpu-freq", &val);
>> -    if (ret)
>> -    continue;
>> -
>> -    /*
>> - * Skip the intentionally bogus clock value found at the bottom
>> - * of most legacy frequency tables
>> - */
>> -    if (val != 2700)
>> -    dev_pm_opp_add(dev, val, 0);
>> -    }
>> -
>> -    of_node_put(node);
>> -
>> -    return 0;
>> -}
>> -
>> -static void adreno_get_pwrlevels(struct device *dev,
>> +static int adreno_get_pwrlevels(struct device *dev,
>>   struct msm_gpu *gpu)
>>   {
>> +    struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>   unsigned long freq = ULONG_MAX;
>>   struct dev_pm_opp *opp;
>>   int ret;
>>     gpu->fast_rate = 0;
>>   -    /* You down with OPP? */
>> -    if (!of_find_property(dev->of_node, "operating-points-v2", NULL))
>> -    ret = adreno_get_legacy_pwrlevels(dev);
>> -    else {
>> -    ret = devm_pm_opp_of_add_table(dev);
>> -    if (ret)
>> -    DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
>> -    }
>> -
>> -    if (!ret) {
>> +    /* devm_pm_opp_of_add_table may error out but will still create an OPP 
>> table */
>> +    ret = devm_pm_opp_of_add_table(dev);
>> +    if (ret == -ENODEV) {
>> +    /* Special cases for ancient hw with ancient DT bindings */
>> +    if (adreno_is_a2xx(adreno_gpu)) {
>> +    dev_warn(dev, "Unable to find the OPP table. Falling back to 
>> 200 MHz.\n");
>> +    dev_pm_opp_add(dev, 2, 0);
>> +    gpu->fast_rate = 2;
> 
> We can skip setting the fast_rate, dev_pm_opp_find_freq_floor below will get 
> it from our freshly generated opp table.
It's not reached in this code path.

> 
>> +    } else if (adreno_is_a320(adreno_gpu)) {
>> +    dev_warn(dev, "Unable to find the OPP table. Falling back to 
>> 450 MHz.\n");
>> +    dev_pm_opp_add(dev, 45000, 0);
>> +    gpu->fast_rate = 45000;
>> +    } else {
>> +    DRM_DEV_ERROR(dev, "Unable to find the OPP table\n");
>> +    return -ENODEV;
>> +    }
>> +    } else if (ret) {
>> +    DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
>> +    return ret;
>> +    } else {
>>   /* Find the fastest defined rate */
>>   opp = dev_pm_opp_find_freq_floor(dev, &freq);
>> -    if (!IS_ERR(opp)) {
>> +
>> +    if (IS_ERR(opp))
>> +    return PTR_ERR(opp);
>> +    else {
>>   gpu->fast_rate = freq;
>>   dev_pm_opp_put(opp);
>>   }
>>   }
>>   -    if (!gpu->fast_rate) {
>> -    dev_warn(dev,
>> -    "Could not find a clock rate. Using a reasonable default\n");
>> -    /* Pick a suitably safe clock speed for any target */
>> -    gpu->fast_rate = 2;
>> -    }
>> -
>>   DBG("fast_rate=%u, slow_rate=2700", gpu->fast_rate);
>> +
>> +    return 0;
>>   }
>>     int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu 
>> *adreno_gpu,
>> @@ -1046,6 +1023,17 @@ int adreno_gpu_init(struct drm_device *drm, struct 
>> platform_device *pdev,
>>   struct adreno_rev *rev = &config->rev;
>>   const char *gpu_name;
>>   u32 speedbin;
>> +    int ret;
>> +
>> +    /* This can only be done here, or devm_pm_opp_set_supported_hw w

Re: [PATCH 1/5] drm/msm/adreno: Use OPP for every GPU generation

2023-02-22 Thread Dmitry Baryshkov


On 22/02/2023 23:47, Konrad Dybcio wrote:

Some older GPUs (namely a2xx with no opp tables at all and a320 with
downstream-remnants gpu pwrlevels) used not to have OPP tables. They
both however had just one frequency defined, making it extremely easy
to construct such an OPP table from within the driver if need be.

Do so and switch all clk_set_rate calls on core_clk to their OPP
counterparts.

Signed-off-by: Konrad Dybcio 
---
  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 94 +++--
  drivers/gpu/drm/msm/msm_gpu.c   |  4 +-
  drivers/gpu/drm/msm/msm_gpu_devfreq.c   |  2 +-
  3 files changed, 45 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index ce6b76c45b6f..9b940c0f063f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -922,73 +922,50 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, 
uint32_t ndwords)
ring->id);
  }
  
-/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp table */

-static int adreno_get_legacy_pwrlevels(struct device *dev)
-{
-   struct device_node *child, *node;
-   int ret;
-
-   node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels");
-   if (!node) {
-   DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n");
-   return -ENXIO;
-   }
-
-   for_each_child_of_node(node, child) {
-   unsigned int val;
-
-   ret = of_property_read_u32(child, "qcom,gpu-freq", &val);
-   if (ret)
-   continue;
-
-   /*
-* Skip the intentionally bogus clock value found at the bottom
-* of most legacy frequency tables
-*/
-   if (val != 2700)
-   dev_pm_opp_add(dev, val, 0);
-   }
-
-   of_node_put(node);
-
-   return 0;
-}
-
-static void adreno_get_pwrlevels(struct device *dev,
+static int adreno_get_pwrlevels(struct device *dev,
struct msm_gpu *gpu)
  {
+   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
unsigned long freq = ULONG_MAX;
struct dev_pm_opp *opp;
int ret;
  
  	gpu->fast_rate = 0;
  
-	/* You down with OPP? */

-   if (!of_find_property(dev->of_node, "operating-points-v2", NULL))
-   ret = adreno_get_legacy_pwrlevels(dev);
-   else {
-   ret = devm_pm_opp_of_add_table(dev);
-   if (ret)
-   DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
-   }
-
-   if (!ret) {
+   /* devm_pm_opp_of_add_table may error out but will still create an OPP 
table */
+   ret = devm_pm_opp_of_add_table(dev);
+   if (ret == -ENODEV) {
+   /* Special cases for ancient hw with ancient DT bindings */
+   if (adreno_is_a2xx(adreno_gpu)) {
+   dev_warn(dev, "Unable to find the OPP table. Falling back to 
200 MHz.\n");
+   dev_pm_opp_add(dev, 2, 0);
+   gpu->fast_rate = 2;


We can skip setting the fast_rate, dev_pm_opp_find_freq_floor below will 
get it from our freshly generated opp table.



+   } else if (adreno_is_a320(adreno_gpu)) {
+   dev_warn(dev, "Unable to find the OPP table. Falling back to 
450 MHz.\n");
+   dev_pm_opp_add(dev, 45000, 0);
+   gpu->fast_rate = 45000;
+   } else {
+   DRM_DEV_ERROR(dev, "Unable to find the OPP table\n");
+   return -ENODEV;
+   }
+   } else if (ret) {
+   DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
+   return ret;
+   } else {
/* Find the fastest defined rate */
opp = dev_pm_opp_find_freq_floor(dev, &freq);
-   if (!IS_ERR(opp)) {
+
+   if (IS_ERR(opp))
+   return PTR_ERR(opp);
+   else {
gpu->fast_rate = freq;
dev_pm_opp_put(opp);
}
}
  
-	if (!gpu->fast_rate) {

-   dev_warn(dev,
-   "Could not find a clock rate. Using a reasonable 
default\n");
-   /* Pick a suitably safe clock speed for any target */
-   gpu->fast_rate = 2;
-   }
-
DBG("fast_rate=%u, slow_rate=2700", gpu->fast_rate);
+
+   return 0;
  }
  
  int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu *adreno_gpu,

@@ -1046,6 +1023,17 @@ int adreno_gpu_init(struct drm_device *drm, struct 
platform_device *pdev,
struct adreno_rev *rev = &config->rev;
const char *gpu_name;
u32 speedbin;
+   int ret;
+
+   /* This can only be done here, or devm_pm_opp_set_supported_hw will 
WARN_ON() */


I

Re: [PATCH 2/5] drm/msm/a2xx: Implement .gpu_busy

2023-02-22 Thread Dmitry Baryshkov


On 22/02/2023 23:47, Konrad Dybcio wrote:

Implement gpu_busy based on the downstream msm-3.4 code [1]. This
allows us to use devfreq on this old old old hardware!

[1] 
https://github.com/LineageOS/android_kernel_sony_apq8064/blob/lineage-16.0/drivers/gpu/msm/adreno_a2xx.c#L1975

Signed-off-by: Konrad Dybcio 
---
  drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 28 
  1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index c67089a7ebc1..6258c98e5a88 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -481,6 +481,33 @@ a2xx_create_address_space(struct msm_gpu *gpu, struct 
platform_device *pdev)
return aspace;
  }
  
+/* While the precise size of this field is unknown, it holds at least these three values.. */

+#define PERF_MODE_CNT  GENMASK(2, 0)
+ #define PERF_STATE_RESET  0x0
+ #define PERF_STATE_ENABLE 0x1
+ #define PERF_STATE_FREEZE 0x2


These should go into a2xx.xml.h

LGTM otherwise.


+static u64 a2xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
+{
+   u64 busy_cycles;
+
+   /* Freeze the counter */
+   gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, FIELD_PREP(PERF_MODE_CNT, 
PERF_STATE_FREEZE));
+
+   busy_cycles = gpu_read64(gpu, REG_A2XX_RBBM_PERFCOUNTER1_LO);
+
+   /* Reset the counter */
+   gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, FIELD_PREP(PERF_MODE_CNT, 
PERF_STATE_RESET));
+
+   /* Re-enable the performance monitors */
+   gpu_rmw(gpu, REG_A2XX_RBBM_PM_OVERRIDE2, BIT(6), BIT(6));
+   gpu_write(gpu, REG_A2XX_RBBM_PERFCOUNTER1_SELECT, 1);
+   gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, FIELD_PREP(PERF_MODE_CNT, 
PERF_STATE_ENABLE));
+
+   *out_sample_rate = clk_get_rate(gpu->core_clk);
+
+   return busy_cycles;
+}
+
  static u32 a2xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
  {
ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR);
@@ -502,6 +529,7 @@ static const struct adreno_gpu_funcs funcs = {
  #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
.show = adreno_show,
  #endif
+   .gpu_busy = a2xx_gpu_busy,
.gpu_state_get = a2xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
.create_address_space = a2xx_create_address_space,



--
With best wishes
Dmitry

Re: [PATCH 4/5] drm/msm/a4xx: Implement .gpu_busy

2023-02-22 Thread Dmitry Baryshkov


On 22/02/2023 23:47, Konrad Dybcio wrote:

Add support for gpu_busy on a4xx, which is required for devfreq
support.

Signed-off-by: Konrad Dybcio 
---
  drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 11 +++
  1 file changed, 11 insertions(+)


Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry

Re: [PATCH 5/5] drm/msm/a5xx: Enable optional icc voting from OPP tables

2023-02-22 Thread Dmitry Baryshkov


On 23/02/2023 00:14, Konrad Dybcio wrote:



On 22.02.2023 23:12, Dmitry Baryshkov wrote:

On 22/02/2023 23:47, Konrad Dybcio wrote:

Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework
handle bus voting as part of power level setting.


This can probably go to the generic code path rather than sticking it into a5xx 
only.

The reasoning is explained in the cover letter, a3xx/a4xx already
have "raw" icc set calls which would require more work (and above
all, testing) to untangle while keeping backwards compat, this is
a midterm solution that would allow getting scaling to work earlier.


Those two platforms call icc_set_bw() during setup, however their opp 
tables do not contain BW settings, making dev_pm_opp_of_find_icc_paths() 
nop. So, I think, we might as well call this function on a3xx/a4xx, 
making the code future proof.




Konrad




Signed-off-by: Konrad Dybcio 
---
   drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index d09221f97f71..a33af0cc27b6 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1775,5 +1775,9 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
   /* Set up the preemption specific bits and pieces for each ringbuffer */
   a5xx_preempt_init(gpu);
   +    ret = dev_pm_opp_of_find_icc_paths(&pdev->dev, NULL);
+    if (ret)
+    return ERR_PTR(ret);
+
   return gpu;
   }





--
With best wishes
Dmitry

Re: [PATCH 5/5] drm/msm/a5xx: Enable optional icc voting from OPP tables

2023-02-22 Thread Konrad Dybcio




On 22.02.2023 23:12, Dmitry Baryshkov wrote:
> On 22/02/2023 23:47, Konrad Dybcio wrote:
>> Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework
>> handle bus voting as part of power level setting.
> 
> This can probably go to the generic code path rather than sticking it into 
> a5xx only.
The reasoning is explained in the cover letter, a3xx/a4xx already
have "raw" icc set calls which would require more work (and above
all, testing) to untangle while keeping backwards compat, this is
a midterm solution that would allow getting scaling to work earlier.

Konrad
> 
>>
>> Signed-off-by: Konrad Dybcio 
>> ---
>>   drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
>> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
>> index d09221f97f71..a33af0cc27b6 100644
>> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
>> @@ -1775,5 +1775,9 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
>>   /* Set up the preemption specific bits and pieces for each ringbuffer 
>> */
>>   a5xx_preempt_init(gpu);
>>   +    ret = dev_pm_opp_of_find_icc_paths(&pdev->dev, NULL);
>> +    if (ret)
>> +    return ERR_PTR(ret);
>> +
>>   return gpu;
>>   }
>>
>

Re: [PATCH 5/5] drm/msm/a5xx: Enable optional icc voting from OPP tables

2023-02-22 Thread Dmitry Baryshkov


On 22/02/2023 23:47, Konrad Dybcio wrote:

Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework
handle bus voting as part of power level setting.


This can probably go to the generic code path rather than sticking it 
into a5xx only.




Signed-off-by: Konrad Dybcio 
---
  drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index d09221f97f71..a33af0cc27b6 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1775,5 +1775,9 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
/* Set up the preemption specific bits and pieces for each ringbuffer */
a5xx_preempt_init(gpu);
  
+	ret = dev_pm_opp_of_find_icc_paths(&pdev->dev, NULL);

+   if (ret)
+   return ERR_PTR(ret);
+
return gpu;
  }



--
With best wishes
Dmitry

Re: [PATCH 3/5] drm/msm/a3xx: Implement .gpu_busy

2023-02-22 Thread Dmitry Baryshkov


On 22/02/2023 23:47, Konrad Dybcio wrote:

Add support for gpu_busy on a3xx, which is required for devfreq
support.

Signed-off-by: Konrad Dybcio 


Tested-by: Dmitry Baryshkov  #ifc6410
Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry

[PATCH 3/5] drm/msm/a3xx: Implement .gpu_busy

2023-02-22 Thread Konrad Dybcio

Add support for gpu_busy on a3xx, which is required for devfreq
support.

Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 948785ed07bb..c86b377f6f0d 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -477,6 +477,16 @@ static struct msm_gpu_state *a3xx_gpu_state_get(struct 
msm_gpu *gpu)
return state;
 }
 
+static u64 a3xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
+{
+   u64 busy_cycles;
+
+   busy_cycles = gpu_read64(gpu, REG_A3XX_RBBM_PERFCTR_RBBM_1_LO);
+   *out_sample_rate = clk_get_rate(gpu->core_clk);
+
+   return busy_cycles;
+}
+
 static u32 a3xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR);
@@ -498,6 +508,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
.show = adreno_show,
 #endif
+   .gpu_busy = a3xx_gpu_busy,
.gpu_state_get = a3xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
.create_address_space = adreno_create_address_space,

-- 
2.39.2

[PATCH 5/5] drm/msm/a5xx: Enable optional icc voting from OPP tables

2023-02-22 Thread Konrad Dybcio

Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework
handle bus voting as part of power level setting.

Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index d09221f97f71..a33af0cc27b6 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1775,5 +1775,9 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
/* Set up the preemption specific bits and pieces for each ringbuffer */
a5xx_preempt_init(gpu);
 
+   ret = dev_pm_opp_of_find_icc_paths(&pdev->dev, NULL);
+   if (ret)
+   return ERR_PTR(ret);
+
return gpu;
 }

-- 
2.39.2

[PATCH 2/5] drm/msm/a2xx: Implement .gpu_busy

2023-02-22 Thread Konrad Dybcio

Implement gpu_busy based on the downstream msm-3.4 code [1]. This
allows us to use devfreq on this old old old hardware!

[1] 
https://github.com/LineageOS/android_kernel_sony_apq8064/blob/lineage-16.0/drivers/gpu/msm/adreno_a2xx.c#L1975

Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index c67089a7ebc1..6258c98e5a88 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -481,6 +481,33 @@ a2xx_create_address_space(struct msm_gpu *gpu, struct 
platform_device *pdev)
return aspace;
 }
 
+/* While the precise size of this field is unknown, it holds at least these 
three values.. */
+#define PERF_MODE_CNT  GENMASK(2, 0)
+ #define PERF_STATE_RESET  0x0
+ #define PERF_STATE_ENABLE 0x1
+ #define PERF_STATE_FREEZE 0x2
+static u64 a2xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
+{
+   u64 busy_cycles;
+
+   /* Freeze the counter */
+   gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, FIELD_PREP(PERF_MODE_CNT, 
PERF_STATE_FREEZE));
+
+   busy_cycles = gpu_read64(gpu, REG_A2XX_RBBM_PERFCOUNTER1_LO);
+
+   /* Reset the counter */
+   gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, FIELD_PREP(PERF_MODE_CNT, 
PERF_STATE_RESET));
+
+   /* Re-enable the performance monitors */
+   gpu_rmw(gpu, REG_A2XX_RBBM_PM_OVERRIDE2, BIT(6), BIT(6));
+   gpu_write(gpu, REG_A2XX_RBBM_PERFCOUNTER1_SELECT, 1);
+   gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, FIELD_PREP(PERF_MODE_CNT, 
PERF_STATE_ENABLE));
+
+   *out_sample_rate = clk_get_rate(gpu->core_clk);
+
+   return busy_cycles;
+}
+
 static u32 a2xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR);
@@ -502,6 +529,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
.show = adreno_show,
 #endif
+   .gpu_busy = a2xx_gpu_busy,
.gpu_state_get = a2xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
.create_address_space = a2xx_create_address_space,

-- 
2.39.2

[PATCH 4/5] drm/msm/a4xx: Implement .gpu_busy

2023-02-22 Thread Konrad Dybcio

Add support for gpu_busy on a4xx, which is required for devfreq
support.

Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 3e09d3a7a0ac..715436cb3996 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -611,6 +611,16 @@ static int a4xx_get_timestamp(struct msm_gpu *gpu, 
uint64_t *value)
return 0;
 }
 
+static u64 a4xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
+{
+   u64 busy_cycles;
+
+   busy_cycles = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_RBBM_1_LO);
+   *out_sample_rate = clk_get_rate(gpu->core_clk);
+
+   return busy_cycles;
+}
+
 static u32 a4xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
ring->memptrs->rptr = gpu_read(gpu, REG_A4XX_CP_RB_RPTR);
@@ -632,6 +642,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
.show = adreno_show,
 #endif
+   .gpu_busy = a4xx_gpu_busy,
.gpu_state_get = a4xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
.create_address_space = adreno_create_address_space,

-- 
2.39.2

[PATCH 1/5] drm/msm/adreno: Use OPP for every GPU generation

2023-02-22 Thread Konrad Dybcio

Some older GPUs (namely a2xx with no opp tables at all and a320 with
downstream-remnants gpu pwrlevels) used not to have OPP tables. They
both however had just one frequency defined, making it extremely easy
to construct such an OPP table from within the driver if need be.

Do so and switch all clk_set_rate calls on core_clk to their OPP
counterparts.

Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 94 +++--
 drivers/gpu/drm/msm/msm_gpu.c   |  4 +-
 drivers/gpu/drm/msm/msm_gpu_devfreq.c   |  2 +-
 3 files changed, 45 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index ce6b76c45b6f..9b940c0f063f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -922,73 +922,50 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, 
uint32_t ndwords)
ring->id);
 }
 
-/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp table */
-static int adreno_get_legacy_pwrlevels(struct device *dev)
-{
-   struct device_node *child, *node;
-   int ret;
-
-   node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels");
-   if (!node) {
-   DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n");
-   return -ENXIO;
-   }
-
-   for_each_child_of_node(node, child) {
-   unsigned int val;
-
-   ret = of_property_read_u32(child, "qcom,gpu-freq", &val);
-   if (ret)
-   continue;
-
-   /*
-* Skip the intentionally bogus clock value found at the bottom
-* of most legacy frequency tables
-*/
-   if (val != 2700)
-   dev_pm_opp_add(dev, val, 0);
-   }
-
-   of_node_put(node);
-
-   return 0;
-}
-
-static void adreno_get_pwrlevels(struct device *dev,
+static int adreno_get_pwrlevels(struct device *dev,
struct msm_gpu *gpu)
 {
+   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
unsigned long freq = ULONG_MAX;
struct dev_pm_opp *opp;
int ret;
 
gpu->fast_rate = 0;
 
-   /* You down with OPP? */
-   if (!of_find_property(dev->of_node, "operating-points-v2", NULL))
-   ret = adreno_get_legacy_pwrlevels(dev);
-   else {
-   ret = devm_pm_opp_of_add_table(dev);
-   if (ret)
-   DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
-   }
-
-   if (!ret) {
+   /* devm_pm_opp_of_add_table may error out but will still create an OPP 
table */
+   ret = devm_pm_opp_of_add_table(dev);
+   if (ret == -ENODEV) {
+   /* Special cases for ancient hw with ancient DT bindings */
+   if (adreno_is_a2xx(adreno_gpu)) {
+   dev_warn(dev, "Unable to find the OPP table. Falling 
back to 200 MHz.\n");
+   dev_pm_opp_add(dev, 2, 0);
+   gpu->fast_rate = 2;
+   } else if (adreno_is_a320(adreno_gpu)) {
+   dev_warn(dev, "Unable to find the OPP table. Falling 
back to 450 MHz.\n");
+   dev_pm_opp_add(dev, 45000, 0);
+   gpu->fast_rate = 45000;
+   } else {
+   DRM_DEV_ERROR(dev, "Unable to find the OPP table\n");
+   return -ENODEV;
+   }
+   } else if (ret) {
+   DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
+   return ret;
+   } else {
/* Find the fastest defined rate */
opp = dev_pm_opp_find_freq_floor(dev, &freq);
-   if (!IS_ERR(opp)) {
+
+   if (IS_ERR(opp))
+   return PTR_ERR(opp);
+   else {
gpu->fast_rate = freq;
dev_pm_opp_put(opp);
}
}
 
-   if (!gpu->fast_rate) {
-   dev_warn(dev,
-   "Could not find a clock rate. Using a reasonable 
default\n");
-   /* Pick a suitably safe clock speed for any target */
-   gpu->fast_rate = 2;
-   }
-
DBG("fast_rate=%u, slow_rate=2700", gpu->fast_rate);
+
+   return 0;
 }
 
 int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu *adreno_gpu,
@@ -1046,6 +1023,17 @@ int adreno_gpu_init(struct drm_device *drm, struct 
platform_device *pdev,
struct adreno_rev *rev = &config->rev;
const char *gpu_name;
u32 speedbin;
+   int ret;
+
+   /* This can only be done here, or devm_pm_opp_set_supported_hw will 
WARN_ON() */
+   if (IS_ERR(devm_clk_get(dev, "core"))) {
+   /*
+* If "core" is absent, go for the legacy clock name.
+* If we got this

[PATCH 0/5] OPP and devfreq for all Adrenos

2023-02-22 Thread Konrad Dybcio

This series is a combination of [1] and a subset of [2] and some new
stuff.

With it, devfreq is used on all a2xx-a6xx (including gmu and
gmu-wrapper) and all clk_set_rate(core clock) calls are dropped in
favour of dev_pm_opp_set_rate, which - drumroll - lets us scale
the voltage domain. DT patches making use of that will be sent
separately.

On top of that, a5xx gets a call to enable icc scaling from the OPP
tables. No SoCs implementing a2xx have icc support yet and a3/4xx
SoCs have separate logic for that, which will be updated at a later
time.

Getting this in for 6.4 early would be appreciated, as that would
allow for getting GMU wrapper GPUs up (without VDD&icc scaling they
can only run at lower freqs, which is.. ehhh..)

Changes:
- a3xx busy: use the _1 counter as per msm-3.x instead of _0
- a6xx-series-opp: basically rewrite, ensure compat with all gens
- a2/4xx busy: new patch
- a5xx icc: new patch

[1] 
https://lore.kernel.org/linux-arm-msm/20230130093809.2079314-1-konrad.dyb...@linaro.org/
[2] 
https://lore.kernel.org/linux-arm-msm/20230214173145.2482651-1-konrad.dyb...@linaro.org/

Signed-off-by: Konrad Dybcio 
---
Konrad Dybcio (5):
  drm/msm/adreno: Use OPP for every GPU generation
  drm/msm/a2xx: Implement .gpu_busy
  drm/msm/a3xx: Implement .gpu_busy
  drm/msm/a4xx: Implement .gpu_busy
  drm/msm/a5xx: Enable optional icc voting from OPP tables

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c   | 28 ++
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   | 11 
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   | 11 
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  4 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 94 +++--
 drivers/gpu/drm/msm/msm_gpu.c   |  4 +-
 drivers/gpu/drm/msm/msm_gpu_devfreq.c   |  2 +-
 7 files changed, 99 insertions(+), 55 deletions(-)
---
base-commit: f4ed0868966d96203fee6f2782508746ded2ce3f
change-id: 20230222-konrad-longbois-next-86d1a69532c2

Best regards,
-- 
Konrad Dybcio

[PATCH] drm/i915/gsc: Fix the Driver-FLR completion

2023-02-22 Thread Alan Previn

The Driver-FLR flow may inadvertently exit early before the full
completion of the re-init of the internal HW state if we only poll
GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead
we need a two-step completion wait-for-completion flow that also
involves GU_CNTL. See the patch and new code comments for detail.
This is new direction from HW architecture folks.

   v2: - Add error message for the teardown timeout (Anshuman)
   - Don't duplicate code in comments (Jani)

Signed-off-by: Alan Previn 
Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was 
loaded")
---
 drivers/gpu/drm/i915/intel_uncore.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
b/drivers/gpu/drm/i915/intel_uncore.c
index f018da7ebaac..f3c46352db89 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -2749,14 +2749,25 @@ static void driver_initiated_flr(struct intel_uncore 
*uncore)
/* Trigger the actual Driver-FLR */
intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR);
 
+   /* Wait for hardware teardown to complete */
+   ret = intel_wait_for_register_fw(uncore, GU_CNTL,
+DRIVERFLR_STATUS, 0,
+flr_timeout_ms);
+   if (ret) {
+   drm_err(&i915->drm, "Driver-FLR-teardown wait completion 
failed! %d\n", ret);
+   return;
+   }
+
+   /* Wait for hardware/firmware re-init to complete */
ret = intel_wait_for_register_fw(uncore, GU_DEBUG,
 DRIVERFLR_STATUS, DRIVERFLR_STATUS,
 flr_timeout_ms);
if (ret) {
-   drm_err(&i915->drm, "wait for Driver-FLR completion failed! 
%d\n", ret);
+   drm_err(&i915->drm, "Driver-FLR-reinit wait completion failed! 
%d\n", ret);
return;
}
 
+   /* Clear sticky completion status */
intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS);
 }
 
-- 
2.39.0

Re: [Intel-gfx] [PATCH 2/2] Apply quirk to disable PSR 2 on Tongfang PHxTxX1 and PHxTQx1

2023-02-22 Thread Rodrigo Vivi

On Wed, Feb 22, 2023 at 03:17:55PM +0100, Werner Sembach wrote:
> On these Barebones PSR 2 is recognized as supported but is very buggy:
> - Upper third of screen does sometimes not updated, resulting in
> disappearing cursors or ghosts of already closed Windows saying behind.
> - Approximately 40 px from the bottom edge a 3 pixel wide strip of randomly
> colored pixels is flickering.
> 
> PSR 1 is working fine however.

I wonder if this is really about the panel's PSR2 or about the userspace
there not marking the dirtyfb? I know I know... it is not userspace fault...

But I wonder if the case you got here highlights the fact that we have
a substantial bug in the i915 itself in regards to PSR2 API.

Jose, Jouni, ideas on how to check what could be happening here?

oh, btw, Werner, do we have an  open gilab issue for this?

Thanks,
Rodrigo.

> 
> Signed-off-by: Werner Sembach 
> Cc: 
> ---
>  drivers/gpu/drm/i915/display/intel_quirks.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_quirks.c 
> b/drivers/gpu/drm/i915/display/intel_quirks.c
> index ce6d0fe6448f5..eeb32d3189f5c 100644
> --- a/drivers/gpu/drm/i915/display/intel_quirks.c
> +++ b/drivers/gpu/drm/i915/display/intel_quirks.c
> @@ -65,6 +65,10 @@ static void quirk_no_pps_backlight_power_hook(struct 
> drm_i915_private *i915)
>   drm_info(&i915->drm, "Applying no pps backlight power quirk\n");
>  }
>  
> +/*
> + * Tongfang PHxTxX1 and PHxTQx1 devices have support for PSR 2 but it is 
> broken
> + * on Linux. PSR 1 however works just fine.
> + */
>  static void quirk_no_psr2(struct drm_i915_private *i915)
>  {
>   intel_set_quirk(i915, QUIRK_NO_PSR2);
> @@ -205,6 +209,10 @@ static struct intel_quirk intel_quirks[] = {
>   /* ECS Liva Q2 */
>   { 0x3185, 0x1019, 0xa94d, quirk_increase_ddi_disabled_time },
>   { 0x3184, 0x1019, 0xa94d, quirk_increase_ddi_disabled_time },
> +
> + /* Tongfang PHxTxX1 and PHxTQx1/TUXEDO InfinityBook 14 Gen6 */
> + { 0x9a49, 0x1d05, 0x1105, quirk_no_psr2 },
> + { 0x9a49, 0x1d05, 0x114c, quirk_no_psr2 },
>  };
>  
>  void intel_init_quirks(struct drm_i915_private *i915)
> -- 
> 2.34.1
>

Re: [PATCH 3/4] drm/msm/adreno: drop redundant pm_runtime_disable()

2023-02-22 Thread Rob Clark

On Tue, Feb 21, 2023 at 2:16 AM Johan Hovold  wrote:
>
> Since commit 4b18299b3365 ("drm/msm/adreno: Defer enabling runpm until
> hw_init()") runtime PM is no longer enabled at adreno_gpu_init(), which
> means that there are no longer any bind() error paths for which
> adreno_gpu_cleanup() is called with runtime PM enabled.
>
> As the runtime PM enable on first open() is balanced by the
> pm_runtime_force_suspend() call at unbind(), adreno_gpu_cleanup() is now
> always called with runtime PM disabled so that its pm_runtime_disable()
> call can be removed.
>
> Signed-off-by: Johan Hovold 
> ---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 5 -
>  1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index ce6b76c45b6f..1101b8234b49 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -1082,15 +1082,10 @@ int adreno_gpu_init(struct drm_device *drm, struct 
> platform_device *pdev,
>
>  void adreno_gpu_cleanup(struct adreno_gpu *adreno_gpu)
>  {
> -   struct msm_gpu *gpu = &adreno_gpu->base;
> -   struct msm_drm_private *priv = gpu->dev ? gpu->dev->dev_private : 
> NULL;
> unsigned int i;
>
> for (i = 0; i < ARRAY_SIZE(adreno_gpu->info->fw); i++)
> release_firmware(adreno_gpu->fw[i]);
>
> -   if (priv && pm_runtime_enabled(&priv->gpu_pdev->dev))
> -   pm_runtime_disable(&priv->gpu_pdev->dev);
> -

Maybe WARN_ON(priv && pm_runtime_enabled(&priv->gpu_pdev->dev))?

BR,
-R

> msm_gpu_cleanup(&adreno_gpu->base);
>  }
> --
> 2.39.2
>

Re: [PATCH v4 01/14] dma-buf/dma-fence: Add deadline awareness

2023-02-22 Thread Rob Clark

On Wed, Feb 22, 2023 at 9:33 AM Tvrtko Ursulin
 wrote:
>
>
> On 22/02/2023 17:16, Rob Clark wrote:
> > On Wed, Feb 22, 2023 at 9:05 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 22/02/2023 15:28, Christian König wrote:
> >>> Am 22.02.23 um 11:23 schrieb Tvrtko Ursulin:
> 
>  On 18/02/2023 21:15, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > vblank, which the fence waiter would prefer not to miss.  This is to aid
> > the fence signaler in making power management decisions, like boosting
> > frequency as the deadline approaches and awareness of missing deadlines
> > so that can be factored in to the frequency scaling.
> >
> > v2: Drop dma_fence::deadline and related logic to filter duplicate
> >   deadlines, to avoid increasing dma_fence size.  The fence-context
> >   implementation will need similar logic to track deadlines of all
> >   the fences on the same timeline.  [ckoenig]
> > v3: Clarify locking wrt. set_deadline callback
> >
> > Signed-off-by: Rob Clark 
> > Reviewed-by: Christian König 
> > ---
> >drivers/dma-buf/dma-fence.c | 20 
> >include/linux/dma-fence.h   | 20 
> >2 files changed, 40 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 0de0482cd36e..763b32627684 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -912,6 +912,26 @@ dma_fence_wait_any_timeout(struct dma_fence
> > **fences, uint32_t count,
> >}
> >EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> >+
> > +/**
> > + * dma_fence_set_deadline - set desired fence-wait deadline
> > + * @fence:the fence that is to be waited on
> > + * @deadline: the time by which the waiter hopes for the fence to be
> > + *signaled
> > + *
> > + * Inform the fence signaler of an upcoming deadline, such as
> > vblank, by
> > + * which point the waiter would prefer the fence to be signaled by.
> > This
> > + * is intended to give feedback to the fence signaler to aid in power
> > + * management decisions, such as boosting GPU frequency if a periodic
> > + * vblank deadline is approaching.
> > + */
> > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> > +{
> > +if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
> > +fence->ops->set_deadline(fence, deadline);
> > +}
> > +EXPORT_SYMBOL(dma_fence_set_deadline);
> > +
> >/**
> > * dma_fence_describe - Dump fence describtion into seq_file
> > * @fence: the 6fence to describe
> > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > index 775cdc0b4f24..d77f6591c453 100644
> > --- a/include/linux/dma-fence.h
> > +++ b/include/linux/dma-fence.h
> > @@ -99,6 +99,7 @@ enum dma_fence_flag_bits {
> >DMA_FENCE_FLAG_SIGNALED_BIT,
> >DMA_FENCE_FLAG_TIMESTAMP_BIT,
> >DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> > +DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
> 
>  Would this bit be better left out from core implementation, given how
>  the approach is the component which implements dma-fence has to track
>  the actual deadline and all?
> 
>  Also taking a step back - are we all okay with starting to expand the
>  relatively simple core synchronisation primitive with side channel
>  data like this? What would be the criteria for what side channel data
>  would be acceptable? Taking note the thing lives outside drivers/gpu/.
> >>>
> >>> I had similar concerns and it took me a moment as well to understand the
> >>> background why this is necessary. I essentially don't see much other
> >>> approach we could do.
> >>>
> >>> Yes, this is GPU/CRTC specific, but we somehow need a common interface
> >>> for communicating it between drivers and that's the dma_fence object as
> >>> far as I can see.
> >>
> >> Yeah I also don't see any other easy options. Just wanted to raise this
> >> as something which probably needs some wider acks.
> >>
> >> Also what about the "low level" part of my question about the reason, or
> >> benefits, of defining the deadline bit in the common layer?
> >
> > We could leave DMA_FENCE_FLAG_HAS_DEADLINE_BIT out, but OTOH managing
> > a bitmask that is partially defined in core enum and partially in
> > backend-driver has it's own drawbacks, and it isn't like we are
> > running out of bits.. :shrug:
>
> There is DMA_FENCE_FLAG_USER_BITS onwards which implementations could
> use to store their stuff?
>
> And if we skip forward to "drm/scheduler: Add fence deadline support"
> that's the only place bit is used, right? Would it simply work to look
> at drm_sched_fence->deadline == 0 as b

Re: [PATCH v2 1/2] drm: Introduce plane SIZE_HINTS property

2023-02-22 Thread Ville Syrjälä

On Tue, Feb 14, 2023 at 01:19:51PM +0200, Ville Syrjälä wrote:
> On Tue, Feb 14, 2023 at 12:01:49PM +0100, Jonas Ådahl wrote:
> > On Tue, Feb 14, 2023 at 12:28:44PM +0200, Ville Syrjälä wrote:
> > > On Tue, Feb 14, 2023 at 10:54:27AM +0100, Jonas Ådahl wrote:
> > > > On Tue, Feb 14, 2023 at 11:25:56AM +0200, Ville Syrjälä wrote:
> > > > > On Thu, Feb 09, 2023 at 03:16:23PM +0100, Jonas Ådahl wrote:
> > > > > > On Wed, Feb 08, 2023 at 11:10:16PM +0200, Ville Syrjala wrote:
> > > > > > > From: Ville Syrjälä 
> > > > > > > 
> > > > > > > Add a new immutable plane property by which a plane can advertise
> > > > > > > a handful of recommended plane sizes. This would be mostly exposed
> > > > > > > by cursor planes as a slightly more capable replacement for
> > > > > > > the DRM_CAP_CURSOR_WIDTH/HEIGHT caps, which can only declare
> > > > > > > a one size fits all limit for the whole device.
> > > > > > > 
> > > > > > > Currently eg. amdgpu/i915/nouveau just advertize the max cursor
> > > > > > > size via the cursor size caps. But always using the max sized
> > > > > > > cursor can waste a surprising amount of power, so a better
> > > > > > > stragey is desirable.
> > > > > > > 
> > > > > > > Most other drivers don't specify any cursor size at all, in
> > > > > > > which case the ioctl code just claims that 64x64 is a great
> > > > > > > choice. Whether that is actually true is debatable.
> > > > > > > 
> > > > > > > A poll of various compositor developers informs us that
> > > > > > > blindly probing with setcursor/atomic ioctl to determine
> > > > > > > suitable cursor sizes is not acceptable, thus the
> > > > > > > introduction of the new property to supplant the cursor
> > > > > > > size caps. The compositor will now be free to select a
> > > > > > > more optimal cursor size from the short list of options.
> > > > > > > 
> > > > > > > Note that the reported sizes (either via the property or the
> > > > > > > caps) make no claims about things such as plane scaling. So
> > > > > > > these things should only really be consulted for simple
> > > > > > > "cursor like" use cases.
> > > > > > > 
> > > > > > > v2: Try to add some docs
> > > > > > > 
> > > > > > > Cc: Simon Ser 
> > > > > > > Cc: Jonas Ådahl 
> > > > > > > Cc: Daniel Stone 
> > > > > > > Cc: Pekka Paalanen 
> > > > > > > Acked-by: Harry Wentland 
> > > > > > > Signed-off-by: Ville Syrjälä 
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/drm_mode_config.c |  7 +
> > > > > > >  drivers/gpu/drm/drm_plane.c   | 48 
> > > > > > > +++
> > > > > > >  include/drm/drm_mode_config.h |  5 
> > > > > > >  include/drm/drm_plane.h   |  4 +++
> > > > > > >  include/uapi/drm/drm_mode.h   | 11 +++
> > > > > > >  5 files changed, 75 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/drm_mode_config.c 
> > > > > > > b/drivers/gpu/drm/drm_mode_config.c
> > > > > > > index 87eb591fe9b5..21860f94a18c 100644
> > > > > > > --- a/drivers/gpu/drm/drm_mode_config.c
> > > > > > > +++ b/drivers/gpu/drm/drm_mode_config.c
> > > > > > > @@ -374,6 +374,13 @@ static int 
> > > > > > > drm_mode_create_standard_properties(struct drm_device *dev)
> > > > > > >   return -ENOMEM;
> > > > > > >   dev->mode_config.modifiers_property = prop;
> > > > > > >  
> > > > > > > + prop = drm_property_create(dev,
> > > > > > > +DRM_MODE_PROP_IMMUTABLE | 
> > > > > > > DRM_MODE_PROP_BLOB,
> > > > > > > +"SIZE_HINTS", 0);
> > > > > > > + if (!prop)
> > > > > > > + return -ENOMEM;
> > > > > > > + dev->mode_config.size_hints_property = prop;
> > > > > > > +
> > > > > > >   return 0;
> > > > > > >  }
> > > > > > >  
> > > > > > > diff --git a/drivers/gpu/drm/drm_plane.c 
> > > > > > > b/drivers/gpu/drm/drm_plane.c
> > > > > > > index 24e7998d1731..ae51b1f83755 100644
> > > > > > > --- a/drivers/gpu/drm/drm_plane.c
> > > > > > > +++ b/drivers/gpu/drm/drm_plane.c
> > > > > > > @@ -140,6 +140,21 @@
> > > > > > >   * DRM_FORMAT_MOD_LINEAR. Before linux kernel release v5.1 
> > > > > > > there have been
> > > > > > >   * various bugs in this area with inconsistencies between 
> > > > > > > the capability
> > > > > > >   * flag and per-plane properties.
> > > > > > > + *
> > > > > > > + * SIZE_HINTS:
> > > > > > > + * Blob property which contains the set of recommended plane 
> > > > > > > size
> > > > > > > + * which can used for simple "cursor like" use cases (eg. no 
> > > > > > > scaling).
> > > > > > > + * Using these hints frees userspace from extensive probing 
> > > > > > > of
> > > > > > > + * supported plane sizes through atomic/setcursor ioctls.
> > > > > > > + *
> > > > > > > + * For optimal usage userspace should pick the smallest size
> > > > > > > + * that satisfies its own requirements.
> > > > > > > + *
> > > > > > > + * The blob contains an array of struct drm_plane_size_hint.
> > > > > > > + *
> > > >

Re: [PATCH 1/2] Mips: ls2k1000: dts: add the display controller device node

2023-02-22 Thread Krzysztof Kozlowski

On 22/02/2023 17:55, suijingfeng wrote:
> The display controller is a pci device, it's pci vendor id is
> 0x0014, it's pci device id is 0x7a06.
> 
> Signed-off-by: suijingfeng 
> ---
>  .../boot/dts/loongson/loongson64-2k1000.dtsi  | 21 +++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi 
> b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
> index 8143a6e3..a528af3977d9 100644
> --- a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
> +++ b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
> @@ -31,6 +31,18 @@ memory@20 {
>   <0x0001 0x1000 0x0001 0xb000>; /* 6912 
> MB at 4352MB */
>   };
>  
> + reserved-memory {
> + #address-cells = <2>;
> + #size-cells = <2>;
> + ranges;
> +
> + display_reserved: framebuffer@3000 {
> + compatible = "shared-dma-pool";
> + reg = <0x0 0x3000 0x0 0x0400>; /* 64M */
> + linux,cma-default;
> + };
> + };
> +
>   cpu_clk: cpu_clk {
>   #clock-cells = <0>;
>   compatible = "fixed-clock";
> @@ -198,6 +210,15 @@ sata@8,0 {
>   interrupt-parent = <&liointc0>;
>   };
>  
> + display-controller@6,0 {
> + compatible = "loongson,ls2k1000-dc";
> +
> + reg = <0x3000 0x0 0x0 0x0 0x0>;
> + interrupts = <28 IRQ_TYPE_LEVEL_LOW>;
> + interrupt-parent = <&liointc0>;
> + memory-region = <&display_reserved>;

NAK. Test your code against the bindings you send. It's the same
patchset. You basically send something which the same moment is incorrect.

Best regards,
Krzysztof

Re: [PATCH 2/2] dt-bindings: display: Add Loongson display controller

2023-02-22 Thread Krzysztof Kozlowski

On 22/02/2023 17:55, suijingfeng wrote:
> This patch add a trival DT usages for loongson display controller found
> in LS2k1000 SoC.

Trivial yet so many things to improve... if you only started from recent
kernel tree (since you Cced wrong address, I doubt you did) and bindings
you would avoid half of these comments.

> 
> Signed-off-by: suijingfeng 
> ---
>  .../loongson/loongson,display-controller.yaml | 58 +++
>  1 file changed, 58 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml
> 
> diff --git 
> a/Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml
>  
> b/Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml
> new file mode 100644
> index ..98b78f449a80
> --- /dev/null
> +++ 
> b/Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml

Filename based on compatible, so "loongson,ls2k1000-dc.yaml"

> @@ -0,0 +1,58 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: 
> http://devicetree.org/schemas/display/loongson/loongson,display-controller.yaml#


> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Loongson Display Controller Device Tree Bindings

Drop "Device Tree Bindings"

> +
> +maintainers:
> +  - Sui Jingfeng 
> +
> +description: |+

Drop |+

> +

No need for blank line. Do you see it anywhere else in the bindings?

> +  The display controller is a PCI device, it has two display pipe.
> +  For the DC in LS2K1000 each way has a DVO output interface which
> +  provide RGB888 signals, vertical & horizontal synchronisations
> +  and the pixel clock. Each CRTC is able to support 1920x1080@60Hz,
> +  the maximum resolution is 2048x2048 according to the hardware spec.
> +
> +properties:
> +  $nodename:
> +pattern: "^display-controller@[0-9a-f],[0-9a-f]$"

Drop nodename.

> +
> +  compatible:
> +oneOf:

Drop oneOf

> +  - items:

and items...

> +  - enum:
> +  - loongson,ls2k1000-dc
> +  reg:
> +maxItems: 1
> +
> +  interrupts:
> +maxItems: 1
> +
> +required:
> +  - compatible
> +  - reg
> +  - interrupts
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +#include 
> +bus {
> +

Drop blank line.

> +#address-cells = <3>;
> +#size-cells = <2>;
> +#interrupt-cells = <2>;

Why do you need interrupt-cells?

> +
> +display-controller@6,0 {
> +compatible = "loongson,ls2k1000-dc";
> +reg = <0x3000 0x0 0x0 0x0 0x0>;> +interrupts = <28 
> IRQ_TYPE_LEVEL_LOW>;
> +};
> +};
> +
> +...

Best regards,
Krzysztof

Re: [PATCH] drm/msm: return early when allocating fbdev fails

2023-02-22 Thread Nathan Chancellor

On Wed, Feb 22, 2023 at 05:09:40PM +0100, Thomas Zimmermann wrote:
> Hi
> 
> Am 22.02.23 um 16:56 schrieb Tom Rix:
> > building with clang and W=1 reports
> > drivers/gpu/drm/msm/msm_fbdev.c:144:6: error: variable 'helper' is used
> >uninitialized whenever 'if' condition is true 
> > [-Werror,-Wsometimes-uninitialized]
> >if (!fbdev)
> >^~
> > 
> > helper is only initialized after fbdev succeeds, so is in a garbage state at
> > the fail: label.  There is nothing to unwinded if fbdev alloaction fails,
> > return NULL.
> > 
> > Fixes: 3fb1f62f80a1 ("drm/fb-helper: Remove drm_fb_helper_unprepare() from 
> > drm_fb_helper_fini()")
> > Signed-off-by: Tom Rix 
> 
> Already fixed here: 
> https://lore.kernel.org/dri-devel/08e3340e-b459-0e60-4bba-30716b675...@suse.de/T/#t

There is also:

../drivers/gpu/drm/omapdrm/omap_fbdev.c:235:6: error: variable 'helper' is used 
uninitialized whenever 'if' condition is true 
[-Werror,-Wsometimes-uninitialized]
if (!fbdev)
^~
../drivers/gpu/drm/omapdrm/omap_fbdev.c:259:26: note: uninitialized use occurs 
here
drm_fb_helper_unprepare(helper);
^~
../drivers/gpu/drm/omapdrm/omap_fbdev.c:235:2: note: remove the 'if' if its 
condition is always false
if (!fbdev)
^~~
../drivers/gpu/drm/omapdrm/omap_fbdev.c:228:30: note: initialize the variable 
'helper' to silence this warning
struct drm_fb_helper *helper;
^
 = NULL
1 error generated.

Is the fix the same as the one you have linked above?

Cheers,
Nathan

Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings

2023-02-22 Thread Danilo Krummrich


On 2/21/23 19:20, Liam R. Howlett wrote:

* Danilo Krummrich  [230217 08:45]:

Add infrastructure to keep track of GPU virtual address (VA) mappings
with a decicated VA space manager implementation.

New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
start implementing, allow userspace applications to request multiple and
arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
intended to serve the following purposes in this context.

1) Provide infrastructure to track GPU VA allocations and mappings,
making use of the maple_tree.

2) Generically connect GPU VA mappings to their backing buffers, in
particular DRM GEM objects.

3) Provide a common implementation to perform more complex mapping
operations on the GPU VA space. In particular splitting and merging
of GPU VA mappings, e.g. for intersecting mapping requests or partial
unmap requests.

Suggested-by: Dave Airlie 
Signed-off-by: Danilo Krummrich 
---
  Documentation/gpu/drm-mm.rst|   31 +
  drivers/gpu/drm/Makefile|1 +
  drivers/gpu/drm/drm_gem.c   |3 +
  drivers/gpu/drm/drm_gpuva_mgr.c | 1704 +++
  include/drm/drm_drv.h   |6 +
  include/drm/drm_gem.h   |   75 ++
  include/drm/drm_gpuva_mgr.h |  714 +
  7 files changed, 2534 insertions(+)
  create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
  create mode 100644 include/drm/drm_gpuva_mgr.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index a52e6f4117d6..c9f120cfe730 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
  .. kernel-doc:: drivers/gpu/drm/drm_mm.c
 :export:
  

...


+
+/**
+ * drm_gpuva_remove_iter - removes the iterators current element
+ * @it: the &drm_gpuva_iterator
+ *
+ * This removes the element the iterator currently points to.
+ */
+void
+drm_gpuva_iter_remove(struct drm_gpuva_iterator *it)
+{
+   mas_erase(&it->mas);
+}
+EXPORT_SYMBOL(drm_gpuva_iter_remove);
+
+/**
+ * drm_gpuva_insert - insert a &drm_gpuva
+ * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
+ * @va: the &drm_gpuva to insert
+ * @addr: the start address of the GPU VA
+ * @range: the range of the GPU VA
+ *
+ * Insert a &drm_gpuva with a given address and range into a
+ * &drm_gpuva_manager.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuva_insert(struct drm_gpuva_manager *mgr,
+struct drm_gpuva *va)
+{
+   u64 addr = va->va.addr;
+   u64 range = va->va.range;
+   MA_STATE(mas, &mgr->va_mt, addr, addr + range - 1);
+   struct drm_gpuva_region *reg = NULL;
+   int ret;
+
+   if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
+   return -EINVAL;
+
+   if (unlikely(drm_gpuva_in_kernel_region(mgr, addr, range)))
+   return -EINVAL;
+
+   if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS) {
+   reg = drm_gpuva_in_region(mgr, addr, range);
+   if (unlikely(!reg))
+   return -EINVAL;
+   }
+


-


+   if (unlikely(drm_gpuva_find_first(mgr, addr, range)))
+   return -EEXIST;
+
+   ret = mas_store_gfp(&mas, va, GFP_KERNEL);


mas_walk() will set the internal maple state to the limits to what it
finds.  So, instead of an iterator, you can use the walk function and
ensure there is a large enough area in the existing NULL:

/*
  * Nothing at addr, mas now points to the location where the store would
  * happen
  */
if (mas_walk(&mas))
return -EEXIST;

/* The NULL entry ends at mas.last, make sure there is room */
if (mas.last < (addr + range - 1))
return -EEXIST;

/* Limit the store size to the correct end address, and store */
  mas.last = addr + range - 1;
  ret = mas_store_gfp(&mas, va, GFP_KERNEL);



Would this variant be significantly more efficient?

Also, would this also work while already walking the tree?

To remove an entry while walking the tree I have a separate function 
drm_gpuva_iter_remove(). Would I need something similar for inserting 
entries?


I already provided this example in a separate mail thread, but it may 
makes sense to move this to the mailing list:


In __drm_gpuva_sm_map() we're iterating a given range of the tree, where 
the given range is the size of the newly requested mapping. 
__drm_gpuva_sm_map() invokes a callback for each sub-operation that 
needs to be taken in order to fulfill this mapping request. In most 
cases such a callback just creates a drm_gpuva_op object and stores it 
in a list.


However, drivers can also implement the callback, such that they 
directly execute this operation within the callback.


Let's have a look at the following example:

 0 a 2
old: |---|   (bo_offset=n)

   1 b 3
req:   |---| (bo_offset=m)

 0  a' 1 b 3
new: |-|

[ANNOUNCE] xf86-video-amdgpu 23.0.0

2023-02-22 Thread Shashank Sharma

Alan Coopersmith (2):
  Update URLs to reflect gitlab migration
  gitlab CI: enable gitlab's builtin static analysis

Kai-Heng Feng (1):
  Initialize present extension for GPU screen

Lukasz Spintzyk (3):
  Use randr_crtc_covering_drawable used in modesetting
  Prefer crtc of primary output for synchronization when screen has to 
crtcs with the same coverage
  Do not consider disabled crtc anymore when looking for xf86crtc covering 
drawable.

Mario Kleiner (1):
  Fix primary output handling in amdgpu_crtc_covering_box().

Pierre-Eric Pelloux-Prayer (1):
  Use DRM_CAP_CURSOR_WIDTH/HEIGHT if possible

Shashank Sharma (2):
  config: Add hotplug driver name
  Bump version for the 23.0.0 release

tiancyin (1):
  Fix screen corruption on secondary GPU

Łukasz Spintzyk (1):
  amdgpu: fixup driver for new X server ABI

git tag: xf86-video-amdgpu-23.0.0

https://xorg.freedesktop.org/archive/individual/driver/xf86-video-amdgpu-23.0.0.tar.gz
SHA256: 08c38287d39b999fd61ecb6e7b23d5079762e2b4b2179b3567973ed9aaf71222  
xf86-video-amdgpu-23.0.0.tar.gz
SHA512: 
ea4843bf10e12ede71a338f84bc667fe80b75de88e677161811a3de4647382bbfe44b720439e2ea7768961b8a7f9cc31362da454038921ec2c577f4955476eec
  xf86-video-amdgpu-23.0.0.tar.gz
PGP:  
https://xorg.freedesktop.org/archive/individual/driver/xf86-video-amdgpu-23.0.0.tar.gz.sig

https://xorg.freedesktop.org/archive/individual/driver/xf86-video-amdgpu-23.0.0.tar.xz
SHA256: 4f04f0ea66f3ced0dcc58f617409860163a19c4e8c285cfb5285f36ba09cc061  
xf86-video-amdgpu-23.0.0.tar.xz
SHA512: 
bf26f147629a34e84a0ae8435119e170b9c95edafcab1995b63bb8f55abef32f2efbf4536eb070e64b2ae1460424b1b27a4206cb9836d33ddc6dfbee404f718b
  xf86-video-amdgpu-23.0.0.tar.xz
PGP:  
https://xorg.freedesktop.org/archive/individual/driver/xf86-video-amdgpu-23.0.0.tar.xz.sig

Re: [PATCH 1/2] drm/msm: drop unused ring variable in msm_ioctl_gem_submit()

2023-02-22 Thread Rob Clark

On Fri, Feb 17, 2023 at 5:25 PM Dmitry Baryshkov
 wrote:
>
> The variable ring is not used by msm_parse_deps() and
> msm_ioctl_gem_submit() and thus can be dropped.
>
> Signed-off-by: Dmitry Baryshkov 
> ---
>  drivers/gpu/drm/msm/msm_gem_submit.c | 10 +++---
>  drivers/gpu/drm/msm/msm_gpu_trace.h  | 10 --
>  2 files changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
> b/drivers/gpu/drm/msm/msm_gem_submit.c
> index ac8ed731f76d..a539eb31042f 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -550,8 +550,7 @@ static struct drm_syncobj **msm_parse_deps(struct 
> msm_gem_submit *submit,
> struct drm_file *file,
> uint64_t in_syncobjs_addr,
> uint32_t nr_in_syncobjs,
> -   size_t syncobj_stride,
> -   struct msm_ringbuffer *ring)
> +   size_t syncobj_stride)
>  {
> struct drm_syncobj **syncobjs = NULL;
> struct drm_msm_gem_submit_syncobj syncobj_desc = {0};
> @@ -722,7 +721,6 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void 
> *data,
> struct msm_gem_submit *submit;
> struct msm_gpu *gpu = priv->gpu;
> struct msm_gpu_submitqueue *queue;
> -   struct msm_ringbuffer *ring;
> struct msm_submit_post_dep *post_deps = NULL;
> struct drm_syncobj **syncobjs_to_reset = NULL;
> int out_fence_fd = -1;
> @@ -760,8 +758,6 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void 
> *data,
> if (!queue)
> return -ENOENT;
>
> -   ring = gpu->rb[queue->ring_nr];
> -
> if (args->flags & MSM_SUBMIT_FENCE_FD_OUT) {
> out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
> if (out_fence_fd < 0) {
> @@ -774,7 +770,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void 
> *data,
> if (IS_ERR(submit))
> return PTR_ERR(submit);
>
> -   trace_msm_gpu_submit(pid_nr(submit->pid), ring->id, submit->ident,
> +   trace_msm_gpu_submit(pid_nr(submit->pid), submit->ident,
> args->nr_bos, args->nr_cmds);

Please don't remove things from the tracepoint, we have userspace
tools that use the tracepoints..

Other parts look ok.

BR,
-R

>
> ret = mutex_lock_interruptible(&queue->lock);
> @@ -803,7 +799,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void 
> *data,
> syncobjs_to_reset = msm_parse_deps(submit, file,
>args->in_syncobjs,
>args->nr_in_syncobjs,
> -  args->syncobj_stride, 
> ring);
> +  args->syncobj_stride);
> if (IS_ERR(syncobjs_to_reset)) {
> ret = PTR_ERR(syncobjs_to_reset);
> goto out_unlock;
> diff --git a/drivers/gpu/drm/msm/msm_gpu_trace.h 
> b/drivers/gpu/drm/msm/msm_gpu_trace.h
> index ac40d857bc45..12ef10f1de4c 100644
> --- a/drivers/gpu/drm/msm/msm_gpu_trace.h
> +++ b/drivers/gpu/drm/msm/msm_gpu_trace.h
> @@ -9,24 +9,22 @@
>  #define TRACE_INCLUDE_FILE msm_gpu_trace
>
>  TRACE_EVENT(msm_gpu_submit,
> -   TP_PROTO(pid_t pid, u32 ringid, u32 id, u32 nr_bos, u32 nr_cmds),
> -   TP_ARGS(pid, ringid, id, nr_bos, nr_cmds),
> +   TP_PROTO(pid_t pid, u32 id, u32 nr_bos, u32 nr_cmds),
> +   TP_ARGS(pid, id, nr_bos, nr_cmds),
> TP_STRUCT__entry(
> __field(pid_t, pid)
> __field(u32, id)
> -   __field(u32, ringid)
> __field(u32, nr_cmds)
> __field(u32, nr_bos)
> ),
> TP_fast_assign(
> __entry->pid = pid;
> __entry->id = id;
> -   __entry->ringid = ringid;
> __entry->nr_bos = nr_bos;
> __entry->nr_cmds = nr_cmds
> ),
> -   TP_printk("id=%d pid=%d ring=%d bos=%d cmds=%d",
> -   __entry->id, __entry->pid, __entry->ringid,
> +   TP_printk("id=%d pid=%d bos=%d cmds=%d",
> +   __entry->id, __entry->pid,
> __entry->nr_bos, __entry->nr_cmds)
>  );
>
> --
> 2.39.1
>

Re: [PATCH v4 01/14] dma-buf/dma-fence: Add deadline awareness

2023-02-22 Thread Tvrtko Ursulin




On 22/02/2023 17:16, Rob Clark wrote:

On Wed, Feb 22, 2023 at 9:05 AM Tvrtko Ursulin
 wrote:



On 22/02/2023 15:28, Christian König wrote:

Am 22.02.23 um 11:23 schrieb Tvrtko Ursulin:


On 18/02/2023 21:15, Rob Clark wrote:

From: Rob Clark 

Add a way to hint to the fence signaler of an upcoming deadline, such as
vblank, which the fence waiter would prefer not to miss.  This is to aid
the fence signaler in making power management decisions, like boosting
frequency as the deadline approaches and awareness of missing deadlines
so that can be factored in to the frequency scaling.

v2: Drop dma_fence::deadline and related logic to filter duplicate
  deadlines, to avoid increasing dma_fence size.  The fence-context
  implementation will need similar logic to track deadlines of all
  the fences on the same timeline.  [ckoenig]
v3: Clarify locking wrt. set_deadline callback

Signed-off-by: Rob Clark 
Reviewed-by: Christian König 
---
   drivers/dma-buf/dma-fence.c | 20 
   include/linux/dma-fence.h   | 20 
   2 files changed, 40 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 0de0482cd36e..763b32627684 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -912,6 +912,26 @@ dma_fence_wait_any_timeout(struct dma_fence
**fences, uint32_t count,
   }
   EXPORT_SYMBOL(dma_fence_wait_any_timeout);
   +
+/**
+ * dma_fence_set_deadline - set desired fence-wait deadline
+ * @fence:the fence that is to be waited on
+ * @deadline: the time by which the waiter hopes for the fence to be
+ *signaled
+ *
+ * Inform the fence signaler of an upcoming deadline, such as
vblank, by
+ * which point the waiter would prefer the fence to be signaled by.
This
+ * is intended to give feedback to the fence signaler to aid in power
+ * management decisions, such as boosting GPU frequency if a periodic
+ * vblank deadline is approaching.
+ */
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
+fence->ops->set_deadline(fence, deadline);
+}
+EXPORT_SYMBOL(dma_fence_set_deadline);
+
   /**
* dma_fence_describe - Dump fence describtion into seq_file
* @fence: the 6fence to describe
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..d77f6591c453 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -99,6 +99,7 @@ enum dma_fence_flag_bits {
   DMA_FENCE_FLAG_SIGNALED_BIT,
   DMA_FENCE_FLAG_TIMESTAMP_BIT,
   DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+DMA_FENCE_FLAG_HAS_DEADLINE_BIT,


Would this bit be better left out from core implementation, given how
the approach is the component which implements dma-fence has to track
the actual deadline and all?

Also taking a step back - are we all okay with starting to expand the
relatively simple core synchronisation primitive with side channel
data like this? What would be the criteria for what side channel data
would be acceptable? Taking note the thing lives outside drivers/gpu/.


I had similar concerns and it took me a moment as well to understand the
background why this is necessary. I essentially don't see much other
approach we could do.

Yes, this is GPU/CRTC specific, but we somehow need a common interface
for communicating it between drivers and that's the dma_fence object as
far as I can see.


Yeah I also don't see any other easy options. Just wanted to raise this
as something which probably needs some wider acks.

Also what about the "low level" part of my question about the reason, or
benefits, of defining the deadline bit in the common layer?


We could leave DMA_FENCE_FLAG_HAS_DEADLINE_BIT out, but OTOH managing
a bitmask that is partially defined in core enum and partially in
backend-driver has it's own drawbacks, and it isn't like we are
running out of bits.. :shrug:


There is DMA_FENCE_FLAG_USER_BITS onwards which implementations could 
use to store their stuff?


And if we skip forward to "drm/scheduler: Add fence deadline support" 
that's the only place bit is used, right? Would it simply work to look 
at drm_sched_fence->deadline == 0 as bit not set? Or you see a need to 
interoperate with other fence implementations via that bit somehow?


Regards,

Tvrtko

Re: [PATCH] dt-bindings: display: Start the info graphics with HS/VS change

2023-02-22 Thread Sam Ravnborg

Hi Marek.

On Tue, Feb 21, 2023 at 09:04:07PM +0100, Marek Vasut wrote:
> The VS signal change is synchronized to HS signal change, start the
> info graphics with that event, instead of having that event occur in
> the middle of it.
> 
> Scope trace of DPI bus with HS/VS active HIGH looks as follows:
>  ...__
> VS...___/__ __\__...
> HS...___/  \___/  \__...__/  \___...
> ^^
>   ||
> |Used to start here -'
>   |
>   '--- Start info graphics here
> 
> Signed-off-by: Marek Vasut 

I recall being annoyed about this before.
Reviewed-by: Sam Ravnborg 

> ---
> Cc: Daniel Vetter 
> Cc: David Airlie 
> Cc: Krzysztof Kozlowski 
> Cc: Rob Herring 
> Cc: Sam Ravnborg 
> Cc: Thierry Reding 
> Cc: devicet...@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> ---
>  .../bindings/display/panel/panel-timing.yaml  | 46 +--
>  1 file changed, 23 insertions(+), 23 deletions(-)
> 
> diff --git 
> a/Documentation/devicetree/bindings/display/panel/panel-timing.yaml 
> b/Documentation/devicetree/bindings/display/panel/panel-timing.yaml
> index 0d317e61edd8f..aea69b84ca5d8 100644
> --- a/Documentation/devicetree/bindings/display/panel/panel-timing.yaml
> +++ b/Documentation/devicetree/bindings/display/panel/panel-timing.yaml
> @@ -17,29 +17,29 @@ description: |
>  
>The parameters are defined as seen in the following illustration.
>  
> -  +--+-+--+---+
> -  |  |^|  |   |
> -  |  ||vback_porch |  |   |
> -  |  |v|  |   |
> -  +--###--+---+
> -  |  #^#  |   |
> -  |  #|#  |   |
> -  |  hback   #|#  hfront  | hsync |
> -  |   porch  #|   hactive  #  porch   |  len  |
> -  |<>#<---+--->#<>|<->|
> -  |  #|#  |   |
> -  |  #|vactive #  |   |
> -  |  #|#  |   |
> -  |  #v#  |   |
> -  +--###--+---+
> -  |  |^|  |   |
> -  |  ||vfront_porch|  |   |
> -  |  |v|  |   |
> -  +--+-+--+---+
> -  |  |^|  |   |
> -  |  ||vsync_len   |  |   |
> -  |  |v|  |   |
> -  +--+-+--+---+
> +  +---+--+-+--+
> +  |   |  |^|  |
> +  |   |  ||vsync_len   |  |
> +  |   |  |v|  |
> +  +---+--+-+--+
> +  |   |  |^|  |
> +  |   |  ||vback_porch |  |
> +  |   |  |v|  |
> +  +---+--###--+
> +  |   |  #^#  |
> +  |   |  #|#  |
> +  | hsync |  hback   #|#  hfront  |
> +  |  len  |   porch  #|   hactive  #  porch   |
> +  |<->|<>#<---+--->#<>|
> +  |   |  #|#  |
> +  |   |  #|vactive #  |
> +  |   |  #|#  |
> +  |   |  #v#  |
> +  +---+--###--+
> +  |   |  |^|  |
> +  |   |  ||vfront_porch|  |
> +  |   |  |v|  |
> +  +---+--+-+--+
>  
>  
>The following is the panel timings shown with tim

Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE

2023-02-22 Thread Danilo Krummrich


On 2/22/23 17:32, Matthew Wilcox wrote:

On Wed, Feb 22, 2023 at 05:11:34PM +0100, Danilo Krummrich wrote:

On 2/21/23 19:31, Matthew Wilcox wrote:

on tue, feb 21, 2023 at 03:37:49pm +0100, danilo krummrich wrote:

It feels a bit weird that I, as a user of the API, would need to lock certain
(or all?) mas_*() functions with the internal spinlock in order to protect
(future) internal features of the tree, such as the slab cache defragmentation
you mentioned. Because from my perspective, as the generic component that tells
it's users (the drivers) to take care of locking VA space operations (and hence
tree operations) I don't have an own purpose of this internal spinlock, right?


You don't ... but we can't know that.


Thanks for the clarification. I think I should now know what to for the
GPUVA manager in terms of locking the maple tree in general.

Though I still have very limited insights on the maple tree I want to share
some further thoughts.

 From what I got so far it really seems to me that it would be better to just
take the internal spinlock for both APIs (normal and advanced) whenever you
need to internally.


No.  Really, no.  The point of the advanced API is that it's a toolbox
for doing the operation you want, but isn't a generic enough operation
to be part of the normal API.


Again the disclaimer, I'm just sharing my thoughts from the perspective 
of a user from a generic tree API.


For me it feels like - and this purely is an assumption, hence please 
correct me if I'm wrong on that - you consider the advanced API to be 
more of a collection of internal functions not *really* being meant to 
be used by arbitrary users and maybe even being slightly tied to mm 
since it originated there?


However, from my external perspective I see it the following way.

Even if an operation is not part of the 'normal API', but an API called 
'advanced API', it still is a generic API operation being exposed to 
arbitrary users. However, my point is not (at least not exclusively) 
that I do not consider this to be safe enough or something.


Its just that I think that when the API *enforces* the user to take an 
internal lock at certain places it can also just take the lock itself no 
matter what the API is being called. Especially when one can't rely on 
this lock at all for other (external) purposes anyways because the 
implementation behind the API is free to drop the lock whenever it needs to.



To take an example from the radix
tree days, in the page cache, we need to walk a range of the tree,
looking for any entries that are marked as DIRTY, clear the DIRTY
mark and set the TOWRITE mark.  There was a horrendous function called
radix_tree_range_tag_if_tagged() which did exactly this.  Now look at
the implementation of tag_pages_for_writeback(); it's a simple loop over
a range with an occasional pause to check whether we need to reschedule.

But that means you need to know how to use the toolbox.  Some of the
tools are dangerous and you can cut yourself on them.


Another plus would probably be maintainability. Once you got quite a few
maple tree users using external locks (either in the sense of calling


I don't want maple tree users using external locks.  That exists
because it was the only reasonable way of converting the VMA tree
from the rbtree to the maple tree.  I intend to get rid of
mt_set_external_lock().  The VMAs are eventually going to be protected
by the internal spinlock.



But the argument also holds for the case of using the advanced API and 
using the internal spinlock. If your requirements on locking change in 
the future every user implementation must be re-validated.

Re: [PATCH v4 01/14] dma-buf/dma-fence: Add deadline awareness

2023-02-22 Thread Rob Clark

On Wed, Feb 22, 2023 at 9:05 AM Tvrtko Ursulin
 wrote:
>
>
> On 22/02/2023 15:28, Christian König wrote:
> > Am 22.02.23 um 11:23 schrieb Tvrtko Ursulin:
> >>
> >> On 18/02/2023 21:15, Rob Clark wrote:
> >>> From: Rob Clark 
> >>>
> >>> Add a way to hint to the fence signaler of an upcoming deadline, such as
> >>> vblank, which the fence waiter would prefer not to miss.  This is to aid
> >>> the fence signaler in making power management decisions, like boosting
> >>> frequency as the deadline approaches and awareness of missing deadlines
> >>> so that can be factored in to the frequency scaling.
> >>>
> >>> v2: Drop dma_fence::deadline and related logic to filter duplicate
> >>>  deadlines, to avoid increasing dma_fence size.  The fence-context
> >>>  implementation will need similar logic to track deadlines of all
> >>>  the fences on the same timeline.  [ckoenig]
> >>> v3: Clarify locking wrt. set_deadline callback
> >>>
> >>> Signed-off-by: Rob Clark 
> >>> Reviewed-by: Christian König 
> >>> ---
> >>>   drivers/dma-buf/dma-fence.c | 20 
> >>>   include/linux/dma-fence.h   | 20 
> >>>   2 files changed, 40 insertions(+)
> >>>
> >>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> >>> index 0de0482cd36e..763b32627684 100644
> >>> --- a/drivers/dma-buf/dma-fence.c
> >>> +++ b/drivers/dma-buf/dma-fence.c
> >>> @@ -912,6 +912,26 @@ dma_fence_wait_any_timeout(struct dma_fence
> >>> **fences, uint32_t count,
> >>>   }
> >>>   EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> >>>   +
> >>> +/**
> >>> + * dma_fence_set_deadline - set desired fence-wait deadline
> >>> + * @fence:the fence that is to be waited on
> >>> + * @deadline: the time by which the waiter hopes for the fence to be
> >>> + *signaled
> >>> + *
> >>> + * Inform the fence signaler of an upcoming deadline, such as
> >>> vblank, by
> >>> + * which point the waiter would prefer the fence to be signaled by.
> >>> This
> >>> + * is intended to give feedback to the fence signaler to aid in power
> >>> + * management decisions, such as boosting GPU frequency if a periodic
> >>> + * vblank deadline is approaching.
> >>> + */
> >>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> >>> +{
> >>> +if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
> >>> +fence->ops->set_deadline(fence, deadline);
> >>> +}
> >>> +EXPORT_SYMBOL(dma_fence_set_deadline);
> >>> +
> >>>   /**
> >>>* dma_fence_describe - Dump fence describtion into seq_file
> >>>* @fence: the 6fence to describe
> >>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> >>> index 775cdc0b4f24..d77f6591c453 100644
> >>> --- a/include/linux/dma-fence.h
> >>> +++ b/include/linux/dma-fence.h
> >>> @@ -99,6 +99,7 @@ enum dma_fence_flag_bits {
> >>>   DMA_FENCE_FLAG_SIGNALED_BIT,
> >>>   DMA_FENCE_FLAG_TIMESTAMP_BIT,
> >>>   DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> >>> +DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
> >>
> >> Would this bit be better left out from core implementation, given how
> >> the approach is the component which implements dma-fence has to track
> >> the actual deadline and all?
> >>
> >> Also taking a step back - are we all okay with starting to expand the
> >> relatively simple core synchronisation primitive with side channel
> >> data like this? What would be the criteria for what side channel data
> >> would be acceptable? Taking note the thing lives outside drivers/gpu/.
> >
> > I had similar concerns and it took me a moment as well to understand the
> > background why this is necessary. I essentially don't see much other
> > approach we could do.
> >
> > Yes, this is GPU/CRTC specific, but we somehow need a common interface
> > for communicating it between drivers and that's the dma_fence object as
> > far as I can see.
>
> Yeah I also don't see any other easy options. Just wanted to raise this
> as something which probably needs some wider acks.
>
> Also what about the "low level" part of my question about the reason, or
> benefits, of defining the deadline bit in the common layer?

We could leave DMA_FENCE_FLAG_HAS_DEADLINE_BIT out, but OTOH managing
a bitmask that is partially defined in core enum and partially in
backend-driver has it's own drawbacks, and it isn't like we are
running out of bits.. :shrug:

BR,
-R

> Regards,
>
> Tvrtko

Re: [PATCH v2 2/8] accel/qaic: Add uapi and core driver file

2023-02-22 Thread Dafna Hirschfeld


On 17.02.2023 11:15, Jeffrey Hugo wrote:

On 2/16/2023 7:13 AM, Jacek Lawrynowicz wrote:

Hi,

On 06.02.2023 16:41, Jeffrey Hugo wrote:

Add the QAIC driver uapi file and core driver file that binds to the PCIe
device.  The core driver file also creates the accel device and manages
all the interconnections between the different parts of the driver.

The driver can be built as a module.  If so, it will be called "qaic.ko".

Signed-off-by: Jeffrey Hugo 
Reviewed-by: Carl Vanderlip 
---
 drivers/accel/qaic/qaic.h | 321 ++
 drivers/accel/qaic/qaic_drv.c | 771 ++
 include/uapi/drm/qaic_accel.h | 283 
 3 files changed, 1375 insertions(+)
 create mode 100644 drivers/accel/qaic/qaic.h
 create mode 100644 drivers/accel/qaic/qaic_drv.c
 create mode 100644 include/uapi/drm/qaic_accel.h

diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
new file mode 100644
index 000..3f7ea76
--- /dev/null
+++ b/drivers/accel/qaic/qaic.h
@@ -0,0 +1,321 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights 
reserved.
+ */
+
+#ifndef QAICINTERNAL_H_


Please use guard macro that matches the file name: _QAIC_H_


Before moving to DRM/ACCEL, this conflicted with the uapi file. 
However, that is no longer the case, so yes, this should be changed. 
Will do.





+#define QAICINTERNAL_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define QAIC_DBC_BASE  0x2
+#define QAIC_DBC_SIZE  0x1000
+
+#define QAIC_NO_PARTITION  -1
+
+#define QAIC_DBC_OFF(i)((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
+
+#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base)
+
+extern bool poll_datapath;
+
+struct qaic_user {
+   /* Uniquely identifies this user for the device */
+   int handle;
+   struct kref ref_count;
+   /* Char device opened by this user */
+   struct qaic_drm_device  *qddev;
+   /* Node in list of users that opened this drm device */
+   struct list_headnode;
+   /* SRCU used to synchronize this user during cleanup */
+   struct srcu_struct  qddev_lock;
+   atomic_tchunk_id;
+};
+
+struct dma_bridge_chan {
+   /* Pointer to device strcut maintained by driver */
+   struct qaic_device  *qdev;
+   /* ID of this DMA bridge channel(DBC) */
+   unsigned intid;
+   /* Synchronizes access to xfer_list */
+   spinlock_t  xfer_lock;
+   /* Base address of request queue */
+   void*req_q_base;
+   /* Base address of response queue */
+   void*rsp_q_base;
+   /*
+* Base bus address of request queue. Response queue bus address can be
+* calculated by adding request queue size to this variable
+*/
+   dma_addr_t  dma_addr;
+   /* Total size of request and response queue in byte */
+   u32 total_size;
+   /* Capacity of request/response queue */
+   u32 nelem;
+   /* The user that opened this DBC */
+   struct qaic_user*usr;
+   /*
+* Request ID of next memory handle that goes in request queue. One
+* memory handle can enqueue more than one request elements, all
+* this requests that belong to same memory handle have same request ID
+*/
+   u16 next_req_id;
+   /* TRUE: DBC is in use; FALSE: DBC not in use */


Use standard "true"/"false" instead of custom "TRUE"/"FALSE" macros.
This applies here and in multiple other places in the driver.


I think you are getting at that the documentation could be confusing.  
I don't appear to see custom macro use in the code.  Will try to 
clarify that here.



+   boolin_use;
+   /*
+* Base address of device registers. Used to read/write request and
+* response queue's head and tail pointer of this DBC.
+*/
+   void __iomem*dbc_base;
+   /* Head of list where each node is a memory handle queued in request 
queue */
+   struct list_headxfer_list;
+   /* Synchronizes DBC readers during cleanup */
+   struct srcu_struct  ch_lock;
+   /*
+* When this DBC is released, any thread waiting on this wait queue is
+* woken up
+*/
+   wait_queue_head_t   dbc_release;
+   /* Head of list where each node is a bo associated with this DBC */
+   struct list_headbo_lists;
+   /* The irq line for this DBC.  Used for polling */
+   unsigned intirq;
+   /* Polling work item to simulate interrupts */
+   struct work_struct  poll_wo

[PATCH 22/27] kbuild, dma-buf: heaps: remove MODULE_LICENSE in non-modules

2023-02-22 Thread Nick Alcock

Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock 
Suggested-by: Luis Chamberlain 
Cc: Luis Chamberlain 
Cc: linux-modu...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: Hitomi Hasegawa 
Cc: Sumit Semwal 
Cc: "Christian König" 
Cc: linux-me...@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-...@lists.linaro.org
---
 drivers/dma-buf/heaps/cma_heap.c| 1 -
 drivers/dma-buf/heaps/system_heap.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index 1131fb943992..a7f048048864 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -407,4 +407,3 @@ static int add_default_cma_heap(void)
 }
 module_init(add_default_cma_heap);
 MODULE_DESCRIPTION("DMA-BUF CMA Heap");
-MODULE_LICENSE("GPL v2");
diff --git a/drivers/dma-buf/heaps/system_heap.c 
b/drivers/dma-buf/heaps/system_heap.c
index e8bd10e60998..79c03f5b4e28 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -440,4 +440,3 @@ static int system_heap_create(void)
return 0;
 }
 module_init(system_heap_create);
-MODULE_LICENSE("GPL v2");
-- 
2.39.1.268.g9de2f9a303

[PATCH 25/27] kbuild, video: fbdev: remove MODULE_LICENSE in non-modules

2023-02-22 Thread Nick Alcock

Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock 
Suggested-by: Luis Chamberlain 
Cc: Luis Chamberlain 
Cc: linux-modu...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: Hitomi Hasegawa 
Cc: Helge Deller 
Cc: linux-fb...@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
---
 drivers/video/fbdev/asiliantfb.c| 1 -
 drivers/video/fbdev/gbefb.c | 1 -
 drivers/video/fbdev/mmp/hw/mmp_ctrl.c   | 1 -
 drivers/video/fbdev/mmp/panel/tpo_tj032md01bw.c | 1 -
 drivers/video/fbdev/vesafb.c| 1 -
 5 files changed, 5 deletions(-)

diff --git a/drivers/video/fbdev/asiliantfb.c b/drivers/video/fbdev/asiliantfb.c
index 8383468f5577..0d33c75afc48 100644
--- a/drivers/video/fbdev/asiliantfb.c
+++ b/drivers/video/fbdev/asiliantfb.c
@@ -632,4 +632,3 @@ static void __exit asiliantfb_exit(void)
pci_unregister_driver(&asiliantfb_driver);
 }
 
-MODULE_LICENSE("GPL");
diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c
index 000b4aa44241..39e89b9f8dca 100644
--- a/drivers/video/fbdev/gbefb.c
+++ b/drivers/video/fbdev/gbefb.c
@@ -1285,4 +1285,3 @@ static void __exit gbefb_exit(void)
 module_init(gbefb_init);
 module_exit(gbefb_exit);
 
-MODULE_LICENSE("GPL");
diff --git a/drivers/video/fbdev/mmp/hw/mmp_ctrl.c 
b/drivers/video/fbdev/mmp/hw/mmp_ctrl.c
index a9df8ee79810..8b2838a67b76 100644
--- a/drivers/video/fbdev/mmp/hw/mmp_ctrl.c
+++ b/drivers/video/fbdev/mmp/hw/mmp_ctrl.c
@@ -576,4 +576,3 @@ module_init(mmphw_init);
 
 MODULE_AUTHOR("Li Guoqing");
 MODULE_DESCRIPTION("Framebuffer driver for mmp");
-MODULE_LICENSE("GPL");
diff --git a/drivers/video/fbdev/mmp/panel/tpo_tj032md01bw.c 
b/drivers/video/fbdev/mmp/panel/tpo_tj032md01bw.c
index 34fae588e202..84ed83ee2366 100644
--- a/drivers/video/fbdev/mmp/panel/tpo_tj032md01bw.c
+++ b/drivers/video/fbdev/mmp/panel/tpo_tj032md01bw.c
@@ -169,4 +169,3 @@ module_spi_driver(panel_tpohvga_driver);
 
 MODULE_AUTHOR("Lisa Du");
 MODULE_DESCRIPTION("Panel driver for tpohvga");
-MODULE_LICENSE("GPL");
diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index 929d4775cb4b..73b35fc67d8b 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -508,4 +508,3 @@ static struct platform_driver vesafb_driver = {
 };
 
 module_platform_driver(vesafb_driver);
-MODULE_LICENSE("GPL");
-- 
2.39.1.268.g9de2f9a303

[PATCH 02/27] kbuild, video: fbdev: remove MODULE_LICENSE in non-modules

2023-02-22 Thread Nick Alcock

Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock 
Suggested-by: Luis Chamberlain 
Cc: Luis Chamberlain 
Cc: linux-modu...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: Hitomi Hasegawa 
Cc: Helge Deller 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-fb...@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
---
 drivers/video/fbdev/wm8505fb.c| 1 -
 drivers/video/fbdev/wmt_ge_rops.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/video/fbdev/wm8505fb.c b/drivers/video/fbdev/wm8505fb.c
index 8f4d674fa0d0..2a2997c647f7 100644
--- a/drivers/video/fbdev/wm8505fb.c
+++ b/drivers/video/fbdev/wm8505fb.c
@@ -407,5 +407,4 @@ module_platform_driver(wm8505fb_driver);
 
 MODULE_AUTHOR("Ed Spiridonov ");
 MODULE_DESCRIPTION("Framebuffer driver for WMT WM8505");
-MODULE_LICENSE("GPL v2");
 MODULE_DEVICE_TABLE(of, wmt_dt_ids);
diff --git a/drivers/video/fbdev/wmt_ge_rops.c 
b/drivers/video/fbdev/wmt_ge_rops.c
index 42255d27a1db..c207fd917dce 100644
--- a/drivers/video/fbdev/wmt_ge_rops.c
+++ b/drivers/video/fbdev/wmt_ge_rops.c
@@ -170,5 +170,4 @@ module_platform_driver(wmt_ge_rops_driver);
 MODULE_AUTHOR("Alexey Charkov ");
 MODULE_DESCRIPTION("Accelerators for raster operations using "
   "WonderMedia Graphics Engine");
-MODULE_LICENSE("GPL v2");
 MODULE_DEVICE_TABLE(of, wmt_dt_ids);
-- 
2.39.1.268.g9de2f9a303

Re: [PATCH] media: Fix building pdfdocs

2023-02-22 Thread Akira Yokosawa

On Thu, 9 Feb 2023 12:11:19 +0200, Tomi Valkeinen wrote:
> Hi Daniel, Dave,
> 
> Could you pick this fix to drm-next? The offending commit is there, it was 
> merged via Laurent's "[GIT PULL FOR v6.3] R-Car DU fixes and improvements".
> 
> I can also push this to drm-misc-fixes, but the offending commit is not there 
> yet.

Hi,

I don't see this fix in next-20230220.

I'd really appreciate it if it could be upstreamed during the current
merge window.

Does anybody care to pick this up?

Thanks, Akira

> 
>  Tomi
> 
> On 08/02/2023 13:17, Mauro Carvalho Chehab wrote:
>> Em Wed, 8 Feb 2023 11:11:34 +0200
>> Laurent Pinchart  escreveu:
>>
>>> Hi Tomi,
>>>
>>> Thank you for the patch.
>>>
>>> On Wed, Feb 08, 2023 at 10:29:16AM +0200, Tomi Valkeinen wrote:
 Commit 8d0e3fc61abd ("media: Add 2-10-10-10 RGB formats") added
 documatation for a few new RGB formats. For some reason these break the
>>>
>>> s/documatation/documentation/
>>>
 pdfdocs build, even if the same style seems to work elsewhere in the
 file.

 Remove the trailing empty dash lines, which seems to fix the issue.

 Fixes: 8d0e3fc61abd ("media: Add 2-10-10-10 RGB formats")
 Reported-by: Akira Yokosawa 
 Signed-off-by: Tomi Valkeinen 
>>>
>>> Reviewed-by: Laurent Pinchart 
>>>
 ---

 Note: the offending patch was merged via drm tree, so we may want to
 apply the fix to the drm tree also.
>>>
>>> Sounds good to me. Mauro, could you ack this patch ?
>>
>> Acked-by: Mauro Carvalho Chehab 
>>
>>>
   Documentation/userspace-api/media/v4l/pixfmt-rgb.rst | 3 ---
   1 file changed, 3 deletions(-)

 diff --git a/Documentation/userspace-api/media/v4l/pixfmt-rgb.rst 
 b/Documentation/userspace-api/media/v4l/pixfmt-rgb.rst
 index d330aeb4d3eb..ea545ed1aeaa 100644
 --- a/Documentation/userspace-api/media/v4l/pixfmt-rgb.rst
 +++ b/Documentation/userspace-api/media/v4l/pixfmt-rgb.rst
 @@ -868,7 +868,6 @@ number of bits for each component.
     - r\ :sub:`4`
     - r\ :sub:`3`
     - r\ :sub:`2`
 -  -
   * .. _V4L2-PIX-FMT-RGBA1010102:
       - ``V4L2_PIX_FMT_RGBA1010102``
 @@ -909,7 +908,6 @@ number of bits for each component.
     - r\ :sub:`4`
     - r\ :sub:`3`
     - r\ :sub:`2`
 -  -
   * .. _V4L2-PIX-FMT-ARGB2101010:
       - ``V4L2_PIX_FMT_ARGB2101010``
 @@ -950,7 +948,6 @@ number of bits for each component.
     - r\ :sub:`6`
     - r\ :sub:`5`
     - r\ :sub:`4`
 -  -
     .. raw:: latex
     
>>>
>>
>>
>>
>> Thanks,
>> Mauro
>

[PATCH v2] drm/amdkfd: Fix an illegal memory access

2023-02-22 Thread qu . huang

In the kfd_wait_on_events() function, the kfd_event_waiter structure is
allocated by alloc_event_waiters(), but the event field of the waiter
structure is not initialized; When copy_from_user() fails in the
kfd_wait_on_events() function, it will enter exception handling to
release the previously allocated memory of the waiter structure;
Due to the event field of the waiters structure being accessed
in the free_waiters() function, this results in illegal memory access
and system crash, here is the crash log:

localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082
localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 
002c
localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: 
e7088f6a21d0
localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: 
aa53c362be64
localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 
0002
localhost kernel: R13: 9e7ead15d600 R14:  R15: 
9e7ead15d698
localhost kernel: FS:  152a3d111700() GS:9e855ee8() 
knlGS:
localhost kernel: CS:  0010 DS:  ES:  CR0: 80050033
localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 
003506e0
localhost kernel: Call Trace:
localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
localhost kernel: remove_wait_queue+0x12/0x50
localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: __x64_sys_ioctl+0x8e/0xd0
localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
localhost kernel: do_syscall_64+0x33/0x80
localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
localhost kernel: RIP: 0033:0x152a4dff68d7

Changes since v1:
  * Allocate the waiter structure using kzalloc, removing the initialization of 
activated;
  * '(event_waiters) &&' in the 'for' loop has also been removed.

Signed-off-by: Qu Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 729d26d..bb54f6c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -780,13 +780,12 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)

event_waiters = kmalloc_array(num_events,
sizeof(struct kfd_event_waiter),
-   GFP_KERNEL);
+   GFP_KERNEL | __GFP_ZERO);
if (!event_waiters)
return NULL;

-   for (i = 0; (event_waiters) && (i < num_events) ; i++) {
+   for (i = 0; i < num_events; i++) {
init_wait(&event_waiters[i].wait);
-   event_waiters[i].activated = false;
}

return event_waiters;
--
1.8.3.1

[PATCH] drm/i915/sseu: fix max_subslices array-index-out-of-bounds access

2023-02-22 Thread Andrea Righi

It seems that commit bc3c5e0809ae ("drm/i915/sseu: Don't try to store EU
mask internally in UAPI format") exposed a potential out-of-bounds
access, reported by UBSAN as following on a laptop with a gen 11 i915
card:

  UBSAN: array-index-out-of-bounds in drivers/gpu/drm/i915/gt/intel_sseu.c:65:27
  index 6 is out of range for type 'u16 [6]'
  CPU: 2 PID: 165 Comm: systemd-udevd Not tainted 6.2.0-9-generic #9-Ubuntu
  Hardware name: Dell Inc. XPS 13 9300/077Y9N, BIOS 1.11.0 03/22/2022
  Call Trace:
   
   show_stack+0x4e/0x61
   dump_stack_lvl+0x4a/0x6f
   dump_stack+0x10/0x18
   ubsan_epilogue+0x9/0x3a
   __ubsan_handle_out_of_bounds.cold+0x42/0x47
   gen11_compute_sseu_info+0x121/0x130 [i915]
   intel_sseu_info_init+0x15d/0x2b0 [i915]
   intel_gt_init_mmio+0x23/0x40 [i915]
   i915_driver_mmio_probe+0x129/0x400 [i915]
   ? intel_gt_probe_all+0x91/0x2e0 [i915]
   i915_driver_probe+0xe1/0x3f0 [i915]
   ? drm_privacy_screen_get+0x16d/0x190 [drm]
   ? acpi_dev_found+0x64/0x80
   i915_pci_probe+0xac/0x1b0 [i915]
   ...

According to the definition of sseu_dev_info, eu_mask->hsw is limited to
a maximum of GEN_MAX_SS_PER_HSW_SLICE (6) sub-slices, but
gen11_sseu_info_init() can potentially set 8 sub-slices, in the
!IS_JSL_EHL(gt->i915) case.

Fix this by reserving up to 8 slots for max_subslices in the eu_mask
struct.

Reported-by: Emil Renner Berthing 
Signed-off-by: Andrea Righi 
---
 drivers/gpu/drm/i915/gt/intel_sseu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.h 
b/drivers/gpu/drm/i915/gt/intel_sseu.h
index aa87d3832d60..d7e8c374f153 100644
--- a/drivers/gpu/drm/i915/gt/intel_sseu.h
+++ b/drivers/gpu/drm/i915/gt/intel_sseu.h
@@ -27,7 +27,7 @@ struct drm_printer;
  * is only relevant to pre-Xe_HP platforms (Xe_HP and beyond use the
  * I915_MAX_SS_FUSE_BITS value below).
  */
-#define GEN_MAX_SS_PER_HSW_SLICE   6
+#define GEN_MAX_SS_PER_HSW_SLICE   8
 
 /*
  * Maximum number of 32-bit registers used by hardware to express the
-- 
2.38.1

[PATCH 13/27] kbuild, vgacon: remove MODULE_LICENSE in non-modules

2023-02-22 Thread Nick Alcock

Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock 
Suggested-by: Luis Chamberlain 
Cc: Luis Chamberlain 
Cc: linux-modu...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: Hitomi Hasegawa 
Cc: Greg Kroah-Hartman 
Cc: Helge Deller 
Cc: linux-fb...@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
---
 drivers/video/console/vgacon.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/video/console/vgacon.c b/drivers/video/console/vgacon.c
index fcdf017e2665..8e13af1f9042 100644
--- a/drivers/video/console/vgacon.c
+++ b/drivers/video/console/vgacon.c
@@ -1204,4 +1204,3 @@ const struct consw vga_con = {
 };
 EXPORT_SYMBOL(vga_con);
 
-MODULE_LICENSE("GPL");
-- 
2.39.1.268.g9de2f9a303

[PATCH] drm/amdkfd: Fix an illegal memory access

2023-02-22 Thread qu . huang

From: Qu Huang 

In the kfd_wait_on_events() function, the kfd_event_waiter structure is
allocated by alloc_event_waiters(), but the event field of the waiter
structure is not initialized; When copy_from_user() fails in the
kfd_wait_on_events() function, it will enter exception handling to
release the previously allocated memory of the waiter structure;
Due to the event field of the waiters structure being accessed
in the free_waiters() function, this results in illegal memory access
and system crash, here is the crash log:

localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082
localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 
002c
localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: 
e7088f6a21d0
localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: 
aa53c362be64
localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 
0002
localhost kernel: R13: 9e7ead15d600 R14:  R15: 
9e7ead15d698
localhost kernel: FS:  152a3d111700() GS:9e855ee8() 
knlGS:
localhost kernel: CS:  0010 DS:  ES:  CR0: 80050033
localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 
003506e0
localhost kernel: Call Trace:
localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
localhost kernel: remove_wait_queue+0x12/0x50
localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: __x64_sys_ioctl+0x8e/0xd0
localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
localhost kernel: do_syscall_64+0x33/0x80
localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
localhost kernel: RIP: 0033:0x152a4dff68d7

Signed-off-by: Qu Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 729d26d..e5faaad 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -787,6 +787,7 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
for (i = 0; (event_waiters) && (i < num_events) ; i++) {
init_wait(&event_waiters[i].wait);
event_waiters[i].activated = false;
+   event_waiters[i].event = NULL;
}

return event_waiters;
--
1.8.3.1

Re: [PATCH] drm/amdkfd: Fix an illegal memory access

2023-02-22 Thread Qu Huang

On 2023/2/22 3:17, Christophe JAILLET wrote:
> Le 21/02/2023 à 17:26, Felix Kuehling a écrit :
>>
>> On 2023-02-21 06:35, qu.huang-fxuvxftifdnyg1zeobx...@public.gmane.org wrote:
>>> From: Qu Huang 
>>>
>>> In the kfd_wait_on_events() function, the kfd_event_waiter structure is
>>> allocated by alloc_event_waiters(), but the event field of the waiter
>>> structure is not initialized; When copy_from_user() fails in the
>>> kfd_wait_on_events() function, it will enter exception handling to
>>> release the previously allocated memory of the waiter structure;
>>> Due to the event field of the waiters structure being accessed
>>> in the free_waiters() function, this results in illegal memory access
>>> and system crash, here is the crash log:
>>>
>>> localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
>>> localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082
>>> localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 
>>> 002c
>>> localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: 
>>> e7088f6a21d0
>>> localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: 
>>> aa53c362be64
>>> localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 
>>> 0002
>>> localhost kernel: R13: 9e7ead15d600 R14:  R15: 
>>> 9e7ead15d698
>>> localhost kernel: FS:  152a3d111700() GS:9e855ee8() 
>>> knlGS:
>>> localhost kernel: CS:  0010 DS:  ES:  CR0: 80050033
>>> localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 
>>> 003506e0
>>> localhost kernel: Call Trace:
>>> localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
>>> localhost kernel: remove_wait_queue+0x12/0x50
>>> localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
>>> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
>>> localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
>>> localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
>>> localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
>>> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
>>> localhost kernel: __x64_sys_ioctl+0x8e/0xd0
>>> localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
>>> localhost kernel: do_syscall_64+0x33/0x80
>>> localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> localhost kernel: RIP: 0033:0x152a4dff68d7
>>>
>>> Signed-off-by: Qu Huang 
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> index 729d26d..e5faaad 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> @@ -787,6 +787,7 @@ static struct kfd_event_waiter 
>>> *alloc_event_waiters(uint32_t num_events)
>>>   for (i = 0; (event_waiters) && (i < num_events) ; i++) {
>>>   init_wait(&event_waiters[i].wait);
>>>   event_waiters[i].activated = false;
>>> +    event_waiters[i].event = NULL;
>>
>> Thank you for catching this. We're often lazy about initializing things to 
>> NULL or 0 because most of our data structures are allocated with kzalloc or 
>> similar. I'm not sure why we're not doing this here. If we allocated 
>> event_waiters with kcalloc, we could also remove the initialization of 
>> activated. I think that would be the cleaner and safer solution.
>
> Hi,
>
> I think that the '(event_waiters) &&' in the 'for' can also be removed.
> 'event_waiters' is already NULL tested a few lines above
>
>
> Just my 2c.
>
> CJ
>

Thanks for the suggestions from Felix and CJ, I have re-submitted patch v2, 
please review it：

https://lore.kernel.org/all/ea5b997309825b21e406f9bad2ce8...@linux.dev/

Regards,

Qu


>>
>> Regards,
>>    Felix
>>
>>
>>>   }
>>>
>>>   return event_waiters;
>>> -- 
>>> 1.8.3.1
>>
>

[PATCH 23/27] kbuild, drm/dsi: remove MODULE_LICENSE in non-modules

2023-02-22 Thread Nick Alcock

Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock 
Suggested-by: Luis Chamberlain 
Cc: Luis Chamberlain 
Cc: linux-modu...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: Hitomi Hasegawa 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/drm_mipi_dsi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_mipi_dsi.c b/drivers/gpu/drm/drm_mipi_dsi.c
index 497ef4b6a90a..4101d9780059 100644
--- a/drivers/gpu/drm/drm_mipi_dsi.c
+++ b/drivers/gpu/drm/drm_mipi_dsi.c
@@ -1294,4 +1294,3 @@ postcore_initcall(mipi_dsi_bus_init);
 
 MODULE_AUTHOR("Andrzej Hajda ");
 MODULE_DESCRIPTION("MIPI DSI Bus");
-MODULE_LICENSE("GPL and additional rights");
-- 
2.39.1.268.g9de2f9a303

Re: [PATCH v4 01/14] dma-buf/dma-fence: Add deadline awareness

2023-02-22 Thread Tvrtko Ursulin




On 22/02/2023 15:28, Christian König wrote:

Am 22.02.23 um 11:23 schrieb Tvrtko Ursulin:


On 18/02/2023 21:15, Rob Clark wrote:

From: Rob Clark 

Add a way to hint to the fence signaler of an upcoming deadline, such as
vblank, which the fence waiter would prefer not to miss.  This is to aid
the fence signaler in making power management decisions, like boosting
frequency as the deadline approaches and awareness of missing deadlines
so that can be factored in to the frequency scaling.

v2: Drop dma_fence::deadline and related logic to filter duplicate
 deadlines, to avoid increasing dma_fence size.  The fence-context
 implementation will need similar logic to track deadlines of all
 the fences on the same timeline.  [ckoenig]
v3: Clarify locking wrt. set_deadline callback

Signed-off-by: Rob Clark 
Reviewed-by: Christian König 
---
  drivers/dma-buf/dma-fence.c | 20 
  include/linux/dma-fence.h   | 20 
  2 files changed, 40 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 0de0482cd36e..763b32627684 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -912,6 +912,26 @@ dma_fence_wait_any_timeout(struct dma_fence 
**fences, uint32_t count,

  }
  EXPORT_SYMBOL(dma_fence_wait_any_timeout);
  +
+/**
+ * dma_fence_set_deadline - set desired fence-wait deadline
+ * @fence:    the fence that is to be waited on
+ * @deadline: the time by which the waiter hopes for the fence to be
+ *    signaled
+ *
+ * Inform the fence signaler of an upcoming deadline, such as 
vblank, by
+ * which point the waiter would prefer the fence to be signaled by. 
This

+ * is intended to give feedback to the fence signaler to aid in power
+ * management decisions, such as boosting GPU frequency if a periodic
+ * vblank deadline is approaching.
+ */
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+    if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
+    fence->ops->set_deadline(fence, deadline);
+}
+EXPORT_SYMBOL(dma_fence_set_deadline);
+
  /**
   * dma_fence_describe - Dump fence describtion into seq_file
   * @fence: the 6fence to describe
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..d77f6591c453 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -99,6 +99,7 @@ enum dma_fence_flag_bits {
  DMA_FENCE_FLAG_SIGNALED_BIT,
  DMA_FENCE_FLAG_TIMESTAMP_BIT,
  DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+    DMA_FENCE_FLAG_HAS_DEADLINE_BIT,


Would this bit be better left out from core implementation, given how 
the approach is the component which implements dma-fence has to track 
the actual deadline and all?


Also taking a step back - are we all okay with starting to expand the 
relatively simple core synchronisation primitive with side channel 
data like this? What would be the criteria for what side channel data 
would be acceptable? Taking note the thing lives outside drivers/gpu/.


I had similar concerns and it took me a moment as well to understand the 
background why this is necessary. I essentially don't see much other 
approach we could do.


Yes, this is GPU/CRTC specific, but we somehow need a common interface 
for communicating it between drivers and that's the dma_fence object as 
far as I can see.


Yeah I also don't see any other easy options. Just wanted to raise this 
as something which probably needs some wider acks.


Also what about the "low level" part of my question about the reason, or 
benefits, of defining the deadline bit in the common layer?


Regards,

Tvrtko

Re: [PATCH v5 0/7] Introduce __xchg, non-atomic xchg

2023-02-22 Thread Peter Zijlstra

On Wed, Jan 18, 2023 at 04:35:22PM +0100, Andrzej Hajda wrote:

> Andrzej Hajda (7):
>   arch: rename all internal names __xchg to __arch_xchg
>   linux/include: add non-atomic version of xchg
>   arch/*/uprobes: simplify arch_uretprobe_hijack_return_addr
>   llist: simplify __llist_del_all
>   io_uring: use __xchg if possible
>   qed: use __xchg if possible
>   drm/i915/gt: use __xchg instead of internal helper

Nothing crazy in here I suppose, I somewhat wonder why you went through
the trouble, but meh.

You want me to take this through te locking tree (for the next cycle,
not this one) where I normally take atomic things or does someone else
want this?

[PATCH 2/2] dt-bindings: display: Add Loongson display controller

2023-02-22 Thread suijingfeng

This patch add a trival DT usages for loongson display controller found
in LS2k1000 SoC.

Signed-off-by: suijingfeng 
---
 .../loongson/loongson,display-controller.yaml | 58 +++
 1 file changed, 58 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml

diff --git 
a/Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml
 
b/Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml
new file mode 100644
index ..98b78f449a80
--- /dev/null
+++ 
b/Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml
@@ -0,0 +1,58 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: 
http://devicetree.org/schemas/display/loongson/loongson,display-controller.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Loongson Display Controller Device Tree Bindings
+
+maintainers:
+  - Sui Jingfeng 
+
+description: |+
+
+  The display controller is a PCI device, it has two display pipe.
+  For the DC in LS2K1000 each way has a DVO output interface which
+  provide RGB888 signals, vertical & horizontal synchronisations
+  and the pixel clock. Each CRTC is able to support 1920x1080@60Hz,
+  the maximum resolution is 2048x2048 according to the hardware spec.
+
+properties:
+  $nodename:
+pattern: "^display-controller@[0-9a-f],[0-9a-f]$"
+
+  compatible:
+oneOf:
+  - items:
+  - enum:
+  - loongson,ls2k1000-dc
+  reg:
+maxItems: 1
+
+  interrupts:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - interrupts
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+bus {
+
+#address-cells = <3>;
+#size-cells = <2>;
+#interrupt-cells = <2>;
+
+display-controller@6,0 {
+compatible = "loongson,ls2k1000-dc";
+reg = <0x3000 0x0 0x0 0x0 0x0>;
+interrupts = <28 IRQ_TYPE_LEVEL_LOW>;
+};
+};
+
+...
-- 
2.34.1

[PATCH 1/2] Mips: ls2k1000: dts: add the display controller device node

2023-02-22 Thread suijingfeng

The display controller is a pci device, it's pci vendor id is
0x0014, it's pci device id is 0x7a06.

Signed-off-by: suijingfeng 
---
 .../boot/dts/loongson/loongson64-2k1000.dtsi  | 21 +++
 1 file changed, 21 insertions(+)

diff --git a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi 
b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
index 8143a6e3..a528af3977d9 100644
--- a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
+++ b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
@@ -31,6 +31,18 @@ memory@20 {
<0x0001 0x1000 0x0001 0xb000>; /* 6912 
MB at 4352MB */
};
 
+   reserved-memory {
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   display_reserved: framebuffer@3000 {
+   compatible = "shared-dma-pool";
+   reg = <0x0 0x3000 0x0 0x0400>; /* 64M */
+   linux,cma-default;
+   };
+   };
+
cpu_clk: cpu_clk {
#clock-cells = <0>;
compatible = "fixed-clock";
@@ -198,6 +210,15 @@ sata@8,0 {
interrupt-parent = <&liointc0>;
};
 
+   display-controller@6,0 {
+   compatible = "loongson,ls2k1000-dc";
+
+   reg = <0x3000 0x0 0x0 0x0 0x0>;
+   interrupts = <28 IRQ_TYPE_LEVEL_LOW>;
+   interrupt-parent = <&liointc0>;
+   memory-region = <&display_reserved>;
+   };
+
pci_bridge@9,0 {
compatible = "pci0014,7a19.0",
   "pci0014,7a19",
-- 
2.34.1

Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings

2023-02-22 Thread Danilo Krummrich


On 2/22/23 16:14, Christian König wrote:

Am 22.02.23 um 16:07 schrieb Danilo Krummrich:

On 2/22/23 11:25, Christian König wrote:

Am 17.02.23 um 14:44 schrieb Danilo Krummrich:





+/**
+ * DOC: Overview
+ *
+ * The DRM GPU VA Manager, represented by struct drm_gpuva_manager 
keeps track
+ * of a GPU's virtual address (VA) space and manages the 
corresponding virtual
+ * mappings represented by &drm_gpuva objects. It also keeps track 
of the

+ * mapping's backing &drm_gem_object buffers.
+ *
+ * &drm_gem_object buffers maintain a list (and a corresponding 
list lock) of
+ * &drm_gpuva objects representing all existent GPU VA mappings 
using this

+ * &drm_gem_object as backing buffer.
+ *
+ * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA 
mapping can
+ * only be created within a previously allocated &drm_gpuva_region, 
which
+ * represents a reserved portion of the GPU VA space. GPU VA 
mappings are not

+ * allowed to span over a &drm_gpuva_region's boundary.
+ *
+ * GPU VA regions can also be flagged as sparse, which allows 
drivers to create
+ * sparse mappings for a whole GPU VA region in order to support 
Vulkan

+ * 'Sparse Resources'.


Well since we have now found that there is absolutely no technical 
reason for having those regions could we please drop them?


I disagree this was the outcome of our previous discussion.

In nouveau I still need them to track the separate sparse page tables 
and, as you confirmed previously, Nvidia cards are not the only cards 
supporting this feature.


The second reason is that with regions we can avoid merging between 
buffers, which saves some effort. However, I agree that this argument 
by itself probably doesn't hold too much, since you've pointed out in 
a previous mail that:



1) If we merge and decide to only do that inside certain boundaries 
then those boundaries needs to be provided and checked against. This 
burns quite some CPU cycles


2) If we just merge what we can we might have extra page table updates 
which cost time and could result in undesired side effects.


3) If we don't merge at all we have additional housekeeping for the 
mappings and maybe hw restrictions.



However, if a driver uses regions to track its separate sparse page 
tables anyway it gets 1) for free, which is a nice synergy.


I totally agree that regions aren't for everyone though. Hence, I made 
them an optional feature and by default regions are disabled. In order 
to use them drm_gpuva_manager_init() must be called with the 
DRM_GPUVA_MANAGER_REGIONS feature flag.


I really would not want to open code regions or have two GPUVA manager 
instances in nouveau to track sparse page tables. That would be really 
messy, hence I hope we can agree on this to be an optional feature.


I absolutely don't think that this is a good idea then. This separate 
handling of sparse page tables is completely Nouveau specific.


Actually, I rely on what you said in a previous mail when I say it's, 
potentially, not specific to nouveau.



This sounds similar to what AMD hw used to have up until gfx8 (I think), 
basically sparse resources where defined through a separate mechanism to 
the address resolution of the page tables. I won't rule out that other 
hardware has similar approaches.





Even when it's optional feature mixing this into the common handling is 
exactly what I pointed out as not properly separating between hardware 
specific and hardware agnostic functionality.


Optionally having regions is *not* a hardware specific concept, drivers 
might use it for a hardware specific purpose though. Which potentially 
is is the case for almost every DRM helper.


Drivers can use regions only for the sake of not merging between buffer 
boundaries as well. Some drivers might prefer this over "never merge" or 
"always merge", depending on the cost of re-organizing page tables for 
unnecessary splits/merges, without having the need of tracking separate 
sparse page tables.


Its just that I think *if* a driver needs to track separate sparse page 
tables anyways its a nice synergy since then there is no extra cost for 
getting this optimization.




This is exactly the problem we ran into with TTM as well and I've spend 
a massive amount of time to clean that up again. >


Admittedly, I don't know what problems you are referring to. However, I 
don't see which kind of trouble it could cause by allowing drivers to 
track regions optionally.



Regards,
Christian.





I don't really see a need for them any more.

Regards,
Christian.

Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE

2023-02-22 Thread Matthew Wilcox

On Wed, Feb 22, 2023 at 05:11:34PM +0100, Danilo Krummrich wrote:
> On 2/21/23 19:31, Matthew Wilcox wrote:
> > on tue, feb 21, 2023 at 03:37:49pm +0100, danilo krummrich wrote:
> > > It feels a bit weird that I, as a user of the API, would need to lock 
> > > certain
> > > (or all?) mas_*() functions with the internal spinlock in order to protect
> > > (future) internal features of the tree, such as the slab cache 
> > > defragmentation
> > > you mentioned. Because from my perspective, as the generic component that 
> > > tells
> > > it's users (the drivers) to take care of locking VA space operations (and 
> > > hence
> > > tree operations) I don't have an own purpose of this internal spinlock, 
> > > right?
> > 
> > You don't ... but we can't know that.
> 
> Thanks for the clarification. I think I should now know what to for the
> GPUVA manager in terms of locking the maple tree in general.
> 
> Though I still have very limited insights on the maple tree I want to share
> some further thoughts.
> 
> From what I got so far it really seems to me that it would be better to just
> take the internal spinlock for both APIs (normal and advanced) whenever you
> need to internally.

No.  Really, no.  The point of the advanced API is that it's a toolbox
for doing the operation you want, but isn't a generic enough operation
to be part of the normal API.  To take an example from the radix
tree days, in the page cache, we need to walk a range of the tree,
looking for any entries that are marked as DIRTY, clear the DIRTY
mark and set the TOWRITE mark.  There was a horrendous function called
radix_tree_range_tag_if_tagged() which did exactly this.  Now look at
the implementation of tag_pages_for_writeback(); it's a simple loop over
a range with an occasional pause to check whether we need to reschedule.

But that means you need to know how to use the toolbox.  Some of the
tools are dangerous and you can cut yourself on them.

> Another plus would probably be maintainability. Once you got quite a few
> maple tree users using external locks (either in the sense of calling

I don't want maple tree users using external locks.  That exists
because it was the only reasonable way of converting the VMA tree
from the rbtree to the maple tree.  I intend to get rid of
mt_set_external_lock().  The VMAs are eventually going to be protected
by the internal spinlock.

Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE

2023-02-22 Thread Danilo Krummrich


On 2/21/23 19:31, Matthew Wilcox wrote:

On Tue, Feb 21, 2023 at 03:37:49PM +0100, Danilo Krummrich wrote:

On Mon, Feb 20, 2023 at 08:33:35PM +, Matthew Wilcox wrote:

On Mon, Feb 20, 2023 at 06:06:03PM +0100, Danilo Krummrich wrote:

On 2/20/23 16:10, Matthew Wilcox wrote:

This is why we like people to use the spinlock embedded in the tree.
There's nothing for the user to care about.  If the access really is
serialised, acquiring/releasing the uncontended spinlock is a minimal
cost compared to all the other things that will happen while modifying
the tree.


I think as for the users of the GPUVA manager we'd have two cases:

1) Accesses to the manager (and hence the tree) are serialized, no lock
needed.

2) Multiple operations on the tree must be locked in order to make them
appear atomic.


Could you give an example here of what you'd like to do?  Ideally
something complicated so I don't say "Oh, you can just do this" when
there's a more complex example for which "this" won't work.  I'm sure
that's embedded somewhere in the next 20-odd patches, but it's probably
quicker for you to describe in terms of tree operations that have to
appear atomic than for me to try to figure it out.



Absolutely, not gonna ask you to read all of that. :-)

One thing the GPUVA manager does is to provide drivers the (sub-)operations
that need to be processed in order to fulfill a map or unmap request from
userspace. For instance, when userspace asks the driver to map some memory
the GPUVA manager calculates which existing mappings must be removed, split up
or can be merged with the newly requested mapping.

A driver has two ways to fetch those operations from the GPUVA manager. It can
either obtain a list of operations or receive a callback for each operation
generated by the GPUVA manager.

In both cases the GPUVA manager walks the maple tree, which keeps track of
existing mappings, for the given range in __drm_gpuva_sm_map() (only considering
the map case, since the unmap case is a subset basically). For each mapping
found in the given range the driver, as mentioned, either receives a callback or
a list entry is added to the list of operations.

Typically, for each operation / callback one entry within the maple tree is
removed and, optionally at the beginning and end of a new mapping's range, a
new entry is inserted. An of course, as the last operation, there is the new
mapping itself to insert.

The GPUVA manager delegates locking responsibility to the drivers. Typically,
a driver either serializes access to the VA space managed by the GPUVA manager
(no lock needed) or need to lock the processing of a full set of operations
generated by the GPUVA manager.


OK, that all makes sense.  It does make sense to have the driver use its
own mutex and then take the spinlock inside the maple tree code.  It
shouldn't ever be contended.


In either case the embedded spinlock wouldn't be useful, we'd either need an
external lock or no lock at all.

If there are any internal reasons why specific tree operations must be
mutually excluded (such as those you explain below), wouldn't it make more
sense to always have the internal lock and, optionally, allow users to
specify an external lock additionally?


So the way this works for the XArray, which is a little older than the
Maple tree, is that we always use the internal spinlock for
modifications (possibly BH or IRQ safe), and if someone wants to
use an external mutex to make some callers atomic with respect to each
other, they're free to do so.  In that case, the XArray doesn't check
the user's external locking at all, because it really can't know.

I'd advise taking that approach; if there's really no way to use the
internal spinlock to make your complicated updates appear atomic
then just let the maple tree use its internal spinlock, and you can
also use your external mutex however you like.



That sounds like the right thing to do.

However, I'm using the advanced API of the maple tree (and that's the reason
why the above example appears a little more detailed than needed) because I
think with the normal API I can't insert / remove tree entries while walking
the tree, right?


Right.  The normal API is for simple operations while the advanced API
is for doing compound operations.


As by the documentation the advanced API, however, doesn't take care of locking
itself, hence just letting the maple tree use its internal spinlock doesn't
really work - I need to take care of that myself, right?


Yes; once you're using the advanced API, you get to compose the entire
operation yourself.


It feels a bit weird that I, as a user of the API, would need to lock certain
(or all?) mas_*() functions with the internal spinlock in order to protect
(future) internal features of the tree, such as the slab cache defragmentation
you mentioned. Because from my perspective, as the generic component that tells
it's users (the drivers) to take care of locking VA space operations (and hence
tree o

Re: [PATCH] drm/msm: return early when allocating fbdev fails

2023-02-22 Thread Thomas Zimmermann


Hi

Am 22.02.23 um 16:56 schrieb Tom Rix:

building with clang and W=1 reports
drivers/gpu/drm/msm/msm_fbdev.c:144:6: error: variable 'helper' is used
   uninitialized whenever 'if' condition is true 
[-Werror,-Wsometimes-uninitialized]
   if (!fbdev)
   ^~

helper is only initialized after fbdev succeeds, so is in a garbage state at
the fail: label.  There is nothing to unwinded if fbdev alloaction fails,
return NULL.

Fixes: 3fb1f62f80a1 ("drm/fb-helper: Remove drm_fb_helper_unprepare() from 
drm_fb_helper_fini()")
Signed-off-by: Tom Rix 


Already fixed here: 
https://lore.kernel.org/dri-devel/08e3340e-b459-0e60-4bba-30716b675...@suse.de/T/#t


Best regards
Thomas


---
  drivers/gpu/drm/msm/msm_fbdev.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c
index c804e5ba682a..c1356aff87da 100644
--- a/drivers/gpu/drm/msm/msm_fbdev.c
+++ b/drivers/gpu/drm/msm/msm_fbdev.c
@@ -142,7 +142,7 @@ struct drm_fb_helper *msm_fbdev_init(struct drm_device *dev)
  
  	fbdev = kzalloc(sizeof(*fbdev), GFP_KERNEL);

if (!fbdev)
-   goto fail;
+   return NULL;
  
  	helper = &fbdev->base;
  


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature

[PATCH] drm/vgem: Drop struct drm_vgem_gem_object

2023-02-22 Thread Maíra Canal

Commit 45d9c8dde4cd ("drm/vgem: use shmem helpers") introduced shmem
helpers to vgem and with that, removed all uses of the struct
drm_vgem_gem_object. So, as the struct is no longer used, delete it.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/vgem/vgem_drv.h | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/drivers/gpu/drm/vgem/vgem_drv.h b/drivers/gpu/drm/vgem/vgem_drv.h
index 0ed300317f87..34cf63e6fb3d 100644
--- a/drivers/gpu/drm/vgem/vgem_drv.h
+++ b/drivers/gpu/drm/vgem/vgem_drv.h
@@ -39,17 +39,6 @@ struct vgem_file {
struct mutex fence_mutex;
 };
 
-#define to_vgem_bo(x) container_of(x, struct drm_vgem_gem_object, base)
-struct drm_vgem_gem_object {
-   struct drm_gem_object base;
-
-   struct page **pages;
-   unsigned int pages_pin_count;
-   struct mutex pages_lock;
-
-   struct sg_table *table;
-};
-
 int vgem_fence_open(struct vgem_file *file);
 int vgem_fence_attach_ioctl(struct drm_device *dev,
void *data,
-- 
2.39.2

Re: [Intel-xe] [PATCH 1/3] drm/suballoc: Introduce a generic suballocation manager

2023-02-22 Thread Thomas Hellström




On 2/22/23 15:20, Christian König wrote:

Am 22.02.23 um 14:54 schrieb Thomas Hellström:

Hi,

On 2/22/23 12:39, Christian König wrote:

Hi Thomas,

Am 22.02.23 um 12:00 schrieb Thomas Hellström:

Hi, Christian,

So I resurrected Maarten's previous patch series around this (the 
amdgpu suballocator) slightly modified the code to match the API of 
this patch series, re-introduced the per-allocation alignment as 
per a previous review comment from you on that series, and made 
checkpatch.pl pass mostly, except for pre-existing style problems, 
and added / fixed some comments. No memory corruption seen so far 
on limited Xe testing.


To move this forward I suggest starting with that as a common drm 
suballocator. I'll post the series later today. We can follow up 
with potential simplifactions lif needed.


I also made a kunit test also reporting some timing information. 
Will post that as a follow up. Some interesting preliminary 
conclusions:


* drm_mm is per se not a cpu hog, If the rb tree processing is 
disabled and the EVICT algorithm is changed from MRU to ring-like 
LRU traversal, it's more or less just as fast as the ring 
suballocator.


* With a single ring, and the suballocation buffer never completely 
filled (no sleeps) the amd suballocator is a bit faster per 
allocation / free. (Around 250 ns instead of 350). Allocation is 
slightly slower on the amdgpu one, freeing is faster, mostly due to 
the locking overhead incurred when setting up the fence callbacks, 
and for avoiding irq-disabled processing on the one I proposed.


For some more realistic numbers try to signal the fence from another 
CPU. Alternative you can invalidate all the CPU read cache lines 
touched by the fence callback so that they need to be read in again 
from the allocating CPU.


Fences are signalled using hr-timer driven fake "ring"s, so should 
probably be distributed among cpus in a pretty realistic way. But 
anyway I agree results obtained from that kunit test can and should 
be challenged before we actually use them for improvements.


I would double check that. My expectation is that hr-timers execute by 
default on the CPU from which they are started.


Hmm, since not using the _PINNED hrtimer flag, I'd expect them to be 
more distributed but you're right, they weren't. A rather few 
timer_expires from other cpus only. So figures for signalling on other 
cpus are, around 500ns for the amdgpu variant, around 900 ns for the 
fence-callback one. Still, sleeping starts around 50-75% fill with the 
amdgpu variant.










* With multiple rings and varying allocation sizes and signalling 
times creating fragmentation, the picture becomes different as the 
amdgpu allocator starts to sleep/throttle already round 50% - 75% 
fill. The one I proposed between 75% to 90% fill, and once that 
happens, the CPU cost of putting to sleep and waking up should 
really shadow the above numbers.


So it's really a tradeoff. Where IMO also code size and 
maintainability should play a role.


Also I looked at the history of the amdgpu allocator originating 
back to Radeon 2012-ish, but couldn't find any commits mentioning 
fence callbacks nor problem with those. Could you point me to that 
discussion?


Uff that was ~10 years ago. I don't think I can find that again.


OK, fair enough. But what was the objective reasoning against using 
fence callbacks for this sort of stuff, was it unforeseen locking 
problems, caching issues or something else?


Well caching line bouncing is one major problem. Also take a look at 
the discussion about using list_head in interrupt handlers, that 
should be easy to find on LWN.


The allocator usually manages enough memory so that it never runs into 
waiting for anything, only in extreme cases like GPU resets we 
actually wait for allocations to be freed.


I guess this varies with the application, but can be remedied with just 
adding more managed memory if needed.


/Thomas




So the only cache lines which is accessed from more than one CPU 
should be the signaled flag of the fence.


With moving list work into the interrupt handler you have at least 3 
cache lines which start to bounce between different CPUs.


Regards,
Christian.



Thanks,

Thomas






Regards,
Christian.



Thanks,

Thomas



On 2/17/23 14:51, Thomas Hellström wrote:


On 2/17/23 14:18, Christian König wrote:

Am 17.02.23 um 14:10 schrieb Thomas Hellström:

[SNIP]


Any chance you could do a quick performance comparison? If 
not, anything against merging this without the amd / radeon 
changes until we can land a simpler allocator?


Only if you can stick the allocator inside Xe and not drm, 
cause this seems to be for a different use case than the 
allocators inside radeon/amdgpu.


Hmm. No It's allocating in a ring-like fashion as well. Let me 
put together a unit test for benchmaking. I think it would be 
a failure for the community to end up with three separate 
suballocators doing the exact same thing for the same proble

[PATCH] drm/msm: return early when allocating fbdev fails

2023-02-22 Thread Tom Rix

building with clang and W=1 reports
drivers/gpu/drm/msm/msm_fbdev.c:144:6: error: variable 'helper' is used
  uninitialized whenever 'if' condition is true 
[-Werror,-Wsometimes-uninitialized]
  if (!fbdev)
  ^~

helper is only initialized after fbdev succeeds, so is in a garbage state at
the fail: label.  There is nothing to unwinded if fbdev alloaction fails,
return NULL.

Fixes: 3fb1f62f80a1 ("drm/fb-helper: Remove drm_fb_helper_unprepare() from 
drm_fb_helper_fini()")
Signed-off-by: Tom Rix 
---
 drivers/gpu/drm/msm/msm_fbdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c
index c804e5ba682a..c1356aff87da 100644
--- a/drivers/gpu/drm/msm/msm_fbdev.c
+++ b/drivers/gpu/drm/msm/msm_fbdev.c
@@ -142,7 +142,7 @@ struct drm_fb_helper *msm_fbdev_init(struct drm_device *dev)
 
fbdev = kzalloc(sizeof(*fbdev), GFP_KERNEL);
if (!fbdev)
-   goto fail;
+   return NULL;
 
helper = &fbdev->base;
 
-- 
2.27.0

Re: [PATCH v4 10/14] drm/vblank: Add helper to get next vblank time

2023-02-22 Thread Ville Syrjälä

On Wed, Feb 22, 2023 at 07:44:42AM -0800, Rob Clark wrote:
> On Wed, Feb 22, 2023 at 1:57 AM Pekka Paalanen  wrote:
> >
> > On Tue, 21 Feb 2023 09:50:20 -0800
> > Rob Clark  wrote:
> >
> > > On Tue, Feb 21, 2023 at 5:01 AM Ville Syrjälä
> > >  wrote:
> > > >
> > > > On Tue, Feb 21, 2023 at 10:45:51AM +0200, Pekka Paalanen wrote:
> > > > > On Mon, 20 Feb 2023 07:55:41 -0800
> > > > > Rob Clark  wrote:
> > > > >
> > > > > > On Mon, Feb 20, 2023 at 1:08 AM Pekka Paalanen 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Sat, 18 Feb 2023 13:15:53 -0800
> > > > > > > Rob Clark  wrote:
> > > > > > >
> > > > > > > > From: Rob Clark 
> > > > > > > >
> > > > > > > > Will be used in the next commit to set a deadline on fences 
> > > > > > > > that an
> > > > > > > > atomic update is waiting on.
> > > > > > > >
> > > > > > > > Signed-off-by: Rob Clark 
> > > > > > > > ---
> > > > > > > >  drivers/gpu/drm/drm_vblank.c | 32 
> > > > > > > > 
> > > > > > > >  include/drm/drm_vblank.h |  1 +
> > > > > > > >  2 files changed, 33 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/drivers/gpu/drm/drm_vblank.c 
> > > > > > > > b/drivers/gpu/drm/drm_vblank.c
> > > > > > > > index 2ff31717a3de..caf25ebb34c5 100644
> > > > > > > > --- a/drivers/gpu/drm/drm_vblank.c
> > > > > > > > +++ b/drivers/gpu/drm/drm_vblank.c
> > > > > > > > @@ -980,6 +980,38 @@ u64 drm_crtc_vblank_count_and_time(struct 
> > > > > > > > drm_crtc *crtc,
> > > > > > > >  }
> > > > > > > >  EXPORT_SYMBOL(drm_crtc_vblank_count_and_time);
> > > > > > > >
> > > > > > > > +/**
> > > > > > > > + * drm_crtc_next_vblank_time - calculate the time of the next 
> > > > > > > > vblank
> > > > > > > > + * @crtc: the crtc for which to calculate next vblank time
> > > > > > > > + * @vblanktime: pointer to time to receive the next vblank 
> > > > > > > > timestamp.
> > > > > > > > + *
> > > > > > > > + * Calculate the expected time of the next vblank based on 
> > > > > > > > time of previous
> > > > > > > > + * vblank and frame duration
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > for VRR this targets the highest frame rate possible for the 
> > > > > > > current
> > > > > > > VRR mode, right?
> > > > > > >
> > > > > >
> > > > > > It is based on vblank->framedur_ns which is in turn based on
> > > > > > mode->crtc_clock.  Presumably for VRR that ends up being a maximum?
> > > > >
> > > > > I don't know. :-)
> > > >
> > > > At least for i915 this will give you the maximum frame
> > > > duration.
> > >
> > > I suppose one could argue that maximum frame duration is the actual
> > > deadline.  Anything less is just moar fps, but not going to involve
> > > stalling until vblank N+1, AFAIU
> > >
> > > > Also this does not calculate the the start of vblank, it
> > > > calculates the start of active video.
> > >
> > > Probably something like end of previous frame's video..  might not be
> > > _exactly_ correct (because some buffering involved), but OTOH on the
> > > GPU side, I expect the driver to set a timer for a few ms or so before
> > > the deadline.  So there is some wiggle room.
> >
> > The vblank timestamp is defined to be the time of the first active
> > pixel of the frame in the video signal. At least that's the one that
> > UAPI carries (when not tearing?). It is not the start of vblank period.
> >
> > With VRR, the front porch before the first active pixel can be multiple
> > milliseconds. The difference between 144 Hz and 60 Hz is 9.7 ms for
> > example.
> 
> What we really want is the deadline for the hw to latch for the next
> frame.. which as Ville pointed out is definitely before the end of
> vblank.
> 
> Honestly this sort of feature is a lot more critical for the non-VRR
> case, and VRR is kind of a minority edge case.  So I'd prefer not to
> get too hung up on VRR.  If there is an easy way for the helpers to
> detect VRR, I'd be perfectly fine not setting a deadline hint in that
> case, and let someone who actually has a VRR display figure out how to
> handle that case.

The formula I gave you earlier works for both VRR and non-VRR.

-- 
Ville Syrjälä
Intel

Re: [PATCH v4 03/14] dma-buf/fence-chain: Add fence deadline support

2023-02-22 Thread Rob Clark

On Wed, Feb 22, 2023 at 2:27 AM Tvrtko Ursulin
 wrote:
>
>
> On 18/02/2023 21:15, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Propagate the deadline to all the fences in the chain.
> >
> > Signed-off-by: Rob Clark 
> > Reviewed-by: Christian König  for this one.
> > ---
> >   drivers/dma-buf/dma-fence-chain.c | 13 +
> >   1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-fence-chain.c 
> > b/drivers/dma-buf/dma-fence-chain.c
> > index a0d920576ba6..4684874af612 100644
> > --- a/drivers/dma-buf/dma-fence-chain.c
> > +++ b/drivers/dma-buf/dma-fence-chain.c
> > @@ -206,6 +206,18 @@ static void dma_fence_chain_release(struct dma_fence 
> > *fence)
> >   dma_fence_free(fence);
> >   }
> >
> > +
> > +static void dma_fence_chain_set_deadline(struct dma_fence *fence,
> > +  ktime_t deadline)
> > +{
> > + dma_fence_chain_for_each(fence, fence) {
> > + struct dma_fence_chain *chain = to_dma_fence_chain(fence);
> > + struct dma_fence *f = chain ? chain->fence : fence;
>
> Low level comment - above two lines could be replaced with:
>
> struct dma_fence *f = dma_fence_chain_contained(fence);
>
> Although to be fair I am not sure that wouldn't be making it less
> readable. From the point of view that fence might not be a chain, so
> dma_fence_chain_contained() reads a bit dodgy as an API.

It does seem to be the canonical way to do it since 18f5fad275ef
("dma-buf: add dma_fence_chain_contained helper").. looks like I
missed that when I rebased.  I guess for consistency it's best that I
use the helper.

BR,
-R

> Regards,
>
> Tvrtko
>
> > +
> > + dma_fence_set_deadline(f, deadline);
> > + }
> > +}
> > +
> >   const struct dma_fence_ops dma_fence_chain_ops = {
> >   .use_64bit_seqno = true,
> >   .get_driver_name = dma_fence_chain_get_driver_name,
> > @@ -213,6 +225,7 @@ const struct dma_fence_ops dma_fence_chain_ops = {
> >   .enable_signaling = dma_fence_chain_enable_signaling,
> >   .signaled = dma_fence_chain_signaled,
> >   .release = dma_fence_chain_release,
> > + .set_deadline = dma_fence_chain_set_deadline,
> >   };
> >   EXPORT_SYMBOL(dma_fence_chain_ops);
> >

Re: [PATCH v4 11/14] drm/atomic-helper: Set fence deadline for vblank

2023-02-22 Thread Rob Clark

On Wed, Feb 22, 2023 at 2:46 AM Luben Tuikov  wrote:
>
> On 2023-02-18 16:15, Rob Clark wrote:
> > From: Rob Clark 
> >
> > For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> > the next vblank time, and inform the fence(s) of that deadline.
> >
> > v2: Comment typo fix (danvet)
> >
> > Signed-off-by: Rob Clark 
> > Reviewed-by: Daniel Vetter 
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/gpu/drm/drm_atomic_helper.c | 36 +
> >  1 file changed, 36 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> > b/drivers/gpu/drm/drm_atomic_helper.c
> > index d579fd8f7cb8..35a4dc714920 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1511,6 +1511,40 @@ void drm_atomic_helper_commit_modeset_enables(struct 
> > drm_device *dev,
> >  }
> >  EXPORT_SYMBOL(drm_atomic_helper_commit_modeset_enables);
> >
> > +/*
> > + * For atomic updates which touch just a single CRTC, calculate the time 
> > of the
> > + * next vblank, and inform all the fences of the deadline.
> > + */
> > +static void set_fence_deadline(struct drm_device *dev,
> > +struct drm_atomic_state *state)
> > +{
> > + struct drm_crtc *crtc, *wait_crtc = NULL;
> > + struct drm_crtc_state *new_crtc_state;
> > + struct drm_plane *plane;
> > + struct drm_plane_state *new_plane_state;
> > + ktime_t vbltime;
>
> I've not looked at the latest language spec, but is AFAIR "vbltime"
> would be uninitialized here. Has this changed?
>
> > + int i;
> > +
> > + for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
> > + if (wait_crtc)
> > + return;
> > + wait_crtc = crtc;
> > + }
> > +
> > + /* If no CRTCs updated, then nothing to do: */
> > + if (!wait_crtc)
> > + return;
> > +
> > + if (drm_crtc_next_vblank_time(wait_crtc, &vbltime))
> > + return;
>
> We have a problem here in that we're adding the time remaining to the next
> vblank event to an uninitialized local variable. As per my comment on patch 
> 10,
> I'd rather drm_crtc_next_vblank_time() yield the time remaining to the vblank 
> event,
> and we can do the arithmetic locally here in this function.

if drm_crtc_next_vblank_time() returns 0 then it has initialized
vbltime, so no problem here

BR,
-R

> --
> Regards,
> Luben
>
> > +
> > + for_each_new_plane_in_state (state, plane, new_plane_state, i) {
> > + if (!new_plane_state->fence)
> > + continue;
> > + dma_fence_set_deadline(new_plane_state->fence, vbltime);
> > + }
> > +}
> > +
> >  /**
> >   * drm_atomic_helper_wait_for_fences - wait for fences stashed in plane 
> > state
> >   * @dev: DRM device
> > @@ -1540,6 +1574,8 @@ int drm_atomic_helper_wait_for_fences(struct 
> > drm_device *dev,
> >   struct drm_plane_state *new_plane_state;
> >   int i, ret;
> >
> > + set_fence_deadline(dev, state);
> > +
> >   for_each_new_plane_in_state(state, plane, new_plane_state, i) {
> >   if (!new_plane_state->fence)
> >   continue;
>

Re: [PATCH v4 10/14] drm/vblank: Add helper to get next vblank time

2023-02-22 Thread Rob Clark

On Wed, Feb 22, 2023 at 2:37 AM Luben Tuikov  wrote:
>
> On 2023-02-18 16:15, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Will be used in the next commit to set a deadline on fences that an
> > atomic update is waiting on.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/gpu/drm/drm_vblank.c | 32 
> >  include/drm/drm_vblank.h |  1 +
> >  2 files changed, 33 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> > index 2ff31717a3de..caf25ebb34c5 100644
> > --- a/drivers/gpu/drm/drm_vblank.c
> > +++ b/drivers/gpu/drm/drm_vblank.c
> > @@ -980,6 +980,38 @@ u64 drm_crtc_vblank_count_and_time(struct drm_crtc 
> > *crtc,
> >  }
> >  EXPORT_SYMBOL(drm_crtc_vblank_count_and_time);
> >
> > +/**
> > + * drm_crtc_next_vblank_time - calculate the time of the next vblank
> > + * @crtc: the crtc for which to calculate next vblank time
> > + * @vblanktime: pointer to time to receive the next vblank timestamp.
> > + *
> > + * Calculate the expected time of the next vblank based on time of previous
> > + * vblank and frame duration
> > + */
> > +int drm_crtc_next_vblank_time(struct drm_crtc *crtc, ktime_t *vblanktime)
> > +{
> > + unsigned int pipe = drm_crtc_index(crtc);
> > + struct drm_vblank_crtc *vblank = &crtc->dev->vblank[pipe];
> > + u64 count;
> > +
> > + if (!vblank->framedur_ns)
> > + return -EINVAL;
> > +
> > + count = drm_vblank_count_and_time(crtc->dev, pipe, vblanktime);
> > +
> > + /*
> > +  * If we don't get a valid count, then we probably also don't
> > +  * have a valid time:
> > +  */
> > + if (!count)
> > + return -EINVAL;
> > +
> > + *vblanktime = ktime_add(*vblanktime, 
> > ns_to_ktime(vblank->framedur_ns));
>
> I'd rather this not do any arithmetic, i.e. add, and simply return the 
> calculated
> remaining time, i.e. time left--so instead of add, it would simply assign
> the remaining time, and possibly rename the vblanktime to something like 
> "time_to_vblank."
>

Note that since I sent the last iteration, I've renamed it to
drm_crtc_next_vblank_start().

I would prefer to keep the arithmetic, because I have another use for
this helper in drm/msm (for async/cursor updates, where we want to set
an hrtimer for start of vblank).  It is a bit off the topic of this
series, but I can include the patch when I repost.

BR,
-R

> Changing the top comment to "calculate the time remaining to the next vblank".
> --
> Regards,
> Luben
>
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(drm_crtc_next_vblank_time);
> > +
> >  static void send_vblank_event(struct drm_device *dev,
> >   struct drm_pending_vblank_event *e,
> >   u64 seq, ktime_t now)
> > diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
> > index 733a3e2d1d10..a63bc2c92f3c 100644
> > --- a/include/drm/drm_vblank.h
> > +++ b/include/drm/drm_vblank.h
> > @@ -230,6 +230,7 @@ bool drm_dev_has_vblank(const struct drm_device *dev);
> >  u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
> >  u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
> >  ktime_t *vblanktime);
> > +int drm_crtc_next_vblank_time(struct drm_crtc *crtc, ktime_t *vblanktime);
> >  void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> >  struct drm_pending_vblank_event *e);
> >  void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
>

Re: [PATCH v4 10/14] drm/vblank: Add helper to get next vblank time

2023-02-22 Thread Rob Clark

On Wed, Feb 22, 2023 at 1:57 AM Pekka Paalanen  wrote:
>
> On Tue, 21 Feb 2023 09:50:20 -0800
> Rob Clark  wrote:
>
> > On Tue, Feb 21, 2023 at 5:01 AM Ville Syrjälä
> >  wrote:
> > >
> > > On Tue, Feb 21, 2023 at 10:45:51AM +0200, Pekka Paalanen wrote:
> > > > On Mon, 20 Feb 2023 07:55:41 -0800
> > > > Rob Clark  wrote:
> > > >
> > > > > On Mon, Feb 20, 2023 at 1:08 AM Pekka Paalanen  
> > > > > wrote:
> > > > > >
> > > > > > On Sat, 18 Feb 2023 13:15:53 -0800
> > > > > > Rob Clark  wrote:
> > > > > >
> > > > > > > From: Rob Clark 
> > > > > > >
> > > > > > > Will be used in the next commit to set a deadline on fences that 
> > > > > > > an
> > > > > > > atomic update is waiting on.
> > > > > > >
> > > > > > > Signed-off-by: Rob Clark 
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/drm_vblank.c | 32 
> > > > > > > 
> > > > > > >  include/drm/drm_vblank.h |  1 +
> > > > > > >  2 files changed, 33 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/drm_vblank.c 
> > > > > > > b/drivers/gpu/drm/drm_vblank.c
> > > > > > > index 2ff31717a3de..caf25ebb34c5 100644
> > > > > > > --- a/drivers/gpu/drm/drm_vblank.c
> > > > > > > +++ b/drivers/gpu/drm/drm_vblank.c
> > > > > > > @@ -980,6 +980,38 @@ u64 drm_crtc_vblank_count_and_time(struct 
> > > > > > > drm_crtc *crtc,
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL(drm_crtc_vblank_count_and_time);
> > > > > > >
> > > > > > > +/**
> > > > > > > + * drm_crtc_next_vblank_time - calculate the time of the next 
> > > > > > > vblank
> > > > > > > + * @crtc: the crtc for which to calculate next vblank time
> > > > > > > + * @vblanktime: pointer to time to receive the next vblank 
> > > > > > > timestamp.
> > > > > > > + *
> > > > > > > + * Calculate the expected time of the next vblank based on time 
> > > > > > > of previous
> > > > > > > + * vblank and frame duration
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > for VRR this targets the highest frame rate possible for the current
> > > > > > VRR mode, right?
> > > > > >
> > > > >
> > > > > It is based on vblank->framedur_ns which is in turn based on
> > > > > mode->crtc_clock.  Presumably for VRR that ends up being a maximum?
> > > >
> > > > I don't know. :-)
> > >
> > > At least for i915 this will give you the maximum frame
> > > duration.
> >
> > I suppose one could argue that maximum frame duration is the actual
> > deadline.  Anything less is just moar fps, but not going to involve
> > stalling until vblank N+1, AFAIU
> >
> > > Also this does not calculate the the start of vblank, it
> > > calculates the start of active video.
> >
> > Probably something like end of previous frame's video..  might not be
> > _exactly_ correct (because some buffering involved), but OTOH on the
> > GPU side, I expect the driver to set a timer for a few ms or so before
> > the deadline.  So there is some wiggle room.
>
> The vblank timestamp is defined to be the time of the first active
> pixel of the frame in the video signal. At least that's the one that
> UAPI carries (when not tearing?). It is not the start of vblank period.
>
> With VRR, the front porch before the first active pixel can be multiple
> milliseconds. The difference between 144 Hz and 60 Hz is 9.7 ms for
> example.

What we really want is the deadline for the hw to latch for the next
frame.. which as Ville pointed out is definitely before the end of
vblank.

Honestly this sort of feature is a lot more critical for the non-VRR
case, and VRR is kind of a minority edge case.  So I'd prefer not to
get too hung up on VRR.  If there is an easy way for the helpers to
detect VRR, I'd be perfectly fine not setting a deadline hint in that
case, and let someone who actually has a VRR display figure out how to
handle that case.

BR,
-R

Re: [PATCH 0/2] Add quirk to disable PSR 2 on Tongfang PHxTxX1 and PHxTQx1

2023-02-22 Thread Werner Sembach




Am 22.02.23 um 15:17 schrieb Werner Sembach:

On these Barebones PSR 2 is recognized as supported but is very buggy:
- Upper third of screen does sometimes not updated, resulting in
disappearing cursors or ghosts of already closed Windows saying behind.
- Approximately 40 px from the bottom edge a 3 pixel wide strip of randomly
colored pixels is flickering.

PSR 1 is working fine however.

This patchset introduces a new quirk to disable PSR 2 specifically on known
buggy devices and applies it to the Tongfang PHxTxX1 and PHxTQx1 barebones.

Signed-off-by: Werner Sembach 
Cc: 


Parralel to this there is a patch fixing the root cause of this issue: 
https://gitlab.freedesktop.org/drm/intel/-/issues/7347#note_1785094


So this quirk might only be relevant for stable kernels, depending on when that 
other patch gets merged.

Re: [PATCH v4 06/14] dma-buf/sync_file: Support (E)POLLPRI

2023-02-22 Thread Rob Clark

On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen  wrote:
>
> On Tue, 21 Feb 2023 09:53:56 -0800
> Rob Clark  wrote:
>
> > On Tue, Feb 21, 2023 at 8:48 AM Luben Tuikov  wrote:
> > >
> > > On 2023-02-20 11:14, Rob Clark wrote:
> > > > On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen  
> > > > wrote:
> > > >>
> > > >> On Sat, 18 Feb 2023 13:15:49 -0800
> > > >> Rob Clark  wrote:
> > > >>
> > > >>> From: Rob Clark 
> > > >>>
> > > >>> Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> > > >>> wait (as opposed to a "housekeeping" wait to know when to cleanup 
> > > >>> after
> > > >>> some work has completed).  Usermode components of GPU driver stacks
> > > >>> often poll() on fence fd's to know when it is safe to do things like
> > > >>> free or reuse a buffer, but they can also poll() on a fence fd when
> > > >>> waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> > > >>> lets the kernel differentiate these two cases.
> > > >>>
> > > >>> Signed-off-by: Rob Clark 
> > > >>
> > > >> Hi,
> > > >>
> > > >> where would the UAPI documentation of this go?
> > > >> It seems to be missing.
> > > >
> > > > Good question, I am not sure.  The poll() man page has a description,
> > > > but my usage doesn't fit that _exactly_ (but OTOH the description is a
> > > > bit vague).
> > > >
> > > >> If a Wayland compositor is polling application fences to know which
> > > >> client buffer to use in its rendering, should the compositor poll with
> > > >> PRI or not? If a compositor polls with PRI, then all fences from all
> > > >> applications would always be PRI. Would that be harmful somehow or
> > > >> would it be beneficial?
> > > >
> > > > I think a compositor would rather use the deadline ioctl and then poll
> > > > without PRI.  Otherwise you are giving an urgency signal to the fence
> > > > signaller which might not necessarily be needed.
> > > >
> > > > The places where I expect PRI to be useful is more in mesa (things
> > > > like glFinish(), readpix, and other similar sorts of blocking APIs)
> > > Hi,
> > >
> > > Hmm, but then user-space could do the opposite, namely, submit work as 
> > > usual--never
> > > using the SET_DEADLINE ioctl, and then at the end, poll using (E)POLLPRI. 
> > > That seems
> > > like a possible usage pattern, unintended--maybe, but possible. Do we 
> > > want to discourage
> > > this? Wouldn't SET_DEADLINE be enough? I mean, one can call SET_DEADLINE 
> > > with the current
> > > time, and then wouldn't that be equivalent to (E)POLLPRI?
> >
> > Yeah, (E)POLLPRI isn't strictly needed if we have SET_DEADLINE.  It is
> > slightly more convenient if you want an immediate deadline (single
> > syscall instead of two), but not strictly needed.  OTOH it piggy-backs
> > on existing UABI.
>
> In that case, I would be conservative, and not add the POLLPRI
> semantics. An UAPI addition that is not strictly needed and somewhat
> unclear if it violates any design principles is best not done, until it
> is proven to be beneficial.
>
> Besides, a Wayland compositor does not necessary need to add the fd
> to its main event loop for poll. It could just SET_DEADLINE, and then
> when it renders simply check if the fence passed or not already. Not
> polling means the compositor does not need to wake up at the moment the
> fence signals to just record a flag.

poll(POLLPRI) isn't intended for wayland.. but is a thing I want in
mesa for fence waits.  I _could_ use SET_DEADLINE but it is two
syscalls and correspondingly more code ;-)

> On another matter, if the application uses SET_DEADLINE with one
> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> another timestamp, what should happen?

The expectation is that many deadline hints can be set on a fence.
The fence signaller should track the soonest deadline.

BR,
-R

> Maybe it's a soft-realtime app whose primary goal is not display, and
> it needs the result faster than the window server?
>
> Maybe SET_DEADLINE should set the deadline only to an earlier timestamp
> and never later?
>
>
> Thanks,
> pq

Re: [PATCH v5 09/14] drm/syncobj: Add deadline support for syncobj waits

2023-02-22 Thread Rob Clark

On Wed, Feb 22, 2023 at 6:06 AM Rodrigo Vivi  wrote:
>
> On Wed, Feb 22, 2023 at 12:09:04PM +0200, Pekka Paalanen wrote:
> > On Tue, 21 Feb 2023 09:25:18 -0800
> > Rob Clark  wrote:
> >
> > > On Tue, Feb 21, 2023 at 12:53 AM Pekka Paalanen  
> > > wrote:
> > > >
> > > > On Mon, 20 Feb 2023 12:18:56 -0800
> > > > Rob Clark  wrote:
> > > >
> > > > > From: Rob Clark 
> > > > >
> > > > > Add a new flag to let userspace provide a deadline as a hint for 
> > > > > syncobj
> > > > > and timeline waits.  This gives a hint to the driver signaling the
> > > > > backing fences about how soon userspace needs it to compete work, so 
> > > > > it
> > > > > can addjust GPU frequency accordingly.  An immediate deadline can be
> > > > > given to provide something equivalent to i915 "wait boost".
> > > > >
> > > > > v2: Use absolute u64 ns value for deadline hint, drop cap and driver
> > > > > feature flag in favor of allowing count_handles==0 as a way for
> > > > > userspace to probe kernel for support of new flag
> > > > >
> > > > > Signed-off-by: Rob Clark 
> > > > > ---
> > > > >  drivers/gpu/drm/drm_syncobj.c | 59 
> > > > > +++
> > > > >  include/uapi/drm/drm.h|  5 +++
> > > > >  2 files changed, 51 insertions(+), 13 deletions(-)
> > > >
> > > > ...
> > > >
> > > > > diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
> > > > > index 642808520d92..aefc8cc743e0 100644
> > > > > --- a/include/uapi/drm/drm.h
> > > > > +++ b/include/uapi/drm/drm.h
> > > > > @@ -887,6 +887,7 @@ struct drm_syncobj_transfer {
> > > > >  #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0)
> > > > >  #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1)
> > > > >  #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE (1 << 2) /* wait for 
> > > > > time point to become available */
> > > > > +#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE (1 << 3) /* set fence 
> > > > > deadline based to deadline_nsec/sec */
> > > >
> > > > Hi,
> > > >
> > > > where is the UAPI documentation explaining what is a "fence deadline"
> > > > and what setting it does here?
> > >
> > > It's with the rest of the drm_syncobj UAPI docs ;-)
> >
> > Is that 
> > https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html#dma-fence-uabi-sync-file
> >  ?
> >
> > That whole page never mentions e.g. WAIT_AVAILABLE, so at least the
> > flags are not there. Does not mention syncobj_wait either.

Sorry, that was a snarky reference to thin UABI docs about syncobj.
The kernel side of it is better documented.

> probably this:
> https://docs.kernel.org/gpu/drm-mm.html
>
> the new one needs to be added there as well.
>
> >
> > I could ask where the real non-IGT userspace is or the plan for it,
> > too, since this is new DRM UAPI.
>
> yeap, it looks like we need to close on this...
> https://gitlab.freedesktop.org/drm/intel/-/issues/8014
>
> I confess I got lost on the many discussions and on how this will
> be used. Is mesa going to set the deadline based on the vk priority?
>
> Will this continue to be internal? I didn't get the broader picture
> I'm afraid...

Yes, the plan is to use it from mesa vk-common helpers (and elsewhere
in mesa if needed).  There is a separate discussion[1] about
limiting/allowing boost (perhaps based on ctx flag or cgroups) but
that is more about how drivers react to the deadline hint.  The
immediate goal of this patch is just to fix the regression mentioned
in that gitlab issue when using syncobj waits instead of
DRM_IOCTL_I915_GEM_WAIT

BR,
-R

[1] https://www.spinics.net/lists/dri-devel/msg383075.html

> >
> >
> > Thanks,
> > pq
> >
> > >
> > > BR,
> > > -R
> > >
> > > > btw. no nsec/sec anymore.
> > > >
> > > >
> > > > Thanks,
> > > > pq
> > > >
> > > >
> > > > >  struct drm_syncobj_wait {
> > > > >   __u64 handles;
> > > > >   /* absolute timeout */
> > > > > @@ -895,6 +896,8 @@ struct drm_syncobj_wait {
> > > > >   __u32 flags;
> > > > >   __u32 first_signaled; /* only valid when not waiting all */
> > > > >   __u32 pad;
> > > > > + /* Deadline hint to set on backing fence(s) in CLOCK_MONOTONIC: 
> > > > > */
> > > > > + __u64 deadline_ns;
> > > > >  };
> > > > >
> > > > >  struct drm_syncobj_timeline_wait {
> > > > > @@ -907,6 +910,8 @@ struct drm_syncobj_timeline_wait {
> > > > >   __u32 flags;
> > > > >   __u32 first_signaled; /* only valid when not waiting all */
> > > > >   __u32 pad;
> > > > > + /* Deadline hint to set on backing fence(s) in CLOCK_MONOTONIC: 
> > > > > */
> > > > > + __u64 deadline_ns;
> > > > >  };
> > > > >
> > > > >
> > > >
> >
>
>

Re: [PATCH v4 01/14] dma-buf/dma-fence: Add deadline awareness

2023-02-22 Thread Christian König


Am 22.02.23 um 11:23 schrieb Tvrtko Ursulin:


On 18/02/2023 21:15, Rob Clark wrote:

From: Rob Clark 

Add a way to hint to the fence signaler of an upcoming deadline, such as
vblank, which the fence waiter would prefer not to miss.  This is to aid
the fence signaler in making power management decisions, like boosting
frequency as the deadline approaches and awareness of missing deadlines
so that can be factored in to the frequency scaling.

v2: Drop dma_fence::deadline and related logic to filter duplicate
 deadlines, to avoid increasing dma_fence size.  The fence-context
 implementation will need similar logic to track deadlines of all
 the fences on the same timeline.  [ckoenig]
v3: Clarify locking wrt. set_deadline callback

Signed-off-by: Rob Clark 
Reviewed-by: Christian König 
---
  drivers/dma-buf/dma-fence.c | 20 
  include/linux/dma-fence.h   | 20 
  2 files changed, 40 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 0de0482cd36e..763b32627684 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -912,6 +912,26 @@ dma_fence_wait_any_timeout(struct dma_fence 
**fences, uint32_t count,

  }
  EXPORT_SYMBOL(dma_fence_wait_any_timeout);
  +
+/**
+ * dma_fence_set_deadline - set desired fence-wait deadline
+ * @fence:    the fence that is to be waited on
+ * @deadline: the time by which the waiter hopes for the fence to be
+ *    signaled
+ *
+ * Inform the fence signaler of an upcoming deadline, such as 
vblank, by
+ * which point the waiter would prefer the fence to be signaled by.  
This

+ * is intended to give feedback to the fence signaler to aid in power
+ * management decisions, such as boosting GPU frequency if a periodic
+ * vblank deadline is approaching.
+ */
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+    if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
+    fence->ops->set_deadline(fence, deadline);
+}
+EXPORT_SYMBOL(dma_fence_set_deadline);
+
  /**
   * dma_fence_describe - Dump fence describtion into seq_file
   * @fence: the 6fence to describe
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..d77f6591c453 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -99,6 +99,7 @@ enum dma_fence_flag_bits {
  DMA_FENCE_FLAG_SIGNALED_BIT,
  DMA_FENCE_FLAG_TIMESTAMP_BIT,
  DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+    DMA_FENCE_FLAG_HAS_DEADLINE_BIT,


Would this bit be better left out from core implementation, given how 
the approach is the component which implements dma-fence has to track 
the actual deadline and all?


Also taking a step back - are we all okay with starting to expand the 
relatively simple core synchronisation primitive with side channel 
data like this? What would be the criteria for what side channel data 
would be acceptable? Taking note the thing lives outside drivers/gpu/.


I had similar concerns and it took me a moment as well to understand the 
background why this is necessary. I essentially don't see much other 
approach we could do.


Yes, this is GPU/CRTC specific, but we somehow need a common interface 
for communicating it between drivers and that's the dma_fence object as 
far as I can see.


Regards,
Christian.



Regards,

Tvrtko


  DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
  };
  @@ -257,6 +258,23 @@ struct dma_fence_ops {
   */
  void (*timeline_value_str)(struct dma_fence *fence,
 char *str, int size);
+
+    /**
+ * @set_deadline:
+ *
+ * Callback to allow a fence waiter to inform the fence signaler of
+ * an upcoming deadline, such as vblank, by which point the waiter
+ * would prefer the fence to be signaled by.  This is intended to
+ * give feedback to the fence signaler to aid in power management
+ * decisions, such as boosting GPU frequency.
+ *
+ * This is called without &dma_fence.lock held, it can be called
+ * multiple times and from any context.  Locking is up to the 
callee

+ * if it has some state to manage.
+ *
+ * This callback is optional.
+ */
+    void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
  };
    void dma_fence_init(struct dma_fence *fence, const struct 
dma_fence_ops *ops,
@@ -583,6 +601,8 @@ static inline signed long dma_fence_wait(struct 
dma_fence *fence, bool intr)

  return ret < 0 ? ret : 0;
  }
  +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t 
deadline);

+
  struct dma_fence *dma_fence_get_stub(void);
  struct dma_fence *dma_fence_allocate_private_stub(void);
  u64 dma_fence_context_alloc(unsigned num);

Re: [PATCH v3 2/7] media: Add Y210, Y212 and Y216 formats

2023-02-22 Thread Nicolas Dufresne

Hi Tomi,

Le mercredi 21 décembre 2022 à 11:24 +0200, Tomi Valkeinen a écrit :
> Add Y210, Y212 and Y216 formats.
> 
> Signed-off-by: Tomi Valkeinen 
> ---
>  .../media/v4l/pixfmt-packed-yuv.rst   | 49 ++-
>  drivers/media/v4l2-core/v4l2-ioctl.c  |  3 ++
>  include/uapi/linux/videodev2.h|  8 +++
>  3 files changed, 58 insertions(+), 2 deletions(-)

It seems you omitted to update v4l2-common.c, Ming Qian had made a suplicated
commit for this, I'll ask him if he can keep the -common changes you forgot.

> 
> diff --git a/Documentation/userspace-api/media/v4l/pixfmt-packed-yuv.rst 
> b/Documentation/userspace-api/media/v4l/pixfmt-packed-yuv.rst
> index bf283a1b5581..24a771542059 100644
> --- a/Documentation/userspace-api/media/v4l/pixfmt-packed-yuv.rst
> +++ b/Documentation/userspace-api/media/v4l/pixfmt-packed-yuv.rst
> @@ -262,7 +262,12 @@ the second byte and Y'\ :sub:`7-0` in the third byte.
>  =
>  
>  These formats, commonly referred to as YUYV or YUY2, subsample the chroma
> -components horizontally by 2, storing 2 pixels in 4 bytes.
> +components horizontally by 2, storing 2 pixels in a container. The container
> +is 32-bits for 8-bit formats, and 64-bits for 10+-bit formats.
> +
> +The packed YUYV formats with more than 8 bits per component are stored as 
> four
> +16-bit little-endian words. Each word's most significant bits contain one
> +component, and the least significant bits are zero padding.
>  
>  .. raw:: latex
>  
> @@ -270,7 +275,7 @@ components horizontally by 2, storing 2 pixels in 4 bytes.
>  
>  .. tabularcolumns:: 
> |p{3.4cm}|p{1.2cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|
>  
> -.. flat-table:: Packed YUV 4:2:2 Formats
> +.. flat-table:: Packed YUV 4:2:2 Formats in 32-bit container
>  :header-rows: 1
>  :stub-columns: 0
>  
> @@ -337,6 +342,46 @@ components horizontally by 2, storing 2 pixels in 4 
> bytes.
>- Y'\ :sub:`3`
>- Cb\ :sub:`2`
>  
> +.. tabularcolumns:: 
> |p{3.4cm}|p{1.2cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|
> +
> +.. flat-table:: Packed YUV 4:2:2 Formats in 64-bit container
> +:header-rows: 1
> +:stub-columns: 0
> +
> +* - Identifier
> +  - Code
> +  - Word 0
> +  - Word 1
> +  - Word 2
> +  - Word 3
> +* .. _V4L2-PIX-FMT-Y210:
> +
> +  - ``V4L2_PIX_FMT_Y210``
> +  - 'Y210'
> +
> +  - Y'\ :sub:`0` (bits 15-6)
> +  - Cb\ :sub:`0` (bits 15-6)
> +  - Y'\ :sub:`1` (bits 15-6)
> +  - Cr\ :sub:`0` (bits 15-6)
> +* .. _V4L2-PIX-FMT-Y212:
> +
> +  - ``V4L2_PIX_FMT_Y212``
> +  - 'Y212'
> +
> +  - Y'\ :sub:`0` (bits 15-4)
> +  - Cb\ :sub:`0` (bits 15-4)
> +  - Y'\ :sub:`1` (bits 15-4)
> +  - Cr\ :sub:`0` (bits 15-4)
> +* .. _V4L2-PIX-FMT-Y216:
> +
> +  - ``V4L2_PIX_FMT_Y216``
> +  - 'Y216'
> +
> +  - Y'\ :sub:`0` (bits 15-0)
> +  - Cb\ :sub:`0` (bits 15-0)
> +  - Y'\ :sub:`1` (bits 15-0)
> +  - Cr\ :sub:`0` (bits 15-0)
> +
>  .. raw:: latex
>  
>  \normalsize
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
> b/drivers/media/v4l2-core/v4l2-ioctl.c
> index 875b9a95e3c8..a244d5181120 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -1449,6 +1449,9 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
>   case V4L2_META_FMT_RK_ISP1_STAT_3A: descr = "Rockchip ISP1 3A 
> Statistics"; break;
>   case V4L2_PIX_FMT_NV12M_8L128:  descr = "NV12M (8x128 Linear)"; break;
>   case V4L2_PIX_FMT_NV12M_10BE_8L128: descr = "10-bit NV12M (8x128 
> Linear, BE)"; break;
> + case V4L2_PIX_FMT_Y210: descr = "10-bit YUYV Packed"; break;
> + case V4L2_PIX_FMT_Y212: descr = "12-bit YUYV Packed"; break;
> + case V4L2_PIX_FMT_Y216: descr = "16-bit YUYV Packed"; break;
>  
>   default:
>   /* Compressed formats */
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 51d6a8aa4e17..403db3fb5cfa 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -621,6 +621,14 @@ struct v4l2_pix_format {
>  #define V4L2_PIX_FMT_YUVX32  v4l2_fourcc('Y', 'U', 'V', 'X') /* 32  
> YUVX-8-8-8-8  */
>  #define V4L2_PIX_FMT_M420v4l2_fourcc('M', '4', '2', '0') /* 12  YUV 
> 4:2:0 2 lines y, 1 line uv interleaved */
>  
> +/*
> + * YCbCr packed format. For each Y2xx format, xx bits of valid data occupy 
> the MSBs
> + * of the 16 bit components, and 16-xx bits of zero padding occupy the LSBs.
> + */
> +#define V4L2_PIX_FMT_Y210v4l2_fourcc('Y', '2', '1', '0') /* 32  YUYV 
> 4:2:2 */
> +#define V4L2_PIX_FMT_Y212v4l2_fourcc('Y', '2', '1', '2') /* 32  YUYV 
> 4:2:2 */
> +#define V4L2_PIX_FMT_Y216v4l2_fourcc('Y', '2', '1', '6') /* 32  YUYV 
> 4:2:2 */
> +
>  /* two planes -- one Y, one Cr + Cb interleaved  */
>  #define V4L2_PIX_FMT_NV12v4l2_f

Re: [PATCH] backlight: ktz8866: Convert to i2c's .probe_new()

2023-02-22 Thread Lee Jones

On Sun, 12 Feb 2023, Uwe Kleine-König wrote:

> Hello Lee,
> 
> On Mon, Jan 30, 2023 at 09:42:01AM +, Lee Jones wrote:
> > On Fri, 27 Jan 2023, Uwe Kleine-König wrote:
> > 
> > > The probe function doesn't make use of the i2c_device_id * parameter so
> > > it can be trivially converted.
> > > 
> > > Signed-off-by: Uwe Kleine-König 
> > > ---
> > > Hello,
> > > 
> > > there is an ongoing effort to convert all drivers to .probe_new to
> > > eventually drop .probe with the i2c_device_id parameter. This driver
> > > currently sits in next so wasn't on my radar before.
> > > 
> > > My plan is to tackle that after the next merge window. So I ask you to
> > > either apply this patch during the next merge window or accept that it
> > > will go in via the i2c tree together with the patch that drops .probe().
> > > 
> > >  drivers/video/backlight/ktz8866.c | 5 ++---
> > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > Applied, thanks
> 
> Thanks, this patch didn't make it into next yet though. Is there a
> problem other than pending tests before you publish it in your tree that
> gets pulled into next?

It's queued, ready to be submitted tomorrow.  Don't worry.

-- 
Lee Jones [李琼斯]

Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings

2023-02-22 Thread Christian König


Am 22.02.23 um 16:07 schrieb Danilo Krummrich:

On 2/22/23 11:25, Christian König wrote:

Am 17.02.23 um 14:44 schrieb Danilo Krummrich:





+/**
+ * DOC: Overview
+ *
+ * The DRM GPU VA Manager, represented by struct drm_gpuva_manager 
keeps track
+ * of a GPU's virtual address (VA) space and manages the 
corresponding virtual
+ * mappings represented by &drm_gpuva objects. It also keeps track 
of the

+ * mapping's backing &drm_gem_object buffers.
+ *
+ * &drm_gem_object buffers maintain a list (and a corresponding 
list lock) of
+ * &drm_gpuva objects representing all existent GPU VA mappings 
using this

+ * &drm_gem_object as backing buffer.
+ *
+ * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA 
mapping can
+ * only be created within a previously allocated &drm_gpuva_region, 
which
+ * represents a reserved portion of the GPU VA space. GPU VA 
mappings are not

+ * allowed to span over a &drm_gpuva_region's boundary.
+ *
+ * GPU VA regions can also be flagged as sparse, which allows 
drivers to create
+ * sparse mappings for a whole GPU VA region in order to support 
Vulkan

+ * 'Sparse Resources'.


Well since we have now found that there is absolutely no technical 
reason for having those regions could we please drop them?


I disagree this was the outcome of our previous discussion.

In nouveau I still need them to track the separate sparse page tables 
and, as you confirmed previously, Nvidia cards are not the only cards 
supporting this feature.


The second reason is that with regions we can avoid merging between 
buffers, which saves some effort. However, I agree that this argument 
by itself probably doesn't hold too much, since you've pointed out in 
a previous mail that:



1) If we merge and decide to only do that inside certain boundaries 
then those boundaries needs to be provided and checked against. This 
burns quite some CPU cycles


2) If we just merge what we can we might have extra page table updates 
which cost time and could result in undesired side effects.


3) If we don't merge at all we have additional housekeeping for the 
mappings and maybe hw restrictions.



However, if a driver uses regions to track its separate sparse page 
tables anyway it gets 1) for free, which is a nice synergy.


I totally agree that regions aren't for everyone though. Hence, I made 
them an optional feature and by default regions are disabled. In order 
to use them drm_gpuva_manager_init() must be called with the 
DRM_GPUVA_MANAGER_REGIONS feature flag.


I really would not want to open code regions or have two GPUVA manager 
instances in nouveau to track sparse page tables. That would be really 
messy, hence I hope we can agree on this to be an optional feature.


I absolutely don't think that this is a good idea then. This separate 
handling of sparse page tables is completely Nouveau specific.


Even when it's optional feature mixing this into the common handling is 
exactly what I pointed out as not properly separating between hardware 
specific and hardware agnostic functionality.


This is exactly the problem we ran into with TTM as well and I've spend 
a massive amount of time to clean that up again.


Regards,
Christian.





I don't really see a need for them any more.

Regards,
Christian.

Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings

2023-02-22 Thread Danilo Krummrich


On 2/22/23 11:25, Christian König wrote:

Am 17.02.23 um 14:44 schrieb Danilo Krummrich:





+/**
+ * DOC: Overview
+ *
+ * The DRM GPU VA Manager, represented by struct drm_gpuva_manager 
keeps track
+ * of a GPU's virtual address (VA) space and manages the 
corresponding virtual
+ * mappings represented by &drm_gpuva objects. It also keeps track of 
the

+ * mapping's backing &drm_gem_object buffers.
+ *
+ * &drm_gem_object buffers maintain a list (and a corresponding list 
lock) of
+ * &drm_gpuva objects representing all existent GPU VA mappings using 
this

+ * &drm_gem_object as backing buffer.
+ *
+ * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA 
mapping can
+ * only be created within a previously allocated &drm_gpuva_region, 
which
+ * represents a reserved portion of the GPU VA space. GPU VA mappings 
are not

+ * allowed to span over a &drm_gpuva_region's boundary.
+ *
+ * GPU VA regions can also be flagged as sparse, which allows drivers 
to create

+ * sparse mappings for a whole GPU VA region in order to support Vulkan
+ * 'Sparse Resources'.


Well since we have now found that there is absolutely no technical 
reason for having those regions could we please drop them?


I disagree this was the outcome of our previous discussion.

In nouveau I still need them to track the separate sparse page tables 
and, as you confirmed previously, Nvidia cards are not the only cards 
supporting this feature.


The second reason is that with regions we can avoid merging between 
buffers, which saves some effort. However, I agree that this argument by 
itself probably doesn't hold too much, since you've pointed out in a 
previous mail that:



1) If we merge and decide to only do that inside certain boundaries then 
those boundaries needs to be provided and checked against. This burns 
quite some CPU cycles


2) If we just merge what we can we might have extra page table updates 
which cost time and could result in undesired side effects.


3) If we don't merge at all we have additional housekeeping for the 
mappings and maybe hw restrictions.



However, if a driver uses regions to track its separate sparse page 
tables anyway it gets 1) for free, which is a nice synergy.


I totally agree that regions aren't for everyone though. Hence, I made 
them an optional feature and by default regions are disabled. In order 
to use them drm_gpuva_manager_init() must be called with the 
DRM_GPUVA_MANAGER_REGIONS feature flag.


I really would not want to open code regions or have two GPUVA manager 
instances in nouveau to track sparse page tables. That would be really 
messy, hence I hope we can agree on this to be an optional feature.




I don't really see a need for them any more.

Regards,
Christian.

[PATCH 2/2] Apply quirk to disable PSR 2 on Tongfang PHxTxX1 and PHxTQx1

2023-02-22 Thread Werner Sembach

On these Barebones PSR 2 is recognized as supported but is very buggy:
- Upper third of screen does sometimes not updated, resulting in
disappearing cursors or ghosts of already closed Windows saying behind.
- Approximately 40 px from the bottom edge a 3 pixel wide strip of randomly
colored pixels is flickering.

PSR 1 is working fine however.

Signed-off-by: Werner Sembach 
Cc: 
---
 drivers/gpu/drm/i915/display/intel_quirks.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_quirks.c 
b/drivers/gpu/drm/i915/display/intel_quirks.c
index ce6d0fe6448f5..eeb32d3189f5c 100644
--- a/drivers/gpu/drm/i915/display/intel_quirks.c
+++ b/drivers/gpu/drm/i915/display/intel_quirks.c
@@ -65,6 +65,10 @@ static void quirk_no_pps_backlight_power_hook(struct 
drm_i915_private *i915)
drm_info(&i915->drm, "Applying no pps backlight power quirk\n");
 }
 
+/*
+ * Tongfang PHxTxX1 and PHxTQx1 devices have support for PSR 2 but it is broken
+ * on Linux. PSR 1 however works just fine.
+ */
 static void quirk_no_psr2(struct drm_i915_private *i915)
 {
intel_set_quirk(i915, QUIRK_NO_PSR2);
@@ -205,6 +209,10 @@ static struct intel_quirk intel_quirks[] = {
/* ECS Liva Q2 */
{ 0x3185, 0x1019, 0xa94d, quirk_increase_ddi_disabled_time },
{ 0x3184, 0x1019, 0xa94d, quirk_increase_ddi_disabled_time },
+
+   /* Tongfang PHxTxX1 and PHxTQx1/TUXEDO InfinityBook 14 Gen6 */
+   { 0x9a49, 0x1d05, 0x1105, quirk_no_psr2 },
+   { 0x9a49, 0x1d05, 0x114c, quirk_no_psr2 },
 };
 
 void intel_init_quirks(struct drm_i915_private *i915)
-- 
2.34.1

[PATCH 0/2] Add quirk to disable PSR 2 on Tongfang PHxTxX1 and PHxTQx1

2023-02-22 Thread Werner Sembach

On these Barebones PSR 2 is recognized as supported but is very buggy:
- Upper third of screen does sometimes not updated, resulting in
disappearing cursors or ghosts of already closed Windows saying behind.
- Approximately 40 px from the bottom edge a 3 pixel wide strip of randomly
colored pixels is flickering.

PSR 1 is working fine however.

This patchset introduces a new quirk to disable PSR 2 specifically on known
buggy devices and applies it to the Tongfang PHxTxX1 and PHxTQx1 barebones.

Signed-off-by: Werner Sembach 
Cc:

[PATCH 1/2] Add quirk to disable PSR 2 on a per device basis

2023-02-22 Thread Werner Sembach

This adds the possibility to disable PSR 2 explicitly using the intel
quirk table for devices where PSR 2 is borked, but PSR 1 works fine.

Signed-off-by: Werner Sembach 
Cc: 
---
 drivers/gpu/drm/i915/display/intel_psr.c| 4 +++-
 drivers/gpu/drm/i915/display/intel_quirks.c | 6 ++
 drivers/gpu/drm/i915/display/intel_quirks.h | 1 +
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_psr.c 
b/drivers/gpu/drm/i915/display/intel_psr.c
index 5b678916e6db5..4607f3c4cf592 100644
--- a/drivers/gpu/drm/i915/display/intel_psr.c
+++ b/drivers/gpu/drm/i915/display/intel_psr.c
@@ -37,6 +37,7 @@
 #include "intel_psr.h"
 #include "intel_snps_phy.h"
 #include "skl_universal_plane.h"
+#include "intel_quirks.h"
 
 /**
  * DOC: Panel Self Refresh (PSR/SRD)
@@ -851,7 +852,8 @@ static bool intel_psr2_config_valid(struct intel_dp 
*intel_dp,
int crtc_vdisplay = crtc_state->hw.adjusted_mode.crtc_vdisplay;
int psr_max_h = 0, psr_max_v = 0, max_bpp = 0;
 
-   if (!intel_dp->psr.sink_psr2_support)
+   if (!intel_dp->psr.sink_psr2_support ||
+   intel_has_quirk(dev_priv, QUIRK_NO_PSR2))
return false;
 
/* JSL and EHL only supports eDP 1.3 */
diff --git a/drivers/gpu/drm/i915/display/intel_quirks.c 
b/drivers/gpu/drm/i915/display/intel_quirks.c
index 6e48d3bcdfec5..ce6d0fe6448f5 100644
--- a/drivers/gpu/drm/i915/display/intel_quirks.c
+++ b/drivers/gpu/drm/i915/display/intel_quirks.c
@@ -65,6 +65,12 @@ static void quirk_no_pps_backlight_power_hook(struct 
drm_i915_private *i915)
drm_info(&i915->drm, "Applying no pps backlight power quirk\n");
 }
 
+static void quirk_no_psr2(struct drm_i915_private *i915)
+{
+   intel_set_quirk(i915, QUIRK_NO_PSR2);
+   drm_info(&i915->drm, "Applying No PSR 2 quirk\n");
+}
+
 struct intel_quirk {
int device;
int subsystem_vendor;
diff --git a/drivers/gpu/drm/i915/display/intel_quirks.h 
b/drivers/gpu/drm/i915/display/intel_quirks.h
index 10a4d163149fd..2e0b788a44a1e 100644
--- a/drivers/gpu/drm/i915/display/intel_quirks.h
+++ b/drivers/gpu/drm/i915/display/intel_quirks.h
@@ -17,6 +17,7 @@ enum intel_quirk_id {
QUIRK_INVERT_BRIGHTNESS,
QUIRK_LVDS_SSC_DISABLE,
QUIRK_NO_PPS_BACKLIGHT_POWER_HOOK,
+   QUIRK_NO_PSR2,
 };
 
 void intel_init_quirks(struct drm_i915_private *i915);
-- 
2.34.1

Re: [Intel-xe] [PATCH 1/3] drm/suballoc: Introduce a generic suballocation manager

2023-02-22 Thread Christian König


Am 22.02.23 um 14:54 schrieb Thomas Hellström:

Hi,

On 2/22/23 12:39, Christian König wrote:

Hi Thomas,

Am 22.02.23 um 12:00 schrieb Thomas Hellström:

Hi, Christian,

So I resurrected Maarten's previous patch series around this (the 
amdgpu suballocator) slightly modified the code to match the API of 
this patch series, re-introduced the per-allocation alignment as per 
a previous review comment from you on that series, and made 
checkpatch.pl pass mostly, except for pre-existing style problems, 
and added / fixed some comments. No memory corruption seen so far on 
limited Xe testing.


To move this forward I suggest starting with that as a common drm 
suballocator. I'll post the series later today. We can follow up 
with potential simplifactions lif needed.


I also made a kunit test also reporting some timing information. 
Will post that as a follow up. Some interesting preliminary 
conclusions:


* drm_mm is per se not a cpu hog, If the rb tree processing is 
disabled and the EVICT algorithm is changed from MRU to ring-like 
LRU traversal, it's more or less just as fast as the ring suballocator.


* With a single ring, and the suballocation buffer never completely 
filled (no sleeps) the amd suballocator is a bit faster per 
allocation / free. (Around 250 ns instead of 350). Allocation is 
slightly slower on the amdgpu one, freeing is faster, mostly due to 
the locking overhead incurred when setting up the fence callbacks, 
and for avoiding irq-disabled processing on the one I proposed.


For some more realistic numbers try to signal the fence from another 
CPU. Alternative you can invalidate all the CPU read cache lines 
touched by the fence callback so that they need to be read in again 
from the allocating CPU.


Fences are signalled using hr-timer driven fake "ring"s, so should 
probably be distributed among cpus in a pretty realistic way. But 
anyway I agree results obtained from that kunit test can and should be 
challenged before we actually use them for improvements.


I would double check that. My expectation is that hr-timers execute by 
default on the CPU from which they are started.








* With multiple rings and varying allocation sizes and signalling 
times creating fragmentation, the picture becomes different as the 
amdgpu allocator starts to sleep/throttle already round 50% - 75% 
fill. The one I proposed between 75% to 90% fill, and once that 
happens, the CPU cost of putting to sleep and waking up should 
really shadow the above numbers.


So it's really a tradeoff. Where IMO also code size and 
maintainability should play a role.


Also I looked at the history of the amdgpu allocator originating 
back to Radeon 2012-ish, but couldn't find any commits mentioning 
fence callbacks nor problem with those. Could you point me to that 
discussion?


Uff that was ~10 years ago. I don't think I can find that again.


OK, fair enough. But what was the objective reasoning against using 
fence callbacks for this sort of stuff, was it unforeseen locking 
problems, caching issues or something else?


Well caching line bouncing is one major problem. Also take a look at the 
discussion about using list_head in interrupt handlers, that should be 
easy to find on LWN.


The allocator usually manages enough memory so that it never runs into 
waiting for anything, only in extreme cases like GPU resets we actually 
wait for allocations to be freed.


So the only cache lines which is accessed from more than one CPU should 
be the signaled flag of the fence.


With moving list work into the interrupt handler you have at least 3 
cache lines which start to bounce between different CPUs.


Regards,
Christian.



Thanks,

Thomas






Regards,
Christian.



Thanks,

Thomas



On 2/17/23 14:51, Thomas Hellström wrote:


On 2/17/23 14:18, Christian König wrote:

Am 17.02.23 um 14:10 schrieb Thomas Hellström:

[SNIP]


Any chance you could do a quick performance comparison? If 
not, anything against merging this without the amd / radeon 
changes until we can land a simpler allocator?


Only if you can stick the allocator inside Xe and not drm, 
cause this seems to be for a different use case than the 
allocators inside radeon/amdgpu.


Hmm. No It's allocating in a ring-like fashion as well. Let me 
put together a unit test for benchmaking. I think it would be a 
failure for the community to end up with three separate 
suballocators doing the exact same thing for the same problem, 
really.


Well exactly that's the point. Those allocators aren't the same 
because they handle different problems.


The allocator in radeon is simpler because it only had to deal 
with a limited number of fence timelines. The one in amdgpu is a 
bit more complex because of the added complexity for more fence 
timelines.


We could take the one from amdgpu and use it for radeon and 
others as well, but the allocator proposed here doesn't even 
remotely matches the requirements.


But again, what *are* those missing requirements

Re: [PATCH v12 03/10] drm/display: Add Type-C switch helpers

2023-02-22 Thread Andy Shevchenko

On Wed, Feb 22, 2023 at 04:53:41PM +0800, Pin-yen Lin wrote:
> On Tue, Feb 21, 2023 at 7:48 PM Andy Shevchenko
> > On Tue, Feb 21, 2023 at 05:50:47PM +0800, Pin-yen Lin wrote:

...

> > >  #include 
> > >  #include 
> > > +#include 
> >
> > I don't see users of this.
> > But a few forward declarations are missing.
> 
> I can put a `struct typec_mux_dev;` here, but there is also a usage of
> typec_mux_set_fn_t.
> 
> Should I add the typedef into this header file as well?

No, inclusion is fine, it's me who didn't notice the type in use.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v5 09/14] drm/syncobj: Add deadline support for syncobj waits

2023-02-22 Thread Rodrigo Vivi

On Wed, Feb 22, 2023 at 12:09:04PM +0200, Pekka Paalanen wrote:
> On Tue, 21 Feb 2023 09:25:18 -0800
> Rob Clark  wrote:
> 
> > On Tue, Feb 21, 2023 at 12:53 AM Pekka Paalanen  wrote:
> > >
> > > On Mon, 20 Feb 2023 12:18:56 -0800
> > > Rob Clark  wrote:
> > >  
> > > > From: Rob Clark 
> > > >
> > > > Add a new flag to let userspace provide a deadline as a hint for syncobj
> > > > and timeline waits.  This gives a hint to the driver signaling the
> > > > backing fences about how soon userspace needs it to compete work, so it
> > > > can addjust GPU frequency accordingly.  An immediate deadline can be
> > > > given to provide something equivalent to i915 "wait boost".
> > > >
> > > > v2: Use absolute u64 ns value for deadline hint, drop cap and driver
> > > > feature flag in favor of allowing count_handles==0 as a way for
> > > > userspace to probe kernel for support of new flag
> > > >
> > > > Signed-off-by: Rob Clark 
> > > > ---
> > > >  drivers/gpu/drm/drm_syncobj.c | 59 +++
> > > >  include/uapi/drm/drm.h|  5 +++
> > > >  2 files changed, 51 insertions(+), 13 deletions(-)  
> > >
> > > ...
> > >  
> > > > diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
> > > > index 642808520d92..aefc8cc743e0 100644
> > > > --- a/include/uapi/drm/drm.h
> > > > +++ b/include/uapi/drm/drm.h
> > > > @@ -887,6 +887,7 @@ struct drm_syncobj_transfer {
> > > >  #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0)
> > > >  #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1)
> > > >  #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE (1 << 2) /* wait for 
> > > > time point to become available */
> > > > +#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE (1 << 3) /* set fence 
> > > > deadline based to deadline_nsec/sec */  
> > >
> > > Hi,
> > >
> > > where is the UAPI documentation explaining what is a "fence deadline"
> > > and what setting it does here?  
> > 
> > It's with the rest of the drm_syncobj UAPI docs ;-)
> 
> Is that 
> https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html#dma-fence-uabi-sync-file
>  ?
> 
> That whole page never mentions e.g. WAIT_AVAILABLE, so at least the
> flags are not there. Does not mention syncobj_wait either.

probably this:
https://docs.kernel.org/gpu/drm-mm.html

the new one needs to be added there as well.

> 
> I could ask where the real non-IGT userspace is or the plan for it,
> too, since this is new DRM UAPI.

yeap, it looks like we need to close on this...
https://gitlab.freedesktop.org/drm/intel/-/issues/8014

I confess I got lost on the many discussions and on how this will
be used. Is mesa going to set the deadline based on the vk priority?

Will this continue to be internal? I didn't get the broader picture
I'm afraid...

> 
> 
> Thanks,
> pq
> 
> > 
> > BR,
> > -R
> > 
> > > btw. no nsec/sec anymore.
> > >
> > >
> > > Thanks,
> > > pq
> > >
> > >  
> > > >  struct drm_syncobj_wait {
> > > >   __u64 handles;
> > > >   /* absolute timeout */
> > > > @@ -895,6 +896,8 @@ struct drm_syncobj_wait {
> > > >   __u32 flags;
> > > >   __u32 first_signaled; /* only valid when not waiting all */
> > > >   __u32 pad;
> > > > + /* Deadline hint to set on backing fence(s) in CLOCK_MONOTONIC: */
> > > > + __u64 deadline_ns;
> > > >  };
> > > >
> > > >  struct drm_syncobj_timeline_wait {
> > > > @@ -907,6 +910,8 @@ struct drm_syncobj_timeline_wait {
> > > >   __u32 flags;
> > > >   __u32 first_signaled; /* only valid when not waiting all */
> > > >   __u32 pad;
> > > > + /* Deadline hint to set on backing fence(s) in CLOCK_MONOTONIC: */
> > > > + __u64 deadline_ns;
> > > >  };
> > > >
> > > >  
> > >  
>

1 2 >

1 - 100 of 130 matches

Mail list logo