Re: [PATCH 1/2] Mips: ls2k1000: dts: add the display controller device node
On 23/02/2023 04:19, Sui jingfeng wrote: > Hi, > > On 2023/2/23 02:32, Krzysztof Kozlowski wrote: >> On 22/02/2023 17:55, suijingfeng wrote: >>> The display controller is a pci device, it's pci vendor id is >>> 0x0014, it's pci device id is 0x7a06. >>> >>> Signed-off-by: suijingfeng >>> --- >>> .../boot/dts/loongson/loongson64-2k1000.dtsi | 21 +++ >>> 1 file changed, 21 insertions(+) >>> >>> diff --git a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi >>> b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi >>> index 8143a6e3..a528af3977d9 100644 >>> --- a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi >>> +++ b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi >>> @@ -31,6 +31,18 @@ memory@20 { >>> <0x0001 0x1000 0x0001 0xb000>; /* 6912 >>> MB at 4352MB */ >>> }; >>> >>> + reserved-memory { >>> + #address-cells = <2>; >>> + #size-cells = <2>; >>> + ranges; >>> + >>> + display_reserved: framebuffer@3000 { >>> + compatible = "shared-dma-pool"; >>> + reg = <0x0 0x3000 0x0 0x0400>; /* 64M */ >>> + linux,cma-default; >>> + }; >>> + }; >>> + >>> cpu_clk: cpu_clk { >>> #clock-cells = <0>; >>> compatible = "fixed-clock"; >>> @@ -198,6 +210,15 @@ sata@8,0 { >>> interrupt-parent = <&liointc0>; >>> }; >>> >>> + display-controller@6,0 { >>> + compatible = "loongson,ls2k1000-dc"; >>> + >>> + reg = <0x3000 0x0 0x0 0x0 0x0>; >>> + interrupts = <28 IRQ_TYPE_LEVEL_LOW>; >>> + interrupt-parent = <&liointc0>; >>> + memory-region = <&display_reserved>; >> NAK. > Err :(, please give me a chance to explain >> Test your code against the bindings you send. > > I can guarantee to you that I test may code more than twice. The code > used to testing is listed at link [1]. I wrote - test against the bindings. I don't believe that it was tested. Please paste the output of the testing (dtbs_check). > > This patchset mainly used to illustrate how we made the driver in [1] > usable on our SoC platform. > >> It's the same >> patchset. You basically send something which the same moment is incorrect. > > Loongson display controller IP has been integrated in both Loongson > North Bridge chipset(ls7a1000 and ls7a2000) and Loongson SoCs(ls2k1000 > and ls2k2000 etc), it even has been included in Loongson BMC(ls2k0500 bmc) > products. I don't understand how your reply here is relevant to incorrect bindings or incorrect DTS according to bindings. Best regards, Krzysztof
Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
Am 22.02.23 um 17:40 schrieb Danilo Krummrich: On 2/22/23 16:14, Christian König wrote: Am 22.02.23 um 16:07 schrieb Danilo Krummrich: On 2/22/23 11:25, Christian König wrote: Am 17.02.23 um 14:44 schrieb Danilo Krummrich: +/** + * DOC: Overview + * + * The DRM GPU VA Manager, represented by struct drm_gpuva_manager keeps track + * of a GPU's virtual address (VA) space and manages the corresponding virtual + * mappings represented by &drm_gpuva objects. It also keeps track of the + * mapping's backing &drm_gem_object buffers. + * + * &drm_gem_object buffers maintain a list (and a corresponding list lock) of + * &drm_gpuva objects representing all existent GPU VA mappings using this + * &drm_gem_object as backing buffer. + * + * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA mapping can + * only be created within a previously allocated &drm_gpuva_region, which + * represents a reserved portion of the GPU VA space. GPU VA mappings are not + * allowed to span over a &drm_gpuva_region's boundary. + * + * GPU VA regions can also be flagged as sparse, which allows drivers to create + * sparse mappings for a whole GPU VA region in order to support Vulkan + * 'Sparse Resources'. Well since we have now found that there is absolutely no technical reason for having those regions could we please drop them? I disagree this was the outcome of our previous discussion. In nouveau I still need them to track the separate sparse page tables and, as you confirmed previously, Nvidia cards are not the only cards supporting this feature. The second reason is that with regions we can avoid merging between buffers, which saves some effort. However, I agree that this argument by itself probably doesn't hold too much, since you've pointed out in a previous mail that: 1) If we merge and decide to only do that inside certain boundaries then those boundaries needs to be provided and checked against. This burns quite some CPU cycles 2) If we just merge what we can we might have extra page table updates which cost time and could result in undesired side effects. 3) If we don't merge at all we have additional housekeeping for the mappings and maybe hw restrictions. However, if a driver uses regions to track its separate sparse page tables anyway it gets 1) for free, which is a nice synergy. I totally agree that regions aren't for everyone though. Hence, I made them an optional feature and by default regions are disabled. In order to use them drm_gpuva_manager_init() must be called with the DRM_GPUVA_MANAGER_REGIONS feature flag. I really would not want to open code regions or have two GPUVA manager instances in nouveau to track sparse page tables. That would be really messy, hence I hope we can agree on this to be an optional feature. I absolutely don't think that this is a good idea then. This separate handling of sparse page tables is completely Nouveau specific. Actually, I rely on what you said in a previous mail when I say it's, potentially, not specific to nouveau. This sounds similar to what AMD hw used to have up until gfx8 (I think), basically sparse resources where defined through a separate mechanism to the address resolution of the page tables. I won't rule out that other hardware has similar approaches. Ok, sounds like I didn't made my point here clear: AMD does have that same mechanism for older hw you try to implement here for Nouveau, but we have *abandoned* it because it is to much trouble and especially overhead to support! In other words we have said "Ok we would need two separate components to cleanly handle that, one for newer hw and one for older hw.". What you now try to do is to write one component which works for both. We have already exercised this idea and came to the conclusion that it's not a good path to go down. So you're basically just repeating our mistake. I mean if it's just for Nouveau then I would say feel free to do whatever you want, but since this component is supposed to be used by more drivers then I strongly think we need to tackle this from a different side. Even when it's optional feature mixing this into the common handling is exactly what I pointed out as not properly separating between hardware specific and hardware agnostic functionality. Optionally having regions is *not* a hardware specific concept, drivers might use it for a hardware specific purpose though. Which potentially is is the case for almost every DRM helper. Drivers can use regions only for the sake of not merging between buffer boundaries as well. Some drivers might prefer this over "never merge" or "always merge", depending on the cost of re-organizing page tables for unnecessary splits/merges, without having the need of tracking separate sparse page tables. Its just that I think *if* a driver needs to track separate sparse page tables anyways its a nice synergy since then there is no extra cost for g
Re: [Intel-gfx] [PATCH 2/2] Apply quirk to disable PSR 2 on Tongfang PHxTxX1 and PHxTQx1
On Wed, 2023-02-22 at 15:13 -0500, Rodrigo Vivi wrote: > On Wed, Feb 22, 2023 at 03:17:55PM +0100, Werner Sembach wrote: > > On these Barebones PSR 2 is recognized as supported but is very > > buggy: > > - Upper third of screen does sometimes not updated, resulting in > > disappearing cursors or ghosts of already closed Windows saying > > behind. > > - Approximately 40 px from the bottom edge a 3 pixel wide strip of > > randomly > > colored pixels is flickering. > > > > PSR 1 is working fine however. > > I wonder if this is really about the panel's PSR2 or about the > userspace > there not marking the dirtyfb? I know I know... it is not userspace > fault... > > But I wonder if the case you got here highlights the fact that we > have > a substantial bug in the i915 itself in regards to PSR2 API. > > Jose, Jouni, ideas on how to check what could be happening here? There is already fix for this (Thanks to Werner Sembach for testing the patch): https://patchwork.freedesktop.org/series/114217/ > > oh, btw, Werner, do we have an open gilab issue for this? https://gitlab.freedesktop.org/drm/intel/-/issues/7347 > > Thanks, > Rodrigo. > > > > > Signed-off-by: Werner Sembach > > Cc: > > --- > > drivers/gpu/drm/i915/display/intel_quirks.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/display/intel_quirks.c > > b/drivers/gpu/drm/i915/display/intel_quirks.c > > index ce6d0fe6448f5..eeb32d3189f5c 100644 > > --- a/drivers/gpu/drm/i915/display/intel_quirks.c > > +++ b/drivers/gpu/drm/i915/display/intel_quirks.c > > @@ -65,6 +65,10 @@ static void > > quirk_no_pps_backlight_power_hook(struct drm_i915_private *i915) > > drm_info(&i915->drm, "Applying no pps backlight power > > quirk\n"); > > } > > > > +/* > > + * Tongfang PHxTxX1 and PHxTQx1 devices have support for PSR 2 but > > it is broken > > + * on Linux. PSR 1 however works just fine. > > + */ > > static void quirk_no_psr2(struct drm_i915_private *i915) > > { > > intel_set_quirk(i915, QUIRK_NO_PSR2); > > @@ -205,6 +209,10 @@ static struct intel_quirk intel_quirks[] = { > > /* ECS Liva Q2 */ > > { 0x3185, 0x1019, 0xa94d, quirk_increase_ddi_disabled_time > > }, > > { 0x3184, 0x1019, 0xa94d, quirk_increase_ddi_disabled_time > > }, > > + > > + /* Tongfang PHxTxX1 and PHxTQx1/TUXEDO InfinityBook 14 Gen6 > > */ > > + { 0x9a49, 0x1d05, 0x1105, quirk_no_psr2 }, > > + { 0x9a49, 0x1d05, 0x114c, quirk_no_psr2 }, > > }; > > > > void intel_init_quirks(struct drm_i915_private *i915) > > -- > > 2.34.1 > >
Re: [PATCH][next] i915/gvt: Fix spelling mistake "vender" -> "vendor"
On 2023.02.02 12:50:18 +, Colin Ian King wrote: > There is a spelling mistake in a literal string. Fix it. > > Signed-off-by: Colin Ian King > --- > drivers/gpu/drm/i915/gvt/firmware.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gvt/firmware.c > b/drivers/gpu/drm/i915/gvt/firmware.c > index dce93738e98a..4dd52ac2043e 100644 > --- a/drivers/gpu/drm/i915/gvt/firmware.c > +++ b/drivers/gpu/drm/i915/gvt/firmware.c > @@ -171,7 +171,7 @@ static int verify_firmware(struct intel_gvt *gvt, > mem = (fw->data + h->cfg_space_offset); > > id = *(u16 *)(mem + PCI_VENDOR_ID); > - VERIFY("vender id", id, pdev->vendor); > + VERIFY("vendor id", id, pdev->vendor); > > id = *(u16 *)(mem + PCI_DEVICE_ID); > VERIFY("device id", id, pdev->device); > -- Thanks, Colin. Acked-by: Zhenyu Wang > signature.asc Description: PGP signature
Re: [Intel-gfx] [PATCH] i915: fix memory leak with using debugfs_lookup()
On 2023.02.06 15:23:55 -0500, Rodrigo Vivi wrote: > On Thu, Feb 02, 2023 at 03:13:09PM +0100, Greg Kroah-Hartman wrote: > > When calling debugfs_lookup() the result must have dput() called on it, > > otherwise the memory will leak over time. To make things simpler, just > > call debugfs_lookup_and_remove() instead which handles all of the logic > > at once. > > > > Cc: Zhenyu Wang > > Cc: Zhi Wang > > Cc: Jani Nikula > > Cc: Joonas Lahtinen > > Cc: Rodrigo Vivi > > Cc: Tvrtko Ursulin > > Cc: David Airlie > > Cc: Daniel Vetter > > Cc: intel-gvt-...@lists.freedesktop.org > > Cc: intel-...@lists.freedesktop.org > > Cc: dri-devel@lists.freedesktop.org > > Signed-off-by: Greg Kroah-Hartman > > > Reviewed-by: Rodrigo Vivi > > Zhenyu or Zhi, could you please propagate this through your gvt branch? > Sorry I missed this one after I migrated my mail stuff onto new machine.. Looks good to me. Reviewed-by: Zhenyu Wang > > drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c > > b/drivers/gpu/drm/i915/gvt/kvmgt.c > > index 8ae7039b3683..de675d799c7d 100644 > > --- a/drivers/gpu/drm/i915/gvt/kvmgt.c > > +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c > > @@ -699,7 +699,7 @@ static void intel_vgpu_close_device(struct vfio_device > > *vfio_dev) > > > > clear_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status); > > > > - debugfs_remove(debugfs_lookup(KVMGT_DEBUGFS_FILENAME, vgpu->debugfs)); > > + debugfs_lookup_and_remove(KVMGT_DEBUGFS_FILENAME, vgpu->debugfs); > > > > kvm_page_track_unregister_notifier(vgpu->vfio_device.kvm, > >&vgpu->track_node); > > -- > > 2.39.1 > > signature.asc Description: PGP signature
Re: [PATCH] Documentation: gpu: add acceleration node section
On Thu, Feb 23, 2023 at 02:52:52AM +, Dylan Le wrote: > > This patch was initially written for the Linux Kernel Bug Fixing Mentorship > program. The patch adds a temporarily stubbed section on Acceleration Nodes > to resolve a documentation warning. > > This resolves the warning: > ./Documentation/gpu/drm-internals:179: ./include/drm/drm_file.h:411: WARNING: > undefined label: drm_accel_node Please write the patch description in imperative mood ("Do foo" instead of "This patch does foo"). > > I would appreciate any feedback on what should be documented here. I think above is better placed between the three dashes and diffstat ... > --- like here. > Documentation/gpu/drm-uapi.rst | 9 + > 1 file changed, 9 insertions(+) > > +.. _drm_accel_node: > + > +Acceleration nodes > +== > + > +.. note:: > + There is not any documentation yet need to figure out what this is. I'd like to write this stub as generic .. admonition:: block [1] instead, with the content which is "This section is empty, add appropriate documentation here." or similar. [1]: https://docutils.sourceforge.io/docs/ref/rst/directives.html#generic-admonition Thanks. -- An old man doll... just what I always wanted! - Clara signature.asc Description: PGP signature
Re: [PATCH 1/2] Mips: ls2k1000: dts: add the display controller device node
Hi, On 2023/2/23 02:32, Krzysztof Kozlowski wrote: On 22/02/2023 17:55, suijingfeng wrote: The display controller is a pci device, it's pci vendor id is 0x0014, it's pci device id is 0x7a06. Signed-off-by: suijingfeng --- .../boot/dts/loongson/loongson64-2k1000.dtsi | 21 +++ 1 file changed, 21 insertions(+) diff --git a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi index 8143a6e3..a528af3977d9 100644 --- a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi +++ b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi @@ -31,6 +31,18 @@ memory@20 { <0x0001 0x1000 0x0001 0xb000>; /* 6912 MB at 4352MB */ }; + reserved-memory { + #address-cells = <2>; + #size-cells = <2>; + ranges; + + display_reserved: framebuffer@3000 { + compatible = "shared-dma-pool"; + reg = <0x0 0x3000 0x0 0x0400>; /* 64M */ + linux,cma-default; + }; + }; + cpu_clk: cpu_clk { #clock-cells = <0>; compatible = "fixed-clock"; @@ -198,6 +210,15 @@ sata@8,0 { interrupt-parent = <&liointc0>; }; + display-controller@6,0 { + compatible = "loongson,ls2k1000-dc"; + + reg = <0x3000 0x0 0x0 0x0 0x0>; + interrupts = <28 IRQ_TYPE_LEVEL_LOW>; + interrupt-parent = <&liointc0>; + memory-region = <&display_reserved>; NAK. Err :(, please give me a chance to explain Test your code against the bindings you send. I can guarantee to you that I test may code more than twice. The code used to testing is listed at link [1]. This patchset mainly used to illustrate how we made the driver in [1] usable on our SoC platform. It's the same patchset. You basically send something which the same moment is incorrect. Loongson display controller IP has been integrated in both Loongson North Bridge chipset(ls7a1000 and ls7a2000) and Loongson SoCs(ls2k1000 and ls2k2000 etc), it even has been included in Loongson BMC(ls2k0500 bmc) products. When use this driver on Loongson embedded platform(say ls2k2000, ls2k1000 and ls2k0500) , the PMON/Uboot firmware(my company using pmon most of time) will pass a DT to the kernel. Different boards will pass different DTs. But when using this driver on Loongson server and PC platform( ls3c5000/ls3a5000+ls7a1000/ls7a2000), there will no DT supplied. The firmware and kernel side developer of Loongson choose ACPI+UEFI for such platform, more discussion can be found at [2]. Therefore, on such a situation we decide to send the patch at separate patchset. It is not like the arm and risc-v, as the binding would not be always exits. If we put those patches into a same patchset, some reviewers would suggest us to revise our code. To a form that the code *ALWAYS* probed from the DT, this is not desired. Besides, the driver code + dt support is petty large, separate it is more easy to review and manage. Finally, Thanks your kindly guiding and valuable reviews. [1] https://patchwork.freedesktop.org/patch/523409/?series=113566&rev=4 [2] https://lkml.org/lkml/2022/7/15/135 Best regards, Krzysztof
Re: [PATCH v2 06/10] dt-bindings: gpu: mali-bifrost: Add a compatible for MediaTek MT8186
On Wed, Feb 22, 2023 at 5:13 PM AngeloGioacchino Del Regno wrote: > > Il 22/02/23 09:37, Chen-Yu Tsai ha scritto: > > On Tue, Feb 21, 2023 at 11:37 PM AngeloGioacchino Del Regno > > wrote: > >> > >> Get GPU support on MT8186 by adding its compatible. > >> > >> Signed-off-by: AngeloGioacchino Del Regno > >> > >> --- > >> Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml | 5 + > >> 1 file changed, 5 insertions(+) > >> > >> diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml > >> b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml > >> index be18b161959b..43a841d4e94d 100644 > >> --- a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml > >> +++ b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml > >> @@ -15,6 +15,11 @@ properties: > >> > >> compatible: > >> oneOf: > >> + - items: > >> + - enum: > >> + - mediatek,mt8186-mali > >> + - const: mediatek,mt8183b-mali > >> + - const: arm,mali-bifrost # Mali Bifrost GPU model/revision is > >> fully discoverable > > > > The MT8186 has Mali-G52 MC2 2EE, while the MT8183 has Mali-G72 MP3. > > Keeping in mind the obvious - which is that G52 and G72 are both Bifrost > > > So we actually need a new entry with two power domains. > > > > ...This is my node for MT8186: > > gpu: gpu@1304 { > compatible = "mediatek,mt8186-mali", > "mediatek,mt8183b-mali", > "arm,mali-bifrost"; > reg = <0 0x1304 0 0x4000>; > > clocks = <&mfgsys CLK_MFG_BG3D>; > interrupts = , > , > ; > interrupt-names = "job", "mmu", "gpu"; > power-domains = <&spm MT8186_POWER_DOMAIN_MFG1>, > <&spm MT8186_POWER_DOMAIN_MFG2>, > <&spm MT8186_POWER_DOMAIN_MFG3>; > power-domain-names = "core0", "core1", "core2"; > > /* Please ignore speedbin, that's for another time > :-) */ > nvmem-cells = <&gpu_volt_bin>; > nvmem-cell-names = "speed-bin"; > #cooling-cells = <2>; > }; > > There are three MFG power domains... MFG2 and MFG3 are parents of MFG1, on > that > I agree, but we can avoid adding a new entry just for MT8186 and use the > MT8183-b > one while still being technically correct. > > Besides, Mali G52 and Mali G72 are both Bifrost... so I don't think that this > commit is incorrect. For the sake of simplicity, I would push on getting this > one picked. I'm aware. In case it wasn't obvious, Mali-G52 MC2 2EE has 2 cores, while Mali-G72 MP3 has 3 cores. I think that is reason enough to do a new entry. Otherwise you are describing power domains for 3 cores for a GPU that only has two. > Unless there are any real-strong opinions against... Yes. ChenYu
Re: [PATCH v2 6/6] drm/msm/adreno: Enable optional icc voting from OPP tables
On Thu, 23 Feb 2023 at 03:47, Konrad Dybcio wrote: > > Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework > handle bus voting as part of power level setting. > > Signed-off-by: Konrad Dybcio Reviewed-by: Dmitry Baryshkov -- With best wishes Dmitry
Re: [PATCH v2 3/6] drm/msm/a2xx: Implement .gpu_busy
On Thu, 23 Feb 2023 at 03:47, Konrad Dybcio wrote: > > Implement gpu_busy based on the downstream msm-3.4 code [1]. This > allows us to use devfreq on this old old old hardware! > > [1] > https://github.com/LineageOS/android_kernel_sony_apq8064/blob/lineage-16.0/drivers/gpu/msm/adreno_a2xx.c#L1975 > > Signed-off-by: Konrad Dybcio Reviewed-by: Dmitry Baryshkov Small nit below > --- > drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 24 > 1 file changed, 24 insertions(+) > > diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c > b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c > index c67089a7ebc1..6f9876b37db5 100644 > --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c > +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c > @@ -481,6 +481,29 @@ a2xx_create_address_space(struct msm_gpu *gpu, struct > platform_device *pdev) > return aspace; > } > > +/* While the precise size of this field is unknown, it holds at least these > three values.. */ > +static u64 a2xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate) > +{ > + u64 busy_cycles; > + > + /* Freeze the counter */ > + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_FREEZE); > + > + busy_cycles = gpu_read64(gpu, REG_A2XX_RBBM_PERFCOUNTER1_LO); > + > + /* Reset the counter */ > + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_RESET); > + > + /* Re-enable the performance monitors */ > + gpu_rmw(gpu, REG_A2XX_RBBM_PM_OVERRIDE2, BIT(6), BIT(6)); It's DEBUG_PERF_SCLK_PM_OVERRIDE See https://github.com/genesi/linux-legacy/blob/master/drivers/mxc/amd-gpu/include/reg/yamato/10/yamato_mask.h#L4428 > + gpu_write(gpu, REG_A2XX_RBBM_PERFCOUNTER1_SELECT, 1); > + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_ENABLE); > + > + *out_sample_rate = clk_get_rate(gpu->core_clk); > + > + return busy_cycles; > +} > + > static u32 a2xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring) > { > ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR); > @@ -502,6 +525,7 @@ static const struct adreno_gpu_funcs funcs = { > #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP) > .show = adreno_show, > #endif > + .gpu_busy = a2xx_gpu_busy, > .gpu_state_get = a2xx_gpu_state_get, > .gpu_state_put = adreno_gpu_state_put, > .create_address_space = a2xx_create_address_space, > > -- > 2.39.2 > -- With best wishes Dmitry
Re: [PATCH v2 2/6] drm/msm/adreno: Use OPP for every GPU generation
On Thu, 23 Feb 2023 at 03:47, Konrad Dybcio wrote: > > Some older GPUs (namely a2xx with no opp tables at all and a320 with > downstream-remnants gpu pwrlevels) used not to have OPP tables. They > both however had just one frequency defined, making it extremely easy > to construct such an OPP table from within the driver if need be. > > Do so and switch all clk_set_rate calls on core_clk to their OPP > counterparts. > > Signed-off-by: Konrad Dybcio Reviewed-by: Dmitry Baryshkov Minor nit below. > --- > drivers/gpu/drm/msm/adreno/adreno_gpu.c | 99 > +++-- > drivers/gpu/drm/msm/msm_gpu.c | 4 +- > drivers/gpu/drm/msm/msm_gpu_devfreq.c | 2 +- > 3 files changed, 48 insertions(+), 57 deletions(-) > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c > b/drivers/gpu/drm/msm/adreno/adreno_gpu.c > index ce6b76c45b6f..8721e3d6231a 100644 > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c > @@ -922,73 +922,48 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, > uint32_t ndwords) > ring->id); > } > > -/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp table > */ > -static int adreno_get_legacy_pwrlevels(struct device *dev) > -{ > - struct device_node *child, *node; > - int ret; > - > - node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels"); > - if (!node) { > - DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n"); > - return -ENXIO; > - } > - > - for_each_child_of_node(node, child) { > - unsigned int val; > - > - ret = of_property_read_u32(child, "qcom,gpu-freq", &val); > - if (ret) > - continue; > - > - /* > -* Skip the intentionally bogus clock value found at the > bottom > -* of most legacy frequency tables > -*/ > - if (val != 2700) > - dev_pm_opp_add(dev, val, 0); > - } > - > - of_node_put(node); > - > - return 0; > -} > - > -static void adreno_get_pwrlevels(struct device *dev, > +static int adreno_get_pwrlevels(struct device *dev, > struct msm_gpu *gpu) > { > + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); > unsigned long freq = ULONG_MAX; > struct dev_pm_opp *opp; > int ret; > > gpu->fast_rate = 0; > > - /* You down with OPP? */ > - if (!of_find_property(dev->of_node, "operating-points-v2", NULL)) > - ret = adreno_get_legacy_pwrlevels(dev); > - else { > - ret = devm_pm_opp_of_add_table(dev); > - if (ret) > - DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); > - } > - > - if (!ret) { > - /* Find the fastest defined rate */ > - opp = dev_pm_opp_find_freq_floor(dev, &freq); > - if (!IS_ERR(opp)) { > - gpu->fast_rate = freq; > - dev_pm_opp_put(opp); > + /* devm_pm_opp_of_add_table may error out but will still create an > OPP table */ > + ret = devm_pm_opp_of_add_table(dev); > + if (ret == -ENODEV) { > + /* Special cases for ancient hw with ancient DT bindings */ > + if (adreno_is_a2xx(adreno_gpu)) { > + dev_warn(dev, "Unable to find the OPP table. Falling > back to 200 MHz.\n"); > + dev_pm_opp_add(dev, 2, 0); > + } else if (adreno_is_a320(adreno_gpu)) { > + dev_warn(dev, "Unable to find the OPP table. Falling > back to 450 MHz.\n"); > + dev_pm_opp_add(dev, 45000, 0); > + } else { > + DRM_DEV_ERROR(dev, "Unable to find the OPP table\n"); > + return -ENODEV; > } > + } else if (ret) { > + DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); > + return ret; > } > > - if (!gpu->fast_rate) { > - dev_warn(dev, > - "Could not find a clock rate. Using a reasonable > default\n"); > - /* Pick a suitably safe clock speed for any target */ > - gpu->fast_rate = 2; > + /* Find the fastest defined rate */ > + opp = dev_pm_opp_find_freq_floor(dev, &freq); > + > + if (IS_ERR(opp)) > + return PTR_ERR(opp); > + else { > + gpu->fast_rate = freq; > + dev_pm_opp_put(opp); > } Nit: you can drop else {} here. > > DBG("fast_rate=%u, slow_rate=2700", gpu->fast_rate); > + > + return 0; > } > > int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu *adreno_gpu, > @@ -1046,6 +1021,20 @@ int adreno_gpu_init(struct drm_device *drm, struct >
Re: [Intel-gfx] How is the progress for removing flush_scheduled_work() callers?
On 2022/11/16 22:06, Ville Syrjälä wrote: > On Wed, Nov 16, 2022 at 12:08:27PM +0200, Jani Nikula wrote: >> On Sun, 06 Nov 2022, Tetsuo Handa wrote: >>> Like commit c4f135d643823a86 ("workqueue: Wrap flush_workqueue() using a >>> macro") says, flush_scheduled_work() is dangerous and will be forbidden. >>> We are on the way for removing all flush_scheduled_work() callers from >>> the kernel, and there are only 4 callers remaining as of linux-20221104. >>> >>> drivers/gpu/drm/i915/display/intel_display.c:8997: >>> flush_scheduled_work(); >> >> Thanks for the reminder, I've pinged folks to get someone working on >> this. We do schedule quite a bunch of work, so it's not immediately >> obvious (at least to me) what exactly needs flushing. > > Here's my earlier cursory analysis of the subject: > https://lore.kernel.org/intel-gfx/yy3byxfrfafql...@intel.com/ Now that a patch for mptscsih.c was proposed as https://lkml.kernel.org/r/0b9ebcfb-b647-1381-0653-b54528a64...@i-love.sakura.ne.jp , intel_display.c is going to become the last flush_scheduled_work() user. If fixing the hpd disable path takes more time, should we start with moving related works from system_wq to a local workqueue dedicated for intel_display.c ? > >> >> https://gitlab.freedesktop.org/drm/intel/-/issues/7546 >> >>> drivers/gpu/drm/i915/gt/selftest_execlists.c:88: >>> flush_scheduled_work(); >> >> Removed by commit 7d33fd02dd94 ("drm/i915/selftests: Remove >> flush_scheduled_work() from live_execlists") in drm-next. >> >> BR, >> Jani. >> >> -- >> Jani Nikula, Intel Open Source Graphics Center >
[PATCH v2 6/6] drm/msm/adreno: Enable optional icc voting from OPP tables
Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework handle bus voting as part of power level setting. Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/adreno_device.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c index 36f062c7582f..5142a4c72cfc 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_device.c +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c @@ -548,6 +548,10 @@ static int adreno_bind(struct device *dev, struct device *master, void *data) return PTR_ERR(gpu); } + ret = dev_pm_opp_of_find_icc_paths(dev, NULL); + if (ret) + return ret; + return 0; } -- 2.39.2
[PATCH v2 3/6] drm/msm/a2xx: Implement .gpu_busy
Implement gpu_busy based on the downstream msm-3.4 code [1]. This allows us to use devfreq on this old old old hardware! [1] https://github.com/LineageOS/android_kernel_sony_apq8064/blob/lineage-16.0/drivers/gpu/msm/adreno_a2xx.c#L1975 Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 24 1 file changed, 24 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c index c67089a7ebc1..6f9876b37db5 100644 --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c @@ -481,6 +481,29 @@ a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) return aspace; } +/* While the precise size of this field is unknown, it holds at least these three values.. */ +static u64 a2xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate) +{ + u64 busy_cycles; + + /* Freeze the counter */ + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_FREEZE); + + busy_cycles = gpu_read64(gpu, REG_A2XX_RBBM_PERFCOUNTER1_LO); + + /* Reset the counter */ + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_RESET); + + /* Re-enable the performance monitors */ + gpu_rmw(gpu, REG_A2XX_RBBM_PM_OVERRIDE2, BIT(6), BIT(6)); + gpu_write(gpu, REG_A2XX_RBBM_PERFCOUNTER1_SELECT, 1); + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_ENABLE); + + *out_sample_rate = clk_get_rate(gpu->core_clk); + + return busy_cycles; +} + static u32 a2xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring) { ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR); @@ -502,6 +525,7 @@ static const struct adreno_gpu_funcs funcs = { #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP) .show = adreno_show, #endif + .gpu_busy = a2xx_gpu_busy, .gpu_state_get = a2xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, .create_address_space = a2xx_create_address_space, -- 2.39.2
[PATCH v2 5/6] drm/msm/a4xx: Implement .gpu_busy
Add support for gpu_busy on a4xx, which is required for devfreq support. Reviewed-by: Dmitry Baryshkov Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c index 3e09d3a7a0ac..715436cb3996 100644 --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c @@ -611,6 +611,16 @@ static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value) return 0; } +static u64 a4xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate) +{ + u64 busy_cycles; + + busy_cycles = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_RBBM_1_LO); + *out_sample_rate = clk_get_rate(gpu->core_clk); + + return busy_cycles; +} + static u32 a4xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring) { ring->memptrs->rptr = gpu_read(gpu, REG_A4XX_CP_RB_RPTR); @@ -632,6 +642,7 @@ static const struct adreno_gpu_funcs funcs = { #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP) .show = adreno_show, #endif + .gpu_busy = a4xx_gpu_busy, .gpu_state_get = a4xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, .create_address_space = adreno_create_address_space, -- 2.39.2
[PATCH v2 2/6] drm/msm/adreno: Use OPP for every GPU generation
Some older GPUs (namely a2xx with no opp tables at all and a320 with downstream-remnants gpu pwrlevels) used not to have OPP tables. They both however had just one frequency defined, making it extremely easy to construct such an OPP table from within the driver if need be. Do so and switch all clk_set_rate calls on core_clk to their OPP counterparts. Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 99 +++-- drivers/gpu/drm/msm/msm_gpu.c | 4 +- drivers/gpu/drm/msm/msm_gpu_devfreq.c | 2 +- 3 files changed, 48 insertions(+), 57 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index ce6b76c45b6f..8721e3d6231a 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -922,73 +922,48 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, uint32_t ndwords) ring->id); } -/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp table */ -static int adreno_get_legacy_pwrlevels(struct device *dev) -{ - struct device_node *child, *node; - int ret; - - node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels"); - if (!node) { - DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n"); - return -ENXIO; - } - - for_each_child_of_node(node, child) { - unsigned int val; - - ret = of_property_read_u32(child, "qcom,gpu-freq", &val); - if (ret) - continue; - - /* -* Skip the intentionally bogus clock value found at the bottom -* of most legacy frequency tables -*/ - if (val != 2700) - dev_pm_opp_add(dev, val, 0); - } - - of_node_put(node); - - return 0; -} - -static void adreno_get_pwrlevels(struct device *dev, +static int adreno_get_pwrlevels(struct device *dev, struct msm_gpu *gpu) { + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); unsigned long freq = ULONG_MAX; struct dev_pm_opp *opp; int ret; gpu->fast_rate = 0; - /* You down with OPP? */ - if (!of_find_property(dev->of_node, "operating-points-v2", NULL)) - ret = adreno_get_legacy_pwrlevels(dev); - else { - ret = devm_pm_opp_of_add_table(dev); - if (ret) - DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); - } - - if (!ret) { - /* Find the fastest defined rate */ - opp = dev_pm_opp_find_freq_floor(dev, &freq); - if (!IS_ERR(opp)) { - gpu->fast_rate = freq; - dev_pm_opp_put(opp); + /* devm_pm_opp_of_add_table may error out but will still create an OPP table */ + ret = devm_pm_opp_of_add_table(dev); + if (ret == -ENODEV) { + /* Special cases for ancient hw with ancient DT bindings */ + if (adreno_is_a2xx(adreno_gpu)) { + dev_warn(dev, "Unable to find the OPP table. Falling back to 200 MHz.\n"); + dev_pm_opp_add(dev, 2, 0); + } else if (adreno_is_a320(adreno_gpu)) { + dev_warn(dev, "Unable to find the OPP table. Falling back to 450 MHz.\n"); + dev_pm_opp_add(dev, 45000, 0); + } else { + DRM_DEV_ERROR(dev, "Unable to find the OPP table\n"); + return -ENODEV; } + } else if (ret) { + DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); + return ret; } - if (!gpu->fast_rate) { - dev_warn(dev, - "Could not find a clock rate. Using a reasonable default\n"); - /* Pick a suitably safe clock speed for any target */ - gpu->fast_rate = 2; + /* Find the fastest defined rate */ + opp = dev_pm_opp_find_freq_floor(dev, &freq); + + if (IS_ERR(opp)) + return PTR_ERR(opp); + else { + gpu->fast_rate = freq; + dev_pm_opp_put(opp); } DBG("fast_rate=%u, slow_rate=2700", gpu->fast_rate); + + return 0; } int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu *adreno_gpu, @@ -1046,6 +1021,20 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev, struct adreno_rev *rev = &config->rev; const char *gpu_name; u32 speedbin; + int ret; + + /* +* This can only be done before devm_pm_opp_of_add_table(), or +* dev_pm_opp_set_config() will WARN_ON() +*/ + if (IS_ERR(devm_clk_get(dev, "core"))) { + /* +* If "core" is absent
[PATCH v2 4/6] drm/msm/a3xx: Implement .gpu_busy
Add support for gpu_busy on a3xx, which is required for devfreq support. Tested-by: Dmitry Baryshkov #ifc6410 Reviewed-by: Dmitry Baryshkov Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c index 948785ed07bb..c86b377f6f0d 100644 --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c @@ -477,6 +477,16 @@ static struct msm_gpu_state *a3xx_gpu_state_get(struct msm_gpu *gpu) return state; } +static u64 a3xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate) +{ + u64 busy_cycles; + + busy_cycles = gpu_read64(gpu, REG_A3XX_RBBM_PERFCTR_RBBM_1_LO); + *out_sample_rate = clk_get_rate(gpu->core_clk); + + return busy_cycles; +} + static u32 a3xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring) { ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR); @@ -498,6 +508,7 @@ static const struct adreno_gpu_funcs funcs = { #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP) .show = adreno_show, #endif + .gpu_busy = a3xx_gpu_busy, .gpu_state_get = a3xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, .create_address_space = adreno_create_address_space, -- 2.39.2
[PATCH v2 0/6] OPP and devfreq for all Adrenos
v1 -> v2: - Move a2xx #defines to XML - Use dev_pm_opp_find_freq_floor in the common path in [2/6] - Clarify a comment in [2/6] - Move voting from a5xx to Adreno-wide [6/6] - Pick up tags v1: https://lore.kernel.org/linux-arm-msm/20230222-konrad-longbois-next-v1-0-010214257...@linaro.org This series is a combination of [1] and a subset of [2] and some new stuff. With it, devfreq is used on all a2xx-a6xx (including gmu and gmu-wrapper) and all clk_set_rate(core clock) calls are dropped in favour of dev_pm_opp_set_rate, which - drumroll - lets us scale the voltage domain. DT patches making use of that will be sent separately. On top of that, a5xx gets a call to enable icc scaling from the OPP tables. No SoCs implementing a2xx have icc support yet and a3/4xx SoCs have separate logic for that, which will be updated at a later time. Getting this in for 6.4 early would be appreciated, as that would allow for getting GMU wrapper GPUs up (without VDD&icc scaling they can only run at lower freqs, which is.. ehhh..) Changes: - a3xx busy: use the _1 counter as per msm-3.x instead of _0 - a6xx-series-opp: basically rewrite, ensure compat with all gens - a2/4xx busy: new patch - a5xx icc: new patch [1] https://lore.kernel.org/linux-arm-msm/20230130093809.2079314-1-konrad.dyb...@linaro.org/ [2] https://lore.kernel.org/linux-arm-msm/20230214173145.2482651-1-konrad.dyb...@linaro.org/ Signed-off-by: Konrad Dybcio --- Konrad Dybcio (6): drm/msm/a2xx: Include perf counter reg values in XML drm/msm/adreno: Use OPP for every GPU generation drm/msm/a2xx: Implement .gpu_busy drm/msm/a3xx: Implement .gpu_busy drm/msm/a4xx: Implement .gpu_busy drm/msm/adreno: Enable optional icc voting from OPP tables drivers/gpu/drm/msm/adreno/a2xx.xml.h | 6 ++ drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 24 drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 11 drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 11 drivers/gpu/drm/msm/adreno/adreno_device.c | 4 ++ drivers/gpu/drm/msm/adreno/adreno_gpu.c| 99 ++ drivers/gpu/drm/msm/msm_gpu.c | 4 +- drivers/gpu/drm/msm/msm_gpu_devfreq.c | 2 +- 8 files changed, 104 insertions(+), 57 deletions(-) --- base-commit: aaf70d5ad5e2b06a8050c51e278b0c3a14fabef5 change-id: 20230223-topic-opp-01e7112b867d Best regards, -- Konrad Dybcio
[PATCH v2 1/6] drm/msm/a2xx: Include perf counter reg values in XML
This is a partial merge of [1], subject to be dropped if a header update is executed. [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21480/ Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/a2xx.xml.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/a2xx.xml.h b/drivers/gpu/drm/msm/adreno/a2xx.xml.h index afa6023346c4..b85fdc082bc1 100644 --- a/drivers/gpu/drm/msm/adreno/a2xx.xml.h +++ b/drivers/gpu/drm/msm/adreno/a2xx.xml.h @@ -1060,6 +1060,12 @@ enum a2xx_mh_perfcnt_select { AXI_TOTAL_READ_REQUEST_DATA_BEATS = 181, }; +enum perf_mode_cnt { + PERF_STATE_RESET = 0, + PERF_STATE_ENABLE = 1, + PERF_STATE_FREEZE = 2, +}; + enum adreno_mmu_clnt_beh { BEH_NEVR = 0, BEH_TRAN_RNG = 1, -- 2.39.2
Re: [Intel-gfx] [PATCH] drm/i915/gsc: Fix the Driver-FLR completion
On 2/22/2023 1:01 PM, Alan Previn wrote: The Driver-FLR flow may inadvertently exit early before the full completion of the re-init of the internal HW state if we only poll GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead we need a two-step completion wait-for-completion flow that also involves GU_CNTL. See the patch and new code comments for detail. This is new direction from HW architecture folks. v2: - Add error message for the teardown timeout (Anshuman) - Don't duplicate code in comments (Jani) LGTM, Tested-by: Vinay Belgaumkar Signed-off-by: Alan Previn Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was loaded") --- drivers/gpu/drm/i915/intel_uncore.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index f018da7ebaac..f3c46352db89 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -2749,14 +2749,25 @@ static void driver_initiated_flr(struct intel_uncore *uncore) /* Trigger the actual Driver-FLR */ intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR); + /* Wait for hardware teardown to complete */ + ret = intel_wait_for_register_fw(uncore, GU_CNTL, +DRIVERFLR_STATUS, 0, +flr_timeout_ms); + if (ret) { + drm_err(&i915->drm, "Driver-FLR-teardown wait completion failed! %d\n", ret); + return; + } + + /* Wait for hardware/firmware re-init to complete */ ret = intel_wait_for_register_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS, DRIVERFLR_STATUS, flr_timeout_ms); if (ret) { - drm_err(&i915->drm, "wait for Driver-FLR completion failed! %d\n", ret); + drm_err(&i915->drm, "Driver-FLR-reinit wait completion failed! %d\n", ret); return; } + /* Clear sticky completion status */ intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS); }
[PATCH] drm/rockchip: use struct_size() in vop2_bind
Use the overflow-protected struct_size() helper macro to compute the allocation size of the vop2 data structure. Signed-off-by: Jacob Keller Cc: Sandy Huang Cc: Heiko Stübner Cc: David Airlie --- I found this while developing a coccinelle script to detect potential places where struct_size() would be appropriate. drivers/gpu/drm/rockchip/rockchip_drm_vop2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c index 0e0012368976..3e5861653b69 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c @@ -2655,7 +2655,7 @@ static int vop2_bind(struct device *dev, struct device *master, void *data) return -ENODEV; /* Allocate vop2 struct and its vop2_win array */ - alloc_size = sizeof(*vop2) + sizeof(*vop2->win) * vop2_data->win_size; + alloc_size = struct_size(vop2, win, vop2_data->win_size); vop2 = devm_kzalloc(dev, alloc_size, GFP_KERNEL); if (!vop2) return -ENOMEM; base-commit: 4d5a2cce47a8e1e7119a881a4714f0eee557c1cd -- 2.39.1.405.gd4c25cc71f83
Re: [PATCH 1/5] drm/msm/adreno: Use OPP for every GPU generation
On 23.02.2023 00:16, Dmitry Baryshkov wrote: > On 23/02/2023 00:40, Konrad Dybcio wrote: >> >> >> On 22.02.2023 23:38, Dmitry Baryshkov wrote: >>> On 22/02/2023 23:47, Konrad Dybcio wrote: Some older GPUs (namely a2xx with no opp tables at all and a320 with downstream-remnants gpu pwrlevels) used not to have OPP tables. They both however had just one frequency defined, making it extremely easy to construct such an OPP table from within the driver if need be. Do so and switch all clk_set_rate calls on core_clk to their OPP counterparts. Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 94 +++-- drivers/gpu/drm/msm/msm_gpu.c | 4 +- drivers/gpu/drm/msm/msm_gpu_devfreq.c | 2 +- 3 files changed, 45 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index ce6b76c45b6f..9b940c0f063f 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -922,73 +922,50 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, uint32_t ndwords) ring->id); } -/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp table */ -static int adreno_get_legacy_pwrlevels(struct device *dev) -{ - struct device_node *child, *node; - int ret; - - node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels"); - if (!node) { - DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n"); - return -ENXIO; - } - - for_each_child_of_node(node, child) { - unsigned int val; - - ret = of_property_read_u32(child, "qcom,gpu-freq", &val); - if (ret) - continue; - - /* - * Skip the intentionally bogus clock value found at the bottom - * of most legacy frequency tables - */ - if (val != 2700) - dev_pm_opp_add(dev, val, 0); - } - - of_node_put(node); - - return 0; -} - -static void adreno_get_pwrlevels(struct device *dev, +static int adreno_get_pwrlevels(struct device *dev, struct msm_gpu *gpu) { + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); unsigned long freq = ULONG_MAX; struct dev_pm_opp *opp; int ret; gpu->fast_rate = 0; - /* You down with OPP? */ - if (!of_find_property(dev->of_node, "operating-points-v2", NULL)) - ret = adreno_get_legacy_pwrlevels(dev); - else { - ret = devm_pm_opp_of_add_table(dev); - if (ret) - DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); - } - - if (!ret) { + /* devm_pm_opp_of_add_table may error out but will still create an OPP table */ + ret = devm_pm_opp_of_add_table(dev); + if (ret == -ENODEV) { + /* Special cases for ancient hw with ancient DT bindings */ + if (adreno_is_a2xx(adreno_gpu)) { + dev_warn(dev, "Unable to find the OPP table. Falling back to 200 MHz.\n"); + dev_pm_opp_add(dev, 2, 0); + gpu->fast_rate = 2; >>> >>> We can skip setting the fast_rate, dev_pm_opp_find_freq_floor below will >>> get it from our freshly generated opp table. >> It's not reached in this code path. > > I see. I got lost in all the ifs. What do you think about turning it into the > main code path, since after this code block we always have a valid OPP table? Sounds good! Konrad > >> >>> + } else if (adreno_is_a320(adreno_gpu)) { + dev_warn(dev, "Unable to find the OPP table. Falling back to 450 MHz.\n"); + dev_pm_opp_add(dev, 45000, 0); + gpu->fast_rate = 45000; + } else { + DRM_DEV_ERROR(dev, "Unable to find the OPP table\n"); + return -ENODEV; + } + } else if (ret) { + DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); + return ret; + } else { /* Find the fastest defined rate */ opp = dev_pm_opp_find_freq_floor(dev, &freq); - if (!IS_ERR(opp)) { + + if (IS_ERR(opp)) + return PTR_ERR(opp); + else { gpu->fast_rate = freq; dev_pm_opp_put(opp); } } - if (!gpu->fast_rate) { - dev_warn(dev, - "Could not find a clock rate. Using a reasonable default\n"); - /* Pick a suitably safe clock speed for any
[PATCH v2 2/2] drm: rcar-du: Write correct values in DORCR reserved fields
The DORCR register controls the routing of clocks and data between DU channels within a group. For groups that contain a single channel, there's no routing option to control, and some fields of the register are then reserved. On Gen2 those reserved fields are documented as required to be set to 0, while on Gen3 and newer the PG1T, DK1S and PG1D reserved fields must be set to 1. The DU driver initializes the DORCR register in rcar_du_group_setup(), where it ignores the PG1T, DK1S and PG1D, and then configures those fields to the correct value in rcar_du_group_set_routing(). This hasn't been shown to cause any issue, but prevents certifying that the driver complies with the documentation in safety-critical use cases. As there is no reasonable change that the documentation will be updated to clarify that those reserved fields can be written to 0 temporarily before starting the hardware, make sure that the registers are always set to valid values. Signed-off-by: Laurent Pinchart --- drivers/gpu/drm/rcar-du/rcar_du_group.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/rcar-du/rcar_du_group.c b/drivers/gpu/drm/rcar-du/rcar_du_group.c index b5950749d68a..2ccd2581f544 100644 --- a/drivers/gpu/drm/rcar-du/rcar_du_group.c +++ b/drivers/gpu/drm/rcar-du/rcar_du_group.c @@ -138,6 +138,7 @@ static void rcar_du_group_setup(struct rcar_du_group *rgrp) { struct rcar_du_device *rcdu = rgrp->dev; u32 defr7 = DEFR7_CODE; + u32 dorcr; /* Enable extended features */ rcar_du_group_write(rgrp, DEFR, DEFR_CODE | DEFR_DEFE); @@ -174,8 +175,15 @@ static void rcar_du_group_setup(struct rcar_du_group *rgrp) /* * Use DS1PR and DS2PR to configure planes priorities and connects the * superposition 0 to DU0 pins. DU1 pins will be configured dynamically. +* +* Groups that have a single channel have a hardcoded configuration. On +* Gen3 and newer, the documentation requires PG1T, DK1S and PG1D_DS1 to +* always be set in this case. */ - rcar_du_group_write(rgrp, DORCR, DORCR_PG0D_DS0 | DORCR_DPRS); + dorcr = DORCR_PG0D_DS0 | DORCR_DPRS; + if (rcdu->info->gen >= 3 && rgrp->num_crtcs == 1) + dorcr |= DORCR_PG1T | DORCR_DK1S | DORCR_PG1D_DS1; + rcar_du_group_write(rgrp, DORCR, dorcr); /* Apply planes to CRTCs association. */ mutex_lock(&rgrp->lock); -- Regards, Laurent Pinchart
[PATCH v2 1/2] drm: rcar-du: Rename DORCR fields to make them 0-based
The DORCR fields were documented in the R-Car H1 datasheet with 1-based named, and then got renamed to 0-based in Gen2. The 0-based names are used for Gen3 and Gen4, making H1 an outlier. Rename the field macros to make them 0-based, in order to increase readability of the code when comparing it with the documentation. Signed-off-by: Laurent Pinchart --- drivers/gpu/drm/rcar-du/rcar_du_group.c | 8 drivers/gpu/drm/rcar-du/rcar_du_regs.h | 26 - 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/rcar-du/rcar_du_group.c b/drivers/gpu/drm/rcar-du/rcar_du_group.c index 152602236377..b5950749d68a 100644 --- a/drivers/gpu/drm/rcar-du/rcar_du_group.c +++ b/drivers/gpu/drm/rcar-du/rcar_du_group.c @@ -175,7 +175,7 @@ static void rcar_du_group_setup(struct rcar_du_group *rgrp) * Use DS1PR and DS2PR to configure planes priorities and connects the * superposition 0 to DU0 pins. DU1 pins will be configured dynamically. */ - rcar_du_group_write(rgrp, DORCR, DORCR_PG1D_DS1 | DORCR_DPRS); + rcar_du_group_write(rgrp, DORCR, DORCR_PG0D_DS0 | DORCR_DPRS); /* Apply planes to CRTCs association. */ mutex_lock(&rgrp->lock); @@ -349,7 +349,7 @@ int rcar_du_group_set_routing(struct rcar_du_group *rgrp) struct rcar_du_device *rcdu = rgrp->dev; u32 dorcr = rcar_du_group_read(rgrp, DORCR); - dorcr &= ~(DORCR_PG2T | DORCR_DK2S | DORCR_PG2D_MASK); + dorcr &= ~(DORCR_PG1T | DORCR_DK1S | DORCR_PG1D_MASK); /* * Set the DPAD1 pins sources. Select CRTC 0 if explicitly requested and @@ -357,9 +357,9 @@ int rcar_du_group_set_routing(struct rcar_du_group *rgrp) * by default. */ if (rcdu->dpad1_source == rgrp->index * 2) - dorcr |= DORCR_PG2D_DS1; + dorcr |= DORCR_PG1D_DS0; else - dorcr |= DORCR_PG2T | DORCR_DK2S | DORCR_PG2D_DS2; + dorcr |= DORCR_PG1T | DORCR_DK1S | DORCR_PG1D_DS1; rcar_du_group_write(rgrp, DORCR, dorcr); diff --git a/drivers/gpu/drm/rcar-du/rcar_du_regs.h b/drivers/gpu/drm/rcar-du/rcar_du_regs.h index 789ae9285108..6c750fab6ebb 100644 --- a/drivers/gpu/drm/rcar-du/rcar_du_regs.h +++ b/drivers/gpu/drm/rcar-du/rcar_du_regs.h @@ -511,19 +511,19 @@ */ #define DORCR 0x11000 -#define DORCR_PG2T (1 << 30) -#define DORCR_DK2S (1 << 28) -#define DORCR_PG2D_DS1 (0 << 24) -#define DORCR_PG2D_DS2 (1 << 24) -#define DORCR_PG2D_FIX0(2 << 24) -#define DORCR_PG2D_DOOR(3 << 24) -#define DORCR_PG2D_MASK(3 << 24) -#define DORCR_DR1D (1 << 21) -#define DORCR_PG1D_DS1 (0 << 16) -#define DORCR_PG1D_DS2 (1 << 16) -#define DORCR_PG1D_FIX0(2 << 16) -#define DORCR_PG1D_DOOR(3 << 16) -#define DORCR_PG1D_MASK(3 << 16) +#define DORCR_PG1T (1 << 30) +#define DORCR_DK1S (1 << 28) +#define DORCR_PG1D_DS0 (0 << 24) +#define DORCR_PG1D_DS1 (1 << 24) +#define DORCR_PG1D_FIX0(2 << 24) +#define DORCR_PG1D_DOOR(3 << 24) +#define DORCR_PG1D_MASK(3 << 24) +#define DORCR_DR0D (1 << 21) +#define DORCR_PG0D_DS0 (0 << 16) +#define DORCR_PG0D_DS1 (1 << 16) +#define DORCR_PG0D_FIX0(2 << 16) +#define DORCR_PG0D_DOOR(3 << 16) +#define DORCR_PG0D_MASK(3 << 16) #define DORCR_RGPV (1 << 4) #define DORCR_DPRS (1 << 0) -- Regards, Laurent Pinchart
[PATCH v2 0/2] drm: rcar-du: Fix more invalid register writes
Hello, Following the "[PATCH 0/2] drm: rcar-du: Avoid writing reserved register fields" series ([1]), this series addresses more invalid register writes in the R-Car DU driver. Patch 1/2 first renames some register field macros to increase readability, and patch 2/2 fixes the invalid writes. The rationale is the same as for the previous series: the current implementation is likely fine, but doesn't pass the functional safety requirements as it doesn't match the documentation. The series supersedes the patch "[PATCH] drm: rcar-du: Write correct values in DORCR reserved fields" ([2]) that I have just sent, which was messing the 1/2 dependency. Patch 2/2 is otherwise identical to [2]. [1] https://lore.kernel.org/dri-devel/20230222050623.29080-1-laurent.pinchart+rene...@ideasonboard.com/T/#t [2] https://lore.kernel.org/dri-devel/2023033113.4737-1-laurent.pinchart+rene...@ideasonboard.com/T/#u Laurent Pinchart (2): drm: rcar-du: Rename DORCR fields to make them 0-based drm: rcar-du: Write correct values in DORCR reserved fields drivers/gpu/drm/rcar-du/rcar_du_group.c | 16 +++ drivers/gpu/drm/rcar-du/rcar_du_regs.h | 26 - 2 files changed, 25 insertions(+), 17 deletions(-) -- Regards, Laurent Pinchart
Re: [PATCH v5 3/8] drm/i915/pxp: Add MTL helpers to submit Heci-Cmd-Packet to GSC
On Tue, 2023-02-14 at 13:38 -0800, Teres Alexis, Alan Previn wrote: > Add helper functions into a new file for heci-packet-submission. > The helpers will handle generating the MTL GSC-CS Memory-Header > and submission of the Heci-Cmd-Packet instructions to the engine. > > NOTE1: These common functions for heci-packet-submission will be used > by different i915 callers: > 1- GSC-SW-Proxy: This is pending upstream publication awaiting > a few remaining opens > 2- MTL-HDCP: An equivalent patch has also been published at: > https://patchwork.freedesktop.org/series/111876/. (Patch 1) > 3- PXP: This series. > > NOTE2: A difference in this patch vs what is appearing is in bullet 2 > above is that HDCP (and SW-Proxy) will be using priveleged submission > (GGTT and common gsc-uc-context) while PXP will be using non-priveleged > PPGTT, context and batch buffer. Therefore this patch will only slightly > overlap with the MTL-HDCP patches despite have very similar function > names (emit_foo vs emit_nonpriv_foo). This is because HECI_CMD_PKT > instructions require different flows and hw-specific code when done > via PPGTT based submission (not different from other engines). MTL-HDCP > contains the same intel_gsc_mtl_header_t structures as this but the > helpers there are different. Both add the same new file names. > alan: snip > +int > +intel_gsc_uc_heci_cmd_submit_nonpriv(struct intel_gsc_uc *gsc, > + struct intel_context *ce, > + struct intel_gsc_heci_non_priv_pkt *pkt, > + u32 *cmd, int timeout_ms) > +{ > + struct intel_engine_cs *eng; > + struct i915_request *rq; > + int err; > + > + rq = intel_context_create_request(ce); alan: i need to this to below the vma-lock-unlock pairs below to avoid any kind of lockdep warning because of expected primed ordering of calls across driver > + if (IS_ERR(rq)) > + return PTR_ERR(rq); > + > + emit_gsc_heci_pkt_nonpriv(cmd, pkt); > + > + i915_vma_lock(pkt->bb_vma); > + err = i915_vma_move_to_active(pkt->bb_vma, rq, EXEC_OBJECT_WRITE); > + i915_vma_unlock(pkt->bb_vma); > + if (err) > + return err; > + > + i915_vma_lock(pkt->heci_pkt_vma); > + err = i915_vma_move_to_active(pkt->heci_pkt_vma, rq, EXEC_OBJECT_WRITE); > + i915_vma_unlock(pkt->heci_pkt_vma); > + if (err) > + return err; > + > + eng = rq->context->engine; > + if (eng->emit_init_breadcrumb) { > + err = eng->emit_init_breadcrumb(rq); > + if (err) > + goto out_rq; > + } alan: snip
[PATCH] drm: rcar-du: Write correct values in DORCR reserved fields
The DORCR register controls the routing of clocks and data between DU channels within a group. For groups that contain a single channel, there's no routing option to control, and some fields of the register are then reserved. On Gen2 those reserved fields are documented as required to be set to 0, while on Gen3 and newer the PG1T, DK1S and PG1D reserved fields must be set to 1. The DU driver initializes the DORCR register in rcar_du_group_setup(), where it ignores the PG1T, DK1S and PG1D, and then configures those fields to the correct value in rcar_du_group_set_routing(). This hasn't been shown to cause any issue, but prevents certifying that the driver complies with the documentation in safety-critical use cases. As there is no reasonable change that the documentation will be updated to clarify that those reserved fields can be written to 0 temporarily before starting the hardware, make sure that the registers are always set to valid values. Signed-off-by: Laurent Pinchart --- drivers/gpu/drm/rcar-du/rcar_du_group.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/rcar-du/rcar_du_group.c b/drivers/gpu/drm/rcar-du/rcar_du_group.c index b5950749d68a..2ccd2581f544 100644 --- a/drivers/gpu/drm/rcar-du/rcar_du_group.c +++ b/drivers/gpu/drm/rcar-du/rcar_du_group.c @@ -138,6 +138,7 @@ static void rcar_du_group_setup(struct rcar_du_group *rgrp) { struct rcar_du_device *rcdu = rgrp->dev; u32 defr7 = DEFR7_CODE; + u32 dorcr; /* Enable extended features */ rcar_du_group_write(rgrp, DEFR, DEFR_CODE | DEFR_DEFE); @@ -174,8 +175,15 @@ static void rcar_du_group_setup(struct rcar_du_group *rgrp) /* * Use DS1PR and DS2PR to configure planes priorities and connects the * superposition 0 to DU0 pins. DU1 pins will be configured dynamically. +* +* Groups that have a single channel have a hardcoded configuration. On +* Gen3 and newer, the documentation requires PG1T, DK1S and PG1D_DS1 to +* always be set in this case. */ - rcar_du_group_write(rgrp, DORCR, DORCR_PG0D_DS0 | DORCR_DPRS); + dorcr = DORCR_PG0D_DS0 | DORCR_DPRS; + if (rcdu->info->gen >= 3 && rgrp->num_crtcs == 1) + dorcr |= DORCR_PG1T | DORCR_DK1S | DORCR_PG1D_DS1; + rcar_du_group_write(rgrp, DORCR, dorcr); /* Apply planes to CRTCs association. */ mutex_lock(&rgrp->lock); -- Regards, Laurent Pinchart
Re: [PATCH] drm/atomic-helpers: remove legacy_cursor_update hacks
On Wed, Feb 22, 2023 at 3:14 PM Rob Clark wrote: > > On Thu, Feb 16, 2023 at 3:12 AM Daniel Vetter wrote: > > > > The stuff never really worked, and leads to lots of fun because it > > out-of-order frees atomic states. Which upsets KASAN, among other > > things. > > > > For async updates we now have a more solid solution with the > > ->atomic_async_check and ->atomic_async_commit hooks. Support for that > > for msm and vc4 landed. nouveau and i915 have their own commit > > routines, doing something similar. > > > > For everyone else it's probably better to remove the use-after-free > > bug, and encourage folks to use the async support instead. The > > affected drivers which register a legacy cursor plane and don't either > > use the new async stuff or their own commit routine are: amdgpu, > > atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx. > > > > Inspired by an amdgpu bug report. > > > > v2: Drop RFC, I think with amdgpu converted over to use > > atomic_async_check/commit done in > > > > commit 674e78acae0dfb4beb56132e41cbae5b60f7d662 > > Author: Nicholas Kazlauskas > > Date: Wed Dec 5 14:59:07 2018 -0500 > > > > drm/amd/display: Add fast path for cursor plane updates > > > > we don't have any driver anymore where we have userspace expecting > > solid legacy cursor support _and_ they are using the atomic helpers in > > their fully glory. So we can retire this. > > > > v3: Paper over msm and i915 regression. The complete_all is the only > > thing missing afaict. > > > > v4: Fixup i915 fixup ... > > > > v5: Unallocate the crtc->event in msm to avoid hitting a WARN_ON in > > dpu_crtc_atomic_flush(). This is a bit a hack, but simplest way to > > untangle this all. Thanks to Abhinav Kumar for the debug help. > > Hmm, are you sure about that double-put? > > [ +0.501263] [ cut here ] > [ +0.32] refcount_t: underflow; use-after-free. > [ +0.33] WARNING: CPU: 6 PID: 1854 at lib/refcount.c:28 > refcount_warn_saturate+0xf8/0x134 > [ +0.43] Modules linked in: uinput rfcomm algif_hash > algif_skcipher af_alg veth venus_dec venus_enc xt_cgroup xt_MASQUERADE > qcom_spmi_temp_alarm qcom_spmi_adc_tm5 qcom_spmi_adc5 qcom_vadc_common > cros_ec_typec typec 8021q hci_uart btqca qcom_stats venus_core > coresight_etm4x coresight_tmc snd_soc_lpass_sc7180 > coresight_replicator coresight_funnel coresight snd_soc_sc7180 > ip6table_nat fuse ath10k_snoc ath10k_core ath mac80211 iio_trig_sysfs > bluetooth cros_ec_sensors cfg80211 cros_ec_sensors_core > industrialio_triggered_buffer kfifo_buf ecdh_generic ecc > cros_ec_sensorhub lzo_rle lzo_compress r8153_ecm cdc_ether usbnet > r8152 mii zram hid_vivaldi hid_google_hammer hid_vivaldi_common joydev > [ +0.000189] CPU: 6 PID: 1854 Comm: DrmThread Not tainted > 5.15.93-16271-g5ecce40dbcd4 #46 > cf9752a1c9e5b13fd13216094f52d77fa5a5f8f3 > [ +0.16] Hardware name: Google Wormdingler rev1+ INX panel board (DT) > [ +0.08] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ +0.13] pc : refcount_warn_saturate+0xf8/0x134 > [ +0.11] lr : refcount_warn_saturate+0xf8/0x134 > [ +0.11] sp : ffc012e43930 > [ +0.08] x29: ffc012e43930 x28: ff80d31aa300 x27: > 024e > [ +0.17] x26: 03bd x25: 0040 x24: > 0040 > [ +0.14] x23: ff8083eb1000 x22: 0002 x21: > ff80845bc800 > [ +0.13] x20: 0040 x19: ff80d0cecb00 x18: > 60014024 > [ +0.12] x17: x16: 003c x15: > ffd97e21a1c0 > [ +0.12] x14: 0003 x13: 0004 x12: > 0001 > [ +0.14] x11: c000dfff x10: ffd97f560f50 x9 : > 5749cdb403550d00 > [ +0.14] x8 : 5749cdb403550d00 x7 : x6 : > 372e31332020205b > [ +0.12] x5 : ffd97f7b8b24 x4 : x3 : > ffc012e43588 > [ +0.13] x2 : ffc012e43590 x1 : dfff x0 : > 0026 > [ +0.14] Call trace: > [ +0.08] refcount_warn_saturate+0xf8/0x134 > [ +0.13] drm_crtc_commit_put+0x54/0x74 > [ +0.13] __drm_atomic_helper_plane_destroy_state+0x64/0x68 > [ +0.13] dpu_plane_destroy_state+0x24/0x3c > [ +0.17] drm_atomic_state_default_clear+0x13c/0x2d8 > [ +0.15] __drm_atomic_state_free+0x88/0xa0 > [ +0.15] drm_atomic_helper_update_plane+0x158/0x188 > [ +0.14] __setplane_atomic+0xf4/0x138 > [ +0.12] drm_mode_cursor_common+0x2e8/0x40c > [ +0.09] drm_mode_cursor_ioctl+0x48/0x70 > [ +0.08] drm_ioctl_kernel+0xe0/0x158 > [ +0.14] drm_ioctl+0x214/0x480 > [ +0.12] __arm64_sys_ioctl+0x94/0xd4 > [ +0.10] invoke_syscall+0x4c/0x100 > [ +0.13] do_el0_svc+0xa4/0x168 > [ +0.12] el0_svc+0x20/0x50 > [ +0.09] el0t_64_sync_handler+0x20/0x110 > [ +0.08] el0t_64_sync+0x1a4/0x1a8 > [ +0.10] ---[ end trace 35bb2d245a684c9a ]--- > without the
Re: [PATCH 1/5] drm/msm/adreno: Use OPP for every GPU generation
On 23/02/2023 00:40, Konrad Dybcio wrote: On 22.02.2023 23:38, Dmitry Baryshkov wrote: On 22/02/2023 23:47, Konrad Dybcio wrote: Some older GPUs (namely a2xx with no opp tables at all and a320 with downstream-remnants gpu pwrlevels) used not to have OPP tables. They both however had just one frequency defined, making it extremely easy to construct such an OPP table from within the driver if need be. Do so and switch all clk_set_rate calls on core_clk to their OPP counterparts. Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 94 +++-- drivers/gpu/drm/msm/msm_gpu.c | 4 +- drivers/gpu/drm/msm/msm_gpu_devfreq.c | 2 +- 3 files changed, 45 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index ce6b76c45b6f..9b940c0f063f 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -922,73 +922,50 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, uint32_t ndwords) ring->id); } -/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp table */ -static int adreno_get_legacy_pwrlevels(struct device *dev) -{ - struct device_node *child, *node; - int ret; - - node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels"); - if (!node) { - DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n"); - return -ENXIO; - } - - for_each_child_of_node(node, child) { - unsigned int val; - - ret = of_property_read_u32(child, "qcom,gpu-freq", &val); - if (ret) - continue; - - /* - * Skip the intentionally bogus clock value found at the bottom - * of most legacy frequency tables - */ - if (val != 2700) - dev_pm_opp_add(dev, val, 0); - } - - of_node_put(node); - - return 0; -} - -static void adreno_get_pwrlevels(struct device *dev, +static int adreno_get_pwrlevels(struct device *dev, struct msm_gpu *gpu) { + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); unsigned long freq = ULONG_MAX; struct dev_pm_opp *opp; int ret; gpu->fast_rate = 0; - /* You down with OPP? */ - if (!of_find_property(dev->of_node, "operating-points-v2", NULL)) - ret = adreno_get_legacy_pwrlevels(dev); - else { - ret = devm_pm_opp_of_add_table(dev); - if (ret) - DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); - } - - if (!ret) { + /* devm_pm_opp_of_add_table may error out but will still create an OPP table */ + ret = devm_pm_opp_of_add_table(dev); + if (ret == -ENODEV) { + /* Special cases for ancient hw with ancient DT bindings */ + if (adreno_is_a2xx(adreno_gpu)) { + dev_warn(dev, "Unable to find the OPP table. Falling back to 200 MHz.\n"); + dev_pm_opp_add(dev, 2, 0); + gpu->fast_rate = 2; We can skip setting the fast_rate, dev_pm_opp_find_freq_floor below will get it from our freshly generated opp table. It's not reached in this code path. I see. I got lost in all the ifs. What do you think about turning it into the main code path, since after this code block we always have a valid OPP table? + } else if (adreno_is_a320(adreno_gpu)) { + dev_warn(dev, "Unable to find the OPP table. Falling back to 450 MHz.\n"); + dev_pm_opp_add(dev, 45000, 0); + gpu->fast_rate = 45000; + } else { + DRM_DEV_ERROR(dev, "Unable to find the OPP table\n"); + return -ENODEV; + } + } else if (ret) { + DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); + return ret; + } else { /* Find the fastest defined rate */ opp = dev_pm_opp_find_freq_floor(dev, &freq); - if (!IS_ERR(opp)) { + + if (IS_ERR(opp)) + return PTR_ERR(opp); + else { gpu->fast_rate = freq; dev_pm_opp_put(opp); } } - if (!gpu->fast_rate) { - dev_warn(dev, - "Could not find a clock rate. Using a reasonable default\n"); - /* Pick a suitably safe clock speed for any target */ - gpu->fast_rate = 2; - } - DBG("fast_rate=%u, slow_rate=2700", gpu->fast_rate); + + return 0; } int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu *adreno_gpu, -- With best wishes Dmitry
Re: [PATCH] drm/atomic-helpers: remove legacy_cursor_update hacks
On Thu, Feb 16, 2023 at 3:12 AM Daniel Vetter wrote: > > The stuff never really worked, and leads to lots of fun because it > out-of-order frees atomic states. Which upsets KASAN, among other > things. > > For async updates we now have a more solid solution with the > ->atomic_async_check and ->atomic_async_commit hooks. Support for that > for msm and vc4 landed. nouveau and i915 have their own commit > routines, doing something similar. > > For everyone else it's probably better to remove the use-after-free > bug, and encourage folks to use the async support instead. The > affected drivers which register a legacy cursor plane and don't either > use the new async stuff or their own commit routine are: amdgpu, > atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx. > > Inspired by an amdgpu bug report. > > v2: Drop RFC, I think with amdgpu converted over to use > atomic_async_check/commit done in > > commit 674e78acae0dfb4beb56132e41cbae5b60f7d662 > Author: Nicholas Kazlauskas > Date: Wed Dec 5 14:59:07 2018 -0500 > > drm/amd/display: Add fast path for cursor plane updates > > we don't have any driver anymore where we have userspace expecting > solid legacy cursor support _and_ they are using the atomic helpers in > their fully glory. So we can retire this. > > v3: Paper over msm and i915 regression. The complete_all is the only > thing missing afaict. > > v4: Fixup i915 fixup ... > > v5: Unallocate the crtc->event in msm to avoid hitting a WARN_ON in > dpu_crtc_atomic_flush(). This is a bit a hack, but simplest way to > untangle this all. Thanks to Abhinav Kumar for the debug help. Hmm, are you sure about that double-put? [ +0.501263] [ cut here ] [ +0.32] refcount_t: underflow; use-after-free. [ +0.33] WARNING: CPU: 6 PID: 1854 at lib/refcount.c:28 refcount_warn_saturate+0xf8/0x134 [ +0.43] Modules linked in: uinput rfcomm algif_hash algif_skcipher af_alg veth venus_dec venus_enc xt_cgroup xt_MASQUERADE qcom_spmi_temp_alarm qcom_spmi_adc_tm5 qcom_spmi_adc5 qcom_vadc_common cros_ec_typec typec 8021q hci_uart btqca qcom_stats venus_core coresight_etm4x coresight_tmc snd_soc_lpass_sc7180 coresight_replicator coresight_funnel coresight snd_soc_sc7180 ip6table_nat fuse ath10k_snoc ath10k_core ath mac80211 iio_trig_sysfs bluetooth cros_ec_sensors cfg80211 cros_ec_sensors_core industrialio_triggered_buffer kfifo_buf ecdh_generic ecc cros_ec_sensorhub lzo_rle lzo_compress r8153_ecm cdc_ether usbnet r8152 mii zram hid_vivaldi hid_google_hammer hid_vivaldi_common joydev [ +0.000189] CPU: 6 PID: 1854 Comm: DrmThread Not tainted 5.15.93-16271-g5ecce40dbcd4 #46 cf9752a1c9e5b13fd13216094f52d77fa5a5f8f3 [ +0.16] Hardware name: Google Wormdingler rev1+ INX panel board (DT) [ +0.08] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ +0.13] pc : refcount_warn_saturate+0xf8/0x134 [ +0.11] lr : refcount_warn_saturate+0xf8/0x134 [ +0.11] sp : ffc012e43930 [ +0.08] x29: ffc012e43930 x28: ff80d31aa300 x27: 024e [ +0.17] x26: 03bd x25: 0040 x24: 0040 [ +0.14] x23: ff8083eb1000 x22: 0002 x21: ff80845bc800 [ +0.13] x20: 0040 x19: ff80d0cecb00 x18: 60014024 [ +0.12] x17: x16: 003c x15: ffd97e21a1c0 [ +0.12] x14: 0003 x13: 0004 x12: 0001 [ +0.14] x11: c000dfff x10: ffd97f560f50 x9 : 5749cdb403550d00 [ +0.14] x8 : 5749cdb403550d00 x7 : x6 : 372e31332020205b [ +0.12] x5 : ffd97f7b8b24 x4 : x3 : ffc012e43588 [ +0.13] x2 : ffc012e43590 x1 : dfff x0 : 0026 [ +0.14] Call trace: [ +0.08] refcount_warn_saturate+0xf8/0x134 [ +0.13] drm_crtc_commit_put+0x54/0x74 [ +0.13] __drm_atomic_helper_plane_destroy_state+0x64/0x68 [ +0.13] dpu_plane_destroy_state+0x24/0x3c [ +0.17] drm_atomic_state_default_clear+0x13c/0x2d8 [ +0.15] __drm_atomic_state_free+0x88/0xa0 [ +0.15] drm_atomic_helper_update_plane+0x158/0x188 [ +0.14] __setplane_atomic+0xf4/0x138 [ +0.12] drm_mode_cursor_common+0x2e8/0x40c [ +0.09] drm_mode_cursor_ioctl+0x48/0x70 [ +0.08] drm_ioctl_kernel+0xe0/0x158 [ +0.14] drm_ioctl+0x214/0x480 [ +0.12] __arm64_sys_ioctl+0x94/0xd4 [ +0.10] invoke_syscall+0x4c/0x100 [ +0.13] do_el0_svc+0xa4/0x168 [ +0.12] el0_svc+0x20/0x50 [ +0.09] el0t_64_sync_handler+0x20/0x110 [ +0.08] el0t_64_sync+0x1a4/0x1a8 [ +0.10] ---[ end trace 35bb2d245a684c9a ]--- BR, -R > Cc: Abhinav Kumar > Cc: Thomas Zimmermann > Cc: Maxime Ripard > References: https://bugzilla.kernel.org/show_bug.cgi?id=199425 > References: > https://lore.kernel.org/all/20220221134155.125447-9-max...@cerno.tech/ > References: https://bugzilla.kernel.org/show_bug.cgi?
Re: [PATCH 1/5] drm/msm/adreno: Use OPP for every GPU generation
On 22.02.2023 23:38, Dmitry Baryshkov wrote: > On 22/02/2023 23:47, Konrad Dybcio wrote: >> Some older GPUs (namely a2xx with no opp tables at all and a320 with >> downstream-remnants gpu pwrlevels) used not to have OPP tables. They >> both however had just one frequency defined, making it extremely easy >> to construct such an OPP table from within the driver if need be. >> >> Do so and switch all clk_set_rate calls on core_clk to their OPP >> counterparts. >> >> Signed-off-by: Konrad Dybcio >> --- >> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 94 >> +++-- >> drivers/gpu/drm/msm/msm_gpu.c | 4 +- >> drivers/gpu/drm/msm/msm_gpu_devfreq.c | 2 +- >> 3 files changed, 45 insertions(+), 55 deletions(-) >> >> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c >> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c >> index ce6b76c45b6f..9b940c0f063f 100644 >> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c >> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c >> @@ -922,73 +922,50 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, >> uint32_t ndwords) >> ring->id); >> } >> -/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp >> table */ >> -static int adreno_get_legacy_pwrlevels(struct device *dev) >> -{ >> - struct device_node *child, *node; >> - int ret; >> - >> - node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels"); >> - if (!node) { >> - DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n"); >> - return -ENXIO; >> - } >> - >> - for_each_child_of_node(node, child) { >> - unsigned int val; >> - >> - ret = of_property_read_u32(child, "qcom,gpu-freq", &val); >> - if (ret) >> - continue; >> - >> - /* >> - * Skip the intentionally bogus clock value found at the bottom >> - * of most legacy frequency tables >> - */ >> - if (val != 2700) >> - dev_pm_opp_add(dev, val, 0); >> - } >> - >> - of_node_put(node); >> - >> - return 0; >> -} >> - >> -static void adreno_get_pwrlevels(struct device *dev, >> +static int adreno_get_pwrlevels(struct device *dev, >> struct msm_gpu *gpu) >> { >> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); >> unsigned long freq = ULONG_MAX; >> struct dev_pm_opp *opp; >> int ret; >> gpu->fast_rate = 0; >> - /* You down with OPP? */ >> - if (!of_find_property(dev->of_node, "operating-points-v2", NULL)) >> - ret = adreno_get_legacy_pwrlevels(dev); >> - else { >> - ret = devm_pm_opp_of_add_table(dev); >> - if (ret) >> - DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); >> - } >> - >> - if (!ret) { >> + /* devm_pm_opp_of_add_table may error out but will still create an OPP >> table */ >> + ret = devm_pm_opp_of_add_table(dev); >> + if (ret == -ENODEV) { >> + /* Special cases for ancient hw with ancient DT bindings */ >> + if (adreno_is_a2xx(adreno_gpu)) { >> + dev_warn(dev, "Unable to find the OPP table. Falling back to >> 200 MHz.\n"); >> + dev_pm_opp_add(dev, 2, 0); >> + gpu->fast_rate = 2; > > We can skip setting the fast_rate, dev_pm_opp_find_freq_floor below will get > it from our freshly generated opp table. It's not reached in this code path. > >> + } else if (adreno_is_a320(adreno_gpu)) { >> + dev_warn(dev, "Unable to find the OPP table. Falling back to >> 450 MHz.\n"); >> + dev_pm_opp_add(dev, 45000, 0); >> + gpu->fast_rate = 45000; >> + } else { >> + DRM_DEV_ERROR(dev, "Unable to find the OPP table\n"); >> + return -ENODEV; >> + } >> + } else if (ret) { >> + DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); >> + return ret; >> + } else { >> /* Find the fastest defined rate */ >> opp = dev_pm_opp_find_freq_floor(dev, &freq); >> - if (!IS_ERR(opp)) { >> + >> + if (IS_ERR(opp)) >> + return PTR_ERR(opp); >> + else { >> gpu->fast_rate = freq; >> dev_pm_opp_put(opp); >> } >> } >> - if (!gpu->fast_rate) { >> - dev_warn(dev, >> - "Could not find a clock rate. Using a reasonable default\n"); >> - /* Pick a suitably safe clock speed for any target */ >> - gpu->fast_rate = 2; >> - } >> - >> DBG("fast_rate=%u, slow_rate=2700", gpu->fast_rate); >> + >> + return 0; >> } >> int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu >> *adreno_gpu, >> @@ -1046,6 +1023,17 @@ int adreno_gpu_init(struct drm_device *drm, struct >> platform_device *pdev, >> struct adreno_rev *rev = &config->rev; >> const char *gpu_name; >> u32 speedbin; >> + int ret; >> + >> + /* This can only be done here, or devm_pm_opp_set_supported_hw w
Re: [PATCH 1/5] drm/msm/adreno: Use OPP for every GPU generation
On 22/02/2023 23:47, Konrad Dybcio wrote: Some older GPUs (namely a2xx with no opp tables at all and a320 with downstream-remnants gpu pwrlevels) used not to have OPP tables. They both however had just one frequency defined, making it extremely easy to construct such an OPP table from within the driver if need be. Do so and switch all clk_set_rate calls on core_clk to their OPP counterparts. Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 94 +++-- drivers/gpu/drm/msm/msm_gpu.c | 4 +- drivers/gpu/drm/msm/msm_gpu_devfreq.c | 2 +- 3 files changed, 45 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index ce6b76c45b6f..9b940c0f063f 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -922,73 +922,50 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, uint32_t ndwords) ring->id); } -/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp table */ -static int adreno_get_legacy_pwrlevels(struct device *dev) -{ - struct device_node *child, *node; - int ret; - - node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels"); - if (!node) { - DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n"); - return -ENXIO; - } - - for_each_child_of_node(node, child) { - unsigned int val; - - ret = of_property_read_u32(child, "qcom,gpu-freq", &val); - if (ret) - continue; - - /* -* Skip the intentionally bogus clock value found at the bottom -* of most legacy frequency tables -*/ - if (val != 2700) - dev_pm_opp_add(dev, val, 0); - } - - of_node_put(node); - - return 0; -} - -static void adreno_get_pwrlevels(struct device *dev, +static int adreno_get_pwrlevels(struct device *dev, struct msm_gpu *gpu) { + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); unsigned long freq = ULONG_MAX; struct dev_pm_opp *opp; int ret; gpu->fast_rate = 0; - /* You down with OPP? */ - if (!of_find_property(dev->of_node, "operating-points-v2", NULL)) - ret = adreno_get_legacy_pwrlevels(dev); - else { - ret = devm_pm_opp_of_add_table(dev); - if (ret) - DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); - } - - if (!ret) { + /* devm_pm_opp_of_add_table may error out but will still create an OPP table */ + ret = devm_pm_opp_of_add_table(dev); + if (ret == -ENODEV) { + /* Special cases for ancient hw with ancient DT bindings */ + if (adreno_is_a2xx(adreno_gpu)) { + dev_warn(dev, "Unable to find the OPP table. Falling back to 200 MHz.\n"); + dev_pm_opp_add(dev, 2, 0); + gpu->fast_rate = 2; We can skip setting the fast_rate, dev_pm_opp_find_freq_floor below will get it from our freshly generated opp table. + } else if (adreno_is_a320(adreno_gpu)) { + dev_warn(dev, "Unable to find the OPP table. Falling back to 450 MHz.\n"); + dev_pm_opp_add(dev, 45000, 0); + gpu->fast_rate = 45000; + } else { + DRM_DEV_ERROR(dev, "Unable to find the OPP table\n"); + return -ENODEV; + } + } else if (ret) { + DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); + return ret; + } else { /* Find the fastest defined rate */ opp = dev_pm_opp_find_freq_floor(dev, &freq); - if (!IS_ERR(opp)) { + + if (IS_ERR(opp)) + return PTR_ERR(opp); + else { gpu->fast_rate = freq; dev_pm_opp_put(opp); } } - if (!gpu->fast_rate) { - dev_warn(dev, - "Could not find a clock rate. Using a reasonable default\n"); - /* Pick a suitably safe clock speed for any target */ - gpu->fast_rate = 2; - } - DBG("fast_rate=%u, slow_rate=2700", gpu->fast_rate); + + return 0; } int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu *adreno_gpu, @@ -1046,6 +1023,17 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev, struct adreno_rev *rev = &config->rev; const char *gpu_name; u32 speedbin; + int ret; + + /* This can only be done here, or devm_pm_opp_set_supported_hw will WARN_ON() */ I
Re: [PATCH 2/5] drm/msm/a2xx: Implement .gpu_busy
On 22/02/2023 23:47, Konrad Dybcio wrote: Implement gpu_busy based on the downstream msm-3.4 code [1]. This allows us to use devfreq on this old old old hardware! [1] https://github.com/LineageOS/android_kernel_sony_apq8064/blob/lineage-16.0/drivers/gpu/msm/adreno_a2xx.c#L1975 Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 28 1 file changed, 28 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c index c67089a7ebc1..6258c98e5a88 100644 --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c @@ -481,6 +481,33 @@ a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) return aspace; } +/* While the precise size of this field is unknown, it holds at least these three values.. */ +#define PERF_MODE_CNT GENMASK(2, 0) + #define PERF_STATE_RESET 0x0 + #define PERF_STATE_ENABLE 0x1 + #define PERF_STATE_FREEZE 0x2 These should go into a2xx.xml.h LGTM otherwise. +static u64 a2xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate) +{ + u64 busy_cycles; + + /* Freeze the counter */ + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, FIELD_PREP(PERF_MODE_CNT, PERF_STATE_FREEZE)); + + busy_cycles = gpu_read64(gpu, REG_A2XX_RBBM_PERFCOUNTER1_LO); + + /* Reset the counter */ + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, FIELD_PREP(PERF_MODE_CNT, PERF_STATE_RESET)); + + /* Re-enable the performance monitors */ + gpu_rmw(gpu, REG_A2XX_RBBM_PM_OVERRIDE2, BIT(6), BIT(6)); + gpu_write(gpu, REG_A2XX_RBBM_PERFCOUNTER1_SELECT, 1); + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, FIELD_PREP(PERF_MODE_CNT, PERF_STATE_ENABLE)); + + *out_sample_rate = clk_get_rate(gpu->core_clk); + + return busy_cycles; +} + static u32 a2xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring) { ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR); @@ -502,6 +529,7 @@ static const struct adreno_gpu_funcs funcs = { #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP) .show = adreno_show, #endif + .gpu_busy = a2xx_gpu_busy, .gpu_state_get = a2xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, .create_address_space = a2xx_create_address_space, -- With best wishes Dmitry
Re: [PATCH 4/5] drm/msm/a4xx: Implement .gpu_busy
On 22/02/2023 23:47, Konrad Dybcio wrote: Add support for gpu_busy on a4xx, which is required for devfreq support. Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 11 +++ 1 file changed, 11 insertions(+) Reviewed-by: Dmitry Baryshkov -- With best wishes Dmitry
Re: [PATCH 5/5] drm/msm/a5xx: Enable optional icc voting from OPP tables
On 23/02/2023 00:14, Konrad Dybcio wrote: On 22.02.2023 23:12, Dmitry Baryshkov wrote: On 22/02/2023 23:47, Konrad Dybcio wrote: Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework handle bus voting as part of power level setting. This can probably go to the generic code path rather than sticking it into a5xx only. The reasoning is explained in the cover letter, a3xx/a4xx already have "raw" icc set calls which would require more work (and above all, testing) to untangle while keeping backwards compat, this is a midterm solution that would allow getting scaling to work earlier. Those two platforms call icc_set_bw() during setup, however their opp tables do not contain BW settings, making dev_pm_opp_of_find_icc_paths() nop. So, I think, we might as well call this function on a3xx/a4xx, making the code future proof. Konrad Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c index d09221f97f71..a33af0cc27b6 100644 --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c @@ -1775,5 +1775,9 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev) /* Set up the preemption specific bits and pieces for each ringbuffer */ a5xx_preempt_init(gpu); + ret = dev_pm_opp_of_find_icc_paths(&pdev->dev, NULL); + if (ret) + return ERR_PTR(ret); + return gpu; } -- With best wishes Dmitry
Re: [PATCH 5/5] drm/msm/a5xx: Enable optional icc voting from OPP tables
On 22.02.2023 23:12, Dmitry Baryshkov wrote: > On 22/02/2023 23:47, Konrad Dybcio wrote: >> Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework >> handle bus voting as part of power level setting. > > This can probably go to the generic code path rather than sticking it into > a5xx only. The reasoning is explained in the cover letter, a3xx/a4xx already have "raw" icc set calls which would require more work (and above all, testing) to untangle while keeping backwards compat, this is a midterm solution that would allow getting scaling to work earlier. Konrad > >> >> Signed-off-by: Konrad Dybcio >> --- >> drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c >> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c >> index d09221f97f71..a33af0cc27b6 100644 >> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c >> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c >> @@ -1775,5 +1775,9 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev) >> /* Set up the preemption specific bits and pieces for each ringbuffer >> */ >> a5xx_preempt_init(gpu); >> + ret = dev_pm_opp_of_find_icc_paths(&pdev->dev, NULL); >> + if (ret) >> + return ERR_PTR(ret); >> + >> return gpu; >> } >> >
Re: [PATCH 5/5] drm/msm/a5xx: Enable optional icc voting from OPP tables
On 22/02/2023 23:47, Konrad Dybcio wrote: Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework handle bus voting as part of power level setting. This can probably go to the generic code path rather than sticking it into a5xx only. Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c index d09221f97f71..a33af0cc27b6 100644 --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c @@ -1775,5 +1775,9 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev) /* Set up the preemption specific bits and pieces for each ringbuffer */ a5xx_preempt_init(gpu); + ret = dev_pm_opp_of_find_icc_paths(&pdev->dev, NULL); + if (ret) + return ERR_PTR(ret); + return gpu; } -- With best wishes Dmitry
Re: [PATCH 3/5] drm/msm/a3xx: Implement .gpu_busy
On 22/02/2023 23:47, Konrad Dybcio wrote: Add support for gpu_busy on a3xx, which is required for devfreq support. Signed-off-by: Konrad Dybcio Tested-by: Dmitry Baryshkov #ifc6410 Reviewed-by: Dmitry Baryshkov -- With best wishes Dmitry
[PATCH 3/5] drm/msm/a3xx: Implement .gpu_busy
Add support for gpu_busy on a3xx, which is required for devfreq support. Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c index 948785ed07bb..c86b377f6f0d 100644 --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c @@ -477,6 +477,16 @@ static struct msm_gpu_state *a3xx_gpu_state_get(struct msm_gpu *gpu) return state; } +static u64 a3xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate) +{ + u64 busy_cycles; + + busy_cycles = gpu_read64(gpu, REG_A3XX_RBBM_PERFCTR_RBBM_1_LO); + *out_sample_rate = clk_get_rate(gpu->core_clk); + + return busy_cycles; +} + static u32 a3xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring) { ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR); @@ -498,6 +508,7 @@ static const struct adreno_gpu_funcs funcs = { #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP) .show = adreno_show, #endif + .gpu_busy = a3xx_gpu_busy, .gpu_state_get = a3xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, .create_address_space = adreno_create_address_space, -- 2.39.2
[PATCH 5/5] drm/msm/a5xx: Enable optional icc voting from OPP tables
Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework handle bus voting as part of power level setting. Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c index d09221f97f71..a33af0cc27b6 100644 --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c @@ -1775,5 +1775,9 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev) /* Set up the preemption specific bits and pieces for each ringbuffer */ a5xx_preempt_init(gpu); + ret = dev_pm_opp_of_find_icc_paths(&pdev->dev, NULL); + if (ret) + return ERR_PTR(ret); + return gpu; } -- 2.39.2
[PATCH 2/5] drm/msm/a2xx: Implement .gpu_busy
Implement gpu_busy based on the downstream msm-3.4 code [1]. This allows us to use devfreq on this old old old hardware! [1] https://github.com/LineageOS/android_kernel_sony_apq8064/blob/lineage-16.0/drivers/gpu/msm/adreno_a2xx.c#L1975 Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 28 1 file changed, 28 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c index c67089a7ebc1..6258c98e5a88 100644 --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c @@ -481,6 +481,33 @@ a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) return aspace; } +/* While the precise size of this field is unknown, it holds at least these three values.. */ +#define PERF_MODE_CNT GENMASK(2, 0) + #define PERF_STATE_RESET 0x0 + #define PERF_STATE_ENABLE 0x1 + #define PERF_STATE_FREEZE 0x2 +static u64 a2xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate) +{ + u64 busy_cycles; + + /* Freeze the counter */ + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, FIELD_PREP(PERF_MODE_CNT, PERF_STATE_FREEZE)); + + busy_cycles = gpu_read64(gpu, REG_A2XX_RBBM_PERFCOUNTER1_LO); + + /* Reset the counter */ + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, FIELD_PREP(PERF_MODE_CNT, PERF_STATE_RESET)); + + /* Re-enable the performance monitors */ + gpu_rmw(gpu, REG_A2XX_RBBM_PM_OVERRIDE2, BIT(6), BIT(6)); + gpu_write(gpu, REG_A2XX_RBBM_PERFCOUNTER1_SELECT, 1); + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, FIELD_PREP(PERF_MODE_CNT, PERF_STATE_ENABLE)); + + *out_sample_rate = clk_get_rate(gpu->core_clk); + + return busy_cycles; +} + static u32 a2xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring) { ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR); @@ -502,6 +529,7 @@ static const struct adreno_gpu_funcs funcs = { #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP) .show = adreno_show, #endif + .gpu_busy = a2xx_gpu_busy, .gpu_state_get = a2xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, .create_address_space = a2xx_create_address_space, -- 2.39.2
[PATCH 4/5] drm/msm/a4xx: Implement .gpu_busy
Add support for gpu_busy on a4xx, which is required for devfreq support. Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c index 3e09d3a7a0ac..715436cb3996 100644 --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c @@ -611,6 +611,16 @@ static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value) return 0; } +static u64 a4xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate) +{ + u64 busy_cycles; + + busy_cycles = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_RBBM_1_LO); + *out_sample_rate = clk_get_rate(gpu->core_clk); + + return busy_cycles; +} + static u32 a4xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring) { ring->memptrs->rptr = gpu_read(gpu, REG_A4XX_CP_RB_RPTR); @@ -632,6 +642,7 @@ static const struct adreno_gpu_funcs funcs = { #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP) .show = adreno_show, #endif + .gpu_busy = a4xx_gpu_busy, .gpu_state_get = a4xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, .create_address_space = adreno_create_address_space, -- 2.39.2
[PATCH 1/5] drm/msm/adreno: Use OPP for every GPU generation
Some older GPUs (namely a2xx with no opp tables at all and a320 with downstream-remnants gpu pwrlevels) used not to have OPP tables. They both however had just one frequency defined, making it extremely easy to construct such an OPP table from within the driver if need be. Do so and switch all clk_set_rate calls on core_clk to their OPP counterparts. Signed-off-by: Konrad Dybcio --- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 94 +++-- drivers/gpu/drm/msm/msm_gpu.c | 4 +- drivers/gpu/drm/msm/msm_gpu_devfreq.c | 2 +- 3 files changed, 45 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index ce6b76c45b6f..9b940c0f063f 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -922,73 +922,50 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, uint32_t ndwords) ring->id); } -/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp table */ -static int adreno_get_legacy_pwrlevels(struct device *dev) -{ - struct device_node *child, *node; - int ret; - - node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels"); - if (!node) { - DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n"); - return -ENXIO; - } - - for_each_child_of_node(node, child) { - unsigned int val; - - ret = of_property_read_u32(child, "qcom,gpu-freq", &val); - if (ret) - continue; - - /* -* Skip the intentionally bogus clock value found at the bottom -* of most legacy frequency tables -*/ - if (val != 2700) - dev_pm_opp_add(dev, val, 0); - } - - of_node_put(node); - - return 0; -} - -static void adreno_get_pwrlevels(struct device *dev, +static int adreno_get_pwrlevels(struct device *dev, struct msm_gpu *gpu) { + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); unsigned long freq = ULONG_MAX; struct dev_pm_opp *opp; int ret; gpu->fast_rate = 0; - /* You down with OPP? */ - if (!of_find_property(dev->of_node, "operating-points-v2", NULL)) - ret = adreno_get_legacy_pwrlevels(dev); - else { - ret = devm_pm_opp_of_add_table(dev); - if (ret) - DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); - } - - if (!ret) { + /* devm_pm_opp_of_add_table may error out but will still create an OPP table */ + ret = devm_pm_opp_of_add_table(dev); + if (ret == -ENODEV) { + /* Special cases for ancient hw with ancient DT bindings */ + if (adreno_is_a2xx(adreno_gpu)) { + dev_warn(dev, "Unable to find the OPP table. Falling back to 200 MHz.\n"); + dev_pm_opp_add(dev, 2, 0); + gpu->fast_rate = 2; + } else if (adreno_is_a320(adreno_gpu)) { + dev_warn(dev, "Unable to find the OPP table. Falling back to 450 MHz.\n"); + dev_pm_opp_add(dev, 45000, 0); + gpu->fast_rate = 45000; + } else { + DRM_DEV_ERROR(dev, "Unable to find the OPP table\n"); + return -ENODEV; + } + } else if (ret) { + DRM_DEV_ERROR(dev, "Unable to set the OPP table\n"); + return ret; + } else { /* Find the fastest defined rate */ opp = dev_pm_opp_find_freq_floor(dev, &freq); - if (!IS_ERR(opp)) { + + if (IS_ERR(opp)) + return PTR_ERR(opp); + else { gpu->fast_rate = freq; dev_pm_opp_put(opp); } } - if (!gpu->fast_rate) { - dev_warn(dev, - "Could not find a clock rate. Using a reasonable default\n"); - /* Pick a suitably safe clock speed for any target */ - gpu->fast_rate = 2; - } - DBG("fast_rate=%u, slow_rate=2700", gpu->fast_rate); + + return 0; } int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu *adreno_gpu, @@ -1046,6 +1023,17 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev, struct adreno_rev *rev = &config->rev; const char *gpu_name; u32 speedbin; + int ret; + + /* This can only be done here, or devm_pm_opp_set_supported_hw will WARN_ON() */ + if (IS_ERR(devm_clk_get(dev, "core"))) { + /* +* If "core" is absent, go for the legacy clock name. +* If we got this
[PATCH 0/5] OPP and devfreq for all Adrenos
This series is a combination of [1] and a subset of [2] and some new stuff. With it, devfreq is used on all a2xx-a6xx (including gmu and gmu-wrapper) and all clk_set_rate(core clock) calls are dropped in favour of dev_pm_opp_set_rate, which - drumroll - lets us scale the voltage domain. DT patches making use of that will be sent separately. On top of that, a5xx gets a call to enable icc scaling from the OPP tables. No SoCs implementing a2xx have icc support yet and a3/4xx SoCs have separate logic for that, which will be updated at a later time. Getting this in for 6.4 early would be appreciated, as that would allow for getting GMU wrapper GPUs up (without VDD&icc scaling they can only run at lower freqs, which is.. ehhh..) Changes: - a3xx busy: use the _1 counter as per msm-3.x instead of _0 - a6xx-series-opp: basically rewrite, ensure compat with all gens - a2/4xx busy: new patch - a5xx icc: new patch [1] https://lore.kernel.org/linux-arm-msm/20230130093809.2079314-1-konrad.dyb...@linaro.org/ [2] https://lore.kernel.org/linux-arm-msm/20230214173145.2482651-1-konrad.dyb...@linaro.org/ Signed-off-by: Konrad Dybcio --- Konrad Dybcio (5): drm/msm/adreno: Use OPP for every GPU generation drm/msm/a2xx: Implement .gpu_busy drm/msm/a3xx: Implement .gpu_busy drm/msm/a4xx: Implement .gpu_busy drm/msm/a5xx: Enable optional icc voting from OPP tables drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 28 ++ drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 11 drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 11 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 ++ drivers/gpu/drm/msm/adreno/adreno_gpu.c | 94 +++-- drivers/gpu/drm/msm/msm_gpu.c | 4 +- drivers/gpu/drm/msm/msm_gpu_devfreq.c | 2 +- 7 files changed, 99 insertions(+), 55 deletions(-) --- base-commit: f4ed0868966d96203fee6f2782508746ded2ce3f change-id: 20230222-konrad-longbois-next-86d1a69532c2 Best regards, -- Konrad Dybcio
[PATCH] drm/i915/gsc: Fix the Driver-FLR completion
The Driver-FLR flow may inadvertently exit early before the full completion of the re-init of the internal HW state if we only poll GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead we need a two-step completion wait-for-completion flow that also involves GU_CNTL. See the patch and new code comments for detail. This is new direction from HW architecture folks. v2: - Add error message for the teardown timeout (Anshuman) - Don't duplicate code in comments (Jani) Signed-off-by: Alan Previn Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was loaded") --- drivers/gpu/drm/i915/intel_uncore.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index f018da7ebaac..f3c46352db89 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -2749,14 +2749,25 @@ static void driver_initiated_flr(struct intel_uncore *uncore) /* Trigger the actual Driver-FLR */ intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR); + /* Wait for hardware teardown to complete */ + ret = intel_wait_for_register_fw(uncore, GU_CNTL, +DRIVERFLR_STATUS, 0, +flr_timeout_ms); + if (ret) { + drm_err(&i915->drm, "Driver-FLR-teardown wait completion failed! %d\n", ret); + return; + } + + /* Wait for hardware/firmware re-init to complete */ ret = intel_wait_for_register_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS, DRIVERFLR_STATUS, flr_timeout_ms); if (ret) { - drm_err(&i915->drm, "wait for Driver-FLR completion failed! %d\n", ret); + drm_err(&i915->drm, "Driver-FLR-reinit wait completion failed! %d\n", ret); return; } + /* Clear sticky completion status */ intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS); } -- 2.39.0
Re: [Intel-gfx] [PATCH 2/2] Apply quirk to disable PSR 2 on Tongfang PHxTxX1 and PHxTQx1
On Wed, Feb 22, 2023 at 03:17:55PM +0100, Werner Sembach wrote: > On these Barebones PSR 2 is recognized as supported but is very buggy: > - Upper third of screen does sometimes not updated, resulting in > disappearing cursors or ghosts of already closed Windows saying behind. > - Approximately 40 px from the bottom edge a 3 pixel wide strip of randomly > colored pixels is flickering. > > PSR 1 is working fine however. I wonder if this is really about the panel's PSR2 or about the userspace there not marking the dirtyfb? I know I know... it is not userspace fault... But I wonder if the case you got here highlights the fact that we have a substantial bug in the i915 itself in regards to PSR2 API. Jose, Jouni, ideas on how to check what could be happening here? oh, btw, Werner, do we have an open gilab issue for this? Thanks, Rodrigo. > > Signed-off-by: Werner Sembach > Cc: > --- > drivers/gpu/drm/i915/display/intel_quirks.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/drivers/gpu/drm/i915/display/intel_quirks.c > b/drivers/gpu/drm/i915/display/intel_quirks.c > index ce6d0fe6448f5..eeb32d3189f5c 100644 > --- a/drivers/gpu/drm/i915/display/intel_quirks.c > +++ b/drivers/gpu/drm/i915/display/intel_quirks.c > @@ -65,6 +65,10 @@ static void quirk_no_pps_backlight_power_hook(struct > drm_i915_private *i915) > drm_info(&i915->drm, "Applying no pps backlight power quirk\n"); > } > > +/* > + * Tongfang PHxTxX1 and PHxTQx1 devices have support for PSR 2 but it is > broken > + * on Linux. PSR 1 however works just fine. > + */ > static void quirk_no_psr2(struct drm_i915_private *i915) > { > intel_set_quirk(i915, QUIRK_NO_PSR2); > @@ -205,6 +209,10 @@ static struct intel_quirk intel_quirks[] = { > /* ECS Liva Q2 */ > { 0x3185, 0x1019, 0xa94d, quirk_increase_ddi_disabled_time }, > { 0x3184, 0x1019, 0xa94d, quirk_increase_ddi_disabled_time }, > + > + /* Tongfang PHxTxX1 and PHxTQx1/TUXEDO InfinityBook 14 Gen6 */ > + { 0x9a49, 0x1d05, 0x1105, quirk_no_psr2 }, > + { 0x9a49, 0x1d05, 0x114c, quirk_no_psr2 }, > }; > > void intel_init_quirks(struct drm_i915_private *i915) > -- > 2.34.1 >
Re: [PATCH 3/4] drm/msm/adreno: drop redundant pm_runtime_disable()
On Tue, Feb 21, 2023 at 2:16 AM Johan Hovold wrote: > > Since commit 4b18299b3365 ("drm/msm/adreno: Defer enabling runpm until > hw_init()") runtime PM is no longer enabled at adreno_gpu_init(), which > means that there are no longer any bind() error paths for which > adreno_gpu_cleanup() is called with runtime PM enabled. > > As the runtime PM enable on first open() is balanced by the > pm_runtime_force_suspend() call at unbind(), adreno_gpu_cleanup() is now > always called with runtime PM disabled so that its pm_runtime_disable() > call can be removed. > > Signed-off-by: Johan Hovold > --- > drivers/gpu/drm/msm/adreno/adreno_gpu.c | 5 - > 1 file changed, 5 deletions(-) > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c > b/drivers/gpu/drm/msm/adreno/adreno_gpu.c > index ce6b76c45b6f..1101b8234b49 100644 > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c > @@ -1082,15 +1082,10 @@ int adreno_gpu_init(struct drm_device *drm, struct > platform_device *pdev, > > void adreno_gpu_cleanup(struct adreno_gpu *adreno_gpu) > { > - struct msm_gpu *gpu = &adreno_gpu->base; > - struct msm_drm_private *priv = gpu->dev ? gpu->dev->dev_private : > NULL; > unsigned int i; > > for (i = 0; i < ARRAY_SIZE(adreno_gpu->info->fw); i++) > release_firmware(adreno_gpu->fw[i]); > > - if (priv && pm_runtime_enabled(&priv->gpu_pdev->dev)) > - pm_runtime_disable(&priv->gpu_pdev->dev); > - Maybe WARN_ON(priv && pm_runtime_enabled(&priv->gpu_pdev->dev))? BR, -R > msm_gpu_cleanup(&adreno_gpu->base); > } > -- > 2.39.2 >
Re: [PATCH v4 01/14] dma-buf/dma-fence: Add deadline awareness
On Wed, Feb 22, 2023 at 9:33 AM Tvrtko Ursulin wrote: > > > On 22/02/2023 17:16, Rob Clark wrote: > > On Wed, Feb 22, 2023 at 9:05 AM Tvrtko Ursulin > > wrote: > >> > >> > >> On 22/02/2023 15:28, Christian König wrote: > >>> Am 22.02.23 um 11:23 schrieb Tvrtko Ursulin: > > On 18/02/2023 21:15, Rob Clark wrote: > > From: Rob Clark > > > > Add a way to hint to the fence signaler of an upcoming deadline, such as > > vblank, which the fence waiter would prefer not to miss. This is to aid > > the fence signaler in making power management decisions, like boosting > > frequency as the deadline approaches and awareness of missing deadlines > > so that can be factored in to the frequency scaling. > > > > v2: Drop dma_fence::deadline and related logic to filter duplicate > > deadlines, to avoid increasing dma_fence size. The fence-context > > implementation will need similar logic to track deadlines of all > > the fences on the same timeline. [ckoenig] > > v3: Clarify locking wrt. set_deadline callback > > > > Signed-off-by: Rob Clark > > Reviewed-by: Christian König > > --- > >drivers/dma-buf/dma-fence.c | 20 > >include/linux/dma-fence.h | 20 > >2 files changed, 40 insertions(+) > > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > > index 0de0482cd36e..763b32627684 100644 > > --- a/drivers/dma-buf/dma-fence.c > > +++ b/drivers/dma-buf/dma-fence.c > > @@ -912,6 +912,26 @@ dma_fence_wait_any_timeout(struct dma_fence > > **fences, uint32_t count, > >} > >EXPORT_SYMBOL(dma_fence_wait_any_timeout); > >+ > > +/** > > + * dma_fence_set_deadline - set desired fence-wait deadline > > + * @fence:the fence that is to be waited on > > + * @deadline: the time by which the waiter hopes for the fence to be > > + *signaled > > + * > > + * Inform the fence signaler of an upcoming deadline, such as > > vblank, by > > + * which point the waiter would prefer the fence to be signaled by. > > This > > + * is intended to give feedback to the fence signaler to aid in power > > + * management decisions, such as boosting GPU frequency if a periodic > > + * vblank deadline is approaching. > > + */ > > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) > > +{ > > +if (fence->ops->set_deadline && !dma_fence_is_signaled(fence)) > > +fence->ops->set_deadline(fence, deadline); > > +} > > +EXPORT_SYMBOL(dma_fence_set_deadline); > > + > >/** > > * dma_fence_describe - Dump fence describtion into seq_file > > * @fence: the 6fence to describe > > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h > > index 775cdc0b4f24..d77f6591c453 100644 > > --- a/include/linux/dma-fence.h > > +++ b/include/linux/dma-fence.h > > @@ -99,6 +99,7 @@ enum dma_fence_flag_bits { > >DMA_FENCE_FLAG_SIGNALED_BIT, > >DMA_FENCE_FLAG_TIMESTAMP_BIT, > >DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, > > +DMA_FENCE_FLAG_HAS_DEADLINE_BIT, > > Would this bit be better left out from core implementation, given how > the approach is the component which implements dma-fence has to track > the actual deadline and all? > > Also taking a step back - are we all okay with starting to expand the > relatively simple core synchronisation primitive with side channel > data like this? What would be the criteria for what side channel data > would be acceptable? Taking note the thing lives outside drivers/gpu/. > >>> > >>> I had similar concerns and it took me a moment as well to understand the > >>> background why this is necessary. I essentially don't see much other > >>> approach we could do. > >>> > >>> Yes, this is GPU/CRTC specific, but we somehow need a common interface > >>> for communicating it between drivers and that's the dma_fence object as > >>> far as I can see. > >> > >> Yeah I also don't see any other easy options. Just wanted to raise this > >> as something which probably needs some wider acks. > >> > >> Also what about the "low level" part of my question about the reason, or > >> benefits, of defining the deadline bit in the common layer? > > > > We could leave DMA_FENCE_FLAG_HAS_DEADLINE_BIT out, but OTOH managing > > a bitmask that is partially defined in core enum and partially in > > backend-driver has it's own drawbacks, and it isn't like we are > > running out of bits.. :shrug: > > There is DMA_FENCE_FLAG_USER_BITS onwards which implementations could > use to store their stuff? > > And if we skip forward to "drm/scheduler: Add fence deadline support" > that's the only place bit is used, right? Would it simply work to look > at drm_sched_fence->deadline == 0 as b
Re: [PATCH v2 1/2] drm: Introduce plane SIZE_HINTS property
On Tue, Feb 14, 2023 at 01:19:51PM +0200, Ville Syrjälä wrote: > On Tue, Feb 14, 2023 at 12:01:49PM +0100, Jonas Ådahl wrote: > > On Tue, Feb 14, 2023 at 12:28:44PM +0200, Ville Syrjälä wrote: > > > On Tue, Feb 14, 2023 at 10:54:27AM +0100, Jonas Ådahl wrote: > > > > On Tue, Feb 14, 2023 at 11:25:56AM +0200, Ville Syrjälä wrote: > > > > > On Thu, Feb 09, 2023 at 03:16:23PM +0100, Jonas Ådahl wrote: > > > > > > On Wed, Feb 08, 2023 at 11:10:16PM +0200, Ville Syrjala wrote: > > > > > > > From: Ville Syrjälä > > > > > > > > > > > > > > Add a new immutable plane property by which a plane can advertise > > > > > > > a handful of recommended plane sizes. This would be mostly exposed > > > > > > > by cursor planes as a slightly more capable replacement for > > > > > > > the DRM_CAP_CURSOR_WIDTH/HEIGHT caps, which can only declare > > > > > > > a one size fits all limit for the whole device. > > > > > > > > > > > > > > Currently eg. amdgpu/i915/nouveau just advertize the max cursor > > > > > > > size via the cursor size caps. But always using the max sized > > > > > > > cursor can waste a surprising amount of power, so a better > > > > > > > stragey is desirable. > > > > > > > > > > > > > > Most other drivers don't specify any cursor size at all, in > > > > > > > which case the ioctl code just claims that 64x64 is a great > > > > > > > choice. Whether that is actually true is debatable. > > > > > > > > > > > > > > A poll of various compositor developers informs us that > > > > > > > blindly probing with setcursor/atomic ioctl to determine > > > > > > > suitable cursor sizes is not acceptable, thus the > > > > > > > introduction of the new property to supplant the cursor > > > > > > > size caps. The compositor will now be free to select a > > > > > > > more optimal cursor size from the short list of options. > > > > > > > > > > > > > > Note that the reported sizes (either via the property or the > > > > > > > caps) make no claims about things such as plane scaling. So > > > > > > > these things should only really be consulted for simple > > > > > > > "cursor like" use cases. > > > > > > > > > > > > > > v2: Try to add some docs > > > > > > > > > > > > > > Cc: Simon Ser > > > > > > > Cc: Jonas Ådahl > > > > > > > Cc: Daniel Stone > > > > > > > Cc: Pekka Paalanen > > > > > > > Acked-by: Harry Wentland > > > > > > > Signed-off-by: Ville Syrjälä > > > > > > > --- > > > > > > > drivers/gpu/drm/drm_mode_config.c | 7 + > > > > > > > drivers/gpu/drm/drm_plane.c | 48 > > > > > > > +++ > > > > > > > include/drm/drm_mode_config.h | 5 > > > > > > > include/drm/drm_plane.h | 4 +++ > > > > > > > include/uapi/drm/drm_mode.h | 11 +++ > > > > > > > 5 files changed, 75 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_mode_config.c > > > > > > > b/drivers/gpu/drm/drm_mode_config.c > > > > > > > index 87eb591fe9b5..21860f94a18c 100644 > > > > > > > --- a/drivers/gpu/drm/drm_mode_config.c > > > > > > > +++ b/drivers/gpu/drm/drm_mode_config.c > > > > > > > @@ -374,6 +374,13 @@ static int > > > > > > > drm_mode_create_standard_properties(struct drm_device *dev) > > > > > > > return -ENOMEM; > > > > > > > dev->mode_config.modifiers_property = prop; > > > > > > > > > > > > > > + prop = drm_property_create(dev, > > > > > > > +DRM_MODE_PROP_IMMUTABLE | > > > > > > > DRM_MODE_PROP_BLOB, > > > > > > > +"SIZE_HINTS", 0); > > > > > > > + if (!prop) > > > > > > > + return -ENOMEM; > > > > > > > + dev->mode_config.size_hints_property = prop; > > > > > > > + > > > > > > > return 0; > > > > > > > } > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_plane.c > > > > > > > b/drivers/gpu/drm/drm_plane.c > > > > > > > index 24e7998d1731..ae51b1f83755 100644 > > > > > > > --- a/drivers/gpu/drm/drm_plane.c > > > > > > > +++ b/drivers/gpu/drm/drm_plane.c > > > > > > > @@ -140,6 +140,21 @@ > > > > > > > * DRM_FORMAT_MOD_LINEAR. Before linux kernel release v5.1 > > > > > > > there have been > > > > > > > * various bugs in this area with inconsistencies between > > > > > > > the capability > > > > > > > * flag and per-plane properties. > > > > > > > + * > > > > > > > + * SIZE_HINTS: > > > > > > > + * Blob property which contains the set of recommended plane > > > > > > > size > > > > > > > + * which can used for simple "cursor like" use cases (eg. no > > > > > > > scaling). > > > > > > > + * Using these hints frees userspace from extensive probing > > > > > > > of > > > > > > > + * supported plane sizes through atomic/setcursor ioctls. > > > > > > > + * > > > > > > > + * For optimal usage userspace should pick the smallest size > > > > > > > + * that satisfies its own requirements. > > > > > > > + * > > > > > > > + * The blob contains an array of struct drm_plane_size_hint. > > > > > > > + * > > > >
Re: [PATCH 1/2] Mips: ls2k1000: dts: add the display controller device node
On 22/02/2023 17:55, suijingfeng wrote: > The display controller is a pci device, it's pci vendor id is > 0x0014, it's pci device id is 0x7a06. > > Signed-off-by: suijingfeng > --- > .../boot/dts/loongson/loongson64-2k1000.dtsi | 21 +++ > 1 file changed, 21 insertions(+) > > diff --git a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi > b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi > index 8143a6e3..a528af3977d9 100644 > --- a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi > +++ b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi > @@ -31,6 +31,18 @@ memory@20 { > <0x0001 0x1000 0x0001 0xb000>; /* 6912 > MB at 4352MB */ > }; > > + reserved-memory { > + #address-cells = <2>; > + #size-cells = <2>; > + ranges; > + > + display_reserved: framebuffer@3000 { > + compatible = "shared-dma-pool"; > + reg = <0x0 0x3000 0x0 0x0400>; /* 64M */ > + linux,cma-default; > + }; > + }; > + > cpu_clk: cpu_clk { > #clock-cells = <0>; > compatible = "fixed-clock"; > @@ -198,6 +210,15 @@ sata@8,0 { > interrupt-parent = <&liointc0>; > }; > > + display-controller@6,0 { > + compatible = "loongson,ls2k1000-dc"; > + > + reg = <0x3000 0x0 0x0 0x0 0x0>; > + interrupts = <28 IRQ_TYPE_LEVEL_LOW>; > + interrupt-parent = <&liointc0>; > + memory-region = <&display_reserved>; NAK. Test your code against the bindings you send. It's the same patchset. You basically send something which the same moment is incorrect. Best regards, Krzysztof
Re: [PATCH 2/2] dt-bindings: display: Add Loongson display controller
On 22/02/2023 17:55, suijingfeng wrote: > This patch add a trival DT usages for loongson display controller found > in LS2k1000 SoC. Trivial yet so many things to improve... if you only started from recent kernel tree (since you Cced wrong address, I doubt you did) and bindings you would avoid half of these comments. > > Signed-off-by: suijingfeng > --- > .../loongson/loongson,display-controller.yaml | 58 +++ > 1 file changed, 58 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml > > diff --git > a/Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml > > b/Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml > new file mode 100644 > index ..98b78f449a80 > --- /dev/null > +++ > b/Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml Filename based on compatible, so "loongson,ls2k1000-dc.yaml" > @@ -0,0 +1,58 @@ > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > +%YAML 1.2 > +--- > +$id: > http://devicetree.org/schemas/display/loongson/loongson,display-controller.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: Loongson Display Controller Device Tree Bindings Drop "Device Tree Bindings" > + > +maintainers: > + - Sui Jingfeng > + > +description: |+ Drop |+ > + No need for blank line. Do you see it anywhere else in the bindings? > + The display controller is a PCI device, it has two display pipe. > + For the DC in LS2K1000 each way has a DVO output interface which > + provide RGB888 signals, vertical & horizontal synchronisations > + and the pixel clock. Each CRTC is able to support 1920x1080@60Hz, > + the maximum resolution is 2048x2048 according to the hardware spec. > + > +properties: > + $nodename: > +pattern: "^display-controller@[0-9a-f],[0-9a-f]$" Drop nodename. > + > + compatible: > +oneOf: Drop oneOf > + - items: and items... > + - enum: > + - loongson,ls2k1000-dc > + reg: > +maxItems: 1 > + > + interrupts: > +maxItems: 1 > + > +required: > + - compatible > + - reg > + - interrupts > + > +additionalProperties: false > + > +examples: > + - | > +#include > +bus { > + Drop blank line. > +#address-cells = <3>; > +#size-cells = <2>; > +#interrupt-cells = <2>; Why do you need interrupt-cells? > + > +display-controller@6,0 { > +compatible = "loongson,ls2k1000-dc"; > +reg = <0x3000 0x0 0x0 0x0 0x0>;> +interrupts = <28 > IRQ_TYPE_LEVEL_LOW>; > +}; > +}; > + > +... Best regards, Krzysztof
Re: [PATCH] drm/msm: return early when allocating fbdev fails
On Wed, Feb 22, 2023 at 05:09:40PM +0100, Thomas Zimmermann wrote: > Hi > > Am 22.02.23 um 16:56 schrieb Tom Rix: > > building with clang and W=1 reports > > drivers/gpu/drm/msm/msm_fbdev.c:144:6: error: variable 'helper' is used > >uninitialized whenever 'if' condition is true > > [-Werror,-Wsometimes-uninitialized] > >if (!fbdev) > >^~ > > > > helper is only initialized after fbdev succeeds, so is in a garbage state at > > the fail: label. There is nothing to unwinded if fbdev alloaction fails, > > return NULL. > > > > Fixes: 3fb1f62f80a1 ("drm/fb-helper: Remove drm_fb_helper_unprepare() from > > drm_fb_helper_fini()") > > Signed-off-by: Tom Rix > > Already fixed here: > https://lore.kernel.org/dri-devel/08e3340e-b459-0e60-4bba-30716b675...@suse.de/T/#t There is also: ../drivers/gpu/drm/omapdrm/omap_fbdev.c:235:6: error: variable 'helper' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (!fbdev) ^~ ../drivers/gpu/drm/omapdrm/omap_fbdev.c:259:26: note: uninitialized use occurs here drm_fb_helper_unprepare(helper); ^~ ../drivers/gpu/drm/omapdrm/omap_fbdev.c:235:2: note: remove the 'if' if its condition is always false if (!fbdev) ^~~ ../drivers/gpu/drm/omapdrm/omap_fbdev.c:228:30: note: initialize the variable 'helper' to silence this warning struct drm_fb_helper *helper; ^ = NULL 1 error generated. Is the fix the same as the one you have linked above? Cheers, Nathan
Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
On 2/21/23 19:20, Liam R. Howlett wrote: * Danilo Krummrich [230217 08:45]: Add infrastructure to keep track of GPU virtual address (VA) mappings with a decicated VA space manager implementation. New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers start implementing, allow userspace applications to request multiple and arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is intended to serve the following purposes in this context. 1) Provide infrastructure to track GPU VA allocations and mappings, making use of the maple_tree. 2) Generically connect GPU VA mappings to their backing buffers, in particular DRM GEM objects. 3) Provide a common implementation to perform more complex mapping operations on the GPU VA space. In particular splitting and merging of GPU VA mappings, e.g. for intersecting mapping requests or partial unmap requests. Suggested-by: Dave Airlie Signed-off-by: Danilo Krummrich --- Documentation/gpu/drm-mm.rst| 31 + drivers/gpu/drm/Makefile|1 + drivers/gpu/drm/drm_gem.c |3 + drivers/gpu/drm/drm_gpuva_mgr.c | 1704 +++ include/drm/drm_drv.h |6 + include/drm/drm_gem.h | 75 ++ include/drm/drm_gpuva_mgr.h | 714 + 7 files changed, 2534 insertions(+) create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c create mode 100644 include/drm/drm_gpuva_mgr.h diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index a52e6f4117d6..c9f120cfe730 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References .. kernel-doc:: drivers/gpu/drm/drm_mm.c :export: ... + +/** + * drm_gpuva_remove_iter - removes the iterators current element + * @it: the &drm_gpuva_iterator + * + * This removes the element the iterator currently points to. + */ +void +drm_gpuva_iter_remove(struct drm_gpuva_iterator *it) +{ + mas_erase(&it->mas); +} +EXPORT_SYMBOL(drm_gpuva_iter_remove); + +/** + * drm_gpuva_insert - insert a &drm_gpuva + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in + * @va: the &drm_gpuva to insert + * @addr: the start address of the GPU VA + * @range: the range of the GPU VA + * + * Insert a &drm_gpuva with a given address and range into a + * &drm_gpuva_manager. + * + * Returns: 0 on success, negative error code on failure. + */ +int +drm_gpuva_insert(struct drm_gpuva_manager *mgr, +struct drm_gpuva *va) +{ + u64 addr = va->va.addr; + u64 range = va->va.range; + MA_STATE(mas, &mgr->va_mt, addr, addr + range - 1); + struct drm_gpuva_region *reg = NULL; + int ret; + + if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range))) + return -EINVAL; + + if (unlikely(drm_gpuva_in_kernel_region(mgr, addr, range))) + return -EINVAL; + + if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS) { + reg = drm_gpuva_in_region(mgr, addr, range); + if (unlikely(!reg)) + return -EINVAL; + } + - + if (unlikely(drm_gpuva_find_first(mgr, addr, range))) + return -EEXIST; + + ret = mas_store_gfp(&mas, va, GFP_KERNEL); mas_walk() will set the internal maple state to the limits to what it finds. So, instead of an iterator, you can use the walk function and ensure there is a large enough area in the existing NULL: /* * Nothing at addr, mas now points to the location where the store would * happen */ if (mas_walk(&mas)) return -EEXIST; /* The NULL entry ends at mas.last, make sure there is room */ if (mas.last < (addr + range - 1)) return -EEXIST; /* Limit the store size to the correct end address, and store */ mas.last = addr + range - 1; ret = mas_store_gfp(&mas, va, GFP_KERNEL); Would this variant be significantly more efficient? Also, would this also work while already walking the tree? To remove an entry while walking the tree I have a separate function drm_gpuva_iter_remove(). Would I need something similar for inserting entries? I already provided this example in a separate mail thread, but it may makes sense to move this to the mailing list: In __drm_gpuva_sm_map() we're iterating a given range of the tree, where the given range is the size of the newly requested mapping. __drm_gpuva_sm_map() invokes a callback for each sub-operation that needs to be taken in order to fulfill this mapping request. In most cases such a callback just creates a drm_gpuva_op object and stores it in a list. However, drivers can also implement the callback, such that they directly execute this operation within the callback. Let's have a look at the following example: 0 a 2 old: |---| (bo_offset=n) 1 b 3 req: |---| (bo_offset=m) 0 a' 1 b 3 new: |-|
[ANNOUNCE] xf86-video-amdgpu 23.0.0
Alan Coopersmith (2): Update URLs to reflect gitlab migration gitlab CI: enable gitlab's builtin static analysis Kai-Heng Feng (1): Initialize present extension for GPU screen Lukasz Spintzyk (3): Use randr_crtc_covering_drawable used in modesetting Prefer crtc of primary output for synchronization when screen has to crtcs with the same coverage Do not consider disabled crtc anymore when looking for xf86crtc covering drawable. Mario Kleiner (1): Fix primary output handling in amdgpu_crtc_covering_box(). Pierre-Eric Pelloux-Prayer (1): Use DRM_CAP_CURSOR_WIDTH/HEIGHT if possible Shashank Sharma (2): config: Add hotplug driver name Bump version for the 23.0.0 release tiancyin (1): Fix screen corruption on secondary GPU Łukasz Spintzyk (1): amdgpu: fixup driver for new X server ABI git tag: xf86-video-amdgpu-23.0.0 https://xorg.freedesktop.org/archive/individual/driver/xf86-video-amdgpu-23.0.0.tar.gz SHA256: 08c38287d39b999fd61ecb6e7b23d5079762e2b4b2179b3567973ed9aaf71222 xf86-video-amdgpu-23.0.0.tar.gz SHA512: ea4843bf10e12ede71a338f84bc667fe80b75de88e677161811a3de4647382bbfe44b720439e2ea7768961b8a7f9cc31362da454038921ec2c577f4955476eec xf86-video-amdgpu-23.0.0.tar.gz PGP: https://xorg.freedesktop.org/archive/individual/driver/xf86-video-amdgpu-23.0.0.tar.gz.sig https://xorg.freedesktop.org/archive/individual/driver/xf86-video-amdgpu-23.0.0.tar.xz SHA256: 4f04f0ea66f3ced0dcc58f617409860163a19c4e8c285cfb5285f36ba09cc061 xf86-video-amdgpu-23.0.0.tar.xz SHA512: bf26f147629a34e84a0ae8435119e170b9c95edafcab1995b63bb8f55abef32f2efbf4536eb070e64b2ae1460424b1b27a4206cb9836d33ddc6dfbee404f718b xf86-video-amdgpu-23.0.0.tar.xz PGP: https://xorg.freedesktop.org/archive/individual/driver/xf86-video-amdgpu-23.0.0.tar.xz.sig
Re: [PATCH 1/2] drm/msm: drop unused ring variable in msm_ioctl_gem_submit()
On Fri, Feb 17, 2023 at 5:25 PM Dmitry Baryshkov wrote: > > The variable ring is not used by msm_parse_deps() and > msm_ioctl_gem_submit() and thus can be dropped. > > Signed-off-by: Dmitry Baryshkov > --- > drivers/gpu/drm/msm/msm_gem_submit.c | 10 +++--- > drivers/gpu/drm/msm/msm_gpu_trace.h | 10 -- > 2 files changed, 7 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c > b/drivers/gpu/drm/msm/msm_gem_submit.c > index ac8ed731f76d..a539eb31042f 100644 > --- a/drivers/gpu/drm/msm/msm_gem_submit.c > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c > @@ -550,8 +550,7 @@ static struct drm_syncobj **msm_parse_deps(struct > msm_gem_submit *submit, > struct drm_file *file, > uint64_t in_syncobjs_addr, > uint32_t nr_in_syncobjs, > - size_t syncobj_stride, > - struct msm_ringbuffer *ring) > + size_t syncobj_stride) > { > struct drm_syncobj **syncobjs = NULL; > struct drm_msm_gem_submit_syncobj syncobj_desc = {0}; > @@ -722,7 +721,6 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void > *data, > struct msm_gem_submit *submit; > struct msm_gpu *gpu = priv->gpu; > struct msm_gpu_submitqueue *queue; > - struct msm_ringbuffer *ring; > struct msm_submit_post_dep *post_deps = NULL; > struct drm_syncobj **syncobjs_to_reset = NULL; > int out_fence_fd = -1; > @@ -760,8 +758,6 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void > *data, > if (!queue) > return -ENOENT; > > - ring = gpu->rb[queue->ring_nr]; > - > if (args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > out_fence_fd = get_unused_fd_flags(O_CLOEXEC); > if (out_fence_fd < 0) { > @@ -774,7 +770,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void > *data, > if (IS_ERR(submit)) > return PTR_ERR(submit); > > - trace_msm_gpu_submit(pid_nr(submit->pid), ring->id, submit->ident, > + trace_msm_gpu_submit(pid_nr(submit->pid), submit->ident, > args->nr_bos, args->nr_cmds); Please don't remove things from the tracepoint, we have userspace tools that use the tracepoints.. Other parts look ok. BR, -R > > ret = mutex_lock_interruptible(&queue->lock); > @@ -803,7 +799,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void > *data, > syncobjs_to_reset = msm_parse_deps(submit, file, >args->in_syncobjs, >args->nr_in_syncobjs, > - args->syncobj_stride, > ring); > + args->syncobj_stride); > if (IS_ERR(syncobjs_to_reset)) { > ret = PTR_ERR(syncobjs_to_reset); > goto out_unlock; > diff --git a/drivers/gpu/drm/msm/msm_gpu_trace.h > b/drivers/gpu/drm/msm/msm_gpu_trace.h > index ac40d857bc45..12ef10f1de4c 100644 > --- a/drivers/gpu/drm/msm/msm_gpu_trace.h > +++ b/drivers/gpu/drm/msm/msm_gpu_trace.h > @@ -9,24 +9,22 @@ > #define TRACE_INCLUDE_FILE msm_gpu_trace > > TRACE_EVENT(msm_gpu_submit, > - TP_PROTO(pid_t pid, u32 ringid, u32 id, u32 nr_bos, u32 nr_cmds), > - TP_ARGS(pid, ringid, id, nr_bos, nr_cmds), > + TP_PROTO(pid_t pid, u32 id, u32 nr_bos, u32 nr_cmds), > + TP_ARGS(pid, id, nr_bos, nr_cmds), > TP_STRUCT__entry( > __field(pid_t, pid) > __field(u32, id) > - __field(u32, ringid) > __field(u32, nr_cmds) > __field(u32, nr_bos) > ), > TP_fast_assign( > __entry->pid = pid; > __entry->id = id; > - __entry->ringid = ringid; > __entry->nr_bos = nr_bos; > __entry->nr_cmds = nr_cmds > ), > - TP_printk("id=%d pid=%d ring=%d bos=%d cmds=%d", > - __entry->id, __entry->pid, __entry->ringid, > + TP_printk("id=%d pid=%d bos=%d cmds=%d", > + __entry->id, __entry->pid, > __entry->nr_bos, __entry->nr_cmds) > ); > > -- > 2.39.1 >
Re: [PATCH v4 01/14] dma-buf/dma-fence: Add deadline awareness
On 22/02/2023 17:16, Rob Clark wrote: On Wed, Feb 22, 2023 at 9:05 AM Tvrtko Ursulin wrote: On 22/02/2023 15:28, Christian König wrote: Am 22.02.23 um 11:23 schrieb Tvrtko Ursulin: On 18/02/2023 21:15, Rob Clark wrote: From: Rob Clark Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling. v2: Drop dma_fence::deadline and related logic to filter duplicate deadlines, to avoid increasing dma_fence size. The fence-context implementation will need similar logic to track deadlines of all the fences on the same timeline. [ckoenig] v3: Clarify locking wrt. set_deadline callback Signed-off-by: Rob Clark Reviewed-by: Christian König --- drivers/dma-buf/dma-fence.c | 20 include/linux/dma-fence.h | 20 2 files changed, 40 insertions(+) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 0de0482cd36e..763b32627684 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -912,6 +912,26 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout); + +/** + * dma_fence_set_deadline - set desired fence-wait deadline + * @fence:the fence that is to be waited on + * @deadline: the time by which the waiter hopes for the fence to be + *signaled + * + * Inform the fence signaler of an upcoming deadline, such as vblank, by + * which point the waiter would prefer the fence to be signaled by. This + * is intended to give feedback to the fence signaler to aid in power + * management decisions, such as boosting GPU frequency if a periodic + * vblank deadline is approaching. + */ +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{ +if (fence->ops->set_deadline && !dma_fence_is_signaled(fence)) +fence->ops->set_deadline(fence, deadline); +} +EXPORT_SYMBOL(dma_fence_set_deadline); + /** * dma_fence_describe - Dump fence describtion into seq_file * @fence: the 6fence to describe diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 775cdc0b4f24..d77f6591c453 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -99,6 +99,7 @@ enum dma_fence_flag_bits { DMA_FENCE_FLAG_SIGNALED_BIT, DMA_FENCE_FLAG_TIMESTAMP_BIT, DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, +DMA_FENCE_FLAG_HAS_DEADLINE_BIT, Would this bit be better left out from core implementation, given how the approach is the component which implements dma-fence has to track the actual deadline and all? Also taking a step back - are we all okay with starting to expand the relatively simple core synchronisation primitive with side channel data like this? What would be the criteria for what side channel data would be acceptable? Taking note the thing lives outside drivers/gpu/. I had similar concerns and it took me a moment as well to understand the background why this is necessary. I essentially don't see much other approach we could do. Yes, this is GPU/CRTC specific, but we somehow need a common interface for communicating it between drivers and that's the dma_fence object as far as I can see. Yeah I also don't see any other easy options. Just wanted to raise this as something which probably needs some wider acks. Also what about the "low level" part of my question about the reason, or benefits, of defining the deadline bit in the common layer? We could leave DMA_FENCE_FLAG_HAS_DEADLINE_BIT out, but OTOH managing a bitmask that is partially defined in core enum and partially in backend-driver has it's own drawbacks, and it isn't like we are running out of bits.. :shrug: There is DMA_FENCE_FLAG_USER_BITS onwards which implementations could use to store their stuff? And if we skip forward to "drm/scheduler: Add fence deadline support" that's the only place bit is used, right? Would it simply work to look at drm_sched_fence->deadline == 0 as bit not set? Or you see a need to interoperate with other fence implementations via that bit somehow? Regards, Tvrtko
Re: [PATCH] dt-bindings: display: Start the info graphics with HS/VS change
Hi Marek. On Tue, Feb 21, 2023 at 09:04:07PM +0100, Marek Vasut wrote: > The VS signal change is synchronized to HS signal change, start the > info graphics with that event, instead of having that event occur in > the middle of it. > > Scope trace of DPI bus with HS/VS active HIGH looks as follows: > ...__ > VS...___/__ __\__... > HS...___/ \___/ \__...__/ \___... > ^^ > || > |Used to start here -' > | > '--- Start info graphics here > > Signed-off-by: Marek Vasut I recall being annoyed about this before. Reviewed-by: Sam Ravnborg > --- > Cc: Daniel Vetter > Cc: David Airlie > Cc: Krzysztof Kozlowski > Cc: Rob Herring > Cc: Sam Ravnborg > Cc: Thierry Reding > Cc: devicet...@vger.kernel.org > Cc: dri-devel@lists.freedesktop.org > --- > .../bindings/display/panel/panel-timing.yaml | 46 +-- > 1 file changed, 23 insertions(+), 23 deletions(-) > > diff --git > a/Documentation/devicetree/bindings/display/panel/panel-timing.yaml > b/Documentation/devicetree/bindings/display/panel/panel-timing.yaml > index 0d317e61edd8f..aea69b84ca5d8 100644 > --- a/Documentation/devicetree/bindings/display/panel/panel-timing.yaml > +++ b/Documentation/devicetree/bindings/display/panel/panel-timing.yaml > @@ -17,29 +17,29 @@ description: | > >The parameters are defined as seen in the following illustration. > > - +--+-+--+---+ > - | |^| | | > - | ||vback_porch | | | > - | |v| | | > - +--###--+---+ > - | #^# | | > - | #|# | | > - | hback #|# hfront | hsync | > - | porch #| hactive # porch | len | > - |<>#<---+--->#<>|<->| > - | #|# | | > - | #|vactive # | | > - | #|# | | > - | #v# | | > - +--###--+---+ > - | |^| | | > - | ||vfront_porch| | | > - | |v| | | > - +--+-+--+---+ > - | |^| | | > - | ||vsync_len | | | > - | |v| | | > - +--+-+--+---+ > + +---+--+-+--+ > + | | |^| | > + | | ||vsync_len | | > + | | |v| | > + +---+--+-+--+ > + | | |^| | > + | | ||vback_porch | | > + | | |v| | > + +---+--###--+ > + | | #^# | > + | | #|# | > + | hsync | hback #|# hfront | > + | len | porch #| hactive # porch | > + |<->|<>#<---+--->#<>| > + | | #|# | > + | | #|vactive # | > + | | #|# | > + | | #v# | > + +---+--###--+ > + | | |^| | > + | | ||vfront_porch| | > + | | |v| | > + +---+--+-+--+ > > >The following is the panel timings shown with tim
Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
On 2/22/23 17:32, Matthew Wilcox wrote: On Wed, Feb 22, 2023 at 05:11:34PM +0100, Danilo Krummrich wrote: On 2/21/23 19:31, Matthew Wilcox wrote: on tue, feb 21, 2023 at 03:37:49pm +0100, danilo krummrich wrote: It feels a bit weird that I, as a user of the API, would need to lock certain (or all?) mas_*() functions with the internal spinlock in order to protect (future) internal features of the tree, such as the slab cache defragmentation you mentioned. Because from my perspective, as the generic component that tells it's users (the drivers) to take care of locking VA space operations (and hence tree operations) I don't have an own purpose of this internal spinlock, right? You don't ... but we can't know that. Thanks for the clarification. I think I should now know what to for the GPUVA manager in terms of locking the maple tree in general. Though I still have very limited insights on the maple tree I want to share some further thoughts. From what I got so far it really seems to me that it would be better to just take the internal spinlock for both APIs (normal and advanced) whenever you need to internally. No. Really, no. The point of the advanced API is that it's a toolbox for doing the operation you want, but isn't a generic enough operation to be part of the normal API. Again the disclaimer, I'm just sharing my thoughts from the perspective of a user from a generic tree API. For me it feels like - and this purely is an assumption, hence please correct me if I'm wrong on that - you consider the advanced API to be more of a collection of internal functions not *really* being meant to be used by arbitrary users and maybe even being slightly tied to mm since it originated there? However, from my external perspective I see it the following way. Even if an operation is not part of the 'normal API', but an API called 'advanced API', it still is a generic API operation being exposed to arbitrary users. However, my point is not (at least not exclusively) that I do not consider this to be safe enough or something. Its just that I think that when the API *enforces* the user to take an internal lock at certain places it can also just take the lock itself no matter what the API is being called. Especially when one can't rely on this lock at all for other (external) purposes anyways because the implementation behind the API is free to drop the lock whenever it needs to. To take an example from the radix tree days, in the page cache, we need to walk a range of the tree, looking for any entries that are marked as DIRTY, clear the DIRTY mark and set the TOWRITE mark. There was a horrendous function called radix_tree_range_tag_if_tagged() which did exactly this. Now look at the implementation of tag_pages_for_writeback(); it's a simple loop over a range with an occasional pause to check whether we need to reschedule. But that means you need to know how to use the toolbox. Some of the tools are dangerous and you can cut yourself on them. Another plus would probably be maintainability. Once you got quite a few maple tree users using external locks (either in the sense of calling I don't want maple tree users using external locks. That exists because it was the only reasonable way of converting the VMA tree from the rbtree to the maple tree. I intend to get rid of mt_set_external_lock(). The VMAs are eventually going to be protected by the internal spinlock. But the argument also holds for the case of using the advanced API and using the internal spinlock. If your requirements on locking change in the future every user implementation must be re-validated.
Re: [PATCH v4 01/14] dma-buf/dma-fence: Add deadline awareness
On Wed, Feb 22, 2023 at 9:05 AM Tvrtko Ursulin wrote: > > > On 22/02/2023 15:28, Christian König wrote: > > Am 22.02.23 um 11:23 schrieb Tvrtko Ursulin: > >> > >> On 18/02/2023 21:15, Rob Clark wrote: > >>> From: Rob Clark > >>> > >>> Add a way to hint to the fence signaler of an upcoming deadline, such as > >>> vblank, which the fence waiter would prefer not to miss. This is to aid > >>> the fence signaler in making power management decisions, like boosting > >>> frequency as the deadline approaches and awareness of missing deadlines > >>> so that can be factored in to the frequency scaling. > >>> > >>> v2: Drop dma_fence::deadline and related logic to filter duplicate > >>> deadlines, to avoid increasing dma_fence size. The fence-context > >>> implementation will need similar logic to track deadlines of all > >>> the fences on the same timeline. [ckoenig] > >>> v3: Clarify locking wrt. set_deadline callback > >>> > >>> Signed-off-by: Rob Clark > >>> Reviewed-by: Christian König > >>> --- > >>> drivers/dma-buf/dma-fence.c | 20 > >>> include/linux/dma-fence.h | 20 > >>> 2 files changed, 40 insertions(+) > >>> > >>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > >>> index 0de0482cd36e..763b32627684 100644 > >>> --- a/drivers/dma-buf/dma-fence.c > >>> +++ b/drivers/dma-buf/dma-fence.c > >>> @@ -912,6 +912,26 @@ dma_fence_wait_any_timeout(struct dma_fence > >>> **fences, uint32_t count, > >>> } > >>> EXPORT_SYMBOL(dma_fence_wait_any_timeout); > >>> + > >>> +/** > >>> + * dma_fence_set_deadline - set desired fence-wait deadline > >>> + * @fence:the fence that is to be waited on > >>> + * @deadline: the time by which the waiter hopes for the fence to be > >>> + *signaled > >>> + * > >>> + * Inform the fence signaler of an upcoming deadline, such as > >>> vblank, by > >>> + * which point the waiter would prefer the fence to be signaled by. > >>> This > >>> + * is intended to give feedback to the fence signaler to aid in power > >>> + * management decisions, such as boosting GPU frequency if a periodic > >>> + * vblank deadline is approaching. > >>> + */ > >>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) > >>> +{ > >>> +if (fence->ops->set_deadline && !dma_fence_is_signaled(fence)) > >>> +fence->ops->set_deadline(fence, deadline); > >>> +} > >>> +EXPORT_SYMBOL(dma_fence_set_deadline); > >>> + > >>> /** > >>>* dma_fence_describe - Dump fence describtion into seq_file > >>>* @fence: the 6fence to describe > >>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h > >>> index 775cdc0b4f24..d77f6591c453 100644 > >>> --- a/include/linux/dma-fence.h > >>> +++ b/include/linux/dma-fence.h > >>> @@ -99,6 +99,7 @@ enum dma_fence_flag_bits { > >>> DMA_FENCE_FLAG_SIGNALED_BIT, > >>> DMA_FENCE_FLAG_TIMESTAMP_BIT, > >>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, > >>> +DMA_FENCE_FLAG_HAS_DEADLINE_BIT, > >> > >> Would this bit be better left out from core implementation, given how > >> the approach is the component which implements dma-fence has to track > >> the actual deadline and all? > >> > >> Also taking a step back - are we all okay with starting to expand the > >> relatively simple core synchronisation primitive with side channel > >> data like this? What would be the criteria for what side channel data > >> would be acceptable? Taking note the thing lives outside drivers/gpu/. > > > > I had similar concerns and it took me a moment as well to understand the > > background why this is necessary. I essentially don't see much other > > approach we could do. > > > > Yes, this is GPU/CRTC specific, but we somehow need a common interface > > for communicating it between drivers and that's the dma_fence object as > > far as I can see. > > Yeah I also don't see any other easy options. Just wanted to raise this > as something which probably needs some wider acks. > > Also what about the "low level" part of my question about the reason, or > benefits, of defining the deadline bit in the common layer? We could leave DMA_FENCE_FLAG_HAS_DEADLINE_BIT out, but OTOH managing a bitmask that is partially defined in core enum and partially in backend-driver has it's own drawbacks, and it isn't like we are running out of bits.. :shrug: BR, -R > Regards, > > Tvrtko
Re: [PATCH v2 2/8] accel/qaic: Add uapi and core driver file
On 17.02.2023 11:15, Jeffrey Hugo wrote: On 2/16/2023 7:13 AM, Jacek Lawrynowicz wrote: Hi, On 06.02.2023 16:41, Jeffrey Hugo wrote: Add the QAIC driver uapi file and core driver file that binds to the PCIe device. The core driver file also creates the accel device and manages all the interconnections between the different parts of the driver. The driver can be built as a module. If so, it will be called "qaic.ko". Signed-off-by: Jeffrey Hugo Reviewed-by: Carl Vanderlip --- drivers/accel/qaic/qaic.h | 321 ++ drivers/accel/qaic/qaic_drv.c | 771 ++ include/uapi/drm/qaic_accel.h | 283 3 files changed, 1375 insertions(+) create mode 100644 drivers/accel/qaic/qaic.h create mode 100644 drivers/accel/qaic/qaic_drv.c create mode 100644 include/uapi/drm/qaic_accel.h diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h new file mode 100644 index 000..3f7ea76 --- /dev/null +++ b/drivers/accel/qaic/qaic.h @@ -0,0 +1,321 @@ +/* SPDX-License-Identifier: GPL-2.0-only + * + * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. + */ + +#ifndef QAICINTERNAL_H_ Please use guard macro that matches the file name: _QAIC_H_ Before moving to DRM/ACCEL, this conflicted with the uapi file. However, that is no longer the case, so yes, this should be changed. Will do. +#define QAICINTERNAL_H_ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define QAIC_DBC_BASE 0x2 +#define QAIC_DBC_SIZE 0x1000 + +#define QAIC_NO_PARTITION -1 + +#define QAIC_DBC_OFF(i)((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE) + +#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base) + +extern bool poll_datapath; + +struct qaic_user { + /* Uniquely identifies this user for the device */ + int handle; + struct kref ref_count; + /* Char device opened by this user */ + struct qaic_drm_device *qddev; + /* Node in list of users that opened this drm device */ + struct list_headnode; + /* SRCU used to synchronize this user during cleanup */ + struct srcu_struct qddev_lock; + atomic_tchunk_id; +}; + +struct dma_bridge_chan { + /* Pointer to device strcut maintained by driver */ + struct qaic_device *qdev; + /* ID of this DMA bridge channel(DBC) */ + unsigned intid; + /* Synchronizes access to xfer_list */ + spinlock_t xfer_lock; + /* Base address of request queue */ + void*req_q_base; + /* Base address of response queue */ + void*rsp_q_base; + /* +* Base bus address of request queue. Response queue bus address can be +* calculated by adding request queue size to this variable +*/ + dma_addr_t dma_addr; + /* Total size of request and response queue in byte */ + u32 total_size; + /* Capacity of request/response queue */ + u32 nelem; + /* The user that opened this DBC */ + struct qaic_user*usr; + /* +* Request ID of next memory handle that goes in request queue. One +* memory handle can enqueue more than one request elements, all +* this requests that belong to same memory handle have same request ID +*/ + u16 next_req_id; + /* TRUE: DBC is in use; FALSE: DBC not in use */ Use standard "true"/"false" instead of custom "TRUE"/"FALSE" macros. This applies here and in multiple other places in the driver. I think you are getting at that the documentation could be confusing. I don't appear to see custom macro use in the code. Will try to clarify that here. + boolin_use; + /* +* Base address of device registers. Used to read/write request and +* response queue's head and tail pointer of this DBC. +*/ + void __iomem*dbc_base; + /* Head of list where each node is a memory handle queued in request queue */ + struct list_headxfer_list; + /* Synchronizes DBC readers during cleanup */ + struct srcu_struct ch_lock; + /* +* When this DBC is released, any thread waiting on this wait queue is +* woken up +*/ + wait_queue_head_t dbc_release; + /* Head of list where each node is a bo associated with this DBC */ + struct list_headbo_lists; + /* The irq line for this DBC. Used for polling */ + unsigned intirq; + /* Polling work item to simulate interrupts */ + struct work_struct poll_wo
[PATCH 22/27] kbuild, dma-buf: heaps: remove MODULE_LICENSE in non-modules
Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock Suggested-by: Luis Chamberlain Cc: Luis Chamberlain Cc: linux-modu...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: Hitomi Hasegawa Cc: Sumit Semwal Cc: "Christian König" Cc: linux-me...@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-...@lists.linaro.org --- drivers/dma-buf/heaps/cma_heap.c| 1 - drivers/dma-buf/heaps/system_heap.c | 1 - 2 files changed, 2 deletions(-) diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c index 1131fb943992..a7f048048864 100644 --- a/drivers/dma-buf/heaps/cma_heap.c +++ b/drivers/dma-buf/heaps/cma_heap.c @@ -407,4 +407,3 @@ static int add_default_cma_heap(void) } module_init(add_default_cma_heap); MODULE_DESCRIPTION("DMA-BUF CMA Heap"); -MODULE_LICENSE("GPL v2"); diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c index e8bd10e60998..79c03f5b4e28 100644 --- a/drivers/dma-buf/heaps/system_heap.c +++ b/drivers/dma-buf/heaps/system_heap.c @@ -440,4 +440,3 @@ static int system_heap_create(void) return 0; } module_init(system_heap_create); -MODULE_LICENSE("GPL v2"); -- 2.39.1.268.g9de2f9a303
[PATCH 25/27] kbuild, video: fbdev: remove MODULE_LICENSE in non-modules
Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock Suggested-by: Luis Chamberlain Cc: Luis Chamberlain Cc: linux-modu...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: Hitomi Hasegawa Cc: Helge Deller Cc: linux-fb...@vger.kernel.org Cc: dri-devel@lists.freedesktop.org --- drivers/video/fbdev/asiliantfb.c| 1 - drivers/video/fbdev/gbefb.c | 1 - drivers/video/fbdev/mmp/hw/mmp_ctrl.c | 1 - drivers/video/fbdev/mmp/panel/tpo_tj032md01bw.c | 1 - drivers/video/fbdev/vesafb.c| 1 - 5 files changed, 5 deletions(-) diff --git a/drivers/video/fbdev/asiliantfb.c b/drivers/video/fbdev/asiliantfb.c index 8383468f5577..0d33c75afc48 100644 --- a/drivers/video/fbdev/asiliantfb.c +++ b/drivers/video/fbdev/asiliantfb.c @@ -632,4 +632,3 @@ static void __exit asiliantfb_exit(void) pci_unregister_driver(&asiliantfb_driver); } -MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c index 000b4aa44241..39e89b9f8dca 100644 --- a/drivers/video/fbdev/gbefb.c +++ b/drivers/video/fbdev/gbefb.c @@ -1285,4 +1285,3 @@ static void __exit gbefb_exit(void) module_init(gbefb_init); module_exit(gbefb_exit); -MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/mmp/hw/mmp_ctrl.c b/drivers/video/fbdev/mmp/hw/mmp_ctrl.c index a9df8ee79810..8b2838a67b76 100644 --- a/drivers/video/fbdev/mmp/hw/mmp_ctrl.c +++ b/drivers/video/fbdev/mmp/hw/mmp_ctrl.c @@ -576,4 +576,3 @@ module_init(mmphw_init); MODULE_AUTHOR("Li Guoqing"); MODULE_DESCRIPTION("Framebuffer driver for mmp"); -MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/mmp/panel/tpo_tj032md01bw.c b/drivers/video/fbdev/mmp/panel/tpo_tj032md01bw.c index 34fae588e202..84ed83ee2366 100644 --- a/drivers/video/fbdev/mmp/panel/tpo_tj032md01bw.c +++ b/drivers/video/fbdev/mmp/panel/tpo_tj032md01bw.c @@ -169,4 +169,3 @@ module_spi_driver(panel_tpohvga_driver); MODULE_AUTHOR("Lisa Du"); MODULE_DESCRIPTION("Panel driver for tpohvga"); -MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c index 929d4775cb4b..73b35fc67d8b 100644 --- a/drivers/video/fbdev/vesafb.c +++ b/drivers/video/fbdev/vesafb.c @@ -508,4 +508,3 @@ static struct platform_driver vesafb_driver = { }; module_platform_driver(vesafb_driver); -MODULE_LICENSE("GPL"); -- 2.39.1.268.g9de2f9a303
[PATCH 02/27] kbuild, video: fbdev: remove MODULE_LICENSE in non-modules
Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock Suggested-by: Luis Chamberlain Cc: Luis Chamberlain Cc: linux-modu...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: Hitomi Hasegawa Cc: Helge Deller Cc: linux-arm-ker...@lists.infradead.org Cc: linux-fb...@vger.kernel.org Cc: dri-devel@lists.freedesktop.org --- drivers/video/fbdev/wm8505fb.c| 1 - drivers/video/fbdev/wmt_ge_rops.c | 1 - 2 files changed, 2 deletions(-) diff --git a/drivers/video/fbdev/wm8505fb.c b/drivers/video/fbdev/wm8505fb.c index 8f4d674fa0d0..2a2997c647f7 100644 --- a/drivers/video/fbdev/wm8505fb.c +++ b/drivers/video/fbdev/wm8505fb.c @@ -407,5 +407,4 @@ module_platform_driver(wm8505fb_driver); MODULE_AUTHOR("Ed Spiridonov "); MODULE_DESCRIPTION("Framebuffer driver for WMT WM8505"); -MODULE_LICENSE("GPL v2"); MODULE_DEVICE_TABLE(of, wmt_dt_ids); diff --git a/drivers/video/fbdev/wmt_ge_rops.c b/drivers/video/fbdev/wmt_ge_rops.c index 42255d27a1db..c207fd917dce 100644 --- a/drivers/video/fbdev/wmt_ge_rops.c +++ b/drivers/video/fbdev/wmt_ge_rops.c @@ -170,5 +170,4 @@ module_platform_driver(wmt_ge_rops_driver); MODULE_AUTHOR("Alexey Charkov "); MODULE_DESCRIPTION("Accelerators for raster operations using " "WonderMedia Graphics Engine"); -MODULE_LICENSE("GPL v2"); MODULE_DEVICE_TABLE(of, wmt_dt_ids); -- 2.39.1.268.g9de2f9a303
Re: [PATCH] media: Fix building pdfdocs
On Thu, 9 Feb 2023 12:11:19 +0200, Tomi Valkeinen wrote: > Hi Daniel, Dave, > > Could you pick this fix to drm-next? The offending commit is there, it was > merged via Laurent's "[GIT PULL FOR v6.3] R-Car DU fixes and improvements". > > I can also push this to drm-misc-fixes, but the offending commit is not there > yet. Hi, I don't see this fix in next-20230220. I'd really appreciate it if it could be upstreamed during the current merge window. Does anybody care to pick this up? Thanks, Akira > > Tomi > > On 08/02/2023 13:17, Mauro Carvalho Chehab wrote: >> Em Wed, 8 Feb 2023 11:11:34 +0200 >> Laurent Pinchart escreveu: >> >>> Hi Tomi, >>> >>> Thank you for the patch. >>> >>> On Wed, Feb 08, 2023 at 10:29:16AM +0200, Tomi Valkeinen wrote: Commit 8d0e3fc61abd ("media: Add 2-10-10-10 RGB formats") added documatation for a few new RGB formats. For some reason these break the >>> >>> s/documatation/documentation/ >>> pdfdocs build, even if the same style seems to work elsewhere in the file. Remove the trailing empty dash lines, which seems to fix the issue. Fixes: 8d0e3fc61abd ("media: Add 2-10-10-10 RGB formats") Reported-by: Akira Yokosawa Signed-off-by: Tomi Valkeinen >>> >>> Reviewed-by: Laurent Pinchart >>> --- Note: the offending patch was merged via drm tree, so we may want to apply the fix to the drm tree also. >>> >>> Sounds good to me. Mauro, could you ack this patch ? >> >> Acked-by: Mauro Carvalho Chehab >> >>> Documentation/userspace-api/media/v4l/pixfmt-rgb.rst | 3 --- 1 file changed, 3 deletions(-) diff --git a/Documentation/userspace-api/media/v4l/pixfmt-rgb.rst b/Documentation/userspace-api/media/v4l/pixfmt-rgb.rst index d330aeb4d3eb..ea545ed1aeaa 100644 --- a/Documentation/userspace-api/media/v4l/pixfmt-rgb.rst +++ b/Documentation/userspace-api/media/v4l/pixfmt-rgb.rst @@ -868,7 +868,6 @@ number of bits for each component. - r\ :sub:`4` - r\ :sub:`3` - r\ :sub:`2` - - * .. _V4L2-PIX-FMT-RGBA1010102: - ``V4L2_PIX_FMT_RGBA1010102`` @@ -909,7 +908,6 @@ number of bits for each component. - r\ :sub:`4` - r\ :sub:`3` - r\ :sub:`2` - - * .. _V4L2-PIX-FMT-ARGB2101010: - ``V4L2_PIX_FMT_ARGB2101010`` @@ -950,7 +948,6 @@ number of bits for each component. - r\ :sub:`6` - r\ :sub:`5` - r\ :sub:`4` - - .. raw:: latex >>> >> >> >> >> Thanks, >> Mauro >
[PATCH v2] drm/amdkfd: Fix an illegal memory access
In the kfd_wait_on_events() function, the kfd_event_waiter structure is allocated by alloc_event_waiters(), but the event field of the waiter structure is not initialized; When copy_from_user() fails in the kfd_wait_on_events() function, it will enter exception handling to release the previously allocated memory of the waiter structure; Due to the event field of the waiters structure being accessed in the free_waiters() function, this results in illegal memory access and system crash, here is the crash log: localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0 localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082 localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 002c localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: e7088f6a21d0 localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: aa53c362be64 localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 0002 localhost kernel: R13: 9e7ead15d600 R14: R15: 9e7ead15d698 localhost kernel: FS: 152a3d111700() GS:9e855ee8() knlGS: localhost kernel: CS: 0010 DS: ES: CR0: 80050033 localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 003506e0 localhost kernel: Call Trace: localhost kernel: _raw_spin_lock_irqsave+0x30/0x40 localhost kernel: remove_wait_queue+0x12/0x50 localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu] localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu] localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: __x64_sys_ioctl+0x8e/0xd0 localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0 localhost kernel: do_syscall_64+0x33/0x80 localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 localhost kernel: RIP: 0033:0x152a4dff68d7 Changes since v1: * Allocate the waiter structure using kzalloc, removing the initialization of activated; * '(event_waiters) &&' in the 'for' loop has also been removed. Signed-off-by: Qu Huang --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 729d26d..bb54f6c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -780,13 +780,12 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) event_waiters = kmalloc_array(num_events, sizeof(struct kfd_event_waiter), - GFP_KERNEL); + GFP_KERNEL | __GFP_ZERO); if (!event_waiters) return NULL; - for (i = 0; (event_waiters) && (i < num_events) ; i++) { + for (i = 0; i < num_events; i++) { init_wait(&event_waiters[i].wait); - event_waiters[i].activated = false; } return event_waiters; -- 1.8.3.1
[PATCH] drm/i915/sseu: fix max_subslices array-index-out-of-bounds access
It seems that commit bc3c5e0809ae ("drm/i915/sseu: Don't try to store EU mask internally in UAPI format") exposed a potential out-of-bounds access, reported by UBSAN as following on a laptop with a gen 11 i915 card: UBSAN: array-index-out-of-bounds in drivers/gpu/drm/i915/gt/intel_sseu.c:65:27 index 6 is out of range for type 'u16 [6]' CPU: 2 PID: 165 Comm: systemd-udevd Not tainted 6.2.0-9-generic #9-Ubuntu Hardware name: Dell Inc. XPS 13 9300/077Y9N, BIOS 1.11.0 03/22/2022 Call Trace: show_stack+0x4e/0x61 dump_stack_lvl+0x4a/0x6f dump_stack+0x10/0x18 ubsan_epilogue+0x9/0x3a __ubsan_handle_out_of_bounds.cold+0x42/0x47 gen11_compute_sseu_info+0x121/0x130 [i915] intel_sseu_info_init+0x15d/0x2b0 [i915] intel_gt_init_mmio+0x23/0x40 [i915] i915_driver_mmio_probe+0x129/0x400 [i915] ? intel_gt_probe_all+0x91/0x2e0 [i915] i915_driver_probe+0xe1/0x3f0 [i915] ? drm_privacy_screen_get+0x16d/0x190 [drm] ? acpi_dev_found+0x64/0x80 i915_pci_probe+0xac/0x1b0 [i915] ... According to the definition of sseu_dev_info, eu_mask->hsw is limited to a maximum of GEN_MAX_SS_PER_HSW_SLICE (6) sub-slices, but gen11_sseu_info_init() can potentially set 8 sub-slices, in the !IS_JSL_EHL(gt->i915) case. Fix this by reserving up to 8 slots for max_subslices in the eu_mask struct. Reported-by: Emil Renner Berthing Signed-off-by: Andrea Righi --- drivers/gpu/drm/i915/gt/intel_sseu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.h b/drivers/gpu/drm/i915/gt/intel_sseu.h index aa87d3832d60..d7e8c374f153 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.h +++ b/drivers/gpu/drm/i915/gt/intel_sseu.h @@ -27,7 +27,7 @@ struct drm_printer; * is only relevant to pre-Xe_HP platforms (Xe_HP and beyond use the * I915_MAX_SS_FUSE_BITS value below). */ -#define GEN_MAX_SS_PER_HSW_SLICE 6 +#define GEN_MAX_SS_PER_HSW_SLICE 8 /* * Maximum number of 32-bit registers used by hardware to express the -- 2.38.1
[PATCH 13/27] kbuild, vgacon: remove MODULE_LICENSE in non-modules
Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock Suggested-by: Luis Chamberlain Cc: Luis Chamberlain Cc: linux-modu...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: Hitomi Hasegawa Cc: Greg Kroah-Hartman Cc: Helge Deller Cc: linux-fb...@vger.kernel.org Cc: dri-devel@lists.freedesktop.org --- drivers/video/console/vgacon.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/video/console/vgacon.c b/drivers/video/console/vgacon.c index fcdf017e2665..8e13af1f9042 100644 --- a/drivers/video/console/vgacon.c +++ b/drivers/video/console/vgacon.c @@ -1204,4 +1204,3 @@ const struct consw vga_con = { }; EXPORT_SYMBOL(vga_con); -MODULE_LICENSE("GPL"); -- 2.39.1.268.g9de2f9a303
[PATCH] drm/amdkfd: Fix an illegal memory access
From: Qu Huang In the kfd_wait_on_events() function, the kfd_event_waiter structure is allocated by alloc_event_waiters(), but the event field of the waiter structure is not initialized; When copy_from_user() fails in the kfd_wait_on_events() function, it will enter exception handling to release the previously allocated memory of the waiter structure; Due to the event field of the waiters structure being accessed in the free_waiters() function, this results in illegal memory access and system crash, here is the crash log: localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0 localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082 localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 002c localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: e7088f6a21d0 localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: aa53c362be64 localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 0002 localhost kernel: R13: 9e7ead15d600 R14: R15: 9e7ead15d698 localhost kernel: FS: 152a3d111700() GS:9e855ee8() knlGS: localhost kernel: CS: 0010 DS: ES: CR0: 80050033 localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 003506e0 localhost kernel: Call Trace: localhost kernel: _raw_spin_lock_irqsave+0x30/0x40 localhost kernel: remove_wait_queue+0x12/0x50 localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu] localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu] localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: __x64_sys_ioctl+0x8e/0xd0 localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0 localhost kernel: do_syscall_64+0x33/0x80 localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 localhost kernel: RIP: 0033:0x152a4dff68d7 Signed-off-by: Qu Huang --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 729d26d..e5faaad 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -787,6 +787,7 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) for (i = 0; (event_waiters) && (i < num_events) ; i++) { init_wait(&event_waiters[i].wait); event_waiters[i].activated = false; + event_waiters[i].event = NULL; } return event_waiters; -- 1.8.3.1
Re: [PATCH] drm/amdkfd: Fix an illegal memory access
On 2023/2/22 3:17, Christophe JAILLET wrote: > Le 21/02/2023 à 17:26, Felix Kuehling a écrit : >> >> On 2023-02-21 06:35, qu.huang-fxuvxftifdnyg1zeobx...@public.gmane.org wrote: >>> From: Qu Huang >>> >>> In the kfd_wait_on_events() function, the kfd_event_waiter structure is >>> allocated by alloc_event_waiters(), but the event field of the waiter >>> structure is not initialized; When copy_from_user() fails in the >>> kfd_wait_on_events() function, it will enter exception handling to >>> release the previously allocated memory of the waiter structure; >>> Due to the event field of the waiters structure being accessed >>> in the free_waiters() function, this results in illegal memory access >>> and system crash, here is the crash log: >>> >>> localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0 >>> localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082 >>> localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: >>> 002c >>> localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: >>> e7088f6a21d0 >>> localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: >>> aa53c362be64 >>> localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: >>> 0002 >>> localhost kernel: R13: 9e7ead15d600 R14: R15: >>> 9e7ead15d698 >>> localhost kernel: FS: 152a3d111700() GS:9e855ee8() >>> knlGS: >>> localhost kernel: CS: 0010 DS: ES: CR0: 80050033 >>> localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: >>> 003506e0 >>> localhost kernel: Call Trace: >>> localhost kernel: _raw_spin_lock_irqsave+0x30/0x40 >>> localhost kernel: remove_wait_queue+0x12/0x50 >>> localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu] >>> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 >>> localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu] >>> localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu] >>> localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu] >>> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 >>> localhost kernel: __x64_sys_ioctl+0x8e/0xd0 >>> localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0 >>> localhost kernel: do_syscall_64+0x33/0x80 >>> localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> localhost kernel: RIP: 0033:0x152a4dff68d7 >>> >>> Signed-off-by: Qu Huang >>> --- >>> drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c >>> index 729d26d..e5faaad 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c >>> @@ -787,6 +787,7 @@ static struct kfd_event_waiter >>> *alloc_event_waiters(uint32_t num_events) >>> for (i = 0; (event_waiters) && (i < num_events) ; i++) { >>> init_wait(&event_waiters[i].wait); >>> event_waiters[i].activated = false; >>> + event_waiters[i].event = NULL; >> >> Thank you for catching this. We're often lazy about initializing things to >> NULL or 0 because most of our data structures are allocated with kzalloc or >> similar. I'm not sure why we're not doing this here. If we allocated >> event_waiters with kcalloc, we could also remove the initialization of >> activated. I think that would be the cleaner and safer solution. > > Hi, > > I think that the '(event_waiters) &&' in the 'for' can also be removed. > 'event_waiters' is already NULL tested a few lines above > > > Just my 2c. > > CJ > Thanks for the suggestions from Felix and CJ, I have re-submitted patch v2, please review it: https://lore.kernel.org/all/ea5b997309825b21e406f9bad2ce8...@linux.dev/ Regards, Qu >> >> Regards, >> Felix >> >> >>> } >>> >>> return event_waiters; >>> -- >>> 1.8.3.1 >> >
[PATCH 23/27] kbuild, drm/dsi: remove MODULE_LICENSE in non-modules
Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock Suggested-by: Luis Chamberlain Cc: Luis Chamberlain Cc: linux-modu...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: Hitomi Hasegawa Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Cc: David Airlie Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org --- drivers/gpu/drm/drm_mipi_dsi.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/drm_mipi_dsi.c b/drivers/gpu/drm/drm_mipi_dsi.c index 497ef4b6a90a..4101d9780059 100644 --- a/drivers/gpu/drm/drm_mipi_dsi.c +++ b/drivers/gpu/drm/drm_mipi_dsi.c @@ -1294,4 +1294,3 @@ postcore_initcall(mipi_dsi_bus_init); MODULE_AUTHOR("Andrzej Hajda "); MODULE_DESCRIPTION("MIPI DSI Bus"); -MODULE_LICENSE("GPL and additional rights"); -- 2.39.1.268.g9de2f9a303
Re: [PATCH v4 01/14] dma-buf/dma-fence: Add deadline awareness
On 22/02/2023 15:28, Christian König wrote: Am 22.02.23 um 11:23 schrieb Tvrtko Ursulin: On 18/02/2023 21:15, Rob Clark wrote: From: Rob Clark Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling. v2: Drop dma_fence::deadline and related logic to filter duplicate deadlines, to avoid increasing dma_fence size. The fence-context implementation will need similar logic to track deadlines of all the fences on the same timeline. [ckoenig] v3: Clarify locking wrt. set_deadline callback Signed-off-by: Rob Clark Reviewed-by: Christian König --- drivers/dma-buf/dma-fence.c | 20 include/linux/dma-fence.h | 20 2 files changed, 40 insertions(+) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 0de0482cd36e..763b32627684 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -912,6 +912,26 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout); + +/** + * dma_fence_set_deadline - set desired fence-wait deadline + * @fence: the fence that is to be waited on + * @deadline: the time by which the waiter hopes for the fence to be + * signaled + * + * Inform the fence signaler of an upcoming deadline, such as vblank, by + * which point the waiter would prefer the fence to be signaled by. This + * is intended to give feedback to the fence signaler to aid in power + * management decisions, such as boosting GPU frequency if a periodic + * vblank deadline is approaching. + */ +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{ + if (fence->ops->set_deadline && !dma_fence_is_signaled(fence)) + fence->ops->set_deadline(fence, deadline); +} +EXPORT_SYMBOL(dma_fence_set_deadline); + /** * dma_fence_describe - Dump fence describtion into seq_file * @fence: the 6fence to describe diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 775cdc0b4f24..d77f6591c453 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -99,6 +99,7 @@ enum dma_fence_flag_bits { DMA_FENCE_FLAG_SIGNALED_BIT, DMA_FENCE_FLAG_TIMESTAMP_BIT, DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, + DMA_FENCE_FLAG_HAS_DEADLINE_BIT, Would this bit be better left out from core implementation, given how the approach is the component which implements dma-fence has to track the actual deadline and all? Also taking a step back - are we all okay with starting to expand the relatively simple core synchronisation primitive with side channel data like this? What would be the criteria for what side channel data would be acceptable? Taking note the thing lives outside drivers/gpu/. I had similar concerns and it took me a moment as well to understand the background why this is necessary. I essentially don't see much other approach we could do. Yes, this is GPU/CRTC specific, but we somehow need a common interface for communicating it between drivers and that's the dma_fence object as far as I can see. Yeah I also don't see any other easy options. Just wanted to raise this as something which probably needs some wider acks. Also what about the "low level" part of my question about the reason, or benefits, of defining the deadline bit in the common layer? Regards, Tvrtko
Re: [PATCH v5 0/7] Introduce __xchg, non-atomic xchg
On Wed, Jan 18, 2023 at 04:35:22PM +0100, Andrzej Hajda wrote: > Andrzej Hajda (7): > arch: rename all internal names __xchg to __arch_xchg > linux/include: add non-atomic version of xchg > arch/*/uprobes: simplify arch_uretprobe_hijack_return_addr > llist: simplify __llist_del_all > io_uring: use __xchg if possible > qed: use __xchg if possible > drm/i915/gt: use __xchg instead of internal helper Nothing crazy in here I suppose, I somewhat wonder why you went through the trouble, but meh. You want me to take this through te locking tree (for the next cycle, not this one) where I normally take atomic things or does someone else want this?
[PATCH 2/2] dt-bindings: display: Add Loongson display controller
This patch add a trival DT usages for loongson display controller found in LS2k1000 SoC. Signed-off-by: suijingfeng --- .../loongson/loongson,display-controller.yaml | 58 +++ 1 file changed, 58 insertions(+) create mode 100644 Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml diff --git a/Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml b/Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml new file mode 100644 index ..98b78f449a80 --- /dev/null +++ b/Documentation/devicetree/bindings/display/loongson/loongson,display-controller.yaml @@ -0,0 +1,58 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/display/loongson/loongson,display-controller.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Loongson Display Controller Device Tree Bindings + +maintainers: + - Sui Jingfeng + +description: |+ + + The display controller is a PCI device, it has two display pipe. + For the DC in LS2K1000 each way has a DVO output interface which + provide RGB888 signals, vertical & horizontal synchronisations + and the pixel clock. Each CRTC is able to support 1920x1080@60Hz, + the maximum resolution is 2048x2048 according to the hardware spec. + +properties: + $nodename: +pattern: "^display-controller@[0-9a-f],[0-9a-f]$" + + compatible: +oneOf: + - items: + - enum: + - loongson,ls2k1000-dc + reg: +maxItems: 1 + + interrupts: +maxItems: 1 + +required: + - compatible + - reg + - interrupts + +additionalProperties: false + +examples: + - | +#include +bus { + +#address-cells = <3>; +#size-cells = <2>; +#interrupt-cells = <2>; + +display-controller@6,0 { +compatible = "loongson,ls2k1000-dc"; +reg = <0x3000 0x0 0x0 0x0 0x0>; +interrupts = <28 IRQ_TYPE_LEVEL_LOW>; +}; +}; + +... -- 2.34.1
[PATCH 1/2] Mips: ls2k1000: dts: add the display controller device node
The display controller is a pci device, it's pci vendor id is 0x0014, it's pci device id is 0x7a06. Signed-off-by: suijingfeng --- .../boot/dts/loongson/loongson64-2k1000.dtsi | 21 +++ 1 file changed, 21 insertions(+) diff --git a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi index 8143a6e3..a528af3977d9 100644 --- a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi +++ b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi @@ -31,6 +31,18 @@ memory@20 { <0x0001 0x1000 0x0001 0xb000>; /* 6912 MB at 4352MB */ }; + reserved-memory { + #address-cells = <2>; + #size-cells = <2>; + ranges; + + display_reserved: framebuffer@3000 { + compatible = "shared-dma-pool"; + reg = <0x0 0x3000 0x0 0x0400>; /* 64M */ + linux,cma-default; + }; + }; + cpu_clk: cpu_clk { #clock-cells = <0>; compatible = "fixed-clock"; @@ -198,6 +210,15 @@ sata@8,0 { interrupt-parent = <&liointc0>; }; + display-controller@6,0 { + compatible = "loongson,ls2k1000-dc"; + + reg = <0x3000 0x0 0x0 0x0 0x0>; + interrupts = <28 IRQ_TYPE_LEVEL_LOW>; + interrupt-parent = <&liointc0>; + memory-region = <&display_reserved>; + }; + pci_bridge@9,0 { compatible = "pci0014,7a19.0", "pci0014,7a19", -- 2.34.1
Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
On 2/22/23 16:14, Christian König wrote: Am 22.02.23 um 16:07 schrieb Danilo Krummrich: On 2/22/23 11:25, Christian König wrote: Am 17.02.23 um 14:44 schrieb Danilo Krummrich: +/** + * DOC: Overview + * + * The DRM GPU VA Manager, represented by struct drm_gpuva_manager keeps track + * of a GPU's virtual address (VA) space and manages the corresponding virtual + * mappings represented by &drm_gpuva objects. It also keeps track of the + * mapping's backing &drm_gem_object buffers. + * + * &drm_gem_object buffers maintain a list (and a corresponding list lock) of + * &drm_gpuva objects representing all existent GPU VA mappings using this + * &drm_gem_object as backing buffer. + * + * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA mapping can + * only be created within a previously allocated &drm_gpuva_region, which + * represents a reserved portion of the GPU VA space. GPU VA mappings are not + * allowed to span over a &drm_gpuva_region's boundary. + * + * GPU VA regions can also be flagged as sparse, which allows drivers to create + * sparse mappings for a whole GPU VA region in order to support Vulkan + * 'Sparse Resources'. Well since we have now found that there is absolutely no technical reason for having those regions could we please drop them? I disagree this was the outcome of our previous discussion. In nouveau I still need them to track the separate sparse page tables and, as you confirmed previously, Nvidia cards are not the only cards supporting this feature. The second reason is that with regions we can avoid merging between buffers, which saves some effort. However, I agree that this argument by itself probably doesn't hold too much, since you've pointed out in a previous mail that: 1) If we merge and decide to only do that inside certain boundaries then those boundaries needs to be provided and checked against. This burns quite some CPU cycles 2) If we just merge what we can we might have extra page table updates which cost time and could result in undesired side effects. 3) If we don't merge at all we have additional housekeeping for the mappings and maybe hw restrictions. However, if a driver uses regions to track its separate sparse page tables anyway it gets 1) for free, which is a nice synergy. I totally agree that regions aren't for everyone though. Hence, I made them an optional feature and by default regions are disabled. In order to use them drm_gpuva_manager_init() must be called with the DRM_GPUVA_MANAGER_REGIONS feature flag. I really would not want to open code regions or have two GPUVA manager instances in nouveau to track sparse page tables. That would be really messy, hence I hope we can agree on this to be an optional feature. I absolutely don't think that this is a good idea then. This separate handling of sparse page tables is completely Nouveau specific. Actually, I rely on what you said in a previous mail when I say it's, potentially, not specific to nouveau. This sounds similar to what AMD hw used to have up until gfx8 (I think), basically sparse resources where defined through a separate mechanism to the address resolution of the page tables. I won't rule out that other hardware has similar approaches. Even when it's optional feature mixing this into the common handling is exactly what I pointed out as not properly separating between hardware specific and hardware agnostic functionality. Optionally having regions is *not* a hardware specific concept, drivers might use it for a hardware specific purpose though. Which potentially is is the case for almost every DRM helper. Drivers can use regions only for the sake of not merging between buffer boundaries as well. Some drivers might prefer this over "never merge" or "always merge", depending on the cost of re-organizing page tables for unnecessary splits/merges, without having the need of tracking separate sparse page tables. Its just that I think *if* a driver needs to track separate sparse page tables anyways its a nice synergy since then there is no extra cost for getting this optimization. This is exactly the problem we ran into with TTM as well and I've spend a massive amount of time to clean that up again. > Admittedly, I don't know what problems you are referring to. However, I don't see which kind of trouble it could cause by allowing drivers to track regions optionally. Regards, Christian. I don't really see a need for them any more. Regards, Christian.
Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
On Wed, Feb 22, 2023 at 05:11:34PM +0100, Danilo Krummrich wrote: > On 2/21/23 19:31, Matthew Wilcox wrote: > > on tue, feb 21, 2023 at 03:37:49pm +0100, danilo krummrich wrote: > > > It feels a bit weird that I, as a user of the API, would need to lock > > > certain > > > (or all?) mas_*() functions with the internal spinlock in order to protect > > > (future) internal features of the tree, such as the slab cache > > > defragmentation > > > you mentioned. Because from my perspective, as the generic component that > > > tells > > > it's users (the drivers) to take care of locking VA space operations (and > > > hence > > > tree operations) I don't have an own purpose of this internal spinlock, > > > right? > > > > You don't ... but we can't know that. > > Thanks for the clarification. I think I should now know what to for the > GPUVA manager in terms of locking the maple tree in general. > > Though I still have very limited insights on the maple tree I want to share > some further thoughts. > > From what I got so far it really seems to me that it would be better to just > take the internal spinlock for both APIs (normal and advanced) whenever you > need to internally. No. Really, no. The point of the advanced API is that it's a toolbox for doing the operation you want, but isn't a generic enough operation to be part of the normal API. To take an example from the radix tree days, in the page cache, we need to walk a range of the tree, looking for any entries that are marked as DIRTY, clear the DIRTY mark and set the TOWRITE mark. There was a horrendous function called radix_tree_range_tag_if_tagged() which did exactly this. Now look at the implementation of tag_pages_for_writeback(); it's a simple loop over a range with an occasional pause to check whether we need to reschedule. But that means you need to know how to use the toolbox. Some of the tools are dangerous and you can cut yourself on them. > Another plus would probably be maintainability. Once you got quite a few > maple tree users using external locks (either in the sense of calling I don't want maple tree users using external locks. That exists because it was the only reasonable way of converting the VMA tree from the rbtree to the maple tree. I intend to get rid of mt_set_external_lock(). The VMAs are eventually going to be protected by the internal spinlock.
Re: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE
On 2/21/23 19:31, Matthew Wilcox wrote: On Tue, Feb 21, 2023 at 03:37:49PM +0100, Danilo Krummrich wrote: On Mon, Feb 20, 2023 at 08:33:35PM +, Matthew Wilcox wrote: On Mon, Feb 20, 2023 at 06:06:03PM +0100, Danilo Krummrich wrote: On 2/20/23 16:10, Matthew Wilcox wrote: This is why we like people to use the spinlock embedded in the tree. There's nothing for the user to care about. If the access really is serialised, acquiring/releasing the uncontended spinlock is a minimal cost compared to all the other things that will happen while modifying the tree. I think as for the users of the GPUVA manager we'd have two cases: 1) Accesses to the manager (and hence the tree) are serialized, no lock needed. 2) Multiple operations on the tree must be locked in order to make them appear atomic. Could you give an example here of what you'd like to do? Ideally something complicated so I don't say "Oh, you can just do this" when there's a more complex example for which "this" won't work. I'm sure that's embedded somewhere in the next 20-odd patches, but it's probably quicker for you to describe in terms of tree operations that have to appear atomic than for me to try to figure it out. Absolutely, not gonna ask you to read all of that. :-) One thing the GPUVA manager does is to provide drivers the (sub-)operations that need to be processed in order to fulfill a map or unmap request from userspace. For instance, when userspace asks the driver to map some memory the GPUVA manager calculates which existing mappings must be removed, split up or can be merged with the newly requested mapping. A driver has two ways to fetch those operations from the GPUVA manager. It can either obtain a list of operations or receive a callback for each operation generated by the GPUVA manager. In both cases the GPUVA manager walks the maple tree, which keeps track of existing mappings, for the given range in __drm_gpuva_sm_map() (only considering the map case, since the unmap case is a subset basically). For each mapping found in the given range the driver, as mentioned, either receives a callback or a list entry is added to the list of operations. Typically, for each operation / callback one entry within the maple tree is removed and, optionally at the beginning and end of a new mapping's range, a new entry is inserted. An of course, as the last operation, there is the new mapping itself to insert. The GPUVA manager delegates locking responsibility to the drivers. Typically, a driver either serializes access to the VA space managed by the GPUVA manager (no lock needed) or need to lock the processing of a full set of operations generated by the GPUVA manager. OK, that all makes sense. It does make sense to have the driver use its own mutex and then take the spinlock inside the maple tree code. It shouldn't ever be contended. In either case the embedded spinlock wouldn't be useful, we'd either need an external lock or no lock at all. If there are any internal reasons why specific tree operations must be mutually excluded (such as those you explain below), wouldn't it make more sense to always have the internal lock and, optionally, allow users to specify an external lock additionally? So the way this works for the XArray, which is a little older than the Maple tree, is that we always use the internal spinlock for modifications (possibly BH or IRQ safe), and if someone wants to use an external mutex to make some callers atomic with respect to each other, they're free to do so. In that case, the XArray doesn't check the user's external locking at all, because it really can't know. I'd advise taking that approach; if there's really no way to use the internal spinlock to make your complicated updates appear atomic then just let the maple tree use its internal spinlock, and you can also use your external mutex however you like. That sounds like the right thing to do. However, I'm using the advanced API of the maple tree (and that's the reason why the above example appears a little more detailed than needed) because I think with the normal API I can't insert / remove tree entries while walking the tree, right? Right. The normal API is for simple operations while the advanced API is for doing compound operations. As by the documentation the advanced API, however, doesn't take care of locking itself, hence just letting the maple tree use its internal spinlock doesn't really work - I need to take care of that myself, right? Yes; once you're using the advanced API, you get to compose the entire operation yourself. It feels a bit weird that I, as a user of the API, would need to lock certain (or all?) mas_*() functions with the internal spinlock in order to protect (future) internal features of the tree, such as the slab cache defragmentation you mentioned. Because from my perspective, as the generic component that tells it's users (the drivers) to take care of locking VA space operations (and hence tree o
Re: [PATCH] drm/msm: return early when allocating fbdev fails
Hi Am 22.02.23 um 16:56 schrieb Tom Rix: building with clang and W=1 reports drivers/gpu/drm/msm/msm_fbdev.c:144:6: error: variable 'helper' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (!fbdev) ^~ helper is only initialized after fbdev succeeds, so is in a garbage state at the fail: label. There is nothing to unwinded if fbdev alloaction fails, return NULL. Fixes: 3fb1f62f80a1 ("drm/fb-helper: Remove drm_fb_helper_unprepare() from drm_fb_helper_fini()") Signed-off-by: Tom Rix Already fixed here: https://lore.kernel.org/dri-devel/08e3340e-b459-0e60-4bba-30716b675...@suse.de/T/#t Best regards Thomas --- drivers/gpu/drm/msm/msm_fbdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c index c804e5ba682a..c1356aff87da 100644 --- a/drivers/gpu/drm/msm/msm_fbdev.c +++ b/drivers/gpu/drm/msm/msm_fbdev.c @@ -142,7 +142,7 @@ struct drm_fb_helper *msm_fbdev_init(struct drm_device *dev) fbdev = kzalloc(sizeof(*fbdev), GFP_KERNEL); if (!fbdev) - goto fail; + return NULL; helper = &fbdev->base; -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature
[PATCH] drm/vgem: Drop struct drm_vgem_gem_object
Commit 45d9c8dde4cd ("drm/vgem: use shmem helpers") introduced shmem helpers to vgem and with that, removed all uses of the struct drm_vgem_gem_object. So, as the struct is no longer used, delete it. Signed-off-by: Maíra Canal --- drivers/gpu/drm/vgem/vgem_drv.h | 11 --- 1 file changed, 11 deletions(-) diff --git a/drivers/gpu/drm/vgem/vgem_drv.h b/drivers/gpu/drm/vgem/vgem_drv.h index 0ed300317f87..34cf63e6fb3d 100644 --- a/drivers/gpu/drm/vgem/vgem_drv.h +++ b/drivers/gpu/drm/vgem/vgem_drv.h @@ -39,17 +39,6 @@ struct vgem_file { struct mutex fence_mutex; }; -#define to_vgem_bo(x) container_of(x, struct drm_vgem_gem_object, base) -struct drm_vgem_gem_object { - struct drm_gem_object base; - - struct page **pages; - unsigned int pages_pin_count; - struct mutex pages_lock; - - struct sg_table *table; -}; - int vgem_fence_open(struct vgem_file *file); int vgem_fence_attach_ioctl(struct drm_device *dev, void *data, -- 2.39.2
Re: [Intel-xe] [PATCH 1/3] drm/suballoc: Introduce a generic suballocation manager
On 2/22/23 15:20, Christian König wrote: Am 22.02.23 um 14:54 schrieb Thomas Hellström: Hi, On 2/22/23 12:39, Christian König wrote: Hi Thomas, Am 22.02.23 um 12:00 schrieb Thomas Hellström: Hi, Christian, So I resurrected Maarten's previous patch series around this (the amdgpu suballocator) slightly modified the code to match the API of this patch series, re-introduced the per-allocation alignment as per a previous review comment from you on that series, and made checkpatch.pl pass mostly, except for pre-existing style problems, and added / fixed some comments. No memory corruption seen so far on limited Xe testing. To move this forward I suggest starting with that as a common drm suballocator. I'll post the series later today. We can follow up with potential simplifactions lif needed. I also made a kunit test also reporting some timing information. Will post that as a follow up. Some interesting preliminary conclusions: * drm_mm is per se not a cpu hog, If the rb tree processing is disabled and the EVICT algorithm is changed from MRU to ring-like LRU traversal, it's more or less just as fast as the ring suballocator. * With a single ring, and the suballocation buffer never completely filled (no sleeps) the amd suballocator is a bit faster per allocation / free. (Around 250 ns instead of 350). Allocation is slightly slower on the amdgpu one, freeing is faster, mostly due to the locking overhead incurred when setting up the fence callbacks, and for avoiding irq-disabled processing on the one I proposed. For some more realistic numbers try to signal the fence from another CPU. Alternative you can invalidate all the CPU read cache lines touched by the fence callback so that they need to be read in again from the allocating CPU. Fences are signalled using hr-timer driven fake "ring"s, so should probably be distributed among cpus in a pretty realistic way. But anyway I agree results obtained from that kunit test can and should be challenged before we actually use them for improvements. I would double check that. My expectation is that hr-timers execute by default on the CPU from which they are started. Hmm, since not using the _PINNED hrtimer flag, I'd expect them to be more distributed but you're right, they weren't. A rather few timer_expires from other cpus only. So figures for signalling on other cpus are, around 500ns for the amdgpu variant, around 900 ns for the fence-callback one. Still, sleeping starts around 50-75% fill with the amdgpu variant. * With multiple rings and varying allocation sizes and signalling times creating fragmentation, the picture becomes different as the amdgpu allocator starts to sleep/throttle already round 50% - 75% fill. The one I proposed between 75% to 90% fill, and once that happens, the CPU cost of putting to sleep and waking up should really shadow the above numbers. So it's really a tradeoff. Where IMO also code size and maintainability should play a role. Also I looked at the history of the amdgpu allocator originating back to Radeon 2012-ish, but couldn't find any commits mentioning fence callbacks nor problem with those. Could you point me to that discussion? Uff that was ~10 years ago. I don't think I can find that again. OK, fair enough. But what was the objective reasoning against using fence callbacks for this sort of stuff, was it unforeseen locking problems, caching issues or something else? Well caching line bouncing is one major problem. Also take a look at the discussion about using list_head in interrupt handlers, that should be easy to find on LWN. The allocator usually manages enough memory so that it never runs into waiting for anything, only in extreme cases like GPU resets we actually wait for allocations to be freed. I guess this varies with the application, but can be remedied with just adding more managed memory if needed. /Thomas So the only cache lines which is accessed from more than one CPU should be the signaled flag of the fence. With moving list work into the interrupt handler you have at least 3 cache lines which start to bounce between different CPUs. Regards, Christian. Thanks, Thomas Regards, Christian. Thanks, Thomas On 2/17/23 14:51, Thomas Hellström wrote: On 2/17/23 14:18, Christian König wrote: Am 17.02.23 um 14:10 schrieb Thomas Hellström: [SNIP] Any chance you could do a quick performance comparison? If not, anything against merging this without the amd / radeon changes until we can land a simpler allocator? Only if you can stick the allocator inside Xe and not drm, cause this seems to be for a different use case than the allocators inside radeon/amdgpu. Hmm. No It's allocating in a ring-like fashion as well. Let me put together a unit test for benchmaking. I think it would be a failure for the community to end up with three separate suballocators doing the exact same thing for the same proble
[PATCH] drm/msm: return early when allocating fbdev fails
building with clang and W=1 reports drivers/gpu/drm/msm/msm_fbdev.c:144:6: error: variable 'helper' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (!fbdev) ^~ helper is only initialized after fbdev succeeds, so is in a garbage state at the fail: label. There is nothing to unwinded if fbdev alloaction fails, return NULL. Fixes: 3fb1f62f80a1 ("drm/fb-helper: Remove drm_fb_helper_unprepare() from drm_fb_helper_fini()") Signed-off-by: Tom Rix --- drivers/gpu/drm/msm/msm_fbdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c index c804e5ba682a..c1356aff87da 100644 --- a/drivers/gpu/drm/msm/msm_fbdev.c +++ b/drivers/gpu/drm/msm/msm_fbdev.c @@ -142,7 +142,7 @@ struct drm_fb_helper *msm_fbdev_init(struct drm_device *dev) fbdev = kzalloc(sizeof(*fbdev), GFP_KERNEL); if (!fbdev) - goto fail; + return NULL; helper = &fbdev->base; -- 2.27.0
Re: [PATCH v4 10/14] drm/vblank: Add helper to get next vblank time
On Wed, Feb 22, 2023 at 07:44:42AM -0800, Rob Clark wrote: > On Wed, Feb 22, 2023 at 1:57 AM Pekka Paalanen wrote: > > > > On Tue, 21 Feb 2023 09:50:20 -0800 > > Rob Clark wrote: > > > > > On Tue, Feb 21, 2023 at 5:01 AM Ville Syrjälä > > > wrote: > > > > > > > > On Tue, Feb 21, 2023 at 10:45:51AM +0200, Pekka Paalanen wrote: > > > > > On Mon, 20 Feb 2023 07:55:41 -0800 > > > > > Rob Clark wrote: > > > > > > > > > > > On Mon, Feb 20, 2023 at 1:08 AM Pekka Paalanen > > > > > > wrote: > > > > > > > > > > > > > > On Sat, 18 Feb 2023 13:15:53 -0800 > > > > > > > Rob Clark wrote: > > > > > > > > > > > > > > > From: Rob Clark > > > > > > > > > > > > > > > > Will be used in the next commit to set a deadline on fences > > > > > > > > that an > > > > > > > > atomic update is waiting on. > > > > > > > > > > > > > > > > Signed-off-by: Rob Clark > > > > > > > > --- > > > > > > > > drivers/gpu/drm/drm_vblank.c | 32 > > > > > > > > > > > > > > > > include/drm/drm_vblank.h | 1 + > > > > > > > > 2 files changed, 33 insertions(+) > > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_vblank.c > > > > > > > > b/drivers/gpu/drm/drm_vblank.c > > > > > > > > index 2ff31717a3de..caf25ebb34c5 100644 > > > > > > > > --- a/drivers/gpu/drm/drm_vblank.c > > > > > > > > +++ b/drivers/gpu/drm/drm_vblank.c > > > > > > > > @@ -980,6 +980,38 @@ u64 drm_crtc_vblank_count_and_time(struct > > > > > > > > drm_crtc *crtc, > > > > > > > > } > > > > > > > > EXPORT_SYMBOL(drm_crtc_vblank_count_and_time); > > > > > > > > > > > > > > > > +/** > > > > > > > > + * drm_crtc_next_vblank_time - calculate the time of the next > > > > > > > > vblank > > > > > > > > + * @crtc: the crtc for which to calculate next vblank time > > > > > > > > + * @vblanktime: pointer to time to receive the next vblank > > > > > > > > timestamp. > > > > > > > > + * > > > > > > > > + * Calculate the expected time of the next vblank based on > > > > > > > > time of previous > > > > > > > > + * vblank and frame duration > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > for VRR this targets the highest frame rate possible for the > > > > > > > current > > > > > > > VRR mode, right? > > > > > > > > > > > > > > > > > > > It is based on vblank->framedur_ns which is in turn based on > > > > > > mode->crtc_clock. Presumably for VRR that ends up being a maximum? > > > > > > > > > > I don't know. :-) > > > > > > > > At least for i915 this will give you the maximum frame > > > > duration. > > > > > > I suppose one could argue that maximum frame duration is the actual > > > deadline. Anything less is just moar fps, but not going to involve > > > stalling until vblank N+1, AFAIU > > > > > > > Also this does not calculate the the start of vblank, it > > > > calculates the start of active video. > > > > > > Probably something like end of previous frame's video.. might not be > > > _exactly_ correct (because some buffering involved), but OTOH on the > > > GPU side, I expect the driver to set a timer for a few ms or so before > > > the deadline. So there is some wiggle room. > > > > The vblank timestamp is defined to be the time of the first active > > pixel of the frame in the video signal. At least that's the one that > > UAPI carries (when not tearing?). It is not the start of vblank period. > > > > With VRR, the front porch before the first active pixel can be multiple > > milliseconds. The difference between 144 Hz and 60 Hz is 9.7 ms for > > example. > > What we really want is the deadline for the hw to latch for the next > frame.. which as Ville pointed out is definitely before the end of > vblank. > > Honestly this sort of feature is a lot more critical for the non-VRR > case, and VRR is kind of a minority edge case. So I'd prefer not to > get too hung up on VRR. If there is an easy way for the helpers to > detect VRR, I'd be perfectly fine not setting a deadline hint in that > case, and let someone who actually has a VRR display figure out how to > handle that case. The formula I gave you earlier works for both VRR and non-VRR. -- Ville Syrjälä Intel
Re: [PATCH v4 03/14] dma-buf/fence-chain: Add fence deadline support
On Wed, Feb 22, 2023 at 2:27 AM Tvrtko Ursulin wrote: > > > On 18/02/2023 21:15, Rob Clark wrote: > > From: Rob Clark > > > > Propagate the deadline to all the fences in the chain. > > > > Signed-off-by: Rob Clark > > Reviewed-by: Christian König for this one. > > --- > > drivers/dma-buf/dma-fence-chain.c | 13 + > > 1 file changed, 13 insertions(+) > > > > diff --git a/drivers/dma-buf/dma-fence-chain.c > > b/drivers/dma-buf/dma-fence-chain.c > > index a0d920576ba6..4684874af612 100644 > > --- a/drivers/dma-buf/dma-fence-chain.c > > +++ b/drivers/dma-buf/dma-fence-chain.c > > @@ -206,6 +206,18 @@ static void dma_fence_chain_release(struct dma_fence > > *fence) > > dma_fence_free(fence); > > } > > > > + > > +static void dma_fence_chain_set_deadline(struct dma_fence *fence, > > + ktime_t deadline) > > +{ > > + dma_fence_chain_for_each(fence, fence) { > > + struct dma_fence_chain *chain = to_dma_fence_chain(fence); > > + struct dma_fence *f = chain ? chain->fence : fence; > > Low level comment - above two lines could be replaced with: > > struct dma_fence *f = dma_fence_chain_contained(fence); > > Although to be fair I am not sure that wouldn't be making it less > readable. From the point of view that fence might not be a chain, so > dma_fence_chain_contained() reads a bit dodgy as an API. It does seem to be the canonical way to do it since 18f5fad275ef ("dma-buf: add dma_fence_chain_contained helper").. looks like I missed that when I rebased. I guess for consistency it's best that I use the helper. BR, -R > Regards, > > Tvrtko > > > + > > + dma_fence_set_deadline(f, deadline); > > + } > > +} > > + > > const struct dma_fence_ops dma_fence_chain_ops = { > > .use_64bit_seqno = true, > > .get_driver_name = dma_fence_chain_get_driver_name, > > @@ -213,6 +225,7 @@ const struct dma_fence_ops dma_fence_chain_ops = { > > .enable_signaling = dma_fence_chain_enable_signaling, > > .signaled = dma_fence_chain_signaled, > > .release = dma_fence_chain_release, > > + .set_deadline = dma_fence_chain_set_deadline, > > }; > > EXPORT_SYMBOL(dma_fence_chain_ops); > >
Re: [PATCH v4 11/14] drm/atomic-helper: Set fence deadline for vblank
On Wed, Feb 22, 2023 at 2:46 AM Luben Tuikov wrote: > > On 2023-02-18 16:15, Rob Clark wrote: > > From: Rob Clark > > > > For an atomic commit updating a single CRTC (ie. a pageflip) calculate > > the next vblank time, and inform the fence(s) of that deadline. > > > > v2: Comment typo fix (danvet) > > > > Signed-off-by: Rob Clark > > Reviewed-by: Daniel Vetter > > Signed-off-by: Rob Clark > > --- > > drivers/gpu/drm/drm_atomic_helper.c | 36 + > > 1 file changed, 36 insertions(+) > > > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c > > b/drivers/gpu/drm/drm_atomic_helper.c > > index d579fd8f7cb8..35a4dc714920 100644 > > --- a/drivers/gpu/drm/drm_atomic_helper.c > > +++ b/drivers/gpu/drm/drm_atomic_helper.c > > @@ -1511,6 +1511,40 @@ void drm_atomic_helper_commit_modeset_enables(struct > > drm_device *dev, > > } > > EXPORT_SYMBOL(drm_atomic_helper_commit_modeset_enables); > > > > +/* > > + * For atomic updates which touch just a single CRTC, calculate the time > > of the > > + * next vblank, and inform all the fences of the deadline. > > + */ > > +static void set_fence_deadline(struct drm_device *dev, > > +struct drm_atomic_state *state) > > +{ > > + struct drm_crtc *crtc, *wait_crtc = NULL; > > + struct drm_crtc_state *new_crtc_state; > > + struct drm_plane *plane; > > + struct drm_plane_state *new_plane_state; > > + ktime_t vbltime; > > I've not looked at the latest language spec, but is AFAIR "vbltime" > would be uninitialized here. Has this changed? > > > + int i; > > + > > + for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) { > > + if (wait_crtc) > > + return; > > + wait_crtc = crtc; > > + } > > + > > + /* If no CRTCs updated, then nothing to do: */ > > + if (!wait_crtc) > > + return; > > + > > + if (drm_crtc_next_vblank_time(wait_crtc, &vbltime)) > > + return; > > We have a problem here in that we're adding the time remaining to the next > vblank event to an uninitialized local variable. As per my comment on patch > 10, > I'd rather drm_crtc_next_vblank_time() yield the time remaining to the vblank > event, > and we can do the arithmetic locally here in this function. if drm_crtc_next_vblank_time() returns 0 then it has initialized vbltime, so no problem here BR, -R > -- > Regards, > Luben > > > + > > + for_each_new_plane_in_state (state, plane, new_plane_state, i) { > > + if (!new_plane_state->fence) > > + continue; > > + dma_fence_set_deadline(new_plane_state->fence, vbltime); > > + } > > +} > > + > > /** > > * drm_atomic_helper_wait_for_fences - wait for fences stashed in plane > > state > > * @dev: DRM device > > @@ -1540,6 +1574,8 @@ int drm_atomic_helper_wait_for_fences(struct > > drm_device *dev, > > struct drm_plane_state *new_plane_state; > > int i, ret; > > > > + set_fence_deadline(dev, state); > > + > > for_each_new_plane_in_state(state, plane, new_plane_state, i) { > > if (!new_plane_state->fence) > > continue; >
Re: [PATCH v4 10/14] drm/vblank: Add helper to get next vblank time
On Wed, Feb 22, 2023 at 2:37 AM Luben Tuikov wrote: > > On 2023-02-18 16:15, Rob Clark wrote: > > From: Rob Clark > > > > Will be used in the next commit to set a deadline on fences that an > > atomic update is waiting on. > > > > Signed-off-by: Rob Clark > > --- > > drivers/gpu/drm/drm_vblank.c | 32 > > include/drm/drm_vblank.h | 1 + > > 2 files changed, 33 insertions(+) > > > > diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c > > index 2ff31717a3de..caf25ebb34c5 100644 > > --- a/drivers/gpu/drm/drm_vblank.c > > +++ b/drivers/gpu/drm/drm_vblank.c > > @@ -980,6 +980,38 @@ u64 drm_crtc_vblank_count_and_time(struct drm_crtc > > *crtc, > > } > > EXPORT_SYMBOL(drm_crtc_vblank_count_and_time); > > > > +/** > > + * drm_crtc_next_vblank_time - calculate the time of the next vblank > > + * @crtc: the crtc for which to calculate next vblank time > > + * @vblanktime: pointer to time to receive the next vblank timestamp. > > + * > > + * Calculate the expected time of the next vblank based on time of previous > > + * vblank and frame duration > > + */ > > +int drm_crtc_next_vblank_time(struct drm_crtc *crtc, ktime_t *vblanktime) > > +{ > > + unsigned int pipe = drm_crtc_index(crtc); > > + struct drm_vblank_crtc *vblank = &crtc->dev->vblank[pipe]; > > + u64 count; > > + > > + if (!vblank->framedur_ns) > > + return -EINVAL; > > + > > + count = drm_vblank_count_and_time(crtc->dev, pipe, vblanktime); > > + > > + /* > > + * If we don't get a valid count, then we probably also don't > > + * have a valid time: > > + */ > > + if (!count) > > + return -EINVAL; > > + > > + *vblanktime = ktime_add(*vblanktime, > > ns_to_ktime(vblank->framedur_ns)); > > I'd rather this not do any arithmetic, i.e. add, and simply return the > calculated > remaining time, i.e. time left--so instead of add, it would simply assign > the remaining time, and possibly rename the vblanktime to something like > "time_to_vblank." > Note that since I sent the last iteration, I've renamed it to drm_crtc_next_vblank_start(). I would prefer to keep the arithmetic, because I have another use for this helper in drm/msm (for async/cursor updates, where we want to set an hrtimer for start of vblank). It is a bit off the topic of this series, but I can include the patch when I repost. BR, -R > Changing the top comment to "calculate the time remaining to the next vblank". > -- > Regards, > Luben > > > + > > + return 0; > > +} > > +EXPORT_SYMBOL(drm_crtc_next_vblank_time); > > + > > static void send_vblank_event(struct drm_device *dev, > > struct drm_pending_vblank_event *e, > > u64 seq, ktime_t now) > > diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h > > index 733a3e2d1d10..a63bc2c92f3c 100644 > > --- a/include/drm/drm_vblank.h > > +++ b/include/drm/drm_vblank.h > > @@ -230,6 +230,7 @@ bool drm_dev_has_vblank(const struct drm_device *dev); > > u64 drm_crtc_vblank_count(struct drm_crtc *crtc); > > u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc, > > ktime_t *vblanktime); > > +int drm_crtc_next_vblank_time(struct drm_crtc *crtc, ktime_t *vblanktime); > > void drm_crtc_send_vblank_event(struct drm_crtc *crtc, > > struct drm_pending_vblank_event *e); > > void drm_crtc_arm_vblank_event(struct drm_crtc *crtc, >
Re: [PATCH v4 10/14] drm/vblank: Add helper to get next vblank time
On Wed, Feb 22, 2023 at 1:57 AM Pekka Paalanen wrote: > > On Tue, 21 Feb 2023 09:50:20 -0800 > Rob Clark wrote: > > > On Tue, Feb 21, 2023 at 5:01 AM Ville Syrjälä > > wrote: > > > > > > On Tue, Feb 21, 2023 at 10:45:51AM +0200, Pekka Paalanen wrote: > > > > On Mon, 20 Feb 2023 07:55:41 -0800 > > > > Rob Clark wrote: > > > > > > > > > On Mon, Feb 20, 2023 at 1:08 AM Pekka Paalanen > > > > > wrote: > > > > > > > > > > > > On Sat, 18 Feb 2023 13:15:53 -0800 > > > > > > Rob Clark wrote: > > > > > > > > > > > > > From: Rob Clark > > > > > > > > > > > > > > Will be used in the next commit to set a deadline on fences that > > > > > > > an > > > > > > > atomic update is waiting on. > > > > > > > > > > > > > > Signed-off-by: Rob Clark > > > > > > > --- > > > > > > > drivers/gpu/drm/drm_vblank.c | 32 > > > > > > > > > > > > > > include/drm/drm_vblank.h | 1 + > > > > > > > 2 files changed, 33 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_vblank.c > > > > > > > b/drivers/gpu/drm/drm_vblank.c > > > > > > > index 2ff31717a3de..caf25ebb34c5 100644 > > > > > > > --- a/drivers/gpu/drm/drm_vblank.c > > > > > > > +++ b/drivers/gpu/drm/drm_vblank.c > > > > > > > @@ -980,6 +980,38 @@ u64 drm_crtc_vblank_count_and_time(struct > > > > > > > drm_crtc *crtc, > > > > > > > } > > > > > > > EXPORT_SYMBOL(drm_crtc_vblank_count_and_time); > > > > > > > > > > > > > > +/** > > > > > > > + * drm_crtc_next_vblank_time - calculate the time of the next > > > > > > > vblank > > > > > > > + * @crtc: the crtc for which to calculate next vblank time > > > > > > > + * @vblanktime: pointer to time to receive the next vblank > > > > > > > timestamp. > > > > > > > + * > > > > > > > + * Calculate the expected time of the next vblank based on time > > > > > > > of previous > > > > > > > + * vblank and frame duration > > > > > > > > > > > > Hi, > > > > > > > > > > > > for VRR this targets the highest frame rate possible for the current > > > > > > VRR mode, right? > > > > > > > > > > > > > > > > It is based on vblank->framedur_ns which is in turn based on > > > > > mode->crtc_clock. Presumably for VRR that ends up being a maximum? > > > > > > > > I don't know. :-) > > > > > > At least for i915 this will give you the maximum frame > > > duration. > > > > I suppose one could argue that maximum frame duration is the actual > > deadline. Anything less is just moar fps, but not going to involve > > stalling until vblank N+1, AFAIU > > > > > Also this does not calculate the the start of vblank, it > > > calculates the start of active video. > > > > Probably something like end of previous frame's video.. might not be > > _exactly_ correct (because some buffering involved), but OTOH on the > > GPU side, I expect the driver to set a timer for a few ms or so before > > the deadline. So there is some wiggle room. > > The vblank timestamp is defined to be the time of the first active > pixel of the frame in the video signal. At least that's the one that > UAPI carries (when not tearing?). It is not the start of vblank period. > > With VRR, the front porch before the first active pixel can be multiple > milliseconds. The difference between 144 Hz and 60 Hz is 9.7 ms for > example. What we really want is the deadline for the hw to latch for the next frame.. which as Ville pointed out is definitely before the end of vblank. Honestly this sort of feature is a lot more critical for the non-VRR case, and VRR is kind of a minority edge case. So I'd prefer not to get too hung up on VRR. If there is an easy way for the helpers to detect VRR, I'd be perfectly fine not setting a deadline hint in that case, and let someone who actually has a VRR display figure out how to handle that case. BR, -R
Re: [PATCH 0/2] Add quirk to disable PSR 2 on Tongfang PHxTxX1 and PHxTQx1
Am 22.02.23 um 15:17 schrieb Werner Sembach: On these Barebones PSR 2 is recognized as supported but is very buggy: - Upper third of screen does sometimes not updated, resulting in disappearing cursors or ghosts of already closed Windows saying behind. - Approximately 40 px from the bottom edge a 3 pixel wide strip of randomly colored pixels is flickering. PSR 1 is working fine however. This patchset introduces a new quirk to disable PSR 2 specifically on known buggy devices and applies it to the Tongfang PHxTxX1 and PHxTQx1 barebones. Signed-off-by: Werner Sembach Cc: Parralel to this there is a patch fixing the root cause of this issue: https://gitlab.freedesktop.org/drm/intel/-/issues/7347#note_1785094 So this quirk might only be relevant for stable kernels, depending on when that other patch gets merged.
Re: [PATCH v4 06/14] dma-buf/sync_file: Support (E)POLLPRI
On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen wrote: > > On Tue, 21 Feb 2023 09:53:56 -0800 > Rob Clark wrote: > > > On Tue, Feb 21, 2023 at 8:48 AM Luben Tuikov wrote: > > > > > > On 2023-02-20 11:14, Rob Clark wrote: > > > > On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen > > > > wrote: > > > >> > > > >> On Sat, 18 Feb 2023 13:15:49 -0800 > > > >> Rob Clark wrote: > > > >> > > > >>> From: Rob Clark > > > >>> > > > >>> Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent > > > >>> wait (as opposed to a "housekeeping" wait to know when to cleanup > > > >>> after > > > >>> some work has completed). Usermode components of GPU driver stacks > > > >>> often poll() on fence fd's to know when it is safe to do things like > > > >>> free or reuse a buffer, but they can also poll() on a fence fd when > > > >>> waiting to read back results from the GPU. The EPOLLPRI/POLLPRI flag > > > >>> lets the kernel differentiate these two cases. > > > >>> > > > >>> Signed-off-by: Rob Clark > > > >> > > > >> Hi, > > > >> > > > >> where would the UAPI documentation of this go? > > > >> It seems to be missing. > > > > > > > > Good question, I am not sure. The poll() man page has a description, > > > > but my usage doesn't fit that _exactly_ (but OTOH the description is a > > > > bit vague). > > > > > > > >> If a Wayland compositor is polling application fences to know which > > > >> client buffer to use in its rendering, should the compositor poll with > > > >> PRI or not? If a compositor polls with PRI, then all fences from all > > > >> applications would always be PRI. Would that be harmful somehow or > > > >> would it be beneficial? > > > > > > > > I think a compositor would rather use the deadline ioctl and then poll > > > > without PRI. Otherwise you are giving an urgency signal to the fence > > > > signaller which might not necessarily be needed. > > > > > > > > The places where I expect PRI to be useful is more in mesa (things > > > > like glFinish(), readpix, and other similar sorts of blocking APIs) > > > Hi, > > > > > > Hmm, but then user-space could do the opposite, namely, submit work as > > > usual--never > > > using the SET_DEADLINE ioctl, and then at the end, poll using (E)POLLPRI. > > > That seems > > > like a possible usage pattern, unintended--maybe, but possible. Do we > > > want to discourage > > > this? Wouldn't SET_DEADLINE be enough? I mean, one can call SET_DEADLINE > > > with the current > > > time, and then wouldn't that be equivalent to (E)POLLPRI? > > > > Yeah, (E)POLLPRI isn't strictly needed if we have SET_DEADLINE. It is > > slightly more convenient if you want an immediate deadline (single > > syscall instead of two), but not strictly needed. OTOH it piggy-backs > > on existing UABI. > > In that case, I would be conservative, and not add the POLLPRI > semantics. An UAPI addition that is not strictly needed and somewhat > unclear if it violates any design principles is best not done, until it > is proven to be beneficial. > > Besides, a Wayland compositor does not necessary need to add the fd > to its main event loop for poll. It could just SET_DEADLINE, and then > when it renders simply check if the fence passed or not already. Not > polling means the compositor does not need to wake up at the moment the > fence signals to just record a flag. poll(POLLPRI) isn't intended for wayland.. but is a thing I want in mesa for fence waits. I _could_ use SET_DEADLINE but it is two syscalls and correspondingly more code ;-) > On another matter, if the application uses SET_DEADLINE with one > timestamp, and the compositor uses SET_DEADLINE on the same thing with > another timestamp, what should happen? The expectation is that many deadline hints can be set on a fence. The fence signaller should track the soonest deadline. BR, -R > Maybe it's a soft-realtime app whose primary goal is not display, and > it needs the result faster than the window server? > > Maybe SET_DEADLINE should set the deadline only to an earlier timestamp > and never later? > > > Thanks, > pq
Re: [PATCH v5 09/14] drm/syncobj: Add deadline support for syncobj waits
On Wed, Feb 22, 2023 at 6:06 AM Rodrigo Vivi wrote: > > On Wed, Feb 22, 2023 at 12:09:04PM +0200, Pekka Paalanen wrote: > > On Tue, 21 Feb 2023 09:25:18 -0800 > > Rob Clark wrote: > > > > > On Tue, Feb 21, 2023 at 12:53 AM Pekka Paalanen > > > wrote: > > > > > > > > On Mon, 20 Feb 2023 12:18:56 -0800 > > > > Rob Clark wrote: > > > > > > > > > From: Rob Clark > > > > > > > > > > Add a new flag to let userspace provide a deadline as a hint for > > > > > syncobj > > > > > and timeline waits. This gives a hint to the driver signaling the > > > > > backing fences about how soon userspace needs it to compete work, so > > > > > it > > > > > can addjust GPU frequency accordingly. An immediate deadline can be > > > > > given to provide something equivalent to i915 "wait boost". > > > > > > > > > > v2: Use absolute u64 ns value for deadline hint, drop cap and driver > > > > > feature flag in favor of allowing count_handles==0 as a way for > > > > > userspace to probe kernel for support of new flag > > > > > > > > > > Signed-off-by: Rob Clark > > > > > --- > > > > > drivers/gpu/drm/drm_syncobj.c | 59 > > > > > +++ > > > > > include/uapi/drm/drm.h| 5 +++ > > > > > 2 files changed, 51 insertions(+), 13 deletions(-) > > > > > > > > ... > > > > > > > > > diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h > > > > > index 642808520d92..aefc8cc743e0 100644 > > > > > --- a/include/uapi/drm/drm.h > > > > > +++ b/include/uapi/drm/drm.h > > > > > @@ -887,6 +887,7 @@ struct drm_syncobj_transfer { > > > > > #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0) > > > > > #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1) > > > > > #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE (1 << 2) /* wait for > > > > > time point to become available */ > > > > > +#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE (1 << 3) /* set fence > > > > > deadline based to deadline_nsec/sec */ > > > > > > > > Hi, > > > > > > > > where is the UAPI documentation explaining what is a "fence deadline" > > > > and what setting it does here? > > > > > > It's with the rest of the drm_syncobj UAPI docs ;-) > > > > Is that > > https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html#dma-fence-uabi-sync-file > > ? > > > > That whole page never mentions e.g. WAIT_AVAILABLE, so at least the > > flags are not there. Does not mention syncobj_wait either. Sorry, that was a snarky reference to thin UABI docs about syncobj. The kernel side of it is better documented. > probably this: > https://docs.kernel.org/gpu/drm-mm.html > > the new one needs to be added there as well. > > > > > I could ask where the real non-IGT userspace is or the plan for it, > > too, since this is new DRM UAPI. > > yeap, it looks like we need to close on this... > https://gitlab.freedesktop.org/drm/intel/-/issues/8014 > > I confess I got lost on the many discussions and on how this will > be used. Is mesa going to set the deadline based on the vk priority? > > Will this continue to be internal? I didn't get the broader picture > I'm afraid... Yes, the plan is to use it from mesa vk-common helpers (and elsewhere in mesa if needed). There is a separate discussion[1] about limiting/allowing boost (perhaps based on ctx flag or cgroups) but that is more about how drivers react to the deadline hint. The immediate goal of this patch is just to fix the regression mentioned in that gitlab issue when using syncobj waits instead of DRM_IOCTL_I915_GEM_WAIT BR, -R [1] https://www.spinics.net/lists/dri-devel/msg383075.html > > > > > > Thanks, > > pq > > > > > > > > BR, > > > -R > > > > > > > btw. no nsec/sec anymore. > > > > > > > > > > > > Thanks, > > > > pq > > > > > > > > > > > > > struct drm_syncobj_wait { > > > > > __u64 handles; > > > > > /* absolute timeout */ > > > > > @@ -895,6 +896,8 @@ struct drm_syncobj_wait { > > > > > __u32 flags; > > > > > __u32 first_signaled; /* only valid when not waiting all */ > > > > > __u32 pad; > > > > > + /* Deadline hint to set on backing fence(s) in CLOCK_MONOTONIC: > > > > > */ > > > > > + __u64 deadline_ns; > > > > > }; > > > > > > > > > > struct drm_syncobj_timeline_wait { > > > > > @@ -907,6 +910,8 @@ struct drm_syncobj_timeline_wait { > > > > > __u32 flags; > > > > > __u32 first_signaled; /* only valid when not waiting all */ > > > > > __u32 pad; > > > > > + /* Deadline hint to set on backing fence(s) in CLOCK_MONOTONIC: > > > > > */ > > > > > + __u64 deadline_ns; > > > > > }; > > > > > > > > > > > > > > > > > >
Re: [PATCH v4 01/14] dma-buf/dma-fence: Add deadline awareness
Am 22.02.23 um 11:23 schrieb Tvrtko Ursulin: On 18/02/2023 21:15, Rob Clark wrote: From: Rob Clark Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling. v2: Drop dma_fence::deadline and related logic to filter duplicate deadlines, to avoid increasing dma_fence size. The fence-context implementation will need similar logic to track deadlines of all the fences on the same timeline. [ckoenig] v3: Clarify locking wrt. set_deadline callback Signed-off-by: Rob Clark Reviewed-by: Christian König --- drivers/dma-buf/dma-fence.c | 20 include/linux/dma-fence.h | 20 2 files changed, 40 insertions(+) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 0de0482cd36e..763b32627684 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -912,6 +912,26 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout); + +/** + * dma_fence_set_deadline - set desired fence-wait deadline + * @fence: the fence that is to be waited on + * @deadline: the time by which the waiter hopes for the fence to be + * signaled + * + * Inform the fence signaler of an upcoming deadline, such as vblank, by + * which point the waiter would prefer the fence to be signaled by. This + * is intended to give feedback to the fence signaler to aid in power + * management decisions, such as boosting GPU frequency if a periodic + * vblank deadline is approaching. + */ +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{ + if (fence->ops->set_deadline && !dma_fence_is_signaled(fence)) + fence->ops->set_deadline(fence, deadline); +} +EXPORT_SYMBOL(dma_fence_set_deadline); + /** * dma_fence_describe - Dump fence describtion into seq_file * @fence: the 6fence to describe diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 775cdc0b4f24..d77f6591c453 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -99,6 +99,7 @@ enum dma_fence_flag_bits { DMA_FENCE_FLAG_SIGNALED_BIT, DMA_FENCE_FLAG_TIMESTAMP_BIT, DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, + DMA_FENCE_FLAG_HAS_DEADLINE_BIT, Would this bit be better left out from core implementation, given how the approach is the component which implements dma-fence has to track the actual deadline and all? Also taking a step back - are we all okay with starting to expand the relatively simple core synchronisation primitive with side channel data like this? What would be the criteria for what side channel data would be acceptable? Taking note the thing lives outside drivers/gpu/. I had similar concerns and it took me a moment as well to understand the background why this is necessary. I essentially don't see much other approach we could do. Yes, this is GPU/CRTC specific, but we somehow need a common interface for communicating it between drivers and that's the dma_fence object as far as I can see. Regards, Christian. Regards, Tvrtko DMA_FENCE_FLAG_USER_BITS, /* must always be last member */ }; @@ -257,6 +258,23 @@ struct dma_fence_ops { */ void (*timeline_value_str)(struct dma_fence *fence, char *str, int size); + + /** + * @set_deadline: + * + * Callback to allow a fence waiter to inform the fence signaler of + * an upcoming deadline, such as vblank, by which point the waiter + * would prefer the fence to be signaled by. This is intended to + * give feedback to the fence signaler to aid in power management + * decisions, such as boosting GPU frequency. + * + * This is called without &dma_fence.lock held, it can be called + * multiple times and from any context. Locking is up to the callee + * if it has some state to manage. + * + * This callback is optional. + */ + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline); }; void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, @@ -583,6 +601,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr) return ret < 0 ? ret : 0; } +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline); + struct dma_fence *dma_fence_get_stub(void); struct dma_fence *dma_fence_allocate_private_stub(void); u64 dma_fence_context_alloc(unsigned num);
Re: [PATCH v3 2/7] media: Add Y210, Y212 and Y216 formats
Hi Tomi, Le mercredi 21 décembre 2022 à 11:24 +0200, Tomi Valkeinen a écrit : > Add Y210, Y212 and Y216 formats. > > Signed-off-by: Tomi Valkeinen > --- > .../media/v4l/pixfmt-packed-yuv.rst | 49 ++- > drivers/media/v4l2-core/v4l2-ioctl.c | 3 ++ > include/uapi/linux/videodev2.h| 8 +++ > 3 files changed, 58 insertions(+), 2 deletions(-) It seems you omitted to update v4l2-common.c, Ming Qian had made a suplicated commit for this, I'll ask him if he can keep the -common changes you forgot. > > diff --git a/Documentation/userspace-api/media/v4l/pixfmt-packed-yuv.rst > b/Documentation/userspace-api/media/v4l/pixfmt-packed-yuv.rst > index bf283a1b5581..24a771542059 100644 > --- a/Documentation/userspace-api/media/v4l/pixfmt-packed-yuv.rst > +++ b/Documentation/userspace-api/media/v4l/pixfmt-packed-yuv.rst > @@ -262,7 +262,12 @@ the second byte and Y'\ :sub:`7-0` in the third byte. > = > > These formats, commonly referred to as YUYV or YUY2, subsample the chroma > -components horizontally by 2, storing 2 pixels in 4 bytes. > +components horizontally by 2, storing 2 pixels in a container. The container > +is 32-bits for 8-bit formats, and 64-bits for 10+-bit formats. > + > +The packed YUYV formats with more than 8 bits per component are stored as > four > +16-bit little-endian words. Each word's most significant bits contain one > +component, and the least significant bits are zero padding. > > .. raw:: latex > > @@ -270,7 +275,7 @@ components horizontally by 2, storing 2 pixels in 4 bytes. > > .. tabularcolumns:: > |p{3.4cm}|p{1.2cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}| > > -.. flat-table:: Packed YUV 4:2:2 Formats > +.. flat-table:: Packed YUV 4:2:2 Formats in 32-bit container > :header-rows: 1 > :stub-columns: 0 > > @@ -337,6 +342,46 @@ components horizontally by 2, storing 2 pixels in 4 > bytes. >- Y'\ :sub:`3` >- Cb\ :sub:`2` > > +.. tabularcolumns:: > |p{3.4cm}|p{1.2cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}| > + > +.. flat-table:: Packed YUV 4:2:2 Formats in 64-bit container > +:header-rows: 1 > +:stub-columns: 0 > + > +* - Identifier > + - Code > + - Word 0 > + - Word 1 > + - Word 2 > + - Word 3 > +* .. _V4L2-PIX-FMT-Y210: > + > + - ``V4L2_PIX_FMT_Y210`` > + - 'Y210' > + > + - Y'\ :sub:`0` (bits 15-6) > + - Cb\ :sub:`0` (bits 15-6) > + - Y'\ :sub:`1` (bits 15-6) > + - Cr\ :sub:`0` (bits 15-6) > +* .. _V4L2-PIX-FMT-Y212: > + > + - ``V4L2_PIX_FMT_Y212`` > + - 'Y212' > + > + - Y'\ :sub:`0` (bits 15-4) > + - Cb\ :sub:`0` (bits 15-4) > + - Y'\ :sub:`1` (bits 15-4) > + - Cr\ :sub:`0` (bits 15-4) > +* .. _V4L2-PIX-FMT-Y216: > + > + - ``V4L2_PIX_FMT_Y216`` > + - 'Y216' > + > + - Y'\ :sub:`0` (bits 15-0) > + - Cb\ :sub:`0` (bits 15-0) > + - Y'\ :sub:`1` (bits 15-0) > + - Cr\ :sub:`0` (bits 15-0) > + > .. raw:: latex > > \normalsize > diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c > b/drivers/media/v4l2-core/v4l2-ioctl.c > index 875b9a95e3c8..a244d5181120 100644 > --- a/drivers/media/v4l2-core/v4l2-ioctl.c > +++ b/drivers/media/v4l2-core/v4l2-ioctl.c > @@ -1449,6 +1449,9 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt) > case V4L2_META_FMT_RK_ISP1_STAT_3A: descr = "Rockchip ISP1 3A > Statistics"; break; > case V4L2_PIX_FMT_NV12M_8L128: descr = "NV12M (8x128 Linear)"; break; > case V4L2_PIX_FMT_NV12M_10BE_8L128: descr = "10-bit NV12M (8x128 > Linear, BE)"; break; > + case V4L2_PIX_FMT_Y210: descr = "10-bit YUYV Packed"; break; > + case V4L2_PIX_FMT_Y212: descr = "12-bit YUYV Packed"; break; > + case V4L2_PIX_FMT_Y216: descr = "16-bit YUYV Packed"; break; > > default: > /* Compressed formats */ > diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h > index 51d6a8aa4e17..403db3fb5cfa 100644 > --- a/include/uapi/linux/videodev2.h > +++ b/include/uapi/linux/videodev2.h > @@ -621,6 +621,14 @@ struct v4l2_pix_format { > #define V4L2_PIX_FMT_YUVX32 v4l2_fourcc('Y', 'U', 'V', 'X') /* 32 > YUVX-8-8-8-8 */ > #define V4L2_PIX_FMT_M420v4l2_fourcc('M', '4', '2', '0') /* 12 YUV > 4:2:0 2 lines y, 1 line uv interleaved */ > > +/* > + * YCbCr packed format. For each Y2xx format, xx bits of valid data occupy > the MSBs > + * of the 16 bit components, and 16-xx bits of zero padding occupy the LSBs. > + */ > +#define V4L2_PIX_FMT_Y210v4l2_fourcc('Y', '2', '1', '0') /* 32 YUYV > 4:2:2 */ > +#define V4L2_PIX_FMT_Y212v4l2_fourcc('Y', '2', '1', '2') /* 32 YUYV > 4:2:2 */ > +#define V4L2_PIX_FMT_Y216v4l2_fourcc('Y', '2', '1', '6') /* 32 YUYV > 4:2:2 */ > + > /* two planes -- one Y, one Cr + Cb interleaved */ > #define V4L2_PIX_FMT_NV12v4l2_f
Re: [PATCH] backlight: ktz8866: Convert to i2c's .probe_new()
On Sun, 12 Feb 2023, Uwe Kleine-König wrote: > Hello Lee, > > On Mon, Jan 30, 2023 at 09:42:01AM +, Lee Jones wrote: > > On Fri, 27 Jan 2023, Uwe Kleine-König wrote: > > > > > The probe function doesn't make use of the i2c_device_id * parameter so > > > it can be trivially converted. > > > > > > Signed-off-by: Uwe Kleine-König > > > --- > > > Hello, > > > > > > there is an ongoing effort to convert all drivers to .probe_new to > > > eventually drop .probe with the i2c_device_id parameter. This driver > > > currently sits in next so wasn't on my radar before. > > > > > > My plan is to tackle that after the next merge window. So I ask you to > > > either apply this patch during the next merge window or accept that it > > > will go in via the i2c tree together with the patch that drops .probe(). > > > > > > drivers/video/backlight/ktz8866.c | 5 ++--- > > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > Applied, thanks > > Thanks, this patch didn't make it into next yet though. Is there a > problem other than pending tests before you publish it in your tree that > gets pulled into next? It's queued, ready to be submitted tomorrow. Don't worry. -- Lee Jones [李琼斯]
Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
Am 22.02.23 um 16:07 schrieb Danilo Krummrich: On 2/22/23 11:25, Christian König wrote: Am 17.02.23 um 14:44 schrieb Danilo Krummrich: +/** + * DOC: Overview + * + * The DRM GPU VA Manager, represented by struct drm_gpuva_manager keeps track + * of a GPU's virtual address (VA) space and manages the corresponding virtual + * mappings represented by &drm_gpuva objects. It also keeps track of the + * mapping's backing &drm_gem_object buffers. + * + * &drm_gem_object buffers maintain a list (and a corresponding list lock) of + * &drm_gpuva objects representing all existent GPU VA mappings using this + * &drm_gem_object as backing buffer. + * + * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA mapping can + * only be created within a previously allocated &drm_gpuva_region, which + * represents a reserved portion of the GPU VA space. GPU VA mappings are not + * allowed to span over a &drm_gpuva_region's boundary. + * + * GPU VA regions can also be flagged as sparse, which allows drivers to create + * sparse mappings for a whole GPU VA region in order to support Vulkan + * 'Sparse Resources'. Well since we have now found that there is absolutely no technical reason for having those regions could we please drop them? I disagree this was the outcome of our previous discussion. In nouveau I still need them to track the separate sparse page tables and, as you confirmed previously, Nvidia cards are not the only cards supporting this feature. The second reason is that with regions we can avoid merging between buffers, which saves some effort. However, I agree that this argument by itself probably doesn't hold too much, since you've pointed out in a previous mail that: 1) If we merge and decide to only do that inside certain boundaries then those boundaries needs to be provided and checked against. This burns quite some CPU cycles 2) If we just merge what we can we might have extra page table updates which cost time and could result in undesired side effects. 3) If we don't merge at all we have additional housekeeping for the mappings and maybe hw restrictions. However, if a driver uses regions to track its separate sparse page tables anyway it gets 1) for free, which is a nice synergy. I totally agree that regions aren't for everyone though. Hence, I made them an optional feature and by default regions are disabled. In order to use them drm_gpuva_manager_init() must be called with the DRM_GPUVA_MANAGER_REGIONS feature flag. I really would not want to open code regions or have two GPUVA manager instances in nouveau to track sparse page tables. That would be really messy, hence I hope we can agree on this to be an optional feature. I absolutely don't think that this is a good idea then. This separate handling of sparse page tables is completely Nouveau specific. Even when it's optional feature mixing this into the common handling is exactly what I pointed out as not properly separating between hardware specific and hardware agnostic functionality. This is exactly the problem we ran into with TTM as well and I've spend a massive amount of time to clean that up again. Regards, Christian. I don't really see a need for them any more. Regards, Christian.
Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings
On 2/22/23 11:25, Christian König wrote: Am 17.02.23 um 14:44 schrieb Danilo Krummrich: +/** + * DOC: Overview + * + * The DRM GPU VA Manager, represented by struct drm_gpuva_manager keeps track + * of a GPU's virtual address (VA) space and manages the corresponding virtual + * mappings represented by &drm_gpuva objects. It also keeps track of the + * mapping's backing &drm_gem_object buffers. + * + * &drm_gem_object buffers maintain a list (and a corresponding list lock) of + * &drm_gpuva objects representing all existent GPU VA mappings using this + * &drm_gem_object as backing buffer. + * + * If the &DRM_GPUVA_MANAGER_REGIONS feature is enabled, a GPU VA mapping can + * only be created within a previously allocated &drm_gpuva_region, which + * represents a reserved portion of the GPU VA space. GPU VA mappings are not + * allowed to span over a &drm_gpuva_region's boundary. + * + * GPU VA regions can also be flagged as sparse, which allows drivers to create + * sparse mappings for a whole GPU VA region in order to support Vulkan + * 'Sparse Resources'. Well since we have now found that there is absolutely no technical reason for having those regions could we please drop them? I disagree this was the outcome of our previous discussion. In nouveau I still need them to track the separate sparse page tables and, as you confirmed previously, Nvidia cards are not the only cards supporting this feature. The second reason is that with regions we can avoid merging between buffers, which saves some effort. However, I agree that this argument by itself probably doesn't hold too much, since you've pointed out in a previous mail that: 1) If we merge and decide to only do that inside certain boundaries then those boundaries needs to be provided and checked against. This burns quite some CPU cycles 2) If we just merge what we can we might have extra page table updates which cost time and could result in undesired side effects. 3) If we don't merge at all we have additional housekeeping for the mappings and maybe hw restrictions. However, if a driver uses regions to track its separate sparse page tables anyway it gets 1) for free, which is a nice synergy. I totally agree that regions aren't for everyone though. Hence, I made them an optional feature and by default regions are disabled. In order to use them drm_gpuva_manager_init() must be called with the DRM_GPUVA_MANAGER_REGIONS feature flag. I really would not want to open code regions or have two GPUVA manager instances in nouveau to track sparse page tables. That would be really messy, hence I hope we can agree on this to be an optional feature. I don't really see a need for them any more. Regards, Christian.
[PATCH 2/2] Apply quirk to disable PSR 2 on Tongfang PHxTxX1 and PHxTQx1
On these Barebones PSR 2 is recognized as supported but is very buggy: - Upper third of screen does sometimes not updated, resulting in disappearing cursors or ghosts of already closed Windows saying behind. - Approximately 40 px from the bottom edge a 3 pixel wide strip of randomly colored pixels is flickering. PSR 1 is working fine however. Signed-off-by: Werner Sembach Cc: --- drivers/gpu/drm/i915/display/intel_quirks.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/display/intel_quirks.c b/drivers/gpu/drm/i915/display/intel_quirks.c index ce6d0fe6448f5..eeb32d3189f5c 100644 --- a/drivers/gpu/drm/i915/display/intel_quirks.c +++ b/drivers/gpu/drm/i915/display/intel_quirks.c @@ -65,6 +65,10 @@ static void quirk_no_pps_backlight_power_hook(struct drm_i915_private *i915) drm_info(&i915->drm, "Applying no pps backlight power quirk\n"); } +/* + * Tongfang PHxTxX1 and PHxTQx1 devices have support for PSR 2 but it is broken + * on Linux. PSR 1 however works just fine. + */ static void quirk_no_psr2(struct drm_i915_private *i915) { intel_set_quirk(i915, QUIRK_NO_PSR2); @@ -205,6 +209,10 @@ static struct intel_quirk intel_quirks[] = { /* ECS Liva Q2 */ { 0x3185, 0x1019, 0xa94d, quirk_increase_ddi_disabled_time }, { 0x3184, 0x1019, 0xa94d, quirk_increase_ddi_disabled_time }, + + /* Tongfang PHxTxX1 and PHxTQx1/TUXEDO InfinityBook 14 Gen6 */ + { 0x9a49, 0x1d05, 0x1105, quirk_no_psr2 }, + { 0x9a49, 0x1d05, 0x114c, quirk_no_psr2 }, }; void intel_init_quirks(struct drm_i915_private *i915) -- 2.34.1
[PATCH 0/2] Add quirk to disable PSR 2 on Tongfang PHxTxX1 and PHxTQx1
On these Barebones PSR 2 is recognized as supported but is very buggy: - Upper third of screen does sometimes not updated, resulting in disappearing cursors or ghosts of already closed Windows saying behind. - Approximately 40 px from the bottom edge a 3 pixel wide strip of randomly colored pixels is flickering. PSR 1 is working fine however. This patchset introduces a new quirk to disable PSR 2 specifically on known buggy devices and applies it to the Tongfang PHxTxX1 and PHxTQx1 barebones. Signed-off-by: Werner Sembach Cc:
[PATCH 1/2] Add quirk to disable PSR 2 on a per device basis
This adds the possibility to disable PSR 2 explicitly using the intel quirk table for devices where PSR 2 is borked, but PSR 1 works fine. Signed-off-by: Werner Sembach Cc: --- drivers/gpu/drm/i915/display/intel_psr.c| 4 +++- drivers/gpu/drm/i915/display/intel_quirks.c | 6 ++ drivers/gpu/drm/i915/display/intel_quirks.h | 1 + 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/display/intel_psr.c b/drivers/gpu/drm/i915/display/intel_psr.c index 5b678916e6db5..4607f3c4cf592 100644 --- a/drivers/gpu/drm/i915/display/intel_psr.c +++ b/drivers/gpu/drm/i915/display/intel_psr.c @@ -37,6 +37,7 @@ #include "intel_psr.h" #include "intel_snps_phy.h" #include "skl_universal_plane.h" +#include "intel_quirks.h" /** * DOC: Panel Self Refresh (PSR/SRD) @@ -851,7 +852,8 @@ static bool intel_psr2_config_valid(struct intel_dp *intel_dp, int crtc_vdisplay = crtc_state->hw.adjusted_mode.crtc_vdisplay; int psr_max_h = 0, psr_max_v = 0, max_bpp = 0; - if (!intel_dp->psr.sink_psr2_support) + if (!intel_dp->psr.sink_psr2_support || + intel_has_quirk(dev_priv, QUIRK_NO_PSR2)) return false; /* JSL and EHL only supports eDP 1.3 */ diff --git a/drivers/gpu/drm/i915/display/intel_quirks.c b/drivers/gpu/drm/i915/display/intel_quirks.c index 6e48d3bcdfec5..ce6d0fe6448f5 100644 --- a/drivers/gpu/drm/i915/display/intel_quirks.c +++ b/drivers/gpu/drm/i915/display/intel_quirks.c @@ -65,6 +65,12 @@ static void quirk_no_pps_backlight_power_hook(struct drm_i915_private *i915) drm_info(&i915->drm, "Applying no pps backlight power quirk\n"); } +static void quirk_no_psr2(struct drm_i915_private *i915) +{ + intel_set_quirk(i915, QUIRK_NO_PSR2); + drm_info(&i915->drm, "Applying No PSR 2 quirk\n"); +} + struct intel_quirk { int device; int subsystem_vendor; diff --git a/drivers/gpu/drm/i915/display/intel_quirks.h b/drivers/gpu/drm/i915/display/intel_quirks.h index 10a4d163149fd..2e0b788a44a1e 100644 --- a/drivers/gpu/drm/i915/display/intel_quirks.h +++ b/drivers/gpu/drm/i915/display/intel_quirks.h @@ -17,6 +17,7 @@ enum intel_quirk_id { QUIRK_INVERT_BRIGHTNESS, QUIRK_LVDS_SSC_DISABLE, QUIRK_NO_PPS_BACKLIGHT_POWER_HOOK, + QUIRK_NO_PSR2, }; void intel_init_quirks(struct drm_i915_private *i915); -- 2.34.1
Re: [Intel-xe] [PATCH 1/3] drm/suballoc: Introduce a generic suballocation manager
Am 22.02.23 um 14:54 schrieb Thomas Hellström: Hi, On 2/22/23 12:39, Christian König wrote: Hi Thomas, Am 22.02.23 um 12:00 schrieb Thomas Hellström: Hi, Christian, So I resurrected Maarten's previous patch series around this (the amdgpu suballocator) slightly modified the code to match the API of this patch series, re-introduced the per-allocation alignment as per a previous review comment from you on that series, and made checkpatch.pl pass mostly, except for pre-existing style problems, and added / fixed some comments. No memory corruption seen so far on limited Xe testing. To move this forward I suggest starting with that as a common drm suballocator. I'll post the series later today. We can follow up with potential simplifactions lif needed. I also made a kunit test also reporting some timing information. Will post that as a follow up. Some interesting preliminary conclusions: * drm_mm is per se not a cpu hog, If the rb tree processing is disabled and the EVICT algorithm is changed from MRU to ring-like LRU traversal, it's more or less just as fast as the ring suballocator. * With a single ring, and the suballocation buffer never completely filled (no sleeps) the amd suballocator is a bit faster per allocation / free. (Around 250 ns instead of 350). Allocation is slightly slower on the amdgpu one, freeing is faster, mostly due to the locking overhead incurred when setting up the fence callbacks, and for avoiding irq-disabled processing on the one I proposed. For some more realistic numbers try to signal the fence from another CPU. Alternative you can invalidate all the CPU read cache lines touched by the fence callback so that they need to be read in again from the allocating CPU. Fences are signalled using hr-timer driven fake "ring"s, so should probably be distributed among cpus in a pretty realistic way. But anyway I agree results obtained from that kunit test can and should be challenged before we actually use them for improvements. I would double check that. My expectation is that hr-timers execute by default on the CPU from which they are started. * With multiple rings and varying allocation sizes and signalling times creating fragmentation, the picture becomes different as the amdgpu allocator starts to sleep/throttle already round 50% - 75% fill. The one I proposed between 75% to 90% fill, and once that happens, the CPU cost of putting to sleep and waking up should really shadow the above numbers. So it's really a tradeoff. Where IMO also code size and maintainability should play a role. Also I looked at the history of the amdgpu allocator originating back to Radeon 2012-ish, but couldn't find any commits mentioning fence callbacks nor problem with those. Could you point me to that discussion? Uff that was ~10 years ago. I don't think I can find that again. OK, fair enough. But what was the objective reasoning against using fence callbacks for this sort of stuff, was it unforeseen locking problems, caching issues or something else? Well caching line bouncing is one major problem. Also take a look at the discussion about using list_head in interrupt handlers, that should be easy to find on LWN. The allocator usually manages enough memory so that it never runs into waiting for anything, only in extreme cases like GPU resets we actually wait for allocations to be freed. So the only cache lines which is accessed from more than one CPU should be the signaled flag of the fence. With moving list work into the interrupt handler you have at least 3 cache lines which start to bounce between different CPUs. Regards, Christian. Thanks, Thomas Regards, Christian. Thanks, Thomas On 2/17/23 14:51, Thomas Hellström wrote: On 2/17/23 14:18, Christian König wrote: Am 17.02.23 um 14:10 schrieb Thomas Hellström: [SNIP] Any chance you could do a quick performance comparison? If not, anything against merging this without the amd / radeon changes until we can land a simpler allocator? Only if you can stick the allocator inside Xe and not drm, cause this seems to be for a different use case than the allocators inside radeon/amdgpu. Hmm. No It's allocating in a ring-like fashion as well. Let me put together a unit test for benchmaking. I think it would be a failure for the community to end up with three separate suballocators doing the exact same thing for the same problem, really. Well exactly that's the point. Those allocators aren't the same because they handle different problems. The allocator in radeon is simpler because it only had to deal with a limited number of fence timelines. The one in amdgpu is a bit more complex because of the added complexity for more fence timelines. We could take the one from amdgpu and use it for radeon and others as well, but the allocator proposed here doesn't even remotely matches the requirements. But again, what *are* those missing requirements
Re: [PATCH v12 03/10] drm/display: Add Type-C switch helpers
On Wed, Feb 22, 2023 at 04:53:41PM +0800, Pin-yen Lin wrote: > On Tue, Feb 21, 2023 at 7:48 PM Andy Shevchenko > > On Tue, Feb 21, 2023 at 05:50:47PM +0800, Pin-yen Lin wrote: ... > > > #include > > > #include > > > +#include > > > > I don't see users of this. > > But a few forward declarations are missing. > > I can put a `struct typec_mux_dev;` here, but there is also a usage of > typec_mux_set_fn_t. > > Should I add the typedef into this header file as well? No, inclusion is fine, it's me who didn't notice the type in use. -- With Best Regards, Andy Shevchenko
Re: [PATCH v5 09/14] drm/syncobj: Add deadline support for syncobj waits
On Wed, Feb 22, 2023 at 12:09:04PM +0200, Pekka Paalanen wrote: > On Tue, 21 Feb 2023 09:25:18 -0800 > Rob Clark wrote: > > > On Tue, Feb 21, 2023 at 12:53 AM Pekka Paalanen wrote: > > > > > > On Mon, 20 Feb 2023 12:18:56 -0800 > > > Rob Clark wrote: > > > > > > > From: Rob Clark > > > > > > > > Add a new flag to let userspace provide a deadline as a hint for syncobj > > > > and timeline waits. This gives a hint to the driver signaling the > > > > backing fences about how soon userspace needs it to compete work, so it > > > > can addjust GPU frequency accordingly. An immediate deadline can be > > > > given to provide something equivalent to i915 "wait boost". > > > > > > > > v2: Use absolute u64 ns value for deadline hint, drop cap and driver > > > > feature flag in favor of allowing count_handles==0 as a way for > > > > userspace to probe kernel for support of new flag > > > > > > > > Signed-off-by: Rob Clark > > > > --- > > > > drivers/gpu/drm/drm_syncobj.c | 59 +++ > > > > include/uapi/drm/drm.h| 5 +++ > > > > 2 files changed, 51 insertions(+), 13 deletions(-) > > > > > > ... > > > > > > > diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h > > > > index 642808520d92..aefc8cc743e0 100644 > > > > --- a/include/uapi/drm/drm.h > > > > +++ b/include/uapi/drm/drm.h > > > > @@ -887,6 +887,7 @@ struct drm_syncobj_transfer { > > > > #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0) > > > > #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1) > > > > #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE (1 << 2) /* wait for > > > > time point to become available */ > > > > +#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE (1 << 3) /* set fence > > > > deadline based to deadline_nsec/sec */ > > > > > > Hi, > > > > > > where is the UAPI documentation explaining what is a "fence deadline" > > > and what setting it does here? > > > > It's with the rest of the drm_syncobj UAPI docs ;-) > > Is that > https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html#dma-fence-uabi-sync-file > ? > > That whole page never mentions e.g. WAIT_AVAILABLE, so at least the > flags are not there. Does not mention syncobj_wait either. probably this: https://docs.kernel.org/gpu/drm-mm.html the new one needs to be added there as well. > > I could ask where the real non-IGT userspace is or the plan for it, > too, since this is new DRM UAPI. yeap, it looks like we need to close on this... https://gitlab.freedesktop.org/drm/intel/-/issues/8014 I confess I got lost on the many discussions and on how this will be used. Is mesa going to set the deadline based on the vk priority? Will this continue to be internal? I didn't get the broader picture I'm afraid... > > > Thanks, > pq > > > > > BR, > > -R > > > > > btw. no nsec/sec anymore. > > > > > > > > > Thanks, > > > pq > > > > > > > > > > struct drm_syncobj_wait { > > > > __u64 handles; > > > > /* absolute timeout */ > > > > @@ -895,6 +896,8 @@ struct drm_syncobj_wait { > > > > __u32 flags; > > > > __u32 first_signaled; /* only valid when not waiting all */ > > > > __u32 pad; > > > > + /* Deadline hint to set on backing fence(s) in CLOCK_MONOTONIC: */ > > > > + __u64 deadline_ns; > > > > }; > > > > > > > > struct drm_syncobj_timeline_wait { > > > > @@ -907,6 +910,8 @@ struct drm_syncobj_timeline_wait { > > > > __u32 flags; > > > > __u32 first_signaled; /* only valid when not waiting all */ > > > > __u32 pad; > > > > + /* Deadline hint to set on backing fence(s) in CLOCK_MONOTONIC: */ > > > > + __u64 deadline_ns; > > > > }; > > > > > > > > > > > >