Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
Am 07.05.24 um 21:07 schrieb Linus Torvalds: On Tue, 7 May 2024 at 11:04, Daniel Vetter wrote: On Tue, May 07, 2024 at 09:46:31AM -0700, Linus Torvalds wrote: I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl too, if this is possibly a more common thing. and not just DRM wants it. Would something like that work for you? Yes. Adding Simon and Pekka as two of the usual suspects for this kind of stuff. Also example code (the int return value is just so that callers know when kcmp isn't available, they all only care about equality): https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/os_file.c#L239 That example thing shows that we shouldn't make it a FISAME ioctl - we should make it a fcntl() instead, and it would just be a companion to F_DUPFD. Doesn't that strike everybody as a *much* cleaner interface? I think F_ISDUP would work very naturally indeed with F_DUPFD. Yes? No? Sounds absolutely sane to me. Christian. Linus
Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?
Hi, On Tue, 7 May 2024 at 12:15, Daniel Vetter wrote: > On Mon, May 06, 2024 at 04:01:42PM +0200, Hans de Goede wrote: > > On 5/6/24 3:38 PM, Daniel Vetter wrote: > > I agree that bad applications are an issue, but not for the flathub / snaps > > case. Flatpacks / snaps run sandboxed and don't have access to a full /dev > > so those should not be able to open /dev/dma_heap/* independent of > > the ACLs on /dev/dma_heap/*. The plan is for cameras using the > > libcamera software ISP to always be accessed through pipewire and > > the camera portal, so in this case pipewere is taking the place of > > the compositor in your kms vs render node example. > > Yeah essentially if you clarify to "set the permissions such that pipewire > can do allocations", then I think that makes sense. And is at the same > level as e.g. drm kms giving compsitors (but _only_ compositors) special > access rights. That would have the unfortunate side effect of making sandboxed apps less efficient on some platforms, since they wouldn't be able to do direct scanout anymore ... Cheers, Daniel
[linux-next:master] BUILD REGRESSION 93a39e4766083050ca0ecd6a3548093a3b9eb60c
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: 93a39e4766083050ca0ecd6a3548093a3b9eb60c Add linux-next specific files for 20240507 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202405080133.16zi5eay-...@intel.com https://lore.kernel.org/oe-kbuild-all/202405080300.itzpe1xh-...@intel.com https://lore.kernel.org/oe-kbuild-all/202405080553.tfh9ems8-...@intel.com https://lore.kernel.org/oe-kbuild-all/202405080801.qqazymz6-...@intel.com Error/Warning: (recently discovered and may have been fixed) /usr/bin/ld: drivers/net/dsa/realtek/rtl8366rb.c:980:(.text+0x2808): undefined reference to `devm_led_classdev_register_ext' ERROR: modpost: "drm_dsc_pps_payload_pack" [drivers/gpu/drm/panel/panel-lg-sw43408.ko] undefined! aarch64-linux-ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:320:(.text+0x850): undefined reference to `kunit_binary_assert_format' aarch64-linux-ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:322:(.text+0x734): undefined reference to `kunit_mem_assert_format' aarch64-linux-ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:58:(.text+0x5ec): undefined reference to `kunit_binary_ptr_assert_format' aarch64-linux-ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:58:(.text+0x60c): undefined reference to `__kunit_abort' aarch64-linux-ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:77:(.text+0x5a0): undefined reference to `kunit_unary_assert_format' aarch64-linux-ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:77:(.text+0x5b0): undefined reference to `__kunit_do_failed_assertion' csky-linux-ld: panel-lg-sw43408.c:(.text+0x310): undefined reference to `drm_dsc_pps_payload_pack' drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1734:62: error: bitwise operation between different enumeration types ('enum amd_asic_type' and 'enum amd_chip_flags') [-Werror,-Wenum-enum-conversion] drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:723:42: error: arithmetic between different enumeration types ('enum amdgpu_ras_block' and 'enum amdgpu_ras_mca_block') [-Werror,-Wenum-enum-conversion] drivers/gpu/drm/arm/display/komeda/komeda_dev.c:26:37: error: invalid use of undefined type 'struct seq_file' drivers/gpu/drm/arm/display/komeda/komeda_dev.c:29:9: error: implicit declaration of function 'seq_puts' [-Werror=implicit-function-declaration] drivers/gpu/drm/arm/display/komeda/komeda_dev.c:44:1: error: type defaults to 'int' in declaration of 'DEFINE_SHOW_ATTRIBUTE' [-Werror=implicit-int] drivers/gpu/drm/drm_mm.c:614:20: error: function 'drm_mm_node_scanned_block' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration] drivers/gpu/drm/imx/ipuv3/imx-ldb.c:658:57: error: '_sel' directive output may be truncated writing 4 bytes into a region of size between 3 and 13 [-Werror=format-truncation=] drivers/gpu/drm/nouveau/nouveau_backlight.c:56:69: error: '%d' directive output may be truncated writing between 1 and 10 bytes into a region of size 3 [-Werror=format-truncation=] drivers/gpu/drm/nouveau/nvif/object.c:161:9: error: 'memcpy' accessing 4294967264 or more bytes at offsets 0 and 32 overlaps 6442450881 bytes at offset -2147483617 [-Werror=restrict] drivers/gpu/drm/pl111/pl111_versatile.c:488:24: error: cast to smaller integer type 'enum versatile_clcd' from 'const void *' [-Werror,-Wvoid-pointer-to-enum-cast] drivers/gpu/drm/radeon/radeon_drv.c:251:2: error: bitwise operation between different enumeration types ('enum radeon_family' and 'enum radeon_chip_flags') [-Werror,-Wenum-enum-conversion] drivers/gpu/drm/renesas/rcar-du/rcar_cmm.c:35:19: error: unused function 'rcar_cmm_read' [-Werror,-Wunused-function] drivers/gpu/drm/xlnx/zynqmp_disp.c:949:14: error: logical not is only applied to the left hand side of this comparison [-Werror,-Wlogical-not-parentheses] drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:322:(.text+0x728): undefined reference to `kunit_mem_assert_format' drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:77:(.text+0x598): dangerous relocation: unsupported relocation drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:77:(.text+0x598): undefined reference to `kunit_unary_assert_format' include/asm-generic/io.h:547:31: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic] include/kunit/test.h:444:(.text+0x440): undefined reference to `kunit_kmalloc_array' kernel/bpf/verifier.c:20086:85: error: 'pcpu_hot' undeclared (first use in this function) ld.lld: error: undefined symbol: drm_dsc_pps_payload_pack panel-lg-sw43408.c:(.text+0x278): undefined reference to `drm_dsc_pps_payload_pack' panel-lg-sw43408.c:(.text+0x934): undefined reference to `.drm_dsc_pps_payload_pack' Unverified Error/Warning (likely false positive, please contact us if interested): drivers/net/dsa/realtek/rtl8366rb.c:1032:4-15: ERROR: probable double put. Error/Warning ids grouped by kconfigs: gcc_recent_errors |-- arc-al
Re: [PATCH v3 5/6] drm/xe: Add helper to accumulate exec queue runtime
On Tue, May 07, 2024 at 04:45:36PM GMT, Umesh Nerlige Ramappa wrote: On Tue, May 07, 2024 at 03:45:09PM -0700, Lucas De Marchi wrote: From: Umesh Nerlige Ramappa Add a helper to accumulate per-client runtime of all its exec queues. This is called every time a sched job is finished. v2: - Use guc_exec_queue_free_job() and execlist_job_free() to accumulate runtime when job is finished since xe_sched_job_completed() is not a notification that job finished. - Stop trying to update runtime from xe_exec_queue_fini() - that is redundant and may happen after xef is closed, leading to a use-after-free - Do not special case the first timestamp read: the default LRC sets CTX_TIMESTAMP to zero, so even the first sample should be a valid one. - Handle the parallel submission case by multiplying the runtime by width. Signed-off-by: Umesh Nerlige Ramappa Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/xe/xe_device_types.h | 9 +++ drivers/gpu/drm/xe/xe_exec_queue.c | 35 drivers/gpu/drm/xe/xe_exec_queue.h | 1 + drivers/gpu/drm/xe/xe_execlist.c | 1 + drivers/gpu/drm/xe/xe_guc_submit.c | 2 ++ 5 files changed, 48 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h index 906b98fb973b..de078bdf0ab9 100644 --- a/drivers/gpu/drm/xe/xe_device_types.h +++ b/drivers/gpu/drm/xe/xe_device_types.h @@ -560,6 +560,15 @@ struct xe_file { struct mutex lock; } exec_queue; + /** +* @runtime: hw engine class runtime in ticks for this drm client +* +* Only stats from xe_exec_queue->lrc[0] are accumulated. For multi-lrc +* case, since all jobs run in parallel on the engines, only the stats +* from lrc[0] are sufficient. Maybe we can drop the above comment altogether after the multi-lrc update. right... I removed it from the middle of the function and this was a leftover. I will remove it on next version. thanks Lucas De Marchi Umesh
Re: [PATCH v3 6/6] drm/xe/client: Print runtime to fdinfo
On Tue, May 07, 2024 at 03:45:10PM -0700, Lucas De Marchi wrote: Print the accumulated runtime for client when printing fdinfo. Each time a query is done it first does 2 things: 1) loop through all the exec queues for the current client and accumulate the runtime, per engine class. CTX_TIMESTAMP is used for that, being read from the context image. 2) Read a "GPU timestamp" that can be used for considering "how much GPU time has passed" and that has the same unit/refclock as the one recording the runtime. RING_TIMESTAMP is used for that via MMIO. Since for all current platforms RING_TIMESTAMP follows the same refclock, just read it once, using any first engine. This is exported to userspace as 2 numbers in fdinfo: drm-cycles-: drm-total-cycles-: Userspace is expected to collect at least 2 samples, which allows to know the client engine busyness as per: RUNTIME1 - RUNTIME0 busyness = - T1 - T0 Another thing to point out is that it's expected that userspace will read any 2 samples every few seconds. Given the update frequency of the counters involved and that CTX_TIMESTAMP is 32-bits, the counter for each exec_queue can wrap around (assuming 100% utilization) after ~200s. The wraparound is not perceived by userspace since it's just accumulated for all the exec_queues in a 64-bit counter), but the measurement will not be accurate if the samples are too far apart. This could be mitigated by adding a workqueue to accumulate the counters every so often, but it's additional complexity for something that is done already by userspace every few seconds in tools like gputop (from igt), htop, nvtop, etc with none of them really defaulting to 1 sample per minute or more. Signed-off-by: Lucas De Marchi --- Documentation/gpu/drm-usage-stats.rst | 16 ++- Documentation/gpu/xe/index.rst | 1 + Documentation/gpu/xe/xe-drm-usage-stats.rst | 10 ++ drivers/gpu/drm/xe/xe_drm_client.c | 136 +++- 4 files changed, 160 insertions(+), 3 deletions(-) create mode 100644 Documentation/gpu/xe/xe-drm-usage-stats.rst diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index 6dc299343b48..421766289b78 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -112,6 +112,17 @@ larger value within a reasonable period. Upon observing a value lower than what was previously read, userspace is expected to stay with that larger previous value until a monotonic update is seen. +- drm-total-cycles-: + +Engine identifier string must be the same as the one specified in the +drm-cycles- tag and shall contain the total number cycles for the given +engine. + +This is a timestamp in GPU unspecified unit that matches the update rate +of drm-cycles-. For drivers that implement this interface, the engine +utilization can be calculated entirely on the GPU clock domain, without +considering the CPU sleep time between 2 samples. + - drm-maxfreq-: [Hz|MHz|KHz] Engine identifier string must be the same as the one specified in the @@ -168,5 +179,6 @@ be documented above and where possible, aligned with other drivers. Driver specific implementations --- -:ref:`i915-usage-stats` -:ref:`panfrost-usage-stats` +* :ref:`i915-usage-stats` +* :ref:`panfrost-usage-stats` +* :ref:`xe-usage-stats` diff --git a/Documentation/gpu/xe/index.rst b/Documentation/gpu/xe/index.rst index c224ecaee81e..3f07aa3b5432 100644 --- a/Documentation/gpu/xe/index.rst +++ b/Documentation/gpu/xe/index.rst @@ -23,3 +23,4 @@ DG2, etc is provided to prototype the driver. xe_firmware xe_tile xe_debugging + xe-drm-usage-stats.rst diff --git a/Documentation/gpu/xe/xe-drm-usage-stats.rst b/Documentation/gpu/xe/xe-drm-usage-stats.rst new file mode 100644 index ..482d503ae68a --- /dev/null +++ b/Documentation/gpu/xe/xe-drm-usage-stats.rst @@ -0,0 +1,10 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +.. _xe-usage-stats: + + +Xe DRM client usage stats implementation + + +.. kernel-doc:: drivers/gpu/drm/xe/xe_drm_client.c + :doc: DRM Client usage stats diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c index 08f0b7c95901..27494839d586 100644 --- a/drivers/gpu/drm/xe/xe_drm_client.c +++ b/drivers/gpu/drm/xe/xe_drm_client.c @@ -2,6 +2,7 @@ /* * Copyright © 2023 Intel Corporation */ +#include "xe_drm_client.h" #include #include @@ -12,9 +13,65 @@ #include "xe_bo.h" #include "xe_bo_types.h" #include "xe_device_types.h" -#include "xe_drm_client.h" +#include "xe_exec_queue.h" +#include "xe_gt.h" +#include "xe_hw_engine.h" +#include "xe_pm.h" #include "xe_trace.h" +/** + * DOC: DRM Client usage stats + * + * The drm/xe driver implements the DRM client usage stats specification as + * documented in
Re: [PATCH v1 2/2] vfio/pci: Allow MMIO regions to be exported through dma-buf
On Thu, May 02, 2024 at 07:50:36AM +, Kasireddy, Vivek wrote: > Hi Jason, > > > > > On Tue, Apr 30, 2024 at 04:24:50PM -0600, Alex Williamson wrote: > > > > +static vm_fault_t vfio_pci_dma_buf_fault(struct vm_fault *vmf) > > > > +{ > > > > + struct vm_area_struct *vma = vmf->vma; > > > > + struct vfio_pci_dma_buf *priv = vma->vm_private_data; > > > > + pgoff_t pgoff = vmf->pgoff; > > > > + > > > > + if (pgoff >= priv->nr_pages) > > > > + return VM_FAULT_SIGBUS; > > > > + > > > > + return vmf_insert_pfn(vma, vmf->address, > > > > + page_to_pfn(priv->pages[pgoff])); > > > > +} > > > > > > How does this prevent the MMIO space from being mmap'd when disabled > > at > > > the device? How is the mmap revoked when the MMIO becomes disabled? > > > Is it part of the move protocol? > In this case, I think the importers that mmap'd the dmabuf need to be tracked > separately and their VMA PTEs need to be zapped when MMIO access is revoked. Which, as we know, is quite hard. > > Yes, we should not have a mmap handler for dmabuf. vfio memory must be > > mmapped in the normal way. > Although optional, I think most dmabuf exporters (drm ones) provide a mmap > handler. Otherwise, there is no easy way to provide CPU access (backup slow > path) > to the dmabuf for the importer. Here we should not, there is no reason since VFIO already provides a mmap mechanism itself. Anything using this API should just call the native VFIO function instead of trying to mmap the DMABUF. Yes, it will be inconvient for the scatterlist case you have, but the kernel side implementation is much easier .. > > > > +/** > > > > + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the > > > > + * region selected. > > > > + * > > > > + * open_flags are the typical flags passed to open(2), eg O_RDWR, > > O_CLOEXEC, > > > > + * etc. offset/length specify a slice of the region to create the > > > > dmabuf > > from. > > > > + * If both are 0 then the whole region is used. > > > > + * > > > > + * Return: The fd number on success, -1 and errno is set on failure. > > > > + */ > > > > +#define VFIO_DEVICE_FEATURE_DMA_BUF 11 > > > > + > > > > +struct vfio_region_p2p_area { > > > > + __u32 region_index; > > > > + __u32 __pad; > > > > + __u64 offset; > > > > + __u64 length; > > > > +}; > > > > + > > > > +struct vfio_device_feature_dma_buf { > > > > + __u32 open_flags; > > > > + __u32 nr_areas; > > > > + struct vfio_region_p2p_area p2p_areas[]; > > > > +}; > > > > Still have no clue what this p2p areas is. You want to create a dmabuf > > out of a scatterlist? Why?? > Because the data associated with a buffer that needs to be shared can > come from multiple ranges. I probably should have used the terms ranges > or slices or chunks to make it more clear instead of p2p areas. Yes, please pick a better name. > > I'm also not sure of the use of the pci_p2pdma family of functions, it > > is a bold step to make struct pages, that isn't going to work in quite > I guess things may have changed since the last discussion on this topic or > maybe I misunderstood but I thought Christoph's suggestion was to use > struct pages to populate the scatterlist instead of using DMA addresses > and, I figured pci_p2pdma APIs can easily help with that. It was, but it doesn't really work for VFIO. You can only create struct pages for large MMIO spaces, and only on some architectures. Requiring them will make VFIO unusable in alot of places. Requiring them just for DMABUF will make that feature unusable. Creating them on first use, and then ignoring how broken it is perhaps reasonable for now, but I'm unhappy to see it so poorly usable. > > > alot of cases. It is really hacky/wrong to immediately call > > pci_alloc_p2pmem() to defeat the internal genalloc. > In my use-case, I need to use all the pages from the pool and I don't see any > better way to do it. You have to fix the P2P API to work properly for this kind of use case. There should be no genalloc at all. > > I'd rather we stick with the original design. Leon is working on DMA > > API changes that should address half the issue. > > Ok, I'll keep an eye out for Leon's work. It saves from messing with the P2P stuff, but you get the other issues back. Jason
Re: [PATCH v3 5/6] drm/xe: Add helper to accumulate exec queue runtime
On Tue, May 07, 2024 at 03:45:09PM -0700, Lucas De Marchi wrote: From: Umesh Nerlige Ramappa Add a helper to accumulate per-client runtime of all its exec queues. This is called every time a sched job is finished. v2: - Use guc_exec_queue_free_job() and execlist_job_free() to accumulate runtime when job is finished since xe_sched_job_completed() is not a notification that job finished. - Stop trying to update runtime from xe_exec_queue_fini() - that is redundant and may happen after xef is closed, leading to a use-after-free - Do not special case the first timestamp read: the default LRC sets CTX_TIMESTAMP to zero, so even the first sample should be a valid one. - Handle the parallel submission case by multiplying the runtime by width. Signed-off-by: Umesh Nerlige Ramappa Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/xe/xe_device_types.h | 9 +++ drivers/gpu/drm/xe/xe_exec_queue.c | 35 drivers/gpu/drm/xe/xe_exec_queue.h | 1 + drivers/gpu/drm/xe/xe_execlist.c | 1 + drivers/gpu/drm/xe/xe_guc_submit.c | 2 ++ 5 files changed, 48 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h index 906b98fb973b..de078bdf0ab9 100644 --- a/drivers/gpu/drm/xe/xe_device_types.h +++ b/drivers/gpu/drm/xe/xe_device_types.h @@ -560,6 +560,15 @@ struct xe_file { struct mutex lock; } exec_queue; + /** +* @runtime: hw engine class runtime in ticks for this drm client +* +* Only stats from xe_exec_queue->lrc[0] are accumulated. For multi-lrc +* case, since all jobs run in parallel on the engines, only the stats +* from lrc[0] are sufficient. Maybe we can drop the above comment altogether after the multi-lrc update. Umesh
Re: [PATCH v4 5/7] drm/panel: himax-hx83102: Support for BOE nv110wum-l60 MIPI-DSI panel
Hi, On Tue, May 7, 2024 at 6:53 AM Cong Yang wrote: > > +static int boe_nv110wum_init(struct hx83102 *ctx) > +{ > + struct mipi_dsi_multi_context dsi_ctx = { .dsi = ctx->dsi }; > + > + msleep(60); > + > + hx83102_enable_extended_cmds(ctx, true); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETPOWER, 0x2c, 0xaf, > 0xaf, 0x2b, 0xeb, 0x42, > +0xe1, 0x4d, 0x36, 0x36, 0x36, 0x36, > 0x1a, 0x8b, 0x11, 0x65, 0x00, > +0x88, 0xfa, 0xff, 0xff, 0x8f, 0xff, > 0x08, 0x9a, 0x33); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETDISP, 0x00, 0x47, > 0xb0, 0x80, 0x00, 0x12, > +0x71, 0x3c, 0xa3, 0x11, 0x00, 0x00, > 0x00, 0x88, 0xf5, 0x22, 0x8f); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETCYC, 0x49, 0x49, > 0x32, 0x32, 0x14, 0x32, > +0x84, 0x6e, 0x84, 0x6e, 0x01, 0x9c); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETSPCCMD, 0xcd); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETMIPI, 0x84); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETSPCCMD, 0x3f); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETVDC, 0x1b, 0x04); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_UNKNOWN_BE, 0x20); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETPTBA, 0xfc, 0x84); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETSTBA, 0x36, 0x36, > 0x22, 0x00, 0x00, 0xa0, > +0x61, 0x08, 0xf5, 0x03); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETSPCCMD, 0xcc); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETTCON, 0x80); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETSPCCMD, 0x3f); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETSPCCMD, 0xc6); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETRAMDMY, 0x97); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETSPCCMD, 0x3f); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETPWM, 0x00, 0x1e, > 0x30, 0xd4, 0x01); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETCLOCK, 0x08, 0x13, > 0x07, 0x00, 0x0f, 0x34); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETPANEL, 0x02, 0x03, > 0x44); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETSPCCMD, 0xc4); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETCASCADE, 0x03); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETSPCCMD, 0x3f); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETPCTRL, 0x37, 0x06, > 0x00, 0x02, 0x04, 0x0c, 0xff); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_UNKNOWN_D2, 0x1f, > 0x11, 0x1f, 0x11); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETGIP0, 0x06, 0x00, > 0x00, 0x00, 0x00, 0x04, > +0x08, 0x04, 0x08, 0x37, 0x37, 0x64, > 0x4b, 0x11, 0x11, 0x03, 0x03, 0x32, > +0x10, 0x0e, 0x00, 0x0e, 0x32, 0x10, > 0x0a, 0x00, 0x0a, 0x32, 0x17, 0x98, > +0x07, 0x98, 0x00, 0x00); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETGIP1, 0x18, 0x18, > 0x18, 0x18, 0x1e, 0x1e, > +0x1e, 0x1e, 0x1f, 0x1f, 0x1f, 0x1f, > 0x24, 0x24, 0x24, 0x24, 0x07, 0x06, > +0x07, 0x06, 0x05, 0x04, 0x05, 0x04, > 0x03, 0x02, 0x03, 0x02, 0x01, 0x00, > +0x01, 0x00, 0x21, 0x20, 0x21, 0x20, > 0x18, 0x18, 0x18, 0x18, 0x18, 0x18, > +0x18, 0x18); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETGIP3, 0xaf, 0xaa, > 0xaa, 0xaa, 0xaa, 0xa0, > +0xaf, 0xaa, 0xaa, 0xaa, 0xaa, 0xa0); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETGMA, 0x00, 0x05, > 0x0d, 0x14, 0x1b, 0x2c, > +0x44, 0x49, 0x51, 0x4c, 0x67, 0x6c, > 0x71, 0x80, 0x7d, 0x84, 0x8d, 0xa0, > +0xa0, 0x4f, 0x58, 0x64, 0x73, 0x00, > 0x05, 0x0d, 0x14, 0x1b, 0x2c, 0x44, > +0x49, 0x51, 0x4c, 0x67, 0x6c, 0x71, > 0x80, 0x7d, 0x84, 0x8d, 0xa0, 0xa0, > +0x4f, 0x58, 0x64, 0x73); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETTP1, 0x07, 0x10, > 0x10, 0x1a, 0x26, 0x9e, > +0x00, 0x53, 0x9b, 0x14, 0x14); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_UNKNOWN_E1, 0x11, > 0x00, 0x00, 0x89, 0x30, 0x80, > +0x07, 0x80, 0x02, 0x58, 0x00, 0x14, > 0x02, 0x58, 0x02, 0x58, 0x02, 0x00, > +0x02, 0x2c, 0x00, 0x20, 0x02, 0x02, > 0x00, 0x08, 0x00, 0x0c, 0x05, 0x0e, > +0x04, 0x94, 0x18, 0x00, 0x10, 0xf0, > 0x03, 0x0c, 0x20, 0x00, 0x06, 0x0b, > +0x0b,
Re: [PATCH v4 2/7] drm/panel: himax-hx83102: Break out as separate driver
Hi, On Tue, May 7, 2024 at 6:53 AM Cong Yang wrote: > > +static int hx83102_enable_extended_cmds(struct hx83102 *ctx, bool enable) > +{ > + struct mipi_dsi_multi_context dsi_ctx = { .dsi = ctx->dsi }; > + > + if (enable) > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETEXTC, 0x83, > 0x10, 0x21, 0x55, 0x00); > + else > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETEXTC, 0x00, > 0x00, 0x00); > + > + return 0; You're throwing away the error codes returned by the mipi_dsi_dcs_write_seq_multi(), which you shouldn't do. You have two options: Option #1: return dsi_ctx.accum_err here and then check the return value in callers. Option #2: instead of having this function take "struct hx83102 *ctx", just have it take "struct mipi_dsi_multi_context *dsi_ctx". Then it can return void and everything will be fine. I'd prefer option #2 but either is OK w/ me. > +static int starry_himax83102_j02_init(struct hx83102 *ctx) > +{ > + struct mipi_dsi_multi_context dsi_ctx = { .dsi = ctx->dsi }; > + > + hx83102_enable_extended_cmds(ctx, true); > + mipi_dsi_dcs_write_seq_multi(_ctx, HX83102_SETPOWER, 0x2c, 0xb5, > 0xb5, 0x31, 0xf1, > +0x31, 0xd7, 0x2f, 0x36, 0x36, 0x36, > 0x36, 0x1a, 0x8b, 0x11, > +0x65, 0x00, 0x88, 0xfa, 0xff, 0xff, > 0x8f, 0xff, 0x08, 0x74, > +0x33); The indentation is still off here. You have 5 tabs followed by a space. To make things line up with the opening brace I think it should be 4 tabs followed by 5 spaces. > +static int hx83102_enable(struct drm_panel *panel) > +{ > + struct hx83102 *ctx = panel_to_hx83102(panel); > + struct mipi_dsi_device *dsi = ctx->dsi; > + struct device *dev = >dev; > + int ret; > + > + ret = ctx->desc->init(ctx); > + if (ret) > + return ret; You're still changing behavior here. In the old boe-tv101wum-nl6 driver the init() function was invoked at the end of prepare(). Now you've got it at the beginning of enable(). If this change is important it should be in a separate commit and explained. > + ret = mipi_dsi_dcs_exit_sleep_mode(dsi); > + if (ret) { > + dev_err(dev, "Failed to exit sleep mode: %d\n", ret); > + return ret; > + } > + > + msleep(120); > + > + ret = mipi_dsi_dcs_set_display_on(dsi); > + if (ret) { > + dev_err(dev, "Failed to turn on the display: %d\n", ret); > + } The old boe-tv101wum-nl6 driver didn't call mipi_dsi_dcs_exit_sleep_mode() nor mipi_dsi_dcs_set_display_on() in its enable routine, did it? If this change is important please put it in a separate change and justify it. -Doug
Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers
On Tue, May 07, 2024 at 08:35:37PM +0100, Pavel Begunkov wrote: > On 5/7/24 18:56, Jason Gunthorpe wrote: > > On Tue, May 07, 2024 at 06:25:52PM +0100, Pavel Begunkov wrote: > > > On 5/7/24 17:48, Jason Gunthorpe wrote: > > > > On Tue, May 07, 2024 at 09:42:05AM -0700, Mina Almasry wrote: > > > > > > > > > 1. Align with devmem TCP to use udmabuf for your io_uring memory. I > > > > > think in the past you said it's a uapi you don't link but in the face > > > > > of this pushback you may want to reconsider. > > > > > > > > dmabuf does not force a uapi, you can acquire your pages however you > > > > want and wrap them up in a dmabuf. No uapi at all. > > > > > > > > The point is that dmabuf already provides ops that do basically what > > > > is needed here. We don't need ops calling ops just because dmabuf's > > > > ops are not understsood or not perfect. Fixup dmabuf. > > > > > > Those ops, for example, are used to efficiently return used buffers > > > back to the kernel, which is uapi, I don't see how dmabuf can be > > > fixed up to cover it. > > > > Sure, but that doesn't mean you can't use dma buf for the other parts > > of the flow. The per-page lifetime is a different topic than the > > refcounting and access of the entire bulk of memory. > > Ok, so if we're leaving uapi (and ops) and keep per page/sub-buffer as > is, the rest is resolving uptr -> pages, and passing it to page pool in > a convenient to page pool format (net_iov). I'm not going to pretend to know about page pool details, but dmabuf is the way to get the bulk of pages into a pool within the net stack's allocator and keep that bulk properly refcounted while. An object like dmabuf is needed for the general case because there are not going to be per-page references or otherwise available. What you seem to want is to alter how the actual allocation flow works from that bulk of memory and delay the free. It seems like a different topic to me, and honestly hacking into the allocator free function seems a bit weird.. Jason
[PATCH AUTOSEL 5.4 5/6] drm/amdkfd: Flush the process wq before creating a kfd_process
From: Lancelot SIX [ Upstream commit f5b9053398e70a0c10aa9cb4dd5910ab6bc457c5 ] There is a race condition when re-creating a kfd_process for a process. This has been observed when a process under the debugger executes exec(3). In this scenario: - The process executes exec. - This will eventually release the process's mm, which will cause the kfd_process object associated with the process to be freed (kfd_process_free_notifier decrements the reference count to the kfd_process to 0). This causes kfd_process_ref_release to enqueue kfd_process_wq_release to the kfd_process_wq. - The debugger receives the PTRACE_EVENT_EXEC notification, and tries to re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE). - When handling this request, KFD tries to re-create a kfd_process. This eventually calls kfd_create_process and kobject_init_and_add. At this point the call to kobject_init_and_add can fail because the old kfd_process.kobj has not been freed yet by kfd_process_wq_release. This patch proposes to avoid this race by making sure to drain kfd_process_wq before creating a new kfd_process object. This way, we know that any cleanup task is done executing when we reach kobject_init_and_add. Signed-off-by: Lancelot SIX Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index aa0a617b8d445..662e4d973f13a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -289,6 +289,14 @@ struct kfd_process *kfd_create_process(struct file *filep) if (process) { pr_debug("Process already found\n"); } else { + /* If the process just called exec(3), it is possible that the +* cleanup of the kfd_process (following the release of the mm +* of the old process image) is still in the cleanup work queue. +* Make sure to drain any job before trying to recreate any +* resource for this process. +*/ + flush_workqueue(kfd_process_wq); + process = create_process(thread); if (IS_ERR(process)) goto out; -- 2.43.0
[PATCH AUTOSEL 5.10 8/9] drm/amdkfd: Flush the process wq before creating a kfd_process
From: Lancelot SIX [ Upstream commit f5b9053398e70a0c10aa9cb4dd5910ab6bc457c5 ] There is a race condition when re-creating a kfd_process for a process. This has been observed when a process under the debugger executes exec(3). In this scenario: - The process executes exec. - This will eventually release the process's mm, which will cause the kfd_process object associated with the process to be freed (kfd_process_free_notifier decrements the reference count to the kfd_process to 0). This causes kfd_process_ref_release to enqueue kfd_process_wq_release to the kfd_process_wq. - The debugger receives the PTRACE_EVENT_EXEC notification, and tries to re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE). - When handling this request, KFD tries to re-create a kfd_process. This eventually calls kfd_create_process and kobject_init_and_add. At this point the call to kobject_init_and_add can fail because the old kfd_process.kobj has not been freed yet by kfd_process_wq_release. This patch proposes to avoid this race by making sure to drain kfd_process_wq before creating a new kfd_process object. This way, we know that any cleanup task is done executing when we reach kobject_init_and_add. Signed-off-by: Lancelot SIX Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index d243e60c6eef7..534f2dec6356f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -766,6 +766,14 @@ struct kfd_process *kfd_create_process(struct file *filep) if (process) { pr_debug("Process already found\n"); } else { + /* If the process just called exec(3), it is possible that the +* cleanup of the kfd_process (following the release of the mm +* of the old process image) is still in the cleanup work queue. +* Make sure to drain any job before trying to recreate any +* resource for this process. +*/ + flush_workqueue(kfd_process_wq); + process = create_process(thread); if (IS_ERR(process)) goto out; -- 2.43.0
[PATCH AUTOSEL 5.15 12/15] drm/amdkfd: Flush the process wq before creating a kfd_process
From: Lancelot SIX [ Upstream commit f5b9053398e70a0c10aa9cb4dd5910ab6bc457c5 ] There is a race condition when re-creating a kfd_process for a process. This has been observed when a process under the debugger executes exec(3). In this scenario: - The process executes exec. - This will eventually release the process's mm, which will cause the kfd_process object associated with the process to be freed (kfd_process_free_notifier decrements the reference count to the kfd_process to 0). This causes kfd_process_ref_release to enqueue kfd_process_wq_release to the kfd_process_wq. - The debugger receives the PTRACE_EVENT_EXEC notification, and tries to re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE). - When handling this request, KFD tries to re-create a kfd_process. This eventually calls kfd_create_process and kobject_init_and_add. At this point the call to kobject_init_and_add can fail because the old kfd_process.kobj has not been freed yet by kfd_process_wq_release. This patch proposes to avoid this race by making sure to drain kfd_process_wq before creating a new kfd_process object. This way, we know that any cleanup task is done executing when we reach kobject_init_and_add. Signed-off-by: Lancelot SIX Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 21ec8a18cad29..7f69031f2b61a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -818,6 +818,14 @@ struct kfd_process *kfd_create_process(struct file *filep) if (process) { pr_debug("Process already found\n"); } else { + /* If the process just called exec(3), it is possible that the +* cleanup of the kfd_process (following the release of the mm +* of the old process image) is still in the cleanup work queue. +* Make sure to drain any job before trying to recreate any +* resource for this process. +*/ + flush_workqueue(kfd_process_wq); + process = create_process(thread); if (IS_ERR(process)) goto out; -- 2.43.0
[PATCH AUTOSEL 5.15 11/15] drm/amd/display: Atom Integrated System Info v2_2 for DCN35
From: Gabe Teeger [ Upstream commit 9a35d205f466501dcfe5625ca313d944d0ac2d60 ] New request from KMD/VBIOS in order to support new UMA carveout model. This fixes a null dereference from accessing Ctx->dc_bios->integrated_info while it was NULL. DAL parses through the BIOS and extracts the necessary integrated_info but was missing a case for the new BIOS version 2.3. Reviewed-by: Nicholas Kazlauskas Acked-by: Aurabindo Pillai Signed-off-by: Gabe Teeger Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c index 228f098e5d88f..6bc8c6bee411e 100644 --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c @@ -2303,6 +2303,7 @@ static enum bp_result construct_integrated_info( result = get_integrated_info_v2_1(bp, info); break; case 2: + case 3: result = get_integrated_info_v2_2(bp, info); break; default: -- 2.43.0
[PATCH AUTOSEL 6.1 18/25] drm/amdkfd: Flush the process wq before creating a kfd_process
From: Lancelot SIX [ Upstream commit f5b9053398e70a0c10aa9cb4dd5910ab6bc457c5 ] There is a race condition when re-creating a kfd_process for a process. This has been observed when a process under the debugger executes exec(3). In this scenario: - The process executes exec. - This will eventually release the process's mm, which will cause the kfd_process object associated with the process to be freed (kfd_process_free_notifier decrements the reference count to the kfd_process to 0). This causes kfd_process_ref_release to enqueue kfd_process_wq_release to the kfd_process_wq. - The debugger receives the PTRACE_EVENT_EXEC notification, and tries to re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE). - When handling this request, KFD tries to re-create a kfd_process. This eventually calls kfd_create_process and kobject_init_and_add. At this point the call to kobject_init_and_add can fail because the old kfd_process.kobj has not been freed yet by kfd_process_wq_release. This patch proposes to avoid this race by making sure to drain kfd_process_wq before creating a new kfd_process object. This way, we know that any cleanup task is done executing when we reach kobject_init_and_add. Signed-off-by: Lancelot SIX Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 7f68d51541e8e..5bca6abd55aef 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -823,6 +823,14 @@ struct kfd_process *kfd_create_process(struct file *filep) if (process) { pr_debug("Process already found\n"); } else { + /* If the process just called exec(3), it is possible that the +* cleanup of the kfd_process (following the release of the mm +* of the old process image) is still in the cleanup work queue. +* Make sure to drain any job before trying to recreate any +* resource for this process. +*/ + flush_workqueue(kfd_process_wq); + process = create_process(thread); if (IS_ERR(process)) goto out; -- 2.43.0
[PATCH AUTOSEL 6.1 17/25] drm/amd/display: Add VCO speed parameter for DCN31 FPU
From: Rodrigo Siqueira [ Upstream commit 0e62103bdcbc88281e16add299a946fb3bd02fbe ] Add VCO speed parameters in the bounding box array. Acked-by: Wayne Lin Signed-off-by: Rodrigo Siqueira Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c index 19d034341e640..cb2f6cd73af54 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c @@ -291,6 +291,7 @@ static struct _vcs_dpi_soc_bounding_box_st dcn3_15_soc = { .do_urgent_latency_adjustment = false, .urgent_latency_adjustment_fabric_clock_component_us = 0, .urgent_latency_adjustment_fabric_clock_reference_mhz = 0, + .dispclk_dppclk_vco_speed_mhz = 2400.0, .num_chans = 4, .dummy_pstate_latency_us = 10.0 }; @@ -438,6 +439,7 @@ static struct _vcs_dpi_soc_bounding_box_st dcn3_16_soc = { .do_urgent_latency_adjustment = false, .urgent_latency_adjustment_fabric_clock_component_us = 0, .urgent_latency_adjustment_fabric_clock_reference_mhz = 0, + .dispclk_dppclk_vco_speed_mhz = 2500.0, }; void dcn31_zero_pipe_dcc_fraction(display_e2e_pipe_params_st *pipes, -- 2.43.0
[PATCH AUTOSEL 6.1 16/25] drm/amd/display: Atom Integrated System Info v2_2 for DCN35
From: Gabe Teeger [ Upstream commit 9a35d205f466501dcfe5625ca313d944d0ac2d60 ] New request from KMD/VBIOS in order to support new UMA carveout model. This fixes a null dereference from accessing Ctx->dc_bios->integrated_info while it was NULL. DAL parses through the BIOS and extracts the necessary integrated_info but was missing a case for the new BIOS version 2.3. Reviewed-by: Nicholas Kazlauskas Acked-by: Aurabindo Pillai Signed-off-by: Gabe Teeger Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c index 93e40e0a15087..4d2590964a204 100644 --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c @@ -2962,6 +2962,7 @@ static enum bp_result construct_integrated_info( result = get_integrated_info_v2_1(bp, info); break; case 2: + case 3: result = get_integrated_info_v2_2(bp, info); break; default: -- 2.43.0
[PATCH AUTOSEL 6.1 15/25] drm/amd/display: Add dtbclk access to dcn315
From: Swapnil Patel [ Upstream commit a01b64f31d65bdc917d1afb4cec9915beb6931be ] [Why & How] Currently DCN315 clk manager is missing code to enable/disable dtbclk. Because of this, "optimized_required" flag is constantly set and this prevents FreeSync from engaging for certain high bandwidth display Modes which require DTBCLK. Reviewed-by: Dmytro Laktyushkin Acked-by: Aurabindo Pillai Signed-off-by: Swapnil Patel Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c| 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c index 28b83133db910..09eb1bc9aa030 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c @@ -131,6 +131,10 @@ static void dcn315_update_clocks(struct clk_mgr *clk_mgr_base, */ clk_mgr_base->clks.zstate_support = new_clocks->zstate_support; if (safe_to_lower) { + if (clk_mgr_base->clks.dtbclk_en && !new_clocks->dtbclk_en) { + dcn315_smu_set_dtbclk(clk_mgr, false); + clk_mgr_base->clks.dtbclk_en = new_clocks->dtbclk_en; + } /* check that we're not already in lower */ if (clk_mgr_base->clks.pwr_state != DCN_PWR_STATE_LOW_POWER) { display_count = dcn315_get_active_display_cnt_wa(dc, context); @@ -146,6 +150,10 @@ static void dcn315_update_clocks(struct clk_mgr *clk_mgr_base, } } } else { + if (!clk_mgr_base->clks.dtbclk_en && new_clocks->dtbclk_en) { + dcn315_smu_set_dtbclk(clk_mgr, true); + clk_mgr_base->clks.dtbclk_en = new_clocks->dtbclk_en; + } /* check that we're not already in D0 */ if (clk_mgr_base->clks.pwr_state != DCN_PWR_STATE_MISSION_MODE) { union display_idle_optimization_u idle_info = { 0 }; -- 2.43.0
[PATCH AUTOSEL 6.6 35/43] drm/amdkfd: Flush the process wq before creating a kfd_process
From: Lancelot SIX [ Upstream commit f5b9053398e70a0c10aa9cb4dd5910ab6bc457c5 ] There is a race condition when re-creating a kfd_process for a process. This has been observed when a process under the debugger executes exec(3). In this scenario: - The process executes exec. - This will eventually release the process's mm, which will cause the kfd_process object associated with the process to be freed (kfd_process_free_notifier decrements the reference count to the kfd_process to 0). This causes kfd_process_ref_release to enqueue kfd_process_wq_release to the kfd_process_wq. - The debugger receives the PTRACE_EVENT_EXEC notification, and tries to re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE). - When handling this request, KFD tries to re-create a kfd_process. This eventually calls kfd_create_process and kobject_init_and_add. At this point the call to kobject_init_and_add can fail because the old kfd_process.kobj has not been freed yet by kfd_process_wq_release. This patch proposes to avoid this race by making sure to drain kfd_process_wq before creating a new kfd_process object. This way, we know that any cleanup task is done executing when we reach kobject_init_and_add. Signed-off-by: Lancelot SIX Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 7a1a574106fac..d98e45aec76b4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -828,6 +828,14 @@ struct kfd_process *kfd_create_process(struct task_struct *thread) if (process) { pr_debug("Process already found\n"); } else { + /* If the process just called exec(3), it is possible that the +* cleanup of the kfd_process (following the release of the mm +* of the old process image) is still in the cleanup work queue. +* Make sure to drain any job before trying to recreate any +* resource for this process. +*/ + flush_workqueue(kfd_process_wq); + process = create_process(thread); if (IS_ERR(process)) goto out; -- 2.43.0
[PATCH AUTOSEL 6.6 34/43] drm/amd/display: Disable seamless boot on 128b/132b encoding
From: Sung Joon Kim [ Upstream commit 6f0c228ed9184287031a66b46a79e5a3d2e73a86 ] [why] preOS will not support display mode programming and link training for UHBR rates. [how] If we detect a sink that's UHBR capable, disable seamless boot Reviewed-by: Anthony Koo Acked-by: Wayne Lin Signed-off-by: Sung Joon Kim Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 46b10ff8f6d41..72db370e2f21f 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1710,6 +1710,9 @@ bool dc_validate_boot_timing(const struct dc *dc, return false; } + if (link->dpcd_caps.channel_coding_cap.bits.DP_128b_132b_SUPPORTED) + return false; + if (dc->link_srv->edp_is_ilr_optimization_required(link, crtc_timing)) { DC_LOG_EVENT_LINK_TRAINING("Seamless boot disabled to optimize eDP link rate\n"); return false; -- 2.43.0
[PATCH AUTOSEL 6.6 33/43] drm/amd/display: Fix DC mode screen flickering on DCN321
From: Leo Ma [ Upstream commit ce649bd2d834db83ecc2756a362c9a1ec61658a5 ] [Why && How] Screen flickering saw on 4K@60 eDP with high refresh rate external monitor when booting up in DC mode. DC Mode Capping is disabled which caused wrong UCLK being used. Reviewed-by: Alvin Lee Acked-by: Wayne Lin Signed-off-by: Leo Ma Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c index e9345f6554dbc..2428a4763b85f 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c @@ -547,8 +547,12 @@ static void dcn32_update_clocks(struct clk_mgr *clk_mgr_base, * since we calculate mode support based on softmax being the max UCLK * frequency. */ - dcn32_smu_set_hard_min_by_freq(clk_mgr, PPCLK_UCLK, - dc->clk_mgr->bw_params->dc_mode_softmax_memclk); + if (dc->debug.disable_dc_mode_overwrite) { + dcn30_smu_set_hard_max_by_freq(clk_mgr, PPCLK_UCLK, dc->clk_mgr->bw_params->max_memclk_mhz); + dcn32_smu_set_hard_min_by_freq(clk_mgr, PPCLK_UCLK, dc->clk_mgr->bw_params->max_memclk_mhz); + } else + dcn32_smu_set_hard_min_by_freq(clk_mgr, PPCLK_UCLK, + dc->clk_mgr->bw_params->dc_mode_softmax_memclk); } else { dcn32_smu_set_hard_min_by_freq(clk_mgr, PPCLK_UCLK, dc->clk_mgr->bw_params->max_memclk_mhz); } @@ -581,8 +585,13 @@ static void dcn32_update_clocks(struct clk_mgr *clk_mgr_base, /* set UCLK to requested value if P-State switching is supported, or to re-enable P-State switching */ if (clk_mgr_base->clks.p_state_change_support && (update_uclk || !clk_mgr_base->clks.prev_p_state_change_support) && - !dc->work_arounds.clock_update_disable_mask.uclk) + !dc->work_arounds.clock_update_disable_mask.uclk) { + if (dc->clk_mgr->dc_mode_softmax_enabled && dc->debug.disable_dc_mode_overwrite) + dcn30_smu_set_hard_max_by_freq(clk_mgr, PPCLK_UCLK, + max((int)dc->clk_mgr->bw_params->dc_mode_softmax_memclk, khz_to_mhz_ceil(clk_mgr_base->clks.dramclk_khz))); + dcn32_smu_set_hard_min_by_freq(clk_mgr, PPCLK_UCLK, khz_to_mhz_ceil(clk_mgr_base->clks.dramclk_khz)); + } if (clk_mgr_base->clks.num_ways != new_clocks->num_ways && clk_mgr_base->clks.num_ways > new_clocks->num_ways) { -- 2.43.0
[PATCH AUTOSEL 6.6 32/43] drm/amd/display: Add VCO speed parameter for DCN31 FPU
From: Rodrigo Siqueira [ Upstream commit 0e62103bdcbc88281e16add299a946fb3bd02fbe ] Add VCO speed parameters in the bounding box array. Acked-by: Wayne Lin Signed-off-by: Rodrigo Siqueira Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c index deb6d162a2d5c..7307b7b8d8ad7 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c @@ -291,6 +291,7 @@ static struct _vcs_dpi_soc_bounding_box_st dcn3_15_soc = { .do_urgent_latency_adjustment = false, .urgent_latency_adjustment_fabric_clock_component_us = 0, .urgent_latency_adjustment_fabric_clock_reference_mhz = 0, + .dispclk_dppclk_vco_speed_mhz = 2400.0, .num_chans = 4, .dummy_pstate_latency_us = 10.0 }; @@ -438,6 +439,7 @@ static struct _vcs_dpi_soc_bounding_box_st dcn3_16_soc = { .do_urgent_latency_adjustment = false, .urgent_latency_adjustment_fabric_clock_component_us = 0, .urgent_latency_adjustment_fabric_clock_reference_mhz = 0, + .dispclk_dppclk_vco_speed_mhz = 2500.0, }; void dcn31_zero_pipe_dcc_fraction(display_e2e_pipe_params_st *pipes, -- 2.43.0
[PATCH AUTOSEL 6.6 31/43] drm/amd/display: Allocate zero bw after bw alloc enable
From: Meenakshikumar Somasundaram [ Upstream commit 46fe9cb1a9e62f4e6229f48ae303ef8e6c1fdc64 ] [Why] During DP tunnel creation, CM preallocates BW and reduces estimated BW of other DPIA. CM release preallocation only when allocation is complete. Display mode validation logic validates timings based on bw available per host router. In multi display setup, this causes bw allocation failure when allocation greater than estimated bw. [How] Do zero alloc to make the CM to release preallocation and update estimated BW correctly for all DPIAs per host router. Reviewed-by: PeiChen Huang Acked-by: Aurabindo Pillai Signed-off-by: Meenakshikumar Somasundaram Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../amd/display/dc/link/protocols/link_dp_dpia_bw.c| 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c index 5491b707cec88..5a965c26bf209 100644 --- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c +++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c @@ -270,7 +270,7 @@ static void set_usb4_req_bw_req(struct dc_link *link, int req_bw) /* Error check whether requested and allocated are equal */ req_bw = requested_bw * (Kbps_TO_Gbps / link->dpia_bw_alloc_config.bw_granularity); - if (req_bw == link->dpia_bw_alloc_config.allocated_bw) { + if (req_bw && (req_bw == link->dpia_bw_alloc_config.allocated_bw)) { DC_LOG_ERROR("%s: Request bw equals to allocated bw for link(%d)\n", __func__, link->link_index); } @@ -341,6 +341,14 @@ bool link_dp_dpia_set_dptx_usb4_bw_alloc_support(struct dc_link *link) ret = true; init_usb4_bw_struct(link); link->dpia_bw_alloc_config.bw_alloc_enabled = true; + + /* +* During DP tunnel creation, CM preallocates BW and reduces estimated BW of other +* DPIA. CM release preallocation only when allocation is complete. Do zero alloc +* to make the CM to release preallocation and update estimated BW correctly for +* all DPIAs per host router +*/ + link_dp_dpia_allocate_usb4_bandwidth_for_stream(link, 0); } } -- 2.43.0
[PATCH AUTOSEL 6.6 30/43] drm/amd/display: Atom Integrated System Info v2_2 for DCN35
From: Gabe Teeger [ Upstream commit 9a35d205f466501dcfe5625ca313d944d0ac2d60 ] New request from KMD/VBIOS in order to support new UMA carveout model. This fixes a null dereference from accessing Ctx->dc_bios->integrated_info while it was NULL. DAL parses through the BIOS and extracts the necessary integrated_info but was missing a case for the new BIOS version 2.3. Reviewed-by: Nicholas Kazlauskas Acked-by: Aurabindo Pillai Signed-off-by: Gabe Teeger Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c index 4c3c4c8de1cfc..93720cf069d7c 100644 --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c @@ -2961,6 +2961,7 @@ static enum bp_result construct_integrated_info( result = get_integrated_info_v2_1(bp, info); break; case 2: + case 3: result = get_integrated_info_v2_2(bp, info); break; default: -- 2.43.0
[PATCH AUTOSEL 6.6 29/43] drm/amd/display: Add dtbclk access to dcn315
From: Swapnil Patel [ Upstream commit a01b64f31d65bdc917d1afb4cec9915beb6931be ] [Why & How] Currently DCN315 clk manager is missing code to enable/disable dtbclk. Because of this, "optimized_required" flag is constantly set and this prevents FreeSync from engaging for certain high bandwidth display Modes which require DTBCLK. Reviewed-by: Dmytro Laktyushkin Acked-by: Aurabindo Pillai Signed-off-by: Swapnil Patel Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c| 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c index 8776055bbeaae..d4d3f58a613f7 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c @@ -145,6 +145,10 @@ static void dcn315_update_clocks(struct clk_mgr *clk_mgr_base, */ clk_mgr_base->clks.zstate_support = new_clocks->zstate_support; if (safe_to_lower) { + if (clk_mgr_base->clks.dtbclk_en && !new_clocks->dtbclk_en) { + dcn315_smu_set_dtbclk(clk_mgr, false); + clk_mgr_base->clks.dtbclk_en = new_clocks->dtbclk_en; + } /* check that we're not already in lower */ if (clk_mgr_base->clks.pwr_state != DCN_PWR_STATE_LOW_POWER) { display_count = dcn315_get_active_display_cnt_wa(dc, context); @@ -160,6 +164,10 @@ static void dcn315_update_clocks(struct clk_mgr *clk_mgr_base, } } } else { + if (!clk_mgr_base->clks.dtbclk_en && new_clocks->dtbclk_en) { + dcn315_smu_set_dtbclk(clk_mgr, true); + clk_mgr_base->clks.dtbclk_en = new_clocks->dtbclk_en; + } /* check that we're not already in D0 */ if (clk_mgr_base->clks.pwr_state != DCN_PWR_STATE_MISSION_MODE) { union display_idle_optimization_u idle_info = { 0 }; -- 2.43.0
[PATCH AUTOSEL 6.6 28/43] drm/amdgpu: Fix VRAM memory accounting
From: Mukul Joshi [ Upstream commit f06446ef23216090d1ee8ede1a7d7ae430c22dcc ] Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 15c5a2533ba60..704567885c7a4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -213,7 +213,7 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, (kfd_mem_limit.ttm_mem_used + ttm_mem_needed > kfd_mem_limit.max_ttm_mem_limit) || (adev && xcp_id >= 0 && adev->kfd.vram_used[xcp_id] + vram_needed > -vram_size - reserved_for_pt)) { +vram_size - reserved_for_pt - atomic64_read(>vram_pin_size))) { ret = -ENOMEM; goto release; } -- 2.43.0
[PATCH AUTOSEL 6.8 43/52] drm/amdkfd: Flush the process wq before creating a kfd_process
From: Lancelot SIX [ Upstream commit f5b9053398e70a0c10aa9cb4dd5910ab6bc457c5 ] There is a race condition when re-creating a kfd_process for a process. This has been observed when a process under the debugger executes exec(3). In this scenario: - The process executes exec. - This will eventually release the process's mm, which will cause the kfd_process object associated with the process to be freed (kfd_process_free_notifier decrements the reference count to the kfd_process to 0). This causes kfd_process_ref_release to enqueue kfd_process_wq_release to the kfd_process_wq. - The debugger receives the PTRACE_EVENT_EXEC notification, and tries to re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE). - When handling this request, KFD tries to re-create a kfd_process. This eventually calls kfd_create_process and kobject_init_and_add. At this point the call to kobject_init_and_add can fail because the old kfd_process.kobj has not been freed yet by kfd_process_wq_release. This patch proposes to avoid this race by making sure to drain kfd_process_wq before creating a new kfd_process object. This way, we know that any cleanup task is done executing when we reach kobject_init_and_add. Signed-off-by: Lancelot SIX Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 58c1fe5421934..451bb058cc620 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -829,6 +829,14 @@ struct kfd_process *kfd_create_process(struct task_struct *thread) if (process) { pr_debug("Process already found\n"); } else { + /* If the process just called exec(3), it is possible that the +* cleanup of the kfd_process (following the release of the mm +* of the old process image) is still in the cleanup work queue. +* Make sure to drain any job before trying to recreate any +* resource for this process. +*/ + flush_workqueue(kfd_process_wq); + process = create_process(thread); if (IS_ERR(process)) goto out; -- 2.43.0
[PATCH AUTOSEL 6.8 42/52] drm/amd/display: Disable seamless boot on 128b/132b encoding
From: Sung Joon Kim [ Upstream commit 6f0c228ed9184287031a66b46a79e5a3d2e73a86 ] [why] preOS will not support display mode programming and link training for UHBR rates. [how] If we detect a sink that's UHBR capable, disable seamless boot Reviewed-by: Anthony Koo Acked-by: Wayne Lin Signed-off-by: Sung Joon Kim Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 3c3d613c5f00e..040b5c2a57586 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1804,6 +1804,9 @@ bool dc_validate_boot_timing(const struct dc *dc, return false; } + if (link->dpcd_caps.channel_coding_cap.bits.DP_128b_132b_SUPPORTED) + return false; + if (dc->link_srv->edp_is_ilr_optimization_required(link, crtc_timing)) { DC_LOG_EVENT_LINK_TRAINING("Seamless boot disabled to optimize eDP link rate\n"); return false; -- 2.43.0
[PATCH AUTOSEL 6.8 41/52] drm/amd/display: Fix DC mode screen flickering on DCN321
From: Leo Ma [ Upstream commit ce649bd2d834db83ecc2756a362c9a1ec61658a5 ] [Why && How] Screen flickering saw on 4K@60 eDP with high refresh rate external monitor when booting up in DC mode. DC Mode Capping is disabled which caused wrong UCLK being used. Reviewed-by: Alvin Lee Acked-by: Wayne Lin Signed-off-by: Leo Ma Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c index bbdbc78161a00..39c63565baa9a 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c @@ -696,8 +696,12 @@ static void dcn32_update_clocks(struct clk_mgr *clk_mgr_base, * since we calculate mode support based on softmax being the max UCLK * frequency. */ - dcn32_smu_set_hard_min_by_freq(clk_mgr, PPCLK_UCLK, - dc->clk_mgr->bw_params->dc_mode_softmax_memclk); + if (dc->debug.disable_dc_mode_overwrite) { + dcn30_smu_set_hard_max_by_freq(clk_mgr, PPCLK_UCLK, dc->clk_mgr->bw_params->max_memclk_mhz); + dcn32_smu_set_hard_min_by_freq(clk_mgr, PPCLK_UCLK, dc->clk_mgr->bw_params->max_memclk_mhz); + } else + dcn32_smu_set_hard_min_by_freq(clk_mgr, PPCLK_UCLK, + dc->clk_mgr->bw_params->dc_mode_softmax_memclk); } else { dcn32_smu_set_hard_min_by_freq(clk_mgr, PPCLK_UCLK, dc->clk_mgr->bw_params->max_memclk_mhz); } @@ -730,8 +734,13 @@ static void dcn32_update_clocks(struct clk_mgr *clk_mgr_base, /* set UCLK to requested value if P-State switching is supported, or to re-enable P-State switching */ if (clk_mgr_base->clks.p_state_change_support && (update_uclk || !clk_mgr_base->clks.prev_p_state_change_support) && - !dc->work_arounds.clock_update_disable_mask.uclk) + !dc->work_arounds.clock_update_disable_mask.uclk) { + if (dc->clk_mgr->dc_mode_softmax_enabled && dc->debug.disable_dc_mode_overwrite) + dcn30_smu_set_hard_max_by_freq(clk_mgr, PPCLK_UCLK, + max((int)dc->clk_mgr->bw_params->dc_mode_softmax_memclk, khz_to_mhz_ceil(clk_mgr_base->clks.dramclk_khz))); + dcn32_smu_set_hard_min_by_freq(clk_mgr, PPCLK_UCLK, khz_to_mhz_ceil(clk_mgr_base->clks.dramclk_khz)); + } if (clk_mgr_base->clks.num_ways != new_clocks->num_ways && clk_mgr_base->clks.num_ways > new_clocks->num_ways) { -- 2.43.0
[PATCH AUTOSEL 6.8 40/52] drm/amd/display: Add VCO speed parameter for DCN31 FPU
From: Rodrigo Siqueira [ Upstream commit 0e62103bdcbc88281e16add299a946fb3bd02fbe ] Add VCO speed parameters in the bounding box array. Acked-by: Wayne Lin Signed-off-by: Rodrigo Siqueira Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c index deb6d162a2d5c..7307b7b8d8ad7 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c @@ -291,6 +291,7 @@ static struct _vcs_dpi_soc_bounding_box_st dcn3_15_soc = { .do_urgent_latency_adjustment = false, .urgent_latency_adjustment_fabric_clock_component_us = 0, .urgent_latency_adjustment_fabric_clock_reference_mhz = 0, + .dispclk_dppclk_vco_speed_mhz = 2400.0, .num_chans = 4, .dummy_pstate_latency_us = 10.0 }; @@ -438,6 +439,7 @@ static struct _vcs_dpi_soc_bounding_box_st dcn3_16_soc = { .do_urgent_latency_adjustment = false, .urgent_latency_adjustment_fabric_clock_component_us = 0, .urgent_latency_adjustment_fabric_clock_reference_mhz = 0, + .dispclk_dppclk_vco_speed_mhz = 2500.0, }; void dcn31_zero_pipe_dcc_fraction(display_e2e_pipe_params_st *pipes, -- 2.43.0
[PATCH AUTOSEL 6.8 39/52] drm/amd/display: Allocate zero bw after bw alloc enable
From: Meenakshikumar Somasundaram [ Upstream commit 46fe9cb1a9e62f4e6229f48ae303ef8e6c1fdc64 ] [Why] During DP tunnel creation, CM preallocates BW and reduces estimated BW of other DPIA. CM release preallocation only when allocation is complete. Display mode validation logic validates timings based on bw available per host router. In multi display setup, this causes bw allocation failure when allocation greater than estimated bw. [How] Do zero alloc to make the CM to release preallocation and update estimated BW correctly for all DPIAs per host router. Reviewed-by: PeiChen Huang Acked-by: Aurabindo Pillai Signed-off-by: Meenakshikumar Somasundaram Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../amd/display/dc/link/protocols/link_dp_dpia_bw.c| 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c index 5491b707cec88..5a965c26bf209 100644 --- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c +++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c @@ -270,7 +270,7 @@ static void set_usb4_req_bw_req(struct dc_link *link, int req_bw) /* Error check whether requested and allocated are equal */ req_bw = requested_bw * (Kbps_TO_Gbps / link->dpia_bw_alloc_config.bw_granularity); - if (req_bw == link->dpia_bw_alloc_config.allocated_bw) { + if (req_bw && (req_bw == link->dpia_bw_alloc_config.allocated_bw)) { DC_LOG_ERROR("%s: Request bw equals to allocated bw for link(%d)\n", __func__, link->link_index); } @@ -341,6 +341,14 @@ bool link_dp_dpia_set_dptx_usb4_bw_alloc_support(struct dc_link *link) ret = true; init_usb4_bw_struct(link); link->dpia_bw_alloc_config.bw_alloc_enabled = true; + + /* +* During DP tunnel creation, CM preallocates BW and reduces estimated BW of other +* DPIA. CM release preallocation only when allocation is complete. Do zero alloc +* to make the CM to release preallocation and update estimated BW correctly for +* all DPIAs per host router +*/ + link_dp_dpia_allocate_usb4_bandwidth_for_stream(link, 0); } } -- 2.43.0
[PATCH AUTOSEL 6.8 38/52] drm/amd/display: Atom Integrated System Info v2_2 for DCN35
From: Gabe Teeger [ Upstream commit 9a35d205f466501dcfe5625ca313d944d0ac2d60 ] New request from KMD/VBIOS in order to support new UMA carveout model. This fixes a null dereference from accessing Ctx->dc_bios->integrated_info while it was NULL. DAL parses through the BIOS and extracts the necessary integrated_info but was missing a case for the new BIOS version 2.3. Reviewed-by: Nicholas Kazlauskas Acked-by: Aurabindo Pillai Signed-off-by: Gabe Teeger Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c index 05f392501c0ae..ab31643b10969 100644 --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c @@ -2948,6 +2948,7 @@ static enum bp_result construct_integrated_info( result = get_integrated_info_v2_1(bp, info); break; case 2: + case 3: result = get_integrated_info_v2_2(bp, info); break; default: -- 2.43.0
[PATCH AUTOSEL 6.8 37/52] drm/amd/display: Add dtbclk access to dcn315
From: Swapnil Patel [ Upstream commit a01b64f31d65bdc917d1afb4cec9915beb6931be ] [Why & How] Currently DCN315 clk manager is missing code to enable/disable dtbclk. Because of this, "optimized_required" flag is constantly set and this prevents FreeSync from engaging for certain high bandwidth display Modes which require DTBCLK. Reviewed-by: Dmytro Laktyushkin Acked-by: Aurabindo Pillai Signed-off-by: Swapnil Patel Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c| 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c index 644da46373209..5506cf9b3672f 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c @@ -145,6 +145,10 @@ static void dcn315_update_clocks(struct clk_mgr *clk_mgr_base, */ clk_mgr_base->clks.zstate_support = new_clocks->zstate_support; if (safe_to_lower) { + if (clk_mgr_base->clks.dtbclk_en && !new_clocks->dtbclk_en) { + dcn315_smu_set_dtbclk(clk_mgr, false); + clk_mgr_base->clks.dtbclk_en = new_clocks->dtbclk_en; + } /* check that we're not already in lower */ if (clk_mgr_base->clks.pwr_state != DCN_PWR_STATE_LOW_POWER) { display_count = dcn315_get_active_display_cnt_wa(dc, context); @@ -160,6 +164,10 @@ static void dcn315_update_clocks(struct clk_mgr *clk_mgr_base, } } } else { + if (!clk_mgr_base->clks.dtbclk_en && new_clocks->dtbclk_en) { + dcn315_smu_set_dtbclk(clk_mgr, true); + clk_mgr_base->clks.dtbclk_en = new_clocks->dtbclk_en; + } /* check that we're not already in D0 */ if (clk_mgr_base->clks.pwr_state != DCN_PWR_STATE_MISSION_MODE) { union display_idle_optimization_u idle_info = { 0 }; -- 2.43.0
[PATCH AUTOSEL 6.8 36/52] drm/amd/display: Ensure that dmcub support flag is set for DCN20
From: Rodrigo Siqueira [ Upstream commit be53bd4f00aa4c7db9f41116224c027b4cfce8e3 ] In the DCN20 resource initialization, ensure that DMCUB support starts configured as true. Signed-off-by: Rodrigo Siqueira Acked-by: Aurabindo Pillai Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/resource/dcn20/dcn20_resource.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn20/dcn20_resource.c index f9c5bc624be30..f81f6110913d3 100644 --- a/drivers/gpu/drm/amd/display/dc/resource/dcn20/dcn20_resource.c +++ b/drivers/gpu/drm/amd/display/dc/resource/dcn20/dcn20_resource.c @@ -2451,6 +2451,7 @@ static bool dcn20_resource_construct( dc->caps.post_blend_color_processing = true; dc->caps.force_dp_tps4_for_cp2520 = true; dc->caps.extended_aux_timeout_support = true; + dc->caps.dmcub_support = true; /* Color pipeline capabilities */ dc->caps.color.dpp.dcn_arch = 1; -- 2.43.0
[PATCH AUTOSEL 6.8 35/52] drm/amdgpu: Fix VRAM memory accounting
From: Mukul Joshi [ Upstream commit f06446ef23216090d1ee8ede1a7d7ae430c22dcc ] Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index daa66eb4f722b..b1e2dd52e643d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -220,7 +220,7 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, (kfd_mem_limit.ttm_mem_used + ttm_mem_needed > kfd_mem_limit.max_ttm_mem_limit) || (adev && xcp_id >= 0 && adev->kfd.vram_used[xcp_id] + vram_needed > -vram_size - reserved_for_pt)) { +vram_size - reserved_for_pt - atomic64_read(>vram_pin_size))) { ret = -ENOMEM; goto release; } -- 2.43.0
[PATCH] drm/msm: remove python 3.9 dependency for compiling msm
Since commit 5acf49119630 ("drm/msm: import gen_header.py script from Mesa"), compilation is broken on machines having python versions older than 3.9 due to dependency on argparse.BooleanOptionalAction. Switch to use simple bool for the validate flag to remove the dependency. Fixes: 5acf49119630 ("drm/msm: import gen_header.py script from Mesa") Signed-off-by: Abhinav Kumar --- drivers/gpu/drm/msm/registers/gen_header.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/msm/registers/gen_header.py b/drivers/gpu/drm/msm/registers/gen_header.py index fc3bfdc991d2..3926485bb197 100644 --- a/drivers/gpu/drm/msm/registers/gen_header.py +++ b/drivers/gpu/drm/msm/registers/gen_header.py @@ -538,7 +538,7 @@ class Parser(object): self.variants.add(reg.domain) def do_validate(self, schemafile): - if self.validate == False: + if not self.validate: return try: @@ -948,7 +948,8 @@ def main(): parser = argparse.ArgumentParser() parser.add_argument('--rnn', type=str, required=True) parser.add_argument('--xml', type=str, required=True) - parser.add_argument('--validate', action=argparse.BooleanOptionalAction) + parser.add_argument('--validate', default=False, action='store_true') + parser.add_argument('--no-validate', dest='validate', action='store_false') subparsers = parser.add_subparsers() subparsers.required = True -- 2.44.0
[PATCH AUTOSEL 4.19 3/3] drm/amd/display: Set color_mgmt_changed to true on unsuspend
From: Joshua Ashton [ Upstream commit 2eb9dd497a698dc384c0dd3e0311d541eb2e13dd ] Otherwise we can end up with a frame on unsuspend where color management is not applied when userspace has not committed themselves. Fixes re-applying color management on Steam Deck/Gamescope on S3 resume. Signed-off-by: Joshua Ashton Reviewed-by: Harry Wentland Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 98d51bc204172..e4139723c473c 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -816,6 +816,7 @@ static int dm_resume(void *handle) dc_stream_release(dm_new_crtc_state->stream); dm_new_crtc_state->stream = NULL; } + dm_new_crtc_state->base.color_mgmt_changed = true; } for_each_new_plane_in_state(dm->cached_state, plane, new_plane_state, i) { -- 2.43.0
[PATCH AUTOSEL 5.4 3/3] drm/amd/display: Set color_mgmt_changed to true on unsuspend
From: Joshua Ashton [ Upstream commit 2eb9dd497a698dc384c0dd3e0311d541eb2e13dd ] Otherwise we can end up with a frame on unsuspend where color management is not applied when userspace has not committed themselves. Fixes re-applying color management on Steam Deck/Gamescope on S3 resume. Signed-off-by: Joshua Ashton Reviewed-by: Harry Wentland Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 3f3242783e1c3..3bfc4aa328c6f 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1251,6 +1251,7 @@ static int dm_resume(void *handle) dc_stream_release(dm_new_crtc_state->stream); dm_new_crtc_state->stream = NULL; } + dm_new_crtc_state->base.color_mgmt_changed = true; } for_each_new_plane_in_state(dm->cached_state, plane, new_plane_state, i) { -- 2.43.0
[PATCH AUTOSEL 5.10 3/3] drm/amd/display: Set color_mgmt_changed to true on unsuspend
From: Joshua Ashton [ Upstream commit 2eb9dd497a698dc384c0dd3e0311d541eb2e13dd ] Otherwise we can end up with a frame on unsuspend where color management is not applied when userspace has not committed themselves. Fixes re-applying color management on Steam Deck/Gamescope on S3 resume. Signed-off-by: Joshua Ashton Reviewed-by: Harry Wentland Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 3578e3b3536e3..29ef0ed44d5f4 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2099,6 +2099,7 @@ static int dm_resume(void *handle) dc_stream_release(dm_new_crtc_state->stream); dm_new_crtc_state->stream = NULL; } + dm_new_crtc_state->base.color_mgmt_changed = true; } for_each_new_plane_in_state(dm->cached_state, plane, new_plane_state, i) { -- 2.43.0
[PATCH AUTOSEL 5.15 4/5] drm/amd/display: Set color_mgmt_changed to true on unsuspend
From: Joshua Ashton [ Upstream commit 2eb9dd497a698dc384c0dd3e0311d541eb2e13dd ] Otherwise we can end up with a frame on unsuspend where color management is not applied when userspace has not committed themselves. Fixes re-applying color management on Steam Deck/Gamescope on S3 resume. Signed-off-by: Joshua Ashton Reviewed-by: Harry Wentland Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index b7b8a2d77da67..b821abb56ac3b 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2772,6 +2772,7 @@ static int dm_resume(void *handle) dc_stream_release(dm_new_crtc_state->stream); dm_new_crtc_state->stream = NULL; } + dm_new_crtc_state->base.color_mgmt_changed = true; } for_each_new_plane_in_state(dm->cached_state, plane, new_plane_state, i) { -- 2.43.0
[PATCH AUTOSEL 6.1 09/12] drm/amdgpu/mes: fix use-after-free issue
From: Jack Xiao [ Upstream commit 948255282074d9367e01908b3f5dcf8c10fc9c3d ] Delete fence fallback timer to fix the ramdom use-after-free issue. v2: move to amdgpu_mes.c Signed-off-by: Jack Xiao Acked-by: Lijo Lazar Acked-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index bebd136ed5444..9a4cbfbd5d9e5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -1083,6 +1083,7 @@ void amdgpu_mes_remove_ring(struct amdgpu_device *adev, return; amdgpu_mes_remove_hw_queue(adev, ring->hw_queue_id); + del_timer_sync(>fence_drv.fallback_timer); amdgpu_ring_fini(ring); kfree(ring); } -- 2.43.0
[PATCH AUTOSEL 6.1 08/12] drm/amdgpu: Fix the ring buffer size for queue VM flush
From: Prike Liang [ Upstream commit fe93b0927bc58cb1d64230f45744e527d9d8482c ] Here are the corrections needed for the queue ring buffer size calculation for the following cases: - Remove the KIQ VM flush ring usage. - Add the invalidate TLBs packet for gfx10 and gfx11 queue. - There's no VM flush and PFP sync, so remove the gfx9 real ring and compute ring buffer usage. Signed-off-by: Prike Liang Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +-- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 +-- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 -- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 84a36b50ddd87..f8382b227ad46 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -9352,7 +9352,7 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_gfx = { 7 + /* PIPELINE_SYNC */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* VM_FLUSH */ + 4 + /* VM_FLUSH */ 8 + /* FENCE for VM_FLUSH */ 20 + /* GDS switch */ 4 + /* double SWITCH_BUFFER, @@ -9445,7 +9445,6 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_kiq = { 7 + /* gfx_v10_0_ring_emit_pipeline_sync */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* gfx_v10_0_ring_emit_vm_flush */ 8 + 8 + 8, /* gfx_v10_0_ring_emit_fence_kiq x3 for user fence, vm fence */ .emit_ib_size = 7, /* gfx_v10_0_ring_emit_ib_compute */ .emit_ib = gfx_v10_0_ring_emit_ib_compute, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index 5a5787bfbce7f..1f9f7fdd4b8e0 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -6157,7 +6157,7 @@ static const struct amdgpu_ring_funcs gfx_v11_0_ring_funcs_gfx = { 7 + /* PIPELINE_SYNC */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* VM_FLUSH */ + 4 + /* VM_FLUSH */ 8 + /* FENCE for VM_FLUSH */ 20 + /* GDS switch */ 5 + /* COND_EXEC */ @@ -6243,7 +6243,6 @@ static const struct amdgpu_ring_funcs gfx_v11_0_ring_funcs_kiq = { 7 + /* gfx_v11_0_ring_emit_pipeline_sync */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* gfx_v11_0_ring_emit_vm_flush */ 8 + 8 + 8, /* gfx_v11_0_ring_emit_fence_kiq x3 for user fence, vm fence */ .emit_ib_size = 7, /* gfx_v11_0_ring_emit_ib_compute */ .emit_ib = gfx_v11_0_ring_emit_ib_compute, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 195b298923543..6a1fe21685149 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -6742,7 +6742,6 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = { 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* gfx_v9_0_ring_emit_vm_flush */ 8 + 8 + 8 + /* gfx_v9_0_ring_emit_fence x3 for user fence, vm fence */ 7 + /* gfx_v9_0_emit_mem_sync */ 5 + /* gfx_v9_0_emit_wave_limit for updating mmSPI_WCL_PIPE_PERCENT_GFX register */ @@ -6781,7 +6780,6 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = { 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* gfx_v9_0_ring_emit_vm_flush */ 8 + 8 + 8, /* gfx_v9_0_ring_emit_fence_kiq x3 for user fence, vm fence */ .emit_ib_size = 7, /* gfx_v9_0_ring_emit_ib_compute */ .emit_fence = gfx_v9_0_ring_emit_fence_kiq, -- 2.43.0
[PATCH AUTOSEL 6.1 07/12] drm/amdgpu: Update BO eviction priorities
From: Felix Kuehling [ Upstream commit b0b13d532105e0e682d95214933bb8483a063184 ] Make SVM BOs more likely to get evicted than other BOs. These BOs opportunistically use available VRAM, but can fall back relatively seamlessly to system memory. It also avoids SVM migrations evicting other, more important BOs as they will evict other SVM allocations first. Signed-off-by: Felix Kuehling Acked-by: Mukul Joshi Tested-by: Mukul Joshi Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index cde2fd2f71171..a5adae8b43d47 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -585,6 +585,8 @@ int amdgpu_bo_create(struct amdgpu_device *adev, else amdgpu_bo_placement_from_domain(bo, bp->domain); if (bp->type == ttm_bo_type_kernel) + bo->tbo.priority = 2; + else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE)) bo->tbo.priority = 1; if (!bp->destroy) -- 2.43.0
[PATCH AUTOSEL 6.1 06/12] drm/amd/display: Set color_mgmt_changed to true on unsuspend
From: Joshua Ashton [ Upstream commit 2eb9dd497a698dc384c0dd3e0311d541eb2e13dd ] Otherwise we can end up with a frame on unsuspend where color management is not applied when userspace has not committed themselves. Fixes re-applying color management on Steam Deck/Gamescope on S3 resume. Signed-off-by: Joshua Ashton Reviewed-by: Harry Wentland Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index ff460c9802eb2..31bae620aeffc 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2964,6 +2964,7 @@ static int dm_resume(void *handle) dc_stream_release(dm_new_crtc_state->stream); dm_new_crtc_state->stream = NULL; } + dm_new_crtc_state->base.color_mgmt_changed = true; } for_each_new_plane_in_state(dm->cached_state, plane, new_plane_state, i) { -- 2.43.0
[PATCH AUTOSEL 6.6 17/19] drm/etnaviv: fix tx clock gating on some GC7000 variants
From: Derek Foreman [ Upstream commit d7a5c9de99b3a9a43dce49f2084eb69b5f6a9752 ] commit 4bce244272513 ("drm/etnaviv: disable tx clock gating for GC7000 rev6203") accidentally applied the fix for i.MX8MN errata ERR050226 to GC2000 instead of GC7000, failing to disable tx clock gating for GC7000 rev 0x6023 as intended. Additional clean-up further propagated this issue, partially breaking the clock gating fixes added for GC7000 rev 6202 in commit 432f51e7deeda ("drm/etnaviv: add clock gating workaround for GC7000 r6202"). Signed-off-by: Derek Foreman Signed-off-by: Lucas Stach Signed-off-by: Sasha Levin --- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index 9276756e1397d..371e1f2733f6f 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -632,8 +632,8 @@ static void etnaviv_gpu_enable_mlcg(struct etnaviv_gpu *gpu) /* Disable TX clock gating on affected core revisions. */ if (etnaviv_is_model_rev(gpu, GC4000, 0x5222) || etnaviv_is_model_rev(gpu, GC2000, 0x5108) || - etnaviv_is_model_rev(gpu, GC2000, 0x6202) || - etnaviv_is_model_rev(gpu, GC2000, 0x6203)) + etnaviv_is_model_rev(gpu, GC7000, 0x6202) || + etnaviv_is_model_rev(gpu, GC7000, 0x6203)) pmc |= VIVS_PM_MODULE_CONTROLS_DISABLE_MODULE_CLOCK_GATING_TX; /* Disable SE and RA clock gating on affected core revisions. */ -- 2.43.0
[PATCH AUTOSEL 6.6 12/19] drm/amdgpu/mes: fix use-after-free issue
From: Jack Xiao [ Upstream commit 948255282074d9367e01908b3f5dcf8c10fc9c3d ] Delete fence fallback timer to fix the ramdom use-after-free issue. v2: move to amdgpu_mes.c Signed-off-by: Jack Xiao Acked-by: Lijo Lazar Acked-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index 15c67fa404ff9..c5c55e132af21 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -1098,6 +1098,7 @@ void amdgpu_mes_remove_ring(struct amdgpu_device *adev, return; amdgpu_mes_remove_hw_queue(adev, ring->hw_queue_id); + del_timer_sync(>fence_drv.fallback_timer); amdgpu_ring_fini(ring); kfree(ring); } -- 2.43.0
[PATCH AUTOSEL 6.6 11/19] drm/amdgpu: Fix the ring buffer size for queue VM flush
From: Prike Liang [ Upstream commit fe93b0927bc58cb1d64230f45744e527d9d8482c ] Here are the corrections needed for the queue ring buffer size calculation for the following cases: - Remove the KIQ VM flush ring usage. - Add the invalidate TLBs packet for gfx10 and gfx11 queue. - There's no VM flush and PFP sync, so remove the gfx9 real ring and compute ring buffer usage. Signed-off-by: Prike Liang Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +-- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 +-- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 -- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 495eb4cad0e1a..3560a3f2c848e 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -9157,7 +9157,7 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_gfx = { 7 + /* PIPELINE_SYNC */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* VM_FLUSH */ + 4 + /* VM_FLUSH */ 8 + /* FENCE for VM_FLUSH */ 20 + /* GDS switch */ 4 + /* double SWITCH_BUFFER, @@ -9248,7 +9248,6 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_kiq = { 7 + /* gfx_v10_0_ring_emit_pipeline_sync */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* gfx_v10_0_ring_emit_vm_flush */ 8 + 8 + 8, /* gfx_v10_0_ring_emit_fence_kiq x3 for user fence, vm fence */ .emit_ib_size = 7, /* gfx_v10_0_ring_emit_ib_compute */ .emit_ib = gfx_v10_0_ring_emit_ib_compute, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index c9058d58c95a7..daab4c7a073ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -6102,7 +6102,7 @@ static const struct amdgpu_ring_funcs gfx_v11_0_ring_funcs_gfx = { 7 + /* PIPELINE_SYNC */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* VM_FLUSH */ + 4 + /* VM_FLUSH */ 8 + /* FENCE for VM_FLUSH */ 20 + /* GDS switch */ 5 + /* COND_EXEC */ @@ -6187,7 +6187,6 @@ static const struct amdgpu_ring_funcs gfx_v11_0_ring_funcs_kiq = { 7 + /* gfx_v11_0_ring_emit_pipeline_sync */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* gfx_v11_0_ring_emit_vm_flush */ 8 + 8 + 8, /* gfx_v11_0_ring_emit_fence_kiq x3 for user fence, vm fence */ .emit_ib_size = 7, /* gfx_v11_0_ring_emit_ib_compute */ .emit_ib = gfx_v11_0_ring_emit_ib_compute, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index d7d15b618c374..8168836a08d2e 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -6988,7 +6988,6 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = { 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* gfx_v9_0_ring_emit_vm_flush */ 8 + 8 + 8 + /* gfx_v9_0_ring_emit_fence x3 for user fence, vm fence */ 7 + /* gfx_v9_0_emit_mem_sync */ 5 + /* gfx_v9_0_emit_wave_limit for updating mmSPI_WCL_PIPE_PERCENT_GFX register */ @@ -7026,7 +7025,6 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = { 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* gfx_v9_0_ring_emit_vm_flush */ 8 + 8 + 8, /* gfx_v9_0_ring_emit_fence_kiq x3 for user fence, vm fence */ .emit_ib_size = 7, /* gfx_v9_0_ring_emit_ib_compute */ .emit_fence = gfx_v9_0_ring_emit_fence_kiq, -- 2.43.0
[PATCH AUTOSEL 6.6 10/19] drm/amdkfd: Add VRAM accounting for SVM migration
From: Mukul Joshi [ Upstream commit 1e214f7faaf5d842754cd5cfcd76308bfedab3b5 ] Do VRAM accounting when doing migrations to vram to make sure there is enough available VRAM and migrating to VRAM doesn't evict other possible non-unified memory BOs. If migrating to VRAM fails, driver can fall back to using system memory seamlessly. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 16 +++- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 659313648b200..3263b5fa182d2 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -516,10 +516,19 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, start = prange->start << PAGE_SHIFT; end = (prange->last + 1) << PAGE_SHIFT; + r = amdgpu_amdkfd_reserve_mem_limit(node->adev, + prange->npages * PAGE_SIZE, + KFD_IOC_ALLOC_MEM_FLAGS_VRAM, + node->xcp ? node->xcp->id : 0); + if (r) { + dev_dbg(node->adev->dev, "failed to reserve VRAM, r: %ld\n", r); + return -ENOSPC; + } + r = svm_range_vram_node_new(node, prange, true); if (r) { dev_dbg(node->adev->dev, "fail %ld to alloc vram\n", r); - return r; + goto out; } ttm_res_offset = prange->offset << PAGE_SHIFT; @@ -549,6 +558,11 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, svm_range_vram_node_free(prange); } +out: + amdgpu_amdkfd_unreserve_mem_limit(node->adev, + prange->npages * PAGE_SIZE, + KFD_IOC_ALLOC_MEM_FLAGS_VRAM, + node->xcp ? node->xcp->id : 0); return r < 0 ? r : 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 87e9ca65e58e0..ce76d45549984 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -3416,7 +3416,7 @@ svm_range_trigger_migration(struct mm_struct *mm, struct svm_range *prange, r = svm_migrate_to_vram(prange, best_loc, mm, KFD_MIGRATE_TRIGGER_PREFETCH); *migrated = !r; - return r; + return 0; } int svm_range_schedule_evict_svm_bo(struct amdgpu_amdkfd_fence *fence) -- 2.43.0
[PATCH AUTOSEL 6.6 09/19] drm/amd/pm: Restore config space after reset
From: Lijo Lazar [ Upstream commit 30d1cda8ce31ab49051ff7159280c542a738b23d ] During mode-2 reset, pci config space registers are affected at device side. However, certain platforms have switches which assign virtual BAR addresses and returns the same even after device is reset. This affects pci_restore_state() as it doesn't issue another config write, if the value read is same as the saved value. Add a workaround to write saved config space values from driver side. Presently, these switches are in platforms with SMU v13.0.6 SOCs, hence restrict the workaround only to those. Signed-off-by: Lijo Lazar Reviewed-by: Asad Kamal Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 25 +++ 1 file changed, 25 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c index 6a28f8d5bff7d..be4b7b64f8785 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c @@ -2039,6 +2039,17 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void **table return sizeof(struct gpu_metrics_v1_3); } +static void smu_v13_0_6_restore_pci_config(struct smu_context *smu) +{ + struct amdgpu_device *adev = smu->adev; + int i; + + for (i = 0; i < 16; i++) + pci_write_config_dword(adev->pdev, i * 4, + adev->pdev->saved_config_space[i]); + pci_restore_msi_state(adev->pdev); +} + static int smu_v13_0_6_mode2_reset(struct smu_context *smu) { int ret = 0, index; @@ -2060,6 +2071,20 @@ static int smu_v13_0_6_mode2_reset(struct smu_context *smu) /* Restore the config space saved during init */ amdgpu_device_load_pci_state(adev->pdev); + /* Certain platforms have switches which assign virtual BAR values to +* devices. OS uses the virtual BAR values and device behind the switch +* is assgined another BAR value. When device's config space registers +* are queried, switch returns the virtual BAR values. When mode-2 reset +* is performed, switch is unaware of it, and will continue to return +* the same virtual values to the OS.This affects +* pci_restore_config_space() API as it doesn't write the value saved if +* the current value read from config space is the same as what is +* saved. As a workaround, make sure the config space is restored +* always. +*/ + if (!(adev->flags & AMD_IS_APU)) + smu_v13_0_6_restore_pci_config(smu); + dev_dbg(smu->adev->dev, "wait for reset ack\n"); do { ret = smu_cmn_wait_for_response(smu); -- 2.43.0
[PATCH AUTOSEL 6.6 08/19] drm/amdgpu: Update BO eviction priorities
From: Felix Kuehling [ Upstream commit b0b13d532105e0e682d95214933bb8483a063184 ] Make SVM BOs more likely to get evicted than other BOs. These BOs opportunistically use available VRAM, but can fall back relatively seamlessly to system memory. It also avoids SVM migrations evicting other, more important BOs as they will evict other SVM allocations first. Signed-off-by: Felix Kuehling Acked-by: Mukul Joshi Tested-by: Mukul Joshi Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 361f2cc94e8e5..1e33e82531f58 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -613,6 +613,8 @@ int amdgpu_bo_create(struct amdgpu_device *adev, else amdgpu_bo_placement_from_domain(bo, bp->domain); if (bp->type == ttm_bo_type_kernel) + bo->tbo.priority = 2; + else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE)) bo->tbo.priority = 1; if (!bp->destroy) -- 2.43.0
[PATCH AUTOSEL 6.6 07/19] drm/amd/display: Set color_mgmt_changed to true on unsuspend
From: Joshua Ashton [ Upstream commit 2eb9dd497a698dc384c0dd3e0311d541eb2e13dd ] Otherwise we can end up with a frame on unsuspend where color management is not applied when userspace has not committed themselves. Fixes re-applying color management on Steam Deck/Gamescope on S3 resume. Signed-off-by: Joshua Ashton Reviewed-by: Harry Wentland Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 3442e08f47876..dce9a4599174c 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2956,6 +2956,7 @@ static int dm_resume(void *handle) dc_stream_release(dm_new_crtc_state->stream); dm_new_crtc_state->stream = NULL; } + dm_new_crtc_state->base.color_mgmt_changed = true; } for_each_new_plane_in_state(dm->cached_state, plane, new_plane_state, i) { -- 2.43.0
[PATCH AUTOSEL 6.8 19/23] drm/etnaviv: fix tx clock gating on some GC7000 variants
From: Derek Foreman [ Upstream commit d7a5c9de99b3a9a43dce49f2084eb69b5f6a9752 ] commit 4bce244272513 ("drm/etnaviv: disable tx clock gating for GC7000 rev6203") accidentally applied the fix for i.MX8MN errata ERR050226 to GC2000 instead of GC7000, failing to disable tx clock gating for GC7000 rev 0x6023 as intended. Additional clean-up further propagated this issue, partially breaking the clock gating fixes added for GC7000 rev 6202 in commit 432f51e7deeda ("drm/etnaviv: add clock gating workaround for GC7000 r6202"). Signed-off-by: Derek Foreman Signed-off-by: Lucas Stach Signed-off-by: Sasha Levin --- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index 9b8445d2a128f..89cb6799b547f 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -632,8 +632,8 @@ static void etnaviv_gpu_enable_mlcg(struct etnaviv_gpu *gpu) /* Disable TX clock gating on affected core revisions. */ if (etnaviv_is_model_rev(gpu, GC4000, 0x5222) || etnaviv_is_model_rev(gpu, GC2000, 0x5108) || - etnaviv_is_model_rev(gpu, GC2000, 0x6202) || - etnaviv_is_model_rev(gpu, GC2000, 0x6203)) + etnaviv_is_model_rev(gpu, GC7000, 0x6202) || + etnaviv_is_model_rev(gpu, GC7000, 0x6203)) pmc |= VIVS_PM_MODULE_CONTROLS_DISABLE_MODULE_CLOCK_GATING_TX; /* Disable SE and RA clock gating on affected core revisions. */ -- 2.43.0
[PATCH AUTOSEL 6.8 14/23] drm/amdgpu/mes: fix use-after-free issue
From: Jack Xiao [ Upstream commit 948255282074d9367e01908b3f5dcf8c10fc9c3d ] Delete fence fallback timer to fix the ramdom use-after-free issue. v2: move to amdgpu_mes.c Signed-off-by: Jack Xiao Acked-by: Lijo Lazar Acked-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index da48b6da01072..420e5bc44e306 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -1129,6 +1129,7 @@ void amdgpu_mes_remove_ring(struct amdgpu_device *adev, return; amdgpu_mes_remove_hw_queue(adev, ring->hw_queue_id); + del_timer_sync(>fence_drv.fallback_timer); amdgpu_ring_fini(ring); kfree(ring); } -- 2.43.0
[PATCH AUTOSEL 6.8 13/23] drm/amdgpu: Fix the ring buffer size for queue VM flush
From: Prike Liang [ Upstream commit fe93b0927bc58cb1d64230f45744e527d9d8482c ] Here are the corrections needed for the queue ring buffer size calculation for the following cases: - Remove the KIQ VM flush ring usage. - Add the invalidate TLBs packet for gfx10 and gfx11 queue. - There's no VM flush and PFP sync, so remove the gfx9 real ring and compute ring buffer usage. Signed-off-by: Prike Liang Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +-- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 +-- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 -- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index dcdecb18b2306..42392a97daff2 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -9194,7 +9194,7 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_gfx = { 7 + /* PIPELINE_SYNC */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* VM_FLUSH */ + 4 + /* VM_FLUSH */ 8 + /* FENCE for VM_FLUSH */ 20 + /* GDS switch */ 4 + /* double SWITCH_BUFFER, @@ -9285,7 +9285,6 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_kiq = { 7 + /* gfx_v10_0_ring_emit_pipeline_sync */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* gfx_v10_0_ring_emit_vm_flush */ 8 + 8 + 8, /* gfx_v10_0_ring_emit_fence_kiq x3 for user fence, vm fence */ .emit_ib_size = 7, /* gfx_v10_0_ring_emit_ib_compute */ .emit_ib = gfx_v10_0_ring_emit_ib_compute, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index 0afe86bcc932b..6a6fc422e44da 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -6110,7 +6110,7 @@ static const struct amdgpu_ring_funcs gfx_v11_0_ring_funcs_gfx = { 7 + /* PIPELINE_SYNC */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* VM_FLUSH */ + 4 + /* VM_FLUSH */ 8 + /* FENCE for VM_FLUSH */ 20 + /* GDS switch */ 5 + /* COND_EXEC */ @@ -6195,7 +6195,6 @@ static const struct amdgpu_ring_funcs gfx_v11_0_ring_funcs_kiq = { 7 + /* gfx_v11_0_ring_emit_pipeline_sync */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* gfx_v11_0_ring_emit_vm_flush */ 8 + 8 + 8, /* gfx_v11_0_ring_emit_fence_kiq x3 for user fence, vm fence */ .emit_ib_size = 7, /* gfx_v11_0_ring_emit_ib_compute */ .emit_ib = gfx_v11_0_ring_emit_ib_compute, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 3bc6943365a4f..153932c1f64f3 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -6991,7 +6991,6 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = { 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* gfx_v9_0_ring_emit_vm_flush */ 8 + 8 + 8 + /* gfx_v9_0_ring_emit_fence x3 for user fence, vm fence */ 7 + /* gfx_v9_0_emit_mem_sync */ 5 + /* gfx_v9_0_emit_wave_limit for updating mmSPI_WCL_PIPE_PERCENT_GFX register */ @@ -7029,7 +7028,6 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = { 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + - 2 + /* gfx_v9_0_ring_emit_vm_flush */ 8 + 8 + 8, /* gfx_v9_0_ring_emit_fence_kiq x3 for user fence, vm fence */ .emit_ib_size = 7, /* gfx_v9_0_ring_emit_ib_compute */ .emit_fence = gfx_v9_0_ring_emit_fence_kiq, -- 2.43.0
[PATCH AUTOSEL 6.8 11/23] drm/amd/pm: Restore config space after reset
From: Lijo Lazar [ Upstream commit 30d1cda8ce31ab49051ff7159280c542a738b23d ] During mode-2 reset, pci config space registers are affected at device side. However, certain platforms have switches which assign virtual BAR addresses and returns the same even after device is reset. This affects pci_restore_state() as it doesn't issue another config write, if the value read is same as the saved value. Add a workaround to write saved config space values from driver side. Presently, these switches are in platforms with SMU v13.0.6 SOCs, hence restrict the workaround only to those. Signed-off-by: Lijo Lazar Reviewed-by: Asad Kamal Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 25 +++ 1 file changed, 25 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c index 78491b04df108..ddb11eb8c3f53 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c @@ -2205,6 +2205,17 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void **table return sizeof(*gpu_metrics); } +static void smu_v13_0_6_restore_pci_config(struct smu_context *smu) +{ + struct amdgpu_device *adev = smu->adev; + int i; + + for (i = 0; i < 16; i++) + pci_write_config_dword(adev->pdev, i * 4, + adev->pdev->saved_config_space[i]); + pci_restore_msi_state(adev->pdev); +} + static int smu_v13_0_6_mode2_reset(struct smu_context *smu) { int ret = 0, index; @@ -2226,6 +2237,20 @@ static int smu_v13_0_6_mode2_reset(struct smu_context *smu) /* Restore the config space saved during init */ amdgpu_device_load_pci_state(adev->pdev); + /* Certain platforms have switches which assign virtual BAR values to +* devices. OS uses the virtual BAR values and device behind the switch +* is assgined another BAR value. When device's config space registers +* are queried, switch returns the virtual BAR values. When mode-2 reset +* is performed, switch is unaware of it, and will continue to return +* the same virtual values to the OS.This affects +* pci_restore_config_space() API as it doesn't write the value saved if +* the current value read from config space is the same as what is +* saved. As a workaround, make sure the config space is restored +* always. +*/ + if (!(adev->flags & AMD_IS_APU)) + smu_v13_0_6_restore_pci_config(smu); + dev_dbg(smu->adev->dev, "wait for reset ack\n"); do { ret = smu_cmn_wait_for_response(smu); -- 2.43.0
[PATCH AUTOSEL 6.8 12/23] drm/amdkfd: Add VRAM accounting for SVM migration
From: Mukul Joshi [ Upstream commit 1e214f7faaf5d842754cd5cfcd76308bfedab3b5 ] Do VRAM accounting when doing migrations to vram to make sure there is enough available VRAM and migrating to VRAM doesn't evict other possible non-unified memory BOs. If migrating to VRAM fails, driver can fall back to using system memory seamlessly. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 16 +++- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index bdc01ca9609a7..5c8d81bfce7ab 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -509,10 +509,19 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, start = start_mgr << PAGE_SHIFT; end = (last_mgr + 1) << PAGE_SHIFT; + r = amdgpu_amdkfd_reserve_mem_limit(node->adev, + prange->npages * PAGE_SIZE, + KFD_IOC_ALLOC_MEM_FLAGS_VRAM, + node->xcp ? node->xcp->id : 0); + if (r) { + dev_dbg(node->adev->dev, "failed to reserve VRAM, r: %ld\n", r); + return -ENOSPC; + } + r = svm_range_vram_node_new(node, prange, true); if (r) { dev_dbg(node->adev->dev, "fail %ld to alloc vram\n", r); - return r; + goto out; } ttm_res_offset = (start_mgr - prange->start + prange->offset) << PAGE_SHIFT; @@ -545,6 +554,11 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, svm_range_vram_node_free(prange); } +out: + amdgpu_amdkfd_unreserve_mem_limit(node->adev, + prange->npages * PAGE_SIZE, + KFD_IOC_ALLOC_MEM_FLAGS_VRAM, + node->xcp ? node->xcp->id : 0); return r < 0 ? r : 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index c50a0dc9c9c07..33205078202b5 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -3424,7 +3424,7 @@ svm_range_trigger_migration(struct mm_struct *mm, struct svm_range *prange, mm, KFD_MIGRATE_TRIGGER_PREFETCH); *migrated = !r; - return r; + return 0; } int svm_range_schedule_evict_svm_bo(struct amdgpu_amdkfd_fence *fence) -- 2.43.0
[PATCH AUTOSEL 6.8 10/23] drm/amdgpu: Update BO eviction priorities
From: Felix Kuehling [ Upstream commit b0b13d532105e0e682d95214933bb8483a063184 ] Make SVM BOs more likely to get evicted than other BOs. These BOs opportunistically use available VRAM, but can fall back relatively seamlessly to system memory. It also avoids SVM migrations evicting other, more important BOs as they will evict other SVM allocations first. Signed-off-by: Felix Kuehling Acked-by: Mukul Joshi Tested-by: Mukul Joshi Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 866bfde1ca6f9..e7deb13ca4090 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -608,6 +608,8 @@ int amdgpu_bo_create(struct amdgpu_device *adev, else amdgpu_bo_placement_from_domain(bo, bp->domain); if (bp->type == ttm_bo_type_kernel) + bo->tbo.priority = 2; + else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE)) bo->tbo.priority = 1; if (!bp->destroy) -- 2.43.0
[PATCH AUTOSEL 6.8 09/23] drm/amd/display: Set color_mgmt_changed to true on unsuspend
From: Joshua Ashton [ Upstream commit 2eb9dd497a698dc384c0dd3e0311d541eb2e13dd ] Otherwise we can end up with a frame on unsuspend where color management is not applied when userspace has not committed themselves. Fixes re-applying color management on Steam Deck/Gamescope on S3 resume. Signed-off-by: Joshua Ashton Reviewed-by: Harry Wentland Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 718e533ab46dd..f33b7f09c3f3d 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -3007,6 +3007,7 @@ static int dm_resume(void *handle) dc_stream_release(dm_new_crtc_state->stream); dm_new_crtc_state->stream = NULL; } + dm_new_crtc_state->base.color_mgmt_changed = true; } for_each_new_plane_in_state(dm->cached_state, plane, new_plane_state, i) { -- 2.43.0
[PATCH v3 6/6] drm/xe/client: Print runtime to fdinfo
Print the accumulated runtime for client when printing fdinfo. Each time a query is done it first does 2 things: 1) loop through all the exec queues for the current client and accumulate the runtime, per engine class. CTX_TIMESTAMP is used for that, being read from the context image. 2) Read a "GPU timestamp" that can be used for considering "how much GPU time has passed" and that has the same unit/refclock as the one recording the runtime. RING_TIMESTAMP is used for that via MMIO. Since for all current platforms RING_TIMESTAMP follows the same refclock, just read it once, using any first engine. This is exported to userspace as 2 numbers in fdinfo: drm-cycles-: drm-total-cycles-: Userspace is expected to collect at least 2 samples, which allows to know the client engine busyness as per: RUNTIME1 - RUNTIME0 busyness = - T1 - T0 Another thing to point out is that it's expected that userspace will read any 2 samples every few seconds. Given the update frequency of the counters involved and that CTX_TIMESTAMP is 32-bits, the counter for each exec_queue can wrap around (assuming 100% utilization) after ~200s. The wraparound is not perceived by userspace since it's just accumulated for all the exec_queues in a 64-bit counter), but the measurement will not be accurate if the samples are too far apart. This could be mitigated by adding a workqueue to accumulate the counters every so often, but it's additional complexity for something that is done already by userspace every few seconds in tools like gputop (from igt), htop, nvtop, etc with none of them really defaulting to 1 sample per minute or more. Signed-off-by: Lucas De Marchi --- Documentation/gpu/drm-usage-stats.rst | 16 ++- Documentation/gpu/xe/index.rst | 1 + Documentation/gpu/xe/xe-drm-usage-stats.rst | 10 ++ drivers/gpu/drm/xe/xe_drm_client.c | 136 +++- 4 files changed, 160 insertions(+), 3 deletions(-) create mode 100644 Documentation/gpu/xe/xe-drm-usage-stats.rst diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index 6dc299343b48..421766289b78 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -112,6 +112,17 @@ larger value within a reasonable period. Upon observing a value lower than what was previously read, userspace is expected to stay with that larger previous value until a monotonic update is seen. +- drm-total-cycles-: + +Engine identifier string must be the same as the one specified in the +drm-cycles- tag and shall contain the total number cycles for the given +engine. + +This is a timestamp in GPU unspecified unit that matches the update rate +of drm-cycles-. For drivers that implement this interface, the engine +utilization can be calculated entirely on the GPU clock domain, without +considering the CPU sleep time between 2 samples. + - drm-maxfreq-: [Hz|MHz|KHz] Engine identifier string must be the same as the one specified in the @@ -168,5 +179,6 @@ be documented above and where possible, aligned with other drivers. Driver specific implementations --- -:ref:`i915-usage-stats` -:ref:`panfrost-usage-stats` +* :ref:`i915-usage-stats` +* :ref:`panfrost-usage-stats` +* :ref:`xe-usage-stats` diff --git a/Documentation/gpu/xe/index.rst b/Documentation/gpu/xe/index.rst index c224ecaee81e..3f07aa3b5432 100644 --- a/Documentation/gpu/xe/index.rst +++ b/Documentation/gpu/xe/index.rst @@ -23,3 +23,4 @@ DG2, etc is provided to prototype the driver. xe_firmware xe_tile xe_debugging + xe-drm-usage-stats.rst diff --git a/Documentation/gpu/xe/xe-drm-usage-stats.rst b/Documentation/gpu/xe/xe-drm-usage-stats.rst new file mode 100644 index ..482d503ae68a --- /dev/null +++ b/Documentation/gpu/xe/xe-drm-usage-stats.rst @@ -0,0 +1,10 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +.. _xe-usage-stats: + + +Xe DRM client usage stats implementation + + +.. kernel-doc:: drivers/gpu/drm/xe/xe_drm_client.c + :doc: DRM Client usage stats diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c index 08f0b7c95901..27494839d586 100644 --- a/drivers/gpu/drm/xe/xe_drm_client.c +++ b/drivers/gpu/drm/xe/xe_drm_client.c @@ -2,6 +2,7 @@ /* * Copyright © 2023 Intel Corporation */ +#include "xe_drm_client.h" #include #include @@ -12,9 +13,65 @@ #include "xe_bo.h" #include "xe_bo_types.h" #include "xe_device_types.h" -#include "xe_drm_client.h" +#include "xe_exec_queue.h" +#include "xe_gt.h" +#include "xe_hw_engine.h" +#include "xe_pm.h" #include "xe_trace.h" +/** + * DOC: DRM Client usage stats + * + * The drm/xe driver implements the DRM client usage stats specification as + * documented in :ref:`drm-client-usage-stats`. + * + *
[PATCH v3 3/6] drm/xe/lrc: Add helper to capture context timestamp
From: Umesh Nerlige Ramappa Add a helper to capture CTX_TIMESTAMP from the context image so it can be used to calculate the runtime. v2: Add kernel-doc to clarify expectation from caller Signed-off-by: Umesh Nerlige Ramappa Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/xe/regs/xe_lrc_layout.h | 1 + drivers/gpu/drm/xe/xe_lrc.c | 11 +++ drivers/gpu/drm/xe/xe_lrc.h | 14 ++ drivers/gpu/drm/xe/xe_lrc_types.h | 3 +++ 4 files changed, 29 insertions(+) diff --git a/drivers/gpu/drm/xe/regs/xe_lrc_layout.h b/drivers/gpu/drm/xe/regs/xe_lrc_layout.h index 1825d8f79db6..8780e6c6b649 100644 --- a/drivers/gpu/drm/xe/regs/xe_lrc_layout.h +++ b/drivers/gpu/drm/xe/regs/xe_lrc_layout.h @@ -11,6 +11,7 @@ #define CTX_RING_TAIL (0x06 + 1) #define CTX_RING_START (0x08 + 1) #define CTX_RING_CTL (0x0a + 1) +#define CTX_TIMESTAMP (0x22 + 1) #define CTX_PDP0_UDW (0x30 + 1) #define CTX_PDP0_LDW (0x32 + 1) diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c index 2066d34ddf0b..8c540f2f43e3 100644 --- a/drivers/gpu/drm/xe/xe_lrc.c +++ b/drivers/gpu/drm/xe/xe_lrc.c @@ -751,6 +751,7 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, lrc->tile = gt_to_tile(hwe->gt); lrc->ring.size = ring_size; lrc->ring.tail = 0; + lrc->ctx_timestamp = 0; xe_hw_fence_ctx_init(>fence_ctx, hwe->gt, hwe->fence_irq, hwe->name); @@ -786,6 +787,7 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, xe_drm_client_add_bo(vm->xef->client, lrc->bo); } + xe_lrc_write_ctx_reg(lrc, CTX_TIMESTAMP, 0); xe_lrc_write_ctx_reg(lrc, CTX_RING_START, __xe_lrc_ring_ggtt_addr(lrc)); xe_lrc_write_ctx_reg(lrc, CTX_RING_HEAD, 0); xe_lrc_write_ctx_reg(lrc, CTX_RING_TAIL, lrc->ring.tail); @@ -1444,3 +1446,12 @@ void xe_lrc_snapshot_free(struct xe_lrc_snapshot *snapshot) xe_bo_put(snapshot->lrc_bo); kfree(snapshot); } + +u32 xe_lrc_update_timestamp(struct xe_lrc *lrc, u32 *old_ts) +{ + *old_ts = lrc->ctx_timestamp; + + lrc->ctx_timestamp = xe_lrc_read_ctx_reg(lrc, CTX_TIMESTAMP); + + return lrc->ctx_timestamp; +} diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h index d32fa31faa2c..dcbc6edd80da 100644 --- a/drivers/gpu/drm/xe/xe_lrc.h +++ b/drivers/gpu/drm/xe/xe_lrc.h @@ -60,4 +60,18 @@ void xe_lrc_snapshot_capture_delayed(struct xe_lrc_snapshot *snapshot); void xe_lrc_snapshot_print(struct xe_lrc_snapshot *snapshot, struct drm_printer *p); void xe_lrc_snapshot_free(struct xe_lrc_snapshot *snapshot); +/** + * xe_lrc_update_timestamp - readout LRC timestamp and update cached value + * @lrc: logical ring context for this exec queue + * @old_ts: pointer where to save the previous timestamp + * + * Read the current timestamp for this LRC and update the cached value. The + * previous cached value is also returned in @old_ts so the caller can calculate + * the delta between 2 updates. Note that this is not intended to be called from + * any place, but just by the paths updating the drm client utilization. + * + * Returns the current LRC timestamp + */ +u32 xe_lrc_update_timestamp(struct xe_lrc *lrc, u32 *old_ts); + #endif diff --git a/drivers/gpu/drm/xe/xe_lrc_types.h b/drivers/gpu/drm/xe/xe_lrc_types.h index b716df0dfb4e..5765d771b901 100644 --- a/drivers/gpu/drm/xe/xe_lrc_types.h +++ b/drivers/gpu/drm/xe/xe_lrc_types.h @@ -41,6 +41,9 @@ struct xe_lrc { /** @fence_ctx: context for hw fence */ struct xe_hw_fence_ctx fence_ctx; + + /** @ctx_timestamp: readout value of CTX_TIMESTAMP on last update */ + u32 ctx_timestamp; }; struct xe_lrc_snapshot; -- 2.43.0
[PATCH v3 0/6] drm/xe: Per client usage
Add per-client usage statistics to xe. This ports xe to use the common method in drm to export the usage to userspace per client (where 1 client == 1 drm fd open). However instead of using the current format measured in nsec, this creates a new one. The intention here is not to mix the GPU clock domain with the CPU clock. It allows to cover a few more use cases without extra complications. I tested this on DG2 and also checked gputop with i915 to make sure not regressed. Last patch also contains the documentation for the new key and sample output as requested in v1. Reproducing it partially here: - drm-total-cycles-: Engine identifier string must be the same as the one specified in the drm-cycles- tag and shall contain the total number cycles for the given engine. This is a timestamp in GPU unspecified unit that matches the update rate of drm-cycles-. For drivers that implement this interface, the engine utilization can be calculated entirely on the GPU clock domain, without considering the CPU sleep time between 2 samples. The pre-existent drm-cycles- is used as is, which allows gputop to work with xe. This last patch still has some open discussion from v2, so we may need to hold it a little more. Test-with: 20240504064643.25863-1-lucas.demar...@intel.com v2: - Create a new drm-total-cycles instead of re-using drm-engine with a different unit - Add documentation for the new interface and clarify usage of xe_lrc_update_timestamp() v3: - Fix bugs in "drm/xe: Add helper to accumulate exec queue runtime" - see commit message - Reorder commits so the ones that are useful in other patch series come first Lucas De Marchi (4): drm/xe: Promote xe_hw_engine_class_to_str() drm/xe: Add XE_ENGINE_CLASS_OTHER to str conversion drm/xe: Add helper to capture engine timestamp drm/xe/client: Print runtime to fdinfo Umesh Nerlige Ramappa (2): drm/xe/lrc: Add helper to capture context timestamp drm/xe: Add helper to accumulate exec queue runtime Documentation/gpu/drm-usage-stats.rst | 16 ++- Documentation/gpu/xe/index.rst| 1 + Documentation/gpu/xe/xe-drm-usage-stats.rst | 10 ++ drivers/gpu/drm/xe/regs/xe_lrc_layout.h | 1 + drivers/gpu/drm/xe/xe_device_types.h | 9 ++ drivers/gpu/drm/xe/xe_drm_client.c| 136 +- drivers/gpu/drm/xe/xe_exec_queue.c| 35 + drivers/gpu/drm/xe/xe_exec_queue.h| 1 + drivers/gpu/drm/xe/xe_execlist.c | 1 + drivers/gpu/drm/xe/xe_guc_submit.c| 2 + drivers/gpu/drm/xe/xe_hw_engine.c | 27 drivers/gpu/drm/xe/xe_hw_engine.h | 3 + drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c | 18 --- drivers/gpu/drm/xe/xe_lrc.c | 11 ++ drivers/gpu/drm/xe/xe_lrc.h | 14 ++ drivers/gpu/drm/xe/xe_lrc_types.h | 3 + 16 files changed, 267 insertions(+), 21 deletions(-) create mode 100644 Documentation/gpu/xe/xe-drm-usage-stats.rst -- 2.43.0
[PATCH v3 4/6] drm/xe: Add helper to capture engine timestamp
Just like CTX_TIMESTAMP is used to calculate runtime, add a helper to get the timestamp for the engine so it can be used to calculate the "engine time" with the same unit as the runtime is recorded. Reviewed-by: Umesh Nerlige Ramappa Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/xe/xe_hw_engine.c | 5 + drivers/gpu/drm/xe/xe_hw_engine.h | 1 + 2 files changed, 6 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c index de30b74bf253..734f43a35b48 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine.c +++ b/drivers/gpu/drm/xe/xe_hw_engine.c @@ -1110,3 +1110,8 @@ const char *xe_hw_engine_class_to_str(enum xe_engine_class class) return NULL; } + +u64 xe_hw_engine_read_timestamp(struct xe_hw_engine *hwe) +{ + return xe_mmio_read64_2x32(hwe->gt, RING_TIMESTAMP(hwe->mmio_base)); +} diff --git a/drivers/gpu/drm/xe/xe_hw_engine.h b/drivers/gpu/drm/xe/xe_hw_engine.h index 843de159e47c..7f2d27c0ba1a 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine.h +++ b/drivers/gpu/drm/xe/xe_hw_engine.h @@ -68,5 +68,6 @@ static inline bool xe_hw_engine_is_valid(struct xe_hw_engine *hwe) } const char *xe_hw_engine_class_to_str(enum xe_engine_class class); +u64 xe_hw_engine_read_timestamp(struct xe_hw_engine *hwe); #endif -- 2.43.0
[PATCH v3 1/6] drm/xe: Promote xe_hw_engine_class_to_str()
Move it out of the sysfs compilation unit so it can be re-used in other places. Reviewed-by: Nirmoy Das Reviewed-by: Oak Zeng Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/xe/xe_hw_engine.c | 18 ++ drivers/gpu/drm/xe/xe_hw_engine.h | 2 ++ drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c | 18 -- 3 files changed, 20 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c index ec69803152a2..85712650be22 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine.c +++ b/drivers/gpu/drm/xe/xe_hw_engine.c @@ -1088,3 +1088,21 @@ bool xe_hw_engine_is_reserved(struct xe_hw_engine *hwe) return xe->info.has_usm && hwe->class == XE_ENGINE_CLASS_COPY && hwe->instance == gt->usm.reserved_bcs_instance; } + +const char *xe_hw_engine_class_to_str(enum xe_engine_class class) +{ + switch (class) { + case XE_ENGINE_CLASS_RENDER: + return "rcs"; + case XE_ENGINE_CLASS_VIDEO_DECODE: + return "vcs"; + case XE_ENGINE_CLASS_VIDEO_ENHANCE: + return "vecs"; + case XE_ENGINE_CLASS_COPY: + return "bcs"; + case XE_ENGINE_CLASS_COMPUTE: + return "ccs"; + default: + return NULL; + } +} diff --git a/drivers/gpu/drm/xe/xe_hw_engine.h b/drivers/gpu/drm/xe/xe_hw_engine.h index 71968ee2f600..843de159e47c 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine.h +++ b/drivers/gpu/drm/xe/xe_hw_engine.h @@ -67,4 +67,6 @@ static inline bool xe_hw_engine_is_valid(struct xe_hw_engine *hwe) return hwe->name; } +const char *xe_hw_engine_class_to_str(enum xe_engine_class class); + #endif diff --git a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c index 844ec68cbbb8..efce6c7dd2a2 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c +++ b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c @@ -618,24 +618,6 @@ static void hw_engine_class_sysfs_fini(struct drm_device *drm, void *arg) kobject_put(kobj); } -static const char *xe_hw_engine_class_to_str(enum xe_engine_class class) -{ - switch (class) { - case XE_ENGINE_CLASS_RENDER: - return "rcs"; - case XE_ENGINE_CLASS_VIDEO_DECODE: - return "vcs"; - case XE_ENGINE_CLASS_VIDEO_ENHANCE: - return "vecs"; - case XE_ENGINE_CLASS_COPY: - return "bcs"; - case XE_ENGINE_CLASS_COMPUTE: - return "ccs"; - default: - return NULL; - } -} - /** * xe_hw_engine_class_sysfs_init - Init HW engine classes on GT. * @gt: Xe GT. -- 2.43.0
[PATCH v3 5/6] drm/xe: Add helper to accumulate exec queue runtime
From: Umesh Nerlige Ramappa Add a helper to accumulate per-client runtime of all its exec queues. This is called every time a sched job is finished. v2: - Use guc_exec_queue_free_job() and execlist_job_free() to accumulate runtime when job is finished since xe_sched_job_completed() is not a notification that job finished. - Stop trying to update runtime from xe_exec_queue_fini() - that is redundant and may happen after xef is closed, leading to a use-after-free - Do not special case the first timestamp read: the default LRC sets CTX_TIMESTAMP to zero, so even the first sample should be a valid one. - Handle the parallel submission case by multiplying the runtime by width. Signed-off-by: Umesh Nerlige Ramappa Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/xe/xe_device_types.h | 9 +++ drivers/gpu/drm/xe/xe_exec_queue.c | 35 drivers/gpu/drm/xe/xe_exec_queue.h | 1 + drivers/gpu/drm/xe/xe_execlist.c | 1 + drivers/gpu/drm/xe/xe_guc_submit.c | 2 ++ 5 files changed, 48 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h index 906b98fb973b..de078bdf0ab9 100644 --- a/drivers/gpu/drm/xe/xe_device_types.h +++ b/drivers/gpu/drm/xe/xe_device_types.h @@ -560,6 +560,15 @@ struct xe_file { struct mutex lock; } exec_queue; + /** +* @runtime: hw engine class runtime in ticks for this drm client +* +* Only stats from xe_exec_queue->lrc[0] are accumulated. For multi-lrc +* case, since all jobs run in parallel on the engines, only the stats +* from lrc[0] are sufficient. +*/ + u64 runtime[XE_ENGINE_CLASS_MAX]; + /** @client: drm client */ struct xe_drm_client *client; }; diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index 395de93579fa..86eb22e22c95 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -769,6 +769,41 @@ bool xe_exec_queue_is_idle(struct xe_exec_queue *q) q->lrc[0].fence_ctx.next_seqno - 1; } +/** + * xe_exec_queue_update_runtime() - Update runtime for this exec queue from hw + * @q: The exec queue + * + * Update the timestamp saved by HW for this exec queue and save runtime + * calculated by using the delta from last update. On multi-lrc case, only the + * first is considered. + */ +void xe_exec_queue_update_runtime(struct xe_exec_queue *q) +{ + struct xe_file *xef; + struct xe_lrc *lrc; + u32 old_ts, new_ts; + + /* +* Jobs that are run during driver load may use an exec_queue, but are +* not associated with a user xe file, so avoid accumulating busyness +* for kernel specific work. +*/ + if (!q->vm || !q->vm->xef) + return; + + xef = q->vm->xef; + + /* +* Only sample the first LRC. For parallel submission, all of them are +* scheduled together and we compensate that below by multiplying by +* width +*/ + lrc = >lrc[0]; + + new_ts = xe_lrc_update_timestamp(lrc, _ts); + xef->runtime[q->class] += (new_ts - old_ts) * q->width; +} + void xe_exec_queue_kill(struct xe_exec_queue *q) { struct xe_exec_queue *eq = q, *next; diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h index 48f6da53a292..e0f07d28ee1a 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.h +++ b/drivers/gpu/drm/xe/xe_exec_queue.h @@ -75,5 +75,6 @@ struct dma_fence *xe_exec_queue_last_fence_get(struct xe_exec_queue *e, struct xe_vm *vm); void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct xe_vm *vm, struct dma_fence *fence); +void xe_exec_queue_update_runtime(struct xe_exec_queue *q); #endif diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c index dece2785933c..a316431025c7 100644 --- a/drivers/gpu/drm/xe/xe_execlist.c +++ b/drivers/gpu/drm/xe/xe_execlist.c @@ -307,6 +307,7 @@ static void execlist_job_free(struct drm_sched_job *drm_job) { struct xe_sched_job *job = to_xe_sched_job(drm_job); + xe_exec_queue_update_runtime(job->q); xe_sched_job_put(job); } diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index d274a139010b..e0ebfe83dfe8 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -749,6 +749,8 @@ static void guc_exec_queue_free_job(struct drm_sched_job *drm_job) { struct xe_sched_job *job = to_xe_sched_job(drm_job); + xe_exec_queue_update_runtime(job->q); + trace_xe_sched_job_free(job); xe_sched_job_put(job); } -- 2.43.0
[PATCH v3 2/6] drm/xe: Add XE_ENGINE_CLASS_OTHER to str conversion
XE_ENGINE_CLASS_OTHER was missing from the str conversion. Add it and remove the default handling so it's protected by -Wswitch. Currently the only user is xe_hw_engine_class_sysfs_init(), which already skips XE_ENGINE_CLASS_OTHER, so there's no change in behavior. Reviewed-by: Nirmoy Das Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/xe/xe_hw_engine.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c index 85712650be22..de30b74bf253 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine.c +++ b/drivers/gpu/drm/xe/xe_hw_engine.c @@ -1100,9 +1100,13 @@ const char *xe_hw_engine_class_to_str(enum xe_engine_class class) return "vecs"; case XE_ENGINE_CLASS_COPY: return "bcs"; + case XE_ENGINE_CLASS_OTHER: + return "other"; case XE_ENGINE_CLASS_COMPUTE: return "ccs"; - default: - return NULL; + case XE_ENGINE_CLASS_MAX: + break; } + + return NULL; } -- 2.43.0
Re: [PATCH v2 6/6] drm/xe/client: Print runtime to fdinfo
On Fri, Apr 26, 2024 at 11:47:37AM GMT, Tvrtko Ursulin wrote: On 24/04/2024 00:56, Lucas De Marchi wrote: Print the accumulated runtime for client when printing fdinfo. Each time a query is done it first does 2 things: 1) loop through all the exec queues for the current client and accumulate the runtime, per engine class. CTX_TIMESTAMP is used for that, being read from the context image. 2) Read a "GPU timestamp" that can be used for considering "how much GPU time has passed" and that has the same unit/refclock as the one recording the runtime. RING_TIMESTAMP is used for that via MMIO. Since for all current platforms RING_TIMESTAMP follows the same refclock, just read it once, using any first engine. This is exported to userspace as 2 numbers in fdinfo: drm-cycles-: drm-total-cycles-: Userspace is expected to collect at least 2 samples, which allows to know the client engine busyness as per: RUNTIME1 - RUNTIME0 busyness = - T1 - T0 Another thing to point out is that it's expected that userspace will read any 2 samples every few seconds. Given the update frequency of the counters involved and that CTX_TIMESTAMP is 32-bits, the counter for each exec_queue can wrap around (assuming 100% utilization) after ~200s. The wraparound is not perceived by userspace since it's just accumulated for all the exec_queues in a 64-bit counter), but the measurement will not be accurate if the samples are too far apart. This could be mitigated by adding a workqueue to accumulate the counters every so often, but it's additional complexity for something that is done already by userspace every few seconds in tools like gputop (from igt), htop, nvtop, etc with none of them really defaulting to 1 sample per minute or more. Signed-off-by: Lucas De Marchi --- Documentation/gpu/drm-usage-stats.rst | 16 ++- Documentation/gpu/xe/index.rst | 1 + Documentation/gpu/xe/xe-drm-usage-stats.rst | 10 ++ drivers/gpu/drm/xe/xe_drm_client.c | 138 +++- 4 files changed, 162 insertions(+), 3 deletions(-) create mode 100644 Documentation/gpu/xe/xe-drm-usage-stats.rst diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index 6dc299343b48..421766289b78 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -112,6 +112,17 @@ larger value within a reasonable period. Upon observing a value lower than what was previously read, userspace is expected to stay with that larger previous value until a monotonic update is seen. +- drm-total-cycles-: + +Engine identifier string must be the same as the one specified in the +drm-cycles- tag and shall contain the total number cycles for the given +engine. + +This is a timestamp in GPU unspecified unit that matches the update rate +of drm-cycles-. For drivers that implement this interface, the engine +utilization can be calculated entirely on the GPU clock domain, without +considering the CPU sleep time between 2 samples. Two opens. 1) Do we need to explicity document that drm-total-cycles and drm-maxfreq are mutually exclusive? so userspace has a fallback mechanism to calculate utilization depending on what keys are available? 2) Should drm-total-cycles for now be documents as driver specific? you mean to call it xe-total-cycles? I have added some more poeple in the cc who were involved with driver fdinfo implementations if they will have an opinion. I would say potentially yes, and promote it to common if more than one driver would use it. For instance I see panfrost has the driver specific drm-curfreq (although isn't documenting it fully in panfrost.rst). And I have to say it is somewhat questionable to expose the current frequency per fdinfo per engine but not my call. aren't all of Documentation/gpu/drm-usage-stats.rst optional that driver may or may not implement? When you say driver-specific I'd think more of the ones not using as prefix as e.g. amd-*. I think drm-cycles + drm-total-cycles is just an alternative implementation for engine utilization. Like drm-cycles + drm-maxfreq already is an alternative to drm-engine and is not implemented by e.g. amdgpu/i915. I will submit a new version of the entire patch series to get the ball rolling, but let's keep this open for now. <...> +static void show_runtime(struct drm_printer *p, struct drm_file *file) +{ + struct xe_file *xef = file->driver_priv; + struct xe_device *xe = xef->xe; + struct xe_gt *gt; + struct xe_hw_engine *hwe; + struct xe_exec_queue *q; + unsigned long i, id_hwe, id_gt, capacity[XE_ENGINE_CLASS_MAX] = { }; + u64 gpu_timestamp, engine_mask = 0; + bool gpu_stamp = false; + + xe_pm_runtime_get(xe); + + /* Accumulate all the exec queues from this client */ + mutex_lock(>exec_queue.lock); +
Re: [PATCH] Revert "drm/i915: Remove extra multi-gt pm-references"
On 5/7/2024 7:10 PM, Rodrigo Vivi wrote: On Tue, May 07, 2024 at 10:54:11AM +0200, Janusz Krzysztofik wrote: On Tuesday, 7 May 2024 09:30:15 GMT+2 Nirmoy Das wrote: Hi Janusz, Just realized we need Fixes tag for this. Fixes: 1f33dc0c1189 ("drm/i915: Remove extra multi-gt pm-references") Whoever is going to push this patch, please feel free to add this tag. dim b4-shazam gets that automagically, now it was sent in reply ;) Nice! I just pushed the patch. thanks for the patch and reviews. Thanks, Nirmoy Thanks, Janusz Regards, Nirmoy On 5/6/2024 8:02 PM, Janusz Krzysztofik wrote: This reverts commit 1f33dc0c1189efb9ae19c6fc22b64dd3e26261fb. There was a patch supposed to fix an issue of illegal attempts to free a still active i915 VMA object when parking a GT believed to be idle, reported by CI on 2-GT Meteor Lake. As a solution, an extra wakeref for a Primary GT was acquired from i915_gem_do_execbuffer() -- see commit f56fe3e91787 ("drm/i915: Fix a VMA UAF for multi-gt platform"). However, that fix occurred insufficient -- the issue was still reported by CI. That wakeref was released on exit from i915_gem_do_execbuffer(), then potentially before completion of the request and deactivation of its associated VMAs. Moreover, CI reports indicated that single-GT platforms also suffered sporadically from the same race. Since that issue was fixed by another commit f3c71b2ded5c ("drm/i915/vma: Fix UAF on destroy against retire race"), the changes introduced by that insufficient fix were dropped as no longer useful. However, that series resulted in another VMA UAF scenario now being triggered in CI. <4> [260.290809] [ cut here ] <4> [260.290988] list_del corruption. prev->next should be 888118c5d990, but was 888118c5a510. (prev=888118c5a510) <4> [260.291004] WARNING: CPU: 2 PID: 1143 at lib/list_debug.c:62 __list_del_entry_valid_or_report+0xb7/0xe0 .. <4> [260.291055] CPU: 2 PID: 1143 Comm: kms_plane Not tainted 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1 <4> [260.291058] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024 <4> [260.291060] RIP: 0010:__list_del_entry_valid_or_report+0xb7/0xe0 ... <4> [260.291087] Call Trace: <4> [260.291089] <4> [260.291124] i915_vma_reopen+0x43/0x80 [i915] <4> [260.291298] eb_lookup_vmas+0x9cb/0xcc0 [i915] <4> [260.291579] i915_gem_do_execbuffer+0xc9a/0x26d0 [i915] <4> [260.291883] i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915] ... <4> [260.292301] ... <4> [260.292506] ---[ end trace ]--- <4> [260.292782] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6ca3: [#1] PREEMPT SMP NOPTI <4> [260.303575] CPU: 2 PID: 1143 Comm: kms_plane Tainted: GW 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1 <4> [260.313851] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024 <4> [260.326359] RIP: 0010:eb_validate_vmas+0x114/0xd80 [i915] ... <4> [260.428756] Call Trace: <4> [260.431192] <4> [639.283393] i915_gem_do_execbuffer+0xd05/0x26d0 [i915] <4> [639.305245] i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915] ... <4> [639.411134] ... <4> [639.449979] ---[ end trace ]--- We defer actually closing, unbinding and destroying a VMA until next idle point, or until the object is freed in the meantime. By postponing the unbind, we allow for the VMA to be reopened by the client, avoiding the work required to rebind the VMA. Starting from commit b0647a5e79b1 ("drm/i915: Avoid live-lock with i915_vma_parked()"), we assume that as long as a GT is held idle, no VMA would be reopened while we destroy them. That assumption is no longer true in multi-GT configurations, where a VMA we reopen may be handled by a GT different from the one that we already keep active via its engine while we set up an execbuf request. Restoring the extra GT0 PM wakeref removed from i915_gem_do_execbuffer() processing path seems to fix this issue. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10608 Signed-off-by: Janusz Krzysztofik Cc: Rodrigo Vivi Cc: Nirmoy Das --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 42619fc05de48..090724fa766c9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -255,6 +255,7 @@ struct i915_execbuffer { struct intel_context *context; /* logical state for the request */ struct i915_gem_context *gem_context; /** caller's context */ intel_wakeref_t wakeref; + intel_wakeref_t wakeref_gt0; /** our requests to build */ struct i915_request *requests[MAX_ENGINE_INSTANCE + 1]; @@ -2685,6 +2686,7 @@
RE: [PATCH] epoll: try to be a _bit_ better about file lifetimes
From: Christian Brauner > Sent: 06 May 2024 09:45 > > > The fact is, it's not dma-buf that is violating any rules. It's epoll. > > I agree that epoll() not taking a reference on the file is at least > unexpected and contradicts the usual code patterns for the sake of > performance and that it very likely is the case that most callers of > f_op->poll() don't know this. > > Note, I cleary wrote upthread that I'm ok to do it like you suggested > but raised two concerns a) there's currently only one instance of > prolonged @file lifetime in f_op->poll() afaict and b) that there's > possibly going to be some performance impact on epoll(). > > So it's at least worth discussing what's more important because epoll() > is very widely used and it's not that we haven't favored performance > before. > > But you've already said that you aren't concerned with performance on > epoll() upthread. So afaict then there's really not a lot more to > discuss other than take the patch and see whether we get any complaints. Surely there isn't a problem with epoll holding a reference to the file structure - it isn't really any different to a dup(). 'All' that needs to happen is that the 'magic' that makes epoll() remove files on the last fput happen when the close is done. I'm sure there are horrid locking issues it that code (separate from it calling ->poll() after ->release()) eg if you call close() concurrently with EPOLL_CTL_ADD. I'm not at all sure it would have mattered if epoll kept the file open. But it can't do that because it is documented not to. As well as poll/select holding a reference to all their fd for the duration of the system call, a successful mmap() holds a reference until the pages are all unmapped - usually by process exit. We (dayjob) have code that uses epoll() to monitor large numbers of UDP sockets. I was doing some tests (trying to) receive RTP (audio) data concurrently on 1 sockets with typically one packet every 20ms. There are 1 associated RCTP sockets that are usually idle. A more normal limit would be 1000 RTP sockets. All the data needs to go into a single (multithreaded) process. Just getting all the packets queued on the sockets was non-trivial. epoll is about the only way to actually read the data. (That needed multiple epoll fd so each thread could process all the events from one epoll fd then look for another unprocessed fd.) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH 11/11] drm/tegra: Use fbdev client helpers
On 2024-05-07 07:58, Thomas Zimmermann wrote: Implement struct drm_client_funcs with the respective helpers and remove the custom code from the emulation. The generic helpers are equivalent in functionality. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/radeon/radeon_fbdev.c | 66 ++- Was radeon meant to be a separate patch? Regards, Felix drivers/gpu/drm/tegra/fbdev.c | 58 ++- 2 files changed, 6 insertions(+), 118 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_fbdev.c b/drivers/gpu/drm/radeon/radeon_fbdev.c index 02bf25759059a..cf790922174ea 100644 --- a/drivers/gpu/drm/radeon/radeon_fbdev.c +++ b/drivers/gpu/drm/radeon/radeon_fbdev.c @@ -29,7 +29,6 @@ #include #include -#include #include #include #include @@ -293,71 +292,12 @@ static const struct drm_fb_helper_funcs radeon_fbdev_fb_helper_funcs = { }; /* - * Fbdev client and struct drm_client_funcs + * struct drm_client_funcs */ -static void radeon_fbdev_client_unregister(struct drm_client_dev *client) -{ - struct drm_fb_helper *fb_helper = drm_fb_helper_from_client(client); - struct drm_device *dev = fb_helper->dev; - struct radeon_device *rdev = dev->dev_private; - - if (fb_helper->info) { - vga_switcheroo_client_fb_set(rdev->pdev, NULL); - drm_helper_force_disable_all(dev); - drm_fb_helper_unregister_info(fb_helper); - } else { - drm_client_release(_helper->client); - drm_fb_helper_unprepare(fb_helper); - kfree(fb_helper); - } -} - -static int radeon_fbdev_client_restore(struct drm_client_dev *client) -{ - drm_fb_helper_lastclose(client->dev); - vga_switcheroo_process_delayed_switch(); - - return 0; -} - -static int radeon_fbdev_client_hotplug(struct drm_client_dev *client) -{ - struct drm_fb_helper *fb_helper = drm_fb_helper_from_client(client); - struct drm_device *dev = client->dev; - struct radeon_device *rdev = dev->dev_private; - int ret; - - if (dev->fb_helper) - return drm_fb_helper_hotplug_event(dev->fb_helper); - - ret = drm_fb_helper_init(dev, fb_helper); - if (ret) - goto err_drm_err; - - if (!drm_drv_uses_atomic_modeset(dev)) - drm_helper_disable_unused_functions(dev); - - ret = drm_fb_helper_initial_config(fb_helper); - if (ret) - goto err_drm_fb_helper_fini; - - vga_switcheroo_client_fb_set(rdev->pdev, fb_helper->info); - - return 0; - -err_drm_fb_helper_fini: - drm_fb_helper_fini(fb_helper); -err_drm_err: - drm_err(dev, "Failed to setup radeon fbdev emulation (ret=%d)\n", ret); - return ret; -} - static const struct drm_client_funcs radeon_fbdev_client_funcs = { - .owner = THIS_MODULE, - .unregister = radeon_fbdev_client_unregister, - .restore= radeon_fbdev_client_restore, - .hotplug= radeon_fbdev_client_hotplug, + .owner = THIS_MODULE, + DRM_FBDEV_HELPER_CLIENT_FUNCS, }; void radeon_fbdev_setup(struct radeon_device *rdev) diff --git a/drivers/gpu/drm/tegra/fbdev.c b/drivers/gpu/drm/tegra/fbdev.c index db6eaac3d30e6..f9cc365cfed94 100644 --- a/drivers/gpu/drm/tegra/fbdev.c +++ b/drivers/gpu/drm/tegra/fbdev.c @@ -12,7 +12,6 @@ #include #include -#include #include #include #include @@ -150,63 +149,12 @@ static const struct drm_fb_helper_funcs tegra_fb_helper_funcs = { }; /* - * struct drm_client + * struct drm_client_funcs */ -static void tegra_fbdev_client_unregister(struct drm_client_dev *client) -{ - struct drm_fb_helper *fb_helper = drm_fb_helper_from_client(client); - - if (fb_helper->info) { - drm_fb_helper_unregister_info(fb_helper); - } else { - drm_client_release(_helper->client); - drm_fb_helper_unprepare(fb_helper); - kfree(fb_helper); - } -} - -static int tegra_fbdev_client_restore(struct drm_client_dev *client) -{ - drm_fb_helper_lastclose(client->dev); - - return 0; -} - -static int tegra_fbdev_client_hotplug(struct drm_client_dev *client) -{ - struct drm_fb_helper *fb_helper = drm_fb_helper_from_client(client); - struct drm_device *dev = client->dev; - int ret; - - if (dev->fb_helper) - return drm_fb_helper_hotplug_event(dev->fb_helper); - - ret = drm_fb_helper_init(dev, fb_helper); - if (ret) - goto err_drm_err; - - if (!drm_drv_uses_atomic_modeset(dev)) - drm_helper_disable_unused_functions(dev); - - ret = drm_fb_helper_initial_config(fb_helper); - if (ret) - goto err_drm_fb_helper_fini; - - return 0; - -err_drm_fb_helper_fini: - drm_fb_helper_fini(fb_helper); -err_drm_err: - drm_err(dev,
Re: [PATCH 1/5] dt-bindings: display: panel: mipi-dbi-spi: Add a pixel format property
On Tue, May 07, 2024 at 11:57:26AM +0200, Noralf Trønnes wrote: > The MIPI DBI 2.0 specification (2005) lists only two pixel formats for > the Type C Interface (SPI) and that is 3-bits/pixel RGB111 with > 2 options for bit layout. > > For Type A and B (parallel) the following formats are listed: RGB332, > RGB444, RGB565, RGB666 and RGB888 (some have 2 options for the bit layout). > > Many MIPI DBI compatible controllers support all interface types on the > same chip and often the manufacturers have chosen to provide support for > the Type A/B interface pixel formats also on the Type C interface. > > Some chips provide many pixel formats with optional bit layouts over SPI, > but the most common by far are RGB565 and RGB666. So even if the > specification doesn't list these formats for the Type C interface, the > industry has chosen to include them. > > The MIPI DCS specification lists the standard commands that can be sent > over the MIPI DBI interface. The set_address_mode (36h) command has one > bit in the parameter that controls RGB/BGR order: > This bit controls the RGB data latching order transferred from the > peripheral’s frame memory to the display device. > This means that each supported RGB format also has a BGR variant. > > Based on this rationale document the following pixel formats describing > the bit layout going over the wire: > - RGB111 (option 1): x2r1g1b1r1g1b1 (2 pixels per byte) > - BGR111 (option 1): x2b1g1r1b1g1r1 (2 pixels per byte) > - RGB111 (option 2): x1r1g1b1x1r1g1b1 (2 pixels per byte) > - BGR111 (option 2): x1b1g1r1x1b1g1r1 (2 pixels per byte) > - RGB565: r5g6b5 (2 bytes) > - BGR565: b5g6r5 (2 bytes) > - RGB666: r6x2g6x2b6x2 (3 bytes) > - BGR666: b6x2g6x2r6x2 (3 bytes) > (x: don't care) > > Signed-off-by: Noralf Trønnes > --- > .../bindings/display/panel/panel-mipi-dbi-spi.yaml | 31 > ++ > 1 file changed, 31 insertions(+) > > diff --git > a/Documentation/devicetree/bindings/display/panel/panel-mipi-dbi-spi.yaml > b/Documentation/devicetree/bindings/display/panel/panel-mipi-dbi-spi.yaml > index e808215cb39e..dac8f9af100e 100644 > --- a/Documentation/devicetree/bindings/display/panel/panel-mipi-dbi-spi.yaml > +++ b/Documentation/devicetree/bindings/display/panel/panel-mipi-dbi-spi.yaml > @@ -50,6 +50,12 @@ description: | >|Command or data | >|| > > + The standard defines one pixel format for type C: RGB111. The industry > + however has decided to provide the type A/B interface pixel formats also on > + the Type C interface and most common among these are RGB565 and RGB666. > + The MIPI DCS command set_address_mode (36h) has one bit that controls > RGB/BGR > + order. This gives each supported RGB format a BGR variant. > + >The panel resolution is specified using the panel-timing node properties >hactive (width) and vactive (height). The other mandatory panel-timing >properties should be set to zero except clock-frequency which can be > @@ -90,6 +96,29 @@ properties: > >spi-3wire: true > > + format: > +description: > > + Pixel format in bit order as going on the wire: > +* `x2r1g1b1r1g1b1` - RGB111, 2 pixels per byte > +* `x2b1g1r1b1g1r1` - BGR111, 2 pixels per byte > +* `x1r1g1b1x1r1g1b1` - RGB111, 2 pixels per byte > +* `x1b1g1r1x1b1g1r1` - BGR111, 2 pixels per byte > +* `r5g6b5` - RGB565, 2 bytes > +* `b5g6r5` - BGR565, 2 bytes > +* `r6x2g6x2b6x2` - RGB666, 3 bytes > +* `b6x2g6x2r6x2` - BGR666, 3 bytes > + This property is optional for backwards compatibility and `r5g6b5` is > + assumed in its absence. Use schemas, not free form text: default: r5g6b5 > +enum: > + - x2r1g1b1r1g1b1 > + - x2b1g1r1b1g1r1 > + - x1r1g1b1x1r1g1b1 > + - x1b1g1r1x1b1g1r1 > + - r5g6b5 > + - b5g6r5 > + - r6x2g6x2b6x2 > + - b6x2g6x2r6x2 > + > required: >- compatible >- reg > @@ -116,6 +145,8 @@ examples: > reset-gpios = < 25 GPIO_ACTIVE_HIGH>; > write-only; > > +format = "r5g6b5"; > + > backlight = <>; > > width-mm = <35>; > > -- > 2.45.0 >
Re: [PATCH] drm/drm-bridge.c: Drop conditionals around of_node pointers
On Wed, May 08, 2024 at 02:00:00AM +0800, Sui Jingfeng wrote: > Having conditional around the of_node pointer of the drm_bridge structure > is not necessary, since drm_bridge structure always has the of_node as its > member. > > Let's drop the conditional to get a better looks, please also note that > this is following the already accepted commitments. see commit d8dfccde2709 > ("drm/bridge: Drop conditionals around of_node pointers") for reference. > > Signed-off-by: Sui Jingfeng It looks like this was forgotten in commit d8dfccde2709 ("drm/bridge: Drop conditionals around of_node pointers"). Reviewed-by: Laurent Pinchart > --- > drivers/gpu/drm/drm_bridge.c | 5 - > 1 file changed, 5 deletions(-) > > diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c > index 30d66bee0ec6..a6dbe1751e88 100644 > --- a/drivers/gpu/drm/drm_bridge.c > +++ b/drivers/gpu/drm/drm_bridge.c > @@ -352,13 +352,8 @@ int drm_bridge_attach(struct drm_encoder *encoder, > struct drm_bridge *bridge, > bridge->encoder = NULL; > list_del(>chain_node); > > -#ifdef CONFIG_OF > DRM_ERROR("failed to attach bridge %pOF to encoder %s: %d\n", > bridge->of_node, encoder->name, ret); > -#else > - DRM_ERROR("failed to attach bridge to encoder %s: %d\n", > - encoder->name, ret); > -#endif > > return ret; > } -- Regards, Laurent Pinchart
Re: [PATCH] dmabuf: fix dmabuf file poll uaf issue
On Tue, May 7, 2024 at 11:17 AM T.J. Mercier wrote: > > On Tue, May 7, 2024 at 7:04 AM Christian König > wrote: > > > > Am 07.05.24 um 15:39 schrieb Daniel Vetter: > > > On Tue, May 07, 2024 at 12:10:07PM +0200, Christian König wrote: > > >> Am 06.05.24 um 21:04 schrieb T.J. Mercier: > > >>> On Mon, May 6, 2024 at 2:30 AM Charan Teja Kalla > > >>> wrote: > > Hi TJ, > > > > Seems I have got answers from [1], where it is agreed upon epoll() is > > the source of issue. > > > > Thanks a lot for the discussion. > > > > [1] > > https://lore.kernel.org/lkml/2d631f0615918...@google.com/ > > > > Thanks > > Charan > > >>> Oh man, quite a set of threads on this over the weekend. Thanks for the > > >>> link. > > >> Yeah and it also has some interesting side conclusion: We should probably > > >> tell people to stop using DMA-buf with epoll. > > >> > > >> The background is that the mutex approach epoll uses to make files > > >> disappear > > >> from the interest list on close results in the fact that each file can > > >> only > > >> be part of a single epoll at a time. > > >> > > >> Now since DMA-buf is build around the idea that we share the buffer > > >> representation as file between processes it means that only one process > > >> at a > > >> time can use epoll with each DMA-buf. > > >> > > >> So for example if a window manager uses epoll everything is fine. If a > > >> client is using epoll everything is fine as well. But if *both* use > > >> epoll at > > >> the same time it won't work. > > >> > > >> This can lead to rather funny and hard to debug combinations of failures > > >> and > > >> I think we need to document this limitation and explicitly point it out. > > > Ok, I tested this with a small C program, and you're mixing things up. > > > Here's what I got > > > > > > - You cannot add a file twice to the same epoll file/fd. So that part is > > >correct, and also my understanding from reading the kernel code. > > > > > > - You can add the same file to two different epoll file instaces. Which > > >means it's totally fine to use epoll on a dma_buf in different > > > processes > > >like both in the compositor and in clients. > > > > Ah! Than I misunderstood that comment in the discussion. Thanks for > > clarifying that. > > > > > > > > - Substantially more entertaining, you can nest epoll instances, and e.g. > > >add a 2nd epoll file as an event to the first one. That way you can add > > >the same file to both epoll fds, and so end up with the same file > > >essentially being added twice to the top-level epoll file. So even > > >within one application there's no real issue when e.g. different > > >userspace drivers all want to use epoll on the same fd, because you can > > >just throw in another level of epoll and it's fine again and you won't > > >get an EEXISTS on EPOLL_CTL_ADD. > > > > > >But I also don't think we have this issue right now anywhere, since > > > it's > > >kinda a general epoll issue that happens with any duplicated file. > > > > I actually have been telling people to (ab)use the epoll behavior to > > check if two file descriptors point to the same underlying file when > > KCMP isn't available. > > > > Some environments (Android?) disable KCMP because they see it as > > security problem. > > > Didn't know about this so I checked. Our kernel has CONFIG_KCMP, but > seccomp does look like it's blocking kcmp for apps. > https://cs.android.com/android/platform/superproject/main/+/main:bionic/libc/SYSCALLS.TXT Getting a bit off the original topic, but fwiw this is what CrOS did for CONFIG_KCMP: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3501193 Ie. allow the kcmp syscall, but block type == KCMP_SYSVSEM, which was the more specific thing that android wanted to block. BR, -R
Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?
On Tue, May 07, 2024 at 10:59:42PM +0300, Dmitry Baryshkov wrote: > On Tue, 7 May 2024 at 21:40, Laurent Pinchart wrote: > > On Tue, May 07, 2024 at 06:19:18PM +0300, Dmitry Baryshkov wrote: > > > On Tue, 7 May 2024 at 18:15, Bryan O'Donoghue wrote: > > > > On 07/05/2024 16:09, Dmitry Baryshkov wrote: > > > > > Ah, I see. Then why do you require the DMA-ble buffer at all? If you > > > > > are > > > > > providing data to VPU or DRM, then you should be able to get the > > > > > buffer > > > > > from the data-consuming device. > > > > > > > > Because we don't necessarily know what the consuming device is, if any. > > > > > > > > Could be VPU, could be Zoom/Hangouts via pipewire, could for argument > > > > sake be GPU or DSP. > > > > > > > > Also if we introduce a dependency on another device to allocate the > > > > output buffers - say always taking the output buffer from the GPU, then > > > > we've added another dependency which is more difficult to guarantee > > > > across different arches. > > > > > > Yes. And it should be expected. It's a consumer who knows the > > > restrictions on the buffer. As I wrote, Zoom/Hangouts should not > > > require a DMA buffer at all. > > > > Why not ? If you want to capture to a buffer that you then compose on > > the screen without copying data, dma-buf is the way to go. That's the > > Linux solution for buffer sharing. > > Yes. But it should be allocated by the DRM driver. As Sima wrote, > there is no guarantee that the buffer allocated from dma-heaps is > accessible to the GPU. No disagreement there. From a libcamera point of view, we want to import buffers allocated externally. It's for use cases where applications can't provide dma buf instances easily that we need to allocate them through the libcamera buffer allocator helper. That helper has to a) provide dma buf fds, and b) make a best effort to allocate buffers that will have a decent chance of being usable by other devices. We're open to exploring other solutions than dma heaps, although I wonder what the dma heaps are for if nobody enables them :-) > > > Applications should be able to allocate > > > the buffer out of the generic memory. > > > > If applications really want to copy data and degrade performance, they > > are free to shoot themselves in the foot of course. Applications (or > > compositors) need to support copying as a fallback in the worst case, > > but all components should at least aim for the zero-copy case. > > I'd say that they should aim for the optimal case. It might include > both zero-copying access from another DMA master or simple software > processing of some kind. > > > > GPUs might also have different > > > requirements. Consider GPUs with VRAM. It might be beneficial to > > > allocate a buffer out of VRAM rather than generic DMA mem. > > > > Absolutely. For that we need a centralized device memory allocator in > > userspace. An effort was started by James Jones in 2016, see [1]. It has > > unfortunately stalled. If I didn't have a camera framework to develop, I > > would try to tackle that issue :-) > > I'll review the talk. However the fact that the effort has stalled > most likely means that 'one fits them all' approach didn't really fly > well. We have too many usecases. > > > [1] > > https://www.x.org/wiki/Events/XDC2016/Program/Unix_Device_Memory_Allocation.pdf -- Regards, Laurent Pinchart
[RFC PATCH] drm/amd/display: Disable panel_power_savings sysfs entry for OLED displays
The panel_power_savings sysfs entry sets the Adaptive Backlight Management level (abm_level). OLED displays work without backlight, so it is unnecessary for them. Before creating the sysfs entry, make sure the display is not an OLED display. Signed-off-by: Gergo Koteles --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index d6e71aa808d8..d54065a76f63 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6535,9 +6535,11 @@ static const struct attribute_group amdgpu_group = { static void amdgpu_dm_connector_unregister(struct drm_connector *connector) { struct amdgpu_dm_connector *amdgpu_dm_connector = to_amdgpu_dm_connector(connector); + union dpcd_sink_ext_caps *ext_caps = + _dm_connector->dc_link->dpcd_sink_ext_caps; if (connector->connector_type == DRM_MODE_CONNECTOR_eDP && - amdgpu_dm_abm_level < 0) + amdgpu_dm_abm_level < 0 && !ext_caps->bits.oled) sysfs_remove_group(>kdev->kobj, _group); drm_dp_aux_unregister(_dm_connector->dm_dp_aux.aux); @@ -6642,10 +6644,12 @@ amdgpu_dm_connector_late_register(struct drm_connector *connector) { struct amdgpu_dm_connector *amdgpu_dm_connector = to_amdgpu_dm_connector(connector); + union dpcd_sink_ext_caps *ext_caps = + _dm_connector->dc_link->dpcd_sink_ext_caps; int r; if (connector->connector_type == DRM_MODE_CONNECTOR_eDP && - amdgpu_dm_abm_level < 0) { + amdgpu_dm_abm_level < 0 && !ext_caps->bits.oled) { r = sysfs_create_group(>kdev->kobj, _group); if (r) base-commit: dccb07f2914cdab2ac3a5b6c98406f765acab803 -- 2.45.0
Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?
Hi, Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit : > Shorter term, we have a problem to solve, and the best option we have > found so far is to rely on dma-buf heaps as a backend for the frame > buffer allocatro helper in libcamera for the use case described above. > This won't work in 100% of the cases, clearly. It's a stop-gap measure > until we can do better. Considering the security concerned raised on this thread with dmabuf heap allocation not be restricted by quotas, you'd get what you want quickly with memfd + udmabuf instead (which is accounted already). It was raised that distro don't enable udmabuf, but as stated there by Hans, in any cases distro needs to take action to make the softISP works. This alternative is easy and does not interfere in anyway with your future plan or the libcamera API. You could even have both dmabuf heap (for Raspbian) and the safer memfd+udmabuf for the distro with security concerns. And for the long term plan, we can certainly get closer by fixing that issue with accounting. This issue also applied to v4l2 io-ops, so it would be nice to find common set of helpers to fix these exporters. regards, Nicolas
Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?
On Tue, 7 May 2024 at 21:40, Laurent Pinchart wrote: > > On Tue, May 07, 2024 at 06:19:18PM +0300, Dmitry Baryshkov wrote: > > On Tue, 7 May 2024 at 18:15, Bryan O'Donoghue wrote: > > > On 07/05/2024 16:09, Dmitry Baryshkov wrote: > > > > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are > > > > providing data to VPU or DRM, then you should be able to get the buffer > > > > from the data-consuming device. > > > > > > Because we don't necessarily know what the consuming device is, if any. > > > > > > Could be VPU, could be Zoom/Hangouts via pipewire, could for argument > > > sake be GPU or DSP. > > > > > > Also if we introduce a dependency on another device to allocate the > > > output buffers - say always taking the output buffer from the GPU, then > > > we've added another dependency which is more difficult to guarantee > > > across different arches. > > > > Yes. And it should be expected. It's a consumer who knows the > > restrictions on the buffer. As I wrote, Zoom/Hangouts should not > > require a DMA buffer at all. > > Why not ? If you want to capture to a buffer that you then compose on > the screen without copying data, dma-buf is the way to go. That's the > Linux solution for buffer sharing. Yes. But it should be allocated by the DRM driver. As Sima wrote, there is no guarantee that the buffer allocated from dma-heaps is accessible to the GPU. > > > Applications should be able to allocate > > the buffer out of the generic memory. > > If applications really want to copy data and degrade performance, they > are free to shoot themselves in the foot of course. Applications (or > compositors) need to support copying as a fallback in the worst case, > but all components should at least aim for the zero-copy case. I'd say that they should aim for the optimal case. It might include both zero-copying access from another DMA master or simple software processing of some kind. > > GPUs might also have different > > requirements. Consider GPUs with VRAM. It might be beneficial to > > allocate a buffer out of VRAM rather than generic DMA mem. > > Absolutely. For that we need a centralized device memory allocator in > userspace. An effort was started by James Jones in 2016, see [1]. It has > unfortunately stalled. If I didn't have a camera framework to develop, I > would try to tackle that issue :-) I'll review the talk. However the fact that the effort has stalled most likely means that 'one fits them all' approach didn't really fly well. We have too many usecases. > > [1] > https://www.x.org/wiki/Events/XDC2016/Program/Unix_Device_Memory_Allocation.pdf -- With best wishes Dmitry
Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers
On 5/7/24 18:56, Jason Gunthorpe wrote: On Tue, May 07, 2024 at 06:25:52PM +0100, Pavel Begunkov wrote: On 5/7/24 17:48, Jason Gunthorpe wrote: On Tue, May 07, 2024 at 09:42:05AM -0700, Mina Almasry wrote: 1. Align with devmem TCP to use udmabuf for your io_uring memory. I think in the past you said it's a uapi you don't link but in the face of this pushback you may want to reconsider. dmabuf does not force a uapi, you can acquire your pages however you want and wrap them up in a dmabuf. No uapi at all. The point is that dmabuf already provides ops that do basically what is needed here. We don't need ops calling ops just because dmabuf's ops are not understsood or not perfect. Fixup dmabuf. Those ops, for example, are used to efficiently return used buffers back to the kernel, which is uapi, I don't see how dmabuf can be fixed up to cover it. Sure, but that doesn't mean you can't use dma buf for the other parts of the flow. The per-page lifetime is a different topic than the refcounting and access of the entire bulk of memory. Ok, so if we're leaving uapi (and ops) and keep per page/sub-buffer as is, the rest is resolving uptr -> pages, and passing it to page pool in a convenient to page pool format (net_iov). I don't see how dmabuf would help here. Adding dmabuf in the middle (internally wrapping pages) would add more setup code with the same final result, that is a format that page pool can work with. And for io_uring it's normal user memory. We'll have to use dmabuf when we'd want to extend to peer-to-peer and all that fun, but that's a small fraction of it, and we'll hopefully reuse some setup helpers from devmem tcp. -- Pavel Begunkov
Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Tue, 7 May 2024 at 11:04, Daniel Vetter wrote: > > On Tue, May 07, 2024 at 09:46:31AM -0700, Linus Torvalds wrote: > > > I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl > > too, if this is possibly a more common thing. and not just DRM wants > > it. > > > > Would something like that work for you? > > Yes. > > Adding Simon and Pekka as two of the usual suspects for this kind of > stuff. Also example code (the int return value is just so that callers know > when kcmp isn't available, they all only care about equality): > > https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/os_file.c#L239 That example thing shows that we shouldn't make it a FISAME ioctl - we should make it a fcntl() instead, and it would just be a companion to F_DUPFD. Doesn't that strike everybody as a *much* cleaner interface? I think F_ISDUP would work very naturally indeed with F_DUPFD. Yes? No? Linus
Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?
On Mon, May 06, 2024 at 03:38:24PM +0200, Daniel Vetter wrote: > On Mon, May 06, 2024 at 02:05:12PM +0200, Maxime Ripard wrote: > > On Mon, May 06, 2024 at 01:49:17PM GMT, Hans de Goede wrote: > > > Hi dma-buf maintainers, et.al., > > > > > > Various people have been working on making complex/MIPI cameras work OOTB > > > with mainline Linux kernels and an opensource userspace stack. > > > > > > The generic solution adds a software ISP (for Debayering and 3A) to > > > libcamera. Libcamera's API guarantees that buffers handed to applications > > > using it are dma-bufs so that these can be passed to e.g. a video encoder. > > > > > > In order to meet this API guarantee the libcamera software ISP allocates > > > dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For > > > the Fedora COPR repo for the PoC of this: > > > https://hansdegoede.dreamwidth.org/28153.html > > > > For the record, we're also considering using them for ARM KMS devices, > > so it would be better if the solution wasn't only considering v4l2 > > devices. > > > > > I have added a simple udev rule to give physically present users access > > > to the dma_heap-s: > > > > > > KERNEL=="system", SUBSYSTEM=="dma_heap", TAG+="uaccess" > > > > > > (and on Rasperry Pi devices any users in the video group get access) > > > > > > This was just a quick fix for the PoC. Now that we are ready to move out > > > of the PoC phase and start actually integrating this into distributions > > > the question becomes if this is an acceptable solution; or if we need some > > > other way to deal with this ? > > > > > > Specifically the question is if this will have any negative security > > > implications? I can certainly see this being used to do some sort of > > > denial of service attack on the system (1). This is especially true for > > > the cma heap which generally speaking is a limited resource. > > > > There's plenty of other ways to exhaust CMA, like allocating too much > > KMS or v4l2 buffers. I'm not sure we should consider dma-heaps > > differently than those if it's part of our threat model. > > So generally for an arm soc where your display needs cma, your render node > doesn't. And user applications only have access to the later, while only > the compositor gets a kms fd through logind. At least in drm aside from > vc4 there's really no render driver that just gives you access to cma and > allows you to exhaust that, you need to be a compositor with drm master > access to the display. > > Which means we're mostly protected against bad applications, and that's > not a threat the "user physically sits in front of the machine accounts > for", and which giving cma access to everyone would open up. And with > flathub/snaps/... this is very much an issue. > > So you need more, either: > > - cgroups limits on dma-buf and dma-buf heaps. This has been bikeshedded > for years and is just not really moving. > > - An allocator service which checks whether you're allowed to allocate > these special buffers. Android does that through binder. I would split the latter into a centralized frame buffer allocator library for userspace (James Jones' Unix device memory allocator comes to mind), providing dma-buf instances using whatever backend is available and suitable (DMA heaps would likely be one of them), and a safe way to make this allocator available to applications. Exposing it through some system services could be useful, but with proper tracking and accounting of memory allocated through DMA heaps (and DRM, and V4L2) we could possibly do with just a library. What I really want is to move memory allocation for devices to a separate component, and make everything else a consumer of those buffers. > Probably also some way to nuke applications that refuse to release buffers > when they're no longer the right application. cgroups is a lot more > convenient for that. > > > > But devices tagged for uaccess are only opened up to users who are > > > physcially present behind the machine and those can just hit > > > the powerbutton, so I don't believe that any *on purpose* DOS is part of > > > the thread model. > > > > How would that work for headless devices? -- Regards, Laurent Pinchart
Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?
On Tue, May 07, 2024 at 06:19:18PM +0300, Dmitry Baryshkov wrote: > On Tue, 7 May 2024 at 18:15, Bryan O'Donoghue wrote: > > On 07/05/2024 16:09, Dmitry Baryshkov wrote: > > > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are > > > providing data to VPU or DRM, then you should be able to get the buffer > > > from the data-consuming device. > > > > Because we don't necessarily know what the consuming device is, if any. > > > > Could be VPU, could be Zoom/Hangouts via pipewire, could for argument > > sake be GPU or DSP. > > > > Also if we introduce a dependency on another device to allocate the > > output buffers - say always taking the output buffer from the GPU, then > > we've added another dependency which is more difficult to guarantee > > across different arches. > > Yes. And it should be expected. It's a consumer who knows the > restrictions on the buffer. As I wrote, Zoom/Hangouts should not > require a DMA buffer at all. Why not ? If you want to capture to a buffer that you then compose on the screen without copying data, dma-buf is the way to go. That's the Linux solution for buffer sharing. > Applications should be able to allocate > the buffer out of the generic memory. If applications really want to copy data and degrade performance, they are free to shoot themselves in the foot of course. Applications (or compositors) need to support copying as a fallback in the worst case, but all components should at least aim for the zero-copy case. > GPUs might also have different > requirements. Consider GPUs with VRAM. It might be beneficial to > allocate a buffer out of VRAM rather than generic DMA mem. Absolutely. For that we need a centralized device memory allocator in userspace. An effort was started by James Jones in 2016, see [1]. It has unfortunately stalled. If I didn't have a camera framework to develop, I would try to tackle that issue :-) [1] https://www.x.org/wiki/Events/XDC2016/Program/Unix_Device_Memory_Allocation.pdf -- Regards, Laurent Pinchart
Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?
On Tue, May 07, 2024 at 07:36:59PM +0200, Daniel Vetter wrote: > On Tue, May 07, 2024 at 04:15:05PM +0100, Bryan O'Donoghue wrote: > > On 07/05/2024 16:09, Dmitry Baryshkov wrote: > > > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are > > > providing data to VPU or DRM, then you should be able to get the buffer > > > from the data-consuming device. > > > > Because we don't necessarily know what the consuming device is, if any. > > Well ... that's an entirely different issue. And it's unsolved. > > Currently the approach is to allocate where the constraints are usually > most severe (like display, if you need that, or the camera module for > input) and then just pray the stack works out without too much copying. > All userspace (whether the generic glue or the userspace driver depends a > bit upon the exact api) does need to have a copy fallback for these > sharing cases, ideally with the copying accelerated by hw. > > If you try to solve this by just preemptive allocating everything as cma > buffers, then you'll make the situation substantially worse (because now > you're wasting tons of cma memory where you might not even need it). > And without really solving the problem, since for some gpus that memory > might be unusable (because you cannot scan that out on any discrete gpu, > and sometimes not even on an integrated one). I think we have a general agreement that the proposed solution is a stop-gap measure for an unsolved issue. Note that libcamera is already designed that way. The API is designed to import buffers, using dma-buf file handles. If an application has a way to allocate dma-buf instances through another means (from the display or from a video encoder for instance), it should do so, and use those buffers with libcamera. For applications that don't have an easy way to get hold of dma-buf instances, we have a buffer allocator helper as a side component. That allocator uses the underlying camera capture device, and allocates buffers from the V4L2 video device. It's only on platforms where we have no hardware camera processing (or, rather, platforms where the hardware vendors doesn't give us access to the camera hardware, such as recent Intel SoCs, or Qualcomm SoCs used in ARM laptops) that we need to allocate memory elsewhere. In the long run, I want a centralized memory allocator accessible by userspace applications (something similar in purpose to gralloc on Android), and I want to get rid of buffer allocation in libcamera (and even in V4L2, in the even longer term). That's the long run. Shorter term, we have a problem to solve, and the best option we have found so far is to rely on dma-buf heaps as a backend for the frame buffer allocatro helper in libcamera for the use case described above. This won't work in 100% of the cases, clearly. It's a stop-gap measure until we can do better. > > Could be VPU, could be Zoom/Hangouts via pipewire, could for argument sake > > be GPU or DSP. > > > > Also if we introduce a dependency on another device to allocate the output > > buffers - say always taking the output buffer from the GPU, then we've added > > another dependency which is more difficult to guarantee across different > > arches. -- Regards, Laurent Pinchart
NXP i.MX8MM GPU performances
Hello all, I did run some benchmark on i.MX8MM GPU and I have some concerns on the differences between mainline Linux/etnaviv/Mesa and the proprietary NXP/Vivante solution. The tests were executed comparing glmark2 results between a mainline kernel (6.9.0-rc6) running Mesa 24.0.3 and NXP provided galcore driver 6.4.3.p4.398061 running with a 5.15 based NXP downstream kernel. The GPU is running in overdrive mode (see [1]). mainline infos (etnaviv): > dmesg | grep -i -E '(gpu|etnaviv)' [9.113389] etnaviv-gpu 3800.gpu: model: GC600, revision: 4653 [9.120939] etnaviv-gpu 3800.gpu: Need to move linear window on MC1.0, disabling TS [9.129238] etnaviv-gpu 38008000.gpu: model: GC520, revision: 5341 [9.138463] [drm] Initialized etnaviv 1.4.0 20151214 for etnaviv on minor 1 glmark2-es2-wayland info output: === glmark2 2023.01 === OpenGL Information GL_VENDOR: Mesa GL_RENDERER:Vivante GC600 rev 4653 GL_VERSION: OpenGL ES 2.0 Mesa 24.0.3 Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0 Surface Size: 640x480 windowed === galcore infos (vivante): > dmesg | grep -i -E '(gpu|vivante|gal)' [4.524977] Galcore version 6.4.3.p4.398061 [4.587654] [drm] Initialized vivante 1.0.0 20170808 for 3800.gpu on minor 0 glmark2-es2-wayland info output: === glmark2 2023.01 === OpenGL Information GL_VENDOR: Vivante Corporation GL_RENDERER:Vivante GC7000NanoUltra GL_VERSION: OpenGL ES 2.0 V6.4.3.p4.398061 Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0 Surface Size: 640x480 windowed === In screen (weston + DSI) test results: glmark2 command: > glmark2-es2-wayland -b shading:duration=5.0 -b refract -b build -b texture -b > shadow -b bump -s 640x480 2>&1 | | glmark2 tests | | sw ver |shading|build|texture|refract|shadow|bump| |-|---|-|---|---|--|| | etnaviv | 263 | 334 | 291 | 22| 63 | 328| | vivante | 544 | 956 | 790 | 26| 225 | 894| we have 50-60% smaller number with etnaviv. Offscreen test results: glmark2 command: > glmark2-es2-wayland --off-screen -b shading:duration=5.0 -b refract -b build > -b texture -b shadow -b bump -s 640x480 2>&1 | | glmark2 tests | | sw ver |shading|build|texture|refract|shadow|bump| |-|---|-|---|---|--|| | etnaviv | 348 | 541 | 466 | 24| 81 | 498| | vivante | 402 | 624 | 520 | 26| 177 | 557| we have a 10~13% smaller number with etnaviv. Do anybody did run similar benchmark in the past on NXP i.MX8MM? With what results? Is it expected such a difference in the glmark2 tests results? Any idea on this big difference between running the test offscreen or not? João Paulo [1] https://lore.kernel.org/all/20240507143555.471025-1-jpaulo.silvagoncal...@gmail.com/
Re: [PATCH v2 01/12] drm/amdgpu, drm/radeon: Make I2C terminology more inclusive
On 5/3/2024 11:13 AM, Easwar Hariharan wrote: > I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave" > with more appropriate terms. Inspired by and following on to Wolfram's > series to fix drivers/i2c/[1], fix the terminology for users of > I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists > in the specification. > > Compile tested, no functionality changes intended > > [1]: > https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/ > > Signed-off-by: Easwar Hariharan > --- > .../gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 8 +++--- > drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c | 10 +++ > drivers/gpu/drm/amd/amdgpu/atombios_i2c.c | 8 +++--- > drivers/gpu/drm/amd/amdgpu/atombios_i2c.h | 2 +- > drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c| 20 ++--- > .../gpu/drm/amd/display/dc/bios/bios_parser.c | 2 +- > .../drm/amd/display/dc/bios/bios_parser2.c| 2 +- > .../drm/amd/display/dc/core/dc_link_exports.c | 4 +-- > drivers/gpu/drm/amd/display/dc/dc.h | 2 +- > drivers/gpu/drm/amd/display/dc/dce/dce_i2c.c | 4 +-- > .../display/include/grph_object_ctrl_defs.h | 2 +- > drivers/gpu/drm/amd/include/atombios.h| 2 +- > drivers/gpu/drm/amd/include/atomfirmware.h| 26 - > .../powerplay/hwmgr/vega20_processpptables.c | 4 +-- > .../amd/pm/powerplay/inc/smu11_driver_if.h| 2 +- > .../inc/pmfw_if/smu11_driver_if_arcturus.h| 2 +- > .../inc/pmfw_if/smu11_driver_if_navi10.h | 2 +- > .../pmfw_if/smu11_driver_if_sienna_cichlid.h | 2 +- > .../inc/pmfw_if/smu13_driver_if_aldebaran.h | 2 +- > .../inc/pmfw_if/smu13_driver_if_v13_0_0.h | 2 +- > .../inc/pmfw_if/smu13_driver_if_v13_0_7.h | 2 +- > .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 4 +-- > .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 8 +++--- > drivers/gpu/drm/radeon/atombios.h | 16 +-- > drivers/gpu/drm/radeon/atombios_i2c.c | 4 +-- > drivers/gpu/drm/radeon/radeon_combios.c | 28 +-- > drivers/gpu/drm/radeon/radeon_i2c.c | 10 +++ > drivers/gpu/drm/radeon/radeon_mode.h | 6 ++-- > 28 files changed, 93 insertions(+), 93 deletions(-) > Hello Christian, Daniel, David, others, Could you re-review v2 since the feedback provided in v0 [1] has now been addressed? I can send v3 with all other feedback and signoffs from the other maintainers incorporated when I have something for amdgpu and radeon. Thanks, Easwar [1] https://lore.kernel.org/all/53f3afba-4759-4ea1-b408-8a929b262...@amd.com/
Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Tue, May 07, 2024 at 09:46:31AM -0700, Linus Torvalds wrote: > On Tue, 7 May 2024 at 04:03, Daniel Vetter wrote: > > > > It's really annoying that on some distros/builds we don't have that, and > > for gpu driver stack reasons we _really_ need to know whether a fd is the > > same as another, due to some messy uniqueness requirements on buffer > > objects various drivers have. > > It's sad that such a simple thing would require two other horrid > models (EPOLL or KCMP). > > There'[s a reason that KCMP is a config option - *some* of that is > horrible code - but the "compare file descriptors for equality" is not > that reason. > > Note that KCMP really is a broken mess. It's also a potential security > hole, even for the simple things, because of how it ends up comparing > kernel pointers (ie it doesn't just say "same file descriptor", it > gives an ordering of them, so you can use KCMP to sort things in > kernel space). > > And yes, it orders them after obfuscating the pointer, but it's still > not something I would consider sane as a baseline interface. It was > designed for checkpoint-restore, it's the wrong thing to use for some > "are these file descriptors the same". > > The same argument goes for using EPOLL for that. Disgusting hack. > > Just what are the requirements for the GPU stack? Is one of the file > descriptors "trusted", IOW, you know what kind it is? > > Because dammit, it's *so* easy to do. You could just add a core DRM > ioctl for it. Literally just > > struct fd f1 = fdget(fd1); > struct fd f2 = fdget(fd2); > int same; > > same = f1.file && f1.file == f2.file; > fdput(fd1); > fdput(fd2); > return same; > > where the only question is if you also woudl want to deal with O_PATH > fd's, in which case the "fdget()" would be "fdget_raw()". > > Honestly, adding some DRM ioctl for this sounds hacky, but it sounds > less hacky than relying on EPOLL or KCMP. Well, in slightly more code (because it's part of the "import this dma-buf/dma-fence/whatever fd into a driver object" ioctl) this is what we do. The issue is that there's generic userspace (like compositors) that sees these things fly by and would also like to know whether the other side they receive them from is doing nasty stuff/buggy/evil. And they don't have access to the device drm fd (since those are a handful of layers away behind the opengl/vulkan userspace drivers even if the compositor could get at them, and in some cases not even that). So if we do this in drm we'd essentially have to create a special drm_compare_files chardev, put the ioctl there and then tell everyone to make that thing world-accessible. Which is just too close to a real syscall that it's offensive, and hey kcmp does what we want already (but unfortunately also way more). So we rejected adding that to drm. But we did think about it. > I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl > too, if this is possibly a more common thing. and not just DRM wants > it. > > Would something like that work for you? Yes. Adding Simon and Pekka as two of the usual suspects for this kind of stuff. Also example code (the int return value is just so that callers know when kcmp isn't available, they all only care about equality): https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/os_file.c#L239 -Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
[PATCH] drm/drm-bridge.c: Drop conditionals around of_node pointers
Having conditional around the of_node pointer of the drm_bridge structure is not necessary, since drm_bridge structure always has the of_node as its member. Let's drop the conditional to get a better looks, please also note that this is following the already accepted commitments. see commit d8dfccde2709 ("drm/bridge: Drop conditionals around of_node pointers") for reference. Signed-off-by: Sui Jingfeng --- drivers/gpu/drm/drm_bridge.c | 5 - 1 file changed, 5 deletions(-) diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c index 30d66bee0ec6..a6dbe1751e88 100644 --- a/drivers/gpu/drm/drm_bridge.c +++ b/drivers/gpu/drm/drm_bridge.c @@ -352,13 +352,8 @@ int drm_bridge_attach(struct drm_encoder *encoder, struct drm_bridge *bridge, bridge->encoder = NULL; list_del(>chain_node); -#ifdef CONFIG_OF DRM_ERROR("failed to attach bridge %pOF to encoder %s: %d\n", bridge->of_node, encoder->name, ret); -#else - DRM_ERROR("failed to attach bridge to encoder %s: %d\n", - encoder->name, ret); -#endif return ret; } -- 2.34.1
Re: [PATCH] dmabuf: fix dmabuf file poll uaf issue
On Tue, May 7, 2024 at 7:04 AM Christian König wrote: > > Am 07.05.24 um 15:39 schrieb Daniel Vetter: > > On Tue, May 07, 2024 at 12:10:07PM +0200, Christian König wrote: > >> Am 06.05.24 um 21:04 schrieb T.J. Mercier: > >>> On Mon, May 6, 2024 at 2:30 AM Charan Teja Kalla > >>> wrote: > Hi TJ, > > Seems I have got answers from [1], where it is agreed upon epoll() is > the source of issue. > > Thanks a lot for the discussion. > > [1] https://lore.kernel.org/lkml/2d631f0615918...@google.com/ > > Thanks > Charan > >>> Oh man, quite a set of threads on this over the weekend. Thanks for the > >>> link. > >> Yeah and it also has some interesting side conclusion: We should probably > >> tell people to stop using DMA-buf with epoll. > >> > >> The background is that the mutex approach epoll uses to make files > >> disappear > >> from the interest list on close results in the fact that each file can only > >> be part of a single epoll at a time. > >> > >> Now since DMA-buf is build around the idea that we share the buffer > >> representation as file between processes it means that only one process at > >> a > >> time can use epoll with each DMA-buf. > >> > >> So for example if a window manager uses epoll everything is fine. If a > >> client is using epoll everything is fine as well. But if *both* use epoll > >> at > >> the same time it won't work. > >> > >> This can lead to rather funny and hard to debug combinations of failures > >> and > >> I think we need to document this limitation and explicitly point it out. > > Ok, I tested this with a small C program, and you're mixing things up. > > Here's what I got > > > > - You cannot add a file twice to the same epoll file/fd. So that part is > >correct, and also my understanding from reading the kernel code. > > > > - You can add the same file to two different epoll file instaces. Which > >means it's totally fine to use epoll on a dma_buf in different processes > >like both in the compositor and in clients. > > Ah! Than I misunderstood that comment in the discussion. Thanks for > clarifying that. > > > > > - Substantially more entertaining, you can nest epoll instances, and e.g. > >add a 2nd epoll file as an event to the first one. That way you can add > >the same file to both epoll fds, and so end up with the same file > >essentially being added twice to the top-level epoll file. So even > >within one application there's no real issue when e.g. different > >userspace drivers all want to use epoll on the same fd, because you can > >just throw in another level of epoll and it's fine again and you won't > >get an EEXISTS on EPOLL_CTL_ADD. > > > >But I also don't think we have this issue right now anywhere, since it's > >kinda a general epoll issue that happens with any duplicated file. > > I actually have been telling people to (ab)use the epoll behavior to > check if two file descriptors point to the same underlying file when > KCMP isn't available. > > Some environments (Android?) disable KCMP because they see it as > security problem. > Didn't know about this so I checked. Our kernel has CONFIG_KCMP, but seccomp does look like it's blocking kcmp for apps. https://cs.android.com/android/platform/superproject/main/+/main:bionic/libc/SYSCALLS.TXT
Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers
On Tue, May 07, 2024 at 06:25:52PM +0100, Pavel Begunkov wrote: > On 5/7/24 17:48, Jason Gunthorpe wrote: > > On Tue, May 07, 2024 at 09:42:05AM -0700, Mina Almasry wrote: > > > > > 1. Align with devmem TCP to use udmabuf for your io_uring memory. I > > > think in the past you said it's a uapi you don't link but in the face > > > of this pushback you may want to reconsider. > > > > dmabuf does not force a uapi, you can acquire your pages however you > > want and wrap them up in a dmabuf. No uapi at all. > > > > The point is that dmabuf already provides ops that do basically what > > is needed here. We don't need ops calling ops just because dmabuf's > > ops are not understsood or not perfect. Fixup dmabuf. > > Those ops, for example, are used to efficiently return used buffers > back to the kernel, which is uapi, I don't see how dmabuf can be > fixed up to cover it. Sure, but that doesn't mean you can't use dma buf for the other parts of the flow. The per-page lifetime is a different topic than the refcounting and access of the entire bulk of memory. Jason
Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
Am 07.05.24 um 18:46 schrieb Linus Torvalds: On Tue, 7 May 2024 at 04:03, Daniel Vetter wrote: It's really annoying that on some distros/builds we don't have that, and for gpu driver stack reasons we _really_ need to know whether a fd is the same as another, due to some messy uniqueness requirements on buffer objects various drivers have. It's sad that such a simple thing would require two other horrid models (EPOLL or KCMP). There'[s a reason that KCMP is a config option - *some* of that is horrible code - but the "compare file descriptors for equality" is not that reason. Note that KCMP really is a broken mess. It's also a potential security hole, even for the simple things, because of how it ends up comparing kernel pointers (ie it doesn't just say "same file descriptor", it gives an ordering of them, so you can use KCMP to sort things in kernel space). And yes, it orders them after obfuscating the pointer, but it's still not something I would consider sane as a baseline interface. It was designed for checkpoint-restore, it's the wrong thing to use for some "are these file descriptors the same". The same argument goes for using EPOLL for that. Disgusting hack. Just what are the requirements for the GPU stack? Is one of the file descriptors "trusted", IOW, you know what kind it is? Because dammit, it's *so* easy to do. You could just add a core DRM ioctl for it. Literally just struct fd f1 = fdget(fd1); struct fd f2 = fdget(fd2); int same; same = f1.file && f1.file == f2.file; fdput(fd1); fdput(fd2); return same; where the only question is if you also woudl want to deal with O_PATH fd's, in which case the "fdget()" would be "fdget_raw()". Honestly, adding some DRM ioctl for this sounds hacky, but it sounds less hacky than relying on EPOLL or KCMP. I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl too, if this is possibly a more common thing. and not just DRM wants it. Would something like that work for you? Well the generic approach yes, the DRM specific one maybe. IIRC we need to be able to compare both DRM as well as DMA-buf file descriptors. The basic problem userspace tries to solve is that drivers might get the same fd through two different code paths. For example application using OpenGL/Vulkan for rendering and VA-API for video decoding/encoding at the same time. Both APIs get a fd which identifies the device to use. It can be the same, but it doesn't have to. If it's the same device driver connection (or in kernel speak underlying struct file) then you can optimize away importing and exporting of buffers for example. Additional to that it makes cgroup accounting much easier because you don't count things twice because they are shared etc... Regards, Christian. Linus
Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?
On Tue, May 07, 2024 at 04:15:05PM +0100, Bryan O'Donoghue wrote: > On 07/05/2024 16:09, Dmitry Baryshkov wrote: > > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are > > providing data to VPU or DRM, then you should be able to get the buffer > > from the data-consuming device. > > Because we don't necessarily know what the consuming device is, if any. Well ... that's an entirely different issue. And it's unsolved. Currently the approach is to allocate where the constraints are usually most severe (like display, if you need that, or the camera module for input) and then just pray the stack works out without too much copying. All userspace (whether the generic glue or the userspace driver depends a bit upon the exact api) does need to have a copy fallback for these sharing cases, ideally with the copying accelerated by hw. If you try to solve this by just preemptive allocating everything as cma buffers, then you'll make the situation substantially worse (because now you're wasting tons of cma memory where you might not even need it). And without really solving the problem, since for some gpus that memory might be unusable (because you cannot scan that out on any discrete gpu, and sometimes not even on an integrated one). -Sima > Could be VPU, could be Zoom/Hangouts via pipewire, could for argument sake > be GPU or DSP. > > Also if we introduce a dependency on another device to allocate the output > buffers - say always taking the output buffer from the GPU, then we've added > another dependency which is more difficult to guarantee across different > arches. > > --- > bod -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers
On 5/7/24 18:15, Mina Almasry wrote: On Tue, May 7, 2024 at 9:55 AM Pavel Begunkov wrote: On 5/7/24 17:23, Christoph Hellwig wrote: On Tue, May 07, 2024 at 01:18:57PM -0300, Jason Gunthorpe wrote: On Tue, May 07, 2024 at 05:05:12PM +0100, Pavel Begunkov wrote: even in tree if you give them enough rope, and they should not have that rope when the only sensible options are page/folio based kernel memory (incuding large/huge folios) and dmabuf. I believe there is at least one deep confusion here, considering you previously mentioned Keith's pre-mapping patches. The "hooks" are not that about in what format you pass memory, it's arguably the least interesting part for page pool, more or less it'd circulate whatever is given. It's more of how to have a better control over buffer lifetime and implement a buffer pool passing data to users and empty buffers back. Isn't that more or less exactly what dmabuf is? Why do you need another almost dma-buf thing for another project? That's the exact point I've been making since the last round of the series. We don't need to reinvent dmabuf poorly in every subsystem, but instead fix the odd parts in it and make it suitable for everyone. Someone would need to elaborate how dma-buf is like that addition to page pool infra. I think I understand what Jason is requesting here, and I'll take a shot at elaborating. AFAICT what he's saying is technically feasible and addresses the nack while giving you the uapi you want. It just requires a bit (a lot?) of work on your end unfortunately. CONFIG_UDMABUF takes in a memfd, converts it to a dmabuf, and returns it to userspace. See udmabuf_create(). I think what Jason is saying here, is that you can write similar code to udmabuf_creat() that takes in a io_uring memory region, and converts it to a dmabuf inside the kernel. I haven't looked at your series yet too closely (sorry!), but I assume you currently have a netlink API that binds an io_uring memory region to the NIC rx-queue page_pool, right? That netlink API would need to be changed to: No, it's different, I'll skip details, but the main problem is that those callbacks are used to implement the user api returning buffers via a ring, where the callback grabs them (in napi context) and feeds into page pool. That replaces SO_DEVMEM_DONTNEED and the need for ioctl/setsockopt. 1. Take in the io_uring memory. 2. Convert it to a dmabuf like udmabuf_create() does. 3. Bind the resulting dmabuf to the rx-queue page_pool. There would be more changes needed vis-a-vis the clean up path and lifetime management, but I think this is the general idea. This would give you the uapi you want, while the page_pool never seen non-dmabuf memory (addresses the nack, I think). -- Pavel Begunkov
Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers
On 5/7/24 17:48, Jason Gunthorpe wrote: On Tue, May 07, 2024 at 09:42:05AM -0700, Mina Almasry wrote: 1. Align with devmem TCP to use udmabuf for your io_uring memory. I think in the past you said it's a uapi you don't link but in the face of this pushback you may want to reconsider. dmabuf does not force a uapi, you can acquire your pages however you want and wrap them up in a dmabuf. No uapi at all. The point is that dmabuf already provides ops that do basically what is needed here. We don't need ops calling ops just because dmabuf's ops are not understsood or not perfect. Fixup dmabuf. Those ops, for example, are used to efficiently return used buffers back to the kernel, which is uapi, I don't see how dmabuf can be fixed up to cover it. If io_uring wants to take its existing memory pre-registration it can wrap that in a dmbauf, and somehow pass it to the netstack. Userspace doesn't need to know a dmabuf is being used in the background. io_uring's pre-registered memory is just pages, but even that is going to be replaced with just a normal user buffer pointer. Regardless, io_uring can wrap pages into a dmabuf, but it's not a direct replacement for the ops, it'd mandate uapi change in a not desirable way. -- Pavel Begunkov
Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers
On Tue, May 07, 2024 at 01:48:38PM -0300, Jason Gunthorpe wrote: > On Tue, May 07, 2024 at 09:42:05AM -0700, Mina Almasry wrote: > > > 1. Align with devmem TCP to use udmabuf for your io_uring memory. I > > think in the past you said it's a uapi you don't link but in the face > > of this pushback you may want to reconsider. > > dmabuf does not force a uapi, you can acquire your pages however you > want and wrap them up in a dmabuf. No uapi at all. > > The point is that dmabuf already provides ops that do basically what > is needed here. We don't need ops calling ops just because dmabuf's > ops are not understsood or not perfect. Fixup dmabuf. > > If io_uring wants to take its existing memory pre-registration it can > wrap that in a dmbauf, and somehow pass it to the netstack. Userspace > doesn't need to know a dmabuf is being used in the background. So roughly the current dma-buf design considerations for the users of the dma-api interfaces: - It's a memory buffer of fixed length. - Either that memory is permanently nailed into place with dma_buf_pin (and if we add more users than just drm display then we should probably figure out the mlock account question for these). For locking hierarchy dma_buf_pin uses dma_resv_lock which nests within mmap_sem/vma locks but outside of any reclaim/alloc contexts. - Or the memory is more dynamic, in which case case you need to be able to dma_resv_lock when you need the memory and make a promise (as a dma_fence) that you'll release the memory within finite time and without any further allocations once you've unlocked the dma_buf (because dma_fence is in GFP_NORECLAIM). That promise can be waiting for memory access to finish, but it can also be a pte invalidate+tlb flush, or some kind of preemption, or whatever your hw can do really. Also, if you do this dynamic model and need to atomically reserve more than one dma_buf, you get to do the wait/wound mutex dance, but that's really just a bunch of funny looking error handling code and not really impacting the overall design or locking hierarchy. Everything else we can adjust, but I think the above three are not really changeable or dma-buf becomes unuseable for gpu drivers. Note that exporters of dma-buf can pretty much do whatever they feel like, including rejecting all the generic interfaces/ops, because we also use dma-buf as userspace handles for some really special memory. -Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers
On 5/7/24 17:42, Mina Almasry wrote: On Tue, May 7, 2024 at 9:24 AM Christoph Hellwig wrote: On Tue, May 07, 2024 at 01:18:57PM -0300, Jason Gunthorpe wrote: On Tue, May 07, 2024 at 05:05:12PM +0100, Pavel Begunkov wrote: even in tree if you give them enough rope, and they should not have that rope when the only sensible options are page/folio based kernel memory (incuding large/huge folios) and dmabuf. I believe there is at least one deep confusion here, considering you previously mentioned Keith's pre-mapping patches. The "hooks" are not that about in what format you pass memory, it's arguably the least interesting part for page pool, more or less it'd circulate whatever is given. It's more of how to have a better control over buffer lifetime and implement a buffer pool passing data to users and empty buffers back. Isn't that more or less exactly what dmabuf is? Why do you need another almost dma-buf thing for another project? That's the exact point I've been making since the last round of the series. We don't need to reinvent dmabuf poorly in every subsystem, but instead fix the odd parts in it and make it suitable for everyone. FWIW the change Christoph is requesting is straight forward from my POV and doesn't really hurt the devmem use case. I'd basically remove the ops and add an if statement in the slow path where the ops are being used to alloc/free from dmabuf instead of alloc_pages(). Something like (very rough, doesn't compile): diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 92be1aaf18ccc..2cc986455bce6 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -557,8 +557,8 @@ netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp) return netmem; /* Slow-path: cache empty, do real allocation */ - if (static_branch_unlikely(_pool_mem_providers) && pool->mp_ops) - netmem = pool->mp_ops->alloc_pages(pool, gfp); + if (page_pool_is_dmabuf(pool)) + netmem = mp_dmabuf_devmem_alloc_pages(): else netmem = __page_pool_alloc_pages_slow(pool, gfp); return netmem; The folks that will be negatively impacted by this are Jakub/Pavel/David. I think all were planning to extend the hooks for io_uring or other memory types. Pavel/David, AFAICT you have these options here (but maybe you can think of more): 1. Align with devmem TCP to use udmabuf for your io_uring memory. I think in the past you said it's a uapi you don't link but in the face of this pushback you may want to reconsider. If the argument would be that we have to switch to a less efficient and less consistent api for io_uring (fast path handling used buffers back to kernel) just because it has to has dmabuf and without direct relation to dmabuf, then no, it's not the way anything can be sanely developed. 2. Follow the example of devmem TCP and add another if statement to alloc from io_uring, so something like: diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 92be1aaf18ccc..3545bb82c7d05 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -557,8 +557,10 @@ netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp) return netmem; /* Slow-path: cache empty, do real allocation */ - if (static_branch_unlikely(_pool_mem_providers) && pool->mp_ops) - netmem = pool->mp_ops->alloc_pages(pool, gfp); + if (page_pool_is_dmabuf(pool)) + netmem = mp_dmabuf_devmem_alloc_pages(): + else if (page_pool_is_io_uring(pool)) + netmem = mp_io_uring_alloc_pages(): else netmem = __page_pool_alloc_pages_slow(pool, gfp); I don't see why we'd do that instead instead of having a well made function table, which is equivalent. return netmem; Note that Christoph/Jason may not like you adding non-dmabuf io_uring backing memory in the first place, so there may be pushback against this approach. Christoph mentioned pages, we're using pages, I don't think it's too fancy. I don't believe that's it, which would be equivalent to "let's remove user pointers from the kernel and mandate passing dmabuf only". 3. Pushback on the nack on this thread. It seems you're already discussing this. I'll see what happens. To be honest the GVE queue-API has just been merged I think, so I'm now unblocked on sending non-RFCs of this work and I'm hoping to send the next version soon. I may apply these changes on the next version for more discussion or leave as is and carry the nack until the conversation converges. -- Pavel Begunkov
Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers
On Tue, May 7, 2024 at 9:55 AM Pavel Begunkov wrote: > > On 5/7/24 17:23, Christoph Hellwig wrote: > > On Tue, May 07, 2024 at 01:18:57PM -0300, Jason Gunthorpe wrote: > >> On Tue, May 07, 2024 at 05:05:12PM +0100, Pavel Begunkov wrote: > even in tree if you give them enough rope, and they should not have > that rope when the only sensible options are page/folio based kernel > memory (incuding large/huge folios) and dmabuf. > >>> > >>> I believe there is at least one deep confusion here, considering you > >>> previously mentioned Keith's pre-mapping patches. The "hooks" are not > >>> that about in what format you pass memory, it's arguably the least > >>> interesting part for page pool, more or less it'd circulate whatever > >>> is given. It's more of how to have a better control over buffer lifetime > >>> and implement a buffer pool passing data to users and empty buffers > >>> back. > >> > >> Isn't that more or less exactly what dmabuf is? Why do you need > >> another almost dma-buf thing for another project? > > > > That's the exact point I've been making since the last round of > > the series. We don't need to reinvent dmabuf poorly in every > > subsystem, but instead fix the odd parts in it and make it suitable > > for everyone. > > Someone would need to elaborate how dma-buf is like that addition > to page pool infra. I think I understand what Jason is requesting here, and I'll take a shot at elaborating. AFAICT what he's saying is technically feasible and addresses the nack while giving you the uapi you want. It just requires a bit (a lot?) of work on your end unfortunately. CONFIG_UDMABUF takes in a memfd, converts it to a dmabuf, and returns it to userspace. See udmabuf_create(). I think what Jason is saying here, is that you can write similar code to udmabuf_creat() that takes in a io_uring memory region, and converts it to a dmabuf inside the kernel. I haven't looked at your series yet too closely (sorry!), but I assume you currently have a netlink API that binds an io_uring memory region to the NIC rx-queue page_pool, right? That netlink API would need to be changed to: 1. Take in the io_uring memory. 2. Convert it to a dmabuf like udmabuf_create() does. 3. Bind the resulting dmabuf to the rx-queue page_pool. There would be more changes needed vis-a-vis the clean up path and lifetime management, but I think this is the general idea. This would give you the uapi you want, while the page_pool never seen non-dmabuf memory (addresses the nack, I think). -- Thanks, Mina
Re: [PATCH] Revert "drm/i915: Remove extra multi-gt pm-references"
On Tue, May 07, 2024 at 10:54:11AM +0200, Janusz Krzysztofik wrote: > On Tuesday, 7 May 2024 09:30:15 GMT+2 Nirmoy Das wrote: > > Hi Janusz, > > > > > > Just realized we need Fixes tag for this. > > > > Fixes: 1f33dc0c1189 ("drm/i915: Remove extra multi-gt pm-references") > > Whoever is going to push this patch, please feel free to add this tag. dim b4-shazam gets that automagically, now it was sent in reply ;) I just pushed the patch. thanks for the patch and reviews. > > Thanks, > Janusz > > > > > > > Regards, > > > > Nirmoy > > > > On 5/6/2024 8:02 PM, Janusz Krzysztofik wrote: > > > This reverts commit 1f33dc0c1189efb9ae19c6fc22b64dd3e26261fb. > > > > > > There was a patch supposed to fix an issue of illegal attempts to free a > > > still active i915 VMA object when parking a GT believed to be idle, > > > reported by CI on 2-GT Meteor Lake. As a solution, an extra wakeref for > > > a Primary GT was acquired from i915_gem_do_execbuffer() -- see commit > > > f56fe3e91787 ("drm/i915: Fix a VMA UAF for multi-gt platform"). > > > > > > However, that fix occurred insufficient -- the issue was still reported by > > > CI. That wakeref was released on exit from i915_gem_do_execbuffer(), then > > > potentially before completion of the request and deactivation of its > > > associated VMAs. Moreover, CI reports indicated that single-GT platforms > > > also suffered sporadically from the same race. > > > > > > Since that issue was fixed by another commit f3c71b2ded5c ("drm/i915/vma: > > > Fix UAF on destroy against retire race"), the changes introduced by that > > > insufficient fix were dropped as no longer useful. However, that series > > > resulted in another VMA UAF scenario now being triggered in CI. > > > > > > <4> [260.290809] [ cut here ] > > > <4> [260.290988] list_del corruption. prev->next should be > > > 888118c5d990, but was 888118c5a510. (prev=888118c5a510) > > > <4> [260.291004] WARNING: CPU: 2 PID: 1143 at lib/list_debug.c:62 > > > __list_del_entry_valid_or_report+0xb7/0xe0 > > > .. > > > <4> [260.291055] CPU: 2 PID: 1143 Comm: kms_plane Not tainted > > > 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1 > > > <4> [260.291058] Hardware name: Intel Corporation Meteor Lake Client > > > Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 > > > 01/31/2024 > > > <4> [260.291060] RIP: 0010:__list_del_entry_valid_or_report+0xb7/0xe0 > > > ... > > > <4> [260.291087] Call Trace: > > > <4> [260.291089] > > > <4> [260.291124] i915_vma_reopen+0x43/0x80 [i915] > > > <4> [260.291298] eb_lookup_vmas+0x9cb/0xcc0 [i915] > > > <4> [260.291579] i915_gem_do_execbuffer+0xc9a/0x26d0 [i915] > > > <4> [260.291883] i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915] > > > ... > > > <4> [260.292301] > > > ... > > > <4> [260.292506] ---[ end trace ]--- > > > <4> [260.292782] general protection fault, probably for non-canonical > > > address 0x6b6b6b6b6b6b6ca3: [#1] PREEMPT SMP NOPTI > > > <4> [260.303575] CPU: 2 PID: 1143 Comm: kms_plane Tainted: GW > > > 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1 > > > <4> [260.313851] Hardware name: Intel Corporation Meteor Lake Client > > > Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 > > > 01/31/2024 > > > <4> [260.326359] RIP: 0010:eb_validate_vmas+0x114/0xd80 [i915] > > > ... > > > <4> [260.428756] Call Trace: > > > <4> [260.431192] > > > <4> [639.283393] i915_gem_do_execbuffer+0xd05/0x26d0 [i915] > > > <4> [639.305245] i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915] > > > ... > > > <4> [639.411134] > > > ... > > > <4> [639.449979] ---[ end trace ]--- > > > > > > We defer actually closing, unbinding and destroying a VMA until next idle > > > point, or until the object is freed in the meantime. By postponing the > > > unbind, we allow for the VMA to be reopened by the client, avoiding the > > > work required to rebind the VMA. > > > > > > Starting from commit b0647a5e79b1 ("drm/i915: Avoid live-lock with > > > i915_vma_parked()"), we assume that as long as a GT is held idle, no VMA > > > would be reopened while we destroy them. That assumption is no longer > > > true in multi-GT configurations, where a VMA we reopen may be handled by a > > > GT different from the one that we already keep active via its engine while > > > we set up an execbuf request. > > > > > > Restoring the extra GT0 PM wakeref removed from i915_gem_do_execbuffer() > > > processing path seems to fix this issue. > > > > > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10608 > > > Signed-off-by: Janusz Krzysztofik > > > Cc: Rodrigo Vivi > > > Cc: Nirmoy Das > > > --- > > > drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 18 ++ > > > 1 file changed, 18 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > > > index 42619fc05de48..090724fa766c9 100644
Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers
On 5/7/24 17:23, Christoph Hellwig wrote: On Tue, May 07, 2024 at 01:18:57PM -0300, Jason Gunthorpe wrote: On Tue, May 07, 2024 at 05:05:12PM +0100, Pavel Begunkov wrote: even in tree if you give them enough rope, and they should not have that rope when the only sensible options are page/folio based kernel memory (incuding large/huge folios) and dmabuf. I believe there is at least one deep confusion here, considering you previously mentioned Keith's pre-mapping patches. The "hooks" are not that about in what format you pass memory, it's arguably the least interesting part for page pool, more or less it'd circulate whatever is given. It's more of how to have a better control over buffer lifetime and implement a buffer pool passing data to users and empty buffers back. Isn't that more or less exactly what dmabuf is? Why do you need another almost dma-buf thing for another project? That's the exact point I've been making since the last round of the series. We don't need to reinvent dmabuf poorly in every subsystem, but instead fix the odd parts in it and make it suitable for everyone. Someone would need to elaborate how dma-buf is like that addition to page pool infra. The granularity here is usually 4K and less (hw dictated), what user receives cannot be guaranteed to be contiguous in memory. Having thousands of dma-buf instances is not an option, so a completion would need to include a range where data sits. Then who controls lifetime of buffers? If it's dma-buf, then at least it needs to track what sub-buffers are handed to user and what are currently in the kernel. How it would be accounted? ioctl_return_subrange(dmabuf, [range]), sounds like a bad idea for performance. To cover user memory it'd also need to be read from userspace, ioctl here wouldn't be an option, but let's say it's somehow done in the kernel. That's not all the list, but in short, even though I haven't been following dma-buf developments too closely, I have hard time seeing how it can be a replacement here. -- Pavel Begunkov
Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Tue, 7 May 2024 at 04:03, Daniel Vetter wrote: > > It's really annoying that on some distros/builds we don't have that, and > for gpu driver stack reasons we _really_ need to know whether a fd is the > same as another, due to some messy uniqueness requirements on buffer > objects various drivers have. It's sad that such a simple thing would require two other horrid models (EPOLL or KCMP). There'[s a reason that KCMP is a config option - *some* of that is horrible code - but the "compare file descriptors for equality" is not that reason. Note that KCMP really is a broken mess. It's also a potential security hole, even for the simple things, because of how it ends up comparing kernel pointers (ie it doesn't just say "same file descriptor", it gives an ordering of them, so you can use KCMP to sort things in kernel space). And yes, it orders them after obfuscating the pointer, but it's still not something I would consider sane as a baseline interface. It was designed for checkpoint-restore, it's the wrong thing to use for some "are these file descriptors the same". The same argument goes for using EPOLL for that. Disgusting hack. Just what are the requirements for the GPU stack? Is one of the file descriptors "trusted", IOW, you know what kind it is? Because dammit, it's *so* easy to do. You could just add a core DRM ioctl for it. Literally just struct fd f1 = fdget(fd1); struct fd f2 = fdget(fd2); int same; same = f1.file && f1.file == f2.file; fdput(fd1); fdput(fd2); return same; where the only question is if you also woudl want to deal with O_PATH fd's, in which case the "fdget()" would be "fdget_raw()". Honestly, adding some DRM ioctl for this sounds hacky, but it sounds less hacky than relying on EPOLL or KCMP. I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl too, if this is possibly a more common thing. and not just DRM wants it. Would something like that work for you? Linus
Re: [PATCH 1/7] dt-bindings: display: rockchip, dw-mipi-dsi: Document RK3128 DSI
On Mon, May 06, 2024 at 09:43:36PM +0200, Alex Bee wrote: > Document the MIPI DSI controller for Rockchip RK3128. The integration is > similar to PX30 so it's bindings-constraints can be re-used. > > Signed-off-by: Alex Bee Acked-by: Conor Dooley Cheers, Conor. signature.asc Description: PGP signature
Re: [PATCH 2/7] dt-bindings: clock: rk3128: Add PCLK_MIPIPHY
On Mon, May 06, 2024 at 09:43:37PM +0200, Alex Bee wrote: > The DPHY's APB clock is required to be exposed in order to be able to > enable it and access the phy's registers. > > Signed-off-by: Alex Bee Acked-by: Conor Dooley signature.asc Description: PGP signature
Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers
On Tue, May 07, 2024 at 09:42:05AM -0700, Mina Almasry wrote: > 1. Align with devmem TCP to use udmabuf for your io_uring memory. I > think in the past you said it's a uapi you don't link but in the face > of this pushback you may want to reconsider. dmabuf does not force a uapi, you can acquire your pages however you want and wrap them up in a dmabuf. No uapi at all. The point is that dmabuf already provides ops that do basically what is needed here. We don't need ops calling ops just because dmabuf's ops are not understsood or not perfect. Fixup dmabuf. If io_uring wants to take its existing memory pre-registration it can wrap that in a dmbauf, and somehow pass it to the netstack. Userspace doesn't need to know a dmabuf is being used in the background. Jason