Re: [PATCH] drm/edid: add a quirk for two 240Hz Samsung monitors
On Thu, Nov 2, 2023 at 8:06 PM Alex Deucher wrote: > > On Thu, Nov 2, 2023 at 3:00 PM Hamza Mahfooz wrote: > > > > On 11/1/23 17:36, Alex Deucher wrote: > > > On Wed, Nov 1, 2023 at 5:01 PM Hamza Mahfooz > > > wrote: > > >> > > >> Without this fix the 5120x1440@240 timing of these monitors > > >> leads to screen flickering. > > >> > > >> Cc: sta...@vger.kernel.org # 6.1+ > > >> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1442 > > >> Co-developed-by: Harry Wentland > > >> Signed-off-by: Harry Wentland > > >> Signed-off-by: Hamza Mahfooz > > >> --- > > >> drivers/gpu/drm/drm_edid.c | 47 +++--- > > >> 1 file changed, 44 insertions(+), 3 deletions(-) > > >> > > >> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c > > >> index bca2af4fe1fc..3fdb8907f66b 100644 > > >> --- a/drivers/gpu/drm/drm_edid.c > > >> +++ b/drivers/gpu/drm/drm_edid.c > > >> @@ -89,6 +89,8 @@ static int oui(u8 first, u8 second, u8 third) > > >> #define EDID_QUIRK_NON_DESKTOP (1 << 12) > > >> /* Cap the DSC target bitrate to 15bpp */ > > >> #define EDID_QUIRK_CAP_DSC_15BPP (1 << 13) > > >> +/* Fix up a particular 5120x1440@240Hz timing */ > > >> +#define EDID_QUIRK_FIXUP_5120_1440_240 (1 << 14) > > > > > > What is wrong with the original timing that needs to be fixed? > > > > Apparently, all of timing values for the 5120x1440@240 mode of these > > monitors aren't set correctly (they are all lower than they should be) > > in their EDIDs. For what it's worth, the windows driver has had a quirk > > similar the one proposed in this patch for ~2 years. > > It would be good to at least include the original mode timings from > the EDID and the new ones added by the quirk in the commit message and > a description of why they are problematic and why the new ones work. > > Alex > I think this part what nvidia is doing in their driver is missing: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/common/modeset/timing/nvt_edidext_861.c#L342 A nouveau user hit this and I think the edid parser in the kernel is just lacking whatever that "RID_MODE" stuff is all about. > > > > > > > > > Alex > > > > > > > > >> > > >> #define MICROSOFT_IEEE_OUI 0xca125c > > >> > > >> @@ -170,6 +172,12 @@ static const struct edid_quirk { > > >> EDID_QUIRK('S', 'A', 'M', 596, EDID_QUIRK_PREFER_LARGE_60), > > >> EDID_QUIRK('S', 'A', 'M', 638, EDID_QUIRK_PREFER_LARGE_60), > > >> > > >> + /* Samsung C49G95T */ > > >> + EDID_QUIRK('S', 'A', 'M', 0x7053, > > >> EDID_QUIRK_FIXUP_5120_1440_240), > > >> + > > >> + /* Samsung S49AG95 */ > > >> + EDID_QUIRK('S', 'A', 'M', 0x71ac, > > >> EDID_QUIRK_FIXUP_5120_1440_240), > > >> + > > >> /* Sony PVM-2541A does up to 12 bpc, but only reports max 8 bpc > > >> */ > > >> EDID_QUIRK('S', 'N', 'Y', 0x2541, EDID_QUIRK_FORCE_12BPC), > > >> > > >> @@ -6586,7 +6594,37 @@ static void update_display_info(struct > > >> drm_connector *connector, > > >> drm_edid_to_eld(connector, drm_edid); > > >> } > > >> > > >> -static struct drm_display_mode *drm_mode_displayid_detailed(struct > > >> drm_device *dev, > > >> +static void drm_mode_displayid_detailed_edid_quirks(struct > > >> drm_connector *connector, > > >> + struct > > >> drm_display_mode *mode) > > >> +{ > > >> + unsigned int hsync_width; > > >> + unsigned int vsync_width; > > >> + > > >> + if (connector->display_info.quirks & > > >> EDID_QUIRK_FIXUP_5120_1440_240) { > > >> + if (mode->hdisplay == 5120 && mode->vdisplay == 1440 && > > >> + mode->clock == 1939490) { > > >> + hsync_width = mode->hsync_end - > > >> mode->hsync_start; > > >> + vsync_width = mode->vsync_end - > > >> mode->vsync_start; > > >> + > > >> + mode->clock = 2018490; > > >> + mode->hdisplay = 5120; > > >> + mode->hsync_start = 5120 + 8; > > >> + mode->hsync_end = 5120 + 8 + hsync_width; > > >> + mode->htotal = 5200; > > >> + > > >> + mode->vdisplay = 1440; > > >> + mode->vsync_start = 1440 + 165; > > >> + mode->vsync_end = 1440 + 165 + vsync_width; > > >> + mode->vtotal = 1619; > > >> + > > >> + drm_dbg_kms(connector->dev, > > >> + "[CONNECTOR:%d:%s] Samsung 240Hz > > >> mode quirk applied\n", > > >> + connector->base.id, connector->name); > > >> + } > > >> + } > > >> +} > > >> + > > >> +static struct drm_display_mode *drm_mode_displayid_detailed(struct > > >> drm_connector *connector, > > >> struct > > >> di
Re: [PATCH 3/9] drm/ttm: use per BO cleanup workers
On Thu, Jun 15, 2023 at 1:19 PM Christian König wrote: > > Am 13.06.23 um 16:18 schrieb Karol Herbst: > > On Tue, Jun 13, 2023 at 3:59 PM Christian König > > wrote: > >> Am 13.06.23 um 15:05 schrieb Karol Herbst: > >>> On Mon, Dec 5, 2022 at 2:40 PM Christian König > >>> wrote: > >>>> Am 29.11.22 um 22:14 schrieb Felix Kuehling: > >>>>> On 2022-11-25 05:21, Christian König wrote: > >>>>>> Instead of a single worker going over the list of delete BOs in regular > >>>>>> intervals use a per BO worker which blocks for the resv object and > >>>>>> locking of the BO. > >>>>>> > >>>>>> This not only simplifies the handling massively, but also results in > >>>>>> much better response time when cleaning up buffers. > >>>>>> > >>>>>> Signed-off-by: Christian König > >>>>> Just thinking out loud: If I understand it correctly, this can cause a > >>>>> lot of sleeping worker threads when > >>>>> AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE is used and many BOs are freed > >>>>> at the same time. This happens e.g. when a KFD process terminates or > >>>>> crashes. I guess with a concurrency-managed workqueue this isn't going > >>>>> to be excessive. And since it's on a per device workqueue, it doesn't > >>>>> stall work items on the system work queue or from other devices. > >>>> Yes, exactly that. The last parameter to alloc_workqueue() limits how > >>>> many work items can be sleeping. > >>>> > >>>>> I'm trying to understand why you set WQ_MEM_RECLAIM. This work queue > >>>>> is not about freeing ttm_resources but about freeing the BOs. But it > >>>>> affects freeing of ghost_objs that are holding the ttm_resources being > >>>>> freed. > >>>> Well if the BO is idle, but not immediately lockable we delegate freeing > >>>> the backing pages in the TT object to those workers as well. It might > >>>> even be a good idea to use a separate wq for this case. > >>>> > >>>>> If those assumptions all make sense, patches 1-3 are > >>>>> > >>>>> Reviewed-by: Felix Kuehling > >>>> Thanks, > >>>> Christian. > >>>> > >>> This patch causes a heap use-after-free when using nouveau with the > >>> potential of trashing filesystems, is there a way to revert it until > >>> we figure out a proper solution to the problem? > >> Uff I don't think so, we have quite some work based on top of this. But > >> let me double check. > >> > > yeah.. I already talked with Dave about fixing this issue as Dave has > > more knowledge on this part of the driver (I hope), so we might have a > > fix soonish, but the concerning part is, that it's already out to > > users, so might be better to be able to revert it if the fix takes a > > while to emerge. > > We at least can't revert easily. This was fixing issues where we have > seen OOM situations because TTM wasn't releasing resources fast enough. > > >> On the other hand have you tried running this with KASAN to catch use > >> after free errors? > > yes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/213#note_1942777 > > > >> Since we now block for work to finish and not check every few > >> milliseconds to garbage collect memory will now be reclaimed much faster > >> after freeing it. > > yeah, that kinda makes sense. This entire issue feels like a race > > happening as I need to run the OpenGL CTS in parallel with 8+ threads > > to trigger it reliably. > > Yeah, from the bug report that's a classic use after free because of a race. > > Easiest would be to make sure everybody has a reference to the fence. > turns out nouveau was using `dma_fence_is_signaled_locked` without taking the lock. Fixing that seems to improve the situation by a lot, so I think we have a fix to resolve this problem. > Regards, > Christian. > > > > >> Regards, > >> Christian. > >> > >>> Bug: https://gitlab.freedesktop.org/drm/nouveau/-/issues/213 > >>> > >>> example trace on affected systems: > >>> > >>> [ 4102.946946] general protection fault, probably for non-canonical > >>> address 0x5f775ce3bd949b45: [#3] PREEMPT SMP NOPTI
Re: [PATCH 3/9] drm/ttm: use per BO cleanup workers
On Tue, Jun 13, 2023 at 3:59 PM Christian König wrote: > > Am 13.06.23 um 15:05 schrieb Karol Herbst: > > On Mon, Dec 5, 2022 at 2:40 PM Christian König > > wrote: > >> Am 29.11.22 um 22:14 schrieb Felix Kuehling: > >>> On 2022-11-25 05:21, Christian König wrote: > >>>> Instead of a single worker going over the list of delete BOs in regular > >>>> intervals use a per BO worker which blocks for the resv object and > >>>> locking of the BO. > >>>> > >>>> This not only simplifies the handling massively, but also results in > >>>> much better response time when cleaning up buffers. > >>>> > >>>> Signed-off-by: Christian König > >>> Just thinking out loud: If I understand it correctly, this can cause a > >>> lot of sleeping worker threads when > >>> AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE is used and many BOs are freed > >>> at the same time. This happens e.g. when a KFD process terminates or > >>> crashes. I guess with a concurrency-managed workqueue this isn't going > >>> to be excessive. And since it's on a per device workqueue, it doesn't > >>> stall work items on the system work queue or from other devices. > >> Yes, exactly that. The last parameter to alloc_workqueue() limits how > >> many work items can be sleeping. > >> > >>> I'm trying to understand why you set WQ_MEM_RECLAIM. This work queue > >>> is not about freeing ttm_resources but about freeing the BOs. But it > >>> affects freeing of ghost_objs that are holding the ttm_resources being > >>> freed. > >> Well if the BO is idle, but not immediately lockable we delegate freeing > >> the backing pages in the TT object to those workers as well. It might > >> even be a good idea to use a separate wq for this case. > >> > >>> If those assumptions all make sense, patches 1-3 are > >>> > >>> Reviewed-by: Felix Kuehling > >> Thanks, > >> Christian. > >> > > This patch causes a heap use-after-free when using nouveau with the > > potential of trashing filesystems, is there a way to revert it until > > we figure out a proper solution to the problem? > > Uff I don't think so, we have quite some work based on top of this. But > let me double check. > yeah.. I already talked with Dave about fixing this issue as Dave has more knowledge on this part of the driver (I hope), so we might have a fix soonish, but the concerning part is, that it's already out to users, so might be better to be able to revert it if the fix takes a while to emerge. > On the other hand have you tried running this with KASAN to catch use > after free errors? yes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/213#note_1942777 > > Since we now block for work to finish and not check every few > milliseconds to garbage collect memory will now be reclaimed much faster > after freeing it. yeah, that kinda makes sense. This entire issue feels like a race happening as I need to run the OpenGL CTS in parallel with 8+ threads to trigger it reliably. > > Regards, > Christian. > > > > > Bug: https://gitlab.freedesktop.org/drm/nouveau/-/issues/213 > > > > example trace on affected systems: > > > > [ 4102.946946] general protection fault, probably for non-canonical > > address 0x5f775ce3bd949b45: [#3] PREEMPT SMP NOPTI > > [ 4102.957794] CPU: 12 PID: 89561 Comm: glcts Tainted: G D > > 6.3.5-200.fc38.x86_64 #1 > > [ 4102.966556] Hardware name: ASUS System Product Name/PRIME B660-PLUS > > D4, BIOS 0418 10/13/2021 > > [ 4102.974972] RIP: 0010:__kmem_cache_alloc_node+0x1ba/0x320 > > [ 4102.980362] Code: 2b 14 25 28 00 00 00 0f 85 74 01 00 00 48 83 c4 > > 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 41 8b 47 28 4d 8b 07 > > 48 01 f8 <48> 8b 18 48 89 c1 49 33 9f b8 00 00 00 48 0f c9 48 31 cb 41 > > f6 c0 > > [ 4102.999073] RSP: 0018:9764e0057b40 EFLAGS: 00010202 > > [ 4103.004291] RAX: 5f775ce3bd949b45 RBX: 0dc0 RCX: > > 0046 > > [ 4103.011408] RDX: 0002cf87600c RSI: 0dc0 RDI: > > 5f775ce3bd949b15 > > [ 4103.018528] RBP: 0dc0 R08: 000390c0 R09: > > 30302d6d > > [ 4103.025649] R10: 756c7473 R11: 20090298 R12: > > > > [ 4103.032767] R13: R14: 0046 R15: > > 8bda80042600 > > [ 4103.039887] FS: 7f386a85ef00() GS:8be1df70() > > knlGS
Re: [PATCH 3/9] drm/ttm: use per BO cleanup workers
On Mon, Dec 5, 2022 at 2:40 PM Christian König wrote: > > Am 29.11.22 um 22:14 schrieb Felix Kuehling: > > On 2022-11-25 05:21, Christian König wrote: > >> Instead of a single worker going over the list of delete BOs in regular > >> intervals use a per BO worker which blocks for the resv object and > >> locking of the BO. > >> > >> This not only simplifies the handling massively, but also results in > >> much better response time when cleaning up buffers. > >> > >> Signed-off-by: Christian König > > > > Just thinking out loud: If I understand it correctly, this can cause a > > lot of sleeping worker threads when > > AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE is used and many BOs are freed > > at the same time. This happens e.g. when a KFD process terminates or > > crashes. I guess with a concurrency-managed workqueue this isn't going > > to be excessive. And since it's on a per device workqueue, it doesn't > > stall work items on the system work queue or from other devices. > > Yes, exactly that. The last parameter to alloc_workqueue() limits how > many work items can be sleeping. > > > I'm trying to understand why you set WQ_MEM_RECLAIM. This work queue > > is not about freeing ttm_resources but about freeing the BOs. But it > > affects freeing of ghost_objs that are holding the ttm_resources being > > freed. > > Well if the BO is idle, but not immediately lockable we delegate freeing > the backing pages in the TT object to those workers as well. It might > even be a good idea to use a separate wq for this case. > > > > > If those assumptions all make sense, patches 1-3 are > > > > Reviewed-by: Felix Kuehling > > Thanks, > Christian. > This patch causes a heap use-after-free when using nouveau with the potential of trashing filesystems, is there a way to revert it until we figure out a proper solution to the problem? Bug: https://gitlab.freedesktop.org/drm/nouveau/-/issues/213 example trace on affected systems: [ 4102.946946] general protection fault, probably for non-canonical address 0x5f775ce3bd949b45: [#3] PREEMPT SMP NOPTI [ 4102.957794] CPU: 12 PID: 89561 Comm: glcts Tainted: G D 6.3.5-200.fc38.x86_64 #1 [ 4102.966556] Hardware name: ASUS System Product Name/PRIME B660-PLUS D4, BIOS 0418 10/13/2021 [ 4102.974972] RIP: 0010:__kmem_cache_alloc_node+0x1ba/0x320 [ 4102.980362] Code: 2b 14 25 28 00 00 00 0f 85 74 01 00 00 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 41 8b 47 28 4d 8b 07 48 01 f8 <48> 8b 18 48 89 c1 49 33 9f b8 00 00 00 48 0f c9 48 31 cb 41 f6 c0 [ 4102.999073] RSP: 0018:9764e0057b40 EFLAGS: 00010202 [ 4103.004291] RAX: 5f775ce3bd949b45 RBX: 0dc0 RCX: 0046 [ 4103.011408] RDX: 0002cf87600c RSI: 0dc0 RDI: 5f775ce3bd949b15 [ 4103.018528] RBP: 0dc0 R08: 000390c0 R09: 30302d6d [ 4103.025649] R10: 756c7473 R11: 20090298 R12: [ 4103.032767] R13: R14: 0046 R15: 8bda80042600 [ 4103.039887] FS: 7f386a85ef00() GS:8be1df70() knlGS: [ 4103.047958] CS: 0010 DS: ES: CR0: 80050033 [ 4103.053692] CR2: 0493b868 CR3: 00014c3ba000 CR4: 00f50ee0 [ 4103.060812] PKRU: 5554 [ 4103.063520] Call Trace: [ 4103.065970] [ 4103.068071] ? die_addr+0x36/0x90 [ 4103.071384] ? exc_general_protection+0x1be/0x420 [ 4103.076081] ? asm_exc_general_protection+0x26/0x30 [ 4103.080952] ? __kmem_cache_alloc_node+0x1ba/0x320 [ 4103.085734] ? ext4_htree_store_dirent+0x42/0x180 [ 4103.090431] ? ext4_htree_store_dirent+0x42/0x180 [ 4103.095132] __kmalloc+0x4d/0x150 [ 4103.098444] ext4_htree_store_dirent+0x42/0x180 [ 4103.102970] htree_dirblock_to_tree+0x1ed/0x370 [ 4103.107494] ext4_htree_fill_tree+0x109/0x3d0 [ 4103.111846] ext4_readdir+0x6d4/0xa80 [ 4103.115505] iterate_dir+0x178/0x1c0 [ 4103.119076] __x64_sys_getdents64+0x88/0x130 [ 4103.123341] ? __pfx_filldir64+0x10/0x10 [ 4103.127260] do_syscall_64+0x5d/0x90 [ 4103.130835] ? handle_mm_fault+0x11e/0x310 [ 4103.134927] ? do_user_addr_fault+0x1e0/0x720 [ 4103.139278] ? exc_page_fault+0x7c/0x180 [ 4103.143195] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 4103.148240] RIP: 0033:0x7f386a418047 [ 4103.151828] Code: 24 fb ff 4c 89 e0 5b 41 5c 5d c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 91 cd 0f 00 f7 d8 64 89 02 48 [ 4103.170543] RSP: 002b:7ffd4793ff38 EFLAGS: 0293 ORIG_RAX: 00d9 [ 4103.178095] RAX: ffda RBX: 04933830 RCX: 7f386a418047 [ 4103.185214] RDX: 8000 RSI: 04933860 RDI: 0006 [ 4103.192335] RBP: 7ffd4793ff70 R08: R09: 0001 [ 4103.199454] R10: 0004 R11: 0293 R12: 04933834 [ 4103.206573] R13: 04933860 R14: ff60 R15: [ 4103.213695] [ 4103.215883] Modul
Re: [PATCH 000/141] Fix fall-through warnings for Clang
On Thu, Nov 26, 2020 at 4:28 PM Geert Uytterhoeven wrote: > > Hi Miguel, > > On Thu, Nov 26, 2020 at 3:54 PM Miguel Ojeda > wrote: > > On Wed, Nov 25, 2020 at 11:44 PM Edward Cree wrote: > > > To make the intent clear, you have to first be certain that you > > > understand the intent; otherwise by adding either a break or a > > > fallthrough to suppress the warning you are just destroying the > > > information that "the intent of this code is unknown". > > > > If you don't know what the intent of your own code is, then you > > *already* have a problem in your hands. > > The maintainer is not necessarily the owner/author of the code, and > thus may not know the intent of the code. > > > > or does it flag up code > > > that can be mindlessly "fixed" (in which case the warning is > > > worthless)? Proponents in this thread seem to be trying to > > > have it both ways. > > > > A warning is not worthless just because you can mindlessly fix it. > > There are many counterexamples, e.g. many > > checkpatch/lint/lang-format/indentation warnings, functional ones like > > the `if (a = b)` warning... > > BTW, you cannot mindlessly fix the latter, as you cannot know if > "(a == b)" or "((a = b))" was intended, without understanding the code > (and the (possibly unavailable) data sheet, and the hardware, ...). > to allow assignments in if statements was clearly a mistake and if you need outside information to understand the code, your code is the issue already. > P.S. So far I've stayed out of this thread, as I like it if the compiler > flags possible mistakes. After all I was the one fixing new > "may be used uninitialized" warnings thrown up by gcc-4.1, until > (a bit later than) support for that compiler was removed... > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- > ge...@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like > that. > -- Linus Torvalds > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx