Re: [Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers
Did you just cherry-pick my change, or were you running the latest drm-next or drm-fixes code? There do appear to be various MM-related fixes that may be related to this in drm-fixes when I scroll down the log looking for nouveau stuff. Shot in the dark, but might be worth trying with Dave's tree if you weren't already. I was testing with drm-fixes-2020-07-17-1 from here: git://anongit.freedesktop.org/drm/drm Thanks, -James On 7/17/20 8:13 PM, James Jones wrote: This doesn't look related. -James On 7/17/20 2:30 PM, Kirill A. Shutemov wrote: On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote: Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Signed-off-by: James Jones I tried and it crashes. Not sure if it's related. [drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF [drm:vblank_disable_fn] disabling vblank on crtc 0 [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_CPU_PREP [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF BUG: unable to handle page fault for address: 059c #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD 0 P4D 0 Oops: [#1] PREEMPT SMP PTI CPU: 13 PID: 3351 Comm: alacritty Tainted: G I 5.8.0-rc5-00191-g086f86c033f9 #53 Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3 Pro/X299 AORUS Gaming 3 Pro-CF, BIOS F5d 11/28/2019 RIP: 0010:kmem_cache_alloc_trace (/home/kas/linux/torvalds/mm/slub.c:272 /home/kas/linux/torvalds/mm/slub.c:278 /home/kas/linux/torvalds/mm/slub.c:292 /home/kas/linux/torvalds/mm/slub.c:2791 /home/kas/linux/torvalds/mm/slub.c:2832 /home/kas/linux/torvalds/mm/slub.c:2849) Code: 8b 51 08 48 89 c8 65 48 03 05 d4 0e ca 70 48 8b 70 08 48 39 f2 75 e7 4c 8b 38 4d 85 ff 0f 84 8f 01 00 00 8b 45 20 48 8b 7d 00 <49> 8b 1c 07 40 f6 c7 0f 0f 85 95 01 00 00 48 8d 8a 80 00 00 00 4c All code 0: 8b 51 08 mov 0x8(%rcx),%edx 3: 48 89 c8 mov %rcx,%rax 6: 65 48 03 05 d4 0e ca add %gs:0x70ca0ed4(%rip),%rax # 0x70ca0ee2 d: 70 e: 48 8b 70 08 mov 0x8(%rax),%rsi 12: 48 39 f2 cmp %rsi,%rdx 15: 75 e7 jne 0xfffe 17: 4c 8b 38 mov (%rax),%r15 1a: 4d 85 ff test %r15,%r15 1d: 0f 84 8f 01 00 00 je 0x1b2 23: 8b 45 20 mov 0x20(%rbp),%eax 26: 48 8b 7d 00 mov 0x0(%rbp),%rdi 2a:* 49 8b 1c 07 mov (%r15,%rax,1),%rbx <-- trapping instruction 2e: 40 f6 c7 0f test $0xf,%dil 32: 0f 85 95 01 00 00 jne 0x1cd 38: 48 8d 8a 80 00 00 00 lea 0x80(%rdx),%rcx 3f: 4c rex.WR Code starting with the faulting instruction === 0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx 4: 40 f6 c7 0f test $0xf,%dil 8: 0f 85 95 01 00 00 jne 0x1a3 e: 48 8d 8a 80 00 00 00 lea 0x80(%rdx),%rcx 15: 4c rex.WR RSP: 0018:a8a381bcfba0 EFLAGS: 00010206 RAX: 0030 RBX: 9c0b15e05e00 RCX: 0003fe50 RDX: fc8d RSI: fc8d RDI: 0003fe50 RBP: 9c0b18407840 R08: R09: 0001 R10: 9c0b06c28000 R11: 0001 R12: 0dc0 [drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_GET_SEQUENCE R13: 0060 R14: 8fa35a47 R15: 056c FS: 7fbe7a8e3900() GS:9c0b1f88() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 059c CR3: 00103c7fe004 CR4: 003606e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: nouveau_fence_new (/home/kas/linux/torvalds/include/linux/slab.h:555 /home/kas/linux/torvalds/include/linux/slab.h:669 /home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_fence.c:423) [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0 nouveau_gem_ioctl_pushbuf (/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:852) [drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_QUEUE_SEQUENCE ? nouveau_gem_ioctl_new (/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680) ? drm_ioctl_kernel (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:793) ? nouveau_gem_ioctl_new
Re: [Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers
On 7/17/20 12:47 PM, Daniel Vetter wrote: On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote: Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Just bikeshed, but maybe a few more words on what exactly is broken and how this works around it. Specifically why we only accept these, but don't advertise them. Added quite a few words. Signed-off-by: James Jones Needs Fixes: line here. Also nice to mention the bug reporter/link. Done in v2. --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } Hm I tried to understand what this magic does by looking at drm_fourcc.h, but the drm_fourcc_canonicalize_nvidia_format_mod() in there implements something else. Is that function wrong, or should we use it here instead? > Or is there something else going on entirely? This may be slightly clearer with the expanded change description: Canonicalize assumes the old modifiers are only used by certain Tegra revisions, because the Mesa patches were supposed to land and obliterate all uses beyond that. That assumption means it can assume the specific page kind (0xfe) used by the display-engine-compatible layout on those specific devices. There is no way to generally canonicalize a legacy modifier without referencing a specific device type, as is indirectly done here. This code does a limited device-specific canonicalization: It substitutes the display-appropriate page kind used by this specific device, ensuring we derive this correct page kind later in the function. I iterated on the best way to accomplish this a few times, and this was the least-invasive thing I came up with, but it does require a pretty thorough understanding of the NVIDIA modifier macros. Thanks for the quick review. -James Cheers, Daniel + if (modifier == DRM_FORMAT_MOD_LINEAR) { /* tile_mode will not be used in this case */ *tile_mode = 0; @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, } } +static const u64 legacy_modifiers[] = { + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), + DRM_FORMAT_MOD_INVALID +}; + static int nouveau_validate_decode_mod(struct nouveau_drm *drm, uint64_t modifier, @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, (disp->format_modifiers[mod] != modifier); mod++); - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) - return -EINVAL; + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { + for (mod = 0; +(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(legacy_modifiers[mod] != modifier); +mod++); + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + } nouveau_decode_mod(drm, modifier, tile_mode, kind); -- 2.17.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH v2] drm/nouveau: Accept 'legacy' format modifiers
Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Existing Mesa drivers are still aware of only these older format modifiers which do not differentiate between different variations of the block linear layout. When the format modifier support flag was flipped in the nouveau kernel driver, the X.org modesetting driver began attempting to use its format modifier-enabled framebuffer path. Because the set of format modifiers advertised by the kernel prior to this change do not intersect with the set of format modifiers advertised by Mesa, allocating GBM buffers using format modifiers fails and the modesetting driver falls back to non-modifier allocation. However, it still later queries the modifier of the GBM buffer when creating its DRM-KMS framebuffer object, receives the old-format modifier from Mesa, and attempts to create a framebuffer with it. Since the kernel is still not aware of these formats, this fails. Userspace should not be attempting to query format modifiers of GBM buffers allocated with a non- format-modifier-aware allocation path, but to avoid breaking existing userspace behavior, this change accepts the old-style format modifiers when creating framebuffers and applying them to planes by translating them to the equivalent new-style modifier. To accomplish this, some layout parameters must be assumed to match properties of the device targeted by the relevant ioctls. To avoid perpetuating misuse of the old-style modifiers, this change does not advertise support for them. Doing so would imply compatibility between devices with incompatible memory layouts. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Reported-by: Kirill A. Shutemov Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers") Link: https://lkml.org/lkml/2020/6/30/1251 Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } + if (modifier == DRM_FORMAT_MOD_LINEAR) { /* tile_mode will not be used in this case */ *tile_mode = 0; @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, } } +static const u64 legacy_modifiers[] = { + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), + DRM_FORMAT_MOD_INVALID +}; + static int nouveau_validate_decode_mod(struct nouveau_drm *drm, uint64_t modifier, @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, (disp->format_modifiers[mod] != modifier); mod++); - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) - return -EINVAL; + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { + for (mod = 0; +(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(legacy_modifiers[mod] != modifier); +mod++); + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + } nouveau_decode_mod(drm, modifier, tile_mode, kind); -- 2.17.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers
This doesn't look related. -James On 7/17/20 2:30 PM, Kirill A. Shutemov wrote: On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote: Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Signed-off-by: James Jones I tried and it crashes. Not sure if it's related. [drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF [drm:vblank_disable_fn] disabling vblank on crtc 0 [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_CPU_PREP [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF BUG: unable to handle page fault for address: 059c #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD 0 P4D 0 Oops: [#1] PREEMPT SMP PTI CPU: 13 PID: 3351 Comm: alacritty Tainted: G I 5.8.0-rc5-00191-g086f86c033f9 #53 Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3 Pro/X299 AORUS Gaming 3 Pro-CF, BIOS F5d 11/28/2019 RIP: 0010:kmem_cache_alloc_trace (/home/kas/linux/torvalds/mm/slub.c:272 /home/kas/linux/torvalds/mm/slub.c:278 /home/kas/linux/torvalds/mm/slub.c:292 /home/kas/linux/torvalds/mm/slub.c:2791 /home/kas/linux/torvalds/mm/slub.c:2832 /home/kas/linux/torvalds/mm/slub.c:2849) Code: 8b 51 08 48 89 c8 65 48 03 05 d4 0e ca 70 48 8b 70 08 48 39 f2 75 e7 4c 8b 38 4d 85 ff 0f 84 8f 01 00 00 8b 45 20 48 8b 7d 00 <49> 8b 1c 07 40 f6 c7 0f 0f 85 95 01 00 00 48 8d 8a 80 00 00 00 4c All code 0: 8b 51 08mov0x8(%rcx),%edx 3: 48 89 c8mov%rcx,%rax 6: 65 48 03 05 d4 0e caadd%gs:0x70ca0ed4(%rip),%rax# 0x70ca0ee2 d: 70 e: 48 8b 70 08 mov0x8(%rax),%rsi 12: 48 39 f2cmp%rsi,%rdx 15: 75 e7 jne0xfffe 17: 4c 8b 38mov(%rax),%r15 1a: 4d 85 fftest %r15,%r15 1d: 0f 84 8f 01 00 00 je 0x1b2 23: 8b 45 20mov0x20(%rbp),%eax 26: 48 8b 7d 00 mov0x0(%rbp),%rdi 2a:* 49 8b 1c 07 mov(%r15,%rax,1),%rbx <-- trapping instruction 2e: 40 f6 c7 0f test $0xf,%dil 32: 0f 85 95 01 00 00 jne0x1cd 38: 48 8d 8a 80 00 00 00lea0x80(%rdx),%rcx 3f: 4c rex.WR Code starting with the faulting instruction === 0: 49 8b 1c 07 mov(%r15,%rax,1),%rbx 4: 40 f6 c7 0f test $0xf,%dil 8: 0f 85 95 01 00 00 jne0x1a3 e: 48 8d 8a 80 00 00 00lea0x80(%rdx),%rcx 15: 4c rex.WR RSP: 0018:a8a381bcfba0 EFLAGS: 00010206 RAX: 0030 RBX: 9c0b15e05e00 RCX: 0003fe50 RDX: fc8d RSI: fc8d RDI: 0003fe50 RBP: 9c0b18407840 R08: R09: 0001 R10: 9c0b06c28000 R11: 0001 R12: 0dc0 [drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_GET_SEQUENCE R13: 0060 R14: 8fa35a47 R15: 056c FS: 7fbe7a8e3900() GS:9c0b1f88() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 059c CR3: 00103c7fe004 CR4: 003606e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: nouveau_fence_new (/home/kas/linux/torvalds/include/linux/slab.h:555 /home/kas/linux/torvalds/include/linux/slab.h:669 /home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_fence.c:423) [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0 nouveau_gem_ioctl_pushbuf (/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:852) [drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_QUEUE_SEQUENCE ? nouveau_gem_ioctl_new (/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680) ? drm_ioctl_kernel (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:793) ? nouveau_gem_ioctl_new (/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680) drm_ioctl_kernel (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:793) drm_ioctl (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:888) ? nouveau_gem_ioctl_new (/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680) ? _raw_spin_unlock_irqrestore (/home/kas/linux/torvalds/arch/x86/include/asm/irqflags.h:41 /home/kas/linux/torvalds/arch/x86/include/asm/irqflags.h:84 /home/kas/linux/torvalds/include/linux/spinlock_api_smp.h:160
Re: [Nouveau] nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"
On Fri, Jul 17, 2020 at 10:43:18AM -0400, Sasha Levin wrote: > On Fri, Jul 17, 2020 at 02:43:52AM +0200, Karol Herbst wrote: > > On Fri, Jul 17, 2020 at 1:54 AM Bjorn Helgaas wrote: > > > On Fri, Jul 17, 2020 at 12:10:39AM +0200, Karol Herbst wrote: > > > > On Tue, Jul 7, 2020 at 9:30 PM Karol Herbst wrote: > > > > > > > > > > Hi everybody, > > > > > > > > > > with the mentioned commit Nouveau isn't able to load firmware onto the > > > > > GPU on one of my systems here. Even though the issue doesn't always > > > > > happen I am quite confident this is the commit breaking it. > > > > > > > > > > I am still digging into the issue and trying to figure out what > > > > > exactly breaks, but it shows up in different ways. Either we are not > > > > > able to boot the engines on the GPU or the GPU becomes unresponsive. > > > > > Btw, this is also a system where our runtime power management issue > > > > > shows up, so maybe there is indeed something funky with the bridge > > > > > controller. > > > > > > > > > > Just pinging you in case you have an idea on how this could break > > > > > Nouveau > > > > > > > > > > most of the times it shows up like this: > > > > > nouveau :01:00.0: acr: AHESASC binary failed > > > > > > > > > > Sometimes it works at boot and fails at runtime resuming with random > > > > > faults. So I will be investigating a bit more, but yeah... I am super > > > > > sure the commit triggered this issue, no idea if it actually causes > > > > > it. > > > > > > > > so yeah.. I reverted that locally and never ran into issues again. > > > > Still valid on latest 5.7. So can we get this reverted or properly > > > > fixed? This breaks runtime pm for us on at least some hardware. > > > > > > Yeah, that stinks. We had another similar report from Patrick: > > > > > > > > > https://lore.kernel.org/r/caerspo5stek_my1dehwp7ahd0xop87+ohywktjbl7algdbx...@mail.gmail.com > > > > > > Apparently the problem is ec411e02b7a2 ("PCI/PM: Assume ports without > > > DLL Link Active train links in 100 ms"), which Patrick found was > > > backported to v5.4.49 as 828b192c57e8, and you found was backported to > > > v5.7.6 as afaff825e3a4. > > > > > > Oddly, Patrick reported that v5.7.7 worked correctly, even though it > > > still contains afaff825e3a4. > > > > > > I guess in the absence of any other clues we'll have to revert it. > > > I hate to do that because that means we'll have slow resume of > > > Thunderbolt-connected devices again, but that's better than having > > > GPUs completely broken. > > > > > > Could you and Patrick open bugzilla.kernel.org reports, attach dmesg > > > logs and "sudo lspci -vv" output, and add the URLs to Kai-Heng's > > > original report at https://bugzilla.kernel.org/show_bug.cgi?id=206837 > > > and to this thread? > > > > > > There must be a way to fix the slow resume problem without breaking > > > the GPUs. > > > > > > > I wouldn't be surprised if this is related to the Intel bridge we > > check against for Nouveau.. I still have to check on another laptop > > with the same bridge our workaround was required as well but wouldn't > > be surprised if it shows the same problem. Will get you the > > information from both systems tomorrow then. > > I take it that ec411e02b7a2 will be reverted upstream? Yes, unless we have a better fix soon. I applied the revert to my for-linus branch, so it will appear in -next soon. I think it's a little late to get it in -rc5, so I'll probably ask Linus to pull it next week for -rc6. Bjorn ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"
On Fri, Jul 17, 2020 at 03:04:10PM -0400, Lyude Paul wrote: > Isn't it possible to tell whether a PCI device is connected through > thunderbolt or not? We could probably get away with just defaulting > to 100ms for thunderbolt devices without DLL Link Active specified, > and then default to the old delay value for non-thunderbolt devices. pci_is_thunderbolt_attached() ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] RFC: ACPI / OSI: remove workarounds for hybrid graphics laptops
On 2020-07-17 1:05 p.m., Karol Herbst wrote: > It's hard to figure out what systems are actually affected and right now I > don't see a good way of removing those... > > But I'd like to see thos getting removed and drivers fixed instead (which > happened at least for nouveau). > > And as mentioned before, I prefer people working on fixing issues instead > of spending time to add firmware level workarounds which are hard to know > to which systems they apply to, hard to remove and basically a big huge > pain to work with.> In the end I have no idea how to even figure out what > systems are affected > and which not by this, so I have no idea how to even verify we can safely > remove this (which just means those are impossible to remove unless we risk > breaking systems, which again makes those supper annoying to deal with). > > Also from the comments it's hard to get what those bits really do. Are they > just preventing runtime pm or do the devices are powered down when booting? > I am sure it's the former, still... > > Please, don't do this again. > > For now, those workaround prevent power savings on systems those workaround > applies to, which might be any so those should get removed asap and if > new issues arrise removing those please do a proper bug report and we can > look into it and come up with a proper fix (and keep this patch out until > we resolve all of those). > > Signed-off-by: Karol Herbst > CC: Alex Hung > CC: "Rafael J. Wysocki" > CC: Len Brown > CC: Lyude Paul > CC: linux-ker...@vger.kernel.org > CC: dri-de...@lists.freedesktop.org > CC: nouveau@lists.freedesktop.org > --- > drivers/acpi/osi.c | 24 > 1 file changed, 24 deletions(-) > > diff --git a/drivers/acpi/osi.c b/drivers/acpi/osi.c > index 9f68538091384..d4405e1ca9b97 100644 > --- a/drivers/acpi/osi.c > +++ b/drivers/acpi/osi.c > @@ -44,30 +44,6 @@ osi_setup_entries[OSI_STRING_ENTRIES_MAX] __initdata = { > {"Processor Device", true}, > {"3.0 _SCP Extensions", true}, > {"Processor Aggregator Device", true}, > - /* > - * Linux-Dell-Video is used by BIOS to disable RTD3 for NVidia graphics > - * cards as RTD3 is not supported by drivers now. Systems with NVidia > - * cards will hang without RTD3 disabled. > - * > - * Once NVidia drivers officially support RTD3, this _OSI strings can > - * be removed if both new and old graphics cards are supported. > - */ > - {"Linux-Dell-Video", true}, > - /* > - * Linux-Lenovo-NV-HDMI-Audio is used by BIOS to power on NVidia's HDMI > - * audio device which is turned off for power-saving in Windows OS. > - * This power management feature observed on some Lenovo Thinkpad > - * systems which will not be able to output audio via HDMI without > - * a BIOS workaround. > - */ > - {"Linux-Lenovo-NV-HDMI-Audio", true}, > - /* > - * Linux-HPI-Hybrid-Graphics is used by BIOS to enable dGPU to > - * output video directly to external monitors on HP Inc. mobile > - * workstations as Nvidia and AMD VGA drivers provide limited > - * hybrid graphics supports. > - */ > - {"Linux-HPI-Hybrid-Graphics", true}, > }; > > static u32 acpi_osi_handler(acpi_string interface, u32 supported) > The changes were discussed and tested a while ago, and no crashes were observed. Thanks for solving PM issues in nouveau. Acked-by: Alex Hung ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers
On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote: > Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() > family of modifiers to handle broken userspace > Xorg modesetting and Mesa drivers. > > Tested with Xorg 1.20 modesetting driver, > weston@c46c70dac84a4b3030cd05b380f9f410536690fc, > gnome & KDE wayland desktops from Ubuntu 18.04, > and sway 1.5 Just bikeshed, but maybe a few more words on what exactly is broken and how this works around it. Specifically why we only accept these, but don't advertise them. > > Signed-off-by: James Jones Needs Fixes: line here. Also nice to mention the bug reporter/link. > --- > drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- > 1 file changed, 24 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c > b/drivers/gpu/drm/nouveau/nouveau_display.c > index 496c4621cc78..31543086254b 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_display.c > +++ b/drivers/gpu/drm/nouveau/nouveau_display.c > @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, > uint32_t *tile_mode, > uint8_t *kind) > { > + struct nouveau_display *disp = nouveau_display(drm->dev); > BUG_ON(!tile_mode || !kind); > > + if ((modifier & (0xffull << 12)) == 0ull) { > + /* Legacy modifier. Translate to this device's 'kind.' */ > + modifier |= disp->format_modifiers[0] & (0xffull << 12); > + } Hm I tried to understand what this magic does by looking at drm_fourcc.h, but the drm_fourcc_canonicalize_nvidia_format_mod() in there implements something else. Is that function wrong, or should we use it here instead? Or is there something else going on entirely? Cheers, Daniel > + > if (modifier == DRM_FORMAT_MOD_LINEAR) { > /* tile_mode will not be used in this case */ > *tile_mode = 0; > @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer > *fb, > } > } > > +static const u64 legacy_modifiers[] = { > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), > + DRM_FORMAT_MOD_INVALID > +}; > + > static int > nouveau_validate_decode_mod(struct nouveau_drm *drm, > uint64_t modifier, > @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, >(disp->format_modifiers[mod] != modifier); >mod++); > > - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) > - return -EINVAL; > + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { > + for (mod = 0; > + (legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && > + (legacy_modifiers[mod] != modifier); > + mod++); > + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) > + return -EINVAL; > + } > > nouveau_decode_mod(drm, modifier, tile_mode, kind); > > -- > 2.17.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] RFC: ACPI / OSI: remove workarounds for hybrid graphics laptops
Hey-just wanted to give some additional context I think Karol missed here. It should be noted that since the last time me and Karol looked at reverting these, we've already fixed all of the nasty runtime PM (e.g. runtime D3) issues we could find. This includes the infamous one that was affecting nearly all of the nvidia pascal (+ some maxwell 2 and turing, it ended up being that the intel PCI bridge was the culprit) machines on the market. Right now I'm only aware of one major issue we have, which is the result of a recent PCI core change that we're already in the process of getting reverted: https://lkml.org/lkml/2020/7/16/1288 [if you do any testing of runtime PM, you may need to temporarily revert this patch to make things work] So, really-runtime D3 is very much supported with nouveau. And we've put a _lot_ of effort into making sure of that, and are happy to fix any issues you're still finding. It also works just fine in AMD, but if you're finding systems it doesn't work with amd on then please let the amdgpu folks know upstream so they can properly fix it. Otherwise, these ACPI workarounds are likely making the power consumption on these systems (for nouveau, amdgpu, and radeon) dramatically higher then it needs to be. On Fri, 2020-07-17 at 21:05 +0200, Karol Herbst wrote: > It's hard to figure out what systems are actually affected and right now I > don't see a good way of removing those... > > But I'd like to see thos getting removed and drivers fixed instead (which > happened at least for nouveau). > > And as mentioned before, I prefer people working on fixing issues instead > of spending time to add firmware level workarounds which are hard to know > to which systems they apply to, hard to remove and basically a big huge > pain to work with. > In the end I have no idea how to even figure out what systems are affected > and which not by this, so I have no idea how to even verify we can safely > remove this (which just means those are impossible to remove unless we risk > breaking systems, which again makes those supper annoying to deal with). > > Also from the comments it's hard to get what those bits really do. Are they > just preventing runtime pm or do the devices are powered down when booting? > I am sure it's the former, still... > > Please, don't do this again. > > For now, those workaround prevent power savings on systems those workaround > applies to, which might be any so those should get removed asap and if > new issues arrise removing those please do a proper bug report and we can > look into it and come up with a proper fix (and keep this patch out until > we resolve all of those). > > Signed-off-by: Karol Herbst > CC: Alex Hung > CC: "Rafael J. Wysocki" > CC: Len Brown > CC: Lyude Paul > CC: linux-ker...@vger.kernel.org > CC: dri-de...@lists.freedesktop.org > CC: nouveau@lists.freedesktop.org > --- > drivers/acpi/osi.c | 24 > 1 file changed, 24 deletions(-) > > diff --git a/drivers/acpi/osi.c b/drivers/acpi/osi.c > index 9f68538091384..d4405e1ca9b97 100644 > --- a/drivers/acpi/osi.c > +++ b/drivers/acpi/osi.c > @@ -44,30 +44,6 @@ osi_setup_entries[OSI_STRING_ENTRIES_MAX] __initdata = { > {"Processor Device", true}, > {"3.0 _SCP Extensions", true}, > {"Processor Aggregator Device", true}, > - /* > - * Linux-Dell-Video is used by BIOS to disable RTD3 for NVidia graphics > - * cards as RTD3 is not supported by drivers now. Systems with NVidia > - * cards will hang without RTD3 disabled. > - * > - * Once NVidia drivers officially support RTD3, this _OSI strings can > - * be removed if both new and old graphics cards are supported. > - */ > - {"Linux-Dell-Video", true}, > - /* > - * Linux-Lenovo-NV-HDMI-Audio is used by BIOS to power on NVidia's HDMI > - * audio device which is turned off for power-saving in Windows OS. > - * This power management feature observed on some Lenovo Thinkpad > - * systems which will not be able to output audio via HDMI without > - * a BIOS workaround. > - */ > - {"Linux-Lenovo-NV-HDMI-Audio", true}, > - /* > - * Linux-HPI-Hybrid-Graphics is used by BIOS to enable dGPU to > - * output video directly to external monitors on HP Inc. mobile > - * workstations as Nvidia and AMD VGA drivers provide limited > - * hybrid graphics supports. > - */ > - {"Linux-HPI-Hybrid-Graphics", true}, > }; > > static u32 acpi_osi_handler(acpi_string interface, u32 supported) ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] RFC: ACPI / OSI: remove workarounds for hybrid graphics laptops
It's hard to figure out what systems are actually affected and right now I don't see a good way of removing those... But I'd like to see thos getting removed and drivers fixed instead (which happened at least for nouveau). And as mentioned before, I prefer people working on fixing issues instead of spending time to add firmware level workarounds which are hard to know to which systems they apply to, hard to remove and basically a big huge pain to work with. In the end I have no idea how to even figure out what systems are affected and which not by this, so I have no idea how to even verify we can safely remove this (which just means those are impossible to remove unless we risk breaking systems, which again makes those supper annoying to deal with). Also from the comments it's hard to get what those bits really do. Are they just preventing runtime pm or do the devices are powered down when booting? I am sure it's the former, still... Please, don't do this again. For now, those workaround prevent power savings on systems those workaround applies to, which might be any so those should get removed asap and if new issues arrise removing those please do a proper bug report and we can look into it and come up with a proper fix (and keep this patch out until we resolve all of those). Signed-off-by: Karol Herbst CC: Alex Hung CC: "Rafael J. Wysocki" CC: Len Brown CC: Lyude Paul CC: linux-ker...@vger.kernel.org CC: dri-de...@lists.freedesktop.org CC: nouveau@lists.freedesktop.org --- drivers/acpi/osi.c | 24 1 file changed, 24 deletions(-) diff --git a/drivers/acpi/osi.c b/drivers/acpi/osi.c index 9f68538091384..d4405e1ca9b97 100644 --- a/drivers/acpi/osi.c +++ b/drivers/acpi/osi.c @@ -44,30 +44,6 @@ osi_setup_entries[OSI_STRING_ENTRIES_MAX] __initdata = { {"Processor Device", true}, {"3.0 _SCP Extensions", true}, {"Processor Aggregator Device", true}, - /* -* Linux-Dell-Video is used by BIOS to disable RTD3 for NVidia graphics -* cards as RTD3 is not supported by drivers now. Systems with NVidia -* cards will hang without RTD3 disabled. -* -* Once NVidia drivers officially support RTD3, this _OSI strings can -* be removed if both new and old graphics cards are supported. -*/ - {"Linux-Dell-Video", true}, - /* -* Linux-Lenovo-NV-HDMI-Audio is used by BIOS to power on NVidia's HDMI -* audio device which is turned off for power-saving in Windows OS. -* This power management feature observed on some Lenovo Thinkpad -* systems which will not be able to output audio via HDMI without -* a BIOS workaround. -*/ - {"Linux-Lenovo-NV-HDMI-Audio", true}, - /* -* Linux-HPI-Hybrid-Graphics is used by BIOS to enable dGPU to -* output video directly to external monitors on HP Inc. mobile -* workstations as Nvidia and AMD VGA drivers provide limited -* hybrid graphics supports. -*/ - {"Linux-HPI-Hybrid-Graphics", true}, }; static u32 acpi_osi_handler(acpi_string interface, u32 supported) -- 2.26.2 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"
On Thu, 2020-07-16 at 18:54 -0500, Bjorn Helgaas wrote: > [+cc Sasha -- stable kernel regression] > [+cc Patrick, Kai-Heng, LKML] > > On Fri, Jul 17, 2020 at 12:10:39AM +0200, Karol Herbst wrote: > > On Tue, Jul 7, 2020 at 9:30 PM Karol Herbst wrote: > > > Hi everybody, > > > > > > with the mentioned commit Nouveau isn't able to load firmware onto the > > > GPU on one of my systems here. Even though the issue doesn't always > > > happen I am quite confident this is the commit breaking it. > > > > > > I am still digging into the issue and trying to figure out what > > > exactly breaks, but it shows up in different ways. Either we are not > > > able to boot the engines on the GPU or the GPU becomes unresponsive. > > > Btw, this is also a system where our runtime power management issue > > > shows up, so maybe there is indeed something funky with the bridge > > > controller. > > > > > > Just pinging you in case you have an idea on how this could break Nouveau > > > > > > most of the times it shows up like this: > > > nouveau :01:00.0: acr: AHESASC binary failed > > > > > > Sometimes it works at boot and fails at runtime resuming with random > > > faults. So I will be investigating a bit more, but yeah... I am super > > > sure the commit triggered this issue, no idea if it actually causes > > > it. > > > > so yeah.. I reverted that locally and never ran into issues again. > > Still valid on latest 5.7. So can we get this reverted or properly > > fixed? This breaks runtime pm for us on at least some hardware. > > Yeah, that stinks. We had another similar report from Patrick: > > > https://lore.kernel.org/r/caerspo5stek_my1dehwp7ahd0xop87+ohywktjbl7algdbx...@mail.gmail.com > > Apparently the problem is ec411e02b7a2 ("PCI/PM: Assume ports without > DLL Link Active train links in 100 ms"), which Patrick found was > backported to v5.4.49 as 828b192c57e8, and you found was backported to > v5.7.6 as afaff825e3a4. > > Oddly, Patrick reported that v5.7.7 worked correctly, even though it > still contains afaff825e3a4. > > I guess in the absence of any other clues we'll have to revert it. > I hate to do that because that means we'll have slow resume of > Thunderbolt-connected devices again, but that's better than having > GPUs completely broken. > > Could you and Patrick open bugzilla.kernel.org reports, attach dmesg > logs and "sudo lspci -vv" output, and add the URLs to Kai-Heng's > original report at https://bugzilla.kernel.org/show_bug.cgi?id=206837 > and to this thread? > > There must be a way to fix the slow resume problem without breaking > the GPUs. Isn't it possible to tell whether a PCI device is connected through thunderbolt or not? We could probably get away with just defaulting to 100ms for thunderbolt devices without DLL Link Active specified, and then default to the old delay value for non-thunderbolt devices. > > Bjorn > ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers
This should resolve the inability to start X with the new NV format modifier support in nouveau. FYI, I'm offline next week, but I'll check in tonight in case there are any review comments. When I'm back, I'll get the associated userspace fixes cleaned up and out to the appropriate lists. Thanks, -James On 7/17/20 11:57 AM, James Jones wrote: Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } + if (modifier == DRM_FORMAT_MOD_LINEAR) { /* tile_mode will not be used in this case */ *tile_mode = 0; @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, } } +static const u64 legacy_modifiers[] = { + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), + DRM_FORMAT_MOD_INVALID +}; + static int nouveau_validate_decode_mod(struct nouveau_drm *drm, uint64_t modifier, @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, (disp->format_modifiers[mod] != modifier); mod++); - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) - return -EINVAL; + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { + for (mod = 0; +(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(legacy_modifiers[mod] != modifier); +mod++); + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + } nouveau_decode_mod(drm, modifier, tile_mode, kind); ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers
Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } + if (modifier == DRM_FORMAT_MOD_LINEAR) { /* tile_mode will not be used in this case */ *tile_mode = 0; @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, } } +static const u64 legacy_modifiers[] = { + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), + DRM_FORMAT_MOD_INVALID +}; + static int nouveau_validate_decode_mod(struct nouveau_drm *drm, uint64_t modifier, @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, (disp->format_modifiers[mod] != modifier); mod++); - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) - return -EINVAL; + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { + for (mod = 0; +(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(legacy_modifiers[mod] != modifier); +mod++); + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + } nouveau_decode_mod(drm, modifier, tile_mode, kind); -- 2.17.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"
On Fri, Jul 17, 2020 at 02:43:52AM +0200, Karol Herbst wrote: On Fri, Jul 17, 2020 at 1:54 AM Bjorn Helgaas wrote: [+cc Sasha -- stable kernel regression] [+cc Patrick, Kai-Heng, LKML] On Fri, Jul 17, 2020 at 12:10:39AM +0200, Karol Herbst wrote: > On Tue, Jul 7, 2020 at 9:30 PM Karol Herbst wrote: > > > > Hi everybody, > > > > with the mentioned commit Nouveau isn't able to load firmware onto the > > GPU on one of my systems here. Even though the issue doesn't always > > happen I am quite confident this is the commit breaking it. > > > > I am still digging into the issue and trying to figure out what > > exactly breaks, but it shows up in different ways. Either we are not > > able to boot the engines on the GPU or the GPU becomes unresponsive. > > Btw, this is also a system where our runtime power management issue > > shows up, so maybe there is indeed something funky with the bridge > > controller. > > > > Just pinging you in case you have an idea on how this could break Nouveau > > > > most of the times it shows up like this: > > nouveau :01:00.0: acr: AHESASC binary failed > > > > Sometimes it works at boot and fails at runtime resuming with random > > faults. So I will be investigating a bit more, but yeah... I am super > > sure the commit triggered this issue, no idea if it actually causes > > it. > > so yeah.. I reverted that locally and never ran into issues again. > Still valid on latest 5.7. So can we get this reverted or properly > fixed? This breaks runtime pm for us on at least some hardware. Yeah, that stinks. We had another similar report from Patrick: https://lore.kernel.org/r/caerspo5stek_my1dehwp7ahd0xop87+ohywktjbl7algdbx...@mail.gmail.com Apparently the problem is ec411e02b7a2 ("PCI/PM: Assume ports without DLL Link Active train links in 100 ms"), which Patrick found was backported to v5.4.49 as 828b192c57e8, and you found was backported to v5.7.6 as afaff825e3a4. Oddly, Patrick reported that v5.7.7 worked correctly, even though it still contains afaff825e3a4. I guess in the absence of any other clues we'll have to revert it. I hate to do that because that means we'll have slow resume of Thunderbolt-connected devices again, but that's better than having GPUs completely broken. Could you and Patrick open bugzilla.kernel.org reports, attach dmesg logs and "sudo lspci -vv" output, and add the URLs to Kai-Heng's original report at https://bugzilla.kernel.org/show_bug.cgi?id=206837 and to this thread? There must be a way to fix the slow resume problem without breaking the GPUs. I wouldn't be surprised if this is related to the Intel bridge we check against for Nouveau.. I still have to check on another laptop with the same bridge our workaround was required as well but wouldn't be surprised if it shows the same problem. Will get you the information from both systems tomorrow then. I take it that ec411e02b7a2 will be reverted upstream? -- Thanks, Sasha ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"
Filed at https://bugzilla.kernel.org/show_bug.cgi?id=208597 oddly enough I wasn't able to reproduce it on my XPS 9560, will ping once something breaks. On Fri, Jul 17, 2020 at 2:43 AM Karol Herbst wrote: > > On Fri, Jul 17, 2020 at 1:54 AM Bjorn Helgaas wrote: > > > > [+cc Sasha -- stable kernel regression] > > [+cc Patrick, Kai-Heng, LKML] > > > > On Fri, Jul 17, 2020 at 12:10:39AM +0200, Karol Herbst wrote: > > > On Tue, Jul 7, 2020 at 9:30 PM Karol Herbst wrote: > > > > > > > > Hi everybody, > > > > > > > > with the mentioned commit Nouveau isn't able to load firmware onto the > > > > GPU on one of my systems here. Even though the issue doesn't always > > > > happen I am quite confident this is the commit breaking it. > > > > > > > > I am still digging into the issue and trying to figure out what > > > > exactly breaks, but it shows up in different ways. Either we are not > > > > able to boot the engines on the GPU or the GPU becomes unresponsive. > > > > Btw, this is also a system where our runtime power management issue > > > > shows up, so maybe there is indeed something funky with the bridge > > > > controller. > > > > > > > > Just pinging you in case you have an idea on how this could break > > > > Nouveau > > > > > > > > most of the times it shows up like this: > > > > nouveau :01:00.0: acr: AHESASC binary failed > > > > > > > > Sometimes it works at boot and fails at runtime resuming with random > > > > faults. So I will be investigating a bit more, but yeah... I am super > > > > sure the commit triggered this issue, no idea if it actually causes > > > > it. > > > > > > so yeah.. I reverted that locally and never ran into issues again. > > > Still valid on latest 5.7. So can we get this reverted or properly > > > fixed? This breaks runtime pm for us on at least some hardware. > > > > Yeah, that stinks. We had another similar report from Patrick: > > > > > > https://lore.kernel.org/r/caerspo5stek_my1dehwp7ahd0xop87+ohywktjbl7algdbx...@mail.gmail.com > > > > Apparently the problem is ec411e02b7a2 ("PCI/PM: Assume ports without > > DLL Link Active train links in 100 ms"), which Patrick found was > > backported to v5.4.49 as 828b192c57e8, and you found was backported to > > v5.7.6 as afaff825e3a4. > > > > Oddly, Patrick reported that v5.7.7 worked correctly, even though it > > still contains afaff825e3a4. > > > > I guess in the absence of any other clues we'll have to revert it. > > I hate to do that because that means we'll have slow resume of > > Thunderbolt-connected devices again, but that's better than having > > GPUs completely broken. > > > > Could you and Patrick open bugzilla.kernel.org reports, attach dmesg > > logs and "sudo lspci -vv" output, and add the URLs to Kai-Heng's > > original report at https://bugzilla.kernel.org/show_bug.cgi?id=206837 > > and to this thread? > > > > There must be a way to fix the slow resume problem without breaking > > the GPUs. > > > > I wouldn't be surprised if this is related to the Intel bridge we > check against for Nouveau.. I still have to check on another laptop > with the same bridge our workaround was required as well but wouldn't > be surprised if it shows the same problem. Will get you the > information from both systems tomorrow then. > > > Bjorn > > ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau