Re: [Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread James Jones
Did you just cherry-pick my change, or were you running the latest 
drm-next or drm-fixes code?  There do appear to be various MM-related 
fixes that may be related to this in drm-fixes when I scroll down the 
log looking for nouveau stuff.  Shot in the dark, but might be worth 
trying with Dave's tree if you weren't already.  I was testing with 
drm-fixes-2020-07-17-1 from here:


git://anongit.freedesktop.org/drm/drm

Thanks,
-James

On 7/17/20 8:13 PM, James Jones wrote:

This doesn't look related.

-James

On 7/17/20 2:30 PM, Kirill A. Shutemov wrote:

On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote:

Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Signed-off-by: James Jones 


I tried and it crashes. Not sure if it's related.

[drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
[drm:vblank_disable_fn] disabling vblank on crtc 0
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_CPU_PREP
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
BUG: unable to handle page fault for address: 059c
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 0 P4D 0
Oops:  [#1] PREEMPT SMP PTI
CPU: 13 PID: 3351 Comm: alacritty Tainted: G  I   
5.8.0-rc5-00191-g086f86c033f9 #53
Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3 
Pro/X299 AORUS Gaming 3 Pro-CF, BIOS F5d 11/28/2019
RIP: 0010:kmem_cache_alloc_trace 
(/home/kas/linux/torvalds/mm/slub.c:272 
/home/kas/linux/torvalds/mm/slub.c:278 
/home/kas/linux/torvalds/mm/slub.c:292 
/home/kas/linux/torvalds/mm/slub.c:2791 
/home/kas/linux/torvalds/mm/slub.c:2832 
/home/kas/linux/torvalds/mm/slub.c:2849)
Code: 8b 51 08 48 89 c8 65 48 03 05 d4 0e ca 70 48 8b 70 08 48 39 f2 
75 e7 4c 8b 38 4d 85 ff 0f 84 8f 01 00 00 8b 45 20 48 8b 7d 00 <49> 8b 
1c 07 40 f6 c7 0f 0f 85 95 01 00 00 48 8d 8a 80 00 00 00 4c

All code

    0:    8b 51 08 mov    0x8(%rcx),%edx
    3:    48 89 c8 mov    %rcx,%rax
    6:    65 48 03 05 d4 0e ca add
%gs:0x70ca0ed4(%rip),%rax    # 0x70ca0ee2

    d:    70
    e:    48 8b 70 08  mov    0x8(%rax),%rsi
   12:    48 39 f2 cmp    %rsi,%rdx
   15:    75 e7    jne    0xfffe
   17:    4c 8b 38 mov    (%rax),%r15
   1a:    4d 85 ff test   %r15,%r15
   1d:    0f 84 8f 01 00 00    je 0x1b2
   23:    8b 45 20 mov    0x20(%rbp),%eax
   26:    48 8b 7d 00  mov    0x0(%rbp),%rdi
   2a:*    49 8b 1c 07  mov    (%r15,%rax,1),%rbx
<-- trapping instruction

   2e:    40 f6 c7 0f  test   $0xf,%dil
   32:    0f 85 95 01 00 00    jne    0x1cd
   38:    48 8d 8a 80 00 00 00 lea    0x80(%rdx),%rcx
   3f:    4c   rex.WR

Code starting with the faulting instruction
===
    0:    49 8b 1c 07  mov    (%r15,%rax,1),%rbx
    4:    40 f6 c7 0f  test   $0xf,%dil
    8:    0f 85 95 01 00 00    jne    0x1a3
    e:    48 8d 8a 80 00 00 00 lea    0x80(%rdx),%rcx
   15:    4c   rex.WR
RSP: 0018:a8a381bcfba0 EFLAGS: 00010206
RAX: 0030 RBX: 9c0b15e05e00 RCX: 0003fe50
RDX: fc8d RSI: fc8d RDI: 0003fe50
RBP: 9c0b18407840 R08:  R09: 0001
R10: 9c0b06c28000 R11: 0001 R12: 0dc0
[drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_GET_SEQUENCE
R13: 0060 R14: 8fa35a47 R15: 056c
FS:  7fbe7a8e3900() GS:9c0b1f88() 
knlGS:

CS:  0010 DS:  ES:  CR0: 80050033
CR2: 059c CR3: 00103c7fe004 CR4: 003606e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
nouveau_fence_new (/home/kas/linux/torvalds/include/linux/slab.h:555 
/home/kas/linux/torvalds/include/linux/slab.h:669 
/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_fence.c:423)

[drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0
nouveau_gem_ioctl_pushbuf 
(/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:852)
[drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, 
DRM_IOCTL_CRTC_QUEUE_SEQUENCE
? nouveau_gem_ioctl_new 
(/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680)
? drm_ioctl_kernel 
(/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:793)
? nouveau_gem_ioctl_new 

Re: [Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread James Jones

On 7/17/20 12:47 PM, Daniel Vetter wrote:

On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote:

Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5


Just bikeshed, but maybe a few more words on what exactly is broken and
how this works around it. Specifically why we only accept these, but don't
advertise them.


Added quite a few words.



Signed-off-by: James Jones 


Needs Fixes: line here. Also nice to mention the bug reporter/link.


Done in v2.


---
  drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
  1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
   uint32_t *tile_mode,
   uint8_t *kind)
  {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
BUG_ON(!tile_mode || !kind);
  
+	if ((modifier & (0xffull << 12)) == 0ull) {

+   /* Legacy modifier.  Translate to this device's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }


Hm I tried to understand what this magic does by looking at drm_fourcc.h,
but the drm_fourcc_canonicalize_nvidia_format_mod() in there implements
something else. Is that function wrong, or should we use it here instead?


> Or is there something else going on entirely?

This may be slightly clearer with the expanded change description:

Canonicalize assumes the old modifiers are only used by certain Tegra 
revisions, because the Mesa patches were supposed to land and obliterate 
all uses beyond that.  That assumption means it can assume the specific 
page kind (0xfe) used by the display-engine-compatible layout on those 
specific devices.  There is no way to generally canonicalize a legacy 
modifier without referencing a specific device type, as is indirectly 
done here.


This code does a limited device-specific canonicalization: It 
substitutes the display-appropriate page kind used by this specific 
device, ensuring we derive this correct page kind later in the function. 
 I iterated on the best way to accomplish this a few times, and this 
was the least-invasive thing I came up with, but it does require a 
pretty thorough understanding of the NVIDIA modifier macros.


Thanks for the quick review.

-James



Cheers, Daniel


+
if (modifier == DRM_FORMAT_MOD_LINEAR) {
/* tile_mode will not be used in this case */
*tile_mode = 0;
@@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
}
  }
  
+static const u64 legacy_modifiers[] = {

+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
+   DRM_FORMAT_MOD_INVALID
+};
+
  static int
  nouveau_validate_decode_mod(struct nouveau_drm *drm,
uint64_t modifier,
@@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
 (disp->format_modifiers[mod] != modifier);
 mod++);
  
-	if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)

-   return -EINVAL;
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
+   for (mod = 0;
+(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(legacy_modifiers[mod] != modifier);
+mod++);
+   if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+   }
  
  	nouveau_decode_mod(drm, modifier, tile_mode, kind);
  
--

2.17.1




___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH v2] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread James Jones
Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers. Existing Mesa
drivers are still aware of only these older
format modifiers which do not differentiate
between different variations of the block linear
layout. When the format modifier support flag was
flipped in the nouveau kernel driver, the X.org
modesetting driver began attempting to use its
format modifier-enabled framebuffer path. Because
the set of format modifiers advertised by the
kernel prior to this change do not intersect with
the set of format modifiers advertised by Mesa,
allocating GBM buffers using format modifiers
fails and the modesetting driver falls back to
non-modifier allocation. However, it still later
queries the modifier of the GBM buffer when
creating its DRM-KMS framebuffer object, receives
the old-format modifier from Mesa, and attempts
to create a framebuffer with it. Since the kernel
is still not aware of these formats, this fails.

Userspace should not be attempting to query format
modifiers of GBM buffers allocated with a non-
format-modifier-aware allocation path, but to
avoid breaking existing userspace behavior, this
change accepts the old-style format modifiers when
creating framebuffers and applying them to planes
by translating them to the equivalent new-style
modifier. To accomplish this, some layout
parameters must be assumed to match properties of
the device targeted by the relevant ioctls. To
avoid perpetuating misuse of the old-style
modifiers, this change does not advertise support
for them. Doing so would imply compatibility
between devices with incompatible memory layouts.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Reported-by: Kirill A. Shutemov 
Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers")
Link: https://lkml.org/lkml/2020/6/30/1251
Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
   uint32_t *tile_mode,
   uint8_t *kind)
 {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
BUG_ON(!tile_mode || !kind);
 
+   if ((modifier & (0xffull << 12)) == 0ull) {
+   /* Legacy modifier.  Translate to this device's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }
+
if (modifier == DRM_FORMAT_MOD_LINEAR) {
/* tile_mode will not be used in this case */
*tile_mode = 0;
@@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
}
 }
 
+static const u64 legacy_modifiers[] = {
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
+   DRM_FORMAT_MOD_INVALID
+};
+
 static int
 nouveau_validate_decode_mod(struct nouveau_drm *drm,
uint64_t modifier,
@@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
 (disp->format_modifiers[mod] != modifier);
 mod++);
 
-   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
-   return -EINVAL;
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
+   for (mod = 0;
+(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(legacy_modifiers[mod] != modifier);
+mod++);
+   if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+   }
 
nouveau_decode_mod(drm, modifier, tile_mode, kind);
 
-- 
2.17.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread James Jones

This doesn't look related.

-James

On 7/17/20 2:30 PM, Kirill A. Shutemov wrote:

On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote:

Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Signed-off-by: James Jones 


I tried and it crashes. Not sure if it's related.

[drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
[drm:vblank_disable_fn] disabling vblank on crtc 0
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_CPU_PREP
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
BUG: unable to handle page fault for address: 059c
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 0 P4D 0
Oops:  [#1] PREEMPT SMP PTI
CPU: 13 PID: 3351 Comm: alacritty Tainted: G  I   
5.8.0-rc5-00191-g086f86c033f9 #53
Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3 Pro/X299 AORUS 
Gaming 3 Pro-CF, BIOS F5d 11/28/2019
RIP: 0010:kmem_cache_alloc_trace (/home/kas/linux/torvalds/mm/slub.c:272 
/home/kas/linux/torvalds/mm/slub.c:278 /home/kas/linux/torvalds/mm/slub.c:292 
/home/kas/linux/torvalds/mm/slub.c:2791 /home/kas/linux/torvalds/mm/slub.c:2832 
/home/kas/linux/torvalds/mm/slub.c:2849)
Code: 8b 51 08 48 89 c8 65 48 03 05 d4 0e ca 70 48 8b 70 08 48 39 f2 75 e7 4c 8b 38 
4d 85 ff 0f 84 8f 01 00 00 8b 45 20 48 8b 7d 00 <49> 8b 1c 07 40 f6 c7 0f 0f 85 
95 01 00 00 48 8d 8a 80 00 00 00 4c
All code

0:  8b 51 08mov0x8(%rcx),%edx
3:  48 89 c8mov%rcx,%rax
6:  65 48 03 05 d4 0e caadd%gs:0x70ca0ed4(%rip),%rax# 
0x70ca0ee2
d:  70
e:  48 8b 70 08 mov0x8(%rax),%rsi
   12:  48 39 f2cmp%rsi,%rdx
   15:  75 e7   jne0xfffe
   17:  4c 8b 38mov(%rax),%r15
   1a:  4d 85 fftest   %r15,%r15
   1d:  0f 84 8f 01 00 00   je 0x1b2
   23:  8b 45 20mov0x20(%rbp),%eax
   26:  48 8b 7d 00 mov0x0(%rbp),%rdi
   2a:* 49 8b 1c 07 mov(%r15,%rax,1),%rbx   <-- 
trapping instruction
   2e:  40 f6 c7 0f test   $0xf,%dil
   32:  0f 85 95 01 00 00   jne0x1cd
   38:  48 8d 8a 80 00 00 00lea0x80(%rdx),%rcx
   3f:  4c  rex.WR

Code starting with the faulting instruction
===
0:  49 8b 1c 07 mov(%r15,%rax,1),%rbx
4:  40 f6 c7 0f test   $0xf,%dil
8:  0f 85 95 01 00 00   jne0x1a3
e:  48 8d 8a 80 00 00 00lea0x80(%rdx),%rcx
   15:  4c  rex.WR
RSP: 0018:a8a381bcfba0 EFLAGS: 00010206
RAX: 0030 RBX: 9c0b15e05e00 RCX: 0003fe50
RDX: fc8d RSI: fc8d RDI: 0003fe50
RBP: 9c0b18407840 R08:  R09: 0001
R10: 9c0b06c28000 R11: 0001 R12: 0dc0
[drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_GET_SEQUENCE
R13: 0060 R14: 8fa35a47 R15: 056c
FS:  7fbe7a8e3900() GS:9c0b1f88() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 059c CR3: 00103c7fe004 CR4: 003606e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
nouveau_fence_new (/home/kas/linux/torvalds/include/linux/slab.h:555 
/home/kas/linux/torvalds/include/linux/slab.h:669 
/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_fence.c:423)
[drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0
nouveau_gem_ioctl_pushbuf 
(/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:852)
[drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_QUEUE_SEQUENCE
? nouveau_gem_ioctl_new 
(/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680)
? drm_ioctl_kernel (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:793)
? nouveau_gem_ioctl_new 
(/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680)
drm_ioctl_kernel (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:793)
drm_ioctl (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:888)
? nouveau_gem_ioctl_new 
(/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680)
? _raw_spin_unlock_irqrestore 
(/home/kas/linux/torvalds/arch/x86/include/asm/irqflags.h:41 
/home/kas/linux/torvalds/arch/x86/include/asm/irqflags.h:84 
/home/kas/linux/torvalds/include/linux/spinlock_api_smp.h:160 

Re: [Nouveau] nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"

2020-07-17 Thread Bjorn Helgaas
On Fri, Jul 17, 2020 at 10:43:18AM -0400, Sasha Levin wrote:
> On Fri, Jul 17, 2020 at 02:43:52AM +0200, Karol Herbst wrote:
> > On Fri, Jul 17, 2020 at 1:54 AM Bjorn Helgaas  wrote:
> > > On Fri, Jul 17, 2020 at 12:10:39AM +0200, Karol Herbst wrote:
> > > > On Tue, Jul 7, 2020 at 9:30 PM Karol Herbst  wrote:
> > > > >
> > > > > Hi everybody,
> > > > >
> > > > > with the mentioned commit Nouveau isn't able to load firmware onto the
> > > > > GPU on one of my systems here. Even though the issue doesn't always
> > > > > happen I am quite confident this is the commit breaking it.
> > > > >
> > > > > I am still digging into the issue and trying to figure out what
> > > > > exactly breaks, but it shows up in different ways. Either we are not
> > > > > able to boot the engines on the GPU or the GPU becomes unresponsive.
> > > > > Btw, this is also a system where our runtime power management issue
> > > > > shows up, so maybe there is indeed something funky with the bridge
> > > > > controller.
> > > > >
> > > > > Just pinging you in case you have an idea on how this could break 
> > > > > Nouveau
> > > > >
> > > > > most of the times it shows up like this:
> > > > > nouveau :01:00.0: acr: AHESASC binary failed
> > > > >
> > > > > Sometimes it works at boot and fails at runtime resuming with random
> > > > > faults. So I will be investigating a bit more, but yeah... I am super
> > > > > sure the commit triggered this issue, no idea if it actually causes
> > > > > it.
> > > >
> > > > so yeah.. I reverted that locally and never ran into issues again.
> > > > Still valid on latest 5.7. So can we get this reverted or properly
> > > > fixed? This breaks runtime pm for us on at least some hardware.
> > > 
> > > Yeah, that stinks.  We had another similar report from Patrick:
> > > 
> > >   
> > > https://lore.kernel.org/r/caerspo5stek_my1dehwp7ahd0xop87+ohywktjbl7algdbx...@mail.gmail.com
> > > 
> > > Apparently the problem is ec411e02b7a2 ("PCI/PM: Assume ports without
> > > DLL Link Active train links in 100 ms"), which Patrick found was
> > > backported to v5.4.49 as 828b192c57e8, and you found was backported to
> > > v5.7.6 as afaff825e3a4.
> > > 
> > > Oddly, Patrick reported that v5.7.7 worked correctly, even though it
> > > still contains afaff825e3a4.
> > > 
> > > I guess in the absence of any other clues we'll have to revert it.
> > > I hate to do that because that means we'll have slow resume of
> > > Thunderbolt-connected devices again, but that's better than having
> > > GPUs completely broken.
> > > 
> > > Could you and Patrick open bugzilla.kernel.org reports, attach dmesg
> > > logs and "sudo lspci -vv" output, and add the URLs to Kai-Heng's
> > > original report at https://bugzilla.kernel.org/show_bug.cgi?id=206837
> > > and to this thread?
> > > 
> > > There must be a way to fix the slow resume problem without breaking
> > > the GPUs.
> > > 
> > 
> > I wouldn't be surprised if this is related to the Intel bridge we
> > check against for Nouveau.. I still have to check on another laptop
> > with the same bridge our workaround was required as well but wouldn't
> > be surprised if it shows the same problem. Will get you the
> > information from both systems tomorrow then.
> 
> I take it that ec411e02b7a2 will be reverted upstream?

Yes, unless we have a better fix soon.  I applied the revert to my
for-linus branch, so it will appear in -next soon.  I think it's a
little late to get it in -rc5, so I'll probably ask Linus to pull it
next week for -rc6.

Bjorn
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"

2020-07-17 Thread Lukas Wunner
On Fri, Jul 17, 2020 at 03:04:10PM -0400, Lyude Paul wrote:
> Isn't it possible to tell whether a PCI device is connected through
> thunderbolt or not? We could probably get away with just defaulting
> to 100ms for thunderbolt devices without DLL Link Active specified,
> and then default to the old delay value for non-thunderbolt devices.

pci_is_thunderbolt_attached()
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] RFC: ACPI / OSI: remove workarounds for hybrid graphics laptops

2020-07-17 Thread Alex Hung
On 2020-07-17 1:05 p.m., Karol Herbst wrote:
> It's hard to figure out what systems are actually affected and right now I
> don't see a good way of removing those...
> 
> But I'd like to see thos getting removed and drivers fixed instead (which
> happened at least for nouveau).
> 
> And as mentioned before, I prefer people working on fixing issues instead
> of spending time to add firmware level workarounds which are hard to know
> to which systems they apply to, hard to remove and basically a big huge
> pain to work with.> In the end I have no idea how to even figure out what 
> systems are affected
> and which not by this, so I have no idea how to even verify we can safely
> remove this (which just means those are impossible to remove unless we risk
> breaking systems, which again makes those supper annoying to deal with).
> 
> Also from the comments it's hard to get what those bits really do. Are they
> just preventing runtime pm or do the devices are powered down when booting?
> I am sure it's the former, still...
> 
> Please, don't do this again.
> 
> For now, those workaround prevent power savings on systems those workaround
> applies to, which might be any so those should get removed asap and if
> new issues arrise removing those please do a proper bug report and we can
> look into it and come up with a proper fix (and keep this patch out until
> we resolve all of those).
> 
> Signed-off-by: Karol Herbst 
> CC: Alex Hung 
> CC: "Rafael J. Wysocki" 
> CC: Len Brown 
> CC: Lyude Paul 
> CC: linux-ker...@vger.kernel.org
> CC: dri-de...@lists.freedesktop.org
> CC: nouveau@lists.freedesktop.org
> ---
>  drivers/acpi/osi.c | 24 
>  1 file changed, 24 deletions(-)
> 
> diff --git a/drivers/acpi/osi.c b/drivers/acpi/osi.c
> index 9f68538091384..d4405e1ca9b97 100644
> --- a/drivers/acpi/osi.c
> +++ b/drivers/acpi/osi.c
> @@ -44,30 +44,6 @@ osi_setup_entries[OSI_STRING_ENTRIES_MAX] __initdata = {
>   {"Processor Device", true},
>   {"3.0 _SCP Extensions", true},
>   {"Processor Aggregator Device", true},
> - /*
> -  * Linux-Dell-Video is used by BIOS to disable RTD3 for NVidia graphics
> -  * cards as RTD3 is not supported by drivers now.  Systems with NVidia
> -  * cards will hang without RTD3 disabled.
> -  *
> -  * Once NVidia drivers officially support RTD3, this _OSI strings can
> -  * be removed if both new and old graphics cards are supported.
> -  */
> - {"Linux-Dell-Video", true},
> - /*
> -  * Linux-Lenovo-NV-HDMI-Audio is used by BIOS to power on NVidia's HDMI
> -  * audio device which is turned off for power-saving in Windows OS.
> -  * This power management feature observed on some Lenovo Thinkpad
> -  * systems which will not be able to output audio via HDMI without
> -  * a BIOS workaround.
> -  */
> - {"Linux-Lenovo-NV-HDMI-Audio", true},
> - /*
> -  * Linux-HPI-Hybrid-Graphics is used by BIOS to enable dGPU to
> -  * output video directly to external monitors on HP Inc. mobile
> -  * workstations as Nvidia and AMD VGA drivers provide limited
> -  * hybrid graphics supports.
> -  */
> - {"Linux-HPI-Hybrid-Graphics", true},
>  };
>  
>  static u32 acpi_osi_handler(acpi_string interface, u32 supported)
> 

The changes were discussed and tested a while ago, and no crashes were
observed. Thanks for solving PM issues in nouveau.

Acked-by: Alex Hung 
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread Daniel Vetter
On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote:
> Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
> family of modifiers to handle broken userspace
> Xorg modesetting and Mesa drivers.
> 
> Tested with Xorg 1.20 modesetting driver,
> weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
> gnome & KDE wayland desktops from Ubuntu 18.04,
> and sway 1.5

Just bikeshed, but maybe a few more words on what exactly is broken and
how this works around it. Specifically why we only accept these, but don't
advertise them.

> 
> Signed-off-by: James Jones 

Needs Fixes: line here. Also nice to mention the bug reporter/link.

> ---
>  drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
> b/drivers/gpu/drm/nouveau/nouveau_display.c
> index 496c4621cc78..31543086254b 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_display.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_display.c
> @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
>  uint32_t *tile_mode,
>  uint8_t *kind)
>  {
> + struct nouveau_display *disp = nouveau_display(drm->dev);
>   BUG_ON(!tile_mode || !kind);
>  
> + if ((modifier & (0xffull << 12)) == 0ull) {
> + /* Legacy modifier.  Translate to this device's 'kind.' */
> + modifier |= disp->format_modifiers[0] & (0xffull << 12);
> + }

Hm I tried to understand what this magic does by looking at drm_fourcc.h,
but the drm_fourcc_canonicalize_nvidia_format_mod() in there implements
something else. Is that function wrong, or should we use it here instead?

Or is there something else going on entirely?

Cheers, Daniel

> +
>   if (modifier == DRM_FORMAT_MOD_LINEAR) {
>   /* tile_mode will not be used in this case */
>   *tile_mode = 0;
> @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer 
> *fb,
>   }
>  }
>  
> +static const u64 legacy_modifiers[] = {
> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
> + DRM_FORMAT_MOD_INVALID
> +};
> +
>  static int
>  nouveau_validate_decode_mod(struct nouveau_drm *drm,
>   uint64_t modifier,
> @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
>(disp->format_modifiers[mod] != modifier);
>mod++);
>  
> - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
> - return -EINVAL;
> + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
> + for (mod = 0;
> +  (legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
> +  (legacy_modifiers[mod] != modifier);
> +  mod++);
> + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
> + return -EINVAL;
> + }
>  
>   nouveau_decode_mod(drm, modifier, tile_mode, kind);
>  
> -- 
> 2.17.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] RFC: ACPI / OSI: remove workarounds for hybrid graphics laptops

2020-07-17 Thread Lyude Paul
Hey-just wanted to give some additional context I think Karol missed here. It
should be noted that since the last time me and Karol looked at reverting these,
we've already fixed all of the nasty runtime PM (e.g. runtime D3) issues we
could find. This includes the infamous one that was affecting nearly all of the
nvidia pascal (+ some maxwell 2 and turing, it ended up being that the intel PCI
bridge was the culprit) machines on the market. Right now I'm only aware of one
major issue we have, which is the result of a recent PCI core change that we're
already in the process of getting reverted:

https://lkml.org/lkml/2020/7/16/1288

[if you do any testing of runtime PM, you may need to temporarily revert this
patch to make things work]

So, really-runtime D3 is very much supported with nouveau. And we've put a _lot_
of effort into making sure of that, and are happy to fix any issues you're still
finding. It also works just fine in AMD, but if you're finding systems it
doesn't work with amd on then please let the amdgpu folks know upstream so they
can properly fix it. Otherwise, these ACPI workarounds are likely making the
power consumption on these systems (for nouveau, amdgpu, and radeon)
dramatically higher then it needs to be.

On Fri, 2020-07-17 at 21:05 +0200, Karol Herbst wrote:
> It's hard to figure out what systems are actually affected and right now I
> don't see a good way of removing those...
> 
> But I'd like to see thos getting removed and drivers fixed instead (which
> happened at least for nouveau).
> 
> And as mentioned before, I prefer people working on fixing issues instead
> of spending time to add firmware level workarounds which are hard to know
> to which systems they apply to, hard to remove and basically a big huge
> pain to work with.
> In the end I have no idea how to even figure out what systems are affected
> and which not by this, so I have no idea how to even verify we can safely
> remove this (which just means those are impossible to remove unless we risk
> breaking systems, which again makes those supper annoying to deal with).
> 
> Also from the comments it's hard to get what those bits really do. Are they
> just preventing runtime pm or do the devices are powered down when booting?
> I am sure it's the former, still...
> 
> Please, don't do this again.
> 
> For now, those workaround prevent power savings on systems those workaround
> applies to, which might be any so those should get removed asap and if
> new issues arrise removing those please do a proper bug report and we can
> look into it and come up with a proper fix (and keep this patch out until
> we resolve all of those).
> 
> Signed-off-by: Karol Herbst 
> CC: Alex Hung 
> CC: "Rafael J. Wysocki" 
> CC: Len Brown 
> CC: Lyude Paul 
> CC: linux-ker...@vger.kernel.org
> CC: dri-de...@lists.freedesktop.org
> CC: nouveau@lists.freedesktop.org
> ---
>  drivers/acpi/osi.c | 24 
>  1 file changed, 24 deletions(-)
> 
> diff --git a/drivers/acpi/osi.c b/drivers/acpi/osi.c
> index 9f68538091384..d4405e1ca9b97 100644
> --- a/drivers/acpi/osi.c
> +++ b/drivers/acpi/osi.c
> @@ -44,30 +44,6 @@ osi_setup_entries[OSI_STRING_ENTRIES_MAX] __initdata = {
>   {"Processor Device", true},
>   {"3.0 _SCP Extensions", true},
>   {"Processor Aggregator Device", true},
> - /*
> -  * Linux-Dell-Video is used by BIOS to disable RTD3 for NVidia graphics
> -  * cards as RTD3 is not supported by drivers now.  Systems with NVidia
> -  * cards will hang without RTD3 disabled.
> -  *
> -  * Once NVidia drivers officially support RTD3, this _OSI strings can
> -  * be removed if both new and old graphics cards are supported.
> -  */
> - {"Linux-Dell-Video", true},
> - /*
> -  * Linux-Lenovo-NV-HDMI-Audio is used by BIOS to power on NVidia's HDMI
> -  * audio device which is turned off for power-saving in Windows OS.
> -  * This power management feature observed on some Lenovo Thinkpad
> -  * systems which will not be able to output audio via HDMI without
> -  * a BIOS workaround.
> -  */
> - {"Linux-Lenovo-NV-HDMI-Audio", true},
> - /*
> -  * Linux-HPI-Hybrid-Graphics is used by BIOS to enable dGPU to
> -  * output video directly to external monitors on HP Inc. mobile
> -  * workstations as Nvidia and AMD VGA drivers provide limited
> -  * hybrid graphics supports.
> -  */
> - {"Linux-HPI-Hybrid-Graphics", true},
>  };
>  
>  static u32 acpi_osi_handler(acpi_string interface, u32 supported)

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH] RFC: ACPI / OSI: remove workarounds for hybrid graphics laptops

2020-07-17 Thread Karol Herbst
It's hard to figure out what systems are actually affected and right now I
don't see a good way of removing those...

But I'd like to see thos getting removed and drivers fixed instead (which
happened at least for nouveau).

And as mentioned before, I prefer people working on fixing issues instead
of spending time to add firmware level workarounds which are hard to know
to which systems they apply to, hard to remove and basically a big huge
pain to work with.
In the end I have no idea how to even figure out what systems are affected
and which not by this, so I have no idea how to even verify we can safely
remove this (which just means those are impossible to remove unless we risk
breaking systems, which again makes those supper annoying to deal with).

Also from the comments it's hard to get what those bits really do. Are they
just preventing runtime pm or do the devices are powered down when booting?
I am sure it's the former, still...

Please, don't do this again.

For now, those workaround prevent power savings on systems those workaround
applies to, which might be any so those should get removed asap and if
new issues arrise removing those please do a proper bug report and we can
look into it and come up with a proper fix (and keep this patch out until
we resolve all of those).

Signed-off-by: Karol Herbst 
CC: Alex Hung 
CC: "Rafael J. Wysocki" 
CC: Len Brown 
CC: Lyude Paul 
CC: linux-ker...@vger.kernel.org
CC: dri-de...@lists.freedesktop.org
CC: nouveau@lists.freedesktop.org
---
 drivers/acpi/osi.c | 24 
 1 file changed, 24 deletions(-)

diff --git a/drivers/acpi/osi.c b/drivers/acpi/osi.c
index 9f68538091384..d4405e1ca9b97 100644
--- a/drivers/acpi/osi.c
+++ b/drivers/acpi/osi.c
@@ -44,30 +44,6 @@ osi_setup_entries[OSI_STRING_ENTRIES_MAX] __initdata = {
{"Processor Device", true},
{"3.0 _SCP Extensions", true},
{"Processor Aggregator Device", true},
-   /*
-* Linux-Dell-Video is used by BIOS to disable RTD3 for NVidia graphics
-* cards as RTD3 is not supported by drivers now.  Systems with NVidia
-* cards will hang without RTD3 disabled.
-*
-* Once NVidia drivers officially support RTD3, this _OSI strings can
-* be removed if both new and old graphics cards are supported.
-*/
-   {"Linux-Dell-Video", true},
-   /*
-* Linux-Lenovo-NV-HDMI-Audio is used by BIOS to power on NVidia's HDMI
-* audio device which is turned off for power-saving in Windows OS.
-* This power management feature observed on some Lenovo Thinkpad
-* systems which will not be able to output audio via HDMI without
-* a BIOS workaround.
-*/
-   {"Linux-Lenovo-NV-HDMI-Audio", true},
-   /*
-* Linux-HPI-Hybrid-Graphics is used by BIOS to enable dGPU to
-* output video directly to external monitors on HP Inc. mobile
-* workstations as Nvidia and AMD VGA drivers provide limited
-* hybrid graphics supports.
-*/
-   {"Linux-HPI-Hybrid-Graphics", true},
 };
 
 static u32 acpi_osi_handler(acpi_string interface, u32 supported)
-- 
2.26.2

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"

2020-07-17 Thread Lyude Paul
On Thu, 2020-07-16 at 18:54 -0500, Bjorn Helgaas wrote:
> [+cc Sasha -- stable kernel regression]
> [+cc Patrick, Kai-Heng, LKML]
> 
> On Fri, Jul 17, 2020 at 12:10:39AM +0200, Karol Herbst wrote:
> > On Tue, Jul 7, 2020 at 9:30 PM Karol Herbst  wrote:
> > > Hi everybody,
> > > 
> > > with the mentioned commit Nouveau isn't able to load firmware onto the
> > > GPU on one of my systems here. Even though the issue doesn't always
> > > happen I am quite confident this is the commit breaking it.
> > > 
> > > I am still digging into the issue and trying to figure out what
> > > exactly breaks, but it shows up in different ways. Either we are not
> > > able to boot the engines on the GPU or the GPU becomes unresponsive.
> > > Btw, this is also a system where our runtime power management issue
> > > shows up, so maybe there is indeed something funky with the bridge
> > > controller.
> > > 
> > > Just pinging you in case you have an idea on how this could break Nouveau
> > > 
> > > most of the times it shows up like this:
> > > nouveau :01:00.0: acr: AHESASC binary failed
> > > 
> > > Sometimes it works at boot and fails at runtime resuming with random
> > > faults. So I will be investigating a bit more, but yeah... I am super
> > > sure the commit triggered this issue, no idea if it actually causes
> > > it.
> > 
> > so yeah.. I reverted that locally and never ran into issues again.
> > Still valid on latest 5.7. So can we get this reverted or properly
> > fixed? This breaks runtime pm for us on at least some hardware.
> 
> Yeah, that stinks.  We had another similar report from Patrick:
> 
>   
> https://lore.kernel.org/r/caerspo5stek_my1dehwp7ahd0xop87+ohywktjbl7algdbx...@mail.gmail.com
> 
> Apparently the problem is ec411e02b7a2 ("PCI/PM: Assume ports without
> DLL Link Active train links in 100 ms"), which Patrick found was
> backported to v5.4.49 as 828b192c57e8, and you found was backported to
> v5.7.6 as afaff825e3a4.
> 
> Oddly, Patrick reported that v5.7.7 worked correctly, even though it
> still contains afaff825e3a4.
> 
> I guess in the absence of any other clues we'll have to revert it.
> I hate to do that because that means we'll have slow resume of
> Thunderbolt-connected devices again, but that's better than having
> GPUs completely broken.
> 
> Could you and Patrick open bugzilla.kernel.org reports, attach dmesg
> logs and "sudo lspci -vv" output, and add the URLs to Kai-Heng's
> original report at https://bugzilla.kernel.org/show_bug.cgi?id=206837
> and to this thread?
> 
> There must be a way to fix the slow resume problem without breaking
> the GPUs.

Isn't it possible to tell whether a PCI device is connected through thunderbolt
or not? We could probably get away with just defaulting to 100ms for thunderbolt
devices without DLL Link Active specified, and then default to the old delay
value for non-thunderbolt devices.

> 
> Bjorn
> 

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread James Jones
This should resolve the inability to start X with the new NV format 
modifier support in nouveau.  FYI, I'm offline next week, but I'll check 
in tonight in case there are any review comments.  When I'm back, I'll 
get the associated userspace fixes cleaned up and out to the appropriate 
lists.


Thanks,
-James

On 7/17/20 11:57 AM, James Jones wrote:

Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Signed-off-by: James Jones 
---
  drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
  1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
   uint32_t *tile_mode,
   uint8_t *kind)
  {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
BUG_ON(!tile_mode || !kind);
  
+	if ((modifier & (0xffull << 12)) == 0ull) {

+   /* Legacy modifier.  Translate to this device's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }
+
if (modifier == DRM_FORMAT_MOD_LINEAR) {
/* tile_mode will not be used in this case */
*tile_mode = 0;
@@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
}
  }
  
+static const u64 legacy_modifiers[] = {

+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
+   DRM_FORMAT_MOD_INVALID
+};
+
  static int
  nouveau_validate_decode_mod(struct nouveau_drm *drm,
uint64_t modifier,
@@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
 (disp->format_modifiers[mod] != modifier);
 mod++);
  
-	if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)

-   return -EINVAL;
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
+   for (mod = 0;
+(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(legacy_modifiers[mod] != modifier);
+mod++);
+   if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+   }
  
  	nouveau_decode_mod(drm, modifier, tile_mode, kind);
  


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread James Jones
Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
   uint32_t *tile_mode,
   uint8_t *kind)
 {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
BUG_ON(!tile_mode || !kind);
 
+   if ((modifier & (0xffull << 12)) == 0ull) {
+   /* Legacy modifier.  Translate to this device's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }
+
if (modifier == DRM_FORMAT_MOD_LINEAR) {
/* tile_mode will not be used in this case */
*tile_mode = 0;
@@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
}
 }
 
+static const u64 legacy_modifiers[] = {
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
+   DRM_FORMAT_MOD_INVALID
+};
+
 static int
 nouveau_validate_decode_mod(struct nouveau_drm *drm,
uint64_t modifier,
@@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
 (disp->format_modifiers[mod] != modifier);
 mod++);
 
-   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
-   return -EINVAL;
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
+   for (mod = 0;
+(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(legacy_modifiers[mod] != modifier);
+mod++);
+   if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+   }
 
nouveau_decode_mod(drm, modifier, tile_mode, kind);
 
-- 
2.17.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"

2020-07-17 Thread Sasha Levin

On Fri, Jul 17, 2020 at 02:43:52AM +0200, Karol Herbst wrote:

On Fri, Jul 17, 2020 at 1:54 AM Bjorn Helgaas  wrote:


[+cc Sasha -- stable kernel regression]
[+cc Patrick, Kai-Heng, LKML]

On Fri, Jul 17, 2020 at 12:10:39AM +0200, Karol Herbst wrote:
> On Tue, Jul 7, 2020 at 9:30 PM Karol Herbst  wrote:
> >
> > Hi everybody,
> >
> > with the mentioned commit Nouveau isn't able to load firmware onto the
> > GPU on one of my systems here. Even though the issue doesn't always
> > happen I am quite confident this is the commit breaking it.
> >
> > I am still digging into the issue and trying to figure out what
> > exactly breaks, but it shows up in different ways. Either we are not
> > able to boot the engines on the GPU or the GPU becomes unresponsive.
> > Btw, this is also a system where our runtime power management issue
> > shows up, so maybe there is indeed something funky with the bridge
> > controller.
> >
> > Just pinging you in case you have an idea on how this could break Nouveau
> >
> > most of the times it shows up like this:
> > nouveau :01:00.0: acr: AHESASC binary failed
> >
> > Sometimes it works at boot and fails at runtime resuming with random
> > faults. So I will be investigating a bit more, but yeah... I am super
> > sure the commit triggered this issue, no idea if it actually causes
> > it.
>
> so yeah.. I reverted that locally and never ran into issues again.
> Still valid on latest 5.7. So can we get this reverted or properly
> fixed? This breaks runtime pm for us on at least some hardware.

Yeah, that stinks.  We had another similar report from Patrick:

  
https://lore.kernel.org/r/caerspo5stek_my1dehwp7ahd0xop87+ohywktjbl7algdbx...@mail.gmail.com

Apparently the problem is ec411e02b7a2 ("PCI/PM: Assume ports without
DLL Link Active train links in 100 ms"), which Patrick found was
backported to v5.4.49 as 828b192c57e8, and you found was backported to
v5.7.6 as afaff825e3a4.

Oddly, Patrick reported that v5.7.7 worked correctly, even though it
still contains afaff825e3a4.

I guess in the absence of any other clues we'll have to revert it.
I hate to do that because that means we'll have slow resume of
Thunderbolt-connected devices again, but that's better than having
GPUs completely broken.

Could you and Patrick open bugzilla.kernel.org reports, attach dmesg
logs and "sudo lspci -vv" output, and add the URLs to Kai-Heng's
original report at https://bugzilla.kernel.org/show_bug.cgi?id=206837
and to this thread?

There must be a way to fix the slow resume problem without breaking
the GPUs.



I wouldn't be surprised if this is related to the Intel bridge we
check against for Nouveau.. I still have to check on another laptop
with the same bridge our workaround was required as well but wouldn't
be surprised if it shows the same problem. Will get you the
information from both systems tomorrow then.


I take it that ec411e02b7a2 will be reverted upstream?

--
Thanks,
Sasha
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"

2020-07-17 Thread Karol Herbst
Filed at https://bugzilla.kernel.org/show_bug.cgi?id=208597

oddly enough I wasn't able to reproduce it on my XPS 9560, will ping
once something breaks.

On Fri, Jul 17, 2020 at 2:43 AM Karol Herbst  wrote:
>
> On Fri, Jul 17, 2020 at 1:54 AM Bjorn Helgaas  wrote:
> >
> > [+cc Sasha -- stable kernel regression]
> > [+cc Patrick, Kai-Heng, LKML]
> >
> > On Fri, Jul 17, 2020 at 12:10:39AM +0200, Karol Herbst wrote:
> > > On Tue, Jul 7, 2020 at 9:30 PM Karol Herbst  wrote:
> > > >
> > > > Hi everybody,
> > > >
> > > > with the mentioned commit Nouveau isn't able to load firmware onto the
> > > > GPU on one of my systems here. Even though the issue doesn't always
> > > > happen I am quite confident this is the commit breaking it.
> > > >
> > > > I am still digging into the issue and trying to figure out what
> > > > exactly breaks, but it shows up in different ways. Either we are not
> > > > able to boot the engines on the GPU or the GPU becomes unresponsive.
> > > > Btw, this is also a system where our runtime power management issue
> > > > shows up, so maybe there is indeed something funky with the bridge
> > > > controller.
> > > >
> > > > Just pinging you in case you have an idea on how this could break 
> > > > Nouveau
> > > >
> > > > most of the times it shows up like this:
> > > > nouveau :01:00.0: acr: AHESASC binary failed
> > > >
> > > > Sometimes it works at boot and fails at runtime resuming with random
> > > > faults. So I will be investigating a bit more, but yeah... I am super
> > > > sure the commit triggered this issue, no idea if it actually causes
> > > > it.
> > >
> > > so yeah.. I reverted that locally and never ran into issues again.
> > > Still valid on latest 5.7. So can we get this reverted or properly
> > > fixed? This breaks runtime pm for us on at least some hardware.
> >
> > Yeah, that stinks.  We had another similar report from Patrick:
> >
> >   
> > https://lore.kernel.org/r/caerspo5stek_my1dehwp7ahd0xop87+ohywktjbl7algdbx...@mail.gmail.com
> >
> > Apparently the problem is ec411e02b7a2 ("PCI/PM: Assume ports without
> > DLL Link Active train links in 100 ms"), which Patrick found was
> > backported to v5.4.49 as 828b192c57e8, and you found was backported to
> > v5.7.6 as afaff825e3a4.
> >
> > Oddly, Patrick reported that v5.7.7 worked correctly, even though it
> > still contains afaff825e3a4.
> >
> > I guess in the absence of any other clues we'll have to revert it.
> > I hate to do that because that means we'll have slow resume of
> > Thunderbolt-connected devices again, but that's better than having
> > GPUs completely broken.
> >
> > Could you and Patrick open bugzilla.kernel.org reports, attach dmesg
> > logs and "sudo lspci -vv" output, and add the URLs to Kai-Heng's
> > original report at https://bugzilla.kernel.org/show_bug.cgi?id=206837
> > and to this thread?
> >
> > There must be a way to fix the slow resume problem without breaking
> > the GPUs.
> >
>
> I wouldn't be surprised if this is related to the Intel bridge we
> check against for Nouveau.. I still have to check on another laptop
> with the same bridge our workaround was required as well but wouldn't
> be surprised if it shows the same problem. Will get you the
> information from both systems tomorrow then.
>
> > Bjorn
> >

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau