Re: drm-next-misc merge breaks vmwgfx
On Thu, Apr 6, 2017 at 8:01 PM, Thomas Hellstromwrote: > On 04/06/2017 04:46 PM, Daniel Vetter wrote: >> On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom >> wrote: >>> On 04/06/2017 02:34 PM, Daniel Vetter wrote: Hi Thomas, Bisected an offender already? Afaik there's no one else who reported issues thus far, and for our own CI it seems all still fine. -Daniel >>> Hi, Daniel, >>> >>> Yes, I rebased drm-misc-next on top of vmwgfx-next and found the culprit >>> to be >>> >>> 38b6441e "drm/atomic-helper: Remove the backoff hack from set_config.." >>> >>> Reverting first 1fa4da04 and then >>> 38b6441e >>> >>> fixes the problem. >> Yeah, we seem to have a solid functional conflict between the vmwgfx >> atomic conversion, and the changes in drm-misc-next. Preliminary >> analysis, but I think what's going on is: >> - With the above changes in -misc we punt the deadlock retry loop to >> the callers of ->set_config. >> - But since it would have been way too invasive, I only fixed up the >> atomic callers (in most places we have special paths for atomic and >> non-atomic due to slightly different semantics), which means for >> legacy functions we in some cases pass a NULL ctx down to >> ->set_config. But since legacy paths only get called on legacy >> drivers, no problem. >> - Well except I've done that audit before vmwgfx became atomic, and >> that audit is now wrong, and I've forgotten to properly re-audit when >> the conflicts happened all around. But since I half-expect to hit a >> mid-driver conversion with this I did sprinkle >> WARN_ON(drm_drv_uses_atomic_modeset()) over all these paths. >> >> So assuming this is correct, you should see a pile of WARN_ON >> backtraces that you're hitting in the atomic-vmwgfx+drm-misc-next >> combo. The proper fix would be to switch over to atomic primitives for >> all these cases. On a quick look I see some in the vmwgfx fbdev >> emulation code, might even be worth it to check whether we could reuse >> the core helpers (which do this split handling alread) in some cases. >> >> Cheers, Daniel > > So with the two reverts previously mentioned applied, I see the > following. Is this consistent with the above. > > FWIW I did a pretty big vmwgfx fbdev rewrite some time ago, but at that > time we didn't have the callbacks > necessary to use the helpers. Maybe that has changed with the atomic > implementation. > > Considering that Sinclair just had a baby, I'm not 100% sure though, > that I have time to fix this up in the vmwgfx driver for this merge > window... > > /Thomas > > > [9.547101] WARNING: CPU: 3 PID: 359 at > drivers/gpu/drm/drm_modeset_lock.c:107 drm_modeset_lock_all+0xb8/0xc0 [drm] > [9.547102] Modules linked in: snd_rawmidi snd_timer > ghash_clmulni_intel intel_rapl_perf ppdev snd_seq_device vmw_balloon snd > rfkill joydev soundcore nfit parport_pc parport acpi_cpufreq tpm_tis > tpm_tis_core tpm shpchp vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl > lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi > scsi_transport_spi mptscsih crc32c_intel e1000 mptbase ata_generic > serio_raw pata_acpi uas usb_storage > [9.547122] CPU: 3 PID: 359 Comm: plymouthd Tainted: GW > 4.11.0-rc4+ #2 > [9.547122] Hardware name: VMware, Inc. VMware Virtual Platform/440BX > Desktop Reference Platform, BIOS 6.00 01/24/2017 > [9.547123] Call Trace: > [9.547128] dump_stack+0x63/0x86 > [9.547130] __warn+0xcb/0xf0 > [9.547131] warn_slowpath_null+0x1d/0x20 > [9.547137] drm_modeset_lock_all+0xb8/0xc0 [drm] > [9.547143] vmw_framebuffer_dmabuf_dirty+0x4c/0x200 [vmwgfx] > [9.547145] ? __check_object_size+0x100/0x19d > [9.547152] drm_mode_dirtyfb_ioctl+0x178/0x1a0 [drm] > [9.547158] drm_ioctl+0x209/0x4c0 [drm] > [9.547164] ? drm_mode_getfb+0x100/0x100 [drm] > [9.547165] ? __do_fault+0x1e/0x110 > [9.547169] vmw_generic_ioctl+0x193/0x2d0 [vmwgfx] > [9.547175] ? drm_getunique+0xa0/0xa0 [drm] > [9.547179] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx] > [9.547180] do_vfs_ioctl+0xa3/0x5f0 > [9.547181] SyS_ioctl+0x79/0x90 > [9.547182] do_syscall_64+0x67/0x180 > [9.547184] entry_SYSCALL64_slow_path+0x25/0x25 > [9.547185] RIP: 0033:0x7fd4c93b7787 > [9.547186] RSP: 002b:7fff17d06b88 EFLAGS: 0246 ORIG_RAX: > 0010 > [9.547187] RAX: ffda RBX: 0c80 RCX: > 7fd4c93b7787 > [9.547187] RDX: 7fff17d06bc0 RSI: c01864b1 RDI: > 0009 > [9.547188] RBP: 7fff17d06bc0 R08: 7fd4c7554000 R09: > 7fd4ca1e9010 > [9.547188] R10: 558ffe14ca40 R11: 0246 R12: > c01864b1 > [9.547188] R13: 0009 R14: R15: > 0258 > [9.547190] ---[ end trace 46a3554c8816a28b ]--- This is an artifact of the two reverts, I've forgotten to properly clear config->acquire_ctx again in the intermediate states.
Re: drm-next-misc merge breaks vmwgfx
On 04/06/2017 04:46 PM, Daniel Vetter wrote: > On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom> wrote: >> On 04/06/2017 02:34 PM, Daniel Vetter wrote: >>> Hi Thomas, >>> >>> Bisected an offender already? Afaik there's no one else who reported >>> issues thus far, and for our own CI it seems all still fine. >>> -Daniel >> Hi, Daniel, >> >> Yes, I rebased drm-misc-next on top of vmwgfx-next and found the culprit >> to be >> >> 38b6441e "drm/atomic-helper: Remove the backoff hack from set_config.." >> >> Reverting first 1fa4da04 and then >> 38b6441e >> >> fixes the problem. > Yeah, we seem to have a solid functional conflict between the vmwgfx > atomic conversion, and the changes in drm-misc-next. Preliminary > analysis, but I think what's going on is: > - With the above changes in -misc we punt the deadlock retry loop to > the callers of ->set_config. > - But since it would have been way too invasive, I only fixed up the > atomic callers (in most places we have special paths for atomic and > non-atomic due to slightly different semantics), which means for > legacy functions we in some cases pass a NULL ctx down to > ->set_config. But since legacy paths only get called on legacy > drivers, no problem. > - Well except I've done that audit before vmwgfx became atomic, and > that audit is now wrong, and I've forgotten to properly re-audit when > the conflicts happened all around. But since I half-expect to hit a > mid-driver conversion with this I did sprinkle > WARN_ON(drm_drv_uses_atomic_modeset()) over all these paths. > > So assuming this is correct, you should see a pile of WARN_ON > backtraces that you're hitting in the atomic-vmwgfx+drm-misc-next > combo. The proper fix would be to switch over to atomic primitives for > all these cases. On a quick look I see some in the vmwgfx fbdev > emulation code, might even be worth it to check whether we could reuse > the core helpers (which do this split handling alread) in some cases. > > Cheers, Daniel So with the two reverts previously mentioned applied, I see the following. Is this consistent with the above. FWIW I did a pretty big vmwgfx fbdev rewrite some time ago, but at that time we didn't have the callbacks necessary to use the helpers. Maybe that has changed with the atomic implementation. Considering that Sinclair just had a baby, I'm not 100% sure though, that I have time to fix this up in the vmwgfx driver for this merge window... /Thomas [9.547101] WARNING: CPU: 3 PID: 359 at drivers/gpu/drm/drm_modeset_lock.c:107 drm_modeset_lock_all+0xb8/0xc0 [drm] [9.547102] Modules linked in: snd_rawmidi snd_timer ghash_clmulni_intel intel_rapl_perf ppdev snd_seq_device vmw_balloon snd rfkill joydev soundcore nfit parport_pc parport acpi_cpufreq tpm_tis tpm_tis_core tpm shpchp vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih crc32c_intel e1000 mptbase ata_generic serio_raw pata_acpi uas usb_storage [9.547122] CPU: 3 PID: 359 Comm: plymouthd Tainted: GW 4.11.0-rc4+ #2 [9.547122] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 01/24/2017 [9.547123] Call Trace: [9.547128] dump_stack+0x63/0x86 [9.547130] __warn+0xcb/0xf0 [9.547131] warn_slowpath_null+0x1d/0x20 [9.547137] drm_modeset_lock_all+0xb8/0xc0 [drm] [9.547143] vmw_framebuffer_dmabuf_dirty+0x4c/0x200 [vmwgfx] [9.547145] ? __check_object_size+0x100/0x19d [9.547152] drm_mode_dirtyfb_ioctl+0x178/0x1a0 [drm] [9.547158] drm_ioctl+0x209/0x4c0 [drm] [9.547164] ? drm_mode_getfb+0x100/0x100 [drm] [9.547165] ? __do_fault+0x1e/0x110 [9.547169] vmw_generic_ioctl+0x193/0x2d0 [vmwgfx] [9.547175] ? drm_getunique+0xa0/0xa0 [drm] [9.547179] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx] [9.547180] do_vfs_ioctl+0xa3/0x5f0 [9.547181] SyS_ioctl+0x79/0x90 [9.547182] do_syscall_64+0x67/0x180 [9.547184] entry_SYSCALL64_slow_path+0x25/0x25 [9.547185] RIP: 0033:0x7fd4c93b7787 [9.547186] RSP: 002b:7fff17d06b88 EFLAGS: 0246 ORIG_RAX: 0010 [9.547187] RAX: ffda RBX: 0c80 RCX: 7fd4c93b7787 [9.547187] RDX: 7fff17d06bc0 RSI: c01864b1 RDI: 0009 [9.547188] RBP: 7fff17d06bc0 R08: 7fd4c7554000 R09: 7fd4ca1e9010 [9.547188] R10: 558ffe14ca40 R11: 0246 R12: c01864b1 [9.547188] R13: 0009 R14: R15: 0258 [9.547190] ---[ end trace 46a3554c8816a28b ]--- 4.824456] WARNING: CPU: 2 PID: 359 at drivers/gpu/drm/drm_crtc.c:499 drm_mode_set_config_internal+0x40/0x50 [drm] [4.824457] Modules linked in: vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih crc32c_intel e1000(+) mptbase ata_generic serio_raw pata_acpi uas usb_storage [4.824467] CPU: 2 PID: 359 Comm: plymouthd Tainted: GW
Re: drm-next-misc merge breaks vmwgfx
On 04/06/2017 05:01 PM, Daniel Vetter wrote: > On Thu, Apr 6, 2017 at 4:56 PM, Thomas Hellstrom> wrote: >> On 04/06/2017 04:47 PM, Daniel Vetter wrote: >>> On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom >>> wrote: Also, when testing the tip of drm-misc-next (with the non-atomic vmwgfx) there appeared to be warnings about a non-NULL dev->mode_config.acquire_ctx. I'll see if I can reproduce those, but perhaps removing the line dev->mode_config.acquire_ctx = in drm_mode_setcrtc() is part of the problem. >>> Hm, where do you hit that? And by tip of drm-misc-next, do you mean >>> the very latest state, which includes atomic vmwgfx, or is this with >>> the non-atomic vmwgfx? Please paste the backtraces (and for which tree >>> they are). >>> -Daniel >> Actually this must have been from a confused rebase-bisect state >> somewhere. I can't reproduce this. > Have you changed CONFIG_DEBUG_WW_MUTEX_SLOWPATH perhaps? Without this > you shouldn't be able to hit any retry path (so minus warnings the > patch you bisected won't have an affected), with you will hit the > deadlock retry paths pseudo-randomly. It might take a few trials to > hit it again, depending upon timing. > -Daniel No, it's not that. I've set it to off for all these tests, but lockdep and hang-checks on. /Thomas ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm-next-misc merge breaks vmwgfx
On Thu, Apr 6, 2017 at 4:56 PM, Thomas Hellstromwrote: > On 04/06/2017 04:47 PM, Daniel Vetter wrote: >> On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom >> wrote: >>> Also, when testing the tip of drm-misc-next (with the non-atomic vmwgfx) >>> there appeared to be warnings about a non-NULL >>> dev->mode_config.acquire_ctx. I'll see if I can reproduce those, but >>> perhaps removing the line >>> >>> dev->mode_config.acquire_ctx = >>> >>> in drm_mode_setcrtc() >>> >>> is part of the problem. >> Hm, where do you hit that? And by tip of drm-misc-next, do you mean >> the very latest state, which includes atomic vmwgfx, or is this with >> the non-atomic vmwgfx? Please paste the backtraces (and for which tree >> they are). >> -Daniel > > Actually this must have been from a confused rebase-bisect state > somewhere. I can't reproduce this. Have you changed CONFIG_DEBUG_WW_MUTEX_SLOWPATH perhaps? Without this you shouldn't be able to hit any retry path (so minus warnings the patch you bisected won't have an affected), with you will hit the deadlock retry paths pseudo-randomly. It might take a few trials to hit it again, depending upon timing. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm-next-misc merge breaks vmwgfx
On 04/06/2017 04:47 PM, Daniel Vetter wrote: > On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom> wrote: >> Also, when testing the tip of drm-misc-next (with the non-atomic vmwgfx) >> there appeared to be warnings about a non-NULL >> dev->mode_config.acquire_ctx. I'll see if I can reproduce those, but >> perhaps removing the line >> >> dev->mode_config.acquire_ctx = >> >> in drm_mode_setcrtc() >> >> is part of the problem. > Hm, where do you hit that? And by tip of drm-misc-next, do you mean > the very latest state, which includes atomic vmwgfx, or is this with > the non-atomic vmwgfx? Please paste the backtraces (and for which tree > they are). > -Daniel Actually this must have been from a confused rebase-bisect state somewhere. I can't reproduce this. /Thomas ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm-next-misc merge breaks vmwgfx
On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstromwrote: > > Also, when testing the tip of drm-misc-next (with the non-atomic vmwgfx) > there appeared to be warnings about a non-NULL > dev->mode_config.acquire_ctx. I'll see if I can reproduce those, but > perhaps removing the line > > dev->mode_config.acquire_ctx = > > in drm_mode_setcrtc() > > is part of the problem. Hm, where do you hit that? And by tip of drm-misc-next, do you mean the very latest state, which includes atomic vmwgfx, or is this with the non-atomic vmwgfx? Please paste the backtraces (and for which tree they are). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm-next-misc merge breaks vmwgfx
On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstromwrote: > On 04/06/2017 02:34 PM, Daniel Vetter wrote: >> Hi Thomas, >> >> Bisected an offender already? Afaik there's no one else who reported >> issues thus far, and for our own CI it seems all still fine. >> -Daniel > > Hi, Daniel, > > Yes, I rebased drm-misc-next on top of vmwgfx-next and found the culprit > to be > > 38b6441e "drm/atomic-helper: Remove the backoff hack from set_config.." > > Reverting first 1fa4da04 and then > 38b6441e > > fixes the problem. Yeah, we seem to have a solid functional conflict between the vmwgfx atomic conversion, and the changes in drm-misc-next. Preliminary analysis, but I think what's going on is: - With the above changes in -misc we punt the deadlock retry loop to the callers of ->set_config. - But since it would have been way too invasive, I only fixed up the atomic callers (in most places we have special paths for atomic and non-atomic due to slightly different semantics), which means for legacy functions we in some cases pass a NULL ctx down to ->set_config. But since legacy paths only get called on legacy drivers, no problem. - Well except I've done that audit before vmwgfx became atomic, and that audit is now wrong, and I've forgotten to properly re-audit when the conflicts happened all around. But since I half-expect to hit a mid-driver conversion with this I did sprinkle WARN_ON(drm_drv_uses_atomic_modeset()) over all these paths. So assuming this is correct, you should see a pile of WARN_ON backtraces that you're hitting in the atomic-vmwgfx+drm-misc-next combo. The proper fix would be to switch over to atomic primitives for all these cases. On a quick look I see some in the vmwgfx fbdev emulation code, might even be worth it to check whether we could reuse the core helpers (which do this split handling alread) in some cases. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm-next-misc merge breaks vmwgfx
On 04/06/2017 02:34 PM, Daniel Vetter wrote: > Hi Thomas, > > Bisected an offender already? Afaik there's no one else who reported > issues thus far, and for our own CI it seems all still fine. > -Daniel Hi, Daniel, Yes, I rebased drm-misc-next on top of vmwgfx-next and found the culprit to be 38b6441e "drm/atomic-helper: Remove the backoff hack from set_config.." Reverting first 1fa4da04 and then 38b6441e fixes the problem. Also, when testing the tip of drm-misc-next (with the non-atomic vmwgfx) there appeared to be warnings about a non-NULL dev->mode_config.acquire_ctx. I'll see if I can reproduce those, but perhaps removing the line dev->mode_config.acquire_ctx = in drm_mode_setcrtc() is part of the problem. /Thomas > On Wed, Apr 5, 2017 at 7:45 PM, Thomas Hellstrom> wrote: >> Hi! >> >> It appears like the drm-next commit 320d8c3d3 "Merge tag >> 'drm-misc-next-2017-03-31..." breaks vmwgfx. >> >> A black screen shown where plymouthd use to show the boot splash on >> Fedora 25. Nothing relevant in the logs. >> >> I eyed through the vmwgfx conflict fixes, but nothing immediately stands >> out as incorrect. >> >> I'll dig into this tomorrow. Does anybody know of any other drivers that >> don't work well with current drm-next? >> >> /Thomas >> >> >> >> >> > > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm-next-misc merge breaks vmwgfx
Hi Thomas, Bisected an offender already? Afaik there's no one else who reported issues thus far, and for our own CI it seems all still fine. -Daniel On Wed, Apr 5, 2017 at 7:45 PM, Thomas Hellstromwrote: > Hi! > > It appears like the drm-next commit 320d8c3d3 "Merge tag > 'drm-misc-next-2017-03-31..." breaks vmwgfx. > > A black screen shown where plymouthd use to show the boot splash on > Fedora 25. Nothing relevant in the logs. > > I eyed through the vmwgfx conflict fixes, but nothing immediately stands > out as incorrect. > > I'll dig into this tomorrow. Does anybody know of any other drivers that > don't work well with current drm-next? > > /Thomas > > > > > -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
drm-next-misc merge breaks vmwgfx
Hi! It appears like the drm-next commit 320d8c3d3 "Merge tag 'drm-misc-next-2017-03-31..." breaks vmwgfx. A black screen shown where plymouthd use to show the boot splash on Fedora 25. Nothing relevant in the logs. I eyed through the vmwgfx conflict fixes, but nothing immediately stands out as incorrect. I'll dig into this tomorrow. Does anybody know of any other drivers that don't work well with current drm-next? /Thomas ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel