Re: drm-next-misc merge breaks vmwgfx

2017-04-06 Thread Daniel Vetter
On Thu, Apr 6, 2017 at 8:01 PM, Thomas Hellstrom  wrote:
> On 04/06/2017 04:46 PM, Daniel Vetter wrote:
>> On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom  
>> wrote:
>>> On 04/06/2017 02:34 PM, Daniel Vetter wrote:
 Hi Thomas,

 Bisected an offender already? Afaik there's no one else who reported
 issues thus far, and for our own CI it seems all still fine.
 -Daniel
>>> Hi, Daniel,
>>>
>>> Yes, I rebased drm-misc-next on top of vmwgfx-next and found the culprit
>>> to be
>>>
>>> 38b6441e "drm/atomic-helper: Remove the backoff hack from set_config.."
>>>
>>> Reverting first 1fa4da04 and then
>>> 38b6441e
>>>
>>> fixes the problem.
>> Yeah, we seem to have a solid functional conflict between the vmwgfx
>> atomic conversion, and the changes in drm-misc-next. Preliminary
>> analysis, but I think what's going on is:
>> - With the above changes in -misc we punt the deadlock retry loop to
>> the callers of ->set_config.
>> - But since it would have been way too invasive, I only fixed up the
>> atomic callers (in most places we have special paths for atomic and
>> non-atomic due to slightly different semantics), which means for
>> legacy functions we in some cases pass a NULL ctx down to
>> ->set_config. But since legacy paths only get called on legacy
>> drivers, no problem.
>> - Well except I've done that audit before vmwgfx became atomic, and
>> that audit is now wrong, and I've forgotten to properly re-audit when
>> the conflicts happened all around. But since I half-expect to hit a
>> mid-driver conversion with this I did sprinkle
>> WARN_ON(drm_drv_uses_atomic_modeset()) over all these paths.
>>
>> So assuming this is correct, you should see a pile of WARN_ON
>> backtraces that you're hitting in the atomic-vmwgfx+drm-misc-next
>> combo. The proper fix would be to switch over to atomic primitives for
>> all these cases. On a quick look I see some in the vmwgfx fbdev
>> emulation code, might even be worth it to check whether we could reuse
>> the core helpers (which do this split handling alread) in some cases.
>>
>> Cheers, Daniel
>
> So with the two reverts previously mentioned applied, I see the
> following. Is this consistent with the above.
>
> FWIW I did a pretty big vmwgfx fbdev rewrite some time ago, but at that
> time we didn't have the callbacks
> necessary to use the helpers. Maybe that has changed with the atomic
> implementation.
>
> Considering that Sinclair just had a baby, I'm not 100% sure though,
> that I have time to fix this up in the vmwgfx driver for this merge
> window...
>
> /Thomas
>
>
> [9.547101] WARNING: CPU: 3 PID: 359 at
> drivers/gpu/drm/drm_modeset_lock.c:107 drm_modeset_lock_all+0xb8/0xc0 [drm]
> [9.547102] Modules linked in: snd_rawmidi snd_timer
> ghash_clmulni_intel intel_rapl_perf ppdev snd_seq_device vmw_balloon snd
> rfkill joydev soundcore nfit parport_pc parport acpi_cpufreq tpm_tis
> tpm_tis_core tpm shpchp vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl
> lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi
> scsi_transport_spi mptscsih crc32c_intel e1000 mptbase ata_generic
> serio_raw pata_acpi uas usb_storage
> [9.547122] CPU: 3 PID: 359 Comm: plymouthd Tainted: GW
> 4.11.0-rc4+ #2
> [9.547122] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
> Desktop Reference Platform, BIOS 6.00 01/24/2017
> [9.547123] Call Trace:
> [9.547128]  dump_stack+0x63/0x86
> [9.547130]  __warn+0xcb/0xf0
> [9.547131]  warn_slowpath_null+0x1d/0x20
> [9.547137]  drm_modeset_lock_all+0xb8/0xc0 [drm]
> [9.547143]  vmw_framebuffer_dmabuf_dirty+0x4c/0x200 [vmwgfx]
> [9.547145]  ? __check_object_size+0x100/0x19d
> [9.547152]  drm_mode_dirtyfb_ioctl+0x178/0x1a0 [drm]
> [9.547158]  drm_ioctl+0x209/0x4c0 [drm]
> [9.547164]  ? drm_mode_getfb+0x100/0x100 [drm]
> [9.547165]  ? __do_fault+0x1e/0x110
> [9.547169]  vmw_generic_ioctl+0x193/0x2d0 [vmwgfx]
> [9.547175]  ? drm_getunique+0xa0/0xa0 [drm]
> [9.547179]  vmw_unlocked_ioctl+0x15/0x20 [vmwgfx]
> [9.547180]  do_vfs_ioctl+0xa3/0x5f0
> [9.547181]  SyS_ioctl+0x79/0x90
> [9.547182]  do_syscall_64+0x67/0x180
> [9.547184]  entry_SYSCALL64_slow_path+0x25/0x25
> [9.547185] RIP: 0033:0x7fd4c93b7787
> [9.547186] RSP: 002b:7fff17d06b88 EFLAGS: 0246 ORIG_RAX:
> 0010
> [9.547187] RAX: ffda RBX: 0c80 RCX:
> 7fd4c93b7787
> [9.547187] RDX: 7fff17d06bc0 RSI: c01864b1 RDI:
> 0009
> [9.547188] RBP: 7fff17d06bc0 R08: 7fd4c7554000 R09:
> 7fd4ca1e9010
> [9.547188] R10: 558ffe14ca40 R11: 0246 R12:
> c01864b1
> [9.547188] R13: 0009 R14:  R15:
> 0258
> [9.547190] ---[ end trace 46a3554c8816a28b ]---

This is an artifact of the two reverts, I've forgotten to properly
clear config->acquire_ctx again in the intermediate states.


Re: drm-next-misc merge breaks vmwgfx

2017-04-06 Thread Thomas Hellstrom
On 04/06/2017 04:46 PM, Daniel Vetter wrote:
> On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom  
> wrote:
>> On 04/06/2017 02:34 PM, Daniel Vetter wrote:
>>> Hi Thomas,
>>>
>>> Bisected an offender already? Afaik there's no one else who reported
>>> issues thus far, and for our own CI it seems all still fine.
>>> -Daniel
>> Hi, Daniel,
>>
>> Yes, I rebased drm-misc-next on top of vmwgfx-next and found the culprit
>> to be
>>
>> 38b6441e "drm/atomic-helper: Remove the backoff hack from set_config.."
>>
>> Reverting first 1fa4da04 and then
>> 38b6441e
>>
>> fixes the problem.
> Yeah, we seem to have a solid functional conflict between the vmwgfx
> atomic conversion, and the changes in drm-misc-next. Preliminary
> analysis, but I think what's going on is:
> - With the above changes in -misc we punt the deadlock retry loop to
> the callers of ->set_config.
> - But since it would have been way too invasive, I only fixed up the
> atomic callers (in most places we have special paths for atomic and
> non-atomic due to slightly different semantics), which means for
> legacy functions we in some cases pass a NULL ctx down to
> ->set_config. But since legacy paths only get called on legacy
> drivers, no problem.
> - Well except I've done that audit before vmwgfx became atomic, and
> that audit is now wrong, and I've forgotten to properly re-audit when
> the conflicts happened all around. But since I half-expect to hit a
> mid-driver conversion with this I did sprinkle
> WARN_ON(drm_drv_uses_atomic_modeset()) over all these paths.
>
> So assuming this is correct, you should see a pile of WARN_ON
> backtraces that you're hitting in the atomic-vmwgfx+drm-misc-next
> combo. The proper fix would be to switch over to atomic primitives for
> all these cases. On a quick look I see some in the vmwgfx fbdev
> emulation code, might even be worth it to check whether we could reuse
> the core helpers (which do this split handling alread) in some cases.
>
> Cheers, Daniel

So with the two reverts previously mentioned applied, I see the
following. Is this consistent with the above.

FWIW I did a pretty big vmwgfx fbdev rewrite some time ago, but at that
time we didn't have the callbacks
necessary to use the helpers. Maybe that has changed with the atomic
implementation.

Considering that Sinclair just had a baby, I'm not 100% sure though,
that I have time to fix this up in the vmwgfx driver for this merge
window...

/Thomas


[9.547101] WARNING: CPU: 3 PID: 359 at
drivers/gpu/drm/drm_modeset_lock.c:107 drm_modeset_lock_all+0xb8/0xc0 [drm]
[9.547102] Modules linked in: snd_rawmidi snd_timer
ghash_clmulni_intel intel_rapl_perf ppdev snd_seq_device vmw_balloon snd
rfkill joydev soundcore nfit parport_pc parport acpi_cpufreq tpm_tis
tpm_tis_core tpm shpchp vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl
lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi
scsi_transport_spi mptscsih crc32c_intel e1000 mptbase ata_generic
serio_raw pata_acpi uas usb_storage
[9.547122] CPU: 3 PID: 359 Comm: plymouthd Tainted: GW  
4.11.0-rc4+ #2
[9.547122] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
Desktop Reference Platform, BIOS 6.00 01/24/2017
[9.547123] Call Trace:
[9.547128]  dump_stack+0x63/0x86
[9.547130]  __warn+0xcb/0xf0
[9.547131]  warn_slowpath_null+0x1d/0x20
[9.547137]  drm_modeset_lock_all+0xb8/0xc0 [drm]
[9.547143]  vmw_framebuffer_dmabuf_dirty+0x4c/0x200 [vmwgfx]
[9.547145]  ? __check_object_size+0x100/0x19d
[9.547152]  drm_mode_dirtyfb_ioctl+0x178/0x1a0 [drm]
[9.547158]  drm_ioctl+0x209/0x4c0 [drm]
[9.547164]  ? drm_mode_getfb+0x100/0x100 [drm]
[9.547165]  ? __do_fault+0x1e/0x110
[9.547169]  vmw_generic_ioctl+0x193/0x2d0 [vmwgfx]
[9.547175]  ? drm_getunique+0xa0/0xa0 [drm]
[9.547179]  vmw_unlocked_ioctl+0x15/0x20 [vmwgfx]
[9.547180]  do_vfs_ioctl+0xa3/0x5f0
[9.547181]  SyS_ioctl+0x79/0x90
[9.547182]  do_syscall_64+0x67/0x180
[9.547184]  entry_SYSCALL64_slow_path+0x25/0x25
[9.547185] RIP: 0033:0x7fd4c93b7787
[9.547186] RSP: 002b:7fff17d06b88 EFLAGS: 0246 ORIG_RAX:
0010
[9.547187] RAX: ffda RBX: 0c80 RCX:
7fd4c93b7787
[9.547187] RDX: 7fff17d06bc0 RSI: c01864b1 RDI:
0009
[9.547188] RBP: 7fff17d06bc0 R08: 7fd4c7554000 R09:
7fd4ca1e9010
[9.547188] R10: 558ffe14ca40 R11: 0246 R12:
c01864b1
[9.547188] R13: 0009 R14:  R15:
0258
[9.547190] ---[ end trace 46a3554c8816a28b ]---


4.824456] WARNING: CPU: 2 PID: 359 at drivers/gpu/drm/drm_crtc.c:499
drm_mode_set_config_internal+0x40/0x50 [drm]
[4.824457] Modules linked in: vmwgfx drm_kms_helper ttm drm mptspi
scsi_transport_spi mptscsih crc32c_intel e1000(+) mptbase ata_generic
serio_raw pata_acpi uas usb_storage
[4.824467] CPU: 2 PID: 359 Comm: plymouthd Tainted: GW   

Re: drm-next-misc merge breaks vmwgfx

2017-04-06 Thread Thomas Hellstrom
On 04/06/2017 05:01 PM, Daniel Vetter wrote:
> On Thu, Apr 6, 2017 at 4:56 PM, Thomas Hellstrom  
> wrote:
>> On 04/06/2017 04:47 PM, Daniel Vetter wrote:
>>> On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom  
>>> wrote:
 Also, when testing the tip of drm-misc-next (with the non-atomic vmwgfx)
 there appeared to be warnings about a non-NULL
 dev->mode_config.acquire_ctx. I'll see if I can reproduce those, but
 perhaps removing the line

 dev->mode_config.acquire_ctx = 

 in drm_mode_setcrtc()

 is part of the problem.
>>> Hm, where do you hit that? And by tip of drm-misc-next, do you mean
>>> the very latest state, which includes atomic vmwgfx, or is this with
>>> the non-atomic vmwgfx? Please paste the backtraces (and for which tree
>>> they are).
>>> -Daniel
>> Actually this must have been from a confused rebase-bisect state
>> somewhere. I can't reproduce this.
> Have you changed CONFIG_DEBUG_WW_MUTEX_SLOWPATH perhaps? Without this
> you shouldn't be able to hit any retry path (so minus warnings the
> patch you bisected won't have an affected), with you will hit the
> deadlock retry paths pseudo-randomly. It might take a few trials to
> hit it again, depending upon timing.
> -Daniel

No, it's not that. I've set it to off for all these tests, but lockdep
and hang-checks on.

/Thomas


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm-next-misc merge breaks vmwgfx

2017-04-06 Thread Daniel Vetter
On Thu, Apr 6, 2017 at 4:56 PM, Thomas Hellstrom  wrote:
> On 04/06/2017 04:47 PM, Daniel Vetter wrote:
>> On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom  
>> wrote:
>>> Also, when testing the tip of drm-misc-next (with the non-atomic vmwgfx)
>>> there appeared to be warnings about a non-NULL
>>> dev->mode_config.acquire_ctx. I'll see if I can reproduce those, but
>>> perhaps removing the line
>>>
>>> dev->mode_config.acquire_ctx = 
>>>
>>> in drm_mode_setcrtc()
>>>
>>> is part of the problem.
>> Hm, where do you hit that? And by tip of drm-misc-next, do you mean
>> the very latest state, which includes atomic vmwgfx, or is this with
>> the non-atomic vmwgfx? Please paste the backtraces (and for which tree
>> they are).
>> -Daniel
>
> Actually this must have been from a confused rebase-bisect state
> somewhere. I can't reproduce this.

Have you changed CONFIG_DEBUG_WW_MUTEX_SLOWPATH perhaps? Without this
you shouldn't be able to hit any retry path (so minus warnings the
patch you bisected won't have an affected), with you will hit the
deadlock retry paths pseudo-randomly. It might take a few trials to
hit it again, depending upon timing.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm-next-misc merge breaks vmwgfx

2017-04-06 Thread Thomas Hellstrom
On 04/06/2017 04:47 PM, Daniel Vetter wrote:
> On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom  
> wrote:
>> Also, when testing the tip of drm-misc-next (with the non-atomic vmwgfx)
>> there appeared to be warnings about a non-NULL
>> dev->mode_config.acquire_ctx. I'll see if I can reproduce those, but
>> perhaps removing the line
>>
>> dev->mode_config.acquire_ctx = 
>>
>> in drm_mode_setcrtc()
>>
>> is part of the problem.
> Hm, where do you hit that? And by tip of drm-misc-next, do you mean
> the very latest state, which includes atomic vmwgfx, or is this with
> the non-atomic vmwgfx? Please paste the backtraces (and for which tree
> they are).
> -Daniel

Actually this must have been from a confused rebase-bisect state
somewhere. I can't reproduce this.

/Thomas


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm-next-misc merge breaks vmwgfx

2017-04-06 Thread Daniel Vetter
On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom  wrote:
>
> Also, when testing the tip of drm-misc-next (with the non-atomic vmwgfx)
> there appeared to be warnings about a non-NULL
> dev->mode_config.acquire_ctx. I'll see if I can reproduce those, but
> perhaps removing the line
>
> dev->mode_config.acquire_ctx = 
>
> in drm_mode_setcrtc()
>
> is part of the problem.

Hm, where do you hit that? And by tip of drm-misc-next, do you mean
the very latest state, which includes atomic vmwgfx, or is this with
the non-atomic vmwgfx? Please paste the backtraces (and for which tree
they are).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm-next-misc merge breaks vmwgfx

2017-04-06 Thread Daniel Vetter
On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom  wrote:
> On 04/06/2017 02:34 PM, Daniel Vetter wrote:
>> Hi Thomas,
>>
>> Bisected an offender already? Afaik there's no one else who reported
>> issues thus far, and for our own CI it seems all still fine.
>> -Daniel
>
> Hi, Daniel,
>
> Yes, I rebased drm-misc-next on top of vmwgfx-next and found the culprit
> to be
>
> 38b6441e "drm/atomic-helper: Remove the backoff hack from set_config.."
>
> Reverting first 1fa4da04 and then
> 38b6441e
>
> fixes the problem.

Yeah, we seem to have a solid functional conflict between the vmwgfx
atomic conversion, and the changes in drm-misc-next. Preliminary
analysis, but I think what's going on is:
- With the above changes in -misc we punt the deadlock retry loop to
the callers of ->set_config.
- But since it would have been way too invasive, I only fixed up the
atomic callers (in most places we have special paths for atomic and
non-atomic due to slightly different semantics), which means for
legacy functions we in some cases pass a NULL ctx down to
->set_config. But since legacy paths only get called on legacy
drivers, no problem.
- Well except I've done that audit before vmwgfx became atomic, and
that audit is now wrong, and I've forgotten to properly re-audit when
the conflicts happened all around. But since I half-expect to hit a
mid-driver conversion with this I did sprinkle
WARN_ON(drm_drv_uses_atomic_modeset()) over all these paths.

So assuming this is correct, you should see a pile of WARN_ON
backtraces that you're hitting in the atomic-vmwgfx+drm-misc-next
combo. The proper fix would be to switch over to atomic primitives for
all these cases. On a quick look I see some in the vmwgfx fbdev
emulation code, might even be worth it to check whether we could reuse
the core helpers (which do this split handling alread) in some cases.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm-next-misc merge breaks vmwgfx

2017-04-06 Thread Thomas Hellstrom
On 04/06/2017 02:34 PM, Daniel Vetter wrote:
> Hi Thomas,
>
> Bisected an offender already? Afaik there's no one else who reported
> issues thus far, and for our own CI it seems all still fine.
> -Daniel

Hi, Daniel,

Yes, I rebased drm-misc-next on top of vmwgfx-next and found the culprit
to be

38b6441e "drm/atomic-helper: Remove the backoff hack from set_config.."

Reverting first 1fa4da04 and then
38b6441e

fixes the problem.

Also, when testing the tip of drm-misc-next (with the non-atomic vmwgfx)
there appeared to be warnings about a non-NULL
dev->mode_config.acquire_ctx. I'll see if I can reproduce those, but
perhaps removing the line

dev->mode_config.acquire_ctx = 

in drm_mode_setcrtc()

is part of the problem.

/Thomas




> On Wed, Apr 5, 2017 at 7:45 PM, Thomas Hellstrom  
> wrote:
>> Hi!
>>
>> It appears like the drm-next commit 320d8c3d3 "Merge tag
>> 'drm-misc-next-2017-03-31..." breaks vmwgfx.
>>
>> A black screen shown where plymouthd use to show the boot splash on
>> Fedora 25. Nothing relevant in the logs.
>>
>> I eyed through the vmwgfx conflict fixes, but nothing immediately stands
>> out as incorrect.
>>
>> I'll dig into this tomorrow. Does anybody know of any other drivers that
>> don't work well with current drm-next?
>>
>> /Thomas
>>
>>
>>
>>
>>
>
>


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm-next-misc merge breaks vmwgfx

2017-04-06 Thread Daniel Vetter
Hi Thomas,

Bisected an offender already? Afaik there's no one else who reported
issues thus far, and for our own CI it seems all still fine.
-Daniel

On Wed, Apr 5, 2017 at 7:45 PM, Thomas Hellstrom  wrote:
> Hi!
>
> It appears like the drm-next commit 320d8c3d3 "Merge tag
> 'drm-misc-next-2017-03-31..." breaks vmwgfx.
>
> A black screen shown where plymouthd use to show the boot splash on
> Fedora 25. Nothing relevant in the logs.
>
> I eyed through the vmwgfx conflict fixes, but nothing immediately stands
> out as incorrect.
>
> I'll dig into this tomorrow. Does anybody know of any other drivers that
> don't work well with current drm-next?
>
> /Thomas
>
>
>
>
>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


drm-next-misc merge breaks vmwgfx

2017-04-05 Thread Thomas Hellstrom
Hi!

It appears like the drm-next commit 320d8c3d3 "Merge tag
'drm-misc-next-2017-03-31..." breaks vmwgfx.

A black screen shown where plymouthd use to show the boot splash on
Fedora 25. Nothing relevant in the logs.

I eyed through the vmwgfx conflict fixes, but nothing immediately stands
out as incorrect.

I'll dig into this tomorrow. Does anybody know of any other drivers that
don't work well with current drm-next?

/Thomas





___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel