Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-11 Thread Daniel Vetter
On Wed, Jan 11, 2017 at 08:16:56AM -0800, Dave Hansen wrote:
> On 01/11/2017 07:39 AM, Daniel Vetter wrote:
> > Hm, just cherry-picked it on top of Linus' latest 4.10 git, applies
> > cleanly there. The substituation was for 4.9. I can send you the patch
> > here, but seems all fine from what I can tell ...
> 
> All of the printk's that I added were making it fail to apply.
> 
> So, I took a 4.10-rc3 kernel with i915 compiled in (not as a module) and
> applied e73ab00e9a0f17, which I grabbed from linux-next.
> 
> I'm seeing basically the same behavior that I did before applying
> e73ab00e9a0f17.  sysfs_create_dir_ns() fails because of a NULL kobj->parent.
> 
> Have you guys tried testing this yourselves?  It seems really easy to
> reproduce if you just compile the driver in.

With some screaming I can reproduce it. I'll try to take a look, but since
I'm travelling to lca might need someone else to look into this too (I
don't have an mst machine to carry around). And I still don't get why this
blows up even.
-Daniel

> 
> > [1.400797] drm_dev_register(88040c73)::730 cpu: 2
> > [1.400860] drm_connector_register(88040c76b000)::382 
> > connector->registered: 0 cpu: 1
> > [1.400870] sysfs_create_dir_ns()::53 error: -2
> > [1.400874] create_dir()::75 error: -2 cpu: 1
> > [1.400878] [ cut here ]
> > [1.400884] WARNING: CPU: 1 PID: 91 at lib/kobject.c:249 
> > kobject_add_internal+0x273/0x330
> > [1.400888] kobject_add_internal failed for card0-DP-3 (error: -2 
> > parent: card0)
> > [1.400892] Modules linked in:
> > [1.400896] CPU: 1 PID: 91 Comm: kworker/1:2 Not tainted 
> > 4.10.0-rc3-i915borked-dirty #67
> > [1.400900] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
> > (1.23 ) 09/20/2016
> > [1.400906] Workqueue: events_long drm_dp_mst_link_probe_work
> > [1.400909] Call Trace:
> > [1.400914]  dump_stack+0x67/0x99
> > [1.400918]  __warn+0xd1/0xf0
> > [1.400922]  warn_slowpath_fmt+0x4f/0x60
> > [1.400925]  kobject_add_internal+0x273/0x330
> > [1.400927]  kobject_add+0x65/0xb0
> > [1.400931]  ? klist_init+0x31/0x40
> > [1.400936]  device_add+0x102/0x5d0
> > [1.400940]  ? kfree_const+0x22/0x30
> > [1.400944]  device_create_groups_vargs+0xd8/0x100
> > [1.400947]  device_create_with_groups+0x36/0x40
> > [1.400952]  ? vprintk_default+0x29/0x50
> > [1.400957]  ? __might_sleep+0x4a/0x90
> > [1.400962]  drm_sysfs_connector_add+0x60/0xe0
> > [1.400967]  drm_connector_register+0x74/0xd0
> > [1.400971]  intel_dp_register_mst_connector+0x41/0x50
> > [1.400975]  drm_dp_add_port+0x350/0x450
> > [1.400977] drm_connector_register(88040ee6f800)::382 
> > connector->registered: 0 cpu: 2
> > [1.400982]  ? rcu_early_boot_tests+0x1/0x10
> > [1.400986]  ? schedule_timeout+0x1cd/0x390
> > [1.400989]  ? __might_sleep+0x4a/0x90
> > [1.400992]  ? mutex_lock+0x25/0x50
> > [1.400995]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
> > [1.400996] drm_sysfs_connector_add() connector: 88040ee6f800 kdev: 
> > 88040eef9c00
> > [1.401002]  ? prepare_to_wait_event+0x120/0x120
> > [1.401005]  ? drm_dp_check_mstb_guid+0x3d/0x120
> > [1.401008]  drm_dp_send_link_address+0x185/0x1f0
> > [1.401012]  drm_dp_check_and_send_link_address+0xad/0xc0
> > [1.401015]  drm_dp_mst_link_probe_work+0x57/0xa0
> > [1.401018]  process_one_work+0x14b/0x430
> > [1.401021]  worker_thread+0x12b/0x4a0
> > [1.401025]  kthread+0x10c/0x140
> > [1.401027]  ? process_one_work+0x430/0x430
> > [1.401030]  ? kthread_create_on_node+0x40/0x40
> > [1.401034]  ret_from_fork+0x27/0x40
> > [1.401038] ---[ end trace ba43fc250fbf282d ]---
> > [1.401041] drm_sysfs_connector_add() connector: 88040c76b000 kdev: 
> > fffe
> > [1.401043] drm_connector_register(88040c768000)::382 
> > connector->registered: 0 cpu: 2
> > [1.401050] [drm:drm_sysfs_connector_add] *ERROR* failed to register 
> > connector device: -2
> > [1.401057] drm_sysfs_connector_add() connector: 88040c768000 kdev: 
> > 88040eefa000
> > [1.401093] drm_connector_register(88040c768800)::382 
> > connector->registered: 0 cpu: 2
> > [1.401113] drm_sysfs_connector_add() connector: 88040c768800 kdev: 
> > 88040eefa400
> > [1.401122] drm_connector_register(88040c769000)::382 
> > connector->registered: 0 cpu: 2
> > [1.401140] drm_sysfs_connector_add() connector: 88040c769000 kdev: 
> > 88040eefa800
> > [1.401167] drm_connector_register(88040c769800)::382 
> > connector->registered: 0 cpu: 2
> > [1.401186] drm_sysfs_connector_add() connector: 88040c769800 kdev: 
> > 88040eefac00
> > [1.401195] drm_connector_register(88040c76b000)::382 
> > connector->registered: 0 cpu: 2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-11 Thread Daniel Vetter
On Wed, Jan 11, 2017 at 08:16:56AM -0800, Dave Hansen wrote:
> On 01/11/2017 07:39 AM, Daniel Vetter wrote:
> > Hm, just cherry-picked it on top of Linus' latest 4.10 git, applies
> > cleanly there. The substituation was for 4.9. I can send you the patch
> > here, but seems all fine from what I can tell ...
> 
> All of the printk's that I added were making it fail to apply.
> 
> So, I took a 4.10-rc3 kernel with i915 compiled in (not as a module) and
> applied e73ab00e9a0f17, which I grabbed from linux-next.
> 
> I'm seeing basically the same behavior that I did before applying
> e73ab00e9a0f17.  sysfs_create_dir_ns() fails because of a NULL kobj->parent.
> 
> Have you guys tried testing this yourselves?  It seems really easy to
> reproduce if you just compile the driver in.

With some screaming I can reproduce it. I'll try to take a look, but since
I'm travelling to lca might need someone else to look into this too (I
don't have an mst machine to carry around). And I still don't get why this
blows up even.
-Daniel

> 
> > [1.400797] drm_dev_register(88040c73)::730 cpu: 2
> > [1.400860] drm_connector_register(88040c76b000)::382 
> > connector->registered: 0 cpu: 1
> > [1.400870] sysfs_create_dir_ns()::53 error: -2
> > [1.400874] create_dir()::75 error: -2 cpu: 1
> > [1.400878] [ cut here ]
> > [1.400884] WARNING: CPU: 1 PID: 91 at lib/kobject.c:249 
> > kobject_add_internal+0x273/0x330
> > [1.400888] kobject_add_internal failed for card0-DP-3 (error: -2 
> > parent: card0)
> > [1.400892] Modules linked in:
> > [1.400896] CPU: 1 PID: 91 Comm: kworker/1:2 Not tainted 
> > 4.10.0-rc3-i915borked-dirty #67
> > [1.400900] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
> > (1.23 ) 09/20/2016
> > [1.400906] Workqueue: events_long drm_dp_mst_link_probe_work
> > [1.400909] Call Trace:
> > [1.400914]  dump_stack+0x67/0x99
> > [1.400918]  __warn+0xd1/0xf0
> > [1.400922]  warn_slowpath_fmt+0x4f/0x60
> > [1.400925]  kobject_add_internal+0x273/0x330
> > [1.400927]  kobject_add+0x65/0xb0
> > [1.400931]  ? klist_init+0x31/0x40
> > [1.400936]  device_add+0x102/0x5d0
> > [1.400940]  ? kfree_const+0x22/0x30
> > [1.400944]  device_create_groups_vargs+0xd8/0x100
> > [1.400947]  device_create_with_groups+0x36/0x40
> > [1.400952]  ? vprintk_default+0x29/0x50
> > [1.400957]  ? __might_sleep+0x4a/0x90
> > [1.400962]  drm_sysfs_connector_add+0x60/0xe0
> > [1.400967]  drm_connector_register+0x74/0xd0
> > [1.400971]  intel_dp_register_mst_connector+0x41/0x50
> > [1.400975]  drm_dp_add_port+0x350/0x450
> > [1.400977] drm_connector_register(88040ee6f800)::382 
> > connector->registered: 0 cpu: 2
> > [1.400982]  ? rcu_early_boot_tests+0x1/0x10
> > [1.400986]  ? schedule_timeout+0x1cd/0x390
> > [1.400989]  ? __might_sleep+0x4a/0x90
> > [1.400992]  ? mutex_lock+0x25/0x50
> > [1.400995]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
> > [1.400996] drm_sysfs_connector_add() connector: 88040ee6f800 kdev: 
> > 88040eef9c00
> > [1.401002]  ? prepare_to_wait_event+0x120/0x120
> > [1.401005]  ? drm_dp_check_mstb_guid+0x3d/0x120
> > [1.401008]  drm_dp_send_link_address+0x185/0x1f0
> > [1.401012]  drm_dp_check_and_send_link_address+0xad/0xc0
> > [1.401015]  drm_dp_mst_link_probe_work+0x57/0xa0
> > [1.401018]  process_one_work+0x14b/0x430
> > [1.401021]  worker_thread+0x12b/0x4a0
> > [1.401025]  kthread+0x10c/0x140
> > [1.401027]  ? process_one_work+0x430/0x430
> > [1.401030]  ? kthread_create_on_node+0x40/0x40
> > [1.401034]  ret_from_fork+0x27/0x40
> > [1.401038] ---[ end trace ba43fc250fbf282d ]---
> > [1.401041] drm_sysfs_connector_add() connector: 88040c76b000 kdev: 
> > fffe
> > [1.401043] drm_connector_register(88040c768000)::382 
> > connector->registered: 0 cpu: 2
> > [1.401050] [drm:drm_sysfs_connector_add] *ERROR* failed to register 
> > connector device: -2
> > [1.401057] drm_sysfs_connector_add() connector: 88040c768000 kdev: 
> > 88040eefa000
> > [1.401093] drm_connector_register(88040c768800)::382 
> > connector->registered: 0 cpu: 2
> > [1.401113] drm_sysfs_connector_add() connector: 88040c768800 kdev: 
> > 88040eefa400
> > [1.401122] drm_connector_register(88040c769000)::382 
> > connector->registered: 0 cpu: 2
> > [1.401140] drm_sysfs_connector_add() connector: 88040c769000 kdev: 
> > 88040eefa800
> > [1.401167] drm_connector_register(88040c769800)::382 
> > connector->registered: 0 cpu: 2
> > [1.401186] drm_sysfs_connector_add() connector: 88040c769800 kdev: 
> > 88040eefac00
> > [1.401195] drm_connector_register(88040c76b000)::382 
> > connector->registered: 0 cpu: 2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-11 Thread Dave Hansen
On 01/11/2017 07:39 AM, Daniel Vetter wrote:
> Hm, just cherry-picked it on top of Linus' latest 4.10 git, applies
> cleanly there. The substituation was for 4.9. I can send you the patch
> here, but seems all fine from what I can tell ...

All of the printk's that I added were making it fail to apply.

So, I took a 4.10-rc3 kernel with i915 compiled in (not as a module) and
applied e73ab00e9a0f17, which I grabbed from linux-next.

I'm seeing basically the same behavior that I did before applying
e73ab00e9a0f17.  sysfs_create_dir_ns() fails because of a NULL kobj->parent.

Have you guys tried testing this yourselves?  It seems really easy to
reproduce if you just compile the driver in.

> [1.400797] drm_dev_register(88040c73)::730 cpu: 2
> [1.400860] drm_connector_register(88040c76b000)::382 
> connector->registered: 0 cpu: 1
> [1.400870] sysfs_create_dir_ns()::53 error: -2
> [1.400874] create_dir()::75 error: -2 cpu: 1
> [1.400878] [ cut here ]
> [1.400884] WARNING: CPU: 1 PID: 91 at lib/kobject.c:249 
> kobject_add_internal+0x273/0x330
> [1.400888] kobject_add_internal failed for card0-DP-3 (error: -2 parent: 
> card0)
> [1.400892] Modules linked in:
> [1.400896] CPU: 1 PID: 91 Comm: kworker/1:2 Not tainted 
> 4.10.0-rc3-i915borked-dirty #67
> [1.400900] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
> (1.23 ) 09/20/2016
> [1.400906] Workqueue: events_long drm_dp_mst_link_probe_work
> [1.400909] Call Trace:
> [1.400914]  dump_stack+0x67/0x99
> [1.400918]  __warn+0xd1/0xf0
> [1.400922]  warn_slowpath_fmt+0x4f/0x60
> [1.400925]  kobject_add_internal+0x273/0x330
> [1.400927]  kobject_add+0x65/0xb0
> [1.400931]  ? klist_init+0x31/0x40
> [1.400936]  device_add+0x102/0x5d0
> [1.400940]  ? kfree_const+0x22/0x30
> [1.400944]  device_create_groups_vargs+0xd8/0x100
> [1.400947]  device_create_with_groups+0x36/0x40
> [1.400952]  ? vprintk_default+0x29/0x50
> [1.400957]  ? __might_sleep+0x4a/0x90
> [1.400962]  drm_sysfs_connector_add+0x60/0xe0
> [1.400967]  drm_connector_register+0x74/0xd0
> [1.400971]  intel_dp_register_mst_connector+0x41/0x50
> [1.400975]  drm_dp_add_port+0x350/0x450
> [1.400977] drm_connector_register(88040ee6f800)::382 
> connector->registered: 0 cpu: 2
> [1.400982]  ? rcu_early_boot_tests+0x1/0x10
> [1.400986]  ? schedule_timeout+0x1cd/0x390
> [1.400989]  ? __might_sleep+0x4a/0x90
> [1.400992]  ? mutex_lock+0x25/0x50
> [1.400995]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
> [1.400996] drm_sysfs_connector_add() connector: 88040ee6f800 kdev: 
> 88040eef9c00
> [1.401002]  ? prepare_to_wait_event+0x120/0x120
> [1.401005]  ? drm_dp_check_mstb_guid+0x3d/0x120
> [1.401008]  drm_dp_send_link_address+0x185/0x1f0
> [1.401012]  drm_dp_check_and_send_link_address+0xad/0xc0
> [1.401015]  drm_dp_mst_link_probe_work+0x57/0xa0
> [1.401018]  process_one_work+0x14b/0x430
> [1.401021]  worker_thread+0x12b/0x4a0
> [1.401025]  kthread+0x10c/0x140
> [1.401027]  ? process_one_work+0x430/0x430
> [1.401030]  ? kthread_create_on_node+0x40/0x40
> [1.401034]  ret_from_fork+0x27/0x40
> [1.401038] ---[ end trace ba43fc250fbf282d ]---
> [1.401041] drm_sysfs_connector_add() connector: 88040c76b000 kdev: 
> fffe
> [1.401043] drm_connector_register(88040c768000)::382 
> connector->registered: 0 cpu: 2
> [1.401050] [drm:drm_sysfs_connector_add] *ERROR* failed to register 
> connector device: -2
> [1.401057] drm_sysfs_connector_add() connector: 88040c768000 kdev: 
> 88040eefa000
> [1.401093] drm_connector_register(88040c768800)::382 
> connector->registered: 0 cpu: 2
> [1.401113] drm_sysfs_connector_add() connector: 88040c768800 kdev: 
> 88040eefa400
> [1.401122] drm_connector_register(88040c769000)::382 
> connector->registered: 0 cpu: 2
> [1.401140] drm_sysfs_connector_add() connector: 88040c769000 kdev: 
> 88040eefa800
> [1.401167] drm_connector_register(88040c769800)::382 
> connector->registered: 0 cpu: 2
> [1.401186] drm_sysfs_connector_add() connector: 88040c769800 kdev: 
> 88040eefac00
> [1.401195] drm_connector_register(88040c76b000)::382 
> connector->registered: 0 cpu: 2



Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-11 Thread Dave Hansen
On 01/11/2017 07:39 AM, Daniel Vetter wrote:
> Hm, just cherry-picked it on top of Linus' latest 4.10 git, applies
> cleanly there. The substituation was for 4.9. I can send you the patch
> here, but seems all fine from what I can tell ...

All of the printk's that I added were making it fail to apply.

So, I took a 4.10-rc3 kernel with i915 compiled in (not as a module) and
applied e73ab00e9a0f17, which I grabbed from linux-next.

I'm seeing basically the same behavior that I did before applying
e73ab00e9a0f17.  sysfs_create_dir_ns() fails because of a NULL kobj->parent.

Have you guys tried testing this yourselves?  It seems really easy to
reproduce if you just compile the driver in.

> [1.400797] drm_dev_register(88040c73)::730 cpu: 2
> [1.400860] drm_connector_register(88040c76b000)::382 
> connector->registered: 0 cpu: 1
> [1.400870] sysfs_create_dir_ns()::53 error: -2
> [1.400874] create_dir()::75 error: -2 cpu: 1
> [1.400878] [ cut here ]
> [1.400884] WARNING: CPU: 1 PID: 91 at lib/kobject.c:249 
> kobject_add_internal+0x273/0x330
> [1.400888] kobject_add_internal failed for card0-DP-3 (error: -2 parent: 
> card0)
> [1.400892] Modules linked in:
> [1.400896] CPU: 1 PID: 91 Comm: kworker/1:2 Not tainted 
> 4.10.0-rc3-i915borked-dirty #67
> [1.400900] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
> (1.23 ) 09/20/2016
> [1.400906] Workqueue: events_long drm_dp_mst_link_probe_work
> [1.400909] Call Trace:
> [1.400914]  dump_stack+0x67/0x99
> [1.400918]  __warn+0xd1/0xf0
> [1.400922]  warn_slowpath_fmt+0x4f/0x60
> [1.400925]  kobject_add_internal+0x273/0x330
> [1.400927]  kobject_add+0x65/0xb0
> [1.400931]  ? klist_init+0x31/0x40
> [1.400936]  device_add+0x102/0x5d0
> [1.400940]  ? kfree_const+0x22/0x30
> [1.400944]  device_create_groups_vargs+0xd8/0x100
> [1.400947]  device_create_with_groups+0x36/0x40
> [1.400952]  ? vprintk_default+0x29/0x50
> [1.400957]  ? __might_sleep+0x4a/0x90
> [1.400962]  drm_sysfs_connector_add+0x60/0xe0
> [1.400967]  drm_connector_register+0x74/0xd0
> [1.400971]  intel_dp_register_mst_connector+0x41/0x50
> [1.400975]  drm_dp_add_port+0x350/0x450
> [1.400977] drm_connector_register(88040ee6f800)::382 
> connector->registered: 0 cpu: 2
> [1.400982]  ? rcu_early_boot_tests+0x1/0x10
> [1.400986]  ? schedule_timeout+0x1cd/0x390
> [1.400989]  ? __might_sleep+0x4a/0x90
> [1.400992]  ? mutex_lock+0x25/0x50
> [1.400995]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
> [1.400996] drm_sysfs_connector_add() connector: 88040ee6f800 kdev: 
> 88040eef9c00
> [1.401002]  ? prepare_to_wait_event+0x120/0x120
> [1.401005]  ? drm_dp_check_mstb_guid+0x3d/0x120
> [1.401008]  drm_dp_send_link_address+0x185/0x1f0
> [1.401012]  drm_dp_check_and_send_link_address+0xad/0xc0
> [1.401015]  drm_dp_mst_link_probe_work+0x57/0xa0
> [1.401018]  process_one_work+0x14b/0x430
> [1.401021]  worker_thread+0x12b/0x4a0
> [1.401025]  kthread+0x10c/0x140
> [1.401027]  ? process_one_work+0x430/0x430
> [1.401030]  ? kthread_create_on_node+0x40/0x40
> [1.401034]  ret_from_fork+0x27/0x40
> [1.401038] ---[ end trace ba43fc250fbf282d ]---
> [1.401041] drm_sysfs_connector_add() connector: 88040c76b000 kdev: 
> fffe
> [1.401043] drm_connector_register(88040c768000)::382 
> connector->registered: 0 cpu: 2
> [1.401050] [drm:drm_sysfs_connector_add] *ERROR* failed to register 
> connector device: -2
> [1.401057] drm_sysfs_connector_add() connector: 88040c768000 kdev: 
> 88040eefa000
> [1.401093] drm_connector_register(88040c768800)::382 
> connector->registered: 0 cpu: 2
> [1.401113] drm_sysfs_connector_add() connector: 88040c768800 kdev: 
> 88040eefa400
> [1.401122] drm_connector_register(88040c769000)::382 
> connector->registered: 0 cpu: 2
> [1.401140] drm_sysfs_connector_add() connector: 88040c769000 kdev: 
> 88040eefa800
> [1.401167] drm_connector_register(88040c769800)::382 
> connector->registered: 0 cpu: 2
> [1.401186] drm_sysfs_connector_add() connector: 88040c769800 kdev: 
> 88040eefac00
> [1.401195] drm_connector_register(88040c76b000)::382 
> connector->registered: 0 cpu: 2



Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-11 Thread Chris Wilson
On Wed, Jan 11, 2017 at 07:24:45AM -0800, Dave Hansen wrote:
> On 01/10/2017 11:43 PM, Daniel Vetter wrote:
> > On Tue, Jan 10, 2017 at 08:52:47AM -0800, Dave Hansen wrote:
> >> On 01/10/2017 02:31 AM, Daniel Vetter wrote:
> >>> commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e
> >>> Author: Daniel Vetter 
> >>> Date:   Sun Dec 18 14:35:45 2016 +0100
> >>>
> >>> drm: prevent double-(un)registration for connectors
> >>>
> >>> Lack of that would perfectly explain that oops ... Otherwise still no idea
> >>> what's going wrong.
> >> No...  That's not in mainline as far as I can see.  Should I test with
> >> it applied?
> > Hm, I guess failed to cc: stable that one properly, iirc we decided the
> > race fix is too academic and can't be hit in reality ;-)
> > 
> > Testing would be great. Probably conflicts because we extracted
> > drm_connector.c only recently, but running s/drm_connector\.c/drm_crtc.c/
> > over the diff and then applying with some fudge should take care of that.
> 
> It doesn't apply to mainline, with or without the substitution you suggest.

I was hoping that the locking was the real cause here and would be an
easy fix to apply. I did have a look at trying to reorder the DP-MST
worker with driver registration. Hacky to say the least.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
>From 30ac9092e934295f12775f03d73170fc480b7fc8 Mon Sep 17 00:00:00 2001
From: Chris Wilson 
Date: Tue, 10 Jan 2017 10:46:25 +
Subject: [PATCH] dp-mst-register

---
 drivers/gpu/drm/i915/intel_dp.c | 12 +++-
 drivers/gpu/drm/i915/intel_dp_mst.c |  9 ++---
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index f0f44cdbe4b4..fc10eb2c8563 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -4762,7 +4762,17 @@ intel_dp_connector_register(struct drm_connector *connector)
 		  intel_dp->aux.name, connector->kdev->kobj.name);
 
 	intel_dp->aux.dev = connector->kdev;
-	return drm_dp_aux_register(_dp->aux);
+	ret = drm_dp_aux_register(_dp->aux);
+	if (ret)
+		return ret;
+
+	if (intel_dp->mst_mgr.cbs) {
+		intel_dp->can_mst = true;
+		if (intel_dp->attached_connector)
+			intel_dp->attached_connector->base.status = intel_dp_long_pulse(intel_dp->attached_connector);
+	}
+
+	return 0;
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
index c93c1999a494..f0a664041dbc 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -582,16 +582,19 @@ intel_dp_mst_encoder_init(struct intel_digital_port *intel_dig_port, int conn_ba
 	struct drm_device *dev = intel_dig_port->base.base.dev;
 	int ret;
 
-	intel_dp->can_mst = true;
+	intel_dp->can_mst = false;
 	intel_dp->mst_mgr.cbs = _cbs;
 
 	/* create encoders */
 	intel_dp_create_fake_mst_encoders(intel_dig_port);
-	ret = drm_dp_mst_topology_mgr_init(_dp->mst_mgr, dev->dev, _dp->aux, 16, 3, conn_base_id);
+	ret = drm_dp_mst_topology_mgr_init(_dp->mst_mgr, dev->dev,
+	_dp->aux, 16, 3,
+	conn_base_id);
 	if (ret) {
-		intel_dp->can_mst = false;
+		intel_dp->mst_mgr.cbs = NULL;
 		return ret;
 	}
+
 	return 0;
 }
 
-- 
2.11.0



Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-11 Thread Chris Wilson
On Wed, Jan 11, 2017 at 07:24:45AM -0800, Dave Hansen wrote:
> On 01/10/2017 11:43 PM, Daniel Vetter wrote:
> > On Tue, Jan 10, 2017 at 08:52:47AM -0800, Dave Hansen wrote:
> >> On 01/10/2017 02:31 AM, Daniel Vetter wrote:
> >>> commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e
> >>> Author: Daniel Vetter 
> >>> Date:   Sun Dec 18 14:35:45 2016 +0100
> >>>
> >>> drm: prevent double-(un)registration for connectors
> >>>
> >>> Lack of that would perfectly explain that oops ... Otherwise still no idea
> >>> what's going wrong.
> >> No...  That's not in mainline as far as I can see.  Should I test with
> >> it applied?
> > Hm, I guess failed to cc: stable that one properly, iirc we decided the
> > race fix is too academic and can't be hit in reality ;-)
> > 
> > Testing would be great. Probably conflicts because we extracted
> > drm_connector.c only recently, but running s/drm_connector\.c/drm_crtc.c/
> > over the diff and then applying with some fudge should take care of that.
> 
> It doesn't apply to mainline, with or without the substitution you suggest.

I was hoping that the locking was the real cause here and would be an
easy fix to apply. I did have a look at trying to reorder the DP-MST
worker with driver registration. Hacky to say the least.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
>From 30ac9092e934295f12775f03d73170fc480b7fc8 Mon Sep 17 00:00:00 2001
From: Chris Wilson 
Date: Tue, 10 Jan 2017 10:46:25 +
Subject: [PATCH] dp-mst-register

---
 drivers/gpu/drm/i915/intel_dp.c | 12 +++-
 drivers/gpu/drm/i915/intel_dp_mst.c |  9 ++---
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index f0f44cdbe4b4..fc10eb2c8563 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -4762,7 +4762,17 @@ intel_dp_connector_register(struct drm_connector *connector)
 		  intel_dp->aux.name, connector->kdev->kobj.name);
 
 	intel_dp->aux.dev = connector->kdev;
-	return drm_dp_aux_register(_dp->aux);
+	ret = drm_dp_aux_register(_dp->aux);
+	if (ret)
+		return ret;
+
+	if (intel_dp->mst_mgr.cbs) {
+		intel_dp->can_mst = true;
+		if (intel_dp->attached_connector)
+			intel_dp->attached_connector->base.status = intel_dp_long_pulse(intel_dp->attached_connector);
+	}
+
+	return 0;
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
index c93c1999a494..f0a664041dbc 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -582,16 +582,19 @@ intel_dp_mst_encoder_init(struct intel_digital_port *intel_dig_port, int conn_ba
 	struct drm_device *dev = intel_dig_port->base.base.dev;
 	int ret;
 
-	intel_dp->can_mst = true;
+	intel_dp->can_mst = false;
 	intel_dp->mst_mgr.cbs = _cbs;
 
 	/* create encoders */
 	intel_dp_create_fake_mst_encoders(intel_dig_port);
-	ret = drm_dp_mst_topology_mgr_init(_dp->mst_mgr, dev->dev, _dp->aux, 16, 3, conn_base_id);
+	ret = drm_dp_mst_topology_mgr_init(_dp->mst_mgr, dev->dev,
+	_dp->aux, 16, 3,
+	conn_base_id);
 	if (ret) {
-		intel_dp->can_mst = false;
+		intel_dp->mst_mgr.cbs = NULL;
 		return ret;
 	}
+
 	return 0;
 }
 
-- 
2.11.0



Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-11 Thread Daniel Vetter
On Wed, Jan 11, 2017 at 4:24 PM, Dave Hansen  wrote:
> On 01/10/2017 11:43 PM, Daniel Vetter wrote:
>> On Tue, Jan 10, 2017 at 08:52:47AM -0800, Dave Hansen wrote:
>>> On 01/10/2017 02:31 AM, Daniel Vetter wrote:
 commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e
 Author: Daniel Vetter 
 Date:   Sun Dec 18 14:35:45 2016 +0100

 drm: prevent double-(un)registration for connectors

 Lack of that would perfectly explain that oops ... Otherwise still no idea
 what's going wrong.
>>> No...  That's not in mainline as far as I can see.  Should I test with
>>> it applied?
>> Hm, I guess failed to cc: stable that one properly, iirc we decided the
>> race fix is too academic and can't be hit in reality ;-)
>>
>> Testing would be great. Probably conflicts because we extracted
>> drm_connector.c only recently, but running s/drm_connector\.c/drm_crtc.c/
>> over the diff and then applying with some fudge should take care of that.
>
> It doesn't apply to mainline, with or without the substitution you suggest.
>
> Do you want that commit itself tested from -next?

Hm, just cherry-picked it on top of Linus' latest 4.10 git, applies
cleanly there. The substituation was for 4.9. I can send you the patch
here, but seems all fine from what I can tell ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-11 Thread Daniel Vetter
On Wed, Jan 11, 2017 at 4:24 PM, Dave Hansen  wrote:
> On 01/10/2017 11:43 PM, Daniel Vetter wrote:
>> On Tue, Jan 10, 2017 at 08:52:47AM -0800, Dave Hansen wrote:
>>> On 01/10/2017 02:31 AM, Daniel Vetter wrote:
 commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e
 Author: Daniel Vetter 
 Date:   Sun Dec 18 14:35:45 2016 +0100

 drm: prevent double-(un)registration for connectors

 Lack of that would perfectly explain that oops ... Otherwise still no idea
 what's going wrong.
>>> No...  That's not in mainline as far as I can see.  Should I test with
>>> it applied?
>> Hm, I guess failed to cc: stable that one properly, iirc we decided the
>> race fix is too academic and can't be hit in reality ;-)
>>
>> Testing would be great. Probably conflicts because we extracted
>> drm_connector.c only recently, but running s/drm_connector\.c/drm_crtc.c/
>> over the diff and then applying with some fudge should take care of that.
>
> It doesn't apply to mainline, with or without the substitution you suggest.
>
> Do you want that commit itself tested from -next?

Hm, just cherry-picked it on top of Linus' latest 4.10 git, applies
cleanly there. The substituation was for 4.9. I can send you the patch
here, but seems all fine from what I can tell ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-11 Thread Dave Hansen
On 01/10/2017 11:43 PM, Daniel Vetter wrote:
> On Tue, Jan 10, 2017 at 08:52:47AM -0800, Dave Hansen wrote:
>> On 01/10/2017 02:31 AM, Daniel Vetter wrote:
>>> commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e
>>> Author: Daniel Vetter 
>>> Date:   Sun Dec 18 14:35:45 2016 +0100
>>>
>>> drm: prevent double-(un)registration for connectors
>>>
>>> Lack of that would perfectly explain that oops ... Otherwise still no idea
>>> what's going wrong.
>> No...  That's not in mainline as far as I can see.  Should I test with
>> it applied?
> Hm, I guess failed to cc: stable that one properly, iirc we decided the
> race fix is too academic and can't be hit in reality ;-)
> 
> Testing would be great. Probably conflicts because we extracted
> drm_connector.c only recently, but running s/drm_connector\.c/drm_crtc.c/
> over the diff and then applying with some fudge should take care of that.

It doesn't apply to mainline, with or without the substitution you suggest.

Do you want that commit itself tested from -next?


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-11 Thread Dave Hansen
On 01/10/2017 11:43 PM, Daniel Vetter wrote:
> On Tue, Jan 10, 2017 at 08:52:47AM -0800, Dave Hansen wrote:
>> On 01/10/2017 02:31 AM, Daniel Vetter wrote:
>>> commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e
>>> Author: Daniel Vetter 
>>> Date:   Sun Dec 18 14:35:45 2016 +0100
>>>
>>> drm: prevent double-(un)registration for connectors
>>>
>>> Lack of that would perfectly explain that oops ... Otherwise still no idea
>>> what's going wrong.
>> No...  That's not in mainline as far as I can see.  Should I test with
>> it applied?
> Hm, I guess failed to cc: stable that one properly, iirc we decided the
> race fix is too academic and can't be hit in reality ;-)
> 
> Testing would be great. Probably conflicts because we extracted
> drm_connector.c only recently, but running s/drm_connector\.c/drm_crtc.c/
> over the diff and then applying with some fudge should take care of that.

It doesn't apply to mainline, with or without the substitution you suggest.

Do you want that commit itself tested from -next?


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-10 Thread Daniel Vetter
On Tue, Jan 10, 2017 at 08:52:47AM -0800, Dave Hansen wrote:
> On 01/10/2017 02:31 AM, Daniel Vetter wrote:
> > commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e
> > Author: Daniel Vetter 
> > Date:   Sun Dec 18 14:35:45 2016 +0100
> > 
> > drm: prevent double-(un)registration for connectors
> > 
> > Lack of that would perfectly explain that oops ... Otherwise still no idea
> > what's going wrong.
> 
> No...  That's not in mainline as far as I can see.  Should I test with
> it applied?

Hm, I guess failed to cc: stable that one properly, iirc we decided the
race fix is too academic and can't be hit in reality ;-)

Testing would be great. Probably conflicts because we extracted
drm_connector.c only recently, but running s/drm_connector\.c/drm_crtc.c/
over the diff and then applying with some fudge should take care of that.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-10 Thread Daniel Vetter
On Tue, Jan 10, 2017 at 08:52:47AM -0800, Dave Hansen wrote:
> On 01/10/2017 02:31 AM, Daniel Vetter wrote:
> > commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e
> > Author: Daniel Vetter 
> > Date:   Sun Dec 18 14:35:45 2016 +0100
> > 
> > drm: prevent double-(un)registration for connectors
> > 
> > Lack of that would perfectly explain that oops ... Otherwise still no idea
> > what's going wrong.
> 
> No...  That's not in mainline as far as I can see.  Should I test with
> it applied?

Hm, I guess failed to cc: stable that one properly, iirc we decided the
race fix is too academic and can't be hit in reality ;-)

Testing would be great. Probably conflicts because we extracted
drm_connector.c only recently, but running s/drm_connector\.c/drm_crtc.c/
over the diff and then applying with some fudge should take care of that.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-10 Thread Dave Hansen
On 01/10/2017 02:31 AM, Daniel Vetter wrote:
> commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e
> Author: Daniel Vetter 
> Date:   Sun Dec 18 14:35:45 2016 +0100
> 
> drm: prevent double-(un)registration for connectors
> 
> Lack of that would perfectly explain that oops ... Otherwise still no idea
> what's going wrong.

No...  That's not in mainline as far as I can see.  Should I test with
it applied?


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-10 Thread Dave Hansen
On 01/10/2017 02:31 AM, Daniel Vetter wrote:
> commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e
> Author: Daniel Vetter 
> Date:   Sun Dec 18 14:35:45 2016 +0100
> 
> drm: prevent double-(un)registration for connectors
> 
> Lack of that would perfectly explain that oops ... Otherwise still no idea
> what's going wrong.

No...  That's not in mainline as far as I can see.  Should I test with
it applied?


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-10 Thread Daniel Vetter
On Mon, Jan 09, 2017 at 09:42:22AM -0800, Dave Hansen wrote:
> On 01/09/2017 08:59 AM, Daniel Vetter wrote:
> > On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen  wrote:
> >> On 01/09/2017 08:41 AM, Daniel Vetter wrote:
> >>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
>  Well, now I found where the -2 comes from.
>  intel_dp_register_mst_connector() calls drm_connector_register(), which
>  fails to add the kobject (warning below).  But, it does zero error
>  checking on the drm_connector_register() call and leaves the
>  partially-constructed connector in place.
> 
>  The next time some poor, hapless code goes and tries to do anything with
>  that kdev, they oops.  I'm perplexed by this, though.  The
>  drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
>  a bit goofy that it can't even _return_ failure.
> 
>  Is there some stable code to go back to here?  Or, is there something
>  about my configuration that's unique?  I really wonder why nobody else
>  is running into this.
> 
>  There's probably some other race going on here.  This warning doesn't
>  happen on every boot.
> >>> This smells more like the root-cause: Something goes wrong on boot
> >>> that prevents connectors from properly registering, then we fall over
> >>> later on. And the register callback is intentionally void, assuming
> >>> that any prep work has been done earlier and that therefore the
> >>> register step can't fail. Can you pls check whether the oops later on
> >>> only happens together with this warning at boot, or whether they're
> >>> not correlated?
> >>
> >> Looking through my logs, I can't find any instance of the oops without
> >> the warning at boot.  So I do think the later oops is entirely caused by
> >> the issue warned about in early boot.
> > 
> > Hm, I guess then we'd need to fix that boot-up warning. Can you try to
> > figure out why it's unhappy? On a hunch it could be that we call
> > drm_connector_register from the mst probe worker before the main
> > driver load thread has reached the drm_dev_register call. A few printk
> > to decide whether that's the case (plus a few boot-up tests to gather
> > the statistics, sorry about that) would be real great.
> 
> I think your suspicion is correct.
> 
> The WARNING is getting printed from CPU0 (the
> drm_dp_mst_link_probe_work() thread) and we can see the
> drm_connector_register()s happening in parallel on CPU1 while the
> WARNING is getting printed out.
> 
> This seems to happen very consistently when i915 is built into the kernel.

Do you have

commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e
Author: Daniel Vetter 
Date:   Sun Dec 18 14:35:45 2016 +0100

drm: prevent double-(un)registration for connectors

Lack of that would perfectly explain that oops ... Otherwise still no idea
what's going wrong.
-Daniel

> 
> > [1.404237] drm_dev_register(02da)::209092608 cpu: 1
> > [1.404284] drm_connector_register(88040ed3e800)::379 
> > connector->registered: 0 cpu: 0
> > [1.404293] sysfs_create_dir_ns()::53 error: -2
> > [1.404295] create_dir()::75 error: -2 cpu: 0
> > [1.404298] [ cut here ]
> > [1.404304] WARNING: CPU: 0 PID: 3 at lib/kobject.c:249 
> > kobject_add_internal+0x273/0x330
> > [1.404308] kobject_add_internal failed for card0-DP-3 (error: -2 
> > parent: card0)
> > [1.404311] Modules linked in:
> > [1.404316] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 
> > 4.10.0-rc3-i915borked-dirty #66
> > [1.404319] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
> > (1.23 ) 09/20/2016
> > [1.404325] Workqueue: events_long drm_dp_mst_link_probe_work
> > [1.404328] Call Trace:
> > [1.404334]  dump_stack+0x67/0x99
> > [1.404338]  __warn+0xd1/0xf0
> > [1.404342]  warn_slowpath_fmt+0x4f/0x60
> > [1.404345]  kobject_add_internal+0x273/0x330
> > [1.404347]  kobject_add+0x65/0xb0
> > [1.404352]  ? klist_init+0x31/0x40
> > [1.404356]  device_add+0x102/0x5d0
> > [1.404360]  ? kfree_const+0x22/0x30
> > [1.404364]  device_create_groups_vargs+0xd8/0x100
> > [1.404367]  device_create_with_groups+0x36/0x40
> > [1.404370] drm_connector_register(88040c779000)::379 
> > connector->registered: 0 cpu: 1
> > [1.404374]  ? vprintk_default+0x29/0x50
> > [1.404375]  ? printk+0x4d/0x4f
> > [1.404378]  drm_sysfs_connector_add+0x60/0xe0
> > [1.404381]  drm_connector_register+0x4e/0x100
> > [1.404383]  intel_dp_register_mst_connector+0x41/0x50
> > [1.404385]  drm_dp_add_port+0x350/0x450
> > [1.404388]  ? rcu_early_boot_tests+0x1/0x10
> > [1.404390]  ? schedule_timeout+0x1cd/0x390
> > [1.404392]  ? __might_sleep+0x4a/0x90
> > [1.404394]  ? mutex_lock+0x25/0x50
> > [1.404395]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
> > [1.404397]  ? 

Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-10 Thread Daniel Vetter
On Mon, Jan 09, 2017 at 09:42:22AM -0800, Dave Hansen wrote:
> On 01/09/2017 08:59 AM, Daniel Vetter wrote:
> > On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen  wrote:
> >> On 01/09/2017 08:41 AM, Daniel Vetter wrote:
> >>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
>  Well, now I found where the -2 comes from.
>  intel_dp_register_mst_connector() calls drm_connector_register(), which
>  fails to add the kobject (warning below).  But, it does zero error
>  checking on the drm_connector_register() call and leaves the
>  partially-constructed connector in place.
> 
>  The next time some poor, hapless code goes and tries to do anything with
>  that kdev, they oops.  I'm perplexed by this, though.  The
>  drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
>  a bit goofy that it can't even _return_ failure.
> 
>  Is there some stable code to go back to here?  Or, is there something
>  about my configuration that's unique?  I really wonder why nobody else
>  is running into this.
> 
>  There's probably some other race going on here.  This warning doesn't
>  happen on every boot.
> >>> This smells more like the root-cause: Something goes wrong on boot
> >>> that prevents connectors from properly registering, then we fall over
> >>> later on. And the register callback is intentionally void, assuming
> >>> that any prep work has been done earlier and that therefore the
> >>> register step can't fail. Can you pls check whether the oops later on
> >>> only happens together with this warning at boot, or whether they're
> >>> not correlated?
> >>
> >> Looking through my logs, I can't find any instance of the oops without
> >> the warning at boot.  So I do think the later oops is entirely caused by
> >> the issue warned about in early boot.
> > 
> > Hm, I guess then we'd need to fix that boot-up warning. Can you try to
> > figure out why it's unhappy? On a hunch it could be that we call
> > drm_connector_register from the mst probe worker before the main
> > driver load thread has reached the drm_dev_register call. A few printk
> > to decide whether that's the case (plus a few boot-up tests to gather
> > the statistics, sorry about that) would be real great.
> 
> I think your suspicion is correct.
> 
> The WARNING is getting printed from CPU0 (the
> drm_dp_mst_link_probe_work() thread) and we can see the
> drm_connector_register()s happening in parallel on CPU1 while the
> WARNING is getting printed out.
> 
> This seems to happen very consistently when i915 is built into the kernel.

Do you have

commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e
Author: Daniel Vetter 
Date:   Sun Dec 18 14:35:45 2016 +0100

drm: prevent double-(un)registration for connectors

Lack of that would perfectly explain that oops ... Otherwise still no idea
what's going wrong.
-Daniel

> 
> > [1.404237] drm_dev_register(02da)::209092608 cpu: 1
> > [1.404284] drm_connector_register(88040ed3e800)::379 
> > connector->registered: 0 cpu: 0
> > [1.404293] sysfs_create_dir_ns()::53 error: -2
> > [1.404295] create_dir()::75 error: -2 cpu: 0
> > [1.404298] [ cut here ]
> > [1.404304] WARNING: CPU: 0 PID: 3 at lib/kobject.c:249 
> > kobject_add_internal+0x273/0x330
> > [1.404308] kobject_add_internal failed for card0-DP-3 (error: -2 
> > parent: card0)
> > [1.404311] Modules linked in:
> > [1.404316] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 
> > 4.10.0-rc3-i915borked-dirty #66
> > [1.404319] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
> > (1.23 ) 09/20/2016
> > [1.404325] Workqueue: events_long drm_dp_mst_link_probe_work
> > [1.404328] Call Trace:
> > [1.404334]  dump_stack+0x67/0x99
> > [1.404338]  __warn+0xd1/0xf0
> > [1.404342]  warn_slowpath_fmt+0x4f/0x60
> > [1.404345]  kobject_add_internal+0x273/0x330
> > [1.404347]  kobject_add+0x65/0xb0
> > [1.404352]  ? klist_init+0x31/0x40
> > [1.404356]  device_add+0x102/0x5d0
> > [1.404360]  ? kfree_const+0x22/0x30
> > [1.404364]  device_create_groups_vargs+0xd8/0x100
> > [1.404367]  device_create_with_groups+0x36/0x40
> > [1.404370] drm_connector_register(88040c779000)::379 
> > connector->registered: 0 cpu: 1
> > [1.404374]  ? vprintk_default+0x29/0x50
> > [1.404375]  ? printk+0x4d/0x4f
> > [1.404378]  drm_sysfs_connector_add+0x60/0xe0
> > [1.404381]  drm_connector_register+0x4e/0x100
> > [1.404383]  intel_dp_register_mst_connector+0x41/0x50
> > [1.404385]  drm_dp_add_port+0x350/0x450
> > [1.404388]  ? rcu_early_boot_tests+0x1/0x10
> > [1.404390]  ? schedule_timeout+0x1cd/0x390
> > [1.404392]  ? __might_sleep+0x4a/0x90
> > [1.404394]  ? mutex_lock+0x25/0x50
> > [1.404395]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
> > [1.404397]  ? prepare_to_wait_event+0x120/0x120
> > [1.404399]  ? 

Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Alex Deucher
On Mon, Jan 9, 2017 at 12:22 PM, Dave Hansen  wrote:
> On 01/09/2017 08:59 AM, Daniel Vetter wrote:
>> On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen  wrote:
>>> On 01/09/2017 08:41 AM, Daniel Vetter wrote:
 On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
> Well, now I found where the -2 comes from.
> intel_dp_register_mst_connector() calls drm_connector_register(), which
> fails to add the kobject (warning below).  But, it does zero error
> checking on the drm_connector_register() call and leaves the
> partially-constructed connector in place.
>
> The next time some poor, hapless code goes and tries to do anything with
> that kdev, they oops.  I'm perplexed by this, though.  The
> drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
> a bit goofy that it can't even _return_ failure.
>
> Is there some stable code to go back to here?  Or, is there something
> about my configuration that's unique?  I really wonder why nobody else
> is running into this.
>
> There's probably some other race going on here.  This warning doesn't
> happen on every boot.
 This smells more like the root-cause: Something goes wrong on boot
 that prevents connectors from properly registering, then we fall over
 later on. And the register callback is intentionally void, assuming
 that any prep work has been done earlier and that therefore the
 register step can't fail. Can you pls check whether the oops later on
 only happens together with this warning at boot, or whether they're
 not correlated?
>>>
>>> Looking through my logs, I can't find any instance of the oops without
>>> the warning at boot.  So I do think the later oops is entirely caused by
>>> the issue warned about in early boot.
>>
>> Hm, I guess then we'd need to fix that boot-up warning. Can you try to
>> figure out why it's unhappy? On a hunch it could be that we call
>> drm_connector_register from the mst probe worker before the main
>> driver load thread has reached the drm_dev_register call. A few printk
>> to decide whether that's the case (plus a few boot-up tests to gather
>> the statistics, sorry about that) would be real great.
>>
>> If that's inconclusive I'm again a bit low on ideas ...
>
> I'll do that shortly.  But, for now I can confirm that the failure is
> precipitated by the !parent check in sysfs_create_dir_ns().
>
> I also can't reproduce this if I build i915 as a module.  It only
> happens when built in.

FWIW, we ran into a race with fbdev and mst topology discovery.  Maybe
you are seeing something similar?

Alex


>
>> Jan  9 09:07:34 ray kernel: [1.400547] sysfs_create_dir_ns()::53 error: 
>> -2
>> Jan  9 09:07:34 ray kernel: [1.400554] create_dir()::75 error: -2
>> Jan  9 09:07:34 ray kernel: [1.400558] [ cut here 
>> ]
>> Jan  9 09:07:34 ray kernel: [1.400565] WARNING: CPU: 1 PID: 90 at 
>> lib/kobject.c:249 kobject_add_internal+0x273/0x320
>> Jan  9 09:07:34 ray kernel: [1.400569] kobject_add_internal failed for 
>> card0-DP-3 (error: -2 parent: card0)
>> Jan  9 09:07:34 ray kernel: [1.400572] Modules linked in:
>> Jan  9 09:07:34 ray kernel: [1.400577] CPU: 1 PID: 90 Comm: kworker/1:2 
>> Not tainted 4.10.0-rc3-dirty #61
>> Jan  9 09:07:34 ray kernel: [1.400579] Hardware name: LENOVO 
>> 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016
>> Jan  9 09:07:34 ray kernel: [1.400585] Workqueue: events_long 
>> drm_dp_mst_link_probe_work
>> Jan  9 09:07:34 ray kernel: [1.400588] Call Trace:
>> Jan  9 09:07:34 ray kernel: [1.400593]  dump_stack+0x67/0x99
>> Jan  9 09:07:34 ray kernel: [1.400598]  __warn+0xd1/0xf0
>> Jan  9 09:07:34 ray kernel: [1.400601]  warn_slowpath_fmt+0x4f/0x60
>> Jan  9 09:07:34 ray kernel: [1.400604]  kobject_add_internal+0x273/0x320
>> Jan  9 09:07:34 ray kernel: [1.400607]  kobject_add+0x65/0xb0
>> Jan  9 09:07:34 ray kernel: [1.400611]  ? klist_init+0x31/0x40
>> Jan  9 09:07:34 ray kernel: [1.400615]  device_add+0x102/0x5d0
>> Jan  9 09:07:34 ray kernel: [1.400619]  ? kfree_const+0x22/0x30
>> Jan  9 09:07:34 ray kernel: [1.400623]  
>> device_create_groups_vargs+0xd8/0x100
>> Jan  9 09:07:34 ray kernel: [1.400626]  
>> device_create_with_groups+0x36/0x40
>> Jan  9 09:07:34 ray kernel: [1.400631]  ? 
>> drm_fb_helper_add_one_connector+0x57/0xd0
>> Jan  9 09:07:34 ray kernel: [1.400636]  ? 
>> kmem_cache_alloc_trace+0x1d2/0x1f0
>> Jan  9 09:07:34 ray kernel: [1.400641]  drm_sysfs_connector_add+0x60/0xe0
>> Jan  9 09:07:34 ray kernel: [1.400645]  drm_connector_register+0x21/0xc0
>> Jan  9 09:07:34 ray kernel: [1.400649]  
>> intel_dp_register_mst_connector+0x41/0x50
>> Jan  9 09:07:34 ray kernel: [1.400653]  drm_dp_add_port+0x350/0x450
>> Jan  9 09:07:34 ray kernel: [1.400657]  ? 

Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Alex Deucher
On Mon, Jan 9, 2017 at 12:22 PM, Dave Hansen  wrote:
> On 01/09/2017 08:59 AM, Daniel Vetter wrote:
>> On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen  wrote:
>>> On 01/09/2017 08:41 AM, Daniel Vetter wrote:
 On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
> Well, now I found where the -2 comes from.
> intel_dp_register_mst_connector() calls drm_connector_register(), which
> fails to add the kobject (warning below).  But, it does zero error
> checking on the drm_connector_register() call and leaves the
> partially-constructed connector in place.
>
> The next time some poor, hapless code goes and tries to do anything with
> that kdev, they oops.  I'm perplexed by this, though.  The
> drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
> a bit goofy that it can't even _return_ failure.
>
> Is there some stable code to go back to here?  Or, is there something
> about my configuration that's unique?  I really wonder why nobody else
> is running into this.
>
> There's probably some other race going on here.  This warning doesn't
> happen on every boot.
 This smells more like the root-cause: Something goes wrong on boot
 that prevents connectors from properly registering, then we fall over
 later on. And the register callback is intentionally void, assuming
 that any prep work has been done earlier and that therefore the
 register step can't fail. Can you pls check whether the oops later on
 only happens together with this warning at boot, or whether they're
 not correlated?
>>>
>>> Looking through my logs, I can't find any instance of the oops without
>>> the warning at boot.  So I do think the later oops is entirely caused by
>>> the issue warned about in early boot.
>>
>> Hm, I guess then we'd need to fix that boot-up warning. Can you try to
>> figure out why it's unhappy? On a hunch it could be that we call
>> drm_connector_register from the mst probe worker before the main
>> driver load thread has reached the drm_dev_register call. A few printk
>> to decide whether that's the case (plus a few boot-up tests to gather
>> the statistics, sorry about that) would be real great.
>>
>> If that's inconclusive I'm again a bit low on ideas ...
>
> I'll do that shortly.  But, for now I can confirm that the failure is
> precipitated by the !parent check in sysfs_create_dir_ns().
>
> I also can't reproduce this if I build i915 as a module.  It only
> happens when built in.

FWIW, we ran into a race with fbdev and mst topology discovery.  Maybe
you are seeing something similar?

Alex


>
>> Jan  9 09:07:34 ray kernel: [1.400547] sysfs_create_dir_ns()::53 error: 
>> -2
>> Jan  9 09:07:34 ray kernel: [1.400554] create_dir()::75 error: -2
>> Jan  9 09:07:34 ray kernel: [1.400558] [ cut here 
>> ]
>> Jan  9 09:07:34 ray kernel: [1.400565] WARNING: CPU: 1 PID: 90 at 
>> lib/kobject.c:249 kobject_add_internal+0x273/0x320
>> Jan  9 09:07:34 ray kernel: [1.400569] kobject_add_internal failed for 
>> card0-DP-3 (error: -2 parent: card0)
>> Jan  9 09:07:34 ray kernel: [1.400572] Modules linked in:
>> Jan  9 09:07:34 ray kernel: [1.400577] CPU: 1 PID: 90 Comm: kworker/1:2 
>> Not tainted 4.10.0-rc3-dirty #61
>> Jan  9 09:07:34 ray kernel: [1.400579] Hardware name: LENOVO 
>> 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016
>> Jan  9 09:07:34 ray kernel: [1.400585] Workqueue: events_long 
>> drm_dp_mst_link_probe_work
>> Jan  9 09:07:34 ray kernel: [1.400588] Call Trace:
>> Jan  9 09:07:34 ray kernel: [1.400593]  dump_stack+0x67/0x99
>> Jan  9 09:07:34 ray kernel: [1.400598]  __warn+0xd1/0xf0
>> Jan  9 09:07:34 ray kernel: [1.400601]  warn_slowpath_fmt+0x4f/0x60
>> Jan  9 09:07:34 ray kernel: [1.400604]  kobject_add_internal+0x273/0x320
>> Jan  9 09:07:34 ray kernel: [1.400607]  kobject_add+0x65/0xb0
>> Jan  9 09:07:34 ray kernel: [1.400611]  ? klist_init+0x31/0x40
>> Jan  9 09:07:34 ray kernel: [1.400615]  device_add+0x102/0x5d0
>> Jan  9 09:07:34 ray kernel: [1.400619]  ? kfree_const+0x22/0x30
>> Jan  9 09:07:34 ray kernel: [1.400623]  
>> device_create_groups_vargs+0xd8/0x100
>> Jan  9 09:07:34 ray kernel: [1.400626]  
>> device_create_with_groups+0x36/0x40
>> Jan  9 09:07:34 ray kernel: [1.400631]  ? 
>> drm_fb_helper_add_one_connector+0x57/0xd0
>> Jan  9 09:07:34 ray kernel: [1.400636]  ? 
>> kmem_cache_alloc_trace+0x1d2/0x1f0
>> Jan  9 09:07:34 ray kernel: [1.400641]  drm_sysfs_connector_add+0x60/0xe0
>> Jan  9 09:07:34 ray kernel: [1.400645]  drm_connector_register+0x21/0xc0
>> Jan  9 09:07:34 ray kernel: [1.400649]  
>> intel_dp_register_mst_connector+0x41/0x50
>> Jan  9 09:07:34 ray kernel: [1.400653]  drm_dp_add_port+0x350/0x450
>> Jan  9 09:07:34 ray kernel: [1.400657]  ? rcu_early_boot_tests+0x1/0x10
>> Jan  9 09:07:34 ray kernel: [1.400660]  ? 

Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Dave Hansen
On 01/09/2017 08:59 AM, Daniel Vetter wrote:
> On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen  wrote:
>> On 01/09/2017 08:41 AM, Daniel Vetter wrote:
>>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
 Well, now I found where the -2 comes from.
 intel_dp_register_mst_connector() calls drm_connector_register(), which
 fails to add the kobject (warning below).  But, it does zero error
 checking on the drm_connector_register() call and leaves the
 partially-constructed connector in place.

 The next time some poor, hapless code goes and tries to do anything with
 that kdev, they oops.  I'm perplexed by this, though.  The
 drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
 a bit goofy that it can't even _return_ failure.

 Is there some stable code to go back to here?  Or, is there something
 about my configuration that's unique?  I really wonder why nobody else
 is running into this.

 There's probably some other race going on here.  This warning doesn't
 happen on every boot.
>>> This smells more like the root-cause: Something goes wrong on boot
>>> that prevents connectors from properly registering, then we fall over
>>> later on. And the register callback is intentionally void, assuming
>>> that any prep work has been done earlier and that therefore the
>>> register step can't fail. Can you pls check whether the oops later on
>>> only happens together with this warning at boot, or whether they're
>>> not correlated?
>>
>> Looking through my logs, I can't find any instance of the oops without
>> the warning at boot.  So I do think the later oops is entirely caused by
>> the issue warned about in early boot.
> 
> Hm, I guess then we'd need to fix that boot-up warning. Can you try to
> figure out why it's unhappy? On a hunch it could be that we call
> drm_connector_register from the mst probe worker before the main
> driver load thread has reached the drm_dev_register call. A few printk
> to decide whether that's the case (plus a few boot-up tests to gather
> the statistics, sorry about that) would be real great.

I think your suspicion is correct.

The WARNING is getting printed from CPU0 (the
drm_dp_mst_link_probe_work() thread) and we can see the
drm_connector_register()s happening in parallel on CPU1 while the
WARNING is getting printed out.

This seems to happen very consistently when i915 is built into the kernel.

> [1.404237] drm_dev_register(02da)::209092608 cpu: 1
> [1.404284] drm_connector_register(88040ed3e800)::379 
> connector->registered: 0 cpu: 0
> [1.404293] sysfs_create_dir_ns()::53 error: -2
> [1.404295] create_dir()::75 error: -2 cpu: 0
> [1.404298] [ cut here ]
> [1.404304] WARNING: CPU: 0 PID: 3 at lib/kobject.c:249 
> kobject_add_internal+0x273/0x330
> [1.404308] kobject_add_internal failed for card0-DP-3 (error: -2 parent: 
> card0)
> [1.404311] Modules linked in:
> [1.404316] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 
> 4.10.0-rc3-i915borked-dirty #66
> [1.404319] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
> (1.23 ) 09/20/2016
> [1.404325] Workqueue: events_long drm_dp_mst_link_probe_work
> [1.404328] Call Trace:
> [1.404334]  dump_stack+0x67/0x99
> [1.404338]  __warn+0xd1/0xf0
> [1.404342]  warn_slowpath_fmt+0x4f/0x60
> [1.404345]  kobject_add_internal+0x273/0x330
> [1.404347]  kobject_add+0x65/0xb0
> [1.404352]  ? klist_init+0x31/0x40
> [1.404356]  device_add+0x102/0x5d0
> [1.404360]  ? kfree_const+0x22/0x30
> [1.404364]  device_create_groups_vargs+0xd8/0x100
> [1.404367]  device_create_with_groups+0x36/0x40
> [1.404370] drm_connector_register(88040c779000)::379 
> connector->registered: 0 cpu: 1
> [1.404374]  ? vprintk_default+0x29/0x50
> [1.404375]  ? printk+0x4d/0x4f
> [1.404378]  drm_sysfs_connector_add+0x60/0xe0
> [1.404381]  drm_connector_register+0x4e/0x100
> [1.404383]  intel_dp_register_mst_connector+0x41/0x50
> [1.404385]  drm_dp_add_port+0x350/0x450
> [1.404388]  ? rcu_early_boot_tests+0x1/0x10
> [1.404390]  ? schedule_timeout+0x1cd/0x390
> [1.404392]  ? __might_sleep+0x4a/0x90
> [1.404394]  ? mutex_lock+0x25/0x50
> [1.404395]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
> [1.404397]  ? prepare_to_wait_event+0x120/0x120
> [1.404399]  ? drm_dp_check_mstb_guid+0x3d/0x120
> [1.404401]  drm_dp_send_link_address+0x185/0x1f0
> [1.404403]  drm_dp_check_and_send_link_address+0xad/0xc0
> [1.404405]  drm_dp_mst_link_probe_work+0x57/0xa0
> [1.404407]  process_one_work+0x14b/0x430
> [1.404408]  worker_thread+0x12b/0x4a0
> [1.404410]  kthread+0x10c/0x140
> [1.404411]  ? process_one_work+0x430/0x430
> [1.404413]  ? kthread_create_on_node+0x40/0x40
> [1.404415]  ret_from_fork+0x27/0x40
> [1.404418] ---[ end 

Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Dave Hansen
On 01/09/2017 08:59 AM, Daniel Vetter wrote:
> On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen  wrote:
>> On 01/09/2017 08:41 AM, Daniel Vetter wrote:
>>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
 Well, now I found where the -2 comes from.
 intel_dp_register_mst_connector() calls drm_connector_register(), which
 fails to add the kobject (warning below).  But, it does zero error
 checking on the drm_connector_register() call and leaves the
 partially-constructed connector in place.

 The next time some poor, hapless code goes and tries to do anything with
 that kdev, they oops.  I'm perplexed by this, though.  The
 drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
 a bit goofy that it can't even _return_ failure.

 Is there some stable code to go back to here?  Or, is there something
 about my configuration that's unique?  I really wonder why nobody else
 is running into this.

 There's probably some other race going on here.  This warning doesn't
 happen on every boot.
>>> This smells more like the root-cause: Something goes wrong on boot
>>> that prevents connectors from properly registering, then we fall over
>>> later on. And the register callback is intentionally void, assuming
>>> that any prep work has been done earlier and that therefore the
>>> register step can't fail. Can you pls check whether the oops later on
>>> only happens together with this warning at boot, or whether they're
>>> not correlated?
>>
>> Looking through my logs, I can't find any instance of the oops without
>> the warning at boot.  So I do think the later oops is entirely caused by
>> the issue warned about in early boot.
> 
> Hm, I guess then we'd need to fix that boot-up warning. Can you try to
> figure out why it's unhappy? On a hunch it could be that we call
> drm_connector_register from the mst probe worker before the main
> driver load thread has reached the drm_dev_register call. A few printk
> to decide whether that's the case (plus a few boot-up tests to gather
> the statistics, sorry about that) would be real great.

I think your suspicion is correct.

The WARNING is getting printed from CPU0 (the
drm_dp_mst_link_probe_work() thread) and we can see the
drm_connector_register()s happening in parallel on CPU1 while the
WARNING is getting printed out.

This seems to happen very consistently when i915 is built into the kernel.

> [1.404237] drm_dev_register(02da)::209092608 cpu: 1
> [1.404284] drm_connector_register(88040ed3e800)::379 
> connector->registered: 0 cpu: 0
> [1.404293] sysfs_create_dir_ns()::53 error: -2
> [1.404295] create_dir()::75 error: -2 cpu: 0
> [1.404298] [ cut here ]
> [1.404304] WARNING: CPU: 0 PID: 3 at lib/kobject.c:249 
> kobject_add_internal+0x273/0x330
> [1.404308] kobject_add_internal failed for card0-DP-3 (error: -2 parent: 
> card0)
> [1.404311] Modules linked in:
> [1.404316] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 
> 4.10.0-rc3-i915borked-dirty #66
> [1.404319] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
> (1.23 ) 09/20/2016
> [1.404325] Workqueue: events_long drm_dp_mst_link_probe_work
> [1.404328] Call Trace:
> [1.404334]  dump_stack+0x67/0x99
> [1.404338]  __warn+0xd1/0xf0
> [1.404342]  warn_slowpath_fmt+0x4f/0x60
> [1.404345]  kobject_add_internal+0x273/0x330
> [1.404347]  kobject_add+0x65/0xb0
> [1.404352]  ? klist_init+0x31/0x40
> [1.404356]  device_add+0x102/0x5d0
> [1.404360]  ? kfree_const+0x22/0x30
> [1.404364]  device_create_groups_vargs+0xd8/0x100
> [1.404367]  device_create_with_groups+0x36/0x40
> [1.404370] drm_connector_register(88040c779000)::379 
> connector->registered: 0 cpu: 1
> [1.404374]  ? vprintk_default+0x29/0x50
> [1.404375]  ? printk+0x4d/0x4f
> [1.404378]  drm_sysfs_connector_add+0x60/0xe0
> [1.404381]  drm_connector_register+0x4e/0x100
> [1.404383]  intel_dp_register_mst_connector+0x41/0x50
> [1.404385]  drm_dp_add_port+0x350/0x450
> [1.404388]  ? rcu_early_boot_tests+0x1/0x10
> [1.404390]  ? schedule_timeout+0x1cd/0x390
> [1.404392]  ? __might_sleep+0x4a/0x90
> [1.404394]  ? mutex_lock+0x25/0x50
> [1.404395]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
> [1.404397]  ? prepare_to_wait_event+0x120/0x120
> [1.404399]  ? drm_dp_check_mstb_guid+0x3d/0x120
> [1.404401]  drm_dp_send_link_address+0x185/0x1f0
> [1.404403]  drm_dp_check_and_send_link_address+0xad/0xc0
> [1.404405]  drm_dp_mst_link_probe_work+0x57/0xa0
> [1.404407]  process_one_work+0x14b/0x430
> [1.404408]  worker_thread+0x12b/0x4a0
> [1.404410]  kthread+0x10c/0x140
> [1.404411]  ? process_one_work+0x430/0x430
> [1.404413]  ? kthread_create_on_node+0x40/0x40
> [1.404415]  ret_from_fork+0x27/0x40
> [1.404418] ---[ end trace 952a97b7cba4984f ]---
> [1.404420] 

Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Dave Hansen
On 01/09/2017 08:59 AM, Daniel Vetter wrote:
> On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen  wrote:
>> On 01/09/2017 08:41 AM, Daniel Vetter wrote:
>>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
 Well, now I found where the -2 comes from.
 intel_dp_register_mst_connector() calls drm_connector_register(), which
 fails to add the kobject (warning below).  But, it does zero error
 checking on the drm_connector_register() call and leaves the
 partially-constructed connector in place.

 The next time some poor, hapless code goes and tries to do anything with
 that kdev, they oops.  I'm perplexed by this, though.  The
 drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
 a bit goofy that it can't even _return_ failure.

 Is there some stable code to go back to here?  Or, is there something
 about my configuration that's unique?  I really wonder why nobody else
 is running into this.

 There's probably some other race going on here.  This warning doesn't
 happen on every boot.
>>> This smells more like the root-cause: Something goes wrong on boot
>>> that prevents connectors from properly registering, then we fall over
>>> later on. And the register callback is intentionally void, assuming
>>> that any prep work has been done earlier and that therefore the
>>> register step can't fail. Can you pls check whether the oops later on
>>> only happens together with this warning at boot, or whether they're
>>> not correlated?
>>
>> Looking through my logs, I can't find any instance of the oops without
>> the warning at boot.  So I do think the later oops is entirely caused by
>> the issue warned about in early boot.
> 
> Hm, I guess then we'd need to fix that boot-up warning. Can you try to
> figure out why it's unhappy? On a hunch it could be that we call
> drm_connector_register from the mst probe worker before the main
> driver load thread has reached the drm_dev_register call. A few printk
> to decide whether that's the case (plus a few boot-up tests to gather
> the statistics, sorry about that) would be real great.
> 
> If that's inconclusive I'm again a bit low on ideas ...

I'll do that shortly.  But, for now I can confirm that the failure is
precipitated by the !parent check in sysfs_create_dir_ns().

I also can't reproduce this if I build i915 as a module.  It only
happens when built in.

> Jan  9 09:07:34 ray kernel: [1.400547] sysfs_create_dir_ns()::53 error: -2
> Jan  9 09:07:34 ray kernel: [1.400554] create_dir()::75 error: -2
> Jan  9 09:07:34 ray kernel: [1.400558] [ cut here 
> ]
> Jan  9 09:07:34 ray kernel: [1.400565] WARNING: CPU: 1 PID: 90 at 
> lib/kobject.c:249 kobject_add_internal+0x273/0x320
> Jan  9 09:07:34 ray kernel: [1.400569] kobject_add_internal failed for 
> card0-DP-3 (error: -2 parent: card0)
> Jan  9 09:07:34 ray kernel: [1.400572] Modules linked in:
> Jan  9 09:07:34 ray kernel: [1.400577] CPU: 1 PID: 90 Comm: kworker/1:2 
> Not tainted 4.10.0-rc3-dirty #61
> Jan  9 09:07:34 ray kernel: [1.400579] Hardware name: LENOVO 
> 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016
> Jan  9 09:07:34 ray kernel: [1.400585] Workqueue: events_long 
> drm_dp_mst_link_probe_work
> Jan  9 09:07:34 ray kernel: [1.400588] Call Trace:
> Jan  9 09:07:34 ray kernel: [1.400593]  dump_stack+0x67/0x99
> Jan  9 09:07:34 ray kernel: [1.400598]  __warn+0xd1/0xf0
> Jan  9 09:07:34 ray kernel: [1.400601]  warn_slowpath_fmt+0x4f/0x60
> Jan  9 09:07:34 ray kernel: [1.400604]  kobject_add_internal+0x273/0x320
> Jan  9 09:07:34 ray kernel: [1.400607]  kobject_add+0x65/0xb0
> Jan  9 09:07:34 ray kernel: [1.400611]  ? klist_init+0x31/0x40
> Jan  9 09:07:34 ray kernel: [1.400615]  device_add+0x102/0x5d0
> Jan  9 09:07:34 ray kernel: [1.400619]  ? kfree_const+0x22/0x30
> Jan  9 09:07:34 ray kernel: [1.400623]  
> device_create_groups_vargs+0xd8/0x100
> Jan  9 09:07:34 ray kernel: [1.400626]  
> device_create_with_groups+0x36/0x40
> Jan  9 09:07:34 ray kernel: [1.400631]  ? 
> drm_fb_helper_add_one_connector+0x57/0xd0
> Jan  9 09:07:34 ray kernel: [1.400636]  ? 
> kmem_cache_alloc_trace+0x1d2/0x1f0
> Jan  9 09:07:34 ray kernel: [1.400641]  drm_sysfs_connector_add+0x60/0xe0
> Jan  9 09:07:34 ray kernel: [1.400645]  drm_connector_register+0x21/0xc0
> Jan  9 09:07:34 ray kernel: [1.400649]  
> intel_dp_register_mst_connector+0x41/0x50
> Jan  9 09:07:34 ray kernel: [1.400653]  drm_dp_add_port+0x350/0x450
> Jan  9 09:07:34 ray kernel: [1.400657]  ? rcu_early_boot_tests+0x1/0x10
> Jan  9 09:07:34 ray kernel: [1.400660]  ? schedule_timeout+0x1cd/0x390
> Jan  9 09:07:34 ray kernel: [1.400664]  ? __might_sleep+0x4a/0x90
> Jan  9 09:07:34 ray kernel: [1.400667]  ? mutex_lock+0x25/0x50
> Jan  9 09:07:34 ray kernel: [1.400670]  ? 
> 

Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Dave Hansen
On 01/09/2017 08:59 AM, Daniel Vetter wrote:
> On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen  wrote:
>> On 01/09/2017 08:41 AM, Daniel Vetter wrote:
>>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
 Well, now I found where the -2 comes from.
 intel_dp_register_mst_connector() calls drm_connector_register(), which
 fails to add the kobject (warning below).  But, it does zero error
 checking on the drm_connector_register() call and leaves the
 partially-constructed connector in place.

 The next time some poor, hapless code goes and tries to do anything with
 that kdev, they oops.  I'm perplexed by this, though.  The
 drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
 a bit goofy that it can't even _return_ failure.

 Is there some stable code to go back to here?  Or, is there something
 about my configuration that's unique?  I really wonder why nobody else
 is running into this.

 There's probably some other race going on here.  This warning doesn't
 happen on every boot.
>>> This smells more like the root-cause: Something goes wrong on boot
>>> that prevents connectors from properly registering, then we fall over
>>> later on. And the register callback is intentionally void, assuming
>>> that any prep work has been done earlier and that therefore the
>>> register step can't fail. Can you pls check whether the oops later on
>>> only happens together with this warning at boot, or whether they're
>>> not correlated?
>>
>> Looking through my logs, I can't find any instance of the oops without
>> the warning at boot.  So I do think the later oops is entirely caused by
>> the issue warned about in early boot.
> 
> Hm, I guess then we'd need to fix that boot-up warning. Can you try to
> figure out why it's unhappy? On a hunch it could be that we call
> drm_connector_register from the mst probe worker before the main
> driver load thread has reached the drm_dev_register call. A few printk
> to decide whether that's the case (plus a few boot-up tests to gather
> the statistics, sorry about that) would be real great.
> 
> If that's inconclusive I'm again a bit low on ideas ...

I'll do that shortly.  But, for now I can confirm that the failure is
precipitated by the !parent check in sysfs_create_dir_ns().

I also can't reproduce this if I build i915 as a module.  It only
happens when built in.

> Jan  9 09:07:34 ray kernel: [1.400547] sysfs_create_dir_ns()::53 error: -2
> Jan  9 09:07:34 ray kernel: [1.400554] create_dir()::75 error: -2
> Jan  9 09:07:34 ray kernel: [1.400558] [ cut here 
> ]
> Jan  9 09:07:34 ray kernel: [1.400565] WARNING: CPU: 1 PID: 90 at 
> lib/kobject.c:249 kobject_add_internal+0x273/0x320
> Jan  9 09:07:34 ray kernel: [1.400569] kobject_add_internal failed for 
> card0-DP-3 (error: -2 parent: card0)
> Jan  9 09:07:34 ray kernel: [1.400572] Modules linked in:
> Jan  9 09:07:34 ray kernel: [1.400577] CPU: 1 PID: 90 Comm: kworker/1:2 
> Not tainted 4.10.0-rc3-dirty #61
> Jan  9 09:07:34 ray kernel: [1.400579] Hardware name: LENOVO 
> 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016
> Jan  9 09:07:34 ray kernel: [1.400585] Workqueue: events_long 
> drm_dp_mst_link_probe_work
> Jan  9 09:07:34 ray kernel: [1.400588] Call Trace:
> Jan  9 09:07:34 ray kernel: [1.400593]  dump_stack+0x67/0x99
> Jan  9 09:07:34 ray kernel: [1.400598]  __warn+0xd1/0xf0
> Jan  9 09:07:34 ray kernel: [1.400601]  warn_slowpath_fmt+0x4f/0x60
> Jan  9 09:07:34 ray kernel: [1.400604]  kobject_add_internal+0x273/0x320
> Jan  9 09:07:34 ray kernel: [1.400607]  kobject_add+0x65/0xb0
> Jan  9 09:07:34 ray kernel: [1.400611]  ? klist_init+0x31/0x40
> Jan  9 09:07:34 ray kernel: [1.400615]  device_add+0x102/0x5d0
> Jan  9 09:07:34 ray kernel: [1.400619]  ? kfree_const+0x22/0x30
> Jan  9 09:07:34 ray kernel: [1.400623]  
> device_create_groups_vargs+0xd8/0x100
> Jan  9 09:07:34 ray kernel: [1.400626]  
> device_create_with_groups+0x36/0x40
> Jan  9 09:07:34 ray kernel: [1.400631]  ? 
> drm_fb_helper_add_one_connector+0x57/0xd0
> Jan  9 09:07:34 ray kernel: [1.400636]  ? 
> kmem_cache_alloc_trace+0x1d2/0x1f0
> Jan  9 09:07:34 ray kernel: [1.400641]  drm_sysfs_connector_add+0x60/0xe0
> Jan  9 09:07:34 ray kernel: [1.400645]  drm_connector_register+0x21/0xc0
> Jan  9 09:07:34 ray kernel: [1.400649]  
> intel_dp_register_mst_connector+0x41/0x50
> Jan  9 09:07:34 ray kernel: [1.400653]  drm_dp_add_port+0x350/0x450
> Jan  9 09:07:34 ray kernel: [1.400657]  ? rcu_early_boot_tests+0x1/0x10
> Jan  9 09:07:34 ray kernel: [1.400660]  ? schedule_timeout+0x1cd/0x390
> Jan  9 09:07:34 ray kernel: [1.400664]  ? __might_sleep+0x4a/0x90
> Jan  9 09:07:34 ray kernel: [1.400667]  ? mutex_lock+0x25/0x50
> Jan  9 09:07:34 ray kernel: [1.400670]  ? 
> drm_dp_mst_wait_tx_reply+0x118/0x1e0
> Jan  9 09:07:34 

Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Daniel Vetter
On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen  wrote:
> On 01/09/2017 08:41 AM, Daniel Vetter wrote:
>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
>>> Well, now I found where the -2 comes from.
>>> intel_dp_register_mst_connector() calls drm_connector_register(), which
>>> fails to add the kobject (warning below).  But, it does zero error
>>> checking on the drm_connector_register() call and leaves the
>>> partially-constructed connector in place.
>>>
>>> The next time some poor, hapless code goes and tries to do anything with
>>> that kdev, they oops.  I'm perplexed by this, though.  The
>>> drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
>>> a bit goofy that it can't even _return_ failure.
>>>
>>> Is there some stable code to go back to here?  Or, is there something
>>> about my configuration that's unique?  I really wonder why nobody else
>>> is running into this.
>>>
>>> There's probably some other race going on here.  This warning doesn't
>>> happen on every boot.
>> This smells more like the root-cause: Something goes wrong on boot
>> that prevents connectors from properly registering, then we fall over
>> later on. And the register callback is intentionally void, assuming
>> that any prep work has been done earlier and that therefore the
>> register step can't fail. Can you pls check whether the oops later on
>> only happens together with this warning at boot, or whether they're
>> not correlated?
>
> Looking through my logs, I can't find any instance of the oops without
> the warning at boot.  So I do think the later oops is entirely caused by
> the issue warned about in early boot.

Hm, I guess then we'd need to fix that boot-up warning. Can you try to
figure out why it's unhappy? On a hunch it could be that we call
drm_connector_register from the mst probe worker before the main
driver load thread has reached the drm_dev_register call. A few printk
to decide whether that's the case (plus a few boot-up tests to gather
the statistics, sorry about that) would be real great.

If that's inconclusive I'm again a bit low on ideas ...

> My distro kernel (4.4.0-57-generic) is also unstable, but I haven't
> managed to capture a good oops there.  It's hitting this, which I assume
> is unrelated:
>
> WARNING: CPU: 0 PID: 41 at /build/linux-lts-xenial-FdAdUy/linux-
> lts-xenial-4.4.0/ubuntu/i915/intel_pm.c:3675
> skl_update_other_pipe_wm+0x191/0x1a0 [i915_bpo]()

wm programming issues, which will kill your box. Needs a newer kernel
to fix (both the wm programming issues, and that wm programming issues
lead to system death).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Daniel Vetter
On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen  wrote:
> On 01/09/2017 08:41 AM, Daniel Vetter wrote:
>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
>>> Well, now I found where the -2 comes from.
>>> intel_dp_register_mst_connector() calls drm_connector_register(), which
>>> fails to add the kobject (warning below).  But, it does zero error
>>> checking on the drm_connector_register() call and leaves the
>>> partially-constructed connector in place.
>>>
>>> The next time some poor, hapless code goes and tries to do anything with
>>> that kdev, they oops.  I'm perplexed by this, though.  The
>>> drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
>>> a bit goofy that it can't even _return_ failure.
>>>
>>> Is there some stable code to go back to here?  Or, is there something
>>> about my configuration that's unique?  I really wonder why nobody else
>>> is running into this.
>>>
>>> There's probably some other race going on here.  This warning doesn't
>>> happen on every boot.
>> This smells more like the root-cause: Something goes wrong on boot
>> that prevents connectors from properly registering, then we fall over
>> later on. And the register callback is intentionally void, assuming
>> that any prep work has been done earlier and that therefore the
>> register step can't fail. Can you pls check whether the oops later on
>> only happens together with this warning at boot, or whether they're
>> not correlated?
>
> Looking through my logs, I can't find any instance of the oops without
> the warning at boot.  So I do think the later oops is entirely caused by
> the issue warned about in early boot.

Hm, I guess then we'd need to fix that boot-up warning. Can you try to
figure out why it's unhappy? On a hunch it could be that we call
drm_connector_register from the mst probe worker before the main
driver load thread has reached the drm_dev_register call. A few printk
to decide whether that's the case (plus a few boot-up tests to gather
the statistics, sorry about that) would be real great.

If that's inconclusive I'm again a bit low on ideas ...

> My distro kernel (4.4.0-57-generic) is also unstable, but I haven't
> managed to capture a good oops there.  It's hitting this, which I assume
> is unrelated:
>
> WARNING: CPU: 0 PID: 41 at /build/linux-lts-xenial-FdAdUy/linux-
> lts-xenial-4.4.0/ubuntu/i915/intel_pm.c:3675
> skl_update_other_pipe_wm+0x191/0x1a0 [i915_bpo]()

wm programming issues, which will kill your box. Needs a newer kernel
to fix (both the wm programming issues, and that wm programming issues
lead to system death).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Dave Hansen
On 01/09/2017 08:41 AM, Daniel Vetter wrote:
> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
>> Well, now I found where the -2 comes from.
>> intel_dp_register_mst_connector() calls drm_connector_register(), which
>> fails to add the kobject (warning below).  But, it does zero error
>> checking on the drm_connector_register() call and leaves the
>> partially-constructed connector in place.
>>
>> The next time some poor, hapless code goes and tries to do anything with
>> that kdev, they oops.  I'm perplexed by this, though.  The
>> drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
>> a bit goofy that it can't even _return_ failure.
>>
>> Is there some stable code to go back to here?  Or, is there something
>> about my configuration that's unique?  I really wonder why nobody else
>> is running into this.
>>
>> There's probably some other race going on here.  This warning doesn't
>> happen on every boot.
> This smells more like the root-cause: Something goes wrong on boot
> that prevents connectors from properly registering, then we fall over
> later on. And the register callback is intentionally void, assuming
> that any prep work has been done earlier and that therefore the
> register step can't fail. Can you pls check whether the oops later on
> only happens together with this warning at boot, or whether they're
> not correlated?

Looking through my logs, I can't find any instance of the oops without
the warning at boot.  So I do think the later oops is entirely caused by
the issue warned about in early boot.

My distro kernel (4.4.0-57-generic) is also unstable, but I haven't
managed to capture a good oops there.  It's hitting this, which I assume
is unrelated:

WARNING: CPU: 0 PID: 41 at /build/linux-lts-xenial-FdAdUy/linux-
lts-xenial-4.4.0/ubuntu/i915/intel_pm.c:3675
skl_update_other_pipe_wm+0x191/0x1a0 [i915_bpo]()



Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Dave Hansen
On 01/09/2017 08:41 AM, Daniel Vetter wrote:
> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
>> Well, now I found where the -2 comes from.
>> intel_dp_register_mst_connector() calls drm_connector_register(), which
>> fails to add the kobject (warning below).  But, it does zero error
>> checking on the drm_connector_register() call and leaves the
>> partially-constructed connector in place.
>>
>> The next time some poor, hapless code goes and tries to do anything with
>> that kdev, they oops.  I'm perplexed by this, though.  The
>> drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
>> a bit goofy that it can't even _return_ failure.
>>
>> Is there some stable code to go back to here?  Or, is there something
>> about my configuration that's unique?  I really wonder why nobody else
>> is running into this.
>>
>> There's probably some other race going on here.  This warning doesn't
>> happen on every boot.
> This smells more like the root-cause: Something goes wrong on boot
> that prevents connectors from properly registering, then we fall over
> later on. And the register callback is intentionally void, assuming
> that any prep work has been done earlier and that therefore the
> register step can't fail. Can you pls check whether the oops later on
> only happens together with this warning at boot, or whether they're
> not correlated?

Looking through my logs, I can't find any instance of the oops without
the warning at boot.  So I do think the later oops is entirely caused by
the issue warned about in early boot.

My distro kernel (4.4.0-57-generic) is also unstable, but I haven't
managed to capture a good oops there.  It's hitting this, which I assume
is unrelated:

WARNING: CPU: 0 PID: 41 at /build/linux-lts-xenial-FdAdUy/linux-
lts-xenial-4.4.0/ubuntu/i915/intel_pm.c:3675
skl_update_other_pipe_wm+0x191/0x1a0 [i915_bpo]()



Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Daniel Vetter
On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
> Well, now I found where the -2 comes from.
> intel_dp_register_mst_connector() calls drm_connector_register(), which
> fails to add the kobject (warning below).  But, it does zero error
> checking on the drm_connector_register() call and leaves the
> partially-constructed connector in place.
>
> The next time some poor, hapless code goes and tries to do anything with
> that kdev, they oops.  I'm perplexed by this, though.  The
> drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
> a bit goofy that it can't even _return_ failure.
>
> Is there some stable code to go back to here?  Or, is there something
> about my configuration that's unique?  I really wonder why nobody else
> is running into this.
>
> There's probably some other race going on here.  This warning doesn't
> happen on every boot.

This smells more like the root-cause: Something goes wrong on boot
that prevents connectors from properly registering, then we fall over
later on. And the register callback is intentionally void, assuming
that any prep work has been done earlier and that therefore the
register step can't fail. Can you pls check whether the oops later on
only happens together with this warning at boot, or whether they're
not correlated?

Still no idea though why this blows up for you.
-Daniel

>
>> [1.396475] [ cut here ]
>> [1.396486] WARNING: CPU: 0 PID: 91 at lib/kobject.c:244 
>> kobject_add_internal+0x108/0x2d0
>> [1.396492] kobject_add_internal failed for card0-DP-3 (error: -2 parent: 
>> card0)
>> [1.396496] Modules linked in:
>> [1.396500] CPU: 0 PID: 91 Comm: kworker/0:2 Not tainted 4.10.0-rc2-dirty 
>> #58
>> [1.396503] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
>> (1.23 ) 09/20/2016
>> [1.396509] Workqueue: events_long drm_dp_mst_link_probe_work
>> [1.396511] Call Trace:
>> [1.396517]  dump_stack+0x67/0x99
>> [1.396521]  __warn+0xd1/0xf0
>> [1.396524]  warn_slowpath_fmt+0x4f/0x60
>> [1.396527]  kobject_add_internal+0x108/0x2d0
>> [1.396531]  ? vsnprintf+0x112/0x4b0
>> [1.396533]  kobject_add+0x65/0xb0
>> [1.396537]  ? klist_init+0x31/0x40
>> [1.396542]  device_add+0x102/0x5d0
>> [1.396546]  ? kfree_const+0x22/0x30
>> [1.396549]  device_create_groups_vargs+0xd8/0x100
>> [1.396553]  device_create_with_groups+0x36/0x40
>> [1.396557]  ? drm_fb_helper_add_one_connector+0x57/0xd0
>> [1.396561]  ? kmem_cache_alloc_trace+0x1d2/0x1f0
>> [1.396566]  drm_sysfs_connector_add+0x60/0xe0
>> [1.396570]  drm_connector_register+0x21/0xc0
>> [1.396575]  intel_dp_register_mst_connector+0x41/0x50
>> [1.396579]  drm_dp_add_port+0x350/0x450
>> [1.396583]  ? rcu_early_boot_tests+0x1/0x10
>> [1.396587]  ? schedule_timeout+0x1cd/0x390
>> [1.396591]  ? __might_sleep+0x4a/0x90
>> [1.396594]  ? mutex_lock+0x25/0x50
>> [1.396597]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
>> [1.396600]  ? prepare_to_wait_event+0x120/0x120
>> [1.396603]  ? drm_dp_check_mstb_guid+0x3d/0x120
>> [1.396606] drm_sysfs_connector_add() connector: 88040ed3f000 kdev: 
>> 88040c55dc00
>> [1.396610]  drm_dp_send_link_address+0x185/0x1f0
>> [1.396614]  drm_dp_check_and_send_link_address+0xad/0xc0
>> [1.396617]  drm_dp_mst_link_probe_work+0x57/0xa0
>> [1.396620]  process_one_work+0x14b/0x430
>> [1.396623]  worker_thread+0x12b/0x4a0
>> [1.396626]  kthread+0x10c/0x140
>> [1.396629]  ? process_one_work+0x430/0x430
>> [1.396632]  ? kthread_create_on_node+0x40/0x40
>> [1.396635]  ret_from_fork+0x27/0x40
>> [1.396639] ---[ end trace dd05bb2cd9ebd900 ]---
>> [1.396643] drm_sysfs_connector_add() connector: 88040c75a000 kdev: 
>> fffe
>> [1.396648] [drm:drm_sysfs_connector_add] *ERROR* failed to register 
>> connector device: -2
>> [1.396651] drm_connector_register(88040ed3f000) 
>> connector->registered: 1
>> [1.39] drm_sysfs_connector_add() connector: 88040ed3f800 kdev: 
>> 88040c55e400
>> [1.396700] drm_connector_register(88040ed3f800) 
>> connector->registered: 1
>> [1.396717] drm_sysfs_connector_add() connector: 88040ed39800 kdev: 
>> 88040c55e800
>> [1.396726] drm_connector_register(88040ed39800) 
>> connector->registered: 1
>> [1.396743] drm_sysfs_connector_add() connector: 88040c758000 kdev: 
>> 88040c55ec00
>> [1.396769] drm_connector_register(88040c758000) 
>> connector->registered: 1
>> [1.396786] drm_sysfs_connector_add() connector: 88040c758800 kdev: 
>> 88040c55f000
>> [1.396799] drm_connector_register(88040c758800) 
>> connector->registered: 1
>> [1.396804] drm_connector_register(88040c75a000) 
>> connector->registered: 1
>> [1.398665] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
>> [1.398893] drm_sysfs_connector_add() 

Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Daniel Vetter
On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen  wrote:
> Well, now I found where the -2 comes from.
> intel_dp_register_mst_connector() calls drm_connector_register(), which
> fails to add the kobject (warning below).  But, it does zero error
> checking on the drm_connector_register() call and leaves the
> partially-constructed connector in place.
>
> The next time some poor, hapless code goes and tries to do anything with
> that kdev, they oops.  I'm perplexed by this, though.  The
> drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
> a bit goofy that it can't even _return_ failure.
>
> Is there some stable code to go back to here?  Or, is there something
> about my configuration that's unique?  I really wonder why nobody else
> is running into this.
>
> There's probably some other race going on here.  This warning doesn't
> happen on every boot.

This smells more like the root-cause: Something goes wrong on boot
that prevents connectors from properly registering, then we fall over
later on. And the register callback is intentionally void, assuming
that any prep work has been done earlier and that therefore the
register step can't fail. Can you pls check whether the oops later on
only happens together with this warning at boot, or whether they're
not correlated?

Still no idea though why this blows up for you.
-Daniel

>
>> [1.396475] [ cut here ]
>> [1.396486] WARNING: CPU: 0 PID: 91 at lib/kobject.c:244 
>> kobject_add_internal+0x108/0x2d0
>> [1.396492] kobject_add_internal failed for card0-DP-3 (error: -2 parent: 
>> card0)
>> [1.396496] Modules linked in:
>> [1.396500] CPU: 0 PID: 91 Comm: kworker/0:2 Not tainted 4.10.0-rc2-dirty 
>> #58
>> [1.396503] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
>> (1.23 ) 09/20/2016
>> [1.396509] Workqueue: events_long drm_dp_mst_link_probe_work
>> [1.396511] Call Trace:
>> [1.396517]  dump_stack+0x67/0x99
>> [1.396521]  __warn+0xd1/0xf0
>> [1.396524]  warn_slowpath_fmt+0x4f/0x60
>> [1.396527]  kobject_add_internal+0x108/0x2d0
>> [1.396531]  ? vsnprintf+0x112/0x4b0
>> [1.396533]  kobject_add+0x65/0xb0
>> [1.396537]  ? klist_init+0x31/0x40
>> [1.396542]  device_add+0x102/0x5d0
>> [1.396546]  ? kfree_const+0x22/0x30
>> [1.396549]  device_create_groups_vargs+0xd8/0x100
>> [1.396553]  device_create_with_groups+0x36/0x40
>> [1.396557]  ? drm_fb_helper_add_one_connector+0x57/0xd0
>> [1.396561]  ? kmem_cache_alloc_trace+0x1d2/0x1f0
>> [1.396566]  drm_sysfs_connector_add+0x60/0xe0
>> [1.396570]  drm_connector_register+0x21/0xc0
>> [1.396575]  intel_dp_register_mst_connector+0x41/0x50
>> [1.396579]  drm_dp_add_port+0x350/0x450
>> [1.396583]  ? rcu_early_boot_tests+0x1/0x10
>> [1.396587]  ? schedule_timeout+0x1cd/0x390
>> [1.396591]  ? __might_sleep+0x4a/0x90
>> [1.396594]  ? mutex_lock+0x25/0x50
>> [1.396597]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
>> [1.396600]  ? prepare_to_wait_event+0x120/0x120
>> [1.396603]  ? drm_dp_check_mstb_guid+0x3d/0x120
>> [1.396606] drm_sysfs_connector_add() connector: 88040ed3f000 kdev: 
>> 88040c55dc00
>> [1.396610]  drm_dp_send_link_address+0x185/0x1f0
>> [1.396614]  drm_dp_check_and_send_link_address+0xad/0xc0
>> [1.396617]  drm_dp_mst_link_probe_work+0x57/0xa0
>> [1.396620]  process_one_work+0x14b/0x430
>> [1.396623]  worker_thread+0x12b/0x4a0
>> [1.396626]  kthread+0x10c/0x140
>> [1.396629]  ? process_one_work+0x430/0x430
>> [1.396632]  ? kthread_create_on_node+0x40/0x40
>> [1.396635]  ret_from_fork+0x27/0x40
>> [1.396639] ---[ end trace dd05bb2cd9ebd900 ]---
>> [1.396643] drm_sysfs_connector_add() connector: 88040c75a000 kdev: 
>> fffe
>> [1.396648] [drm:drm_sysfs_connector_add] *ERROR* failed to register 
>> connector device: -2
>> [1.396651] drm_connector_register(88040ed3f000) 
>> connector->registered: 1
>> [1.39] drm_sysfs_connector_add() connector: 88040ed3f800 kdev: 
>> 88040c55e400
>> [1.396700] drm_connector_register(88040ed3f800) 
>> connector->registered: 1
>> [1.396717] drm_sysfs_connector_add() connector: 88040ed39800 kdev: 
>> 88040c55e800
>> [1.396726] drm_connector_register(88040ed39800) 
>> connector->registered: 1
>> [1.396743] drm_sysfs_connector_add() connector: 88040c758000 kdev: 
>> 88040c55ec00
>> [1.396769] drm_connector_register(88040c758000) 
>> connector->registered: 1
>> [1.396786] drm_sysfs_connector_add() connector: 88040c758800 kdev: 
>> 88040c55f000
>> [1.396799] drm_connector_register(88040c758800) 
>> connector->registered: 1
>> [1.396804] drm_connector_register(88040c75a000) 
>> connector->registered: 1
>> [1.398665] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
>> [1.398893] drm_sysfs_connector_add() connector: 

Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Dave Hansen
On 01/09/2017 05:40 AM, Dave Hansen wrote:
> Is there some stable code to go back to here?  Or, is there something
> about my configuration that's unique?  I really wonder why nobody else
> is running into this.

Here are a couple of similar-looking reports, if that helps:

https://bugzilla.redhat.com/attachment.cgi?id=1209947
https://bugzilla.redhat.com/show_bug.cgi?id=1382342
http://marc.info/?l=openbsd-bugs=147487082821348=2


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Dave Hansen
On 01/09/2017 05:40 AM, Dave Hansen wrote:
> Is there some stable code to go back to here?  Or, is there something
> about my configuration that's unique?  I really wonder why nobody else
> is running into this.

Here are a couple of similar-looking reports, if that helps:

https://bugzilla.redhat.com/attachment.cgi?id=1209947
https://bugzilla.redhat.com/show_bug.cgi?id=1382342
http://marc.info/?l=openbsd-bugs=147487082821348=2


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Dave Hansen
Well, now I found where the -2 comes from.
intel_dp_register_mst_connector() calls drm_connector_register(), which
fails to add the kobject (warning below).  But, it does zero error
checking on the drm_connector_register() call and leaves the
partially-constructed connector in place.

The next time some poor, hapless code goes and tries to do anything with
that kdev, they oops.  I'm perplexed by this, though.  The
drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
a bit goofy that it can't even _return_ failure.

Is there some stable code to go back to here?  Or, is there something
about my configuration that's unique?  I really wonder why nobody else
is running into this.

There's probably some other race going on here.  This warning doesn't
happen on every boot.

> [1.396475] [ cut here ]
> [1.396486] WARNING: CPU: 0 PID: 91 at lib/kobject.c:244 
> kobject_add_internal+0x108/0x2d0
> [1.396492] kobject_add_internal failed for card0-DP-3 (error: -2 parent: 
> card0)
> [1.396496] Modules linked in:
> [1.396500] CPU: 0 PID: 91 Comm: kworker/0:2 Not tainted 4.10.0-rc2-dirty 
> #58
> [1.396503] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
> (1.23 ) 09/20/2016
> [1.396509] Workqueue: events_long drm_dp_mst_link_probe_work
> [1.396511] Call Trace:
> [1.396517]  dump_stack+0x67/0x99
> [1.396521]  __warn+0xd1/0xf0
> [1.396524]  warn_slowpath_fmt+0x4f/0x60
> [1.396527]  kobject_add_internal+0x108/0x2d0
> [1.396531]  ? vsnprintf+0x112/0x4b0
> [1.396533]  kobject_add+0x65/0xb0
> [1.396537]  ? klist_init+0x31/0x40
> [1.396542]  device_add+0x102/0x5d0
> [1.396546]  ? kfree_const+0x22/0x30
> [1.396549]  device_create_groups_vargs+0xd8/0x100
> [1.396553]  device_create_with_groups+0x36/0x40
> [1.396557]  ? drm_fb_helper_add_one_connector+0x57/0xd0
> [1.396561]  ? kmem_cache_alloc_trace+0x1d2/0x1f0
> [1.396566]  drm_sysfs_connector_add+0x60/0xe0
> [1.396570]  drm_connector_register+0x21/0xc0
> [1.396575]  intel_dp_register_mst_connector+0x41/0x50
> [1.396579]  drm_dp_add_port+0x350/0x450
> [1.396583]  ? rcu_early_boot_tests+0x1/0x10
> [1.396587]  ? schedule_timeout+0x1cd/0x390
> [1.396591]  ? __might_sleep+0x4a/0x90
> [1.396594]  ? mutex_lock+0x25/0x50
> [1.396597]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
> [1.396600]  ? prepare_to_wait_event+0x120/0x120
> [1.396603]  ? drm_dp_check_mstb_guid+0x3d/0x120
> [1.396606] drm_sysfs_connector_add() connector: 88040ed3f000 kdev: 
> 88040c55dc00
> [1.396610]  drm_dp_send_link_address+0x185/0x1f0
> [1.396614]  drm_dp_check_and_send_link_address+0xad/0xc0
> [1.396617]  drm_dp_mst_link_probe_work+0x57/0xa0
> [1.396620]  process_one_work+0x14b/0x430
> [1.396623]  worker_thread+0x12b/0x4a0
> [1.396626]  kthread+0x10c/0x140
> [1.396629]  ? process_one_work+0x430/0x430
> [1.396632]  ? kthread_create_on_node+0x40/0x40
> [1.396635]  ret_from_fork+0x27/0x40
> [1.396639] ---[ end trace dd05bb2cd9ebd900 ]---
> [1.396643] drm_sysfs_connector_add() connector: 88040c75a000 kdev: 
> fffe
> [1.396648] [drm:drm_sysfs_connector_add] *ERROR* failed to register 
> connector device: -2
> [1.396651] drm_connector_register(88040ed3f000) 
> connector->registered: 1
> [1.39] drm_sysfs_connector_add() connector: 88040ed3f800 kdev: 
> 88040c55e400
> [1.396700] drm_connector_register(88040ed3f800) 
> connector->registered: 1
> [1.396717] drm_sysfs_connector_add() connector: 88040ed39800 kdev: 
> 88040c55e800
> [1.396726] drm_connector_register(88040ed39800) 
> connector->registered: 1
> [1.396743] drm_sysfs_connector_add() connector: 88040c758000 kdev: 
> 88040c55ec00
> [1.396769] drm_connector_register(88040c758000) 
> connector->registered: 1
> [1.396786] drm_sysfs_connector_add() connector: 88040c758800 kdev: 
> 88040c55f000
> [1.396799] drm_connector_register(88040c758800) 
> connector->registered: 1
> [1.396804] drm_connector_register(88040c75a000) 
> connector->registered: 1
> [1.398665] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
> [1.398893] drm_sysfs_connector_add() connector: 88040c75b000 kdev: 
> 88040c790400
> [1.398907] drm_connector_register(88040c75b000) 
> connector->registered: 1
> [1.398930] drm_sysfs_connector_add() connector: 88040c75c000 kdev: 
> 88040c790800
> [1.398939] drm_connector_register(88040c75c000) 
> connector->registered: 1


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Dave Hansen
Well, now I found where the -2 comes from.
intel_dp_register_mst_connector() calls drm_connector_register(), which
fails to add the kobject (warning below).  But, it does zero error
checking on the drm_connector_register() call and leaves the
partially-constructed connector in place.

The next time some poor, hapless code goes and tries to do anything with
that kdev, they oops.  I'm perplexed by this, though.  The
drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
a bit goofy that it can't even _return_ failure.

Is there some stable code to go back to here?  Or, is there something
about my configuration that's unique?  I really wonder why nobody else
is running into this.

There's probably some other race going on here.  This warning doesn't
happen on every boot.

> [1.396475] [ cut here ]
> [1.396486] WARNING: CPU: 0 PID: 91 at lib/kobject.c:244 
> kobject_add_internal+0x108/0x2d0
> [1.396492] kobject_add_internal failed for card0-DP-3 (error: -2 parent: 
> card0)
> [1.396496] Modules linked in:
> [1.396500] CPU: 0 PID: 91 Comm: kworker/0:2 Not tainted 4.10.0-rc2-dirty 
> #58
> [1.396503] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
> (1.23 ) 09/20/2016
> [1.396509] Workqueue: events_long drm_dp_mst_link_probe_work
> [1.396511] Call Trace:
> [1.396517]  dump_stack+0x67/0x99
> [1.396521]  __warn+0xd1/0xf0
> [1.396524]  warn_slowpath_fmt+0x4f/0x60
> [1.396527]  kobject_add_internal+0x108/0x2d0
> [1.396531]  ? vsnprintf+0x112/0x4b0
> [1.396533]  kobject_add+0x65/0xb0
> [1.396537]  ? klist_init+0x31/0x40
> [1.396542]  device_add+0x102/0x5d0
> [1.396546]  ? kfree_const+0x22/0x30
> [1.396549]  device_create_groups_vargs+0xd8/0x100
> [1.396553]  device_create_with_groups+0x36/0x40
> [1.396557]  ? drm_fb_helper_add_one_connector+0x57/0xd0
> [1.396561]  ? kmem_cache_alloc_trace+0x1d2/0x1f0
> [1.396566]  drm_sysfs_connector_add+0x60/0xe0
> [1.396570]  drm_connector_register+0x21/0xc0
> [1.396575]  intel_dp_register_mst_connector+0x41/0x50
> [1.396579]  drm_dp_add_port+0x350/0x450
> [1.396583]  ? rcu_early_boot_tests+0x1/0x10
> [1.396587]  ? schedule_timeout+0x1cd/0x390
> [1.396591]  ? __might_sleep+0x4a/0x90
> [1.396594]  ? mutex_lock+0x25/0x50
> [1.396597]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
> [1.396600]  ? prepare_to_wait_event+0x120/0x120
> [1.396603]  ? drm_dp_check_mstb_guid+0x3d/0x120
> [1.396606] drm_sysfs_connector_add() connector: 88040ed3f000 kdev: 
> 88040c55dc00
> [1.396610]  drm_dp_send_link_address+0x185/0x1f0
> [1.396614]  drm_dp_check_and_send_link_address+0xad/0xc0
> [1.396617]  drm_dp_mst_link_probe_work+0x57/0xa0
> [1.396620]  process_one_work+0x14b/0x430
> [1.396623]  worker_thread+0x12b/0x4a0
> [1.396626]  kthread+0x10c/0x140
> [1.396629]  ? process_one_work+0x430/0x430
> [1.396632]  ? kthread_create_on_node+0x40/0x40
> [1.396635]  ret_from_fork+0x27/0x40
> [1.396639] ---[ end trace dd05bb2cd9ebd900 ]---
> [1.396643] drm_sysfs_connector_add() connector: 88040c75a000 kdev: 
> fffe
> [1.396648] [drm:drm_sysfs_connector_add] *ERROR* failed to register 
> connector device: -2
> [1.396651] drm_connector_register(88040ed3f000) 
> connector->registered: 1
> [1.39] drm_sysfs_connector_add() connector: 88040ed3f800 kdev: 
> 88040c55e400
> [1.396700] drm_connector_register(88040ed3f800) 
> connector->registered: 1
> [1.396717] drm_sysfs_connector_add() connector: 88040ed39800 kdev: 
> 88040c55e800
> [1.396726] drm_connector_register(88040ed39800) 
> connector->registered: 1
> [1.396743] drm_sysfs_connector_add() connector: 88040c758000 kdev: 
> 88040c55ec00
> [1.396769] drm_connector_register(88040c758000) 
> connector->registered: 1
> [1.396786] drm_sysfs_connector_add() connector: 88040c758800 kdev: 
> 88040c55f000
> [1.396799] drm_connector_register(88040c758800) 
> connector->registered: 1
> [1.396804] drm_connector_register(88040c75a000) 
> connector->registered: 1
> [1.398665] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
> [1.398893] drm_sysfs_connector_add() connector: 88040c75b000 kdev: 
> 88040c790400
> [1.398907] drm_connector_register(88040c75b000) 
> connector->registered: 1
> [1.398930] drm_sysfs_connector_add() connector: 88040c75c000 kdev: 
> 88040c790800
> [1.398939] drm_connector_register(88040c75c000) 
> connector->registered: 1


Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Dave Hansen
On 01/09/2017 02:15 AM, Daniel Vetter wrote:
...
> Can you pls do some printk tracing to make sure that without your patch
> we're indeed releasing the same connector twice from this loop? I suspect
> you're just ever-so-slightly shifting the timing and things blow up
> somewhre else. But no idea where :(

Your suspicious appear correct.  I still get an oops even with my patch
applied.  I did add some printk's but they're weird.  I'm seeing a
'kdev' that looks to be -2, although it should never get set to that
unless device_create_with_groups() and th error checking failed somehow.

I'll take out the locking portion of the patch and add some more
printk's and see if anything else shows up.  Here's a copy of the
printk's and associated oops, with my locking modification patch
applied, if this helps at all:

> [ 7927.049763] drm_dp_destroy_port() kfree(88040c759000)
> [ 7927.050039] drm_dp_destroy_connector_work() port: 88040c759800 
> connector: 88040c75a000
> [ 7927.050061] drm_sysfs_connector_remove() connector: 88040c75a000 kdev: 
> fffe
> [ 7927.050106] BUG: unable to handle kernel NULL pointer dereference at 
> 009e
> [ 7927.050242] IP: device_del+0x19/0x330
> [ 7927.050300] PGD 0 
> [ 7927.050302] 
> [ 7927.050364] Oops:  [#1] SMP
> [ 7927.050414] Modules linked in: netconsole ctr ccm ipt_MASQUERADE 
> nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 
> nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 
> xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc iptable_filter ip_tables 
> ebtable_nat ebtables x_tables cmac rfcomm bnep dm_crypt arc4 iwlmvm mac80211 
> snd_hda_codec_hdmi iwlwifi snd_hda_codec_realtek snd_hda_codec_generic 
> snd_hda_intel snd_seq_midi snd_seq_midi_event snd_hda_codec cfg80211 
> snd_hwdep snd_rawmidi intel_rapl snd_hda_core iosf_mbi snd_pcm snd_seq btusb 
> x86_pkg_temp_thermal hid_logitech_hidpp btrtl btbcm btintel coretemp shpchp 
> snd_timer ghash_clmulni_intel bluetooth joydev thinkpad_acpi snd_seq_device 
> nvram wmi snd soundcore mac_hid aesni_intel aes_x86_64 crypto_simd cryptd 
> glue_helper
> [ 7927.051492]  kvm_intel kvm irqbypass hid_generic hid_logitech_dj usbhid 
> hid [last unloaded: netconsole]
> [ 7927.051642] CPU: 2 PID: 123 Comm: kworker/2:2 Tainted: GW   
> 4.10.0-rc2-dirty #53
> [ 7927.051763] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
> (1.23 ) 09/20/2016
> [ 7927.051887] Workqueue: events drm_dp_destroy_connector_work
> [ 7927.051971] task: 88040c21bc00 task.stack: c90002484000
> [ 7927.052065] RIP: 0010:device_del+0x19/0x330
> [ 7927.052129] RSP: 0018:c90002487d58 EFLAGS: 00010282
> [ 7927.052209] RAX:  RBX: fffe RCX: 
> 88040c72d1f8
> [ 7927.052310] RDX: 81cb69b2 RSI: 0001 RDI: 
> fffe
> [ 7927.052412] RBP: c90002487d90 R08:  R09: 
> 0392
> [ 7927.052517] R10: 0003e05de802 R11:  R12: 
> fffe
> [ 7927.052677] R13: 88040c516c18 R14:  R15: 
> 88040c516bd8
> [ 7927.052781] FS:  () GS:88042150() 
> knlGS:
> [ 7927.052897] CS:  0010 DS:  ES:  CR0: 80050033
> [ 7927.052981] CR2: 009e CR3: 01e0b000 CR4: 
> 003406e0
> [ 7927.053082] DR0:  DR1:  DR2: 
> 
> [ 7927.053186] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [ 7927.053288] Call Trace:
> [ 7927.053338]  ? printk+0x4d/0x4f
> [ 7927.053396]  device_unregister+0x12/0x30
> [ 7927.053464]  drm_sysfs_connector_remove+0x57/0x70
> [ 7927.053538]  drm_connector_unregister.part.8+0x27/0x40
> [ 7927.053616]  drm_connector_unregister+0x14/0x20
> [ 7927.053690]  intel_dp_destroy_mst_connector+0x1a/0x80
> [ 7927.053771]  drm_dp_destroy_connector_work+0xce/0x150
> [ 7927.053850]  process_one_work+0x14b/0x430
> [ 7927.053914]  worker_thread+0x12b/0x4a0
> [ 7927.053982]  kthread+0x10c/0x140
> [ 7927.054037]  ? process_one_work+0x430/0x430
> [ 7927.054104]  ? kthread_create_on_node+0x40/0x40
> [ 7927.054179]  ret_from_fork+0x27/0x40
> [ 7927.054237] Code: 00 00 00 00 00 00 00 5b 41 5c 41 5d 5d c3 0f 1f 40 00 0f 
> 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 <48> 
> 8b 87 a0 00 00 00 4c 8b 2f 48 85 c0 74 1b 48 8b b8 90 00 00 
> [ 7927.054462] RIP: device_del+0x19/0x330 RSP: c90002487d58
> [ 7927.054486] CR2: 009e
> [ 7927.065278] ---[ end trace 1a2a5ca2cf6c3a4f ]---



Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Dave Hansen
On 01/09/2017 02:15 AM, Daniel Vetter wrote:
...
> Can you pls do some printk tracing to make sure that without your patch
> we're indeed releasing the same connector twice from this loop? I suspect
> you're just ever-so-slightly shifting the timing and things blow up
> somewhre else. But no idea where :(

Your suspicious appear correct.  I still get an oops even with my patch
applied.  I did add some printk's but they're weird.  I'm seeing a
'kdev' that looks to be -2, although it should never get set to that
unless device_create_with_groups() and th error checking failed somehow.

I'll take out the locking portion of the patch and add some more
printk's and see if anything else shows up.  Here's a copy of the
printk's and associated oops, with my locking modification patch
applied, if this helps at all:

> [ 7927.049763] drm_dp_destroy_port() kfree(88040c759000)
> [ 7927.050039] drm_dp_destroy_connector_work() port: 88040c759800 
> connector: 88040c75a000
> [ 7927.050061] drm_sysfs_connector_remove() connector: 88040c75a000 kdev: 
> fffe
> [ 7927.050106] BUG: unable to handle kernel NULL pointer dereference at 
> 009e
> [ 7927.050242] IP: device_del+0x19/0x330
> [ 7927.050300] PGD 0 
> [ 7927.050302] 
> [ 7927.050364] Oops:  [#1] SMP
> [ 7927.050414] Modules linked in: netconsole ctr ccm ipt_MASQUERADE 
> nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 
> nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 
> xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc iptable_filter ip_tables 
> ebtable_nat ebtables x_tables cmac rfcomm bnep dm_crypt arc4 iwlmvm mac80211 
> snd_hda_codec_hdmi iwlwifi snd_hda_codec_realtek snd_hda_codec_generic 
> snd_hda_intel snd_seq_midi snd_seq_midi_event snd_hda_codec cfg80211 
> snd_hwdep snd_rawmidi intel_rapl snd_hda_core iosf_mbi snd_pcm snd_seq btusb 
> x86_pkg_temp_thermal hid_logitech_hidpp btrtl btbcm btintel coretemp shpchp 
> snd_timer ghash_clmulni_intel bluetooth joydev thinkpad_acpi snd_seq_device 
> nvram wmi snd soundcore mac_hid aesni_intel aes_x86_64 crypto_simd cryptd 
> glue_helper
> [ 7927.051492]  kvm_intel kvm irqbypass hid_generic hid_logitech_dj usbhid 
> hid [last unloaded: netconsole]
> [ 7927.051642] CPU: 2 PID: 123 Comm: kworker/2:2 Tainted: GW   
> 4.10.0-rc2-dirty #53
> [ 7927.051763] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W 
> (1.23 ) 09/20/2016
> [ 7927.051887] Workqueue: events drm_dp_destroy_connector_work
> [ 7927.051971] task: 88040c21bc00 task.stack: c90002484000
> [ 7927.052065] RIP: 0010:device_del+0x19/0x330
> [ 7927.052129] RSP: 0018:c90002487d58 EFLAGS: 00010282
> [ 7927.052209] RAX:  RBX: fffe RCX: 
> 88040c72d1f8
> [ 7927.052310] RDX: 81cb69b2 RSI: 0001 RDI: 
> fffe
> [ 7927.052412] RBP: c90002487d90 R08:  R09: 
> 0392
> [ 7927.052517] R10: 0003e05de802 R11:  R12: 
> fffe
> [ 7927.052677] R13: 88040c516c18 R14:  R15: 
> 88040c516bd8
> [ 7927.052781] FS:  () GS:88042150() 
> knlGS:
> [ 7927.052897] CS:  0010 DS:  ES:  CR0: 80050033
> [ 7927.052981] CR2: 009e CR3: 01e0b000 CR4: 
> 003406e0
> [ 7927.053082] DR0:  DR1:  DR2: 
> 
> [ 7927.053186] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [ 7927.053288] Call Trace:
> [ 7927.053338]  ? printk+0x4d/0x4f
> [ 7927.053396]  device_unregister+0x12/0x30
> [ 7927.053464]  drm_sysfs_connector_remove+0x57/0x70
> [ 7927.053538]  drm_connector_unregister.part.8+0x27/0x40
> [ 7927.053616]  drm_connector_unregister+0x14/0x20
> [ 7927.053690]  intel_dp_destroy_mst_connector+0x1a/0x80
> [ 7927.053771]  drm_dp_destroy_connector_work+0xce/0x150
> [ 7927.053850]  process_one_work+0x14b/0x430
> [ 7927.053914]  worker_thread+0x12b/0x4a0
> [ 7927.053982]  kthread+0x10c/0x140
> [ 7927.054037]  ? process_one_work+0x430/0x430
> [ 7927.054104]  ? kthread_create_on_node+0x40/0x40
> [ 7927.054179]  ret_from_fork+0x27/0x40
> [ 7927.054237] Code: 00 00 00 00 00 00 00 5b 41 5c 41 5d 5d c3 0f 1f 40 00 0f 
> 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 <48> 
> 8b 87 a0 00 00 00 4c 8b 2f 48 85 c0 74 1b 48 8b b8 90 00 00 
> [ 7927.054462] RIP: device_del+0x19/0x330 RSP: c90002487d58
> [ 7927.054486] CR2: 009e
> [ 7927.065278] ---[ end trace 1a2a5ca2cf6c3a4f ]---



Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Daniel Vetter
On Thu, Jan 05, 2017 at 11:03:44AM -0800, Dave Hansen wrote:
> My Thinkpad x260 doesn't like to be unplugged from its dock.  I don't
> think this is a new bug.  It's happening on my distro's 4.4 kernel
> as well.
> 
> The actual oops is in device_del().  It appears to have been passed a
> null 'struct device *'.
> 
> There appears to have been a race _around_ here fixed in 1f7717552e.
> I've looked for and tried to find the locking that prevents
> drm_connector_unregister() from being called twice concurrently.  I'm
> unable to find anything.
> 
> drm_dp_destroy_connector_work() has some locking that looks useful:
> 
>   mutex_lock(>destroy_connector_lock)
> 
> but it's released before the offending call:
> 
>   mgr->cbs->destroy_connector(mgr, port->connector);
> 
> which actually calls intel_dp_destroy_mst_connector().  I have no idea
> if it's correct (and haven't even run it with lockdep), but the attached
> patch does seem to fix my oopses.
> 
> Any ideas?
> 
> > Jan  5 10:22:32 ray kernel: [  537.087042] BUG: unable to handle kernel 
> > NULL pointer dereference at 009e
> > Jan  5 10:22:32 ray kernel: [  537.087954] IP: device_del+0x19/0x330
> > Jan  5 10:22:32 ray kernel: [  537.088860] PGD 0
> > Jan  5 10:22:32 ray kernel: [  537.088860]
> > Jan  5 10:22:32 ray kernel: [  537.090578] Oops:  [#1] SMP
> > Jan  5 10:22:32 ray kernel: [  537.091406] Modules linked in: ctr ccm 
> > ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat 
> > nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
> > nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc 
> > iptable_filter ip_tables ebtable_nat ebtables x_tables cmac rfcomm bnep 
> > dm_crypt arc4 iwlmvm mac80211 snd_hda_codec_hdmi snd_hda_codec_realtek 
> > snd_hda_codec_generic iwlwifi intel_rapl snd_hda_intel iosf_mbi 
> > hid_logitech_hidpp snd_seq_midi cfg80211 x86_pkg_temp_thermal snd_hda_codec 
> > snd_seq_midi_event snd_hwdep btusb snd_rawmidi snd_hda_core btrtl coretemp 
> > snd_seq snd_pcm btbcm btintel joydev bluetooth ghash_clmulni_intel 
> > snd_timer shpchp thinkpad_acpi snd_seq_device nvram wmi snd soundcore 
> > mac_hid aesni_intel aes_x86_64 crypto_simd cryptd glue_helper kvm_intel
> > Jan  5 10:22:32 ray kernel: [  537.095222]  kvm irqbypass hid_generic 
> > hid_logitech_dj usbhid hid
> > Jan  5 10:22:32 ray kernel: [  537.096272] CPU: 2 PID: 23 Comm: kworker/2:0 
> > Tainted: GW   4.10.0-rc2 #47
> > Jan  5 10:22:32 ray kernel: [  537.097263] Hardware name: LENOVO 
> > 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016
> > Jan  5 10:22:32 ray kernel: [  537.098291] Workqueue: events 
> > drm_dp_destroy_connector_work
> > Jan  5 10:22:32 ray kernel: [  537.099328] task: 88040f2f1e00 
> > task.stack: c9000198c000
> > Jan  5 10:22:32 ray kernel: [  537.100335] RIP: 0010:device_del+0x19/0x330
> > Jan  5 10:22:32 ray kernel: [  537.101340] RSP: 0018:c9000198fd58 
> > EFLAGS: 00010282
> > Jan  5 10:22:32 ray kernel: [  537.102361] RAX:  RBX: 
> > fffe RCX: 88040c5191b0
> > Jan  5 10:22:32 ray kernel: [  537.103418] RDX: 81cb6246 RSI: 
> > 0001 RDI: fffe
> > Jan  5 10:22:32 ray kernel: [  537.104473] RBP: c9000198fd90 R08: 
> >  R09: 880421517780
> > Jan  5 10:22:32 ray kernel: [  537.105574] R10: 007d0ce17c93 R11: 
> > 0001 R12: fffe
> > Jan  5 10:22:32 ray kernel: [  537.106636] R13: 88040ed36bd8 R14: 
> > 88040ed36788 R15: 88040c728810
> > Jan  5 10:22:32 ray kernel: [  537.107728] FS:  () 
> > GS:88042150() knlGS:
> > Jan  5 10:22:32 ray kernel: [  537.108812] CS:  0010 DS:  ES:  CR0: 
> > 80050033
> > Jan  5 10:22:32 ray kernel: [  537.109937] CR2: 009e CR3: 
> > 000384894000 CR4: 003406e0
> > Jan  5 10:22:32 ray kernel: [  537.111038] DR0:  DR1: 
> >  DR2: 
> > Jan  5 10:22:32 ray kernel: [  537.112142] DR3:  DR6: 
> > fffe0ff0 DR7: 0400
> > Jan  5 10:22:32 ray kernel: [  537.113223] Call Trace:
> > Jan  5 10:22:32 ray kernel: [  537.114293]  device_unregister+0x12/0x30
> > Jan  5 10:22:32 ray kernel: [  537.115354]  
> > drm_sysfs_connector_remove+0x3b/0x50
> > Jan  5 10:22:32 ray kernel: [  537.116391]  
> > drm_connector_unregister.part.8+0x27/0x40
> > Jan  5 10:22:32 ray kernel: [  537.117433]  
> > drm_connector_unregister+0x14/0x20
> > Jan  5 10:22:32 ray kernel: [  537.118478]  
> > intel_dp_destroy_mst_connector+0x1a/0x80
> > Jan  5 10:22:32 ray kernel: [  537.119513]  
> > drm_dp_destroy_connector_work+0xa9/0x150
> > Jan  5 10:22:32 ray kernel: [  537.120539]  process_one_work+0x14b/0x430
> > Jan  5 10:22:32 ray kernel: [  537.121568]  worker_thread+0x12b/0x4a0
> > Jan  5 10:22:32 ray kernel: [  537.122581]  kthread+0x10c/0x140
> > Jan  5 

Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code

2017-01-09 Thread Daniel Vetter
On Thu, Jan 05, 2017 at 11:03:44AM -0800, Dave Hansen wrote:
> My Thinkpad x260 doesn't like to be unplugged from its dock.  I don't
> think this is a new bug.  It's happening on my distro's 4.4 kernel
> as well.
> 
> The actual oops is in device_del().  It appears to have been passed a
> null 'struct device *'.
> 
> There appears to have been a race _around_ here fixed in 1f7717552e.
> I've looked for and tried to find the locking that prevents
> drm_connector_unregister() from being called twice concurrently.  I'm
> unable to find anything.
> 
> drm_dp_destroy_connector_work() has some locking that looks useful:
> 
>   mutex_lock(>destroy_connector_lock)
> 
> but it's released before the offending call:
> 
>   mgr->cbs->destroy_connector(mgr, port->connector);
> 
> which actually calls intel_dp_destroy_mst_connector().  I have no idea
> if it's correct (and haven't even run it with lockdep), but the attached
> patch does seem to fix my oopses.
> 
> Any ideas?
> 
> > Jan  5 10:22:32 ray kernel: [  537.087042] BUG: unable to handle kernel 
> > NULL pointer dereference at 009e
> > Jan  5 10:22:32 ray kernel: [  537.087954] IP: device_del+0x19/0x330
> > Jan  5 10:22:32 ray kernel: [  537.088860] PGD 0
> > Jan  5 10:22:32 ray kernel: [  537.088860]
> > Jan  5 10:22:32 ray kernel: [  537.090578] Oops:  [#1] SMP
> > Jan  5 10:22:32 ray kernel: [  537.091406] Modules linked in: ctr ccm 
> > ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat 
> > nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
> > nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc 
> > iptable_filter ip_tables ebtable_nat ebtables x_tables cmac rfcomm bnep 
> > dm_crypt arc4 iwlmvm mac80211 snd_hda_codec_hdmi snd_hda_codec_realtek 
> > snd_hda_codec_generic iwlwifi intel_rapl snd_hda_intel iosf_mbi 
> > hid_logitech_hidpp snd_seq_midi cfg80211 x86_pkg_temp_thermal snd_hda_codec 
> > snd_seq_midi_event snd_hwdep btusb snd_rawmidi snd_hda_core btrtl coretemp 
> > snd_seq snd_pcm btbcm btintel joydev bluetooth ghash_clmulni_intel 
> > snd_timer shpchp thinkpad_acpi snd_seq_device nvram wmi snd soundcore 
> > mac_hid aesni_intel aes_x86_64 crypto_simd cryptd glue_helper kvm_intel
> > Jan  5 10:22:32 ray kernel: [  537.095222]  kvm irqbypass hid_generic 
> > hid_logitech_dj usbhid hid
> > Jan  5 10:22:32 ray kernel: [  537.096272] CPU: 2 PID: 23 Comm: kworker/2:0 
> > Tainted: GW   4.10.0-rc2 #47
> > Jan  5 10:22:32 ray kernel: [  537.097263] Hardware name: LENOVO 
> > 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016
> > Jan  5 10:22:32 ray kernel: [  537.098291] Workqueue: events 
> > drm_dp_destroy_connector_work
> > Jan  5 10:22:32 ray kernel: [  537.099328] task: 88040f2f1e00 
> > task.stack: c9000198c000
> > Jan  5 10:22:32 ray kernel: [  537.100335] RIP: 0010:device_del+0x19/0x330
> > Jan  5 10:22:32 ray kernel: [  537.101340] RSP: 0018:c9000198fd58 
> > EFLAGS: 00010282
> > Jan  5 10:22:32 ray kernel: [  537.102361] RAX:  RBX: 
> > fffe RCX: 88040c5191b0
> > Jan  5 10:22:32 ray kernel: [  537.103418] RDX: 81cb6246 RSI: 
> > 0001 RDI: fffe
> > Jan  5 10:22:32 ray kernel: [  537.104473] RBP: c9000198fd90 R08: 
> >  R09: 880421517780
> > Jan  5 10:22:32 ray kernel: [  537.105574] R10: 007d0ce17c93 R11: 
> > 0001 R12: fffe
> > Jan  5 10:22:32 ray kernel: [  537.106636] R13: 88040ed36bd8 R14: 
> > 88040ed36788 R15: 88040c728810
> > Jan  5 10:22:32 ray kernel: [  537.107728] FS:  () 
> > GS:88042150() knlGS:
> > Jan  5 10:22:32 ray kernel: [  537.108812] CS:  0010 DS:  ES:  CR0: 
> > 80050033
> > Jan  5 10:22:32 ray kernel: [  537.109937] CR2: 009e CR3: 
> > 000384894000 CR4: 003406e0
> > Jan  5 10:22:32 ray kernel: [  537.111038] DR0:  DR1: 
> >  DR2: 
> > Jan  5 10:22:32 ray kernel: [  537.112142] DR3:  DR6: 
> > fffe0ff0 DR7: 0400
> > Jan  5 10:22:32 ray kernel: [  537.113223] Call Trace:
> > Jan  5 10:22:32 ray kernel: [  537.114293]  device_unregister+0x12/0x30
> > Jan  5 10:22:32 ray kernel: [  537.115354]  
> > drm_sysfs_connector_remove+0x3b/0x50
> > Jan  5 10:22:32 ray kernel: [  537.116391]  
> > drm_connector_unregister.part.8+0x27/0x40
> > Jan  5 10:22:32 ray kernel: [  537.117433]  
> > drm_connector_unregister+0x14/0x20
> > Jan  5 10:22:32 ray kernel: [  537.118478]  
> > intel_dp_destroy_mst_connector+0x1a/0x80
> > Jan  5 10:22:32 ray kernel: [  537.119513]  
> > drm_dp_destroy_connector_work+0xa9/0x150
> > Jan  5 10:22:32 ray kernel: [  537.120539]  process_one_work+0x14b/0x430
> > Jan  5 10:22:32 ray kernel: [  537.121568]  worker_thread+0x12b/0x4a0
> > Jan  5 10:22:32 ray kernel: [  537.122581]  kthread+0x10c/0x140
> > Jan  5