nouveau_fan_update: possible circular locking dependency detected

2014-03-13 Thread Marcin Slusarz
On Thu, Mar 13, 2014 at 09:38:45AM -0400, Ilia Mirkin wrote:
> On Sun, Mar 9, 2014 at 10:51 AM, Marcin Slusarz
>  wrote:
> > [  326.168487] ==
> > [  326.168491] [ INFO: possible circular locking dependency detected ]
> > [  326.168496] 3.13.6 #1270 Not tainted
> > [  326.168500] ---
> > [  326.168504] ldconfig/22297 is trying to acquire lock:
> > [  326.168507]  (&(&priv->fan->lock)->rlock){-.-...}, at: 
> > [] nouveau_fan_update+0xeb/0x252 [nouveau]
> > [  326.168551]
> > but task is already holding lock:
> > [  326.168555]  (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}, at: 
> > [] alarm_timer_callback+0xf1/0x179 [nouveau]
> > [  326.168587]
> > which lock already depends on the new lock.
> > (...)
> 
> Marcin, how reproducible is this? What hardware was this on? If it's
> reasonably reproducible perhaps it makes sense to file a bug in the
> fd.o tracker?

It happened only once so far (but I don't use this machine every day) - on the
first boot of 3.13 kernel. At that time the machine was quite hot (it was
rebuilding the whole system (Gentoo) *and* CPU fan was dusty), so it probably
affected GPU temperature.
It's NVA8 card (dmesg attached).

Marcin
-- next part --
[0.00] Linux version 3.13.6 (marcin at joi) (gcc version 4.7.3 (Gentoo 
4.7.3-r1 p1.4, pie-0.5.5) ) #1274 SMP PREEMPT Thu Mar 13 22:55:41 CET 2014
[0.00] Command line: BOOT_IMAGE=/boot/kernel-3.13.6 
root=UUID=a55f9cc0-8726-4a17-9198-a153da676c85 netconsole=3 at 
192.168.1.123/eth0, at 192.168.1.102/00:06:5b:6a:a5:74
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e4c00-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xbf77] usable
[0.00] BIOS-e820: [mem 0xbf78-0xbf797fff] ACPI data
[0.00] BIOS-e820: [mem 0xbf798000-0xbf7dbfff] ACPI NVS
[0.00] BIOS-e820: [mem 0xbf7dc000-0xbfff] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xffe0-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x0001bfff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.5 present.
[0.00] DMI: System manufacturer System Product Name/P6T SE, BIOS 0603   
 09/02/2009
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] No AGP bridge found
[0.00] e820: last_pfn = 0x1c max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-E3FFF write-protect
[0.00]   E4000-EBFFF write-through
[0.00]   EC000-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 1C000 mask FC000 uncachable
[0.00]   1 base 0 mask E write-back
[0.00]   2 base 0C000 mask FC000 uncachable
[0.00]   3 base 0BF80 mask FFF80 uncachable
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[0.00] original variable MTRRs
[0.00] reg 0, base: 7GB, range: 1GB, type UC
[0.00] reg 1, base: 0GB, range: 8GB, type WB
[0.00] reg 2, base: 3GB, range: 1GB, type UC
[0.00] reg 3, base: 3064MB, range: 8MB, type UC
[0.00] total RAM covered: 6136M
[0.00] Found optimal setting for mtrr clean up
[0.00]  gran_size: 64K  chunk_size: 16M num_reg: 5  lose 
cover RAM: 0G
[0.00] New variable MTRRs
[0.00] reg 0, base: 0GB, range: 2GB, type WB
[0.00] reg 1, base: 2GB, range: 1GB, type WB
[0.00] reg 2, base: 3064MB, range: 8MB, type UC
[0.00] reg 3, base: 4GB, range: 2GB, type WB
[0.00] reg 4, base: 6GB, range: 1GB, type WB
[0.00] e820: update [mem 0xbf80-0x] usable ==> reserved
[0.00] e820: last_pfn = 0xbf780 max_arch_pfn = 0x4
[0.00] Base memory trampoline at [88099000] 99000 size 24576
[0.00] init_memory_mapping: [mem 0x-0x000f]
[0.00]  [mem 0x-0x000f] page 4k
[0.00] BRK [0x027db000, 0x027dbfff] PGTABLE
[0.00] BRK [0x027dc000, 0x027dcfff] PGTABLE
[0.00] BRK [0x027dd000, 0x027ddfff] PGTABLE
[0.00] init_memory_mapping: [mem 0x1bfe0-0x1bfff

nouveau_fan_update: possible circular locking dependency detected

2014-03-13 Thread Martin Peres
Le 13/03/2014 14:38, Ilia Mirkin a ?crit :
> On Sun, Mar 9, 2014 at 10:51 AM, Marcin Slusarz
>  wrote:
>> [  326.168487] ==
>> [  326.168491] [ INFO: possible circular locking dependency detected ]
>> [  326.168496] 3.13.6 #1270 Not tainted
>> [  326.168500] ---
>> [  326.168504] ldconfig/22297 is trying to acquire lock:
>> [  326.168507]  (&(&priv->fan->lock)->rlock){-.-...}, at: 
>> [] nouveau_fan_update+0xeb/0x252 [nouveau]
>> [  326.168551]
>> but task is already holding lock:
>> [  326.168555]  (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}, at: 
>> [] alarm_timer_callback+0xf1/0x179 [nouveau]
>> [  326.168587]
>> which lock already depends on the new lock.
>>
>> [  326.168592]
>> the existing dependency chain (in reverse order) is:
>> [  326.168596]
>> -> #1 (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}:
>> [  326.168606][] lock_acquire+0xce/0x117
>> [  326.168615][] _raw_spin_lock_irqsave+0x3f/0x51
>> [  326.168623][] alarm_timer_callback+0xf1/0x179 
>> [nouveau]
>> [  326.168651][] 
>> nv04_timer_alarm_trigger+0x1b1/0x1cb [nouveau]
>> [  326.168679][] nv04_timer_alarm+0xb5/0xbe 
>> [nouveau]
>> [  326.168708][] nouveau_fan_update+0x234/0x252 
>> [nouveau]
>> [  326.168735][] nouveau_fan_alarm+0x15/0x17 
>> [nouveau]
>> [  326.168763][] 
>> nv04_timer_alarm_trigger+0x1b1/0x1cb [nouveau]
>> [  326.168790][] nv04_timer_intr+0x5b/0x13c 
>> [nouveau]
>> [  326.168817][] nouveau_mc_intr+0x2e2/0x3b1 
>> [nouveau]
>> [  326.168838][] handle_irq_event_percpu+0x5c/0x1dc
>> [  326.168846][] handle_irq_event+0x3c/0x5c
>> [  326.168852][] handle_edge_irq+0xc4/0xeb
>> [  326.168860][] handle_irq+0x120/0x12d
>> [  326.168868][] do_IRQ+0x48/0xaf
>> [  326.168873][] ret_from_intr+0x0/0x13
>> [  326.168881][] arch_cpu_idle+0x13/0x1d
>> [  326.168887][] cpu_startup_entry+0x140/0x218
>> [  326.168895][] start_secondary+0x1bf/0x1c4
>> [  326.168902]
>> -> #0 (&(&priv->fan->lock)->rlock){-.-...}:
>> [  326.168913][] __lock_acquire+0x10be/0x182b
>> [  326.168920][] lock_acquire+0xce/0x117
>> [  326.168924][] _raw_spin_lock_irqsave+0x3f/0x51
>> [  326.168931][] nouveau_fan_update+0xeb/0x252 
>> [nouveau]
>> [  326.168958][] nouveau_therm_fan_set+0x14/0x16 
>> [nouveau]
>> [  326.168984][] nouveau_therm_update+0x303/0x312 
>> [nouveau]
>> [  326.169011][] nouveau_therm_alarm+0x13/0x15 
>> [nouveau]
>> [  326.169038][] 
>> nv04_timer_alarm_trigger+0x1b1/0x1cb [nouveau]
>> [  326.169059][] nv04_timer_alarm+0xb5/0xbe 
>> [nouveau]
>> [  326.169079][] alarm_timer_callback+0x15e/0x179 
>> [nouveau]
>> [  326.169101][] 
>> nv04_timer_alarm_trigger+0x1b1/0x1cb [nouveau]
>> [  326.169121][] nv04_timer_intr+0x5b/0x13c 
>> [nouveau]
>> [  326.169142][] nouveau_mc_intr+0x2e2/0x3b1 
>> [nouveau]
>> [  326.169160][] handle_irq_event_percpu+0x5c/0x1dc
>> [  326.169165][] handle_irq_event+0x3c/0x5c
>> [  326.169170][] handle_edge_irq+0xc4/0xeb
>> [  326.169175][] handle_irq+0x120/0x12d
>> [  326.169179][] do_IRQ+0x48/0xaf
>> [  326.169183][] ret_from_intr+0x0/0x13
>> [  326.169189]
>> other info that might help us debug this:
>>
>> [  326.169193]  Possible unsafe locking scenario:
>>
>> [  326.169195]CPU0CPU1
>> [  326.169197]
>> [  326.169199]   lock(&(&priv->sensor.alarm_program_lock)->rlock);
>> [  326.169205]
>> lock(&(&priv->fan->lock)->rlock);
>> [  326.169211]
>> lock(&(&priv->sensor.alarm_program_lock)->rlock);
>> [  326.169216]   lock(&(&priv->fan->lock)->rlock);
>> [  326.169221]
>>   *** DEADLOCK ***
>>
>>   [  326.169225] 1 lock held by ldconfig/22297:
>>   [  326.169229]  #0:  (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}, 
>> at: [] alarm_timer_callback+0xf1/0x179 [nouveau]
>>   [  326.169253]
>>   stack backtrace:
>>   [  326.169258] CPU: 7 PID: 22297 Comm: ldconfig Not tainted 3.13.6 #1270
>>   [  326.169260] Hardware name: System manufacturer System Product Name/P6T 
>> SE, BIOS 060309/02/2009
>>   [  326.169264]  90fb6360 8801bfdc3a38 9059e369 
>> 0006
>>   [  326.169273]  90fb61b0 8801bfdc3a88 905998cf 
>> 0002
>>   [  326.169282]  8800b148dbe0 0001 8800b148e1e0 
>> 0001
>>   [  326.169342] Call Trace:
>>   [  326.169344][] dump_stack+0x4e/0x71
>>   [  326.169352]  [] print_circular_bug+0x2ad/0x2be
>>   [  326.169356]  [] __lock_acquire+0x10be/0x182b
>>   [  326.169360]  [] ? check_irq_usage+0x99/0xab
>>   [  326.169365]  [] lock_acquire+0xce/0x117
>>   [  326.169384]  []

nouveau_fan_update: possible circular locking dependency detected

2014-03-13 Thread Ilia Mirkin
On Sun, Mar 9, 2014 at 10:51 AM, Marcin Slusarz
 wrote:
> [  326.168487] ==
> [  326.168491] [ INFO: possible circular locking dependency detected ]
> [  326.168496] 3.13.6 #1270 Not tainted
> [  326.168500] ---
> [  326.168504] ldconfig/22297 is trying to acquire lock:
> [  326.168507]  (&(&priv->fan->lock)->rlock){-.-...}, at: 
> [] nouveau_fan_update+0xeb/0x252 [nouveau]
> [  326.168551]
> but task is already holding lock:
> [  326.168555]  (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}, at: 
> [] alarm_timer_callback+0xf1/0x179 [nouveau]
> [  326.168587]
> which lock already depends on the new lock.
>
> [  326.168592]
> the existing dependency chain (in reverse order) is:
> [  326.168596]
> -> #1 (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}:
> [  326.168606][] lock_acquire+0xce/0x117
> [  326.168615][] _raw_spin_lock_irqsave+0x3f/0x51
> [  326.168623][] alarm_timer_callback+0xf1/0x179 
> [nouveau]
> [  326.168651][] 
> nv04_timer_alarm_trigger+0x1b1/0x1cb [nouveau]
> [  326.168679][] nv04_timer_alarm+0xb5/0xbe 
> [nouveau]
> [  326.168708][] nouveau_fan_update+0x234/0x252 
> [nouveau]
> [  326.168735][] nouveau_fan_alarm+0x15/0x17 
> [nouveau]
> [  326.168763][] 
> nv04_timer_alarm_trigger+0x1b1/0x1cb [nouveau]
> [  326.168790][] nv04_timer_intr+0x5b/0x13c 
> [nouveau]
> [  326.168817][] nouveau_mc_intr+0x2e2/0x3b1 
> [nouveau]
> [  326.168838][] handle_irq_event_percpu+0x5c/0x1dc
> [  326.168846][] handle_irq_event+0x3c/0x5c
> [  326.168852][] handle_edge_irq+0xc4/0xeb
> [  326.168860][] handle_irq+0x120/0x12d
> [  326.168868][] do_IRQ+0x48/0xaf
> [  326.168873][] ret_from_intr+0x0/0x13
> [  326.168881][] arch_cpu_idle+0x13/0x1d
> [  326.168887][] cpu_startup_entry+0x140/0x218
> [  326.168895][] start_secondary+0x1bf/0x1c4
> [  326.168902]
> -> #0 (&(&priv->fan->lock)->rlock){-.-...}:
> [  326.168913][] __lock_acquire+0x10be/0x182b
> [  326.168920][] lock_acquire+0xce/0x117
> [  326.168924][] _raw_spin_lock_irqsave+0x3f/0x51
> [  326.168931][] nouveau_fan_update+0xeb/0x252 
> [nouveau]
> [  326.168958][] nouveau_therm_fan_set+0x14/0x16 
> [nouveau]
> [  326.168984][] nouveau_therm_update+0x303/0x312 
> [nouveau]
> [  326.169011][] nouveau_therm_alarm+0x13/0x15 
> [nouveau]
> [  326.169038][] 
> nv04_timer_alarm_trigger+0x1b1/0x1cb [nouveau]
> [  326.169059][] nv04_timer_alarm+0xb5/0xbe 
> [nouveau]
> [  326.169079][] alarm_timer_callback+0x15e/0x179 
> [nouveau]
> [  326.169101][] 
> nv04_timer_alarm_trigger+0x1b1/0x1cb [nouveau]
> [  326.169121][] nv04_timer_intr+0x5b/0x13c 
> [nouveau]
> [  326.169142][] nouveau_mc_intr+0x2e2/0x3b1 
> [nouveau]
> [  326.169160][] handle_irq_event_percpu+0x5c/0x1dc
> [  326.169165][] handle_irq_event+0x3c/0x5c
> [  326.169170][] handle_edge_irq+0xc4/0xeb
> [  326.169175][] handle_irq+0x120/0x12d
> [  326.169179][] do_IRQ+0x48/0xaf
> [  326.169183][] ret_from_intr+0x0/0x13
> [  326.169189]
> other info that might help us debug this:
>
> [  326.169193]  Possible unsafe locking scenario:
>
> [  326.169195]CPU0CPU1
> [  326.169197]
> [  326.169199]   lock(&(&priv->sensor.alarm_program_lock)->rlock);
> [  326.169205]
> lock(&(&priv->fan->lock)->rlock);
> [  326.169211]
> lock(&(&priv->sensor.alarm_program_lock)->rlock);
> [  326.169216]   lock(&(&priv->fan->lock)->rlock);
> [  326.169221]
>  *** DEADLOCK ***
>
>  [  326.169225] 1 lock held by ldconfig/22297:
>  [  326.169229]  #0:  (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}, 
> at: [] alarm_timer_callback+0xf1/0x179 [nouveau]
>  [  326.169253]
>  stack backtrace:
>  [  326.169258] CPU: 7 PID: 22297 Comm: ldconfig Not tainted 3.13.6 #1270
>  [  326.169260] Hardware name: System manufacturer System Product Name/P6T 
> SE, BIOS 060309/02/2009
>  [  326.169264]  90fb6360 8801bfdc3a38 9059e369 
> 0006
>  [  326.169273]  90fb61b0 8801bfdc3a88 905998cf 
> 0002
>  [  326.169282]  8800b148dbe0 0001 8800b148e1e0 
> 0001
>  [  326.169342] Call Trace:
>  [  326.169344][] dump_stack+0x4e/0x71
>  [  326.169352]  [] print_circular_bug+0x2ad/0x2be
>  [  326.169356]  [] __lock_acquire+0x10be/0x182b
>  [  326.169360]  [] ? check_irq_usage+0x99/0xab
>  [  326.169365]  [] lock_acquire+0xce/0x117
>  [  326.169384]  [] ? nouveau_fan_update+0xeb/0x252 
> [nouveau]
>  [  326.169388]  [] _raw_spin_lock_irqsave+0x3f/0x51
>  [  326.169407]  [] ? nouveau_fan_update+0xeb/0x252 
> [nouveau]
>  [  326

nouveau_fan_update: possible circular locking dependency detected

2014-03-09 Thread Marcin Slusarz
[  326.168487] ==
[  326.168491] [ INFO: possible circular locking dependency detected ]
[  326.168496] 3.13.6 #1270 Not tainted
[  326.168500] ---
[  326.168504] ldconfig/22297 is trying to acquire lock:
[  326.168507]  (&(&priv->fan->lock)->rlock){-.-...}, at: [] 
nouveau_fan_update+0xeb/0x252 [nouveau]
[  326.168551] 
but task is already holding lock:
[  326.168555]  (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}, at: 
[] alarm_timer_callback+0xf1/0x179 [nouveau]
[  326.168587] 
which lock already depends on the new lock.

[  326.168592] 
the existing dependency chain (in reverse order) is:
[  326.168596] 
-> #1 (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}:
[  326.168606][] lock_acquire+0xce/0x117
[  326.168615][] _raw_spin_lock_irqsave+0x3f/0x51
[  326.168623][] alarm_timer_callback+0xf1/0x179 
[nouveau]
[  326.168651][] nv04_timer_alarm_trigger+0x1b1/0x1cb 
[nouveau]
[  326.168679][] nv04_timer_alarm+0xb5/0xbe [nouveau]
[  326.168708][] nouveau_fan_update+0x234/0x252 
[nouveau]
[  326.168735][] nouveau_fan_alarm+0x15/0x17 [nouveau]
[  326.168763][] nv04_timer_alarm_trigger+0x1b1/0x1cb 
[nouveau]
[  326.168790][] nv04_timer_intr+0x5b/0x13c [nouveau]
[  326.168817][] nouveau_mc_intr+0x2e2/0x3b1 [nouveau]
[  326.168838][] handle_irq_event_percpu+0x5c/0x1dc
[  326.168846][] handle_irq_event+0x3c/0x5c
[  326.168852][] handle_edge_irq+0xc4/0xeb
[  326.168860][] handle_irq+0x120/0x12d
[  326.168868][] do_IRQ+0x48/0xaf
[  326.168873][] ret_from_intr+0x0/0x13
[  326.168881][] arch_cpu_idle+0x13/0x1d
[  326.168887][] cpu_startup_entry+0x140/0x218
[  326.168895][] start_secondary+0x1bf/0x1c4
[  326.168902] 
-> #0 (&(&priv->fan->lock)->rlock){-.-...}:
[  326.168913][] __lock_acquire+0x10be/0x182b
[  326.168920][] lock_acquire+0xce/0x117
[  326.168924][] _raw_spin_lock_irqsave+0x3f/0x51
[  326.168931][] nouveau_fan_update+0xeb/0x252 
[nouveau]
[  326.168958][] nouveau_therm_fan_set+0x14/0x16 
[nouveau]
[  326.168984][] nouveau_therm_update+0x303/0x312 
[nouveau]
[  326.169011][] nouveau_therm_alarm+0x13/0x15 
[nouveau]
[  326.169038][] nv04_timer_alarm_trigger+0x1b1/0x1cb 
[nouveau]
[  326.169059][] nv04_timer_alarm+0xb5/0xbe [nouveau]
[  326.169079][] alarm_timer_callback+0x15e/0x179 
[nouveau]
[  326.169101][] nv04_timer_alarm_trigger+0x1b1/0x1cb 
[nouveau]
[  326.169121][] nv04_timer_intr+0x5b/0x13c [nouveau]
[  326.169142][] nouveau_mc_intr+0x2e2/0x3b1 [nouveau]
[  326.169160][] handle_irq_event_percpu+0x5c/0x1dc
[  326.169165][] handle_irq_event+0x3c/0x5c
[  326.169170][] handle_edge_irq+0xc4/0xeb
[  326.169175][] handle_irq+0x120/0x12d
[  326.169179][] do_IRQ+0x48/0xaf
[  326.169183][] ret_from_intr+0x0/0x13
[  326.169189] 
other info that might help us debug this:

[  326.169193]  Possible unsafe locking scenario:

[  326.169195]CPU0CPU1
[  326.169197]
[  326.169199]   lock(&(&priv->sensor.alarm_program_lock)->rlock);
[  326.169205]lock(&(&priv->fan->lock)->rlock);
[  326.169211]
lock(&(&priv->sensor.alarm_program_lock)->rlock);
[  326.169216]   lock(&(&priv->fan->lock)->rlock);
[  326.169221] 
 *** DEADLOCK ***

 [  326.169225] 1 lock held by ldconfig/22297:
 [  326.169229]  #0:  (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}, at: 
[] alarm_timer_callback+0xf1/0x179 [nouveau]
 [  326.169253] 
 stack backtrace:
 [  326.169258] CPU: 7 PID: 22297 Comm: ldconfig Not tainted 3.13.6 #1270
 [  326.169260] Hardware name: System manufacturer System Product Name/P6T SE, 
BIOS 060309/02/2009
 [  326.169264]  90fb6360 8801bfdc3a38 9059e369 
0006
 [  326.169273]  90fb61b0 8801bfdc3a88 905998cf 
0002
 [  326.169282]  8800b148dbe0 0001 8800b148e1e0 
0001
 [  326.169342] Call Trace:
 [  326.169344][] dump_stack+0x4e/0x71
 [  326.169352]  [] print_circular_bug+0x2ad/0x2be
 [  326.169356]  [] __lock_acquire+0x10be/0x182b
 [  326.169360]  [] ? check_irq_usage+0x99/0xab
 [  326.169365]  [] lock_acquire+0xce/0x117
 [  326.169384]  [] ? nouveau_fan_update+0xeb/0x252 [nouveau]
 [  326.169388]  [] _raw_spin_lock_irqsave+0x3f/0x51
 [  326.169407]  [] ? nouveau_fan_update+0xeb/0x252 [nouveau]
 [  326.169426]  [] ? nv04_timer_alarm_trigger+0x18d/0x1cb 
[nouveau]
 [  326.169445]  [] nouveau_fan_update+0xeb/0x252 [nouveau]
 [  326.169465]  [] nouveau_therm_fan_set+0x14/0x16 [nouveau]
 [  326.169483]  [] nouveau_therm_update+0x303/0x312 [nouveau]
 [  326.169502]  [] nouveau_therm_alarm+0x