Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-23 Thread Sergey Senozhatsky
On (09/23/19 14:58), Petr Mladek wrote:
> 
> If I understand it correctly then this is the re-appearing problem.
> The only systematic solution with the current approach is to
> take port->lock in printk_safe/printk_deferred context.

It probably is.
We have a number of reverse paths. TTY invokes MM under port->lock,
UART invokes TTY under port->lock, MM invokes UART under zone->lock,
UART invokes sched under port->lock, shced invokes UART, UART invokes
UART under port->lock (!), and so on.

> But this is a massive change that almost nobody wants. Instead,
> we want the changes that were discussed on Plumbers.
>
> Now, the question is what to do with existing kernels. There were
> several lockdep reports. And I am a bit lost. Did anyone seen
> real deadlocks or just the lockdep reports?

I think so. Qian Cai mentioned "a hang" in one of his reports
(was it unseeded random()?). I'll send a formal patch maybe,
since there were no objections.

> To be clear. I would feel more comfortable when there are no
> deadlocks. But I also do not want to invest too much time
> into old kernels. All these problems were there for ages.
> We could finally see them because lockdep was enabled in printk()
> thanks to printk_safe.

True.
Everyone was so much happier when printk() used to do

   lockdep_off();
   call_console_drivers();
   lockdep_on();

Now we can have lockdep and RCU checks enabled, yet somehow
printk_safe is still "a terrible thing" :)

-ss


Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-23 Thread Qian Cai
On Mon, 2019-09-23 at 14:58 +0200, Petr Mladek wrote:
> On Mon 2019-09-23 19:21:00, Sergey Senozhatsky wrote:
> > So we have
> > 
> > port->lock -> MM -> zone->lock
> > // from pty_write()->__tty_buffer_request_room()->kmalloc()
> > 
> > vs
> > 
> > zone->lock -> printk() -> port->lock
> > // from __offline_pages()->__offline_isolated_pages()->printk()
> 
> If I understand it correctly then this is the re-appearing problem.
> The only systematic solution with the current approach is to
> take port->lock in printk_safe/printk_deferred context.
> 
> But this is a massive change that almost nobody wants. Instead,
> we want the changes that were discussed on Plumbers.
> 
> Now, the question is what to do with existing kernels. There were
> several lockdep reports. And I am a bit lost. Did anyone seen
> real deadlocks or just the lockdep reports?

Yes, there is a real deadlock which causes an arm64 system hang on boot where
replace printk() with printk_deferred() in _warn_unseeded_randomness() will fix
it.

[ 1078.214683][T43784] WARNING: possible circular locking dependency detected
[ 1078.221550][T43784] 5.3.0-rc7-next-20190904 #14 Not tainted
[ 1078.227112][T43784] --
[ 1078.233976][T43784] vi/43784 is trying to acquire lock:
[ 1078.239192][T43784] 008b7cff9290 (&(>lock)->rlock){-.-.}, at:
rmqueue_bulk.constprop.21+0xb0/0x1218
[ 1078.249111][T43784] 
[ 1078.249111][T43784] but task is already holding lock:
[ 1078.256322][T43784] 00938db47d40 (&(>lock)->rlock){-.-.}, at:
pty_write+0x78/0x100
[ 1078.264760][T43784] 
[ 1078.264760][T43784] which lock already depends on the new lock.
[ 1078.264760][T43784] 
[ 1078.275008][T43784] 
[ 1078.275008][T43784] the existing dependency chain (in reverse order) is:
[ 1078.283869][T43784] 
[ 1078.283869][T43784] -> #3 (&(>lock)->rlock){-.-.}:
[ 1078.291350][T43784]__lock_acquire+0x5c8/0xbb0
[ 1078.296394][T43784]lock_acquire+0x154/0x428
[ 1078.301266][T43784]_raw_spin_lock_irqsave+0x80/0xa0
[ 1078.306831][T43784]tty_port_tty_get+0x28/0x68
[ 1078.311873][T43784]tty_port_default_wakeup+0x20/0x40
[ 1078.317523][T43784]tty_port_tty_wakeup+0x38/0x48
[ 1078.322827][T43784]uart_write_wakeup+0x2c/0x50
[ 1078.327956][T43784]pl011_tx_chars+0x240/0x260
[ 1078.332999][T43784]pl011_start_tx+0x24/0xa8
[ 1078.337868][T43784]__uart_start+0x90/0xa0
[ 1078.342563][T43784]uart_write+0x15c/0x2c8
[ 1078.347261][T43784]do_output_char+0x1c8/0x2b0
[ 1078.352304][T43784]n_tty_write+0x300/0x668
[ 1078.357087][T43784]tty_write+0x2e8/0x430
[ 1078.361696][T43784]redirected_tty_write+0xcc/0xe8
[ 1078.367086][T43784]do_iter_write+0x228/0x270
[ 1078.372041][T43784]vfs_writev+0x10c/0x1c8
[ 1078.376735][T43784]do_writev+0xdc/0x180
[ 1078.381257][T43784]__arm64_sys_writev+0x50/0x60
[ 1078.386476][T43784]el0_svc_handler+0x11c/0x1f0
[ 1078.391606][T43784]el0_svc+0x8/0xc
[ 1078.395691][T43784] 
[ 1078.395691][T43784] -> #2 (_lock_key){-.-.}:
[ 1078.402561][T43784]__lock_acquire+0x5c8/0xbb0
[ 1078.407604][T43784]lock_acquire+0x154/0x428
[ 1078.412474][T43784]_raw_spin_lock+0x68/0x88
[ 1078.417343][T43784]pl011_console_write+0x2ac/0x318
[ 1078.422820][T43784]console_unlock+0x3c4/0x898
[ 1078.427863][T43784]vprintk_emit+0x2d4/0x460
[ 1078.432732][T43784]vprintk_default+0x48/0x58
[ 1078.437688][T43784]vprintk_func+0x194/0x250
[ 1078.442557][T43784]printk+0xbc/0xec
[ 1078.446732][T43784]register_console+0x4a8/0x580
[ 1078.451947][T43784]uart_add_one_port+0x748/0x878
[ 1078.457250][T43784]pl011_register_port+0x98/0x128
[ 1078.462639][T43784]sbsa_uart_probe+0x398/0x480
[ 1078.467772][T43784]platform_drv_probe+0x70/0x108
[ 1078.473075][T43784]really_probe+0x15c/0x5d8
[ 1078.477944][T43784]driver_probe_device+0x94/0x1d0
[ 1078.483335][T43784]__device_attach_driver+0x11c/0x1a8
[ 1078.489072][T43784]bus_for_each_drv+0xf8/0x158
[ 1078.494201][T43784]__device_attach+0x164/0x240
[ 1078.499331][T43784]device_initial_probe+0x24/0x30
[ 1078.504721][T43784]bus_probe_device+0xf0/0x100
[ 1078.509850][T43784]device_add+0x63c/0x960
[ 1078.514546][T43784]platform_device_add+0x1ac/0x3b8
[ 1078.520023][T43784]platform_device_register_full+0x1fc/0x290
[ 1078.526373][T43784]acpi_create_platform_device.part.0+0x264/0x3a8
[ 1078.533152][T43784]acpi_create_platform_device+0x68/0x80
[ 1078.539150][T43784]acpi_default_enumeration+0x34/0x78
[ 1078.544887][T43784]acpi_bus_attach+0x340/0x3b8
[ 1078.550015][T43784]acpi_bus_attach+0xf8/0x3b8
[ 1078.555057][T43784]acpi_bus_attach+0xf8/0x3b8
[ 1078.560099][T43784]acpi_bus_attach+0xf8/0x3b8
[ 1078.565142][T43784]

Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-23 Thread Qian Cai
On Mon, 2019-09-23 at 19:21 +0900, Sergey Senozhatsky wrote:
> On (09/18/19 12:10), Qian Cai wrote:
> [..]
> > > So you have debug objects enabled. Right? This thing does not behave
> > > when it comes to printing. debug_objects are slightly problematic.
> > 
> > Yes, but there is an also a similar splat without the debug_objects. It 
> > looks
> > like anything try to allocate memory in that path will trigger it anyway.
> 
> Appears to be different, yet somehow very familiar.
> 
> > [  297.425908] WARNING: possible circular locking dependency detected
> > [  297.425908] 5.3.0-next-20190917 #8 Not tainted
> > [  297.425909] --
> > [  297.425910] test.sh/8653 is trying to acquire lock:
> > [  297.425911] 865a4460 (console_owner){-.-.}, at:
> > console_unlock+0x207/0x750
> > 
> > [  297.425914] but task is already holding lock:
> > [  297.425915] 3fff3c58 (&(>lock)->rlock){-.-.}, at:
> > __offline_isolated_pages+0x179/0x3e0
> > 
> > [  297.425919] which lock already depends on the new lock.
> > 
> > 
> > [  297.425920] the existing dependency chain (in reverse order) is:
> > 
> > [  297.425922] -> #3 (&(>lock)->rlock){-.-.}:
> > [  297.425925]__lock_acquire+0x5b3/0xb40
> > [  297.425925]lock_acquire+0x126/0x280
> > [  297.425926]_raw_spin_lock+0x2f/0x40
> > [  297.425927]rmqueue_bulk.constprop.21+0xb6/0x1160
> > [  297.425928]get_page_from_freelist+0x898/0x22c0
> > [  297.425928]__alloc_pages_nodemask+0x2f3/0x1cd0
> > [  297.425929]alloc_pages_current+0x9c/0x110
> > [  297.425930]allocate_slab+0x4c6/0x19c0
> > [  297.425931]new_slab+0x46/0x70
> > [  297.425931]___slab_alloc+0x58b/0x960
> > [  297.425932]__slab_alloc+0x43/0x70
> > [  297.425933]__kmalloc+0x3ad/0x4b0
> > [  297.425933]__tty_buffer_request_room+0x100/0x250
> > [  297.425934]tty_insert_flip_string_fixed_flag+0x67/0x110
> > [  297.425935]pty_write+0xa2/0xf0
> > [  297.425936]n_tty_write+0x36b/0x7b0
> > [  297.425936]tty_write+0x284/0x4c0
> > [  297.425937]__vfs_write+0x50/0xa0
> > [  297.425938]vfs_write+0x105/0x290
> > [  297.425939]redirected_tty_write+0x6a/0xc0
> > [  297.425939]do_iter_write+0x248/0x2a0
> > [  297.425940]vfs_writev+0x106/0x1e0
> > [  297.425941]do_writev+0xd4/0x180
> > [  297.425941]__x64_sys_writev+0x45/0x50
> > [  297.425942]do_syscall_64+0xcc/0x76c
> > [  297.425943]entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > [  297.425944] -> #2 (&(>lock)->rlock){-.-.}:
> > [  297.425946]__lock_acquire+0x5b3/0xb40
> > [  297.425947]lock_acquire+0x126/0x280
> > [  297.425948]_raw_spin_lock_irqsave+0x3a/0x50
> > [  297.425949]tty_port_tty_get+0x20/0x60
> > [  297.425949]tty_port_default_wakeup+0xf/0x30
> > [  297.425950]tty_port_tty_wakeup+0x39/0x40
> > [  297.425951]uart_write_wakeup+0x2a/0x40
> > [  297.425952]serial8250_tx_chars+0x22e/0x440
> > [  297.425952]serial8250_handle_irq.part.8+0x14a/0x170
> > [  297.425953]serial8250_default_handle_irq+0x5c/0x90
> > [  297.425954]serial8250_interrupt+0xa6/0x130
> > [  297.425955]__handle_irq_event_percpu+0x78/0x4f0
> > [  297.425955]handle_irq_event_percpu+0x70/0x100
> > [  297.425956]handle_irq_event+0x5a/0x8b
> > [  297.425957]handle_edge_irq+0x117/0x370
> > [  297.425958]do_IRQ+0x9e/0x1e0
> > [  297.425958]ret_from_intr+0x0/0x2a
> > [  297.425959]cpuidle_enter_state+0x156/0x8e0
> > [  297.425960]cpuidle_enter+0x41/0x70
> > [  297.425960]call_cpuidle+0x5e/0x90
> > [  297.425961]do_idle+0x333/0x370
> > [  297.425962]cpu_startup_entry+0x1d/0x1f
> > [  297.425962]start_secondary+0x290/0x330
> > [  297.425963]secondary_startup_64+0xb6/0xc0
> > 
> > [  297.425964] -> #1 (_lock_key){-.-.}:
> > [  297.425967]__lock_acquire+0x5b3/0xb40
> > [  297.425967]lock_acquire+0x126/0x280
> > [  297.425968]_raw_spin_lock_irqsave+0x3a/0x50
> > [  297.425969]serial8250_console_write+0x3e4/0x450
> > [  297.425970]univ8250_console_write+0x4b/0x60
> > [  297.425970]console_unlock+0x501/0x750
> > [  297.425971]vprintk_emit+0x10d/0x340
> > [  297.425972]vprintk_default+0x1f/0x30
> > [  297.425972]vprintk_func+0x44/0xd4
> > [  297.425973]printk+0x9f/0xc5
> > [  297.425974]register_console+0x39c/0x520
> > [  297.425975]univ8250_console_init+0x23/0x2d
> > [  297.425975]console_init+0x338/0x4cd
> > [  297.425976]start_kernel+0x534/0x724
> > [  297.425977]x86_64_start_reservations+0x24/0x26
> > [  297.425977]x86_64_start_kernel+0xf4/0xfb
> > [  297.425978]secondary_startup_64+0xb6/0xc0
> > 
> > [  297.425979] -> #0 

Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-23 Thread Petr Mladek
On Mon 2019-09-23 19:21:00, Sergey Senozhatsky wrote:
> So we have
> 
> port->lock -> MM -> zone->lock
>   // from pty_write()->__tty_buffer_request_room()->kmalloc()
> 
> vs
> 
> zone->lock -> printk() -> port->lock
>   // from __offline_pages()->__offline_isolated_pages()->printk()

If I understand it correctly then this is the re-appearing problem.
The only systematic solution with the current approach is to
take port->lock in printk_safe/printk_deferred context.

But this is a massive change that almost nobody wants. Instead,
we want the changes that were discussed on Plumbers.

Now, the question is what to do with existing kernels. There were
several lockdep reports. And I am a bit lost. Did anyone seen
real deadlocks or just the lockdep reports?

To be clear. I would feel more comfortable when there are no
deadlocks. But I also do not want to invest too much time
into old kernels. All these problems were there for ages.
We could finally see them because lockdep was enabled in printk()
thanks to printk_safe. Well, it is getting worse over time with
the increasing complexity and number of debugging messages.

> A number of debugging options make the kernel less stable.
> Sad but true.

Yeah. The only good thing is that most debug options are not
enabled on production systems. It is not an excuse for ignoring
the problems. But it might be important for prioritizing.

Best Regards,
Petr


Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-23 Thread Sergey Senozhatsky
On (09/18/19 12:10), Qian Cai wrote:
[..]
> > So you have debug objects enabled. Right? This thing does not behave
> > when it comes to printing. debug_objects are slightly problematic.
> 
> Yes, but there is an also a similar splat without the debug_objects. It looks
> like anything try to allocate memory in that path will trigger it anyway.

Appears to be different, yet somehow very familiar.

> [  297.425908] WARNING: possible circular locking dependency detected
> [  297.425908] 5.3.0-next-20190917 #8 Not tainted
> [  297.425909] --
> [  297.425910] test.sh/8653 is trying to acquire lock:
> [  297.425911] 865a4460 (console_owner){-.-.}, at:
> console_unlock+0x207/0x750
> 
> [  297.425914] but task is already holding lock:
> [  297.425915] 3fff3c58 (&(>lock)->rlock){-.-.}, at:
> __offline_isolated_pages+0x179/0x3e0
> 
> [  297.425919] which lock already depends on the new lock.
> 
> 
> [  297.425920] the existing dependency chain (in reverse order) is:
> 
> [  297.425922] -> #3 (&(>lock)->rlock){-.-.}:
> [  297.425925]__lock_acquire+0x5b3/0xb40
> [  297.425925]lock_acquire+0x126/0x280
> [  297.425926]_raw_spin_lock+0x2f/0x40
> [  297.425927]rmqueue_bulk.constprop.21+0xb6/0x1160
> [  297.425928]get_page_from_freelist+0x898/0x22c0
> [  297.425928]__alloc_pages_nodemask+0x2f3/0x1cd0
> [  297.425929]alloc_pages_current+0x9c/0x110
> [  297.425930]allocate_slab+0x4c6/0x19c0
> [  297.425931]new_slab+0x46/0x70
> [  297.425931]___slab_alloc+0x58b/0x960
> [  297.425932]__slab_alloc+0x43/0x70
> [  297.425933]__kmalloc+0x3ad/0x4b0
> [  297.425933]__tty_buffer_request_room+0x100/0x250
> [  297.425934]tty_insert_flip_string_fixed_flag+0x67/0x110
> [  297.425935]pty_write+0xa2/0xf0
> [  297.425936]n_tty_write+0x36b/0x7b0
> [  297.425936]tty_write+0x284/0x4c0
> [  297.425937]__vfs_write+0x50/0xa0
> [  297.425938]vfs_write+0x105/0x290
> [  297.425939]redirected_tty_write+0x6a/0xc0
> [  297.425939]do_iter_write+0x248/0x2a0
> [  297.425940]vfs_writev+0x106/0x1e0
> [  297.425941]do_writev+0xd4/0x180
> [  297.425941]__x64_sys_writev+0x45/0x50
> [  297.425942]do_syscall_64+0xcc/0x76c
> [  297.425943]entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> [  297.425944] -> #2 (&(>lock)->rlock){-.-.}:
> [  297.425946]__lock_acquire+0x5b3/0xb40
> [  297.425947]lock_acquire+0x126/0x280
> [  297.425948]_raw_spin_lock_irqsave+0x3a/0x50
> [  297.425949]tty_port_tty_get+0x20/0x60
> [  297.425949]tty_port_default_wakeup+0xf/0x30
> [  297.425950]tty_port_tty_wakeup+0x39/0x40
> [  297.425951]uart_write_wakeup+0x2a/0x40
> [  297.425952]serial8250_tx_chars+0x22e/0x440
> [  297.425952]serial8250_handle_irq.part.8+0x14a/0x170
> [  297.425953]serial8250_default_handle_irq+0x5c/0x90
> [  297.425954]serial8250_interrupt+0xa6/0x130
> [  297.425955]__handle_irq_event_percpu+0x78/0x4f0
> [  297.425955]handle_irq_event_percpu+0x70/0x100
> [  297.425956]handle_irq_event+0x5a/0x8b
> [  297.425957]handle_edge_irq+0x117/0x370
> [  297.425958]do_IRQ+0x9e/0x1e0
> [  297.425958]ret_from_intr+0x0/0x2a
> [  297.425959]cpuidle_enter_state+0x156/0x8e0
> [  297.425960]cpuidle_enter+0x41/0x70
> [  297.425960]call_cpuidle+0x5e/0x90
> [  297.425961]do_idle+0x333/0x370
> [  297.425962]cpu_startup_entry+0x1d/0x1f
> [  297.425962]start_secondary+0x290/0x330
> [  297.425963]secondary_startup_64+0xb6/0xc0
> 
> [  297.425964] -> #1 (_lock_key){-.-.}:
> [  297.425967]__lock_acquire+0x5b3/0xb40
> [  297.425967]lock_acquire+0x126/0x280
> [  297.425968]_raw_spin_lock_irqsave+0x3a/0x50
> [  297.425969]serial8250_console_write+0x3e4/0x450
> [  297.425970]univ8250_console_write+0x4b/0x60
> [  297.425970]console_unlock+0x501/0x750
> [  297.425971]vprintk_emit+0x10d/0x340
> [  297.425972]vprintk_default+0x1f/0x30
> [  297.425972]vprintk_func+0x44/0xd4
> [  297.425973]printk+0x9f/0xc5
> [  297.425974]register_console+0x39c/0x520
> [  297.425975]univ8250_console_init+0x23/0x2d
> [  297.425975]console_init+0x338/0x4cd
> [  297.425976]start_kernel+0x534/0x724
> [  297.425977]x86_64_start_reservations+0x24/0x26
> [  297.425977]x86_64_start_kernel+0xf4/0xfb
> [  297.425978]secondary_startup_64+0xb6/0xc0
> 
> [  297.425979] -> #0 (console_owner){-.-.}:
> [  297.425982]check_prev_add+0x107/0xea0
> [  297.425982]validate_chain+0x8fc/0x1200
> [  297.425983]__lock_acquire+0x5b3/0xb40
> [  297.425984]lock_acquire+0x126/0x280
> [  297.425984]

Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-18 Thread Qian Cai
On Thu, 2019-09-19 at 00:50 +0900, Sergey Senozhatsky wrote:
> On (09/18/19 10:39), Qian Cai wrote:
> > > Perhaps for a quick fix (and a comment that says this needs to be fixed
> > > properly). I think the changes to printk() that was discussed at
> > > Plumbers may also solve this properly.
> > 
> > I assume that the new printk() stuff will also fix this deadlock between
> > printk() and memory offline.
> 
> Mother chicken...
> 
> Do you actually see a deadlock? I'd rather expect a lockdep splat, but
> anyway...

Not yet, just a lockdep splat so far.

> 
> > [  317.337595] WARNING: possible circular locking dependency detected
> > [  317.337596] 5.3.0-next-20190917+ #9 Not tainted
> > [  317.337597] --
> > [  317.337597] test.sh/8738 is trying to acquire lock:
> > [  317.337598] b33a4978 ((console_sem).lock){-.-.}, at:> 
> > down_trylock+0x16/0x50
> > 
> > [  317.337602] but task is already holding lock:
> > [  317.337602] 3fff4318 (&(>lock)->rlock){-.-.}, at:> 
> > start_isolate_page_range+0x1f7/0x570
> > 
> > [  317.337606] which lock already depends on the new lock.
> > 
> > [  317.337608] the existing dependency chain (in reverse order) is:
> > 
> > [  317.337609] -> #3 (&(>lock)->rlock){-.-.}:
> > [  317.337612]__lock_acquire+0x5b3/0xb40
> > [  317.337613]lock_acquire+0x126/0x280
> > [  317.337613]_raw_spin_lock+0x2f/0x40
> > [  317.337614]rmqueue_bulk.constprop.21+0xb6/0x1160
> > [  317.337615]get_page_from_freelist+0x898/0x22c0
> > [  317.337616]__alloc_pages_nodemask+0x2f3/0x1cd0
> > [  317.337617]alloc_page_interleave+0x18/0x130
> > [  317.337618]alloc_pages_current+0xf6/0x110
> > [  317.337619]allocate_slab+0x4c6/0x19c0
> > [  317.337620]new_slab+0x46/0x70
> > [  317.337621]___slab_alloc+0x58b/0x960
> > [  317.337621]__slab_alloc+0x43/0x70
> > [  317.337622]kmem_cache_alloc+0x354/0x460
> > [  317.337623]fill_pool+0x272/0x4b0
> > [  317.337624]__debug_object_init+0x86/0x790
> > [  317.337624]debug_object_init+0x16/0x20
> > [  317.337625]hrtimer_init+0x27/0x1e0
> > [  317.337626]init_dl_task_timer+0x20/0x40
> > [  317.337627]__sched_fork+0x10b/0x1f0
> > [  317.337627]init_idle+0xac/0x520
> > [  317.337628]idle_thread_get+0x7c/0xc0
> > [  317.337629]bringup_cpu+0x1a/0x1e0
> > [  317.337630]cpuhp_invoke_callback+0x197/0x1120
> > [  317.337630]_cpu_up+0x171/0x280
> > [  317.337631]do_cpu_up+0xb1/0x120
> > [  317.337632]cpu_up+0x13/0x20
> > [  317.337632]smp_init+0xa4/0x12d
> > [  317.337633]kernel_init_freeable+0x37e/0x76e
> > [  317.337634]kernel_init+0x11/0x12f
> > [  317.337635]ret_from_fork+0x3a/0x50
> 
> So you have debug objects enabled. Right? This thing does not behave
> when it comes to printing. debug_objects are slightly problematic.

Yes, but there is an also a similar splat without the debug_objects. It looks
like anything try to allocate memory in that path will trigger it anyway.

[  297.425908] WARNING: possible circular locking dependency detected
[  297.425908] 5.3.0-next-20190917 #8 Not tainted
[  297.425909] --
[  297.425910] test.sh/8653 is trying to acquire lock:
[  297.425911] 865a4460 (console_owner){-.-.}, at:
console_unlock+0x207/0x750

[  297.425914] but task is already holding lock:
[  297.425915] 3fff3c58 (&(>lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0

[  297.425919] which lock already depends on the new lock.


[  297.425920] the existing dependency chain (in reverse order) is:

[  297.425922] -> #3 (&(>lock)->rlock){-.-.}:
[  297.425925]__lock_acquire+0x5b3/0xb40
[  297.425925]lock_acquire+0x126/0x280
[  297.425926]_raw_spin_lock+0x2f/0x40
[  297.425927]rmqueue_bulk.constprop.21+0xb6/0x1160
[  297.425928]get_page_from_freelist+0x898/0x22c0
[  297.425928]__alloc_pages_nodemask+0x2f3/0x1cd0
[  297.425929]alloc_pages_current+0x9c/0x110
[  297.425930]allocate_slab+0x4c6/0x19c0
[  297.425931]new_slab+0x46/0x70
[  297.425931]___slab_alloc+0x58b/0x960
[  297.425932]__slab_alloc+0x43/0x70
[  297.425933]__kmalloc+0x3ad/0x4b0
[  297.425933]__tty_buffer_request_room+0x100/0x250
[  297.425934]tty_insert_flip_string_fixed_flag+0x67/0x110
[  297.425935]pty_write+0xa2/0xf0
[  297.425936]n_tty_write+0x36b/0x7b0
[  297.425936]tty_write+0x284/0x4c0
[  297.425937]__vfs_write+0x50/0xa0
[  297.425938]vfs_write+0x105/0x290
[  297.425939]redirected_tty_write+0x6a/0xc0
[  297.425939]do_iter_write+0x248/0x2a0
[  297.425940]vfs_writev+0x106/0x1e0
[  297.425941]do_writev+0xd4/0x180
[  297.425941]

Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-18 Thread Sergey Senozhatsky
A correction:

On (09/19/19 00:51), Sergey Senozhatsky wrote:
[..]
>
> zone->lock --> console_sem->lock
> 
> So then we have
> 
>   zone->lock --> console_sem->lock --> pi_lock --> rq->lock
> 
>   vs. the reverse chain
> 
>   rq->lock --> console_sem->lock

 ^^^ zone->lock

-ss


Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-18 Thread Sergey Senozhatsky
On (09/18/19 10:39), Qian Cai wrote:
> > Perhaps for a quick fix (and a comment that says this needs to be fixed
> > properly). I think the changes to printk() that was discussed at
> > Plumbers may also solve this properly.
> 
> I assume that the new printk() stuff will also fix this deadlock between
> printk() and memory offline.

Mother chicken...

Do you actually see a deadlock? I'd rather expect a lockdep splat, but
anyway...

> [  317.337595] WARNING: possible circular locking dependency detected
> [  317.337596] 5.3.0-next-20190917+ #9 Not tainted
> [  317.337597] --
> [  317.337597] test.sh/8738 is trying to acquire lock:
> [  317.337598] b33a4978 ((console_sem).lock){-.-.}, at:> 
> down_trylock+0x16/0x50
> 
> [  317.337602] but task is already holding lock:
> [  317.337602] 3fff4318 (&(>lock)->rlock){-.-.}, at:> 
> start_isolate_page_range+0x1f7/0x570
> 
> [  317.337606] which lock already depends on the new lock.
>
> [  317.337608] the existing dependency chain (in reverse order) is:
> 
> [  317.337609] -> #3 (&(>lock)->rlock){-.-.}:
> [  317.337612]__lock_acquire+0x5b3/0xb40
> [  317.337613]lock_acquire+0x126/0x280
> [  317.337613]_raw_spin_lock+0x2f/0x40
> [  317.337614]rmqueue_bulk.constprop.21+0xb6/0x1160
> [  317.337615]get_page_from_freelist+0x898/0x22c0
> [  317.337616]__alloc_pages_nodemask+0x2f3/0x1cd0
> [  317.337617]alloc_page_interleave+0x18/0x130
> [  317.337618]alloc_pages_current+0xf6/0x110
> [  317.337619]allocate_slab+0x4c6/0x19c0
> [  317.337620]new_slab+0x46/0x70
> [  317.337621]___slab_alloc+0x58b/0x960
> [  317.337621]__slab_alloc+0x43/0x70
> [  317.337622]kmem_cache_alloc+0x354/0x460
> [  317.337623]fill_pool+0x272/0x4b0
> [  317.337624]__debug_object_init+0x86/0x790
> [  317.337624]debug_object_init+0x16/0x20
> [  317.337625]hrtimer_init+0x27/0x1e0
> [  317.337626]init_dl_task_timer+0x20/0x40
> [  317.337627]__sched_fork+0x10b/0x1f0
> [  317.337627]init_idle+0xac/0x520
> [  317.337628]idle_thread_get+0x7c/0xc0
> [  317.337629]bringup_cpu+0x1a/0x1e0
> [  317.337630]cpuhp_invoke_callback+0x197/0x1120
> [  317.337630]_cpu_up+0x171/0x280
> [  317.337631]do_cpu_up+0xb1/0x120
> [  317.337632]cpu_up+0x13/0x20
> [  317.337632]smp_init+0xa4/0x12d
> [  317.337633]kernel_init_freeable+0x37e/0x76e
> [  317.337634]kernel_init+0x11/0x12f
> [  317.337635]ret_from_fork+0x3a/0x50

So you have debug objects enabled. Right? This thing does not behave
when it comes to printing. debug_objects are slightly problematic.

This thing does

rq->lock --> zone->lock

It takes rq->lock and then calls into 
__sched_fork()->hrtimer_init()->debug_objects()->MM

This doesn't look very right - a dive into MM under rq->lock.

Peter, Thomas am I wrong?

> [  317.337635] -> #2 (>lock){-.-.}:
> [  317.337638]__lock_acquire+0x5b3/0xb40
> [  317.337639]lock_acquire+0x126/0x280
> [  317.337639]_raw_spin_lock+0x2f/0x40
> [  317.337640]task_fork_fair+0x43/0x200
> [  317.337641]sched_fork+0x29b/0x420
> [  317.337642]copy_process+0xf3c/0x2fd0
> [  317.337642]_do_fork+0xef/0x950
> [  317.337643]kernel_thread+0xa8/0xe0
> [  317.337644]rest_init+0x28/0x311
> [  317.337645]arch_call_rest_init+0xe/0x1b
> [  317.337645]start_kernel+0x6eb/0x724
> [  317.337646]x86_64_start_reservations+0x24/0x26
> [  317.337647]x86_64_start_kernel+0xf4/0xfb
> [  317.337648]secondary_startup_64+0xb6/0xc0

pi_lock --> rq->lock

> [  317.337649] -> #1 (>pi_lock){-.-.}:
> [  317.337651]__lock_acquire+0x5b3/0xb40
> [  317.337652]lock_acquire+0x126/0x280
> [  317.337653]_raw_spin_lock_irqsave+0x3a/0x50
> [  317.337653]try_to_wake_up+0xb4/0x1030
> [  317.337654]wake_up_process+0x15/0x20
> [  317.337655]__up+0xaa/0xc0
> [  317.337655]up+0x55/0x60
> [  317.337656]__up_console_sem+0x37/0x60
> [  317.337657]console_unlock+0x3a0/0x750
> [  317.337658]vprintk_emit+0x10d/0x340
> [  317.337658]vprintk_default+0x1f/0x30
> [  317.337659]vprintk_func+0x44/0xd4
> [  317.337660]printk+0x9f/0xc5
> [  317.337660]crng_reseed+0x3cc/0x440
> [  317.337661]credit_entropy_bits+0x3e8/0x4f0
> [  317.337662]random_ioctl+0x1eb/0x250
> [  317.337663]do_vfs_ioctl+0x13e/0xa70
> [  317.337663]ksys_ioctl+0x41/0x80
> [  317.337664]__x64_sys_ioctl+0x43/0x4c
> [  317.337665]do_syscall_64+0xcc/0x76c
> [  317.337666]entry_SYSCALL_64_after_hwframe+0x49/0xbe

console_sem->lock --> pi_lock

This also covers console_sem->lock --> rq->lock, and maintains
pi_lock --> rq->lock

So we have


printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-18 Thread Qian Cai
On Mon, 2019-09-16 at 10:42 -0400, Steven Rostedt wrote:
> On Thu, 12 Sep 2019 08:05:41 -0400
> Qian Cai  wrote:
> 
> > >  drivers/char/random.c | 7 ---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > > index 9b54cdb301d3..975015857200 100644
> > > --- a/drivers/char/random.c
> > > +++ b/drivers/char/random.c
> > > @@ -1687,8 +1687,9 @@ static void _warn_unseeded_randomness(const char 
> > > *func_name, void *caller,
> > >   print_once = true;
> > >  #endif
> > >   if (__ratelimit(_warning))
> > > - pr_notice("random: %s called from %pS with crng_init=%d\n",
> > > -   func_name, caller, crng_init);
> > > + printk_deferred(KERN_NOTICE "random: %s called from %pS "
> > > + "with crng_init=%d\n", func_name, caller,
> > > + crng_init);
> > >  }
> > >  
> > >  /*
> > > @@ -2462,4 +2463,4 @@ void add_bootloader_randomness(const void *buf, 
> > > unsigned int size)
> > >   else
> > >   add_device_randomness(buf, size);
> > >  }
> > > -EXPORT_SYMBOL_GPL(add_bootloader_randomness);
> > > \ No newline at end of file
> > > +EXPORT_SYMBOL_GPL(add_bootloader_randomness);  
> > 
> > This will also fix the hang.
> > 
> > Sergey, do you plan to submit this Ted?
> 
> Perhaps for a quick fix (and a comment that says this needs to be fixed
> properly). I think the changes to printk() that was discussed at
> Plumbers may also solve this properly.

I assume that the new printk() stuff will also fix this deadlock between
printk() and memory offline.

[  317.337595] WARNING: possible circular locking dependency detected
[  317.337596] 5.3.0-next-20190917+ #9 Not tainted
[  317.337597] --
[  317.337597] test.sh/8738 is trying to acquire lock:
[  317.337598] b33a4978 ((console_sem).lock){-.-.}, at:
down_trylock+0x16/0x50

[  317.337602] but task is already holding lock:
[  317.337602] 3fff4318 (&(>lock)->rlock){-.-.}, at:
start_isolate_page_range+0x1f7/0x570

[  317.337606] which lock already depends on the new lock.


[  317.337608] the existing dependency chain (in reverse order) is:

[  317.337609] -> #3 (&(>lock)->rlock){-.-.}:
[  317.337612]__lock_acquire+0x5b3/0xb40
[  317.337613]lock_acquire+0x126/0x280
[  317.337613]_raw_spin_lock+0x2f/0x40
[  317.337614]rmqueue_bulk.constprop.21+0xb6/0x1160
[  317.337615]get_page_from_freelist+0x898/0x22c0
[  317.337616]__alloc_pages_nodemask+0x2f3/0x1cd0
[  317.337617]alloc_page_interleave+0x18/0x130
[  317.337618]alloc_pages_current+0xf6/0x110
[  317.337619]allocate_slab+0x4c6/0x19c0
[  317.337620]new_slab+0x46/0x70
[  317.337621]___slab_alloc+0x58b/0x960
[  317.337621]__slab_alloc+0x43/0x70
[  317.337622]kmem_cache_alloc+0x354/0x460
[  317.337623]fill_pool+0x272/0x4b0
[  317.337624]__debug_object_init+0x86/0x790
[  317.337624]debug_object_init+0x16/0x20
[  317.337625]hrtimer_init+0x27/0x1e0
[  317.337626]init_dl_task_timer+0x20/0x40
[  317.337627]__sched_fork+0x10b/0x1f0
[  317.337627]init_idle+0xac/0x520
[  317.337628]idle_thread_get+0x7c/0xc0
[  317.337629]bringup_cpu+0x1a/0x1e0
[  317.337630]cpuhp_invoke_callback+0x197/0x1120
[  317.337630]_cpu_up+0x171/0x280
[  317.337631]do_cpu_up+0xb1/0x120
[  317.337632]cpu_up+0x13/0x20
[  317.337632]smp_init+0xa4/0x12d
[  317.337633]kernel_init_freeable+0x37e/0x76e
[  317.337634]kernel_init+0x11/0x12f
[  317.337635]ret_from_fork+0x3a/0x50

[  317.337635] -> #2 (>lock){-.-.}:
[  317.337638]__lock_acquire+0x5b3/0xb40
[  317.337639]lock_acquire+0x126/0x280
[  317.337639]_raw_spin_lock+0x2f/0x40
[  317.337640]task_fork_fair+0x43/0x200
[  317.337641]sched_fork+0x29b/0x420
[  317.337642]copy_process+0xf3c/0x2fd0
[  317.337642]_do_fork+0xef/0x950
[  317.337643]kernel_thread+0xa8/0xe0
[  317.337644]rest_init+0x28/0x311
[  317.337645]arch_call_rest_init+0xe/0x1b
[  317.337645]start_kernel+0x6eb/0x724
[  317.337646]x86_64_start_reservations+0x24/0x26
[  317.337647]x86_64_start_kernel+0xf4/0xfb
[  317.337648]secondary_startup_64+0xb6/0xc0

[  317.337649] -> #1 (>pi_lock){-.-.}:
[  317.337651]__lock_acquire+0x5b3/0xb40
[  317.337652]lock_acquire+0x126/0x280
[  317.337653]_raw_spin_lock_irqsave+0x3a/0x50
[  317.337653]try_to_wake_up+0xb4/0x1030
[  317.337654]wake_up_process+0x15/0x20
[  317.337655]__up+0xaa/0xc0
[  317.337655]up+0x55/0x60
[  317.337656]__up_console_sem+0x37/0x60
[  317.337657]console_unlock+0x3a0/0x750
[  317.337658]vprintk_emit+0x10d/0x340
[  317.337658]