Re: [PATCH 7/7] dyndbg: enable 'cache' of active pr_debug callsites
On Wed, Nov 25, 2020 at 1:54 PM Jason Baron wrote: > > > > On 11/25/20 2:36 PM, Jim Cromie wrote: > > In ddebug_putsite(), dont zs_unmap the callsite if it is enabled for > > printing. This means that the next time this pr_debug callsite is > > executed, the _getsite() will succeed quickly without remapping the > > zrec. > > > > Once the callsite is disabled via >control, a following _putsite() > > will see the flag cleared, and zs_unmap it. > > > > This changes the lifetime of our zs_mappings from brief (only for the > > single printk) to as long as uptime (if a prdebug is enabled til > > poweroff). This appears to be triggering the ensuing mayhem. > > > > I am able to get through init, to root console, by disabling all > > dynamic-debugs, including the ones that are enabled at compile. > > > > $ kruna --kopt \*.dyndbg=-p --kopt debug_locks_proceed=1 > > > > but enabling any pr-debug crashes. > > Plain old boot also panics, as pasted below. > > > > > Hi Jim, > > Looks like you've made a lot of progress with this series...but if I > understand correctly there are still some unresolved issues (panic). > So I think this series is more 'RFC' at this point? > yes, RFC. WIP suggests I know whats needed next. certainly Ive seen a variety of sleeping while atomic ... BUGs, > I was also expecting to see updates to the actual printing functions > to now use dp->site. But perhaps I missed those bits? those bits were changed in patch 5 basically changes are ~ s/dp/dc/g, and in ddebug_getsite / _putsite > > Thanks, > > -Jason > > > $ kruna --kopt debug_locks_proceed=1 > > [1.915768] ? rest_init+0x24d/0x24d > > [1.916417] kernel_init+0xaf/0x103 > > [1.917038] ret_from_fork+0x22/0x30 > > [1.917726] Kernel Offset: disabled > > [1.918324] ---[ end Kernel panic - not syncing: Requested init /bin/sh > > failed (error -14). ]--- > > QEMU 5.1.0 monitor - type 'help' for more information > > > > Signed-off-by: Jim Cromie > > --- > > lib/dynamic_debug.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c > > index 534d73e45844..0c69aa52395d 100644 > > --- a/lib/dynamic_debug.c > > +++ b/lib/dynamic_debug.c > > @@ -165,7 +165,9 @@ static void ddebug_putsite(struct _ddebug *dp) > > /* need to keep site until zmap backing exists */ > > return; > > > > - /* always unmap for now. if !pr-debug has locking issues */ > > + if (dp->flags & _DPRINTK_FLAGS_PRINT) > > + return; /* keep maps of enabled pr_debugs */ > > + > > zs_unmap_object(dd_callsite_zpool, dp->zhandle); > > dp->site = NULL; > > } > >
Re: [PATCH 7/7] dyndbg: enable 'cache' of active pr_debug callsites
On 11/25/20 2:36 PM, Jim Cromie wrote: > In ddebug_putsite(), dont zs_unmap the callsite if it is enabled for > printing. This means that the next time this pr_debug callsite is > executed, the _getsite() will succeed quickly without remapping the > zrec. > > Once the callsite is disabled via >control, a following _putsite() > will see the flag cleared, and zs_unmap it. > > This changes the lifetime of our zs_mappings from brief (only for the > single printk) to as long as uptime (if a prdebug is enabled til > poweroff). This appears to be triggering the ensuing mayhem. > > I am able to get through init, to root console, by disabling all > dynamic-debugs, including the ones that are enabled at compile. > > $ kruna --kopt \*.dyndbg=-p --kopt debug_locks_proceed=1 > > but enabling any pr-debug crashes. > Plain old boot also panics, as pasted below. > Hi Jim, Looks like you've made a lot of progress with this series...but if I understand correctly there are still some unresolved issues (panic). So I think this series is more 'RFC' at this point? I was also expecting to see updates to the actual printing functions to now use dp->site. But perhaps I missed those bits? Thanks, -Jason > $ kruna --kopt debug_locks_proceed=1 > > [0.443926] > - > [0.445085] BUG: 14 unexpected failures (out of 350) - debugging > proceeding anyway! | > [0.446498] > - > ... > [0.617347] dyndbg: 224 modules, 2014 entries and 8960 bytes in ddebug > tables, 80560 bytes in __dyndbg section, 80560 bytes in __dyndbg_callsites > section > [0.617537] dyndbg: 2014 entries. repeated entries: 1789 module 1667 file > 809 func > ... > [1.456548] zswap: loaded using pool zstd/zsmalloc > [1.462802] dyndbg: total pages: 24 compaction: 0 > [1.463965] PM: Magic number: 12:980:889 > [1.464860] BUG: sleeping function called from invalid context at > mm/slab.h:506 > [1.466150] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, > name: swapper/0 > [1.467282] 3 locks held by swapper/0/1: > [1.467936] #0: 82875460 (cpu_hotplug_lock){}-{0:0}, at: > cpufreq_register_driver+0xb5/0x2f0 > [1.469622] #1: 888003cfbd28 (subsys mutex#5){+.+.}-{4:4}, at: > subsys_interface_register+0x5f/0x130 > [1.471448] #2: 8880057f00b0 (>lock){.+.+}-{3:3}, at: > zs_map_object+0x7b/0x2b0 > [1.472666] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > 5.10.0-rc3-00304-g7a4ad0dc5042 #75 > [1.474007] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.13.0-3.fc33 04/01/2014 > [1.475314] Call Trace: > [1.475682] dump_stack+0x7d/0x9f > [1.476162] ___might_sleep.cold+0xa6/0xb7 > [1.476745] __might_sleep+0x46/0x80 > [1.477266] __kmalloc_track_caller+0x1dd/0x2d0 > [1.477895] kstrdup_const+0x45/0x70 > [1.478678] __kernfs_new_node+0x4b/0x280 > [1.479280] ? dynamic_emit_prefix+0x1dd/0x1f0 > [1.479907] ? __dynamic_pr_debug+0x8a/0xb0 > [1.480496] kernfs_create_dir_ns+0x44/0xc0 > [1.481188] sysfs_create_dir_ns+0x6c/0xd0 > [1.481820] kobject_add_internal+0x102/0x300 > [1.482505] kobject_init_and_add+0x71/0xa0 > [1.483194] ? kmem_cache_alloc_trace+0x21d/0x270 > [1.483993] ? cpufreq_online+0x62/0x960 > [1.484628] cpufreq_online+0xa5/0x960 > [1.485325] ? system_root_device_release+0x10/0x10 > [1.486223] cpufreq_add_dev+0x79/0x90 > [1.486868] subsys_interface_register+0x11e/0x130 > [1.487787] cpufreq_register_driver+0x171/0x2f0 > [1.488679] acpi_cpufreq_init+0x266/0x294 > [1.489508] ? CPU_FREQ_GOV_ONDEMAND_init+0x12/0x12 > [1.490350] do_one_initcall+0x5a/0x2c0 > [1.491037] ? rcu_read_lock_sched_held+0x41/0x80 > [1.491820] kernel_init_freeable+0x23b/0x287 > [1.492561] ? rest_init+0x24d/0x24d > [1.493180] kernel_init+0x9/0x103 > [1.493825] ret_from_fork+0x22/0x30 > [1.494557] > [1.494804] = > [1.495503] [ BUG: Invalid wait context ] > [1.496134] 5.10.0-rc3-00304-g7a4ad0dc5042 #75 Tainted: GW > [1.497346] - > [1.497986] swapper/0/1 is trying to lock: > [1.498668] 8299ad90 (iattr_mutex){+.+.}-{4:4}, at: > kernfs_xattr_get+0x25/0x60 > [1.499855] other info that might help us debug this: > [1.500655] context-{5:5} > [1.501118] 3 locks held by swapper/0/1: > [1.501732] #0: 82875460 (cpu_hotplug_lock){}-{0:0}, at: > cpufreq_register_driver+0xb5/0x2f0 > [1.503384] #1: 888003cfbd28 (subsys mutex#5){+.+.}-{4:4}, at: > subsys_interface_register+0x5f/0x130 > [1.505289] #2: 8880057f00b0 (>lock){.+.+}-{3:3}, at: > zs_map_object+0x7b/0x2b0 > [1.506743] stack backtrace: > [1.507228] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW >
[PATCH 7/7] dyndbg: enable 'cache' of active pr_debug callsites
In ddebug_putsite(), dont zs_unmap the callsite if it is enabled for printing. This means that the next time this pr_debug callsite is executed, the _getsite() will succeed quickly without remapping the zrec. Once the callsite is disabled via >control, a following _putsite() will see the flag cleared, and zs_unmap it. This changes the lifetime of our zs_mappings from brief (only for the single printk) to as long as uptime (if a prdebug is enabled til poweroff). This appears to be triggering the ensuing mayhem. I am able to get through init, to root console, by disabling all dynamic-debugs, including the ones that are enabled at compile. $ kruna --kopt \*.dyndbg=-p --kopt debug_locks_proceed=1 but enabling any pr-debug crashes. Plain old boot also panics, as pasted below. $ kruna --kopt debug_locks_proceed=1 [0.443926] - [0.445085] BUG: 14 unexpected failures (out of 350) - debugging proceeding anyway! | [0.446498] - ... [0.617347] dyndbg: 224 modules, 2014 entries and 8960 bytes in ddebug tables, 80560 bytes in __dyndbg section, 80560 bytes in __dyndbg_callsites section [0.617537] dyndbg: 2014 entries. repeated entries: 1789 module 1667 file 809 func ... [1.456548] zswap: loaded using pool zstd/zsmalloc [1.462802] dyndbg: total pages: 24 compaction: 0 [1.463965] PM: Magic number: 12:980:889 [1.464860] BUG: sleeping function called from invalid context at mm/slab.h:506 [1.466150] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 [1.467282] 3 locks held by swapper/0/1: [1.467936] #0: 82875460 (cpu_hotplug_lock){}-{0:0}, at: cpufreq_register_driver+0xb5/0x2f0 [1.469622] #1: 888003cfbd28 (subsys mutex#5){+.+.}-{4:4}, at: subsys_interface_register+0x5f/0x130 [1.471448] #2: 8880057f00b0 (>lock){.+.+}-{3:3}, at: zs_map_object+0x7b/0x2b0 [1.472666] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc3-00304-g7a4ad0dc5042 #75 [1.474007] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-3.fc33 04/01/2014 [1.475314] Call Trace: [1.475682] dump_stack+0x7d/0x9f [1.476162] ___might_sleep.cold+0xa6/0xb7 [1.476745] __might_sleep+0x46/0x80 [1.477266] __kmalloc_track_caller+0x1dd/0x2d0 [1.477895] kstrdup_const+0x45/0x70 [1.478678] __kernfs_new_node+0x4b/0x280 [1.479280] ? dynamic_emit_prefix+0x1dd/0x1f0 [1.479907] ? __dynamic_pr_debug+0x8a/0xb0 [1.480496] kernfs_create_dir_ns+0x44/0xc0 [1.481188] sysfs_create_dir_ns+0x6c/0xd0 [1.481820] kobject_add_internal+0x102/0x300 [1.482505] kobject_init_and_add+0x71/0xa0 [1.483194] ? kmem_cache_alloc_trace+0x21d/0x270 [1.483993] ? cpufreq_online+0x62/0x960 [1.484628] cpufreq_online+0xa5/0x960 [1.485325] ? system_root_device_release+0x10/0x10 [1.486223] cpufreq_add_dev+0x79/0x90 [1.486868] subsys_interface_register+0x11e/0x130 [1.487787] cpufreq_register_driver+0x171/0x2f0 [1.488679] acpi_cpufreq_init+0x266/0x294 [1.489508] ? CPU_FREQ_GOV_ONDEMAND_init+0x12/0x12 [1.490350] do_one_initcall+0x5a/0x2c0 [1.491037] ? rcu_read_lock_sched_held+0x41/0x80 [1.491820] kernel_init_freeable+0x23b/0x287 [1.492561] ? rest_init+0x24d/0x24d [1.493180] kernel_init+0x9/0x103 [1.493825] ret_from_fork+0x22/0x30 [1.494557] [1.494804] = [1.495503] [ BUG: Invalid wait context ] [1.496134] 5.10.0-rc3-00304-g7a4ad0dc5042 #75 Tainted: GW [1.497346] - [1.497986] swapper/0/1 is trying to lock: [1.498668] 8299ad90 (iattr_mutex){+.+.}-{4:4}, at: kernfs_xattr_get+0x25/0x60 [1.499855] other info that might help us debug this: [1.500655] context-{5:5} [1.501118] 3 locks held by swapper/0/1: [1.501732] #0: 82875460 (cpu_hotplug_lock){}-{0:0}, at: cpufreq_register_driver+0xb5/0x2f0 [1.503384] #1: 888003cfbd28 (subsys mutex#5){+.+.}-{4:4}, at: subsys_interface_register+0x5f/0x130 [1.505289] #2: 8880057f00b0 (>lock){.+.+}-{3:3}, at: zs_map_object+0x7b/0x2b0 [1.506743] stack backtrace: [1.507228] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 5.10.0-rc3-00304-g7a4ad0dc5042 #75 [1.509060] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-3.fc33 04/01/2014 [1.510505] Call Trace: [1.510887] dump_stack+0x7d/0x9f [1.511456] __lock_acquire.cold+0xb9/0x2cd [1.512389] ? usage_match+0x20/0x20 [1.512960] ? __bfs+0xf3/0x220 [1.513442] lock_acquire+0x137/0x3e0 [1.514069] ? kernfs_xattr_get+0x25/0x60 [1.514724] ? ___might_sleep+0x15c/0x190 [1.515369] __mutex_lock+0x8e/0x940 [1.515866] ? kernfs_xattr_get+0x25/0x60 [1.516515] ? asm_sysvec_apic_timer_interrupt+0x12/0x20 [1.517343] ?
[PATCH 7/7] dyndbg: enable 'cache' of active pr_debug callsites
in ddebug_zpool_put() dont zs_unmap the callsite, if it is enabled for printing. This will eliminate possibly repeated un-maps then re-maps of enabled and invoked pr-debug callsites, and will promptly retire all other uses. But this causes kernel to BUG [1.364303] BUG: sleeping function called from invalid context at mm/slab.h:567 [jimc@frodo build-v2]$ krun -a main.dyndbg=+pmf -q=-s -q=-S ./.virtme_mods/lib/modules/0.0.0 /usr/bin/qemu-system-x86_64 -fsdev local,id=virtfs1,path=/,security_model=none,readonly,multidevs=remap -device virtio-9p-pci,fsdev=virtfs1,mount_tag=/dev/root -fsdev local,id=virtfs5,path=/usr/local/lib/python3.8/site-packages/virtme-0.1.1-py3.8.egg/virtme/guest,security_model=none,readonly,multidevs=remap -device virtio-9p-pci,fsdev=virtfs5,mount_tag=virtme.guesttools -machine accel=kvm:tcg -watchdog i6300esb -cpu host -parallel none -net none -echr 1 -serial none -chardev stdio,id=console,signal=off,mux=on -serial chardev:console -mon chardev=console -vga none -display none -kernel ./arch/x86/boot/bzImage -append 'virtme_link_mods=/home/jimc/projects/lx/linux.git/build-v2/.virtme_mods/lib/modules/0.0.0 earlyprintk=serial,ttyS0,115200 console=ttyS0 psmouse.proto=exps "virtme_stty_con=rows 25 cols 102 iutf8" TERM=xterm-256color rootfstype=9p rootflags=version=9p2000.L,trans=virtio,access=any raid=noautodetect ro nokaslr dynamic_debug.verbose=3 module.dyndbg=+pm main.dyndbg=+pmf init=/bin/sh -- -c "mount -t tmpfs run /run;mkdir -p /run/virtme/guesttools;/bin/mount -n -t 9p -o ro,version=9p2000.L,trans=virtio,access=any virtme.guesttools /run/virtme/guesttools;exec /run/virtme/guesttools/virtme-init"' -s -S Wrong EFI loader signature. early console in extract_kernel input_data: 0x033373a8 input_len: 0x00aba748 output: 0x0100 output_len: 0x025f9e28 kernel_total_size: 0x02e2c000 needed_size: 0x0300 trampoline_32bit: 0x0009d000 KASLR disabled: 'nokaslr' on cmdline. Decompressing Linux... Parsing ELF... No relocation needed... done. Booting the kernel. [0.00] Linux version 5.8.0-00025-g4e76f4427bf8 (jimc@frodo) (gcc (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1), GNU ld version 2.34-4.fc32) #30 SMP Thu Aug 6 16:39:03 MDT 2020 [0.00] Command line: virtme_link_mods=/home/jimc/projects/lx/linux.git/build-v2/.virtme_mods/lib/modules/0.0.0 earlyprintk=serial,ttyS0,115200 console=ttyS0 psmouse.proto=exps "virtme_stty_con=rows 25 cols 102 iutf8" TERM=xterm-256color rootfstype=9p rootflags=version=9p2000.L,trans=virtio,access=any raid=noautodetect ro nokaslr dynamic_debug.verbose=3 module.dyndbg=+pm main.dyndbg=+pmf init=/bin/sh -- -c "mount -t tmpfs run /run;mkdir -p /run/virtme/guesttools;/bin/mount -n -t 9p -o ro,version=9p2000.L,trans=virtio,access=any virtme.guesttools /run/virtme/guesttools;exec /run/virtme/guesttools/virtme-init" [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' [0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 [0.00] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 [0.00] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format. [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x07fd] usable [0.00] BIOS-e820: [mem 0x07fe-0x07ff] reserved [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved [0.00] BIOS-e820: [mem 0xfffc-0x] reserved [0.00] printk: bootconsole [earlyser0] enabled [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.8 present. [0.00] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [0.00] Hypervisor detected: KVM [0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00 [0.00] kvm-clock: cpu 0, msr 3141001, primary cpu clock [0.00] kvm-clock: using sched offset of 218135524 cycles [0.000900] clocksource: kvm-clock: mask: 0x max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [0.003580] tsc: Detected 2591.998 MHz processor [0.004611] last_pfn = 0x7fe0 max_arch_pfn = 0x4 [0.005518] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [0.013334] found SMP