Re: [PATCH 7/7] dyndbg: enable 'cache' of active pr_debug callsites

2020-11-25 Thread jim . cromie
On Wed, Nov 25, 2020 at 1:54 PM Jason Baron  wrote:
>
>
>
> On 11/25/20 2:36 PM, Jim Cromie wrote:
> > In ddebug_putsite(), dont zs_unmap the callsite if it is enabled for
> > printing.  This means that the next time this pr_debug callsite is
> > executed, the _getsite() will succeed quickly without remapping the
> > zrec.
> >
> > Once the callsite is disabled via >control, a following _putsite()
> > will see the flag cleared, and zs_unmap it.
> >
> > This changes the lifetime of our zs_mappings from brief (only for the
> > single printk) to as long as uptime (if a prdebug is enabled til
> > poweroff).  This appears to be triggering the ensuing mayhem.
> >
> > I am able to get through init, to root console, by disabling all
> > dynamic-debugs, including the ones that are enabled at compile.
> >
> > $ kruna --kopt \*.dyndbg=-p --kopt debug_locks_proceed=1
> >
> > but enabling any pr-debug crashes.
> > Plain old boot also panics, as pasted below.
> >
>
>
> Hi Jim,
>
> Looks like you've made a lot of progress with this series...but if I
> understand correctly there are still some unresolved issues (panic).
> So I think this series is more 'RFC' at this point?
>

yes, RFC.  WIP suggests I know whats needed next.

certainly Ive seen a variety of sleeping while atomic ... BUGs,



> I was also expecting to see updates to the actual printing functions
> to now use dp->site. But perhaps I missed those bits?

those bits were changed in patch 5
basically changes are ~ s/dp/dc/g, and in ddebug_getsite / _putsite

>
> Thanks,
>
> -Jason
>
> > $ kruna --kopt debug_locks_proceed=1
> >




 [1.915768]  ? rest_init+0x24d/0x24d
> > [1.916417]  kernel_init+0xaf/0x103
> > [1.917038]  ret_from_fork+0x22/0x30
> > [1.917726] Kernel Offset: disabled
> > [1.918324] ---[ end Kernel panic - not syncing: Requested init /bin/sh 
> > failed (error -14). ]---
> > QEMU 5.1.0 monitor - type 'help' for more information
> >
> > Signed-off-by: Jim Cromie 
> > ---
> >  lib/dynamic_debug.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
> > index 534d73e45844..0c69aa52395d 100644
> > --- a/lib/dynamic_debug.c
> > +++ b/lib/dynamic_debug.c
> > @@ -165,7 +165,9 @@ static void ddebug_putsite(struct _ddebug *dp)
> >   /* need to keep site until zmap backing exists */
> >   return;
> >
> > - /* always unmap for now. if !pr-debug has locking issues */
> > + if (dp->flags & _DPRINTK_FLAGS_PRINT)
> > + return; /* keep maps of enabled pr_debugs */
> > +
> >   zs_unmap_object(dd_callsite_zpool, dp->zhandle);
> >   dp->site = NULL;
> >  }
> >


Re: [PATCH 7/7] dyndbg: enable 'cache' of active pr_debug callsites

2020-11-25 Thread Jason Baron



On 11/25/20 2:36 PM, Jim Cromie wrote:
> In ddebug_putsite(), dont zs_unmap the callsite if it is enabled for
> printing.  This means that the next time this pr_debug callsite is
> executed, the _getsite() will succeed quickly without remapping the
> zrec.
> 
> Once the callsite is disabled via >control, a following _putsite()
> will see the flag cleared, and zs_unmap it.
> 
> This changes the lifetime of our zs_mappings from brief (only for the
> single printk) to as long as uptime (if a prdebug is enabled til
> poweroff).  This appears to be triggering the ensuing mayhem.
> 
> I am able to get through init, to root console, by disabling all
> dynamic-debugs, including the ones that are enabled at compile.
> 
> $ kruna --kopt \*.dyndbg=-p --kopt debug_locks_proceed=1
> 
> but enabling any pr-debug crashes.
> Plain old boot also panics, as pasted below.
> 


Hi Jim,

Looks like you've made a lot of progress with this series...but if I
understand correctly there are still some unresolved issues (panic).
So I think this series is more 'RFC' at this point?

I was also expecting to see updates to the actual printing functions
to now use dp->site. But perhaps I missed those bits?

Thanks,

-Jason

> $ kruna --kopt debug_locks_proceed=1
> 
> [0.443926] 
> -
> [0.445085] BUG:  14 unexpected failures (out of 350) - debugging 
> proceeding anyway! |
> [0.446498] 
> -
> ...
> [0.617347] dyndbg: 224 modules, 2014 entries and 8960 bytes in ddebug 
> tables, 80560 bytes in __dyndbg section, 80560 bytes in __dyndbg_callsites 
> section
> [0.617537] dyndbg: 2014 entries. repeated entries: 1789 module 1667 file 
> 809 func
> ...
> [1.456548] zswap: loaded using pool zstd/zsmalloc
> [1.462802] dyndbg: total pages: 24 compaction: 0
> [1.463965] PM:   Magic number: 12:980:889
> [1.464860] BUG: sleeping function called from invalid context at 
> mm/slab.h:506
> [1.466150] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, 
> name: swapper/0
> [1.467282] 3 locks held by swapper/0/1:
> [1.467936]  #0: 82875460 (cpu_hotplug_lock){}-{0:0}, at: 
> cpufreq_register_driver+0xb5/0x2f0
> [1.469622]  #1: 888003cfbd28 (subsys mutex#5){+.+.}-{4:4}, at: 
> subsys_interface_register+0x5f/0x130
> [1.471448]  #2: 8880057f00b0 (>lock){.+.+}-{3:3}, at: 
> zs_map_object+0x7b/0x2b0
> [1.472666] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.10.0-rc3-00304-g7a4ad0dc5042 #75
> [1.474007] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.13.0-3.fc33 04/01/2014
> [1.475314] Call Trace:
> [1.475682]  dump_stack+0x7d/0x9f
> [1.476162]  ___might_sleep.cold+0xa6/0xb7
> [1.476745]  __might_sleep+0x46/0x80
> [1.477266]  __kmalloc_track_caller+0x1dd/0x2d0
> [1.477895]  kstrdup_const+0x45/0x70
> [1.478678]  __kernfs_new_node+0x4b/0x280
> [1.479280]  ? dynamic_emit_prefix+0x1dd/0x1f0
> [1.479907]  ? __dynamic_pr_debug+0x8a/0xb0
> [1.480496]  kernfs_create_dir_ns+0x44/0xc0
> [1.481188]  sysfs_create_dir_ns+0x6c/0xd0
> [1.481820]  kobject_add_internal+0x102/0x300
> [1.482505]  kobject_init_and_add+0x71/0xa0
> [1.483194]  ? kmem_cache_alloc_trace+0x21d/0x270
> [1.483993]  ? cpufreq_online+0x62/0x960
> [1.484628]  cpufreq_online+0xa5/0x960
> [1.485325]  ? system_root_device_release+0x10/0x10
> [1.486223]  cpufreq_add_dev+0x79/0x90
> [1.486868]  subsys_interface_register+0x11e/0x130
> [1.487787]  cpufreq_register_driver+0x171/0x2f0
> [1.488679]  acpi_cpufreq_init+0x266/0x294
> [1.489508]  ? CPU_FREQ_GOV_ONDEMAND_init+0x12/0x12
> [1.490350]  do_one_initcall+0x5a/0x2c0
> [1.491037]  ? rcu_read_lock_sched_held+0x41/0x80
> [1.491820]  kernel_init_freeable+0x23b/0x287
> [1.492561]  ? rest_init+0x24d/0x24d
> [1.493180]  kernel_init+0x9/0x103
> [1.493825]  ret_from_fork+0x22/0x30
> [1.494557]
> [1.494804] =
> [1.495503] [ BUG: Invalid wait context ]
> [1.496134] 5.10.0-rc3-00304-g7a4ad0dc5042 #75 Tainted: GW
> [1.497346] -
> [1.497986] swapper/0/1 is trying to lock:
> [1.498668] 8299ad90 (iattr_mutex){+.+.}-{4:4}, at: 
> kernfs_xattr_get+0x25/0x60
> [1.499855] other info that might help us debug this:
> [1.500655] context-{5:5}
> [1.501118] 3 locks held by swapper/0/1:
> [1.501732]  #0: 82875460 (cpu_hotplug_lock){}-{0:0}, at: 
> cpufreq_register_driver+0xb5/0x2f0
> [1.503384]  #1: 888003cfbd28 (subsys mutex#5){+.+.}-{4:4}, at: 
> subsys_interface_register+0x5f/0x130
> [1.505289]  #2: 8880057f00b0 (>lock){.+.+}-{3:3}, at: 
> zs_map_object+0x7b/0x2b0
> [1.506743] stack backtrace:
> [1.507228] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
> 

[PATCH 7/7] dyndbg: enable 'cache' of active pr_debug callsites

2020-11-25 Thread Jim Cromie
In ddebug_putsite(), dont zs_unmap the callsite if it is enabled for
printing.  This means that the next time this pr_debug callsite is
executed, the _getsite() will succeed quickly without remapping the
zrec.

Once the callsite is disabled via >control, a following _putsite()
will see the flag cleared, and zs_unmap it.

This changes the lifetime of our zs_mappings from brief (only for the
single printk) to as long as uptime (if a prdebug is enabled til
poweroff).  This appears to be triggering the ensuing mayhem.

I am able to get through init, to root console, by disabling all
dynamic-debugs, including the ones that are enabled at compile.

$ kruna --kopt \*.dyndbg=-p --kopt debug_locks_proceed=1

but enabling any pr-debug crashes.
Plain old boot also panics, as pasted below.

$ kruna --kopt debug_locks_proceed=1

[0.443926] -
[0.445085] BUG:  14 unexpected failures (out of 350) - debugging proceeding 
anyway! |
[0.446498] -
...
[0.617347] dyndbg: 224 modules, 2014 entries and 8960 bytes in ddebug 
tables, 80560 bytes in __dyndbg section, 80560 bytes in __dyndbg_callsites 
section
[0.617537] dyndbg: 2014 entries. repeated entries: 1789 module 1667 file 
809 func
...
[1.456548] zswap: loaded using pool zstd/zsmalloc
[1.462802] dyndbg: total pages: 24 compaction: 0
[1.463965] PM:   Magic number: 12:980:889
[1.464860] BUG: sleeping function called from invalid context at 
mm/slab.h:506
[1.466150] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: 
swapper/0
[1.467282] 3 locks held by swapper/0/1:
[1.467936]  #0: 82875460 (cpu_hotplug_lock){}-{0:0}, at: 
cpufreq_register_driver+0xb5/0x2f0
[1.469622]  #1: 888003cfbd28 (subsys mutex#5){+.+.}-{4:4}, at: 
subsys_interface_register+0x5f/0x130
[1.471448]  #2: 8880057f00b0 (>lock){.+.+}-{3:3}, at: 
zs_map_object+0x7b/0x2b0
[1.472666] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.10.0-rc3-00304-g7a4ad0dc5042 #75
[1.474007] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-3.fc33 04/01/2014
[1.475314] Call Trace:
[1.475682]  dump_stack+0x7d/0x9f
[1.476162]  ___might_sleep.cold+0xa6/0xb7
[1.476745]  __might_sleep+0x46/0x80
[1.477266]  __kmalloc_track_caller+0x1dd/0x2d0
[1.477895]  kstrdup_const+0x45/0x70
[1.478678]  __kernfs_new_node+0x4b/0x280
[1.479280]  ? dynamic_emit_prefix+0x1dd/0x1f0
[1.479907]  ? __dynamic_pr_debug+0x8a/0xb0
[1.480496]  kernfs_create_dir_ns+0x44/0xc0
[1.481188]  sysfs_create_dir_ns+0x6c/0xd0
[1.481820]  kobject_add_internal+0x102/0x300
[1.482505]  kobject_init_and_add+0x71/0xa0
[1.483194]  ? kmem_cache_alloc_trace+0x21d/0x270
[1.483993]  ? cpufreq_online+0x62/0x960
[1.484628]  cpufreq_online+0xa5/0x960
[1.485325]  ? system_root_device_release+0x10/0x10
[1.486223]  cpufreq_add_dev+0x79/0x90
[1.486868]  subsys_interface_register+0x11e/0x130
[1.487787]  cpufreq_register_driver+0x171/0x2f0
[1.488679]  acpi_cpufreq_init+0x266/0x294
[1.489508]  ? CPU_FREQ_GOV_ONDEMAND_init+0x12/0x12
[1.490350]  do_one_initcall+0x5a/0x2c0
[1.491037]  ? rcu_read_lock_sched_held+0x41/0x80
[1.491820]  kernel_init_freeable+0x23b/0x287
[1.492561]  ? rest_init+0x24d/0x24d
[1.493180]  kernel_init+0x9/0x103
[1.493825]  ret_from_fork+0x22/0x30
[1.494557]
[1.494804] =
[1.495503] [ BUG: Invalid wait context ]
[1.496134] 5.10.0-rc3-00304-g7a4ad0dc5042 #75 Tainted: GW
[1.497346] -
[1.497986] swapper/0/1 is trying to lock:
[1.498668] 8299ad90 (iattr_mutex){+.+.}-{4:4}, at: 
kernfs_xattr_get+0x25/0x60
[1.499855] other info that might help us debug this:
[1.500655] context-{5:5}
[1.501118] 3 locks held by swapper/0/1:
[1.501732]  #0: 82875460 (cpu_hotplug_lock){}-{0:0}, at: 
cpufreq_register_driver+0xb5/0x2f0
[1.503384]  #1: 888003cfbd28 (subsys mutex#5){+.+.}-{4:4}, at: 
subsys_interface_register+0x5f/0x130
[1.505289]  #2: 8880057f00b0 (>lock){.+.+}-{3:3}, at: 
zs_map_object+0x7b/0x2b0
[1.506743] stack backtrace:
[1.507228] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
5.10.0-rc3-00304-g7a4ad0dc5042 #75
[1.509060] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-3.fc33 04/01/2014
[1.510505] Call Trace:
[1.510887]  dump_stack+0x7d/0x9f
[1.511456]  __lock_acquire.cold+0xb9/0x2cd
[1.512389]  ? usage_match+0x20/0x20
[1.512960]  ? __bfs+0xf3/0x220
[1.513442]  lock_acquire+0x137/0x3e0
[1.514069]  ? kernfs_xattr_get+0x25/0x60
[1.514724]  ? ___might_sleep+0x15c/0x190
[1.515369]  __mutex_lock+0x8e/0x940
[1.515866]  ? kernfs_xattr_get+0x25/0x60
[1.516515]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[1.517343]  ? 

[PATCH 7/7] dyndbg: enable 'cache' of active pr_debug callsites

2020-08-07 Thread Jim Cromie
in ddebug_zpool_put() dont zs_unmap the callsite, if it is enabled for
printing.  This will eliminate possibly repeated un-maps then re-maps
of enabled and invoked pr-debug callsites, and will promptly retire
all other uses.

But this causes kernel to BUG
[1.364303] BUG: sleeping function called from invalid context at 
mm/slab.h:567

[jimc@frodo build-v2]$ krun -a main.dyndbg=+pmf -q=-s -q=-S
./.virtme_mods/lib/modules/0.0.0
/usr/bin/qemu-system-x86_64 -fsdev 
local,id=virtfs1,path=/,security_model=none,readonly,multidevs=remap -device 
virtio-9p-pci,fsdev=virtfs1,mount_tag=/dev/root -fsdev 
local,id=virtfs5,path=/usr/local/lib/python3.8/site-packages/virtme-0.1.1-py3.8.egg/virtme/guest,security_model=none,readonly,multidevs=remap
 -device virtio-9p-pci,fsdev=virtfs5,mount_tag=virtme.guesttools -machine 
accel=kvm:tcg -watchdog i6300esb -cpu host -parallel none -net none -echr 1 
-serial none -chardev stdio,id=console,signal=off,mux=on -serial 
chardev:console -mon chardev=console -vga none -display none -kernel 
./arch/x86/boot/bzImage -append 
'virtme_link_mods=/home/jimc/projects/lx/linux.git/build-v2/.virtme_mods/lib/modules/0.0.0
 earlyprintk=serial,ttyS0,115200 console=ttyS0 psmouse.proto=exps 
"virtme_stty_con=rows 25 cols 102 iutf8" TERM=xterm-256color rootfstype=9p 
rootflags=version=9p2000.L,trans=virtio,access=any raid=noautodetect ro nokaslr 
dynamic_debug.verbose=3 module.dyndbg=+pm main.dyndbg=+pmf init=/bin/sh -- -c 
"mount -t tmpfs run /run;mkdir -p /run/virtme/guesttools;/bin/mount -n -t 9p -o 
ro,version=9p2000.L,trans=virtio,access=any virtme.guesttools 
/run/virtme/guesttools;exec /run/virtme/guesttools/virtme-init"' -s -S
Wrong EFI loader signature.
early console in extract_kernel
input_data: 0x033373a8
input_len: 0x00aba748
output: 0x0100
output_len: 0x025f9e28
kernel_total_size: 0x02e2c000
needed_size: 0x0300
trampoline_32bit: 0x0009d000

KASLR disabled: 'nokaslr' on cmdline.

Decompressing Linux... Parsing ELF... No relocation needed... done.
Booting the kernel.
[0.00] Linux version 5.8.0-00025-g4e76f4427bf8 (jimc@frodo) (gcc (GCC) 
10.2.1 20200723 (Red Hat 10.2.1-1), GNU ld version 2.34-4.fc32) #30 SMP Thu Aug 
6 16:39:03 MDT 2020
[0.00] Command line: 
virtme_link_mods=/home/jimc/projects/lx/linux.git/build-v2/.virtme_mods/lib/modules/0.0.0
 earlyprintk=serial,ttyS0,115200 console=ttyS0 psmouse.proto=exps 
"virtme_stty_con=rows 25 cols 102 iutf8" TERM=xterm-256color rootfstype=9p 
rootflags=version=9p2000.L,trans=virtio,access=any raid=noautodetect ro nokaslr 
dynamic_debug.verbose=3 module.dyndbg=+pm main.dyndbg=+pmf init=/bin/sh -- -c 
"mount -t tmpfs run /run;mkdir -p /run/virtme/guesttools;/bin/mount -n -t 9p -o 
ro,version=9p2000.L,trans=virtio,access=any virtme.guesttools 
/run/virtme/guesttools;exec /run/virtme/guesttools/virtme-init"
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
[0.00] x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
[0.00] x86/fpu: Enabled xstate features 0x1f, context size is 960 
bytes, using 'compacted' format.
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x07fd] usable
[0.00] BIOS-e820: [mem 0x07fe-0x07ff] reserved
[0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved
[0.00] BIOS-e820: [mem 0xfffc-0x] reserved
[0.00] printk: bootconsole [earlyser0] enabled
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.8 present.
[0.00] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 
04/01/2014
[0.00] Hypervisor detected: KVM
[0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
[0.00] kvm-clock: cpu 0, msr 3141001, primary cpu clock
[0.00] kvm-clock: using sched offset of 218135524 cycles
[0.000900] clocksource: kvm-clock: mask: 0x max_cycles: 
0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[0.003580] tsc: Detected 2591.998 MHz processor
[0.004611] last_pfn = 0x7fe0 max_arch_pfn = 0x4
[0.005518] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
[0.013334] found SMP