Re: [RFC PATCH v2 00/19] dynamic debug diet plan

2021-01-06 Thread jim . cromie
On Tue, Dec 29, 2020 at 11:56 AM Joe Perches  wrote:
>
> On Fri, 2020-12-25 at 13:19 -0700, Jim Cromie wrote:
> > Well, we're mostly overeating, but we can all look forward to a diet
> > in January.  And more exersize.
> >
> > dyndbg's compiled-in data-table currently uses 56 bytes per prdebug;
> > this includes 3 pointers to hierarchical "decorator" data, which is
> > primarily for adding "module:function:line:" prefixes to prdebug
> > messages, and for enabling and modifying those prdebugs selectively.
> >
> > This patchset decouples "decorator" data, and makes it optional, and
> > disposable.  By separating that data, it opens up possiblities to
> > compress it, swap it out, map it selectively, etc.
>
> While this may be somewhat useful, what debugging does it really help?
> Are there really memory limited platforms that enable dynamic debug?
>
>

hi Joe, happy new year!

Who wants to drop 5 lbs of weight for free ?
You dont even have to put down the turkey leg.

Seriously, I cant point to any particular use case that suddenly becomes
possible. and there are no powerful new debugging features here either.
but
dynamic_debug: add an option to enable dynamic debug for modules only

Recently reduced dyndbg's system footprint, surely to open up new use
cases, users.  This is an orthogonal (and more involved) approach to
dropping more weight, and improving the coefficients in a
user's cost-benefit equation.


I tried out DRM as a user
https://lore.kernel.org/lkml/20201204035318.332419-1-jim.cro...@gmail.com/

it works, but I got the impression Ville is inclined to use static-keys
directly to replace drm_debug_enabled(), avoiding dyndbg overheads.

The possible in-memory savings here are asymptotically 24/64 (56 maybe)
of the footprint, which is easy if subsystems dont need the
decorators/selectors,
DRM has that option.

Possible savings in dyndbg aside, a static-key takes 16 bytes.
I think I can get struct _ddebug down to 32 bytes (RFC on 18,19 particularly)
so Im still playing catch-up wrt what a minimal static-keys drm update could do.
Theres also a vector of jump-labels form of static-keys
that Ville may be able to exploit too.

IOW, drm is not my ace card.  but memory savings is still nice.

Where Id like to RFC:

(patch-19) DEFINE_DYNAMIC_DEBUG_TABLE(i915)  worked.
it adds a pair of header records into the 2 elf sections, It will let me drop
the site pointer currently needed to get each site's decorations, when needed.

But I had to code it in manually, as a test. Its not a general solution.

I'd like to figure out how to have it defined in module scope automatically,
and weakly, and maybe-unused, so that if the module does not have any
pr_debugs, the header record is silently excluded, and that module's
sections are left empty.

When the header is linked in, as with my hacked i915.ko,
It becomes possible to finally lose the  _ddebug.site pointer.
.module_index and container-of can replace it:
it gets us from struct _ddebug *p back up to the header,
then we could follow a header.sites_vector[.module_index]
to the right decorators/selectors.
Its a modest cost increase for a rarely used path,
to shave 8/40 off our minimum footprint

Then the total footprint reduces back to 56 bytes/callsite,
but now with 24 optional, and manageable..
module_index would be a fine lookup to a compressed RO table of callsites,
and a good-enough key to a hashtable of active/enabled pr-debug callsites.

I played with zs-pool to store callsite data. Though it had problems,
I did see 3/1 pages/zs-page, which is a decent (slightly pessimistic)
proxy for what could be had with another (block) compression choice.

Once compressed callsites works, we can drop and recycle the
__dyndbg_callsites section.


other pertinences:

the 2 section relative ordering may be a consequence of :
- natural ordering of compilation & lexical placement of the paired declarations
- OR the site pointer, and its initialization between the 2 records.
I suspect former.

if 2nd, dropping site may lose the constraint between 2 sections.
I havent tried yet to test the drop to see what happens,
I cannot use the current BUG_ON (site_iter != iter->site) construct.

I tried invoking TABLE from METADATA,
hoping that __weak and __maybe_unused would allow redundant definitions
it errored, something about "local" and "section" mumble.
I now believe that initialization in TABLE is part of the problem,



I tried :
$ objcopy --dump-section __dyndbg_callsites=dd_callsites
--dump-section __dyndbg=dd vmlinux.o

I got mostly null data, as if some final linking wasnt yet done.

[jimc@frodo local-i915m]$ od -c dd_callsites
000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0205620

trying it on vmlinux doesnt work;
objcopy: vmlinux: can't dump section '__dyndbg' - it does not exist:
file format not recognized
objcopy: vmlinux: can't dump section '__dyndbg_callsites' - it does
not exist: file format not recognized

[jimc@frodo local-i915m]$ ll dd*
-r

Re: [RFC PATCH v2 00/19] dynamic debug diet plan

2020-12-29 Thread Joe Perches
On Fri, 2020-12-25 at 13:19 -0700, Jim Cromie wrote:
> Well, we're mostly overeating, but we can all look forward to a diet
> in January.  And more exersize.
> 
> dyndbg's compiled-in data-table currently uses 56 bytes per prdebug;
> this includes 3 pointers to hierarchical "decorator" data, which is
> primarily for adding "module:function:line:" prefixes to prdebug
> messages, and for enabling and modifying those prdebugs selectively.
> 
> This patchset decouples "decorator" data, and makes it optional, and
> disposable.  By separating that data, it opens up possiblities to
> compress it, swap it out, map it selectively, etc.

While this may be somewhat useful, what debugging does it really help?
Are there really memory limited platforms that enable dynamic debug?





[RFC PATCH v2 00/19] dynamic debug diet plan

2020-12-25 Thread Jim Cromie
Well, we're mostly overeating, but we can all look forward to a diet
in January.  And more exersize.

dyndbg's compiled-in data-table currently uses 56 bytes per prdebug;
this includes 3 pointers to hierarchical "decorator" data, which is
primarily for adding "module:function:line:" prefixes to prdebug
messages, and for enabling and modifying those prdebugs selectively.

This patchset decouples "decorator" data, and makes it optional, and
disposable.  By separating that data, it opens up possiblities to
compress it, swap it out, map it selectively, etc.


In more detail, patchset does:

1- split struct _ddebug in 2, move "decorator" fields to _ddebug_callsites.

while this adds a pointer per site to memory footprint, it allows:

a- dropping decorations and storage, for some use cases.
   this could include DRM:
   - want to save calls to drm_debug_enabled()
   - use distinct categories, can map to format-prefixes, ex: "drm:kms:"
   - don't need "module:function:line" dynamic prefixes.
   - dont mind loss of info/context in /proc/dynamic_debug/control

b- ddebug_callsites[] contents are hierarchical, compressible.
c- ddebug_callsites[] in separate section is compressible as a block.
d- for just enabled prdebugs, could allocate callsites and fill from zblock.

2- make ddebug_callsites optional internally.
   This lets us drop them outright, for any reason, perhaps memory pressure.

3- allow dropping callsites by those users.
   echo module drm +D > /proc/dynamic_debug/control
   this doesnt currently recover __dyndbg_callsites storage

4- drop _ddebug.site, convert to _ddebug[N].property lookup.
   RFC is mostly here.

rev1: 
https://lore.kernel.org/lkml/20201125194855.2267337-1-jim.cro...@gmail.com/

rev2 differs by dropping zram attempt, making callsite data optional, etc.


Jim Cromie (19): against v5.10

  dyndbg: fix use before null check
1 dyndbg: split struct _ddebug, move display fields to new
_ddebug_callsite

2 dyndbg: refactor part of ddebug_change to ddebug_match_site
  dyndbg: accept null site in ddebug_match_site
  dyndbg: hoist ->site out of ddebug_match_site
  dyndbg: accept null site in ddebug_change
  dyndbg: accept null site in dynamic_emit_prefix
  dyndbg: accept null site in ddebug_proc_show
  
  dyndbg: optimize ddebug_emit_prefix
  dyndbg: avoid calling dyndbg_emit_prefix when it has no work
  
3 dyndbg: refactor ddebug_alter_site out of ddebug_change
  dyndbg: allow deleting site info via control interface
  
4 dyndbg: verify __dyndbg & __dyndbg_callsite invariant
  dyndbg+module: expose dyndbg_callsites to modules
  dyndbg: add ddebug_site_get/put api with pass-thru impl
  dyndbg: ddebug_site_get/put api commentary
  dyndbg: rearrange struct ddebug_callsites
  dyndbg: add module_index to struct _ddebug
  dyndbg: try DEFINE_DYNAMIC_DEBUG_TABLE

 drivers/gpu/drm/i915/i915_drv.c   |   3 +
 include/asm-generic/vmlinux.lds.h |   4 +
 include/linux/dynamic_debug.h |  97 ---
 kernel/module-internal.h  |   1 +
 kernel/module.c   |   9 +-
 lib/dynamic_debug.c   | 271 +-
 6 files changed, 283 insertions(+), 102 deletions(-)

-- 
2.29.2