[Bug debug/99178] Emit .debug_names

2024-01-10 Thread tromey at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

--- Comment #7 from Tom Tromey  ---
(In reply to David Blaikie from comment #6)

> Ideally that'd be detected by looking at the abbreviation table, rather than
> the augmentation string - if parent info is necessary for a usage of the
> table, that'd be a generic way to check for it & ensure the unusable indexes
> are ignored while not ignoring usable ones.

Good idea.

> Ah, but yeah, if you need extensions, then positive augmentation string
> checking seems likely necessary.

It's possible we could accept indexes without the extensions and sort of
limp along or have bugs.  Not sure offhand.

> (though this starts to feel like websites checking browser versions,
> unfortunately :/ )

https://github.com/rust-lang/rust/issues/41252#issuecomment-293676579

> Any sense of how bad the performance is if names without template parameters
> (strawman: this could be communicated via another flag on the entry in the
> index) did require DWARF parsing to check template parameters?

It can be bad -- gdb actually has a related bug right now:

https://sourceware.org/bugzilla/show_bug.cgi?id=30520

(There's a few possible dups of this.)

Maybe it would be possible to do some kind of 2-phase expansion.
But we already have 2 DWARF scanners and this would be adding a 3rd...

> Hmm, I missed a step here - perhaps you can help me understand. Maybe,
> ultimately, I agree with you here - I've pushed back on the lldb folks
> relying on character identical name lookup in the index due to the problems
> you've described (there's no real canonical demangling) - but where does
> DWARF say that writers should "use the system demangling style"?

https://wiki.dwarfstd.org/Best_Practices.md

See also bug#94845.

> Oh, that makes loads of sense - yeah, beats me how lldb deals with tab
> completion using the hash table... maybe it builds some other side table or
> something. That's something I've wondered about for a while and it's good to
> know how GDB deals with this, and why its index looks different.

FWIW you can't really go by what gdb does today: its .debug_names it just
super wrong, and .gdb_index is a hash table but explicitly relies on the
name canonicalization that gdb does.  Which itself is unsafe (like what if
that changes between versions) but we didn't really think through all the
problems back then.

> Does that mean you want the entries in the table to be sorted? Do you emit
> them that way and then, based on augmentation string, rely on them being
> sorted? Or do a quick scan at startup and build a sorted list in memory
> regardless of the order in .debug_names? (.debug_names entry list isn't
> suited to random access, is it? The records aren't all the same size so I
> don't think you could binary search through them)

The new reader just reads the entries and creates the same internal data
structures that the new DWARF scanner creates.  It handles sorting,
canonicalization, etc, during the scan.  This work is done in a worker
thread to hide it from the user (although I think it's reasonably quick
anyway).

> > The only other thing I can think of offhand is that the reliance on
> > .debug_str means that gdb may have to augment the string table when
> > DW_FORM_string is in use.  This is also caused by the "(anonymous 
> > namespace)"
> > special case.
> 
> And Split DWARF, I guess? The strings wouldn't otherwise be in the
> executables .debug_str if not for the index - they'd only appear in the
> dwo/dwp .debug_str.dwo sections.

Yeah, I haven't really looked at this too much.

[Bug debug/99178] Emit .debug_names

2024-01-10 Thread dblaikie at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

--- Comment #6 from David Blaikie  ---
(In reply to Tom Tromey from comment #5)
> (In reply to David Blaikie from comment #4)
> 
> I don't remember filing this bug.  At the time maybe I thought it
> would be worthwhile to have "end to end" .debug_names generation,
> that is, to try to have the index and also not slow down
> compilation or link times too much.  Not sure how I feel about it now.

Certainly what's been possible with .debug_gnu_pubnames/types + -Wl,--gdb-index
today. It'd be nice to have that same workflow, but in a more portable form.

> > It'd be great to get GCC/GDB folks take on the name tables - get some
> > practical experience with their contents, file any bugs about missing
> > elements, etc. It's possible they're leaning too heavily towards lldb's
> > style of name lookup since they derived from an existing apple
> > implementation there & it'd be good to generalize them where needed.
> 
> gdb has long done the wrong thing with .debug_names.
> https://sourceware.org/bugzilla/show_bug.cgi?id=24820

Ah, thanks for the link - I followed up there with some context/thoughts.

> I don't really know how/why that happened.  However, I wrote patches to
> fix it:
> 
> https://sourceware.org/pipermail/gdb-patches/2023-December/204949.html
> 
> This version of gdb will look at the augmentation string and only
> allow certain indexes to be used.  This is done to avoid known bugs --
> mainly coming from the "old" (current) gdb, but also clang has some
> issues (from memory, it doesn't include parent info).

Ideally that'd be detected by looking at the abbreviation table, rather than
the augmentation string - if parent info is necessary for a usage of the table,
that'd be a generic way to check for it & ensure the unusable indexes are
ignored while not ignoring usable ones.

> Also, gdb relies on its extensions (see below).

Ah, but yeah, if you need extensions, then positive augmentation string
checking seems likely necessary.

(though this starts to feel like websites checking browser versions,
unfortunately :/ )

> When writing the new scanner, I found a few bugs in DWARF related to
> which names appear in the index.  I don't recall offhand what these are,
> and I didn't file them due to the late unpleasantness (sorry).  

No worries - and totally understandable. If they happen to come to mind at any
point, I'd love to hear about them.

> They could probably be dug up by reading the scanner and comparing to the 
> spec.
> 
> gdb will also emit some extensions.  You can see these in the docs patch:
> 
> https://sourceware.org/pipermail/gdb-patches/2023-December/204947.html

Awesome - appreciate the documentation!

> Generally the goal of these is to avoid having to do any DIE reading
> in order to reject a lookup.  Note that this means gdb relies on
> template parameters being in the name -- something we discussed in
> gdb bugzilla a bit.

Yeah, I'd love to figure out how to deal with that better, but don't have
immediate suggestions.

Any sense of how bad the performance is if names without template parameters
(strawman: this could be communicated via another flag on the entry in the
index) did require DWARF parsing to check template parameters? Is that
something that'd be an option? (especially with a flag in the entry, then it'd
only be a runtime cost to those using this naming mechanism - as much as I'd
like to move to that mechanism being normal/the default, perhaps this would be
a safe transition path)

But I guess Google's probably the only one super interested in the simplified
template names at the moment (& we're mostly investing in lldb these days), so
might be unlikely anyone would be signing up to do that work in gdb.

> With these patches, gdb will not generate or use the hash table.
> This is explained here:
> 
> https://sourceware.org/pipermail/gdb-patches/2023-December/204963.html

Oh, that's got some good details/answers some of my questions - thanks!

> I consider this hash table to be essentially useless in general, due to the 
> name canonicalization problem -- while DWARF says that writers should use the 
> system demangling style, (1) this style varies across systems, so it can't 
> truly be relied on; and (2) at least GCC and one other compiler don't 
> actually follow this part of the spec anyway.

Hmm, I missed a step here - perhaps you can help me understand. Maybe,
ultimately, I agree with you here - I've pushed back on the lldb folks relying
on character identical name lookup in the index due to the problems you've
described (there's no real canonical demangling) - but where does DWARF say
that writers should "use the system demangling style"?

> It's important to note, though, that even if the hash was somehow useful, GDB 
> probably still would not use it -- a sorted list of names is needed for 
> completion and performs reasonably well for other lookups, so a hash table is 
> just overhead, IMO.

Oh, that makes loa

[Bug debug/99178] Emit .debug_names

2024-01-09 Thread tromey at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

--- Comment #5 from Tom Tromey  ---
(In reply to David Blaikie from comment #4)

I don't remember filing this bug.  At the time maybe I thought it
would be worthwhile to have "end to end" .debug_names generation,
that is, to try to have the index and also not slow down
compilation or link times too much.  Not sure how I feel about it now.

> It'd be great to get GCC/GDB folks take on the name tables - get some
> practical experience with their contents, file any bugs about missing
> elements, etc. It's possible they're leaning too heavily towards lldb's
> style of name lookup since they derived from an existing apple
> implementation there & it'd be good to generalize them where needed.

gdb has long done the wrong thing with .debug_names.
https://sourceware.org/bugzilla/show_bug.cgi?id=24820

I don't really know how/why that happened.  However, I wrote patches to
fix it:

https://sourceware.org/pipermail/gdb-patches/2023-December/204949.html

This version of gdb will look at the augmentation string and only
allow certain indexes to be used.  This is done to avoid known bugs --
mainly coming from the "old" (current) gdb, but also clang has some
issues (from memory, it doesn't include parent info).  Also, gdb relies
on its extensions (see below).

When writing the new scanner, I found a few bugs in DWARF related to
which names appear in the index.  I don't recall offhand what these are,
and I didn't file them due to the late unpleasantness (sorry).  They
could probably be dug up by reading the scanner and comparing to the spec.

gdb will also emit some extensions.  You can see these in the docs patch:

https://sourceware.org/pipermail/gdb-patches/2023-December/204947.html

Generally the goal of these is to avoid having to do any DIE reading
in order to reject a lookup.  Note that this means gdb relies on
template parameters being in the name -- something we discussed in
gdb bugzilla a bit.

With these patches, gdb will not generate or use the hash table.
This is explained here:

https://sourceware.org/pipermail/gdb-patches/2023-December/204963.html

The only other thing I can think of offhand is that the reliance on
.debug_str means that gdb may have to augment the string table when
DW_FORM_string is in use.  This is also caused by the "(anonymous namespace)"
special case.

[Bug debug/99178] Emit .debug_names

2024-01-09 Thread dblaikie at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

David Blaikie  changed:

   What|Removed |Added

 CC||dblaikie at gmail dot com

--- Comment #4 from David Blaikie  ---
(In reply to Mark Wielaard from comment #3)
> So if the compiler would emit the .debug_name index would that make any
> link/post-processing steps easier or more efficient?

Right - that's the intent. You can omit the hash table part of .debug_names -
in which case it's just like a newer pubnames/pubtypes - maybe with the
opportunity to have more guaranteed contents (the lack of those guarantees I
think is why debug_gnu_pubnames/types came to be, yeah?).

At least on the lld side, we're working on adding the requisite merging - like
`-Wl,--gdb-index`, except instead of merging debug_gnu_pubnames/pubtypes ->
gdb_index, it merges debug_names -> debug_names.

This is relevant/important/necessary for Split DWARF in particular, where the
linker wouldn't have access to the DWARF to index it anyway (& you don't always
want to run the dwp tool, which would have access to all the DWARF to index it
- but it'd be nice to avoid that in iterative developer scenarios, and save it
only for archival situations) - and even if you do have all the DWARF, it's
certainly faster to merge some tables than to reparse all the DWARF from
scratch.

It'd be great to get GCC/GDB folks take on the name tables - get some practical
experience with their contents, file any bugs about missing elements, etc. It's
possible they're leaning too heavily towards lldb's style of name lookup since
they derived from an existing apple implementation there & it'd be good to
generalize them where needed.

[Bug debug/99178] Emit .debug_names

2021-02-22 Thread mark at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

--- Comment #3 from Mark Wielaard  ---
So if the compiler would emit the .debug_name index would that make any
link/post-processing steps easier or more efficient?

[Bug debug/99178] Emit .debug_names

2021-02-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||mark at gcc dot gnu.org,
   ||vries at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
I'm not sure if it is a good idea.
The indexes are useful when they cover the whole binary or library, when each
TU has its own index, then I don't see how it is helpful for the consumer to
search each TU's index compared to walking the DIEs.
I think .debug_names should be created during linking or post-linking, so ld,
dwz or gdb.

[Bug debug/99178] Emit .debug_names

2021-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99178

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-02-22
   Severity|normal  |enhancement
Version|unknown |11.0

--- Comment #1 from Richard Biener  ---
Confirmed.