Apologies for the delay, the email fell through the cracks somehow.

The updated patch looks like it would work alright, only needs a couple tests, 
e.g.:
https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/lib.rs#L413-L422
https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/v0.rs#L1442-L1444

Thanks,
- Eddy B.

On Tue, Dec 7, 2021, at 21:16, Mark Wielaard wrote:
> Hi Eddy,
>
> On Fri, 2021-12-03 at 01:14 +0200, Eduard-Mihai Burtescu wrote:
>> On Fri, Dec 3, 2021, at 00:07, Mark Wielaard wrote:
>> > On Thu, Dec 02, 2021 at 07:35:17PM +0200, Eduard-Mihai Burtescu
>> > wrote:
>> > > That also means that for consistency, suffixes like these should
>> > > be
>> > > handled uniformly for both v0 and legacy (as rustc-demangle
>> > > does),
>> > > since LLVM doesn't distinguish.
>> > 
>> > The problem with the legacy mangling is that dot '.' is a valid
>> > character. That is why the patch only handles the v0 mangling case
>> > (where dot '.' isn't valid).
>> 
>> Thought so, that's an annoying complication - however, see later down
>> why that's still not a blocker to the way rustc-demangle handles it.
>> 
>> > > You may even be able to get Clang to generate C++ mangled symbols
>> > > with ".llvm." suffixes, with enough application of LTO.  This is
>> > > not
>> > > unlike GCC ".clone" suffixes, AFAIK.  Sadly I don't think there's
>> > > a
>> > > way to handle both as "outside the symbol", without hardcoding
>> > > ".llvm." in the implementation.
>> > 
>> > We could use the scheme used by c++ where the .suffix is added as "
>> > [clone .suffix]", it even handles multiple dots, where something
>> > like
>> > _Z3fooi.part.9.165493.constprop.775.31805
>> > demangles to
>> > foo(int) [clone .part.9.165493] [clone .constprop.775.31805]
>> > 
>> > I just don't think that is very useful and a little confusing.
>> 
>> Calling it "clone" is a bit weird, but I just checked what rustc-
>> demangle
>> does for printing suffixes back out and it's not great either:
>> - ".llvm." (and everything after it) is completely removed
>> - any left-over suffixes (after demangling), if they start with ".",
>> are
>>   not considered errors, but printed out verbatim after the
>> demangling
>> 
>> > > I don't recall the libiberty demangling API having any provisions
>> > > for the demangler deciding that a mangled symbol "stops early",
>> > > which would maybe allow for a more general solution.
>> > 
>> > No, there indeed is no interface. We might introduce a new option
>> > flag
>> > for treating '.' as end of symbol. But do we really need that
>> > flexibility?
>> 
>> That's not what I meant - a v0 or legacy symbol is self-terminating
>> in
>> its parsing (or at the very least there are not dots allowed outside
>> of a length-prefixed identifier), so that when you see the start of
>> a valid mangled symbol, you can always find its end in the string,
>> even when that end is half-way through (and is followed by suffixes
>> or any other unrelated noise).
>> 
>> What I was imagining is a way to return to the caller the number of
>> chars from the start of the original string, that were demangled,
>> letting the caller do something else with the rest of that string.
>> (see below for how rustc-demangle already does something similar)
>>
>> > > Despite all that, if it helps in practice, I would still not mind
>> > > this patch landing in its current form, I just wanted to share my
>> > > perspective on the larger issue.
>> > 
>> > Thanks for that. Do you happen to know what other rust demanglers
>> > do?
>> 
>> rustc-demangle's internal API returns a pair of the demangler and the
>> "leftover" parts of the original string, after the end of the symbol.
>> You can see here how that suffix is further checked, and kept:
>> https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/lib.rs#L108-L138
>
> Yes, returning a struct that returns the style detected, the demangled
> string and the left over chars makes sense. But we don't have an
> interface like that at the moment, and I am not sure we (currently)
> have users who want this.
>
>> As mentioned above, ".llvm." is handled differently, just above the snippet
>> linked - perhaps it was deemed too common to let it pollute the output.
>
> But that also makes it a slightly odd interface. I would imagine that
> people would be interested in the .llvm. part. Now that just gets
> dropped.
>
> Since we don't have an interface to return the suffix (and I find the
> choice of dropping .llvm. but not other suffixes odd), I think we
> should just simply always drop the .suffix. I understand now how to do
> that for legacy symbols too, thanks for the hints.
>
> See the attached update to the patch. What do you think?
>
> Thanks,
>
> Mark
>
> Attachments:
> * 0001-libiberty-rust-demangle-ignore-.suffix.patch

Reply via email to