I must admit I've never played around with C++ demangling, but I wonder if our 
purposes in demangling might inform how we do this?

We use demangled names for a couple of purposes.  One is to print names in 
backtraces and thread reporting when we stop.  For the most part the requests 
we've gotten for this is that the full demangled names are too noisy and 
impossible to read and we need to cut them down for usability's sake.  For 
instance, we added a display mode to the swift demangler so that backtraces 
were actually useful.  But in any case, this part can be done lazily when a 
name shows up in a backtrace, and so is not so performance sensitive.

The other reason we use them is to allow the various name lookups to work with 
human-level names (often partially specialized) and find their way to the 
actual symbols.  This is generally why we have to do mass demangling of symbols 
when we read in a module.  Having a full demangled name here does allow folks 
to specify a particular overload (for setting breakpoints, etc.) but that part 
of our symbol lookups is more frustrating than helpful because you have to know 
pretty much exactly how the compiler spelled the demangled name, at which point 
it's generally easier just to use the mangled name.

So I wonder if it wouldn't be possible to make a demangle that doesn't attempt 
full fidelity, but rather is crafted to pick out the pieces that we actually 
need and use to do heuristic name matches, and then we could use the faithful 
demangler when we are intentionally presenting a name - at which point the 
speed will be much less important.

I'm probably missing some uses of demangled names that might not make this 
possible, but it seems worth considering.


> The mangled name length threshold would be the easiest to implement.
> However, I fear we may not be able to find a good cutoff length,
> because it's not the length of it that matters, but the number (and
> recursiveness) of back-references. For example, I was able to find a
> mangled name of 757 characters in lldb:
> _ZN12lldb_private23ScriptInterpreterPython21InitializeInterpreterEPFvvEPFbPKcS4_RKSt10shared_ptrINS_10StackFrameEERKS5_INS_18BreakpointLocationEEEPFbS4_S4_S9_RKS5_INS_10WatchpointEEEPFbS4_PvRKNS_10SharingPtrINS_11ValueObjectEEEPSM_RKS5_INS_18TypeSummaryOptionsEERNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEPFSM_S4_S4_SR_EPFSM_S4_S4_S5_INS_8DebuggerEEEPFmSM_jEPFSM_SM_jEPFiSM_S4_EPFSM_SM_EPFSP_SM_EPFbSM_ES1N_S1J_PFbS4_S4_RS19_S4_RNS_19CommandReturnObjectES5_INS_19ExecutionContextRefEEEPFbSM_S1O_S4_S1Q_S1S_EPFbS4_S4_S1O_EPFSM_S4_S4_RKS5_INS_7ProcessEEEPFbS4_S4_RS20_S13_EPFbS4_S4_RS5_INS_6ThreadEES13_EPFbS4_S4_RS5_INS_6TargetEES13_EPFbS4_S4_RS7_S13_EPFbS4_S4_RSP_S13_EPFSM_SM_S4_RKS2E_EPFSM_S4_S4_RKS5_INS_10ThreadPlanEEEPFbSM_S4_PNS_5EventERbE
> This demangles string of lenght 2534 and I think it would be good to
> handle it. On the other hand, I was able to produce a mangled name of
> only 168 characters:
> which demanges to a 70MB string. (It takes about 3 seconds to compile
> a file with this symbol and 0.8s to demangle it).
> So we may need limit the on the output buffer size instead, but this
> will require cooperation from the demangling library. Fortunately, all
> targets nowadays use either the "fast" demangler or
> llvm::itaniumDemangle by default, which we can modify to add a
> threshold like this.
> pl
>> That's true, but shouldn't it be possible to demangle up until the last
>> point you got something meaningful?  (I don't know the details of itanium
>> mangling, just assuming this is possible)
>> anywhere you cut the string many things can go wrong. I think this would
>> fall under the "start to demangle the string and if the output buffer goes
>> over a certain length, abort the demangling which is solution #4 from my
>> original email.
>>> If you just cut off the string, then it might not demangle without an
>>> error if you truncate the mangled string at a specific point...
>>> What about doing a partial demangle?   Take at most 1024 (for example)
>>> characters from the mangled name, demangle that, and then display ... at the
>>> end.
>>>> I have an issue where I am debugging a C++ binary that is around 250MB in
>>>> size. It contains some mangled names that are crazy:
>>>> _ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiEEEESI_S7_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_EEEEESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESt6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_
>>>> This de-mangles to something that is 72MB in size and takes 280 seconds
>>>> (try running "time c++filt -n" on the above string).
>>>> There are probably many symbols likes this in this binary. Currently lldb
>>>> will de-mangle all names in the symbol table so that we can chop up the
>>>> names so we know function base names and we might be able to classify a 
>>>> base
>>>> name as a method or function for breakpoint categorization.
>>>> My questions is: how do we work around such issues in LLDB? A few
>>>> solutions I can think of:
>>>> 1 - time each name demangle and if it takes too long somehow stop
>>>> de-mangling similar symbols or symbols over a certain length?
>>>> 2 - allow a setting that says "don't de-mangle names that start with..."
>>>> and the setting has a list of prefixes.
>>>> 3 - have a setting that turns off de-mangling symbols over a certain
>>>> length all of the time with a default of something like 256 or 512
>>>> 4 - modify our FastDemangler to abort if the de-mangled string goes over
>>>> a certain limit to avoid bad cases like this...
>>>> #1 would still mean we get a huge delay (like 280 seconds) when starting
>>>> to debug this binary, but might prevent multiple symbols from adding to 
>>>> that
>>>> delay...
>>>> #2 would require debugging debugging once and then knowing which symbols
>>>> took a while to de-mangle. If we time each de-mangle, we can warn that 
>>>> there
>>>> are large mangled names and print the mangled name so the user might know?
>>>> #3 would disable de-mangling of long names at the risk of not de-mangling
>>>> names that are close to the limit
>>>> #4 requires that our FastDemangle code can decode the string mangled
>>>> string. The fast de-mangler currently aborts on tricky de-mangling and we
>>>> fall back onto cxa_demangle from the C++ library which doesn't not have a
>>>> cutoff on length...
>>>> Can anyone else think of any other solutions?
>>>> Greg Clayton
