[PATCH] D143640: [Tooling/Inclusion] Support overload symbols in the stdlib.

Kadir Cetinkaya via Phabricator via cfe-commits Mon, 13 Feb 2023 05:09:02 -0800

kadircet added a comment.

In D143640#4122603 <https://reviews.llvm.org/D143640#4122603>, @hokein wrote:

> Thanks for the comment, putting more thoughts.
>
> In D143640#4121998 <https://reviews.llvm.org/D143640#4121998>, @kadircet 
> wrote:
>
>> All this complexity for handling only 4 symbols feels wrong. is this the 
>> whole set? are there any other symbols that fall under this class? can we 
>> disambiguate all of them solely based on number of parameters?
>
> No, these 4 symbols is not the whole set, there are other symbols (e.g. 
> `isspace` from `<cctype>`, or `<locale>`), but these are not important, IMO.

I agree them being not important. But I think having such a complex 
infrastructure for just 4 symbols and making it more complex in the future when 
we want to handle these just sounds like a big negative.

> The number of parameters is the most critical information to disambiguate -- 
> there are some exceptions, e.g. `std::get()` has a few 1-parameter overloads 
> for array, vector, etc. We will add more information in the `SymbolName` 
> structure for disambiguation in the future.

Makes sense. but maybe there's a way that involves not relying on parameter 
count at all and something completely different. that's another reason why I 
was asking for other symbols that are causing problems today.

>> Downsides:
>>
>> - We are relying heavily on number of parameters in certain cases now
>
> Yeah, I think this is expected if we want to do disambiguations.
>
>> - We only have that information available for symbols with variants
>
> My intention of this information is internal, and merely used for 
> disambiguation. And we're not exposing them to public APIs.

I can see that purpose, but having those symbols available internally without 
any way for users to poke at them sounds like a shortcoming of the design 
that'll never address. Hence I was trying to avoid that.

>> - We're creating a discrepancy in Symbol::all() vs Symbol::named() views of 
>> the world
>
> Oh, this is not intended, we can remove this discrepancy, and make 
> `Symbol::all()` and `Symbol::named()` aligned.

I don't think removing that discrepancy by also dropping these symbols from 
::all is the right trade-off though. As they're still available as a `Symbol` 
through Recognizer. I think the more appropriate thing would require us to 
change `::named` to return multiple symbols, with enough details for 
disambiguation if needed.

>> - In addition to generator and hand curated list, now there's a 3rd place to 
>> augment symbol mapping
>
> Right, this bit is untweaked in this current patch (there is a FIXME), we can 
> merge it with the hand-curated list.
>
>> This complexity both in terms of mental load when introducing new symbols, 
>> and also possible discrepancies that can be encountered when using the 
>> public APIs makes me question if we're doing the sensible thing that'll 
>> generalize for the future variant symbols, or just trying to patch up a 
>> solution for couple cases we know to be not working.
>> I'd suggest:
>> 1- doing a little bit more design work here and at least getting rid of the 
>> burden on the public APIs and a way to introduce these symbols through hand 
>> curated lists (e.g. we should be returning all alternatives in 
>> Symbol::named, with enough details for the caller to have a chance to 
>> disambiguate and have maybe new syntax in raw symbol map files to indicate 
>> this variant situation).
>
> OK, looks like we have some different ideas on the design here. I'm not sure 
> returning all alternatives in `Symbol::named` and exposing all information 
> for disambiguation is a good idea (I'm also not sure it is an important 
> usecase). This seems complicating the public APIs too much.

I think we agree on them being not important so far. But I don't see why having 
a discrepancy between `Symbol`s received through `Recognizer` and 
`Symbol::named` is preferred instead of having a unified view between the two 
(which we already have right now, by excluding variants from both).
I was suggesting preserving this invariant because it's easier to reason about 
the library when we only have a single view rather than 2. Could you elaborate 
a little on why you think "exposing all information for disambiguation" is not 
a good idea? The public interfaces stays ~the same. We'll be having these extra 
information in `Symbol`, if we can find a meaningful set.

> My original design is
>
> - full disambiguation is *only* performed in the `Recognizer` API where the 
> input is the AST node
> - other qualified-name-based APIs `Symbol::named` remains the same (or best 
> effort returning the symbol with most common header)
>
> I share your concern about the current approach, it is not perfect, but I 
> think this is moving towards a right direction, the tradeoff seems OK to me 
> (rather than putting the special-case handling to *every* application, we 
> isolate it in one place).

I agree that sharing this logic with applications is definitely a plus (and is 
the main goal of this library). I am mostly concerned of current approach 
because it's adding too much complexity for handling of 2 special symbols, that 
might not generalise to others in the future and we can special case these 
symbols on the applications with only couple lines of code + this special 
casing still needs to exist in applications that uses Symbol::named interfaces. 
Therefore this discrepancy might create bugs in the future, just because people 
are used to behaviour in one side and expect the same on the other.

>> 2- or handling this in include-cleaner only, similar to how we do it in 
>> clangd and other applications, until we encounter more problematic symbols. 
>> as handling this in the general case seems hard (especially in the absence 
>> of an AST), and we haven't seen enough signals for symbols other than 
>> `std::move` to cause trouble.
>>
>> i feel like 2nd alternative gets us a long way with minimal effort. WDYT?
>
> The 2nd alternative definitely sounds good to me. Before we settle down the 
> design, I will prepare a patch to special-case these symbols in 
> include-cleaner.

Thanks a lot!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143640/new/

https://reviews.llvm.org/D143640

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D143640: [Tooling/Inclusion] Support overload symbols in the stdlib.

Reply via email to