[PATCH] D143496: [clangd] Add support for missing includes analysis.

Kadir Cetinkaya via Phabricator via cfe-commits Wed, 15 Feb 2023 05:11:06 -0800

kadircet added a comment.

Thanks! Details mostly looks good, but i think we had different ideas about how 
the final diagnostics should look like. Let's try to clear that up a bit. My 
suggested proposal is:


- Having a main diagnostic for each symbol that doesn't have a satisfying 
include in the main file, attached to the first use of the symbol, with message 
`<foo> providing 'ns::Foo' is not directly included`, with severity warning
- Having notes attached to this main diagnostic, on rest of the references of 
the same symbol inside the main file, with message `Also used here`, with 
severity info
- Having a single fix attached to main diagnostic inserting the first provider.

I believe in the future the main diagnostic message should change to `No header 
providing 'ns::Foo' is directly included`, and we should have multiple fixes 
attached. But there's value in explicitly spelling the header name when there's 
only one fix to get rid of indirections.

This means:

  + we'll emit different diagnostics for each unsatisfied symbol even if 
they're satisfied by the same provider.
  + the user will be able to see usage sites for each symbol as a whole (as 
we're attaching them as notes), and if need be can make a decision around 
dropping the dependency or introducing a forward decl.
  + we won't "butcher" the diagnostic panel with 100 duplicated diagnostics all 
talking about the same thing.
  - in certain editors, like vscode, related information doesn't show up as 
extra decorations in code. so if the user ignores diagnostic panel and the 
first usage of a symbol is not visible. they'll likely miss the finding.

WDYT about this approach? any other alternatives you've in mind? I think it 
might make sense to get others opinion's here as well. Especially around the 
diagnostic message, as usually whatever is convenient to a single person isn't 
necessarily clear/obvious for every one :D



================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:516
+
+  std::vector<include_cleaner::SymbolReference> Macros =
+      collectMacroReferences(AST);
----------------
VitaNuo wrote:
> kadircet wrote:
> > 
> Why would you use `auto` here? The return type is not obvious from the 
> function call.
> 
> The style guide says: "types that you and your reviewer experience as 
> unnecessary clutter will very often provide useful information to others. For 
> example, you can assume that the return type of `make_unique<Foo>()` is 
> obvious, but the return type of `MyWidgetFactory()` probably isn't." 
> (http://go/cstyle#Type_deduction)
well, feel free to keep it. but at least, inside clangd codebase, we use auto 
frequently (maybe more than we should). especially in cases like this where a 
declaration would otherwise fit a single line and because the name of the 
initializer implies this will be a container of macro references and any extra 
details about the container is probably irrelevant.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:547
+convertIncludes(const SourceManager &SM,
+                std::vector<Inclusion> MainFileIncludes) {
+  include_cleaner::Includes Includes;
----------------
VitaNuo wrote:
> kadircet wrote:
> > you can just pass an `llvm::ArrayRef<Inclusion>` to prevent a copy
> By preventing a copy, do you mean that the construction of 
> `llvm::ArrayRef<Inclusion>` will only copy a pointer to the data rather than 
> the whole vector? AFAIU `const std::vector<Inclusion>&` should be even better 
> then, no copies involved. CMIIW.
well from performance-wise they're pretty close, passing by const ref doesn't 
mean you don't do any copies, it'll still require address of the entity to be 
signalled somehow, which is a pointer copy. passing an arrayref implies copying 
a pointer to data and the size (so it's slightly worse in that regard). but it 
can represent any chunk of contiguous memory, the data source doesn't need to 
be a vector. moreover you can easily pass a slice of a vector rather than the 
whole vector etc.

the performance implications rarely matters in practice, and the abstraction it 
provides on the interfaces is usually a benefit, e.g. if we were to change 
underlying type from vector to llvm::SmallVector, the APIs would still work and 
you don't need to think about any concrete types. hence we tend to prefer 
having ArrayRef on API boundaries whenever possible.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:512
+include_cleaner::Includes
+convertIncludes(const SourceManager &SM,
+                const std::vector<Inclusion> &MainFileIncludes) {
----------------
since this is a local symbol, either mark it as `static` or move it to 
anonymous namespace above so that it won't be visible to other translation 
units.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:513
+convertIncludes(const SourceManager &SM,
+                const std::vector<Inclusion> &MainFileIncludes) {
+  include_cleaner::Includes Includes;
----------------
nit: you can use `llvm::ArrayRef` instead of a `const std::vector &`. 
`ArrayRef` is a trivially copyable/constructible type for providing views into 
a consecutive set of elements.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:521
+        SM.getComposedLoc(SM.getMainFileID(), Inc.HashOffset);
+    TransformedInc.Line = Inc.HashLine;
+    TransformedInc.Angled = WrittenRef.starts_with("<");
----------------
include_cleaner::Includes::Line is 1-based, whereas 
clang::clangd::Inclusion::HashLine is 0-based. so we need to have a `+1` on the 
RHS. we are probably missing some test coverage if this didn't fail


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:523
+    TransformedInc.Angled = WrittenRef.starts_with("<");
+    if (auto FE = SM.getFileManager().getFile(Inc.Resolved))
+      TransformedInc.Resolved = *FE;
----------------
i don't think we should convert any unresolved includes. it usually means 
header wasn't found, so we can't perform any reliable analysis on them. 
anything i am missing?


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:535
+  Diag D;
+  D.Message = llvm::formatv("header {0} is missing", Spelling);
+  D.Name = "missing-includes";
----------------
i think the symbol name should also be part of the diagnostic. as editors can 
show these diagnostics without context (e.g. you've got a big file open, 
there's a diagnostic panel displaying messages/fixes for all of the file. you 
should be able to reason about the diagnostics and fixes without jumping all 
over the file). so maybe something like:

`formatv("{0} providing '{1}' is not directly included", Header, Symbol)`


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:545
+                          offsetToPosition(Code, Range.endOffset())};
+  tooling::HeaderIncludes HeaderIncludes(FileName, Code,
+                                         tooling::IncludeStyle{});
----------------
this is actually an expensive object to create (it scans the file for existing 
includes). so we should create this once for the whole file. i think it's 
better to generate whole set of diagnostics in this function directly, rather 
than creating one by one.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:549
+  bool Angled = HeaderRef.starts_with("<");
+  std::optional<tooling::Replacement> Replacement = HeaderIncludes.insert(
+      HeaderRef.trim("\"<>"), Angled, tooling::IncludeDirective::Include);
----------------
this returns nullopt only if Header is already included in the main file. our 
analysis should never suggest such a header for include, unless the include is 
coming from a PP-disabled region.

so i think if Replacement generation fails, we should drop the diagnostic 
completely rather than just dropping the fix. WDYT?


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:561
+
+Diag computeUnusedIncludesDiagnostic(std::string FileName, const Inclusion 
*Inc,
+                                     llvm::StringRef Code) {
----------------
s/computeUnusedIncludesDiagnostic/generateUnusedIncludeDiagnostic/


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:588
+
+std::string resolveSpelledHeader(ParsedAST &AST, const FileEntry *MainFile,
+                                 include_cleaner::Header Provider) {
----------------
s/resolveSpelledHeader/spellHeader/

again make this static or put into anon namespace above


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:590
+                                 include_cleaner::Header Provider) {
+  std::string SpelledHeader;
+  if (Provider.kind() == include_cleaner::Header::Physical) {
----------------
nit: you can rewrite this as:
```
// Give URI schemes a chance to customize header spellings
if(Provider.kind() == Physical) {
   if(auto CanPath = getCanonicalPath(Provider.physical(), 
AST.getSourceManager())) {
         // URI::includeSpelling only fails when URI scheme is unknown. Since 
we're creating URI ourselves here, we can't get an unknown scheme.
         std::string Header = 
llvm::cantFail(URI::includeSpelling(URI::create(*CanonicalPath));
         if (!Header.empty())
             return Header;
   }
}
return spellHeader(...);
```


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:613
+  const auto &Includes = AST.getIncludeStructure();
+  include_cleaner::Includes IncludeCleanerIncludes =
+      convertIncludes(AST.getSourceManager(), Includes.MainFileIncludes);
----------------
s/IncludeCleanerIncludes/ConvertedIncludes/ ?


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:636
+        bool Satisfied = false;
+        for (const auto &H : Providers) {
+          if (!IncludeCleanerIncludes.match(H).empty()) {
----------------
you can directly use `IncludeCleanerIncludes` now, without the need for 
matching the header against clangd's Includes. e.g:

```
bool Satisfied = false;
for (auto &H : Providers) {
   if (H.kind() == Physical && H.physical() == MainFile) {
      Satisfied = true;
      continue;
   }
   for (auto &Inc : IncludeCleanerIncludes.match(H)) {
      Satisfied = true;
      auto HeaderID = Includes.getID(Inc.Resolved);
      assert(HeaderID.has_value() && "IncludeCleanerIncludes only contain 
resolved includes.");
      Used.insert(*HeaderID);
   }
}
```

this way you can also get rid of the logic that generates `BySpelling` mapping 
above.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:662
+          std::string SpelledHeader =
+              resolveSpelledHeader(AST, MainFile, Providers.front());
+
----------------
sorry wasn't thinking this through in the initial set of comments. but i think 
we should be grouping findings per Symbol rather than per Header spelling, 
because:
- for reasons i pointed elsewhere, we should be including symbol name in the 
diagnostic messages
- again as pointed elsewhere, we should have diagnostics attached to all 
references not just the first one.
- in the future we'll likely extend this to cover providing fixes for 
alternative providers. we shouldn't be emitting multiple diags for each header 
but rather have multiple fixes on the same diagnostic.

so probably a mapping like:

llvm::DenseMap<Symbol, pair</*Providers*/vector<Header>, /*reference 
locations*/vector<Range>>; // instead of a pair, a dedicated struct would look 
better

we also should delay spelling of the Header, as it's expensive and we might not 
be diagnosing missing includes.

later on we can consume the whole map at once and convert them into a set of 
clangd::Diag. WDYT?


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:668
+          if (SpelledForExpanded.has_value())
+            MissingIncludes.insert(
+                {SpelledHeader, std::move(SpelledForExpanded.value())});
----------------
any reason for storing tokens? you can store a range directly here as a 
`clangd::Range`, or a `syntax::FileRange` if you don't want to pay for 
`offsetToPosition` calls if we're not going to emit.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:668
+          if (SpelledForExpanded.has_value())
+            MissingIncludes.insert(
+                {SpelledHeader, std::move(SpelledForExpanded.value())});
----------------
kadircet wrote:
> any reason for storing tokens? you can store a range directly here as a 
> `clangd::Range`, or a `syntax::FileRange` if you don't want to pay for 
> `offsetToPosition` calls if we're not going to emit.
this is not a multimap, so we're only retaining the range for a missing header 
on the first range that referenced a symbol provided by it. i think we should 
be collecting all the reference ranges instead. e.g. if i have a file:
```
#include <all>

void foo() {
  std::string x;
  std::string y;
}
```

there should be missing include diagnostics attached to both mentions of 
`std::string` not only the first (again in big files it might be impossible to 
see the first range the diagnostic is attached to and people have a tendency to 
only care about the parts of the code they've touched). does that make sense?


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:720
+
   std::string FileName =
       AST.getSourceManager()
----------------
nit: while here you can replace all uses of `FileName` to `AST.tuPath()`


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:729
+  }
+  for (const llvm::StringMapEntry<llvm::ArrayRef<syntax::Token>> &Missing :
+       MissingIncludes) {
----------------
convention is definitely to use `auto` in place of such detailed type names. 
e.g. `auto &Entry : MissingIncludes`


================
Comment at: clang-tools-extra/clangd/ParsedAST.cpp:694
+        issueIncludeCleanerDiagnostics(Result, Inputs.Contents);
     Result.Diags->insert(Result.Diags->end(),
+                         make_move_iterator(IncludeCleanerDiags.begin()),
----------------
nit: `llvm::move(issueIncludeCleanerDiagnostics(...), 
std::back_inserter(*Result.Diags))`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143496/new/

https://reviews.llvm.org/D143496

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D143496: [clangd] Add support for missing includes analysis.

Reply via email to