sammccall added a comment.

In D136723#3915853 <https://reviews.llvm.org/D136723#3915853>, @maryammo wrote:

> This commit causes build failure on  
> `https://lab.llvm.org/buildbot/#/builders/121/builds/24947`  : 
> I was able to reproduce the failure and by reverting this commit locally it 
> passed, can you please take a look? @sammccall

It looks like this was already fixed in
https://github.com/llvm/llvm-project/commit/d19ba74dee0b9ab553bd8a6ef5b67ff349f4bf13



================
Comment at: 
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h:81
+    llvm::StringMap<llvm::SmallVector<unsigned>> BySpelling;
+    llvm::DenseMap<const FileEntry *, llvm::SmallVector<unsigned>> ByFile;
+  } Includes;
----------------
kadircet wrote:
> it feels like having a set of includes corresponding to the same fileentry, 
> rather than a single one, feels subtle enough to deserve a comment.
> 
> the only case i can think of is, two logically different files resolving to 
> same physical file (i.e. symlinks). these will actually have completely 
> different spellings, and probably it'll make things hard when deciding which 
> include to "keep" or "insert". are there other cases where we can have 
> multiple includes corresponding to the same fileentry?
Sure:
```
#include "foo.h"
#include "bar.h"
#include "foo.h"
```

it isn't useful, but people write it (by mistake).
Maybe we want to clean it up, maybe we don't, but I don't see much reason to 
model it wrong on purpose - it doesn't really simplify anything, and may 
further confuse a confusing situation.

I can add a comment - what do you want it to say?
"There may be more than one #include of the same file" just seems to restate 
the obvious interpretation of the type here. (And this is the header, so it's 
not an ideal place to dump a bunch of implementation comments)


================
Comment at: 
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h:54
+struct RecordedPP {
+  // The callback (when installed into clang) tracks macros/includes in this.
+  std::unique_ptr<PPCallbacks> record(const Preprocessor &PP);
----------------
kadircet wrote:
> s/this/the main file/
"this" as in the current object, as the data flow here isn't obvious. Changed 
to `*this`, clearer?


================
Comment at: 
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h:75
+    //  - for a logical file like <vector>, we check Spelled
+    llvm::SmallVector<const Include *> match(Header H) const;
+
----------------
kadircet wrote:
> hokein wrote:
> > sammccall wrote:
> > > in the prototype I reimplemented this function in clangd, but I expect we 
> > > can just reuse the RecordedIncludes class instead, copying from clangd's 
> > > includes list is cheap.
> > > 
> > > (That's one argument for putting this in a different header, which I can 
> > > do already if you prefer)
> > that's an interesting idea, but we need to do it carefully because of 
> > `FileEntry*`, the `Include` structure has a filed of `FileEntry*`, it is 
> > not feasible for clangd to propagate it (clangd's `Inclusion` doesn't have 
> > it, instead it uses a `HeaderID` which is based on the `fs::UniqueID` to 
> > identify a physical file).
> > 
> > But I think this is not something we should worry about at the moment, and 
> > the current interfaces look quite good.
> right, let's keep it to the minimum now. we can extend/move-around later on
Good point. We could give Include a UniqueID instead I think. Let's defer this 
though.


================
Comment at: 
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h:75
+    //  - for a logical file like <vector>, we check Spelled
+    llvm::SmallVector<const Include *> match(Header H) const;
+
----------------
sammccall wrote:
> kadircet wrote:
> > hokein wrote:
> > > sammccall wrote:
> > > > in the prototype I reimplemented this function in clangd, but I expect 
> > > > we can just reuse the RecordedIncludes class instead, copying from 
> > > > clangd's includes list is cheap.
> > > > 
> > > > (That's one argument for putting this in a different header, which I 
> > > > can do already if you prefer)
> > > that's an interesting idea, but we need to do it carefully because of 
> > > `FileEntry*`, the `Include` structure has a filed of `FileEntry*`, it is 
> > > not feasible for clangd to propagate it (clangd's `Inclusion` doesn't 
> > > have it, instead it uses a `HeaderID` which is based on the 
> > > `fs::UniqueID` to identify a physical file).
> > > 
> > > But I think this is not something we should worry about at the moment, 
> > > and the current interfaces look quite good.
> > right, let's keep it to the minimum now. we can extend/move-around later on
> Good point. We could give Include a UniqueID instead I think. Let's defer 
> this though.
> that's an interesting idea, but we need to do it carefully because of 
> `FileEntry*`, the `Include` structure has a filed of `FileEntry*`, it is not 
> feasible for clangd to propagate it (clangd's `Inclusion` doesn't have it, 
> instead it uses a `HeaderID` which is based on the `fs::UniqueID` to identify 
> a physical file).
> 
> But I think this is not something we should worry about at the moment, and 
> the current interfaces look quite good.




================
Comment at: 
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:52
 struct Symbol {
   enum Kind {
     // A canonical clang declaration.
----------------
hokein wrote:
>  probably worth adding a comment:  the order of the enumerators must match 
> the  the order of template arguments in Storage. 
Done in the base patch. D136710


================
Comment at: 
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:117
+  llvm::StringRef Spelled;             // e.g. vector
+  const FileEntry *Resolved = nullptr; // e.g. /path/to/c++/v1/vector
+  SourceLocation HashLocation;         // of hash in #include <vector>
----------------
kadircet wrote:
> i think it's worth leaving a comment here mentioning that this can be null 
> (couldn't resolve, or maybe we shouldn't be caring, e.g. logical stl include).
> 
> even better, maybe we should actually have a `Header` here? that way this 
> would be conceptual equivalent of SymbolReference, but for includes?
Done (comment.)

> maybe we should actually have a Header here
A `Header` is conceptually something like a `Matcher<Include>`. If we put 
Header here then we're communicating what the reference is going to look like: 
`#include "foo.h"` is going to satisfy a symbol usage in `/path/to/foo.h` or 
one in a file with `// IWYU pragma private: include "foo.h"`, which we don't 
know.
(And it could be both, which would no longer be expressible).

Technically you could store an Header that *always* has a FileEntry, but that 
just seems like obfuscation.


================
Comment at: 
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:41
+struct Macro {
+  IdentifierInfo *Name;
+  // The location of the Name where the macro is defined.
----------------
kadircet wrote:
> what about just a stringref to name?
That's twice as big, and this is already the critical path for 
sizeof(SymbolReference) which me can accumulate a lot of.

I don't think it's critical, but I also don't see why StringRef is better. 
(Less indirection vs carries some semantics + lifetime information)


================
Comment at: 
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:46
+  bool operator==(const Macro &S) const {
+    return Name == S.Name && Definition == S.Definition;
+  }
----------------
kadircet wrote:
> OOC: can we really have multiple macro identifiers defined at the same source 
> location? I guess it's cheap to compare another pointer, but might be easier 
> if we actually didn't make the identifierinfo part of the identity.
Probably not? Removed from comparison.
(I think it's *simpler* to include it in the comparison, but it doesn't matter 
much)


================
Comment at: 
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:118
+  const FileEntry *Resolved = nullptr; // e.g. /path/to/c++/v1/vector
+  SourceLocation Location;             // of hash in #include <vector>
+  unsigned Line = 0;                   // 1-based line number for #include
----------------
hokein wrote:
> nit: HashLoc seems clearer than `Location`.
> 
> And it looks like `Location` and `Line` feels somewhat redundant -- we can 
> get the Line loc from the `Location` with `SourceManager`. But I think it is 
> fair to hold both, Line is an important property of the include, which will 
> probably widely used.
Agree, we could one of these but it's a bit sad: we need SourceLocation to 
diagnose things, but Line is much more convenient the rest of the time.

Renamed, leaving this open to see what Kadir thinks too.


================
Comment at: clang-tools-extra/include-cleaner/lib/Record.cpp:20
 
+class PPRecorder : public PPCallbacks {
+public:
----------------
hokein wrote:
> this should be hidden in anonymous namespace, right? 
Right! This used to be friended.


================
Comment at: clang-tools-extra/include-cleaner/lib/Record.cpp:139
+  case Header::Physical:
+    for (unsigned I : ByFile.lookup(H.physical()))
+      Result.push_back(&All[I]);
----------------
kadircet wrote:
> we might have different handles (fileentry*) to the same file (if we end up 
> having multiple filemanagers in action, which is unlikely as things stand 
> today, but still). so what about using uniqueids for comparison here?
> 
> AFAICT, filemanager won't have multiple `fileenrty`s with the same `uniqueid` 
> anyway. so we wouldn't really lose any generality.
This is approximately the same issue as above (`> that's an interesting idea, 
but we need to do it carefully because of `FileEntry*`).
I agree and it doesn't really add code complexity but it does add subtlety, 
especially if we use FileEntry in the data structure and UniqueID for matching.
So I think deferring *all* the multi-file-manager concerns makes sense (or 
handling them all now if you prefer)


================
Comment at: clang-tools-extra/include-cleaner/lib/Record.cpp:28
+                           FileID PrevFID) override {
+    Active = SM.isWrittenInMainFile(Loc);
+  }
----------------
kadircet wrote:
> we can keep a include depth based on the `Reason` and bail out on other 
> operations when depth is not `1` (getFileID on Loc here probably hits the 
> SourceManager cache anyways, so might not be important, but i feel like it's 
> better to not rely on that).
That sounds complicated, wait until it shows up in a profile?

The conceptual stack definitely isn't properly maintained in all situations. 
e.g. a normal parse leaks 1 entry for... builtins or something, and clangd's 
parse with preamble leaks 3. I'm not specifically sure there's a problem in 
this scenario, but it's *not* a trivial thing that obviously can't go wrong.

(& I think if we're afraid to rely on the cache for the file we're *currently 
lexing* then we should seriously consider fixing the cache instead)


================
Comment at: clang-tools-extra/include-cleaner/unittests/RecordTest.cpp:122
+// Matches an Include with a particular spelling.
+MATCHER_P(spelled, S, "") { return arg.Spelled == S; }
+
----------------
kadircet wrote:
> i know these look more like functions, but convention is to use `PascalCase` 
> for matchers, so `Spelled` instead?
These are functions - I don't think there's a convention here, just confusion 
and bugs.
OK to fix `named` intead?



================
Comment at: clang-tools-extra/include-cleaner/unittests/RecordTest.cpp:186
+      Header.point("def"));
+  ASSERT_NE(Recorded.MacroReferences.size(), 0u);
+  Symbol OrigX = Recorded.MacroReferences.front().Symbol;
----------------
kadircet wrote:
> nit: `ASSERT_FALSE(Recorded.MacroReferences.empty())`
Why? This means we get "expected empty() is false, got true" instead of 
"expected size() is 0, got 7", which seems strictly worse...

Added a somewhat-useful printer for SymbolReference and switched to IsEmpty() 
instead.


================
Comment at: clang-tools-extra/include-cleaner/unittests/RecordTest.cpp:180
+  Inputs.ExtraFiles["header.h"] = Header.code();
+  Inputs.ErrorOK = true; // missing header
+  auto AST = build();
----------------
hokein wrote:
> IIUC, ErrorOK is irrelevant, and should be removed.
Hmm, it's not irrelevant, it means I get to make lots of mistakes in my 
examples :-)

Removed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136723/new/

https://reviews.llvm.org/D136723

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to