tonyelewis added inline comments.

================
Comment at: 
clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp:75
+      continue;
+    if (AnyHeader || File->NamePart.equals_lower(ThisFile->NamePart))
+      return; // Found a good candidate for matching decl
----------------
njames93 wrote:
> tonyelewis wrote:
> > njames93 wrote:
> > > tonyelewis wrote:
> > > > njames93 wrote:
> > > > > gribozavr2 wrote:
> > > > > > njames93 wrote:
> > > > > > > gribozavr2 wrote:
> > > > > > > > njames93 wrote:
> > > > > > > > > gribozavr2 wrote:
> > > > > > > > > > This heuristic should be a lot more complex. In practice 
> > > > > > > > > > people have more complex naming conventions (for example, 
> > > > > > > > > > in Clang, Sema.h corresponds to many files named 
> > > > > > > > > > SemaSomethingSomething.cpp).
> > > > > > > > > > 
> > > > > > > > > > I think there is a high chance that checking only a header 
> > > > > > > > > > with a matching name will satisfy too few projects to be 
> > > > > > > > > > worth implementing.
> > > > > > > > > > 
> > > > > > > > > > On the other hand, if you could use C++ or Clang modules to 
> > > > > > > > > > narrow down the list of headers where the declaration 
> > > > > > > > > > should appear, that would be a much stronger signal.
> > > > > > > > > That is the reason I added the CheckAnyHeader option. For 
> > > > > > > > > small projects the matching name would be usually be enough, 
> > > > > > > > > but for big complex projects there is no feasible check that 
> > > > > > > > > would work. Falling back to making sure every external 
> > > > > > > > > definition has a declaration in at least one header will 
> > > > > > > > > always be a good warning
> > > > > > > > That's the thing -- due to the lack of consistency in projects 
> > > > > > > > and style guides for C++, I think most projects will have to 
> > > > > > > > turn on CheckAnyHeader. We get implementation complexity in 
> > > > > > > > ClangTidy, false positives in high-profile projects, force 
> > > > > > > > users to configure ClangTidy to work well for their projects 
> > > > > > > > (unless we set CheckAnyHeader=1 by default... which then nobody 
> > > > > > > > would flip back to zero), and little benefit for end users.
> > > > > > > Would you say the best way would be check if the source file name 
> > > > > > > starts with the header file name by default. Or is that very 
> > > > > > > specific to Clang?
> > > > > > > 
> > > > > > > ```
> > > > > > > /// <SomeHeaderImpl.cpp>
> > > > > > > #include "SomeHeader.h"
> > > > > > > ```
> > > > > > > This file would check for declarations in `SomeHeader.h`
> > > > > > > 
> > > > > > > Or maybe check if the header file name starts with the source 
> > > > > > > file?
> > > > > > > 
> > > > > > > ```
> > > > > > > /// <SomeHeader.cpp>
> > > > > > > #include "SomeHeaderImpl.h"
> > > > > > > ```
> > > > > > > This file would check for declarations in `SomeHeaderImpl.h`.
> > > > > > > Or even check both?
> > > > > > > Would you say the best way would be check if the source file name 
> > > > > > > starts with the header file name by default. Or is that very 
> > > > > > > specific to Clang?
> > > > > > 
> > > > > > I don't think it is too specific, it should cover many projects I 
> > > > > > think.
> > > > > > 
> > > > > > > Or maybe check if the header file name starts with the source 
> > > > > > > file?
> > > > > > 
> > > > > > Seems like an unusual case to me, but I'm willing to be convinced 
> > > > > > otherwise.
> > > > > I thought it was an unusual case and wasn't thinking it would be a 
> > > > > good idea to add. I'll just set it up so that it will always look in 
> > > > > header files whose names appear at the start of the main source file.
> > > > FWIW, my instinct is that there are two separate questions:
> > > > 
> > > >  1. What sort of files should have their external-linkage definitions 
> > > > checked?
> > > >  2. What sort of included files should be part of the search for 
> > > > adequate declarations that match those definitions?
> > > > 
> > > > ...and that my answers, in reverse order, are:
> > > > 
> > > >  2. All included files should be included (ie a check for a given 
> > > > external-linkage definition should be appeased by _any_ matching 
> > > > declaration in _any_ included file)
> > > >  1. Only main files. And (to prevent lots of false-positives when the 
> > > > tool is run over headers), only "c", "cc", "cxx", "cpp" files by 
> > > > default but with an option to check in _all_ main files.
> > > > 
> > > > I think this has the merits that it:
> > > >  * allows users to put their valid declaration in files with any 
> > > > extension they like
> > > >  * defaults to only firing when run against c/cc/cxx/cpp files but 
> > > > provides the option to fire when run against a file with _any_ extension
> > > > 
> > > > 
> > > > So I propose that there be one option and it be as follows:
> > > > 
> > > > 
> > > > > .. option:: CheckExtDefnsInAllMainFiles
> > > > > 
> > > > > If set to `0` only check external-linkage definitions in main files 
> > > > > with typical source-file extensions ("c", "cc", "cxx", "cpp").
> > > > > If set to `1` check external linkage definitions in all main files.
> > > > > Default value is `0`.
> > > So current behaviour is kind of explained that you can either search in 
> > > headers that are a starting substring of the source file or just search 
> > > for all headers, I tried to explain it in the documentation. By default 
> > > it only checks for definitions in files with the extension `c`, `cc`, 
> > > `cpp` or `cxx`. I could add an option to check all definitions expanded 
> > > in the main file, regardless of the extension.
> > Yep - I //think// I understand the current behaviour (but could well be 
> > wrong about that of course :) ).
> > 
> > But IIUC, I share @gribozavr2's concerns that the current behaviour's 
> > default of only accepting matching declarations in included files //that 
> > match a specific pattern// is going to trigger a lot of false-positives in 
> > a lot of projects. And I don't think this check should care about //where// 
> > the matching declaration is found, so long as there is such a declaration 
> > in a file included within the TU.
> > 
> > To my thinking, the implication of having a default policy that restricts 
> > the names of files in which we'll accept matching declarations is that 
> > we're policing the way that users name and organise their files (by 
> > spamming them with lots of false-positives if they don't do it in a way 
> > that satisfies our rule). I'd argue that file naming/organisation doesn't 
> > belong as part of this check (or even as part of any other clang-tidy 
> > check).
> > 
> > The other half of my previous comment was about a 
> > `CheckExtDefnsInAllMainFiles` option but I think that that's less important 
> > and that having a restriction to `c`, `cc`, `cpp` and `cxx` seems pretty 
> > reasonable to me—it could always be broadened in the future.
> > 
> > Overall, I think it's better to err on the side of //not// triggering the 
> > warning so as to avoid excessive false-positives; the check can always be 
> > made more sensitive in the future. I see the implications of that stance as 
> > being:
> >  * default to only checking the definitions in files that look like source 
> > files (so I think restricting to `c`, `cc`, `cpp` and `cxx` is good)
> >  * default to accepting matching declarations in all included files.
> > 
> The reason for the unrelated header check is to prevent potential bugs. If a 
> declaration is in a completely unrelated header that happens to get included, 
> it more than likely could be a collision which could then break ODR rules 
> during linking. Perhaps a better thing to do would be if the decl isn't in a 
> "corresponding" header, emit a warning, but say there was a decl found in 
> `example.h`
Thanks.

When you say "warning", do you mean something different from a check violation? 
Does that mean the check isn't fired but a warning message is printed?

Either way, my feeling is that it would be better to avoid diagnosing this 
situation (of a definition in a source-file matching a declaration in a source 
file with a filename stem that isn't a leading substring of the source file's 
name) at all.

One minor aspect: if this ODR-motivated diagnostic message is tied to the 
`misc-missing-header-file-declaration check`, that has the potential to be 
confusing to users. The existing check is about one specific issue ("you said 
you want X to be accessible in multiple TUs but you haven't provided a 
declaration for other TUs to share") which IMHO would be muddied by being 
conflated with a quite different issue ("the names of the files containing the 
definition and declaration of X are quite different, which we think smells a 
bit like an ODR violation").

Another issue: it feels like an inappropriate thing to be enforcing. C++ gives 
users complete freedom about the way they name their source-code files (indeed 
I don't think the standard even assumes source-code comes in files) and I don't 
think there's anything like universally agreed upon best-practice that 
deviating from my_stuff.cpp / my_stuff.hpp pairs is always bad. So it feels 
wrong to start making life difficult for users who deviate from that pattern 
for any reason.

And I think the rest of my concern relates to two empirical questions:
 * what fraction of the all such instances in the world's source code are 
actually OK (not an ODR violation)?
 * what fraction of projects contain such instances?

If I was convinced these numbers were both really low, I'd feel much more 
comfortable with clang-tidy firing warnings/check-violations. In that world, we 
could say: even though it's not //wrong// to deviate from matching my_stuff.cpp 
/ my_stuff.hpp pairs, it's worth //complete// adherence to that pattern for the 
resultant ODR-violation-detection gains. (Side note: another empirical question 
is how many ODR-violations this check would detect that aren't already caught 
by other ODR-violation detecting tools, eg linkers' and [[ 
https://github.com/google/sanitizers/wiki/AddressSanitizerOneDefinitionRuleViolation
 | ASan's ]])

But my instinct is that those fractions are both pretty high: that lots of 
projects involve some file-naming structures that would fall afoul of the rule 
and cause lots of false positives.

In other words, my instinct is that this pattern isn't very strong evidence of 
an ODR violation and that the line-noise/inconvenience cost would far outweigh 
the ODR-detecting benefit.

So I think that the check should always accept matching declarations in //any// 
header (or at least that that should be the default policy and that users 
explicitly opt in to the stricter policy).

Again: thanks for all the work on this check - it's really great and I really 
appreciate it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73413/new/

https://reviews.llvm.org/D73413



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to