aaron.ballman added inline comments.

================
Comment at: clang/test/Frontend/backend-attribute-error-warning-optimize.c:12
   foo(); // expected-error {{call to 'foo' declared with 'error' attribute: oh 
no foo}}
+         // expected-note@* {{In function 'baz'}}
   if (x())
----------------
nickdesaulniers wrote:
> cjdb wrote:
> > aaron.ballman wrote:
> > > nickdesaulniers wrote:
> > > > cjdb wrote:
> > > > > aaron.ballman wrote:
> > > > > > nickdesaulniers wrote:
> > > > > > > nickdesaulniers wrote:
> > > > > > > > aaron.ballman wrote:
> > > > > > > > > Instead of allowing this note to appear anywhere in the file, 
> > > > > > > > > I think it's better to use "bookmark" comments. e.g.,
> > > > > > > > > ```
> > > > > > > > > void baz() { // #baz_defn
> > > > > > > > > }
> > > > > > > > > 
> > > > > > > > > void foo() {
> > > > > > > > >   baz(); // expected-note@#baz_defn {{whatever note text}}
> > > > > > > > > }
> > > > > > > > > ```
> > > > > > > > > so that we're testing where the diagnostic is emitted. (This 
> > > > > > > > > is especially important given that the changes are about 
> > > > > > > > > location tracking.)
> > > > > > > > oh, that's new (to me)! will do
> > > > > > > It looks like the "bookmarks" don't work because the notes do not 
> > > > > > > line+col info. They follow the warning/error diagnostic which 
> > > > > > > does, on the bottom most call site.
> > > > > > > 
> > > > > > > The warning is supposed to be emitted on the callsite, not the 
> > > > > > > definition.
> > > > > > I still don't think this is reasonable for test coverage because 
> > > > > > this means we'll accept the note *anywhere* in the file. This makes 
> > > > > > it much too easy to regress the behavior accidentally (e.g., 
> > > > > > accidentally emit all the notes on line 1 of the file). I think we 
> > > > > > need *some* way to nail down where these notes are emitted in the 
> > > > > > test. However, I might be asking you to solve "please track 
> > > > > > location information through the backend" for that, so perhaps this 
> > > > > > isn't a reasonable request?
> > > > > > 
> > > > > > Also, CC @cjdb for awareness of how this kind of diagnostic is 
> > > > > > produced (Chris is working on an idea for how we emit diagnostics 
> > > > > > so we get better structured information from them, so knowing about 
> > > > > > this novel approach might impact that idea).
> > > > > Very interesting, thanks for the heads up! Cross-phase diagnostics 
> > > > > are definitely something I hadn't considered, and it **does** impact 
> > > > > the design (but not in a derailing way).
> > > > I think we're back at "please track location information through the 
> > > > backend".
> > > > 
> > > > That is the tradeoff with this approach; not measurably regressing 
> > > > performance when these attributes aren't used in source in exchange for 
> > > > Note diagnostics that lack precise line+col info, but in practice 
> > > > provide enough info for the user to understand what's going wrong where.
> > > > 
> > > > Your call if the tradeoff is worth it.
> > > Here's my thinking, please correct any misunderstandings I have:
> > > 
> > > * This functionality is primarily for use with `_FORTIFY_SOURCE` (but not 
> > > solely for that purpose), and that's an important security feature used 
> > > by glibc and the Linux kernel.
> > > * Security of C and C++ source code is Very Important, especially now 
> > > that various gov't agencies [1][2] are suggesting to not use C or C++ for 
> > > new projects specifically because the security posture of the languages 
> > > and their tools.
> > > * Therefore, the ergonomics of this functionality are quite important for 
> > > the intended use cases and the ecosystem as a whole.
> > > * Emitting only the function name but without location information does 
> > > not give a good user experience, especially in circumstances where the 
> > > function has multiple overloads, because it pushes the burden onto the 
> > > programmer to figure out which functions are actually involved in the 
> > > call chain. This issue is also present in C because of support for 
> > > `__attribute__((overloadable))` and is not just a C++ concern.
> > > * The compile-time performance costs of tracking this location 
> > > information are roughly 1%, and there are no runtime or binary size 
> > > overhead costs unless other functionality is enabled (such as emitting 
> > > debug info).
> > > * We can determine whether we need to enable this location tracking while 
> > > building the AST when we see one of these two attributes being used, so 
> > > we could enable this functionality selectively without burdening the user 
> > > with enabling it manually via a flag. (We could still consider giving the 
> > > user a flag to never track this even in the presence of the attributes if 
> > > a user had a compelling use case for it.)
> > > 
> > > If this is all reasonably accurate, I think it's worth seriously 
> > > considering accepting the costs. I don't want to increase our compile 
> > > time overhead, but at the same time, if ~1% is a "make or break" amount 
> > > of compile-time performance to lose, we should be addressing *that* issue 
> > > rather than hamstringing a user-facing diagnostic feature related to 
> > > security. It's hard to argue "that CVE was totally worth it because we 
> > > could then compile 1% faster". That said, perhaps others have a strong 
> > > contrary opinion they'd like to argue in favor of?
> > > 
> > > [1] 
> > > https://media.defense.gov/2022/Nov/10/2003112742/-1/-1/0/CSI_SOFTWARE_MEMORY_SAFETY.PDF
> > > [2] https://doi.org/10.6028/NIST.IR.8397
> > > 
> > My attitude is that we should prioritise ergonomics and usability over 
> > speed. Yes, having to wait a little longer for stuff to build sucks, but 
> > there's a clear ergonomics issue, which is motivation enough for me in this 
> > case (adding the CVE argument is the cherry on top). Folks either aren't 
> > going to use a feature if it's difficult to get right, or they're going to 
> > use it with large amounts of stress (and I don't think this is fair).
> > 
> > Moreover, I think there may be some opportunities we can look into to 
> > cancel out this pessimisation.
> > The compile-time performance costs of tracking this location information 
> > are roughly 1%, and there are no runtime or binary size overhead costs 
> > unless other functionality is enabled (such as emitting debug info).
> 
> (Just a note for passers-by: https://reviews.llvm.org/D141451 is not a 1% 
> performance regression; @aaron.ballman is referring to 
> https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261?u=nickdesaulniers
>  Solution 2 where would could track all location info unconditionally)
> 
> That 1% measurement was for Linux kernel builds, as reported 
> https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261?u=nickdesaulniers,
>  and it gets worse for other projects. The ones tracked on llvm compile time 
> tracker were much worse `geomean slowdown of 4.77% of wall time to build 
> various open source projects (ranging from 1.95% to 6.9%). ` 
> http://llvm-compile-time-tracker.com/compare.php?from=c1125ae5b05f5e23c7e61e85afb359e095444d18&to=059eb5565a580f7ea37c2d62dcffdb7401e12e55&stat=wall-time
> 
> I did try tracking //only// source locations of //calls//.
> https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/11?u=nickdesaulniers
> That was faster, but I only measured Linux kernel build times, and not llvm 
> compile time tracker times.  I could do a similar experiment (test on llvm 
> compile time tracker) if that helps?
> 
> > We can determine whether we need to enable this location tracking while 
> > building the AST when we see one of these two attributes being used
> 
> I had looked into that; I wasn't able to get something like that working. 
> IIRC, it's hard to modify codegen during parsing, or the decision on what 
> kind of DICompileUnit to emit had already been made.  Or maybe I was trying 
> to do it from CodeGen after we already started producing IR, I don't remember 
> now.  Are we guaranteed to never start CodeGen of IR until after Sema is 
> complete?  I could look into this again/try harder though.
> 
> > We could still consider giving the user a flag to never track this even in 
> > the presence of the attributes if a user had a compelling use case for it.
> 
> Yeah, I'm concerned with the interaction between <whatever ships> and `-g0`. 
> If <whatever ships> == location tracking via debug info, then the presence of 
> `-g0` may change our diagnostics!
> 
> > That said, perhaps others have a strong contrary opinion they'd like to 
> > argue in favor of?
> 
> @efriedma had some such feedback on the RFC: 
> https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/2?u=nickdesaulniers
> That 1% measurement was for Linux kernel builds, as reported 
> https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261?u=nickdesaulniers,
>  and it gets worse for other projects. The ones tracked on llvm compile time 
> tracker were much worse geomean slowdown of 4.77% of wall time to build 
> various open source projects (ranging from 1.95% to 6.9%).  
> http://llvm-compile-time-tracker.com/compare.php?from=c1125ae5b05f5e23c7e61e85afb359e095444d18&to=059eb5565a580f7ea37c2d62dcffdb7401e12e55&stat=wall-time

Oh, that makes me sad. A 5% slowdown is quite the hit for this.

> I did try tracking only source locations of calls.
https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/11?u=nickdesaulniers
That was faster, but I only measured Linux kernel build times, and not llvm 
compile time tracker times. I could do a similar experiment (test on llvm 
compile time tracker) if that helps?

I don't want to send you on a wild goose chase, but if we can find a solution 
that gets us closer to 1% (or less!), that would make it easier to decide what 
to do here. I think 5% is... probably too much, but then I wonder "how do we 
make Fortify work?".

> I had looked into that; I wasn't able to get something like that working. 
> IIRC, it's hard to modify codegen during parsing, 

Ah, that I'd believe (I forgot, language options are basically const by the 
time we start parsing). However, could we squirrel this away on `ASTContext` 
instead?

>> That said, perhaps others have a strong contrary opinion they'd like to 
>> argue in favor of?
> @efriedma had some such feedback on the RFC: 
> https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/2?u=nickdesaulniers

Ah, thank you for reminding me of that! However, I disagree with this:

> (Most code doesn’t normally contain either inline asm or warning/error 
> attributes, and I’m not really convinced by other potential diagnostics.)

because this attribute is also used by glibc: 
https://codebrowser.dev/glibc/glibc/misc/sys/cdefs.h.html#203 and that macro 
does get used: https://codebrowser.dev/glibc/glibc/stdlib/bits/stdlib.h.html#33 
or https://codebrowser.dev/glibc/glibc/posix/bits/unistd.h.html#32 (etc), so I 
think this could potentially be significantly wider than just the Linux kernel. 
Does that change your opinion at all @efriedma?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141451/new/

https://reviews.llvm.org/D141451

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to