nickdesaulniers added inline comments.

================
Comment at: clang/test/Frontend/backend-attribute-error-warning-optimize.c:12
   foo(); // expected-error {{call to 'foo' declared with 'error' attribute: oh 
no foo}}
+         // expected-note@* {{In function 'baz'}}
   if (x())
----------------
cjdb wrote:
> aaron.ballman wrote:
> > nickdesaulniers wrote:
> > > cjdb wrote:
> > > > aaron.ballman wrote:
> > > > > nickdesaulniers wrote:
> > > > > > nickdesaulniers wrote:
> > > > > > > aaron.ballman wrote:
> > > > > > > > Instead of allowing this note to appear anywhere in the file, I 
> > > > > > > > think it's better to use "bookmark" comments. e.g.,
> > > > > > > > ```
> > > > > > > > void baz() { // #baz_defn
> > > > > > > > }
> > > > > > > > 
> > > > > > > > void foo() {
> > > > > > > >   baz(); // expected-note@#baz_defn {{whatever note text}}
> > > > > > > > }
> > > > > > > > ```
> > > > > > > > so that we're testing where the diagnostic is emitted. (This is 
> > > > > > > > especially important given that the changes are about location 
> > > > > > > > tracking.)
> > > > > > > oh, that's new (to me)! will do
> > > > > > It looks like the "bookmarks" don't work because the notes do not 
> > > > > > line+col info. They follow the warning/error diagnostic which does, 
> > > > > > on the bottom most call site.
> > > > > > 
> > > > > > The warning is supposed to be emitted on the callsite, not the 
> > > > > > definition.
> > > > > I still don't think this is reasonable for test coverage because this 
> > > > > means we'll accept the note *anywhere* in the file. This makes it 
> > > > > much too easy to regress the behavior accidentally (e.g., 
> > > > > accidentally emit all the notes on line 1 of the file). I think we 
> > > > > need *some* way to nail down where these notes are emitted in the 
> > > > > test. However, I might be asking you to solve "please track location 
> > > > > information through the backend" for that, so perhaps this isn't a 
> > > > > reasonable request?
> > > > > 
> > > > > Also, CC @cjdb for awareness of how this kind of diagnostic is 
> > > > > produced (Chris is working on an idea for how we emit diagnostics so 
> > > > > we get better structured information from them, so knowing about this 
> > > > > novel approach might impact that idea).
> > > > Very interesting, thanks for the heads up! Cross-phase diagnostics are 
> > > > definitely something I hadn't considered, and it **does** impact the 
> > > > design (but not in a derailing way).
> > > I think we're back at "please track location information through the 
> > > backend".
> > > 
> > > That is the tradeoff with this approach; not measurably regressing 
> > > performance when these attributes aren't used in source in exchange for 
> > > Note diagnostics that lack precise line+col info, but in practice provide 
> > > enough info for the user to understand what's going wrong where.
> > > 
> > > Your call if the tradeoff is worth it.
> > Here's my thinking, please correct any misunderstandings I have:
> > 
> > * This functionality is primarily for use with `_FORTIFY_SOURCE` (but not 
> > solely for that purpose), and that's an important security feature used by 
> > glibc and the Linux kernel.
> > * Security of C and C++ source code is Very Important, especially now that 
> > various gov't agencies [1][2] are suggesting to not use C or C++ for new 
> > projects specifically because the security posture of the languages and 
> > their tools.
> > * Therefore, the ergonomics of this functionality are quite important for 
> > the intended use cases and the ecosystem as a whole.
> > * Emitting only the function name but without location information does not 
> > give a good user experience, especially in circumstances where the function 
> > has multiple overloads, because it pushes the burden onto the programmer to 
> > figure out which functions are actually involved in the call chain. This 
> > issue is also present in C because of support for 
> > `__attribute__((overloadable))` and is not just a C++ concern.
> > * The compile-time performance costs of tracking this location information 
> > are roughly 1%, and there are no runtime or binary size overhead costs 
> > unless other functionality is enabled (such as emitting debug info).
> > * We can determine whether we need to enable this location tracking while 
> > building the AST when we see one of these two attributes being used, so we 
> > could enable this functionality selectively without burdening the user with 
> > enabling it manually via a flag. (We could still consider giving the user a 
> > flag to never track this even in the presence of the attributes if a user 
> > had a compelling use case for it.)
> > 
> > If this is all reasonably accurate, I think it's worth seriously 
> > considering accepting the costs. I don't want to increase our compile time 
> > overhead, but at the same time, if ~1% is a "make or break" amount of 
> > compile-time performance to lose, we should be addressing *that* issue 
> > rather than hamstringing a user-facing diagnostic feature related to 
> > security. It's hard to argue "that CVE was totally worth it because we 
> > could then compile 1% faster". That said, perhaps others have a strong 
> > contrary opinion they'd like to argue in favor of?
> > 
> > [1] 
> > https://media.defense.gov/2022/Nov/10/2003112742/-1/-1/0/CSI_SOFTWARE_MEMORY_SAFETY.PDF
> > [2] https://doi.org/10.6028/NIST.IR.8397
> > 
> My attitude is that we should prioritise ergonomics and usability over speed. 
> Yes, having to wait a little longer for stuff to build sucks, but there's a 
> clear ergonomics issue, which is motivation enough for me in this case 
> (adding the CVE argument is the cherry on top). Folks either aren't going to 
> use a feature if it's difficult to get right, or they're going to use it with 
> large amounts of stress (and I don't think this is fair).
> 
> Moreover, I think there may be some opportunities we can look into to cancel 
> out this pessimisation.
> The compile-time performance costs of tracking this location information are 
> roughly 1%, and there are no runtime or binary size overhead costs unless 
> other functionality is enabled (such as emitting debug info).

(Just a note for passers-by: https://reviews.llvm.org/D141451 is not a 1% 
performance regression; @aaron.ballman is referring to 
https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261?u=nickdesaulniers
 Solution 2 where would could track all location info unconditionally)

That 1% measurement was for Linux kernel builds, as reported 
https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261?u=nickdesaulniers,
 and it gets worse for other projects. The ones tracked on llvm compile time 
tracker were much worse `geomean slowdown of 4.77% of wall time to build 
various open source projects (ranging from 1.95% to 6.9%). ` 
http://llvm-compile-time-tracker.com/compare.php?from=c1125ae5b05f5e23c7e61e85afb359e095444d18&to=059eb5565a580f7ea37c2d62dcffdb7401e12e55&stat=wall-time

I did try tracking //only// source locations of //calls//.
https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/11?u=nickdesaulniers
That was faster, but I only measured Linux kernel build times, and not llvm 
compile time tracker times.  I could do a similar experiment (test on llvm 
compile time tracker) if that helps?

> We can determine whether we need to enable this location tracking while 
> building the AST when we see one of these two attributes being used

I had looked into that; I wasn't able to get something like that working. IIRC, 
it's hard to modify codegen during parsing, or the decision on what kind of 
DICompileUnit to emit had already been made.  Or maybe I was trying to do it 
from CodeGen after we already started producing IR, I don't remember now.  Are 
we guaranteed to never start CodeGen of IR until after Sema is complete?  I 
could look into this again/try harder though.

> We could still consider giving the user a flag to never track this even in 
> the presence of the attributes if a user had a compelling use case for it.

Yeah, I'm concerned with the interaction between <whatever ships> and `-g0`. If 
<whatever ships> == location tracking via debug info, then the presence of 
`-g0` may change our diagnostics!

> That said, perhaps others have a strong contrary opinion they'd like to argue 
> in favor of?

@efriedma had some such feedback on the RFC: 
https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/2?u=nickdesaulniers


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141451/new/

https://reviews.llvm.org/D141451

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to