Re: static_branch/jump_label vs branch merging
On Fri, 2021-04-09 at 13:12 +0200, Peter Zijlstra wrote: > On Fri, Apr 09, 2021 at 11:57:22AM +0200, Ard Biesheuvel wrote: > > On Thu, 8 Apr 2021 at 18:53, Peter Zijlstra > > wrote: > > > > Is there *any* way in which we can have the compiler recognise > > > that the > > > asm_goto only depends on its arguments and have it merge the > > > branches > > > itself? > > > > > > I do realize that asm-goto being volatile this is a fairly huge > > > ask, but > > > I figured I should at least raise the issue, if only to raise > > > awareness. > > > > > > > Wouldn't that require the compiler to interpret the contents of the > > asm() block? > > Yeah, this is more or less asking for ponies :-) One option would be > some annotation that conveys the desired semantics without it having > to > untangle the mess in the asm block. > > The thing the compiler needs to know is that the branch is constant > for > any @key, and hence allow the obvious optimizations. I'm not sure if > this is something compiler folks would be even willing to consider, > but > I figured asking never hurts. > Sorry if this is a dumb question, but does the function attribute: __attribute__ ((pure)) help here? It's meant to allow multiple calls to a predicate to be merged - though I'd be nervous of using it here, the predicate isn't 100% pure, since AIUI the whole point of what you've built is for predicates that very rarely change - but can change occasionally. Hope this is constructive Dave
Re: static_branch/jump_label vs branch merging
On Fri, 2021-04-09 at 15:13 +0200, Peter Zijlstra wrote: > On Fri, Apr 09, 2021 at 03:01:50PM +0200, Peter Zijlstra wrote: > > On Fri, Apr 09, 2021 at 02:03:46PM +0200, Peter Zijlstra wrote: > > > On Fri, Apr 09, 2021 at 07:55:42AM -0400, David Malcolm wrote: > > > > > > Sorry if this is a dumb question, but does the function > > > > attribute: > > > > __attribute__ ((pure)) > > > > help here? It's meant to allow multiple calls to a predicate > > > > to be > > > > merged - though I'd be nervous of using it here, the predicate > > > > isn't > > > > 100% pure, since AIUI the whole point of what you've built is > > > > for > > > > predicates that very rarely change - but can change > > > > occasionally. > > > > > > I actually tried that, but it doesn't seem to work. Given the > > > function > > > arguments are all compile time constants it should DTRT AFAICT, > > > but > > > alas. > > > > FWIW, I tried the below patch and GCC-10.2.1 on current tip/master. > > I also just tried __attribute__((__const__)), which is stronger still > than __pure__ and that's also not working :/ > > I then also tried to replace the __buildin_types_compatible_p() magic > in > static_branch_unlikely() with _Generic(), but still no joy. > [..snip...] You tried __pure on arch_static_branch; did you try it on static_branch_unlikely? With the caveat that my knowledge of GCC's middle-end is mostly about implementing warnings, rather than optimization, I did some experimentation, with gcc trunk on x86_64 FWIW. Given: int __attribute__((pure)) foo(void); int t(void) { int a; if (foo()) a++; if (foo()) a++; if (foo()) a++; return a; } At -O1 and above this is optimized to a single call to foo, returning 0 or 3 accordingly. -fdump-tree-all shows that it's the "fre1" pass that eliminates the subsequent calls to foo, replacing them with reuses of the result of the first call. This is in gcc/tree-ssa-sccvn.c, a value-numbering pass. I think you want to somehow "teach" the compiler that: static_branch_unlikely(&sched_schedstats) is "pure-ish", that for some portion of the surrounding code that you want the result to be treated as pure - though I suspect compiler maintainers with more experience than me are thinking "but which portion? what is it safe to assume, and what will users be annoyed about if we optimize away? what if t itself is inlined somewhere?" and similar concerns. Or maybe the asm stmt itself could somehow be marked as pure??? (with similar concerns about semantics as above) Hope this is constructive Dave
Re: static_branch/jump_label vs branch merging
On Fri, 2021-04-09 at 20:40 +0200, Peter Zijlstra wrote: > On Fri, Apr 09, 2021 at 09:48:33AM -0400, David Malcolm wrote: > > You tried __pure on arch_static_branch; did you try it on > > static_branch_unlikely? > > static_branch_unlikely() is a CPP macro that expands to a statement > expression, or as with the later patch, a _Generic(). I'm not sure > how > to apply the attribute to either of them since it is a function > attribute. > > I was hoping the attribute would percolate through, so to speak. > > > With the caveat that my knowledge of GCC's middle-end is mostly > > about > > implementing warnings, rather than optimization, I did some > > experimentation, with gcc trunk on x86_64 FWIW. > > > > Given: > > > > int __attribute__((pure)) foo(void); > > > > int t(void) > > { > > int a; > > if (foo()) > > a++; > > if (foo()) > > a++; > > if (foo()) > > a++; > > return a; > > } > > > > At -O1 and above this is optimized to a single call to foo, > > returning 0 > > or 3 accordingly. > > > > -fdump-tree-all shows that it's the "fre1" pass that eliminates the > > subsequent calls to foo, replacing them with reuses of the result > > of > > the first call. > > > > This is in gcc/tree-ssa-sccvn.c, a value-numbering pass. > > > > I think you want to somehow "teach" the compiler that: > > static_branch_unlikely(&sched_schedstats) > > is "pure-ish", that for some portion of the surrounding code that > > you > > want the result to be treated as pure - though I suspect compiler > > maintainers with more experience than me are thinking "but which > > portion? what is it safe to assume, and what will users be annoyed > > about if we optimize away? what if t itself is inlined somewhere?" > > and > > similar concerns. > > Right, pure or even const. As to the scope, as wide as possible. It > literally is a global constant, the value returned is the same > everywhere. [Caveat: I'm a gcc developer, not a kernel expert] But it's not *quite* a global constant, or presumably you would be simply using a global constant, right? As the optimizer gets smarter, you don't want to have it one day decide that actually it really is constant, and optimize away everything at compile-time (e.g. when LTO is turned on, or whatnot). I get the impression that you're resorting to assembler because you're pushing beyond what the C language can express. Taking things to a slightly higher level, am I right in thinking that what you're trying to achieve is a control flow construct that almost always takes one of the given branches, but which can (very rarely) be switched to permanently take one of the other branches, and that you want the lowest possible overhead for the common case where the control flow hasn't been touched yet? (and presumably little overhead for when it has been?)... and that you want to be able to merge repeated such conditionals. Perhaps a __builtin_ to hint that a conditional should work that way (analogous to __builtin_expect)? I can imagine a way of implementing such a construct in gcc's gimple and RTL representations, but it would be a ton of work (and I have a full plate already) Or maybe another way of thinking about it is that you're reading a value and you would like the compiler to amortize away repeated reads of the value (perhaps just within the current function). It's kind of the opposite of "volatile" - something that the user is happy for the compiler to treat as not changing much, as opposed to something the user is warning the compiler about changing from under it. A "const-ish" value? Sorry if I'm being incoherent; I'm kind of thinking aloud here. Hope this is constructive Dave > > All we need GCC to do for the static_branch construct is to emit both > branches; that is, it must not treat the result as a constant and > elide > the other branches. But it can consider consecutive calls (as far and > wide as it wants) to return the same value. > > > Or maybe the asm stmt itself could somehow be marked as pure??? > > (with > > similar concerns about semantics as above) > > Yeah, not sure, someone with more clue will have to inform us what, > if > anything more than marking it either pure or const is required. > Perhaps > that attribute is sufficient and the compiler just isn't optimizing > for > an unrelated reason. > > Regards, > > Peter >
Re: static_branch/jump_label vs branch merging
On Fri, 2021-04-09 at 22:09 +0200, Peter Zijlstra wrote: > On Fri, Apr 09, 2021 at 03:21:49PM -0400, David Malcolm wrote: > > [Caveat: I'm a gcc developer, not a kernel expert] > > > > But it's not *quite* a global constant, or presumably you would be > > simply using a global constant, right? As the optimizer gets > > smarter, > > you don't want to have it one day decide that actually it really is > > constant, and optimize away everything at compile-time (e.g. when > > LTO > > is turned on, or whatnot). > > Right; as I said, the result is not a constant, but any invocation > ever, > will return the same result. Small but subtle difference :-) > > > I get the impression that you're resorting to assembler because > > you're > > pushing beyond what the C language can express. > > Of course :-) I tend to always push way past what C considers > sane. > Lets say I'm firmly in the C-as-Optimizing-Assembler camp :-) Yeah, I got that :) > > Taking things to a slightly higher level, am I right in thinking > > that > > what you're trying to achieve is a control flow construct that > > almost > > always takes one of the given branches, but which can (very rarely) > > be > > switched to permanently take one of the other branches, and that > > you > > want the lowest possible overhead for the common case where the > > control flow hasn't been touched yet? > > Correct, that's what it is. We do runtime code patching to flip the > branch if/when needed. We've been doing this for many many years now. > > The issue of today is all this clever stuff defeating some simple > optimizations. It's certainly clever - though, if you'll forgive me, that's not always a good thing :) > > (and presumably little overhead for when it > > has been?)... and that you want to be able to merge repeated such > > conditionals. > > This.. So the 'static' branches have been upstream and in use ever > since > GCC added asm-goto, it was in fact the driving force to get asm-goto > implemented. This was 2010 according to git history. > > So we emit, using asm goto, either a "NOP5" or "JMP.d32" (x86 > speaking), > and a special section entry into which we encode the key address and > the > instruction address and the jump target. > > GCC, not knowing what the asm does, only sees the 2 edges and all is > well. > > Then, at runtime, when we decide we want the other edge for a given > key, > we iterate our section and rewrite the code to either nop5 or jmp.d32 > with the correct jump target. > > > It's kind of the opposite of "volatile" - something that the user > > is > > happy for the compiler to treat as not changing much, as opposed to > > something the user is warning the compiler about changing from > > under > > it. A "const-ish" value? > > Just so. Encoded in text, not data. > > > Sorry if I'm being incoherent; I'm kind of thinking aloud here. > > No problem, we're way outside of what is generally considered normal, > and I did somewhat assume people were familiar with our 'dodgy' > construct (some on this list are more than others). > > I hope it's all a little clearer now. Yeah. This is actually on two mailing lists; I'm only subscribed to linux-toolchains, which AIUI is about sharing ideas between Linux and the toolchains. You've built a very specific thing out of asm-goto to fulfil the tough requirements you outlined above - as well as the nops, there's a thing in another section to contend with. How to merge these asm-goto constructs? Doing so feels very special-case to the kernel and not something that other GCC users would find useful. I can imagine a GCC plugin that implemented a custom optimization pass for that - basically something that spots the asm-gotos in the gimple IR and optimizes away duplicates by replacing them with jumps, but having read about Linus's feelings about GCC plugins recently: https://lwn.net/Articles/851090/ I suspect that that isn't going to fly (and if you're going down the route of adding an optimization pass via a plugin, there's probably a way to do that that doesn't involve asm). In theory, something to optimize the asm-gotos could be relatively simple, but that said, we don't really have a GCC plugin API; all of our internal APIs are exposed, and are liable to change from release to release, which I know is a pain (I've managed to break one of my own plugins with one of my own API changes at least once). Hope this is constructive Dave
Re: [PATCH 05/11] perf parse-event: Fix memory leak in evsel->unit
On Tue, 2020-09-15 at 11:59 -0700, Ian Rogers wrote: > On Tue, Sep 15, 2020 at 5:19 AM Arnaldo Carvalho de Melo > wrote: > > Em Tue, Sep 15, 2020 at 12:18:13PM +0900, Namhyung Kim escreveu: > > > The evsel->unit borrows a pointer of pmu event or alias instead > > > of > > > owns a string. But tool event (duration_time) passes a result of > > > strdup() caused a leak. > > > > > > It was found by ASAN during metric test: > > > > Thanks, applied. > > Thanks Namhyung and Arnaldo, just to raise a meta point. A lot of the > parse-events asan failures were caused by a lack of strdup causing > frees of string literals. It seems we're now adding strdup > defensively > but introducing memory leaks. Could we be doing this in a smarter > way? > For C++ I'd likely use std::string and walk away. For perf code the > best source of "ownership" I've found is to look at the "delete" > functions and figure out ownership from what gets freed there - this > can be burdensome. For strings, the code is also using strbuf and > asprintf. One possible improvement could be to document ownership > next > to the struct member variable declarations. Another idea would be to > declare a macro whose usage would look like: > > struct evsel { > ... > OWNER(char *name, "this"); > ... > UNOWNED(const char *unit); > ... > > Maybe then we could get a static analyzer to complain if a literal > were assigned to an owned struct variable. Perhaps if a strdup were > assigned to an UNOWNED struct variable perhaps it could warn too, as > presumably the memory allocation is a request to own the memory. > > There was a talk about GCC's -fanalyzer option doing malloc/free > checking at Linux plumbers 2 weeks ago: > https://linuxplumbersconf.org/event/7/contributions/721/attachments/542/961/2020-LPC-analyzer-talk.pdf > I added David Malcolm, the LPC presenter, as he may have ideas on how > we could do this in a better way. Hi Ian. Some ideas (with the caveat that I'm a GCC developer, and not a regular on LKML): can you capture the ownership status in the type system? I'm brainstorming here but how about: typedef char *owned_string_t; typedef const char *borrowed_string_t; This would at least capture the intent in human-readable form, and *might* make things more amenable to checking by a machine. It's also less macro cruft. I take it that capturing the ownership status with a runtime flag next to the pointer in a struct is too expensive for your code? Some notes on -fanalyzer: Caveat: The implementation of -fanalyzer in gcc 10 is an early prototype and although it has found its first CVE I don't recommend it for use "in anger" yet; I'm working on getting it more suitable for general usage for C in gcc 11. (mostly scaling issues and other bugfixing) -fanalyzer associates state machines with APIs; one of these state machines implements leak detection for malloc, along with e.g. double- free detection. I'm generalizing this checker to other acquire/release APIs: I have a semi-working patch under development (targeting GCC 11) that exposes this via a fndecl attribute, currently named "deallocated_by", so that fn decls can be labeled e.g.: extern void foo_release (foo *); extern foo *foo_acquire (void) __attribute__((deallocated_by(foo_release)); and have -fanalyzer detect leaks, double-releases, use-after-release, failure to check for NULL (alloc failure) etc. Ultimately this attribute might land in the libc header for strdup (and friends), but I can also special-case strdup so that the analyzer "knows" that the result needs to be freed if non-NULL (and that it can fail and return NULL). Hope this is constructive Dave [...]