On Thu, Feb 25, 2016 at 6:33 PM, Torvald Riegel <trie...@redhat.com> wrote: > On Wed, 2016-02-24 at 13:14 +0100, Richard Biener wrote: >> On Tue, Feb 23, 2016 at 8:38 PM, Torvald Riegel <trie...@redhat.com> wrote: >> > I'd like to know, based on the GCC experience, how important we consider >> > optimizations that may turn data dependencies of pointers into control >> > dependencies. I'm thinking about all optimizations or transformations >> > that guess that a pointer might have a specific value, and then create >> > (specialized) code that assumes this value that is only executed if the >> > pointer actually has this value. For example: >> > >> > int d[2] = {23, compute_something()}; >> > >> > int compute(int v) { >> > if (likely(v == 23)) return 23; >> > else <lots of stuff>; >> > } >> > >> > int bar() { >> > int *p = ptr.load(memory_order_consume); >> > size_t reveal_that_p_is_in_d = p - d[0]; >> > return compute(*p); >> > } >> > >> > Could be transformed to (after inlining compute(), and specializing for >> > the likely path): >> > >> > int bar() { >> > int *p = ptr.load(memory_order_consume); >> > if (p == d) return 23; >> > else <lots of stuff(*p)>; >> > } >> >> Note that if a user writes >> >> if (p == d) >> { >> ... do lots of stuff via p ... >> } >> >> GCC might rewrite accesses to p as accesses to d and thus expose >> those opportunities. Is that a transform that isn't valid then or is >> the code written by the user (establishing the equivalency) to blame? > > In the context of this memory_order_consume proposal, this transform > would be valid because the program has already "reveiled" what value p > has after the branch has been taken. > >> There's a PR where this kind of equivalencies lead to unexpected (wrong?) >> points-to results for example. >> >> > Other potential examples that come to mind are de-virtualization, or >> > feedback-directed optimizations that has observed at runtime that a >> > certain pointer is likely to be always equal to some other pointer (eg., >> > if p is almost always d[0], and specializing for that). >> >> That's the cases that are quite important in practice. > > Could you quantify this somehow, even if it's a very rough estimate? > I'm asking because it's significant and widely used, then this would > require users or compiler implementors to make a difficult trade-off > (ie, do you want mo_consume performance or performance through those > other optimizations?).
Probably resoved by your followup on how the transform is safe anyway. >> > Also, it would be interesting to me to know how often we may turn data >> > dependencies into control dependencies in cases where this doesn't >> > affect performance significantly. >> >> I suppose we try to avoid that but can we ever know for sure? Like >> speculative devirtualization does this (with the intent that it _does_ >> matter, >> of course). >> >> I suppose establishing non-dependence isn't an issue, like with the >> vectorizer adding runtime dependence checks and applying versioning >> to get a vectorized and a not vectorized loop (in case there are >> dependences)? > > I'm not sure I understand you correctly. Do you have a brief example, > perhaps? For mo_consume and its data dependencies, if there might be a > dependence, the compiler would have to preserve it; but I guess that > both a vectorized loop an one that accessses each element separately > would preserve dependences because it's doing those accesses, and they > depend on the input data. > OTOH, peraps HW vector instructions don't get the ordering guarantees > from data dependences -- Paul, do you know of any such cases? A brief example would be for void foo (int *a, int *b, int n) { for (int i = 0; i < n; ++i) a[i] = b[i]; } which we can vectorize like if (a + n < b || b + n < a) { vectorized loop } else { not vectorized loop } note how we're not establishing equivalences between pointers but non-dependence vs. possible dependence. >> > The background for this question is Paul McKenney's recently updated >> > proposal for a different memory_order_consume specification: >> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0190r0.pdf >> > >> > In a nutshell, this requires a compiler to either prove that a pointer >> > value is not carrying a dependency (simplified, its value somehow >> > originates from a memory_order_consume load), or it has to >> > conservatively assume that it does; if it does, the compiler must not >> > turn data dependencies into control dependencies in generated code. >> > (The data dependencies, in contrast to control dependencies, enforce >> > memory ordering on archs such as Power and ARM; these orderings than >> > allow for not having to use an acquire HW barrier in the generated >> > code.) >> > >> > Given that such a proof will likely be hard for a compiler (dependency >> > chains propagate through assignments to variables on the heap and stack, >> > chains are not marked in the code, and points-to analysis can be hard), >> > a compiler faces a trade-off between either: >> > (1) trying to support this memory_order_consume specification and likely >> > disallowing all transformations that change data dependencies into >> > control dependencies, or >> > (2) not support the proposal by simply emitting memory_order_acquire >> > code, but get no new constraints on transformations in return (ie, what >> > we do for memory_order_consume today). >> > >> > A compiler could let users make this choice, but this will be hard for >> > users too, and the compiler would still have to pick a default. >> > >> > Therefore, it would be good to know how important such transformations >> > or optimizations are in practice. If they are, it either means somewhat >> > slower code everywhere (or at least having to change much in todays >> > compilers), or not supporting the new memory_order_consume (at least not >> > in the default setting of the compiler). >> >> IMHO all the memory order semantics are too complicated already so what's >> the reason to complicate it even more...? > > The hardware memory models of archs like Power an ARM guarantee ordering > for accesses that have a data dependency, thus allowing things like > traversing a concurrently modified list to not have to use acquire > memory barriers (ie, a programmer could use memory_order_consume instead > of memory_order_acquire). The Linux kernel uses this heavily in RCU, > for example. It doesn't matter for archs like x86 where acquire / > release barriers are implicit. That's memory commit ordering on a single CPU core, right? How would it be ever valid to change commit order when there is a data dependence? > Paul is looking for a solution that would make both the compiler and the > kernel happy; being able to standardize this in C/C++ would be another > good outcome from his perspective, I believe. For GCC, the former might > also have benefits because we'd have clearer requirements for something > the Linux kernel is using heavily. > The Linux kernel guys seem to be opposed to marking the dependency > chains a program relies on (except for the initial memory_order_consume > load). I pointed out that this proposal would disallow the > transformations I have asked about in this email, and Paul wanted to > know how important they are. So I asked here ... :) > > If the optimizations that would get disallowed in practice are important > for performance, my inclination would be that we might need a different > proposal for C++ standardization. For the kernel use case only, it > could nonetheless still be interesting for us to further investigate it; > if the kernel doesn't need these optimizations, we'd at least have a > better understanding of what the kernel expects and what we deliver. > >