On Thu, Feb 25, 2016 at 6:33 PM, Torvald Riegel <trie...@redhat.com> wrote:
> On Wed, 2016-02-24 at 13:14 +0100, Richard Biener wrote:
>> On Tue, Feb 23, 2016 at 8:38 PM, Torvald Riegel <trie...@redhat.com> wrote:
>> > I'd like to know, based on the GCC experience, how important we consider
>> > optimizations that may turn data dependencies of pointers into control
>> > dependencies.  I'm thinking about all optimizations or transformations
>> > that guess that a pointer might have a specific value, and then create
>> > (specialized) code that assumes this value that is only executed if the
>> > pointer actually has this value.  For example:
>> >
>> > int d[2] = {23, compute_something()};
>> >
>> > int compute(int v) {
>> >   if (likely(v == 23)) return 23;
>> >   else <lots of stuff>;
>> > }
>> >
>> > int bar() {
>> >   int *p = ptr.load(memory_order_consume);
>> >   size_t reveal_that_p_is_in_d = p - d[0];
>> >   return compute(*p);
>> > }
>> >
>> > Could be transformed to (after inlining compute(), and specializing for
>> > the likely path):
>> >
>> > int bar() {
>> >   int *p = ptr.load(memory_order_consume);
>> >   if (p == d) return 23;
>> >   else <lots of stuff(*p)>;
>> > }
>>
>> Note that if a user writes
>>
>>   if (p == d)
>>    {
>>      ... do lots of stuff via p ...
>>    }
>>
>> GCC might rewrite accesses to p as accesses to d and thus expose
>> those opportunities.  Is that a transform that isn't valid then or is
>> the code written by the user (establishing the equivalency) to blame?
>
> In the context of this memory_order_consume proposal, this transform
> would be valid because the program has already "reveiled" what value p
> has after the branch has been taken.
>
>> There's a PR where this kind of equivalencies lead to unexpected (wrong?)
>> points-to results for example.
>>
>> > Other potential examples that come to mind are de-virtualization, or
>> > feedback-directed optimizations that has observed at runtime that a
>> > certain pointer is likely to be always equal to some other pointer (eg.,
>> > if p is almost always d[0], and specializing for that).
>>
>> That's the cases that are quite important in practice.
>
> Could you quantify this somehow, even if it's a very rough estimate?
> I'm asking because it's significant and widely used, then this would
> require users or compiler implementors to make a difficult trade-off
> (ie, do you want mo_consume performance or performance through those
> other optimizations?).

Probably resoved by your followup on how the transform is safe anyway.

>> > Also, it would be interesting to me to know how often we may turn data
>> > dependencies into control dependencies in cases where this doesn't
>> > affect performance significantly.
>>
>> I suppose we try to avoid that but can we ever know for sure?  Like
>> speculative devirtualization does this (with the intent that it _does_ 
>> matter,
>> of course).
>>
>> I suppose establishing non-dependence isn't an issue, like with the
>> vectorizer adding runtime dependence checks and applying versioning
>> to get a vectorized and a not vectorized loop (in case there are 
>> dependences)?
>
> I'm not sure I understand you correctly.  Do you have a brief example,
> perhaps?  For mo_consume and its data dependencies, if there might be a
> dependence, the compiler would have to preserve it; but I guess that
> both a vectorized loop an one that accessses each element separately
> would preserve dependences because it's doing those accesses, and they
> depend on the input data.
> OTOH, peraps HW vector instructions don't get the ordering guarantees
> from data dependences -- Paul, do you know of any such cases?

A brief example would be for

 void foo (int *a, int *b, int n)
{
  for (int i = 0; i < n; ++i)
   a[i] = b[i];
}

which we can vectorize like

  if (a + n < b || b + n < a)
   {
      vectorized loop
   }
  else
    {
       not vectorized loop
    }

note how we're not establishing equivalences between pointers but
non-dependence vs. possible dependence.

>> > The background for this question is Paul McKenney's recently updated
>> > proposal for a different memory_order_consume specification:
>> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0190r0.pdf
>> >
>> > In a nutshell, this requires a compiler to either prove that a pointer
>> > value is not carrying a dependency (simplified, its value somehow
>> > originates from a memory_order_consume load), or it has to
>> > conservatively assume that it does; if it does, the compiler must not
>> > turn data dependencies into control dependencies in generated code.
>> > (The data dependencies, in contrast to control dependencies, enforce
>> > memory ordering on archs such as Power and ARM; these orderings than
>> > allow for not having to use an acquire HW barrier in the generated
>> > code.)
>> >
>> > Given that such a proof will likely be hard for a compiler (dependency
>> > chains propagate through assignments to variables on the heap and stack,
>> > chains are not marked in the code, and points-to analysis can be hard),
>> > a compiler faces a trade-off between either:
>> > (1) trying to support this memory_order_consume specification and likely
>> > disallowing all transformations that change data dependencies into
>> > control dependencies, or
>> > (2) not support the proposal by simply emitting memory_order_acquire
>> > code, but get no new constraints on transformations in return (ie, what
>> > we do for memory_order_consume today).
>> >
>> > A compiler could let users make this choice, but this will be hard for
>> > users too, and the compiler would still have to pick a default.
>> >
>> > Therefore, it would be good to know how important such transformations
>> > or optimizations are in practice.  If they are, it either means somewhat
>> > slower code everywhere (or at least having to change much in todays
>> > compilers), or not supporting the new memory_order_consume (at least not
>> > in the default setting of the compiler).
>>
>> IMHO all the memory order semantics are too complicated already so what's
>> the reason to complicate it even more...?
>
> The hardware memory models of archs like Power an ARM guarantee ordering
> for accesses that have a data dependency, thus allowing things like
> traversing a concurrently modified list to not have to use acquire
> memory barriers (ie, a programmer could use memory_order_consume instead
> of memory_order_acquire).  The Linux kernel uses this heavily in RCU,
> for example.  It doesn't matter for archs like x86 where acquire /
> release barriers are implicit.

That's memory commit ordering on a single CPU core, right?  How would it
be ever valid to change commit order when there is a data dependence?

> Paul is looking for a solution that would make both the compiler and the
> kernel happy; being able to standardize this in C/C++ would be another
> good outcome from his perspective, I believe.  For GCC, the former might
> also have benefits because we'd have clearer requirements for something
> the Linux kernel is using heavily.
> The Linux kernel guys seem to be opposed to marking the dependency
> chains a program relies on (except for the initial memory_order_consume
> load).  I pointed out that this proposal would disallow the
> transformations I have asked about in this email, and Paul wanted to
> know how important they are.  So I asked here ... :)
>
> If the optimizations that would get disallowed in practice are important
> for performance, my inclination would be that we might need a different
> proposal for C++ standardization.  For the kernel use case only, it
> could nonetheless still be interesting for us to further investigate it;
> if the kernel doesn't need these optimizations, we'd at least have a
> better understanding of what the kernel expects and what we deliver.
>
>

Reply via email to