(not a review)

On Thu, 10 May 2018, Hans-Peter Nilsson wrote:

Replacing a division feeding a division helps only when the
second division is the only user, and "fusing" the divisions is

Well, that's not quite true.
int x, y;
void f(int n){
  int c = 3 << 20;
  x = n / c;
  y = n / c;
}

Here we can optimize the last division to y = 0. After your patch, we likely need VRP to do that simplification. There are probably more complicated transformations this disables.

downright bad if another user of the result of first division is
a modulus of the same value as the second division, forming a
divmod pair.  See the test-case, where for the tested
architectures (which all fail the test-case before the patch)
the div and mod are implemented using the high-part of a widened
multiplication and shift, emitted separately but combined as
late as in rtl, with the multiplicaton and shift re-used.  That
of course does not happen if later passes see (y / 48; y % 3).
While in match.pd, I noticed the corresponding mul-mul match,
which I believe should be guarded the same way.

Did you notice bad codegen because of the multiplication? You are only adding a test for divisions. I am asking because I know such a change will help some cases and hurt others...

Now a rant on the match.pd ":s" flag, which reasonable people
may reasonably suggest I should have used in the patch instead
of the (if (single_use ...)).

Initially, I got the match.pd-language (which deserves a proper
name) all wrong.  Then I read the documentation and still got it
wrong.  I "misunderstood" that the ":s" on an operand "O" was
supposed to have the effect of conditionalize the replacement
"R" by wrapping it in "(if (single_use (O)) R)" as in the
suggested patch (above).  To wit, this does not work; it will
*not* stop the replacement as seen in the test-case (THIS IS NOT
A SUGGESTED PATCH):

(for div (trunc_div exact_div)
 (simplify
-  (div (div @0 INTEGER_CST@1) INTEGER_CST@2)
+  (div (div:s @0 INTEGER_CST@1) INTEGER_CST@2)
  (with {
    bool overflow_p;
    wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2),

In PR69556, it seems other people seem to have read the
documentation of ":s" the same way, but are corrected by other
comments there, so I guess it's not my reading that's flawed.

I suggest preferably (1) correcting the semantics of ":s" to do
as the documentation says because I don't understand the
explanation in PR69556 comment #4 that the replacement "is still
allowed if it is a single operation as that replaces at least
one other (the one we are simplifying)"; I see that as a
complete nullification of the :s flag, making it a nop.
Alternatively (2), if the :s is *not* a nop in some case add an
example of that case and sufficient explanation to
match-and-simplify.texi *and* the suggested ":S" flag
(i.e. *really* conditionalize the replacement by a single-use of
that operand).

To simplify, the goal of :s is to avoid increasing the number of instructions. Normally, the transformation output is smaller (or the same size but cheaper, simpler, more canonical) than the input. But if we can't get rid of some of the input intermediate results, the size may still increase. :s does test single_use, but it has a special case. If the output (possibly after several rounds of resimplifications) is at most one instruction, then it cannot be larger than the input (we are at least getting rid of the instruction we called the simplification on), so the transformation does happen even if !single_use. Originally this was only done when the simplification led to an SSA_NAME or a constant, IIRC.

Then people start wanting single_use restrictions to reduce register pressure, reduce the size / number of constants, etc. And not all of those want exactly the same conditions.

It is useful for high-level transformations to push the canonicalization as far as possible, to notice equivalent quantities or constant bounds in particular. So on a case by case basis, we use :s or single_use or whatever...

If we use both y=x/3 and z=x/15 in the same function, should we make an effort to detect it and rewrite to z=y/5?

--
Marc Glisse

Reply via email to