[Abe wrote:]
After finishing fixing the known regressions, I intend/plan to reg-test for
AArch64;
after that, I think I`m going to need some community help to reg-test for other
ISAs.
[Alan wrote:]
OK, I'm confused. When you write "making the new if-converter not mangle IR"...
> does "the new if-converter" mean your scratchpad fix to PR46029
Basically, yes. The code I currently have is, IMO,
sufficiently-{changed/rewritten/replaced}
relative to {the code currently in the file at that path in trunk} that I am
referring to
the pre-patch code in "tree-if-conv.c" as "the old if converter" and referring
to the code
in the version of the file at that path in my Git GCC checkout as "the new if
converter".
[Alan wrote:]
or is there some other new if-conversion phase that you are still working on
and haven't posted yet?
No; as of now, Sebastian and I are not working on adding any more passes or
phases: we are just trying to
replace the old, potentially-unsafe, gives-up-more-easily converter with the
new one that uses the scratchpad
idea both to produce safer conversions and to be able to convert a greater
percentage of the time.
The only significant difficulty with the preceding at this time, AFAIK, is that
the work-in-progress of the
new converter produces {too many levels deep} of indirections, so the
vectorizer gives up trying to vectorize
the result[s] of the conversion, which makes the whole thing a big "fail" [not
in the DejaGNU sense,
although pretty close]. That`s why I would love to have some help in fixing
the regressions.
[Alan wrote:]
> I haven't yet understood what you mean about "vectorizer-friendly" IR being
mangled; is the problem that your
> new […] transforms IR that can currently be if-converted by the existing
phase, into IR that can't? (Example?)
Not exactly, but close.
"TLDR" for the following [from "vvv" to "^^^"]: "too much unnecessary indirection is
being added".
--- vvv --- TLDR`d above --- vvv ---
Let`s pretend the programmer of the code being compiled knows all about
if-conversion and vectorization and does the
conversion _manually_. In other words, the C/C++/etc. code passed in to the
compiler _already_ looks something like:
temp0 = compute_condition();
temp1 = foo();
temp2 = bar();
result = temp0 ? temp1 : temp2;
IOW, essentially IR/assembler written in C. Since the C code is lowered to
GIMPLE and the '?' operator is lowered [too early IMO *] to blocks and "goto"s,
the if-conversion pass no longer "sees" "result = temp0 ? temp1 : temp2;" --
instead, it sees something like:
if temp0
goto <bb 4>
else
goto <bb 5>
/* --- bb 4 --- */
result = temp1;
goto <bb 6>
/* --- bb 4 --- */
result = temp2;
/* --- bb 6 --- */
… which the old if converter handles in such a way as to
keep the vectorizer happy, but the new one does not yet.
'*': another issue to discuss and something that IMO should be
fixed/improved in GCC but outside of "tree-if-conv.c"
The new if converter is doing something at least similar to "ooh! Ooh!
The programmer wrote a valid-conversion-candidate ''if'' that I know how to
convert!"
and messes it up; IIRC the way in which this gets messed up is something like:
_ifcvt_temp_0 = temp0;
…
_ifcvt_temp_1 = temp1;
…
_ifcvt_temp_2 = temp2;
…
result = _ifcvt_temp_0 ? _ifcvt_temp_1 : _ifcvt_temp_2;
… at which the vectorizer turns up its nose, says "too much indirection" and/or "too
much gather",
and doesn`t vectorize -- for a {source code, target architecture} pair for
which the code can
and should be vectorized by GCC. In particular, the old vectorizer handles it
well IIRC,
probably b/c the old vectorizer has less indirection overhead, which it can
easily "afford" since it never adds an indirection through a scratchpad.
I think what is needed here is basically to reduce the indirection overhead in
the new converter
by making the new converter realize "oh, <foo> is already pure and
thread-local, so I don`t need
to copy <foo> into a temporary before using it". At least, that seems to me
like _one_ way to
fix this category of regressions in the new converter. Another way is to make
GCC overall not
convert e.g. "x ? y : z" into basic blocks etc. so early all of the time; my
impressions is
that GCC is not doing both of {inspecting the purity, analyzing the cost} of
the expressions
and only converting into basic blocks etc. when e.g. 'y'/'z'/both is/are impure
and/or
at least one of {'y', 'z'} is an "expensive" thing to compute. For low-cost
pure expressions,
e.g. "x ? y : z" should be retained as-as IMO -- i.e. not lowered into separate
BBs --
for as long as possible. Ideally, it is encoded as a "COND_EXPR" and stays
that way for
as long as possible when there is no purity/high-computational-cost problem
with either the
second or the third param. In the worst case, this could involve
fixing/improving several
front ends, so I want to push this off into a separate subproject from the if
conversion itself.
--- ^^^ --- TLDR`d above --- ^^^ ---
I hope the above is helpful, but since it`s from memory I don`t
guarantee that it`s both 100% accurate and 100% complete. ;-)
[Alan wrote:]
Then I might (only "might", sorry!) be able to help...
Great! Thanks. :-)
Ideally, I/we fix the above problem -- and the rest of the regressions in the
new if converter --
without any significant changes to core GCC files outside of "tree-if-conv.c".
IOW, I`d like
to minimize the invasiveness of this patch and get rid of {as many regressions
as possible}
the fixes for which lie entirely inside "tree-if-conv.c" before proceeding to
fix/improve the
"lowered too early" problem that I perceive current GCC trunk has in its lowering of
"x ? y : z".
I`m almost 100% certain that fixing/improving the lowering will require
significant alterations
to code in files other than "tree-if-conv.c", so even though that
fix/improvement would likely
fix at least one regression in the new if converter, I`d rather do it
separately/later/both.
In particular, I remember that "result = condition ? array1[index] : array2[maybe
the same index, maybe not]"
is being converted too early IMO. IOW, somewhere in GCC an array dereference
is being considered
as either impure, too-expensive, or both. "array[index]" in C [not in C++!:
operator overloading]
AFAIK is always pure and is always low-cost whenever the index expression is
pure and low-cost.
Regards,
Abe