On 7/2/15 4:30 AM, Alan Lawrence wrote:
Hi, pleased to meet you :)
Likewise. :-)
[Abe wrote:]
* Always safe for stores, sometimes a little risky for loads:
speculative loads might cause multithreaded programs with
insufficient locking to fail due to writes by another thread
being "lost"/"missed", even though the same program works OK
"by luck" when compiled without if-conversion of loads.
This risk comes mainly/only from what the relevant literature
calls a "half hammock": an "if" with a "then" section but no
"else" section [or effectively vice-versa, e.g. an empty "then"
and a non-empty "else"]. In this case, e.g. "if (c) X[x] = Y[y];"
with no attached "else" section is risky to fully if-convert
in the event of the code being compiled running multithreaded
and not having been written with all the locking it really needs.
Respectively, e.g. "if (c) ; /* empty ''then'' */ else X[x] = Y[y];".
[Alan wrote:]
For the unenlightened, can you outline the problem with this code sequence?
(i.e. the expected transformation that makes it unsafe!?)
I would hope your scratchpad patch would turn this into something like
a1 = c ? &Y[y] : &scratch;
temp = *a1;
a2 = c ? &X[x] : &scratch;
*a2 = temp;
which seems OK to me
Yes, you are right. The problem I was thinking about is not present in the
above:
in the "'c' is false" case, the vectorized code for the above just wastes some
effort
by reading garbage from the scratchpad and writing it back to the scratchpad.
so is the scratchpad approach going away?
Not at all. :-)
My example[s] was/were not written well with regard to expressing what I had in
mind.
The problem I was thinking about is shown with a scalar destination, e.g.:
if (c) foo = X[x];
... which is if-converted into the equivalent of:
foo = c ? X[x] : foo;
The [perceived/potential] problem with the preceding is that the part that is equivalent
to "foo = foo;"
in the above can _not_ be optimized out, as it normally would for a non-"volatile"
"foo",
because it is part of a larger vectorized operation and this small part cannot
be broken out of the whole
without breaking vectorization. Therefore, the value of "foo" might be read
and then written back
a few {micro|nano|pico|whatever}-seconds later, which may cause an update to
the same location to be overwritten.
That`s why the pathological program-under-compilation is a badly-written
multithreaded program without enough locking:
without the "if something, then overwrite" being replaced by "read, then if
something then write the new value,
and if not that same something then rewrite the old value", the badly-written
multithreaded program might work correctly
through "good luck", but with the replacement [i.e. the if conversion] the
chances of success
[where "success" here basically means not "missing" a write by another thread]
is lower than it was before.
However, after some discussion with Sebastian I learned that this is already
taken care of, i.e. safe:
the "read, then if something then write the new value, and if not that same
something then rewrite the old value"
replacement strategy is only used for thread-local scalars. For global scalars
and static scalars,
we treat the scalar as if it were the first element of a length=1 array and
don`t have this problem.
In other words, the problem about which I was concerned is not going to be triggered by
e.g. "if (c) x = ..."
which lacks an attached "else x = ..." in a multithreaded program without
enough locking just because 'x' is global/static.
The only remaining case to consider is if some code being compiler takes the address of
something thread-local and then "gives"
that pointer to another thread. Even for _that_ extreme case, Sebastian says
that the gimplifier will detect this
"address has been taken" situation and do the right thing such that the new if
converter also does the right thing.
TLDR: Abe was being too paranoid; to the best of our knowledge, it`s OK as-is.
;-)
[Abe wrote:]
One of the reasons the new if converter has not yet been submitted
for incorporation into GCC`s trunk is that it still has some
performance regressions WRT the old converter, and most of those
are "true regressions", i.e. not just because the old converter
was less safe and the additional safety is what is causing the loss,
but rather because there is more work to do before the patch is ready.
As of this writing, the new if converter sometimes tries
to "convert" something that is already vectorizer-friendly,
and in doing so it renders that code now-NOT-vectorizer-friendly.
[Alan wrote:]
Can you give an example?
The test cases in the GCC tree at "gcc.dg/vect/pr61194.c" and
"gcc.dg/vect/vect-mask-load-1.c"
currently test as: the new if-converter is "converting" something that`s
already vectorizer-friendly,
messing up the IR in the process and thus disabling vectorization for the test
case in question.
My understanding was that the existing vectorizer bailed out pretty much
straightaway
if the number of basic blocks in the loop was not exactly 2 (for inner loops)
or 5
(for outermost loops, i.e. containing exactly one inner loop)...
I see nothing in that text that I know to be wrong.
that seems to rule out vectorization of *any* kind of
conditional execution, that the if-converter might convert?
If the converter takes something that`s already good [e.g. manually "converted"]
and doesn`t "understand" that the code is already good, it may analyze the code
in question,
arrive at the conclusion "if-conversion candidate", and then "convert" it,
making a mess in the process.
This mess, in turn, causes the vectorizer to arrive at the conclusion "NOT a
vectorization candidate"
whereas without the incorrect "conversion" its conclusion would have been
"vectorization candidate" and it would have vectorized it.
However, TTBOMK the vectorizer already "understands" that in cases where its
input looks like:
x = c ? y : z;
... and 'y' and 'z' are both pure [side-effect-free] -- including, but not limited to,
they must be non-"volatile" --
it may vectorize a loop containing code like the preceding, ignoring for this
particular instance the C mandate
that only one of {y, z} be evaluated, depending on the value of 'c', since when
both are truly pure expressions it`s only
at worst a waste of time/energy/both, not a semantics issue, and the vectorizer
is either assuming that it`s worth it
to make the transformation due to the factors-level speedup of vectorization,
or it is using a cost model
and arriving at the conclusion that for the loop in question, vectorization is
worth it.
Normally, the code would not really look all that close to the preceding, but
rather more like:
X[x] = c ? Y[y] : Z[z];
... since if all the operands are scalars, it`s not very likely to be worth
vectorizing, of course.
The code in the loop being vectorized might even look like:
X[x] = C[c] ? Y[y] : Z[z];
... i.e. the condition could also be read from an array rather than from a
scalar,
regardless of whether or not that scalar [in the previous example] is a
compiler-generated temporary.
I hope the above is helpful.
Regards,
Abe