On Fri, Jul 10, 2015 at 11:39 PM, Abe <abe_skol...@yahoo.com> wrote: >> The GIMPLE level if-conversion code was purely >> written to make loops suitable for vectorization. > > > I`m not surprised to read that. > > >> It wasn't meant to provide if-conversion of >> scalar code in the end (even though it does). > > > Serendipity sure is nice. ;-) > > >> We've discussed enabling the versioning path unconditionally for example. > > > It might make sense even without vectorization, for target arch.s with [1] > either a wide range of > predicated instructions or at least [a] usable "cmove-like" instruction[s] > _and_ [2] the target either > _never_ runs off battery or only extremely-rarely runs off battery. When > running off "wall outlet" > electricity [and not caring about the electric bill ;-)], wasted speculation > is [usually?] just > wasted energy that didn`t cost any extra time b/c the CPU/core would have > been idle otherwise. > When running off battery, the wasted energy _can_ be unacceptable, but is > not _necessarily_ so: > it depends on the customer/programmer/user`s priorities, esp. vis-a-vis > execution speed vs. > amount of time a single charge allows the machine to run.
Well, but we do have a pretty strong if-converter on RTL which has access to target specific information. > The preceding makes me wonder: has anybody considered adding an optimization > profile for GCC, > to add to the set {"-O"..."-O3", "-Ofast", "-Os"}, that optimizes for the > amount of energy > consumed? I don`t remember reading about anything like that in relation to > compiler research, > but perhaps somebody reading this _has_ seen [or done!] something related > and would kindly reply. > Obviously, this is not an easy thing to figure out, since in _most_ cases > finishing the job > sooner -- i.e. running faster -- means less energy spent computing the job > than would have > otherwise been the case, but this is not _always_ true: for example, > speculative execution > that has a 50% probability of being wasteful instead of just idling in a > low-power state. I think there were GCC summit papers/talks about this. > >> So if the new scheme with scratch-pads produces more "correct" code but >> code > >> that will known to fail vectorization then it's done at the wrong place - >> because the whole purpose of GIMPLE if-conversion is to enable more >> vectorization. > > I think I understand, and I agree. The purpose of this pass is enable more > vectorization — the recently-reported fact that it can also enable more > cmove-style > non-vectorized code can also be beneficial, but is not the main objective. > > The main benefit of the new if converter is not vs. "GCC without any if > conversion", > but rather is vs. the _old_ if converter. The old one can, in some cases, > produce code that e.g. dereferences a null pointer when the same program > given > the same inputs would have not done so without the if-conversion > "optimization". Testcase? I don't think it can and if it can this bug needs to be fixed. > The new converter reduces/eliminates this problem. Therefor, my current > main goal is to eliminate the performance regressions that are not spurious > [e.g. are not a direct result of the old conversion being unsafe], > so that the new converter can be merged to trunk and also enabled implicitly > by "-O3" for autovectorization-enabled arch.es, which the old converter > AFAIK was _not_ [due to the aforementioned safety issues]. You mean the -ftree-loop-if-convert-stores path. > In other words, the old if converter was like a sharp knife with a very > small > handle: usable by experts, but dangerous for people with little knowledge of > the run-time properties of the code [e.g. will a pointer ever be null?] who > just want to pass in "-O3" and have the code run faster without much > thinking. > A typical GCC user: "This code runs fine when compiled with ''-O1'' > and ''-O2'', so with ''-O3'' it should also be fine, only faster!" > > IMO, only those flags that _explicitly_ request unsafe transformations > should be allowed > to cause {source code that runs perfectly when compiled with a low > optimization setting} > to be compiled to code that may crash or may compute a different result than > under a > low-optimization setting [e.g. compiling floating-point code such that the > executable > ignores NaNs or equates denorms with zero] even when given the same inputs > as a > non-crashing correct-result-producing less-optimized build of the same > source. > AFAIK this is in accordance with GCC`s philosophy, which explains why the > old > if converter was not enabled by default. The _new_ if converter, OTOH, is > safe > enough to enable by default under "-O3", and should be beneficial for > targets > that support vector operations and for which the autovectorizer is > successful > in generating vector code. Those are probably the main reasons why the new > converter is worth hacking on to get it into shape, > performance-regression-wise. > Plus, this is work that my employer [Samsung] is willing and able > to fund at this time [by paying my salary while I work on it ;-)]. Sure. > Regards, > > Abe