> On 04/04/16 11:13, Evandro Menezes wrote: > > On 04/01/16 18:08, Wilco Dijkstra wrote: > >> Evandro Menezes wrote: > >>> I hope that this gets in the ballpark of what's been discussed > >>> previously. > >> Yes that's very close to what I had in mind. A minor issue is that > >> the vector modes cannot work as they start at MAX_MODE_FLOAT (which > >> is > 32): > >> > >> +/* Control approximate alternatives to certain FP operators. */ > >> +#define AARCH64_APPROX_MODE(MODE) \ > >> + ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \ > >> + ? (1 << ((MODE) - MIN_MODE_FLOAT)) \ > >> + : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= > >> MAX_MODE_VECTOR_FLOAT) \ > >> + ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT + 1)) \ > >> + : (0)) > >> > >> That should be: > >> > >> + ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT - > >> MIN_MODE_FLOAT + 1)) \ > >> > >> It would be worth testing all the obvious cases to be sure they work. > >> > >> Also I don't think it is a good idea to enable all modes on Exynos-M1 > >> and XGene-1 - I haven't seen any evidence that shows it gives a > >> speedup on real code for all modes (or at least on a good micro > >> benchmark like the unit vector test I suggested - a simple throughput > >> test does not count!). > > > > This approximation does benefit M1 in general across several > > benchmarks. As for my choice for Xgene1, it preserves the original > > setting. I believe that, with this more granular option, developers > > can fine tune their targets. > > > >> The issue is it hides performance gains from an improved divider/sqrt > >> on new revisions or microarchitectures. That means you should only > >> enable cases where there is evidence of a major speedup that cannot > >> be matched by a future improved divider/sqrt. > > > > I did notice that some benchmarks with heavy use of multiplication or > > multiply-accumulation, the series may be detrimental, since it > > increases the competition for the unit(s) that do(es) such operations. > > > > But those micro-architectures that get a better unit for division or > > sqrt() are free to add their own tuning parameters. Granted, I assume > > that running legacy code is not much of an issue only in a few markets. > > Ping^1
Ping^2 -- Evandro Menezes Austin, TX