On Wednesday, 19 August 2015 at 10:25:14 UTC, ponce wrote:
Loops in video coding already have no conditional. And for the one who have, conditionals were already removeable with existing instructions.

I think you are side-stepping the issue. Most people don't write video codecs. Most people also don't want to hand optimize their inner loops. The typical and most likely scenario is to run some easy-to-read-but-suboptimal function over a dataset. You both need library and compiler support for that to work out.

But even then: 10% difference in CPU benchmarks is a disaster.

I stand by what I know and measured: previously few things are speed up by AVX-xxx. It almost always better investing this time to optimize somewhere else.

AVX-512 is too far into the future, but if you are going to write a backend you have to think about increasing register sizes. Just because register size increase does not mean that throughput increases in the generation it was introduced (it could translate into several micro-ops).

But if you start redesigning your back end now then maybe you have something good in 5 years, so you need to plan ahead, not thinking about current gen, but 1-3 generations ahead.

Keep in mind that clock speeds are unlikely to increase, but stacking of memory on top of the CPU and getting improved memory bus speeds is a quite likely scenario.

A good use for the DMD backend would be to improve and redesign it for compile time evaluation. Then use LLVM for codegen.

Reply via email to