Re: auto vectorization notes

Bruce Carneal via Digitalmars-d-learn Sat, 28 Mar 2020 15:26:09 -0700

On Saturday, 28 March 2020 at 18:01:37 UTC, Crayo List wrote:

On Saturday, 28 March 2020 at 06:56:14 UTC, Bruce Carneal wrote:
On Saturday, 28 March 2020 at 05:21:14 UTC, Crayo List wrote:
On Monday, 23 March 2020 at 18:52:16 UTC, Bruce Carneal wrote:
[snip]
Explicit SIMD code, ispc or other, isn't as readable orcomposable or vanilla portable but it certainly is performancepredictable.
This is not true! The idea of ispc is to write portable codethat willvectorize predictably based on the target CPU. The objectfile/binary is not portable,
if that is what you meant.
Also, I find it readable.

There are many waypoints on the readability <==> performanceaxis. If ispc works for you along that axis, great!

I find SIMT code readability better than SIMD but a littleworse than auto-vectorizable kernels. Hugely betterperformance though for less effort than SIMD if your platformsupports it.
Again I don't think this is true. Unless I am misunderstandingyou, SIMT and SIMDare not mutually exclusive and if you need performance then youmust use both.Also based on the workload and processor SIMD may be much moreeffective than SIMT.j

SIMD might become part of the solution under the hood for anumber of reasons including: ease of deployment, programmerfamiliarity, PCIe xfer overheads, kernel launch overhead, memorysubsystem suitability, existing code base issues, ...

SIMT works for me in high throughput situations where it's hardto "take a log" on the problem. SIMD, in auto-vectorizable ormore explicit form, works in others.

Combinations can be useful but most of the work I've come incontact with splits pretty clearly along the memory bandwidthdivide (SIMT on one side, SIMD/CPU on the other). Others need aplus-up in arithmetic horsepower. The more extreme therequirements, the more attractive SIMT appears. (hence myexcitement about dcompute possibly expanding the dlangperformance envelope with much less cognitive load thanCUDA/OpenCL/SycL/...)

On the readability front, I find per-lane programming, even withthe current thread-divergence caveats, to be easier to reasonabout wrt correctness and performance predictability than otherapproaches. Apparently your mileage does vary.

When you have chosen SIMD, whether ispc or other, over SIMT whatdrove the decision? Performance? Ease of programming to reach atarget speed?

Re: auto vectorization notes

Reply via email to