Re: faster splitter

David Nadlinger via Digitalmars-d Tue, 31 May 2016 15:56:57 -0700

On Tuesday, 31 May 2016 at 21:29:34 UTC, Andrei Alexandrescuwrote:

You may want to then try https://dpaste.dzfl.pl/392710b765a9,which generates inline code on all compilers. -- Andrei

In general, it might be beneficial to useldc.intrinsics.llvm_expect (cf. __builtin_expect) for things likethat in order to optimise basic block placement. (We shouldprobably have a compiler-independent API for that in core.*, bythe way.) In this case, the skip computation path is probablysmall enough for that not to matter much, though.

Another thing that might be interesting to do (now that you havea "clever" baseline) is to start counting cycles and make somecomparisons against manual asm/intrinsics implementations. Forshort(-ish) needles, PCMPESTRI is probably the most promisingcandidate, although I suspect that for \r\n scanning in longstrings in particular, an optimised AVX2 solution might havehigher throughput.

Of course these observations are not very valuable withoutbacking them up with measurements, but it seems like beforeoptimising a generic search algorithm for short-needle testcases, having one's eyes on a solid SIMD baseline would be aprudent thing to do.


 — David

Re: faster splitter

Reply via email to