On Tuesday, 31 May 2016 at 21:29:34 UTC, Andrei Alexandrescu wrote:
You may want to then try https://dpaste.dzfl.pl/392710b765a9, which generates inline code on all compilers. -- Andrei

In general, it might be beneficial to use ldc.intrinsics.llvm_expect (cf. __builtin_expect) for things like that in order to optimise basic block placement. (We should probably have a compiler-independent API for that in core.*, by the way.) In this case, the skip computation path is probably small enough for that not to matter much, though.

Another thing that might be interesting to do (now that you have a "clever" baseline) is to start counting cycles and make some comparisons against manual asm/intrinsics implementations. For short(-ish) needles, PCMPESTRI is probably the most promising candidate, although I suspect that for \r\n scanning in long strings in particular, an optimised AVX2 solution might have higher throughput.

Of course these observations are not very valuable without backing them up with measurements, but it seems like before optimising a generic search algorithm for short-needle test cases, having one's eyes on a solid SIMD baseline would be a prudent thing to do.

 — David

Reply via email to