On Tuesday, 11 October 2016 at 10:01:41 UTC, Stefan Koch wrote:
On Tuesday, 11 October 2016 at 09:45:11 UTC, Temtaime wrote:

Sorry this was also a type in the code.

void popFront7(ref char[] s) @trusted pure nothrow
{
  import core.bitop;
  auto v = 7 - bsr(~s[0] | 1);
s = s[v > 6 ? 1 : (v ? (v > s.length ? s.length : v) : 1)..$];
}

Please check this.

162 us

The branching, it hurts my eyes!

Something like the following should give correct (assuming I haven't written bad logic) branchless results with architecture-optimised max calls. Note that the minus/plus 1 operation on the third line will ensure with the sign multiplication that values of 7 will map to 1, whereas for all other values it's an extra operation. But the advantage is that you're not sticking three branches in close proximity to each other, so you will never get a branch predictor fail. (Of note, any performance test for these functions should test with data designed to fail the branching code I quoted, keeping in mind that desktop Intel processors have a four-state branch predictor. I've not performance tested it myself, but this will certainly run faster on the AMD Jaguar processors than a version with branching checks.)

int v = 7 - bsr( ~s[0] | 1 );
int sign = ( (v - 7) >> 31 );
v = ( v - 1 ) * sign + 1;
str = str[ min( v, s.length ) .. $ ];

Reply via email to