On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:
I'm currently too busy to submit a complete solution, but please feel free to use my idea if you think it sounds promising.

I now managed to dig up my old C source... but I'm still blocked by dmd not accepting the 'pext' instruction...

1) I know my solution is not directly comparable to the rest in this thread(for many reasons). 2) It's of course trivial to add a fast path for ascii... if desired.
3) It throws safety and standards out the window.

#include <x86intrin.h>

char32_t front(const char* inp)
{
  static const uint32_t mask[32] =
  {
      0b01111111'00000000'00000000'00000000,
      0b00000000'00000000'00000000'00000000,
      0b00011111'00111111'00000000'00000000,
      0b00001111'00111111'00111111'00000000,
      0b00000111'00111111'00111111'00111111,
  };
uint32_t rev = __builtin_bswap32(reinterpret_cast<const uint32_t*>(inp)[0]);
  uint32_t len = __lzcnt32(~rev);

  return _pext_u32(rev, mask[len]);
}

This is what clang 3.4 generated:
## BB#0:
        pushq   %rbp
Ltmp2:
        .cfi_def_cfa_offset 16
Ltmp3:
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
Ltmp4:
        .cfi_def_cfa_register %rbp
        movl    (%rdi), %eax
        bswapl  %eax
        movl    %eax, %ecx
        notl    %ecx
        lzcntl  %ecx, %ecx
        leaq    __ZZ5frontPKcE4mask(%rip), %rdx
        pextl   (%rdx,%rcx,4), %eax, %eax
        popq    %rbp
        ret

Reply via email to