Awesome addition!  Would it make sense to use x86's BMI2's PDEP instruction, or 
is the interleave computation too small of a percentage to introduce 
not-so-easy-to-port code?  Also, I think it needs a bit more documentation to 
explain the logic, i.e. a link to 
https://stackoverflow.com/questions/39490345/interleave-bits-efficiently ?  Thx 
for making it faster :)

Reply via email to