Op 12/27/2021 om 4:39 PM schreef Bart via lazarus:
pn8^              =11100010   //first byte
(pn8^ shr 7)      =11111111  //<<-- I would have expected that to be 00000001 ?

Depends on if pn8^ is signed or not, for a signed shift it makes sense. The definition as pint8 (instead of puint8) is an odd choice.

The expression seems to be 1 when the top bits are 10  iow when it is a follow bytes of utf8, that is what the comment says, and I as far as I can see the signedness doesn't matter.

Basically to me that seems to be a branchless version of

if (p[i] and %11000000)=%10000000 then

   inc(result);

...which counts all utf8 follow bytes, and then subtracts it from the number of bytes in a string to find the number of utf8 sequences/codepoints.


Maybe the absolute stuff confuses somehow? Also make sure the input is 100% the same by printing the values of the bytes of the input string.

--
_______________________________________________
lazarus mailing list
lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Reply via email to