(this only applies for strict UTF-8) On Monday 22 August 2016 23:19:51 Karl Williamson wrote: > The code could be tweaked to call UTF8_IS_SUPER first, but I'm > asserting that an optimizing compiler will see that any call to > is_utf8_char_slow() is pointless, and will optimize it out.
Such optimization cannot be done and compiler cannot know such thing... You have this code: + const STRLEN char_len = isUTF8_CHAR(x, send); + + if ( UNLIKELY(! char_len) + || ( UNLIKELY(isUTF8_POSSIBLY_PROBLEMATIC(*x)) + && ( UNLIKELY(UTF8_IS_SURROGATE(x, send)) + || UNLIKELY(UTF8_IS_SUPER(x, send)) + || UNLIKELY(UTF8_IS_NONCHAR(x, send))))) + { + *ep = x; + return FALSE; + } Here isUTF8_CHAR() macro will call function is_utf8_char_slow() if condition IS_UTF8_CHAR_FAST(UTF8SKIP(x))) is truth. And because is_utf8_char_slow() is external library function compiler has absolutely no idea what that function is doing. In non-functional world such function could have side effect, etc and compiler really cannot eliminate that call. Moving UTF8_IS_SUPER before isUTF8_CHAR maybe could help, but I'm septic if gcc really can propagate constant from PL_utf8skip[] array back and prove that IS_UTF8_CHAR_FAST must be always true when UTF8_IS_SUPER is true too... Rather add IS_UTF8_CHAR_FAST(UTF8SKIP(s))) check (or similar) before isUTF8_CHAR() call. That should totally eliminate generating code with call to is_utf8_char_slow() function. With UTF8_IS_SUPER there can be branch in binary code which never will be evaluated.