On Thu, Jul 09, 2015 at 03:46:08PM +0200, Richard Biener wrote: > On Thu, 9 Jul 2015, Bernhard Reutner-Fischer wrote: > > > gcc/ChangeLog > > > > 2015-07-09 Bernhard Reutner-Fischer <al...@gcc.gnu.org> > > > > * builtins.c (fold_builtin_tolower, fold_builtin_toupper): New > > static functions. > > (fold_builtin_1): Handle BUILT_IN_TOLOWER, BUILT_IN_TOUPPER. > > As I read it you fold tolower (X) to (X) >= target_char_set ('A') > && (X) <= target_char_set ('Z') ? (X) - target_char_set ('A') + > target_char_set ('a'); > > I don't think this can be correct for all locales which need not > have a lower-case character for all upper-case ones nor do > all letters having one need to be in the range of 'A' to 'Z'. > > Joseph will surely correct me if I am wrong. > Thats correct as this doesn't handle toupper('č') with appropriate single byte locale. You cannot even rely on fact that if x<128 then only conversion is happens in 'A'..'Z' range, there are locales where that doesn't hold and we need to check _NL_CTYPE_NONASCII_CASE. We don't export that so you would need to check that while constructing table with 256 entries.
Also your example is invalid as you used __builtin_tolower instead tolower. As usual gcc builtins are slow, you will get better performance with following. #include <ctype.h> int foo(char *c) { int i; for(i=0;i<1000;i++) c[i]=tolower(c[i]); } As your example first problem is that it doesn't work with utf8 due multibyte characters. Second problem is that sse4.2 doesn't help at all as generating masks with it is quite slow. Using just sse2 is faster here. It could be possible to add such function to libc. For vectorization you would need to use following after checking that _NL_CTYPE_NONASCII_CASE didn't happen. I didn't finished or tested that, you need set up char128, a, z to to tests 128 <= x[i], 'A' <= x[i] and x[i] <= 'Z' void c16(char *_x, char *y) { __m128i x = _mm_loadu_si128(_x); int mask = _mm_movemask_epi8(_mm_cmpgt_epi8(x, char128); x=_mm_or_si128(x, _mm_and_si128(tolower_bit, _mm_and_si128 (_mm_cmpgt_epi8(a,x), _mm_cmpgt_epi8(x,z)))); _mm_storeu_si128(y, x); while (mask) { int i = ffs(mask); y[i] = tolower(y[i]); mask = mask & (mask - 1); } }