On Thu, Jul 09, 2015 at 03:46:08PM +0200, Richard Biener wrote:
> On Thu, 9 Jul 2015, Bernhard Reutner-Fischer wrote:
> 
> > gcc/ChangeLog
> > 
> > 2015-07-09  Bernhard Reutner-Fischer  <al...@gcc.gnu.org>
> > 
> >     * builtins.c (fold_builtin_tolower, fold_builtin_toupper): New
> >     static functions.
> >     (fold_builtin_1): Handle BUILT_IN_TOLOWER, BUILT_IN_TOUPPER.
> 
> As I read it you fold tolower (X) to (X) >= target_char_set ('A')
> && (X) <= target_char_set ('Z') ? (X) - target_char_set ('A') + 
> target_char_set ('a');
> 
> I don't think this can be correct for all locales which need not
> have a lower-case character for all upper-case ones nor do
> all letters having one need to be in the range of 'A' to 'Z'.
> 
> Joseph will surely correct me if I am wrong.
> 
Thats correct as this doesn't handle toupper('č') with appropriate
single byte locale. You cannot even rely on fact that if x<128 then only
conversion is happens in 'A'..'Z' range, there are locales where that
doesn't hold and we need to check _NL_CTYPE_NONASCII_CASE. We don't
export that so you would need to check that while constructing table with 256 
entries.

Also your example is invalid as you used __builtin_tolower instead
tolower. As usual gcc builtins are slow, you will get better performance
with following.

#include <ctype.h>
int foo(char *c)
{
 int i;
 for(i=0;i<1000;i++)
   c[i]=tolower(c[i]);
}


As your example first problem is that it doesn't work with utf8 due
multibyte characters.

Second problem is that sse4.2 doesn't help at all as generating masks
with it is quite slow. Using just sse2 is faster here.

It could be possible to add such function to libc. For vectorization you
would need to use following after checking that _NL_CTYPE_NONASCII_CASE
didn't happen. I didn't finished or tested that, you need set up
char128, a, z to to tests 128 <= x[i], 'A' <= x[i] and x[i] <= 'Z'

void c16(char *_x, char *y)
{
__m128i x = _mm_loadu_si128(_x);

int mask = _mm_movemask_epi8(_mm_cmpgt_epi8(x, char128);
x=_mm_or_si128(x, _mm_and_si128(tolower_bit, 
_mm_and_si128 (_mm_cmpgt_epi8(a,x), _mm_cmpgt_epi8(x,z))));
_mm_storeu_si128(y, x);
while (mask)
{
  int i = ffs(mask);
  y[i] = tolower(y[i]); 
  mask = mask & (mask - 1);
}
}

Reply via email to