bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales

Pádraig Brady Fri, 10 Jan 2014 17:50:23 -0800

Cool so it does this transformation:

  sed 's/./[\L&\U&]/g'


Though multi byte case handling has all sorts of edge cases (pardon the pun),
and it may not be always valid to treat each character independently?
For example see some of the tests in:
http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=tests/unicase/test-ulc-casecmp.c;hb=HEAD

I wonder might this faster path be restricted to a safer but very common input 
subset of:

(MB_CUR_MAX == 1 || (in_utf8 && *c < 0x80))

Also are the following printfs in the test redundant?

> +data=$(      printf "I:$I $i:i")
> +search_str=$(printf "$i:i I:$I")

nice improvement!
Pádraig.

bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales

Reply via email to