On 07.07. 2015 at 02:02, Duy Nguyen <[email protected]> wrote:
> On Tue, Jul 7, 2015 at 3:10 AM, René Scharfe <[email protected]> wrote:
> > Am 06.07.2015 um 14:42 schrieb Nguyễn Thái Ngọc Duy:
> > So the optimization before this patch was that if a string was searched for
> > without -F then it would be treated as a fixed string anyway unless it
> > contained regex special characters. Searching for fixed strings using the
> > kwset functions is faster than using regcomp and regexec, which makes the
> > exercise worthwhile.
> >
> > Your patch disables the optimization if non-ASCII characters are searched
> > for because kwset handles case transformations only for ASCII chars.
> >
> > Another consequence of this limitation is that -Fi (explicit
> > case-insensitive fixed-string search) doesn't work properly with non-ASCII
> > chars neither. How can we handle this one? Fall back to regcomp by
> > escaping all special characters? Or at least warn?
>
> Hehe.. I noticed it too shortly after sending the patch. I was torn
> between simply documenting the limitation and waiting for the next
> person to come and fix it, or quoting the regex then passing to
> regcomp. GNU grep does the quoting in this case, but that code is
> GPLv3 so we can't simply copy over. It could be a problem if we need
> to quote a regex in a multibyte charset where ascii is not a subset.
> But i guess we can just go with utf-8..
I played a little bit with the code and I came up with this function to escape
regular expressions in utf-8. Hope it helps.
static void escape_regexp(const char *pattern, size_t len,
char **new_pattern, size_t *new_len)
{
const char *p = pattern;
char *np = *new_pattern = xmalloc(2 * len);
int chrlen;
*new_len = len;
while (len) {
chrlen = mbs_chrlen(&p, &len, "utf-8");
if (chrlen == 1 && is_regex_special(*pattern))
*np++ = '\\';
memcpy(np, pattern, chrlen);
np += chrlen;
pattern = p;
}
*new_len = np - *new_pattern;
}
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html