Don't use the \C escape in regexes - Why not?

Michael Ludwig Mon, 03 May 2010 11:35:08 -0700

"Don't use the \C escape in regexes" - taken from Juerd's Unicode Advice page:


  http://juerd.nl/site.plp/perluniadvice

Why not?

------ perldoc perlre:
\C  Match a single C char (octet) even under Unicode.
    NOTE: breaks up characters into their UTF-8 bytes,
    so you may end up with malformed pieces of UTF-8.
    Unsupported in lookbehind.

------ URI::Escape
sub escape_char {
    return join '', @URI::Escape::escapes{$_[0] =~ /(\C)/g};
}

The regular expression is used to disassemble an incoming text string into 
individual bytes (and then use the resulting list in a hash slice). It is a 
legitimate use case, and the means seems to do the job. What's the problem with 
the \C escape?

-- 
Michael.Ludwig (#) XING.com

Don't use the \C escape in regexes - Why not?

Reply via email to