Hi,
I would like to discuss some issues before taking the ticket which expects
a new builtin function(e.g. string regex_escape(string_pattern)). The
purpose of the function is to escape a set of special characters by
replacing the string pattern with their escaped characters.
1. Define candidates of escaped characters
When I research the escape on other languages, interestingly there are some
differences and features in each language.
We should set our escaped characters. Here is a summary of the above
discussion:
- Perl: Escapes every character that is not alphanumeric(i.e. [A-Za-z_0-9]).
- PHP: Escapes the following special characters: . \ + * ? [ ^ ] $ ( ) { }
= ! < > | : -
- Python: Same as Perl's approach, but the character underscore is no
longer escaped since version 3.3.
- Ruby: Escapes the following special characters: [ ] { } ( ) | - * . \ ? +
^ $ #
Ruby Escapes comments(#), but do not escape context sensitive characters(:
<)
- Java: A different approach. Java relies on "as if it were a literal
pattern" by "\Q" and "\E"
- C#: Escapes the following special characters: \ * + ? | { [ ( ) ^ $ . #
whitespace
C# does not escapes ] and }.
See the discussion if you want to see more details:
https://github.com/benjamingr/RegExp.escape/blob/master/data/other_languages/discussions.md
2. Built-in function name
The reporter proposed "regex_escape". I think the function name is
intuitive and self-explainable. Please suggest if you have any better name.
3. Signature of the built-in function
Do we have to extend function signature? I guess an user may want to pass a
set of customized characters.
regex_escape(string_pattern, [delimiter])
delimiter
:= "^[A-Za-z0-9]"
| "[.\?\[^()\]{}=!<>|:-]"
"^[A-Za-z0-9]" means "escapes non-alphanumeric characters"
"[.\?\[^()\]{}=!<>|:-]" means "escapes the specified characters"
In delimiter, the following characters should be escaped: []
Best regards,
Jinchul