Hi,

I would like to discuss some issues before taking the ticket which expects
a new builtin function(e.g. string regex_escape(string_pattern)). The
purpose of the function is to escape a set of special characters by
replacing the string pattern with their escaped characters.

1. Define candidates of escaped characters
When I research the escape on other languages, interestingly there are some
differences and features in each language.

We should set our escaped characters. Here is a summary of the above
discussion:

- Perl: Escapes every character that is not alphanumeric(i.e. [A-Za-z_0-9]).
- PHP: Escapes the following special characters: . \ + * ? [ ^ ] $ ( ) { }
= ! < > | : -
- Python: Same as Perl's approach, but the character underscore is no
longer escaped since version 3.3.
- Ruby: Escapes the following special characters: [ ] { } ( ) | - * . \ ? +
^ $ #
Ruby Escapes comments(#), but do not escape context sensitive characters(:
<)
- Java: A different approach. Java relies on "as if it were a literal
pattern" by "\Q" and "\E"
- C#: Escapes the following special characters: \ * + ? | { [ ( ) ^ $ . #
whitespace
C# does not escapes ] and }.

See the discussion if you want to see more details:
https://github.com/benjamingr/RegExp.escape/blob/master/data/other_languages/discussions.md

2. Built-in function name
The reporter proposed "regex_escape". I think the function name is
intuitive and self-explainable. Please suggest if you have any better name.

3. Signature of the built-in function
Do we have to extend function signature? I guess an user may want to pass a
set of customized characters.

regex_escape(string_pattern, [delimiter])

delimiter
  := "^[A-Za-z0-9]"
  | "[.\?\[^()\]{}=!<>|:-]"

"^[A-Za-z0-9]" means "escapes non-alphanumeric characters"
"[.\?\[^()\]{}=!<>|:-]" means "escapes the specified characters"
In delimiter, the following characters should be escaped: []

Best regards,
Jinchul

Reply via email to