https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83601

            Bug ID: 83601
           Summary: std::regex_replace C++14 conformance issue: escaping
                    in SED mode
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: andrey.y.guskov at intel dot com
  Target Milestone: ---

C++14 standard (page 1107, see here:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf#1121), 28.5.2
[Bitmask type regex_constants::match_flag_type]:

...
format_sed
When a regular expression match is to be replaced by a new string, the new
string shall be constructed using the rules used by the sed utility in POSIX.
...


The rules which SED uses are documented in IEEE 1003.1 (p. 3221):

An <ampersand>
('&') appearing in the replacement shall be replaced by the string matching the
BRE. The special meaning of '&' in this context can be suppressed by preceding
it
by a <backslash>. The characters "\n", where n is a digit, shall be replaced by
the
text matched by the corresponding back-reference expression. 
...
The special meaning of "\n" where n is a digit in
this context, can be suppressed by preceding it by a <backslash>.


The current implementation of std::regex_replace does not comply to the
standard: special meanings of &, \0, \2 cannot be suppressed by escaping them
with backslashes.


Reproducer:

#include <regex>
int frep(const wchar_t *istr, const wchar_t *rstr, const wchar_t *ostr) {
    std::basic_regex<wchar_t> wrgx(L"(a*)(b+)");
    std::basic_string<wchar_t> wstr = istr, wret = ostr, test;
    std::regex_replace(std::back_inserter(test), wstr.begin(), wstr.end(),
                       wrgx, std::basic_string<wchar_t>(rstr),
                       std::regex_constants::format_sed);
    return !printf("'%ls' %c= '%ls'\n",
                   test.c_str(), (test == wret)? '=' : '!', wret.c_str());
}
int main() {
    frep(L"xbbyabz", L"!\\\\2!", L"x!\\2!y!\\2!z");
    frep(L"xbbyabz", L"!\\\\0!", L"x!\\0!y!\\0!z");
    return frep(L"xbbyabz", L"!\\&!", L"x!&!y!&!z");
}

Reply via email to