https://bugs.llvm.org/show_bug.cgi?id=40904

            Bug ID: 40904
           Summary: regex_search on MacOS gives wrong results when \D
                    found in a character class
           Product: libc++
           Version: unspecified
          Hardware: Macintosh
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: All Bugs
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected], [email protected]

Pre-C++20, there's no way to turn on /s, so instead of a pattern like /ab.cd/
(where the third character could be a newline) we must write something like
/ab[/d/D]cd/ (using the union of "digits" and "non-digits" to match "any
character").

Unfortunately, libc++ doesn't match properly on this.

Example:

  #include <regex>
  #include <string>
  #include <iostream>
  #include <iomanip>

  int main()
  {
      const std::string input = "abZcd";
      char const* pattern = R"REGEX(^ab[\d\D]cd)REGEX";

      std::regex::flag_type flags = std::regex_constants::ECMAScript;
      std::regex re(pattern, flags);

      std::cout << std::boolalpha << std::regex_search(input.cbegin(),
input.cend(), re) << '\n';
  }

Output is "false" with:

  $ clang --version
  Apple LLVM version 10.0.0 (clang-1000.10.44.4)
  Target: x86_64-apple-darwin18.2.0
  Thread model: posix
  InstalledDir: /Library/Developer/CommandLineTools/usr/bin

But "true" (as expected) with g++ (GCC) 8.2.0.

Looking into it a bit, here are the results with some variants:

Pattern        Input    Should match?    Matches?
-------------------------------------------------
/^ab[\d\D]cd/  abZcd        Yes             No      <--- !
/^ab[\d\D]cd/  ab5cd        Yes             No      <--- !
/^ab[\D]cd/    abZcd        Yes             No      <--- !
/^ab\Dcd/      abZcd        Yes             Yes
/^ab[\d]cd/    ab5cd        Yes             Yes
/^ab\dcd/      ab5cd        Yes             Yes
/^ab\dcd/      abZcd        No              No
/^ab\Dcd/      ab5cd        No              No

The common feature amongst the three failures is the \D inside a character
class.

The behaviour is the same when switching to std::regex_match.

For added fun, I get the expected results on Linux:

  $ clang++ --version
  clang version 5.0.0-3~16.04.1 (tags/RELEASE_500/final)
  Target: x86_64-pc-linux-gnu
  Thread model: posix
  InstalledDir: /usr/bin

Related to bug 21363 (locale fun)?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to