[PATCH] D50534: [libc++] Fix handling of negated character classes in regex

2018-09-06 Thread Hans Wennborg via Phabricator via cfe-commits
hans added a comment.

In https://reviews.llvm.org/D50534#1225591, @xbolva00 wrote:

> is this fixed in 7.0 release branch too?
>
> @hans


I've merged it in r341529 now. Thanks for letting me know.


Repository:
  rCXX libc++

https://reviews.llvm.org/D50534



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50534: [libc++] Fix handling of negated character classes in regex

2018-09-06 Thread Dávid Bolvanský via Phabricator via cfe-commits
xbolva00 added subscribers: hans, xbolva00.
xbolva00 added a comment.

is this fixed in 7.0 release branch too?

@hans


Repository:
  rCXX libc++

https://reviews.llvm.org/D50534



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50534: [libc++] Fix handling of negated character classes in regex

2018-08-24 Thread Louis Dionne via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rCXX340609: [libc++] Fix handling of negated character classes 
in regex (authored by ldionne, committed by ).

Changed prior to commit:
  https://reviews.llvm.org/D50534?vs=160169&id=162377#toc

Repository:
  rCXX libc++

https://reviews.llvm.org/D50534

Files:
  include/regex
  test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
  test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp


Index: include/regex
===
--- include/regex
+++ include/regex
@@ -2414,20 +2414,17 @@
 goto __exit;
 }
 }
-// set of "__found" chars =
+// When there's at least one of __neg_chars_ and __neg_mask_, the set
+// of "__found" chars is
 //   union(complement(union(__neg_chars_, __neg_mask_)),
 // other cases...)
 //
-// __neg_chars_ and __neg_mask_'d better be handled together, as there
-// are no short circuit opportunities.
-//
-// In addition, when __neg_mask_/__neg_chars_ is empty, they should be
-// treated as all ones/all chars.
+// It doesn't make sense to check this when there are no __neg_chars_
+// and no __neg_mask_.
+if (!(__neg_mask_ == 0 && __neg_chars_.empty()))
 {
-  const bool __in_neg_mask = (__neg_mask_ == 0) ||
-  __traits_.isctype(__ch, __neg_mask_);
+const bool __in_neg_mask = __traits_.isctype(__ch, __neg_mask_);
   const bool __in_neg_chars =
-  __neg_chars_.empty() ||
   std::find(__neg_chars_.begin(), __neg_chars_.end(), __ch) !=
   __neg_chars_.end();
   if (!(__in_neg_mask || __in_neg_chars))
Index: test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp
===
--- test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp
+++ test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp
@@ -18,7 +18,7 @@
 
 #include 
 #include 
-#include "test_macros.h"
+
 
 // PR34310
 int main()
Index: test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
===
--- test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
+++ test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
@@ -0,0 +1,44 @@
+//===--===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is dual licensed under the MIT and the University of Illinois Open
+// Source Licenses. See LICENSE.TXT for details.
+//
+//===--===//
+
+// 
+// UNSUPPORTED: c++98, c++03
+
+// Make sure that we correctly match inverted character classes.
+
+#include 
+#include 
+
+
+int main() {
+assert(std::regex_match("X", std::regex("[X]")));
+assert(std::regex_match("X", std::regex("[XY]")));
+assert(!std::regex_match("X", std::regex("[^X]")));
+assert(!std::regex_match("X", std::regex("[^XY]")));
+
+assert(std::regex_match("X", std::regex("[\\S]")));
+assert(!std::regex_match("X", std::regex("[^\\S]")));
+
+assert(!std::regex_match("X", std::regex("[\\s]")));
+assert(std::regex_match("X", std::regex("[^\\s]")));
+
+assert(std::regex_match("X", std::regex("[\\s\\S]")));
+assert(std::regex_match("X", std::regex("[^Y\\s]")));
+assert(!std::regex_match("X", std::regex("[^X\\s]")));
+
+assert(std::regex_match("X", std::regex("[\\w]")));
+assert(std::regex_match("_", std::regex("[\\w]")));
+assert(!std::regex_match("X", std::regex("[^\\w]")));
+assert(!std::regex_match("_", std::regex("[^\\w]")));
+
+assert(!std::regex_match("X", std::regex("[\\W]")));
+assert(!std::regex_match("_", std::regex("[\\W]")));
+assert(std::regex_match("X", std::regex("[^\\W]")));
+assert(std::regex_match("_", std::regex("[^\\W]")));
+}


Index: include/regex
===
--- include/regex
+++ include/regex
@@ -2414,20 +2414,17 @@
 goto __exit;
 }
 }
-// set of "__found" chars =
+// When there's at least one of __neg_chars_ and __neg_mask_, the set
+// of "__found" chars is
 //   union(complement(union(__neg_chars_, __neg_mask_)),
 // other cases...)
 //
-// __neg_chars_ and __neg_mask_'d better be handled together, as there
-// are no short circuit opportunities.
-//
-// In addition, when __neg_mask_/__neg_chars_ is empty, they should be
-// treated as all ones/all chars.
+// It doesn't make sense to check this when there are no __neg_chars_
+// and no __neg_mask_.

[PATCH] D50534: [libc++] Fix handling of negated character classes in regex

2018-08-10 Thread Tim Shen via Phabricator via cfe-commits
timshen accepted this revision.
timshen added a comment.
This revision is now accepted and ready to land.

That looks more correct to me, thanks! Although I'm still puzzled by the empty 
check at all, it's clearly an improvement.


Repository:
  rCXX libc++

https://reviews.llvm.org/D50534



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50534: [libc++] Fix handling of negated character classes in regex

2018-08-10 Thread Louis Dionne via Phabricator via cfe-commits
ldionne added a comment.

In https://reviews.llvm.org/D50534#1194664, @timshen wrote:

> I'm not fully equipped with the context right now, but something doesn't add 
> up. if `__neg_chars_.empty()` check is removed, the `(__neg_mask_ == 0)` 
> above should be removed too. They have to be consistent.
>
> However, there is more weirdness in it. The comment above describes the 
> intention:
>
>   union(complement(union(__neg_chars_, __neg_mask_)), other cases...)
>   
>
> With the `__neg_chars_.empty()` and `(__neg_mask_ == 0)` removed, I believe 
> that the code exactly matches the comment. Let's see what happens when users 
> don't specify any negative class or chars. `__neg_chars_` and `__neg_mask_` 
> will be empty sets, and `union(complement(union(__neg_chars_, __neg_mask_)), 
> other cases...)` always evaluate to full set, which means it always matches 
> all characters. This can't be right.
>
> It's likely that the comment description doesn't fully describe the intended 
> behavior. I think we need to figure that out first.


Yes, I think you were right. I think what was missing is that the comment does 
not apply when there's no `neg_mask` and no `neg_chars`. Otherwise we get into 
that situation where we're taking the complement of an empty set, and we match 
everything, which is not what we want.


Repository:
  rCXX libc++

https://reviews.llvm.org/D50534



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50534: [libc++] Fix handling of negated character classes in regex

2018-08-10 Thread Louis Dionne via Phabricator via cfe-commits
ldionne updated this revision to Diff 160169.
ldionne added a comment.

Rewrite the patch in light of timshen's comment, and add tests.


Repository:
  rCXX libc++

https://reviews.llvm.org/D50534

Files:
  libcxx/include/regex
  libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
  libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp


Index: libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp
===
--- libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp
+++ libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp
@@ -18,7 +18,7 @@
 
 #include 
 #include 
-#include "test_macros.h"
+
 
 // PR34310
 int main()
Index: 
libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
===
--- /dev/null
+++ libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
@@ -0,0 +1,44 @@
+//===--===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is dual licensed under the MIT and the University of Illinois Open
+// Source Licenses. See LICENSE.TXT for details.
+//
+//===--===//
+
+// 
+// UNSUPPORTED: c++98, c++03
+
+// Make sure that we correctly match inverted character classes.
+
+#include 
+#include 
+
+
+int main() {
+assert(std::regex_match("X", std::regex("[X]")));
+assert(std::regex_match("X", std::regex("[XY]")));
+assert(!std::regex_match("X", std::regex("[^X]")));
+assert(!std::regex_match("X", std::regex("[^XY]")));
+
+assert(std::regex_match("X", std::regex("[\\S]")));
+assert(!std::regex_match("X", std::regex("[^\\S]")));
+
+assert(!std::regex_match("X", std::regex("[\\s]")));
+assert(std::regex_match("X", std::regex("[^\\s]")));
+
+assert(std::regex_match("X", std::regex("[\\s\\S]")));
+assert(std::regex_match("X", std::regex("[^Y\\s]")));
+assert(!std::regex_match("X", std::regex("[^X\\s]")));
+
+assert(std::regex_match("X", std::regex("[\\w]")));
+assert(std::regex_match("_", std::regex("[\\w]")));
+assert(!std::regex_match("X", std::regex("[^\\w]")));
+assert(!std::regex_match("_", std::regex("[^\\w]")));
+
+assert(!std::regex_match("X", std::regex("[\\W]")));
+assert(!std::regex_match("_", std::regex("[\\W]")));
+assert(std::regex_match("X", std::regex("[^\\W]")));
+assert(std::regex_match("_", std::regex("[^\\W]")));
+}
Index: libcxx/include/regex
===
--- libcxx/include/regex
+++ libcxx/include/regex
@@ -2414,20 +2414,17 @@
 goto __exit;
 }
 }
-// set of "__found" chars =
+// When there's at least one of __neg_chars_ and __neg_mask_, the set
+// of "__found" chars is
 //   union(complement(union(__neg_chars_, __neg_mask_)),
 // other cases...)
 //
-// __neg_chars_ and __neg_mask_'d better be handled together, as there
-// are no short circuit opportunities.
-//
-// In addition, when __neg_mask_/__neg_chars_ is empty, they should be
-// treated as all ones/all chars.
+// It doesn't make sense to check this when there are no __neg_chars_
+// and no __neg_mask_.
+if (!(__neg_mask_ == 0 && __neg_chars_.empty()))
 {
-  const bool __in_neg_mask = (__neg_mask_ == 0) ||
-  __traits_.isctype(__ch, __neg_mask_);
+const bool __in_neg_mask = __traits_.isctype(__ch, __neg_mask_);
   const bool __in_neg_chars =
-  __neg_chars_.empty() ||
   std::find(__neg_chars_.begin(), __neg_chars_.end(), __ch) !=
   __neg_chars_.end();
   if (!(__in_neg_mask || __in_neg_chars))


Index: libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp
===
--- libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp
+++ libcxx/test/std/re/re.alg/re.alg.search/invert_neg_word_search.pass.cpp
@@ -18,7 +18,7 @@
 
 #include 
 #include 
-#include "test_macros.h"
+
 
 // PR34310
 int main()
Index: libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
===
--- /dev/null
+++ libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
@@ -0,0 +1,44 @@
+//===--===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is dual licensed under the MIT and the University of Illinois Open
+// Source Licenses. See LICENSE.TXT for details.
+//
+//===

[PATCH] D50534: [libc++] Fix handling of negated character classes in regex

2018-08-09 Thread Tim Shen via Phabricator via cfe-commits
timshen added a comment.

I'm not fully equipped with the context right now, but something doesn't add 
up. if `__neg_chars_.empty()` check is removed, the `(__neg_mask_ == 0)` above 
should be removed too. They have to be consistent.

However, there is more weirdness in it. The comment above describes the 
intention:

  union(complement(union(__neg_chars_, __neg_mask_)), other cases...)

With the `__neg_chars_.empty()` and `(__neg_mask_ == 0)` removed, I believe 
that the code exactly matches the comment. Let's see what happens when users 
don't specify any negative class or chars. __neg_chars_ and __neg_mask_ will be 
empty sets, and `union(complement(union(__neg_chars_, __neg_mask_)), other 
cases...)` always evaluate to true, which means it always matches all 
characters. This can't be right.

It's likely that the comment description doesn't fully describe the intended 
behavior. I think we need to figure that out first.


Repository:
  rCXX libc++

https://reviews.llvm.org/D50534



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50534: [libc++] Fix handling of negated character classes in regex

2018-08-09 Thread Louis Dionne via Phabricator via cfe-commits
ldionne added a comment.

The changeset that introduced this regression was reviewed as 
https://reviews.llvm.org/D37955. @timshen, I'm curious to understand why you 
or'ed with `__neg_chars_.empty()` when initializing `__in_neg_chars`. The 
logic's not clear to me, and my tests seem to show that this is wrong, but I 
don't understand the reasoning behind doing it in the first place, so I might 
be introducing another bug here. Please double-check.


Repository:
  rCXX libc++

https://reviews.llvm.org/D50534



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50534: [libc++] Fix handling of negated character classes in regex

2018-08-09 Thread Louis Dionne via Phabricator via cfe-commits
ldionne created this revision.
ldionne added reviewers: mclow.lists, timshen.
Herald added a reviewer: EricWF.
Herald added subscribers: cfe-commits, dexonsmith, christof.

This commit fixes a regression introduced in r316095, where we don't match
inverted character classes when there's no negated characrers in the []'s.

rdar://problem/43060054


Repository:
  rCXX libc++

https://reviews.llvm.org/D50534

Files:
  libcxx/include/regex
  libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp


Index: 
libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
===
--- /dev/null
+++ libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
@@ -0,0 +1,29 @@
+//===--===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is dual licensed under the MIT and the University of Illinois Open
+// Source Licenses. See LICENSE.TXT for details.
+//
+//===--===//
+
+// 
+// UNSUPPORTED: c++98, c++03
+
+// Make sure that we correctly match inverted character classes.
+
+#include 
+#include 
+
+
+int main() {
+assert(std::regex_match("X", std::regex("[\\S]")));
+assert(!std::regex_match("X", std::regex("[^\\S]")));
+
+assert(!std::regex_match("X", std::regex("[\\s]")));
+assert(std::regex_match("X", std::regex("[^\\s]")));
+
+assert(std::regex_match("X", std::regex("[\\s\\S]")));
+assert(std::regex_match("X", std::regex("[^Y\\s]")));
+assert(!std::regex_match("X", std::regex("[^X\\s]")));
+}
Index: libcxx/include/regex
===
--- libcxx/include/regex
+++ libcxx/include/regex
@@ -2427,7 +2427,6 @@
   const bool __in_neg_mask = (__neg_mask_ == 0) ||
   __traits_.isctype(__ch, __neg_mask_);
   const bool __in_neg_chars =
-  __neg_chars_.empty() ||
   std::find(__neg_chars_.begin(), __neg_chars_.end(), __ch) !=
   __neg_chars_.end();
   if (!(__in_neg_mask || __in_neg_chars))


Index: libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
===
--- /dev/null
+++ libcxx/test/std/re/re.alg/re.alg.match/inverted_character_classes.pass.cpp
@@ -0,0 +1,29 @@
+//===--===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is dual licensed under the MIT and the University of Illinois Open
+// Source Licenses. See LICENSE.TXT for details.
+//
+//===--===//
+
+// 
+// UNSUPPORTED: c++98, c++03
+
+// Make sure that we correctly match inverted character classes.
+
+#include 
+#include 
+
+
+int main() {
+assert(std::regex_match("X", std::regex("[\\S]")));
+assert(!std::regex_match("X", std::regex("[^\\S]")));
+
+assert(!std::regex_match("X", std::regex("[\\s]")));
+assert(std::regex_match("X", std::regex("[^\\s]")));
+
+assert(std::regex_match("X", std::regex("[\\s\\S]")));
+assert(std::regex_match("X", std::regex("[^Y\\s]")));
+assert(!std::regex_match("X", std::regex("[^X\\s]")));
+}
Index: libcxx/include/regex
===
--- libcxx/include/regex
+++ libcxx/include/regex
@@ -2427,7 +2427,6 @@
   const bool __in_neg_mask = (__neg_mask_ == 0) ||
   __traits_.isctype(__ch, __neg_mask_);
   const bool __in_neg_chars =
-  __neg_chars_.empty() ||
   std::find(__neg_chars_.begin(), __neg_chars_.end(), __ch) !=
   __neg_chars_.end();
   if (!(__in_neg_mask || __in_neg_chars))
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits