[Bug libstdc++/88947] regex_match doesn't fail early when given a non-matching pattern with a start-of-input anchor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88947 --- Comment #7 from Tomalak Geret'kal --- (In reply to Tim Shen from comment #5) > For the original test case, have you tried regex_match() with "what.*"? That behaves as I'd expect (http://quick-bench.com/AKdMnnhA03T1vwfN9sf53xlbD6s). > Do you have any non-trivial testcase in mind that is still unexpectedly slow > with regex_match()? The original real-world pattern that led to me discovering this was: /^[\x02-\x7f]\0..[\x01-\x0c]\0..\0\0/ Switching to regex_match() for that pattern also yields the expected result (http://quick-bench.com/g6lZj00gBswzvd-rjO7QwRE0Exg), so that's a good workaround here. But, adapting your earlier example to "(^abc|xyz)", this would require chaining a regex_match with a regex_search, which gets unwieldy quite quickly. Well, okay, I suppose in that example we could regex_match on "(?:(abc).*|.*(xyz).*)", but I really don't think we should have to rewrite patterns like this in order to get the behaviour that's common in other ecosystems' regex impls. So, although I'm open to being convinced otherwise, I still think we would reasonably expect regex_search to fail fast.
[Bug libstdc++/88947] regex_match doesn't fail early when given a non-matching pattern with a start-of-input anchor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88947 --- Comment #4 from Tomalak Geret'kal --- To be honest I'd expect this in less trivial circumstances too. If, at a given stage of processing, the only possible paths towards a match all require a prefix that's already been ruled out, that should be an immediate return false. To the best of my knowledge this is commonly what happens in regex engines (though again libstdc++ is far from alone in the C++ world in not doing so!)
[Bug libstdc++/88947] New: regex_match doesn't fail early when given a non-matching pattern with a start-of-input anchor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88947 Bug ID: 88947 Summary: regex_match doesn't fail early when given a non-matching pattern with a start-of-input anchor Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: tom at kera dot name Target Milestone: --- I first raised this on SO (https://stackoverflow.com/q/54237547/560648), on which I have posted some benchmarks to back up the claim(s) below. Take the following: #include int main() { static const size_t BufSize = 100; char buf[BufSize] = {}; auto begin = std::cbegin(buf), end = std::cend(buf); std::cmatch groups; std::regex::flag_type flags = std::regex_constants::ECMAScript; std::regex re("^what", flags); std::regex_search(begin, end, groups, re); } This attempts to match the pattern "^what" against a block of 100 characters. The match is not expected to succeed (in this case, the input is simply 100 '\0's, but the problem exists for any non-matching input). However, I expect the match to fail as soon as the first character of input is examined. By adjusting BufSize to increasingly large values, we observe that the execution time increases also, suggesting that the regex engine is examining the entire input even though the presence of the anchor "^" guarantees that a match will never be found. It only needed to examine the first character to know this. When BufSize reaches larger values like 100KB, this becomes quite problematic. It is clear from the implementation code (https://github.com/gcc-mirror/gcc/blob/464ac146f6d2aaab847f653edde3ae84a8366c94/libstdc%2B%2B-v3/include/bits/regex_executor.tcc#L37-L54) that there is simply no logic in place to "fail fast" or "fail early" in a case like this: the only way a "no match" result is returned is after examining the whole input, regardless of the pattern. It is my opinion that this is a quality of implementation issue, and one that only appears in C++ implementations of regular expressions. This problem is common to libstdc++, libc++ and also Visual Studio's stdlib impl. (I am raising bugs against all three.) As a workaround I'm having to artificially select a prefix of the input data in order to get a fast result -- in the example above, that could be: auto begin = std::cbegin(buf), end = std::cbegin(buf)+4; However, not all examples are so trivial (indeed, the example above would be much better approached with a simple string prefix comparison) and the workaround not always so easy. When the pattern is more complex, it is not always easy to find the best number of characters to send to the regex engine, and the resulting code not particularly elegant. It would be much better if the engine could be given the whole input without having to worry about scale. Hopefully my expectation isn't unreasonable; Safari's implementation of regex behaves as I'd expect. That is, the time to return a "no match" result is constant (and fast) given the JS equivalent of the above example. Is it possible that the regex_match implementation could be given a little more intelligence? (Apologies that I am not sufficiently familiar with libstdc++ version history to select an appropriate version number for this bug.)
[Bug libstdc++/88802] std::hash not implemented
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88802 --- Comment #1 from Tomalak Geret'kal --- [unord.hash]/2 > Each specialization of hash is either enabled or disabled, as described > below. [ Note: Enabled specializations meet the Cpp17Hash requirements, and > disabled specializations do not. — end note ] Each header that declares the > template hash provides enabled specializations of hash for nullptr_t and all > cv-unqualified arithmetic, enumeration, and pointer types. For any type Key > for which neither the library nor the user provides an explicit or partial > specialization of the class template hash, hash is disabled. (Clang HEAD does support this, it turns out.)
[Bug libstdc++/88802] New: std::hash not implemented
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88802 Bug ID: 88802 Summary: std::hash not implemented Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: tom at kera dot name Target Milestone: --- See https://stackoverflow.com/q/54147254/560648. C++17 requires that std::hash be provided. MSVS does, but dev libstdc++ doesn't (and neither does libc++). This seems to be the case on trunk still. #include int main() { std::hash h; return h(nullptr); } Result: main.cpp: In function 'int main()': main.cpp:4:31: error: use of deleted function 'std::hash::hash()' std::hash h; Expected result: Good build and some return code.
[Bug c++/86049] Array bindings are not const when initializer is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86049 Tomalak Geret'kal changed: What|Removed |Added CC||tom at kera dot name --- Comment #3 from Tomalak Geret'kal --- I disagree and think that Clang is wrong. The top-level qualifiers of T (the type of One) should be "cv", and cv is "the cv-qualifiers in the decl-specifier-seq". The decl-specifier-seq is "auto", not "const auto". That "auto" will infer "const int" doesn't seem to be relevant. Discussion on https://stackoverflow.com/q/53726135/560648.
[Bug libstdc++/53838] _GLIBCXX_DEBUG and empty ostringstream
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53838 Tomalak Geret'kal tom at kera dot name changed: What|Removed |Added CC||tom at kera dot name --- Comment #6 from Tomalak Geret'kal tom at kera dot name --- Not a GCC bug? Really? I beg to differ. - Bug 54173 - Bug 33021 - Bug 64504
[Bug c++/64791] New: Generic lambda fails to implicitly capture `const` variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64791 Bug ID: 64791 Summary: Generic lambda fails to implicitly capture `const` variable Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: tom at kera dot name From http://stackoverflow.com/q/28141403/560648 Reproduction: / #include iostream #include functional int main() { const int a = 2; std::functionvoid(int) f = [](auto b) { std::cout a , b std::endl; }; f(3); } / Taking any of the following steps allows the program to build and run with the expected output 2, 3: - remove `const` from declaration of a - name `a` in the capture-list instead of relying on implicit capture - change declaration of `f` from `std::functionvoid(int)` to `auto` - make the lambda non-generic by changing `auto b` to `int b` - use Clang (e.g. v3.5.0) Suspect detection of odr-use is breaking, or this could be related to bug 61814, or something else entirely?
[Bug c++/64791] Generic lambda fails to implicitly capture `const` variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64791 --- Comment #1 from Tomalak Geret'kal tom at kera dot name --- Build error: / main.cpp: In instantiation of 'main()::lambda(auto:1) [with auto:1 = int]': /usr/local/include/c++/4.9.2/functional:2149:71: required by substitution of 'templateclass _Res, class ... _ArgTypes templateclass _Functor using _Invoke = decltype (std::__callable_functor(declval_Functor())((declval_ArgTypes)()...)) [with _Functor = main()::lambda(auto:1) _Res = void; _ArgTypes = {int}]' /usr/local/include/c++/4.9.2/functional:2225:9: required by substitution of 'templateclass _Functor, class std::function_Res(_ArgTypes ...)::function(_Functor) [with _Functor = main()::lambda(auto:1) template-parameter-1-2 = missing]' main.cpp:7:90: required from here main.cpp:7:58: error: 'a' was not declared in this scope std::functionvoid(int) f = [](auto b) { std::cout a , b std::endl; }; ^ /
[Bug c++/64791] Generic lambda fails to implicitly capture `const` variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64791 --- Comment #2 from Tomalak Geret'kal tom at kera dot name --- Actually, I'm no longer sure that `a` *is* odr-used...
[Bug libstdc++/53984] iostream operation throwing exception when exceptions not enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53984 Tomalak Geret'kal tom at kera dot name changed: What|Removed |Added CC||tom at kera dot name --- Comment #3 from Tomalak Geret'kal tom at kera dot name --- Another testcase was proposed under the following Stack Overflow question: http://stackoverflow.com/q/20371956/560648 The answer to that question was a link to this bug.
[Bug libstdc++/52015] std::to_string does not work under MinGW
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52015 Tomalak Geret'kal tom at kera dot name changed: What|Removed |Added CC||tom at kera dot name --- Comment #23 from Tomalak Geret'kal tom at kera dot name --- Nathan, read comment 15. :)
[Bug libstdc++/52169] the ifstream readsome() method does not signal any bit on eof.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52169 Tomalak Geret'kal tom at kera dot name changed: What|Removed |Added CC||tom at kera dot name --- Comment #1 from Tomalak Geret'kal tom at kera dot name 2012-02-08 10:39:54 UTC --- cplusplus.com is (a) not authoritative, (b) full of mistakes, and (c) otherwise just awful. Instead, we'll quote the standard(s): [C++11: 27.7.2.3]: streamsize readsome(char_type* s, streamsize n); 32/ Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1). After constructing a sentry object, if !good() calls setstate(failbit) which may throw an exception, and return. Otherwise extracts characters and stores them into successive locations of an array whose first element is designated by s. If rdbuf()-in_avail() == -1, calls setstate(eofbit) (which may throw ios_base::failure (27.5.5.4)), and extracts no characters; — If rdbuf()-in_avail() == 0, extracts no characters — If rdbuf()-in_avail() 0, extracts min(rdbuf()-in_avail(),n)). 33/ Returns: The number of characters extracted. [C++03: 27.6.1.3]: streamsize readsome(char_type* s, streamsize n); 30/ Effects: Behaves as an unformatted input function (as described in 27.6.1.3, paragraph 1). After constructing a sentry object, if !good() calls setstate(failbit) which may throw an exception, and return. Otherwise extracts characters and stores them into successive locations of an array whose first element is designated by s. If rdbuf()-in_avail() == -1, calls setstate(eofbit) (which may throw ios_base::failure (27.4.4.3)), and extracts no characters; — If rdbuf()-in_avail() == 0, extracts no characters — If rdbuf()-in_avail() 0, extracts min(rdbuf()-in_avail(),n)). 31/ Returns: The number of characters extracted.
[Bug libstdc++/52169] the ifstream readsome() method does not signal any bit on eof.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52169 --- Comment #2 from Tomalak Geret'kal tom at kera dot name 2012-02-08 10:45:17 UTC --- Are you sure it's not just that in_avail is 0? Why should it be -1 here? i.e. doesn't readsome become a noop when there's nothing to read?
[Bug c++/40942] GCC accepts code that Comeau and MSVC deems invalid.
--- Comment #4 from tom at kera dot name 2009-08-25 14:48 --- (In reply to comment #2) Why would this be ambiguous? A string literal has type array of n const char (see 2.13.4/1), so it should go with the array constructor. Do you disagree? W. Table 9 under 13.3.3/1 shows that array-to-pointer conversion is Exact Match. As is rvalue-to-lvalue conversion (though string literals are lvalues anyway :D). As is Identity, which is what applies here. There is no clear ranking between them, hence Comeau reporting ambiguity. Although I personally think Identity should overrule absolutely everything, it doesn't. So IMO GCC is buggy in this way. -- tom at kera dot name changed: What|Removed |Added CC||tom at kera dot name http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40942