[Bug libstdc++/88947] regex_match doesn't fail early when given a non-matching pattern with a start-of-input anchor

2019-01-22 Thread tom at kera dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88947

--- Comment #7 from Tomalak Geret'kal  ---
(In reply to Tim Shen from comment #5)
> For the original test case, have you tried regex_match() with "what.*"?

That behaves as I'd expect
(http://quick-bench.com/AKdMnnhA03T1vwfN9sf53xlbD6s).

> Do you have any non-trivial testcase in mind that is still unexpectedly slow
> with regex_match()?

The original real-world pattern that led to me discovering this was:

/^[\x02-\x7f]\0..[\x01-\x0c]\0..\0\0/

Switching to regex_match() for that pattern also yields the expected result
(http://quick-bench.com/g6lZj00gBswzvd-rjO7QwRE0Exg), so that's a good
workaround here.

But, adapting your earlier example to "(^abc|xyz)", this would require chaining
a regex_match with a regex_search, which gets unwieldy quite quickly.

Well, okay, I suppose in that example we could regex_match on
"(?:(abc).*|.*(xyz).*)", but I really don't think we should have to rewrite
patterns like this in order to get the behaviour that's common in other
ecosystems' regex impls. So, although I'm open to being convinced otherwise, I
still think we would reasonably expect regex_search to fail fast.

[Bug libstdc++/88947] regex_match doesn't fail early when given a non-matching pattern with a start-of-input anchor

2019-01-22 Thread tom at kera dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88947

--- Comment #4 from Tomalak Geret'kal  ---
To be honest I'd expect this in less trivial circumstances too. If, at a given
stage of processing, the only possible paths towards a match all require a
prefix that's already been ruled out, that should be an immediate return false.
To the best of my knowledge this is commonly what happens in regex engines
(though again libstdc++ is far from alone in the C++ world in not doing so!)

[Bug libstdc++/88947] New: regex_match doesn't fail early when given a non-matching pattern with a start-of-input anchor

2019-01-21 Thread tom at kera dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88947

Bug ID: 88947
   Summary: regex_match doesn't fail early when given a
non-matching pattern with a start-of-input anchor
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tom at kera dot name
  Target Milestone: ---

I first raised this on SO (https://stackoverflow.com/q/54237547/560648), on
which I have posted some benchmarks to back up the claim(s) below.

Take the following:

#include 
int main()
{
  static const size_t BufSize = 100;
  char buf[BufSize] = {};
  auto begin = std::cbegin(buf), end = std::cend(buf);

  std::cmatch groups;
  std::regex::flag_type flags = std::regex_constants::ECMAScript;
  std::regex re("^what", flags);
  std::regex_search(begin, end, groups, re);
}

This attempts to match the pattern "^what" against a block of 100 characters.
The match is not expected to succeed (in this case, the input is simply 100
'\0's, but the problem exists for any non-matching input).

However, I expect the match to fail as soon as the first character of input is
examined. By adjusting BufSize to increasingly large values, we observe that
the execution time increases also, suggesting that the regex engine is
examining the entire input even though the presence of the anchor "^"
guarantees that a match will never be found. It only needed to examine the
first character to know this. When BufSize reaches larger values like 100KB,
this becomes quite problematic.

It is clear from the implementation code
(https://github.com/gcc-mirror/gcc/blob/464ac146f6d2aaab847f653edde3ae84a8366c94/libstdc%2B%2B-v3/include/bits/regex_executor.tcc#L37-L54)
that there is simply no logic in place to "fail fast" or "fail early" in a case
like this: the only way a "no match" result is returned is after examining the
whole input, regardless of the pattern.

It is my opinion that this is a quality of implementation issue, and one that
only appears in C++ implementations of regular expressions. This problem is
common to libstdc++, libc++ and also Visual Studio's stdlib impl. (I am raising
bugs against all three.)

As a workaround I'm having to artificially select a prefix of the input data in
order to get a fast result -- in the example above, that could be:

  auto begin = std::cbegin(buf), end = std::cbegin(buf)+4;

However, not all examples are so trivial (indeed, the example above would be
much better approached with a simple string prefix comparison) and the
workaround not always so easy. When the pattern is more complex, it is not
always easy to find the best number of characters to send to the regex engine,
and the resulting code not particularly elegant. It would be much better if the
engine could be given the whole input without having to worry about scale.

Hopefully my expectation isn't unreasonable; Safari's implementation of regex
behaves as I'd expect. That is, the time to return a "no match" result is
constant (and fast) given the JS equivalent of the above example.

Is it possible that the regex_match implementation could be given a little more
intelligence?

(Apologies that I am not sufficiently familiar with libstdc++ version history
to select an appropriate version number for this bug.)

[Bug libstdc++/88802] std::hash not implemented

2019-01-11 Thread tom at kera dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88802

--- Comment #1 from Tomalak Geret'kal  ---
[unord.hash]/2
> Each specialization of hash is either enabled or disabled, as described 
> below. [ Note: Enabled specializations meet the Cpp17Hash requirements, and 
> disabled specializations do not. — end note ] Each header that declares the 
> template hash provides enabled specializations of hash for nullptr_­t and all 
> cv-unqualified arithmetic, enumeration, and pointer types. For any type Key 
> for which neither the library nor the user provides an explicit or partial 
> specialization of the class template hash, hash is disabled.

(Clang HEAD does support this, it turns out.)

[Bug libstdc++/88802] New: std::hash not implemented

2019-01-11 Thread tom at kera dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88802

Bug ID: 88802
   Summary: std::hash not implemented
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tom at kera dot name
  Target Milestone: ---

See https://stackoverflow.com/q/54147254/560648.

C++17 requires that std::hash be provided. MSVS does, but dev
libstdc++ doesn't (and neither does libc++). This seems to be the case on trunk
still.


#include 
int main()
{
std::hash h;
return h(nullptr);
}


Result:

main.cpp: In function 'int main()':
main.cpp:4:31: error: use of deleted function
'std::hash::hash()'
 std::hash h;


Expected result:

Good build and some return code.

[Bug c++/86049] Array bindings are not const when initializer is

2018-12-11 Thread tom at kera dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86049

Tomalak Geret'kal  changed:

   What|Removed |Added

 CC||tom at kera dot name

--- Comment #3 from Tomalak Geret'kal  ---
I disagree and think that Clang is wrong.

The top-level qualifiers of T (the type of One) should be "cv", and cv is "the
cv-qualifiers in the decl-specifier-seq". The decl-specifier-seq is "auto", not
"const auto". That "auto" will infer "const int" doesn't seem to be relevant.

Discussion on https://stackoverflow.com/q/53726135/560648.

[Bug libstdc++/53838] _GLIBCXX_DEBUG and empty ostringstream

2015-04-02 Thread tom at kera dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53838

Tomalak Geret'kal tom at kera dot name changed:

   What|Removed |Added

 CC||tom at kera dot name

--- Comment #6 from Tomalak Geret'kal tom at kera dot name ---
Not a GCC bug? Really? I beg to differ.

 - Bug 54173
 - Bug 33021
 - Bug 64504


[Bug c++/64791] New: Generic lambda fails to implicitly capture `const` variable

2015-01-25 Thread tom at kera dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64791

Bug ID: 64791
   Summary: Generic lambda fails to implicitly capture `const`
variable
   Product: gcc
   Version: 4.9.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tom at kera dot name

From http://stackoverflow.com/q/28141403/560648

Reproduction:

/
#include iostream
#include functional

int main()
{
const int a = 2;
std::functionvoid(int) f = [](auto b) { std::cout  a  ,   b 
std::endl; };
f(3);
}
/


Taking any of the following steps allows the program to build and run with the
expected output 2, 3:

- remove `const` from declaration of a
- name `a` in the capture-list instead of relying on implicit capture
- change declaration of `f` from `std::functionvoid(int)` to `auto`
- make the lambda non-generic by changing `auto b` to `int b`
- use Clang (e.g. v3.5.0)

Suspect detection of odr-use is breaking, or this could be related to bug
61814, or something else entirely?


[Bug c++/64791] Generic lambda fails to implicitly capture `const` variable

2015-01-25 Thread tom at kera dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64791

--- Comment #1 from Tomalak Geret'kal tom at kera dot name ---
Build error:

/
main.cpp: In instantiation of 'main()::lambda(auto:1) [with auto:1 = int]':
/usr/local/include/c++/4.9.2/functional:2149:71:   required by substitution of
'templateclass _Res, class ... _ArgTypes templateclass _Functor using
_Invoke = decltype
(std::__callable_functor(declval_Functor())((declval_ArgTypes)()...))
[with _Functor = main()::lambda(auto:1) _Res = void; _ArgTypes = {int}]'
/usr/local/include/c++/4.9.2/functional:2225:9:   required by substitution of
'templateclass _Functor, class std::function_Res(_ArgTypes
...)::function(_Functor) [with _Functor = main()::lambda(auto:1)
template-parameter-1-2 = missing]'
main.cpp:7:90:   required from here
main.cpp:7:58: error: 'a' was not declared in this scope
 std::functionvoid(int) f = [](auto b) { std::cout  a  ,   b 
std::endl; };
  ^
/


[Bug c++/64791] Generic lambda fails to implicitly capture `const` variable

2015-01-25 Thread tom at kera dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64791

--- Comment #2 from Tomalak Geret'kal tom at kera dot name ---
Actually, I'm no longer sure that `a` *is* odr-used...


[Bug libstdc++/53984] iostream operation throwing exception when exceptions not enabled

2013-12-04 Thread tom at kera dot name
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53984

Tomalak Geret'kal tom at kera dot name changed:

   What|Removed |Added

 CC||tom at kera dot name

--- Comment #3 from Tomalak Geret'kal tom at kera dot name ---
Another testcase was proposed under the following Stack Overflow question:

  http://stackoverflow.com/q/20371956/560648

The answer to that question was a link to this bug.


[Bug libstdc++/52015] std::to_string does not work under MinGW

2013-10-21 Thread tom at kera dot name
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52015

Tomalak Geret'kal tom at kera dot name changed:

   What|Removed |Added

 CC||tom at kera dot name

--- Comment #23 from Tomalak Geret'kal tom at kera dot name ---
Nathan, read comment 15. :)


[Bug libstdc++/52169] the ifstream readsome() method does not signal any bit on eof.

2012-02-08 Thread tom at kera dot name
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52169

Tomalak Geret'kal tom at kera dot name changed:

   What|Removed |Added

 CC||tom at kera dot name

--- Comment #1 from Tomalak Geret'kal tom at kera dot name 2012-02-08 
10:39:54 UTC ---
cplusplus.com is (a) not authoritative, (b) full of mistakes, and (c) otherwise
just awful.

Instead, we'll quote the standard(s):

[C++11: 27.7.2.3]:
  streamsize readsome(char_type* s, streamsize n);
32/ Effects: Behaves as an unformatted input function (as described in
27.7.2.3, paragraph 1). After constructing a sentry object, if !good() calls
setstate(failbit) which may throw an exception, and return. Otherwise extracts
characters and stores them into successive locations of an array whose first
element is designated by s. If rdbuf()-in_avail() == -1, calls
setstate(eofbit) (which may throw ios_base::failure (27.5.5.4)), and extracts
no characters;
— If rdbuf()-in_avail() == 0, extracts no characters
— If rdbuf()-in_avail()  0, extracts min(rdbuf()-in_avail(),n)).
33/ Returns: The number of characters extracted.

[C++03: 27.6.1.3]:
  streamsize readsome(char_type* s, streamsize n);
30/ Effects: Behaves as an unformatted input function (as described in
27.6.1.3, paragraph 1). After constructing a sentry object, if !good() calls
setstate(failbit) which may throw an exception, and return. Otherwise extracts
characters and stores them into successive locations of an array whose first
element is designated by s. If rdbuf()-in_avail() == -1, calls
setstate(eofbit) (which may throw ios_base::failure (27.4.4.3)), and extracts
no characters;
— If rdbuf()-in_avail() == 0, extracts no characters
— If rdbuf()-in_avail()  0, extracts min(rdbuf()-in_avail(),n)).
31/ Returns: The number of characters extracted.


[Bug libstdc++/52169] the ifstream readsome() method does not signal any bit on eof.

2012-02-08 Thread tom at kera dot name
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52169

--- Comment #2 from Tomalak Geret'kal tom at kera dot name 2012-02-08 
10:45:17 UTC ---
Are you sure it's not just that in_avail is 0? Why should it be -1 here?

i.e. doesn't readsome become a noop when there's nothing to read?


[Bug c++/40942] GCC accepts code that Comeau and MSVC deems invalid.

2009-08-25 Thread tom at kera dot name


--- Comment #4 from tom at kera dot name  2009-08-25 14:48 ---
(In reply to comment #2)
 Why would this be ambiguous? A string literal has type array of n const char
 (see 2.13.4/1), so it should go with the array constructor. Do you disagree?
 
 W.
 

Table 9 under 13.3.3/1 shows that array-to-pointer conversion is Exact Match.
As is rvalue-to-lvalue conversion (though string literals are lvalues anyway
:D). As is Identity, which is what applies here.

There is no clear ranking between them, hence Comeau reporting ambiguity.
Although I personally think Identity should overrule absolutely everything, it
doesn't. So IMO GCC is buggy in this way.


-- 

tom at kera dot name changed:

   What|Removed |Added

 CC||tom at kera dot name


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40942