[Bug libstdc++/100128] New: Behavior and performance depends on order of ctype.h and stdlib.h include

2021-04-16 Thread travis.downs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100128

Bug ID: 100128
   Summary: Behavior and performance depends on order of ctype.h
and stdlib.h include
   Product: gcc
   Version: 10.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: travis.downs at gmail dot com
  Target Milestone: ---

When ctype.h is included as the first header in a file, it will be processed
without __NO_CTYPE being defined. This results in several differences versus
the case where __NO_CTYPE is defined.

For example, toupper() is defined as extern inline or as a macro if __NO_CTYPE
is undefed, but is not defined (only declared), otherwise. 

As another example, is_alnum_l and many similar methods will be defined as
macros if __NO_CTYPE is undefined, but otherwise will not.

On the other hand, if you include stdlib.h (or many other files such as
) in a C++ compile, the C++ "override" file
include/c++/10.3.0/stdlib.h gets included, which ultimately ends up including
x86_64-linux-gnu/bits/os_defines.h which defines __NO_CTYPE.

If  is subsequently included, its effect is different as described
above.

I suppose this is an ODR violation in one way or another (e.g., if two files
are included in the same program with and without __NO_CTYPE), and it can also
have a significant impact on performance as described here:

https://travisdowns.github.io/blog/2019/11/19/toupper.html

Evidently, the behavior and definitions exposed by these headers should not
depend on the order of include. I suspect there are other cases besides the
__NO_CTYPE as long as files that don't trigger the C++ header include chain
like ctype.h exist.

You can play with this example on godbolt:

https://godbolt.org/z/vY4EnE51z

Try swapping the order of ctype and stdlib includes to see the effect. The int
variables are canaries so you can see which macros were defined in the
preprocessed output.

This is the same as glibc bug #25214, but I was advised over there than this
should be filed against libstdc++ instead.

https://sourceware.org/bugzilla/show_bug.cgi?id=25214

[Bug c++/95400] New: -march=native and -march=icelake-client produce different results on icelake client

2020-05-28 Thread travis.downs at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95400

Bug ID: 95400
   Summary: -march=native and -march=icelake-client produce
different results on icelake client
   Product: gcc
   Version: 9.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: travis.downs at gmail dot com
  Target Milestone: ---

On an Ice Lake client machine, using -O3 -march=native produces 512-bit AVX-512
instructions, whereas -O3 -march=icelake-client produces 256-bit instructions.

Since this machine *is* Ice Lake client, I would expect both options to do the
same thing.

[Bug c++/92440] New: Error output for first error truncated with -fmax-errors=1

2019-11-10 Thread travis.downs at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92440

Bug ID: 92440
   Summary: Error output for first error truncated with
-fmax-errors=1
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: travis.downs at gmail dot com
  Target Milestone: ---

Consider the following code snippet:

template 
struct S {
template 
friend struct S; 
};

S<0> s;

Compiled with gcc trunk 10.0.0 and any earlier version I tried, it produces
with following error without any explicit command line flags:

1
x86-64 gcc (trunk)
- cached

#2 with x86-64 gcc (trunk)

: In instantiation of 'struct S<0>':

:7:6:   required from here

:1:15: error: template parameter 'int T'

1 | template 

  |   ^

:4:19: error: redeclared here as 'class U'

4 | friend struct S;

  |   ^

Compiler returned: 1

That's logically a single error.

With -fmax-errors=1 the error is truncated in the middle of a "sentence" with
only the first part visible, which prevents understanding the error:

: In instantiation of 'struct S<0>':

:7:6:   required from here

:1:15: error: template parameter 'int T'

1 | template 

  |   ^

compilation terminated due to -fmax-errors=1.

Compiler returned: 1


Godbolt link:

https://godbolt.org/z/bX2z4f

[Bug target/62011] False Data Dependency in popcnt instruction

2017-11-11 Thread travis.downs at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011

--- Comment #16 from Travis Downs  ---
Also, this is fixed for Skylake for tzcnt and lzcnt but not popcnt.

[Bug target/62011] False Data Dependency in popcnt instruction

2017-11-11 Thread travis.downs at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011

Travis Downs  changed:

   What|Removed |Added

 CC||travis.downs at gmail dot com

--- Comment #15 from Travis Downs  ---
For what it's worth and because Richard asked for it above, there is are Intel
erratum for this, at least as of Haswell, for example HSD146: "POPCNT
Instruction May Take Longer to Execute Than Expected". 

It mentions only popcnt, and I found it for Haswell, Skylake (SKL029) and
Broadwell. The text is:

POPCNT Instruction May Take Longer to Execute Than Expected

Problem:
POPCNT instruction execution with a 32 or 64 bit operand may be delayed until 
previous non-dependent instructions have executed.

Implication:
Software using the POPCNT instruction may experience lower performance than
expected. 

Workaround:
None identified