Possibly minor pessimisation at -O2 for logical-AND'd branch statements

2021-04-24 Thread Soul Studios

The following:
#include 

int main()
{
const bool a =(rand() % 20) > 5;
const bool b =(rand() & 1) == 1;

	int x = rand() % 5, y = rand() % 2, z = rand() % 3, c = rand() % 4, d = 
rand() % 5;


if (a & b) d = y * c;
else if (a & !b) c = 2 * d;
else if (!a & b) y = c / z;
else z = c / d * 5;

return x * c * d * y * z;
}


Results in 3 branch instructions at -O0, as one would expect, but 6 
branch instructions at -O2 (x86-64, gcc 10), which I wouldn't expect. It 
appears to mimick the same code using && instead of & at -O2.
I would expect this to be a pessimisation due to the increased number of 
branch instructions, it may've been trying to reduce the overall number 
of potential evaluations of conditions, but missed the fact that those 
conditions are const variables not (for example) function calls. Clang 
keeps this at 3 branches at -O2.


Further into the 'can we do better-than/as-good-as clang' territory, the 
same code above but using && instead of &, resolves to 5 branches in 
clang (12) and 6 in gcc (10.3), at -O2 (x86-64).


Cheers,
matt Bentley


Re: removing toxic emailers

2021-04-22 Thread Soul Studios

On 15/04/2021 10:40 am, Frosku wrote:

On Wed Apr 14, 2021 at 9:49 PM BST, Paul Koning via Gcc wrote:


My answer is "it depends". More precisely, in the past I would have
favored those who decline because the environment is unpleasant -- with
the implied assumption being that their objections are reasonable. Given
the emergency of cancel culture, that assumption is no longer
automatically valid.

This is why I asked the question "who decides?" Given a disagreement in
which the proposed remedy is to ostracise a participant, it is necessary
to inquire for what reason this should be done (and, perhaps, who is
pushing for it to be done). My suggestion is that this judgment can be
made by the community (via secret ballot), unless it is decided to
delegate that power to a smaller body, considered as trustees, or
whatever you choose to call them.

paul


I think, in general, it's fine to leave this decision to moderators. It's
just a little disconcerting when one of the people who would probably be
moderating is saying that he could have shut down the discussion if he
could only ban jerks, as if to imply that everyone who dares to disagree
with his position is a jerk worthy of a ban.



A little late to the party, but thought this was worth commenting on- 
from my perspective, as long as there is some sort of consensus amongst 
moderators about who is worth banning, as opposed to whether it can be 
fixed by calling the person out on their ongoing behaviour, it's 
probably worth doing. If that power is left to one mod, it's not a good 
thing. 3 or a larger odd number of mods is best for avoiding stalemates, 
and more is better.
As an example of a controversial mod choice and without wanting to 
reopen wounds here, if I were a mod I could quite easily ban Nathan for 
the dishonesty and divisiveness of his initial post (see below if you 
require substantive talk around that), despite the fact that I have no 
particular love for Stallman or any investment in the topic. But another 
mod might see that contribution as 'the end justifying the means' in 
terms of bringing in an inevitable debate around Stallman's offputting 
personal manner, and whether that fits in today's society. Another mod 
might have another opinion etc.


Two or three heads, are better than one, when it comes to behaviour 
judgement - particularly when an international community is at stake. 
And the more temperamentally/culturally diverse the mods are - the 
better for decision-making overall.









=

 1. 'skeptical that voluntarily pedophilia harms children.’ 
stallman's own  archives 2006-mar-jun  I note that children are

*incapable* of consenting. That’s what the age of consent means.


He has recanted on this as of 2019 
(https://www.stallman.org/archives/2019-jul-oct.html#14_September_2019_(Sex_between_an_adult_and_a_child_is_wrong))


because people took the time to point out to him why his opinion was 
wrong. Omitting his recantation is, by my standards, a lie by omission. 
It doesn't make what he initially said any less terrible. But it 
clarifies his actual position.




 2. 'end censorship of “child pornography”’. Stallman's archives 
2012-jul-oct.html Notice use of “quotes” to down play what is actually

being requested.


While I don't actually agree with Stallman in the slightest, his stated 
objection is "it's common practice for teenagers to exchange nude photos 
with their lovers, and they all potentially could be imprisoned for 
this. A substantial fraction of them are actually prosecuted. "


That's very different from how it's been presented here - a lie by omission.



 3. 'gentle expressions of attraction’ Stallman's archives > 

2012-jul-oct.html Condoning a variant of the wolf-whistle.  Unless

one’s talking to one’s lover, ‘gentle invitations for sex’ by a
stranger is *grooming* (be it child or of-age).


If you ever been to a bar, or an open-air event, or god forbid a party, 
you are aware that this is an obvious lie (for adults).


Secondarily, nothing in Richard's text relates to wolf-whistling or 
variants.




Re: removing toxic emailers

2021-04-14 Thread Soul Studios



On 15/04/2021 11:09 am, Adrian via Gcc wrote:

Eric S. Raymond :
Speaking as a "high functioning autist", I'm aware of the difficulties that
some of us have with social interactions - and also that many of us
construct a persona or multiple personae to interact with others, a
phenomenon known as "masking".

I understand why "Asshole" can function as a viable mask for many people,
because there are cultures where it's tolerated, particularly in
remote-working groups like mailing lists, where physical altercations are
unlikely and no-one has to confront the results of their interactions with
others if they don't want to.



Just wanted to say thanks for this Adrian-
regardless of which position people take on this, I think this serves as 
a collection of useful insights - as does Eric's contribution.

cheers


Re: Remove RMS from the GCC Steering Committee

2021-03-29 Thread Soul Studios



On 30/03/2021 1:18 am, Richard Kenner wrote:

I think I will leave this discussion up to those who have more
familiarity with the guy than I do. There's no doubt that some of the
stuff Stallman has written creeps me the hell out, and I think it was
more the tone of the OP I objected to.


I mostly want to stay out of this and will leave much of this discussion to
others (though I have met RMS personally on a number of occaisions), but I
want to mostly say that I agree with Jeff that it's important that this
discussion stay civil.

I believe that to a large extent, the discussion here is reflective of a
much larger discussion in society of to what extent, if at all, an entity
associated with an person must or should take action based on things that
that person does while not associated with that entity.


It's worth noting that when RMS was kicked from FSF, there was a 
2k-strong petition in favour, and a 3.5k-strong petition against. So 
clearly there is a discussion to be had, but as long as the left-wing 
(through self-rightiousness and threats of exclusion/withdrawal) and the 
right-wing (through belligerance and abuse/hostility) are trying 
actively to shut down discussion, that will not take place.


Re: Remove RMS from the GCC Steering Committee

2021-03-28 Thread Soul Studios

We are not talking about some single recent incident, but about
decades of problematic behavior. At the last face-to-face GNU Tools
Cauldron, everybody I talked to about it had some story about being
harassed by RMS, had witnessed such harassment or heard from or knew
someone who had been.


I think I will leave this discussion up to those who have more 
familiarity with the guy than I do. There's no doubt that some of the 
stuff Stallman has written creeps me the hell out, and I think it was 
more the tone of the OP I objected to.
Giving twitter as reference points doesn't really help matters, but it 
appears as though the problems are more offline than on.


Re: 10-12% performance decrease in benchmark going from GCC8 to GCC9

2020-09-30 Thread Soul Studios

Created a bug report a while back,
also have recently identified the smallest code change necessary to 
'turn off' the bug and have updated with that:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96750

If anyone has any input I'd love to see this fixed.
Thanks-
M

On 11/08/2020 12:58 am, Bill Schmidt wrote:


On 8/10/20 3:30 AM, Jonathan Wakely via Gcc wrote:

Hi Matt,

The best thing to do here is file a bug report with the code to 
reproduce it:

https://gcc.gnu.org/bugzill

Thanks



Also, be sure to follow the instructions at https://gcc.gnu.org/bugs/.

Bill



On Sat, 8 Aug 2020 at 23:01, Soul Studios  wrote:

Hi all,
recently have been working on a new version of the plf::colony container
(plflib.org) and found GCC9 was giving 10-12% worse performance on a
given benchmark than GCC8.

Previous versions of the colony container did not experience this
performance loss going from GCC8 to GCC9.
However Clang 6 and MSVC2019 show no performance loss going from the old
colony version to the new version.

The effect is repeatable across architectures - I've tested on xubuntu,
windows running nuwen mingw, and on Core2 and Haswell CPUs, with and
without -march=native specified.

Compiler flags are: -O2;-march=native;-std=c++17

Code is attached with an absolute minimum use-case - other benchmarks
have not shown such strong performance differences - including both
simpler and more complex tests.
So I cannot reduce further, please do not ask me to do so.

The benchmark in question inserts into a container initially then
iterates over container elements repeatedly, randomly erasing and/or
inserting new elements.


In addition I've attached the assembly output under both GCC8 and GCC9.
In this case I have output from 8.2 and 9.2 respectively, but the same
effects apply to 8.4 and 9.3. The output for 8 is a lot larger than 9,
wondering if there's more unrolling occurring.

Any questions let me know. I will help where I can, but my knowledge of
assembly is limited. If supplying the older version of colony is useful
I'm happy to do so.

Nanotimer is a ~nanosecond-precision sub-timeslice cross-platform timer.
Colony is a bucket-array-like unordered sequence container.
Thanks,
Matt




10-12% performance decrease in benchmark going from GCC8 to GCC9

2020-08-08 Thread Soul Studios

Hi all,
recently have been working on a new version of the plf::colony container 
(plflib.org) and found GCC9 was giving 10-12% worse performance on a 
given benchmark than GCC8.


Previous versions of the colony container did not experience this 
performance loss going from GCC8 to GCC9.
However Clang 6 and MSVC2019 show no performance loss going from the old 
colony version to the new version.


The effect is repeatable across architectures - I've tested on xubuntu, 
windows running nuwen mingw, and on Core2 and Haswell CPUs, with and 
without -march=native specified.


Compiler flags are: -O2;-march=native;-std=c++17

Code is attached with an absolute minimum use-case - other benchmarks 
have not shown such strong performance differences - including both 
simpler and more complex tests.

So I cannot reduce further, please do not ask me to do so.

The benchmark in question inserts into a container initially then 
iterates over container elements repeatedly, randomly erasing and/or 
inserting new elements.



In addition I've attached the assembly output under both GCC8 and GCC9. 
In this case I have output from 8.2 and 9.2 respectively, but the same 
effects apply to 8.4 and 9.3. The output for 8 is a lot larger than 9, 
wondering if there's more unrolling occurring.


Any questions let me know. I will help where I can, but my knowledge of 
assembly is limited. If supplying the older version of colony is useful 
I'm happy to do so.


Nanotimer is a ~nanosecond-precision sub-timeslice cross-platform timer.
Colony is a bucket-array-like unordered sequence container.
Thanks,
Matt


<>


Re: -Wclass-memaccess warning should be in -Wextra, not -Wall

2018-07-10 Thread Soul Studios
I guess the phrasing is a bit weak, "some users" obviously has to refer 
to a significant proportion of users, "easy to avoid" cannot have too 
many drawbacks (in particular, generated code should be of equivalent 
quality), etc.


-Wclass-memaccess fits the "easy to avoid" quite well, since a simple 
cast disables it. -Wmaybe-uninitialized is much worse: it produces many 
false positives, that change with every release and are super hard to 
avoid. And even in the "easy to avoid" category where we don't want to 
litter the code with casts to quiet the warnings, I find -Wsign-compare 
way worse in practice than -Wclass-memaccess.


I personally get annoyed with the amount of hand-holding compilers seem 
to need now in order to do what I ask them to do - and this isn't 
leveled at GCC, but also clang and MSVC.
Some of it makes sense in order to check the programmer meant to make 
that conversion, but you end up with a hell of a lot of static/other 
casts, and it's not great for readability.
As a side-note, the workaround for memset/cpy/move as mentioned isn't 
visible in the warning, which I think it probably should be, rather than 
just the GCC documentation.
Still, as you all have mentioned, the workaround is pretty 
straight-forward in this case, so that's good.

M@


Re: -Wclass-memaccess warning should be in -Wextra, not -Wall

2018-07-10 Thread Soul Studios
Not sure how kosher it is to address several replies in one email, but 
I'm going to attempt it as there are overlapping topics:



Martin:


Simply because a struct has a constructor does not mean it isn't a
viable target/source for use with memcpy/memmove/memset.


As the documentation that Segher quoted explains, it does
mean exactly that.


I just thought I'd go back and address this, because it doesn't; you can 
have a struct with a basic constructor that is just a convenience 
function (or an optimization to avoid default type assignment), but all 
the types are POD. This is an obvious viable target for memcpy/memmove.




Please open bugs with small test cases showing
the inefficiencies so the optimizers can be improved.


I've done one today (86471), I may not have time to do more. In 
particular I've found GCC to generate less-optimal code for Core2 
processors as opposed to the i3 and upwards range, when avoiding memcpy.




No, programmers don't always know that.  In fact, it's easy even
for an expert programmer to make the mistake that what looks like
a POD struct can safely be cleared by memset or copied by memcpy
when doing so is undefined because one of the struct members is
of a non-trivial type (such a container like string).


I'm not sure that that would be 'easy' for an expert to make that 
mistake (Marc's example aside), but okay.
I'm coming from the understanding that the orthodox approach (and the 
one currently being taught in schools/universities) is to use the C++ 
templated functions as opposed to the older C-style functions, and this 
is the avenue that the majority of novice C++ programmers will be coming 
from.




Quite a lot of thought and discussion went into the design and
implementation of the warning, so venting your frustrations or
insulting those of us involved in the process is unlikely to
help you effect a change.  To make a compelling argument you


Where did I insult you? I didn't. And expressing (not venting) 
frustration is fine if I feel people are not accurately reading what 
I've written I think. Don't take too much offence.




Jonathan:

> It was clear in your first post, but that doesn't make it correct. The
> statement "any programmer [invoking undefined behaviour] is going to
> know what they're getting into" is laughable.

I didn't say [invoking undefined behaviour]. I said using 
memset/memcpy/memmove on the structs/classes-with-constructors-etc where 
it is viable to do so. And that may not be invoking undefined behaviour, 
as above-



> I've seen far more cases of assignment operators implemented with
> memcpy that were wrong and broken and due to ignorance or incompetence
> than I have seen them done by programmers who knew what they were
> doing, or knew what they were getting into. There are programmers who
> come from C, and don't realise that a std::string shouldn't be copied
> with memcpy. There are programmers who are too lazy to write out
> memberwise assignment for each member, so just want to save a few
> lines of code by copying everything in one go with memcpy. There are
> lots of other ways to do it wrong. Your statement is simply not based
> in fact, it's more likely based on your limited experience, and
> assumption that everybody is probably doing what you're doing.

Fair enough, if that's your experience, it's fine. My assumption was 
that the orthodox (as above) approach is currently 
pretty-well-propagated on the internet-and-academia, and that anyone not 
doing that would have their reasons. But, thank you for actually 
addressing my post's point and not simply dancing around the edge of it.
There are other non-lazy performance reasons to use these, even for 
assignment, but it depends on the situation. I don't say this not having 
benchmarked it extensively, and correctly, in my own code.


M@


Re: -Wclass-memaccess warning should be in -Wextra, not -Wall

2018-07-09 Thread Soul Studios

On 07/05/2018 05:14 PM, Soul Studios wrote:

Simply because a struct has a constructor does not mean it isn't a
viable target/source for use with memcpy/memmove/memset.


As the documentation that Segher quoted explains, it does
mean exactly that.

Some classes have user-defined copy and default ctors with
the same effect as memcpy/memset.  In modern C++ those ctors
should be defaulted (= default) and GCC should emit optimal
code for them.  In fact, in loops they can result in more
efficient code than the equivalent memset/memcpy calls.  In
any case, "native" operations lend themselves more readily
to code analysis than raw memory accesses and as a result
allow all compilers (not just GCC) do a better a job of
detecting bugs or performing interesting transformations
that they may not be able to do otherwise.


Having benchmarked the alternatives memcpy/memmove/memset definitely
makes a difference in various scenarios.


Please open bugs with small test cases showing
the inefficiencies so the optimizers can be improved.

Martin




My point to all of this (and I'm annoyed that I'm having to repeat it 
again, as it my first post wasn't clear enough - which it was) was that 
any programmer using memcpy/memmove/memset is going to know what they're 
getting into.
Therefore it makes no sense to penalize them by getting them to write 
ugly, needless code - regardless of the surrounding politics/codewars.

Extra seems an amiable place to put this, All doesn't.

As for test cases, well, this is something I've benchmarked over a range 
of scenarios in various projects over the past 3 years (mainly 
plf::colony and plf::list).

If I have time I'll submit a sample.


-Wclass-memaccess warning should be in -Wextra, not -Wall

2018-07-05 Thread Soul Studios
Simply because a struct has a constructor does not mean it isn't a 
viable target/source for use with memcpy/memmove/memset.
Having benchmarked the alternatives memcpy/memmove/memset definitely 
makes a difference in various scenarios.
The bypass of littering code with needless reinterpret_cast's is 
fugly.
Members which are invariants should of course be noted, but anyone using 
memset/cpy/move probably knows this.
Please discuss this annoyance. I would prefer this be moved into Extra, 
as this is less commonly used.


Re: Apparent deeply-nested missing error bug with gcc 7.3

2018-06-21 Thread Soul Studios

UPDATE: My bad.
The original compiler feature detection on the test suite was broken/not
matching the correct libstdc++ versions.
Hence the emplace_back/emplace_front tests were not running.


Told you so :-P



However, it does surprise me that GCC doesn't check this code.


It's a dependent expression so can't be fully checked until
instantiated -- and as you've discovered, it wasn't being
instantiated. There's a trade-off between compilation speed and doing
additional work to check uninstantiated templates with arbitrarily
complex expressions in them.



Yeah, I get it - saves a lot of time with heavily-templated setups and 
large projects.


Re: Apparent deeply-nested missing error bug with gcc 7.3

2018-06-18 Thread Soul Studios


It's never called.

I added a call to abort() to that function, and the tests all pass. So
the function is never used, so GCC never compiles it and doesn't
notice that the return type is invalid. That's allowed by the
standard. The compiler is not required to diagnose ill-formed code in
uninstantiated templates.




UPDATE: My bad.
The original compiler feature detection on the test suite was broken/not 
matching the correct libstdc++ versions.

Hence the emplace_back/emplace_front tests were not running.
However, it does surprise me that GCC doesn't check this code.
Cheers-


Re: Apparent deeply-nested missing error bug with gcc 7.3

2018-06-18 Thread Soul Studios


It's never called.

I added a call to abort() to that function, and the tests all pass. So
the function is never used, so GCC never compiles it and doesn't
notice that the return type is invalid. That's allowed by the
standard. The compiler is not required to diagnose ill-formed code in
uninstantiated templates.



As I mentioned in the original message, it is called several times.
Both MSVC and Clang pick up the error, but GCC does not. It's a bug.


Apparent deeply-nested missing error bug with gcc 7.3

2018-06-17 Thread Soul Studios
In the following case GCC correctly throws an error since 
simple_return_value is returning a pointer, not a reference:


"#include 

int & simple_return_value(int )
{
return 
}


int main()
{
int temp = 42;
return simple_return_value(temp);
}"



However in deeply-nested code GCC appears to miss the error, and in fact 
still returns a reference despite the return value of a pointer.


Take the following code in plf_list 
(https://github.com/mattreecebentley/plf_list):


"template
inline PLF_LIST_FORCE_INLINE reference emplace_back(arguments &&... 
parameters)

{
	return (emplace(end_iterator, 
std::forward(parameters)...)).node_pointer->element;

}
"


emplace returns an iterator which contains a pointer to a node, which is 
then used to return the element at that node. However if you change the 
line to:
	"return &((emplace(end_iterator, 
std::forward(parameters)...)).node_pointer->element);"



GCC 7.3 doesn't blink. Worse, it appears to return a reference anyway. 
The test suite cpp explicitly tests the return value of emplace_back, so 
changing the line should result in a test fail as well as a compile 
error. Neither occur.
You can test this yourself by downloading the code + test suite at 
http://www.plflib.org/plf_list_18-06-2018.zip

and editing the line yourself (line 1981).

Clang 3.7.1 detects the error immediately.


I'm not sure at which stage this bug appears. Templating the code at the 
top of this mail doesn't recreate the bug, so it must be something to do 
with templated classes (I'm guessing here).


Hopefully someone can shed some light on this.


Re: Missing warning opportunity - code after break;

2017-06-21 Thread Soul Studios

Doesn't look like I can create a GCC bugzilla account, but 7 years? Wow.


On 20/06/2017 3:48 p.m., Eric Gallager wrote:

On 6/19/17, Soul Studios <m...@soulstudios.co.nz> wrote:

Just noticed this:
while(something)
{
// stuff

if (num_elements == 0)
{
break;
--current_group;
}
}

doesn't trigger a warning in GCC 5.1, 6.3 and 7.1. The line after
"break;" is unused, probably should be before the break, ie. user error.

Matt Bentley



This is bug 46476: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46476



Missing warning opportunity - code after break;

2017-06-19 Thread Soul Studios

Just noticed this:
while(something)
{
// stuff

if (num_elements == 0)
{
break;
--current_group;
}
}

doesn't trigger a warning in GCC 5.1, 6.3 and 7.1. The line after 
"break;" is unused, probably should be before the break, ie. user error.


Matt Bentley


Re: [PING][RFC] Assertions as optimization hints

2016-11-27 Thread Soul Studios

The main problem with __assume is that it should never be used.
Literally, if you have to use it, then the code it refers to should
actually be commented out.



ps. This is a bit of a simplification - I should have stated, if the 
assumption is wrong, it can lead to dangerous code which breaks.


Re: [PING][RFC] Assertions as optimization hints

2016-11-27 Thread Soul Studios



Does this approach make sense in general? If it does I can probably
come up with more measurements.


Sounds good to me-not sure why there hasn't been more response to this, it
seems logical-
if functions can make assumptions such as pointerA != pointerB then that
leads the way for avoiding aliasing restrictions.



As a side note, at least some users may consider this a useful feature:
http://www.nntp.perl.org/group/perl.perl5.porters/2013/11/msg209482.html


The main problem with __assume is that it should never be used. 
Literally, if you have to use it, then the code it refers to should 
actually be commented out.


libstdc++ deque allocation

2016-06-22 Thread Soul Studios

Hi there-
quick question,
does deque as defined in libstdc++ allocate upon initialisation or upon 
first insertion?

Trying to dig through the code but can't figure it out.
Reason being, it's insertion graphs seem to show a surprisingly linear 
progression from small amounts of N to large amounts.

Thanks in advance,
Matt


std::list iteration performance for under 1000 elements

2016-04-24 Thread Soul Studios

Hi guys,
I was wondering if any of you could explain this performance for me:
www.plflib.org/colony.htm#benchmarks

(full disclosure, this is my website and benchmarks - I just don't under 
the std::list results I'm getting at the moment)


If you look at the iteration performance graphs, you'll see that 
std::list under gcc (and MSVC, from further testing) has /really good/ 
forward-iteration performance for under 1000 elements (fewer elements 
for larger datatypes).
Why is this. Everything I know about std::list's (non-contiguous memory 
allocation, cache effect) implementation tells me it should have 
terrible iteration performance. But for under 1000 elements it's often 
better than std::deque?


Benchmarking is done with templates, so there's no different code 
between std::deque and std::list (with the exception of std::list using 
push_front rather than push_back, for these tests) - and subsequent 
changes to the benchmark code have made no difference to the restuls.


Anyway, if anyone here is a GCC developer and has an understanding of 
why this happens, I'd be appreciative.


Cheers,
matt