Re: Redundant constants in coremark crc8 for RISCV/aarch64 (no-if-conversion)

2022-10-18 Thread Jeff Law via Gcc



On 10/18/22 20:09, Vineet Gupta wrote:


On 10/18/22 16:36, Jeff Law wrote:
There isn't a great place in GCC to handle this right now.  If the 
constraints were relaxed in PRE, then we'd have a chance, but 
getting the cost model right is going to be tough.


It would have been better (for this specific case) if loop unrolling 
was not being done so early. The tree pass cunroll is flattening it 
out and leaving for rest of the all tree/rtl passes to pick up the 
pieces and remove any redundancies, if at all. It obviously needs to 
be early if we are injecting 7x more instructions, but seems like a 
lot to unravel.


Yup.  If that loop gets unrolled, it's going to be a mess.  It will 
almost certainly make this problem worse as each iteration is going 
to have a pair of constants loaded and no good way to remove them.


Thats the original problem that I started this thread with. I'd 
snipped the disassembly as it would have been too much text but 
basically on RV, Coremark crc8 loop of const 8 iterations gets 
unrolled including extraneous 8 insns pairs to load the same constant 
- which is preposterous. Other arches side-step by using if-conversion 
/ cond moves, latter currently WIP in RV International. x86 w/o 
if-convert seems OK since the const can be encoded in the xor insn.


OTOH given that gimple/tree-pass cunroll is doing the culprit loop 
unrolling and introducing redundant const 8 times, can it ne addressed 
there somehow.
tree_estimate_loop_size() seems to identify constant expression, not 
just an operand. Can it be taught to identify a "non-trivial const" 
and hoist/code-move the expression. Sorry just rambling here, most 
likely non-sense.


Oh, cunroll.  There might be a distinct flag for complete unrolling.


I really expect something like Click's work is the way forward. 
Essentially when you VN the function you'll identify those constants and 
collapse them all down to a single instance.  Then the GCM phase will 
kick in and find a place to put the evaluation so that you have one and 
only one.


Some of Bodik's work might catch it as well, though implementing his 
ideas is likely a lot more work.



Jeff


Re: Redundant constants in coremark crc8 for RISCV/aarch64 (no-if-conversion)

2022-10-18 Thread Vineet Gupta



On 10/18/22 16:36, Jeff Law wrote:
There isn't a great place in GCC to handle this right now.  If the 
constraints were relaxed in PRE, then we'd have a chance, but 
getting the cost model right is going to be tough.


It would have been better (for this specific case) if loop unrolling 
was not being done so early. The tree pass cunroll is flattening it 
out and leaving for rest of the all tree/rtl passes to pick up the 
pieces and remove any redundancies, if at all. It obviously needs to 
be early if we are injecting 7x more instructions, but seems like a 
lot to unravel.


Yup.  If that loop gets unrolled, it's going to be a mess.  It will 
almost certainly make this problem worse as each iteration is going to 
have a pair of constants loaded and no good way to remove them.


Thats the original problem that I started this thread with. I'd snipped 
the disassembly as it would have been too much text but basically on RV, 
Coremark crc8 loop of const 8 iterations gets unrolled including 
extraneous 8 insns pairs to load the same constant - which is 
preposterous. Other arches side-step by using if-conversion / cond 
moves, latter currently WIP in RV International. x86 w/o if-convert 
seems OK since the const can be encoded in the xor insn.


OTOH given that gimple/tree-pass cunroll is doing the culprit loop 
unrolling and introducing redundant const 8 times, can it ne addressed 
there somehow.
tree_estimate_loop_size() seems to identify constant expression, not 
just an operand. Can it be taught to identify a "non-trivial const" and 
hoist/code-move the expression. Sorry just rambling here, most likely 
non-sense.






FWIW -fno-unroll-loops only seems to work at -O2. At -O3 it always 
unrolls. Is that expected ?


The only case I'm immediately aware of where this wouldn't work would 
be if -O3 came after -fno-unroll-oops.


Weird that gcc-12, gcc-11, gcc-10 all seem to be silently ignoring 
-funroll-loops despite following -O3. Perhaps a different toggle is 
needed to supress the issue.


Thx,
-Vineet


Re: Redundant constants in coremark crc8 for RISCV/aarch64 (no-if-conversion)

2022-10-18 Thread Jeff Law via Gcc



On 10/18/22 15:51, Vineet Gupta wrote:




Where BB4 corresponds to .L2 and BB6 corresponds to .L3. Evaluation 
of the constants occurs in BB3 and BB5.


And Evaluation here means use of the constant (vs. definition ?).


In this case, use of the constant.




PRE/GCSE is better suited for this scenario, but it has a critical 
constraint.  In particular our PRE formulation is never allowed to 
put an evaluation of an expression on a path that didn't have one 
before. So
while there clearly a redundancy on the path 2->3->4->5 (BB3 and 
BB5), there is nowhere we could put an evaluation that would reduce 
the number of evaluation on that path without introducing an 
evaluation on paths that didn't have one.  So consider 2->4->6.  On 
that path there are zero evaluations.  So we can't place an eval in 
BB2 because that will cause evaluations on 2->4->6 which didn't have 
any evaluations.


OK. How does PRE calculate all possible paths to consider: say your 
example 2-3-4-5 and 2-4-6 ? Is that just indicative or would actually 
be the one PRE calculates for this case. Would there be more ?


PRE has a series of dataflow equations it solves which gives it the 
answer to that question.  The one that computes this property is usually 
called anticipated.  Given some block B in a graph G. An expression is 
anticipated at B when the expression is guaranteed to be computed if we 
reach B.  That doesn't mean the evaluation must happen in B, just that 
evaluation at some point is guaranteed if we reach B.


If an expression is not anticipated in a block, then PRE will not insert 
in that block since doing so would add evaluations on paths where they 
did not previously have any.





There isn't a great place in GCC to handle this right now.  If the 
constraints were relaxed in PRE, then we'd have a chance, but getting 
the cost model right is going to be tough.


It would have been better (for this specific case) if loop unrolling 
was not being done so early. The tree pass cunroll is flattening it 
out and leaving for rest of the all tree/rtl passes to pick up the 
pieces and remove any redundancies, if at all. It obviously needs to 
be early if we are injecting 7x more instructions, but seems like a 
lot to unravel.


Yup.  If that loop gets unrolled, it's going to be a mess.  It will 
almost certainly make this problem worse as each iteration is going to 
have a pair of constants loaded and no good way to remove them.





FWIW -fno-unroll-loops only seems to work at -O2. At -O3 it always 
unrolls. Is that expected ?


The only case I'm immediately aware of where this wouldn't work would be 
if -O3 came after -fno-unroll-oops.





If this seems worthwhile and you have ideas to do this any better, I'd 
be happy to work on this with some guidance.


I don't see  a great solution here.    Something like Cliff Click's work 
might help, but it's far from a guarantee.  Click's work essentially 
throws away the PRE constraint about never inserting an expression 
evaluation on a path where it didn't exit, along with all kinds of other 
things.  Essentially it's a total reformulation of redundancy elimination.



I did an implementation eons ago in gimple, but never was able to 
convince myself the implementation was correct or that integrating it 
was a good thing.   It's almost certainly going to cause performance 
regressions elsewhere so it may end up doing more harm than good.  I 
don't really know.



https://courses.cs.washington.edu/courses/cse501/06wi/reading/click-pldi95.pdf


Jeff





Re: Redundant constants in coremark crc8 for RISCV/aarch64 (no-if-conversion)

2022-10-18 Thread Vineet Gupta

Hi Jeff,

On 10/14/22 09:54, Jeff Law via Gcc wrote:

...

.L2:
xor    a4,a4,a5
andi    a4,a4,1
srli    a3,a0,2
srli    a5,a5,1
beq    a4,zero,.L3

li    a4,-24576    # 0x_A000
addi    a4,a4,1    # 0x_A001
xor    a5,a5,a4
zext.h    a5,a5

.L3:
xor    a3,a3,a5
andi    a3,a3,1
srli    a4,a0,3
srli    a5,a5,1
beq    a3,zero,.L4

li    a3,-24576    # 0x_A000
addi    a3,a3,1    # 0x_A001
...
...

I see that with small tests cse1 is able to substitute redundant 
constant reg with equivalent old reg.


I find it easier to reason about this stuff with a graphical CFG, so a 
bit of ascii art...



   2
     /    \
  3 ---> 4
  /    \
  5 --->  6



Yeah A picture is worth thousand words :-)


Where BB4 corresponds to .L2 and BB6 corresponds to .L3. Evaluation of 
the constants occurs in BB3 and BB5.


And Evaluation here means use of the constant (vs. definition ?).

CSE isn't going to catch this.  The best way to think about CSE's 
capabilities is that it can work on extended basic blocks. An 
extended basic block can have jumps out, but not jumps in.  There are 3 
EBBs in this code.  (1,2), (4,5) and 6.    So BB4 is in a different EBB 
than BB3.  So the evaluation in BB3 can't be used by CSE in the EBB 
containing BB4, BB5.


Thanks for the detailed explanation.

PRE/GCSE is better suited for this scenario, but it has a critical 
constraint.  In particular our PRE formulation is never allowed to put 
an evaluation of an expression on a path that didn't have one before. So
while there clearly a redundancy on the path 2->3->4->5 (BB3 and BB5), 
there is nowhere we could put an evaluation that would reduce the number 
of evaluation on that path without introducing an evaluation on paths 
that didn't have one.  So consider 2->4->6.  On that path there are zero 
evaluations.  So we can't place an eval in BB2 because that will cause 
evaluations on 2->4->6 which didn't have any evaluations.


OK. How does PRE calculate all possible paths to consider: say your 
example 2-3-4-5 and 2-4-6 ? Is that just indicative or would actually be 
the one PRE calculates for this case. Would there be more ?


There isn't a great place in GCC to handle this right now.  If the 
constraints were relaxed in PRE, then we'd have a chance, but getting 
the cost model right is going to be tough.


It would have been better (for this specific case) if loop unrolling was 
not being done so early. The tree pass cunroll is flattening it out and 
leaving for rest of the all tree/rtl passes to pick up the pieces and 
remove any redundancies, if at all. It obviously needs to be early if we 
are injecting 7x more instructions, but seems like a lot to unravel.


FWIW -fno-unroll-loops only seems to work at -O2. At -O3 it always 
unrolls. Is that expected ?


If this seems worthwhile and you have ideas to do this any better, I'd 
be happy to work on this with some guidance.


Thx,
-Vineet


Re: Toolchain Infrastructure project statement of support

2022-10-18 Thread Paul Smith
On Tue, 2022-10-18 at 14:14 -0400, Siddhesh Poyarekar wrote:
> On 2022-10-18 14:13, Siddhesh Poyarekar wrote:
> > only Job Corbet's questions to Carlos/David are pending an answer;
> 
> s/Job/Jon/ sorry about misspelling your name.

I thought it was great!  We all have known for years that Jon has the
requisite patience for that role...


Re: Toolchain Infrastructure project statement of support

2022-10-18 Thread Siddhesh Poyarekar

On 2022-10-18 14:13, Siddhesh Poyarekar wrote:
only Job Corbet's questions to Carlos/David are pending an answer; I 


s/Job/Jon/ sorry about misspelling your name.

Sid


Re: Toolchain Infrastructure project statement of support

2022-10-18 Thread Siddhesh Poyarekar

On 2022-10-18 12:42, Christopher Faylor wrote:

On Tue, Oct 18, 2022 at 11:17:15AM -0400, Siddhesh Poyarekar wrote:

That is not true, Mark.  Your objections and questions have been answered at
every stage, privately as well as publicly.


Actually, going back through this thread, I see outstanding
questions/issues raised by Mark, Frank, Alexandre Oliva, Jon Corbet, and
Andrew Pinski.


As far as actual questions regarding the proposal is concerned, I think 
only Job Corbet's questions to Carlos/David are pending an answer; I 
deferred them to Carlos or David because I think they're better placed 
to answer them in their entirety.  The rest, AFAICT, are either fear of 
some kind of corporate takeover, discussions about current sourceware 
infrastructure, or just rhetoric, none of which I'm interested in 
engaging with.


The corporate takeover fear especially is amusing to me given how much 
of GNU toolchain development and infrastructure is sponsored by 
corporations right now but that's just my personal opinion.


Maybe the FSF hosted call next week[1] would be a suitable forum to 
discuss fears of corporate control of the GNU toolchain project 
infrastructure due to the LF IT migration, or for that matter any other 
questions you or others think may have gone unanswered.  Since 
sourceware is neither a GNU nor an FSF project, it probably does not 
make sense to discuss current sourceware infrastructure there.


Sid

[1] https://sourceware.org/pipermail/overseers/2022q4/018997.html


Re: Toolchain Infrastructure project statement of support

2022-10-18 Thread Christopher Faylor via Gcc
On Tue, Oct 18, 2022 at 11:17:15AM -0400, Siddhesh Poyarekar wrote:
>That is not true, Mark.  Your objections and questions have been answered at
>every stage, privately as well as publicly.

Actually, going back through this thread, I see outstanding
questions/issues raised by Mark, Frank, Alexandre Oliva, Jon Corbet, and
Andrew Pinski.



Re: Toolchain Infrastructure project statement of support

2022-10-18 Thread Siddhesh Poyarekar

On 2022-10-18 05:50, Mark Wielaard wrote:

Hi Siddhesh,

On Mon, Oct 17, 2022 at 12:11:53PM -0400, Siddhesh Poyarekar wrote:

There seems to be little to discuss from the GNU toolchain perspective IMO;


Yes, it is clear you don't want any discussion or answer any questions
about the proposals, 


That is not true, Mark.  Your objections and questions have been 
answered at every stage, privately as well as publicly.  What *is* clear 
is that we have been talking past each other because despite our common 
high level intentions, we appear to have little common ground for our 
goals.  You want to retain the current sourceware infrastructure and try 
and see what we can do within that framework and I want us to migrate 
services to infrastructure with better funding (that's not just limited 
to services), dedicated ops management and an actually scalable future.



how funds can be used,
Let me turn that around: how *would* you like funds to be used beyond 
what is currently proposed in the LF IT proposal?


what the budget is, 


Around $400,000.


what
the requirements are, 


Your lack of clarity about requirements IMO have more to do with you 
wanting to fulfill those requirements within sourceware and not with 
their existence.  I and others have repeated them here and the overseers 
have either questioned their validity or noted them in bugzilla as 
possible things to explore in the current sourceware context.  With 
sourceware migration to LF IT off the table, there's little incentive 
for me personally to explore them.


how the governance structure works, 


I think you know how it works, maybe you meant to say that you don't 
like it?


The governance structure and their workings have been described in the 
GTI introduction.  There are two key bodies that govern the project: the 
Technical Advisory Council (comprised of project community members) 
manages the technical details of the infrastructure and the governing 
board (comprised of representatives from funding companies) manages the 
funding for those technical details.


The current TAC comprises of people from the initial community 
stakeholders who were contacted and subsequently accepted the invitation 
to be part of TAC.  You, along with other overseers, were invited too 
but most of you declined.



what
alternatives we have, etc.

For projects the alternatives they have are:

1. Migrate to LF IT infrastructure
2. Have a presence on sourceware as well as LF IT, contingent to Red 
Hat's decision on the hardware infrastructure

3. Stay fully on sourceware

For sourceware as infrastructure the alternatives are:

1. Migrate to LF IT infrastructure
2. Stay as it currently is

For sourceware overseers, the choices are contingent on what projects 
decide and what Red Hat decides w.r.t. sourceware.


All of the above has been clear all along.  Maybe the problem here is 
that you're not happy with the alternatives?



But personally I think it is healthy to have real discussions, doing
resource analysis, creating public roadmaps, collecting infrastructure
enahancement requests, discuss how to organize, argue about the needed
budget and how to use funding most efficiently, etc. To make sure that
sourceware keeps being a healthy and viable free software
infrastructure project for the next 24 years, hopefully including the
various GNU toolchain projects.


You want to talk about sourceware without including the LF IT proposal 
whereas I'd love to talk about sourceware as an LF IT maintained 
infrastructure.  There's a real disconnect that precludes any real 
discussions.


Sid


Re: [PATCH RESEND 0/1] RFC: P1689R5 support

2022-10-18 Thread Ben Boeckel via Gcc
On Thu, Oct 13, 2022 at 13:08:46 -0400, David Malcolm wrote:
> On Mon, 2022-10-10 at 16:21 -0400, Jason Merrill wrote:
> > David Malcolm would probably know best about JSON wrangling.
> 
> Unfortunately our JSON output doesn't make any guarantees about the
> ordering of keys within an object, so the precise textual output
> changes from run to run.  I've coped with that in my test cases by
> limiting myself to simple regexes of fragments of the JSON output.
> 
> Martin Liska [CCed] went much further in
> 4e275dccfc2467b3fe39012a3dd2a80bac257dd0 by adding a run-gcov-pytest
> DejaGnu directive, allowing for test cases for gcov to be written in
> Python, which can thus test much more interesting assertions about the
> generated JSON.

Ok, if Python is acceptable, I'll use its stdlib to do "fancy" things.
Part of this is because I want to assert that unnecessary fields don't
exist and that sounds…unlikely to be possible in any maintainable way
(assuming it is possible) with regexen. `jq` could help immensely, but
that is probably a bridge too far :) .

Thanks,

--Ben


Re: [PATCH RESEND 1/1] p1689r5: initial support

2022-10-18 Thread Ben Boeckel via Gcc
On Tue, Oct 11, 2022 at 07:42:43 -0400, Ben Boeckel wrote:
> On Mon, Oct 10, 2022 at 17:04:09 -0400, Jason Merrill wrote:
> > Can we share utf8 parsing code with decode_utf8_char in pretty-print.cc?
> 
> I can look at factoring that out. I'll have to decode its logic to see
> how much overlap there is.

There is some mismatch. First, that is in `gcc` and this is in `libcpp`.
Second, `pretty-print.cc`'s implementation:

- fails on an empty string;
- accepts extended-length (5+-byte) encodings which are invalid Unicode;
  and
- decodes codepoint-by-codepoint instead of just validating the entire
  string.

--Ben


I'm legal forever

2022-10-18 Thread Sitthiphong Singphon via Gcc
All of our lists  have public archives.

Copyright (C) Free Software Foundation, Inc.  Verbatim
copying and distribution of this entire article is permitted in any medium,
provided this notice is preserved.

Thank you sir.

 Hybr


GNU Tools Cauldron 2022 - video recordings

2022-10-18 Thread Martin Liška
Hello,

Video recordings of the GNU Tools Cauldron 2022 are available now
at YouTube:

https://www.youtube.com/playlist?list=PL_GiHdX17WtzDK0OVhD_u_a3ti4shpXLl

Cheers,
Martin


Re: Toolchain Infrastructure project statement of support

2022-10-18 Thread Mark Wielaard
Hi Siddhesh,

On Mon, Oct 17, 2022 at 12:11:53PM -0400, Siddhesh Poyarekar wrote:
> There seems to be little to discuss from the GNU toolchain perspective IMO;

Yes, it is clear you don't want any discussion or answer any questions
about the proposals, how funds can be used, what the budget is, what
the requirements are, how the governance structure works, what
alternatives we have, etc.

But personally I think it is healthy to have real discussions, doing
resource analysis, creating public roadmaps, collecting infrastructure
enahancement requests, discuss how to organize, argue about the needed
budget and how to use funding most efficiently, etc. To make sure that
sourceware keeps being a healthy and viable free software
infrastructure project for the next 24 years, hopefully including the
various GNU toolchain projects.

Cheers,

Mark