Re: Stepping up as maintainer for ia64

2024-03-08 Thread Peter Bergner via Gcc
On 3/8/24 5:28 PM, Jonathan Wakely wrote:
> On Fri, 8 Mar 2024 at 22:35, Frank Scheiner via Gcc  wrote:
>>
>> On 08.03.24 23:00, Peter Bergner wrote:
>>> On 3/8/24 7:16 AM, Richard Biener via Gcc wrote:
>>>> I CCed Jeff who is on the commitee to forward the maintainer proposal
>>>> though I guess this will not go forward as a first step.  Instead
>>>> you are probably expected to show activity on the port, for example
>>>> post the patch series to make ia64 use LRA, get write access to the
>>>> git repository and then be promoted maintainer.
>>>
>>> One other method for showing activity is posting regular testsuite
>>> results on the gcc-testresults mailing list to show the community
>>> the port is "working".
>>
>> I don't want to spam this or the other list each and every week, but I
> 
> Sending test results to the gcc-testresults list is **not** spamming,
> that's what the list is for!

100% agree!  If you look at what we (IBM) post, we roughly post somewhere
around 7 testsuite results per day due to runs on different hardware,
endianness and OS (Linux versus AIX).  So spam ...err... post away!



> If you're testing uncommon targets (e.g. ia64-linux) then sending test
> results to the list is essential so we know the target builds, because
> nobody else is testing it.

Again, 100% agree!

Peter



Re: [PATCH] fix PowerPC < 7 w/ Altivec not to default to power7

2024-03-08 Thread Peter Bergner via Gcc
On 3/8/24 5:30 AM, Jonathan Wakely via Gcc wrote:
> Patches should be sent to the gcc-patches list instead of this one,
> and should be against trunk not an old gcc-11 RC. See
> https://gcc.gnu.org/contribute.html#patches for more details - thanks!

And you need to CC the rs6000/powerpc port maintainers which you can find
along with their preferred email addresses in the MAINTAINERS file.  If you
don't CC them, they may miss seeing the patch.

Peter




Re: Stepping up as maintainer for ia64

2024-03-08 Thread Peter Bergner via Gcc
On 3/8/24 7:16 AM, Richard Biener via Gcc wrote:
> I CCed Jeff who is on the commitee to forward the maintainer proposal
> though I guess this will not go forward as a first step.  Instead
> you are probably expected to show activity on the port, for example
> post the patch series to make ia64 use LRA, get write access to the
> git repository and then be promoted maintainer.

One other method for showing activity is posting regular testsuite
results on the gcc-testresults mailing list to show the community
the port is "working".

Peter



Re: [RFC Linux patch] powerpc: add documentation for HWCAPs

2022-05-20 Thread Peter Bergner via Gcc
On 5/20/22 12:15 AM, Nicholas Piggin via Gcc wrote:
> +PPC_FEATURE_HAS_ALTIVEC
> +Vector (aka Altivec, VSX) facility is available.

Slight typo.  s/VSX/VMX/


Peter



Re: [power-ieee128] What should the math functions be annotated with?

2021-12-04 Thread Peter Bergner via Gcc
On 12/4/21 11:40 AM, Thomas Koenig wrote:
> OK, what I have now is
> 
> tkoenig@gcc-fortran:~$ echo $PATH
> /home/tkoenig/bin:/opt/at15.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
> tkoenig@gcc-fortran:~$ echo $LD_LIBRARY_PATH
> /home/tkoenig/lib64
> 
> I generally use LD_LIBRARY_PATH to point to where the shared
> libgfortran and other libraries is installed.
> 
> However, this breaks man (and I don't know what else):

So LD_LIBRARY_PATH is searched before the directories in ld.so.cache,
so you end up picking up some "new" libs from /home/tkoenig/lib64
and some of these rely on the newer libs in AT15.  However, man and
some of the other system binaries use the system dynamic linker, so
they search first through LD_LIBRARY_PATH an dnot finding something,
they fall back to /etc/ld.so.cache and that doesn't have the newer
AT15 libs, so you hit errors.

Instead of setting LD_LIBRARY_PATH=/home/tkoenig/lib64 could you try
setting it to LD_LIBRARY_PATH='$ORIGIN/lib64' instead?  This would
allow the other system binaries to not find your /home/tkoenig/lib64
directory so they'd behave normally.  However, any binary that was
compiled in a directory where your lib64/ exists would find your
new libs and use them.  I'm not sure if that cramps your testing
or not, to limit yourself to compiling your tests in that one directory.

If that doesn't work, could you instead not set LD_LIBRARY_PATH and
instead compile using -L/home/bergner/lib64 -R/home/bergner/lib64 ?

Peter





Re: [power-ieee128] What should the math functions be annotated with?

2021-12-04 Thread Peter Bergner via Gcc
On 12/4/21 10:19 AM, Jakub Jelinek wrote:
> But when Thomas is working on the vanilla gcc tree, trying to make it work
> for Fortran, I think he'll need to patch that gcc tree too to use the
> AT15's dynamic linker and rpath like the AT15 gcc is.

That is part of the magic that happens when you configure with
--with-advance-toolchain=at15.0, it forces the gcc to use AT15's
dynamic linker and AT15's ld.so.cache makes it so that the
dynamic linker finds AT15's libs etc.

Peter




Re: [power-ieee128] What should the math functions be annotated with?

2021-12-04 Thread Peter Bergner via Gcc
On 12/4/21 9:37 AM, Peter Bergner wrote:
> On 12/4/21 9:25 AM, Michael Meissner wrote:
> ubuntu@gcc-fortran:/home/tkoenig/Tst$ ldd ./a.out 
> ./a.out: /lib/powerpc64le-linux-gnu/libc.so.6: version `GLIBC_2.34' not found 
> (required by ./a.out)
>   linux-vdso64.so.1 (0x7f633962)
>   libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x7f63393d)
>   /opt/at15.0/lib64/ld64.so.2 => /lib64/ld64.so.2 (0x7f633964)

To go into a little more in depth, the important thing is your a.out
was linked with the correct loader:

ubuntu@gcc-fortran:/home/tkoenig/Tst$ readelf -l a.out | grep interpreter
  [Requesting program interpreter: /opt/at15.0/lib64/ld64.so.2]


...and the error message you saw was a good thing, it showed your a.out was
expecting to see the newer GLIBC 2.34 and didn't.  The reason it didn't
was that the system ldd which you used does some magic and overrides the
a.out runtime loader with the system loader and that loader uses its
own ld.so.cache which doesn't include AT15's library paths.  The AT15
loader has its own /opt/at15.0/etc/ld.so.cache which includes its lib dirs
as well the system lib dirs.  This way, the AT15 libs are found first and
any library AT15 doesn't provide it automatically picked up from the system.
As long as you keep the AT15 bin path before the system bin dirs, you should
be fine.

Peter





Re: [power-ieee128] What should the math functions be annotated with?

2021-12-04 Thread Peter Bergner via Gcc
On 12/4/21 9:25 AM, Michael Meissner wrote:
> On Sat, Dec 04, 2021 at 02:42:13PM +0100, Thomas Koenig wrote:
> Note, the system ldd does not tend to accurately report the library
> dependencies for AT libraries:

And using AT15's ldd, it shows your a.out is linked to the correct libc:

ubuntu@gcc-fortran:/home/tkoenig/Tst$ ldd ./a.out 
./a.out: /lib/powerpc64le-linux-gnu/libc.so.6: version `GLIBC_2.34' not found 
(required by ./a.out)
linux-vdso64.so.1 (0x7f633962)
libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x7f63393d)
/opt/at15.0/lib64/ld64.so.2 => /lib64/ld64.so.2 (0x7f633964)
ubuntu@gcc-fortran:/home/tkoenig/Tst$ /opt/at15.0/bin/ldd ./a.out 
linux-vdso64.so.1 (0x7158fb1c)
libc.so.6 => /opt/at15.0/lib64/power9/libc.so.6 (0x7158faf4)
/opt/at15.0/lib64/ld64.so.2 (0x7158fb1e)


What I would do is place /opt/at15.0/bin as the 2nd directory in your PATH,
with your new GCC install dir being first.  That way, things should be
seemless for you.

Peter




Re: How to describe ‘earlyclobber’ explicitly for specific source operand ?

2021-11-19 Thread Peter Bergner via Gcc
On 11/19/21 1:28 AM, Jojo R via Gcc wrote:
>   We know gcc supply earlyclobber function to avoid register overlap,
> 
>   but it can not describe explicitly for specific source operand, is it 
> right ?

You add the early clobber to the OUTPUT operand(s) that can clobber any of the
input source operands.  You don't mark the source operands that could be 
clobbered.

Peter




Re: libgfortran.so SONAME and powerpc64le-linux ABI changes

2021-10-06 Thread Peter Bergner via Gcc
On 10/6/21 12:50 PM, Segher Boessenkool wrote:
> So we have three options (well, four):
> 
> 0) Do nothing.  We will stay in this hell forever.  Not my choice :-)
> 1) Use a soft-float-like parameter passing everywhere.  This works but
>will be horridly slow on newer systems.  We can do better than that.
> 2) Use the current setup where -mcpu=power8 (or later) makes QP float
>available.  Most BE stuff isn't compiled with that currently, and it
>will split our ecosystem.
> 3) As Joseph reminds me the high VSRs are the VRs, so we could use the
>same parameter passing on anything with AltiVec.  We could even
>simply require -maltivec for QP float to be supported (we currently
>require -mvsx, this would not be a restriction).
> 
> I think I like 3) :-)

I like 3 too, meaning requiring -maltivec to support IEEE QP at all.
This would cover POWER6 and later server CPUs, as well as some other
cpus like in the Power Macs.  

Anything without Altivec hardware would need to either not support
IEEE QP at all, or go through the work themselves of coming up with
a -msoft-altivec like ABI.

Peter



Re: GCC trunk commit a325bdd195ee96f826b208c3afb9bed2ec077e12

2021-06-16 Thread Peter Bergner via Gcc
On 6/16/21 1:32 PM, Uros Bizjak wrote:
> On Wed, Jun 16, 2021 at 6:08 PM Liu Hao  wrote:
>> It looks like Uroš was on 00d07ec6e12, committed his changes mistakenly with 
>> `git commit --amend`
>> (which changed the commit message but did not reset the author), then 
>> rebased the modified commit
>> onto ee52bf609bac. Git is smart enough to drop duplicate changes, but the 
>> leftovers formed a new
>> commit, which was exactly a325bdd195e.
> 
> Indeed, IIRC - contrib/gcc_update failed due to the unresolved merge,
> and I changed my commit with --amend. There were some issues, but I
> was under the impression that I fixed them. It looks like I forgot
> something, so the result is the commit with wrong author attribution.
> 
> Perhaps a notice in the documentation should be added what to do if
> contrib/gcc_update fails, or perhaps this script should be made more
> robust.

I admit, that if the same thing happened to me, I would have made the
same mistake...or worse :-), so yeah, a comment about what to do to "fix"
things when gcc_update fails would be greatly appreciated by me too!

Peter





GCC trunk commit a325bdd195ee96f826b208c3afb9bed2ec077e12

2021-06-16 Thread Peter Bergner via Gcc
Hi all,

I recently did a search on a git log of gcc trunk looking for a particular
commit of mine, so was searching for my name, and I came across a commit
from Uroš that lists me as the Author.  I did not author that commit and
talking with Uroš offline, he assures me that he didn't use --author when
committing that, so we're wondering whether there might be a bug in one
of the commit hooks.  Is there someone who an dig into the commit below
and try to find out how the author field was incorrectly set?

Peter


commit a325bdd195ee96f826b208c3afb9bed2ec077e12
Author: Peter Bergner 
AuthorDate: Thu Jun 10 13:54:12 2021 -0500
Commit: Uros Bizjak 
CommitDate: Thu Jun 10 23:55:24 2021 +0200

i386: Add V8QI and other 64bit vector permutations [PR89021]

In addition to V8QI permutations, several other missing permutations are
added for 64bit vector modes for TARGET_SSSE3 and TARGET_SSE4_1 targets.

2021-06-10  Uroš Bizjak  

gcc/
PR target/89021
* config/i386/i386-expand.c (ix86_split_mmx_punpck):
Handle V2SF mode.  Emit SHUFPS to fixup unpack-high for V2SF mode.
(expand_vec_perm_blend): Handle 64bit modes for TARGET_SSE4_1.
(expand_vec_perm_pshufb): Handle 64bit modes for TARGET_SSSE3.
(expand_vec_perm_pblendv): Handle 64bit modes for TARGET_SSE4_1.
(expand_vec_perm_interleave2): Handle 64bit modes.
(expand_vec_perm_even_odd_pack): Handle V8QI mode.
(expand_vec_perm_even_odd_1): Ditto.
(ix86_vectorize_vec_perm_const): Ditto.
* config/i386/i386.md (UNSPEC_PSHUFB): Move from ...
* config/i386/sse.md: ... here.
* config/i386/mmx.md (*vec_interleave_lowv2sf):
New insn_and_split pattern.
(*vec_interleave_highv2sf): Ditto.
(mmx_pshufbv8qi3): New insn pattern.
(*mmx_pblendw): Ditto.



Re: D build on powerpc broken (was Re: GCC 11.1 Release Candidate available from gcc.gnu.org)

2021-04-20 Thread Peter Bergner via Gcc
On 4/20/21 4:20 PM, Jakub Jelinek via Gcc wrote:
> On Tue, Apr 20, 2021 at 03:27:08PM -0500, William Seurer via Gcc wrote:
>> /tmp/cc8zG8DV.s: Assembler messages:
>> /tmp/cc8zG8DV.s:2566: Error: unsupported relocation against r13
>> /tmp/cc8zG8DV.s:2570: Error: unsupported relocation against r14
[snip]
> So do we need to change
> +else version (PPC)   
>   
>   
> +{
>   
>   
> +void*[19] regs = void;   
>   
>   
> +asm pure nothrow @nogc   
>   
>   
> +{
>   
>   
> +"stw r13, %0" : "=m" (regs[ 0]); 
>   
>   
> +"stw r14, %0" : "=m" (regs[ 1]); 
>   
>   
> ...
> +else version (PPC64) 
>   
>   
> +{
>   
>   
> +void*[19] regs = void;   
>   
>   
> +asm pure nothrow @nogc   
>   
>   
> +{
>   
>   
> +"std r13, %0" : "=m" (regs[ 0]); 
>   
>   
> +"std r14, %0" : "=m" (regs[ 1]); 
>   
>   
> ...
> to "stw 13, %0" and "std 13, %0" etc. unconditionally, or
> to "stw %%r13, %0" etc. under some conditions?

Yes, I think so.  The "r13", etc. names are not accepted by gas unless you
use the -mregnames option.  It's easier to just remove the 'r'.

Peter



Re: subversion status on gcc.gnu.org

2020-03-24 Thread Peter Bergner via Gcc
On 3/24/20 12:06 PM, Frank Ch. Eigler wrote:
>> Thanks for working on this!!!  However, I still see at least one issue
>> in the following bugzilla entry:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94123#c4
>>
>> The first two git style links work, but the last one which points
>> to the SVN revision doesn't.  Is that a bug in the actual url that
>> bugzilla added or can we handle these too?
> 
> We can/do handle the last one too.  httpd mod_rewrite is powerful.

Works now.  Thanks for fixing!

Peter




Re: subversion status on gcc.gnu.org

2020-03-24 Thread Peter Bergner via Gcc
On 3/20/20 12:37 PM, Frank Ch. Eigler via Gcc wrote:
> Hi -
> 
> Both svn: and ssh+svn: now work for your archeological needs.
> Further, URLs such as
> 
> https://gcc.gnu.org/viewcvs?rev=279160&root=gcc&view=rev
> https://gcc.gnu.org/r123456
> 
> are mapped to gitweb searches that try to locate the matching
> From-SVN: rABCDEF commit.  This way, historical URLs from bugzilla
> should work.
> 
> If you badly need something else subversionish, please let me know.

Thanks for working on this!!!  However, I still see at least one issue
in the following bugzilla entry:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94123#c4

The first two git style links work, but the last one which points
to the SVN revision doesn't.  Is that a bug in the actual url that
bugzilla added or can we handle these too?

Peter




Re: Merges from release branches to vendor tracking branches

2020-01-23 Thread Peter Bergner
On 1/23/20 12:09 PM, Peter Bergner wrote:
> On 1/23/20 4:29 AM, Jakub Jelinek wrote:
>> so it is not a fast forward merge and we have the requirement that
>> From-SVN: shouldn't appear in commit logs of new commits.
> 
> So I just did "git merge releases/gcc-9" into our branch and I'm not
> seeing any From-SVN: in any of the commit messages.  Where/how are
> you seeing those?

Actually, I see them now.  I'm not sure what happened before.

So Joseph said these are actually ok on the vendor branches,
but what was the original concern with them being there in the
commit logs?  Is it still useful to remove them?

Peter





Re: Merges from release branches to vendor tracking branches

2020-01-23 Thread Peter Bergner
On 1/23/20 4:29 AM, Jakub Jelinek wrote:
> Just FYI if somebody needs to do something similar, I needed to do a merge
> from origin/releases/gcc-9 to our vendor branch - 
> refs/vendors/redhat/heads/gcc-9-branch
> This branch has some extra commits origin/releases/gcc-9 branch doesn't
> have,

This is good timing, as I'd like to do the same for our IBM 9 branch
refs/vendors/ibm/heads/gcc-9-branch.



> so it is not a fast forward merge and we have the requirement that
> From-SVN: shouldn't appear in commit logs of new commits.

So I just did "git merge releases/gcc-9" into our branch and I'm not
seeing any From-SVN: in any of the commit messages.  Where/how are
you seeing those?


Peter




Re: git conversion in progress

2020-01-22 Thread Peter Bergner
On 1/22/20 3:26 AM, Gerald Pfeifer wrote:
> On Mon, 13 Jan 2020, Joseph Myers wrote:
>> In addition, once git.html is more complete (has the list of branches 
>> added, at least) we need to update the GCC home page to link to the new 
>> pages in place of those for SVN, redirect the old pages to the new ones, 
>> and generally update references to SVN in wwwdocs and the GCC manuals.
> 
> I have removed all references to svnwrite.html and svn.html from our
> own pages, added redirects to gitwrite.html and git.html, respectively,
> and after svnwrite.html a few days ago now also removed svn.html.

The rsync.html page can be removed too, since that was a way to download
the entire svn repo.  With git clone, you get the entire repo, so rsync
isn't needed anymore.

Peter




Help with new GCC git workflow...

2020-01-14 Thread Peter Bergner
As somewhat of a git newbie and given gcc developers will do a git push of
our changes rather than employing a git pull development model, I'd like
a little hand holding on what my new gcc git workflow should be, so I don't
screw up the upstream repo by pushing something to the wrong place. :-)

I know enough that I should be using local branches to develop my changes,
so I want something like:

  git checkout master
  git pull
  git checkout -b 
  
  git commit -m "My commit message1"
  
  git commit -m "My commit message2"
  
  git commit -m "My commit message3"
  

At this point, I get a little confused. :-)  I know to submit my patch
for review, I'll want to squash my commits down into one patch, but how
does one do that?  Should I do that now or only when I'm ready to
push this change to the upstream repo or ???  Do I need to even do that?

Also, when I'm ready to push this "change" upstream to trunk, I'll need
to move this over to my master and then push.  What are the recommended
commands for doing that?  I assume I need to rebase my branch to
current upstream master, since that probably has moved forward since
I checked my code out.

Also, at what point do I write my final commit message, which is different
than the (possibly simple) commit messages above?  Is that done after I've
pulled my local branch into my master?  ...or before?  ...or during the
merge over?

...and this is just for changes going to trunk.  How does all this change
when I want to push changes to a release or vendor branch?

I guess I'm just looking for some simple workflow commands for both
trunk and release/vendor branches I can follow until I'm a little more
confident in my git knowledge.

I'm guessing I'm not the only one who would like this info, so maybe
someone can add this to our wiki?


Peter




Re: BountySource campaign for gcc PR/91851

2019-10-30 Thread Peter Bergner
On 10/30/19 2:31 PM, Georg-Johann Lay wrote:
> Hi, have the cc0 backends been deprecated?
> 
> I didn't follow the lists for some time...  At least neither v9 or v10
> release notes caveats mention such deprecation, neither is there
> respective PRs for the cc0 targets.

https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01256.html

Peter




Re: Question regarding constraint usage within inline asm

2019-02-21 Thread Peter Bergner
On 2/20/19 9:39 PM, Alan Modra wrote:
> On Wed, Feb 20, 2019 at 08:57:52PM -0600, Peter Bergner wrote:
>> Yes, because they don't have my IRA and LRA patches that exposed this
>> problem. I would say they were buggy for not complaining and silently
>> spilling a hard register in the case where we used asm reg("...").
> 
> I don't follow your reasoning.  It seems to me that giving some
> variable a register asm doesn't mean that the value of that variable
> can't appear in some other register.  An obvious example is when
> passing that variable to a function.

I don't disagree with you here.  For sure, multiple registers can hold
the same value, the same that multiple variables can hold the same value.



> So why shouldn't a hard reg be reloaded in order to satisfy
> incompatible constraints?

About the only usage of register asm that is guaranteed, is their
usage in inline asm.  If you specify a hard register for a variable
and then use that variable in an inline asm, you are guaranteed
that that variable will use that register in the inline asm.
Now in this case, "input" doesn't have the register asm, but
asmcons rewrites the rtl such that it looks like "input" was
assigned via a register asm.

LRA doesn't know about register asms, it just sees pseudos and hard
registers, so I think it needs to be conservative and assume the
explicit hard registers it sees could have come from a register asm,
and not spill it, but rather error out and let the user fix it.

That said, the "bug" in the case we're seeing, is that asmcons
rewrote all of "input"'s pseudos, and it should be more careful
to not create rtl with illegal constraint usage that LRA cannot
fix up.  With the fix, operand %1 in the inline asm is no longer
hard coded to r3 and it uses the pseudo instead, so everything
is copacetic.

Peter



Re: Question regarding constraint usage within inline asm

2019-02-20 Thread Peter Bergner
On 2/20/19 4:04 PM, Alan Modra wrote:
> On Wed, Feb 20, 2019 at 10:08:07AM -0600, Peter Bergner wrote:
>> On 2/19/19 9:09 PM, Alan Modra wrote:
>> That said, talking with Segher and Uli offline, they both think the
>> inline asm usage in the test case should be legal
> 
> Good, it seems we are in agreement.  Incidentally, the single pseudo
> for the inputs happens even for testcases like
> 
> long input;
> long
> bug (void)
> {
>   register long output /* asm ("r3") */;
>   asm ("blah %0, %1, %2" : "=r" (output) : "wi" (input), "0" (input));
>   return output;
> }

This is a different problem than I'm fixing, but you are correct that
asmcons shouldn't replace operand %1 since it has a non-compatible
constraint than the output operand.  In this case, it's probably "ok"
to spill even though it's a hard register, because it doesn't match
the regclass it is supposed to have.  I'm not sure how important
this is to fix.  It can also imagine that this would be hard to
handle, since we'd have to call into the backend to see whether the
two constraints are compatible and with the overlap between different
constraints, that could be very very messy!

Peter







Re: Question regarding constraint usage within inline asm

2019-02-20 Thread Peter Bergner
On 2/20/19 4:19 PM, Alan Modra wrote:
> I forgot to say, gcc-6, gcc-7 and gcc-8 handle your original testcase
> with the register asm just fine.

Yes, because they don't have my IRA and LRA patches that exposed this
problem. I would say they were buggy for not complaining and silently
spilling a hard register in the case where we used asm reg("...").

Peter



Re: Question regarding constraint usage within inline asm

2019-02-20 Thread Peter Bergner
On 2/19/19 9:09 PM, Alan Modra wrote:
> On Mon, Feb 18, 2019 at 01:13:31PM -0600, Peter Bergner wrote:
>> long input;
>> long
>> bug (void)
>> {
>>   register long output asm ("r3");
>>   asm ("blah %0, %1, %2" : "=&r" (output) : "r" (input), "0" (input));
>>   return output;
>> }
>>
>> I know an input operand can have a matching constraint associated with
>> an early clobber operand, as there seems to be code that explicitly
>> mentions this scenario.  In this case, the user has to manually ensure
>> that the input operand is not clobbered by the early clobber operand.
>> In the case that the input operand uses an "r" constraint, we just
>> ensure that the early clobber operand and the input operand are assigned
>> different registers.  My question is, what about the case above where
>> we have the same variable being used for two different inputs with
>> constraints that seem to be incompatible?
> 
> Without the asm("r3") gcc will provide your "blah" instruction with
> one register for %0 and %2, and another register for %1.  Both
> registers will be initialised with the value of "input".

That's not what I'm seeing.  I see one pseudo (123) used for the output
operand and one pseudo (121) used for both input operands.  Like so:

(insn 8 6 7 (parallel [
(set (reg:DI 123 [ outputD.2831 ])
(asm_operands:DI ("blah %0, %1, %2") ("=&r") 0 [
(reg/v:DI 121 [  ]) repeated x2
]
 [
(asm_input:DI ("r") bug.i:6)
(asm_input:DI ("0") bug.i:6)
]
 [] bug.i:6))
(clobber (reg:SI 76 ca))
]) "bug.i":6:3 -1
 (nil))

The only difference between using asm("r3") and not using it is that
pseudo 123 is replaced with hard reg 3 in the output operand.  The input
operands use pseudo 121 in both cases.  It stays this way up until the
asmcons pass (ie, match_asm_constraints_1) which notices that operand %2
has a matching constraint with operand %0, so it emits a copy before
the asm that writes "input"'s pseudo into "output"'s pseudo and then
rewrites the asm operand %2 to use "output"'s pseudo.  But then it goes
ahead and rewrites all other uses of "input"'s pseudos with "output"'s
pseudo, so operand %1 also gets rewritten.  So we end up with:

(insn 15 6 8 2 (set (reg:DI 123 [ outputD.2831 ])
(reg/v:DI 121 [  ])) "bug.i":6:3 -1
 (nil))
(insn 8 15 12 2 (parallel [
(set (reg:DI 123 [ outputD.2831 ])
(asm_operands:DI ("blah %0, %1, %2") ("=&r") 0 [
(reg:DI 123 [ outputD.2831 ]) repeated x2
]
 [
(asm_input:DI ("r") bug.i:6)
(asm_input:DI ("0") bug.i:6)
]
 [] bug.i:6))
(clobber (reg:SI 76 ca))
]) "bug.i":6:3 -1
 (expr_list:REG_DEAD (reg/v:DI 121 [  ])
(expr_list:REG_UNUSED (reg:SI 76 ca)
(nil

Now the case above (ie, not using asm("r3")) compiles fine.  We assign
pseudo 123 to r3 and LRA's constraint checking code notices that operand
%1 should not be assigned to the same register as the early clobber
output operand, so it spills it.  However, when we use asm("r3"),
LRA's constraint checking code again sees that operand %1 shouldn't
have the same register as operand %0, but since it's a preassigned
hard register, it cannot spill it, since there may have been a valid
reason why that particular operand is supposed to be in r3, so we ICE.
I'm not sure we can ever safely spill a hard register.

That said, talking with Segher and Uli offline, they both think the
inline asm usage in the test case should be legal, so that tells me
then that the bug is in the asmcons pass when it rewrites operand %1's
pseudo.  It really should check that operand %1's pseudo should not
be updated because it conflicts with the early clobber operand %0.
That would then allow operand %1 and operand %2 to have different
registers.  I'll try and prepare a patch that checks for that scenario.

Peter



Question regarding constraint usage within inline asm

2019-02-18 Thread Peter Bergner
I have a question about constraint usage in inline asm when we have
an early clobber output operand.  The test case is from PR89313 and
looks like the code below (I'm using "r3" for the reg on ppc, but
you could also use "rax" on x86_64, etc.).

long input;
long
bug (void)
{
  register long output asm ("r3");
  asm ("blah %0, %1, %2" : "=&r" (output) : "r" (input), "0" (input));
  return output;
}

I know an input operand can have a matching constraint associated with
an early clobber operand, as there seems to be code that explicitly
mentions this scenario.  In this case, the user has to manually ensure
that the input operand is not clobbered by the early clobber operand.
In the case that the input operand uses an "r" constraint, we just
ensure that the early clobber operand and the input operand are assigned
different registers.  My question is, what about the case above where
we have the same variable being used for two different inputs with
constraints that seem to be incompatible?  Clearly, we cannot assign
a register to the "input" variable that is both the same and different
to the register that is assigned to "output".

Is this outright invalid to have "input" use both a matching and
non-matching constraint with an early clobber operand?  Or is is
expected that reload/LRA will come along and fix up the "r" usage
to use a different register?

My guess is that this is invalid usage and I have a patch to
expand_asm_stmt() to catch this, but it only works if we've
preassigned "output" to a hard register.  If this is truly
invalid, should I flag this even if "output" isn't preassigned?

If it is valid, then should match_asm_constraints_1() really rewrite
all of the uses of "input" with the register assigned to output as
it is doing now, which is what is causing the problems in LRA.
LRA sees that both input operands are using r3 and it catches the
constraint violation of the "r" input and tries to spill it, but
it's not a pseudo, but an explicit hard register already.  I'm not
sure LRA can really safely spill an operand that is an explicit hard
register.

Thoughts?

Peter




Re: Spectre V1 diagnostic / mitigation

2018-12-19 Thread Peter Bergner
On 12/19/18 7:59 AM, Florian Weimer wrote:
> * Richard Biener:
> 
>> Sure, if we'd ever deploy this in production placing this in the
>> TCB for glibc targets might be beneifical.  But as said the
>> current implementation was just an experiment intended to be
>> maximum portable.  I suppose the dynamic loader takes care
>> of initializing the TCB data?
> 
> Yes, the dynamic linker will initialize it.  If you need 100% reliable
> initialization with something that is not zero, it's going to be tricky
> though.  Initial-exec TLS memory has this covered, but in the TCB, we
> only have zeroed-out reservations today.

We have non-zero initialized TCB entries on powerpc*-linux which are used
for the GCC __builtin_cpu_is() and __builtin_cpu_supports() builtin
functions.  Tulio would know the magic that was used to get them setup.

Peter





Re: LRA reload produces invalid insn

2018-11-02 Thread Peter Bergner
On 11/1/18 10:37 PM, Vladimir Makarov wrote:
> On 11/01/2018 08:25 PM, Paul Koning wrote:
>> Is this an LRA bug, or is there something I need to do in the target to 
>> prevent this happening?
> It is hard to say whose code is responsible for this.  It might be a wrong 
> machine-dependent code or a LRA bug.
> 
> Paul, could you send me full LRA dump file (.reload).  It might help me to 
> say more specific reason for the bug.  LRA has iterated sub-passes and the 
> full dump can say where LRA started to behave wrongly.
> 

I'll note that when we ported the rs6000 (ie, ppc*) port over to LRA
from reload, we hit many target problems.  It seems LRA is much less
forgiving to bad constraints, predicates, etc. than reload was.
I think that's actually a good thing.

Peter



Re: LRA reload produces invalid insn

2018-11-01 Thread Peter Bergner
On 11/1/18 8:40 PM, Segher Boessenkool wrote:
> Hi Peter,
> 
> On Thu, Nov 01, 2018 at 07:49:36PM -0500, Peter Bergner wrote:
>> On 11/1/18 7:25 PM, Paul Koning wrote:
>>> I'm running the testsuite on the pdp11 target, and I get a failure when 
>>> using LRA that works correctly with the old allocator.  The issue is that 
>>> LRA is producing an insn that is invalid (it violates the constraints 
>>> stated in the insn definition).
>> [snip]
>>> which is the correct sequence given the matching operand constraint in the 
>>> define_insn.
>>>
>>> Is this an LRA bug, or is there something I need to do in the target to 
>>> prevent this happening?
>>
>> What do you mean by "old allocator"?
> 
> I think Paul just means old reload.

In that case, my patch may still help.

Peter



Re: LRA reload produces invalid insn

2018-11-01 Thread Peter Bergner
On 11/1/18 7:25 PM, Paul Koning wrote:
> I'm running the testsuite on the pdp11 target, and I get a failure when using 
> LRA that works correctly with the old allocator.  The issue is that LRA is 
> producing an insn that is invalid (it violates the constraints stated in the 
> insn definition).
[snip]
> which is the correct sequence given the matching operand constraint in the 
> define_insn.
> 
> Is this an LRA bug, or is there something I need to do in the target to 
> prevent this happening?

What do you mean by "old allocator"?  Just an older revision?  Does it work 
before my
revision 264897 commit and broken after?  If so, could you try the following to 
see
whether that fixes things for you?

https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html

My commit above exposed some latent LRA bugs and my patch above tries
to fix issues similar to what you're seeing.

Peter



Re: Even numbered register pairs restriction on some instructions

2018-08-31 Thread Peter Bergner
On 8/31/18 10:41 AM, Matthew Malcomson wrote:
> I'm looking into whether it's possible to require even numbered registers on
> modes that need more than one hard-register to represent them. But only in
> some cases.

Yes, it's possible.  You can look at TDmode (128-bit decimal floating point)
on powerpc64*-linux, which is only allowed in even-odd register pairs.
It's in *all* cases though, not some of the time.

Peter



Re: Transactional memory test case reduction failure

2018-08-27 Thread Peter Bergner
On 8/27/18 1:20 PM, sameeran joshi wrote:
> On 8/27/18, Peter Bergner  wrote:
>> On 8/27/18 12:13 PM, sameeran joshi wrote:
>>> On 8/27/18, Peter Bergner  wrote:
>>>> Well what does:
>>>>
>>>>   linux% gcc -I/home/swamimauli/upload/csmith/runtime/ -Wall bug.c
>>>
>>> running above command on terminal,gives many warnings and asks for the
>>> -fgnu-tm option.
>>>
> 
> this shows me ICE if I include -fgnu-tm flag.

Then you need to add -fgnu-tm to your compile options in your
creduce script.  Otherwise, how can creduce reduce your test
case down to a minimal, but still ICEing test case, if you
don't tell it how to make it ICE?

Peter




Re: Transactional memory test case reduction failure

2018-08-27 Thread Peter Bergner
On 8/27/18 12:13 PM, sameeran joshi wrote:
> On 8/27/18, Peter Bergner  wrote:
>> Well what does:
>>
>>   linux% gcc -I/home/swamimauli/upload/csmith/runtime/ -Wall bug.c
> 
> running above command on terminal,gives many warnings and asks for the
> -fgnu-tm option.
> 
> bug.c:1091:2: error: ‘__transaction_relaxed ’ without transactional
> memory support enabled
>   __transaction_relaxed {

Well there's your problem then, meaning your compile command doesn't
result in the "internal compiler error: " message you're expecting
to see.

Peter



Re: Transactional memory test case reduction failure

2018-08-27 Thread Peter Bergner
On 8/27/18 11:42 AM, sameeran joshi wrote:
> It's still giving output as 1,I included the -squiggle option still,it
> dosen't work for me? any Ideas?
> 
> #!/bin/bash
> 
> CC="-I/home/swamimauli/upload/csmith/runtime/"
> OPTS="-Wall"
> TEST="bug.c"
> gcc ${CC} ${OPTS} ${TEST} 2>&1 | grep 'internal compiler error:in
> expand_expr_addr_expr_1, at expr.c:7862'
> if ! test $? = 0; then
> exit 1
> fi
> exit 0

Well what does:

  linux% gcc -I/home/swamimauli/upload/csmith/runtime/ -Wall bug.c

return?

And also, what does:

  linux% gcc -I/home/swamimauli/upload/csmith/runtime/ -Wall bug.c 2>&1 | grep 
'internal compiler error: in expand_expr_addr_expr_1, at expr.c:7862'
  linux% echo $?

return?

Peter



Re: Transactional memory test case reduction failure

2018-08-27 Thread Peter Bergner
On 8/27/18 10:35 AM, Shubham Narlawar wrote:
> Here is the file. I am getting some error in sending .sh file, so I send it
> as below.
> 
> #!/bin/bash
> gcc -fgnu-tm testcase.c > out.txt 2>&1 &&\
> if
> grep 'internal compiler error' out.txt
> then
>   exit 0
> else
>   exit 1
> fi

When I use creduce, I never write my output to an actual file, but
just pipe it directly into grep.  My creduce.sh scripts usually look
like the following which have worked for me in the past.

Peter


#!/bin/bash

CC="/home/bergner/gcc/build/gcc-fsf-6-pr78543-debug/gcc/xgcc 
-B/home/bergner/gcc/build/gcc-fsf-6-pr78543-debug/gcc"
OPTS="-O3 -S"
TEST=pr78543-2.i

${CC} ${OPTS} ${TEST} 2>&1 | grep 'internal compiler error: in push_reload, at 
reload.c:1349'
if ! test $? = 0; then
  exit 1
fi
exit 0



Re: Question regarding preventing optimizing out of register in expansion

2018-06-26 Thread Peter Bergner
On 6/26/18 4:05 AM, Peryt, Sebastian wrote:
> With some changes simplified implementation of my expansion is as follows:
> tmp_op0 = gen_reg_rtx (mode);
> emit_move_insn (tmp_op0, op0);

You set tmp_op0 here, and then


> emit_insn (gen_rtx_SET (tmp_op0, reg));

You set it again here without ever using it above, so it's dead code,
which explains why it's removed.

Peter




Re: Why does IRA force all pseudos live across a setjmp call to be spilled?

2018-03-06 Thread Peter Bergner
On 3/5/18 9:33 AM, Segher Boessenkool wrote:
> On Mon, Mar 05, 2018 at 08:01:14AM +0100, Eric Botcazou wrote:
>> Apparently the authors of the SPARC psABI thought that the last part of your 
>> sentence is an interpolation and that the (historical) requirements were 
>> vague 
>> enough to allow their interpretation, IOW that the compiler can do the work.
> 
> Maybe we should have a target hook that says setjmp/longjmp are
> implemented by simple function calls (or as-if by function calls), so
> as not to penalize everyone who has an, erm, more conservative ABI?

Unless someone really wants to work on this, I'll have a look at
adding this once stage1 opens up.

Peter



Re: Why does IRA force all pseudos live across a setjmp call to be spilled?

2018-03-04 Thread Peter Bergner
On 3/4/18 7:57 AM, Eric Botcazou wrote:
>> I can't argue with anything in that comment, other than the conclusion. :-)
>> It's not the compiler's job to implement the setjmp/longjmp save/restore.
>> Maybe Kenny was working around a problem with some target's buggy setjmp
>> and spilling everything "fixed" it?
> 
> What are the requirements imposed on setjmp exactly and by whom?  The psABI 
> on 
> SPARC (the SCD) has an explicit note saying that setjmp/sigsetjmp/vfork don't 
> (have to) preserve the usual non-volatile registers.

I'm not a language lawyer and I don't play one on TV either, but I believe
the requirements come from multiple sources.  You've pointed out your ABI
and Andreas pointed out the C standard also places requirements:

https://gcc.gnu.org/ml/gcc/2018-03/msg00030.html

I wouldn't be surprised if there are more specs/standards that place
restrictions too.  Clearly returning from the function that calls
setjmp before calling longjmp must be illegal, since that would result
in clobbering of the stack frame the longjmp would attempt to restore to.
I don't know off hand who/what states that restriction.

Peter




Re: Why does IRA force all pseudos live across a setjmp call to be spilled?

2018-03-03 Thread Peter Bergner
On 3/3/18 5:47 PM, Peter Bergner wrote:
> On 3/3/18 10:29 AM, Jeff Law wrote:
>> Here's the comment from regstat.c:
>>
>>   /* We have a problem with any pseudoreg that lives
>>  across the setjmp.  ANSI says that if a user variable
>>  does not change in value between the setjmp and the
>>  longjmp, then the longjmp preserves it.  This
>>  includes longjmp from a place where the pseudo
>>  appears dead.  (In principle, the value still exists
>>  if it is in scope.)  If the pseudo goes in a hard
>>  reg, some other value may occupy that hard reg where
>>  this pseudo is dead, thus clobbering the pseudo.
>>  Conclusion: such a pseudo must not go in a hard
>>  reg.  */
> 
> I can't argue with anything in that comment, other than the conclusion. :-)
> It's not the compiler's job to implement the setjmp/longjmp save/restore.
> Maybe Kenny was working around a problem with some target's buggy setjmp
> and spilling everything "fixed" it?

The only observable difference I can see between a variable that has been
spilled to memory versus one that is assigned to a non-volatile hard reg
is if it is modified between the setjmp and the longjmp.  In the case
where the variable is spilled to memory, the "new" updated value is the
value you _may_ see on the return from setjmp (the return caused by the
call to longjmp), whereas if it is assigned to a non-volatile register,
then you _will_ see the "old" value that was saved by the setjmp call.
I say _may_ see above, because there are cases were we might not store
the "new" updated value to memory, even if we've spilled the pseudo.
Examples would be spill code optimization, or the variable has been
broken into separate live ranges/pseudos. etc. etc.  I guess I can even
think of cases where we could see both "old" and "new" values of a
variable.  Think of a variable that has been spilled/split like below:

a =  [start of live range, a assigned to non-volatile reg]
spill store a
...
setjmp()
...
1)  ... = ... a ... [end of live range]
... [a not assigned to a reg in this region]
spill load a[start of live range]
2)  ... = ... a ... [end of live range]
...
if (...)
   a =   [start of live range]
3) spill store a[end of live range]
... [a not assigned to a reg in this region]
longjmp()


On return from setjmp (the return caused by the call to longjmp),
the use of "a" at "1)" will use the non-volatile hard register
that was saved by the initial call to setjmp, so it will see the
"old" value of "a".  However, since the use of "a" at "2)" loads
the value from memory, it will use the "new" value stored by
the spill load at "3)"!

That said, the comment above only talks about variables that do not
change between the setjmp and the longjmp and in that case, you will
see the same "old" value (which is the only value, since it wasn't
modified) regardless of whether it was spilled or not.

What does ANSI (or any spec) say about what should happen to variables
that are modified between the setjmp and longjmp calls?  Maybe all bets
are off, given the example above, since even spilling a variable live
across a setjmp can still lead to strange behavior unless you don't
allow spill/split optimization and I don't think we'd want that at all.

Peter




Re: Why does IRA force all pseudos live across a setjmp call to be spilled?

2018-03-03 Thread Peter Bergner
On 3/3/18 10:29 AM, Jeff Law wrote:
> Here's the comment from regstat.c:
> 
>   /* We have a problem with any pseudoreg that lives
>  across the setjmp.  ANSI says that if a user variable
>  does not change in value between the setjmp and the
>  longjmp, then the longjmp preserves it.  This
>  includes longjmp from a place where the pseudo
>  appears dead.  (In principle, the value still exists
>  if it is in scope.)  If the pseudo goes in a hard
>  reg, some other value may occupy that hard reg where
>  this pseudo is dead, thus clobbering the pseudo.
>  Conclusion: such a pseudo must not go in a hard
>  reg.  */

I can't argue with anything in that comment, other than the conclusion. :-)
It's not the compiler's job to implement the setjmp/longjmp save/restore.
Maybe Kenny was working around a problem with some target's buggy setjmp
and spilling everything "fixed" it?

It is absolutely fine for a pseudo that is live across a setjmp call to
occupy a (non-volatile) hard register at the setjmp's call site, even if
some other value eventually occupies the same hard register between the
setjmp and the longjmp.  The reason is that setjmp saves all of the non-
volatile hard registers in the jmp_buf.  If our pseudo was assigned to
one of those non-volatile hard registers, then its value at the time of
the setjmp call is saved, so even if its hard register is clobbered before
we get to the longjmp call, the longjmp will restore the pseudos value from
the jmp_buf into the hard register, restoring the value it had at the time
of the setjmp call.

The only way I can see the above not working is either setjmp doesn't
save the entire register state it should, the jmp_buf somehow gets clobbered
before the longjmp call or longjmp doesn't restore the entire register
state that it should.  All of those would be bugs in my book.

The only thing the register allocator should need to do, is treat setjmp
just like any other function call and make all pseudos that are live across
it interfere with all volatile hard registers, so that they will be assigned
to either non-volatile hard registers or spilled (if no non-volatile registers
are available).

Peter



Re: Why does IRA force all pseudos live across a setjmp call to be spilled?

2018-03-02 Thread Peter Bergner
On 3/2/18 3:26 PM, Jeff Law wrote:
> On 03/02/2018 12:45 PM, Peter Bergner wrote:
>> ...which forces us to spill everything live across the setjmp by forcing
>> the pseudos to interfere all hardregs.  That can't be good for performance.
>> What am I missing?
>
> You might want to hold off a bit.  I've got changes for 21161 which can
> help this significantly.  Basically the live-across-setjmp set is way
> too conservative -- it includes everything live at the setjmp, but it
> really just needs what's live on the longjump path.
> 
> As for why, I believe it's related to trying to make sure everything has
> the right values if we perform a longjmp.

I can understand why we might save/restore across functions that can throw
exceptions since the program state hasn't been saved at the point of the
call or in the call, but what is special about setjmp()?  We don't need
to save/restore the volatile regs since all functions clobber them and
the non-volatile regs are saved/restored by setjmp(), just like any
normal function call.  ...and as far as I know, setjmp() doesn't save
or restore the stack contents, just the stack pointer, pc, etc.
So I guess I still don't know why we treat it differently than any
other function call wrt register allocation.

Peter




Why does IRA force all pseudos live across a setjmp call to be spilled?

2018-03-02 Thread Peter Bergner
While debugging the PR84264 ICE caused by the following test case:

  void _setjmp ();
  void a (unsigned long *);
  void
  b ()
  {
for (;;)
  {
_setjmp ();
unsigned long args[9]{};
a (args);
  }
  }

I noticed that IRA is spilling all pseudos that are live across the call
to setjmp.  Why is that?  Trying to look through the history of this, I see
Jim committed a patch to reload that removed it spilling everything across
all setjmps:

  https://gcc.gnu.org/ml/gcc-patches/2003-11/msg01667.html

But currently ira-lives.c:process_bb_node_lives() has:

  /* Don't allocate allocnos that cross setjmps or any
 call, if this function receives a nonlocal
 goto.  */
  if (cfun->has_nonlocal_label
  || find_reg_note (insn, REG_SETJMP,
NULL_RTX) != NULL_RTX)
{
  SET_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj));
  SET_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj));
}

...which forces us to spill everything live across the setjmp by forcing
the pseudos to interfere all hardregs.  That can't be good for performance.
What am I missing?

Peter




Re: Register Allocation Graph Coloring algorithm and Others

2017-12-15 Thread Peter Bergner
On 12/14/17 9:18 PM, Leslie Zhai wrote:
> * The papers by Briggs and Chaiten contradict[2] themselves when examine
> the text of the paper vs. the pseudocode provided?

I've read both of these papers many times (in the past) and I don't recall
any contradictions in them.  Can you (Dave?) be more specific about what you
think are contradictions?

I do admit that pseudo code in papers can be very terse, to the point that
they don't show all the little details that are needed to actually implement
them, but they definitely shouldn't contradict their written description.
I was very grateful that Preston was more than willing to answer all my many
questions regarding his allocator and the many many details he couldn't
mention in his Ph.D. thesis, let alone a short paper.

Peter



Re: PowerPC -many

2017-02-14 Thread Peter Bergner

On 2/14/17 6:06 PM, Alan Modra wrote:

Since we've been talking about obsoleting cpu support, how about
getting rid of -many in ASM_CPU_SPEC for gcc-8?


+1

Peter





Re: -mcx16 vs. not using CAS for atomic loads

2017-01-24 Thread Peter Bergner

On 1/24/17 3:06 PM, Richard Henderson wrote:

The only possible concern I see might be with simulators that force HTM
failure, for the purpose of forcibly testing fallback paths.  I guess we'd have
to continue to fall back to the lock path for that case.


IIRC, this was the path that valgrind was going to use all of the time,
because actually implementing the HTM instructions was too hard.

Peter




Re: Remove sel-sched?

2016-01-15 Thread Peter Bergner
On Fri, 2016-01-15 at 11:13 +0100, Richard Biener wrote:
> Btw, I'd like people to start thinking if the scheduling algorithms
> working on loops (and sometimes requiring unrolling of loops) can be
> implemented in a way to apply that unrolling on the GIMPLE level
> (not the scheduling itself of course).

We've been underwhelmed with the RTL unroller on POWER and I think
we concur that a GIMPLE level unroller would be interesting.

Peter



Re: building gcc with macro support for gdb?

2015-12-02 Thread Peter Bergner
On Wed, 2015-12-02 at 20:05 -0500, Ryan Burn wrote:
> Is there any way to easily build a stage1 gcc with macro support for 
> debugging?
> 
> I tried setting CFLAGS, and CXXFLAGS to specify "-O0 -g3" via the
> command line before running configure, but that only includes those
> flags for some of the compilation steps.
> 
> I was only successful after I manually edited the makefile to replace
> "-g" with "-g3".

Try CFLAGS_FOR_TARGET='-O0 -g3 -fno-inline' and CXXFLAGS_FOR_TARGET='-O0 -g3 
-fno-inline'

Peter



Re: Powerpc atomic_load

2015-09-23 Thread Peter Bergner
On Wed, 2015-09-23 at 16:15 +0200, Sebastian Huber wrote:
> On 10/09/15 19:52, David Edelsohn wrote:
> > https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
> 
> Is there specific reason why the SYNC L,E (Elemental Memory Barriers) 
> defined by Power-ISA V2.07 doesn't appear in this table?

Probably because that category is only implemented on some (one?) cpus
(eg, E6500) and not on any of the server cpus (eg, power[45678]), so no
one cared enough to add that info? :-)  It would probably be useful to
add though.

Peter




Re: Repository for the conversion machinery

2015-08-28 Thread Peter Bergner
On Fri, 2015-08-28 at 11:00 -0400, Eric S. Raymond wrote:
> Peter Bergner :
> > On Thu, 2015-08-27 at 10:38 -0400, Eric S. Raymond wrote:
> > > I've made it available at:
> > > 
> > > http://thyrsus.com/gitweb/?p=gcc-conversion.git
> > > 
> > > The interesting content is gcc.map (the contributor map) and gcc.lift.
> > > 
> > > Presently the only command in gcc.lift expunges the hooks directory.
> > 
> > >From your list, I also see that alanm and amodra are both listed with
> > Alan's old bigpond.net.au address which no longer exists.  He now uses:
> > 
> >amo...@gmail.com

It looks like you have a cut/paste error, with Alan's email address:

alanm = Alan Modra 
amodra = Alan Modra 

s/amodra@amodra@/amodra@/

Peter





Re: Repository for the conversion machinery

2015-08-28 Thread Peter Bergner
> azanella = Adhemerval Zanella 

Adhemerval now works for Linaro, so his email address should be:

adhemerval.zane...@linaro.org


> bje = Ben Elliston 

Ben is no longer at Red Hat...or IBM.  He went back to school and his
new email address seems to be:

b.ellis...@unsw.edu.au


> dnovillo = Diego Novillo 

Diego is now at google, so his email address should be:

dnovi...@google.com


> drepper = Ulrich Drepper 

Uli is no longer at Red Hat (now at Goldman Sachs?).
His last email to the GCC mailing list used this address:

drep...@gmail.com

> janis = Janis Johnson 

Janis is now retired.  Her personal email address as listed in
the MAINTAINERS file is:

janis.marie.john...@gmail.com


> jgrimm = Jon Grimm 

Jon is now at Canonical.  I'm not sure which of the two email addresses
that seem to be active for him he prefers:

jon.gr...@canonical.com 
jon.gr...@gmail.com


> luisgpm = Luis Machado 

Luis is now with Codesourcery.  His email address is:

lgust...@codesourcery.com


> meissner = Michael Meissner 

Mike is now at IBM, but his email address in the MAINTAINERS file is:

g...@the-meissners.org


> mircea = Mircea Namolaru 

Mircea is now working at INRIA.  His email address is:

mircea.namol...@inria.fr


> olga = Olga Golovanevsky 

Olga is no longer at IBM.  I believe she is now at Cavium, but her
recent GNU Cauldron presentation used this address:

golovanevsky.o...@gmail.com


> spop = Sebastian Pop 

Sebastian is now at Samsung and his address is:

s@samsung.com


Peter




Re: Repository for the conversion machinery

2015-08-28 Thread Peter Bergner
On Thu, 2015-08-27 at 10:38 -0400, Eric S. Raymond wrote:
> I've made it available at:
> 
> http://thyrsus.com/gitweb/?p=gcc-conversion.git
> 
> The interesting content is gcc.map (the contributor map) and gcc.lift.
> 
> Presently the only command in gcc.lift expunges the hooks directory.

>From your list, I also see that alanm and amodra are both listed with
Alan's old bigpond.net.au address which no longer exists.  He now uses:

   amo...@gmail.com

Peter




Re: Repository for the conversion machinery

2015-08-27 Thread Peter Bergner
On Thu, 2015-08-27 at 16:13 +, Joseph Myers wrote:
> 273 missing usernames (this is based on grepping the output of svn log on 
> an rsync mirror of the repository, so it's possible one or two could be 
> spurious, but should be pretty accurate).  I've made no attempt to map 
> these to emails yet.

> acsawdey

Aaron Sawdey / acsaw...@linux.vnet.ibm.com


> bergner

Peter Bergner / berg...@vnet.ibm.com


> boger

Lynn Boger ?   labo...@linux.vnet.ibm.com


> pthaugen

Pat Haugen / pthau...@linux.vnet.ibm.com


> revitale

Revital Eres / e...@il.ibm.com


> wschmidt

Bill Schmidt / wschm...@linux.vnet.ibm.com


> zaks

Ayal Zaks / His MAINTAINERS entry still lists his IBM email address,
but he is no longer with IBM.  I'm not sure whether he now prefers
his az...@ee.technion.ac.il or ayal.z...@intel.com email addresses.


Peter




Re: 33 unknowns left

2015-08-26 Thread Peter Bergner
On Wed, 2015-08-26 at 20:12 -0400, Eric S. Raymond wrote:
> Peter Bergner :
> > On Wed, 2015-08-26 at 16:35 -0400, Eric S. Raymond wrote:
> > > Joseph Myers :
> > > > > irar = irar 
> > > > 
> > > > Ira Rosen 
> > > 
> > > I pretty much knew these two guys went with these two names, but couldn't
> > > figure out which was which.  Thanks.
> > 
> > Actually, Ira Rosen is a "she" and not a "he".
> > 
> > Peter
> > 
> 
> Really?  Interesting.  I have bever encountered "Ira" as a female name before.
> What language does this?

She works for IBM's Haifa research lab.

  https://il.linkedin.com/pub/ira-rosen/34/b73/433

Peter





Re: 33 unknowns left

2015-08-26 Thread Peter Bergner
On Wed, 2015-08-26 at 18:55 -0500, Peter Bergner wrote:
> On Wed, 2015-08-26 at 16:35 -0400, Eric S. Raymond wrote:
> > Joseph Myers :
> > > > irar = irar 
> > > 
> > > Ira Rosen 
> > 
> > I pretty much knew these two guys went with these two names, but couldn't
> > figure out which was which.  Thanks.
> 
> Actually, Ira Rosen is a "she" and not a "he".

Ah, I see Nick Clifton has been fingered.  Nevermind.

Peter




Re: 33 unknowns left

2015-08-26 Thread Peter Bergner
On Wed, 2015-08-26 at 13:44 -0700, Ian Lance Taylor wrote:
> On Wed, Aug 26, 2015 at 12:31 PM, Eric S. Raymond  wrote:
> > click = click 
> 
> You've got me on that one.  Any hints?

Just purely looking at the name, did Cliff Click ever
contribute to gcc in the past?

Peter




Re: 33 unknowns left

2015-08-26 Thread Peter Bergner
On Wed, 2015-08-26 at 16:35 -0400, Eric S. Raymond wrote:
> Joseph Myers :
> > > irar = irar 
> > 
> > Ira Rosen 
> 
> I pretty much knew these two guys went with these two names, but couldn't
> figure out which was which.  Thanks.

Actually, Ira Rosen is a "she" and not a "he".

Peter




Re: Moving to git

2015-08-21 Thread Peter Bergner
On Fri, 2015-08-21 at 16:09 +0200, Andreas Schwab wrote:
> Ramana Radhakrishnan  writes:
> 
> > On Fri, Aug 21, 2015 at 11:48 AM, Jonathan Wakely  
> > wrote:
> >> Teams following a different model could use a separate repo shared by
> >> those developers, not the gcc.gnu.org one. It's much easier to do that
> >> with git.
> >
> > Yes you are right they sure can, but one of the reasons that teams are
> > doing their development on a feature branch is so that they can obtain
> > feedback and collaborate with others in the community.
> 
> It is also much easier for others to pull from foreign repositories with
> git, so this isn't a severe downside.

It may be easy for git to pull from foreign repositories, but it may
be difficult/impossible (policy wise) for some developers from some
companies to be able to write to foreign repositories.  At IBM, we
cannot host our own source repositories that others can access.  We can
only write to the official source code repositories for the projects
that we have clearance to work in.  We currently have an IBM vendor
directory where we have our branches.  If we move to git (I'm all for
it), I would hope that those can remain in the official source code
repository.

That said, if the GCC project created an "official" side repository
where branches are stored, we could participate in that.

Peter





Re: Fail to compile trunk

2015-04-14 Thread Peter Bergner
On Tue, 2015-04-14 at 17:37 +0200, Harald Servat wrote:
>   I'm trying to compile the GCC's trunk but I find out the following
> error while compiling it. I've configured it such as

This question is not appropriate for this mailing list, as this
list is only for questions about gcc development.  You should continue
this on the gcc-help mailing list.


>   ./configure --prefix=/home/harald/pkg/gcc/git
> --enable-languages=c,c++ --disable-multilib

Building gcc within the GCC source tree is not supported.
try creating an empty build directory and using:

  /path/to/gcc/source/directory/configure 

Peter




Re: build broken on ppc linux?!

2013-11-22 Thread Peter Bergner
On Fri, 2013-11-22 at 12:30 +0100, Richard Biener wrote:
> On Fri, Nov 22, 2013 at 1:57 AM, Jonathan Wakely  
> wrote:
> > Yes, it only seems to be a problem with SUSE kernels:
> > http://gcc.gnu.org/ml/gcc/2013-11/msg00090.html
> 
> As my bugreport is being ignored it would help if one ouf our
> partners (hint! hint!) would raise this issue via the appropriate
> channel ;)

Ok, I'll open a bug on our side and we'll see if that helps
move things along.

Peter




Re: powerpc64 bootstrap broken due to libsanitizer merge from upstream

2013-11-07 Thread Peter Bergner
On Fri, 2013-11-08 at 00:03 +0100, Steven Bosscher wrote:
> powerpc64-linux bootstrap is broken by the libsanitizer merge:

I already reported the failures here:

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00312.html

It seems others have reported it breaks bootstrap for them as
well on other arches.  It's sad it's been broken this long,
given it affects so many people.  Anyway, the powerpc64-linux
breakage is being tracked here:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59009

Peter





Re: Bootstrap broken in libobjc/sendmsg.c

2013-09-06 Thread Peter Bergner
On Fri, 2013-09-06 at 13:36 +0200, Paolo Carlini wrote:
> . on x86_64-linux, this commit broke the build of that file:
> 
> http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00149.html
> 
> CC-ing Peter.

Can you try the patch that HJ suggested?

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58139#c9

Peter





Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-24 Thread Peter Bergner
On Wed, 2013-07-24 at 10:42 -0700, H.J. Lu wrote:
> Are there any other Linux targets with callee saved vector registers?

Yes, on POWER.  From our ABI:

  On processors with the VMX feature.
v0-v1 Volatile scratch registers
v2-v13 Volatile vector parameters registers
v14-v19 Volatile scratch registers
v20-v31 Non-volatile registers

I'll note that the new VSX register state we recently added with power7
were made volatile, but then we already had these non-volatile altivec
regs to use.

Peteer




Re: Libitm issues porting to POWER8 HTM

2013-06-19 Thread Peter Bergner
On Tue, 2013-06-18 at 21:48 +0200, Andi Kleen wrote:
> > Given Torvald's comment, can you verify whether your hw txn succeeds
> > (all the way to commit) or whether it is failing and somehow skips
> > the fall through code that is hanging for us (Power and S390)?
> 
> All the 3 transactions in reentrant.c abort.

Can you please explain the above?  When you say abort, do you mean
that libitm is calling htm_abort() or that your xbegin hardware
instruction isn't succeeding?

> That's not surprising, because there are usually lots of aborts in
> the startup phase of programs, and the test doesn't use a loop.

Is this a libitm statement or an Intel RTM statement, that the
startup phase usually has lots of aborts?

Peter





Re: Libitm issues porting to POWER8 HTM

2013-06-18 Thread Peter Bergner
On Tue, 2013-06-18 at 18:41 +0200, Torvald Riegel wrote:
> On Fri, 2013-06-14 at 19:44 -0500, Peter Bergner wrote:
> > I'll note that if I hack the call to
> > htm_abort_should_retry(ret) so that we break of of the loop and fallback
> > to SW TM, then the test case executes correctly.
> 
> That matches what I suppose the bug is.
> 
> Please feel free to create a bug report.  I will work on a patch.

Done.  http://gcc.gnu.org/PR57643

Since this seems to pass on x86, let me know if you want me to test a
patch on our power8 system.

Peter





Re: Libitm issues porting to POWER8 HTM

2013-06-18 Thread Peter Bergner
On Tue, 2013-06-18 at 11:22 -0700, Andi Kleen wrote:
> Peter Bergner  writes:
> >
> > I have yet to track down who has the write lock and why, but I am working
> > towards that.  Talking with Andreas, he said he is seeing the same failure
> > on S390, so I'm wondering whether this might be a generic libitm issue
> > and it might hit Intel too.  Does anyone know whether this executes 
> > correctly
> > on Intel hardware with RTM?  I'll note that if I hack the call to
> 
> FWIW on a TSX system I get the following for libitm with current
> trunk. So no hangs on reentrant at least.

Given Torvald's comment, can you verify whether your hw txn succeeds
(all the way to commit) or whether it is failing and somehow skips
the fall through code that is hanging for us (Power and S390)?

Thanks!

Peter





Libitm issues porting to POWER8 HTM

2013-06-14 Thread Peter Bergner
I'm currently implementing support for hardware transactional memory in
the rs6000 backend for POWER8.  Things seem to be mostly working, but I
have run into a few issues I'm wondering whether other people are seeing.

For me, all of the libitm execution test cases in libitm/testsuite/libitm.c/
compile and execute without error, except for reentrant.c, which hangs for me.
My gdb hasn't been ported to support HTM on Power yet, so debugging has been
slow, but what I've learned is, that my tbegin. instruction succeeds, but I
fail the test (meaning someone has the write lock) at beginend.cc:200:

if (unlikely(serial_lock.is_write_locked()))
  htm_abort();

...so we abort the transaction.  The failure is not persistent, so we do
not break out of the loop due to:

if (!htm_abort_should_retry(ret))
  break;

We then fall into the following code, where we hang trying to get the
read lock:

serial_lock.read_lock(tx);

I have yet to track down who has the write lock and why, but I am working
towards that.  Talking with Andreas, he said he is seeing the same failure
on S390, so I'm wondering whether this might be a generic libitm issue
and it might hit Intel too.  Does anyone know whether this executes correctly
on Intel hardware with RTM?  I'll note that if I hack the call to
htm_abort_should_retry(ret) so that we break of of the loop and fallback
to SW TM, then the test case executes correctly.


Secondly, many of the test cases in libitm/testsuite/libitm.c++/ fail
to build for me when I use -static with the following error:

/home/bergner/gcc/install/gcc-fsf-mainline-htm/lib64/libitm.a(method-serial.o):(.opd+0x1098):
 multiple definition of `__cxa_pure_virtual'
/home/bergner/gcc/install/gcc-fsf-mainline-htm/lib64/libstdc++.a(pure.o):(.opd+0x0):
 first defined here
collect2: error: ld returned 1 exit status

The comment in method-serial.cc says it's trying to avoid a dependency
on libstdc++.  Is the __cxa_pure_virtual workaround in method-serial.cc
supposed to work with -static?


Finially, when compiling (static or non-static) static-ctor.C, I'm seeing:

/home/bergner/gcc/gcc-fsf-mainline-htm/libitm/testsuite/libitm.c++/static_ctor.C:12:18:
 error: unsafe function call 'void __cxa_guard_release(long long int*)' within 
'transaction_safe' function
   static int y = x;
  ^
/home/bergner/gcc/gcc-fsf-mainline-htm/libitm/testsuite/libitm.c++/static_ctor.C:12:18:
 error: unsafe function call 'int __cxa_guard_acquire(long long int*)' within 
'transaction_safe' function

Does x86 not get calls to __cxa_guard_acquire and __cxa_guard_release for
this access, so it doesn't see this error?  To be honest, I'm not sure
what we're supposed to do with this error.


Peter





Re: register indirect addressing for global variables on powerpc

2013-01-14 Thread Peter Bergner
On Mon, 2013-01-14 at 08:00 +0100, Thomas Baier wrote:
> The operating system I'd like to use gcc for (OS-9, for the curious)
> requires an ABI, where global variables are only accessed through
> register indirect addressing. On the powerpc platform, r2 is used for
> indirect addressing. There is already a feature in gcc which can use
> register indirect addressing for the powerpc target for global variables
> using a special small data area, but unfortunately this is not enough.

If you look at the -mcmodel={small,medium,large} support we (IBM)
added to powerpc64-linux, you will see how one can generate larger
offsets to r2 (16-bit, 32-bit and 64-bit respectively).
Maybe you can borrow some of that code?

Peter





Re: bootstrap comparison failure ppc64 FreeBSD

2012-11-14 Thread Peter Bergner
On Wed, 2012-11-14 at 18:51 +0100, Andreas Tobler wrote:
> Hello,
> 
> on trunk (193501) I get a comparison failure:
> ---
> Bootstrap comparison failure!
> gcc/tree-ssa-forwprop.o differs
> ---
> 
> This is with --disable-checking. Leaving disable-checking away, the
> bootstrap completes succesfully.

I just fired off a --disable-checking build and I see the same
thing on powerpc64-linux.


> -9658:e8 89 00 09 ldu r4,8(r9)
> -965c:39 08 00 01 addir8,r8,1
> +9658:39 08 00 01 addir8,r8,1
> +965c:e8 89 00 09 ldu r4,8(r9)

Looks like a harmless scheduling difference, but enough trigger
the stage2/stage3 comparison. :(

Peter





Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-11-05 Thread Peter Bergner
On Mon, 2012-11-05 at 15:47 +0100, Jakub Jelinek wrote:
> On Mon, Nov 05, 2012 at 08:40:00AM -0600, Peter Bergner wrote:
> > Well we also patch config.in and configure.ac/configure.  If those are
> > acceptable to be patched later too, then great.  If not, the patch
> 
> That is the same thing as config.gcc bits.
> 
> > isn't really very large.  We did do this for power7 initially too:
> > 
> >   http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00162.html
> 
> But then power7 patch went in during stage1 of the n+1 release, and
> wasn't really backported to release branch (just to distro vendor branches),
> right?

I think we could have done better there, yes, but not all of our patches
were appropriate for backporting, especially those parts that touched
outside of the port.  There will be portions of power8 we won't/don't
want to backport either, but I would like to get the major backend
portions like machine description files and the like backported to
4.8 when the time comes.  Having the configurey changes in would help
that, but if you say those are things we can get in after stage1,
then that can ease things a bit.  That said, I'll post our current
patch as is and discuss within our team and with David on what our
next course of action should be.

Peter




Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-11-05 Thread Peter Bergner
On Mon, 2012-11-05 at 13:53 +0100, Jakub Jelinek wrote:
> On Mon, Nov 05, 2012 at 06:41:47AM -0600, Peter Bergner wrote:
> > I'd like to post later today (hopefully this morning) a very minimal
> > configure patch that adds the -mcpu=power8 and -mtune=power8 compiler
> > options to gcc.  Currently, power8 will be an alias for power7, but
> > getting this path in now allows us to add power8 support to the
> > compiler without having to touch the arch independent configure script.
> 
> config.gcc target specific hunks are part of the backend, the individual
> target maintainers can approve changes to that, I really don't see a reason
> to add a dummy alias now just for that.  If the power8 enablement is
> approved and non-intrusive enough that it would be acceptable even during
> stage 3, then so would be corresponding config.gcc changes.

Well we also patch config.in and configure.ac/configure.  If those are
acceptable to be patched later too, then great.  If not, the patch
isn't really very large.  We did do this for power7 initially too:

  http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00162.html

Peter




Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-11-05 Thread Peter Bergner
On Mon, 2012-10-29 at 18:56 +0100, Jakub Jelinek wrote:
> Status
> ==
> 
> I'd like to close the stage 1 phase of GCC 4.8 development
> on Monday, November 5th.  If you have still patches for new features you'd
> like to see in GCC 4.8, please post them for review soon.  Patches
> posted before the freeze, but reviewed shortly after the freeze, may
> still go in, further changes should be just bugfixes and documentation
> fixes.

I'd like to post later today (hopefully this morning) a very minimal
configure patch that adds the -mcpu=power8 and -mtune=power8 compiler
options to gcc.  Currently, power8 will be an alias for power7, but
getting this path in now allows us to add power8 support to the
compiler without having to touch the arch independent configure script.

The only hang up at the moment is we're still determining the
assembler mnemonic we'll be releasing that the gcc configure script
will use to test for power6 assembler support.

Peter




Re: Memory corruption due to word sharing

2012-02-01 Thread Peter Bergner
On Wed, 2012-02-01 at 13:09 -0500, David Miller wrote:
> From: Michael Matz 
> Date: Wed, 1 Feb 2012 18:41:05 +0100 (CET)
> 
> > One problem is that it's not a new problem, GCC emitted similar code since 
> > about forever, and still they turned up only now (well, probably because 
> > ia64 is dead, but sparc64 should have similar problems). 
> 
> Indeed, on sparc64 it does do the silly 64-bit access too:
> 
> wrong:
> ldx [%o0+8], %g2
> sethi   %hi(2147483648), %g1
> or  %g2, %g1, %g1
> jmp %o7+8
>  stx%g1, [%o0+8]

Ditto for powerpc64-linux:

ld 9,8(3)
li 10,1
rldimi 9,10,31,32
std 9,8(3)
blr


Peter





Re: Recovering REG_EXPR information after temporary expression replacement

2012-01-27 Thread Peter Bergner
On Fri, 2012-01-27 at 18:40 +0100, Michael Matz wrote:
> The hack below works in this specific situation (TERed into a switch), and 
> adds a REG_EXPR when an TERed SSA name ever expanded into a pseudo (i.e. 
> also for some more generic situations).

FYI, I bootstrapped and regtested your patch on powerpc64-linux and did
not see any regressions.


Peter





Re: IRA issue with shuffle copies...

2012-01-12 Thread Peter Bergner
On Wed, 2012-01-11 at 12:29 -0500, Vladimir Makarov wrote:
> There is no visible effect of the patch on SPECFP2000 performance and 
> size (the size increase is only about 0.02%) for x86 and x86-64.
> 
> The patch does worsen performance of SPECINT2000 on x86 (about 0.5%) and 
> x86-64 (about 0.3%).  x86-64 SPECINT200 code size increase is about 
> 0.05% and there is no visible change in code size on x86.
> 
> So I'd say the patch does not work for x86/x86-64.

Pat ran SPEC2000 and SPEC2006 and we had some wins and some loses.
We'll dig into a couple of the loses to see if we can learn anything
and report back if we do.  Thanks for doing the x86* runs.

Peter





Re: IRA issue with shuffle copies...

2012-01-10 Thread Peter Bergner
On Tue, 2012-01-10 at 12:20 -0500, Vladimir Makarov wrote:
> > Do we really need or want to create shuffle copies for insns that do not
> > have a two operand constraint?
> Yes, I think so.  As I remember I did some benchmarking and it gave some 
> "order" in hard register assignments and improved code slightly (at 
> least for SPEC2000) even for 3-ops insn architectures.

I'm a little skeptical about 3-op insn architectures, but will take
your word for it since you tested it.  I may have someone on the team
disable completely for ppc just as a test just so we can analyze why
it helps.  Sometimes just knowing why is a good thing. :)



> Your patch might work.  But we need to test it for major 2-ops 
> architecture x86/x86-64 and 3-ops ppc (I believe SPEC2000 would be ok 
> for this).

Ok, I'll have someone on my team kick off this patch on ppc, but it would
be nice if someone else could do the runs on x86/x86_64 or other cpus that
might be affected that we don't have access to.


Peter





IRA issue with shuffle copies...

2012-01-06 Thread Peter Bergner
Hi Vlad,

While debugging a slightly modified version of the test case in PR16458:

  int
  foo (unsigned int a, unsigned int b)
  {
if (a == b) return 1;
if (a > b)  return 2;
if (a < b)  return 3;
if (a != b) return 4;
return 0;
  }

I noticed a couple of ugly code gen warts which I tracked back to IRA.
Namely, compiling the above with -O2 -m32 on powerpc64-linux, I'm seeing:

li 9,3
mr 3,9
blr
and:
li 9,1
mr 3,9
blr

If we look at the rtl just before IRA, we have the following:

BB2:
  (set (reg/v:SI 122 [ a ]) (reg:SI 3 3 [ a ])) 
REG_DEAD (reg:SI 3 3 [ a ])
  (set (reg/v:SI 123 [ b ]) (reg:SI 4 4 [ b ])) 
REG_DEAD (reg:SI 4 4 [ b ])
  (set (reg:CC 124) (compare:CC (reg/v:SI 122 [ a ]) (reg/v:SI 123 [ b ])))
  (if_then_else (eq (reg:CC 124) (const_int 0 [0]))
goto BB6;

BB3:
  (set (reg:CCUNS 125) (compare:CCUNS (reg/v:SI 122 [ a ]) (reg/v:SI 123 [ b 
]))) REG_DEAD (reg/v:SI 123 [ b ])

REG_DEAD (reg/v:SI 122 [ a ])
  (set (reg:SI 120 [ D.1379 ]) (const_int 2 [0x2]))
  (if_then_else (gtu (reg:CC 124) (const_int 0 [0]))
goto BB8;

BB4:
  (if_then_else (geu (reg:CC 124) (const_int 0 [0]))
goto BB7;

BB5:
  (set (reg:SI 120 [ D.1379 ]) (const_int 3 [0x3]))
  goto BB8;

BB6:
  (set (reg:SI 120 [ D.1379 ]) (const_int 1 [0x1]))
  goto BB8;

BB7:
  (set (reg:SI 120 [ D.1379 ]) (const_int 4 [0x4]))

BB8:
  (set (reg/i:SI 3 3) (reg:SI 120 [ D.1379 ])) REG_DEAD (reg:SI 120 [ D.1379 ])
  (use (reg/i:SI 3 3))
  return

When we start coloring the allocnos, we get the following:

Pass 1 for finding pseudo/allocno costs

r125: preferred CR_REGS, ...
r124: preferred CR_REGS, ...
r123: preferred GENERAL_REGS, ...
r122: preferred GENERAL_REGS, ...
r120: preferred GENERAL_REGS, ...

...

  Popping a3(r122,l0)  -- assign reg 3
  Popping a2(r123,l0)  -- assign reg 4
  Popping a0(r120,l0)  -- assign reg 9
  Popping a4(r124,l0)  -- assign reg 75
  Popping a1(r125,l0)  -- assign reg 3
Assigning 75 to a1r125

This looks a little startling, since we're initially assigning r125 to r3,
even though it's preferred class is CR_REGS before improve_allocation()
saves us and reassigns r125 to r75 (a real CR reg).  The reason r125
ends up initially in r3 is that we detect a "shuffle" copy during the
set of r125, because r122 (and r123) dies in the insn r125 is defined in.
This ends up preferencing the costs for r125, such that it wants r3.
This in turn via ALLOCNO_UPDATED_HARD_REG_COSTS() increases the cost
of assigning r120 to r3, such that r120 ends up with r9 instead, when
we really really want it to get r3.

Your comments about the "shuffle" copies seem to infer that they're being
used to try and help insns with two operand contraints, but in the case
above, they're over preferencing things.  As an experiment, I disabled all
shuffle copies and the code gen for the test case above is much improved.

Do we really need or want to create shuffle copies for insns that do not
have a two operand constraint?  If not, do you know how we can test for that?
If you think we do need that for non two operand contraint insns, can we
at least disable creating shuffle copies for allocnos that have different
preferred classes, since they're probably not going to be assigned the
same hard reg?  Ala:

Index: ira-conflicts.c
===
--- ira-conflicts.c (revision 182936)
+++ ira-conflicts.c (working copy)
@@ -397,6 +397,11 @@ process_regs_for_copy (rtx reg1, rtx reg
   enum machine_mode mode;
   ira_copy_t cp;
 
+  if (!constraint_p
+  && reg_preferred_class (REGNO (reg1))
+!= reg_preferred_class (REGNO (reg2)))
+return false;
+
   gcc_assert (REG_SUBREG_P (reg1) && REG_SUBREG_P (reg2));
   only_regs_p = REG_P (reg1) && REG_P (reg2);
   reg1 = go_through_subreg (reg1, &offset1);


Your thoughts?


Peter



 





Re: Discussion: What is unspec_volatile?

2010-11-13 Thread Peter Bergner
On Sat, 2010-11-13 at 11:27 +0100, Paolo Bonzini wrote:
> On 11/12/2010 03:25 PM, H.J. Lu wrote:
> > IRA may move instructions across an unspec_volatile,
> 
> Do you have a testcase?

Are you sure it's IRA and not our old friend update_equiv_regs()
which IRA calls?  http://gcc.gnu.org/PR41171 shows an example
where update_equiv_regs() moves code around.

Peter





Re: %pc relative addressing of string literals/const data

2010-11-08 Thread Peter Bergner
latOn Mon, 2010-11-08 at 21:13 +, Dave Korn wrote:
> On 08/11/2010 13:44, Joakim Tjernlund wrote:
> > One ping and a few days later and nothing. Very frustrating. I don't
> > believe all PPC devs are so "busy" that none has the time to look
> > at a simple one liner. What is up?
> 
>   There's only the one of him.  He probably is that busy.  He's a very nice
> bloke and wouldn't be snubbing you just to be nasty, but he does have a day
> job as well as volunteering for GCC.

Not to mention he was at the recent GCC Summit and probably has a large
backlog of email to catch up with.

Hälsningar,

Peter





Re: GCC Binary

2010-08-06 Thread Peter Bergner
On Fri, 2010-08-06 at 12:27 -0700, Erick Garske wrote:
> There a location where I can download the binary of GCC for the IBM i?
> 
> http://gcc.gnu.org/install/binaries.html
> 
> Are any of these compatible for the IBM i at V6R1M0?

There is no support in GCC for native iSeries (AKA AS/400).


Peter





Re: A question about mov pattern

2010-06-24 Thread Peter Bergner
On Thu, 2010-06-24 at 08:57 -0600, Jeff Law wrote:
> On 06/24/10 02:02, Revital1 Eres wrote:
> > Hello,
> >
> > In the new target I'm working on there are branch regs and gprs.
> > The loads and store instructions are only to/from the gprs, so if a
> > branch reg needs to be spilled it first needs to be moved to a gpr and
> > then stored to memory.  I've implemented mov pattern in the machine
> > description file for the gprs and a mov pattern between gprs and branch
> > regs; however I'm am not sure if I need to add more to model the behavior
> > described above and if so how to do it.
> >
> Secondary reloads is the answer.
> 
> This isn't a terribly uncommon situation.  Handling of the shift 
> register  (SAR) on the PA would be a good example.  You can move the SAR 
> to/from a GPR, but SAR can not be stored directly to memory.  Searches 
> for SAR in pa.c will get you a long way.

The same is true for the condition register on PowerPC.

Peter





Re: IRA undoing scheduling decisions

2009-09-02 Thread Peter Bergner
On Wed, 2009-09-02 at 11:49 -0400, Vladimir Makarov wrote:
> So probably, it is worth to do update_equiv_reg as a separate pass.

Agreed.


> I'll submit a patch on next week (sorry, I am a bit busy this week).

Sounds good.  Thanks for taking care of this!

Peter





Re: IRA undoing scheduling decisions

2009-09-01 Thread Peter Bergner
On Tue, 2009-09-01 at 16:46 -0400, Vladimir Makarov wrote:
> Peter Bergner wrote:
> > Were you going to whip that patch up or did you want me to?
> >
> I am going to do it by myself.

Great!  I'd like to see how your patch affects POWER6 performance.
Do you have access to a POWER6 box?  If not, can you send Pat and I
the patch and we'll fire off a run on our POWER6 benchmark system.
Thanks.

Peter





Re: IRA undoing scheduling decisions

2009-09-01 Thread Peter Bergner
On Tue, 2009-09-01 at 10:38 -0400, Vladimir Makarov wrote:
> We could do update_equiv_regs in a separate pass before the 1st insn 
> scheduling as it was before IRA.

IIRC, update_equiv_regs() was always called as part of local-alloc,
so it was always after sched1 even before IRA.  That said, moving it
to its own pass before sched1 sounds like an interesting idea.
My patch from the other note basically didn't affect SPEC2000 at all,
and we could use it, but if your idea works, I'm more than happy to
dump my patch. :)

Were you going to whip that patch up or did you want me to?

Peter





Re: IRA undoing scheduling decisions

2009-09-01 Thread Peter Bergner
On Wed, 2009-08-26 at 17:12 -0500, Peter Bergner wrote:
> On Wed, 2009-08-26 at 23:30 +0200, Richard Guenther wrote:
> > Hmm.  I suppose if you conditionalize it on flag_schedule_insns it might be
> > an overall win.  Care to SPEC test that change?
> 
> I assume you mean like the change below?  Yeah, I can SPEC test that.
> 
> Peter
> 
> 
> Index: ira.c
> ===
> --- ira.c (revision 15)
> +++ ira.c (working copy)
> @@ -2510,6 +2510,8 @@ update_equiv_regs (void)
>calls.  */
> 
> if (REG_N_REFS (regno) == 2
> +   && (!flag_schedule_insns
> +   || REG_BASIC_BLOCK (regno) < NUM_FIXED_BLOCKS)
> && (rtx_equal_p (x, src)
> || ! equiv_init_varies_p (src))
> && NONJUMP_INSN_P (insn)

Pat ran the patch on SPEC2000 and it was very neutral.  The overall
SPECFP number didn't change and the SPECINT number only improved by
0.2%, which is pretty much in the noise.

I think Vlad's suggestion of moving update_equiv_regs() to its own pass
before sched1 sounds interesting.  If that works, it's probably better
than this patch.

Peter





Re: IRA undoing scheduling decisions

2009-08-26 Thread Peter Bergner
On Wed, 2009-08-26 at 23:30 +0200, Richard Guenther wrote:
> On Wed, Aug 26, 2009 at 10:47 PM, Peter Bergner wrote:
> > Looking at update_equiv_regs(), if I disable the replacement for regs
> > that are local to one basic block (patch below) like it existed before
> > John Wehle's patch way back in Oct 2000:
> >
> >  http://gcc.gnu.org/ml/gcc-patches/2000-09/msg00782.html
> >
> > then we get the ordering we want.  Does anyone know why John removed
> > that part of the test in his patch?  Thoughts anyone?
> 
> Hmm.  I suppose if you conditionalize it on flag_schedule_insns it might be
> an overall win.  Care to SPEC test that change?

I assume you mean like the change below?  Yeah, I can SPEC test that.

Peter


Index: ira.c
===
--- ira.c   (revision 15)
+++ ira.c   (working copy)
@@ -2510,6 +2510,8 @@ update_equiv_regs (void)
 calls.  */
 
  if (REG_N_REFS (regno) == 2
+ && (!flag_schedule_insns
+ || REG_BASIC_BLOCK (regno) < NUM_FIXED_BLOCKS)
  && (rtx_equal_p (x, src)
  || ! equiv_init_varies_p (src))
  && NONJUMP_INSN_P (insn)




Re: IRA undoing scheduling decisions

2009-08-26 Thread Peter Bergner
On Mon, 2009-08-24 at 23:56 +, Charles J. Tabony wrote:
> I am seeing a performance regression on the port I maintain, and I would 
> appreciate some pointers.
> 
> When I compile the following code
> 
> void f(int *x, int *y){
>   *x = 7;
>   *y = 4;
> }
> 
> with GCC 4.3.2, I get the desired sequence of instructions.  I'll call it 
> sequence A:
> 
> r0 = 7
> r1 = 4
> [x] = r0
> [y] = r1
> 
> When I compile the same code with GCC 4.4.0, I get a sequence that is lower 
> performance for my target machine.  I'll call it sequence B:
> 
> r0 = 7
> [x] = r0
> r0 = 4
> [y] = r0

This is caused by update_equiv_regs() which IRA inherited from local-alloc.c.
Although with gcc 4.3 and earlier, you don't see the problem, it is still there,
because if you look at the 4.3 dumps, you will see that update_equiv_regs()
unordered them for us.  What is saving us is that sched2 reschedules them
again for us in the order we want.  With 4.4, IRA happens to reuse the same
register for both pseudos, so sched2 is hand tied and cannot schedule them
back again for us.

Looking at update_equiv_regs(), if I disable the replacement for regs
that are local to one basic block (patch below) like it existed before
John Wehle's patch way back in Oct 2000:

  http://gcc.gnu.org/ml/gcc-patches/2000-09/msg00782.html

then we get the ordering we want.  Does anyone know why John removed
that part of the test in his patch?  Thoughts anyone?


Peter


Index: ira.c
===
--- ira.c   (revision 15)
+++ ira.c   (working copy)
@@ -2510,6 +2510,7 @@ update_equiv_regs (void)
 calls.  */
 
  if (REG_N_REFS (regno) == 2
+ && REG_BASIC_BLOCK (regno) < NUM_FIXED_BLOCKS
  && (rtx_equal_p (x, src)
  || ! equiv_init_varies_p (src))
  && NONJUMP_INSN_P (insn)





Re: Incorrect line info in printf for powerpc-eabisim -mhard-foat

2009-07-21 Thread Peter Bergner
On Thu, 2009-07-16 at 13:55 -0700, Michael Eager wrote:
> I've tracked down a failure in gdb to hit a breakpoint
> set at printf to the the breakpoint being placed incorrectly.
> 
> Here is the code generated for printf with -mhard-float:
> 
>   .loc 1 29 0
>   .cfi_startproc
> .LVL0:
>   mflr 0
>   stwu 1,-112(1)
> .LCFI0:
>   .cfi_def_cfa_offset 112
>   stw 5,24(1)
>   stw 0,116(1)
>   stw 6,28(1)
>   stw 7,32(1)
>   stw 8,36(1)
>   stw 9,40(1)
>   stw 10,44(1)
>   bne- 1,.L2  <<<  - 1
>   .cfi_offset 65, 4
>   .loc 1 29 0 <<<  - 2
>   stfd 1,48(1)<<<  - 3
>   stfd 2,56(1)
>   stfd 3,64(1)
>   stfd 4,72(1)
>   stfd 5,80(1)
>   stfd 6,88(1)
>   stfd 7,96(1)
>   stfd 8,104(1)
> .L2:
>   .loc 1 34 0
> 
> Gdb places a breakpoint at printf() at the stfd instruction (3).
> This appears to be because of the .loc at (2).  When the code is
> executed, the branch (1) is taken, jumping over the the breakpoint.
> I think that the .loc at (2) should not be generated, since it is
> in the middle of the prologue code.

Luis, isn't there a bugzilla regarding this?  This seems to me to
be similar to what you had been looking at.

Peter





Re: (known?) Issue with bitmap iterators

2009-06-20 Thread Peter Bergner
On Sat, 2009-06-20 at 17:01 +0200, Richard Guenther wrote:
> On Sat, Jun 20, 2009 at 4:54 PM, Jeff Law wrote:
> >
> > Imagine a loop like this
> >
> > EXECUTE_IF_SET_IN_BITMAP (something, 0, i, bi)
> >  {
> >   bitmap_clear_bit (something, i)
> >   [ ... whatever code we want to process i, ... ]
> >  }
> >
> > This code is unsafe.

[snip]

> It is known (but maybe not appropriately documented) that deleting
> bits in the bitmap you iterate over is not safe.  If it would be me I would
> see if I could make it safe though.

FYI, that's what I did with the sparseset implementation, so:

  EXECUTE_IF_SET_IN_SPARSESET (something, i)
{
  sparseset_clear_bit (something, i);
  [ ... whatever code we want to process i, ... ]
}

is safe.  In fact, we use it for one of the special cases in
sparseset_and() and sparseset_and_compl().


Peter





Re: Status of the DLX backend for GCC?

2008-10-07 Thread Peter Bergner
On Sat, 2008-10-04 at 18:48 +0200, Gerald Pfeifer wrote:
> Thanks for the background on this, Peter, and the background on this
> site disappearing.
> 
> The reason I asked was that we have that reference from our site to that
> URL and I failed to find any replacement so far.  The first two hits that
> I get in Google actually are mails by you in the gcc archives. ;-)
> 
> I guess we'll just have to remove that reference?

I talked with Aaron Sawdey and he still had the tarballs which he has
given me.  Let me go through a build process with them to make sure they
still build and then I'll post them somewhere you can link to.

Peter






Re: improving testsuite runtime

2008-09-18 Thread Peter Bergner
On Fri, 2008-09-19 at 09:41 +1000, Ben Elliston wrote:
> On Thu, 2008-09-18 at 10:44 -0600, Tom Tromey wrote:
> > Yeah, this seems necessary.  Ideally the order ought to be stable, too.
> 
> Do you think that the current order of .exps should be preserved in the
> resultant .sum and .logs?  I guess some people and/or build farms
> actually use diff rather than compare_tests?

Do people still use compare_tests?  Talking with Janis, she mentioned that
it wasn't multilib (ie, RUNTESTFLAGS="--target_board=unix'{-m32,-m64}')
compatible, but that test_summary was.  It's what I've been using to
compare two runs.

Peter





Re: IRA copy heuristics

2008-09-04 Thread Peter Bergner
On Thu, 2008-09-04 at 20:28 -0400, David Edelsohn wrote:
> On Thu, Sep 4, 2008 at 7:39 PM, Vladimir Makarov <[EMAIL PROTECTED]> wrote:
> > Meanwhile I am going to submit your second patch with an added
> > comment.  The patch permits gcc to generate the same quality code as
> > before your first patch.
> 
> Why?
> 
> As Richard said before:
> 
> "... it changes
> the heuristics _without any explanation of why this is necessary_.
> IMO, that's unacceptable for our shiny, new (and generally very nice)
> register allocator.  And I think it's unacceptable even if it happens
> to fix a performance regression."

I have to agree with Richard and David here.  I find it troubling that
allocation order affects performance by anything other than a small
amount due to heuristic noise.  It might be in the end there is a 
valid reason on why Richard's patch has a positive benefit, but until
we know why, I'd rather wait.

Peter





Re: Bootstrap failures on ToT, changes with no ChangeLog entry?

2008-07-24 Thread Peter Bergner
On Thu, 2008-07-24 at 18:48 +0200, Andreas Schwab wrote:
> Definitely something fishy around that time.  svn log says:
> 
> 
> r138082 | meissner | 2008-07-23 13:18:03 +0200 (Mi, 23 Jul 2008) | 1 line
> 
> Add missing ChangeLog from 138075
> 
> r138078 | meissner | 2008-07-23 13:06:42 +0200 (Mi, 23 Jul 2008) | 1 line
> 
> undo 138077
> 
> r138075 | meissner | 2008-07-23 12:28:06 +0200 (Mi, 23 Jul 2008) | 1 line
> 
> Add ability to set target options (ix86 only) and optimization options on a 
> func
> 
> 
> And svn diff says:
> 
> $ svn diff -c138078
> svn: Unable to find repository location for '' in revision 138077
> $ svn diff -c138077
> svn: The location for '' for revision 138077 does not exist in the repository 
> or refers to an unrelated object
> 
> Apparently the repository has some issues with revision 138077.

Maybe it's related to this #gcc comment:

 [snip]
   However, I did accidentily delete the trunk when I was trying to 
delete
   the branch, and did a copy from the previous version.  Is there 
anyway on
   the svn pre-commits to prevent somebody deleting the trunk?

Peter





Re: Bad code generation on HPPA platform

2008-05-08 Thread Peter Bergner
On Thu, 2008-05-08 at 11:38 -0700, Steve Ellcey wrote:
> The psuedo for %r8 does have REG_POINTER set and the psuedo for %r19
> does not.  I first see REG_POINTER set for ivtmp___1536 (the psuedo for
> %r8) in flow.c.138r.loop2_invariant.  This seems interesting because
> Peter's patch, that fixes this problem without undoing Andrews patch,
> includes a change to loop-invariant.c, though that change should be
> preserving REG_POINTER's during optimization not preventing them.

Similar to hppa, power6 cares about knowing whether a pseudo is a pointer
or not, because for regA + regB load/store addressing, we get much better
performance if regA is the pointer and regB is the offset rather than
the other way around.  What I found, was that the loop invariant and
GCSE code were creating some pseudos to copy expressions into, but was
failing to copy the REG_POINTER/MEM_POINTER attribute along with it.

The hunk from:

  http://gcc.gnu.org/ml/gcc-patches/2008-04/msg00693.html

which replaced the rtlanal.c from the first commit was needed at -O0,
because the only chance to order the operands at -O0 is at expand time.


Peter





Re: Bad code generation on HPPA platform

2008-05-07 Thread Peter Bergner
On Wed, 2008-05-07 at 11:03 -0700, Steve Ellcey wrote:
> > Can you please also add the replacement hunk from:
> > 
> >   o;?http://gcc.gnu.org/ml/gcc-patches/2008-04/msg00693.html
> > 
> > If the first part gets backported, I'd like the second hunk to
> > go along with it if possible.  Thanks.
> > 
> > Peter
> 
> I was wondering about that patch since it seems to be related to the
> other changes.  I will include it in my 4.3 branch testing.

Yes, it ends up doing the same thing the rtlanal.c hunk that was reverted
did, but in a manner much more friendly to CRIS.  Thanks.


Peter





Re: Bad code generation on HPPA platform

2008-05-07 Thread Peter Bergner
On Wed, 2008-05-07 at 10:10 -0700, Steve Ellcey wrote:
> Yes, it looks like it is.  I added -fno-strict-aliasing and the perl
> benchmarks passed when compiled with ToT GCC.  That makes me feel better
> about the idea of putting Peter's patch (with the revert) on the 4.3
> branch as a way to fix the HPPA bad code generation bug.  I am going to
> test that patch on the branch and verify that it fixes my SPEC/GCC
> failure.

Can you please also add the replacement hunk from:

  http://gcc.gnu.org/ml/gcc-patches/2008-04/msg00693.html

If the first part gets backported, I'd like the second hunk to
go along with it if possible.  Thanks.

Peter





Re: Bad code generation on HPPA platform

2008-05-07 Thread Peter Bergner
On Wed, 2008-05-07 at 07:45 -0700, Steve Ellcey wrote:
> I have found that this problem does not occur on the ToT sources and
> that the problem went away with this patch:
> 
>   2008-04-07  Peter Bergner  <[EMAIL PROTECTED]>
> 
>PR middle-end/PR28690
>* rtlanal.c: Update copyright years.
>(commutative_operand_precedence): Give SYMBOL_REF's the same precedence
>as REG_POINTER and MEM_POINTER operands.
>* emit-rtl.c (gen_reg_rtx_and_attrs): New function.
>(set_reg_attrs_from_value): Call mark_reg_pointer as appropriate.
>* rtl.h (gen_reg_rtx_and_attrs): Add prototype for new function.
>* gcse.c: Update copyright years.
>(pre_delete): Call gen_reg_rtx_and_attrs.
>(hoist_code): Likewise.
>(build_store_vectors): Likewise.
>(delete_store): Likewise.
>* loop-invariant.c (move_invariant_reg): Likewise.
>Update copyright years.
> 
> I don't know if porting this patch to the 4.3 branch is an option or not
> but it might be the easiest way to fix this problem without having to
> revert Andrew's patch.

Note that the rtlanal.c:commutative_operand_precedence() hunk was reverted
because it caused some problems on CRIS and was replaced by the following
safer change:

http://gcc.gnu.org/ml/gcc-patches/2008-04/msg00693.html

Peter





Re: IRA for GCC 4.4

2008-04-28 Thread Peter Bergner
On Mon, 2008-04-28 at 18:07 -0400, Vladimir Makarov wrote:
> I am currently working on bit matrix compression.  It is not implemented 
> yet.  I hope it will be ready in a week.

Ahh, ok.  Well, hopefully the code I wrote on the trunk is useful for IRA.
If you have questions about it, let me know, or if you want me to look into
it on IRA, just point me to your current code that does this and I'll try
and take a look when I have some free cycles.

I'll note that the real key to eliminating the space from the bit matrix
isn't that we know two allocnos do not interfere, but rather that we know
we'll never test for whether they conflict or not.  Since our definition
of conflict is "live at the definition of another", that simply translates
into, if they're never simultaneously live, then we'll never call any
bit matrix routines asking whether they conflict or not, so we don't
need to reserve space for any conflict info.

The fact that local allonocs from different blocks are never simultaneously
live was just a very easy and inexpensive property to measure.  If your live
range info can easily and cheaply partition the allocnos into sets that are
and are not live simultaneously, then you should be able to see some further
reductions over what I'm seeing...which I think I've shown, can be considerable.

Peter






Re: IRA for GCC 4.4

2008-04-28 Thread Peter Bergner
On Mon, 2008-04-28 at 16:01 -0400, Vladimir Makarov wrote:
> Thanks, Peter.  That was clever and email is very enlightening.  I have 
> analogous idea for more compact conflict matrix representation.  IRA 
> builds allocno live ranges first (they are ranges of program points 
> where the allocno lives).  I can use this information for fast searching 
> potential conflicts to sort the allocnos.  Probably the matrix will be 
> even more compact because live ranges contain more detail info than 
> basic blocks where the local allocnos live.  For example, the ranges 
> even can show that allocnos local in the same block will never 
> conflicts.  It means that matrix even for fppp can be compressed.

You say you use your analogous idea now?  Can you point me to the code?
I thought I heard you (maybe someone else?) that your conflict information
was much bigger than old mainline.  If this is true and you are compacting
the bit matrix like I am, why is it so big?


> I tried to use sparsets for the same purposes (only for maintaining and 
> processing allocnos currently living).  But usage of sparsets for this 
> purposes gave practically nothing (I had to use valgrind lackey to see 
> the difference).  Therefore I decided not to introduce the additional 
> data and use just bitmaps for this.
> 
> Sparsets already exists in a compiler.  I am thinking about their usage 
> too.  May be you have a benchmark where the sparsets give a visible 
> compiler speed improvement (my favorite was combine.i).  I'd appreciate 
> if you point me such benchmark.  It could help me to make a decision to 
> use sparsets.

Yes, I added the sparseset implementation that has been in since gcc 4.3.
Did you use my sparseset implementation or did you write your own for your
tests?  I don't recall which file(s) I saw the difference on.  All I recall
is I tried it both ways, saw a difference somewhere and promptly threw the
slower code away along with which file(s) I saw the difference on.  Sorry I
can't be of more help.

Given how sparsesets are implemented, I cannot see how they could ever be
slower than bitmaps for the use of "live", but I can see how they might be
faster.  That said, if your allocator is spending enough time elsewhere,
then I can easily imagine the difference being swamped such that you don't
see any difference at all.


Peter






Re: IRA for GCC 4.4

2008-04-25 Thread Peter Bergner
On Thu, 2008-04-24 at 20:23 -0400, Vladimir Makarov wrote:
> Hi, Peter.  The last time I looked at the conflict builder 
> (ra-conflict.c), I did not see the compressed matrix.  Is it in the 
> trunk?  What should I look at?

Yes, the compressed bit matrix was committed as revision 129037 on
October 5th, so it's been there a while.  Note that the old square
bit matrix was used not only for testing for conflicts, but also for
visiting an allocno's neighbors.  The new code (and all compilers I've
worked on/with), use a {,compressed} upper triangular bit matrix for
testing for conflicts and an adjacency list for visiting neighbors.

The code that allocates and initializes the compressed bit matrix is in
global.c.  If you remember how a upper triangular bit matrix works, it's
just one big bit vector, where the bit number that represents the conflict
between allocnos LOW and HIGH is given by either of these two functions:

  1) bitnum = f(HIGH) + LOW
  2) bitnum = f(LOW) + HIGH

where:

  1) f(HIGH) = (HIGH * (HIGH - 1)) / 2
  2) f(LOW) = LOW * (max_allocno - LOW) + (LOW * (LOW - 1)) / 2 - LOW - 1

As mentioned in some of the conflict graph bit matrix literature (actually,
they only mention expression #1 above), the expensive functions f(HIGH) and
f(LOW) can be precomputed and stored in an array, so to access the conflict
graph bits only takes a load and an addition.  Below is an example bit matrix
with initialized array:
  
012 3456789   10   11
---
| -1 |  0 ||  0 |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 | 10 |
---
|  9 |  1 ||| 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---
| 18 |  2 |||| 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 |
---
| 26 |  3 ||||| 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 |
---
| 33 |  4 |||||| 38 | 39 | 40 | 41 | 42 | 43 | 44 |
---
| 39 |  5 ||||||| 45 | 46 | 47 | 48 | 49 | 50 |
---
| 44 |  6 |||||||| 51 | 52 | 53 | 54 | 55 |
---
| 48 |  7 ||||||||| 56 | 57 | 58 | 59 |
---
| 51 |  8 |||||||||| 60 | 61 | 62 |
---
| 53 |  9 ||||||||||| 63 | 64 |
---
| 54 | 10 |||||||||||| 65 |
---
| NA | 11 |||||||||||||
---

As an example, if we look at the interference between allocnos 8 and 10, we
compute "array[8] + 10" = "51 + 10" = "61", which if you look above, you will
see is the correct bit number for that interference bit.

The difference between a compressed upper triangular bit matrix from a standard
upper triangular bit matrix like the one above, is we eliminate space from the
bit matrix for conflicts we _know_ can never exist.  The easiest case to catch,
and the only one we catch at the moment, is that allocnos that are "local" to a
basic block B1 cannot conflict with allocnos that are local to basic block B2,
where B1 != B2.  To take advantage of this fact, I updated the code in global.c
to sort the allocnos such that all "global" allocnos (allocnos that are live in
more than one basic block) are given smaller allocno numbers than the "local"
allocnos, and all allocnos for a given basic block are grouped together in a
contiguous range to allocno numbers.  The sorting is accomplished by:

  /* ...so we can sort them in the order we want them to receive
 their allocnos.  */
  qsort (reg_allocno, max_allocno, sizeof (int), regno_compare);

Once we have them sorted, our conceptual view of the compressed bit matrix
will now look like:

  GGGB0   B0   B0   B1   B1   B2   B2   B2   B2

  012 3456789   10   11
--  -
| -1 |G   0 ||  0 |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 | 10 |
-- 

Re: IRA for GCC 4.4

2008-04-24 Thread Peter Bergner
On Thu, 2008-04-24 at 16:33 -0500, Peter Bergner wrote:
> On Thu, 2008-04-24 at 16:51 +0200, Paolo Bonzini wrote:
> > >> (The testcase is 400k lines of preprocessed Fortran code, 16M is size,
> > >> available here:
> > >> http://www.pci.unizh.ch/vandevondele/tmp/all_cp2k_gfortran.f90.gz)
> > >>
> > >>   
> > > Thanks, I'll check it.
> > 
> > Vlad, I think you should also try to understand what does trunk do with 
> >   global (and without local allocation) at -O0.  That will give a 
> > measure of the benefit from Peter's patches for conflict graph building.
> 
> I took a patch from Ken/Steven that disabled local_alloc and instead runs
> global_alloc() at -O0 and summing up all of the bit matrix allocation
> info we emit into the *.greg output, the new conflict builder saves a lot
> of space compared to the old square bit matrix (almost 20x less space).
> Here's the accumulated data for the test case above:
> 
> compressed upper triangular:  431210251 bits,   53902848 bytes
> upper triangular:4264666581 bits,  533084851 bytes
> square:  8531372796 bits, 1066423618 bytes

The SPEC2000 numbers look even better (29x less space):

compressed upper triangular:  281657797 bits,   35212532 bytes
upper triangular 4094809686 bits,  511856604 bytes
square:  8191641644 bits, 1023962188 bytes

Peter





Re: IRA for GCC 4.4

2008-04-24 Thread Peter Bergner
On Thu, 2008-04-24 at 16:51 +0200, Paolo Bonzini wrote:
> >> (The testcase is 400k lines of preprocessed Fortran code, 16M is size,
> >> available here:
> >> http://www.pci.unizh.ch/vandevondele/tmp/all_cp2k_gfortran.f90.gz)
> >>
> >>   
> > Thanks, I'll check it.
> 
> Vlad, I think you should also try to understand what does trunk do with 
>   global (and without local allocation) at -O0.  That will give a 
> measure of the benefit from Peter's patches for conflict graph building.

I took a patch from Ken/Steven that disabled local_alloc and instead runs
global_alloc() at -O0 and summing up all of the bit matrix allocation
info we emit into the *.greg output, the new conflict builder saves a lot
of space compared to the old square bit matrix (almost 20x less space).
Here's the accumulated data for the test case above:

compressed upper triangular:  431210251 bits,   53902848 bytes
upper triangular:4264666581 bits,  533084851 bytes
square:  8531372796 bits, 1066423618 bytes

Peter





  1   2   >