Re: Stepping up as maintainer for ia64
On 3/8/24 5:28 PM, Jonathan Wakely wrote: > On Fri, 8 Mar 2024 at 22:35, Frank Scheiner via Gcc wrote: >> >> On 08.03.24 23:00, Peter Bergner wrote: >>> On 3/8/24 7:16 AM, Richard Biener via Gcc wrote: >>>> I CCed Jeff who is on the commitee to forward the maintainer proposal >>>> though I guess this will not go forward as a first step. Instead >>>> you are probably expected to show activity on the port, for example >>>> post the patch series to make ia64 use LRA, get write access to the >>>> git repository and then be promoted maintainer. >>> >>> One other method for showing activity is posting regular testsuite >>> results on the gcc-testresults mailing list to show the community >>> the port is "working". >> >> I don't want to spam this or the other list each and every week, but I > > Sending test results to the gcc-testresults list is **not** spamming, > that's what the list is for! 100% agree! If you look at what we (IBM) post, we roughly post somewhere around 7 testsuite results per day due to runs on different hardware, endianness and OS (Linux versus AIX). So spam ...err... post away! > If you're testing uncommon targets (e.g. ia64-linux) then sending test > results to the list is essential so we know the target builds, because > nobody else is testing it. Again, 100% agree! Peter
Re: [PATCH] fix PowerPC < 7 w/ Altivec not to default to power7
On 3/8/24 5:30 AM, Jonathan Wakely via Gcc wrote: > Patches should be sent to the gcc-patches list instead of this one, > and should be against trunk not an old gcc-11 RC. See > https://gcc.gnu.org/contribute.html#patches for more details - thanks! And you need to CC the rs6000/powerpc port maintainers which you can find along with their preferred email addresses in the MAINTAINERS file. If you don't CC them, they may miss seeing the patch. Peter
Re: Stepping up as maintainer for ia64
On 3/8/24 7:16 AM, Richard Biener via Gcc wrote: > I CCed Jeff who is on the commitee to forward the maintainer proposal > though I guess this will not go forward as a first step. Instead > you are probably expected to show activity on the port, for example > post the patch series to make ia64 use LRA, get write access to the > git repository and then be promoted maintainer. One other method for showing activity is posting regular testsuite results on the gcc-testresults mailing list to show the community the port is "working". Peter
Re: [RFC Linux patch] powerpc: add documentation for HWCAPs
On 5/20/22 12:15 AM, Nicholas Piggin via Gcc wrote: > +PPC_FEATURE_HAS_ALTIVEC > +Vector (aka Altivec, VSX) facility is available. Slight typo. s/VSX/VMX/ Peter
Re: [power-ieee128] What should the math functions be annotated with?
On 12/4/21 11:40 AM, Thomas Koenig wrote: > OK, what I have now is > > tkoenig@gcc-fortran:~$ echo $PATH > /home/tkoenig/bin:/opt/at15.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin > tkoenig@gcc-fortran:~$ echo $LD_LIBRARY_PATH > /home/tkoenig/lib64 > > I generally use LD_LIBRARY_PATH to point to where the shared > libgfortran and other libraries is installed. > > However, this breaks man (and I don't know what else): So LD_LIBRARY_PATH is searched before the directories in ld.so.cache, so you end up picking up some "new" libs from /home/tkoenig/lib64 and some of these rely on the newer libs in AT15. However, man and some of the other system binaries use the system dynamic linker, so they search first through LD_LIBRARY_PATH an dnot finding something, they fall back to /etc/ld.so.cache and that doesn't have the newer AT15 libs, so you hit errors. Instead of setting LD_LIBRARY_PATH=/home/tkoenig/lib64 could you try setting it to LD_LIBRARY_PATH='$ORIGIN/lib64' instead? This would allow the other system binaries to not find your /home/tkoenig/lib64 directory so they'd behave normally. However, any binary that was compiled in a directory where your lib64/ exists would find your new libs and use them. I'm not sure if that cramps your testing or not, to limit yourself to compiling your tests in that one directory. If that doesn't work, could you instead not set LD_LIBRARY_PATH and instead compile using -L/home/bergner/lib64 -R/home/bergner/lib64 ? Peter
Re: [power-ieee128] What should the math functions be annotated with?
On 12/4/21 10:19 AM, Jakub Jelinek wrote: > But when Thomas is working on the vanilla gcc tree, trying to make it work > for Fortran, I think he'll need to patch that gcc tree too to use the > AT15's dynamic linker and rpath like the AT15 gcc is. That is part of the magic that happens when you configure with --with-advance-toolchain=at15.0, it forces the gcc to use AT15's dynamic linker and AT15's ld.so.cache makes it so that the dynamic linker finds AT15's libs etc. Peter
Re: [power-ieee128] What should the math functions be annotated with?
On 12/4/21 9:37 AM, Peter Bergner wrote: > On 12/4/21 9:25 AM, Michael Meissner wrote: > ubuntu@gcc-fortran:/home/tkoenig/Tst$ ldd ./a.out > ./a.out: /lib/powerpc64le-linux-gnu/libc.so.6: version `GLIBC_2.34' not found > (required by ./a.out) > linux-vdso64.so.1 (0x7f633962) > libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x7f63393d) > /opt/at15.0/lib64/ld64.so.2 => /lib64/ld64.so.2 (0x7f633964) To go into a little more in depth, the important thing is your a.out was linked with the correct loader: ubuntu@gcc-fortran:/home/tkoenig/Tst$ readelf -l a.out | grep interpreter [Requesting program interpreter: /opt/at15.0/lib64/ld64.so.2] ...and the error message you saw was a good thing, it showed your a.out was expecting to see the newer GLIBC 2.34 and didn't. The reason it didn't was that the system ldd which you used does some magic and overrides the a.out runtime loader with the system loader and that loader uses its own ld.so.cache which doesn't include AT15's library paths. The AT15 loader has its own /opt/at15.0/etc/ld.so.cache which includes its lib dirs as well the system lib dirs. This way, the AT15 libs are found first and any library AT15 doesn't provide it automatically picked up from the system. As long as you keep the AT15 bin path before the system bin dirs, you should be fine. Peter
Re: [power-ieee128] What should the math functions be annotated with?
On 12/4/21 9:25 AM, Michael Meissner wrote: > On Sat, Dec 04, 2021 at 02:42:13PM +0100, Thomas Koenig wrote: > Note, the system ldd does not tend to accurately report the library > dependencies for AT libraries: And using AT15's ldd, it shows your a.out is linked to the correct libc: ubuntu@gcc-fortran:/home/tkoenig/Tst$ ldd ./a.out ./a.out: /lib/powerpc64le-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./a.out) linux-vdso64.so.1 (0x7f633962) libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x7f63393d) /opt/at15.0/lib64/ld64.so.2 => /lib64/ld64.so.2 (0x7f633964) ubuntu@gcc-fortran:/home/tkoenig/Tst$ /opt/at15.0/bin/ldd ./a.out linux-vdso64.so.1 (0x7158fb1c) libc.so.6 => /opt/at15.0/lib64/power9/libc.so.6 (0x7158faf4) /opt/at15.0/lib64/ld64.so.2 (0x7158fb1e) What I would do is place /opt/at15.0/bin as the 2nd directory in your PATH, with your new GCC install dir being first. That way, things should be seemless for you. Peter
Re: How to describe ‘earlyclobber’ explicitly for specific source operand ?
On 11/19/21 1:28 AM, Jojo R via Gcc wrote: > We know gcc supply earlyclobber function to avoid register overlap, > > but it can not describe explicitly for specific source operand, is it > right ? You add the early clobber to the OUTPUT operand(s) that can clobber any of the input source operands. You don't mark the source operands that could be clobbered. Peter
Re: libgfortran.so SONAME and powerpc64le-linux ABI changes
On 10/6/21 12:50 PM, Segher Boessenkool wrote: > So we have three options (well, four): > > 0) Do nothing. We will stay in this hell forever. Not my choice :-) > 1) Use a soft-float-like parameter passing everywhere. This works but >will be horridly slow on newer systems. We can do better than that. > 2) Use the current setup where -mcpu=power8 (or later) makes QP float >available. Most BE stuff isn't compiled with that currently, and it >will split our ecosystem. > 3) As Joseph reminds me the high VSRs are the VRs, so we could use the >same parameter passing on anything with AltiVec. We could even >simply require -maltivec for QP float to be supported (we currently >require -mvsx, this would not be a restriction). > > I think I like 3) :-) I like 3 too, meaning requiring -maltivec to support IEEE QP at all. This would cover POWER6 and later server CPUs, as well as some other cpus like in the Power Macs. Anything without Altivec hardware would need to either not support IEEE QP at all, or go through the work themselves of coming up with a -msoft-altivec like ABI. Peter
Re: GCC trunk commit a325bdd195ee96f826b208c3afb9bed2ec077e12
On 6/16/21 1:32 PM, Uros Bizjak wrote: > On Wed, Jun 16, 2021 at 6:08 PM Liu Hao wrote: >> It looks like Uroš was on 00d07ec6e12, committed his changes mistakenly with >> `git commit --amend` >> (which changed the commit message but did not reset the author), then >> rebased the modified commit >> onto ee52bf609bac. Git is smart enough to drop duplicate changes, but the >> leftovers formed a new >> commit, which was exactly a325bdd195e. > > Indeed, IIRC - contrib/gcc_update failed due to the unresolved merge, > and I changed my commit with --amend. There were some issues, but I > was under the impression that I fixed them. It looks like I forgot > something, so the result is the commit with wrong author attribution. > > Perhaps a notice in the documentation should be added what to do if > contrib/gcc_update fails, or perhaps this script should be made more > robust. I admit, that if the same thing happened to me, I would have made the same mistake...or worse :-), so yeah, a comment about what to do to "fix" things when gcc_update fails would be greatly appreciated by me too! Peter
GCC trunk commit a325bdd195ee96f826b208c3afb9bed2ec077e12
Hi all, I recently did a search on a git log of gcc trunk looking for a particular commit of mine, so was searching for my name, and I came across a commit from Uroš that lists me as the Author. I did not author that commit and talking with Uroš offline, he assures me that he didn't use --author when committing that, so we're wondering whether there might be a bug in one of the commit hooks. Is there someone who an dig into the commit below and try to find out how the author field was incorrectly set? Peter commit a325bdd195ee96f826b208c3afb9bed2ec077e12 Author: Peter Bergner AuthorDate: Thu Jun 10 13:54:12 2021 -0500 Commit: Uros Bizjak CommitDate: Thu Jun 10 23:55:24 2021 +0200 i386: Add V8QI and other 64bit vector permutations [PR89021] In addition to V8QI permutations, several other missing permutations are added for 64bit vector modes for TARGET_SSSE3 and TARGET_SSE4_1 targets. 2021-06-10 Uroš Bizjak gcc/ PR target/89021 * config/i386/i386-expand.c (ix86_split_mmx_punpck): Handle V2SF mode. Emit SHUFPS to fixup unpack-high for V2SF mode. (expand_vec_perm_blend): Handle 64bit modes for TARGET_SSE4_1. (expand_vec_perm_pshufb): Handle 64bit modes for TARGET_SSSE3. (expand_vec_perm_pblendv): Handle 64bit modes for TARGET_SSE4_1. (expand_vec_perm_interleave2): Handle 64bit modes. (expand_vec_perm_even_odd_pack): Handle V8QI mode. (expand_vec_perm_even_odd_1): Ditto. (ix86_vectorize_vec_perm_const): Ditto. * config/i386/i386.md (UNSPEC_PSHUFB): Move from ... * config/i386/sse.md: ... here. * config/i386/mmx.md (*vec_interleave_lowv2sf): New insn_and_split pattern. (*vec_interleave_highv2sf): Ditto. (mmx_pshufbv8qi3): New insn pattern. (*mmx_pblendw): Ditto.
Re: D build on powerpc broken (was Re: GCC 11.1 Release Candidate available from gcc.gnu.org)
On 4/20/21 4:20 PM, Jakub Jelinek via Gcc wrote: > On Tue, Apr 20, 2021 at 03:27:08PM -0500, William Seurer via Gcc wrote: >> /tmp/cc8zG8DV.s: Assembler messages: >> /tmp/cc8zG8DV.s:2566: Error: unsupported relocation against r13 >> /tmp/cc8zG8DV.s:2570: Error: unsupported relocation against r14 [snip] > So do we need to change > +else version (PPC) > > > +{ > > > +void*[19] regs = void; > > > +asm pure nothrow @nogc > > > +{ > > > +"stw r13, %0" : "=m" (regs[ 0]); > > > +"stw r14, %0" : "=m" (regs[ 1]); > > > ... > +else version (PPC64) > > > +{ > > > +void*[19] regs = void; > > > +asm pure nothrow @nogc > > > +{ > > > +"std r13, %0" : "=m" (regs[ 0]); > > > +"std r14, %0" : "=m" (regs[ 1]); > > > ... > to "stw 13, %0" and "std 13, %0" etc. unconditionally, or > to "stw %%r13, %0" etc. under some conditions? Yes, I think so. The "r13", etc. names are not accepted by gas unless you use the -mregnames option. It's easier to just remove the 'r'. Peter
Re: subversion status on gcc.gnu.org
On 3/24/20 12:06 PM, Frank Ch. Eigler wrote: >> Thanks for working on this!!! However, I still see at least one issue >> in the following bugzilla entry: >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94123#c4 >> >> The first two git style links work, but the last one which points >> to the SVN revision doesn't. Is that a bug in the actual url that >> bugzilla added or can we handle these too? > > We can/do handle the last one too. httpd mod_rewrite is powerful. Works now. Thanks for fixing! Peter
Re: subversion status on gcc.gnu.org
On 3/20/20 12:37 PM, Frank Ch. Eigler via Gcc wrote: > Hi - > > Both svn: and ssh+svn: now work for your archeological needs. > Further, URLs such as > > https://gcc.gnu.org/viewcvs?rev=279160&root=gcc&view=rev > https://gcc.gnu.org/r123456 > > are mapped to gitweb searches that try to locate the matching > From-SVN: rABCDEF commit. This way, historical URLs from bugzilla > should work. > > If you badly need something else subversionish, please let me know. Thanks for working on this!!! However, I still see at least one issue in the following bugzilla entry: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94123#c4 The first two git style links work, but the last one which points to the SVN revision doesn't. Is that a bug in the actual url that bugzilla added or can we handle these too? Peter
Re: Merges from release branches to vendor tracking branches
On 1/23/20 12:09 PM, Peter Bergner wrote: > On 1/23/20 4:29 AM, Jakub Jelinek wrote: >> so it is not a fast forward merge and we have the requirement that >> From-SVN: shouldn't appear in commit logs of new commits. > > So I just did "git merge releases/gcc-9" into our branch and I'm not > seeing any From-SVN: in any of the commit messages. Where/how are > you seeing those? Actually, I see them now. I'm not sure what happened before. So Joseph said these are actually ok on the vendor branches, but what was the original concern with them being there in the commit logs? Is it still useful to remove them? Peter
Re: Merges from release branches to vendor tracking branches
On 1/23/20 4:29 AM, Jakub Jelinek wrote: > Just FYI if somebody needs to do something similar, I needed to do a merge > from origin/releases/gcc-9 to our vendor branch - > refs/vendors/redhat/heads/gcc-9-branch > This branch has some extra commits origin/releases/gcc-9 branch doesn't > have, This is good timing, as I'd like to do the same for our IBM 9 branch refs/vendors/ibm/heads/gcc-9-branch. > so it is not a fast forward merge and we have the requirement that > From-SVN: shouldn't appear in commit logs of new commits. So I just did "git merge releases/gcc-9" into our branch and I'm not seeing any From-SVN: in any of the commit messages. Where/how are you seeing those? Peter
Re: git conversion in progress
On 1/22/20 3:26 AM, Gerald Pfeifer wrote: > On Mon, 13 Jan 2020, Joseph Myers wrote: >> In addition, once git.html is more complete (has the list of branches >> added, at least) we need to update the GCC home page to link to the new >> pages in place of those for SVN, redirect the old pages to the new ones, >> and generally update references to SVN in wwwdocs and the GCC manuals. > > I have removed all references to svnwrite.html and svn.html from our > own pages, added redirects to gitwrite.html and git.html, respectively, > and after svnwrite.html a few days ago now also removed svn.html. The rsync.html page can be removed too, since that was a way to download the entire svn repo. With git clone, you get the entire repo, so rsync isn't needed anymore. Peter
Help with new GCC git workflow...
As somewhat of a git newbie and given gcc developers will do a git push of our changes rather than employing a git pull development model, I'd like a little hand holding on what my new gcc git workflow should be, so I don't screw up the upstream repo by pushing something to the wrong place. :-) I know enough that I should be using local branches to develop my changes, so I want something like: git checkout master git pull git checkout -b git commit -m "My commit message1" git commit -m "My commit message2" git commit -m "My commit message3" At this point, I get a little confused. :-) I know to submit my patch for review, I'll want to squash my commits down into one patch, but how does one do that? Should I do that now or only when I'm ready to push this change to the upstream repo or ??? Do I need to even do that? Also, when I'm ready to push this "change" upstream to trunk, I'll need to move this over to my master and then push. What are the recommended commands for doing that? I assume I need to rebase my branch to current upstream master, since that probably has moved forward since I checked my code out. Also, at what point do I write my final commit message, which is different than the (possibly simple) commit messages above? Is that done after I've pulled my local branch into my master? ...or before? ...or during the merge over? ...and this is just for changes going to trunk. How does all this change when I want to push changes to a release or vendor branch? I guess I'm just looking for some simple workflow commands for both trunk and release/vendor branches I can follow until I'm a little more confident in my git knowledge. I'm guessing I'm not the only one who would like this info, so maybe someone can add this to our wiki? Peter
Re: BountySource campaign for gcc PR/91851
On 10/30/19 2:31 PM, Georg-Johann Lay wrote: > Hi, have the cc0 backends been deprecated? > > I didn't follow the lists for some time... At least neither v9 or v10 > release notes caveats mention such deprecation, neither is there > respective PRs for the cc0 targets. https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01256.html Peter
Re: Question regarding constraint usage within inline asm
On 2/20/19 9:39 PM, Alan Modra wrote: > On Wed, Feb 20, 2019 at 08:57:52PM -0600, Peter Bergner wrote: >> Yes, because they don't have my IRA and LRA patches that exposed this >> problem. I would say they were buggy for not complaining and silently >> spilling a hard register in the case where we used asm reg("..."). > > I don't follow your reasoning. It seems to me that giving some > variable a register asm doesn't mean that the value of that variable > can't appear in some other register. An obvious example is when > passing that variable to a function. I don't disagree with you here. For sure, multiple registers can hold the same value, the same that multiple variables can hold the same value. > So why shouldn't a hard reg be reloaded in order to satisfy > incompatible constraints? About the only usage of register asm that is guaranteed, is their usage in inline asm. If you specify a hard register for a variable and then use that variable in an inline asm, you are guaranteed that that variable will use that register in the inline asm. Now in this case, "input" doesn't have the register asm, but asmcons rewrites the rtl such that it looks like "input" was assigned via a register asm. LRA doesn't know about register asms, it just sees pseudos and hard registers, so I think it needs to be conservative and assume the explicit hard registers it sees could have come from a register asm, and not spill it, but rather error out and let the user fix it. That said, the "bug" in the case we're seeing, is that asmcons rewrote all of "input"'s pseudos, and it should be more careful to not create rtl with illegal constraint usage that LRA cannot fix up. With the fix, operand %1 in the inline asm is no longer hard coded to r3 and it uses the pseudo instead, so everything is copacetic. Peter
Re: Question regarding constraint usage within inline asm
On 2/20/19 4:04 PM, Alan Modra wrote: > On Wed, Feb 20, 2019 at 10:08:07AM -0600, Peter Bergner wrote: >> On 2/19/19 9:09 PM, Alan Modra wrote: >> That said, talking with Segher and Uli offline, they both think the >> inline asm usage in the test case should be legal > > Good, it seems we are in agreement. Incidentally, the single pseudo > for the inputs happens even for testcases like > > long input; > long > bug (void) > { > register long output /* asm ("r3") */; > asm ("blah %0, %1, %2" : "=r" (output) : "wi" (input), "0" (input)); > return output; > } This is a different problem than I'm fixing, but you are correct that asmcons shouldn't replace operand %1 since it has a non-compatible constraint than the output operand. In this case, it's probably "ok" to spill even though it's a hard register, because it doesn't match the regclass it is supposed to have. I'm not sure how important this is to fix. It can also imagine that this would be hard to handle, since we'd have to call into the backend to see whether the two constraints are compatible and with the overlap between different constraints, that could be very very messy! Peter
Re: Question regarding constraint usage within inline asm
On 2/20/19 4:19 PM, Alan Modra wrote: > I forgot to say, gcc-6, gcc-7 and gcc-8 handle your original testcase > with the register asm just fine. Yes, because they don't have my IRA and LRA patches that exposed this problem. I would say they were buggy for not complaining and silently spilling a hard register in the case where we used asm reg("..."). Peter
Re: Question regarding constraint usage within inline asm
On 2/19/19 9:09 PM, Alan Modra wrote: > On Mon, Feb 18, 2019 at 01:13:31PM -0600, Peter Bergner wrote: >> long input; >> long >> bug (void) >> { >> register long output asm ("r3"); >> asm ("blah %0, %1, %2" : "=&r" (output) : "r" (input), "0" (input)); >> return output; >> } >> >> I know an input operand can have a matching constraint associated with >> an early clobber operand, as there seems to be code that explicitly >> mentions this scenario. In this case, the user has to manually ensure >> that the input operand is not clobbered by the early clobber operand. >> In the case that the input operand uses an "r" constraint, we just >> ensure that the early clobber operand and the input operand are assigned >> different registers. My question is, what about the case above where >> we have the same variable being used for two different inputs with >> constraints that seem to be incompatible? > > Without the asm("r3") gcc will provide your "blah" instruction with > one register for %0 and %2, and another register for %1. Both > registers will be initialised with the value of "input". That's not what I'm seeing. I see one pseudo (123) used for the output operand and one pseudo (121) used for both input operands. Like so: (insn 8 6 7 (parallel [ (set (reg:DI 123 [ outputD.2831 ]) (asm_operands:DI ("blah %0, %1, %2") ("=&r") 0 [ (reg/v:DI 121 [ ]) repeated x2 ] [ (asm_input:DI ("r") bug.i:6) (asm_input:DI ("0") bug.i:6) ] [] bug.i:6)) (clobber (reg:SI 76 ca)) ]) "bug.i":6:3 -1 (nil)) The only difference between using asm("r3") and not using it is that pseudo 123 is replaced with hard reg 3 in the output operand. The input operands use pseudo 121 in both cases. It stays this way up until the asmcons pass (ie, match_asm_constraints_1) which notices that operand %2 has a matching constraint with operand %0, so it emits a copy before the asm that writes "input"'s pseudo into "output"'s pseudo and then rewrites the asm operand %2 to use "output"'s pseudo. But then it goes ahead and rewrites all other uses of "input"'s pseudos with "output"'s pseudo, so operand %1 also gets rewritten. So we end up with: (insn 15 6 8 2 (set (reg:DI 123 [ outputD.2831 ]) (reg/v:DI 121 [ ])) "bug.i":6:3 -1 (nil)) (insn 8 15 12 2 (parallel [ (set (reg:DI 123 [ outputD.2831 ]) (asm_operands:DI ("blah %0, %1, %2") ("=&r") 0 [ (reg:DI 123 [ outputD.2831 ]) repeated x2 ] [ (asm_input:DI ("r") bug.i:6) (asm_input:DI ("0") bug.i:6) ] [] bug.i:6)) (clobber (reg:SI 76 ca)) ]) "bug.i":6:3 -1 (expr_list:REG_DEAD (reg/v:DI 121 [ ]) (expr_list:REG_UNUSED (reg:SI 76 ca) (nil Now the case above (ie, not using asm("r3")) compiles fine. We assign pseudo 123 to r3 and LRA's constraint checking code notices that operand %1 should not be assigned to the same register as the early clobber output operand, so it spills it. However, when we use asm("r3"), LRA's constraint checking code again sees that operand %1 shouldn't have the same register as operand %0, but since it's a preassigned hard register, it cannot spill it, since there may have been a valid reason why that particular operand is supposed to be in r3, so we ICE. I'm not sure we can ever safely spill a hard register. That said, talking with Segher and Uli offline, they both think the inline asm usage in the test case should be legal, so that tells me then that the bug is in the asmcons pass when it rewrites operand %1's pseudo. It really should check that operand %1's pseudo should not be updated because it conflicts with the early clobber operand %0. That would then allow operand %1 and operand %2 to have different registers. I'll try and prepare a patch that checks for that scenario. Peter
Question regarding constraint usage within inline asm
I have a question about constraint usage in inline asm when we have an early clobber output operand. The test case is from PR89313 and looks like the code below (I'm using "r3" for the reg on ppc, but you could also use "rax" on x86_64, etc.). long input; long bug (void) { register long output asm ("r3"); asm ("blah %0, %1, %2" : "=&r" (output) : "r" (input), "0" (input)); return output; } I know an input operand can have a matching constraint associated with an early clobber operand, as there seems to be code that explicitly mentions this scenario. In this case, the user has to manually ensure that the input operand is not clobbered by the early clobber operand. In the case that the input operand uses an "r" constraint, we just ensure that the early clobber operand and the input operand are assigned different registers. My question is, what about the case above where we have the same variable being used for two different inputs with constraints that seem to be incompatible? Clearly, we cannot assign a register to the "input" variable that is both the same and different to the register that is assigned to "output". Is this outright invalid to have "input" use both a matching and non-matching constraint with an early clobber operand? Or is is expected that reload/LRA will come along and fix up the "r" usage to use a different register? My guess is that this is invalid usage and I have a patch to expand_asm_stmt() to catch this, but it only works if we've preassigned "output" to a hard register. If this is truly invalid, should I flag this even if "output" isn't preassigned? If it is valid, then should match_asm_constraints_1() really rewrite all of the uses of "input" with the register assigned to output as it is doing now, which is what is causing the problems in LRA. LRA sees that both input operands are using r3 and it catches the constraint violation of the "r" input and tries to spill it, but it's not a pseudo, but an explicit hard register already. I'm not sure LRA can really safely spill an operand that is an explicit hard register. Thoughts? Peter
Re: Spectre V1 diagnostic / mitigation
On 12/19/18 7:59 AM, Florian Weimer wrote: > * Richard Biener: > >> Sure, if we'd ever deploy this in production placing this in the >> TCB for glibc targets might be beneifical. But as said the >> current implementation was just an experiment intended to be >> maximum portable. I suppose the dynamic loader takes care >> of initializing the TCB data? > > Yes, the dynamic linker will initialize it. If you need 100% reliable > initialization with something that is not zero, it's going to be tricky > though. Initial-exec TLS memory has this covered, but in the TCB, we > only have zeroed-out reservations today. We have non-zero initialized TCB entries on powerpc*-linux which are used for the GCC __builtin_cpu_is() and __builtin_cpu_supports() builtin functions. Tulio would know the magic that was used to get them setup. Peter
Re: LRA reload produces invalid insn
On 11/1/18 10:37 PM, Vladimir Makarov wrote: > On 11/01/2018 08:25 PM, Paul Koning wrote: >> Is this an LRA bug, or is there something I need to do in the target to >> prevent this happening? > It is hard to say whose code is responsible for this. It might be a wrong > machine-dependent code or a LRA bug. > > Paul, could you send me full LRA dump file (.reload). It might help me to > say more specific reason for the bug. LRA has iterated sub-passes and the > full dump can say where LRA started to behave wrongly. > I'll note that when we ported the rs6000 (ie, ppc*) port over to LRA from reload, we hit many target problems. It seems LRA is much less forgiving to bad constraints, predicates, etc. than reload was. I think that's actually a good thing. Peter
Re: LRA reload produces invalid insn
On 11/1/18 8:40 PM, Segher Boessenkool wrote: > Hi Peter, > > On Thu, Nov 01, 2018 at 07:49:36PM -0500, Peter Bergner wrote: >> On 11/1/18 7:25 PM, Paul Koning wrote: >>> I'm running the testsuite on the pdp11 target, and I get a failure when >>> using LRA that works correctly with the old allocator. The issue is that >>> LRA is producing an insn that is invalid (it violates the constraints >>> stated in the insn definition). >> [snip] >>> which is the correct sequence given the matching operand constraint in the >>> define_insn. >>> >>> Is this an LRA bug, or is there something I need to do in the target to >>> prevent this happening? >> >> What do you mean by "old allocator"? > > I think Paul just means old reload. In that case, my patch may still help. Peter
Re: LRA reload produces invalid insn
On 11/1/18 7:25 PM, Paul Koning wrote: > I'm running the testsuite on the pdp11 target, and I get a failure when using > LRA that works correctly with the old allocator. The issue is that LRA is > producing an insn that is invalid (it violates the constraints stated in the > insn definition). [snip] > which is the correct sequence given the matching operand constraint in the > define_insn. > > Is this an LRA bug, or is there something I need to do in the target to > prevent this happening? What do you mean by "old allocator"? Just an older revision? Does it work before my revision 264897 commit and broken after? If so, could you try the following to see whether that fixes things for you? https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html My commit above exposed some latent LRA bugs and my patch above tries to fix issues similar to what you're seeing. Peter
Re: Even numbered register pairs restriction on some instructions
On 8/31/18 10:41 AM, Matthew Malcomson wrote: > I'm looking into whether it's possible to require even numbered registers on > modes that need more than one hard-register to represent them. But only in > some cases. Yes, it's possible. You can look at TDmode (128-bit decimal floating point) on powerpc64*-linux, which is only allowed in even-odd register pairs. It's in *all* cases though, not some of the time. Peter
Re: Transactional memory test case reduction failure
On 8/27/18 1:20 PM, sameeran joshi wrote: > On 8/27/18, Peter Bergner wrote: >> On 8/27/18 12:13 PM, sameeran joshi wrote: >>> On 8/27/18, Peter Bergner wrote: >>>> Well what does: >>>> >>>> linux% gcc -I/home/swamimauli/upload/csmith/runtime/ -Wall bug.c >>> >>> running above command on terminal,gives many warnings and asks for the >>> -fgnu-tm option. >>> > > this shows me ICE if I include -fgnu-tm flag. Then you need to add -fgnu-tm to your compile options in your creduce script. Otherwise, how can creduce reduce your test case down to a minimal, but still ICEing test case, if you don't tell it how to make it ICE? Peter
Re: Transactional memory test case reduction failure
On 8/27/18 12:13 PM, sameeran joshi wrote: > On 8/27/18, Peter Bergner wrote: >> Well what does: >> >> linux% gcc -I/home/swamimauli/upload/csmith/runtime/ -Wall bug.c > > running above command on terminal,gives many warnings and asks for the > -fgnu-tm option. > > bug.c:1091:2: error: ‘__transaction_relaxed ’ without transactional > memory support enabled > __transaction_relaxed { Well there's your problem then, meaning your compile command doesn't result in the "internal compiler error: " message you're expecting to see. Peter
Re: Transactional memory test case reduction failure
On 8/27/18 11:42 AM, sameeran joshi wrote: > It's still giving output as 1,I included the -squiggle option still,it > dosen't work for me? any Ideas? > > #!/bin/bash > > CC="-I/home/swamimauli/upload/csmith/runtime/" > OPTS="-Wall" > TEST="bug.c" > gcc ${CC} ${OPTS} ${TEST} 2>&1 | grep 'internal compiler error:in > expand_expr_addr_expr_1, at expr.c:7862' > if ! test $? = 0; then > exit 1 > fi > exit 0 Well what does: linux% gcc -I/home/swamimauli/upload/csmith/runtime/ -Wall bug.c return? And also, what does: linux% gcc -I/home/swamimauli/upload/csmith/runtime/ -Wall bug.c 2>&1 | grep 'internal compiler error: in expand_expr_addr_expr_1, at expr.c:7862' linux% echo $? return? Peter
Re: Transactional memory test case reduction failure
On 8/27/18 10:35 AM, Shubham Narlawar wrote: > Here is the file. I am getting some error in sending .sh file, so I send it > as below. > > #!/bin/bash > gcc -fgnu-tm testcase.c > out.txt 2>&1 &&\ > if > grep 'internal compiler error' out.txt > then > exit 0 > else > exit 1 > fi When I use creduce, I never write my output to an actual file, but just pipe it directly into grep. My creduce.sh scripts usually look like the following which have worked for me in the past. Peter #!/bin/bash CC="/home/bergner/gcc/build/gcc-fsf-6-pr78543-debug/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-6-pr78543-debug/gcc" OPTS="-O3 -S" TEST=pr78543-2.i ${CC} ${OPTS} ${TEST} 2>&1 | grep 'internal compiler error: in push_reload, at reload.c:1349' if ! test $? = 0; then exit 1 fi exit 0
Re: Question regarding preventing optimizing out of register in expansion
On 6/26/18 4:05 AM, Peryt, Sebastian wrote: > With some changes simplified implementation of my expansion is as follows: > tmp_op0 = gen_reg_rtx (mode); > emit_move_insn (tmp_op0, op0); You set tmp_op0 here, and then > emit_insn (gen_rtx_SET (tmp_op0, reg)); You set it again here without ever using it above, so it's dead code, which explains why it's removed. Peter
Re: Why does IRA force all pseudos live across a setjmp call to be spilled?
On 3/5/18 9:33 AM, Segher Boessenkool wrote: > On Mon, Mar 05, 2018 at 08:01:14AM +0100, Eric Botcazou wrote: >> Apparently the authors of the SPARC psABI thought that the last part of your >> sentence is an interpolation and that the (historical) requirements were >> vague >> enough to allow their interpretation, IOW that the compiler can do the work. > > Maybe we should have a target hook that says setjmp/longjmp are > implemented by simple function calls (or as-if by function calls), so > as not to penalize everyone who has an, erm, more conservative ABI? Unless someone really wants to work on this, I'll have a look at adding this once stage1 opens up. Peter
Re: Why does IRA force all pseudos live across a setjmp call to be spilled?
On 3/4/18 7:57 AM, Eric Botcazou wrote: >> I can't argue with anything in that comment, other than the conclusion. :-) >> It's not the compiler's job to implement the setjmp/longjmp save/restore. >> Maybe Kenny was working around a problem with some target's buggy setjmp >> and spilling everything "fixed" it? > > What are the requirements imposed on setjmp exactly and by whom? The psABI > on > SPARC (the SCD) has an explicit note saying that setjmp/sigsetjmp/vfork don't > (have to) preserve the usual non-volatile registers. I'm not a language lawyer and I don't play one on TV either, but I believe the requirements come from multiple sources. You've pointed out your ABI and Andreas pointed out the C standard also places requirements: https://gcc.gnu.org/ml/gcc/2018-03/msg00030.html I wouldn't be surprised if there are more specs/standards that place restrictions too. Clearly returning from the function that calls setjmp before calling longjmp must be illegal, since that would result in clobbering of the stack frame the longjmp would attempt to restore to. I don't know off hand who/what states that restriction. Peter
Re: Why does IRA force all pseudos live across a setjmp call to be spilled?
On 3/3/18 5:47 PM, Peter Bergner wrote: > On 3/3/18 10:29 AM, Jeff Law wrote: >> Here's the comment from regstat.c: >> >> /* We have a problem with any pseudoreg that lives >> across the setjmp. ANSI says that if a user variable >> does not change in value between the setjmp and the >> longjmp, then the longjmp preserves it. This >> includes longjmp from a place where the pseudo >> appears dead. (In principle, the value still exists >> if it is in scope.) If the pseudo goes in a hard >> reg, some other value may occupy that hard reg where >> this pseudo is dead, thus clobbering the pseudo. >> Conclusion: such a pseudo must not go in a hard >> reg. */ > > I can't argue with anything in that comment, other than the conclusion. :-) > It's not the compiler's job to implement the setjmp/longjmp save/restore. > Maybe Kenny was working around a problem with some target's buggy setjmp > and spilling everything "fixed" it? The only observable difference I can see between a variable that has been spilled to memory versus one that is assigned to a non-volatile hard reg is if it is modified between the setjmp and the longjmp. In the case where the variable is spilled to memory, the "new" updated value is the value you _may_ see on the return from setjmp (the return caused by the call to longjmp), whereas if it is assigned to a non-volatile register, then you _will_ see the "old" value that was saved by the setjmp call. I say _may_ see above, because there are cases were we might not store the "new" updated value to memory, even if we've spilled the pseudo. Examples would be spill code optimization, or the variable has been broken into separate live ranges/pseudos. etc. etc. I guess I can even think of cases where we could see both "old" and "new" values of a variable. Think of a variable that has been spilled/split like below: a = [start of live range, a assigned to non-volatile reg] spill store a ... setjmp() ... 1) ... = ... a ... [end of live range] ... [a not assigned to a reg in this region] spill load a[start of live range] 2) ... = ... a ... [end of live range] ... if (...) a = [start of live range] 3) spill store a[end of live range] ... [a not assigned to a reg in this region] longjmp() On return from setjmp (the return caused by the call to longjmp), the use of "a" at "1)" will use the non-volatile hard register that was saved by the initial call to setjmp, so it will see the "old" value of "a". However, since the use of "a" at "2)" loads the value from memory, it will use the "new" value stored by the spill load at "3)"! That said, the comment above only talks about variables that do not change between the setjmp and the longjmp and in that case, you will see the same "old" value (which is the only value, since it wasn't modified) regardless of whether it was spilled or not. What does ANSI (or any spec) say about what should happen to variables that are modified between the setjmp and longjmp calls? Maybe all bets are off, given the example above, since even spilling a variable live across a setjmp can still lead to strange behavior unless you don't allow spill/split optimization and I don't think we'd want that at all. Peter
Re: Why does IRA force all pseudos live across a setjmp call to be spilled?
On 3/3/18 10:29 AM, Jeff Law wrote: > Here's the comment from regstat.c: > > /* We have a problem with any pseudoreg that lives > across the setjmp. ANSI says that if a user variable > does not change in value between the setjmp and the > longjmp, then the longjmp preserves it. This > includes longjmp from a place where the pseudo > appears dead. (In principle, the value still exists > if it is in scope.) If the pseudo goes in a hard > reg, some other value may occupy that hard reg where > this pseudo is dead, thus clobbering the pseudo. > Conclusion: such a pseudo must not go in a hard > reg. */ I can't argue with anything in that comment, other than the conclusion. :-) It's not the compiler's job to implement the setjmp/longjmp save/restore. Maybe Kenny was working around a problem with some target's buggy setjmp and spilling everything "fixed" it? It is absolutely fine for a pseudo that is live across a setjmp call to occupy a (non-volatile) hard register at the setjmp's call site, even if some other value eventually occupies the same hard register between the setjmp and the longjmp. The reason is that setjmp saves all of the non- volatile hard registers in the jmp_buf. If our pseudo was assigned to one of those non-volatile hard registers, then its value at the time of the setjmp call is saved, so even if its hard register is clobbered before we get to the longjmp call, the longjmp will restore the pseudos value from the jmp_buf into the hard register, restoring the value it had at the time of the setjmp call. The only way I can see the above not working is either setjmp doesn't save the entire register state it should, the jmp_buf somehow gets clobbered before the longjmp call or longjmp doesn't restore the entire register state that it should. All of those would be bugs in my book. The only thing the register allocator should need to do, is treat setjmp just like any other function call and make all pseudos that are live across it interfere with all volatile hard registers, so that they will be assigned to either non-volatile hard registers or spilled (if no non-volatile registers are available). Peter
Re: Why does IRA force all pseudos live across a setjmp call to be spilled?
On 3/2/18 3:26 PM, Jeff Law wrote: > On 03/02/2018 12:45 PM, Peter Bergner wrote: >> ...which forces us to spill everything live across the setjmp by forcing >> the pseudos to interfere all hardregs. That can't be good for performance. >> What am I missing? > > You might want to hold off a bit. I've got changes for 21161 which can > help this significantly. Basically the live-across-setjmp set is way > too conservative -- it includes everything live at the setjmp, but it > really just needs what's live on the longjump path. > > As for why, I believe it's related to trying to make sure everything has > the right values if we perform a longjmp. I can understand why we might save/restore across functions that can throw exceptions since the program state hasn't been saved at the point of the call or in the call, but what is special about setjmp()? We don't need to save/restore the volatile regs since all functions clobber them and the non-volatile regs are saved/restored by setjmp(), just like any normal function call. ...and as far as I know, setjmp() doesn't save or restore the stack contents, just the stack pointer, pc, etc. So I guess I still don't know why we treat it differently than any other function call wrt register allocation. Peter
Why does IRA force all pseudos live across a setjmp call to be spilled?
While debugging the PR84264 ICE caused by the following test case: void _setjmp (); void a (unsigned long *); void b () { for (;;) { _setjmp (); unsigned long args[9]{}; a (args); } } I noticed that IRA is spilling all pseudos that are live across the call to setjmp. Why is that? Trying to look through the history of this, I see Jim committed a patch to reload that removed it spilling everything across all setjmps: https://gcc.gnu.org/ml/gcc-patches/2003-11/msg01667.html But currently ira-lives.c:process_bb_node_lives() has: /* Don't allocate allocnos that cross setjmps or any call, if this function receives a nonlocal goto. */ if (cfun->has_nonlocal_label || find_reg_note (insn, REG_SETJMP, NULL_RTX) != NULL_RTX) { SET_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj)); SET_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)); } ...which forces us to spill everything live across the setjmp by forcing the pseudos to interfere all hardregs. That can't be good for performance. What am I missing? Peter
Re: Register Allocation Graph Coloring algorithm and Others
On 12/14/17 9:18 PM, Leslie Zhai wrote: > * The papers by Briggs and Chaiten contradict[2] themselves when examine > the text of the paper vs. the pseudocode provided? I've read both of these papers many times (in the past) and I don't recall any contradictions in them. Can you (Dave?) be more specific about what you think are contradictions? I do admit that pseudo code in papers can be very terse, to the point that they don't show all the little details that are needed to actually implement them, but they definitely shouldn't contradict their written description. I was very grateful that Preston was more than willing to answer all my many questions regarding his allocator and the many many details he couldn't mention in his Ph.D. thesis, let alone a short paper. Peter
Re: PowerPC -many
On 2/14/17 6:06 PM, Alan Modra wrote: Since we've been talking about obsoleting cpu support, how about getting rid of -many in ASM_CPU_SPEC for gcc-8? +1 Peter
Re: -mcx16 vs. not using CAS for atomic loads
On 1/24/17 3:06 PM, Richard Henderson wrote: The only possible concern I see might be with simulators that force HTM failure, for the purpose of forcibly testing fallback paths. I guess we'd have to continue to fall back to the lock path for that case. IIRC, this was the path that valgrind was going to use all of the time, because actually implementing the HTM instructions was too hard. Peter
Re: Remove sel-sched?
On Fri, 2016-01-15 at 11:13 +0100, Richard Biener wrote: > Btw, I'd like people to start thinking if the scheduling algorithms > working on loops (and sometimes requiring unrolling of loops) can be > implemented in a way to apply that unrolling on the GIMPLE level > (not the scheduling itself of course). We've been underwhelmed with the RTL unroller on POWER and I think we concur that a GIMPLE level unroller would be interesting. Peter
Re: building gcc with macro support for gdb?
On Wed, 2015-12-02 at 20:05 -0500, Ryan Burn wrote: > Is there any way to easily build a stage1 gcc with macro support for > debugging? > > I tried setting CFLAGS, and CXXFLAGS to specify "-O0 -g3" via the > command line before running configure, but that only includes those > flags for some of the compilation steps. > > I was only successful after I manually edited the makefile to replace > "-g" with "-g3". Try CFLAGS_FOR_TARGET='-O0 -g3 -fno-inline' and CXXFLAGS_FOR_TARGET='-O0 -g3 -fno-inline' Peter
Re: Powerpc atomic_load
On Wed, 2015-09-23 at 16:15 +0200, Sebastian Huber wrote: > On 10/09/15 19:52, David Edelsohn wrote: > > https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html > > Is there specific reason why the SYNC L,E (Elemental Memory Barriers) > defined by Power-ISA V2.07 doesn't appear in this table? Probably because that category is only implemented on some (one?) cpus (eg, E6500) and not on any of the server cpus (eg, power[45678]), so no one cared enough to add that info? :-) It would probably be useful to add though. Peter
Re: Repository for the conversion machinery
On Fri, 2015-08-28 at 11:00 -0400, Eric S. Raymond wrote: > Peter Bergner : > > On Thu, 2015-08-27 at 10:38 -0400, Eric S. Raymond wrote: > > > I've made it available at: > > > > > > http://thyrsus.com/gitweb/?p=gcc-conversion.git > > > > > > The interesting content is gcc.map (the contributor map) and gcc.lift. > > > > > > Presently the only command in gcc.lift expunges the hooks directory. > > > > >From your list, I also see that alanm and amodra are both listed with > > Alan's old bigpond.net.au address which no longer exists. He now uses: > > > >amo...@gmail.com It looks like you have a cut/paste error, with Alan's email address: alanm = Alan Modra amodra = Alan Modra s/amodra@amodra@/amodra@/ Peter
Re: Repository for the conversion machinery
> azanella = Adhemerval Zanella Adhemerval now works for Linaro, so his email address should be: adhemerval.zane...@linaro.org > bje = Ben Elliston Ben is no longer at Red Hat...or IBM. He went back to school and his new email address seems to be: b.ellis...@unsw.edu.au > dnovillo = Diego Novillo Diego is now at google, so his email address should be: dnovi...@google.com > drepper = Ulrich Drepper Uli is no longer at Red Hat (now at Goldman Sachs?). His last email to the GCC mailing list used this address: drep...@gmail.com > janis = Janis Johnson Janis is now retired. Her personal email address as listed in the MAINTAINERS file is: janis.marie.john...@gmail.com > jgrimm = Jon Grimm Jon is now at Canonical. I'm not sure which of the two email addresses that seem to be active for him he prefers: jon.gr...@canonical.com jon.gr...@gmail.com > luisgpm = Luis Machado Luis is now with Codesourcery. His email address is: lgust...@codesourcery.com > meissner = Michael Meissner Mike is now at IBM, but his email address in the MAINTAINERS file is: g...@the-meissners.org > mircea = Mircea Namolaru Mircea is now working at INRIA. His email address is: mircea.namol...@inria.fr > olga = Olga Golovanevsky Olga is no longer at IBM. I believe she is now at Cavium, but her recent GNU Cauldron presentation used this address: golovanevsky.o...@gmail.com > spop = Sebastian Pop Sebastian is now at Samsung and his address is: s@samsung.com Peter
Re: Repository for the conversion machinery
On Thu, 2015-08-27 at 10:38 -0400, Eric S. Raymond wrote: > I've made it available at: > > http://thyrsus.com/gitweb/?p=gcc-conversion.git > > The interesting content is gcc.map (the contributor map) and gcc.lift. > > Presently the only command in gcc.lift expunges the hooks directory. >From your list, I also see that alanm and amodra are both listed with Alan's old bigpond.net.au address which no longer exists. He now uses: amo...@gmail.com Peter
Re: Repository for the conversion machinery
On Thu, 2015-08-27 at 16:13 +, Joseph Myers wrote: > 273 missing usernames (this is based on grepping the output of svn log on > an rsync mirror of the repository, so it's possible one or two could be > spurious, but should be pretty accurate). I've made no attempt to map > these to emails yet. > acsawdey Aaron Sawdey / acsaw...@linux.vnet.ibm.com > bergner Peter Bergner / berg...@vnet.ibm.com > boger Lynn Boger ? labo...@linux.vnet.ibm.com > pthaugen Pat Haugen / pthau...@linux.vnet.ibm.com > revitale Revital Eres / e...@il.ibm.com > wschmidt Bill Schmidt / wschm...@linux.vnet.ibm.com > zaks Ayal Zaks / His MAINTAINERS entry still lists his IBM email address, but he is no longer with IBM. I'm not sure whether he now prefers his az...@ee.technion.ac.il or ayal.z...@intel.com email addresses. Peter
Re: 33 unknowns left
On Wed, 2015-08-26 at 20:12 -0400, Eric S. Raymond wrote: > Peter Bergner : > > On Wed, 2015-08-26 at 16:35 -0400, Eric S. Raymond wrote: > > > Joseph Myers : > > > > > irar = irar > > > > > > > > Ira Rosen > > > > > > I pretty much knew these two guys went with these two names, but couldn't > > > figure out which was which. Thanks. > > > > Actually, Ira Rosen is a "she" and not a "he". > > > > Peter > > > > Really? Interesting. I have bever encountered "Ira" as a female name before. > What language does this? She works for IBM's Haifa research lab. https://il.linkedin.com/pub/ira-rosen/34/b73/433 Peter
Re: 33 unknowns left
On Wed, 2015-08-26 at 18:55 -0500, Peter Bergner wrote: > On Wed, 2015-08-26 at 16:35 -0400, Eric S. Raymond wrote: > > Joseph Myers : > > > > irar = irar > > > > > > Ira Rosen > > > > I pretty much knew these two guys went with these two names, but couldn't > > figure out which was which. Thanks. > > Actually, Ira Rosen is a "she" and not a "he". Ah, I see Nick Clifton has been fingered. Nevermind. Peter
Re: 33 unknowns left
On Wed, 2015-08-26 at 13:44 -0700, Ian Lance Taylor wrote: > On Wed, Aug 26, 2015 at 12:31 PM, Eric S. Raymond wrote: > > click = click > > You've got me on that one. Any hints? Just purely looking at the name, did Cliff Click ever contribute to gcc in the past? Peter
Re: 33 unknowns left
On Wed, 2015-08-26 at 16:35 -0400, Eric S. Raymond wrote: > Joseph Myers : > > > irar = irar > > > > Ira Rosen > > I pretty much knew these two guys went with these two names, but couldn't > figure out which was which. Thanks. Actually, Ira Rosen is a "she" and not a "he". Peter
Re: Moving to git
On Fri, 2015-08-21 at 16:09 +0200, Andreas Schwab wrote: > Ramana Radhakrishnan writes: > > > On Fri, Aug 21, 2015 at 11:48 AM, Jonathan Wakely > > wrote: > >> Teams following a different model could use a separate repo shared by > >> those developers, not the gcc.gnu.org one. It's much easier to do that > >> with git. > > > > Yes you are right they sure can, but one of the reasons that teams are > > doing their development on a feature branch is so that they can obtain > > feedback and collaborate with others in the community. > > It is also much easier for others to pull from foreign repositories with > git, so this isn't a severe downside. It may be easy for git to pull from foreign repositories, but it may be difficult/impossible (policy wise) for some developers from some companies to be able to write to foreign repositories. At IBM, we cannot host our own source repositories that others can access. We can only write to the official source code repositories for the projects that we have clearance to work in. We currently have an IBM vendor directory where we have our branches. If we move to git (I'm all for it), I would hope that those can remain in the official source code repository. That said, if the GCC project created an "official" side repository where branches are stored, we could participate in that. Peter
Re: Fail to compile trunk
On Tue, 2015-04-14 at 17:37 +0200, Harald Servat wrote: > I'm trying to compile the GCC's trunk but I find out the following > error while compiling it. I've configured it such as This question is not appropriate for this mailing list, as this list is only for questions about gcc development. You should continue this on the gcc-help mailing list. > ./configure --prefix=/home/harald/pkg/gcc/git > --enable-languages=c,c++ --disable-multilib Building gcc within the GCC source tree is not supported. try creating an empty build directory and using: /path/to/gcc/source/directory/configure Peter
Re: build broken on ppc linux?!
On Fri, 2013-11-22 at 12:30 +0100, Richard Biener wrote: > On Fri, Nov 22, 2013 at 1:57 AM, Jonathan Wakely > wrote: > > Yes, it only seems to be a problem with SUSE kernels: > > http://gcc.gnu.org/ml/gcc/2013-11/msg00090.html > > As my bugreport is being ignored it would help if one ouf our > partners (hint! hint!) would raise this issue via the appropriate > channel ;) Ok, I'll open a bug on our side and we'll see if that helps move things along. Peter
Re: powerpc64 bootstrap broken due to libsanitizer merge from upstream
On Fri, 2013-11-08 at 00:03 +0100, Steven Bosscher wrote: > powerpc64-linux bootstrap is broken by the libsanitizer merge: I already reported the failures here: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00312.html It seems others have reported it breaks bootstrap for them as well on other arches. It's sad it's been broken this long, given it affects so many people. Anyway, the powerpc64-linux breakage is being tracked here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59009 Peter
Re: Bootstrap broken in libobjc/sendmsg.c
On Fri, 2013-09-06 at 13:36 +0200, Paolo Carlini wrote: > . on x86_64-linux, this commit broke the build of that file: > > http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00149.html > > CC-ing Peter. Can you try the patch that HJ suggested? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58139#c9 Peter
Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512
On Wed, 2013-07-24 at 10:42 -0700, H.J. Lu wrote: > Are there any other Linux targets with callee saved vector registers? Yes, on POWER. From our ABI: On processors with the VMX feature. v0-v1 Volatile scratch registers v2-v13 Volatile vector parameters registers v14-v19 Volatile scratch registers v20-v31 Non-volatile registers I'll note that the new VSX register state we recently added with power7 were made volatile, but then we already had these non-volatile altivec regs to use. Peteer
Re: Libitm issues porting to POWER8 HTM
On Tue, 2013-06-18 at 21:48 +0200, Andi Kleen wrote: > > Given Torvald's comment, can you verify whether your hw txn succeeds > > (all the way to commit) or whether it is failing and somehow skips > > the fall through code that is hanging for us (Power and S390)? > > All the 3 transactions in reentrant.c abort. Can you please explain the above? When you say abort, do you mean that libitm is calling htm_abort() or that your xbegin hardware instruction isn't succeeding? > That's not surprising, because there are usually lots of aborts in > the startup phase of programs, and the test doesn't use a loop. Is this a libitm statement or an Intel RTM statement, that the startup phase usually has lots of aborts? Peter
Re: Libitm issues porting to POWER8 HTM
On Tue, 2013-06-18 at 18:41 +0200, Torvald Riegel wrote: > On Fri, 2013-06-14 at 19:44 -0500, Peter Bergner wrote: > > I'll note that if I hack the call to > > htm_abort_should_retry(ret) so that we break of of the loop and fallback > > to SW TM, then the test case executes correctly. > > That matches what I suppose the bug is. > > Please feel free to create a bug report. I will work on a patch. Done. http://gcc.gnu.org/PR57643 Since this seems to pass on x86, let me know if you want me to test a patch on our power8 system. Peter
Re: Libitm issues porting to POWER8 HTM
On Tue, 2013-06-18 at 11:22 -0700, Andi Kleen wrote: > Peter Bergner writes: > > > > I have yet to track down who has the write lock and why, but I am working > > towards that. Talking with Andreas, he said he is seeing the same failure > > on S390, so I'm wondering whether this might be a generic libitm issue > > and it might hit Intel too. Does anyone know whether this executes > > correctly > > on Intel hardware with RTM? I'll note that if I hack the call to > > FWIW on a TSX system I get the following for libitm with current > trunk. So no hangs on reentrant at least. Given Torvald's comment, can you verify whether your hw txn succeeds (all the way to commit) or whether it is failing and somehow skips the fall through code that is hanging for us (Power and S390)? Thanks! Peter
Libitm issues porting to POWER8 HTM
I'm currently implementing support for hardware transactional memory in the rs6000 backend for POWER8. Things seem to be mostly working, but I have run into a few issues I'm wondering whether other people are seeing. For me, all of the libitm execution test cases in libitm/testsuite/libitm.c/ compile and execute without error, except for reentrant.c, which hangs for me. My gdb hasn't been ported to support HTM on Power yet, so debugging has been slow, but what I've learned is, that my tbegin. instruction succeeds, but I fail the test (meaning someone has the write lock) at beginend.cc:200: if (unlikely(serial_lock.is_write_locked())) htm_abort(); ...so we abort the transaction. The failure is not persistent, so we do not break out of the loop due to: if (!htm_abort_should_retry(ret)) break; We then fall into the following code, where we hang trying to get the read lock: serial_lock.read_lock(tx); I have yet to track down who has the write lock and why, but I am working towards that. Talking with Andreas, he said he is seeing the same failure on S390, so I'm wondering whether this might be a generic libitm issue and it might hit Intel too. Does anyone know whether this executes correctly on Intel hardware with RTM? I'll note that if I hack the call to htm_abort_should_retry(ret) so that we break of of the loop and fallback to SW TM, then the test case executes correctly. Secondly, many of the test cases in libitm/testsuite/libitm.c++/ fail to build for me when I use -static with the following error: /home/bergner/gcc/install/gcc-fsf-mainline-htm/lib64/libitm.a(method-serial.o):(.opd+0x1098): multiple definition of `__cxa_pure_virtual' /home/bergner/gcc/install/gcc-fsf-mainline-htm/lib64/libstdc++.a(pure.o):(.opd+0x0): first defined here collect2: error: ld returned 1 exit status The comment in method-serial.cc says it's trying to avoid a dependency on libstdc++. Is the __cxa_pure_virtual workaround in method-serial.cc supposed to work with -static? Finially, when compiling (static or non-static) static-ctor.C, I'm seeing: /home/bergner/gcc/gcc-fsf-mainline-htm/libitm/testsuite/libitm.c++/static_ctor.C:12:18: error: unsafe function call 'void __cxa_guard_release(long long int*)' within 'transaction_safe' function static int y = x; ^ /home/bergner/gcc/gcc-fsf-mainline-htm/libitm/testsuite/libitm.c++/static_ctor.C:12:18: error: unsafe function call 'int __cxa_guard_acquire(long long int*)' within 'transaction_safe' function Does x86 not get calls to __cxa_guard_acquire and __cxa_guard_release for this access, so it doesn't see this error? To be honest, I'm not sure what we're supposed to do with this error. Peter
Re: register indirect addressing for global variables on powerpc
On Mon, 2013-01-14 at 08:00 +0100, Thomas Baier wrote: > The operating system I'd like to use gcc for (OS-9, for the curious) > requires an ABI, where global variables are only accessed through > register indirect addressing. On the powerpc platform, r2 is used for > indirect addressing. There is already a feature in gcc which can use > register indirect addressing for the powerpc target for global variables > using a special small data area, but unfortunately this is not enough. If you look at the -mcmodel={small,medium,large} support we (IBM) added to powerpc64-linux, you will see how one can generate larger offsets to r2 (16-bit, 32-bit and 64-bit respectively). Maybe you can borrow some of that code? Peter
Re: bootstrap comparison failure ppc64 FreeBSD
On Wed, 2012-11-14 at 18:51 +0100, Andreas Tobler wrote: > Hello, > > on trunk (193501) I get a comparison failure: > --- > Bootstrap comparison failure! > gcc/tree-ssa-forwprop.o differs > --- > > This is with --disable-checking. Leaving disable-checking away, the > bootstrap completes succesfully. I just fired off a --disable-checking build and I see the same thing on powerpc64-linux. > -9658:e8 89 00 09 ldu r4,8(r9) > -965c:39 08 00 01 addir8,r8,1 > +9658:39 08 00 01 addir8,r8,1 > +965c:e8 89 00 09 ldu r4,8(r9) Looks like a harmless scheduling difference, but enough trigger the stage2/stage3 comparison. :( Peter
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Mon, 2012-11-05 at 15:47 +0100, Jakub Jelinek wrote: > On Mon, Nov 05, 2012 at 08:40:00AM -0600, Peter Bergner wrote: > > Well we also patch config.in and configure.ac/configure. If those are > > acceptable to be patched later too, then great. If not, the patch > > That is the same thing as config.gcc bits. > > > isn't really very large. We did do this for power7 initially too: > > > > http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00162.html > > But then power7 patch went in during stage1 of the n+1 release, and > wasn't really backported to release branch (just to distro vendor branches), > right? I think we could have done better there, yes, but not all of our patches were appropriate for backporting, especially those parts that touched outside of the port. There will be portions of power8 we won't/don't want to backport either, but I would like to get the major backend portions like machine description files and the like backported to 4.8 when the time comes. Having the configurey changes in would help that, but if you say those are things we can get in after stage1, then that can ease things a bit. That said, I'll post our current patch as is and discuss within our team and with David on what our next course of action should be. Peter
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Mon, 2012-11-05 at 13:53 +0100, Jakub Jelinek wrote: > On Mon, Nov 05, 2012 at 06:41:47AM -0600, Peter Bergner wrote: > > I'd like to post later today (hopefully this morning) a very minimal > > configure patch that adds the -mcpu=power8 and -mtune=power8 compiler > > options to gcc. Currently, power8 will be an alias for power7, but > > getting this path in now allows us to add power8 support to the > > compiler without having to touch the arch independent configure script. > > config.gcc target specific hunks are part of the backend, the individual > target maintainers can approve changes to that, I really don't see a reason > to add a dummy alias now just for that. If the power8 enablement is > approved and non-intrusive enough that it would be acceptable even during > stage 3, then so would be corresponding config.gcc changes. Well we also patch config.in and configure.ac/configure. If those are acceptable to be patched later too, then great. If not, the patch isn't really very large. We did do this for power7 initially too: http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00162.html Peter
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Mon, 2012-10-29 at 18:56 +0100, Jakub Jelinek wrote: > Status > == > > I'd like to close the stage 1 phase of GCC 4.8 development > on Monday, November 5th. If you have still patches for new features you'd > like to see in GCC 4.8, please post them for review soon. Patches > posted before the freeze, but reviewed shortly after the freeze, may > still go in, further changes should be just bugfixes and documentation > fixes. I'd like to post later today (hopefully this morning) a very minimal configure patch that adds the -mcpu=power8 and -mtune=power8 compiler options to gcc. Currently, power8 will be an alias for power7, but getting this path in now allows us to add power8 support to the compiler without having to touch the arch independent configure script. The only hang up at the moment is we're still determining the assembler mnemonic we'll be releasing that the gcc configure script will use to test for power6 assembler support. Peter
Re: Memory corruption due to word sharing
On Wed, 2012-02-01 at 13:09 -0500, David Miller wrote: > From: Michael Matz > Date: Wed, 1 Feb 2012 18:41:05 +0100 (CET) > > > One problem is that it's not a new problem, GCC emitted similar code since > > about forever, and still they turned up only now (well, probably because > > ia64 is dead, but sparc64 should have similar problems). > > Indeed, on sparc64 it does do the silly 64-bit access too: > > wrong: > ldx [%o0+8], %g2 > sethi %hi(2147483648), %g1 > or %g2, %g1, %g1 > jmp %o7+8 > stx%g1, [%o0+8] Ditto for powerpc64-linux: ld 9,8(3) li 10,1 rldimi 9,10,31,32 std 9,8(3) blr Peter
Re: Recovering REG_EXPR information after temporary expression replacement
On Fri, 2012-01-27 at 18:40 +0100, Michael Matz wrote: > The hack below works in this specific situation (TERed into a switch), and > adds a REG_EXPR when an TERed SSA name ever expanded into a pseudo (i.e. > also for some more generic situations). FYI, I bootstrapped and regtested your patch on powerpc64-linux and did not see any regressions. Peter
Re: IRA issue with shuffle copies...
On Wed, 2012-01-11 at 12:29 -0500, Vladimir Makarov wrote: > There is no visible effect of the patch on SPECFP2000 performance and > size (the size increase is only about 0.02%) for x86 and x86-64. > > The patch does worsen performance of SPECINT2000 on x86 (about 0.5%) and > x86-64 (about 0.3%). x86-64 SPECINT200 code size increase is about > 0.05% and there is no visible change in code size on x86. > > So I'd say the patch does not work for x86/x86-64. Pat ran SPEC2000 and SPEC2006 and we had some wins and some loses. We'll dig into a couple of the loses to see if we can learn anything and report back if we do. Thanks for doing the x86* runs. Peter
Re: IRA issue with shuffle copies...
On Tue, 2012-01-10 at 12:20 -0500, Vladimir Makarov wrote: > > Do we really need or want to create shuffle copies for insns that do not > > have a two operand constraint? > Yes, I think so. As I remember I did some benchmarking and it gave some > "order" in hard register assignments and improved code slightly (at > least for SPEC2000) even for 3-ops insn architectures. I'm a little skeptical about 3-op insn architectures, but will take your word for it since you tested it. I may have someone on the team disable completely for ppc just as a test just so we can analyze why it helps. Sometimes just knowing why is a good thing. :) > Your patch might work. But we need to test it for major 2-ops > architecture x86/x86-64 and 3-ops ppc (I believe SPEC2000 would be ok > for this). Ok, I'll have someone on my team kick off this patch on ppc, but it would be nice if someone else could do the runs on x86/x86_64 or other cpus that might be affected that we don't have access to. Peter
IRA issue with shuffle copies...
Hi Vlad, While debugging a slightly modified version of the test case in PR16458: int foo (unsigned int a, unsigned int b) { if (a == b) return 1; if (a > b) return 2; if (a < b) return 3; if (a != b) return 4; return 0; } I noticed a couple of ugly code gen warts which I tracked back to IRA. Namely, compiling the above with -O2 -m32 on powerpc64-linux, I'm seeing: li 9,3 mr 3,9 blr and: li 9,1 mr 3,9 blr If we look at the rtl just before IRA, we have the following: BB2: (set (reg/v:SI 122 [ a ]) (reg:SI 3 3 [ a ])) REG_DEAD (reg:SI 3 3 [ a ]) (set (reg/v:SI 123 [ b ]) (reg:SI 4 4 [ b ])) REG_DEAD (reg:SI 4 4 [ b ]) (set (reg:CC 124) (compare:CC (reg/v:SI 122 [ a ]) (reg/v:SI 123 [ b ]))) (if_then_else (eq (reg:CC 124) (const_int 0 [0])) goto BB6; BB3: (set (reg:CCUNS 125) (compare:CCUNS (reg/v:SI 122 [ a ]) (reg/v:SI 123 [ b ]))) REG_DEAD (reg/v:SI 123 [ b ]) REG_DEAD (reg/v:SI 122 [ a ]) (set (reg:SI 120 [ D.1379 ]) (const_int 2 [0x2])) (if_then_else (gtu (reg:CC 124) (const_int 0 [0])) goto BB8; BB4: (if_then_else (geu (reg:CC 124) (const_int 0 [0])) goto BB7; BB5: (set (reg:SI 120 [ D.1379 ]) (const_int 3 [0x3])) goto BB8; BB6: (set (reg:SI 120 [ D.1379 ]) (const_int 1 [0x1])) goto BB8; BB7: (set (reg:SI 120 [ D.1379 ]) (const_int 4 [0x4])) BB8: (set (reg/i:SI 3 3) (reg:SI 120 [ D.1379 ])) REG_DEAD (reg:SI 120 [ D.1379 ]) (use (reg/i:SI 3 3)) return When we start coloring the allocnos, we get the following: Pass 1 for finding pseudo/allocno costs r125: preferred CR_REGS, ... r124: preferred CR_REGS, ... r123: preferred GENERAL_REGS, ... r122: preferred GENERAL_REGS, ... r120: preferred GENERAL_REGS, ... ... Popping a3(r122,l0) -- assign reg 3 Popping a2(r123,l0) -- assign reg 4 Popping a0(r120,l0) -- assign reg 9 Popping a4(r124,l0) -- assign reg 75 Popping a1(r125,l0) -- assign reg 3 Assigning 75 to a1r125 This looks a little startling, since we're initially assigning r125 to r3, even though it's preferred class is CR_REGS before improve_allocation() saves us and reassigns r125 to r75 (a real CR reg). The reason r125 ends up initially in r3 is that we detect a "shuffle" copy during the set of r125, because r122 (and r123) dies in the insn r125 is defined in. This ends up preferencing the costs for r125, such that it wants r3. This in turn via ALLOCNO_UPDATED_HARD_REG_COSTS() increases the cost of assigning r120 to r3, such that r120 ends up with r9 instead, when we really really want it to get r3. Your comments about the "shuffle" copies seem to infer that they're being used to try and help insns with two operand contraints, but in the case above, they're over preferencing things. As an experiment, I disabled all shuffle copies and the code gen for the test case above is much improved. Do we really need or want to create shuffle copies for insns that do not have a two operand constraint? If not, do you know how we can test for that? If you think we do need that for non two operand contraint insns, can we at least disable creating shuffle copies for allocnos that have different preferred classes, since they're probably not going to be assigned the same hard reg? Ala: Index: ira-conflicts.c === --- ira-conflicts.c (revision 182936) +++ ira-conflicts.c (working copy) @@ -397,6 +397,11 @@ process_regs_for_copy (rtx reg1, rtx reg enum machine_mode mode; ira_copy_t cp; + if (!constraint_p + && reg_preferred_class (REGNO (reg1)) +!= reg_preferred_class (REGNO (reg2))) +return false; + gcc_assert (REG_SUBREG_P (reg1) && REG_SUBREG_P (reg2)); only_regs_p = REG_P (reg1) && REG_P (reg2); reg1 = go_through_subreg (reg1, &offset1); Your thoughts? Peter
Re: Discussion: What is unspec_volatile?
On Sat, 2010-11-13 at 11:27 +0100, Paolo Bonzini wrote: > On 11/12/2010 03:25 PM, H.J. Lu wrote: > > IRA may move instructions across an unspec_volatile, > > Do you have a testcase? Are you sure it's IRA and not our old friend update_equiv_regs() which IRA calls? http://gcc.gnu.org/PR41171 shows an example where update_equiv_regs() moves code around. Peter
Re: %pc relative addressing of string literals/const data
latOn Mon, 2010-11-08 at 21:13 +, Dave Korn wrote: > On 08/11/2010 13:44, Joakim Tjernlund wrote: > > One ping and a few days later and nothing. Very frustrating. I don't > > believe all PPC devs are so "busy" that none has the time to look > > at a simple one liner. What is up? > > There's only the one of him. He probably is that busy. He's a very nice > bloke and wouldn't be snubbing you just to be nasty, but he does have a day > job as well as volunteering for GCC. Not to mention he was at the recent GCC Summit and probably has a large backlog of email to catch up with. Hälsningar, Peter
Re: GCC Binary
On Fri, 2010-08-06 at 12:27 -0700, Erick Garske wrote: > There a location where I can download the binary of GCC for the IBM i? > > http://gcc.gnu.org/install/binaries.html > > Are any of these compatible for the IBM i at V6R1M0? There is no support in GCC for native iSeries (AKA AS/400). Peter
Re: A question about mov pattern
On Thu, 2010-06-24 at 08:57 -0600, Jeff Law wrote: > On 06/24/10 02:02, Revital1 Eres wrote: > > Hello, > > > > In the new target I'm working on there are branch regs and gprs. > > The loads and store instructions are only to/from the gprs, so if a > > branch reg needs to be spilled it first needs to be moved to a gpr and > > then stored to memory. I've implemented mov pattern in the machine > > description file for the gprs and a mov pattern between gprs and branch > > regs; however I'm am not sure if I need to add more to model the behavior > > described above and if so how to do it. > > > Secondary reloads is the answer. > > This isn't a terribly uncommon situation. Handling of the shift > register (SAR) on the PA would be a good example. You can move the SAR > to/from a GPR, but SAR can not be stored directly to memory. Searches > for SAR in pa.c will get you a long way. The same is true for the condition register on PowerPC. Peter
Re: IRA undoing scheduling decisions
On Wed, 2009-09-02 at 11:49 -0400, Vladimir Makarov wrote: > So probably, it is worth to do update_equiv_reg as a separate pass. Agreed. > I'll submit a patch on next week (sorry, I am a bit busy this week). Sounds good. Thanks for taking care of this! Peter
Re: IRA undoing scheduling decisions
On Tue, 2009-09-01 at 16:46 -0400, Vladimir Makarov wrote: > Peter Bergner wrote: > > Were you going to whip that patch up or did you want me to? > > > I am going to do it by myself. Great! I'd like to see how your patch affects POWER6 performance. Do you have access to a POWER6 box? If not, can you send Pat and I the patch and we'll fire off a run on our POWER6 benchmark system. Thanks. Peter
Re: IRA undoing scheduling decisions
On Tue, 2009-09-01 at 10:38 -0400, Vladimir Makarov wrote: > We could do update_equiv_regs in a separate pass before the 1st insn > scheduling as it was before IRA. IIRC, update_equiv_regs() was always called as part of local-alloc, so it was always after sched1 even before IRA. That said, moving it to its own pass before sched1 sounds like an interesting idea. My patch from the other note basically didn't affect SPEC2000 at all, and we could use it, but if your idea works, I'm more than happy to dump my patch. :) Were you going to whip that patch up or did you want me to? Peter
Re: IRA undoing scheduling decisions
On Wed, 2009-08-26 at 17:12 -0500, Peter Bergner wrote: > On Wed, 2009-08-26 at 23:30 +0200, Richard Guenther wrote: > > Hmm. I suppose if you conditionalize it on flag_schedule_insns it might be > > an overall win. Care to SPEC test that change? > > I assume you mean like the change below? Yeah, I can SPEC test that. > > Peter > > > Index: ira.c > === > --- ira.c (revision 15) > +++ ira.c (working copy) > @@ -2510,6 +2510,8 @@ update_equiv_regs (void) >calls. */ > > if (REG_N_REFS (regno) == 2 > + && (!flag_schedule_insns > + || REG_BASIC_BLOCK (regno) < NUM_FIXED_BLOCKS) > && (rtx_equal_p (x, src) > || ! equiv_init_varies_p (src)) > && NONJUMP_INSN_P (insn) Pat ran the patch on SPEC2000 and it was very neutral. The overall SPECFP number didn't change and the SPECINT number only improved by 0.2%, which is pretty much in the noise. I think Vlad's suggestion of moving update_equiv_regs() to its own pass before sched1 sounds interesting. If that works, it's probably better than this patch. Peter
Re: IRA undoing scheduling decisions
On Wed, 2009-08-26 at 23:30 +0200, Richard Guenther wrote: > On Wed, Aug 26, 2009 at 10:47 PM, Peter Bergner wrote: > > Looking at update_equiv_regs(), if I disable the replacement for regs > > that are local to one basic block (patch below) like it existed before > > John Wehle's patch way back in Oct 2000: > > > > http://gcc.gnu.org/ml/gcc-patches/2000-09/msg00782.html > > > > then we get the ordering we want. Does anyone know why John removed > > that part of the test in his patch? Thoughts anyone? > > Hmm. I suppose if you conditionalize it on flag_schedule_insns it might be > an overall win. Care to SPEC test that change? I assume you mean like the change below? Yeah, I can SPEC test that. Peter Index: ira.c === --- ira.c (revision 15) +++ ira.c (working copy) @@ -2510,6 +2510,8 @@ update_equiv_regs (void) calls. */ if (REG_N_REFS (regno) == 2 + && (!flag_schedule_insns + || REG_BASIC_BLOCK (regno) < NUM_FIXED_BLOCKS) && (rtx_equal_p (x, src) || ! equiv_init_varies_p (src)) && NONJUMP_INSN_P (insn)
Re: IRA undoing scheduling decisions
On Mon, 2009-08-24 at 23:56 +, Charles J. Tabony wrote: > I am seeing a performance regression on the port I maintain, and I would > appreciate some pointers. > > When I compile the following code > > void f(int *x, int *y){ > *x = 7; > *y = 4; > } > > with GCC 4.3.2, I get the desired sequence of instructions. I'll call it > sequence A: > > r0 = 7 > r1 = 4 > [x] = r0 > [y] = r1 > > When I compile the same code with GCC 4.4.0, I get a sequence that is lower > performance for my target machine. I'll call it sequence B: > > r0 = 7 > [x] = r0 > r0 = 4 > [y] = r0 This is caused by update_equiv_regs() which IRA inherited from local-alloc.c. Although with gcc 4.3 and earlier, you don't see the problem, it is still there, because if you look at the 4.3 dumps, you will see that update_equiv_regs() unordered them for us. What is saving us is that sched2 reschedules them again for us in the order we want. With 4.4, IRA happens to reuse the same register for both pseudos, so sched2 is hand tied and cannot schedule them back again for us. Looking at update_equiv_regs(), if I disable the replacement for regs that are local to one basic block (patch below) like it existed before John Wehle's patch way back in Oct 2000: http://gcc.gnu.org/ml/gcc-patches/2000-09/msg00782.html then we get the ordering we want. Does anyone know why John removed that part of the test in his patch? Thoughts anyone? Peter Index: ira.c === --- ira.c (revision 15) +++ ira.c (working copy) @@ -2510,6 +2510,7 @@ update_equiv_regs (void) calls. */ if (REG_N_REFS (regno) == 2 + && REG_BASIC_BLOCK (regno) < NUM_FIXED_BLOCKS && (rtx_equal_p (x, src) || ! equiv_init_varies_p (src)) && NONJUMP_INSN_P (insn)
Re: Incorrect line info in printf for powerpc-eabisim -mhard-foat
On Thu, 2009-07-16 at 13:55 -0700, Michael Eager wrote: > I've tracked down a failure in gdb to hit a breakpoint > set at printf to the the breakpoint being placed incorrectly. > > Here is the code generated for printf with -mhard-float: > > .loc 1 29 0 > .cfi_startproc > .LVL0: > mflr 0 > stwu 1,-112(1) > .LCFI0: > .cfi_def_cfa_offset 112 > stw 5,24(1) > stw 0,116(1) > stw 6,28(1) > stw 7,32(1) > stw 8,36(1) > stw 9,40(1) > stw 10,44(1) > bne- 1,.L2 <<< - 1 > .cfi_offset 65, 4 > .loc 1 29 0 <<< - 2 > stfd 1,48(1)<<< - 3 > stfd 2,56(1) > stfd 3,64(1) > stfd 4,72(1) > stfd 5,80(1) > stfd 6,88(1) > stfd 7,96(1) > stfd 8,104(1) > .L2: > .loc 1 34 0 > > Gdb places a breakpoint at printf() at the stfd instruction (3). > This appears to be because of the .loc at (2). When the code is > executed, the branch (1) is taken, jumping over the the breakpoint. > I think that the .loc at (2) should not be generated, since it is > in the middle of the prologue code. Luis, isn't there a bugzilla regarding this? This seems to me to be similar to what you had been looking at. Peter
Re: (known?) Issue with bitmap iterators
On Sat, 2009-06-20 at 17:01 +0200, Richard Guenther wrote: > On Sat, Jun 20, 2009 at 4:54 PM, Jeff Law wrote: > > > > Imagine a loop like this > > > > EXECUTE_IF_SET_IN_BITMAP (something, 0, i, bi) > > { > > bitmap_clear_bit (something, i) > > [ ... whatever code we want to process i, ... ] > > } > > > > This code is unsafe. [snip] > It is known (but maybe not appropriately documented) that deleting > bits in the bitmap you iterate over is not safe. If it would be me I would > see if I could make it safe though. FYI, that's what I did with the sparseset implementation, so: EXECUTE_IF_SET_IN_SPARSESET (something, i) { sparseset_clear_bit (something, i); [ ... whatever code we want to process i, ... ] } is safe. In fact, we use it for one of the special cases in sparseset_and() and sparseset_and_compl(). Peter
Re: Status of the DLX backend for GCC?
On Sat, 2008-10-04 at 18:48 +0200, Gerald Pfeifer wrote: > Thanks for the background on this, Peter, and the background on this > site disappearing. > > The reason I asked was that we have that reference from our site to that > URL and I failed to find any replacement so far. The first two hits that > I get in Google actually are mails by you in the gcc archives. ;-) > > I guess we'll just have to remove that reference? I talked with Aaron Sawdey and he still had the tarballs which he has given me. Let me go through a build process with them to make sure they still build and then I'll post them somewhere you can link to. Peter
Re: improving testsuite runtime
On Fri, 2008-09-19 at 09:41 +1000, Ben Elliston wrote: > On Thu, 2008-09-18 at 10:44 -0600, Tom Tromey wrote: > > Yeah, this seems necessary. Ideally the order ought to be stable, too. > > Do you think that the current order of .exps should be preserved in the > resultant .sum and .logs? I guess some people and/or build farms > actually use diff rather than compare_tests? Do people still use compare_tests? Talking with Janis, she mentioned that it wasn't multilib (ie, RUNTESTFLAGS="--target_board=unix'{-m32,-m64}') compatible, but that test_summary was. It's what I've been using to compare two runs. Peter
Re: IRA copy heuristics
On Thu, 2008-09-04 at 20:28 -0400, David Edelsohn wrote: > On Thu, Sep 4, 2008 at 7:39 PM, Vladimir Makarov <[EMAIL PROTECTED]> wrote: > > Meanwhile I am going to submit your second patch with an added > > comment. The patch permits gcc to generate the same quality code as > > before your first patch. > > Why? > > As Richard said before: > > "... it changes > the heuristics _without any explanation of why this is necessary_. > IMO, that's unacceptable for our shiny, new (and generally very nice) > register allocator. And I think it's unacceptable even if it happens > to fix a performance regression." I have to agree with Richard and David here. I find it troubling that allocation order affects performance by anything other than a small amount due to heuristic noise. It might be in the end there is a valid reason on why Richard's patch has a positive benefit, but until we know why, I'd rather wait. Peter
Re: Bootstrap failures on ToT, changes with no ChangeLog entry?
On Thu, 2008-07-24 at 18:48 +0200, Andreas Schwab wrote: > Definitely something fishy around that time. svn log says: > > > r138082 | meissner | 2008-07-23 13:18:03 +0200 (Mi, 23 Jul 2008) | 1 line > > Add missing ChangeLog from 138075 > > r138078 | meissner | 2008-07-23 13:06:42 +0200 (Mi, 23 Jul 2008) | 1 line > > undo 138077 > > r138075 | meissner | 2008-07-23 12:28:06 +0200 (Mi, 23 Jul 2008) | 1 line > > Add ability to set target options (ix86 only) and optimization options on a > func > > > And svn diff says: > > $ svn diff -c138078 > svn: Unable to find repository location for '' in revision 138077 > $ svn diff -c138077 > svn: The location for '' for revision 138077 does not exist in the repository > or refers to an unrelated object > > Apparently the repository has some issues with revision 138077. Maybe it's related to this #gcc comment: [snip] However, I did accidentily delete the trunk when I was trying to delete the branch, and did a copy from the previous version. Is there anyway on the svn pre-commits to prevent somebody deleting the trunk? Peter
Re: Bad code generation on HPPA platform
On Thu, 2008-05-08 at 11:38 -0700, Steve Ellcey wrote: > The psuedo for %r8 does have REG_POINTER set and the psuedo for %r19 > does not. I first see REG_POINTER set for ivtmp___1536 (the psuedo for > %r8) in flow.c.138r.loop2_invariant. This seems interesting because > Peter's patch, that fixes this problem without undoing Andrews patch, > includes a change to loop-invariant.c, though that change should be > preserving REG_POINTER's during optimization not preventing them. Similar to hppa, power6 cares about knowing whether a pseudo is a pointer or not, because for regA + regB load/store addressing, we get much better performance if regA is the pointer and regB is the offset rather than the other way around. What I found, was that the loop invariant and GCSE code were creating some pseudos to copy expressions into, but was failing to copy the REG_POINTER/MEM_POINTER attribute along with it. The hunk from: http://gcc.gnu.org/ml/gcc-patches/2008-04/msg00693.html which replaced the rtlanal.c from the first commit was needed at -O0, because the only chance to order the operands at -O0 is at expand time. Peter
Re: Bad code generation on HPPA platform
On Wed, 2008-05-07 at 11:03 -0700, Steve Ellcey wrote: > > Can you please also add the replacement hunk from: > > > > o;?http://gcc.gnu.org/ml/gcc-patches/2008-04/msg00693.html > > > > If the first part gets backported, I'd like the second hunk to > > go along with it if possible. Thanks. > > > > Peter > > I was wondering about that patch since it seems to be related to the > other changes. I will include it in my 4.3 branch testing. Yes, it ends up doing the same thing the rtlanal.c hunk that was reverted did, but in a manner much more friendly to CRIS. Thanks. Peter
Re: Bad code generation on HPPA platform
On Wed, 2008-05-07 at 10:10 -0700, Steve Ellcey wrote: > Yes, it looks like it is. I added -fno-strict-aliasing and the perl > benchmarks passed when compiled with ToT GCC. That makes me feel better > about the idea of putting Peter's patch (with the revert) on the 4.3 > branch as a way to fix the HPPA bad code generation bug. I am going to > test that patch on the branch and verify that it fixes my SPEC/GCC > failure. Can you please also add the replacement hunk from: http://gcc.gnu.org/ml/gcc-patches/2008-04/msg00693.html If the first part gets backported, I'd like the second hunk to go along with it if possible. Thanks. Peter
Re: Bad code generation on HPPA platform
On Wed, 2008-05-07 at 07:45 -0700, Steve Ellcey wrote: > I have found that this problem does not occur on the ToT sources and > that the problem went away with this patch: > > 2008-04-07 Peter Bergner <[EMAIL PROTECTED]> > >PR middle-end/PR28690 >* rtlanal.c: Update copyright years. >(commutative_operand_precedence): Give SYMBOL_REF's the same precedence >as REG_POINTER and MEM_POINTER operands. >* emit-rtl.c (gen_reg_rtx_and_attrs): New function. >(set_reg_attrs_from_value): Call mark_reg_pointer as appropriate. >* rtl.h (gen_reg_rtx_and_attrs): Add prototype for new function. >* gcse.c: Update copyright years. >(pre_delete): Call gen_reg_rtx_and_attrs. >(hoist_code): Likewise. >(build_store_vectors): Likewise. >(delete_store): Likewise. >* loop-invariant.c (move_invariant_reg): Likewise. >Update copyright years. > > I don't know if porting this patch to the 4.3 branch is an option or not > but it might be the easiest way to fix this problem without having to > revert Andrew's patch. Note that the rtlanal.c:commutative_operand_precedence() hunk was reverted because it caused some problems on CRIS and was replaced by the following safer change: http://gcc.gnu.org/ml/gcc-patches/2008-04/msg00693.html Peter
Re: IRA for GCC 4.4
On Mon, 2008-04-28 at 18:07 -0400, Vladimir Makarov wrote: > I am currently working on bit matrix compression. It is not implemented > yet. I hope it will be ready in a week. Ahh, ok. Well, hopefully the code I wrote on the trunk is useful for IRA. If you have questions about it, let me know, or if you want me to look into it on IRA, just point me to your current code that does this and I'll try and take a look when I have some free cycles. I'll note that the real key to eliminating the space from the bit matrix isn't that we know two allocnos do not interfere, but rather that we know we'll never test for whether they conflict or not. Since our definition of conflict is "live at the definition of another", that simply translates into, if they're never simultaneously live, then we'll never call any bit matrix routines asking whether they conflict or not, so we don't need to reserve space for any conflict info. The fact that local allonocs from different blocks are never simultaneously live was just a very easy and inexpensive property to measure. If your live range info can easily and cheaply partition the allocnos into sets that are and are not live simultaneously, then you should be able to see some further reductions over what I'm seeing...which I think I've shown, can be considerable. Peter
Re: IRA for GCC 4.4
On Mon, 2008-04-28 at 16:01 -0400, Vladimir Makarov wrote: > Thanks, Peter. That was clever and email is very enlightening. I have > analogous idea for more compact conflict matrix representation. IRA > builds allocno live ranges first (they are ranges of program points > where the allocno lives). I can use this information for fast searching > potential conflicts to sort the allocnos. Probably the matrix will be > even more compact because live ranges contain more detail info than > basic blocks where the local allocnos live. For example, the ranges > even can show that allocnos local in the same block will never > conflicts. It means that matrix even for fppp can be compressed. You say you use your analogous idea now? Can you point me to the code? I thought I heard you (maybe someone else?) that your conflict information was much bigger than old mainline. If this is true and you are compacting the bit matrix like I am, why is it so big? > I tried to use sparsets for the same purposes (only for maintaining and > processing allocnos currently living). But usage of sparsets for this > purposes gave practically nothing (I had to use valgrind lackey to see > the difference). Therefore I decided not to introduce the additional > data and use just bitmaps for this. > > Sparsets already exists in a compiler. I am thinking about their usage > too. May be you have a benchmark where the sparsets give a visible > compiler speed improvement (my favorite was combine.i). I'd appreciate > if you point me such benchmark. It could help me to make a decision to > use sparsets. Yes, I added the sparseset implementation that has been in since gcc 4.3. Did you use my sparseset implementation or did you write your own for your tests? I don't recall which file(s) I saw the difference on. All I recall is I tried it both ways, saw a difference somewhere and promptly threw the slower code away along with which file(s) I saw the difference on. Sorry I can't be of more help. Given how sparsesets are implemented, I cannot see how they could ever be slower than bitmaps for the use of "live", but I can see how they might be faster. That said, if your allocator is spending enough time elsewhere, then I can easily imagine the difference being swamped such that you don't see any difference at all. Peter
Re: IRA for GCC 4.4
On Thu, 2008-04-24 at 20:23 -0400, Vladimir Makarov wrote: > Hi, Peter. The last time I looked at the conflict builder > (ra-conflict.c), I did not see the compressed matrix. Is it in the > trunk? What should I look at? Yes, the compressed bit matrix was committed as revision 129037 on October 5th, so it's been there a while. Note that the old square bit matrix was used not only for testing for conflicts, but also for visiting an allocno's neighbors. The new code (and all compilers I've worked on/with), use a {,compressed} upper triangular bit matrix for testing for conflicts and an adjacency list for visiting neighbors. The code that allocates and initializes the compressed bit matrix is in global.c. If you remember how a upper triangular bit matrix works, it's just one big bit vector, where the bit number that represents the conflict between allocnos LOW and HIGH is given by either of these two functions: 1) bitnum = f(HIGH) + LOW 2) bitnum = f(LOW) + HIGH where: 1) f(HIGH) = (HIGH * (HIGH - 1)) / 2 2) f(LOW) = LOW * (max_allocno - LOW) + (LOW * (LOW - 1)) / 2 - LOW - 1 As mentioned in some of the conflict graph bit matrix literature (actually, they only mention expression #1 above), the expensive functions f(HIGH) and f(LOW) can be precomputed and stored in an array, so to access the conflict graph bits only takes a load and an addition. Below is an example bit matrix with initialized array: 012 3456789 10 11 --- | -1 | 0 || 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | --- | 9 | 1 ||| 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | --- | 18 | 2 |||| 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | --- | 26 | 3 ||||| 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | --- | 33 | 4 |||||| 38 | 39 | 40 | 41 | 42 | 43 | 44 | --- | 39 | 5 ||||||| 45 | 46 | 47 | 48 | 49 | 50 | --- | 44 | 6 |||||||| 51 | 52 | 53 | 54 | 55 | --- | 48 | 7 ||||||||| 56 | 57 | 58 | 59 | --- | 51 | 8 |||||||||| 60 | 61 | 62 | --- | 53 | 9 ||||||||||| 63 | 64 | --- | 54 | 10 |||||||||||| 65 | --- | NA | 11 ||||||||||||| --- As an example, if we look at the interference between allocnos 8 and 10, we compute "array[8] + 10" = "51 + 10" = "61", which if you look above, you will see is the correct bit number for that interference bit. The difference between a compressed upper triangular bit matrix from a standard upper triangular bit matrix like the one above, is we eliminate space from the bit matrix for conflicts we _know_ can never exist. The easiest case to catch, and the only one we catch at the moment, is that allocnos that are "local" to a basic block B1 cannot conflict with allocnos that are local to basic block B2, where B1 != B2. To take advantage of this fact, I updated the code in global.c to sort the allocnos such that all "global" allocnos (allocnos that are live in more than one basic block) are given smaller allocno numbers than the "local" allocnos, and all allocnos for a given basic block are grouped together in a contiguous range to allocno numbers. The sorting is accomplished by: /* ...so we can sort them in the order we want them to receive their allocnos. */ qsort (reg_allocno, max_allocno, sizeof (int), regno_compare); Once we have them sorted, our conceptual view of the compressed bit matrix will now look like: GGGB0 B0 B0 B1 B1 B2 B2 B2 B2 012 3456789 10 11 -- - | -1 |G 0 || 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | --
Re: IRA for GCC 4.4
On Thu, 2008-04-24 at 16:33 -0500, Peter Bergner wrote: > On Thu, 2008-04-24 at 16:51 +0200, Paolo Bonzini wrote: > > >> (The testcase is 400k lines of preprocessed Fortran code, 16M is size, > > >> available here: > > >> http://www.pci.unizh.ch/vandevondele/tmp/all_cp2k_gfortran.f90.gz) > > >> > > >> > > > Thanks, I'll check it. > > > > Vlad, I think you should also try to understand what does trunk do with > > global (and without local allocation) at -O0. That will give a > > measure of the benefit from Peter's patches for conflict graph building. > > I took a patch from Ken/Steven that disabled local_alloc and instead runs > global_alloc() at -O0 and summing up all of the bit matrix allocation > info we emit into the *.greg output, the new conflict builder saves a lot > of space compared to the old square bit matrix (almost 20x less space). > Here's the accumulated data for the test case above: > > compressed upper triangular: 431210251 bits, 53902848 bytes > upper triangular:4264666581 bits, 533084851 bytes > square: 8531372796 bits, 1066423618 bytes The SPEC2000 numbers look even better (29x less space): compressed upper triangular: 281657797 bits, 35212532 bytes upper triangular 4094809686 bits, 511856604 bytes square: 8191641644 bits, 1023962188 bytes Peter
Re: IRA for GCC 4.4
On Thu, 2008-04-24 at 16:51 +0200, Paolo Bonzini wrote: > >> (The testcase is 400k lines of preprocessed Fortran code, 16M is size, > >> available here: > >> http://www.pci.unizh.ch/vandevondele/tmp/all_cp2k_gfortran.f90.gz) > >> > >> > > Thanks, I'll check it. > > Vlad, I think you should also try to understand what does trunk do with > global (and without local allocation) at -O0. That will give a > measure of the benefit from Peter's patches for conflict graph building. I took a patch from Ken/Steven that disabled local_alloc and instead runs global_alloc() at -O0 and summing up all of the bit matrix allocation info we emit into the *.greg output, the new conflict builder saves a lot of space compared to the old square bit matrix (almost 20x less space). Here's the accumulated data for the test case above: compressed upper triangular: 431210251 bits, 53902848 bytes upper triangular:4264666581 bits, 533084851 bytes square: 8531372796 bits, 1066423618 bytes Peter