Re: [RFC][PATCH 0/5] arch: atomic rework
I don't know the specifics of your example, but from how I understand it, I don't see a problem if the compiler can prove that the store will always happen. To be more specific, if the compiler can prove that the store will happen anyway, and the region of code can be assumed to always run atomically (e.g., there's no loop or such in there), then it is known that we have one atomic region of code that will always perform the store, so we might as well do the stuff in the region in some order. Now, if any of the memory accesses are atomic, then the whole region of code containing those accesses is often not atomic because other threads might observe intermediate results in a data-race-free way. (I know that this isn't a very precise formulation, but I hope it brings my line of reasoning across.) So given something like: if (x) y = 3; assuming both x and y are atomic (so don't gimme crap for now knowing the C11 atomic incantations); and you can prove x is always true; you don't see a problem with not emitting the conditional? Avoiding the conditional changes the result; see that control dependency email from earlier. In the above example the load of X and the store to Y are strictly ordered, due to control dependencies. Not emitting the condition and maybe not even emitting the load completely wrecks this. Its therefore an invalid optimization to take out the conditional or speculate the store, since it takes out the dependency.
gnat.dg test: div_no_warning.adb
Hi, I'm trying to use the tests in gcc/testsuite/gnat.dg and I'm having trouble understanding one particular test: gnat.dg/div_no_warning.adb. If I understand correctly, this test is expected to compile without any error or warning. When I compile this test with a native compiler gnatmake, I get no error nor warning. But when I compile it with a gcc port to private target I get the following output: div_no_warning.adb:13:20: warning: division by zero div_no_warning.adb:13:20: warning: Constraint_Error will be raised at run time The source looks like: . 4 : Flag : constant Boolean := False; . 12: if Flag and then F then 13: Int := Int / 0; 14: end if; . I checked the assembler produced by my compiler, it does not contain any code corresponding to the division or if code shown before. So I guess the compiler has detected that the if False and then . is dead code, but still I get the warnings. I'm having a hard time finding what kind of option is activated/suppressed in the native compiler so it does not output those warnings. I'm working on GCC 4.7.3 and GNAT 7.1.2. Any help appreciated. Thanks, Didier
m68k optimisation for beginners?
Hi. I would like to get started with how to improve code generation for a backend. Any pointers, especially to good documentation is welcome. For this example consider this C function for a reference counted type: void TCRelease(TCTypeRef tc) { if (--tc-retainCount == 0) { if (tc-destroy) { tc-destroy(tc); } free((void *)tc); } } The generated m68k asm is this: _TCRelease: move.l %a2,-(%sp) move.l 8(%sp),%a2 move.w (%a2),%d0 ; Question 1: subq.w #1,%d0 move.w %d0,(%a2) jne .L7 move.l 4(%a2),%a0 ; Question 2: cmp.w #0,%a0 jeq .L9 move.l %a2,-(%sp) ; Question 3: jsr (%a0) addq.l #4,%sp .L9: move.l %a2,8(%sp) move.l (%sp)+,%a2 jra _free .L7: move.l (%sp)+,%a2 rts Question 1: This could be done as one instructions sub.l #1, (%a2), the result in d0 is never used again, and adding directly to memory will update the status flags. Would save 4 bytes, and 8 cycles on a 68000. How would I attack this problem? Peephole optimisation, or maybe the gcc is not aware that the instruction updates flags? Question 2: Doing this as a move.l 4(%a2), %d0 to a temporary data register would update the status register, allowing for the branch without the compare with immediate instruction. Obviously requiring an extra move %d0, %a0 if the branch is not taken to be able to make the jump. But still 2 bytes, and 8 cycles saved in work case (12 cycles is best case). Is this a peephole optimisation? Or is it about providing accurate instruction costs for inst? Question 3: Storing a2 on the stack is only ever needed if this code path is taken. Is this even worth to bother with? And is this something that moving from reload to LRA for the m68k target solves? // Fredrik Olsson
Re: Fwd: LLVM collaboration?
On Tue, Feb 11, 2014 at 10:20 PM, Jan Hubicka hubi...@ucw.cz wrote: Since both toolchains do the magic, binutils has no incentive to create any automatic detection of objects. It is mostly a historical decision. At the time the design was for the plugin to be matched to the compiler, and so the compiler could pass that information down to the linker. The trouble however is that one needs to pass explicit --plugin argument specifying the particular plugin to load and so GCC ships with its own wrappers (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar thing. These wrappers should not be necessary. While the linker currently requires a command line option, bfd has support for searching for a plugin. It will search inst/lib/bfd-plugin. See for example the instructions at http://llvm.org/docs/GoldPlugin.html. My reading of bfd/plugin.c is that it basically walks the directory and looks for first plugin that returns OK for onload. (that is always the case for GCC/LLVM plugins). So if I instlal GCC and llvm plugin there it will depend who will end up being first and only that plugin will be used. We need multiple plugin support as suggested by the directory name ;) Also it sems that currently plugin is not used if file is ELF for ar/nm/ranlib (as mentioned by Markus) and also GNU-ld seems to choke on LLVM object files even if it has plugin. This probably needs ot be sanitized. This was done because ar and nm are not normally bound to any compiler. Had we realized this issue earlier we would probably have supported searching for plugins in the linker too. So it seems that what you want could be done by * having bfd-ld and gold search bfd-plugins (maybe rename the directory?) * support loading multiple plugins, and asking each to see if it supports a given file. That ways we could LTO when having a part GCC and part LLVM build. Yes, that is what I have in mind. Plus perhaps additional configuration file to avoid loading everything. Say user instealls 3 versions of LLVM, open64 and ICC. If all of them loads as a shared library, like LLVM does, it will probably slow down the tools measurably. What about instead of our current odd way of identifying LTO objects simply add a special ELF note telling the linker the plugin to use? .note._linker_plugin '/./libltoplugin.so' that way the linker should try 1) loading that plugin, 2) register the specific object with that plugin. If a full path is undesired (depends on install setup) then specifying the plugin SONAME might also work (we'd of course need to bump our plugins SONAME for each release to allow parallel install of multiple versions or make the plugin contain all the dispatch-to-different-GCC-version-lto-wrapper code). Richard. * maybe be smart about version and load new ones first? (libLLVM-3.4 before libLLVM-3.3 for example). Probably the first one should always be the one given in the command line. Yes, i think we may want to prioritize the list. So user can prevail his own version of GCC over the system one, for example. For OS X the situation is a bit different. There instead of a plugin the linker loads a library: libLTO.dylib. When doing LTO with a newer llvm, one needs to set DYLD_LIBRARY_PATH. I think I proposed setting that from clang some time ago, but I don't remember the outcome. In theory GCC could implement a libLTO.dylib and set DYLD_LIBRARY_PATH. The gold/bfd plugin that LLVM uses is basically a API mapping the other way, so the job would be inverting it. The LTO model ld64 is a bit more strict about knowing all symbol definitions and uses (including inline asm), so there would be work to be done to cover that, but the simple cases shouldn't be too hard. I would not care that much about symbols in asm definitions to start with. Even if we will force users to non-LTO those object files, it would be an improvement over what we have now. One problem is that we need a volunteer to implement the reverse glue (libLTO-plugin API), since I do not have an OS X box (well, have an old G5, but even that is quite far from me right now) Why complete symbol tables are required? Can't ld64 be changed to ignore unresolved symbols in the first stage just like gold/gnu-ld does? Honza Cheers, Rafael
Re: sparse overlapping structs for vectorization
On Wed, Feb 12, 2014 at 7:21 AM, Albert Cahalan acaha...@gmail.com wrote: I had a problem that got solved in an ugly way. I think gcc ought to provide a few ways to make a nicer solution. There was an array of structs roughly like so: struct{int w;float x;char y[4];short z[2];}foo[512][4]; The types within the struct are 4 bytes each; I don't actually remember anything else and it doesn't matter except that they are distinct. I think it was bitfields actually, neatly grouped into groups of 32 bits. In other words, like 4 4-byte values but with more-or-less incompatible types. Note that 4 of the structs neatly fill a 64-byte cache line. An alignment attribute was used to ensure 64-byte alignment. The most common operation needed on this array is to compare the first struct member of 4 of the structs against a given value, looking to see if there is a match. SSE would be good. This would then be followed by using the matching entry if there is one, else picking one of the 4 to recycle and thus use. First bad solution: One could load up 4 SSE registers, shuffle things around... NO. Second bad solution: One could simply have 4 distinct arrays. This is bad because there are different cache lines for w, x, y, and z. Third bad solution: The array can be viewed as int foo[512][4][4] instead, with the struct forming the third array index. Note that the last two array indexes are both 4, so you can kind of swap them around. This groups 4 fields of each type together, allowing SSE. The problem here is loss of type safety; one must use array indexes instead of struct field names. Like so: foo[idx][WHERE_W_IS][i] Fourth bad solution: We lay things out as in the third solution, but we cast pointers to effectively lay sparse structs over each other like shingles. { int w; int pad_wx[3]; float x; int pad_xy[3]; char y[4]; int pad_yz[3]; short z[2]; } Performance is hurt by the need for __may_alias__ and of course the result is painful to look at. We went with this anyway, using SSE intrinsics, and performance was great. Maintainability... not so much. BTW, an array of 512 structs containing 4-entry arrays was not used because we wanted to have a simple normal pointer to indicate the item being operated on. We didn't want to need a pointer,index pair. Can something be done to help out here? The first thing that pops into mind is the ability to tell gcc that the struct-to-struct byte offset for array indexing is a user-specified value instead of simply the struct size. It's possible we could have safely ignored the warning about aliasing. I don't know. Perhaps that would give even better performance, but the casting would still be very ugly. Solutions that that be defined away for non-gcc compilers are better. Do the overlay but use an overlay of type char[large enough] and load from that. Should be more maintainable than using may_alias and also work with other compilers. Richard.
Re: Fwd: LLVM collaboration?
What about instead of our current odd way of identifying LTO objects simply add a special ELF note telling the linker the plugin to use? .note._linker_plugin '/./libltoplugin.so' that way the linker should try 1) loading that plugin, 2) register the specific object with that plugin. If a full path is undesired (depends on install setup) then specifying the plugin SONAME might also work (we'd of course need to bump our plugins SONAME for each release to allow parallel install of multiple versions or make the plugin contain all the dispatch-to-different-GCC-version-lto-wrapper code). Might be an interesting addition to what we have, but keep in mind that LLVM uses thin non-ELF files. It is also able to load IR from previous versions, so for LLVM at least, using the newest plugin is probably the best default. Richard. Cheers, Rafael
Re: m68k optimisation for beginners?
On 02/12/14 02:37, Fredrik Olsson wrote: Hi. I would like to get started with how to improve code generation for a backend. Any pointers, especially to good documentation is welcome. For this example consider this C function for a reference counted type: void TCRelease(TCTypeRef tc) { if (--tc-retainCount == 0) { if (tc-destroy) { tc-destroy(tc); } free((void *)tc); } } The generated m68k asm is this: _TCRelease: move.l %a2,-(%sp) move.l 8(%sp),%a2 move.w (%a2),%d0 ; Question 1: subq.w #1,%d0 move.w %d0,(%a2) jne .L7 move.l 4(%a2),%a0 ; Question 2: cmp.w #0,%a0 jeq .L9 move.l %a2,-(%sp) ; Question 3: jsr (%a0) addq.l #4,%sp .L9: move.l %a2,8(%sp) move.l (%sp)+,%a2 jra _free .L7: move.l (%sp)+,%a2 rts Question 1: This could be done as one instructions sub.l #1, (%a2), the result in d0 is never used again, and adding directly to memory will update the status flags. Would save 4 bytes, and 8 cycles on a 68000. How would I attack this problem? Peephole optimisation, or maybe the gcc is not aware that the instruction updates flags? Most likely an issue in the combiner. Prior to conversion to RTL the decrement is turned into a three statement format (load from mem, decrement, store back to memory). The decremented value is used in the comparison. So I can reasonably guess the combiner is unable to squash all that back into a single insn. Also note that flags are effectively not exposed on the m68k. Instead a conditional branch is modeled as two insns. One which sets a special register, cc0 and one that uses the cc0 register. Those two insns are kept consecutive throughout the RTL optimizers and only during final assembly do we try to eliminate the compare by tracking the state of the flags register. There are better ways to do that, but nobody has converted the m68k to the newer style. It's a fair amount of work and not a high priority. Question 2: Doing this as a move.l 4(%a2), %d0 to a temporary data register would update the status register, allowing for the branch without the compare with immediate instruction. Obviously requiring an extra move %d0, %a0 if the branch is not taken to be able to make the jump. But still 2 bytes, and 8 cycles saved in work case (12 cycles is best case). Is this a peephole optimisation? Or is it about providing accurate instruction costs for inst? Can't be tackled without first fixing how we track the flags register. Question 3: Storing a2 on the stack is only ever needed if this code path is taken. Is this even worth to bother with? And is this something that moving from reload to LRA for the m68k target solves? This is called shrink wrapping. GCC has some limited support for shrink-wrapping these days. Someone would have to look into why the shrink-wrapping optimization did not apply here. Jeff
Re: Fwd: LLVM collaboration?
On Wed, 12 Feb 2014, Richard Biener wrote: What about instead of our current odd way of identifying LTO objects simply add a special ELF note telling the linker the plugin to use? .note._linker_plugin '/./libltoplugin.so' that way the linker should try 1) loading that plugin, 2) register the specific object with that plugin. Unless this is only allowed for a whitelist of known-good plugins in known-good directories, it's a clear security hole for the linker to execute code in arbitrary files named by linker input. The linker should be safe to run on untrusted input files. -- Joseph S. Myers jos...@codesourcery.com
Re: Fwd: LLVM collaboration?
On Wed, 12 Feb 2014, Richard Biener wrote: What about instead of our current odd way of identifying LTO objects simply add a special ELF note telling the linker the plugin to use? .note._linker_plugin '/./libltoplugin.so' that way the linker should try 1) loading that plugin, 2) register the specific object with that plugin. Unless this is only allowed for a whitelist of known-good plugins in known-good directories, it's a clear security hole for the linker to execute code in arbitrary files named by linker input. The linker should be safe to run on untrusted input files. Also I believe the flies should be independent of particular setup (that is not contain a path) and probably host OS (that is not having .so extension) at least. We need some versioning scheme for different versions of compilers. Finally we need a solution for non-ELF LTO objects (like LLVM) But yes, having an compiler independent way of declaring that plugin is needed and what plugin should be uses seems possible. Honza -- Joseph S. Myers jos...@codesourcery.com
Re: [RFC][PATCH 0/5] arch: atomic rework
On Tue, Feb 11, 2014 at 10:06:34PM -0800, Torvald Riegel wrote: On Tue, 2014-02-11 at 07:59 -0800, Paul E. McKenney wrote: On Mon, Feb 10, 2014 at 11:09:24AM -0800, Linus Torvalds wrote: On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel trie...@redhat.com wrote: Intuitively, this is wrong because this let's the program take a step the abstract machine wouldn't do. This is different to the sequential code that Peter posted because it uses atomics, and thus one can't easily assume that the difference is not observable. Btw, what is the definition of observable for the atomics? Because I'm hoping that it's not the same as for volatiles, where observable is about the virtual machine itself, and as such volatile accesses cannot be combined or optimized at all. Now, I claim that atomic accesses cannot be done speculatively for writes, and not re-done for reads (because the value could change), but *combining* them would be possible and good. For example, we often have multiple independent atomic accesses that could certainly be combined: testing the individual bits of an atomic value with helper functions, causing things like load atomic, test bit, load same atomic, test another bit. The two atomic loads could be done as a single load without possibly changing semantics on a real machine, but if visibility is defined in the same way it is for volatile, that wouldn't be a valid transformation. Right now we use volatile semantics for these kinds of things, and they really can hurt. Same goes for multiple writes (possibly due to setting bits): combining multiple accesses into a single one is generally fine, it's *adding* write accesses speculatively that is broken by design.. At the same time, you can't combine atomic loads or stores infinitely - visibility on a real machine definitely is about timeliness. Removing all but the last write when there are multiple consecutive writes is generally fine, even if you unroll a loop to generate those writes. But if what remains is a loop, it might be a busy-loop basically waiting for something, so it would be wrong (untimely) to hoist a store in a loop entirely past the end of the loop, or hoist a load in a loop to before the loop. Does the standard allow for that kind of behavior? You asked! ;-) So the current standard allows merging of both loads and stores, unless of course ordring constraints prevent the merging. Volatile semantics may be used to prevent this merging, if desired, for example, for real-time code. Agreed. Infinite merging is intended to be prohibited, but I am not certain that the current wording is bullet-proof (1.10p24 and 1.10p25). Yeah, maybe not. But it at least seems to rather clearly indicate the intent ;) That is my hope. ;-) The only prohibition against speculative stores that I can see is in a non-normative note, and it can be argued to apply only to things that are not atomics (1.10p22). I think this one is specifically about speculative stores that would affect memory locations that the abstract machine would not write to, and that might be observable or create data races. While a compiler could potentially prove that such stores aren't leading to a difference in the behavior of the program (e.g., by proving that there are no observers anywhere and this isn't overlapping with any volatile locations), I think that this is hard in general and most compilers will just not do such things. In GCC, bugs in that category were fixed after researchers doing fuzz-testing found them (IIRC, speculative stores by loops). And that is my fear. ;-) I don't see any prohibition against reordering a store to precede a load preceding a conditional branch -- which would not be speculative if the branch was know to be taken and the load hit in the store buffer. In a system where stores could be reordered, some other CPU might perceive the store as happening before the load that controlled the conditional branch. This needs to be addressed. I don't know the specifics of your example, but from how I understand it, I don't see a problem if the compiler can prove that the store will always happen. The current Documentation/memory-barriers.txt formulation requires that both the load and the store have volatile semantics. Does that help? To be more specific, if the compiler can prove that the store will happen anyway, and the region of code can be assumed to always run atomically (e.g., there's no loop or such in there), then it is known that we have one atomic region of code that will always perform the store, so we might as well do the stuff in the region in some order. And it would be very hard to write a program that proved that the store had been reordered prior to the load in this case. Now, if any of the memory accesses are atomic, then the
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, Feb 12, 2014 at 10:19:07AM +0100, Peter Zijlstra wrote: I don't know the specifics of your example, but from how I understand it, I don't see a problem if the compiler can prove that the store will always happen. To be more specific, if the compiler can prove that the store will happen anyway, and the region of code can be assumed to always run atomically (e.g., there's no loop or such in there), then it is known that we have one atomic region of code that will always perform the store, so we might as well do the stuff in the region in some order. Now, if any of the memory accesses are atomic, then the whole region of code containing those accesses is often not atomic because other threads might observe intermediate results in a data-race-free way. (I know that this isn't a very precise formulation, but I hope it brings my line of reasoning across.) So given something like: if (x) y = 3; assuming both x and y are atomic (so don't gimme crap for now knowing the C11 atomic incantations); and you can prove x is always true; you don't see a problem with not emitting the conditional? You need volatile semantics to force the compiler to ignore any proofs it might otherwise attempt to construct. Hence all the ACCESS_ONCE() calls in my email to Torvald. (Hopefully I translated your example reasonably.) Thanx, Paul Avoiding the conditional changes the result; see that control dependency email from earlier. In the above example the load of X and the store to Y are strictly ordered, due to control dependencies. Not emitting the condition and maybe not even emitting the load completely wrecks this. Its therefore an invalid optimization to take out the conditional or speculate the store, since it takes out the dependency.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote: You need volatile semantics to force the compiler to ignore any proofs it might otherwise attempt to construct. Hence all the ACCESS_ONCE() calls in my email to Torvald. (Hopefully I translated your example reasonably.) My brain gave out for today; but it did appear to have the right structure. I would prefer it C11 would not require the volatile casts. It should simply _never_ speculate with atomic writes, volatile or not.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Tue, Feb 11, 2014 at 09:39:24PM -0800, Torvald Riegel wrote: On Mon, 2014-02-10 at 11:09 -0800, Linus Torvalds wrote: On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel trie...@redhat.com wrote: Intuitively, this is wrong because this let's the program take a step the abstract machine wouldn't do. This is different to the sequential code that Peter posted because it uses atomics, and thus one can't easily assume that the difference is not observable. Btw, what is the definition of observable for the atomics? Because I'm hoping that it's not the same as for volatiles, where observable is about the virtual machine itself, and as such volatile accesses cannot be combined or optimized at all. No, atomics aren't an observable behavior of the abstract machine (unless they are volatile). See 1.8.p8 (citing the C++ standard). Us Linux-kernel hackers will often need to use volatile semantics in combination with C11 atomics in most cases. The C11 atomics do cover some of the reasons we currently use ACCESS_ONCE(), but not all of them -- in particular, it allows load/store merging. Now, I claim that atomic accesses cannot be done speculatively for writes, and not re-done for reads (because the value could change), Agreed, unless the compiler can prove that this doesn't make a difference in the program at hand and it's not volatile atomics. In general, that will be hard and thus won't happen often I suppose, but if correctly proved it would fall under the as-if rule I think. but *combining* them would be possible and good. Agreed. In some cases, agreed. But many uses in the Linux kernel will need volatile semantics in combination with C11 atomics. Which is OK, for the foreseeable future, anyway. For example, we often have multiple independent atomic accesses that could certainly be combined: testing the individual bits of an atomic value with helper functions, causing things like load atomic, test bit, load same atomic, test another bit. The two atomic loads could be done as a single load without possibly changing semantics on a real machine, but if visibility is defined in the same way it is for volatile, that wouldn't be a valid transformation. Right now we use volatile semantics for these kinds of things, and they really can hurt. Agreed. In your example, the compiler would have to prove that the abstract machine would always be able to run the two loads atomically (ie, as one load) without running into impossible/disallowed behavior of the program. But if there's no loop or branch or such in-between, this should be straight-forward because any hardware oddity or similar could merge those loads and it wouldn't be disallowed by the standard (considering that we're talking about a finite number of loads), so the compiler would be allowed to do it as well. As long as they are not marked volatile, agreed. Thanx, Paul Same goes for multiple writes (possibly due to setting bits): combining multiple accesses into a single one is generally fine, it's *adding* write accesses speculatively that is broken by design.. Agreed. As Paul points out, this being correct assumes that there are no other ordering guarantees or memory accesses interfering, but if the stores are to the same memory location and adjacent to each other in the program, then I don't see a reason why they wouldn't be combinable. At the same time, you can't combine atomic loads or stores infinitely - visibility on a real machine definitely is about timeliness. Removing all but the last write when there are multiple consecutive writes is generally fine, even if you unroll a loop to generate those writes. But if what remains is a loop, it might be a busy-loop basically waiting for something, so it would be wrong (untimely) to hoist a store in a loop entirely past the end of the loop, or hoist a load in a loop to before the loop. Agreed. That's what 1.10p24 and 1.10p25 are meant to specify for loads, although those might not be bullet-proof as Paul points out. Forward progress is rather vaguely specified in the standard, but at least parts of the committee (and people in ISO C++ SG1, in particular) are working on trying to improve this. Does the standard allow for that kind of behavior? I think the standard requires (or intends to require) the behavior that you (and I) seem to prefer in these examples.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Tue, Feb 11, 2014 at 09:13:34PM -0800, Torvald Riegel wrote: On Sun, 2014-02-09 at 19:51 -0800, Paul E. McKenney wrote: On Mon, Feb 10, 2014 at 01:06:48AM +0100, Torvald Riegel wrote: On Thu, 2014-02-06 at 20:20 -0800, Paul E. McKenney wrote: On Fri, Feb 07, 2014 at 12:44:48AM +0100, Torvald Riegel wrote: On Thu, 2014-02-06 at 14:11 -0800, Paul E. McKenney wrote: On Thu, Feb 06, 2014 at 10:17:03PM +0100, Torvald Riegel wrote: On Thu, 2014-02-06 at 11:27 -0800, Paul E. McKenney wrote: On Thu, Feb 06, 2014 at 06:59:10PM +, Will Deacon wrote: There are also so many ways to blow your head off it's untrue. For example, cmpxchg takes a separate memory model parameter for failure and success, but then there are restrictions on the sets you can use for each. It's not hard to find well-known memory-ordering experts shouting Just use memory_model_seq_cst for everything, it's too hard otherwise. Then there's the fun of load-consume vs load-acquire (arm64 GCC completely ignores consume atm and optimises all of the data dependencies away) as well as the definition of data races, which seem to be used as an excuse to miscompile a program at the earliest opportunity. Trust me, rcu_dereference() is not going to be defined in terms of memory_order_consume until the compilers implement it both correctly and efficiently. They are not there yet, and there is currently no shortage of compiler writers who would prefer to ignore memory_order_consume. Do you have any input on http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448? In particular, the language standard's definition of dependencies? Let's see... 1.10p9 says that a dependency must be carried unless: — B is an invocation of any specialization of std::kill_dependency (29.3), or — A is the left operand of a built-in logical AND (, see 5.14) or logical OR (||, see 5.15) operator, or — A is the left operand of a conditional (?:, see 5.16) operator, or — A is the left operand of the built-in comma (,) operator (5.18); So the use of flag before the ? is ignored. But the flag - flag after the ? will carry a dependency, so the code fragment in 59448 needs to do the ordering rather than just optimizing flag - flag out of existence. One way to do that on both ARM and Power is to actually emit code for flag - flag, but there are a number of other ways to make that work. And that's what would concern me, considering that these requirements seem to be able to creep out easily. Also, whereas the other atomics just constrain compilers wrt. reordering across atomic accesses or changes to the atomic accesses themselves, the dependencies are new requirements on pieces of otherwise non-synchronizing code. The latter seems far more involved to me. Well, the wording of 1.10p9 is pretty explicit on this point. There are only a few exceptions to the rule that dependencies from memory_order_consume loads must be tracked. And to your point about requirements being placed on pieces of otherwise non-synchronizing code, we already have that with plain old load acquire and store release -- both of these put ordering constraints that affect the surrounding non-synchronizing code. I think there's a significant difference. With acquire/release or more general memory orders, it's true that we can't order _across_ the atomic access. However, we can reorder and optimize without additional constraints if we do not reorder. This is not the case with consume memory order, as the (p + flag - flag) example shows. Agreed, memory_order_consume does introduce additional restrictions. This issue got a lot of discussion, and the compromise is that dependencies cannot leak into or out of functions unless the relevant parameters or return values are annotated with [[carries_dependency]]. This means that the compiler can see all the places where dependencies must be tracked. This is described in 7.6.4. I wasn't aware of 7.6.4 (but it isn't referred to as an additional constraint--what it is--in 1.10, so I guess at least that should be fixed). Also, AFAIU, 7.6.4p3 is wrong in that the attribute does make a semantic difference, at least if one is assuming that normal optimization of sequential code is the default, and that maintaining things such as (flag-flag) is not; if optimizing away (flag-flag) would require the insertion of fences unless there is the carries_dependency attribute, then this would be bad I think. No, the
Re: [LLVMdev] Zero-cost toolchain standardization process
On Feb 11, 2014, at 10:59 AM, Renato Golin renato.go...@linaro.org wrote: Hi Folks, First of all, I'd like to thank everyone for their great responses and heart warming encouragement for such an enterprise. This will be my last email about this subject on these lists, so I'd like to just let everyone know what (and where) I'll be heading next with this topic. Feel free to reply to me personally, I don't want to span an ugly two-list thread. Renato, thank you for spearheading this, but please do not cross post to both lists like this. Among other problems it is a severe pain for moderation. I’m a fan of your goals, but I’d like to point out that we have already solved this problem in various ways. For example, C++ ABI issues are dealt with quite well across GCC, LLVM, and many other compilers on the “itanium” ABI mailing list. It’s a great example of a list hosted in a “neutral” place that many compiler vendors are on, including commercial ones. Why don’t you just set up a few similar mailing lists to cover related topics (toolchain topics, language extensions, etc) and encourage the right people to join them? I feel like you’re turning a simple problem into a complex one. -Chris
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, Feb 12, 2014 at 10:07 AM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: Us Linux-kernel hackers will often need to use volatile semantics in combination with C11 atomics in most cases. The C11 atomics do cover some of the reasons we currently use ACCESS_ONCE(), but not all of them -- in particular, it allows load/store merging. I really disagree with the will need to use volatile. We should never need to use volatile (outside of whatever MMIO we do using C) if C11 defines atomics correctly. Allowing load/store merging is *fine*. All sane CPU's do that anyway - it's called a cache - and there's no actual reason to think that ACCESS_ONCE() has to mean our current volatile. Now, it's possible that the C standards simply get atomics _wrong_, so that they create visible semantics that are different from what a CPU cache already does, but that's a plain bug in the standard if so. But merging loads and stores is fine. And I *guarantee* it is fine, exactly because CPU's already do it, so claiming that the compiler couldn't do it is just insanity. Now, there are things that are *not* fine, like speculative stores that could be visible to other threads. Those are *bugs* (either in the compiler or in the standard), and anybody who claims otherwise is not worth discussing with. But I really really disagree with the we might have to use 'volatile'. Because if we *ever* have to use 'volatile' with the standard C atomic types, then we're just better off ignoring the atomic types entirely, because they are obviously broken shit - and we're better off doing it ourselves the way we have forever. Seriously. This is not even hyperbole. It really is as simple as that. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, Feb 12, 2014 at 12:22:53PM -0800, Linus Torvalds wrote: On Wed, Feb 12, 2014 at 10:07 AM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: Us Linux-kernel hackers will often need to use volatile semantics in combination with C11 atomics in most cases. The C11 atomics do cover some of the reasons we currently use ACCESS_ONCE(), but not all of them -- in particular, it allows load/store merging. I really disagree with the will need to use volatile. We should never need to use volatile (outside of whatever MMIO we do using C) if C11 defines atomics correctly. Allowing load/store merging is *fine*. All sane CPU's do that anyway - it's called a cache - and there's no actual reason to think that ACCESS_ONCE() has to mean our current volatile. Now, it's possible that the C standards simply get atomics _wrong_, so that they create visible semantics that are different from what a CPU cache already does, but that's a plain bug in the standard if so. But merging loads and stores is fine. And I *guarantee* it is fine, exactly because CPU's already do it, so claiming that the compiler couldn't do it is just insanity. Agreed, both CPUs and compilers can merge loads and stores. But CPUs normally get their stores pushed through the store buffer in reasonable time, and CPUs also use things like invalidations to ensure that a store is seen in reasonable time by readers. Compilers don't always have these two properties, so we do need to be more careful of load and store merging by compilers. Now, there are things that are *not* fine, like speculative stores that could be visible to other threads. Those are *bugs* (either in the compiler or in the standard), and anybody who claims otherwise is not worth discussing with. And as near as I can tell, volatile semantics are required in C11 to avoid speculative stores. I might be wrong about this, and hope that I am wrong. But I am currently not seeing it in the current standard. (Though I expect that most compilers would avoid speculating stores, especially in the near term. But I really really disagree with the we might have to use 'volatile'. Because if we *ever* have to use 'volatile' with the standard C atomic types, then we're just better off ignoring the atomic types entirely, because they are obviously broken shit - and we're better off doing it ourselves the way we have forever. Seriously. This is not even hyperbole. It really is as simple as that. Agreed, if we are talking about replacing ACCESS_ONCE() with C11 relaxed atomics any time soon. But someone porting Linux to a new CPU architecture might use a carefully chosen subset of C11 atomics to implement some of the Linux atomic operations, especially non-value-returning atomics such as atomic_inc(). Thanx, Paul
Aarch64 implementation for dwarf exception handling
Hi, I have a question about the implementation of aarch64_final_eh_return_addr which is used to point out the return address of the frame According the source code If FP is not needed return gen_frame_mem (DImode, plus_constant (Pmode, stack_pointer_rtx, fp_offset + cfun-machine-frame.saved_regs_size - 2 * UNITS_PER_WORD)); According the frame layout +---+ -- arg_pointer_rtx | | callee-allocated save area | for register varargs | +---+ | | local variables | +---+ -- frame_pointer_rtx | | callee-saved registers | +---+ | LR' +---+ | FP' P+---+ -- hard_frame_pointer_rtx | dynamic allocation +---+ | | outgoing stack arguments | +---+ -- stack_pointer_rtx Shouldn't the return value be return gen_frame_mem (DImode, plus_constant (Pmode, stack_pointer_rtx, fp_offset + 2* UNITS_PER_WORD)); Or I just mis-understanding something ? Hope someone could give me a tip. It would be very helpful. Thanks Shiva Chen
Dead code elimination PROBLEM
Hi PPL i developed a plugin that produces the following gimple test () { int selected_fnc_var_.3; int random_Var.2; int D.2363; int _1; bb 2: random_Var.2_2 = rand (); selected_fnc_var_.3_3 = random_Var.2_2 %[fl] 5; if (selected_fnc_var_.3_3 == 4) goto L7; if (selected_fnc_var_.3_3 == 3) goto L6; if (selected_fnc_var_.3_3 == 2) goto L5; if (selected_fnc_var_.3_3 == 1) goto L4; if (selected_fnc_var_.3_3 == 0) goto L3; L7: _1 = f.clone.4 (t, t); goto L8; L6: _1 = f.clone.3 (t, t); goto L8; L5: _1 = f.clone.2 (t, t); goto L8; L4: _1 =f.clone.1 (t, t); goto L8; L8: if (_1 != 0) goto bb 3; else goto bb 4; bb 3: __builtin_puts ( f success [0]); goto bb 5; bb 4: __builtin_puts ( f failed [0]); bb 5: return; } with this final code 004005c6 test: 4005c6:55 push %rbp 4005c7:48 89 e5 mov%rsp,%rbp 4005ca:53 push %rbx 4005cb:48 83 ec 08 sub$0x8,%rsp 4005cf:e8 6c fe ff ff callq 400440 rand@plt 4005d4:89 d9mov%ebx,%ecx 4005d6:c1 f9 1f sar$0x1f,%ecx 4005d9:89 d8mov%ebx,%eax 4005db:31 c8xor%ecx,%eax 4005dd:ba 67 66 66 66 mov$0x6667,%edx 4005e2:f7 e2mul%edx 4005e4:89 d0mov%edx,%eax 4005e6:d1 e8shr%eax 4005e8:31 c8xor%ecx,%eax 4005ea:89 c2mov%eax,%edx 4005ec:c1 e2 02 shl$0x2,%edx 4005ef:01 c2add%eax,%edx 4005f1:89 d8mov%ebx,%eax 4005f3:29 d0sub%edx,%eax 4005f5:83 f8 04 cmp$0x4,%eax 4005f8:75 32jne40062c test+0x66 4005fa:83 f8 03 cmp$0x3,%eax 4005fd:74 2dje 40062c test+0x66 4005ff:83 f8 02 cmp$0x2,%eax 400602:74 28je 40062c test+0x66 400604:83 f8 01 cmp$0x1,%eax 400607:74 23je 40062c test+0x66 400609:85 c0test %eax,%eax 40060b:74 1fje 40062c test+0x66 40060d:be bc 09 40 00 mov$0x4009bc,%esi 400612:bf c6 09 40 00 mov$0x4009c6,%edi 400617:e8 7d 02 00 00 callq 400899 f.clone.4 40061c:85 c0test %eax,%eax 40061e:75 0cjne40062c test+0x66 400620:bf d0 09 40 00 mov$0x4009d0,%edi 400625:e8 e6 fd ff ff callq 400410 puts@plt 40062a:eb 0ajmp400636 test+0x70 40062c:bf e8 09 40 00 mov$0x4009e8,%edi 400631:e8 da fd ff ff callq 400410 puts@plt 400636:48 83 c4 08 add$0x8,%rsp 40063a:5b pop%rbx 40063b:5d pop%rbp 40063c:c3 retq from this gimple test(){ int D.2363; int _1; bb 2: _1 = f(t, t); if (_1 != 0) goto bb 3; else goto bb 4; bb 3: __builtin_puts ( f [0]); goto bb 5; bb 4: __builtin_puts ( f [0]); bb 5: return; } as you can see in the dis output code, its only make call to f.clone.4 ( callq 400899 f.clone.4 ), i suppose is the dead code elimination pass is the responsable of this action, i tryed to disable it using -O0 compilation option but without success. my question is how can i make the compiler produce the final code without deleting those dead codes portion ( do i need to make any kind of PHI nodes in the labels to achive that, if so how could i do that ? ) thanks in advance
[Bug c/60156] New: GCC doesn't warn about variadic main
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60156 Bug ID: 60156 Summary: GCC doesn't warn about variadic main Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: mpolacek at gcc dot gnu.org E.g. on int main (int argc, char *argv[], ...) { } with -Wpedantic.
[Bug c/60156] GCC doesn't warn about variadic main
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60156 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added Keywords||diagnostic Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2014-02-12 Assignee|unassigned at gcc dot gnu.org |mpolacek at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Marek Polacek mpolacek at gcc dot gnu.org --- I have a patch for 5.0.
[Bug c++/60047] [4.7/4.8/4.9 Regression] ICE with defaulted copy constructor and virtual base class
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60047 --- Comment #3 from paolo at gcc dot gnu.org paolo at gcc dot gnu.org --- Author: paolo Date: Wed Feb 12 08:45:46 2014 New Revision: 207712 URL: http://gcc.gnu.org/viewcvs?rev=207712root=gccview=rev Log: /cp 2014-02-12 Paolo Carlini paolo.carl...@oracle.com PR c++/60047 * method.c (implicitly_declare_fn): A constructor of a class with virtual base classes isn't constexpr (7.1.5p4). /testsuite 2014-02-12 Paolo Carlini paolo.carl...@oracle.com PR c++/60047 * g++.dg/cpp0x/pr60047.C: New. Added: trunk/gcc/testsuite/g++.dg/cpp0x/pr60047.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/method.c trunk/gcc/testsuite/ChangeLog
[Bug c++/60047] [4.7/4.8 Regression] ICE with defaulted copy constructor and virtual base class
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60047 Paolo Carlini paolo.carlini at oracle dot com changed: What|Removed |Added Status|ASSIGNED|RESOLVED CC|jason at gcc dot gnu.org | Resolution|--- |FIXED Assignee|paolo.carlini at oracle dot com|unassigned at gcc dot gnu.org Target Milestone|--- |4.9.0 Summary|[4.7/4.8/4.9 Regression]|[4.7/4.8 Regression] ICE |ICE with defaulted copy |with defaulted copy |constructor and virtual |constructor and virtual |base class |base class --- Comment #4 from Paolo Carlini paolo.carlini at oracle dot com --- Fixed for 4.9.0. Note that there is no ICE in release mode anyway.
[Bug target/60157] New: adding -mstrict-align for i386 and x86_64 architecture
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60157 Bug ID: 60157 Summary: adding -mstrict-align for i386 and x86_64 architecture Product: gcc Version: 4.4.6 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vinxxe at gmail dot com Created attachment 32113 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32113action=edit source code to reproduce the problem the pthread_cond_wait nptl function enters an infinite loop, never suspending the calling thread, if the address of the condition variable is misaligned. Now, this is not a gcc bug, obviously, but my question is: does it make sense to add the target option -mstrict-align to the i386 and x86_64 architectures, so that these kind of problem can be detected at compilation time? attached you will find a source code example to reproduce the problem execute the program with strace -f exec_name to see a neverending series of [pid 2922] futex(0x80499fd, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EINVAL (Invalid argument) here follows some info of my linux machine cat /proc/version Linux version 2.6.32-220.7.1.el6.centos.plus.i686 (root@thalix11dev) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Mon Oct 21 07:05:28 UTC 2013 rpm -qa | grep glibc glibc-devel-2.12-1.47.i686 glibc-common-2.12-1.47.i686 glibc-2.12-1.47.i686 glibc-debuginfo-2.12-1.47.i686 glibc-headers-2.12-1.47.i686 glibc-utils-2.12-1.47.i686 glibc-debuginfo-common-2.12-1.47.i686 glibc-static-2.12-1.47.i686 cat /proc/cpuinfo processor: 0 vendor_id: GenuineIntel cpu family: 15 model: 3 model name: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping: 4 cpu MHz: 2799.930 cache size: 1024 KB fdiv_bug: no hlt_bug: no f00f_bug: no coma_bug: no fpu: yes fpu_exception: yes cpuid level: 5 wp: yes flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc up pebs bts pni dtes64 monitor ds_cpl cid xtpr bogomips: 5586.31 clflush size: 64 cache_alignment: 128 address sizes: 36 bits physical, 32 bits virtual power management: rpm -qa | grep gcc gcc-4.4.6-3.el6.i686 libgcc-4.4.6-3.el6.i686 gcc-c++-4.4.6-3.el6.i686
[Bug rtl-optimization/60116] [4.8/4.9 Regression] wrong code at -Os on x86_64-linux-gnu in 32-bit mode
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60116 --- Comment #16 from Eric Botcazou ebotcazou at gcc dot gnu.org --- Author: ebotcazou Date: Wed Feb 12 08:49:55 2014 New Revision: 207713 URL: http://gcc.gnu.org/viewcvs?rev=207713root=gccview=rev Log: PR rtl-optimization/60116 * combine.c (try_combine): Also remove dangling REG_DEAD notes on the other_insn once the combination has been validated. Added: trunk/gcc/testsuite/gcc.c-torture/execute/20140212-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/combine.c trunk/gcc/testsuite/ChangeLog
[Bug target/60157] adding -mstrict-align for i386 and x86_64 architecture
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60157 vinxxe at gmail dot com changed: What|Removed |Added Severity|normal |enhancement
[Bug rtl-optimization/60116] [4.8/4.9 Regression] wrong code at -Os on x86_64-linux-gnu in 32-bit mode
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60116 --- Comment #17 from Eric Botcazou ebotcazou at gcc dot gnu.org --- Author: ebotcazou Date: Wed Feb 12 08:51:57 2014 New Revision: 207714 URL: http://gcc.gnu.org/viewcvs?rev=207714root=gccview=rev Log: PR rtl-optimization/60116 * combine.c (try_combine): Also remove dangling REG_DEAD notes on the other_insn once the combination has been validated. Added: branches/gcc-4_8-branch/gcc/testsuite/gcc.c-torture/execute/20140212-1.c - copied unchanged from r207713, trunk/gcc/testsuite/gcc.c-torture/execute/20140212-1.c Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/combine.c branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
[Bug rtl-optimization/60116] [4.8/4.9 Regression] wrong code at -Os on x86_64-linux-gnu in 32-bit mode
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60116 Eric Botcazou ebotcazou at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #18 from Eric Botcazou ebotcazou at gcc dot gnu.org --- Thanks for reporting the problem.
[Bug fortran/60060] [4.9 Regression] lto1: internal compiler error: in add_AT_specification, at dwarf2out.c:4096
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60060 --- Comment #9 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Wed Feb 12 09:01:30 2014 New Revision: 207715 URL: http://gcc.gnu.org/viewcvs?rev=207715root=gccview=rev Log: 2014-02-12 Richard Biener rguent...@suse.de PR lto/60060 * lto-lang.c (lto_write_globals): Do not call wrapup_global_declarations or emit_debug_global_declarations but emit debug info for non-function scope variables directly. Modified: trunk/gcc/lto/ChangeLog trunk/gcc/lto/lto-lang.c
[Bug fortran/49636] [F03] ASSOCIATE construct confused with slightly complicated case
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49636 --- Comment #6 from paul.richard.thomas at gmail dot com paul.richard.thomas at gmail dot com --- Dear Dominique, Thanks for the heads-up about -m32 - I thought that the code would be immune to word length changes ***sigh*** Cheers Paul On 12 February 2014 00:40, dominiq at lps dot ens.fr gcc-bugzi...@gcc.gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49636 --- Comment #5 from Dominique d'Humieres dominiq at lps dot ens.fr --- Created attachment 32098 [details] A fix for this problem AFAICT it fixes the problem for 64 bit mode only. In 32 bit mode the ICE is gone, but I get at run time i_good= 1 3 5 i_bad= 1** 3 I am sure that this trick will fix pr57019 too. This latter is claimed to be a regression but I am sure that it never worked :-) Nonetheless, I will take advantage of the regression label! I will work on it tomorrow night. By the way, this patch regtests OK on trunk. I have to make sure that substrings of character arrays work OK with ASSOCIATE. Did you regtest with -m32? I see gfortran.dg/associated_target_5.f03 failing at execution time with -m32, as well as the first test in pr57522 0 1 2 3 0 4 1 5 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug.
[Bug debug/60152] [4.9 Regression] multiple AT_calling_convention attributes generated after r205679
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60152 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-12 CC|rguenth at gcc dot gnu.org | Target Milestone|--- |4.9.0 Summary|[4.9.0 Regression] multiple |[4.9 Regression] multiple | AT_calling_convention |AT_calling_convention |attributes generated after |attributes generated after |r205679 |r205679 Ever confirmed|0 |1 --- Comment #1 from Richard Biener rguenth at gcc dot gnu.org --- Confirmed.
[Bug fortran/60060] [4.9 Regression] lto1: internal compiler error: in add_AT_specification, at dwarf2out.c:4096
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60060 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Richard Biener rguenth at gcc dot gnu.org --- Fixed.
[Bug middle-end/60092] posix_memalign not recognized to derive alias and alignment info
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60092 Tobias Burnus burnus at gcc dot gnu.org changed: What|Removed |Added CC||burnus at gcc dot gnu.org --- Comment #18 from Tobias Burnus burnus at gcc dot gnu.org --- (In reply to Richard Biener from comment #1) We could lower posix_memalign (ptr, align, size); to posix_memalign (ptr, align, size); ptr = __builtin_assume_algined (ptr, align); and hope for FRE to fix things up enough to make that useful. I wonder about mm_malloc. I assume for config/i386/pmm_malloc.h, it is already handled via posix_memalign, but shouldn't one also handle config/i386/gmm_malloc.h? For instance via --- a/gcc/config/i386/gmm_malloc.h +++ b/gcc/config/i386/gmm_malloc.h @@ -61,7 +61,11 @@ _mm_malloc (size_t size, size_t align) /* Store the original pointer just before p. */ ((void **) aligned_ptr) [-1] = malloc_ptr; +#if defined(__GNUC__) __GNUC__ = 4 __GNUC_MINOR__ = 7 + return __builtin_assume_aligned(aligned_ptr, align); +#else return aligned_ptr; +#endif } static __inline__ void
[Bug debug/60152] [4.9 Regression] multiple AT_calling_convention attributes generated after r205679
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60152 --- Comment #2 from Tobias Burnus burnus at gcc dot gnu.org --- See PR 60060 comment 7 for some details and a backtrace of the two add_calling_convention_attribute calls.
[Bug rtl-optimization/60116] [4.8/4.9 Regression] wrong code at -Os on x86_64-linux-gnu in 32-bit mode
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60116 --- Comment #19 from Eric Botcazou ebotcazou at gcc dot gnu.org --- Author: ebotcazou Date: Wed Feb 12 10:16:34 2014 New Revision: 207716 URL: http://gcc.gnu.org/viewcvs?rev=207716root=gccview=rev Log: PR rtl-optimization/60116 * combine.c (try_combine): Fix oversight in previous change. Modified: trunk/gcc/combine.c
[Bug rtl-optimization/60116] [4.8/4.9 Regression] wrong code at -Os on x86_64-linux-gnu in 32-bit mode
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60116 --- Comment #20 from Eric Botcazou ebotcazou at gcc dot gnu.org --- Author: ebotcazou Date: Wed Feb 12 10:17:08 2014 New Revision: 207717 URL: http://gcc.gnu.org/viewcvs?rev=207717root=gccview=rev Log: PR rtl-optimization/60116 * combine.c (try_combine): Fix oversight in previous change. Modified: branches/gcc-4_8-branch/gcc/combine.c
[Bug middle-end/60092] posix_memalign not recognized to derive alias and alignment info
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60092 --- Comment #19 from Jakub Jelinek jakub at gcc dot gnu.org --- (In reply to Tobias Burnus from comment #18) (In reply to Richard Biener from comment #1) We could lower posix_memalign (ptr, align, size); to posix_memalign (ptr, align, size); ptr = __builtin_assume_algined (ptr, align); and hope for FRE to fix things up enough to make that useful. I wonder about mm_malloc. I assume for config/i386/pmm_malloc.h, it is already handled via posix_memalign, but shouldn't one also handle config/i386/gmm_malloc.h? For instance via --- a/gcc/config/i386/gmm_malloc.h +++ b/gcc/config/i386/gmm_malloc.h @@ -61,7 +61,11 @@ _mm_malloc (size_t size, size_t align) /* Store the original pointer just before p. */ ((void **) aligned_ptr) [-1] = malloc_ptr; +#if defined(__GNUC__) __GNUC__ = 4 __GNUC_MINOR__ = 7 + return __builtin_assume_aligned(aligned_ptr, align); +#else return aligned_ptr; +#endif } static __inline__ void No, why? ccp of course understands the dynamic realignment: aligned_ptr = (void *) (((size_t) malloc_ptr + align) ~((size_t) (align) - 1)); so will know that aligned_ptr is align bytes aligned.
[Bug c/60158] New: powerpc: usage of the .data.rel.ro.local section
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60158 Bug ID: 60158 Summary: powerpc: usage of the .data.rel.ro.local section Product: gcc Version: 4.8.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jal2 at gmx dot de This bug may concern the gcc documentation on section usage only. Crosscompiling Das U-Boot with gcc 4.8.2 for powerpc with -fpic -mrelocatable, some addresses are put into a section .data.rel.ro.local, e.g. the address of qwerty from printf(%p\n, qwerty); There is no corresponding entry in the .fixup section. As Das U-Boot relocates itself to RAM using .got2/.got and .fixup sections only, how shall the section .data.rel.ro.local be handled? Currently it contains addresses only, but this may depend on the source code. I put .data.rel.ro.local into the GOT which solved my problem, but I'm not sure if this is the intention of the gcc developers. I've tried gcc 4.7.3 which put the address of qwerty into the GOT directly, i.e. there was no .data.rel.ro.local section and the string address was accessed with one redirection less. details: - gcc version: powerpc-softfloat-linux-gnuspe-gcc (Gentoo 4.8.2 p1.3r1, pie-0.5.8r1) 4.8.2 - gcc command line (some -I removed): -g -gdwarf-2 -Os -fpic -mrelocatable \ -meabi \ -D__KERNEL__ -DCONFIG_SYS_TEXT_BASE=0xef77 \ -fno-builtin -ffreestanding \ -isystem /usr/lib/gcc/powerpc-softfloat-linux-gnuspe/4.8.1/include \ -nostdinc -pipe -DCONFIG_PPC -D__powerpc__ -ffixed-r2 -Wa,-me500 \ -msoft-float -mno-string -mspe=yes -mno-spe -Wall -Wstrict-prototypes \ -fno-stack-protector -Wno-format-nonliteral -Wno-format-security \ -fstack-usage
[Bug rtl-optimization/60159] New: improve code for conditional sibcall
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60159 Bug ID: 60159 Summary: improve code for conditional sibcall Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jay.foad at gmail dot com If I compile this code for x86-64 I get: $ cat jcc.c extern int f(int x); int g(int x) { return x 3 ? f(x) : x; } $ cc1 -quiet -O3 jcc.c -o - ... g: .LFB0: .cfi_startproc cmpl$3, %edi jg .L4 movl%edi, %eax ret .p2align 4,,10 .p2align 3 .L4: jmp f .cfi_endproc This code would be simpler and shorter if the jg-to-jmp sequence was replaced with a single jg f instruction. I'm using gcc built from svn trunk r207717.
[Bug lto/60150] [4.9 Regression] ICE in function_and_variable_visibility, at ipa.c:1000
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60150 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |4.9.0
[Bug target/43546] [4.7/4.8/4.9 Regression] ICE: in assign_stack_local_1, at function.c:353 with -mpreferred-stack-boundary=2 -msseregparm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43546 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #14 from Jakub Jelinek jakub at gcc dot gnu.org --- Created attachment 32114 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32114action=edit gcc49-pr43546.patch This untested patch fixes this for me, the dynamic stack realignment code is then aware of the DFmode that might need to be possibly spilled. The cost patch isn't wrong either, but at that level we really can't determine if the constant load will be zero cost (when we will attempt to load it into a i387 stack register) or more expensive (if it is loaded into a SSE register).
[Bug target/43546] [4.7/4.8/4.9 Regression] ICE: in assign_stack_local_1, at function.c:353 with -mpreferred-stack-boundary=2 -msseregparm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43546 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||uros at gcc dot gnu.org --- Comment #15 from Jakub Jelinek jakub at gcc dot gnu.org --- Yet another option, perhaps better, would be to add a new predicate, that would return true for a MEM operand for which avoid_constant_pool_reference returns a CONST_DOUBLE floating point constant (other than signalling NaN?), and add another define_insn before *extendsfdf2_i387 that would use that predicate on the second operand and would do what *extendsfdf2_i387 does, but have also a =x, m alternative that would be later on split into a load of the constant widened to DFmode in memory. Then we should get better code when trying to load a DFmode constant into a DFmode register and compress_float_constant decided to compress it, while it isn't a win in the end. Or both my patch and this change.
[Bug rtl-optimization/60159] improve code for conditional sibcall
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60159 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org --- Not sure if that is desirable though, it will mess up debug/unwind info.
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #22 from Paulo J. Matos pa...@matos-sorge.com --- After some thought, I am concluding this cannot actually be optimized and that GCC 4.5.4 was better because it was taking advantage of an undefined behaviour that doesn't exist. The thought process is as follows. The whole process has to do with this type of loop: void foo (int loopCount) { short i; for (i = 0; (int)i loopCount; i++) ... } GCC 4.5.4 was assuming i++ could have undefined behaviour and the increment was done in type short. Then i was promoted to int through a sign_extend and compared to loopCount. This undefined behaviour allows GCC 4.5.4 to generate an int scev for the loop. In GCC 4.8 or later (haven't tested with 4.6 or 4.7), i++ is known not to have undefined behaviour. i++ due to C integer promotion rules is: i = (short) ((int) i + 1). GCC validly simplifies to i = (short) ((unsigned short)i + 1). This is then sign extended to int for comparison. GCC cannot generate an int scev because it's not simple: (int) (short) {1, +, 1}_1. This can validly loop forever if loopCount SHORT_MAX. For example, is loopCount is SHORT_MAX + 1, then when i reaches SHORT_MAX and is incremented by one the addition is fine because is done in (unsigned short) and then truncated using modulo 2 (implementation defined behaviour) to short, therefore never reaching loopCount and looping forever. In RTL we get the following sequence: r4:SI - [loopCount] r0:HI - 0 code label... ... r2:HI - r1:HI + 1 r3:SI - sign_extend r2:HI p0:BI - r3:SI r4:SI loop to code label if p0:BI I was tempted to simplify this to: r4:SI - [loopCount] r0:SI - 0 code label... ... r2:SI - r1:SI + 1 p0:BI - r2:SI r4:SI loop to code label if p0:BI However this will never have an infinite loop behaviour if r4:SI == SHORT_MAX, therefore I think that at least in this case this cannot be optimized. I am tempted to close the bug report. Richard?
[Bug rtl-optimization/60155] ICE: in get_pressure_class_and_nregs at gcse.c:3438
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60155 --- Comment #1 from John David Anglin danglin at gcc dot gnu.org --- With 4.6 and 4.7 compilers, this appears as: gcc-4.6 -g -O2 -Wall -Wpointer-arith -Wuninitialized -Wsign-compare -Wformat-security -Wno-pointer-sign -Wno-unused-result -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -ftrapv -fno-builtin-memset -D_FORTIFY_SOURCE=2 -g -O2 -Wformat -Werror=format-security -DLOGIN_PROGRAM=\/bin/login\ -DLOGIN_NO_ENDOPT -DSSH_EXTRAVERSION=\Debian-2\ -I. -I.. -I/usr/include/editline -DSSHDIR=\/etc/ssh\ -D_PATH_SSH_PROGRAM=\/usr/bin/ssh\ -D_PATH_SSH_ASKPASS_DEFAULT=\/usr/bin/ssh-askpass\ -D_PATH_SFTP_SERVER=\/usr/lib/openssh/sftp-server\ -D_PATH_SSH_KEY_SIGN=\/usr/lib/openssh/ssh-keysign\ -D_PATH_SSH_PKCS11_HELPER=\/usr/lib/openssh/ssh-pkcs11-helper\ -D_PATH_SSH_PIDDIR=\/var/run\ -D_PATH_PRIVSEP_CHROOT_DIR=\/var/run/sshd\ -DHAVE_CONFIG_H -c ../ssh-keygen.c ../ssh-keygen.c: In function ‘do_fingerprint’: ../ssh-keygen.c:887:1: internal compiler error: in hoist_code, at gcse.c:4631
[Bug target/57202] Please make the intrinsics headers like immintrin.h be usable without compiler flags
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57202 --- Comment #4 from Marc Glisse glisse at gcc dot gnu.org --- Can this be closed?
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #23 from rguenther at suse dot de rguenther at suse dot de --- On Wed, 12 Feb 2014, pa...@matos-sorge.com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #22 from Paulo J. Matos pa...@matos-sorge.com --- After some thought, I am concluding this cannot actually be optimized and that GCC 4.5.4 was better because it was taking advantage of an undefined behaviour that doesn't exist. The thought process is as follows. The whole process has to do with this type of loop: void foo (int loopCount) { short i; for (i = 0; (int)i loopCount; i++) ... } GCC 4.5.4 was assuming i++ could have undefined behaviour and the increment was done in type short. Then i was promoted to int through a sign_extend and compared to loopCount. This undefined behaviour allows GCC 4.5.4 to generate an int scev for the loop. In GCC 4.8 or later (haven't tested with 4.6 or 4.7), i++ is known not to have undefined behaviour. i++ due to C integer promotion rules is: i = (short) ((int) i + 1). GCC validly simplifies to i = (short) ((unsigned short)i + 1). This is then sign extended to int for comparison. GCC cannot generate an int scev because it's not simple: (int) (short) {1, +, 1}_1. This can validly loop forever if loopCount SHORT_MAX. For example, is loopCount is SHORT_MAX + 1, then when i reaches SHORT_MAX and is incremented by one the addition is fine because is done in (unsigned short) and then truncated using modulo 2 (implementation defined behaviour) to short, therefore never reaching loopCount and looping forever. In RTL we get the following sequence: r4:SI - [loopCount] r0:HI - 0 code label... ... r2:HI - r1:HI + 1 r3:SI - sign_extend r2:HI p0:BI - r3:SI r4:SI loop to code label if p0:BI I was tempted to simplify this to: r4:SI - [loopCount] r0:SI - 0 code label... ... r2:SI - r1:SI + 1 p0:BI - r2:SI r4:SI loop to code label if p0:BI However this will never have an infinite loop behaviour if r4:SI == SHORT_MAX, therefore I think that at least in this case this cannot be optimized. I am tempted to close the bug report. Richard? Yes. That sounds correct.
[Bug middle-end/60092] posix_memalign not recognized to derive alias and alignment info
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60092 --- Comment #20 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Wed Feb 12 13:36:08 2014 New Revision: 207720 URL: http://gcc.gnu.org/viewcvs?rev=207720root=gccview=rev Log: 2014-02-12 Richard Biener rguent...@suse.de PR middle-end/60092 * gimple-low.c (lower_builtin_posix_memalign): Lower conditional of posix_memalign being successful. (lower_stmt): Restrict lowering of posix_memalign to when -ftree-bit-ccp is enabled. * gcc.dg/torture/pr60092.c: New testcase. * gcc.dg/tree-ssa/alias-31.c: Disable SRA. Added: trunk/gcc/testsuite/gcc.dg/torture/pr60092.c Modified: trunk/gcc/ChangeLog trunk/gcc/gimple-low.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/alias-31.c
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 pmatos at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #24 from pmatos at gcc dot gnu.org --- Closing as invalid. Thanks Richard.
[Bug sanitizer/60142] [4.9 Regression][asan] -fsanitize=address breaks debugging - stepping into functions no longer possible
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60142 Jan Kratochvil jan.kratochvil at redhat dot com changed: What|Removed |Added CC||jan.kratochvil at redhat dot com --- Comment #4 from Jan Kratochvil jan.kratochvil at redhat dot com --- Verified GDB fails with it. GDB puts breakpoint on second .loc (that is not the fist/initial .loc) in a function as currently neither GCC nor GCC use DW_LNS_set_prologue_end. g++ (GCC) 4.9.0 20140212 (experimental) -S -g -fsanitize=address .type _Z4testv, @function _Z4testv: .LASANPC512: .LFB512: .file 2 asantest.C .loc 2 4 0 .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 .cfi_lsda 0x3,.LLSDA512 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq$112, %rsp .cfi_offset 14, -24 .cfi_offset 13, -32 .cfi_offset 12, -40 .cfi_offset 3, -48 leaq-128(%rbp), %rbx movq%rbx, %r14 cmpl$0, __asan_option_detect_stack_use_after_return(%rip) je .L3 .loc 2 4 0 --- here GDB puts the breakpoint movq%rbx, %rsi movl$96, %edi call__asan_stack_malloc_1 movq%rax, %rbx .L3: GDB already workarounds a similar case of GCC PR debug/48827, this asan prologue may look standard enough it could be possibly also workarounded in GDB.
[Bug target/57202] Please make the intrinsics headers like immintrin.h be usable without compiler flags
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57202 --- Comment #5 from Thiago Macieira thiago at kde dot org --- (In reply to Marc Glisse from comment #4) Can this be closed? Oh, yeah, this is working fine in GCC 4.9.
[Bug bootstrap/60160] New: Building with -flto in CFLAGS_FOR_TARGET / CXXFLAGS_FOR_TARGET
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60160 Bug ID: 60160 Summary: Building with -flto in CFLAGS_FOR_TARGET / CXXFLAGS_FOR_TARGET Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: d.g.gorbachev at gmail dot com Created attachment 32115 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32115action=edit Tentative patch Someone might want to build everything with LTO. Currently, I see two problems. 1. crtstuff.c: perhaps it'd be better to compile it with -fno-lto. 2. attribute used for _Unwind_* functions.
[Bug bootstrap/60160] Building with -flto in CFLAGS_FOR_TARGET / CXXFLAGS_FOR_TARGET
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60160 --- Comment #1 from Marc Glisse glisse at gcc dot gnu.org --- Note the related: http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01480.html (PR 43538) and PR 59893.
[Bug bootstrap/60160] Building with -flto in CFLAGS_FOR_TARGET / CXXFLAGS_FOR_TARGET
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60160 --- Comment #2 from Markus Trippelsdorf trippels at gcc dot gnu.org --- libstdc++ also causes problems: /var/tmp/gcc_build_dir_/./prev-gcc/xg++ -B/var/tmp/gcc_build_dir_/./prev-gcc/ -B/usr/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/var/tmp/gcc_build_dir_/prev-x86_64-pc-linux-gnu/l ibstdc++-v3/src/.libs -B/var/tmp/gcc_build_dir_/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/var/tmp/gcc_build_dir_/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/ x86_64-pc-linux-gnu -I/var/tmp/gcc_build_dir_/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/var/tmp/gcc/libstdc++-v3/libsupc++ -L/var/tmp/gcc_build_dir_/prev-x86_64-pc-lin ux-gnu/libstdc++-v3/src/.libs -L/var/tmp/gcc_build_dir_/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -march=native -O3 -pipe -flto=jobserver -frandom-seed=1 -fprof ile-generate -fno-lto -DIN_GCC-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedan tic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -DHAVE_CONFIG_H -DGENERATOR_FILE -Wl,-O1,--hash-style=gnu,--as-needed,--gc-sections,--icf=safe,--icf-iterati ons=3 -o build/genconstants \ build/genconstants.o build/read-md.o build/errors.o .././libiberty/libiberty.a /var/tmp/gcc_build_dir_/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so: error: undefined reference to 'std::istream::ignore(long)' /var/tmp/gcc_build_dir_/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so: error: undefined reference to 'std::basic_istreamwchar_t, std::char_traitswchar_t :: ignore(long)'
[Bug target/57202] Please make the intrinsics headers like immintrin.h be usable without compiler flags
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57202 Marc Glisse glisse at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Known to work||4.9.0 Resolution|--- |FIXED Target Milestone|--- |4.9.0 --- Comment #6 from Marc Glisse glisse at gcc dot gnu.org --- Thanks.
[Bug rtl-optimization/56965] nonoverlapping_component_refs_p is bogus and slow
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56965 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #6 from Richard Biener rguenth at gcc dot gnu.org --- Mine.
[Bug target/60151] HAVE_AS_GOTOFF_IN_DATA is mis-detected on x86-64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60151 --- Comment #1 from hjl at gcc dot gnu.org hjl at gcc dot gnu.org --- Author: hjl Date: Wed Feb 12 16:12:36 2014 New Revision: 207731 URL: http://gcc.gnu.org/viewcvs?rev=207731root=gccview=rev Log: Pass --32 to GNU assembler for .long foo@GOTOFF check PR target/60151 * configure.ac (HAVE_AS_GOTOFF_IN_DATA): Pass --32 to GNU assembler. * configure: Regenerated. Modified: trunk/gcc/ChangeLog trunk/gcc/configure trunk/gcc/configure.ac
[Bug target/60151] HAVE_AS_GOTOFF_IN_DATA is mis-detected on x86-64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60151 --- Comment #2 from hjl at gcc dot gnu.org hjl at gcc dot gnu.org --- Author: hjl Date: Wed Feb 12 16:38:50 2014 New Revision: 207733 URL: http://gcc.gnu.org/viewcvs?rev=207733root=gccview=rev Log: Pass --32 to GNU assembler for .long foo@GOTOFF check Backport from mainline PR target/60151 * configure.ac (HAVE_AS_GOTOFF_IN_DATA): Pass --32 to GNU assembler. Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/configure branches/gcc-4_8-branch/gcc/configure.ac
[Bug target/60151] HAVE_AS_GOTOFF_IN_DATA is mis-detected on x86-64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60151 --- Comment #3 from hjl at gcc dot gnu.org hjl at gcc dot gnu.org --- Author: hjl Date: Wed Feb 12 16:43:47 2014 New Revision: 207734 URL: http://gcc.gnu.org/viewcvs?rev=207734root=gccview=rev Log: Pass --32 to GNU assembler for .long foo@GOTOFF check Backport from mainline PR target/60151 * configure.ac (HAVE_AS_GOTOFF_IN_DATA): Pass --32 to GNU assembler. Modified: branches/gcc-4_7-branch/gcc/ChangeLog branches/gcc-4_7-branch/gcc/configure branches/gcc-4_7-branch/gcc/configure.ac
[Bug target/60151] HAVE_AS_GOTOFF_IN_DATA is mis-detected on x86-64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60151 H.J. Lu hjl.tools at gmail dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #4 from H.J. Lu hjl.tools at gmail dot com --- Fixed in GCC 4.7.4/4.8.3/4.9.0.
[Bug other/59893] Use LTO for libgcc.a, libstdc++.a, etc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59893 --- Comment #7 from Dmitry Gorbachev d.g.gorbachev at gmail dot com --- I used to build GCC 4.8/4.9 with -flto in C(XX)FLAGS_FOR_TARGET for quite some time (both native i686-pc-linux-gnu and a cross), and it seems to work. I saw two problems: PR 60160 (for which a patch exists), and PR 59472 (annoying, but not fatal).
[Bug middle-end/59737] [4.9 Regression] ice from optimize_inline_calls
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59737 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED CC||jakub at gcc dot gnu.org Resolution|--- |FIXED --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org --- Author: hubicka Date: Tue Feb 11 22:54:21 2014 New Revision: 207702 URL: http://gcc.gnu.org/viewcvs?rev=207702root=gccview=rev Log: PR lto/59468 * ipa-utils.h (possible_polymorphic_call_targets): Update prototype and wrapper. * ipa-devirt.c: Include demangle.h (odr_violation_reported): New static variable. (add_type_duplicate): Update odr_violations. (maybe_record_node): Add completep parameter; update it. (record_target_from_binfo): Add COMPLETEP parameter; update it as needed. (possible_polymorphic_call_targets_1): Likewise. (struct polymorphic_call_target_d): Add nonconstruction_targets; rename FINAL to COMPLETE. (record_targets_from_bases): Sanity check we found the binfo; fix COMPLETEP updating. (possible_polymorphic_call_targets): Add NONCONSTRUTION_TARGETSP parameter, fix computing of COMPLETEP. (dump_possible_polymorphic_call_targets): Imrove readability of dump; at LTO time do demangling. (ipa_devirt): Use nonconstruction_targets; Improve dumps. * gimple-fold.c (gimple_get_virt_method_for_vtable): Add can_refer parameter. (gimple_get_virt_method_for_binfo): Likewise. * gimple-fold.h (gimple_get_virt_method_for_binfo, gimple_get_virt_method_for_vtable): Update prototypes. PR lto/59468 * g++.dg/ipa/devirt-27.C: New testcase. * g++.dg/ipa/devirt-26.C: New testcase. Added: trunk/gcc/testsuite/g++.dg/ipa/devirt-26.C trunk/gcc/testsuite/g++.dg/ipa/devirt-27.C Modified: trunk/gcc/ChangeLog trunk/gcc/cp/decl2.c trunk/gcc/gimple-fold.c trunk/gcc/gimple-fold.h trunk/gcc/ipa-devirt.c trunk/gcc/ipa-utils.h trunk/gcc/testsuite/ChangeLog
[Bug libgcc/60161] New: updating collapsed because of no authentified software packets (lib32cc1)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60161 Bug ID: 60161 Summary: updating collapsed because of no authentified software packets (lib32cc1) Product: gcc Version: unknown Status: UNCONFIRMED Severity: blocker Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: dierk.zeissler at gmail dot com Installation von Paketen erforderlich, denen nicht vertraut werden kann Diese Aktion würde die Installation von Paketen aus nicht authentifizierten Software-Paketquellen erfordern. lib32gcc1 Installierte Version: 4:0.8.9-0ubuntu0.12.04.1 Hardware 64 Bit-Version
[Bug middle-end/59737] [4.9 Regression] ice from optimize_inline_calls
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59737 --- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org --- Author: jakub Date: Wed Feb 12 16:55:51 2014 New Revision: 207735 URL: http://gcc.gnu.org/viewcvs?rev=207735root=gccview=rev Log: PR middle-end/59737 * g++.dg/ipa/pr59737.C: New test. Added: trunk/gcc/testsuite/g++.dg/ipa/pr59737.C Modified: trunk/gcc/testsuite/ChangeLog
[Bug libgcc/60161] updating collapsed because of no authentified software packets (lib32cc1)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60161 Andreas Schwab sch...@linux-m68k.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from Andreas Schwab sch...@linux-m68k.org --- Please report that to Ubuntu, this has nothing to do with gcc.
[Bug target/58115] testcase gcc.target/i386/intrinsics_4.c failure
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58115 Bernd Edlinger bernd.edlinger at hotmail dot de changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #16 from Bernd Edlinger bernd.edlinger at hotmail dot de --- fixed on trunk. Thanks!
[Bug c++/43680] [DR 1022] G++ is too aggressive in optimizing away bounds checking with enums
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43680 Jason Merrill jason at gcc dot gnu.org changed: What|Removed |Added Known to fail|| --- Comment #19 from Jason Merrill jason at gcc dot gnu.org --- It looks like the committee is making this code undefined again: http://open-std.org/jtc1/sc22/wg21/docs/cwg_toc.html#1766
[Bug target/59516] [4.9 Regression] Multiple definition of `X' / of `non-virtual thunk to X' errors with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59516 Kai Tietz ktietz at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||ktietz at gcc dot gnu.org Resolution|--- |INVALID --- Comment #1 from Kai Tietz ktietz at gcc dot gnu.org --- This issue is a known binutils' ld bug. Issue here is that object-file arguments aren't treated proper for LTO-plugin. Work-a-round for this is adding all files into library by ar-tool, and doing linking via it (Side-note be aware that you will need to mark classes then via dllexport).
[Bug c/59193] Unused postfix operator temporaries
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59193 Max TenEyck Woodbury mtewoodbury at gmail dot com changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|INVALID |--- --- Comment #2 from Max TenEyck Woodbury mtewoodbury at gmail dot com --- The practice is very common in C (and the GCC code) and is NOT peculiar to C++. The creation of temporary values that are never used is a waste of resources and, even when removed by the optimizer, represent an, admittedly minor, defect. This may be a minor point but it is NOT controversial. Also, it is not really a matter of style. Your lack of insight on this is somewhat disturbing. Marking the argument as INVALID is just plain wrong. It should be left open to provide a reference for patches that address this problem.
[Bug c/59193] Unused postfix operator temporaries
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59193 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |INVALID --- Comment #3 from Andrew Pinski pinskia at gcc dot gnu.org --- a++ and ++a should be treated as similar and don't change the semantics of the loading from the variable or increase the number of loads if never used for scalar types. Now in C++, they are different when you overload them for classes but we don't use that feature yet.
[Bug rtl-optimization/57193] [4.7/4.8/4.9 Regression] suboptimal register allocation for SSE registers
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193 Richard Henderson rth at gcc dot gnu.org changed: What|Removed |Added Last reconfirmed|2013-05-07 00:00:00 |2014-2-12 CC||rth at gcc dot gnu.org --- Comment #4 from Richard Henderson rth at gcc dot gnu.org --- It seems like incomplete reload inheritance: (insn 19 16 21 2 (set (reg:V8HI 107) (truncate:V8HI (lshiftrt:V8SI (mult:V8SI (zero_extend:V8SI (subreg:V8HI (reg:V16QI 105) 0)) (zero_extend:V8SI (subreg:V8HI (reg/v:V2DI 101 [ f ]) 0))) (const_int 16 [0x10] include/emmintrin.h:1362 2134 {*umulv8hi3_highpart} (expr_list:REG_DEAD (reg:V16QI 105) (nil))) Creating newreg=111 from oldreg=107, assigning class SSE_REGS to r111 19: r111:V8HI=trunc(zero_extend(r111:V8HI)*zero_extend(r101:V2DI#0) 00x10) REG_DEAD r105:V16QI Inserting insn reload before: 31: r111:V8HI=r105:V16QI#0 Inserting insn reload after: 32: r107:V8HI=r111:V8HI The new register r111 does wind up inheriting from r107, but not transitively to r105. Thus we wind up leaving the copy insn 31.
[Bug target/58158] [4.8/4.9 Regression] ICE with conditional moves on GPRs with a floating point conditional on mipsel with loongson2f
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58158 Richard Henderson rth at gcc dot gnu.org changed: What|Removed |Added CC||rth at gcc dot gnu.org --- Comment #13 from Richard Henderson rth at gcc dot gnu.org --- (In reply to Tom Li from comment #12) { + if (!ISA_HAS_FP_CONDMOVE + GET_MODE_CLASS (GET_MODE (XEXP (operands[1], 0))) != MODE_INT) +FAIL; The patch is clearly wrong. It's attempting to look through a subreg around operands[1], but of course that subreg will not always exist.
[Bug target/58158] [4.8/4.9 Regression] ICE with conditional moves on GPRs with a floating point conditional on mipsel with loongson2f
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58158 --- Comment #14 from Andrew Pinski pinskia at gcc dot gnu.org --- (In reply to Richard Henderson from comment #13) (In reply to Tom Li from comment #12) { + if (!ISA_HAS_FP_CONDMOVE + GET_MODE_CLASS (GET_MODE (XEXP (operands[1], 0))) != MODE_INT) +FAIL; The patch is clearly wrong. It's attempting to look through a subreg around operands[1], but of course that subreg will not always exist. Actually it is correct as operands[1] will be an comparison_operator which always have two operands itself.
[Bug middle-end/60162] New: [4.9 lra regression] mlra appears to be using the FP registers as a set of spill registers for ARM.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60162 Bug ID: 60162 Summary: [4.9 lra regression] mlra appears to be using the FP registers as a set of spill registers for ARM. Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: ramana at gcc dot gnu.org This is something that I've just noticed with spec2k gzip : longest_match. If the function is compiled for a cross arm-none-linux-gnueabihf toolchain with the following parameters, --with-arch=armv7-a --with-fpu=neon --with-float=hard With a cross toolchain using mlra by default I get code that loads a value into an FP register and then moves this over to an integer register. While this is not that big a problem on some of the newer cores, it will be an issue on older cores where the latency of such transfers can be pretty high. You can experiment with -mno-lra to see the difference in code generated and this is essentially something that has shown up rather recently. Bisecting and will follow up in the morning with a testcase.
[Bug ada/60163] New: Ada style checks: token spacing enforces space only around the first of several multiplying operators
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60163 Bug ID: 60163 Summary: Ada style checks: token spacing enforces space only around the first of several multiplying operators Product: gcc Version: 4.7.4 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: piotr.trojanek at gmail dot com Created attachment 32116 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32116action=edit Example of wrong The -gnatyt option of the GNAT Ada compiler should check if binary operators other than ** are surrounded by spaces. However, it works correctly only for the first of several multiplying operators in an expression. For example, expression x * x + x*x does not trigger any warning. When compiling the attached file with gnatmake -gnatyt -gnatwe -gnatf style there should be 4 warning messages, but currently there are only 2. The problem occurs in the 4.7.4 version of the GNAT compiler; tested on Linux x86_64, but probably is platform-independent.
[Bug libgomp/60035] [PATCH] make it possible to use OMP on both sides of a fork (without violating standard)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 --- Comment #2 from Nathaniel J. Smith njs at pobox dot com --- Good point -- sent. http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00813.html
[Bug rtl-optimization/60162] [4.9 lra regression] mlra appears to be using the FP registers for integer values and then moving on to GPR registers.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60162 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added Component|middle-end |rtl-optimization --- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org --- This sounds like the cost model of moving between register classes is not correct for the arm backend.
[Bug ada/60163] Ada style checks: token spacing enforces space only around the first of several multiplying operators
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60163 --- Comment #1 from Piotr Trojanek piotr.trojanek at gmail dot com --- Created attachment 32117 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32117action=edit Patch to solve the problem The attached patch solves the problem. I tested it with GNAT GPL 2013, but the file is against the latest FSF sources.
[Bug ada/60164] New: Missing parenthesis in the documentation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60164 Bug ID: 60164 Summary: Missing parenthesis in the documentation Product: gcc Version: unknown Status: UNCONFIRMED Severity: minor Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: piotr.trojanek at gmail dot com Created attachment 32118 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32118action=edit Correct nested parentheses in the gnatmem documentation. There are nested parentheses in the documentation of the gnatmem. The closing parenthesis is missing. The attached patch solves the problem.
[Bug rtl-optimization/60155] ICE: in get_pressure_class_and_nregs at gcse.c:3438
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60155 --- Comment #2 from John David Anglin danglin at gcc dot gnu.org --- Breakpoint 1, get_pressure_class_and_nregs (insn=0xfab51d98, nregs=0xfaf028c0) at ../../gcc/gcc/gcse.c:3459 3459 gcc_assert (set != NULL_RTX); (gdb) p debug_rtx (insn) (insn 212 211 213 18 (parallel [ (set (reg/v:SI 114 [ num ]) (plus:SI (reg/v:SI 114 [ num ]) (const_int 1 [0x1]))) (trap_if (ne (plus:DI (sign_extend:DI (reg/v:SI 114 [ num ])) (sign_extend:DI (const_int 1 [0x1]))) (sign_extend:DI (plus:SI (reg/v:SI 114 [ num ]) (const_int 1 [0x1] (const_int 0 [0])) ]) ../ssh-keygen.c:830 113 {addvsi3} (nil)) $1 = void
[Bug rtl-optimization/60155] ICE: in get_pressure_class_and_nregs at gcse.c:3438
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60155 --- Comment #3 from John David Anglin danglin at gcc dot gnu.org --- Function compiles without -ftrapv.
[Bug c/16602] Spurious warnings about pointer to array - const pointer to array conversion
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16602 Sebastian Unger sebunger44 at gmail dot com changed: What|Removed |Added CC||sebunger44 at gmail dot com --- Comment #11 from Sebastian Unger sebunger44 at gmail dot com --- (In reply to Joseph S. Myers from comment #6) When you apply const to array of int, the resulting type is array of const int not const array of int; that's how type qualifiers and arrays interact in C, there is no such thing as a qualified array type. array of const int is not a const-qualified type in C. Can anybody provide a reference to the standard to the effect of this claim? Because I can't find any, and I do believe this statement is wrong. All other comments claiming this issue to be invalid are based on this (as are all examples claiming to show that the original issue breaks the constness promise). I'm inclined to reopen this issue unless someone can point me to the standard for this.
[Bug c/16602] Spurious warnings about pointer to array - const pointer to array conversion
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16602 --- Comment #12 from Andrew Pinski pinskia at gcc dot gnu.org --- (In reply to Sebastian Unger from comment #11) I'm inclined to reopen this issue unless someone can point me to the standard for this. From 6.7.3/9 (in the C11 draft): If the specification of an array type includes any type qualifiers, the element type is so qualified, not the array type. If the specification of a function type includes any type qualifiers, the behavior is undefined. 136)
[Bug c/16602] Spurious warnings about pointer to array - const pointer to array conversion
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16602 --- Comment #13 from Sebastian Unger sebunger44 at gmail dot com --- I believe the intent behind that is that the qualification of an array type is identical to that of its element type. I.e. the statement here is that an 'array of const ints' is identical to a 'const array of ints' rather than that the latter does not exist. Thus a 'pointer to array of ints' is perfectly convertible to 'pointer to array of const ints' which makes perfect sense. Note that this is completely different from a 'pointer to pointer to int' or any such as has been given in previous examples. At the very least GCC should treat it such in Gnu99 mode, as it makes perfect sense to have the following code compile successfully: typedef int IntArray[3]; void foo(IntArray const* a); void bar(IntArray* a) { foo(a); }
[Bug fortran/60148] strings in NAMELIST do not honor DELIM= in open statement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60148 --- Comment #7 from Jerry DeLisle jvdelisle at gcc dot gnu.org --- The regressions are two fold: 1) Tests are specifically looking for a or a ' when no longer generated, and 2) We need to also revise namelist reading of character types which are no longer delimited namelist_18.f90, modify test namelist_38.f90, error in read namelist_56.f90, ... namelist_70.f90, ... My patch is as follows so far, a little different from Steve's. With this patch I don't explicitly write a space delim because we write one in the first chunk below. This was needed for namelist_16.f90 Index: write.c === --- write.c(revision 206864) +++ write.c(working copy) @@ -1921,7 +1921,8 @@ to column 2. Reset the repeat counter. */ dtp-u.p.no_leading_blank = 0; - write_character (dtp, semi_comma, 1, 1); + if (dtp-u.p.nml_delim || (obj-type != BT_CHARACTER)) +write_character (dtp, semi_comma, 1, 1); if (num 5) { num = 0; @@ -1971,9 +1972,18 @@ /* Set the delimiter for namelist output. */ tmp_delim = dtp-u.p.current_unit-delim_status; + switch (tmp_delim) +{ + case DELIM_APOSTROPHE: +dtp-u.p.nml_delim = '\''; +break; + case DELIM_QUOTE: +dtp-u.p.nml_delim = ''; +break; + default: +dtp-u.p.nml_delim = '\0'; +} - dtp-u.p.nml_delim = tmp_delim == DELIM_APOSTROPHE ? '\'' : ''; - /* Temporarily disable namelist delimters. */ dtp-u.p.current_unit-delim_status = DELIM_NONE; I have not looked at read yet.
[Bug tree-optimization/60165] New: may be used uninitialized warning with -O3 but not with -O2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60165 Bug ID: 60165 Summary: may be used uninitialized warning with -O3 but not with -O2 Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincent-gcc at vinc17 dot net With: gcc (Debian 20140111-1) 4.9.0 20140111 (experimental) [trunk revision 206552] I get the following inconsistency in the warnings: ypig% cat out.i int a, b; int fn2 (int, int); int fn1 (int *p1) { if (fn2 (a, 0)) *p1 = b; int c; fn1 (c); return c; } ypig% gcc-snapshot -c -Wall -Werror=maybe-uninitialized -O2 out.i ypig% gcc-snapshot -c -Wall -Werror=maybe-uninitialized -O3 out.i out.i: In function 'fn1': out.i:9:5: error: 'c' may be used uninitialized in this function [-Werror=maybe-uninitialized] return c; ^ cc1: some warnings being treated as errors ypig% I don't know whether this is regarded as normal, but this looks strange. Note: I got this problem when compiling round_prec.c from the GNU MPFR trunk. I generated the preprocessed file with -E, then used creduce on the following script: #!/bin/sh { gcc-snapshot -c -Wall -Werror=maybe-uninitialized -O2 out.i \ ! gcc-snapshot -c -Wall -Werror=maybe-uninitialized -O3 out.i } /dev/null 21 to generate the simple testcase (and fixed the declarations to avoid additional warnings -- I think I should have used -Werror in the script to avoid them in the first place).
[Bug libgcc/60166] New: ARM default NAN encoding violates EABI
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60166 Bug ID: 60166 Summary: ARM default NAN encoding violates EABI Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: joey.ye at arm dot com #include stdio.h #include string.h #include math.h int g; float i = 0.0 ,j = 0.0 ; int main() { float f = i / j; memcpy(g, f, sizeof(g)); printf(f=%f, hex=%x\n, f, g); return 0; } When built for ARM thumb1, result is: f=nan, hex=7fff While according to the RTABI (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf) section 4.1.1.1: When not otherwise specified by IEEE 754, the result on an invalid operation should be the quiet NaN bit pattern with only the most significant bit of the significand set, and all other significand bits zero. So current libgcc is violating ARM EABI. I have a patch under testing.
[Bug c++/60167] New: [4.9 regression] Bogus error: conflicting declaration
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60167 Bug ID: 60167 Summary: [4.9 regression] Bogus error: conflicting declaration Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: ppluzhnikov at google dot com The test case below fails to compile with current trunk: g++ (GCC) 4.9.0 20140213 (experimental) g++ -c t.cc t.cc:10:48: error: conflicting declaration ‘typename FooF::Bar FooF::cache’ template int F typename FooF::Bar FooF::cache; ^ t.cc:5:14: note: previous declaration as ‘FooF::Bar FooF::cache’ static Bar cache; ^ t.cc:10:48: error: declaration of ‘FooF::Bar FooF::cache’ outside of class is not definition [-fpermissive] template int F typename FooF::Bar FooF::cache; ^ /// --- cut --- template int F struct Foo { typedef int Bar; static Bar cache; }; // template int F int FooF::cache; // OK template int F typename FooF::Bar FooF::cache; /// --- cut --- Removing reference (template int F struct Foo) also makes it compile. Compiles fine with gcc-4.8 and Clang.
[Bug c++/60168] New: Incorrect check in ~unique_ptr() when Deleter::pointer type is not a pointer type
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60168 Bug ID: 60168 Summary: Incorrect check in ~unique_ptr() when Deleter::pointer type is not a pointer type Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: ashish.sadanandan at gmail dot com The following compiles on both VS2013 and ICC 13.0.1 #include memory struct del { using pointer = int; void operator()(int) {} }; int main() { std::unique_ptrint, del p; } It fails on gcc4.8.1 with this error /usr/include/c++/4.8/bits/unique_ptr.h: In instantiation of 'std::unique_ptr_Tp, _Dp::~unique_ptr() [with _Tp = int; _Dp = del]': main.cpp:13:35: required from here /usr/include/c++/4.8/bits/unique_ptr.h:183:12: error: invalid operands of types 'int' and 'std::nullptr_t' to binary 'operator!=' if (__ptr != nullptr) I believe that last if statement should be if (__ptr != pointer())
[Bug target/60169] New: ICE ARM thumb1 handles far jump
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60169 Bug ID: 60169 Summary: ICE ARM thumb1 handles far jump Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: joey.ye at arm dot com Created attachment 32119 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32119action=edit testcase Trunk gcc 20140210: arm-none-eabi-gcc -mthumb -fomit-frame-pointer -mthumb -fPIC -mcpu=cortex-m0 -mno-lra png.c -c png.c: In function 'png_do_read_swap_alpha': png.c:104:1: internal compiler error: in reload, at reload1.c:1058 } ^ Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions.
[Bug c++/60168] Incorrect check in ~unique_ptr() when Deleter::pointer type is not a pointer type
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60168 Jonathan Wakely redi at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from Jonathan Wakely redi at gcc dot gnu.org --- Not a bug, the type unique_ptrT,D::pointer must meet the requirements of a NullablePointer which includes being comparable with nullptr, so int is not allowed
[Bug target/60169] ICE ARM thumb1 handles far jump
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60169 --- Comment #1 from Joey Ye joey.ye at arm dot com --- Caused by http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01229.html, reason is that stack layout shouldn't change during and after reload. I have a patch fixing it under testing.
[Bug c++/60168] Incorrect check in ~unique_ptr() when Deleter::pointer type is not a pointer type
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60168 --- Comment #2 from Jonathan Wakely redi at gcc dot gnu.org --- The standard also specifies the behaviour of ~unique_ptrT,D in terms of comparing the stored pointer with nullptr.
[Bug plugins/59335] Plugin doesn't build on trunk
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59335 Joey Ye joey.ye at arm dot com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #9 from Joey Ye joey.ye at arm dot com --- Resolved in trunk
[Bug c++/60168] Incorrect check in ~unique_ptr() when Deleter::pointer type is not a pointer type
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60168 --- Comment #3 from Ashish Sadanandan ashish.sadanandan at gmail dot com --- You are right, of course. Not a bug, but it's disappointing that it isn't. If that comparison were against a value initialized unique_ptrT, D::pointer, instead of nullptr, it'd allow unique_ptr to be used to manage any generic `handle` type, which may not meet the requirements of NullablePointer.
[Bug c/60170] New: No -Wtype-limits warning with -O1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60170 Bug ID: 60170 Summary: No -Wtype-limits warning with -O1 Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: chengniansun at gmail dot com For the expression -4L == (*g = l == 0): 1) gcc -O0 warns that the comparison is always false, which is desired. 2) with gcc -O1, the expression is optimized away, and the function fn1 directly returns 0. No -Wtype-limits warning for this expression. Since warning is a way to notify the programmers of potential bugs, IMHO it may still be necessary to report the warning at -O1, even though Gcc has no policy to ensure the warning consistency between -O0 and -O. $: cat s.c unsigned short *g; int fn1() { unsigned char ***const l = 0; return -4L == (*g = l == 0); } $: gcc-trunk -Wtype-limits -c s.c s.c: In function ‘fn1’: s.c:4:14: warning: comparison is always false due to limited range of data type [-Wtype-limits] return -4L == (*g = l == 0); ^ $: gcc-trunk -Wtype-limits -c -O1 s.c $: gcc-trunk --version gcc-trunk (GCC) 4.9.0 20140210 (experimental) Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Re: Warn about virtual table mismatches
On 02/11/2014 10:27 PM, Jan Hubicka wrote: On 02/11/2014 07:54 PM, Jan Hubicka wrote: + /* Allow combining RTTI and non-RTTI is OK. */ You mean combining -frtti and -fno-rtti compiles? Yes, that's fine, though you need to prefer the -frtti version in case code from that translation unit uses the RTTI info. Is there some mechanism that linker will do so? At the moment we just chose variant that would be selected by linker. I can make the choice, but what happens with non-LTO? Hmm, the linker might well make the wrong choice. Might be worth warning about this as well. + a type with the same name but number of virtual methods is but different number Jason
Re: [C++ Patch/RFC] PR 60047
On 02/06/2014 02:59 AM, Paolo Carlini wrote: - if (vec_safe_is_empty (vbases)) + if (vbases == NULL) vec_safe_is_empty is still more correct here. The rest of the patch is OK. Jason
Fix broken build for AVR and SPU targets
The below patch fixes the build for AVR and SPU targets, which got broken when the signature of build_function_call_vec changed. The patch passes vNULL for the extra parameter added (arg_loc), which I hope is ok for builtins? If ok, could someone commit please? I don't have commit access. Regards Senthil gcc/ChangeLog 2014-02-12 Senthil Kumar Selvaraj senthil_kumar.selva...@atmel.com * config/avr/avr-c.c (avr_resolve_overloaded_builtin): Pass vNULL for arg_loc. * config/spu/spu-c.c (spu_resolve_overloaded_builtin): Likewise. diff --git gcc/config/avr/avr-c.c gcc/config/avr/avr-c.c index 98650e0..101d280 100644 --- gcc/config/avr/avr-c.c +++ gcc/config/avr/avr-c.c @@ -115,7 +115,7 @@ avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs) fold = targetm.builtin_decl (id, true); if (fold != error_mark_node) -fold = build_function_call_vec (loc, fold, args, NULL); +fold = build_function_call_vec (loc, vNULL, fold, args, NULL); break; // absfx @@ -181,7 +181,7 @@ avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs) fold = targetm.builtin_decl (id, true); if (fold != error_mark_node) -fold = build_function_call_vec (loc, fold, args, NULL); +fold = build_function_call_vec (loc, vNULL, fold, args, NULL); break; // roundfx @@ -238,7 +238,7 @@ avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs) fold = targetm.builtin_decl (id, true); if (fold != error_mark_node) -fold = build_function_call_vec (loc, fold, args, NULL); +fold = build_function_call_vec (loc, vNULL, fold, args, NULL); break; // countlsfx } diff --git gcc/config/spu/spu-c.c gcc/config/spu/spu-c.c index 411496d..9d7aa5a 100644 --- gcc/config/spu/spu-c.c +++ gcc/config/spu/spu-c.c @@ -181,7 +181,7 @@ spu_resolve_overloaded_builtin (location_t loc, tree fndecl, void *passed_args) return error_mark_node; } - return build_function_call_vec (loc, match, fnargs, NULL); + return build_function_call_vec (loc, vNULL, match, fnargs, NULL); #undef SCALAR_TYPE_P }
Re: PATCH: PR target/60151: HAVE_AS_GOTOFF_IN_DATA is mis-detected on x86-64
On Tue, Feb 11, 2014 at 9:41 PM, H.J. Lu hjl.to...@gmail.com wrote: HAVE_AS_GOTOFF_IN_DATA defines a 32-bit assembler feature, we need to pass --32 to assembler. Otherwise, we get the wrong result on x86-64. We already pass --32 to assembler on x86. It should be OK to do it in configure. OK for trunk? This would break Solaris/x86 with as configurations, where this test currently passes, but would fail since as doesn't understand --32. How about passing --32 to as only for Linux? OK to install? I'd rather do it for gas instead, which can be used on non-Linux systems, too. Sure. Here is the new patch. OK to install? Attached is slightly changed patch that follows established configure.ac code formatting. Please check if this version works for you. The patch is OK for mainline and release branches. Thanks, Uros. Index: configure === --- configure (revision 207710) +++ configure (working copy) @@ -25028,6 +25028,10 @@ # These two are used unconditionally by i386.[ch]; it is to be defined # to 1 if the feature is present, 0 otherwise. +as_ix86_gotoff_in_data_opt= +if test x$gas = xyes; then + as_ix86_gotoff_in_data_opt=--32 +fi { $as_echo $as_me:${as_lineno-$LINENO}: checking assembler for GOTOFF in data 5 $as_echo_n checking assembler for GOTOFF in data... 6; } if test ${gcc_cv_as_ix86_gotoff_in_data+set} = set; then : @@ -25044,7 +25048,7 @@ nop .data .long .L0@GOTOFF' conftest.s -if { ac_try='$gcc_cv_as $gcc_cv_as_flags -o conftest.o conftest.s 5' +if { ac_try='$gcc_cv_as $gcc_cv_as_flags $as_ix86_gotoff_in_data_opt -o conftest.o conftest.s 5' { { eval echo \\$as_me\:${as_lineno-$LINENO}: \$ac_try\; } 5 (eval $ac_try) 25 ac_status=$? Index: configure.ac === --- configure.ac(revision 207710) +++ configure.ac(working copy) @@ -3867,8 +3867,13 @@ # These two are used unconditionally by i386.[ch]; it is to be defined # to 1 if the feature is present, 0 otherwise. +as_ix86_gotoff_in_data_opt= +if test x$gas = xyes; then + as_ix86_gotoff_in_data_opt=--32 +fi gcc_GAS_CHECK_FEATURE([GOTOFF in data], -gcc_cv_as_ix86_gotoff_in_data, [2,11,0],, + gcc_cv_as_ix86_gotoff_in_data, [2,11,0], + [$as_ix86_gotoff_in_data_opt], [ .text .L0: nop